A novel method for estimating properties of attentional oscillators reveals an age-related decline in flexibility

Ece Kaya; Sonja A Kotz; Molly J Henry

doi:10.7554/eLife.90735

. 2024 Jun 21;12:RP90735. doi: 10.7554/eLife.90735

A novel method for estimating properties of attentional oscillators reveals an age-related decline in flexibility

Ece Kaya ^1,^2,^✉, Sonja A Kotz ^2,³, Molly J Henry ^1,⁴

Editors: Peter Kok⁵, Barbara G Shinn-Cunningham⁶

PMCID: PMC11192533 PMID: 38904659

Abstract

Dynamic attending theory proposes that the ability to track temporal cues in the auditory environment is governed by entrainment, the synchronization between internal oscillations and regularities in external auditory signals. Here, we focused on two key properties of internal oscillators: their preferred rate, the default rate in the absence of any input; and their flexibility, how they adapt to changes in rhythmic context. We developed methods to estimate oscillator properties (Experiment 1) and compared the estimates across tasks and individuals (Experiment 2). Preferred rates, estimated as the stimulus rates with peak performance, showed a harmonic relationship across measurements and were correlated with individuals’ spontaneous motor tempo. Estimates from motor tasks were slower than those from the perceptual task, and the degree of slowing was consistent for each individual. Task performance decreased with trial-to-trial changes in stimulus rate, and responses on individual trials were biased toward the preceding trial’s stimulus properties. Flexibility, quantified as an individual’s ability to adapt to faster-than-previous rates, decreased with age. These findings show domain-specific rate preferences for the assumed oscillatory system underlying rhythm perception and production, and that this system loses its ability to flexibly adapt to changes in the external rhythmic context during aging.

Research organism: Human

Introduction

Auditory tasks such as understanding speech and listening to music rely on our ability to allocate and adjust attention to rhythmic cues in complex auditory signals. However, listeners’ attention to rhythmic cues can fail when the signal is temporally disorganized (Zalta et al., 2020), or with advancing age (Schneider et al., 2005). These failures of attention might result in reduced speech comprehension (Schneider et al., 2005) as well as in diminished ability to solve the ‘cocktail party problem’ (Zion Golumbic et al., 2013). However, speech perception (Poeppel and Assaneo, 2020) and production of musical sequences are improved when stimuli are presented at specific rates (Zamm et al., 2018; Scheurich et al., 2018), indicating that these abilities might be ‘restored’ in certain conditions. Here, we aimed to understand factors that facilitate and impede auditory rhythm processing from two different perspectives: the factors that arise from stimulus properties in the external world and those that stem from individual differences (the perceiver). Specifically, we tested how stimulus and the rhythmic context in which a stimulus is presented affect rhythm perception and production, and how temporal adaptation abilities change with advancing age. We found (1) a range of rates specific for each individual that yielded best performance and (2) deteriorating performance when switching between stimulus rates that was further amplified by age.

Two main theoretical approaches explain how we perceive time and rhythm. A timekeeper account proposes that the duration between two events is represented by the count of accumulated pulses that are generated by an internal pacemaker (Scheurich et al., 2018). An entrainment account, dynamic attending theory (DAT) proposes that biological systems consist of internal oscillations, i.e., rhythms, that adjust their phase and period to the temporal regularities of an external signal (Jones, 2018; Jones, 1976; Jones and Boltz, 1989). Synchronization between internal and external rhythms, termed entrainment, is the underlying mechanism for time and rhythm perception. Predictions of DAT have been confirmed in a number of studies that reported rhythmic facilitation effects, where a rhythmic cue improves perceptual timing of subsequent targets, with the highest accuracy for targets aligning with the entraining attentional oscillator’s peaks (Large and Jones, 1999; Barnes and Jones, 2000; Jones et al., 2002; McAuley and Jones, 2003; Martin et al., 2005; Herrmann et al., 2016; Jones et al., 2017; Cheng and Creel, 2020).

The current study did not test whether timing abilities are governed by entrainment or timekeeper mechanisms. We rather adopt an entrainment approach as well as common assumptions of entrainment models (Jones, 2008) that derive from the general properties of limit-cycle oscillators:

Assumption 1: Oscillators are self-sustaining; they persist even when no stimulus is present. They induce series of periodic expectations at the peaks of the oscillations.
Assumption 2: Oscillators are adaptive; they respond to timing perturbations (e.g. changes in stimulus rate) by correcting their phase and period.
Assumption 3: Each oscillator has an intrinsic period (Drake et al., 2000) at which it oscillates in the absence of any input (see Assumption 1) and is most stable against perturbations.
Assumption 4: Oscillators can respond to stimulus rates with integer-ratio relationships (i.e. in nested hierarchies).

Two key properties of internal oscillators that were the focus of the current study are their preferred rate and their flexibility. Preferred rate, also termed as natural frequency or eigenfrequency in different literatures, refers to the intrinsic period of the oscillator (Assumption 3), or group of nested oscillators (Jones, 2008), in the absence of any input (Assumption 1). Oscillators accomplish synchronization to periodicities in the external signal better when the signal’s rate is similar to the oscillator’s preferred rate (or harmonics of the preferred rate; McAuley and Jones, 2003) than when it is dissimilar (Notbohm et al., 2016). The range of rates around the oscillator’s preferred rate for synchronization is referred to as the entrainment region (McAuley et al., 2006). Theoretically, knowing the preferred rate of an individual’s internal oscillator would allow predicting the rates at which they would most successfully interact in a real-world listening situation.

One common method to estimate the preferred rate is the spontaneous tapping task, where participants are asked to tap their finger (McAuley et al., 2006; Collyer et al., 1994; Schwartze and Kotz, 2015) or a drumstick (Drake et al., 2000), on a desk or a sensor at a ‘comfortable rate’. The preferred rate estimate, spontaneous motor tempo (SMT), measured as the mean or median of the intervals between the individual taps, tends to cluster around 500–600 ms in adults (McAuley et al., 2006). One potential shortcoming of using SMT as a direct measure of an internal oscillator’s preferred rate is that SMT reflects a ‘preference’ for producing periodic movements in the absence of any interaction with the environment. Although this is indeed the definition of preferred rate, a stronger test of the degree to which SMT reflects the preferred rate of an internal oscillator would be to observe successful synchronization within – but not outside of – an entrainment region. SMT does predict timing preference and performance in other tasks: participants tend to prefer stimulus rates (i.e. preferred perceptual tempo [PPT]; McAuley et al., 2006) closer to their SMT (McAuley et al., 2006), drift back to their SMT during continuation tapping in synchronization-continuation paradigms (Zamm et al., 2018), and over- and underproduce stimuli that are faster and slower than their SMT, respectively (Zamm et al., 2018; Scheurich et al., 2018). However, in paradigms that involve comparison of individuals’ rate preferences (McAuley et al., 2006) and tapping performance (Zamm et al., 2018; Scheurich et al., 2020) across stimulus rates, stimulus conditions are tailored to individuals’ SMT and are low in number. This results in a resolution that is too poor to observe an entrainment region, and often confounds SMT with the global mean stimulus rate in an experiment (Kliger Amrani and Zion Golumbic, 2020a). We have previously proposed a synchronization-continuation paradigm where individuals’ tapping behavior on a finely sampled, broad range of stimulus rates was assessed. We estimated preferred rate as the stimulus rate with minimum tapping errors during continuation tapping (Kaya and Henry, 2022). However, estimating preferred rates based on a tapping paradigm cannot disentangle preferred rates of an auditory oscillator, a motor oscillator, or a coupled oscillatory system whose preferred rate would be influenced by the preferences and coupling strengths of its components (Schneider et al., 2005). Thus, here we applied the fine rate sampling to a perceptual paradigm (Experiment 1), estimated preferred rates in perceptual and motor versions of the paradigm with same stimulus rate conditions (Experiment 2), and compared the estimates to individuals’ SMT and PPT (Experiment 2).

Based on Assumption 2, we defined flexibility as the internal oscillator’s ability to adapt to rate changes in the external sound signal (Kaya and Henry, 2022). The logic is as follows: upon encountering a new rate, the oscillator gradually updates its phase and period to each upcoming interval. From a dynamical systems perspective, flexibility can be conceptualized as a complement to ‘stiffness’, and might be quantified based on the presence of hysteresis, which refers to a system’s tendency to stay in a previous state despite changes in stimulus parameters (Kelso, 1995). An inflexible oscillator would exhibit hysteresis and continue to respond in a way that reflects the properties of previously entrained stimuli. A fully flexible oscillator would not exhibit hysteresis as it would completely update its phase and period to the new stimulus, resulting in no discrepancy between the current stimulus and its internal representation. Thus, the extent to which timing performance would be affected by the stimulus history is inversely related to the underlying oscillator’s flexibility.

Prior research reveals effects of preceding context, also referred to as serial dependence (Kim and Alais, 2021; Motala et al., 2020) and carryover effects (Wiener et al., 2014), on timing behavior in tasks with and without a motor synchronization component. Within individual trials of synchronized tapping paradigms, changes in stimulus rate (period perturbation) and stimulus onset times (phase perturbation) result in increased asynchronies between stimulus and tap onsets. This effect is more pronounced for phase than period perturbations (Large et al., 2002; Loehr et al., 2011), and for sequences that speed up than those that slow down (Scheurich et al., 2020; Loehr et al., 2011). Across trials, the tapping rate in each trial is biased toward the previous trial’s stimulus rate (Kaya and Henry, 2022; Motala et al., 2020). Temporal judgments in the absence of motor synchronization are also affected by the stimulus properties presented in a preceding trial (Wiener et al., 2014; Jones and Mcauley, 2005; Wiener and Thompson, 2015) and throughout the experiment (Jones and Mcauley, 2005; McAuley and Miller, 2007), suggesting effects of local and global temporal contexts on duration perception. The majority of studies that revealed individual differences in proneness to history effects (Kim and Alais, 2021; Arzounian et al., 2017) have not aimed to explicitly estimate the extent and source of these individual differences, or have done so in shorter temporal contexts, using different operational definitions of flexibility than the one used here (Scheurich et al., 2018). Finally, similar to methods proposed to estimate preferred rate (Zamm et al., 2018; Scheurich et al., 2018; McAuley et al., 2006; Kaya and Henry, 2022; McPherson et al., 2018), previous attempts to measure flexibility (Scheurich et al., 2018; Scheurich et al., 2020; Kaya and Henry, 2022) involved only motor responses. Thus, we presented the same stimulus history to participants in two tasks, one with and one without the motor demands of synchronize-continue tapping. This design allowed assessing the effects of the same predictor (trial-to-trial rate change) on performance in different tasks, and thereby performing systematic comparisons of oscillator flexibility across perceptual and motor domains.

From the perceiver’s side, we chose to focus on how properties of internal oscillators change with advancing age. Studies assessing age-related changes in timing abilities show that older, as compared to younger individuals, produce slower tapping rates when asked to tap at a comfortable rate (McAuley et al., 2006; Baudouin et al., 2004) and at the fastest rate (Turgeon et al., 2011) they can maintain, show worse performance in temporal-order judgments (Szymaszek et al., 2009), gap detection (Fitzgibbons and Gordon-Salant, 1995) and discrimination and reproduction of time intervals (Incao et al., 2022), and tend to prefer slower stimulus rates (McAuley et al., 2006), which manifests in a breakdown in understanding fast speech. From an entrainment perspective, these findings suggest that internal oscillators of older individuals have slower preferred rates, reduced flexibility, or both. While the current study did not incorporate neural measures, it is worth noting that literature on neural entrainment can offer insights into the dynamics of attention. This is particularly relevant as these physical measures often align with the predictions of DAT (see Haegens and Zion Golumbic, 2018; Henry and Herrmann, 2014 for reviews). Neural entrainment to external auditory signals is aberrant (Goossens et al., 2016; Herrmann et al., 2019; Purcell et al., 2004), and less responsive to top-down attention in older than younger adults (Henry et al., 2017). Moreover, older adults exhibit reduced neural adaptation (Herrmann et al., 2023) and sensory gating (Brinkmann et al., 2021), suggesting an age-related decline in neural inhibition (Herrmann et al., 2023) that leads to a reduced capacity of the auditory system to adapt based on context. Based on the behavioral findings converging on reduced temporal abilities and evidence for impaired neural entrainment in older individuals, we hypothesized that older adults would exhibit stronger hysteresis than younger adults, which should result in smaller estimates of oscillator flexibility.

The aim of the current study was to estimate individuals’ preferred rate and flexibility in rhythmic tasks with and without a motor synchronization component, and in both preference and performance contexts: here, preference refers to SMT and PPT, whereas performance refers to tasks that require listeners to either synchronize with or make a perceptual judgment about rhythmic stimuli. Moreover, we aimed to assess how internal oscillator properties, specifically oscillator flexibility, change with advancing age.

We conducted two experiments. The main goal of Experiment 1 was to develop methods to estimate preferred rate and flexibility in a paradigm without a motor synchronization component, as a complement to our recent tapping study (Kaya and Henry, 2022). The task was a duration discrimination paradigm where participants compared the duration of a single comparison interval to the duration of intervals making up a standard stimulus. We assessed the effect of stimulus history on responses by comparing performance across two sessions with the same finely sampled pool of stimulus rates, one where we maximized and the other where we minimized the amount of rate change across trials. Experiment 2 involved shorter versions of the duration discrimination (Experiment 1) and paced tapping (Kaya and Henry, 2022) tasks with matched stimulus rates and histories, unpaced tapping tasks including SMT, and two tasks where individuals’ rate preferences (PPT) were measured.

In line with the preferred period hypothesis (McAuley et al., 2006), if SMT captures the preferred rate of common mechanisms underlying rhythm perception and production, we should see better performance around an individual’s SMT, as has previously been observed for motor tasks (Zamm et al., 2018; Scheurich et al., 2018; McAuley et al., 2006; Kliger Amrani and Zion Golumbic, 2020b). However, we did not necessarily expect a one-to-one correspondence between preferred rate estimates across tasks with and without a motor component, as individual differences in motor contributions to synchronization abilities are well documented (Assaneo et al., 2021).

We hypothesized that larger trial-to-trial changes in stimulus rate would lead to poorer performance due to hysteresis, in that both tapping and duration discrimination responses should reflect the properties of the preceding stimuli. Thus, we expected that larger changes between consecutive trials’ stimulus rates should decrease discrimination accuracy and increase tapping errors. We expected that the strength of these effects – the degree of inflexibility – should increase with age.

Experiment 1

Methods

Participants

Participants (N=31) were recruited from the participant pool of Max Planck Institute for Empirical Aesthetics laboratories in Frankfurt, Germany. Written informed consent was obtained from all participants. The procedure was approved by the Ethics Council of the Max Planck Society (approval number 2019_04) and the Research Ethics Board at Toronto Metropolitan University in accordance with the Declaration of Helsinki. Out of 31 (age: M=33, SD = 11) individuals who were recruited for the study, 27 participants (age: M=33, SD = 12) completed both sessions. Upon completion of each session, participants received 7 euros for every 30 min of their participation (21 euros per session on average). Two participants volunteered to complete the study without compensation. Prior to the experimental sessions, participants completed an online survey. All participants self-reported normal hearing and proficiency in English.

Procedure

The study consisted of an online background survey that participants completed at home, and then two experimental sessions. During the in-lab experimental sessions, participants completed two types of tasks. A series of unpaced tapping tasks, consisting of SMT and a ‘forced’ motor tempo (FMT) task, which was used to assess the range of free tapping rates within the participants’ motor abilities; and the main task, duration discrimination, where participants judged whether a comparison interval was ‘shorter’ or ‘longer’ than the intervals making up a standard sequence. Details of all tasks are provided below. Sessions were separated by 4–19 days. A single session started with the SMT and FMT tasks. Participants then set the sound volume to a level that they found comfortable for completing the task. Then, participants were presented with instructions on a computer screen that explained the main task with text and figures. A practice block, simulating the duration discrimination task, followed the instructions (details below). All instructions were in English. Once participants indicated that they understood the task, the main task blocks were initiated. Finally, unpaced tapping tasks were repeated in the same order. Participants were debriefed upon their request, only after the second session. An individual session lasted 90 min on average.

Duration discrimination task

The main task was a duration discrimination paradigm, where participants judged whether a comparison interval was longer or shorter than the intervals making up an isochronous standard sequence, by pressing either the L (longer) or S (shorter) key on a computer keyboard. The task procedure is illustrated in Figure 1. In each experimental session, 400 unique trials of this task were presented, each consisting of a combination of the three main independent variables: the inter-onset interval, IOI; amount of deviation of the comparison interval from the standard, DEV, and the amount of change in stimulus IOI between consecutive trials, ΔIOI. We explain each of these variables in detail in the next paragraphs.

Figure 1. — Each trial consisted of an isochronous standard sequence of five sounds (four intervals), followed by silence and another pair of sounds. The comparison duration was either shorter or longer than the standard intervals and took on one of ten values (DEV) that were proportional to the inter-onset interval (IOI) between sounds making up the standard sequence. The task was to press the S or L key to indicate whether the comparison interval was shorter or longer than the standard IOI. Over the course of 400 unique trials of a single session, IOI ranged from 200 ms to 998 ms. In random-order sessions, change in stimulus rate between a given trial n and immediately preceding trial n–1 (ΔIOI) was maximized, and the distribution of ΔIOI ranged from –778 ms to +770 ms. In linear-order sessions, IOI increased in each trial in the first 200 trials and decreased in the other half of the trials (or vice versa, counterbalanced across participants) in steps of 4 ms.

Stimuli were made up of 50 ms woodblock sounds; first, an isochronous standard sequence and then a comparison interval, separated by a silent gap. The interval between the five woodblock sounds making up the ‘standard’ isochronous stimulus sequence is referred to as IOI. Each trial’s IOI was drawn (without replacement) from a pool of all possible stimulus rates, linearly spaced between 200 ms to 998 ms in 2 ms steps. The silent interval between the last stimulus onset of the standard sequence and the first stimulus onset of the comparison pair was six times the standard IOI.

The comparison interval on each trial was longer or shorter than the standard IOI. DEV refers to the magnitude of the comparison interval’s deviation from the standard IOI. DEV took on one of ten levels, which were proportional to IOI:±2%, 7%, 11%, 16%, 20%. Each DEV level was presented 40 times in each session. Since IOI was unique on each trial, IOI and DEV were not fully crossed factors. Instead, the IOI dimension was divided into 40 bins, each consisting of 10 consecutive IOIs. The 10 DEV levels were randomly assigned to the 10 IOI values in each bin. The correspondence between IOI and DEV pairs was unique for each participant.

While the mean (M=599 ms), standard deviation (SD = 231 ms), and range (200 ms, 998 ms) of the presented stimulus IOIs were identical between the sessions, the way IOI changed from trial to trial was different. Change in IOI between consecutive trials was referred to as ΔIOI. In one session, the ‘linear-order’ session, ΔIOI was always ±4 ms. In one half of the session, ΔIOI was fixed at +4 ms. That is, IOI was 200 ms in the first trial, 204 ms in the second, and so on. In the other half of the session, ΔIOI was fixed at –4 ms. On the first trial, IOI was 998 ms, 994 ms in the second, and so on. The starting point, either 200 ms or 998 ms (in fast-start and slow-start conditions, respectively), was counterbalanced across participants.

In the other session, the ‘random-order’ session, ΔIOI was maximized, and the direction of the change (i.e. whether a trial was faster or slower than the previous) alternated on every trial. That is, if the stimulus IOI on one trial was faster than the previous (–ΔIOI), it would be slower (+ΔIOI) in the following trial, and vice versa. Note that stimulus IOI was stable within the standard sequence, and only changed between trials. Session order, i.e., whether a participant experienced the linear-order or random-order session first, was counterbalanced across participants. An example trajectory of stimulus IOI within random-order and linear-order sessions across trials is illustrated in Figure 1.

In each session, participants completed 407 trials, presented in 8 blocks with 50 trials in the first block, and 51 trials in the remaining 7 blocks. Except for the first block, the first trial of each block repeated the IOI that was presented as the last trial of the preceding block and was discarded from further analyses; this enabled preservation of the between-trial histories across blocks between which participants were allowed to take short breaks. Before the main task, participants were instructed about the task, and practiced the task for at least 6 trials. Instructions included two example trials with IOI of 500 ms, one with DEV of +0.3 and another with DEV of –0.3, illustrating ‘comparison longer’ and ‘comparison shorter’ conditions, respectively. DEV was fixed at +0.2 in half of the practice trials and at –0.2 in the other half. Two practice trials each were presented at fast, medium, and slow IOIs; randomly selected from ranges of [300–500 ms], [501–700 ms], and [701–900 ms], respectively. If participants failed on more than 3 of the first 6 practice trials, they completed another round of 6 practice trials. Both example and practice trials were randomly ordered within their respective blocks in each session.

The dependent variables were accuracy and bias. Accuracy coded whether a response on a trial was correct or not (1=correct, 0=incorrect). Bias, on the other hand, could take on one of three values per trial: if the response was correct, bias was 0. If the comparison interval in a trial was longer than the standard, and the participant’s response was ‘shorter’, bias in that trial was –1. Similarly, if participant’s response was ‘longer’ in a trial where comparison interval was shorter, bias was +1.

Unpaced tapping tasks

Unpaced tapping tasks consisted of a single SMT trial and two FMT trials, one each to estimate the ‘slowest’ and ‘fastest’ rates at which participants could maintain steady tapping. The unpaced tasks were repeated in the same order before and after completion of the duration discrimination task in both sessions. In the SMT task, participants were instructed to ‘tap on the desk at a rate that is comfortable to maintain’. In the FMT tasks, the instruction was ‘tap at the slowest rate that is comfortable to maintain’ (FMT-slowest) and to ‘tap at the fastest rate that is comfortable to maintain’ (FMT-fastest). Participants tapped for 30 s in the SMT task and FMT-fastest task, and 45 s in the FMT-slowest task. For all unpaced tapping tasks, the dependent measures were tapping rate (median of the produced intervals) and coefficient of variation (CV).

Apparatus

Stimuli were generated and presented on a Windows desktop computer, using the Psychophysics Toolbox extensions (Brainard, 1997; Pelli, 1997) for MATLAB. Auditory stimuli were presented via Beyerdynamics 880 Pro headphones. The audio signal was presented and recorded by an RME Fireface UC soundcard. All instructions were presented on an ASUS VG24QE LCD screen. Keypress responses for the duration discrimination task were collected on a USB keyboard. Tapping responses for the unpaced tapping tasks were recorded via a Schaller Oyster S/P contact microphone at a sampling rate of 44,100 Hz. The contact microphone was attached on the right half of the desk by default. Prior to the sessions, participants were asked to specify if they would like the microphone to be moved to the left half of the desk. None of the participants requested a relocation of the microphone.

Background survey

Prior to the first experimental session, participants completed an online survey. The survey consisted of two parts: the first part included questions about participants’ demographics, language skills, hearing abilities, and psychological disorders. The second part was ‘The Goldsmiths Musical Sophistication Index’, ‘Gold-MSI’ (Müllensiefen et al., 2014). The survey language was English by default, with an option to change the language to German. One question in the Gold-MSI was removed from the analyses due to contrasting Likert coding between the different languages in which the survey was completed.

Analysis

Data cleaning and exclusion criteria

The raw format of the tapping data was audio, since tapping responses were collected by a microphone. Individual taps were extracted from the audio files after visual inspection of the soundwave of each trial to set the noise floor for the recording on that trial. All peaks that exceeded the noise floor were retained. Inter-tap intervals (ITIs) were calculated as the difference between neighboring taps’ timestamps. We developed an automated procedure that detects and removes single-trial ITI outliers while accounting for drift that may have occurred within tapping trials. The script first marked the ITIs whose deviation from the median ITI exceeded 3× the median absolute deviation (MAD) of all ITIs in the respective trial. Then, it fitted a linear regression to the unmarked ITIs as a function of tap count. Finally, it removed any ITI that was smaller than half or larger than 1.5 times the predicted ITI.

Exclusion criteria for the main task were (1) a decrease in accuracy with increasing absolute DEV, and (2) chance level performance for both deviation directions (trials where comparison interval was shorter, and those where it was longer). To assess the first criterion at the participant level, we fitted separate models to each individual’s single-session data where accuracy was predicted by absolute deviation of the comparison interval for either shorter (|–DEV|) or longer (|+DEV|) comparison conditions. The models were fitted using MATLAB’s fitglm function, with the response variable distribution specified as ‘binomial’, and link function specified as ‘logit’, since the response variable, accuracy, was binary. Next, we compared the slopes (β) obtained from the separate models where either |–DEV| or |+DEV| predicted accuracy against zero, using one-tailed one-sample t-tests. All participants had positive slopes for both directions in both session types, indicating that the probability of correct response increased with |DEV| in all conditions. To test for chance level performance, for each session type, we split all trials into negative and positive DEV conditions and compared each group of trials’ accuracy against a mean of 0.5, using one-sample t-tests. Results showed that none of the participants had chance-level performance for both deviation directions. Finally, before applying group-level statistics such as t-tests and correlations, any data point that fell outside of the interquartile range was excluded from the respective distributions.

Preferred rate estimates

We conceptualized individuals’ preferred rates as the stimulus rates where duration discrimination accuracy was highest. To estimate preferred rate on an individual basis, we smoothed response accuracy across the stimulus rate (IOI) dimension for each session type, using the smoothdata function in MATLAB, which outputs the moving average of the neighboring data points within a specified window size. We used ‘Gaussian’ as the method for smoothing that calculates the Gaussian-weighted moving average over each window. This method gives higher values into the midpoint of the window, enhancing the fluctuations in the data that were the focus of the current analysis. As we were interested in a single-point maximum accuracy for each individual and session, we optimized the window size for each session type such that the smoothed data revealed a single global maximum. An illustration of the optimization for an example participant’s dataset is shown in Figure 2—figure supplement 1. For small windows, smoothed data included multiple IOI values where accuracy was 1, especially in the linear-order sessions. The optimization procedure revealed that, to obtain a single global maximum for each individual’s dataset, accuracy should be smoothed by windows of 26 samples in the random-order sessions and 48 samples in linear-order sessions (Figure 2—figure supplement 1). To equalize the smoothing across the variables of accuracy and IOI, we also smoothed IOI with the same window size. Estimates of preferred rate were taken as the smoothed IOI that yielded maximum accuracy.

To compare the preferred rate estimates between session types, we first conducted a paired-samples t-test. Then, we assessed the correspondence between the estimates. However, conventional correlation methods are not able to capture possible harmonic relationships between variables. Thus, we used a permutation test that accounted for the harmonic structure in data, in addition to the assessment of one-to-one correspondence between the data points. The test first calculates the perpendicular distance of the data points to the closest line among the y=x, y=2*x, and y=x/2 theoretical lines (referred to as residuals here, as in Kaya and Henry, 2022) whose sum quantifies how much the data points deviate from a total harmonic correspondence. Then, the test shuffles the Y axis values with respect to the X axis values 1000 times and calculates summed residuals for each permutation. The p-value is the percentage of summed residuals smaller than the initial value computed from original data. To validate the results obtained from this test, we ran an additional analysis using a modular approach. We first calculated how much the slower estimate (larger IOI value) diverts, proportionally from the faster estimate (smaller IOI value) or its multiples (i.e. harmonics) by normalizing the estimates from both sessions by the faster estimate. The outcome measure was the modulus of the slower, with respect to the faster estimate, divided by the faster estimate, described as mod(max(X), min(X))/min(X) where X = [session1_estimate session2_estimate]. For example, if a participant’s preferred rate estimate is 603 ms in one session, 295 ms in the other session, the slower estimate (603 ms) diverts from the multiple of the faster estimate (590 ms) by 13 ms, a proportional deviation of 4% of the faster estimate. As the resulting distribution of percentage diversion values was non-normal, we used median to summarize the central tendency for percentage diversion of slow from fast preferred rate estimates. Then, we ran a permutation test where linear-order session estimates were shuffled over 1000 iterations, and median percentage diversion values for each iteration (Figure 2—figure supplement 2) were retrieved. This test statistic was significant (p=0.003), indicating that the harmonic relationships we observed in the estimates were not due to chance or dependent on the assessment method.

In addition to estimating preferred rate at stimulus rates with peak performance, we investigated whether accuracy increased as a function of detuning, namely, the difference between stimulus rate and preferred rate, as predicted by the entrainment models (Jones, 2018; Large, 1994; McAuley, 1995). We tested this prediction by assessing the slopes of mixed-effects logistic regression models, where accuracy was regressed on the IOI condition, separately for stimulus rates that were faster or slower than an individual’s preferred rate estimate. To do so, we first z-scored IOIs that were faster and slower than the participant’s preferred rate estimates separately to render IOI scales comparable across participants. The detuning direction (i.e. whether stimulus IOI was faster or slower than the preferred rate estimate) was coded categorically. Accuracy (binary) was predicted by these variables (z-scored IOI, detuning direction), and their interaction. The model was fitted separately to datasets from random-order and linear-order sessions, using the fitglme function in MATLAB. Fixed effects were z-scored IOI and detuning direction and random effect was their interaction. We expected a systematic increase in performance toward the preferred rate, which would result in a significant interaction between stimulus rate and detuning direction. To decompose the significant interaction and to visualize the effects of detuning, we fitted separate models to each participant’s single-session datasets, and obtained slopes from each direction condition, hereafter denoted as the ‘relative-detuning slope’. We treated relative-detuning slope as an index of the magnitude of relative-detuning effects on accuracy. We then evaluated these models, using the glmval function in MATLAB to obtain predicted accuracy values for each participant and session. To visualize the relative-detuning curves, we averaged the predicted accuracies across participants within each session, separately for each direction condition (faster or slower than the preferred rate). To obtain a single value of relative-detuning magnitude for each participant, we averaged relative-detuning slopes across direction conditions. However, since slopes from IOI > preferred rate conditions quantified an accuracy decrease as a function of detuning, we sign-flipped these slopes before averaging. The resulting average relative-detuning slopes, obtained from each participant’s single-session datasets, quantified how much the accuracy increase toward preferred rate was dependent on, in other words, sensitive to, relative detuning.

Flexibility estimates

We hypothesized that larger trial-to-trial changes in stimulus rate would reduce accuracy. To test this hypothesis, we first compared participants’ average accuracy between session types, using a paired-sample t-test. Then, we assessed the effect of absolute rate change (|±ΔIOI|) on accuracy for each individual. To do so, we fitted generalized linear models to each participant’s random-order session data and obtained slopes (β) that quantified the strength of the |±ΔIOI| effect for each participant. The models were fitted using MATLAB’s fitglm function, with the distribution of the response variable specified as ‘binomial’, and link function specified as ‘logit’, since the response variable, accuracy, was binary. We fitted separate models for trials where the stimulus was faster or slower than the previous trial’s stimulus, where the predictor was either |–ΔIOI| or |+ΔIOI|, respectively. The model formula was p(Y=1|X)=e^(α+βx)/e^(α+βx)+1, where Y is accuracy and X is the amount of rate change in trials that were faster than previous (|–ΔIOI|) or in trials that were slower (|+ΔIOI|). Next, using one-tailed one-sample t-tests, we tested whether models’ β were smaller than zero, which would confirm a decrease in accuracy as a function of |–ΔIOI| or |+ΔIOI|. The β values, which quantified individuals’ ability to adapt to changes in stimulus rate from one trial to the next, served as our single-individual estimate of oscillator flexibility. Finally, to investigate whether responses were affected by the previous trial’s stimulus, we computed participants’ average bias in trials where stimulus was faster than the previous one (|–ΔIOI|), and in trials where it was slower (|+ΔIOI|). We compared the distribution of average bias values against zero, using one-sample t-tests. Non-zero positive bias indicated that participants incorrectly responded as ‘comparison interval was longer’ in trials where comparison interval was in fact shorter than the standard interval, and non-zero negative bias indicated the opposite. We further tested the relationship between the flexibility estimates (β from models where |–ΔIOI| or |+ΔIOI| predicted accuracy) and average relative-detuning slopes (see Preferred rate estimates) from random-order sessions. We predicted that flexible oscillators (larger β) would be less severely affected by detuning, and thus have smaller detuning slopes. Conversely, inflexible oscillators (smaller β) should have more difficulty in adapting to a large range of stimulus rates, and their adaptive abilities should be constrained around the preferred rate, as indexed by steeper relative-detuning slopes.

Results

We first assessed whether accuracy increased with increasing DEV. Comparison of the distribution of slopes (β) against zero showed that for both DEV directions, β were greater than zero. Descriptive and inferential statistics are shown in Supplementary file 1a. Next, we compared participants’ average accuracies from ‘comparison shorter’ (|–DEV|) and ‘comparison longer’ (|+DEV|) conditions. Although average accuracy from the latter conditions was higher in both sessions, these differences were nonsignificant.

Preferred rate estimates

We expected that accuracy should depend on IOI differently for each participant, and estimated individuals’ preferred rate as the IOI where smoothed accuracy was maximum. Between-session comparisons showed that estimates did not significantly differ between sessions (p=0.129). When we directly compared preferred rate estimates from the two session types (Figure 2A), we found that for most participants, the estimates were numerically close to each other. Interestingly, for some participants, estimates from one session were close to double or half of those from the other session, suggesting a harmonic relationship between the estimates. We applied a permutation test that accounted for the harmonic structure of the data and found a significant relationship between estimates from two session types (p=0.008, Figure 2A).

Figure 2. — (A) Left: Each circle represents a single participant’s preferred rate estimate from the random-order session (x axis) and linear-order session (y axis). The histograms along the top and right of the plot show the distributions of estimates for each session type. The dotted and dashed lines respectively represent 1:2 and 2:1 ratio between the axes, and the solid line represents one-to-one correspondence. Right: Permutation test results. The distribution of summed residuals (distance of data points to the closest y=x, y=2*x, and y=x/2 lines) of shuffled data over 1000 iterations, and the summed residual from original data (dashed line) that fell below 0.008 of the permutation distribution. (B) Top: Illustration of the preferred rate estimation method from an example participant’s linear-order session dataset. Estimates were the stimulus rates (IOI) where smoothed accuracy (orange line) was maximum (arrow). The dotted lines originating from the IOI axis delineate the stimulus rates that were faster (left, IOI < preferred rate) and slower (right, IOI > preferred rate) than the preferred rate estimate and expand those separate axes, the values of which were z-scored for the relative-detuning analysis. Bottom: Predicted accuracy, calculated from single-participant models where accuracy in random-order (purple) and linear-order (orange) sessions was predicted by z-scored IOIs that were faster than a participant’s preferred rate estimate (left), and by those that were slower (right). Thin lines show predicted accuracy from single-participant models, solid lines show the averages across participants, and the shaded areas represent standard error of the mean. Predicted accuracy is maximal at the preferred rate and decreases as a function of detuning. (C) Average accuracy from random-order (left, purple) and linear-order (right, orange) sessions. Each circle represents a participant’s average accuracy. (D) Flexibility estimates. Each circle represents an individuals’ slope (β) obtained from logistic models, fitted separately to conditions where |–ΔIOI| (left, green) or |+ΔIOI| (right blue) predicted accuracy, with greater values (arrow’s direction) indicating better oscillator flexibility. The means of the distributions of β from both conditions were smaller than zero (dashed line), indicating a negative effect of between-trial absolute rate change on accuracy. (E) Participants’ average bias from |–ΔIOI| (green) and |+ΔIOI| (blue) conditions in random-order (left) and linear-order (right) sessions. Negative bias indicates underestimation of the comparison intervals, positive bias indicates the opposite. Box plots in **C–E** show median (black vertical line), 25^th and 75^th percentiles (box edges), and extreme data points (whiskers). In C and E, empty circles show outlier values that remained after data cleaning procedures. (F) Correlations between participants’ average relative-detuning slopes, indexing the steepness of the increase in accuracy toward the preferred rate estimate (from panel B), and flexibility estimates from |–ΔIOI| (top, green) and |+ΔIOI| (bottom, blue) conditions (from panel C). Solid black lines represent the best-fit line, dashed lines represent 95% confidence intervals.

Figure 2—figure supplement 1. — (A) Left: Each circle represents a single participant’s preferred rate estimate from the random-order session (x axis) and linear-order session (y axis). The histograms along the top and right of the plot show the distributions of estimates for each session type. The dotted and dashed lines respectively represent 1:2 and 2:1 ratio between the axes, and the solid line represents one-to-one correspondence. Right: Permutation test results. The distribution of summed residuals (distance of data points to the closest y=x, y=2*x, and y=x/2 lines) of shuffled data over 1000 iterations, and the summed residual from original data (dashed line) that fell below 0.008 of the permutation distribution. (B) Top: Illustration of the preferred rate estimation method from an example participant’s linear-order session dataset. Estimates were the stimulus rates (IOI) where smoothed accuracy (orange line) was maximum (arrow). The dotted lines originating from the IOI axis delineate the stimulus rates that were faster (left, IOI < preferred rate) and slower (right, IOI > preferred rate) than the preferred rate estimate and expand those separate axes, the values of which were z-scored for the relative-detuning analysis. Bottom: Predicted accuracy, calculated from single-participant models where accuracy in random-order (purple) and linear-order (orange) sessions was predicted by z-scored IOIs that were faster than a participant’s preferred rate estimate (left), and by those that were slower (right). Thin lines show predicted accuracy from single-participant models, solid lines show the averages across participants, and the shaded areas represent standard error of the mean. Predicted accuracy is maximal at the preferred rate and decreases as a function of detuning. (C) Average accuracy from random-order (left, purple) and linear-order (right, orange) sessions. Each circle represents a participant’s average accuracy. (D) Flexibility estimates. Each circle represents an individuals’ slope (β) obtained from logistic models, fitted separately to conditions where |–ΔIOI| (left, green) or |+ΔIOI| (right blue) predicted accuracy, with greater values (arrow’s direction) indicating better oscillator flexibility. The means of the distributions of β from both conditions were smaller than zero (dashed line), indicating a negative effect of between-trial absolute rate change on accuracy. (E) Participants’ average bias from |–ΔIOI| (green) and |+ΔIOI| (blue) conditions in random-order (left) and linear-order (right) sessions. Negative bias indicates underestimation of the comparison intervals, positive bias indicates the opposite. Box plots in **C–E** show median (black vertical line), 25^th and 75^th percentiles (box edges), and extreme data points (whiskers). In C and E, empty circles show outlier values that remained after data cleaning procedures. (F) Correlations between participants’ average relative-detuning slopes, indexing the steepness of the increase in accuracy toward the preferred rate estimate (from panel B), and flexibility estimates from |–ΔIOI| (top, green) and |+ΔIOI| (bottom, blue) conditions (from panel C). Solid black lines represent the best-fit line, dashed lines represent 95% confidence intervals.

Logistic models assessing a systematic increase in accuracy toward the preferred rate estimate in each session type revealed significant main effects of IOI (linear-order session: β=0.26399, p=4.9546e-09; random-order session: β=0.17506, p=8.1406e-08), and significant interactions between IOI and direction (linear-order session: β=–0.44378, p=4.1998e-13; random-order session: β=–0.36437, p=5.0164e-15), indicating that accuracy increased as fast rates slowed toward the preferred rate (positive slopes) and decreased again as slow rates slowed further past the preferred rate (negative slopes), regardless of the session type. Figure 2B illustrates the preferred rate estimation method for an example participant’s dataset and shows the predicted accuracy values from models fitted to each participant’s single-session datasets. Note that the main effect and interaction were obtained from mixed-effects models that included aggregated datasets from all participants, whereas the slopes quantifying the accuracy increase as a function of detuning (i.e. relative-detuning slopes) were from models fitted to single-participant datasets.

Flexibility estimates

Average accuracy (Figure 2C) was higher in linear-order (M=0.834, SD = 0.039) sessions than in random-order (M=0.695, SD = 0.072) sessions (t(24) = 12.5964, p=4.5497e-12). β from models where |±ΔIOI| predicted accuracy was significantly smaller than zero for both |–ΔIOI| and |+ΔIOI| conditions and we found no significant differences between β from the former and latter conditions, showing that the probability of giving a correct response decreased with the amount of rate change across trials, regardless of whether a stimulus was faster or slower than the previous trial. Descriptive and inferential statistics are provided in Supplementary file 1a. The distributions of β from individual fits are shown in Figure 2D. To investigate the source of the negative relationship between |±ΔIOI| and accuracy, we analyzed how rate change affected bias. In both session types, participants’ average bias from faster-than-previous (|–ΔIOI|) conditions was significantly smaller than zero (random-order session: M=–0.179, SD = 0.144, t(26) = –6.4487, p=3.9085e-07; linear-order session: M=–0.065, SD = 0.078, t(26) = –4.3159, p=0.00010215), and average bias from slower-than-previous (|+ΔIOI|) conditions was significantly greater than zero (random-order session: M=0.195, SD = 0.096, t(26) = 10.5406, p=3.5025e-11; linear-order session: M=0.063, SD = 0.046, t(23) = 6.6472, p=4.4044e-07), as shown in Figure 2E. These results indicate that participants perceived longer comparison intervals as shorter on the trials where stimulus was faster than the previous trial, and vice versa on trials where stimulus was slower.

We tested the relationship between the flexibility estimates and single-participant relative-detuning slopes from random-order sessions (Figure 2B). The results revealed negative correlations between the relative-detuning slopes and flexibility estimates, both with β(r(23) = –0.52905, p=0.0065428) from models where |–ΔIOI| predicted accuracy (adapting to speeding-up trials), and β (r(23) = –0.57999, p=0.0023735) from models where |+ΔIOI| predicted accuracy (adapting to slowing-down trials). That is, the performance of individuals with less flexible oscillators suffered more as detuning increased. These results are shown in Figure 2F.

Unpaced tapping

Individuals completed a series of unpaced tapping tasks in the beginning and in the end of each session. Here, we focused on tapping rate from the SMT task. We first compared individuals’ SMT before and after sessions. For both random- and linear-order sessions, SMT from before and after the session correlated and were not significantly different. Given the consistency of the measure, we averaged participants’ SMT within sessions and compared the mean SMT across session types. We found a strong correlation between tapping rates from the random- and linear-order sessions. Test results of the unpaced tapping analyses are provided in Supplementary file 1b.

Discussion

The results of Experiment 1 showed that discrimination accuracy systematically increased with the difference between standard and comparison intervals (DEV) and decreased with the difference in stimulus rate between consecutive trials (|±ΔIOI|). Accuracy showed a nonlinear relationship with IOI: we observed improved accuracy at an individual-specific range of stimulus rates and in cases at their (sub)harmonics.

For most participants, estimates from random-order sessions were close to double the estimates from the linear-order sessions (see Figure 2A). Correspondence between estimates from the two session types shows the reliability of the paradigm and robustness of the methods we developed for the preferred rate estimation, since we were able to obtain similar estimates in repeated measurements, and under conditions with major differences in stimulus history and task difficulty. The current findings support three key predictions of the entrainment account. First, similar estimates of preferred rate under different temporal contexts and repeated measurements as well as a systematic increase in accuracy toward the preferred rate suggest improved timing abilities in situations with smaller detuning between the oscillator’s preferred rate and the stimulus rate (Notbohm et al., 2016). Second, that the estimates from the more challenging random-order session were narrower while preserving the correspondence to those from other conditions indicates that the internal oscillators were able to adaptively (McAuley and Jones, 2003; McAuley, 1995) entrain to the range of rates around their preferred rate, i.e., their entrainment region (McAuley et al., 2006). Finally, the harmonic relationship between the estimates from the two session types suggest the oscillator’s ability to respond to multiple, nested rates, either due to the circular nature of oscillators (McAuley, 1995) or by involvement of multiple nested oscillators in rhythmic entrainment (Jones, 2008).

Two sets of results confirmed the presence of history effects on timing performance. Accuracy was lower in random-order sessions where absolute rate change (|±ΔIOI|) was maximum, than in linear-order sessions where it was minimum. Moreover, accuracy in random-order sessions decreased as rate change increased. The difference in discrimination accuracy between sessions cannot be attributed merely to the effects of the global context, given that the global context was identical across session types. If the duration representations were drawn toward the mean of the rates presented in the session (‘the central tendency effect’, Jazayeri and Shadlen, 2010), accuracy would be similar between the sessions with identical global means. Instead, we observed a drastic decrease in accuracy in the random-order session, which suggests a stronger influence of local than global context in the current paradigm. The analyses of bias confirmed this explanation by showing that internal duration representations on a given trial were biased toward the previous stimulus rate. Interestingly, rate change across trials affected bias even when it was small and fixed.