Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2025 Oct 23;23(10):e3003459. doi: 10.1371/journal.pbio.3003459

Human brain integrates both unconditional and conditional timing statistics to guide expectation and behavior

Yiyuan Teresa Huang 1,2,3,*, Zenas C Chao 1,*
Editor: Ruth de Diego Balaguer4
PMCID: PMC12561982  PMID: 41129593

Abstract

Our brain uses prior experience to anticipate the timing of upcoming events. This dynamic process can be modeled using a hazard function derived from the probability distribution of event timings. However, the contexts of an event can lead to various probability distributions for the same event, and it remains unclear how the brain integrates these distributions into a coherent temporal prediction. In this study, we create a foreperiod sequence paradigm consisting of a sequence of paired trials, where in each trial, participants respond to a target signal after a specified time interval (i.e., foreperiod) following a warning cue. The prediction of the target onset in the second trial can be based on two probability distributions: the unconditional probability of the second foreperiod and its conditional probability given the foreperiod in the first trial. These probability distributions are then transformed into hazard functions to represent the unconditional and conditional temporal predictions. The behavioral model incorporating both predictions and their mutual modulation provides the best fit for reaction times to the target signal, indicating that both temporal statistics are integrated to make predictions. We further show that electroencephalographic source signals are also best reconstructed when integrating both predictions. Specifically, the unconditional and conditional predictions are encoded separately in the posterior and anterior brain regions, and integration of these two types of predictive processing requires a third region, particularly the right posterior cingulate area. Our study reveals brain networks that integrate multilevel temporal information, offering insight into the hierarchical predictive coding of time.


How does the brain integrate different temporal predictions based on prior experience? This study shows that humans combine unconditional and conditional timing expectations to guide behavior, with distinct brain regions encoding each type and the posterior cingulate cortex integrating them, revealing a hierarchical network for predictive coding of time.

Introduction

Predicting when an event will occur is a fundamental cognitive process that allows for the efficient allocation of attentional resources and optimal motor performance [13]. It has also been found that precise temporal predictions optimize the processing of sensory information by enhancing contrast sensitivity [4]. To study temporal prediction, researchers often use a foreperiod task where participants are required to press a button as soon as a target signal appears following a warning signal. The time interval between the warning and the target signals is referred to as the foreperiod. To achieve a fast response, it is crucial to learn the probability distribution of the foreperiod as participants can use it to expect the target onset. In this task, temporal prediction is often modeled by a hazard function (HF), which represents the ongoing updates to predictions as time progresses [5,6]. This function, derived from the probability distribution of the foreperiod, describes how the probability of the target signal occurring is updated over time, given that it has not yet occurred.

For neural correlates of the temporal prediction, higher hazard values (i.e., stronger prediction) have been associated with increased activity in the parietal area, as observed in single-neuron recordings and human MRI studies [7,8]. Additionally, increased alpha power in human electroencephalogram (EEG) has also been linked to higher hazard values [9]. Moreover, as temporal predictions elapse over time, these dynamics can be tracked in the brain by modeling time-resolved EEG signals in a forward encoding model. This modeling approach can effectively distinguish brain signals associated with varying HFs derived from different probability distributions [10]. However, the timing of an event may be influenced by multiple probability distributions, particularly when considering specific contexts involved. Take Beethoven’s Symphony No. 5 as an example. It begins with a “short-short-short-long” motif, commonly known as “fate knocking at the door.” To predict the fourth interval, one might expect a higher chance of “short” when only considering the probability distribution of each element, but there could be a higher chance of “long” when considering the conditional probability of the multi-element pattern that characterizes the motif. It raises a question of how the brain integrates various probability distributions for the same event across different levels, to generate a coherent temporal prediction.

To understand how the temporal prediction is established by probability distributions of event timings across multiple levels, we introduce a foreperiod sequence paradigm. In this paradigm, two foreperiod trials (FP1 and FP2) are paired in a sequence, leading to two levels of statistical regularities. Predicting FP2 can be based not only on the probability distribution of the single foreperiod, but also the probability distribution of FP2 conditioned on the preceding foreperiod (i.e., FP1) when the sequence structure is considered. We obtain two HFs from the two probability distributions to represent unconditional and conditional temporal predictions, respectively, and then model reaction times to the target and EEG source signals. It is worth noting that our paradigm aims to investigate multi-level temporal prediction, similar to the music example provided above. However, participants are required to actively respond to each target signal rather than passively listen. While this design may limit the applicability of our findings to real-world scenarios where an active response is not always required, it is essential for testing the integration of multiple statistics at the behavioral level.

Using a model-fitting approach, both behavioral and neural results indicate that the two statistics are learned jointly, rather than independently, for prediction establishment. Furthermore, we find that unconditional and conditional temporal predictions are encoded in distinct brain regions, while additional regions are necessary for processing their integration. Our study offers an experimental platform for exploring multilevel temporal prediction, and identifies key brain regions involved in the hierarchical predictive coding of time.

Results

Foreperiod sequence paradigm to establish multilevel temporal predictions

To control the predictability of an upcoming event based on two distinct probability distributions calculated at different levels, we created a novel foreperiod sequence paradigm. During each trial, participants received a warning signal (a low-pitched tone and a white dot on the screen) and were instructed to press the button promptly as the target signal (a high-pitched tone and a red dot) appeared. The delay between the warning and the target signal is referred to as the foreperiod (Fig 1A). Two consecutive trials are considered a “sequence” when the warning signal from the second trial occurs 1.2 s after the target signal from the first trial, with the interval between two consecutive sequences ranging from 3 to 3.2 s (Fig 1B, top). Within each sequence, the foreperiods for the first and the second trials are denoted as FP1 and FP2, respectively. This sequence design allows the predictability of FP2 to be established with two distinct probability distributions (Fig 1B, bottom). The first is an unconditional probability distribution, derived directly from the frequency of the single foreperiod (either FP1 or FP2), regardless of the sequence structure. The second is a conditional probability distribution, incorporating sequence structure by calculating the conditional probability of FP2 given FP1, the preceding trial.

Fig 1. Task design.

Fig 1

(A) A single foreperiod trial consisted of a white dot and a low-pitched tone as a warning signal, and a red dot and a high-pitched tone as a target signal. The foreperiod is the interval between the warning and the target, ranging from 0.4 to 2 s. Participants were instructed to press a zero button when the target appeared. (B) A sequence consisted of two foreperiod trials (denoted as FP1 and FP2) separated by a short blank, while a longer blank separated two sequences. There are two probability calculations. Regardless of the sequence structure, an unconditional probability can be computed based on single foreperiod occurrences (either FP1 or FP2) (highlighted with the black rectangle). A conditional probability can be computed based on FP2 occurrences (the purple rectangle) given on durations of FP1 (the light purple rectangle). (C) To vary the probabilities, four blocks were created, each containing 50 foreperiod sequences. The foreperiods timed between 0.4 and 1.1 s and between 1.3 and 2 s are denoted as S and L, respectively. There are four types of foreperiod sequences with different trial configurations for each of the four blocks. (D) Left column: The unconditional probability distributions of FP1 or FP2 are the same. Middle column: Two conditional probability distributions of FP2 were calculated. One was given FP1 < 1.2 s (in blue), and the other was given FP1 > 1.2 s (in red). The number on each plot (e.g., 50–50) represents the proportion of cumulative probabilities in the range of foreperiod >1.2 s and foreperiod <1.2 s. Right column: The unconditional and conditional probability distributions of each foreperiod were computed. The color indicates the probability value shown on the color bar. (E) The hazard functions (HFs) were computed based on the respective probability distributions in the panel C. HFU and HFC are computed based on the unconditional and conditional probability distribution, respectively. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

Four sequence blocks were created, each with 50 sequences (100 trials) and a unique configuration of unconditional and conditional probability distributions (Fig 1C). The block design allows us to disentangle the effects of unconditional and conditional probability distributions. For example, in Block 1, the foreperiods were set in the range of 0.4–2 s, with a step of 0.1 s and exclusion of 1.2 s. To simply explain the design in the following, we categorized the foreperiods as short (between 0.4 and 1.1 s, denoted as S) or long (between 1.3 and 2 s, denoted as L). Two sequence types were used: LL (long FP1 and long FP2) and SS (short FP1 and short FP2), each presented 25 times. For the unconditional probability, there were 25 trials of S and 25 trials of L, resulting in an unconditional probability distribution of 50% for S and 50% for L (denoted as 50–50, Fig 1D, left column). For the conditional probability, long FP2 always followed long FP1, and thus the conditional probability distribution for FP2 given long FP1 was a skewed unimodal distribution of almost 100% L (denoted as 0–100, Fig 1D, middle column). Similarly, short FP2 always followed short FP1, and thus the conditional probability distribution for FP2 given short FP1 was a skewed unimodal distribution of almost 100% S (denoted as 100–0). The other three blocks consisted of different numbers of sequences LL and SS, with additional sequences LS and SL. While the unconditional probability was always controlled at an even 50–50 ratio, as in Block 1, the conditional probability varied. It is important to note that the conditional probability of FP2 is computed based on each FP1 duration, as shown in Fig 1D (right column), rather than based on the short or long labels. Therefore, our paradigm can accommodate any probability distribution and is not restricted to a bimodal structure.

Next, to model temporal dynamics of the unconditional and conditional predictions, unconditional and conditional probability distributions in the four blocks (Fig 1D) were transformed into HFs denoted as HFU and HFC, respectively (Fig 1E; see the formula in Methods). Please note that predictability of FP1 is solely determined by its unconditional probability and the subsequent HFU, which are the same across all blocks.

Thirty-one participants (16 males and 15 females; age: 23 ± 2.9 years old, mean ± standard deviation) were included in the study. During the experiment, the four blocks were delivered in a random order and repeated three times, each with a different random order, resulting in a total of 12 block representations (run). At the end of each run, participants were asked to identify which sequence types were more frequent to maintain their engagement with the experiment. The accuracy was 87 ± 4% (mean ± std, n = 31 participants, chance level = 50%). During the task, we also recorded reaction time to the target signal and 64-channel EEG signals. The data underlying each table and figure are available in a public repository on the Open Science Framework (https://doi.org/10.17605/OSF.IO/VEDHP).

Reaction time modeled by both unconditional and conditional predictions and their interaction term

To investigate how unconditional and conditional temporal information can be used to predicting FP2, we analyzed the correlations between the values of HFU and HFC and the reaction times to the target following FP2 (later we simply refer to as reaction times following FP2). The trials in which participants made responses before the target onset are classified as false alarm trials, with their frequency detailed in S1 Table. After excluding only the false alarm trials, the average reaction times following both FP1 and FP2 are shown in Fig 2 and S2 Table.

Fig 2. Average and predicted reaction times across participants and their corresponding trial numbers.

Fig 2

(A) The average reaction time following FP1 and FP2 for each block is shown by the black line, with dominant foreperiod durations marked by black circles. The gray shade represents the standard deviation. Predicted reaction times are shown by the red line, with dominant foreperiod durations marked by red squares. For FP1, the predicted reaction times were obtained from a model with HFU as the fixed effect. For FP2, the predicted reaction times were based on a model including HFU, HFC, and their interaction as fixed effects. (B) The actual trial distributions after the removal of the false alarm are shown for FP1 and FP2 and blocks. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

Before focusing on FP2, we first verified whether the reaction times following FP1 could be correlated with HFU, as studies with a single foreperiod have shown that faster reaction times are associated with higher hazard rates [6,8,10]. To achieve this, we used a linear-mixed effect model to regress the reaction times following FP1 against the fixed effect, HFU values, and the individual difference was considered by including the participant as a random effect. A significantly negative effect of HFU was found, indicating that faster reaction following FP1 was associated with stronger prediction (p < 0.001, n = 31 participants, see Table 1). This is consistent with results from the literature. We also visualized the observed and predicted reaction times across all foreperiods (Fig 2A). A general trend consistent with the HF can be observed, particularly in the foreperiods with dominant trial frequencies, where actual and predicted reaction times closely align. Note that data from all blocks were combined for the model analysis; however, we separated them in the figure for clearer illustration of the actual reaction times across foreperiods alongside the predicted values.

Table 1. Effect of HFU on reaction times following FP1.

Estimates SE β t value p Con R2
(Intercept) 0.244 0.006 42.88 <0.001 0.173
HFU −0.021 0.002 −0.08 −11.67 <0.001

n = 17,705 observations. Random effect: participants. Estimate: correlation coefficient. β: standardized beta coefficients. SE: standard error. Con R2: conditional R-squared.

Next, we examined how HFU and HFC contribute to the reaction time following FP2. We used linear-mixed effect models to regress the reaction times following FP2 against four different sets of variables: (1) HFU (denoted as HFU-only), (2) HFC (denoted as HFC-only), (3) HFU and HFC (denoted as HFU + HFC), and (4) HFU and HFC and their interaction term (denoted as HFU + HFC + HFU * HFC). Please note that in our regression analysis, the duration of FP1 (L or S) was included as a random effect. This adjustment was necessary because we observed that the duration of FP1 influenced the response following FP2, which is known as the asymmetrical sequential effect [11]. Specifically, reaction times following a short FP2 were consistently longer after a long FP1 (as in the sequence LS) compared to after a short FP1 (as in the sequence SS) (see S3 Table). By including the duration of FP1 in the random effect, we were able to control for this confounding effect. Additionally, the participant was assigned as a random effect for controlling individual differences.

We first visualized the actual and predicted reaction times for all models in Figs 2 and S1. Due to the varied block-wise probability distributions, visually inspecting model fit is more challenging, as it requires jointly evaluating all predicted curves across the four blocks. To statistically determine the best-fitting model, we performed model comparisons. We find the model that incorporated HFU, HFC, and their interaction term (i.e., HFU * HFC) best predicts the reaction time following FP2 (with the smallest AIC and BIC for model fitness and the highest conditional R-squared for model explanation, see Table 2). The results of the best model are shown in Table 3, and the predicted reaction times are shown in Fig 2 (the predicted reaction times from the rest models are shown in S1 Fig). The interaction term had a notably positive coefficient (p < 0.001), indicating that the effect of HFU on the reaction time is moderated by the value of HFC. Specifically, the effect of HFU on the reaction time was weaker under greater HFC values, but stronger under smaller HFC values. In other words, unconditional prediction had a reduced impact on behavioral responses when conditional prediction was strong. For HFU and HFC, their main effects on FP2 were significantly negative (p < 0.001), indicating that higher hazard values, either unconditional or conditional predictions, led to faster responses. These findings suggest that the prediction of FP2 involves both HFU and HFC. Importantly, unconditional and conditional predictions interact to influence behavioral responses, suggesting an integrative process that enables mutual modulation between the two predictions.

Table 2. Comparisons of linear mixed-effect models on reaction times following FP2.

Fixed effects AIC BIC p Con R2
~HFU −48388.5 −48349.6 <0.001 0.218
~HFC −48335.4 −48296.5 <0.001 0.215
~HFU + HFC −48386.8 −48340.1 <0.001 0.218
~HFU + HFC + HFU * HFC −48449.7 −48395.1 0.221

n = 17,793 observations. Random effects: participants and FP1 durations. AIC: Akaike’s Information Criterion. BIC: Bayesian Information Criterion. p value obtained by comparing the model in the current row to the one in the last row.

Table 3. Effects of HFU and HFC on reaction times following FP2.

Estimates SE β t value p Con R2
(Intercept) 0.243 0.007 34.93 <0.001 0. 221
HFU −0.049 0.005 −0.21 −10.44 <0.001
HFC −0.008 0.002 −0.04 −3.93 <0.001
HFU * HFC 0.042 0.005 0.19 8.06 <0.001

n = 17,793 observations.

Mapping unconditional and conditional temporal predictions to cortical activations

Building on the behavioral analyses, we further estimated how brain signals encode the unconditional prediction (HFU) and the conditional prediction (HFC) for FP2. Our approach was to identify brain areas where trial-by-trial EEG signals could be reconstructed based on the HF over time. To achieve this, we used a forward encoding model, also known as regularized linear regression [12].

In our EEG analysis, to ensure sufficient data length for effective model training, we included trials that lasted > 0.7 s and did not contain a false alarm. These signals were then segmented, starting from 0.4 s after the onset of the warning signal and extending to the onset of the target signal. This segmentation minimized the influence of evoked responses to the warning and target signals. Then, we transformed EEG signals from the channel level (64 EEG channels) to the source level (~3,000 sources), using individual 3D electrode locations and structural MR images (see details of the source analysis in Methods). For HFU and HFC, values from 0.4 to 2 s with a 0.1-s step (10 Hz) were splined-interpolated to share the same sampling rate as the EEG source signals at a rate of 250 Hz (see S2 Fig).

To identify which brain areas encode HFU, HFC, and their interaction term, we trained a forward-encoding model to reconstruct the EEG source signals from the hazard values. First, we modeled the EEG signals during FP1 on HFU only, which is similar to a typical single foreperiod scenario and was used as a benchmark. Then, we modeled the EEG signals during FP2 on (1) HFU-only, (2) HFC-only, (3) HFU + HFC, and (4) HFU + HFC + HFU*HFC, following the same approach used in the behavioral analysis.

The training was done for each trial and participant with a leave-one-out cross-validation approach. First, one trial was excluded and used as a testing trial, and the remaining trials were used for training (Fig 3A). For EEG signals, the training data comprised dimensions of n1 trials, m time points, and ~3,000 sources. For the HFs, the training data comprised dimensions of n1 trials by m time points. Then, we modeled the EEG signals from each source on the hazard values with different time lags (76 lags, from −100 to −200 ms with a 4-ms step), resulting in the temporal response function (TRF) (dimensions: ~ 3,000 sources and 76 lags) (Fig 3C). Each TRF value represents the weight of the HF on the EEG signal at a specific source and lag. Then, the source data were projected to the voxel level of the brain (181 * 217 * 181 = 7,109,137 voxels).

Fig 3. Procedure for the forward-encoding model.

Fig 3

See the main text for details. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

To evaluate the modeling performance, we used the TRF and the HFs from the testing trial (Fig 3C, bottom) to reconstruct the EEG signals for each source at a time lag of zero, focusing on the immediate effect of the HF on the EEG signals. We then calculated the Pearson correlation coefficients between the reconstructed and actual responses (Fig 3D) and averaged the correlation coefficients across all leave-one-out training (i.e., n times from Fig 3A3D) for each source for each participant. Sources with significant correlations across participants were identified as “significant areas” (n = 31 participants, Monte Carlo method, 1,000 permutations, cluster-based correction, two-tail, α = 0.05; see more details in Methods).

For FP1, the significant areas included the middle and posterior cingulate areas, superior and middle temporal areas, supramarginal area, calcarine, lingual, and fusiform (Fig 4A). In these significant areas, the TRF values were generally negative (see S3 Fig), indicating higher HFU values (stronger predictions) led to smaller responses. Note that our prediction performances, indicated by the Pearson correlation coefficients between the actual and predicted signals, are below 0.015. While these may seem low, they fall within the range. For instance, the study by Di Liberto and colleagues [13] reported correlation values around 0.06 using EEG electrode signals in relation to speech characteristics, while Herbst and colleagues [10] found a correlation of approximately 0.005 when using EEG source data and HFs.

Fig 4. Neural correlates for unconditional and conditional temporal predictions.

Fig 4

(A) EEG source responses during FP1 were modeled against HFU-only. Four cortical surfaces with significant correlation coefficients are shown. The color bar shows correlation coefficients, and the same scale is applied to panels B and C. (B) EEG source responses during FP2 were modeled against HFC-only, and cortical surfaces with significance are shown. (C) EEG source responses during FP2 were modeled against HFU + HFC + HFU * HFC, and cortical surfaces with significances are shown. (D) Within the significant areas in the panel C, the box plot shows the average TRFs between source responses and hazard values at a 0-s lag. Values were normalized between −1 and 1. In each box plot, there are the median (the middle horizontal line), quartiles (the bottom and top edges), and the maximum and minimum (the horizontal lines outside the box). (E) The significant areas in the panels A–C were compiled. There is no overlap between the significant areas for HFU-only and HFC-only. (F) The bar plots show the percentages for significant areas for each hazard function over all areas in the left and right hemispheres, respectively. The percentage is calculated using the sum of significant voxels in the panel E. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

For FP2, significant areas were found only when modeling with (1) HFC-only, and (2) HFU + HFC + HFU * HFC. The significant areas for HFC-only included the anterior cingulate and medial orbitofrontal areas (Fig 4B). In these areas, we also observed negative TRF values (S4 Fig), indicating higher HFC values were associated with smaller responses. On the other hand, the significant areas for HFU + HFC + HFU * HFC included the anterior, middle, and posterior cingulate areas, superior and middle temporal areas, supramarginal area, inferior parietal area, calcarine, lingual, and fusiform (Fig 4C). This training yielded significantly higher correlation coefficients compared to HFC-only (n = 31 participants, Monte Carlo method, 1,000 permutations, cluster-based correction, two-tail, α = 0.05). This indicates that the EEG signals were better reconstructed when incorporating HFU, HFC, and their interaction term, rather than considering HFC alone, which is consistent with the behavioral model comparisons (Table 2).

For these significant areas for HFU + HFC + HFU * HFC, we show the boxplots of the average TRF values for HFU, HFC, and their interaction term (n = 35,104 significant voxels for each TRF) (Fig 4D, also see S5 Fig for spatial distribution of these values). First, the median TRF values for HFU and HFC were negative; again these indicate high unconditional and conditional predictions are associated with reduced responses. However, the median TRF value for their interaction term was positive, suggesting that the influence of HFU on the responses was weaker when the influence of HFC was strong, and vice versa. Again, these findings align with the beta values of the behavioral model.

We further compared the significant areas for HFU-only (as shown in Fig 4A), HFC-only (Fig 4B), and for HFU + HFC + HFU * HFC (Fig 4C). Here, we refer to these as the HFU area, HFC area, and the integration area, respectively. In Fig 4E, we find that the HFU area and HFC area (colored separately in dark blue and dark purple) were mutually exclusive. Furthermore, the HFU area and the integration area were broadly overlaid (light blue, the intersection of dark blue and red), while the HFC area and the integration area were shared focally in the prefrontal area (light purple, the intersection of dark purple and red). For the integration area that was not shared with the HFU and HFC areas (colored in red), several regions were identified, including the middle and posterior cingulate areas, the inferior parietal area, and calcarine.

Fig 4F further illustrates the percentages of these areas in the left and right hemispheres. The integration area (colored in red) was found to be dominant, accounting for more than half of the total significant areas in both hemispheres. In summary, our findings indicate that the unconditional and conditional predictions are encoded in distinct brain regions, while the integration of these predictions involves large areas as well as additional regions.

Discussion

Our foreperiod sequence paradigm was designed to create two levels of event timing regularities for temporal prediction. For both behavioral and EEG data, we demonstrated that predictions are made using statistics from both levels. Importantly, these statistics not only influence responses but also interact with each other. In other words, they are integrated to generate predictions. We further showed that these processes occur in distinct but overlapping brain regions. Our study establishes an experimental and analytical platform to isolate latent dynamics in hierarchical temporal prediction, a crucial step towards unifying predictive coding of “what” and “when” information (i.e., content and timing).

Prefrontal cortex involvement in processing information at a longer time scale

Our investigation used a forward encoding model to estimate how the brain encodes unconditional and conditional temporal predictions based on learning regularities of the single foreperiod and foreperiod sequence. The HF is a representation of a prediction updated over time, and this dynamic characteristic was kept and assigned as a time-resolved regressor during model training. By training the model with HFU and HFC values, we disentangled their corresponding neural correlates and extracted the correlates of their interaction term at the source-level areas.

Specifically, only when the sequence structure was considered, the neural correlates in the anterior cingulate and medial frontal areas were identified. This prefrontal involvement has been reported in previous neuroimaging studies showing differences when comparing responses in the variable foreperiod paradigm (i.e., a uniform distribution of the foreperiod) and in the fixed foreperiod paradigm [7,14,15]. Additionally, in rodents, inactivations of bilateral dorsomedial prefrontal cortex led to premature motor responses, indicating top-down inhibition on the motor cortex during preparation [16,17]. While these previous studies localized prefrontal activity by manipulating single foreperiod occurrences (akin to the regularity of the single foreperiod in our design), we observed similar activity only during processing the foreperiod sequence. In fact, increased responses were found in ERPs during both FP1 and FP2, and a significantly positive waveform was found at the frontal electrodes when comparing ERPs during FP1 and FP2 (S6 Fig). The activation around these frontal electrodes may imply a function of sustained monitoring. For instance, in the previous studies, increased prefrontal activity was found in the condition of unimodal probability distributions, compared to the condition of precisely invariant FP. Also, the retention of temporal information over a longer scale is supported by prefrontal neurons activating differentially to store different durations of the first and second cues [18]. For our findings, ramping-up prefrontal activity may reflect tracking of FP1 and FP2 in the two-foreperiod sequence.

A distinct brain network for the integration of unconditional and conditional temporal predictions

We used the multiplication of the HFU and HFC values to represent the integration of the unconditional and conditional predictions. This approach was inspired by the interaction term in regression models, which allows for the combined effect of two or more dependent variables on the independent variable, in addition to their individual main effects. Interestingly, in both behavioral regression and neural encoding models, we observed a positive effect of the interaction term. This indicates that the prediction at one level influences the reaction time and brain response more strongly when the prediction at the other level is weak, and vice versa. Furthermore, a wide range of brain areas was found to process integration, as there were no shared areas between significant correlates to HFU and HFC. Particularly, the posterior cingulate cortex was notably identified with a large distribution among the important areas exclusively involved in the integration processing. The posterior cingulate cortex was part of the frontoparietal network comprising the precuneus and inferior frontal gyrus, and functional connectivity of this network mainly increased with update values in an ideal Bayesian observer model. The update is quantified as a divergence between prior and posterior probabilities of the current FP, compared to surprise, inversely and non-linearly correlated with the HF [19,20]. Furthermore, the neural correlate of the integration area localized preferentially at the right side of posterior cingulate cortex further strengthens a link to the top-down updates. Hemispheric lateralization in temporal processes has been proposed, where the right hemisphere would be better at learning previous information to predict future onset while the left hemisphere would be better at comparisons between a test stimulus interval and a target stimulus interval [21]. In sum, our finding of the cingulate area may imply simultaneously processing multiple levels of prior information, requiring intense prediction updates in the posterior cingulate cortex.

In addition to the cingulate areas, the inferior parietal area was also identified although its distribution was relatively sparse in our results. It has been shown that neural activity in the lateral intraparietal neuron field changed differently in blocks of unimodal and bimodal distributions [8]. Analogously, we created different bimodal distributions, e.g., the 50–50 unconditional probability distribution and 80–20 conditional probability distribution, and discrimination between the probability distributions may also be a function of the integration hub. Here, it is important to clarify again that the integration of two probabilities was supported by the model including the interaction term (i.e., HFU* HFC), which yielded the best fit relative to the other models that did not feature such a term. To directly investigate this integration effect within the brain, future research could alter activity in the identified areas through lesions, neuropharmacological manipulations, or brain stimulation techniques. By selectively enhancing or inhibiting each brain region associated with the HFU area, HFC area, or interaction area, we can assess its causal influence on behavior and on neural activity in the other regions.

Moreover, while our study provided insights into brain networks involved in temporal predictions based on time-series responses, it is also critical to understand how these networks function and communicate within the oscillatory domain. The alpha/beta oscillation has been suggested as a representation of the top-down prediction signal, while the gamma oscillation is associated with the bottom-up prediction error signal [22]. This aligns with findings that show that alpha power prior to a target changed with the HF [9]. Furthermore, predictions of “what will happen” at levels of single tone transitions and multi-tone sequences have been shown in different ranges of the beta oscillation [23]. To further investigate this frequency ordering of hierarchical predictions of “when an event will happen,” we should combine time-frequency analysis as well as functional connectivity analysis in our future work. Moreover, while we transformed EEG surface signals into source signals to enhance spatial resolution, a better understanding of the areal brain functions could be achieved by combining EEG with functional magnetic resonance imaging measurements. Such a combination allows us to track the dynamics of temporal prediction signals and observe brain activity in precise locations.

We also acknowledge that our requirement of asking participants to identify the sequence structure may make the participants examine time more explicitly, particularly for learning the conditional probability, although the instruction was aimed to keep their engagement with the experiment. To explore implicit temporal predictions more effectively, the sequence identification question after each block can be removed. We anticipate that this modification may lead to the extraction of integrating the two predictions in different brain areas, as well as different degrees of how the predictions modulate each other.

Other potential temporal prediction models

We noted that Janssen and Shadlen [8] adopted a smooth version of the HF, called temporal-blurred HF, assuming the precision of timing decreases as time goes on. On the other hand, recent studies using a probabilistically blurred probability distribution instead of a HF provided a better explanation for the reaction time [24,25]. Still, in long-foreperiod paradigms, where greater temporal uncertainty would be expected, reaction times have been well explained by temporally blurred HFs [26]. These seemingly conflicting findings across studies may reflect key differences across studies [27], including: (1) the statistical properties of the foreperiod distributions (e.g., mean and standard deviation), (2) the temporal resolution of the experimental design, and (3) the use of catch trials. These factors can introduce varied temporal prediction profiles with different types and magnitudes of uncertainty. Regarding the first point, in a bimodal distribution created by overlapping two identical unimodal distributions, a greater distance between the peaks is expected to produce a stronger temporal blurring effect. For probabilistic blurring, the extent of the effect can be influenced by the distribution’s standard deviation, for instance, a sharper distribution may lead to more rapid changes over time than a flatter one. For the second point, even when distributions share the same mean and standard deviation, the range of foreperiods can still affect uncertainty. For example, a longer range (e.g., 0–20 s, as in [26]) introduces greater temporal uncertainty than a shorter range (e.g., 0.4–2 s, as used in our study). It is also reasonable to infer that temporal and probabilistic blurring may coexist and interact, potentially modulating one another, when using probability distributions with different means, standard deviations, and time ranges. Third, the inclusion of catch trials not only adds uncertainty regarding whether the target will occur but also complicates the definition of the underlying probability distribution. For instance, Grabenhorst and colleagues [25] modeled reaction times by incorporating both the uncertainty of target occurrence and probabilistic uncertainty (i.e., probabilistic blurring), raising the question of how these two forms of uncertainty interact in the brain. In terms of probability distributions, catch trials may either be excluded or interpreted as extremely long foreperiods, making distribution estimation more complex.

In our study, various smooth versions of HFs and probability distribution functions with different blurring parameters to fit the behavioral and neural data (see details in S1 Text) were tested. However, the results were not consistent: statistically significant improvement was observed for the reaction time (S7 Fig), but not the EEG data. Specifically, model performance quantified by the R2 value varied depending on the type of blurring (temporally or probabilistically HF) and the dataset (FP1 or FP2), but the best models were obtained with HF as base. This inconsistency further supports our earlier points: in our study, we implemented distinct foreperiod distributions to examine whether the brain can learn and integrate both unconditional and conditional temporal statistics. This design inevitably introduced different types of uncertainty across conditions. As shown in previous neurophysiological work, which demonstrates neural encoding of original HFs, and consistent with our findings, we suggest that the brain may rely on HF-like computations as a neural mechanism for temporal prediction. These computations may be modulated by different forms of uncertainty in a context-dependent and potentially parallel manner. Nevertheless, we emphasize that while using nonblurred models allows us to isolate core computational principles, under the simplifying assumption that time is encoded without error, it also underrepresents the role of uncertainty. The modulatory effects of uncertainty should be explored in future research, for example, by varying foreperiod ranges (to examine temporal blurring) or manipulating the standard deviation of the distribution (to examine probabilistic blurring).

We also acknowledge that the conditional R2 values in our behavioral data appear modest, and the relatively poor model fits can be attributed to several factors. First, we included reaction time outliers and used non-log-transformed data in our analysis (see details in Methods) to avoid introducing additional assumptions. We verified the contribution of trial-level noise from outliers to the model fit by demonstrating consistently improved conditional R2 values when outliers were removed (S3 Table). Second, unlike previous studies that used exponential or unimodal probability distributions, typically resulting in a single peak or a monotonic increase, we employed bimodal distributions that generate dual expectancy peaks. This probability structure produces more complex temporal expectations that cannot be adequately captured by a simple monotonic drift in brain signals. Third, to ensure a complete probability distribution structure, we included trials across the full foreperiod range in both our design and analysis, even though some foreperiod durations were in few trial numbers, likely introducing trial noise. This allowed us to more comprehensively capture the evolution of the HF over time. Fourth, our design is the first to create two probability regularities, and to achieve it, we also designed various bimodal distributions across blocks (e.g., equal 50–50 peak contributions versus asymmetric 20–80 peaks). Such complexity might increase trial-by-trial (also inter-individual) variability in learning and strategy, which in turn can increase behavioral noise and reduce the explained variance in models.

Finally, all the models we tested were framed within the HF approach and its derivative, the probability distribution, both of which rely solely on stimulus properties and lack a mechanistic foundation for how the brain processes them. More recently, several mechanistic computational and neural network-based models have emerged as promising alternatives, and their model predictions may better account for our observed data. For example, Bayesian frameworks describe temporal expectation as inference over time, with priors about event probability distributions continuously updated in light of new observations [20,28]. Similarly, reservoir computing models can be linearly or nonlinearly decoded to represent hazard-like signals or probabilistic expectations, making them strong candidates for biologically plausible temporal coding [29,30]. Future work may therefore benefit from incorporating these frameworks, which provide more mechanistic accounts of temporal expectation.

Still, we would like to emphasize that despite this added complexity, our key behavioral and neural findings remain robust. All major results, such as both significant behavioral and neural correlates with HFU, HFC, and the interaction term, were consistently verified and shown through control and additional analyses.

The asymmetrical sequential effect observed in bimodal distributions

While our results demonstrated that reaction times were faster as hazard values were higher, we also observed an asymmetrical sequential effect due to our design composed of long/short FP1 paired with long/short FP2. The effect describes how reaction times increase if the preceding foreperiod trial is longer than the current one in the variable foreperiod paradigm. A model based on the principle of trace conditioning has been proposed to account for the effect [11]. Initially, in the case of a uniform probability distribution, the chance of a target occurring at any time is equal and thus the associated weights for each critical moment are equal. During the preparation for target occurrence, weights decrease as the corresponding critical moment passes (called extinction). This so-called extinction becomes weaker as time elapses, meaning that the weight associated with a late moment decreases less or remains relatively unchanged. At the imperative moment when the target is presented, the response is made, leading to an increase in the weight associated with the imperative moment (reinforcement). Then, the associated weights are passed to the next foreperiod trial. If the imperative moment is shorter (i.e., a target occurs after a shorter foreperiod than the previous one), the associated weight was already weaker for that moment, thus resulting in a slower response.

Regarding the asymmetrical sequential effect, two points should be noted: (1) Our regression results still show modulation from the HF while durations of FP1 were controlled as a covariate, and (2) Our forward encoding model results excluded the sequential effect. Regarding (1), we calculated the average reaction times following long and short FP2 after either short or long FP1 (i.e., LL, LS, SS, and SL) for four blocks (S8 Fig, the visualization of S4 Table). The asymmetry effect was observed, where reaction times following FP2 tended to be longer when FP1 had a longer duration (LS versus SS), but the effect was different among the blocks due to our manipulations for different probability distributions. For example, in S8 Fig, the asymmetry between the reaction times following FP2 in LS and SS is smaller in Block 4 but larger in Block 3. This is because of Block 4 composed of more trials of LS (40%) than Block 3 (10%), and fewer trials of SS (10%) than Block 3 (40%). Therefore, to carefully identify the effect of the foreperiod on reaction time (i.e., temporal hazards) and exclude the sequential effect, we controlled the FP1 length in our regression analysis.

Regarding (2), it is known that anatomical locations responsible for the effect of the foreperiod on reaction and the sequential effect are distinct [31]. Patients with lesions to the right frontal area had a weaker foreperiod-reaction-time effect while the sequential effect remained intact. In contrast, lesions to the left premotor area diminished the sequential effect. Here, we adopted the forward encoding model and identified the neural correlates of HFs mostly in the posterior cingulate and frontal areas; however, an involvement of the premotor area was not detected. This suggests the forward encoding model as a plausible analysis tool to extract responses of temporal predictions while eliminating confounds due to the sequential effect. Still, although we used individual MRI images to reduce the volume conduction effect and reconstruct source-level signals, we should carefully consider the spatial limitations of the EEG when coming to a conclusion.

In conclusion, to our knowledge, this study is the first to reveal how the brain integrates multi-level information of event timing for prediction establishment. This also supports the hierarchical organization of the predictive-coding theory in the “when” domain. This paradigm was initially inspired by the auditory local-global oddball paradigm which was aimed to study the multi-level prediction of event pitches in the “what” domain [32,33]. As we solely manipulated the event timing (i.e., the “when” domain), in order to generalize this well-known and fundamental theory, we plan to simultaneously manipulate multi-level information in both the “what” domain (e.g., tone pitches) and the “when” domain (e.g., tone onset) in future research.

Methods

Participants

We recruited 34 participants (17 males and 17 females; age: 23 ± 2.9 years old). The inclusion criteria were: (1) age between 20 and 50 years old; (2) no severe deficit in hearing, eyesight, and color discrimination that could cause problems in understanding experimental procedures; (3) no medical history and diagnosis of neurological or psychological diseases reported by the participant. All participants signed consent forms after understanding the procedure and before the experiment started. The protocol was approved by the ethical committee of the University of Tokyo (No. 21-372) and has been conducted according to the principles expressed in the Declaration of Helsinki. The data from three participants were excluded because they were easily distracted during the experiment or missed structural MRI acquisition (16 males and 15 females; age: 23 ± 3 years old).

Stimulus

The warning signal was composed of an auditory stimulus (three combined pitches: 350, 700, and 1,400 Hz, 55 dB) and a visual stimulus (a white dot with a 15-pixel diameter placed at the center). The duration of the warning signal was 0.1 s. The target signal was composed of an auditory stimulus (three combined pitches: 500, 1,000, and 1,500 Hz) and a visual stimulus (a white dot with a 15-pixel diameter placed at the center). The duration continued until the zero button was pressed or lasted for 1 s if no zero-pressed response was detected. The stimulation layout on the monitor (a resolution of 1,920 * 1,080 pixels) included (1) a gray background (RGB: [85, 85, 85]), (2) a fixation (a white cross with a 40-pixel diameter) placed 110 pixels below the center of the monitor. The auditory stimuli were delivered through a pair of desktop speakers. The stimulation was programmed using MATLAB-based Psychtoolbox [34,35] and presented in a dim and sound-proof booth.

Foreperiod sequence paradigm

We paired two foreperiod trials as a sequence (FP1 and FP2) to establish two regularities: the single foreperiod and the foreperiod sequence. With each sequence, two foreperiod trials were separated by a 1.2-s interval between the offset of the target in FP1 and the onset of the warning in FP2. Between consecutive sequences, the interval ranged from 3 to 3.2 s, with increments of 0.05 s, between the offset of the target in FP2 and the onset of the warning in the next FP1. To make foreperiod chunking obvious to the participant, a black square (a size of 400 * 400 pixels placed 110 pixels below the center) was presented 0.5 s after the press response for the target in FP2 or 1 s after the onset of the target when the press response was absent. Foreperiod trials within the duration range between 0.4 and 1.1 s and between 1.3 and 2 s are denoted as S and L, respectively. The participants underwent prior testing to be familiar with foreperiod trials having relatively short or long durations. There were four sequence types: LL, SS, LS, and SL, where the first and the second in each pair represent the duration of FP1 and FP2, respectively.

Four blocks were designed, each containing 50 trials of two-foreperiod sequences. In Block 1, there were 25 trials of LL and 25 trials of SS, leading to an unconditional probability distribution with two nonoverlapping unimodal probability distributions, respectively, peaking at 0.8 and 1.6 s. The 50–50 unconditional probability distribution represents 50% cumulative probabilities in the range of S and 50% cumulative probabilities in the range of L. For conditional probability distribution, a 0–100 distribution after long FP1 was calculated, representing close to 0% cumulative probabilities in the range of S and close to 100% cumulative probabilities in the range of L. Similarly, a 100–0 distribution after short FP1 was calculated. In Block 2, there were 25 trials of LS and 25 trials of SL, resulting in a 50–50 unconditional probability distribution, a 100–0 conditional probability distribution after long FP1, and a 0–100 conditional probability distribution after short FP1. In Block 3, there were 20 trials of LL, 5 trials of LS, 20 trials of SS, and 5 trials of SL, resulting in a 50–50 unconditional probability distribution, a 20–80 conditional probability distribution after long FP1, and an 80–20 conditional probability distribution after short FP1. In Block 4, there were 5 trials of LL, 20 trials of LS, 5 trials of SS, and 20 trials of SL, resulting in a 50–50 unconditional probability distribution, an 80–20 conditional probability distribution after long FP1, and a 20–80 conditional probability distribution after short FP1. S9 and S10 Figs detail the trial numbers of foreperiod with different durations across four blocks, respectively, for all participants and for one participant. As shown in S10A Fig, we assigned one trial each to foreperiods of 0.4, 0.5, 0.6, 1.0, 1.1, 1.3, 1.4, 1.9, and 2.0 s to maintain a nearly complete and continuous distribution of foreperiods within each block. This design allowed for more accurate estimation of HFs over time while minimally affecting the learning of conditional probabilities. In particular, foreperiods of 1.1 and 1.3 s—which are difficult to clearly categorize as short or long—were included with only one trial each. Their influence was confirmed to be minimal by control analyses excluding these foreperiods (S13 Table), which yielded results similar to the main findings (Table 3).

During the experiment, the participants were instructed to (1) sit comfortably, (2) rest their chins on a chin supporter, (3) look at the fixation to minimize head and eye movements, and (4) press the zero button as soon as the target signal appeared. The four blocks were presented in a random order and two subsequent repeats followed. After each block presentation, there was a short rest, and the participants were asked to identify which sequence type appeared most frequently to ensure their engagement with the experiment. Each participant completed a total of 600 FP1 trials and 600 FP2 trials, with the number of trials determined primarily based on prior studies [26,10]. We also confirmed the stability of individual results using bootstrap resampling, which showed that the distribution of individual R-squared values was unimodal, with the actual values close to the bootstrap mean and within the 95% confidence intervals (S11 and S12 Figs).

Data acquisitions

We recorded EEG signals using the 64-channel actiCAP slim from Brain Products and captured individual electrode locations using a 3D camera (brand: STRUCTURE). To reconstruct EEG sources, we also collected individual structural MRI images (T1) using SIEMENS 3T Magnetom Prisma. These two measurements were conducted either on the same day or on two separate days.

EEG recording.

The 64-channel electrode cap was capped according to anatomical landmarks including nasion, left, and right porions (a top of the ear canal). We captured electrode locations and the three landmarks (labeled with red stickers) using the 3D camera mounted on an iPad. During EEG recording, event codes were sent at the onset of the warning signal in each foreperiod trial. Raw EEG signals were recorded with a 1000-Hz sampling rate.

MRI acquisition.

Before image acquisition, the participants completed an MRI safety checklist, and a device was used to detect whether there was any metal material on body. The three anatomical landmarks were labeled using sugar candies (小林製薬のブレスケア) which can be identified in MRI images. The use of the three landmarks allowed us to align electrode locations and MRI images to the same anatomical axis in later analyses. During acquisition, comfortable air-filled paddings were used to prevent severe head motion, and earplugs were used to reduce scanner noise. The T1 acquisition protocol included a field of view: 240 × 256 mm, 300 × 320 matrix, TR: 2,400 ms, TE: 2.22 ms, flip angle: 8°, and 0.8-mm slice thickness.

Data analyses

Hazard function.

HF, describing the conditional probability of an event occurring at time t given it has not yet occurred, was used to estimate dynamics of temporal predictions. The formula is listed below:

HF(t)= f(t)1C(t) (1)

where t represents time points with the range of 0.4–2 s. The function f represents the probability distribution of the foreperiod in a block. The function C represents the cumulative probabilities until time t.

When 1 C(t) is close to zero, HF(t) is toward infinity.

HF(t)=max{HF(0), HF(1), ..., HF(t1)} (2)

Finally, we normalized hazard values between 0 and 1

Linear mixed-effect analysis.

We excluded only the false alarm trials from our analysis, meaning we did not disregard outlier trials or trials from the early phase of each block. When considering outlier trials, we could, for instance, exclude those with reaction times that are more than 2.5 standard deviations above or below the mean. In this case, there is no significant difference in the number of trials among the four blocks (S5 Table). However, it’s crucial to recognize that removing outliers is based on certain assumptions, such as participants not paying attention to these trials. Also, alternative methods, such as excluding trials that fall outside the 25th and 75th percentiles, might yield different outcomes. Regarding trials in the early phase, our analysis found no significant difference in reaction times between the first 20 trials and the last 20 trials (S6 Table). Given the lack of significant differences and our concerns about potential data loss, we decided to keep these outliers and early-phase trials in our linear mixed-effects analysis. While trial-level variability from outliers attenuated the model fit (see the comparisons in S3 Table), we emphasize that the overall conclusions remain unchanged, and the robustness of our results can be further demonstrated by analyses using nearly the entire dataset.

To estimate correlations between hazard values (i.e., temporal prediction) and reaction times, we used linear mixed-effect models while individual differences were controlled. The durations of FP1 (L or S) were also controlled when reaction times following FP2 were modeled. The regression models are as below:

For FP1:

RTt ~ HFt+(1|Sub) (3)

For FP2:

RTt ~ HFUt+(1|Sub)+(1|FP1)  (4)
RTt ~ HFCt+(1|Sub) +(1|FP1)   (5)
RTt ~ HFUt+HFCt+(1|Sub)+(1|FP1)    (6)
RTt ~ HFUt+HFCt+ HFUt* HFCt+(1|Sub) +(1|FP1)   (7)

where RTt represents log-transformed reaction times to the target signal appearing at time t (relative to the onset of a warning signal); HFUt represents an HFU value at time t; HFCt represents an HFC value at time t.

Additionally, some studies have employed log-transformed reaction times in regression models to demonstrate a negative relationship between log-transformed reaction times and hazard values [9,10]. For completeness, we also conducted analyses using log-transformed reaction times, and our conclusions remained consistent (see S7, S8, and S9 Tables). Moreover, we also performed analyses with HFs based on probability distributions of the actual trial configurations (refer to the example in S10 Fig). The results aligned closely with the model utilizing HFs in Fig 1E (see S10, S11, and S12 Tables and S13, S14 Figs).

The analyses were conducted using R (4.2.2). The functions, lmer, anova, and r.squaredGLMM, were used for the regression analyses, model comparisons, and conditional R-squared, respectively.

EEG preprocessing.

The raw EEG signals were preprocessed using MATLAB-based EEGLAB for each participant. Firstly, channels showing high amplitudes (over 100 microvoltage) in over than half of the dataset were removed (pop_select.m), and signals below 0.1 Hz were filtered out (pop_eegfiltnew.m). Independent component analysiswas used for component extraction (pop_runica.m), and components containing artifacts were removed using the ADJUST plugin toolbox (interface_ADJ.m). Next, the data were segmented into epochs with a time range between −0.8 and 2.8 s relative to the onset of the warning for each foreperiod trial (pop_epoch.m), and epochs showing high amplitudes (over 100 microvoltage) were manually removed. The epoch data were then referenced to the average of signals across the 64 channels (pop_reref.m), and 1-Hz high-pass and 55-Hz low-pass filters were applied (pop_eegfiltnew.m, the default setting: linear noncausal finite impulse function, the forward and backward direction, zero-phase, 25% of the lower passband edge as the transition band). Finally, the data were baseline-corrected (−0.2 and 0 s; pop_rmbased.m) and down-sampled to 250 Hz (pop_resample.m).

For further analyses, we disregarded trials with the foreperiod lasting <0.8 s or trials with the false alarm. The processed EEG signals were estimated between 0.4 s after the onset of the warning and the onset of the target.

Source reconstruction analysis.

The analysis aims to estimate volume conductivity (i.e., forward model) and interpolate sources of electrical potentials (i.e., inverse model) for each participant. First, a head model (also known as a volume conduction model) was created using structural MRI images with the finite element method. The images were resliced into isotropic dimensions (256 * 256 * 256 [voxel]) (ft_volumeslice.m) and realigned to the three candy-labeled markers (ft_realign.m). Then, the images were segmented into five tissue types (gray matter, white matter, cerebral cerebrospinal fluid, skull, and scalp) (ft_volumesegent.m). The boundary was adjusted to create a hexahedron for each voxel with the shifting parameter set to 0.3 (ft_preparemesh.m). The volume conductivity, [0.33 0.14 1.79 0.01 0.43], was assigned in the order of the above-mentioned tissue types (ft_prepare_headmodel.m). Second, the electrode locations captured in the 3D image were aligned according to the three red-dot-labeled markers (ft_meshrealign.m), and electrode names were assigned manually (ft_electrodeplacement.m). The locations were further adjusted according to the brain shape for better fitness (ft_plot_headshape.m; ft_electroderealign.m).

Third, the leadfield (potential contributions from dipoles to electrodes) was created with the head model and the electrode locations, with a resolution of 1 center (ft_prepare_leadfield.m). Then, linear-constraint minimum-variance beamforming (lcmv beamforming) was applied to extract spatial filters (ft_sourceanalysis.m). The input products included the processed EEG signals, the head model, and the leadfield. The output product was spatial filters (dimensions: xyz-axes * electrode * source). Please note that the source numbers varied because of the use of individual MRI images. To acquire source signals in dimensions of time points and source, spatial filters of each source (xyz-axes * electrode) were multiplied by the EEG (electrode * time points), resulting in a time course in three axes (xyz-axes * time points). We used principal component analysis to extract a time course along a dominant axis (time points).

Multivariate temporal response function analysis (mTRF).

The multivariate temporal response function analysis (mTRF) was used to evaluate correlations between hazard values (i.e., temporal prediction) and EEG source signals. This analysis enables a convoluted regression of a stimulus vector (e.g., HFC) against neural responses (e.g., a time course of a source), and has been widely used in speech research [13] and recently in time research [10]. The formula is as follows.

r(t,n)= τw(τ,n) s(tτ)+ ε(t,n) (8)
w=(sTs+λI)1 sTr (9)

where r represents the EEG source response for one trial of one participant. t represents a time point in a range between 0.8 and 2 s relative to the cue onset. To match the sampling rate of the source response (in 250 Hz), hazard values (in 10 Hz) were splined-interpolated (interp1.mat). n represents an EEG source (~3,000 sources in total). τ represents a time lag, set to −0.1 and 0.2 s for minimal and maximal time lags with a step of 4 ms as a step. w represents the TRF in the dimension of 76 lags * ~ 3,000 sources. s represents the HF. ε represents a residual response not explained by the model. I represents the identity matrix. λ represents a ridge parameter or the smoothing constant, set to 1. The analysis was conducted using the MATLAB-based mTRF toolbox. We determined the parameters λ and τ based on optimal correlation coefficients between hazard values and EEG signals in the cross-validation (mTRFcrossval.m). Specifically, we tested the ridge parameter λ across a log-spaced range from 10−3 to 10³ (i.e., 10.^(−3:1:3)) and evaluated four combinations of time lag parameters (τ), with minimum lags set to either −0.2 or −0.1 s and maximum lags to either 0.2 or 0.3 s. In each fold of the outer loop, one trial was held out as the test set, while the remaining N − 1 trials served as the training set. For each combination of the ridge and time lag parameters (inner loops), a model was trained on the training set and tested on the held-out trial. This procedure was repeated for all λ values and time lag combinations (inner loops) across all trials (outer loops), resulting in performance metrics (i.e., correlation coefficients) for each parameter set across all folds. The optimal λ and time lag parameters were selected as those that yielded the highest average correlation across the outer folds.

For each participant, the leave-one-out correlation coefficients were evaluated as follows (Fig 3): (1) for each trial, it was temporarily removed from the data; (2) for each source, responses of the remaining (n–1) trials were modeled against the corresponding hazard values (mTRFtrain.m); (3) the output was the TRF w; (4) for each source, the TRF and the HF of the removed trial were used to reconstruct a predicted response of the removed trial; (5) Pearson correlation coefficient between the predicted response and the actual response was estimated for each source (mTRFpredict.m). For FP1, the response was modeled against HFU. For FP2, the response was modeled against (1) HFU-only, (2) HFC-only, (3) HFU and HFC, and (4) both with the interaction term. For each participant, the final correlation coefficients (n trials * ~ 3,000 sources) were averaged across trials. Then, the source data were projected to the voxel level of the brain (7109137 voxels).

Moreover, we also modeled the response against shuffled hazard values as a benchmark control. We shuffled hazard values for each trial and participant. We segmented hazard values into five parts (e.g., 250 values with 50 values per part) and rearranged the order of these parts. This shuffle can contain partial changes of the HF, compared to the total randomization. For example, one shuffle with the order of 2-1-3-5-4 would thus be (51th–100th)-(1st–50th)-(101st–150th)-(201st–250th)-(151st–200th). For the shuffled HF as controls, we only found significant correlates of shuffled HFU (S15 Fig) and disregarded them from the correlates shown in Fig 4A.

Statistics and visualization.

In order to assess whether the correlation coefficients significantly differ from zero, we performed group-level analyses, and the procedure included source interpolation, volume normalization, and source statistics. For each participant, sources, each containing the correlation coefficient, were interpolated back to their individual MRI images (ft_sourceinterpolate.m). The anatomical layout was then normalized to the standard brain map, with parameters set to T1.nii in SPM12 and a nonlinear transformation (ft_volumenormalise.m). We tested whether the correlation coefficient of each brain voxel is significantly above 0 or significantly different from the one obtained using different variables (e.g., only HFU versus HFU + HFC) in the models across the participants. The parameters were set to Monte Carlo method, cluster-based correction, 1,000 randomization, and an alpha level of 0.05 (ft_sourcestatistics.m). Specifically, the Monte Carlo method is a nonparametric randomization approach used to estimate statistical significance by generating a null distribution through repeated permutations of the data. In each permutation, the data from the two conditions (e.g., Model 1 performance versus zero performance) were either left unchanged or sign-flipped within each participant to simulate the null hypothesis. Group-level test statistics (i.e., t-values) were then computed across all voxels for each permutation. To correct for multiple comparisons, we employed the cluster-based maxsum method. In the method, voxel-wise t-values were thresholded at a cluster-forming alpha level of 0.025 (0.05/2 for two-sided testing). Spatially, “adjacent” and “significant” voxels were grouped into clusters, and each cluster was assigned a statistic value equal to the sum of its t-values. Only the largest cluster-level statistic value was kept. We performed 1,000 permutations to form a null distribution of maximum cluster sums. An observed cluster was considered statistically significant if its summed t-value exceeded the threshold percentile of the null distribution (0.05/2 for two-sided testing). Then, the significances were visualized on cortical surfaces of the MNI brain with parameters set to a nearest projection and no lighting (ft_sourceplot.m).

Supporting information

S1 Text. Supplementary methods for calculating different blurred models.

(DOCX)

pbio.3003459.s001.docx (26.7KB, docx)
S1 Table. False alarm trial numbers across blocks.

(DOCX)

pbio.3003459.s002.docx (19.6KB, docx)
S2 Table. Average reaction times.

(DOCX)

pbio.3003459.s003.docx (20KB, docx)
S3 Table. Comparisons of linear mixed-effects models on reaction times (false alarms only vs. false alarms and reaction times outliers excluded).

(DOCX)

pbio.3003459.s004.docx (21.5KB, docx)
S4 Table. Average reaction times following FP2.

(DOCX)

pbio.3003459.s005.docx (20.5KB, docx)
S5 Table. Outlier trial numbers across four blocks.

(DOCX)

pbio.3003459.s006.docx (19.9KB, docx)
S6 Table. Average reaction times across different trials.

(DOCX)

pbio.3003459.s007.docx (20.9KB, docx)
S7 Table. Effect of HFU on log-transformed reaction times following FP1.

(DOCX)

pbio.3003459.s008.docx (19.9KB, docx)
S8 Table. Comparisons of linear mixed-effect models on log-transformed reaction times following FP2.

(DOCX)

pbio.3003459.s009.docx (20.5KB, docx)
S9 Table. Effects of HFU and HFC on log-transformed reaction times following FP2.

(DOCX)

pbio.3003459.s010.docx (20.5KB, docx)
S10 Table. Effect of actual HFU on reaction times following FP1.

(DOCX)

pbio.3003459.s011.docx (20.1KB, docx)
S11 Table. Comparisons of linear mixed-effect models on reaction times following FP2.

(DOCX)

pbio.3003459.s012.docx (20.4KB, docx)
S12 Table. Effects of actual HFU and HFC on reaction times following FP2.

(DOCX)

pbio.3003459.s013.docx (20.5KB, docx)
S13 Table. Effects of HFU and HFC on reaction times following FP2 (1.1- and 1.3-sec FP removed).

(DOCX)

pbio.3003459.s014.docx (20.5KB, docx)
S1 Fig. Average and predicted reaction times across participants.

The average reaction time following FP2 for each block is shown by the black line, with dominant foreperiod durations marked by black circles. The shade represents the standard deviation. Predicted reaction times are shown by the red line, with dominant foreperiod durations marked by red squares. For FP2, predicted reaction times were obtained from Models 1, 2, and 3. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

(EPS)

S2 Fig. Splined-interpolated hazard values.

The original hazard values at a 10-Hz sampling rate are represented in orange, while the interpolated values at a 250-Hz sampling rate are represented in blue. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

(EPS)

pbio.3003459.s016.eps (1.5MB, eps)
S3 Fig. Temporal response function of HFU.

Within the significant area for HFU-only in Fig 4A, the positive and negative average TRF (i.e., weight) between source responses and HFU values at a 0-s lag is shown. Values were normalized between −1 and 1. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

(EPS)

pbio.3003459.s017.eps (5.9MB, eps)
S4 Fig. Temporal response function of HFC.

Within the significant area for HFC-only in Fig 4B, the positive and negative average TRF between source responses and HFC values at a 0-s lag is shown. Values were normalized between −1 and 1. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

(EPS)

pbio.3003459.s018.eps (5.9MB, eps)
S5 Fig. Temporal response functions of HFU, HFC, and their interaction.

Within the significant area for HFU+ + HFC+ + HFU*HFC in Fig 4C, the positive and negative average TRFs between source responses and each of those three values at a 0-s lag is shown. Values were normalized between −1 and 1. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

(EPS)

pbio.3003459.s019.eps (15.6MB, eps)
S6 Fig. Event-related potentials (ERP) during FP1 and FP2.

We averaged processed EEG signals in foreperiod trials lasting >0.7 s. Time zero represents the onset of the warning signal. Differences between ERPs during FP1 and FP2 (FP1 − FP2) were tested using the Monte Carlo method and cluster-based correction method (1,000 randomization, a two-tailed test, and an alpha level of 0.05). Triangles in the topography and gray shades in the time course represent the significance. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

(EPS)

pbio.3003459.s020.eps (16.7MB, eps)
S7 Fig. R-squared obtained using different representations of the temporal predictions.

The probability distribution is denoted as PDF, the hazard function as HF, temporally blurred as temp-blurred, and probabilistically blurred as prob-blurred. (A) The adjusted R-squared was obtained from each participant for FP1, and the conditional R-squared was obtained (with FP1 duration as a random effect). The black circle represents the mean of the R-squared across participants. (B) The mean R-squared was obtained across participants across blurring parameters ranging from 0.15 to 0.35 in steps of 0.1. The results on the left and right columns were obtained from the regression model formulas (3) and (7), respectively. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

(EPS)

pbio.3003459.s021.eps (855.9KB, eps)
S8 Fig. Average reaction times following FP2.

The visualization of S3 Table. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

(EPS)

pbio.3003459.s022.eps (1.4MB, eps)
S9 Fig. Probability distributions of the actual trial configuration averaged across participants.

The red line represents the hazard function derived from the probability distribution function in Fig 1D. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

(EPS)

pbio.3003459.s023.eps (2.5MB, eps)
S10 Fig. The actual trial configuration from one participant.

There were 50 trials of the two-foreperiod sequence, resulting in a total of 100 foreperiod trials for each block (A) Initial trial numbers for each participant and block were determined based on a 50−50 probability distribution. (B) For each participant, foreperiods were randomly selected in the ranges of L and S, and paired as LL, SS, LS, or SL. An example of actual trial numbers for one participant is shown. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

(EPS)

pbio.3003459.s024.eps (1.6MB, eps)
S11 Fig. Bootstrap distribution of adjusted R-squared values based on 1,000 resamples of reaction times following FP1 for each participant.

Reaction times were regressed against HFU in the linear regression model. The blue line represents the actual adjusted R-squared value; the green line represents the mean of the bootstrap distribution; and the red dashed lines represent the 95% confidence interval. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

(EPS)

pbio.3003459.s025.eps (2.1MB, eps)
S12 Fig. Bootstrap distribution of conditional R-squared values based on 1,000 resamples of reaction times following FP2 for each participant.

Reaction times were regressed against HFU, HFU, and the interaction term in the linear mixed effect model, with the FP1 duration as a random effect. The representation is the same as S11 Fig. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

(EPS)

pbio.3003459.s026.eps (2.1MB, eps)
S13 Fig. R-squared obtained using hazard functions from the actual trial configuration.

The representations are the same as those in S7A Fig. The results in the left and right columns were obtained from the regression model formulas (3) and (7), respectively. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

(EPS)

pbio.3003459.s027.eps (476KB, eps)
S14 Fig. Average and predicted reaction times across participants.

The same representations as Fig 2 are used. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

(EPS)

pbio.3003459.s028.eps (1.6MB, eps)
S15 Fig. Neural correlates of original and shuffled HFU.

EEG source responses during FP1 were modeled against shuffled HFU as a control in the panel A and original HFU in the panel B. Four cortical surfaces with significant correlation coefficients are shown. The color bar shows correlation coefficients. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

(EPS)

pbio.3003459.s029.eps (6.1MB, eps)

Acknowledgments

We thank Lu Li and Junko Taniai for helping with participant recruitment and experiment preparation. We also thank Felix B. Kern for proofreading.

Abbreviations

AIC

Akaike’s Information Criterion

BIC

Bayesian Information Criterion

EEG

electroencephalogram

HF

hazard function

mTRF

multivariate temporal response function analysis

TRF

temporal response function

Data Availability

The raw data, along with the data underlying each table, figure, and supporting table and figure, are publicly available in the Open Science Framework (https://doi.org/10.17605/OSF.IO/VEDHP).

Funding Statement

This study was supported by World Premier International Research Center Initiative (WPI), MEXT, Japan (to Z.C.C.) (https://www.jsps.go.jp/english/e-toplevel/) and Japan Society for the Promotion of Science, Japan (to Y.T.H) (https://www.jsps.go.jp/english/e-pd/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Morillon B, Schroeder CE. Neuronal oscillations as a mechanistic substrate of auditory temporal prediction. Ann N Y Acad Sci. 2015;1337(1):26–31. doi: 10.1111/nyas.12629 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Sørensen TA, Vangkilde S, Bundesen C. Components of attention modulated by temporal expectation. J Exp Psychol Learn Mem Cogn. 2015;41(1):178–92. doi: 10.1037/a0037268 [DOI] [PubMed] [Google Scholar]
  • 3.Vangkilde S, Coull JT, Bundesen C. Great expectations: temporal expectation modulates perceptual processing speed. J Exp Psychol Hum Percept Perform. 2012;38(5):1183–91. doi: 10.1037/a0026343 [DOI] [PubMed] [Google Scholar]
  • 4.Rohenkohl G, Cravo AM, Wyart V, Nobre AC. Temporal expectation improves the quality of sensory information. J Neurosci. 2012;32(24):8424–8. doi: 10.1523/JNEUROSCI.0804-12.2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Luce RD. Response times: their role in inferring elementary mental organization. Oxford University Press; 1991. doi: 10.1093/acprof:oso/9780195070019.001.0001 [DOI] [Google Scholar]
  • 6.Nobre A, Correa A, Coull J. The hazards of time. Curr Opin Neurobiol. 2007;17(4):465–70. doi: 10.1016/j.conb.2007.07.006 [DOI] [PubMed] [Google Scholar]
  • 7.Coull JT, Cotti J, Vidal F. Differential roles for parietal and frontal cortices in fixed versus evolving temporal expectations: dissociating prior from posterior temporal probabilities with fMRI. Neuroimage. 2016;141:40–51. doi: 10.1016/j.neuroimage.2016.07.036 [DOI] [PubMed] [Google Scholar]
  • 8.Janssen P, Shadlen MN. A representation of the hazard rate of elapsed time in macaque area LIP. Nat Neurosci. 2005;8(2):234–41. doi: 10.1038/nn1386 [DOI] [PubMed] [Google Scholar]
  • 9.Herbst SK, Obleser J. Implicit variations of temporal predictability: shaping the neural oscillatory and behavioural response. Neuropsychologia. 2017;101:141–52. doi: 10.1016/j.neuropsychologia.2017.05.019 [DOI] [PubMed] [Google Scholar]
  • 10.Herbst SK, Fiedler L, Obleser J. Tracking temporal hazard in the human electroencephalogram using a forward encoding model. eNeuro. 2018;5(2):ENEURO.0017-18.2018. doi: 10.1523/ENEURO.0017-18.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Los SA, van den Heuvel CE. Intentional and unintentional contributions to nonspecific preparation during reaction time foreperiods. J Exp Psychol Hum Percept Perform. 2001;27(2):370–86. doi: 10.1037//0096-1523.27.2.370 [DOI] [PubMed] [Google Scholar]
  • 12.Crosse MJ, Di Liberto GM, Bednar A, Lalor EC. The Multivariate Temporal Response Function (mTRF) Toolbox: a MATLAB toolbox for relating neural signals to continuous stimuli. Front Hum Neurosci. 2016;10:604. doi: 10.3389/fnhum.2016.00604 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Di Liberto GM, O’Sullivan JA, Lalor EC. Low-frequency cortical entrainment to speech reflects phoneme-level processing. Curr Biol. 2015;25(19):2457–65. doi: 10.1016/j.cub.2015.08.030 [DOI] [PubMed] [Google Scholar]
  • 14.Vallesi A, McIntosh AR, Shallice T, Stuss DT. When time shapes behavior: fMRI evidence of brain correlates of temporal monitoring. J Cogn Neurosci. 2009;21(6):1116–26. doi: 10.1162/jocn.2009.21098 [DOI] [PubMed] [Google Scholar]
  • 15.Vallesi A, McIntosh AR, Stuss DT. Temporal preparation in aging: a functional MRI study. Neuropsychologia. 2009;47(13):2876–81. doi: 10.1016/j.neuropsychologia.2009.06.013 [DOI] [PubMed] [Google Scholar]
  • 16.Narayanan NS, Horst NK, Laubach M. Reversible inactivations of rat medial prefrontal cortex impair the ability to wait for a stimulus. Neuroscience. 2006;139(3):865–76. doi: 10.1016/j.neuroscience.2005.11.072 [DOI] [PubMed] [Google Scholar]
  • 17.Narayanan NS, Laubach M. Top-down control of motor cortex ensembles by dorsomedial prefrontal cortex. Neuron. 2006;52(5):921–31. doi: 10.1016/j.neuron.2006.10.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Oshio K, Chiba A, Inase M. Delay period activity of monkey prefrontal neurones during duration-discrimination task. Eur J Neurosci. 2006;23(10):2779–90. doi: 10.1111/j.1460-9568.2006.04781.x [DOI] [PubMed] [Google Scholar]
  • 19.Visalli A, Capizzi M, Ambrosini E, Mazzonetto I, Vallesi A. Bayesian modeling of temporal expectations in the human brain. Neuroimage. 2019;202:116097. doi: 10.1016/j.neuroimage.2019.116097 [DOI] [PubMed] [Google Scholar]
  • 20.Visalli A, Capizzi M, Ambrosini E, Kopp B, Vallesi A. Electroencephalographic correlates of temporal Bayesian belief updating and surprise. Neuroimage. 2021;231:117867. doi: 10.1016/j.neuroimage.2021.117867 [DOI] [PubMed] [Google Scholar]
  • 21.Coull J, Nobre A. Dissociating explicit timing from temporal expectation with fMRI. Curr Opin Neurobiol. 2008;18(2):137–44. doi: 10.1016/j.conb.2008.07.011 [DOI] [PubMed] [Google Scholar]
  • 22.Bastos AM, Lundqvist M, Waite AS, Kopell N, Miller EK. Layer and rhythm specificity for predictive routing. Proc Natl Acad Sci U S A. 2020;117(49):31459–69. doi: 10.1073/pnas.2014868117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Chao ZC, Huang YT, Wu C-T. A quantitative model reveals a frequency ordering of prediction and prediction-error signals in the human brain. Commun Biol. 2022;5(1):1076. doi: 10.1038/s42003-022-04049-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Grabenhorst M, Michalareas G, Maloney LT, Poeppel D. The anticipation of events in time. Nat Commun. 2019;10(1):5802. doi: 10.1038/s41467-019-13849-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Grabenhorst M, Maloney LT, Poeppel D, Michalareas G. Two sources of uncertainty independently modulate temporal expectancy. Proc Natl Acad Sci U S A. 2021;118(16):e2019342118. doi: 10.1073/pnas.2019342118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Bueti D, Bahrami B, Walsh V, Rees G. Encoding of temporal probabilities in the human brain. J Neurosci. 2010;30(12):4343–52. doi: 10.1523/JNEUROSCI.2254-09.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Nobre AC, van Ede F. Anticipated moments: temporal structure in attention. Nat Rev Neurosci. 2018;19(1):34–48. doi: 10.1038/nrn.2017.141 [DOI] [PubMed] [Google Scholar]
  • 28.Doelling KB, Arnal LH, Assaneo MF. Adaptive oscillators support Bayesian prediction in temporal processing. PLoS Comput Biol. 2023;19(11):e1011669. doi: 10.1371/journal.pcbi.1011669 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Shahi S, Fenton FH, Cherry EM. Prediction of chaotic time series using recurrent neural networks and reservoir computing techniques: a comparative study. Mach Learn Appl. 2022;8:100300. doi: 10.1016/j.mlwa.2022.100300 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Yuan Z, Wiggins G, Botteldooren D. A novel Reservoir Architecture for Periodic Time Series Prediction. In: 2024 International Joint Conference on Neural Networks (IJCNN); 2024. p. 1–8. doi: 10.1109/ijcnn60899.2024.10650592 [DOI] [Google Scholar]
  • 31.Vallesi A, Mussoni A, Mondani M, Budai R, Skrap M, Shallice T. The neural basis of temporal preparation: insights from brain tumor patients. Neuropsychologia. 2007;45(12):2755–63. doi: 10.1016/j.neuropsychologia.2007.04.017 [DOI] [PubMed] [Google Scholar]
  • 32.Bekinschtein TA, Dehaene S, Rohaut B, Tadel F, Cohen L, Naccache L. Neural signature of the conscious processing of auditory regularities. Proc Natl Acad Sci U S A. 2009;106(5):1672–7. doi: 10.1073/pnas.0809667106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wacongne C, Labyt E, van Wassenhove V, Bekinschtein T, Naccache L, Dehaene S. Evidence for a hierarchy of predictions and prediction errors in human cortex. Proc Natl Acad Sci U S A. 2011;108(51):20754–9. doi: 10.1073/pnas.1117807108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kleiner M, Brainard D, Pelli D. What’s new in psychtoolbox-3. Perception. 2007;36(14):1–16. [Google Scholar]
  • 35.Pelli DG. The VideoToolbox software for visual psychophysics: transforming numbers into movies. Spat Vis. 1997;10(4):437–42. doi: 10.1163/156856897x00366 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Christian Schnell, PhD

30 Jun 2025

Dear Dr Huang,

Thank you for submitting your manuscript entitled "Temporal Prediction through Integration of Probability Distributions of Event Timings at Multiple Levels" for consideration as a Research Article by PLOS Biology.

Your manuscript has now been evaluated by the PLOS Biology editorial staff as well as by an academic editor with relevant expertise and I am writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. After your manuscript has passed the checks it will be sent out for review. To provide the metadata for your submission, please Login to Editorial Manager (https://www.editorialmanager.com/pbiology) within two working days, i.e. by Jul 02 2025 11:59PM.

If your manuscript has been previously peer-reviewed at another journal, PLOS Biology is willing to work with those reviews in order to avoid re-starting the process. Submission of the previous reviews is entirely optional and our ability to use them effectively will depend on the willingness of the previous journal to confirm the content of the reports and share the reviewer identities. Please note that we reserve the right to invite additional reviewers if we consider that additional/independent reviewers are needed, although we aim to avoid this as far as possible. In our experience, working with previous reviews does save time.

If you would like us to consider previous reviewer reports, please edit your cover letter to let us know and include the name of the journal where the work was previously considered and the manuscript ID it was given. In addition, please upload a response to the reviews as a 'Prior Peer Review' file type, which should include the reports in full and a point-by-point reply detailing how you have or plan to address the reviewers' concerns.

During the process of completing your manuscript submission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Kind regards,

Christian

Christian Schnell, PhD

Senior Editor

PLOS Biology

cschnell@plos.org

Decision Letter 1

Christian Schnell, PhD

17 Sep 2025

Dear Dr Huang,

Thank you for your patience while we considered your revised manuscript "Temporal Prediction through Integration of Probability Distributions of Event Timings at Multiple Levels" for publication as a Research Article at PLOS Biology. This revised version of your manuscript has been evaluated by the PLOS Biology editors, the Academic Editor and the original reviewers.

Based on the reviews and on our Academic Editor's assessment of your revision, we are likely to accept this manuscript for publication, provided you satisfactorily address the following data and other policy-related requests. We note that Reviewer 1 still has a large number of concerns and is not convinced that your model provides a good fit for the experimental data. After discussing these concerns with the other reviewers and the Academic Editor, we think that these are important concerns and do not dismiss them, but still think that your study provides a valuable contribution to the literature, noting that further studies are necessary.

* We would like to suggest a different title to improve its accessibility for our broad audience:

"Humans integrate both unconditional and conditional timing expectations into the timing of their behavioral response"

* Please add the links to the funding agencies in the Financial Disclosure statement in the manuscript details.

* Please include information in the Methods section whether the study has been conducted according to the principles expressed in the Declaration of Helsinki.

* DATA POLICY:

You may be aware of the PLOS Data Policy, which requires that all data be made available without restriction: http://journals.plos.org/plosbiology/s/data-availability. For more information, please also see this editorial: http://dx.doi.org/10.1371/journal.pbio.1001797

Note that we do not require all raw data. Rather, we ask that all individual quantitative observations that underlie the data summarized in the figures and results of your paper be made available in one of the following forms:

1) Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore).

2) Deposition in a publicly available repository. Please also provide the accession code or a reviewer link so that we may view your data before publication.

Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in the following figure panels as they are essential for readers to assess your analysis and to reproduce it: 4D, S7A and S13.

NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values).

Please also ensure that figure legends in your manuscript include information on where the underlying data can be found, and ensure your supplemental data file/s has a legend.

Please ensure that your Data Statement in the submission system accurately describes where your data can be found.

* CODE POLICY

Per journal policy, if you have generated any custom code during the course of this investigation, please make it available without restrictions. Please ensure that the code is sufficiently well documented and reusable, and that your Data Statement in the Editorial Manager submission system accurately describes where your code can be found.

Please note that we cannot accept sole deposition of code in GitHub, as this could be changed after publication. However, you can archive this version of your publicly available GitHub code to Zenodo. Once you do this, it will generate a DOI number, which you will need to provide in the Data Accessibility Statement (you are welcome to also provide the GitHub access information). See the process for doing this here: https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content

As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.

In addition to these revisions, you may need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests shortly. If you do not receive a separate email within a few days, please assume that checks have been completed, and no additional changes are required.

We expect to receive your revised manuscript within two weeks.

To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following:

- a cover letter that should detail your responses to any editorial requests, if applicable, and whether changes have been made to the reference list

- a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable, if not applicable please do not delete your existing 'Response to Reviewers' file.)

- a track-changes file indicating any changes that you have made to the manuscript.

NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:

https://journals.plos.org/plosbiology/s/supporting-information

*Published Peer Review History*

Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://plos.org/published-peer-review-history/

*Press*

Should you, your institution's press office or the journal office choose to press release your paper, please ensure you have opted out of Early Article Posting on the submission form. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please do not hesitate to contact me should you have any questions.

Sincerely,

Christian

Christian Schnell, PhD

Senior Editor

cschnell@plos.org

PLOS Biology

------------------------------------------------------------------------

Reviewer remarks:

Reviewer #1: ===========================================================================

REVISION ROUND 2 - COMMENT REVIEWER #1:

The authors have replied to all of my comments in writing and provided several

additional analyses. To facilitate easy reference to the previous round of review, I briefly reply

to the authors in a comment-by-comment fashion below. But first, let me preface my review with an important aspect of the modeling work that has not yet been resolved in the revised

manuscript.

Preface

The revised manuscript does not offer convincing evidence that the hazard-function-based models provide an adequate fit to the reaction time data over the range of foreperiods.

Quantitatively, the goodness-of-fit is weak (numerically small adj. R2: FP1: 0.173, Table 1;

FP2: 0.215 to 0.221, Table 2). Also the beta values are very small (-0.008 to -0.049) indicating

weak correlation. Suppl. Fig. 2 depicts this weak correlation.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

We thank the reviewer for the careful evaluation and for highlighting important issues regarding the model fit and interpretation of our results. Before addressing specific concerns, we would like to clarify a terminology issue. In the original manuscript, we mistakenly referred to the model fit metric as "adjusted R2". However, since we used a linear mixed-effects model, the appropriate metric is "conditional R2", which accounts for both fixed and random effects. Importantly, this was a labeling error only; the calculations and reported values are correct and remain unchanged. We have revised the terminology throughout the manuscript and supplementary materials to accurately reflect this correction. Below, we respond point by point to the comments.

Response 1: Model fit

We acknowledge the reviewer's concern regarding the modest conditional R2 values (FP1: 0.173; FP2: 0.215-0.221). These values were obtained from analyses in which only false alarm trials were excluded, while outliers and the first 20 trials of each block were retained. This decision was made in response to the previous Comment 4 and reflects our approach to minimize assumptions about the underlying processes of outlier trials.

To examine whether trial-level noise from outliers contributed to the modest fit, we conducted a follow-up analysis using a stricter dataset in which both false alarms and RT outliers (defined as ±2.5 SD) were excluded. This resulted in consistently improved conditional R2 values (see details below):

* FP1: Conditional R2 increased to 0.262

* FP2: Conditional R2 increased to 0.295-0.304

These updated results are now reported in Supplementary Table 3. The main text has also been revised to indicate that trial-level variability attenuated the explanation of the relationship between HF and RT in our original analysis. Additionally, we tested model fit using log- transformed RTs and obtained conditional R2 values of 0.301 for FP1 and 0.340-0.353 for FP2. However, in the main text, we chose to retain the original data selection and analysis approach, which excludes only false alarms and uses non-transformed RTs. This decision was based on two considerations: first, to avoid introducing additional assumptions about the underlying processes of outliers and the assumption of normally distributed reaction times; and second, although the R2 values are not high, they are reasonable compared to those reported in existing publications, as we will discuss below.

-----SUPPL TABLE 3

Comparison with previous studies

Our study yielded modest conditional R2 values, but they are reasonable when compared to prior studies, particularly considering: (1) the use of a more complex probability distribution structure, (2) the inclusion of a full range of foreperiods, and (3) the introduction of both unconditional and conditional probability regularities within a single task. We elaborate on each of these below and summarize them in the table that follows.

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

Before we get lost in the argumentative weeds here: Even after sub-selecting data, the fits remain quantitatively weak (R-squared <= 0.3). For the best-fitting model of behavior of the manuscript, these values are not "reasonable" at all.

In the table that compares R-squared across different studies, the authors provide a selection of four published studies. Two of these studies report R-squared values between 0.429 and 0.53 (Herbst 2018, Bueti 2010). Obviously, the authors included these two studies to argue that the small R-squares values observed in their own work are ok. First, these two studies offer better fits to data with substantially larger R-squared than the current manuscript. Second, the standard for what is a "reasonable" fit of HF to RT is much higher than the authors want to make us believe here by including these two papers. In the temporal prediction literature there are many papers that offer much better HF model fits to RT with larger R-squared. E.g.:

Pasquereau et al. J Neurophysiol 2015: Fig. 2e&f, R-squared = [0.82, 0.97]

Sharma et al. Cereb Cortex 2015: Fig 2b, R-squared = 0.972

Tsunoda et al. Neurosci Lett, 2008: Fig 4B, R-squared = [0.91, 0.93]

In the following sections, the authors try to explain why their HF fits are weak. It appears that the authors make a crucial mistake here: The question is not "Why does the hazard function not fit the data well?", as the authors seem to think, but rather: "What computational model fits the data really well?".

The very weak quantitative evidence for the hazard function hypothesis reported here does not justify the use of the model as a regressor on neural data.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

---TABLE

1. Studies using simple probability distributions yield better fit

For instance, Herbst et al. (2018) applied a linear mixed-effects model using log-transformed RTs and HF as a fixed effect, reporting conditional R2 values of 0.429-0.448 for unimodal and uniform distributions, respectively. On the other hand, Grabenhorst et al. (2019) reported even higher model fits (adjusted R2 values approaching 1; see their Supplementary Figure 13) using exponential and flipped distributions without catch trials.

Although these studies reported higher R2 values, a key difference lies in the simplicity of their probability structures, which typically produce hazard functions or probability distributions with a single peak or monotonic increase used for model fitting. In contrast, our bimodal distributions generate dual expectancy peaks, resulting in more complex temporal expectations that cannot be adequately captured, such as, by a simple monotonic drift in brain signals.

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

The authors argue that the small R-squared values of their models result from the higher complexity of their task. This argument does not logically support their model choice. Whatever the complexity of the task, an adequate model needs to capture the behavior, resulting in larger R-squared. If the HF does not fit the data well, then it is not a good model.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

2. Design issues in studies using bimodal probability distributions

Some studies have reported high R2 values with bimodal distributions. For example, Janssen and Shadlen (2005) reported values of 0.95 and 0.96 in two non-human primates, and Bueti and Macaluso (2010) reported a value of 0.53 in humans. However, these studies did not include a full range of foreperiods. In Janssen and Shadlen (2005), for instance, the bimodal distribution had peaks around 0.27 and 1.93 seconds, but no trials were presented between 0.75 and 1.75 seconds (i.e., between the peaks). There is the omission of intermediate foreperiods with low probability, where presumably few trials are set and high uncertainty could be caused. This design reduces trial-by-trial variability and likely improves model fit by this omission.

In contrast, our design incorporated a nearly continuous range of foreperiods (excluding only 1.2 seconds to avoid categorical confusion between short and long FPs). This allowed us to capture the full dynamics of the hazard function, providing a more comprehensive and naturalistic test of temporal prediction over time. However, this approach also introduced greater variability, which can reduce R2. Notably, this comprehensive design was not limited to our study but also adopted in the above studies using a simpler probability structure.

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

The autors state that their design incorporates a full range of foreperiods and they argue that this introduces greater variability [in RT]. The authors again attribute their weak model fit to differences between other HF papers' tasks and their own task. I can only repeat myself: If the HF does not fit the data well, then it is not a good model. An adequate model needs to better capture the behavior.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

3. Task complexity

Finally and most critically, our study is the first to combine two probability distributions within a single task. Moreover, while employing a bimodal distribution, we also varied its structure across blocks (e.g., equal 50-50 peak contributions vs. asymmetric 20-80 peaks). This manipulation was essential for disentangling unconditional and conditional probability structures and also for assessing how the brain generalizes across different probabilistic rules.

Such complexity can increase trial-by-trial (and also inter-individual variability) in learning and strategy, which in turn can reduce the explained variance in model fitting. Still, we would like to emphasize that despite this added complexity, our key behavioral and neural findings remain robust and statistically significant. All major results, including significant behavioral and neural correlations with HFU, HFC, and the interaction term, were consistently verified and shown through control and additional analyses.

We have included the Supplementary Table 3 in the revised Methods:

"While trial-level variability from outliers attenuated the model fit (see the comparisons in Supplementary Table 3), we emphasize that the overall conclusions remain unchanged, and the robustness of our results can be further demonstrated by analyses using nearly the entire dataset." (Page 30, Lines 15-18)

We have also discussed this in the revised Discussion:

"We also acknowledge that the conditional R2 values in our behavioral data appear modest, and the relatively poor model fits can be attributed to several factors. First, we included reaction time outliers and used non-log-transformed data in our analysis (see details in Methods) to avoid introducing additional assumptions. We verified the contribution of trial-level noise from outliers to the model fit by demonstrating consistently improved conditional R2 values when outliers were removed (Supplementary Table 3). Second, unlike previous studies that used exponential or unimodal probability distributions, typically resulting in a single peak or a monotonic increase, we employed bimodal distributions that generate dual expectancy peaks. This probability structure produces more complex temporal expectations that cannot be adequately captured by a simple monotonic drift in brain signals. Third, to ensure a complete probability distribution structure, we included trials across the full foreperiod range in both our design and analysis, even though some foreperiod durations were in few trial numbers, likely introducing trial noise. This allowed us to more comprehensively capture the evolution of the hazard function over time. Finally, our design is the first to create two probability regularities, and to achieve it, we also designed various bimodal distributions across blocks (e.g., equal 50-50 peak contributions vs. asymmetric 20-80 peaks). Such complexity might increase trial-by-trial (also inter-individual) variability in learning and strategy, which in turn can increase behavioral noise and reduce the explained variance in models. Still, we would like to emphasize that despite this added complexity, our key behavioral and neural findings remain robust. All major results, such as both significant behavioral and neural correlates with HFU, HFC, and the interaction term, were consistently verified and shown through control and additional analyses." (Pages 21-22 Lines 16-13)

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

The authors claim: "our key behavioral and neural findings remain robust and statistically significant." I disagree with this assessment. The behavioral results are obviously not robust (yet again: R-squared < 0.3, systematic deviations between data and model, more on that later). The neural findings hinge entirely on this weak modeling of behavior and are thus also far from being robust. Here a few examples of much more robust behavioral modeling that is then related to neural data:

Janssen & Shadlen Nat Neurosci 2005: Figs. 2a&b, R-squared = 0.77 to 0.96

Pasquereau et al. J Neurophysiol 2015: Fig. 2e&f, R-squared = [0.82, 0.97]

Sharma et al. Cereb Cortex 2015: Fig 2b, R-squared = 0.972

I can only repeat myself: If the HF does not fit the data well, then it is not a good model. An adequate model needs to better capture the RT behavior, irrespective of the task complexity. In this manuscript, the authors developed and used a specific, complex task. The authors should also have developed a specific computational model from first principles, i.e. a model that reflects the computational demands of the task. But the authors opted for fitting an off-the-shelf HF to the data and now they are trying their best to argue that it is a good model.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

Response 2: Beta coefficients

The reported beta values were unstandardized and therefore influenced by the scale of the predictor variables. To improve interpretability, we have added standardized beta coefficients, which range from 0.04 to 0.21 across models. While modest in magnitude, these effects are statistically significant and consistent across models and analyses. The standardized beta values have been incorporated into Tables 1 and 3 (also shown below) as well as Supplementary Tables 7, 9, 10, and 12.

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

The authors acknowledge that their models' beta values are numerically "modest in magnitude". In other words: they are very small. Arguing from the standpoint of statistical significance is not convincing here. There will be many other models that yield such small, yet significant, correlation beta values (especially in light of the large data set which facilitates high statistical power...). Therefore models of RT over foreperiod are commonly assessed based on R-squared and on visual inspection of the fit according to commonly accepted criteria (more on that below).

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

---TABLE 1

---TABLE 3

===========================================================================

REVISION ROUND 2 - COMMENT REVIEWER #1:

Qualitatively, the RT modulation (Suppl. Fig. 1) does not resemble the dynamics of the

(mirrored) HF variables (Fig 1E).

Since all of the manuscript's main results hinge on the HF fits to RT, more must be done to

convince the reader that the HF is indeed an appropriate model of RT (and a neurobiologically

relevant computation).

First, it is crucial for the reader to see the average RT over foreperiod (group-level) plots in the

manuscript (not in the Supplement) so that the reader can assess the modulation of RT.

Second, these average RT over foreperiod (group-level) plots need to show the dynamics of the RT modulation in a clear way. The current Suppl. Fig. 1 does not do this sufficiently: The y-axis spans close to 1 s, while the RT modulation spans less than 0.1 s, making it hard to see any

pattern. All the boxes, error bars, outlier dots etc. in the plots further obscure a clear view of the RT modulation.

The authors need to plot average RT over foreperiod and rescale the y-axis so that the RT

modulation over foreperiods becomes clearly visible. See e.g. these papers for intuitive average RT over foreperiod plots: Janssen & Shadlen, Nat Neuro 2005, Bueti et. al. J Neurosci 2010, Grabenhorst et al. Nat Comms 2019, Schoffelen et al., Science 2005.

Third, Suppl. Fig. 2 shows RT over hazard. This figure does not show the fits *over time*, i.e.

over foreperiods. Instead, in Suppl. Fig. 2, data are collapsed across foreperiods. It is therefore

not possible to visually inspect the model fits to data *over time* (i.e. over foreperiod). Such an assessment is crucial since a convincing model needs to capture RT dynamics over the range of foreperiods. The average RT over foreperiod plots are the common way to display fits to data in these types of experiments (e.g. Fig. 2 in Janssen & Shadlen, Nat Neuro 2007, which the authors cite).

The authors need to provide plots that show how the HR model fits *group-level* average RT

*over time*, i.e. over foreperiods. This should be done for *all models* so that the reader can

assess the choice of models based on visual inspection and not only based on fit metrics (adj. R2, AIC, BIC).

[n.b. The authors say that it is challenging to visualize the interaction term in one of their models (their resply to comment 5). Some ideas: 3D surface plots, line plots across time for different levels of HFX(t), contour plots...)

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

Response 3: Visualizing average RT over foreperiods and model fit across all models

We fully agree with the reviewer that the original Supplementary Figure 1 did not clearly demonstrate RT changes due to its wide y-axis range and overly detailed plotting elements (e.g., boxplots and outliers). In response, we have created a new Figure 2 that shows group-level average RTs over foreperiods in FP1 and FP2 across all four blocks, using a more appropriate y- axis range and simplified plotting (mean with standard deviation shading only). This figure also includes: (1) the predicted RT values following FP1 from Model 1 (HFU only), (2) the predicted values following FP2 from Model 4 (HFU, HFC, and the interaction term), and (3) the actual trial distribution across all foreperiods at the bottom.

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

The authors now provide plots in which the y-axis scaling is scaled to accomodate the large RT variance.

1) Noone has asked for plots of variance. This, again, makes it unnecessarily hard to assess model fit and RT modulation: Modulation of RT is much less than 100 ms on most plots, yet y-axis spans ~170 ms.

2) Noone has asked to mark "the foreperiods with dominant trial frequencies" with dots. This just distracts from inspecting the fits. And it is utterly nonsensical: A model of RT over foreperiod needs to fit all the data, not only the foreperiods with dominant trial frequencies.

At this point, after having asked twice for these simple plots, which should have been part of the manuscript from the initial submission, in line with the literature that the authors cite, the authors' plotting strategy seems intended to obscure the weak model fits.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

We elaborate further on them in Response 4 below.

We sincerely thank the reviewer for this suggestion, which helped us overcome the challenge of visualizing Model 4 (HFU, HFC, and the interaction term). Accordingly, we have also revised the visualization of model fits from other models in Supplementary Figure 1 and used actual HF values in Supplementary Figure 14.

---FIG 2

---SUPPL FIG 1

Response 4: Model fit interpretations

We agree with the reviewer that both visual inspection and statistical criteria (e.g., AIC, BIC) are critical for evaluating model performance. Several key points should be emphasized regarding the interpretation of model fits in the new Figure 2.

First and also before discussing the model fit in detail, it is important to note that we performed the linear mixed-effects model analysis using data from all four blocks combined. Accordingly, the predicted and actual RTs across all blocks should be considered jointly when evaluating model fit. For example, inspecting the model fit for FP2 involves considering the eight actual and predicted RT curves across the various blocks together.

Second, we note that the predicted RT values (red and orange) generally fall within the range of the observed group mean (black) ± 1 standard deviation (shading), indicating an overall reasonable fit.

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

The authors argue that a model is reasonable if it predicts with a precision of 1 std of the mean of the to-be-predicted variable. This is definitely not an appropriate definition of a reasonable model fit in RT models in temporal prediction and it appears that the authors are making up their own rules by now.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

However, some discrepancies in the trend can be observed, particularly at short foreperiods (e.g., 0.4-0.6 s), where the actual RTs show a drop (black), while the predicted RTs remain relatively flat (red and orange). Several points are worth noting in this context:

1. The predicted values in these early foreperiods are flat because they were associated with very low probability values, deliberately designed to approach zero. As a result, the derived HF values (both blurred and unblurred) were also close to zero, producing flat predicted RTs.

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

Fig 2 shows that there is a systematic deviation between data and model at short foreperiods. This means that the model is not doing a good job at predicting the data. Which means that the authors' computational assumtions about the brain are wrong.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

2. As discussed in Response 1, although these short foreperiod trials were few in number, they were retained to preserve the full hazard function change over time. This approach naturally led to increased variability, reflected in the wider standard deviation shading in those areas.

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

This argument does not support the weak model fits. Of cause one needs to keep the low probability trials in the analysis. They are part of the probability distribution that the brain has to learn. (n.b. in simple RT tasks average RT and variance of RT are positively correlated. This also holds in probability-based designs and is to be expected.)

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

3. A similar pattern, where RTs vary while HF remains stable, has also been observed in previous work using longer-tailed probability distributions with sparse trials at both ends, such as Herbst et al. (2018).

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

The authors argue that their model fit is ok because a phenomenon that their model cannot capture was also not captured by some other model in some other paper. Wouldn't it be more scientific, to model this phenomenon and thus offer a quantitative account? ===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

To aid visual interpretation, we marked the foreperiods with dominant trial frequencies (0.7, 0.8, 0.9, 1.5, 1.6, and 1.7 s) using dots. These highlighted points more clearly demonstrate that the observed RTs correspond well with both the predicted trend and the peaks in the foreperiod distribution.

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

Here the authors marked RT and model with dots at the high-probability-foreperiods. They argue that the model fits the data at these foreperiods. I disagree with this statement because this is obviously not the case. As one can see, the RT dots do not correspond well with the model dots. There are systematic differences between the "RT dots" and the "model dots" (more on that below).

In probabilistic designs, such as the one used here, the probability varies across foreperiods (uniform distribution being the exception). A convincing model needs to fit the data over the ENTIRE RANGE of foreperiods. Note that this is a core assumption of the Hazard Function hypothesis in temporal prediction: The brain computes the continuous HF variable across the entire range of foreperiods.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

Finally, although the model fits for FP2 in Figure 2 and Supplementary Figure 1 appear visually comparable across the four models, statistical comparisons using AIC and BIC nonetheless identified Model 4 (which includes the interaction term) as the best-fitting model.

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

As I had already written, model comparison based on AIC/BIC can only be interpreted in *relative* terms. This relative comparison does not offer support for the hazard function as an adequate model at all, it merely indicates that among the compared models, one is the "best". An *absolute* assessment of the model needs to rely on R-squared and beta (both weak) and on visual expection of the fits (also weak, see below).

Fig. 2 is a central data figure of the manuscript. But in their manuscript, the authors do not describe the dynamics of their main RT variable to the reader. It is an unconventional choice to not at all verbally assess how the experimental task shapes behavior and, instead, go straight to the modeling. But it shows how little the authors want to highlight their RT data. Here is an assessment of RT dynamics and how the models relate to it:

On the data:

- Over short foreperiods (< ~1.2s), RT decreases in all blocks, in "FP1" and "FP2" (block 3, rightmost plot deviates from this pattern to some degree).

- Over long foreperiods (> ~1.2s), in all of the blocks, the difference in RT across the range of foreperiods is numerically very small, spanning only ~0.025s (block 3 rightmost plot and block 4 middle plot deviate from this range since each has one strong outlier RT value). This is a very small RT modulation compared to the temporal prediction RT literature. Also, there are no clear patterns of RT modulation, so maybe we are only looking at random RT. This raises the question whether there are any causal effects of the task's probability manipulation on RT over long foreperiods at all.

On the modeling:

FP1 plots (HFu model):

- In blocks 1 to 4, RT decreases over short foreperiods (< ~1.2s). The model predicts a close to constant RT and does not capture the observed modulation.

- Across blocks 1 to 4, over long foreperiods (> ~1.2s), the model predicts a triphasic pattern of RT ("down-up-down"). This pattern is not observed in RT in any of the blocks and thus the model does not predict the data well. In all blocks the 3 HFu model "dots" predict a steeply decreasing RT which does not capture well the corresponding RT "dots" which either decrease with a much shallower slope (blocks 1 to 3) or show a biphasic pattern ("up-then-down", block 4).

FP2 plots (HFc + HFu*HFc model):

- Over short foreperiods (< ~1.2s), the model does not capture the decrease in RT well. It over-estimates the data in block 1, without capturing the decrease. In block 2, it predicts a close-to-constant RT, whereas RT decreases. In block 3 (middle plot), it under-estimates RT and does not capture its decrease. In block 3 (right plot), the model captures the data better. In block 4 (middle plot), it under-estimates the RT data. In block 4 (right plot), it does not capture the range of RT modulation.

- Over long foreperiods (> ~1.2s), the model predicts either a triphasic "down-up-down" (Blocks 2 & 3) or a tetraphasic "up-down-up-down" (Blocks 1 & 4) RT dynamic over a foreperiod range of ~0.8 s. The data do not justify this modeling assumption. In light of the miniscule difference in RT across the range of foreperiods (~0.025 s), such a polyphasic model is inadequate, at best capturing random noise by chance.

When considered jointly across all blocks, the HFc + HFu*HFc model does not capture the data well.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

Accordingly, we have revised the Results section.

"We also visualized the observed and predicted reaction times across all foreperiods (Figure 2A). A general trend consistent with the HF can be observed, particularly in the foreperiods with dominant trial frequencies, where actual and predicted reaction times closely align. Note that data from all blocks were combined for the model analysis; however, we separated them in the figure for clearer illustration of the actual reaction times across foreperiods alongside the predicted values." (Page 9, Lines 4-10)

"We first visualized the actual and predicted reaction times for all models in Figure 2 and Supplementary Figure 1. Due to the varied block-wise probability distributions, visually inspecting model fit is more challenging, as it requires jointly evaluating all predicted curves across the four blocks. To statistically determine the best-fitting model, we performed model comparisons." (Page 10, Lines 2-6)

===========================================================================

REVISION ROUND 2 - COMMENT REVIEWER #1:

Comment 1

The authors improved the terminology in their manuscript. This is very helpful, as is the table

showing the changes. One minor point: In the intro (P3, l14) the authors say that the hazard rate itself is a conditional probability. While this makes sense in itself, it may throw off the reader who may confuse it with the use of the terms (un-)conditional, as in e.g. Fig. 1D. Maybe rephrase the sentence in the intro to not use the word conditional?

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

We thank the reviewer for pointing this out. To avoid potential confusion with the terminology used later in the manuscript, we have revised the sentence as follows: "This function, derived from the probability distribution of the foreperiod, describes how the probability of the target signal occurring is updated over time, given that it has not yet occurred." (Page 3, Line 13-15)

===========================================================================

REVISION ROUND 3 - RESPONSE REVIEWER #1:

Ok.

===========================================================================

REVISION ROUND 2 - COMMENT REVIEWER #1:

The authors performed new model fits to non-log-transformed RT. This is in line with the bulk of the hazard rate literature and does not add neuro-computational assumptions as the log-

transform did. Both improves interpretability. The authors kept the fits to log-transformed RT

and added them to Supplement.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

We appreciate the reviewer's understanding on this point.

===========================================================================

REVISION ROUND 2 - COMMENT REVIEWER #1

The authors reply to my comment that they regressed HF on EEG data, not RT. I am aware of

this fact. The crucial point is: if the HF fits to RT are not convincing, this does call into question

the HF as an appropriate regressor and thus questions the EEG analysis and the interpretation

of the results, see Preface.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

We thank the reviewer for pointing this out. We have addressed this comment by providing an improved visualization of the HF across all foreperiods along with the predicted values in Figure 2 and Supplementary Figure 1 (see Responses 3 and 4). We also show significant and moderate correlations between RT and HF (Response 2), and discuss prior literature that supports the use of HF as a regressor (Response 5 below).

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

Again: a "moderate correlation" between HF and RT does not justify a regression on neural data. At this point the manuscript has reached a dead end: The authors have not formally developed a model tailored to their task's computational demands but tacitly assumed the Hazard Function's validity in their specific context. But the HF model does not fit the data well and there are no better acounts of the behavior in the manuscript. Frankly, the autors still owe the reader a convincing computational account of the RT data.

===========================================================================

REVISION ROUND 2 - COMMENT REVIEWER #1:

Comment 2

Please see Preface above.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

We have addressed this comment by providing Figure 2 and Supplementary Figure 1 (see Responses 3 and 4).

===========================================================================

REVISION ROUND 2 - COMMENT REVIEWER #1:

Comment 3

The authors convincingly addressed this comment.

Comment 4

The authors convincingly addressed this comment.

Comment 5

i) On the model selection: Please see Preface.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

We have addressed the comment above in Responses 3 and 4.

===========================================================================

REVISION ROUND 2 - COMMENT REVIEWER #1:

ii) On the temporal uncertainty:

The authors tested temporally and probabilistically blurred HF models and find the results

inconclusive.

Based on the authors' reply to my comment, I get the impression that they think that the different blurring regimes are just a means to improve model fit ("The winner in RT following FP1 is using probabilistically blurred HF. However, the winner in RT following FP2 is using temporally blurred and probabilistically blurred HF." and "However, the optimal blurring effect

varies, depending on different blurring factors").

Instead, these blurring regimes serve to test specific hypotheses about the brain's uncertainty in time estimation.

It is widely accepted that there are errors to neural temporal estimates, e.g. in interval timing.

These timing mechanisms are crucial for sucessful performance of the timing task that the

authors report here. Therefore, the reader could expect the authors to take a stance on

uncertainty in time estimation, especially since the manuscript's abstract closes with this bold

statement: "Our study reveals brain networks that integrate multilevel temporal information,

offering insight into the hierarchical predictive coding of time." It is worth noting that the

manuscript's modeling work unrealistically implies that time is encoded *perfectly* in the brain, without any error....

The authors state in the Discussion (p 18, l. 18-20): "Therefore, although we believe that the

time estimation during waiting is unlikely to be equally precise throughout, we chose to use a

simpler model that does not include explicit blurring parameters or assumptions about

precision."

To the contrary, the non-blurred models that the authors identified as the best-fitting ones make a specific assumption about precision, i.e. that the brains studied in this experiment are perfect clocks that make no timing errors (and can perfectly estimate two hazard rates simultaneously). This seems to be a neurobiologically implausible assumption, given the literature on time estimation (Gibbon Psychol Rev 1977, Gallistel & Gibbon Psychol Rev 2000).

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

Response 5: On using different blurring factors and parameters (also dddressing the third question below)

We sincerely appreciate the reviewer's comments regarding the role of uncertainty in time estimation. We acknowledge that our previous response may have unintentionally given the impression that temporal and probabilistic blurring were used solely to improve model fit. We would like to clarify that our intention in testing these different blurring factors was to evaluate competing hypotheses about the neural representation of uncertainty during time processing.

To our knowledge, the question of how the brain represents uncertainty in temporal prediction remains unresolved in the existing literature. For example, neural activity in the primate brain has been shown to closely correlate with temporally blurred HFs under both unimodal and bimodal distributions (Janssen & Shadlen, 2005). In long-foreperiod paradigms, where greater temporal uncertainty is expected, RTs have also been successfully explained by temporally blurred HFs (Bueti et al., 2010; Bueti & Macaluso, 2010). Conversely, other studies suggest that probabilistically blurred probability distribution itself (PDF) better account for human behavioral data (Grabenhorst et al., 2019, 2021). Still, unblurred HFs have also been effectively used to differentiate behavioral and neural responses across uniform and unimodal distributions (Herbst et al., 2018), in both explicit and implicit timing tasks (Coull & Nobre, 2008), and in modeling trial-by-trial neural responses to surprise in the foreperiod paradigm (Visalli et al., 2021).

These seemingly conflicting findings may reflect important differences across studies (Nobre & van Ede, 2018), including: (1) the statistical properties of the foreperiod distributions (e.g., mean and variance); (2) the temporal resolution of the experimental design; and (3) the use of catch trials. These factors can result in varied temporal prediction profiles, along with different types and magnitudes of uncertainty. For example, regarding the first point, in a bimodal distribution formed by overlapping two identical unimodal distributions, increasing the distance between the peaks can amplify temporal blurring. This is evident in the ~0.27- and ~1.93-second peaks in Janssen & Shadlen (2005) and the ~3.91- and ~13.75-second peaks in Bueti et al. (2010), compared to the 0.8- and 1.6-second peaks in the current study. For probabilistic blurring, the degree of blurring can be influenced by the standard deviation of the distribution, a narrower distribution produces sharper changes over time, whereas a flatter distribution leads to more gradual changes.

Regarding the second point, even when foreperiod distributions have identical means and standard deviations, the overall range can still affect uncertainty. For instance, a longer range (0- 20 seconds, as in Bueti et al., 2010) can introduce greater temporal uncertainty than a shorter range (0.4-2 seconds), as used in our study. Furthermore, it is also reasonable to infer that both temporal and probabilistic blurring may coexist and interact with each other when using probability distributions with different means, standard deviations, and time ranges. In a side note, temporal resolutionm, defined by the time step used in the analysis (e.g., 0.1-second vs. 0.01-second), can also influence the dynamics of the HF, particularly around the later peaks. This effect is illustrated in our simulations based on Bueti et al. (2010) and our own study, as well as in comparisons with Herbst et al. (2018) (see green arrows in the figures below).

--- FIG

Third, the inclusion of catch trials in some foreperiod paradigms not only introduces additional uncertainty regarding whether the target will occur but also complicates the definition of the probability distribution for target trials. Regarding the former, Grabenhorst et al. (2021) modeled RT by incorporating both the uncertainty of target occurrence and probabilistic uncertainty (i.e., probabilistic blurring), raising important questions about how these two forms of uncertainty interact in the brain. Regarding the latter, distribution estimation becomes more complex, as catch trials may either be omitted or interpreted as extremely long foreperiods within the distribution.

Returning to our study, which aimed to test whether the brain can learn and integrate both unconditional and conditional temporal statistics, we implemented distinct foreperiod distributions that inevitably introduced different types of uncertainty across conditions. We found that no single blurring factor consistently outperformed others across FP1 and FP2, further supporting our first and second points.

Importantly, we do not claim that the original HF consistently yielded the best-fitting model. Rather, we emphasize that the integration of multilevel temporal information was consistently observed in any blurring factors used. Notably, all better-fitting models were based on hazard functions—either temporally or probabilistically blurred. Consistent with prior neurophysiological findings demonstrating neural encoding of original hazard functions, our results revealed significant associations between RT/neural activity and original HF predictors. This suggests that the brain may perform HF-like computations as a mechanism for temporal prediction, with different forms of uncertainty modulating these estimates in a context-dependent and potentially parallel manner.

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

The authors acknowledge that "the question of how the brain represents uncertainty in temporal prediction remains unresolved in the existing literature" and in the following sections conjecture different hypotheses about why this may be the case. So the authors identified the uncertainty in time estimation as an important aspect in temporal prediction. Yet despite the authors' lengthy musings that follow this insight, the manuscript at hand does not offer any tangible progress on this aspect.

Instead, the manuscript leaves the reader puzzled by two points:

1) Their temporally / probabilistically blurred HF fits fail to yield consistent results across conditions. They conclude that they are not appropriate or some other thing is needed, yet they do not offer a modeling solution to this problem. At this point the reader wonders: What is the major contribution of this manuscript: There is a novel task and new data but there is just a weak fit of the nonblurred Hazard Function and a non-conclusive investigation of different hypotheses about temporal uncertainty.

2) The blurring should account for noise in the data and, if the temporal blurring/probabilistic blurring hypotheses are on the right track, there should be an improved model fit. Yet the results are inconsistent across conditions. This further questions the basic modeling approach of this manuscript. The off-the-shelf Hazard Function appears to be a wrong model choice in the first place.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

Once again, we fully acknowledge the reviewer's important point: non-blurred models imply a simplified assumption that time is encoded without error. While this assumption allows us to isolate core computational principles, it underrepresents the role of uncertainty.

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

The authors cannot claim to have "isolated core computational primitives" The fits are unconvincing both quantitatively (R-squqred, beta) and qualitatively (visual inspection).

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

Its modulatory effects should be further investigated in future research, such as using foreperiod ranges of different lengths (to test temporal blurring) or manipulating the standard deviation of distributions (to test probabilistic blurring). Accordingly, we have explicitly discussed this limitation in the revised Discussion to better reflect the complexity of the underlying temporal processes.

Discussion section:

"Still, in long-foreperiod paradigms, where greater temporal uncertainty would be expected, RTs have been well explained by temporally blurred HFs (Bueti et al., 2010). These seemingly conflicting findings across studies may reflect key differences across studies (Nobre & van Ede, 2018), including: (1) the statistical properties of the foreperiod distributions (e.g., mean and standard deviation), (2) the temporal resolution of the experimental design, and (3) the use of catch trials. These factors can introduce varied temporal prediction profiles with different types and magnitudes of uncertainty. Regarding the first point, in a bimodal distribution created by overlapping two identical unimodal distributions, a greater distance between the peaks is expected to produce a stronger temporal blurring effect. For probabilistic blurring, the extent of the effect can be influenced by the distribution's standard deviation, for instance, a sharper distribution may lead to more rapid changes over time than a flatter one. For the second point, even when distributions share the same mean and standard deviation, the range of foreperiods can still affect uncertainty. For example, a longer range (e.g., 0-20 seconds, as in Bueti et al., 2010) introduces greater temporal uncertainty than a shorter range (e.g., 0.4-2 seconds, as used in our study). It is also reasonable to infer that temporal and probabilistic blurring may coexist and interact, potentially modulating one another, when using probability distributions with different means, standard deviations, and time ranges. Third, the inclusion of catch trials not only adds uncertainty regarding whether the target will occur but also complicates the definition of the underlying probability distribution. For instance, Grabenhorst et al. (2021) modeled RT by incorporating both the uncertainty of target occurrence and probabilistic uncertainty (i.e., probabilistic blurring), raising the question of how these two forms of uncertainty interact in the brain. In terms of probability distributions, catch trials may either be excluded or interpreted as extremely long foreperiods, making distribution estimation more complex.

This inconsistency further supports our earlier points: in our study, we implemented distinct foreperiod distributions to examine whether the brain can learn and integrate both unconditional and conditional temporal statistics. This design inevitably introduced different types of uncertainty across conditions. As shown in previous neurophysiological work, which demonstrates neural encoding of original hazard functions, and consistent with our findings, we suggest that the brain may rely on HF-like computations as a neural mechanism for temporal prediction. These computations may be modulated by different forms of uncertainty in a context-dependent and potentially parallel manner. Nevertheless, we emphasize that while using non-blurred models allows us to isolate core computational principles, under the simplifying assumption that time is encoded without error, it also underrepresents the role of uncertainty. The modulatory effects of uncertainty should be explored in future research, for example by varying foreperiod ranges (to examine temporal blurring) or manipulating the standard deviation of the distribution (to examine probabilistic blurring)." (Pages 18-20, Lines 18-14)

===========================================================================

REVISION ROUND 2 - COMMENT REVIEWER #1:

In fact, it should worry the authors that all versions of their models (non-blurred, temporally

blurred, probabilistically blurred, Suppl. Fig. 8) yield rather small goddness-of-fit values (adj.

R2), as the authors acknowledge. The authors speculate that this may be due to the number of trials per participant which they say may be too small. Three questions:

1) If this is the case, then how can the main results of the manuscript be conclusive?

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

Response 6: Trial Numbers

We appreciate the reviewer's concern. When designing our experiment, we determined the number of trials, 600 for FP1 and 600 for FP2, based primarily on prior studies. Specifically, we referred to: (1) Herbst et al. (2018), in which each participant completed a total of 546 trials during EEG recordings; and (2) Bueti et al. (2010), which included 600 trials in a training session and 200 trials in a testing session during fMRI acquisition. Notably, they reported no substantial differences between the results from the two sessions. Based on these references, we specified the rationale for our chosen trial numbers in the Methods section.

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

The authors state that they presented 600 trials for FP1 and for FP2. When inspecting the data more closely (see my above comments to Fig. 2), it became obvious that there is not much of a patterned RT modulation over the longer half of foreperiods. Why may this be in light of the large trial numbers?

The experiment is structured in blocks 1 to 4 and in their manuscript, the authors write: "During the experiment, the four blocks were delivered in a random order and repeated three times, each with a different random order, resulting in a total of 12 block representations (run)."

If I understand this correctly, then there were no two same consecutive blocks in the experiment. Could then one explanation for this lack of a clear effect be that the authors presented too few trials per block before randomly switching to another block ("block" = experimental condition)? Compared to work in the literature, where several hundreds of trials are presented before switching condition, this would render the design underpowered: the authors may have switched condition before participants reached the end of learning.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

In addition, we acknowledge that our earlier attribution of low conditional R2 values to limited trial numbers per participant was speculative and not supported by direct evidence. To evaluate this more rigorously, we conducted a bootstrapping analysis to assess the stability of R2 values within our dataset. Specifically, for each participant and for both FP1 and FP2, we resampled their trials with replacement and re-estimated the R2 value 1,000 times to generate a bootstrap distribution and corresponding 95% confidence intervals. This allowed us to assess whether the observed R2 values were statistically stable and representative. Please see the results below and in Supplementary Figures 11 and 12. Our assessment is based on two criteria: (1) The bootstrap distribution of R2 should be unimodal. Bimodal or uniform distributions would suggest instability. While skewness toward zero may reflect limited informativeness, it does not necessarily imply instability. (2) The actual R2 value should fall within the 95% confidence interval of the bootstrap distribution and ideally be close to the bootstrap mean. Across participants, these criteria were generally met for both FP1 and FP2, suggesting that the number of trials per participant was sufficient to yield statistically stable R2 estimates. We believe these findings reinforce the robustness of our conclusions.

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

The results of the boot-strapping analysis reinforces the robustness of the authors' conclusions only in the sense that the HF model really offers only a weak model fit.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

Methods section:

"Each participant completed a total of 600 FP1 trials and 600 FP2 trials, with the number of trials determined primarily based on prior studies (Bueti et al., 2010; Herbst et al., 2018). We also confirmed the stability of individual results using bootstrap resampling, which showed that the distribution of individual R-squared values was unimodal, with the actual values close to the bootstrap mean and within the 95% confidence intervals (Supplementary Figures 11 and 12)." (Page 28, Lines 6-11)

---Suppl Fig 11

---Suppl Fig 12

===========================================================================

REVISION ROUND 2 - COMMENT REVIEWER #1

2) Do the HF models fit the group-level RT well (see Preface)?

3) Is the hazard function an adequate model for these data at all?

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

These two questions have been addressed in Responses 1-5.

===========================================================================

REVISION ROUND 2 - COMMENT REVIEWER #1:

Comment 6

The authors convincingly addressed this comment.

Comment 7

In the Discussion (p. 20 l. 2-3), the authors write: "Here, it is important to clarify again that the

integration of two probabilities was supported by the optimal model performance with the

inclusion of the interaction term (i.e., HFU * HFC)."

The term "optimal" is inappropriate here. The authors probably want to express that the model with the interaction term yielded the best fit relative to the other models that did not feature such a term. So this is a relative statement and should be formulated as such. The current sentence implies that the model is performing optimally, which does not make sense.

===========================================================================

REVISION ROUND 2 - AUTHORS' RESPONSE:

We thank the reviewer for pointing this out. Accordingly, we have revised the sentence to: "Here, it is important to clarify again that the integration of two probabilities was supported by the model including the interaction term (i.e., HFU * HFC), which yielded the best fit relative to the other models that did not feature such a term." (Page 18, Line 5-12)

===========================================================================

REVISION ROUND 3 - COMMENT REVIEWER #1:

Ok.

===========================================================================

REVISION ROUND 2 - COMMENT REVIEWER #1:

Comment 8

The authors convincingly addressed this comment.

Comment 9

The authors convincingly addressed this comment.

Comment 10

The authors convincingly addressed this comment.

Reviewer #2: In my view the authors have addressed all concerns with the current revision. The requested plots (RT over FP) and additional analyses are sufficiently convincing that the model explains important computational aspects of temporal predictions as the authors claim, even though not all variability in RT is captured (as in many existing studies).

It is inherently difficult to design probabilistic variations of foreperiods, and regress these to RT, due to the multiple facets of temporal prediction, and contributions to RT such as autocorrelations over trials, lapses, ..

My recommendation is to accept the paper for publication. I belief it is an important addition to the literature.

Decision Letter 2

Christian Schnell, PhD

9 Oct 2025

Dear Dr Huang,

Thank you for the submission of your revised Research Article "Human brain integrates both unconditional and conditional timing statistics to guide expectation and behavior" for publication in PLOS Biology. On behalf of my colleagues and the Academic Editor, Ruth de Diego Balaguer, I am pleased to say that we can in principle accept your manuscript for publication, provided you address any remaining formatting and reporting issues. These will be detailed in an email you should receive within 2-3 business days from our colleagues in the journal operations team; no action is required from you until then. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have completed any requested changes.

When you attend to those request to come, please also add a citation of the location of the source data clearly in all relevant main and supplementary Figure legends, e.g. “The data underlying this Figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP”."

Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.

PRESS

We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have previously opted in to the early version process, we ask that you notify us immediately of any press plans so that we may opt out on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for choosing PLOS Biology for publication and supporting Open Access publishing. We look forward to publishing your study. 

Sincerely, 

Christian

Christian Schnell, PhD

Senior Editor

PLOS Biology

cschnell@plos.org

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. Supplementary methods for calculating different blurred models.

    (DOCX)

    pbio.3003459.s001.docx (26.7KB, docx)
    S1 Table. False alarm trial numbers across blocks.

    (DOCX)

    pbio.3003459.s002.docx (19.6KB, docx)
    S2 Table. Average reaction times.

    (DOCX)

    pbio.3003459.s003.docx (20KB, docx)
    S3 Table. Comparisons of linear mixed-effects models on reaction times (false alarms only vs. false alarms and reaction times outliers excluded).

    (DOCX)

    pbio.3003459.s004.docx (21.5KB, docx)
    S4 Table. Average reaction times following FP2.

    (DOCX)

    pbio.3003459.s005.docx (20.5KB, docx)
    S5 Table. Outlier trial numbers across four blocks.

    (DOCX)

    pbio.3003459.s006.docx (19.9KB, docx)
    S6 Table. Average reaction times across different trials.

    (DOCX)

    pbio.3003459.s007.docx (20.9KB, docx)
    S7 Table. Effect of HFU on log-transformed reaction times following FP1.

    (DOCX)

    pbio.3003459.s008.docx (19.9KB, docx)
    S8 Table. Comparisons of linear mixed-effect models on log-transformed reaction times following FP2.

    (DOCX)

    pbio.3003459.s009.docx (20.5KB, docx)
    S9 Table. Effects of HFU and HFC on log-transformed reaction times following FP2.

    (DOCX)

    pbio.3003459.s010.docx (20.5KB, docx)
    S10 Table. Effect of actual HFU on reaction times following FP1.

    (DOCX)

    pbio.3003459.s011.docx (20.1KB, docx)
    S11 Table. Comparisons of linear mixed-effect models on reaction times following FP2.

    (DOCX)

    pbio.3003459.s012.docx (20.4KB, docx)
    S12 Table. Effects of actual HFU and HFC on reaction times following FP2.

    (DOCX)

    pbio.3003459.s013.docx (20.5KB, docx)
    S13 Table. Effects of HFU and HFC on reaction times following FP2 (1.1- and 1.3-sec FP removed).

    (DOCX)

    pbio.3003459.s014.docx (20.5KB, docx)
    S1 Fig. Average and predicted reaction times across participants.

    The average reaction time following FP2 for each block is shown by the black line, with dominant foreperiod durations marked by black circles. The shade represents the standard deviation. Predicted reaction times are shown by the red line, with dominant foreperiod durations marked by red squares. For FP2, predicted reaction times were obtained from Models 1, 2, and 3. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

    (EPS)

    S2 Fig. Splined-interpolated hazard values.

    The original hazard values at a 10-Hz sampling rate are represented in orange, while the interpolated values at a 250-Hz sampling rate are represented in blue. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

    (EPS)

    pbio.3003459.s016.eps (1.5MB, eps)
    S3 Fig. Temporal response function of HFU.

    Within the significant area for HFU-only in Fig 4A, the positive and negative average TRF (i.e., weight) between source responses and HFU values at a 0-s lag is shown. Values were normalized between −1 and 1. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

    (EPS)

    pbio.3003459.s017.eps (5.9MB, eps)
    S4 Fig. Temporal response function of HFC.

    Within the significant area for HFC-only in Fig 4B, the positive and negative average TRF between source responses and HFC values at a 0-s lag is shown. Values were normalized between −1 and 1. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

    (EPS)

    pbio.3003459.s018.eps (5.9MB, eps)
    S5 Fig. Temporal response functions of HFU, HFC, and their interaction.

    Within the significant area for HFU+ + HFC+ + HFU*HFC in Fig 4C, the positive and negative average TRFs between source responses and each of those three values at a 0-s lag is shown. Values were normalized between −1 and 1. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

    (EPS)

    pbio.3003459.s019.eps (15.6MB, eps)
    S6 Fig. Event-related potentials (ERP) during FP1 and FP2.

    We averaged processed EEG signals in foreperiod trials lasting >0.7 s. Time zero represents the onset of the warning signal. Differences between ERPs during FP1 and FP2 (FP1 − FP2) were tested using the Monte Carlo method and cluster-based correction method (1,000 randomization, a two-tailed test, and an alpha level of 0.05). Triangles in the topography and gray shades in the time course represent the significance. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

    (EPS)

    pbio.3003459.s020.eps (16.7MB, eps)
    S7 Fig. R-squared obtained using different representations of the temporal predictions.

    The probability distribution is denoted as PDF, the hazard function as HF, temporally blurred as temp-blurred, and probabilistically blurred as prob-blurred. (A) The adjusted R-squared was obtained from each participant for FP1, and the conditional R-squared was obtained (with FP1 duration as a random effect). The black circle represents the mean of the R-squared across participants. (B) The mean R-squared was obtained across participants across blurring parameters ranging from 0.15 to 0.35 in steps of 0.1. The results on the left and right columns were obtained from the regression model formulas (3) and (7), respectively. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

    (EPS)

    pbio.3003459.s021.eps (855.9KB, eps)
    S8 Fig. Average reaction times following FP2.

    The visualization of S3 Table. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

    (EPS)

    pbio.3003459.s022.eps (1.4MB, eps)
    S9 Fig. Probability distributions of the actual trial configuration averaged across participants.

    The red line represents the hazard function derived from the probability distribution function in Fig 1D. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

    (EPS)

    pbio.3003459.s023.eps (2.5MB, eps)
    S10 Fig. The actual trial configuration from one participant.

    There were 50 trials of the two-foreperiod sequence, resulting in a total of 100 foreperiod trials for each block (A) Initial trial numbers for each participant and block were determined based on a 50−50 probability distribution. (B) For each participant, foreperiods were randomly selected in the ranges of L and S, and paired as LL, SS, LS, or SL. An example of actual trial numbers for one participant is shown. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

    (EPS)

    pbio.3003459.s024.eps (1.6MB, eps)
    S11 Fig. Bootstrap distribution of adjusted R-squared values based on 1,000 resamples of reaction times following FP1 for each participant.

    Reaction times were regressed against HFU in the linear regression model. The blue line represents the actual adjusted R-squared value; the green line represents the mean of the bootstrap distribution; and the red dashed lines represent the 95% confidence interval. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

    (EPS)

    pbio.3003459.s025.eps (2.1MB, eps)
    S12 Fig. Bootstrap distribution of conditional R-squared values based on 1,000 resamples of reaction times following FP2 for each participant.

    Reaction times were regressed against HFU, HFU, and the interaction term in the linear mixed effect model, with the FP1 duration as a random effect. The representation is the same as S11 Fig. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

    (EPS)

    pbio.3003459.s026.eps (2.1MB, eps)
    S13 Fig. R-squared obtained using hazard functions from the actual trial configuration.

    The representations are the same as those in S7A Fig. The results in the left and right columns were obtained from the regression model formulas (3) and (7), respectively. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

    (EPS)

    pbio.3003459.s027.eps (476KB, eps)
    S14 Fig. Average and predicted reaction times across participants.

    The same representations as Fig 2 are used. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

    (EPS)

    pbio.3003459.s028.eps (1.6MB, eps)
    S15 Fig. Neural correlates of original and shuffled HFU.

    EEG source responses during FP1 were modeled against shuffled HFU as a control in the panel A and original HFU in the panel B. Four cortical surfaces with significant correlation coefficients are shown. The color bar shows correlation coefficients. The data underlying this figure can be found in https://doi.org/10.17605/OSF.IO/VEDHP.

    (EPS)

    pbio.3003459.s029.eps (6.1MB, eps)
    Attachment

    Submitted filename: reviewers_comments_v5.pdf

    pbio.3003459.s030.pdf (1.6MB, pdf)
    Attachment

    Submitted filename: reviewers_comments_v5_auresp_1.pdf

    pbio.3003459.s031.pdf (1.6MB, pdf)
    Attachment

    Submitted filename: reviewers_comments_v5_auresp_2.pdf

    pbio.3003459.s032.pdf (1.6MB, pdf)

    Data Availability Statement

    The raw data, along with the data underlying each table, figure, and supporting table and figure, are publicly available in the Open Science Framework (https://doi.org/10.17605/OSF.IO/VEDHP).


    Articles from PLOS Biology are provided here courtesy of PLOS

    RESOURCES