Abstract
We use our sense of time to identify temporal relationships between events and to anticipate actions. How well we can exploit temporal contingencies depends on the variability of our measurements of time. We asked humans to reproduce time intervals drawn from different underlying distributions. As expected, production times were more variable for longer intervals. Surprisingly however, production times exhibited a systematic regression towards the mean. Consequently, estimates for a sample interval differed depending on the distribution from which it was drawn. A performance-optimizing Bayesian model that takes the underlying distribution of samples into account provided an accurate description of subjects’ performance, variability and bias. This finding suggests that the central nervous system incorporates knowledge about temporal uncertainty to adapt internal timing mechanisms to the temporal statistics of the environment.
From simple habitual responses to complex sensorimotor skills, our behavioral repertoire exhibits a remarkable sensitivity to timing information. To internalize temporal contingencies, and to put them to use in the control of conditioned and deliberative behavior, our nervous systems must be equipped with central mechanisms to process time.
Among the elementary aspect of temporal processing, and one that has been the focus of many psychophysical studies of time perception, is the ability to measure the duration between events; i.e., interval timing1. A common feature associated with repeated estimation (or production) of a sample interval is that the standard deviation of the estimated (or produced) intervals increases linearly with their mean, a property that is termed scalar variability2–4. While previous work has demonstrated how suitable forms of internal noise might lead to scalar variability5–8, we do not know whether and how the nervous system can make use of this lawful relationship to improve timing behavior.
Scalar variability implies that measurements of relatively longer intervals are less reliable and thus more uncertain. The question we address is whether subjects have knowledge about this uncertainty, and how they might exploit it to improve estimation and production of time intervals. This question is particularly important when one has prior expectations of how long an event might last. For instance, if one measures an interval to be ~1.5 s but, based on experience, expects it to be closer to 1.2 s, then s/he may conclude that the true interval was probably somewhere between 1.2 and 1.5 s. More generally, knowledge about the distribution of time intervals one may encounter – which we refer to as temporal context – could help reduce uncertainty. The extent to which temporal context should inform temporal judgments depends on how unreliable measurements of time are. While a metronome need not rely on temporal context to stay on the beat, a piano player may well use the tempo of a musical piece to coordinate finger movements in time. Thus, to make use of the oft-present temporal context, the brain must have knowledge about the reliability of its own measurements of time.
The question of how knowledge about temporal context may improve measurements of elapsed time can be posed rigorously in the framework of statistical inference. In this framework, to estimate a sample interval, the observer may take advantage of two sources of information: (1) the likelihood function, which quantifies the statistics of sample intervals consistent with a measurement, and (2) the prior probability distribution function of the sample intervals the observer may encounter. One possibility is for the observer to ignore the prior distribution, and to choose the most likely value directly from the likelihood function, a strategy known as the maximum-likelihood estimation (ML)9. Alternatively, a Bayesian observer would combine the likelihood function and the prior, and use some statistic to map the resulting posterior probability distribution onto an estimate. Common mapping rules are the maximum a posteriori (MAP) and Bayes Least Squares (BLS), which correspond to the mode and the mean of the posterior respectively.
To understand how humans evaluate their measurements of elapsed time in the presence of a temporal context, we asked subjects to estimate and subsequently reproduce time intervals in the sub-second to seconds range that were drawn from three different prior distributions. Subjects’ production times showed a clear dependence on both the sample intervals and the prior distribution from which they were drawn. We fitted subjects’ responses to various observer models such as ML, MAP and BLS and found that a Bayesian observer associated with the BLS could account for the bias, variability and overall performance of every subject in all three prior conditions. This suggests that subjects have implicit knowledge of the reliability of their measurements of time, and can use this information to adjust their timing behavior to the temporal regularities of the environment. Furthermore, our observer model shows that this sophisticated Bayesian behavior can be accounted for by a nonlinear transformation that simply and directly maps noisy measurement of time to optimal estimates.
Results
The Ready-Set-Go paradigm
Subjects had to measure, and immediately afterwards reproduce different sample intervals. A sample interval, ts, was demarcated by two brief flashes, a “Ready” cue followed by a “Set” cue. The corresponding production time, tp, was measured from the time of the “Set” cue to when subjects proactively responded via a manual key press (Fig. 1a). Subjects received feedback for sufficiently accurate production times (Fig. 1c).
In each session, sample intervals were drawn from a discrete uniform prior distribution. For each subject, three partially overlapping prior distributions (“Short”, “Intermediate” and “Long”) were tested (Fig. 1b). The main data for each prior condition were collected after an initial learning stage (typically 500 trials) to ensure subjects had time to adapt their responses to the range of sample intervals presented.
Subjects’ timing behavior exhibited three characteristic features (Fig. 2). First, production times monotonically increased with sample intervals. Second, for each prior condition, production times were systematically biased towards the mean of the prior as evident from their tendency to deviate from sample intervals (diagonal dashed line) and gravitate towards the mean interval (horizontal dashed line)10–12. Consequently, mean production times associated with a particular ts were differentially biased for the three prior conditions. Third, production time biases were more pronounced in the “Intermediate”, and more so, in the “Long” prior conditions, indicating that longer sample intervals were associated with progressively stronger prior-dependent biases. Similarly, within each prior condition, the magnitude of the bias was larger for the longest sample interval compared to the shortest sample interval (Supplementary Fig. S1).
Scalar variability implies that the measurement of longer sample intervals engender more uncertainty. According to Bayesian theory, for these more uncertain measurements, subjects’ performance would improve if they rely more on their prior expectation (Supplementary Fig. S2). This is consistent with the observed increases in prior-dependent biases associated with longer sample intervals and suggests that subjects might have adopted a Bayesian strategy to reproduce time intervals. We thus developed probabilistic observer models to evaluate these observations quantitatively and to understand the computations from which they might arise.
The observer model
The observer model is presented with a sample interval, ts. Due to measurement noise, the measured interval, tm, may differ from ts. The observer must use tm to compute an estimate, te, for the sample interval, ts. To do so, the observer may use an estimator that relies on probabilistic sources of information such as the likelihood function and the prior distribution. Importantly however, the estimator itself is fully characterized by a deterministic function, f, that maps a measurement, tm, to an estimate, te; i.e. te = f(tm). Finally additional noise during the production phase may cause the production time, tp, to differ from te (Fig. 3a).
To formulate the model mathematically, we need to specify the relationship between the sample interval, ts, the measured interval, tm, the estimate, te, and the production time, tp. The relationship between tm and ts can be quantified by the conditional probability distribution, p(tm|ts), the probability of different measurements for a specific sample interval. This distribution also specifies the likelihood function, λtm (ts), a statistical description of the different sample intervals associated with a fixed measurement. We modelled p(tm|ts) as a Gaussian distribution centered at ts, and motivated by the scalar variability of timing, assumed that the standard deviation of p(tm|ts) grows linearly with its mean (Fig. 3a). The distribution of measurement noise was thus fully characterized by the ratio of the standard deviation to the mean of p(tm|ts), which we will refer to as the Weber fraction associated with the measurement, wm. With the same arguments in mind, we assumed that the distribution of tp conditioned on te, p(tp|te) is also Gaussian, is centered at te, and is associated with a constant Weber fraction, wp.
Finally, the relationship between tm and te was modelled by a deterministic mapping function, f, which we will refer to as the estimator. Different estimators are associated with different mapping rules. Among them, we focused on the ML, MAP and BLS because of their well-known properties, and because they were most germane to the development of our arguments with respect to the psychophysical data. We denote the corresponding estimators by fML, fMAP, and fBLS respectively (Fig. 3b–d).
The fML estimator assigns te to the peak of the likelihood function (Fig. 4a). In our model, with a Gaussian-distributed measurement noise and a constant Weber fraction, te would be proportional to tm (see Methods). The fMAP and fBLS estimators, on the other hand, rely on the posterior distribution, which is proportional to the product of the prior distribution and the likelihood function. Because the prior distribution we used was uniform, the posterior was a scaled replica of the likelihood function within the domain of the prior and zero elsewhere. The MAP rule extracts the mode of the posterior, which would correspond to the peak of the likelihood function except when the peak falls below/above the prior distribution’s shortest/longest sample interval. Thus, fMAP is the same as fML with the difference that its range is limited to the domain of the prior (Fig. 4b). For BLS, which is associated with the mean of the posterior, the estimator, fBLS, is a sigmoid function of tm (Fig. 4c).
Note that since the specification of these estimators does not invoke any additional free parameters, the observer model associated with each estimator was fully characterized by two free parameters only: wm and wp.
Comparing experimental data with the observer model
Our psychophysical data consisted of pairs of sample intervals and production times (ts and tp), but the observer model we created to relate ts to tp relies on two intervening and unobservable (hidden) variables, tm and te. We thus expressed these two hidden variables in terms of their probabilistic relationship to the observable variables ts and tp (see Methods), and derived a direct relationship between production times and sample intervals. This formulation was then used to examine which of the three observer models described human subjects’ responses best.
To compare human subjects’ responses to those predicted by the observer models, we quantified production times with two statistics, their variance (VAR), and their bias (BIAS) (Fig. 5a), which together partition the overall root mean squared error (RMSE) as follows:
This relationship, which highlights the familiar trade off between the VAR and BIAS, when written as the sum of squares, becomes the standard equation of a circle:
This geometric description indicates that in a plot of VAR1/2 versus BIAS, a continuum of values along a quarter circle would lead to the same RMSE (Fig. 5b). It also provides a convenient graphical description for how a larger RMSE represented by a quarter circle with a larger radius may arise from increases in VAR1/2, BIAS or both. We used this plot to summarize the statistics of production times, and also to evaluate the degree to which different observer models could capture those statistics.
We fitted the parameters of the ML, MAP and BLS models (wm and wp) for each subject, based on the production times in the three prior conditions (Fig. 6a–c inset). We then simulated each subject’s behavior using the fitted observer models, and compared each model’s predictions to the actual responses using the BIAS, VAR1/2, and RMSE statistics (Fig. 5c,e,g).
The ML model did not exhibit the prior-dependent biases present in production times (Fig. 5c,d), because the ML estimator does not take the prior into account. This failure cannot be attributed to an unsuccessful fitting procedure or a misrepresentation of the likelihood function. The fact that subjects’ production times depended on the prior condition would render any estimator that neglects the prior (e.g. ML) inadequate, the parametric form of the likelihood function notwithstanding. The MAP model was slightly better than the ML model at capturing the trade-off between BIAS and VAR (Fig. 5e,f), but it also underestimated the bias of the production times and overestimated their variance for all subjects (Fig. 6b). The BLS model on the other hand, mimicked the bias and variance of the production times quite well (Fig. 5g). It captured the overall RMSE as well as the trade off between the VAR and the BIAS (Fig. 5h), and was statistically superior to both ML and MAP estimators across our subjects (Fig. 6c).
We evaluated several variants of the BLS model by incorporating different assumptions concerning the measurement and production noise. In our main model (Fig. 4c), we fit Weber fractions for both sources of noise (wm and wp), consistent with the observation that, for all subjects, the standard deviation of the production times was roughly proportional to the mean (Supplementary Fig. S4). We also considered the possibility that the standard deviation of either the measurement noise or the production noise scales with the base interval, whereas the other noise source has constant standard deviation (Supplementary Table S2). For all subjects, the original BLS model outperformed the model in which the measurement noise had a constant standard deviation, and for 5 out of 6 subjects, it outperformed the alternative in which the production noise had a constant standard deviation (Akaike Information Criterion, Supplementary Table S1). Moreover, a BLS model in which Weber fractions were assumed identical (wm= wp) was inferior to the original BLS model (log likelihood ratio test for nested models; p< 0.03 for one subject and p<1e–7 for others). The importance of the measurement and production Weber fractions in accounting for the bias and variability of production times was also evident in model simulations (Supplementary Fig. S5).
Because our observer models were described by two parameters only (wm and wp), and all models used the same number of parameters, we were reasonably confident that the success of the BLS rule was not due to over-fitting. Nonetheless, we tested for this possibility by fitting the model to data from the “Short” and “Long” prior conditions. The fits captured the statistics of the “Intermediate” prior condition equally well. Finally, we note that the fits for the BLS and MAP rules did not differ systematically (Fig. 6a–c, insets). Therefore, the success of the BLS model cannot be attributed to the constraints inherent to our fitting procedure, but rather to its superior description of the estimator subjects adopted in this task.
Discussion
Our central finding is that humans can exploit the uncertainty associated with measurements of elapsed time to optimize their timed responses to the statistics of the intervals they encounter. This conclusion is based on the success of a Bayesian observer model that accurately captured the statistics of subjects’ production times in a simple time reproduction task.
A characteristic feature of subjects’ production times was that they were systematically biased towards the mean of the distribution of sample intervals. This observation is consistent with the ubiquitous central tendency of psychophysical responses in categorical judgment and motor production10–14. Previous work, such as the adaptation-level theory14, and range-frequency theory13 attributed these so-called range effects to subjects’ tendency to evaluate a stimulus based on its relation to the set of stimuli from which it is drawn. These theories, however, did not offer an explanation for what gives rise to such range effects in the first place, and whether or not they are of any value. In contrast, our work suggests that it is subjects’ (implicit) knowledge of their temporal uncertainty that determines the strength of the range effect. Moreover, the Bayesian account of range effects suggest that production time biases help — not harm — subjects’ overall performance (Supplementary Fig. S2,6). In what follows, we explain the novel aspects of our Bayesian model, and then discuss its implications for the neurobiology of interval timing.
Bayesian interval timing
Bayesian models have had great success in describing a variety of phenomena in vision and sensorimotor control15–18, as well as interval timing19, 20. Symptomatic to these models are prior-dependent biases whose magnitude increases for progressively less reliable measurements21. Motivated by the observation of such biases in our subjects’ behavior, and the success of a previous Bayesian model of coincidence timing19, we set out to formulate a Bayesian model for time reproduction.
The model consisted of three stages. The first stage emulated a noisy measurement process that quantified the probabilistic relationship between the sample intervals and the corresponding noise-perturbed measurements22. In the second stage, a Bayesian estimator computed an estimate of the sample interval from the measurement. Finally, a noisy production stage converted estimates to production times23, 24. In line with previous work on interval timing, the measurement and production noise exhibited scalar variability2, 3, 5, 7.
The estimator in the second stage of the model defines a deterministic mapping of measurements to estimates, and its functional form is determined precisely from the likelihood function, the prior distribution, and the cost (loss) function. The success of a Bayesian estimator thus depends on how well the likelihood, the prior and the cost function are constrained.
In psychophysical settings, because sensory measurements are not directly accessible, the likelihood function must be inferred from behavior and suitable assumptions about the distribution of noise. For example, cue combination studies make the reasonable assumption that measurements are perturbed by additive zero-mean Gaussian noise, and infer the width of the likelihood function from psychophysical thresholds25, 26. Alternatively, it is possible to model the likelihood function based on the uncertainty associated with external noise in the stimulus16, 27, 28. We modelled the likelihood based on the assumption that the distribution of measurements associated with a sample interval was Gaussian, was centered on the sample interval and had a standard deviation that scaled with the mean (see Methods).
To tease apart the roles of the likelihood function and the prior, it is important to be able to vary them independently. To manipulate likelihoods, one common strategy is to control factors that change psychophysical thresholds, such as varying the external noise in the stimulus16, 27. In our work, we exploited the scalar variability of timing to manipulate likelihoods. This property which arises from internal noise only and is known to hold across tasks and species2–4 for the range of times we used10 allowed us to manipulate the likelihood function simply by changing the sample interval. To manipulate the prior independently, we collected data using three discrete Uniform prior distributions. The priors were partially overlapping so that certain sample intervals were tested for two or three different priors, which enabled us to evaluate the effect of the prior independent of the likelihood function.
To convert the posterior distribution to an estimate, we needed to specify the cost function associated with the estimator. We considered two possibilities: (i) a cost function that penalizes all erroneous estimates similarly, which corresponds the mode of the posterior (MAP), and (ii) a cost function that penalizes errors by the square of their magnitude, which corresponds to the mean of the posterior (BLS). We also considered a non-Bayesian ML estimator that ignores the prior altogether, and chooses the peak of the likelihood function for the estimate. To decide which of these estimators better described subjects’ behavior, it proved essential to consider both the bias and the variability of production times. This technique, which was originally introduced to estimate internal priors from psychophysical data22, provided a powerful constraint in the specification of the estimator’s mapping function.
We used our three-stage model to estimate the measurement and production Weber fractions, and to decide which of the three mapping rules (ML, MAP, or BLS) better captured production times29. The ML estimator clearly failed to capture the pattern of prior-dependent biases evident in every subject’s production times, as expected from any estimator that neglects the prior. By incorporating the prior, both the MAP and BLS estimators exhibited contextual biases, but the BLS consistently outperformed the MAP model in explaining the trade off between the trial-to-trial variability and bias across our subjects (Fig. 6b,c). It is important to emphasize that, had we ignored the trial-to-trial variability, both BLS and MAP as well as a variety of other Bayesian models could have accounted for the prior-dependent biases in our data.
We also considered variants of the BLS model in which either the measurement or production noise were modelled as Gaussian with a fixed standard deviation (not scalar). Overall, our original model outperformed these alternatives (Supplementary Table S1) because the measurement and production Weber fractions played relatively independent roles in controlling the increasing bias and variance of production times with sample interval (Supplementary Fig. 5). The degrading effect of formulating noise with a fixed standard deviation was more severe for the measurement stage than it was for the production stage (Supplementary Table S1).
Despite the success of our modelling exercise, further validation is required to substantiate the role of a BLS mapping in interval timing. Four considerations deserve scrutiny. First, formulation of the likelihood function might take into account factors other than scalar variability that could alter measurement noise. For example, task difficulty or reinforcement schedule (Supplementary Fig. S3) could motivate subjects to pay more attention to certain intervals, and to measure them more reliably, which could in turn strengthen the role of the likelihood function relative to the prior. Therefore, it is important to consider attention and other related cognitive factors as an integral part of how the nervous system could balance the relative effects of the likelihood function and the prior. Second, knowledge of the prior is itself subject to uncertainty, and the internalized prior distribution may differ from the one imposed experimentally. Third, the feedback subjects receive is likely to interact with the mapping rule they adopt. Our feedback schedule did not encourage the use of BLS rule, but we cannot rule out the possibility that it influenced subjects’ behavior. Fourth, although the operation of a Bayesian estimator is formulated deterministically, its neural implementation is likely subject to biological noise. These different sources of variability must be parsed out before the estimator can be characterized definitively. These considerations, which concern all Bayesian models of psychophysical data, highlight the gap between ‘normative’ descriptions and their biological implementation.
We referred to our model as a Bayesian observer, and not a Bayesian observer-actor because our formulation was only concerned with making optimal estimates. But since the full task of the observer was to reproduce those estimated intervals, we can formulate a Bayesian observer-actor whose objective is to directly optimize production times, and not the intervening estimates. This model has to take into account both the measurement and production uncertainty and integrate them with prior to compute the probability of every possible pair of sample and production interval. It would then use this joint posterior to minimize the cost of producing erroneous intervals. The derivations associated with the Bayesian observer-actor model are more involved and beyond the scope of the present work. Yet, we note that under suitable assumptions, the two models would behave similarly (see Methods).
Context-dependent central timing
Our finding suggests that the brain takes into account knowledge of temporal uncertainty and adapts its time keeping mechanisms to temporal statistics in the environment. What neural computations may lead to such sophisticated behavior? One possibility is that the brain implements a formal Bayesian algorithm. For example, populations of neurons might maintain an internal representation of the prior distribution and the likelihood function, multiply them to represent a posterior and produce an estimate by approximating its expectation. Related variants of this scheme are also conceivable. For instance, our results could be accommodated by an ML strategy if the prior would exert its effect indirectly by changing the statistics of noise associated with measurements. Another more attractive possibility that obviates the need for explicit representations of the likelihood function and the prior is for the brain to learn the sensorimotor transformation that would map measurements onto their corresponding Bayesian estimates directly30. This is what our observer model exemplifies: it establishes a deterministic nonlinear mapping function to directly transform measurements to estimates. Evidently, this form of learning must incorporate knowledge about (1) scalar variability, and (2) prior distribution.
Electrophysiological recordings from sensorimotor structures in monkeys have described computations akin to those our observer model utilizes. For instance, parietal association regions and subcortical neurons in Caudate have been shown to reflect flexible sensorimotor associations31, 32. The time course of activity across sensorimotor neurons is believed to represent sensory evidence33, its integration with the prior information34, and the preparatory signals in anticipation of instructed and self-generated action35–37. The importance of sensorimotor structures in time reproduction is further reinforced by their consistent activation in human neuroimaging studies that involve time sensitive computations38–41.
A variety of models have been proposed to explain the perception and use of an interval of time. Information theoretic models attribute the sense of time to the accumulation of tics from a central clock11, 42, 43; physiological studies have noted a general role for rising neural activity for tracking elapsed time in the brain36, 37, 44–48, and biophysical models have been developed that suggest that time may be represented through the dynamics of neuronal network49. Our work, which does not commit to a specific neural implementation, suggests that the internal sense of elapsed time in the sub-second to seconds range may arise from a plastic sensorimotor process that enables us to operate efficiently in different temporal contexts.
Methods
Psychophysical procedures
Six human subjects aged 19 to 40 yr participated in this study after giving informed consent. All had normal or corrected-to-normal vision, and all were naïve to the purpose of the experiment. Subjects viewed all stimuli binocularly from a distance of 52 cm on an 17-inch iiyama AS4311U LCD monitor at a resolution of 1024×768 driven by a Intel Macintosh G5 computer at a refresh rate of 85 Hz in a dark, quiet room.
In a “Ready-Set-Go” time reproduction task, subjects measured certain sample intervals demarcated by a pair of flashed stimuli, and reproduced those intervals by producing time-sensitive manual responses. Each trial began with the presentation of a central fixation point (FP) for 1 s, followed by the presentation of a warning stimulus at a variable distance to the left of the FP. After a variable delay ranging from 0.25 to 0.85 s drawn randomly from a truncated exponential distribution, two 100 ms flashes separated by the sample interval, ts, were presented. The first flash, which signified the “Ready” stimulus, was presented at the same distance as the warning stimulus but to the right of the FP. The “Set” stimulus was presented ts ms afterwards and 5 deg above the FP (Fig. 1a). Subjects were instructed to measure and reproduce the sample interval by pressing the space bar on the keyboard ts ms after the presentation of the “Set”. Production times, tp, were measured from the center of the “Set” flash (i.e. 50 ms after its onset) to when the key was pressed. When tp was sufficiently close to ts, the warning stimulus changed from white to green to provide positive feedback and encourage stable performance.
All stimuli were circular in shape and were presented on a dark grey background. Except for FP that subtended 0.5 deg of visual angle, all other stimuli were 1.5 deg. To ensure that subjects could not use the layout of the stimuli to adopt a spatial strategy for the time reproduction task (e.g. track an imaginary moving target), we varied the distance of the “Ready” and the warning stimulus from the FP on each trial (range 7.5 to 12.5 deg).
For each subject, three experimental conditions were tested separately. These conditions were the same in all respects except that, for each condition, the sample intervals were drawn from a different prior probability distribution. All priors were discrete Uniform distributions with 11 values, ranging from 494 to 847 ms for the “Short”, 671 to 1023 ms for the “Intermediate”, and 847 to 1200 ms for “Long” prior condition. Note that to help tease apart the effects of prior condition from sample interval, the priors were chosen to be partially overlapping.
For each subject, the order in which the three prior conditions were tested was randomized. For each prior condition, subjects were tested after they completed an initial learning stage. Learning was considered complete when the variance and bias of the production times had stabilized (less than 10% change between sessions). The main data for each prior condition were collected in two sessions after leaning for that condition was complete. Learning for each subsequent prior condition started after testing for the preceding prior condition was completed. For 5 out of 6 subjects, the learning was completed by the end of the first session (less than 10% change between first and second sessions). For one subject, learning of the first prior condition was completed after 4 sessions. For this subject, the 5th and 6th sessions provided data for the first prior condition. For the other two prior conditions, similar to other subjects, responses stabilized after one practice session. All subjects typically participated in 3 sessions per week, and each sessions lasted ~45 minutes (i.e. nearly 500 trials).
Subjects received positive feedback for responses that fell within a specified window around ts (i.e., “correct” trials). To compensate for the increased difficulty associated with longer sample intervals – a natural consequence of scalar timing variability2–4 – the width of this window was scaled with the sample interval with a constant of proportionality, k. To ensure that the performance was comparable across different prior conditions, the value of k was controlled by an adaptive one-up one-down procedure which added/subtracted 0.015 to/from k for each “miss”/“correct” trial. As such, every subject’s performance for every session yielded approximately 50% positively reinforced trials (mean = 51.7%; std=1.33%). For each prior condition, the maximum (minimum) number of “correct” trials corresponded to the intermediate (extreme) sample intervals (Supplementary Fig. S3).
The Bayesian estimator
The noise distribution associated with the measurement stage of the model, determines the distribution of tm for a given ts, p(tm|ts). From the perspective of the observer who makes a measurement tm but does not know ts, this relationship becomes a function of ts known as the likelihood function, λtm(ts) ≡ p(tm|ts) in which tm is fixed. We modelled p(tm|ts) as a Gaussian distribution with mean ts, and standard deviation wmts that scaled linearly with ts (scalar variability) with a constant coefficient of variation, wm.
(1) |
Similarly, the production noise distribution, p(tp|te), was assumed to be Gaussian with zero mean and a constant coefficient of variation, wp:
(2) |
To simplify derivations, we modelled the discrete uniform prior distributions used in the experiment as continuous. For each prior condition, we specified the domain of sample intervals between and based to the minimum and maximum values used in the experiment.
(3) |
The resulting posterior, π(ts|tm), is the product of the prior multiplied by the likelihood function and appropriately normalized:
(4) |
The Bayesian estimator computes a single estimate, te, from the posterior by considering an objective cost function, l(te, ts), that quantifies the cost of erroneously estimating ts as te. The Bayesian estimate minimizes the posterior expected loss, which is the integral of the cost function for each ts, weighted by its posterior probability, π(ts|tm):
(5) |
Notice that the optimal estimate, te, is a deterministic function of the measured sample, fl(tm) in which the subscript l reflects the particular cost/loss function.
For the ML model, the estimator fML(tm ) is associated with the sample interval that maximizes the likelihood function, which can be derived from Equation (1):
(6) |
The ML estimate is proportional to measurement. For a plausible range of values for wm, the constant of proportionality would be less than one, and thus the ML would systematically underestimate the sample. For instance, for 0.1< wm < 0.3, the constant of proportionality would vary between 0.99 and 0.92.
For the MAP rule, the cost function is −δ(te−ts), where δ(.) denotes the Dirac delta function. The corresponding estimator function, fMAP(tm ), is specified by the mode of the posterior as follows:
(7) |
For the BLS rule, the cost function is the squared error, (te−ts)2, and the estimator function, fBLS(tm) corresponds to the mean of the posterior:
(8) |
The Bayesian observer model
The Bayesian estimator specifies a deterministic mapping from a measurement, tm to an estimate, te. But our psychophysical data consists of pairs of sample interval, ts and production time, tp. Accordingly, we augmented the estimator with a measurement stage and a production stage, which together with the estimator, provide a complete characterization of the relationship between ts and tp. The model however relies on two intermediate variables tm and te that are psychophysically unobservable (i.e. hidden variables). To remove these variables from the description of the model, we took advantage of a trick common to Bayesian inference, which is to integrate out the hidden variables (i.e. marginalization). Specifically, using the chain rule, we decomposed the joint conditional distribution of variables tm, te, and tp to three intervening conditional probabilities:
(9) |
We used the serial architecture of our model (Fig. 3a) to simplify the dependencies in the right hand side of Equation (9). In the first term, because the conditional probability of tp is fully specified by te and wp (from Equation (2)), we can safely omit the other conditional variables (tm, ts and wm). In the second term, the only relevant conditional variable is tm since it specifies te deterministically. And for the third term, wp has no bearing on tm. Incorporating these simplifications, the joint conditional distribution can be rewritten as follows:
(10) |
Moreover, because te is a deterministic function of tm; i.e., te= f(tm), the conditional probability p(te|tm ) can be written as a Dirac delta function as follows:
(11) |
We can eliminate the dependence on the two hidden variables tm and te by marginalization:
(12) |
The integrand is the product of the conditional probability distributions associated with the measurement and production stages. By substituting these distributions from Equations (1) and (2), and f(tm) from Equation (6), (7), or (8) (depending on the estimator of interest), Equation (12) provides the conditional probability of tp for a given ts as a function of the model parameters, wm and wp.
The Bayesian observer-actor model
The observer model described in the previous section obtains an estimate that minimizes a cost built around the estimate and the actual time interval. It was formulated to minimize the expected loss associated with erroneous estimates, not production times. A more elaborate Bayesian “observer-actor” model would seek to minimize expected loss with respect to the ensuing production times (and not the intervening estimates). This elaboration demands two considerations. First, the uncertainty associated with both the measurement and the production phases must be taken in to account. As such, the relevant probability distribution would be the joint posterior of the sample interval and production time conditioned on the measurement, π(tp, ts|tm). Second, the definition of the cost function should concern the sample interval and production time; i.e., l(tp, ts). The appropriate posterior expected loss could then be minimized as follows:
(13) |
The Delta and least squares cost functions in this optimization problem do not correspond to the mode and mean of the joint posterior, and derivation of the optimal solution is more involved and beyond the scope of this manuscript. Nonetheless, we note that the corresponding estimators for the Bayesian observer-actor are qualitatively similar to those we derived for the MAP and BLS mapping rules in our simplified Bayesian observer model.
Fitting the model to the data
We assumed that tp values associated with any ts were independent across trials, and thus expressed the joint conditional probability of individual tp values across all N trials, and across the three prior conditions, by the product of their individual conditional probabilities:
(14) |
The products change to sums by taking the logarithm of both sides:
(15) |
Each term in the sum was derived from Equation (12), after substituting f(tm) with the appropriate estimator function (Equation (6), (7) or (8)).
We used this equation to maximize the likelihood of model parameters, wm and wp, across all ts and tp values measured psychophysically. The maximization was done using ‘fminsearch’ command in MATLAB software, which incorporates the Nelder-Mead downhill simplex optimization method. Integrals of Equations (8) and (12) are not analytically solvable and were thus approximated numerically using the trapezoidal rule. We evaluated the success of the fitting exercise by repeating the search with different initial values; the likelihood function near the fitted parameters was highly concave, and the fitting procedure was stable with respect to initial values.
Supplementary Material
Acknowledgments
This work was supported by a fellowship from Helen Hay Whitney Foundation, HHMI, and research grants EY11378 and RR000166 from the NIH. We are grateful to G. Horwitz (G.H.) for sharing resources and to G.H. and V. de Lafuente for their feedback on the manuscript.
Footnotes
Contributions
M.J. designed the experiment, collected and analyzed the data and performed the computational modelling. M.N.S. helped in data analysis and provided intellectual support throughout the study. M.J. and M.N.S. wrote the manuscript.
References
- 1.Mauk MD, Buonomano DV. The neural basis of temporal processing. Annu Rev Neurosci. 2004;27:307–340. doi: 10.1146/annurev.neuro.27.070203.144247. [DOI] [PubMed] [Google Scholar]
- 2.Gallistel CR, Gibbon J. Time, rate, and conditioning. Psychol Rev. 2000;107:289–344. doi: 10.1037/0033-295x.107.2.289. [DOI] [PubMed] [Google Scholar]
- 3.Rakitin BC, et al. Scalar expectancy theory and peak-interval timing in humans. J Exp Psychol Anim Behav Process. 1998;24:15–33. doi: 10.1037//0097-7403.24.1.15. [DOI] [PubMed] [Google Scholar]
- 4.Brannon EM, Libertus ME, Meck WH, Woldorff MG. Electrophysiological measures of time processing in infant and adult brains: Weber’s Law holds. J Cogn Neurosci. 2008;20:193–203. doi: 10.1162/jocn.2008.20016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Gibbon J, Church RM. Comparison of variance and covariance patterns in parallel and serial theories of timing. J Exp Anal Behav. 1992;57:393–406. doi: 10.1901/jeab.1992.57-393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Reutimann J, Yakovlev V, Fusi S, Senn W. Climbing neuronal activity as an event-based cortical representation of time. J Neurosci. 2004;24:3295–3303. doi: 10.1523/JNEUROSCI.4098-03.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Matell MS, Meck WH. Cortico-striatal circuits and interval timing: coincidence detection of oscillatory processes. Brain Res Cogn Brain Res. 2004;21:139–170. doi: 10.1016/j.cogbrainres.2004.06.012. [DOI] [PubMed] [Google Scholar]
- 8.Ahrens M, Sahani M. Inferring Elapsed Time from Stochastic Neural Processes. In: Platt JC, Koller D, Singer Y, Roweis S, editors. Advances in Neural Information Processing Systems. MIT Press; 2008. [Google Scholar]
- 9.Casella G, Berger RL. Duxbury Resource Center; Pacific Grove, CA: 2002. [Google Scholar]
- 10.Lewis PA, Miall RC. The precision of temporal judgement: milliseconds, many minutes, and beyond. Philos Trans R Soc Lond B Biol Sci. 2009;364:1897–1905. doi: 10.1098/rstb.2009.0020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Treisman M. Temporal discrimination and the indifference interval. Implications for a model of the “internal clock”. Psychol Monogr. 1963;77:1–31. doi: 10.1037/h0093864. [DOI] [PubMed] [Google Scholar]
- 12.Hollingworth HL. The central tendency of judgement. Arch Psychol. 1913;4:44–52. [Google Scholar]
- 13.Parducci A. Category judgment: a range-frequency model. Psychol Rev. 1965;72:407–418. doi: 10.1037/h0022602. [DOI] [PubMed] [Google Scholar]
- 14.Helson H. Adaptation-level as a basis for a quantitative theory of frames of reference. Psychol Rev. 1948;55:297–313. doi: 10.1037/h0056721. [DOI] [PubMed] [Google Scholar]
- 15.Kersten D, Mamassian P, Yuille A. Object perception as Bayesian inference. Annu Rev Psychol. 2004;55:271–304. doi: 10.1146/annurev.psych.55.090902.142005. [DOI] [PubMed] [Google Scholar]
- 16.Kording KP, Wolpert DM. Bayesian integration in sensorimotor learning. Nature. 2004;427:244–247. doi: 10.1038/nature02169. [DOI] [PubMed] [Google Scholar]
- 17.Knill DC, Richards W. Perception as Bayesian Inference. Cambridge University Press; Cambridge: 1996. [Google Scholar]
- 18.Mamassian P, Landy MS, Maloney LT. Bayesian modelling of visual perception. In: RROB, LM, editors. Probabilistic Models of the Brain: Perception and Neural Function. MIT Press; Cambridge, MA: 2002. pp. 239–286. [Google Scholar]
- 19.Miyazaki M, Nozaki D, Nakajima Y. Testing Bayesian models of human coincidence timing. J Neurophysiol. 2005;94:395–399. doi: 10.1152/jn.01168.2004. [DOI] [PubMed] [Google Scholar]
- 20.Hudson TE, Maloney LT, Landy MS. Optimal compensation for temporal uncertainty in movement planning. PLoS Comput Biol. 2008;4:e1000130. doi: 10.1371/journal.pcbi.1000130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Bernardo JM, Smith AFM. Bayesian Theory. Wiley; New York: 1994. [Google Scholar]
- 22.Stocker AA, Simoncelli EP. Noise characteristics and prior expectations in human visual speed perception. Nat Neurosci. 2006;9:578–585. doi: 10.1038/nn1669. [DOI] [PubMed] [Google Scholar]
- 23.Trommershauser J, Maloney LT, Landy MS. Statistical decision theory and the selection of rapid, goal-directed movements. J Opt Soc Am A Opt Image Sci Vis. 2003;20:1419–1433. doi: 10.1364/josaa.20.001419. [DOI] [PubMed] [Google Scholar]
- 24.Mamassian P. Overconfidence in an objective anticipatory motor task. Psychol Sci. 2008;19:601–606. doi: 10.1111/j.1467-9280.2008.02129.x. [DOI] [PubMed] [Google Scholar]
- 25.Ernst MO, Banks MS. Humans integrate visual and haptic information in a statistically optimal fashion. Nature. 2002;415:429–433. doi: 10.1038/415429a. [DOI] [PubMed] [Google Scholar]
- 26.Jacobs RA. Optimal integration of texture and motion cues to depth. Vision Res. 1999;39:3621–3629. doi: 10.1016/s0042-6989(99)00088-7. [DOI] [PubMed] [Google Scholar]
- 27.Tassinari H, Hudson TE, Landy MS. Combining priors and noisy visual cues in a rapid pointing task. J Neurosci. 2006;26:10154–10163. doi: 10.1523/JNEUROSCI.2779-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Graf EW, Warren PA, Maloney LT. Explicit estimation of visual uncertainty in human motion processing. Vision Res. 2005;45:3050–3059. doi: 10.1016/j.visres.2005.08.007. [DOI] [PubMed] [Google Scholar]
- 29.Kording KP, Wolpert DM. The loss function of sensorimotor learning. Proc Natl Acad Sci U S A. 2004;101:9839–9842. doi: 10.1073/pnas.0308394101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Raphan M, Simoncelli EP. Neural Information Processing Systems. MIT Press; 2006. Learning to be Bayesian without Supervision; pp. 1145–1152. [Google Scholar]
- 31.Toth LJ, Assad JA. Dynamic coding of behaviourally relevant stimuli in parietal cortex. Nature. 2002;415:165–168. doi: 10.1038/415165a. [DOI] [PubMed] [Google Scholar]
- 32.Lauwereyns J, et al. Feature-based anticipation of cues that predict reward in monkey caudate nucleus. Neuron. 2002;33:463–473. doi: 10.1016/s0896-6273(02)00571-8. [DOI] [PubMed] [Google Scholar]
- 33.Shadlen MN, Newsome WT. Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. J Neurophysiol. 2001;86:1916–1936. doi: 10.1152/jn.2001.86.4.1916. [DOI] [PubMed] [Google Scholar]
- 34.Gold JI, Law CT, Connolly P, Bennur S. The relative influences of priors and sensory evidence on an oculomotor decision variable during perceptual learning. J Neurophysiol. 2008;100:2653–2668. doi: 10.1152/jn.90629.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Janssen P, Shadlen MN. A representation of the hazard rate of elapsed time in macaque area LIP. Nat Neurosci. 2005;8:234–241. doi: 10.1038/nn1386. [DOI] [PubMed] [Google Scholar]
- 36.Maimon G, Assad JA. A cognitive signal for the proactive timing of action in macaque LIP. Nat Neurosci. 2006;9:948–955. doi: 10.1038/nn1716. [DOI] [PubMed] [Google Scholar]
- 37.Schultz W, Romo R. Role of primate basal ganglia and frontal cortex in the internal generation of movements. I. Preparatory activity in the anterior striatum. Exp Brain Res. 1992;91:363–384. doi: 10.1007/BF00227834. [DOI] [PubMed] [Google Scholar]
- 38.Meck WH, Penney TB, Pouthas V. Cortico-striatal representation of time in animals and humans. Curr Opin Neurobiol. 2008;18:145–152. doi: 10.1016/j.conb.2008.08.002. [DOI] [PubMed] [Google Scholar]
- 39.Cui X, Stetson C, Montague PR, Eagleman DM. Ready...go: Amplitude of the FMRI signal encodes expectation of cue arrival time. PLoS Biol. 2009;7:e1000167. doi: 10.1371/journal.pbio.1000167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Nobre A, Correa A, Coull J. The hazards of time. Curr Opin Neurobiol. 2007;17:465–470. doi: 10.1016/j.conb.2007.07.006. [DOI] [PubMed] [Google Scholar]
- 41.Rao SM, Mayer AR, Harrington DL. The evolution of brain activation during temporal processing. Nat Neurosci. 2001;4:317–323. doi: 10.1038/85191. [DOI] [PubMed] [Google Scholar]
- 42.Allan LG. Perception of Time. Perception & Psychophysics. 1979;26:340–354. [Google Scholar]
- 43.Creelman CD. Human Discrimination of Auditory Duration. Journal of the Acoustical Society of America. 1962;34:582. [Google Scholar]
- 44.Lee IH, Assad JA. Putaminal activity for simple reactions or self-timed movements. J Neurophysiol. 2003;89:2528–2537. doi: 10.1152/jn.01055.2002. [DOI] [PubMed] [Google Scholar]
- 45.Mita A, Mushiake H, Shima K, Matsuzaka Y, Tanji J. Interval time coding by neurons in the presupplementary and supplementary motor areas. Nat Neurosci. 2009;12:502–507. doi: 10.1038/nn.2272. [DOI] [PubMed] [Google Scholar]
- 46.Okano K, Tanji J. Neuronal activities in the primate motor fields of the agranular frontal cortex preceding visually triggered and self-paced movement. Exp Brain Res. 1987;66:155–166. doi: 10.1007/BF00236211. [DOI] [PubMed] [Google Scholar]
- 47.Tanaka M. Cognitive signals in the primate motor thalamus predict saccade timing. J Neurosci. 2007;27:12109–12118. doi: 10.1523/JNEUROSCI.1873-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Tanaka M. Inactivation of the central thalamus delays self-timed saccades. Nat Neurosci. 2006;9:20–22. doi: 10.1038/nn1617. [DOI] [PubMed] [Google Scholar]
- 49.Buonomano DV, Maass W. State-dependent computations: spatiotemporal processing in cortical networks. Nat Rev Neurosci. 2009;10:113–125. doi: 10.1038/nrn2558. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.