Abstract
Several experimental studies claim to be able to predict the outcome of simple decisions from brain signals measured before subjects are aware of their decision. Often, these studies use multivariate pattern recognition methods with the underlying assumption that the ability to classify the brain signal is equivalent to predict the decision itself. Here we show instead that it is possible to correctly classify a signal even if it does not contain any predictive information about the decision. We first define a simple stochastic model that mimics the random decision process between two equivalent alternatives, and generate a large number of independent trials that contain no choice-predictive information. The trials are first time-locked to the time point of the final event and then classified using standard machine-learning techniques. The resulting classification accuracy is above chance level long before the time point of time-locking. We then analyze the same trials using information theory. We demonstrate that the high classification accuracy is a consequence of time-locking and that its time behavior is simply related to the large relaxation time of the process. We conclude that when time-locking is a crucial step in the analysis of neural activity patterns, both the emergence and the timing of the classification accuracy are affected by structural properties of the network that generates the signal.
The subjective feeling of consciously taking free decisions is questioned in several studies1,2,3,4,5,6 inspired by the seminal works of Libet7,8. These works1,2,3,4,5,6,7,8 have tried to relate the onset of neural activity preceding a voluntary action with the time of decision. In recent versions of these experimental works, subjects were asked to freely decide to press a right or left button2,3 while their brain activity was recorded with functional magnetic resonance imaging (fMRI). A linear support vector machine (SVM), a multivariate pattern recognition method often used to classify imaging signals6,9,10, was trained to classify left or right button-press trials. The resulting classification accuracy was found to be above chance several seconds before the time point of conscious decision2,3 both in the frontopolar (BA10) and in the parietal cortex. Analogous results were obtained also in the case of complex free-decisions4 and supported by intracranial studies6. The emergence of these so called choice-predictive signals4 before awareness has sparked a hot debate11,12 within the field of neuroscience, and in research fields concerned with moral responsibility and legal culpability when the decision process occurs beyond conscious control13. These findings are interpreted as evidence that specific brain activity patterns contain information about the upcoming decision even before subjective awareness2,11,14. At the base of this interpretation there is the implicit assumption that the time course of the classification accuracy is exclusively affected by information about future choices. However, classifiers are not predictive models. To clarify the difference between classification and prediction we introduce a conceptual model that generates mutually exclusive, independent “left” and “right” trials to be classified first by means of SVM classification and then analyzed with information theory methods.
The model developed here relates to the so-called WWW model, where intentional control of actions involves three main components15: The “what” component accounts for what type of action is going to be taken, e.g., pressing the left or right button2,3,4; the “when” and “whether” components are instead related to the timing of the action and the final event of the action, respectively. Traditionally, the readiness potential16 (RP) was considered to be predictive of when to move7,8 and to be related to the conscious intention to move12. The origin of the RP was recently investigated using a modeling approach17,18 based on a variant of the classic drift-diffusion model19. The RP was also used in a more recent study20 about vetoing voluntary decisions and shown to be a necessary but not sufficient condition for movement to take place. This is in line with those studies claiming that the RP may be a highly reproducible accident only fortuitously related to movement18.
In this study we introduce a model that contains the three WWW components, contains signals that are necessary but not sufficient for the final event “left” or “right” to take place, and is built in such a way that only predictions at chance level, i.e., 50%, are possible. As we shall see, after time-locking the trials to the time-point of the final event, the classification accuracy will raise well above chance level long before the end of the trials. We will show that this happens despite the fact that the analyzed signal does not contain any choice-predictive information.
Materials and Methods
Model setup
As a metaphor of the decision process, our model describes a random walker in a unidimensional room, i.e., on a line. Time and space are discrete and the walker jumps either to the right or to the left with equal probability at each time step (Fig. 1a). On both opposite walls, left and right, there is one light that the walker tries to switch on by pressing one button every time he happens to reach the wall (Fig. 1a). The two buttons, however, are connected to power only at random times and are not synchronized either with each other or with anything else. The random process that turns the power on and off accounts for an independent veto process that aborts a decision not allowing it to become action.
Figure 1. Scheme of the model, data generation, and time-locked average.
(a) The rules of the model. A random walker moves by discrete, equally sized steps with equal probability to the left or to the right in a room (I). During the walk, the walker will eventually come close to the buttons placed on the right and left wall and presses the corresponding button without interrupting its walk. A switch turns on and off randomly, and independently of the position of the walker. Four types of events are possible: either the walker reaches one of the lights (left or right) when the random switch is off and the light stays off (cases II and III), or the switch is on and the light turns on (cases IV and V). (b) The black line shows a short piece of a trajectory in which the “right” (blue) light on the right wall was successfully turned on. The N positions, colored in light blue, preceding the successful event constitute a “right” trial. (c) Averaging over many independent trials for both lights shows that the average position of the walker gets closer to the left and right walls as time grows towards t = 0, when the trajectories are time-locked to a successful light bulb on.
The power circuit is on at each discrete time step with a fixed probability and stays on just for the duration of one time unit. If by chance the walker presses the button when the power is on, then the light will shine for just one time step. Otherwise, the light will stay off. The epoch of length N of the walker’s trajectory prior to a light flash corresponds to one trial (Fig. 1b). Here, we considered the limit case when the time between two consecutive “power on” events is much longer than the time needed by the walker to visit the whole room uniformly. This last requirement is equivalent to ask subjects in an experiment to avoid correlations between consecutive button press events.
To understand the effect of topology and of the intrinsic time scales, we have both considered rooms of different sizes (i.e., different distances between the walls) as well as a version of the model where we give the walker the possibility to jump from any position to any other position at any step and with equal probability. As we shall see, both variants are crucial to interpret the result of the classification. An intermediate variant of the model is considered in the Supplementary materials.
Data simulations
In oder to quantitatively implement the metaphor discussed above and to perform the simulations of the walker in the room (Fig. 1a), we have considered a stationary random walk process on a line with n + 2 positions {0, 1, …, n, n + 1}. One “left” trial is then generated as follows. (i) A very long time series of the stationary random walk is first generated in such a way that every state is visited a large number of times. (ii) Then, one out of the many occurrences of the position (or state) 0 is chosen at random with equal probability and the N preceding steps of the walk are stored.
The next “left” trial is generated by starting again from a new and independent time series at stationarity. The “right” trials are generated in a similar way by randomly choosing one of the many occurrences of the state n + 1 instead. Each trial is extracted from an independent stationary time series that does not contain any information about which trial will be eventually extracted. In this way, the states 0 and n + 1 are necessary but not sufficient conditions to generate a “left” and “right” trial, respectively. Once this correspondence is determined, one can easily generalize this process to any kind of network. The easiest generalization is for the complete graph, as described below. Any other network with non-homogeneous degree distribution but symmetric with respect to 0 and n + 1 delivers the same qualitative results (Supplementary materials).
With this setup, we have generated statistically independent trials simulating a random walk in discrete time on two types of networks, a linear chain and a complete graph (Fig. 2). The linear chain simulates the room contained within two walls. The complete graph, instead, is a topology that guarantees the possibility for the walker to jump with equal probability to any position in just one step. At steady state, the random walk visits all states {0, 1, …, n, n + 1}, with uniform probability. Thereby, the states 0 and n + 1 are two boundary states and the lights can flash only when the walker is in one of these states. As already mentioned, visiting the boundary is a necessary condition for the light to go on, but it is not sufficient. We interpret this both as an effect of veto that can independently inhibit the light to shine and as a model for brain signals that are necessary but not sufficient for an event to happen.
Figure 2. Classification.
(a) The solid lines show the time course of the cross-validated classification accuracy time-locked to t = 0 based on the average across k = 500 realizations (i.e., participants) for a random walk on a line with n = 5 and n = 10. For both n, the accuracy decreases from 100% moving backward from t = 0 and the smaller the system size is, the faster is the accuracy decrease. The inset is a scheme of the linear network. To ensure a uniform distribution, the walker can jump with equal probability either left or right from each position but the boundaries. (b) Time course of classification accuracy for a random walk on a complete graph. The average accuracy remains 50% and is independent of n. The inset shows the scheme of a complete graph, here the walker can jump from any position (or state) to any other position in just one step.
To mimic experimental conditions we generated k independent sets (participants) of 2M independent trials, M called “left” and M called “right”. Both for the linear chain and for the complete graph, we have generated the trials exploiting the time reversal property of the random walk process. The same holds also for a graph with non-homogeneous degree distribution (Supplementary materials).
Support-vector classification
We used a time-resolved cross-validated linear support-vector machines (SVMs) for classification (Supplementary Figure 1). The time point of the final event is set at t = 0. All other previous positions of the walker are at negative times. After time-locking at t = 0, the 2M trials for each of the k participants, the cross-validation approach initially consists in subdividing the trials into independent training and test groups. To avoid classification biases, training and test sets contained an equal number of “left” and “right” trials. At each time point the training set was used to train a support vector machine to distinguish between “left” and “right” decisions2,3,4. The obtained model was then used to classify the test set. We have used a leave-one-pair-out cross-validation: each pair of the 2M “left” plus “right” trials was successively used for testing the model learned on the remaining 2M − 2 trials. The classification performance was quantified in terms of accuracy: each cross-validation iteration produced 0%, 50% or 100% depending on whether the classifier attributed 0, 1 or 2 correct labels to each pair of test trials. At each time point t, the goodness of classification was given as average percentage at across all iterations (M per participant). The same analysis was repeated at each time point, thus generating a time course of accuracy at for each of the k participants. The result is presented in terms of the average across all simulations of the time course of accuracy (Fig. 2a,b). The actual classification was performed using the standard Matlab (The MathWorks, Inc., Natick, Massachusetts, United States) library for support vector machines21. The choice of this classifier is dictated purely because we wanted to use the same analysis techniques that are commonly employed for similar experimental data, and especially those analysis techniques used in the experiment with the “left” and “right” button presses2.
The random walks
In this work we have considered two versions of the random walk. A third version is discussed in the Supplementary materials. In all cases, the state space of the walk is given by the set of n + 2 states {0, 1, …, n + 1}. Technically the random walk considered here is a Markov chain in discrete time on this state space. Let Xt be the random variable that gives the state visited by the process after step t = 1, 2, …. The transition probabilities governing the behavior of the process are formally defined from the conditional probabilities
![]() |
as the elements of the (n + 2) × (n + 2) dimensional transition probability matrix P. We have chosen this model only because it is the simplest conceivable model conveying our main conclusions.
Random walk on a line
The random walk on a line is the one-dimensional random walk with two boundaries. To ensure a uniform stationary distribution on this state space, the transition probability matrix Pij defined in (1) takes the following explicit form
![]() |
which can be more explicitly written as
![]() |
For later use, we define also the t-step transition probability matrix P(t) defined as the t-th power of P and we indicate it with
![]() |
whose limit, as t → ∞ gives the stationary probability distribution
. Given the choice of the transition probabilities (2), the stationary probability is uniform with πi = 1/(n + 2) for all i = 0, 1, …, n + 1.
Random walk on a complete graph
The second random walk model considered in the manuscript is the random walk on a complete graph. On this graph, the transition probabilities are given by
![]() |
for any choice of i and j in the state space {0, 1, …, n + 1}. The generation of the trials proceeds exactly as described previously. The random walk on a complete graph has the same transition probabilities as the stationary probability distribution. Technically, the transition matrix (5) can be seen as the transition matrix (2) taken to an infinitely large power. In fact, Eq. (5) is identical to the stationary distribution
of the process described by Eq. (2). This correspondence implies that if the time resolution of a measurement is long compared to the internal timescale, a process whose connectivity is for instance the linear chain may seem more connected than it is in reality. This correspondence does not hold for more complex networks of states with non-homogeneous degree distribution.
Relaxation timescales
Consider the orthogonal set of eigenvectors
, for i = 0, 1, …, n + 1 of the transition matrix P and their associated eigenvalues λi, whose property is that λ0 = 1 and all other λi have real parts strictly smaller than one in absolute value. The eigenvalue λ1 is a real number and it is the closest to λ0. Let us define the vector of initial conditions
![]() |
as the vector giving the probability mass function for the stochastic variable X0,
![]() |
and the vector
![]() |
as the vector giving the probability mass function for the variable Xt under the (implicit) condition that X0 is distributed according to
. Then, these two vectors are related through
![]() |
whose long time behavior gives the unique stationary probability
as the normalized eigenvector for the unitary eigenvalue of P
![]() |
where
. The vector
can be decomposed on the orthogonal space of the eigenvectors, as
![]() |
with the ci being the projection of
on the vector
. Therefore, it results that
![]() |
where we have defined the time scales τi = −1/log(λi), whose real part is positive. Eq. (12) means that the vector
approaches the stationary state
as t → ∞ because all
tend to zero in this limit. The largest of the λi, namely λ1 is the one that governs the long time behavior of this limit. It is therefore customary to associate a relaxation time scale τ1 to each Markov chain with a unique stationary state. The relaxation time scale is related to the largest eigenvalue λ1 smaller than unity of the transition matrix P, as
![]() |
whose meaning is that τ1 gives a lower estimate of the time scale needed to cover the state space according to the stationary probability distribution. For the random walk on the line, λ1 grows with the number of states n and becomes closer and closer to the value 1, the time scale τ1 becomes also larger and larger (Fig. 3c). The short time behavior of the system is however dominated by all involved time scales τi. A consequence of this discussion is that whenever a functional depends on the elements of the t-step matrix P(t) it depends on the
and therefore on the relaxation time scales τi. The relaxation time scale for the complete graph is again given by the inverse of the logarithm of the largest non-trivial eigenvalue. The relaxation time for the random walk on a complete graph is not dependent on the size of the system and is virtually zero (Fig. 3c). This means that the process has reached the steady state just after one step, due to the fact that the one-step transition probability matrix in Eq. (5) is already the matrix for the stationary state. For more complex networks that interpolate between the line and the complete graph, λ1 would depend non-trivially on both the topology of the network and on its size. The analysis of time scales presented here is very similar to the one applied in the context of neural networks22.
Figure 3. Distribution of walker’s position, mutual information and relaxation time.
(a) The distribution of the walker’s position changes from uniform to bimodal while the walker approaches the end point of the trials. At t = 0 the walker necessarily visits either 0 or (n + 1). The rate at which the distribution becomes uniform going backward in time from the time point t = 0 is related to the relaxation time of the stochastic process (Materials and Methods). (b) The time-locked mutual information I0 decreases moving backward in time from t = 0. The decrease is faster for a linear chain of smaller size because of the shorter relaxation time. (c) For a random walk on a line the relaxation time increases with the number of states n. For a random walk on a complete graph the relaxation time is virtually zero and independent of the number of states. Nevertheless, the procedure to generate the “left” and “right” trials is the same for both.
Time reversibility
For a process at stationarity, the time reversed transition matrix is defined as
![]() |
where
is a shorthand for X0 being chosen according to the stationary probability mass function
. The inverse matrix can be rearranged using the definition of conditional probabilities as
![]() |
where Pij is defined in (1). For both random walk models considered here, the matrix P(−) coincides with the forward matrix P. In general, however, when the network of states has loops and cycles, the time reversibility property does not hold and the simulation of the process backward in time requires the use of Eq. (15).
Information theory
For a stochastic variable Z, which we assume here to take values in a countable set σ, we denote with H(Z) its Shannon entropy, defined as
![]() |
with Pr(⋅) being the probability mass function associated to the stochastic variable Z, i.e. Pr(z) = Pr{Z = z} for any z ∈ σ. In the following we will simplify the notation and write
![]() |
instead of Eq. (16). For the problem discussed here the mutual information
![]() |
asks how much information about a stimulus S can be decoded from the response R.
Time-locked mutual information
Let us consider time-locked trials at t = 0. Time-locking the trials in our random walk means that the walker’s position X0 at time zero is either 0 or n + 1 given that the light flashes. We describe this condition by saying that the response R occurs at t = 0. The time-locked mutual information I0(S; R) implements this condition through:
![]() |
where the response R takes values in “left” or “right” whereas S is the position Xt of the walker after step t. The calculation of I0 can be reported into the framework of the standard definition given in (18) by introducing a new random variable Yt carrying the information about X0 being restricted to the two boundary states 0 and n + 1 with equal probability (Fig. 3a). The variable Yt is precisely defined as the variable Xt when the initial condition X0 is either in state 0 or in state n + 1. This is more precisely expressed by the set of variables
![]() |
which says that the event Yt = k is given by the two independent events Xt = k when X0 is either 0 or n + 1, for k ∈ {0, 1, …, n + 1}, where X0 is the position of the walker when the events “left” or “right” occur. Note that by construction X0 must be either 0 or n + 1 with equal probability and that the following condition holds
![]() |
and similar for X0 = n + 1. Thanks to the variable Yt, we have the identity
![]() |
which allows using Eq. (18) for the explicit calculation.
When the time t is measured relative to the time point in which the “left” and “right” trials end (both forward and backward in time), the distribution of Xt is not the stationary distribution
, derived in Eq. (10). The choice of the time-locked trials forces a new distribution of the process (Fig. 3a), which leads to use an appropriate random variable defined in (20). We analyze now the mutual information
![]() |
where the subscript 0 just reminds us that the use of the variable Yt is limited to time-locked trials, i.e., on the condition R occurs at time t = 0. Here t can take any integer value, with the meaning that negative t means that Yt is the process before the “left” or “right” event, whereas positive t means that it is afterwards. When t > 0, the time-locked mutual information is used to make predictions23, when t < 0 the time-locked mutual information can be used to perform a classification or interpolation24. Due to the time inversion symmetry of the random walks on the line and on the complete graph there is no difference in the results for positive and for negative t. To increase clarity, however, we will henceforth explicit the negative sign of t. To proceed, we need to derive two properties of the random variable Y. We start with
![]() |
where the transition probability matrix P was defined in (2) and where we exploit the time reversibility property and Eq. (21) and where the t-steps transition probabilities
were defined in Eq. (4). Given Eq. (24) it is now possible to define the Shannon entropy associated to Y−t as
![]() |
which is computed with elementary matrix algebra. Two limits can be easily computed by hand. At t = 0, the variable Y0 can be either 0 or n + 1 with equal probability. Therefore, it results H(Y0) = 1. In the limit t → ±∞, instead, Y−t becomes stationary and takes the same stationary distribution
as X−t (Fig. 3a). In this case we obtain
. A second useful property of Y−t is the following
![]() |
where the last probability can be computed explicitly using the transition matrix P. A similar expression holds obviously also when R = “right”. Therefore, we can now compute the conditional Shannon entropy
![]() |
using matrix algebra by means of Eq. (4). In the first sum, “left” and “right” are denoted with l and r, respectively. Plugging Eqs. (25) and (27) together in the definition of I0 given in Eq. (23) finally leads to a time dependent mutual information depending solely on the t-step transition probabilities. Since the time behavior of these probabilities depends only in the intrinsic timescales, the mutual information (23) decays, going backward in time, according to the time scales of the process (Fig. 3b).
Unconstrained mutual information
Also for the unconstrained mutual information I(S; R), the response R is one of the events “left”, “right” and the stimulus S is the position Xt of the walker after step t. In the calculation of I(S; R) there is no time-locking and the pattern Xt is sampled without any knowledge about the future. The mutual information (18) can be rewritten in the more useful form
![]() |
For the random walk models considered here, the single terms of this formula can be computed as follows. The Shannon entropy of the variable R alone is given by
![]() |
since “left” and “right” (here denoted with l and r, respectively) occur with equal probability. Furthermore, since Xt is not time-locked with the event R, in our random walk the event R is independent of Xt by construction and thus it results also
![]() |
for any t both positive and negative. Thus, putting all together it results
![]() |
identically. This is in agreement with our expectation since the trials have been built to lead to “left” or “right” with equal probability independently of the values taken by the variable Xt. This calculation demonstrates why the unconstrained mutual information captures the true nature of the underlying process. Deriving the unconstrained mutual information for our model is simple and can be done analytically. However, an application to real data may be challenging due to the limitation of imaging techniques. Such an application would indeed require the analysis of relatively large sets of data. This is necessary in order to determine the structure of the network that reproduces the dynamics connecting the various recorded patterns. For instance, fMRI measurements deliver time series of spatial brain activity patterns. After associating each spatial pattern to a state, the time series can be seen as a walk on this network of states. Once this network is known and the associated transition probabilities and the order of the Markov process describing the dynamics25 are determined, the unconstrained mutual information can be computed. When Xt visits those states that are sufficient to generate/predict the response R, then the mutual information will be larger than zero. The main challenge of this method relies on the limitations of brain imaging techniques. Probably, fMRI is not suitable for this analysis since long time series must be collected, both in the presence and in the absence of the event one wants to study. However, EEG, intracranial EEG, and single cell recordings are well established methods and could allow this approach.
Results
To mimic the experimental procedure2, we have generated a large number of independent trials ending randomly with the event “left” or “right” with 50% probability (Fig. 1a,b). The trials were built in such a way that no prediction better than 50% is possible. Once all the trials have been time-locked at the time point of “left” or “right” event, they were classified using the same tools and approaches as in the experimental works2,3,4 (Supplementary Figure 1).
For trials generated using a random walk on the linear chain (Fig. 1a) we obtained a classification accuracy above 50% several time steps before the end of the trials, which climbed to 100% at t = 0 (Fig. 2a). When the trials were generated using a random walk on a complete graph, instead, the mean accuracy remained at 50% level at all times (Fig. 2b). If we did not know the properties of the model that generates the trials, we would have interpreted the result for the random walk on the line (Fig. 2a) as evidence of an activity predicting the upcoming decision while approaching the “left” or “right” choice. However, for both networks only predictions at chance level are possible by construction. Therefore, the interpretation of accuracy above chance as reflecting “choice-predictive signals” must be wrong and the accuracy time course calls for a different explanation.
Time-locking the trials such that the events “left” or “right” occur at time point t = 0 is equivalent to knowing that at time point t = 0 the position X0 of the random walk is either equal 0 or n + 1 (Fig. 3a). To understand the role of time-locking, we exploit decoding methods from information theory26,27, which are intrinsically related to classification28 but allow analytical treatment (Materials and Methods). Methods based on information theory are often exploited to extract predictive informations from neural signals especially when past stimuli are used to predict future events23. The mutual information
![]() |
tells how much information about a stimulus S can be decoded from the response R, when H(X) is the Shannon entropy associated to the random variable X (Materials and Methods). In our model, the stimulus S is the position Xt of the walker at time t prior to the left/right event. The response R is either “left” or “right”. For times t < 0, the time-locked mutual information
![]() |
contains the information that a response R has occurred at time t = 0. There is a profound difference between the time-locked mutual information and the unconstrained mutual information I = I(Xt; R), where no information beyond time point t is known. Using methods for future conditioned stochastic processes29,30,31, both functions I and I0 can be computed analytically for our random walk (Materials and Methods). The time course of the time-locked mutual information I0 (Fig. 3b) is qualitatively similar to the SVM classification accuracy (Fig. 2a): it is maximal at the time point of time-locking, i.e., t = 0, and decreases at times t < 0 at a rate that depends on the relaxation timescales of the process (Materials and Methods). In contrast to this, the unconstrained mutual information I(Xt; R) is zero at all times, consistent with the fact that the random walk trajectory does not contain information about whether R will be “left” or “right”. Therefore, only the unconstrained mutual information I(Xt; R) gives a faithful representation of the procedure employed to generate the trials. This result shows that time-locking combined with the slow relaxation time of the walk (Fig. 3c) produces classification accuracies significantly larger than 50% before t = 0.
Discussion
Our modeling approach allowed us to understand the effect that time-locking has on the analysis of the neural signal preceding the outcome of a decision. We have generated data with a simple strategic model and analyzed them using the standard analysis techniques, based on the SVM classifier, typically exploited in the experimental works. We have complemented the analysis with an original approach based on information theory, which allows a transparent mathematical treatment. While the accuracy of the SVM alone can be confusing, the treatment with mutual information offers more clarity and allows to highlight the conditioning introduced by time-locking. In this way, no confusion can arise. However, when one believes to compute unconstrained quantities and has overseen the conditioning introduced by time-locking, a confusion in the interpretation of the result necessarily arises. Indeed, one would erroneously come to the conclusion that the time course of the accuracy is evidence of predictive signals where instead it is just time-locking and relaxation time. We have seen, indeed, that the classification accuracy is well above the chance level of 50% long time before the end of the trials when the trials are generated with the linear network model. We have demonstrated that this time behavior can be explained through the combined effect of network topology and relaxation timescale of the modeled process. By construction, our model does not contain any predictive information. From this we have to conclude that the raise of the classification accuracy long time before the time-locking event is not necessarily a signature of the emergence of predictive signals.
To fully capture the deceptive role of time-looking just consider the following instructive argument. Given a linear network with the buttons always connected to power, the light goes on each time one presses the button (Fig. 1a). If the walker is just one step before, say, the left wall, the probability that the left light will shine at the next step is 0.5. However, if we know that the next time step will be a decision time, the same probability is 1. This effect is reflected on the analysis and it is quantitatively evident when looking at the difference between the results of the conditioned, time-locked mutual information and the unconstrained mutual information. Only this last approach is able to show that there is no predictive signal. Our model generates a signal that is necessary but not sufficient to the generation of the final event. It may be argued that brain activity does not have such kind of signals. However, a recent experimental study on vetoing20 has shown that there are necessary but not sufficient brain activity patterns related to the decision and execution of simple tasks. These signals, indeed, can deceive a classifier trained to recognize brain activities related to movement.
Beyond the technical aspects, our model belongs to a broad class of models often used to study neural activity related to decision processes17. In line with these models, we believe that our result has a relevance in relation to the common paradigms in the field, as we will explain here.
When, What, Whether
The neural decision of “when” to move was recently investigated by modeling electrophysiological signals with a leaky stochastic accumulator model17, which may look somewhat similar to our model. However, our conceptual model is different. We aimed at introducing a conceptual model that captures all the fundamental ingredients of volition15. We were therefore interested in accounting not only for the “when”, but also for the “what” and, most importantly, for the “whether” decisions. For this, our model includes a stochastic process implementing the decision between “left” and “right”. This process has an intrinsic dynamics and a corresponding time-scale. Moreover, the model describes the veto process represented by the stochastic switch, which does not allow to systematically translate intention into action. This approach allowed us to show the theoretical pitfalls in the debate about free-decisions. In the present work, we describe the decision process with a simple diffusion without a drift term. This point is crucial to show that time-locking introduces a bias that generates apparent predictive signals. While our model does not allow predictions better than chance, in the stochastic accumulator model17 the presence of a drift term ensures by construction that eventually a decision will be taken. This is equivalent to say that the information about the decision accumulates in time and therefore predictions are intrinsically possible in the drift-diffusion model.
Veto process
Our model includes a veto process represented by the stochastic switch, which does not allow to systematically translate intention into action. Similar to other studies concerned with volition20, here we use the term veto because it was traditionally introduced by Libet. However, we do not share the dualistic flavor of Libet’s interpretation of this process. In contrast to Libet, who considered veto as the control of the conscious mind uncorrelated with brain activity, we believe that the veto is implemented in specific brain networks32,33. As for the stochastic accumulator model17, also our conceptual model aims at describing the decision process in its pre-motor phase while veto comes at a later stage and can inhibit the motor output of decision. We considered the decision process and veto as being statistically independent. From the experiments we know that proactive inhibition can slow down motor execution34. Because our model does not account for a motor-phase extended in time, we introduced veto as a binary process that can only allow or stop the execution of the intended action instead of acting as a slowing-down mechanism.
Relationship to Libet-like experiments
Our result supports an alternative approach to investigate the neural determinants of free-decisions. Besides confirming the bias of time-locking and suggesting a more appropriate analysis, our approach evidences the limitation of Libet-like experimental paradigms1,2,3,4,5,6,7,8. Already in his original work8, Libet reported that sometimes participants consciously felt the urge to move but they inhibited their action before a movement occurred. Moreover, it was recently shown that even when their decisions are predicted in real-time using brain signals preceding their actions, participants can veto their action before movement onset20. These experimental evidences confirm that veto implicitly plays a crucial role in Libet-like experiments. From the analysis point of view, time-locking to the button press is equivalent to ignore the veto because only trials corresponding to not-vetoed actions are considered. We have shown that this approach leads to misleading results. From the experimental point of view, paradigms that simultaneously include all decisions (when, what, and whether) can make explicit the effect of veto and are therefore more ecologically valid. When analyzing the data from such paradigms, or in order to interpret previous results1,2,3,4,6,7,8, it is therefore fundamental, on one hand, to quantify how the different WWW components modulates each other and, on the other hand, to quantify how this modulation changes in time.
Structural and topological effects
Finally, different brain regions are characterized by different intrinsic time scales probably related to the structure of the underlying neural circuit35. Furthermore, several experiments2,3,4 show that there are brain areas in which the accuracy of classification increases very late or it does not increase at all and that some of the brain regions showing significantly large classification accuracy are also particularly large in size11. Our approach allows an interpretation of these findings. The random walk teaches us that the classification accuracy can be enhanced by increasing the relaxation time of the process. The random walk on a line, e.g., has a long relaxation time that grows with the number of states (Fig. 3c). This type of walk generates trials that are easy to classify (Fig. 2a). In contract to this, trials generated from a random walk on a complete graph cannot be classified because the relaxation time is very short (Fig. 2b). In this latter case the accuracy remains always around chance. Thus, small, fast, and highly connected networks will lead to little increases in accuracy; large, slow, and sparsely connected networks will produce a stronger increase in accuracy. Therefore, the classification accuracy is a useful quantity to study structural properties of the neural circuit involved in the generation of task-related brain signals.
Conclusions
Taken together, our analyses show that classifying trials ending with a decision does not imply extracting predictive information about the decision itself. We have shown this by using the logic of a reductio ad absurdum proof. We have generated data that do not contain predictive information about the final event, i.e., “left” or “right” button press, and analyzed them with a standard classifier after time-locking the trials to the time point of the final event. We have shown that the time-course of the classification accuracy prior to the final event is well above the chance level depending on the topology of the underlying network of states. Since by construction the data do not contain any information about the future outcome, the high level of the classification accuracy cannot be interpreted as prediction. We have then exploited a more transparent approach based on the mutual information and demonstrated that time-locking introduces a bias analogous to future-conditioning. This allowed us to claim that the time-course of the classification accuracy at t ≤ 0 is a consequence of the network’s topology and of the time scales associated to the activity on the network. Our result adds to those critical positions questioning the existence of predictive signals of volition before awareness17,36 and their interpretation in terms of free-will14,37. Instead of proving the existence of choice-predictive signals, the time course of the classification accuracy can be interpreted as the signature of task-specific structural properties of local neural circuits generating the recoded brain activity.
Our analysis shows a limitation of “reverse-time” event-related studies, in which a signal S preceding a known event R is analyzed retrospectively to the occurrence of R. In these cases, signals S that are necessary but not sufficient to R will seem to be necessary and sufficient. Furthermore, the time scale of the decay, backward in time, of the classification accuracy is not necessarily related to the information about R contained in S but is due also to a structural component. We have shown how this structural component produces a large and long classification accuracy even in a model where the signal S has by construction no information about R. Time-locking to R generates therefore two important biases. On one hand the role of veto is bypassed; on the other hand, time-locking introduces a conditioning in the future that can create a long-time effect backward in time depending on the network topology. As we have shown here, this second bias produces the emergence of high classification accuracies also in the absence of predictive signals. Our result, however, does not apply to those studies23 in which the effect of an event R on the upcoming signal S is studied.
We believe that a new analysis of the data, based on stochastic predictive models38, could help providing the time course of the unconstrained mutual information. We have discussed how our model is similar to previously studied model17 but differs in several crucial aspects. Albeit simple, both these models capture the essential aspects of the decision process. More complex models of neural activity could and must be introduced in the future to better quantify the neural processes leading to decisions. However, also these models will have to cope with the result discussed here as long as time-locked trajectories are analyzed backward in time.
Additional Information
How to cite this article: Rusconi, M. and Valleriani, A. Predict or classify: The deceptive role of time-locking in brain signal classification. Sci. Rep. 6, 28236; doi: 10.1038/srep28236 (2016).
Supplementary Material
Acknowledgments
We would like to thank S. Risse and C. Allefeld for useful comments at an early stage of this work.
Footnotes
Author Contributions A.V. conceived the study. M.R. interpreted the study in terms of the www model. A.V. performed the mathematical derivations. M.R. performed the SVM analyses. M.R. and A.V. produced all figures. Both authors interpreted the results, wrote and approved the manuscript.
References
- Haggard P. & Eimer M. On the relation between brain potentials and the awareness of voluntary movements. Experimental brain research. Experimentelle Hirnforschung. Expérimentation cérébrale 126, 128–33 (1999). [DOI] [PubMed] [Google Scholar]
- Soon C. S., Brass M., Heinze H.-J. & Haynes J.-D. Unconscious determinants of free decisions in the human brain. Nat Neurosci 11, 543–545 (2008). [DOI] [PubMed] [Google Scholar]
- Bode S. et al. Tracking the unconscious generation of free decisions using ultra-high field fMRI. PLOS ONE 6, e21612 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soon C. S., He A. H., Bode S. & Haynes J.-D. Predicting free choices for abstract intentions. PNAS 110, 6217–6222 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matsuhashi M. & Hallett M. The timing of the conscious intention to move. The European Journal of Neuroscience 28, 2344–51 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fried I., Mukamel R. & Kreiman G. Internally generated preactivation of single neurons in human medial frontal cortex predicts volition. Neuron 69, 548–62 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Libet B., Wright E. W. & Gleason C. A. Readiness-potentials preceding unrestricted ‘spontaneous’ vs. pre-planned voluntary acts. Electroencephalogr Clin Neurophysiol 54, 322–335 (1982). [DOI] [PubMed] [Google Scholar]
- Libet B., Gleason C. A., Wright E. W. & Pearl D. K. Time of conscious intention to act in relation to onset of cerebral activity (readiness-potential). Brain 106, 623–642 (1983). [DOI] [PubMed] [Google Scholar]
- Haynes J.-D. & Rees G. Decoding mental states from brain activity in humans. Nat Rev Neurosci 7, 523–34 (2006). [DOI] [PubMed] [Google Scholar]
- Herrojo Ruiz M. et al. Encoding of sequence boundaries in the subthalamic nucleus of patients with Parkinson’s disease. Brain: A Journal of Neurology 137, 2715–30 (2014). [DOI] [PubMed] [Google Scholar]
- Haynes J.-D. Decoding and predicting intentions. Ann N Y Acad Sci 1224, 9–21 (2011). [DOI] [PubMed] [Google Scholar]
- Haggard P. Human volition: towards a neuroscience of will. Nat Rev Neurosci 9, 934–46 (2008). [DOI] [PubMed] [Google Scholar]
- Sinnott-Armstrong W. & Nadel L. (eds.) Conscious will and responsibility (Oxford University Press, Oxford, 2011), Oxford Series in Neuroscience, Law, and Philosophy edn. [Google Scholar]
- Bode S. et al. Demystifying “free will”: the role of contextual information and evidence accumulation for predictive brain activity. Neurosci Biobehav Rev 47, 636–645 (2014). [DOI] [PubMed] [Google Scholar]
- Brass M. & Haggard P. The what, when, whether model of intentional action. The Neuroscientist 14, 319–25 (2008). [DOI] [PubMed] [Google Scholar]
- Kornhuber H. & Deecke L. Hirnpotentialänderungen bei Willkürbewegungen und passiven Bewegungen des Menschen: Bereitschaftspotential und reafferente Potentiale. Pflüger’s Archiv für die gesamte Physiologie … 284, 1–17 (1965). [PubMed] [Google Scholar]
- Schurger A., Sitt J. D. & Dehaene S. An accumulator model for spontaneous neural activity prior to self-initiated movement. PNAS 109, E2904–13 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schurger A., Mylopoulos M. & Rosenthal D. Neural Antecedents of Spontaneous Voluntary Movement: A New Perspective. Trends in Cognitive Sciences 20, 77–79 (2015). [DOI] [PubMed] [Google Scholar]
- Ratcliff R. A Theory of Memory Retrieval. Psychol Rev 85, 59–108 (1978). [Google Scholar]
- Schultze-Kraft M. et al. The point of no return in vetoing self-initiated movements. PNAS 113, 1080–1085 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chang C.-C. & Lin C.-J. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011). [Google Scholar]
- Toyoizumi T. & Huang H. Structure of attractors in randomly connected networks. Phys Rev E 91, 032802 (2015). [DOI] [PubMed] [Google Scholar]
- Palmer S. E., Marre O., Berry M. J. & Bialek W. Predictive information in a sensory population. Proc Natl Acad Sci USA 112, 6908–6913 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rudemo M. Prediction and smoothing for partially observed Markov chains. J Math Anal Appl 49, 1–23 (1975). [Google Scholar]
- Bettenbühl M. & Rusconi M. and Engbert, R. and Holschneider, M. Bayesian selection of Markov models for symbol sequences: application to microsaccadic eye movements. PLOS ONE 7, e43388 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thomas J. A. & Thomas J. A. Elements of information theory (John Wiley & Sons, New York, 2006). [Google Scholar]
- MacKay D. J. Information theory, inference, and learning algorithms vol. 7 (Cambridge University Press, 2003). [Google Scholar]
- Quiroga R. Q. & Panzeri S. Extracting information from neuronal populations: information theory and decoding approaches. Nat Rev Neurosci 10, 173–185 (2009). [DOI] [PubMed] [Google Scholar]
- Valleriani A., Liepelt S. & Lipowsky R. Dwell time distributions for kinesin’s mechanical steps. EPL (Europhysics Letters) 82, 28011 (2008). [Google Scholar]
- Li X., Kolomeisky A. B. & Valleriani A. Stochastic kinetics on networks: when slow is fast. J Phys Chem B 118, 10419–10425 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Valleriani A. Circular analysis in complex stochastic systems. Sci. Rep. 5, 17986 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brass M. & Haggard P. To do or not to do: the neural signature of self-control. The Journal of Neuroscience 27, 9141–9145 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Filevich E., Kühn S. & Haggard P. There Is No Free Won’t: Antecedent Brain Activity Predicts Decisions to Inhibit. PLOS ONE 8, e53053 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aron A. R. From reactive to proactive and selective control: Developing a richer model for stopping inappropriate responses. Biol Psychiatry 69, e55–e68 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murray J. D. et al. A hierarchy of intrinsic timescales across primate cortex. Nat Neurosci 17, 661–664 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guggisberg A. G. & Mottaz A. Timing and awareness of movement decisions: does consciousness really come too late? Frontiers in Human Neuroscience 7, 385 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klemm W. Free will debates: Simple experiments are not so simple. Advances in Cognitive Psychology 6, 47–65 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crutchfield J. P. Between order and chaos. Nature Physics 8, 17–24 (2011). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




































