Classification of brain states that predicts future performance in visual tasks based on co-integration analysis of EEG data

Marie Levakova; Jeppe Høy Christensen; Susanne Ditlevsen

doi:10.1098/rsos.220621

. 2022 Nov 30;9(11):220621. doi: 10.1098/rsos.220621

Classification of brain states that predicts future performance in visual tasks based on co-integration analysis of EEG data

Marie Levakova ^1,^✉, Jeppe Høy Christensen ², Susanne Ditlevsen ¹

PMCID: PMC9709569 PMID: 36465674

Abstract

Electroencephalogram (EEG) is a popular tool for studying brain activity. Numerous statistical techniques exist to enhance understanding of the complex dynamics underlying the EEG recordings. Inferring the functional network connectivity between EEG channels is of interest, and non-parametric inference methods are typically applied. We propose a fully parametric model-based approach via cointegration analysis. It not only estimates the network but also provides further insight through cointegration vectors, which characterize equilibrium states, and the corresponding loadings, which describe the mechanism of how the EEG dynamics is drawn to the equilibrium. We outline the estimation procedure in the context of EEG data, which faces specific challenges compared with the common econometric problems, for which cointegration analysis was originally conceived. In particular, the dimension is higher, typically around 64; there is usually access to repeated trials; and the data are artificially linearly dependent through the normalization done in EEG recordings. Finally, we illustrate the method on EEG data from a visual task experiment and show how brain states identified via cointegration analysis can be utilized in further investigations of determinants playing roles in sensory identifications.

Keywords: EEG data, cointegration analysis, functional network

1. Introduction

An electroencephalogram (EEG) records the electrical activity of the brain in terms of differences in electrical potentials between two points by electrodes attached to the scalp. EEG recordings are an invaluable source of information about neuronal activity on a global level. Traditional statistical techniques focus on both the time and the frequency domain. In the time domain, stimulus-evoked responses at a single or at a few electrodes are typically assessed using event-related response methods [1], which average the time-locked EEG signal across repeated trials. Frequency domain techniques investigate the amplitude or phase of oscillatory activity in selected electrode(s) for specific frequency bands.

More recently, methods for assessing the functional network connectivity between electrodes have been developed [2,3]. The functional network connectivity describes how activity at each recording site is related to activity measured at other sites. These include computing Pearson correlations between the time series, estimating oscillatory coupling from, e.g. coherence or phase locking values within a time window [4], or estimating the causal influence from one electrode to others with Granger causality [5]. Thus, functional network connectivity methods assess the EEG activity of all the electrodes, adhering to the idea that the brain is a dynamical system and that electrodes should not be examined in isolation [6].

In this article, we propose cointegration analysis as a novel tool to infer functional network connectivity. The starting point is a multi-dimensional continuous stochastic differential equation model for the underlying process leading to a vector autoregressive (VAR) model of order 1 of the discretely observed EEG signal. This model considers all channels jointly and takes into account the autocorrelation in the data. Cointegration analysis is based on the VAR model. VAR models have previously been used for EEG data; for example, autoregressive models were used to calculate a direct transfer function [7], partial directed coherence [8], or in classification algorithms [9]. However, the standard VAR model is assumed stationary, which is not the case for EEG data during stimulation, and the phenomenon of spurious regression may arise [10].

Cointegration analysis is able to address the four following aspects in EEG network analysis. First, most of the usual methods are non-parametric, whereas our approach is fully parametric. While a clear benefit of non-parametric and semi-parametric procedures is that they are (nearly) free from limiting assumptions and flexibly applicable under most circumstances, a fully model-based parametric approach offers a framework that can answer more specific scientific questions. Moreover, interpretation of parameters is more transparent, and estimation variance is reduced in the parametric setting.

Second, the most common methods study connectivity in the frequency domain, which requires observation intervals that are long enough to extract the most prominent cyclical components. However, the period of the alpha rhythm is between 80 and 125 ms. Here, we estimate the functional network connectivity during a short stimulation period lasting only between 20 and 110 ms, and we determine the brain state from only 100 ms before stimulus onset. Thus, inference in the frequency domain is not an option.

Third, a crucial assumption in standard time series analysis and a common assumption in general is that EEG data are stationary. The natural physical limitations of EEG data do not allow for persistent trends. However, there will be temporary deviations due to experimental inputs. This non-stationarity has implications for the statistical analysis, such as spurious regression [10]. Finally, networks are typically inferred among a limited number of channels or among a few larger brain areas. Our aim is to upscale the cointegration methodology so that all EEG channels (64 in our case) are included in the network.

Cointegration analysis was originally developed with econometrics applications in mind [11]. Recently, it has been applied to climate research [12], phase-coupled oscillating systems in physics [13] and to low-dimensional systems of coupled oscillators in neuroscience [14]. The idea of cointegration analysis is to discern which part of the data can be attributed to stochastic trends and which part stems from linear equilibrium relationships, termed cointegration relationships. This is particularly relevant for non-stationary data, which nevertheless exhibit some kind of stability or structure in the overall system. Such dynamics are observed in many biological systems, e.g. in processes in the brain measured through EEG, which we focus on here. The cointegration relationships then represent the functional network connectivity between electrodes in the EEG.

The estimated parameters of the cointegration model are relevant for interpreting the EEG: the cointegration rank gives the number of independent cointegration relationships related to the global connectivity network and the number of independent stochastic trends; the cointegration matrix contains coefficients of cointegration relationships (i.e. electrode connectivity strengths); the loading matrix describes how the underlying system (i.e. the brain) reacts to deviations from the cointegration relationships. Most importantly, the product of the loading and the cointegration matrices describes the functional network structure. The structure might change over time, which reflects for example how brain states change in response to changing tasks.

We apply the cointegration analysis to EEG data obtained from two human participants performing a simple visual identification task. In each trial, the participant was asked to report the orientation of a briefly presented Landolt ring. Exogenous factors such as the presentation time and stimulus luminance are the typical factors that predict performance in the task; however, pre-stimulus EEG activity has recently been shown also to have predictive value in similar tasks [15]. That is, the brain activity prior to the onset of the visual stimulus affects how well the stimulus is perceived, either from oscillatory activity in the visual cortex timed to the onset of sensory input [16,17] or from spontaneous activity [18]. We show that cointegration analysis can be used to investigate this further while giving an account of the functional networks involved. We then test if the pre-stimulus brain state estimated from a time interval as short as 100 ms predicts performance on the visual identification task.

2. Cointegration methodology

The vector $x_{t} = {(x_{1 t}, x_{2 t}, \dots, x_{p t})}^{'} \in R^{p}$ represents the EEG signals recorded at time t, p is the number of electrodes and ′ denotes the transpose. If not stated otherwise, then p = 64. The data are recorded at time points t₀ < t₁ < … < t_n < … < t_N. We write x_n for $x_{t_{n}}$ , and x_0:N denotes the set of observations {x_n : n = 0, …, N}. For M repeated trials, $x_{n}^{(m)}$ is the vector of recordings at time t_n in trial m and x^(1:M) denotes the set of all observations in all trials ${x_{n}^{(m)} : n = 0, \dots, N_{m}, m = 1, \dots, M}$ , where N_m is the number of observations in trial m. A p-dimensional column vector of zeros is denoted by 0_p, a p-dimensional column vector of ones is denoted by 1_p and I_p is the p × p unit matrix. The trace of a generic square matrix A is tr(A), A_i· represents the ith row of the matrix A and A_·i is the ith column of A. The l₂-norm of a vector $v \in R^{p}$ is ${‖ v ‖}_{2} = \sqrt{\sum_{i = 1}^{p} v_{i}^{2}}$ and the l₁-norm is ${‖ v ‖}_{1} = \sum_{i = 1}^{p} | v_{i} |$ . Finally, a hat marks an estimate, i.e. $\hat{μ}$ is an estimate of parameter μ.

2.1. Model

Assume that the EEG signals evolve according to an Ornstein–Uhlenbeck process [19,20],

d x_{t} = P (x_{t} - m) d t + D d W_{t},

2.1

where m is a p-dimensional mean vector, P is a p × p matrix, D is a p × d matrix with d ≤ p such that DD′ is positive semidefinite and W is a d-dimensional Brownian motion. To ensure that the process generated by (2.1) is recurrent, the eigenvalues of P must all have negative real parts (positive recurrent) or some of them be equal to 0 (null recurrent). The matrix P can be factorized as P = ab′, where $a, b \in R^{p \times r}$ and r ≤ p. We assume furthermore that |b′a| ≠ 0 and all eigenvalues of b′a have negative real parts.

If the process is observed at equidistant time points t₁, …, t_N with timestep Δ = t_n − t_n−1, the observations x_n, n = 1, …, N satisfy the VAR model

x_{n} = μ + A x_{n - 1} + ε_{n}, ε_{n} \sim N_{p} (0_{p}, Σ), n = 1, \dots, N,

2.2

where A = e^PΔ, μ = (I_p − A)m and $Σ = \int_{0}^{Δ} e^{u P} D D^{'} e^{u P^{'}} d u$ . It is more convenient to write the VAR process as a vector error–correction model (VECM),

Δ x_{n} = x_{n} - x_{n - 1} = μ + Π x_{n - 1} + ε_{n}, n = 1, \dots, N,

2.3

with $Π = A - I_{p}$ . The assumptions imposed on matrices a and b imply that all eigenvalues of A have modulus less than 1 or are equal to 1 [21].

The rank of $Π$ in (2.3), $r = rank (Π)$ , has fundamental implications for the properties of the system, directly linking to the properties of the original system (2.1), since $rank (Π) = rank (P)$ [21].

—
If r = 0, ${x_{n}}_{n = 1}^{N}$ is a set of p random walks, as $Δ x_{n} = ε_{n}$ .
—
If r = p, all eigenvalues of A have modulus less than 1, ${x_{n}}_{n = 1}^{N}$ is asymptotically stationary and contains neither a stochastic trend nor a linear trend [22].
—
If 0 < r < p, ${x_{n}}_{n = 1}^{N}$ is driven by exactly p − r independent stochastic trends, and there exist r linear combinations of the vector process ${x_{n}}_{n = 1}^{N}$ that yield an asymptotically stationary one-dimensional process. Linear trends due to the drift μ are possible too.

The third case is termed cointegration and will be assumed from now on. If 0 < r < p, we can find matrices $α, β \in R^{p \times r}$ such that $Π = α β^{'}$ . The matrices α and β are not unique, one can take an arbitrary invertible matrix $Q \in R^{r \times r}$ and find another decomposition $Π = (α Q) (Q^{- 1} β^{'}) = α^{*} β^{*^{'}}$ . However, the subspaces spanned by the columns of α and β, sp(α) and sp(β), are unique [21].

Matrix $Π$ has a straightforward interpretation. An element $Π_{i j}$ quantifies the influence of channel j on the change in channel i in the following time point. The matrix $Π$ thus defines a functional network among the channels 1, …, p. The matrix β is called the cointegration matrix, since the columns of β contain cointegration vectors. A cointegration vector $v \in R^{p}$ is a vector such that v′x_n is an asymptotically stationary univariate process. Thus, a linear combination given by v represents an equilibrium, although single EEG signals can be subject to stochastic trends. Cointegration vectors form a linear subspace, meaning that any linear combination of two distinct cointegration vectors is also a cointegration vector. The columns of β are one possible basis of this subspace.

The matrix α is called the loading matrix. It describes the correcting mechanism ensuring that β′x_n is always pushed to the long-term mean $E (β^{'} x_{\infty}) : = lim_{n \to \infty} E (β^{'} x_{n})$ . Rewrite the VECM model (2.3) as follows:

Δ x_{n} = \underset{a constant vector}{\underset{⏟}{μ + α E (β^{'} x_{\infty})}} + α \underset{disequilibrium error}{\underset{⏟}{[β^{'} x_{n - 1} - E (β^{'} x_{\infty})]}} + ε_{n} .

2.4

The matrix α thus reveals the rate at which the system reacts to the deviations of the cointegration relationships β′x_n−1 from the asymptotic mean $E (β^{'} x_{\infty})$ to keep the stationary cointegration relationships satisfied in the long term.

2.2. Estimation procedure

We briefly review the most common estimation procedure termed the Johansen procedure [23]. We will show the non-uniqueness of some of the estimated parameters and how a particular choice of an estimate affects the rest, and address particular issues arising for EEG datasets.

2.2.1. Single trial

The Johansen procedure is based on the maximum likelihood method. Assuming $ε_{n} \sim N (0, Σ)$ and centring Δx_n and x_n as follows:

z_{0 n} = Δ x_{n} - \frac{1}{N} \sum_{n = 1}^{N} Δ x_{n}; z_{1 n} = x_{n - 1} - \frac{1}{N} \sum_{n = 1}^{N} x_{n - 1},

2.5

the log-likelihood becomes

ℓ (α, β, Σ; z_{0, 1 : N}, z_{1, 1 : N}) \propto - \frac{N}{2} \log | Σ | - \frac{1}{2} \sum_{n = 1}^{N} {(z_{0 n} - α β^{'} z_{1 n})}^{'} Σ^{- 1} (z_{0 n} - α β^{'} z_{1 n}) .

2.6

The parameter estimates can be expressed in terms of sufficient statistics S₀₀, S₀₁ and S₁₁, which are obtained as follows:

S_{i j} = N^{- 1} \sum_{n = 1}^{N} z_{i n} z_{j n}^{'}, i, j \in {1, 2} .

2.7

The estimation procedure has the following consecutive steps:

(i)
Estimate the cointegration rank r
(ii)
Estimate the cointegration matrix β
(iii)
Estimate the loading matrix α
(iv)
Estimate the covariance matrix $Σ$ and the drift μ

We explain the steps (ii)–(iv) first and discuss the topic of the cointegration rank determination in more detail in §2.2.5.

Estimation of β. For a given cointegration rank r, the task is to estimate $Π$ subject to $rank (Π) = r$ , which is a problem of reduced rank regression [24]. The columns of β can be found as the eigenvectors v₁, …, v_r corresponding to the r largest eigenvalues λ_i of the eigenvalue problem,

λ_{i} S_{11} v_{i} = S_{01}^{'} S_{00}^{- 1} S_{01} v_{i}, i = 1, \dots, p,

2.8

which are normalized so that

v_{i}^{'} S_{11} v_{j} = {\begin{matrix} 1 & for i = j, \\ 0 & for i \neq j . \end{matrix}

2.9

Since any linear combination of the cointegration vectors yields a cointegration vector, any basis of $s p (\hat{β})$ can serve as $\hat{β}$ .

Estimation of α. For fixed $\hat{β}$ , we construct new stationary covariates ${\hat{β}}^{'} z_{1 n} = u_{n} \in R^{r}$ for each n ∈ {1, …, N} and turn the VECM model (2.3) into a standard linear regression model,

z_{0 n} = α u_{n} + ε_{n},

2.10

with no restrictions on the rank. The maximum likelihood estimator (MLE) as well as the standard least squares estimator are expressed as follows:

\hat{α} = S_{01} \hat{β} {({\hat{β}}^{'} S_{11} \hat{β})}^{- 1} .

2.11

The chosen form of $\hat{β}$ affects $\hat{α}$ ; however, $s p (\hat{α})$ is invariant. Furthermore, $\hat{Π} = \hat{α} {\hat{β}}^{'}$ is always unique. The estimation procedure gives some flexibility for choosing $\hat{β}$ , but not for $\hat{α}$ . There exist also procedures where $\hat{α}$ is identified first with certain degree of freedom, and then, the estimator of β given $\hat{α}$ is unique [25].

Estimation of $Σ$ and μ. The closed-form expressions of the MLE of $Σ$ and μ for fixed $\hat{β}$ are as follows:

\hat{Σ} = S_{00} - S_{01} \hat{β} {({\hat{β}}^{'} S_{11} \hat{β})}^{- 1} {\hat{β}}^{'} S_{01}^{'}

2.12

and

\hat{μ} = \frac{1}{N} \sum_{i = 1}^{N} (Δ x_{n} - \hat{α} {\hat{β}}^{'} x_{n - 1}) .

2.13

2.2.2. Repeated trials

Cointegration analysis has mainly been used in econometrics, where processes of interest cannot be repeated, and statistical inference is based on single time series observed typically over a long period of time. The data in experimental neurobiology often differ in two aspects: processes evolve over a short time interval and only few observations can be made, and running the same experiment under controlled conditions repeatedly is not a problem. Here, we show how cointegration analysis can be performed with data from repeated trials.

Assume the process (2.3) was observed in M experimental trials with N_m observations in trial m. The log-likelihood becomes

\begin{aligned} ℓ (μ, α, β, Σ; x^{(1 : M)}) \propto - \frac{\log | Σ |}{2} \sum_{m = 1}^{M} N_{m} \\ - \frac{1}{2} \sum_{m = 1}^{M} \sum_{n = 1}^{N_{m}} {(Δ x_{n}^{(m)} - α β^{'} x_{n - 1}^{(m)} - μ)}^{'} Σ^{- 1} (Δ x_{n}^{(m)} - α β^{'} x_{n - 1}^{(m)} - μ), \end{aligned}

2.14

and the only change is that now we also sum over trials m. The sufficient statistics $S_{00}^{M}$ , $S_{01}^{M}$ and $S_{11}^{M}$ are as follows:

S_{i j}^{M} = {(\sum_{m = 1}^{M} N_{m})}^{- 1} \sum_{m = 1}^{M} \sum_{n = 1}^{N_{m}} z_{i n}^{(m)} z_{j n}^{{(m)}^{'}},

2.15

where

z_{0 n}^{(m)} = Δ x_{n}^{(m)} - {(\sum_{m = 1}^{M} N_{m})}^{- 1} \sum_{m = 1}^{M} \sum_{i = 1}^{N_{m}} Δ x_{n}^{(m)}

2.16

and

z_{1 n}^{(m)} = x_{n - 1}^{(m)} - {(\sum_{m = 1}^{M} N_{m})}^{- 1} \sum_{m = 1}^{M} \sum_{i = 1}^{N_{m}} x_{n - 1}^{(m)} .

2.17

Analogously to a single trial, $\hat{β}$ is constructed from r eigenvectors v_i, i = 1, …, r, corresponding to the r largest eigenvalues of the problem

λ_{i} S_{11}^{M} v_{i} = S_{01}^{M^{'}} {(S_{00}^{M})}^{- 1} S_{01}^{M} v_{i}, i = 1, \dots, p,

2.18

and the estimators of α, $Σ$ and μ are obtained as follows:

\hat{α} = S_{01}^{M} \hat{β} {({\hat{β}}^{'} S_{11}^{M} \hat{β})}^{- 1},

2.19

\hat{Σ} = S_{00}^{M} - S_{01}^{M} \hat{β} {({\hat{β}}^{'} S_{11}^{M} \hat{β})}^{- 1} {\hat{β}}^{'} S_{01}^{M^{'}}

2.20

and \hat{μ} = {(\sum_{m = 1}^{M} N_{m})}^{- 1} \sum_{m = 1}^{M} \sum_{i = 1}^{N_{m}} (Δ x_{n}^{(m)} - \hat{α} {\hat{β}}^{'} x_{n - 1}^{(m)}) .

2.21

2.2.3. Dealing with the reference level in electroencephalogram measurements

EEG measures differences in electrical potentials between two points. Thus, the signal at any channel is the difference to some recording site. This recording site is the baseline electrode, which is, however, prone to pick up electrical noise that does not reach the other electrodes. Consequently, the voltage differences between baseline and other electrodes are also affected by this noise.

To eliminate this noise, systems for recording EEG usually re-reference EEG signals with respect to another reference level that is chosen from the EEG channels. The signals at the other EEG channels are expressed as the differences in electrical potential to this reference instead of the baseline. This cancels out the noise stemming from the baseline circuit; however, it introduces linear dependence between the recordings and causes the cointegration model to be overparametrized. We will illustrate how this can be handled, for the case of a common average reference, which is a frequent choice of the reference level.

The new reference is the average electrical activity measured across all channels and re-referencing is achieved by subtracting the average from each channel. The electrical activity across all channels therefore sums to zero at each time point,

\sum_{i = 1}^{p} x_{i n} = 0, \forall n \in {1, \dots, N},

2.22

and the signal from one of the channels can always be derived from the other p − 1 channels.

Assume that the pth channel is excluded when the cointegration matrix is estimated. The new dataset consists of (p − 1)-dimensional observations $x_{n}^{(p - 1)} = {(x_{1 n}, \dots, x_{p - 1, n})}^{'}$ , and the model is

Δ x_{n}^{(p - 1)} = μ^{(p - 1)} + α^{(p - 1)} β^{{(p - 1)}^{'}} x_{n - 1}^{(p - 1)} + ε_{n}^{(p - 1)},

2.23

where $α^{(p - 1)}, β^{(p - 1)} \in R^{(p - 1) \times r}$ , $μ^{(p - 1)} \in R^{p - 1}$ and $ε_{n}^{(p - 1)} \sim N_{p - 1} (0, Σ^{(p - 1)})$ . A cointegration relationship in the (p − 1)-dimensional model given by the jth column of β^(p−1) can be written as a linear combination of the full p-dimensional vector x_n using (2.22) as follows:

β_{\cdot j}^{{(p - 1)}^{'}} x_{n}^{(p - 1)} = {(β_{\cdot j}^{(p - 1)} + c_{j} 1_{p - 1})}^{'} x_{n}^{(p - 1)} + c_{j} x_{p n}, j \in {1, \dots, r},

2.24

where $c_{j} \in R$ is an arbitrary constant. If β^(p−1) is chosen so that the normalization condition $β^{{(p - 1)}^{'}} S_{11}^{(p - 1)} β^{(p - 1)} = I_{r}$ holds, the p-dimensional cointegration matrix β with columns $β_{\cdot j} = {(β_{\cdot j}^{{(p - 1)}^{'}} + c_{j} 1_{p - 1}^{'}, c_{j})}^{'}$ satisfies the analogous p-dimensional condition β′S₁₁β = I_r for any $c_{j} \in R$ , j ∈ {1, …, r}. This is because the common mode reference implies S₁₁1_p = 0_p. In the analysis in §3, we use $c_{j} = - (1 / p) \sum_{i = 1}^{p - 1} β_{i j}^{(p - 1)}$ to minimize the norm ‖β_·j‖₂.

The matrix $\hat{α}$ is obtained from the full p-dimensional model using formula (2.19). Note that $\hat{Π} = \hat{α} {\hat{β}}^{'}$ in the full p-dimensional model is not invariant with respect to the choice of the excluded channel and with respect to the choice of constants c_j, j = 1, …, r. However, the product $\hat{Π} x_{n - 1}$ is invariant. Specifically, $\hat{Π} x_{n - 1}$ remains unchanged if an arbitrary constant d_i is added to all elements in a row ${\hat{Π}}_{i \cdot}$ , i ∈ {1, …, p}.

2.2.4. Regularization

As the dimension p is typically high for EEG data, it is beneficial to impose regularization to obtain robust estimators. Several approaches have been suggested: imposing lasso penalty directly on $Π$ [26], on β [25] or regularizing certain decompositions of $Π$ [27,28]. Here, we penalize α. The reasons for choosing penalization of α over other possible methods are as follows: (i) easy computation that can be done with standard software packages, (ii) no bias is introduced into the estimated cointegration space $s p (\hat{β})$ , and (iii) better performance over a range of error measures compared with other penalization methods [29].

First, the cointegration matrix β is estimated by the Johansen procedure, yielding $\hat{β}$ . Then, we find $\hat{α}$ by minimizing the sum of squared errors with elastic net penalty [30]

\begin{aligned} \hat{α} & = \arg min_{α \in R^{p \times r}} \frac{1}{2 N} \sum_{m = 1}^{M} \sum_{n = 1}^{N_{m}} {(z_{0 n}^{m} - α {\hat{β}}^{'} z_{1 n}^{(m)})}^{'} (z_{0 n}^{m} - α {\hat{β}}^{'} z_{1 n}^{(m)}) \\ + γ [\frac{1 - ω}{2} \sum_{i = 1}^{p} {‖ α_{i \cdot} ‖}_{2}^{2} + ω \sum_{i = 1}^{p} {‖ α_{i \cdot} ‖}_{1}], \end{aligned}

2.25

where γ ≥ 0 is a tuning parameter governing the overall amount of penalization and ω ∈ [0, 1] controls the proportion of the ridge penalty ${‖ α_{i \cdot} ‖}_{2}^{2} / 2$ and the lasso penalty ‖α_i·‖₁. The lasso penalty pushes the elements of $\hat{α}$ to become exact zeros and thus a sparse representation, allowing for a more meaningful interpretation. The ridge penalty weakens the impact of potential correlation between the predictors ${\hat{β}}^{'} z_{1 n}^{(m)}$ . A sparse estimate $\hat{α}$ does not imply that $\hat{Π} = \hat{α} {\hat{β}}^{'}$ is sparse.

2.2.5. Determination of the cointegration rank

In standard low-dimensional problems, the typical procedure to determine r is based on likelihood ratio tests that are applied sequentially over a range of possible cointegration ranks. The likelihood ratio test can be in two forms. The null hypothesis is the same, H₀ : r ≤ r₀, while the alternative is either H_a : r ≤ p (trace test) or H_a : r ≤ r₀ + 1 (maximum eigenvalue test). The test statistics depend only on eigenvalues λ_i, i = 1, …, p, of the eigenvalue problem (2.18). Assuming that the eigenvalues are in a descending order, λ₁ ≥ … ≥ λ_p, the test statistics are as follows:

—
trace test:
$- 2 [ℓ (r_{0}) - ℓ (p)] = - (\sum_{m = 1}^{M} N_{m}) \sum_{i = r_{0} + 1}^{p} \log (1 - λ_{i}),$ 2.26
—
maximum eigenvalue test:
$- 2 [ℓ (r_{0}) - ℓ (r_{0} + 1)] = - (\sum_{m = 1}^{M} N_{m}) \log (1 - λ_{r_{0} + 1}) .$ 2.27

The sequential testing starts with setting r₀ = 0. If H₀ is rejected, r₀ is increased by one, and the test is repeated until acceptance.

This way of determining the rank is in general not applicable for EEG data due to their high dimension. The trace and the maximum eigenvalue test statistics do not follow any standard distribution, and their critical values depend on p and need to be calculated numerically. Currently, critical values are available for dimension p ≤ 11. This can be overcome by bootstrap methods, but the required computer time makes them out of reach.

Eigenvalues. The eigenvalues of (2.18) are approximate indicators of the cointegration rank. If a cointegration relationship v_i exists, the corresponding eigenvalue λ_i is expected to be significantly larger than zero. A rough insight can therefore be gained from a scree plot, where ordered eigenvalues are plotted against the rank.

Bunea et al. [31] proposed a rank selection criterion that uses eigenvalues of $S_{01}^{M} {(S_{11}^{M})}^{- 1} S_{10}^{M}$ and identifies the cointegration rank as the number of eigenvalues larger than or equal to a threshold θ,

θ = \frac{2 (p + q)}{p (\sum_{m = 1}^{M} N_{m} - q)} \sum_{m = 1}^{M} \sum_{n = 1}^{N_{m}} {(z_{0 n}^{(m)} - {\hat{Π}}_{p} z_{1 n}^{(m)})}^{'} (z_{0 n}^{(m)} - {\hat{Π}}_{p} z_{1 n}^{(m)}),

2.28

where ${\hat{Π}}_{p}$ is the MLE of $Π$ assuming full rank p and q is the rank of the matrix of predictors $Z_{1} = {(z_{11}, \dots, z_{1 N_{M}})}^{'}$ . The cointegration rank selected by the rank selection criterion is equal to the rank of $\hat{Π}$ estimated by a penalized least squares estimator of the form

\hat{Π} = \arg min_{Π} [\sum_{m = 1}^{M} \sum_{n = 1}^{N_{m}} {(z_{0 n}^{(m)} - Π z_{1 n}^{(m)})}^{'} (z_{0 n}^{(m)} - Π z_{1 n}^{(m)}) + θ rank (Π)] .

2.29

Matrix angle. We use the following generalized version of the vector angle for quantifying the closeness of subspaces spanned by two matrices U and V,

Θ (U, V) = \arccos (\frac{{⟨ U, V ⟩}_{F}}{\sqrt{{⟨ U, U ⟩}_{F} {⟨ V, V ⟩}_{F}}}),

2.30

and we call it the angle between matrices $U, V \in R^{p \times p}$ . It uses the Frobenius inner product 〈U, V〉_F = tr(U′V). For U = 0 or V = 0, we set $Θ (U, V) = π / 2$ .

The angle $Θ (\hat{Π}, Π)$ between the estimate $\hat{Π}$ and the true $Π$ depends on how close the estimation rank is to the true rank. When the estimation rank is much smaller than the true rank, the estimate $\hat{Π}$ is far from the truth and the angle $Θ (\hat{Π}, Π)$ can be as high as π/2. When the estimation rank approaches the true rank, the angle $Θ (\hat{Π}, Π)$ tends to decrease. However, when the estimation rank increases beyond the true rank, the angle does not decrease further but fluctuates around a constant level due to sampling error.

Ideally, we could plot $Θ ({\hat{Π}}_{r}, Π)$ as a function of the estimation rank and inspect if the curve has a kink separating the decreasing and constant segments. However, the true $Π$ is unknown. A naive solution is to replace $Π$ with the estimate under full estimation rank ${\hat{Π}}_{p}$ , but this produces a curve that decreases to zero, becoming exactly 0 for rank p, since the matrices are then equal.

Instead we replace the true $Π$ with several estimates ${\hat{Π}}_{p}^{(- k)}$ . We split the original dataset into K folds and estimate $Π$ from data with the kth fold excluded, assuming full rank. Then we calculate estimates $Π_{r}^{(k)}$ under ranks r ∈ {0, …, p}, using only data from fold k. The plots of $Θ ({\hat{Π}}_{r}^{(k)}, {\hat{Π}}_{p}^{(- k)})$ against r are useful for two reasons. First, $Θ ({\hat{Π}}_{r}^{(k)}, {\hat{Π}}_{p}^{(- k)})$ does not go to zero due to sampling error. Second, $Θ ({\hat{Π}}_{r}^{(k)}, {\hat{Π}}_{p}^{(- k)})$ tends to have little variance for ranks lower than the true one, but can vary a lot for ranks higher than the true one.

Cross-validation. Another option is to compare the prediction error of the cointegration model with different ranks when applied to independent data through cross-validation, using K folds as mentioned earlier. We use the following two criteria:

1.
Mean squared error (MSE) of prediction:
$\begin{aligned} MSE (r) \\ = \frac{1}{K} \sum_{k = 1}^{K} [\frac{1}{N_{k}} \sum_{n \in I_{k}} {(Δ x_{n}^{(k)} - {\hat{Π}}_{r}^{(- k)} x_{n - 1}^{(k)} - {\hat{μ}}^{(- k)})}^{'} (Δ x_{n}^{(k)} - {\hat{Π}}_{r}^{(- k)} x_{n - 1}^{(k)} - {\hat{μ}}^{(- k)})] . \end{aligned}$ 2.31
2.
Average cross-validated log-likelihood:
$\begin{aligned} ℓ (r) & = \frac{1}{K} \sum_{k = 1}^{K} {- \frac{1}{2 N_{k}} \sum_{n \in I_{k}} [\log | {\hat{Σ}}_{r}^{(- k)} | \\ + {(Δ x_{n}^{(k)} - {\hat{Π}}_{r}^{(- k)} x_{n - 1}^{(k)} - {\hat{μ}}^{(- k)})}^{'} {({\hat{Σ}}_{r}^{(- k)})}^{- 1} (Δ x_{n}^{(k)} - {\hat{Π}}_{r}^{(- k)} x_{n - 1}^{(k)} - {\hat{μ}}^{(- k)})]}, \end{aligned}$ 2.32

where ${x_{n}^{(k)} : n \in I_{k}}$ are the observations in the kth fold specified by an index set I_k, N_k is the number of observations in the kth fold and ${\hat{μ}}^{(- k)}$ , ${\hat{Π}}^{(- k)}$ and ${\hat{Σ}}^{(- k)}$ are estimates obtained when the kth fold is left out.

2.2.6. Test of structural difference between two cointegrated networks

A natural question is whether the network is the same under two different experimental set-ups A and B, or whether the experimental conditions impose a structural difference of the network. We use the Chow test, with the null hypothesis being that the data from both subsets follow the same model with identical parameters μ₀ and $Π_{0}$ ,

H_{0} : Δ x_{n}^{(k)} = μ_{0} + Π_{0} x_{n - 1}^{(k)} + ε_{n}^{(k)}, n = 1, \dots, N_{k}, k \in {A, B},

2.33

while the alternative is that each subset is governed by parameters $(μ_{A}, Π_{A})$ or $(μ_{B}, Π_{B})$ , as follows:

H_{a} : Δ x_{n}^{(k)} = μ_{k} + Π_{k} x_{n - 1}^{(k)} + ε_{n}^{(k)}, n = 1, \dots, N_{k}, k \in {A, B} .

2.34

The test statistic is the usual likelihood ratio test statistic,

Q (H_{0} | H_{a}) = - 2 [ℓ (μ_{0}, Π_{0}; x^{(A, B)}) - ℓ (μ_{A}, Π_{A}; x^{(A)}) - ℓ (μ_{B}, Π_{B}; x^{(B)})]

2.35

and the asymptotic distribution is χ² with p(2r + 1) − r² degrees of freedom [22,32], given that the VAR model with lag order 1 is valid.

When the hypothesis is rejected, it is of interest to identify which elements are different. A range of likelihood ratio tests of constant loading matrix α or constant cointegration matrix β under either constant or variable cointegration rank can be found in the literature [32].

3. Application to electroencephalogram recordings from a visual task experiment

We applied the cointegration methodology to EEG recordings obtained during a visual identification experiment (figure 1), during which two participants were given the following task. First, the participant fixated on a centrally located fixation cross on a screen. The fixation lasted between 1.5 and 2.5 s randomly chosen. Then two rings appeared, one on each side of the fixation cross. Either the left or the right ring was a Landolt C, i.e. it had a gap at a certain orientation. The orientation was chosen from eight possible angles evenly spaced from 0 to 315 degrees in intervals of 45 degrees. The luminance contrast of the rings was either 6.5% or 28%. The duration of the visual stimulus was 20, 40, 70 or 110 ms. The position, orientation, contrast and duration were selected randomly with equal probabilities. The gap in the Landolt C was then masked for the 500 ms, followed by a blank screen. The participants had to report the orientation of the gap in the Landolt C. We classify participants’ answers into two categories: correct and incorrect.

EEG signals were recorded throughout the whole experimental session. They were obtained with 64 channel BioSemi EEG recording device (figure 1c) with 1024 Hz sampling (post hoc downsampled to 256 Hz). The data were cleaned with automatic methods from EEGlab in Matlab and re-referenced using a common mode reference. Excessive kurtosis was detected in four channels for Participant 1 (A2, B1, B2 and B12) and in three channels for Participant 2 (B29, B30 and B31); these channels were excluded from further analysis. Each participant completed 768 trials ( $8 orientations \times 2 contrasts \times 4 durations \times 12 repetitions$ ). After cleaning the data, there were 609 trials for Participant 1 and 587 trials for Participant 2. The two study participants were healthy, young female university students (20 and 21 years old).

The statistical analysis consists of two parts: the cointegration analysis to infer the functional network and prediction of the accuracy of the response from pre-stimulus EEG activity.

3.1. Fitting the cointegration model

The dataset of each participant was stratified into six subsets, and for each of them, one network was estimated by the cointegration analysis. The split was done according to the response accuracy (two categories: correct and incorrect) and the period within the experimental trial (three categories: fixation, stimulus and masking period). The fixation period was a 500 ms time window prior to the stimulus onset; the stimulation period was between 20 and 110 ms time window, and the masking period was the 500 ms time window starting with the mask onset (figure 1b). The number of experimental trials with selected characteristics of the fitted models is presented in table 1.

Table 1.

The number of observations N in the datasets after splitting the data according to the accuracy of the answer and the stage of the visual task, the cointegration ranks ${\hat{r}}_{R S C}$ estimated by the rank selection criterion and the tuning parameters γ and ω used in model fitting.

participant	accuracy	no. of trials	period
participant	accuracy	no. of trials	fixation	stimulus	masking
1	correct	416	N = 53 248	N = 7, 305	N = 52 832
			${\hat{r}}_{R S C} = 25$	${\hat{r}}_{R S C} = 14$	${\hat{r}}_{R S C} = 22$
			γ = 2 · 10⁻³	γ = 1 · 10⁻²	γ = 2 · 10⁻³
			ω = 0.75	ω = 0.75	ω = 0.5
	incorrect	193	N = 24, 704	N = 1, 815	N = 24, 511
			${\hat{r}}_{R S C} = 23$	${\hat{r}}_{R S C} = 11$	${\hat{r}}_{R S C} = 19$
			γ = 3 · 10⁻³	γ = 1 · 10⁻²	γ = 3 · 10⁻³
			ω = 0.75	ω = 0.25	ω = 0.75
2	correct	414	N = 52, 992	N = 7, 745	N = 52 578
			${\hat{r}}_{R S C} = 20$	${\hat{r}}_{R S C} = 12$	${\hat{r}}_{R S C} = 17$
			γ = 2 · 10⁻³	γ = 4 · 10⁻²	γ = 4 · 10⁻³
			ω = 1	ω = 0.25	ω = 0.5
	incorrect	173	N = 22, 144	N = 1, 468	N = 21 971
			${\hat{r}}_{R S C} = 18$	${\hat{r}}_{R S C} = 9$	${\hat{r}}_{R S C} = 15$
			γ = 6 · 10⁻³	γ = 4 · 10⁻²	γ = 3 · 10⁻³
			ω = 0.5	ω = 0.75	ω = 0.75

Open in a new tab

Cointegration rank. The cointegration rank was investigated through the criteria described in §2.2.5. The likelihood ratio tests using bootstrap were too time consuming and therefore not performed. We focused on the fixation period, where the process is expected to be most stable. We then used the same rank for stimulation and masking periods.

The visual assessment of the scree plots of the eigenvalues (figure 2) reveals that the most rapid decrease is observed for ranks between 0 and 20, consistently across all six studied scenarios. The stimulation period has in general larger eigenvalues than the other periods, and the data from trials with incorrect answers have larger eigenvalues than the data from trials with a correct answer in the stimulation period. This is probably due to smaller sample sizes and not evidence of a change in the cointegration rank. The eigenvalues for the masking period decrease to zero earlier than for the fixation period. This could indicate that the activity in the masking period is less cointegrated and driven by more stochastic trends than in the fixation period, since the amount of data is similar.

The angles $Θ ({\hat{Π}}_{r}^{(k)}, {\hat{Π}}_{p}^{(- k)})$ after splitting data into five folds (figure 3) are nearly constant for ranks larger than 15, and the variability of $Θ ({\hat{Π}}_{r}^{(k)}, {\hat{Π}}_{p}^{(- k)})$ across folds k is also higher for those ranks. The MSE of prediction and the average cross-validated log-likelihood were calculated for ranks r = 5, 10, …, 55. MSEs attain minimum for r = 55, except when the answer of Participant 2 is incorrect, in which case the minimum is achieved for r = 50. The cross-validated likelihood always attains its maximum for r = 55. However, both criteria change only little for r > 15. Finally, ranks estimated by the rank selection criterion (table 1) were between 9 and 25. We continue the analysis with r = 15 in all set-ups.

The models were fitted by the Johansen procedure with elastic net penalization on α. In the following, we comment on the main results for Participant 1. For the complete results of both participants, see the electronic supplementary material.

Network. Two main patterns are visible in $\hat{Π}$ (figure 4a). First, the row for channel B6 stands out with many large elements, indicating that B6 is affected strongly by other channels. Second, there are a few regions with larger positive or negative elements, signalling clusters of interlinked activity. One such cluster consists of channels around channel A9, and another cluster consists of channels A20 to A30. This is also shown in figure 4b, which shows strong input into B6 and lots of connections in the left frontal region and in the occipital lobe.

Test of difference between networks for correct and incorrect trials. By mere eye, there seems to be little difference between the estimated networks. Therefore, we compared the cointegration model fitted separately to correct and incorrect trials with the model fitted to all trials, separately for the periods of fixation, stimulation and masking. The results for both participants (table 2) indicate a highly significant difference between the cointegration models for correct and incorrect trials in all three periods.

Table 2.

Likelihood ratio tests of structural differences between trials with correct and incorrect answers.

participant	period	Q(H₀\|H_a)	d.f.	$χ_{D f}^{2} (0.95)$	p-value
1	fixation	15593.4	1604	1698.3	<0.001
	stimulation	5728.7	1604	1698.3	<0.001
	masking	15124.3	1604	1698.3	<0.001
2	fixation	11205.2	1635	1730.2	<0.001
	stimulation	4579.9	1635	1730.2	<0.001
	masking	11632.8	1635	1730.2	<0.001

Open in a new tab

Since the cointegration models differ visually little for the six subsets of data, in the following, we will comment on $\hat{β}$ , $\hat{α}$ and $\hat{μ}$ only for the fixation period in trials with a correct answer to illustrate the main features. The estimates for the remaining five subsets can be found in the supplementary material. It appears that the main pattern is specific to the participant, since the cointegration models for participant 2 are clearly different from participant 1, but do show very similar features within participant across the six subsets of data.

Cointegration vectors. Each of the 15 columns of $\hat{β}$ (figure 5a) represents weights of one cointegration vector. They have almost zero weights from channels B7 to B20. This indicates that these channels play a minor role in the joint dynamics of the system.

Figure 5b shows the first cointegration vector and how the channels with largest weights are located on the scalp. The most distinctive channels are two neighbouring channels A8 and A9 in the left frontal region and two neighbouring channels A4 and B6 in the centre of the frontal lobe. Neighbouring channels have weights of similar absolute value, but opposite signs, which indicates that they tend to have the same level of activity at equilibrium. This may result naturally from the presence of a strong connection between the neighbouring channels due to volume conduction [3]. However, there could be another mediator channel as well.

Loadings. It is important to note that $\hat{α}$ can only be interpreted in the light of the accompanying matrix $\hat{β}$ , which is not uniquely identifiable. For the estimate $\hat{β}$ satisfying the normalization condition (2.9), then larger absolute values of α (figure 5c) appear only in channel B6, around A9 and A20–A30, so the equilibrium cointegrated state specified by columns of $\hat{β}$ that satisfies (2.9) is achieved primarily by adjustments of these channels. For example, the equilibrium in the first cointegration vector (figure 5d) is achieved through adjustments of channel B6, which has the most significant loading in the first column of $\hat{α}$ . The sign of the loading has no biological meaning because it depends on the particular form of the cointegration vector, where signs of the weights could be reversed.

The channels having a noticeable non-zero weight in the cointegration vector usually also have a noticeable non-zero loading. Thus, if the equilibrium state is not temporarily met, the channels defining the equilibrium are also subject to the largest adjustments. However, this does not apply to channel A4, which plays a large role in the cointegration relationship, but has a loading close to zero. Thus, this channel’s activity evolves independently of whether the cointegration equilibrium is achieved, while the remaining channels adjust their activity to match the activity of A4 required in the cointegrated state.

Drift. For EEG data, we do not expect large drifts. First, the range of EEG signals is limited by natural laws. Second, forcing the signals to sum to zero limits possible drifts. This is shown in figure 5e. The majority of channels have drifts close to zero. Only channel B6 stands out with a significant negative drift.

3.2. Brain networks identified as predictors of performance in the visual task

In this subsection, we investigate whether the brain states identified by the cointegration analysis for the period right before the stimulus onset are predictive of whether a participant can identify the visual stimulus correctly. We fit a logistic model, where the response is the accuracy of the answer. One predictor is then the goodness of fit of two alternative cointegration models.

We construct the variable d_m for trial m ∈ {1, …, M}, which quantifies how well data from the mth trial, x^(m), are fit by a model A with parameters $(μ_{A}, Π_{A}, Σ_{A})$ compared with a model B with parameters $(μ_{B}, Π_{B}, Σ_{B})$ . We capture it by the difference in the log-likelihoods under the two models [33],

d_{m} = \log L (μ_{A}, Π_{A}, Σ_{A}; x^{(m)}) - \log L (μ_{B}, Π_{B}, Σ_{B}; x^{(m)}) .

3.1

Model A is the cointegration model (2.3) fitted to data in the 100 ms time window prior to the stimulus onset in the trials with a correct answer, whereas Model B is with an incorrect answer. We used 100 ms to capture the network right before the stimulus onset and simultaneously to allow for a full period of 10 Hz oscillatory activity, which has previously been suggested to drive the predictive value in the pre-stimulus EEG [17]. We applied a leave-one-trial-out principle, so if the answer in trial k was correct, the data from this trial were excluded for fitting Model A, and if the answer was incorrect, the data were left out when fitting Model B.

Furthermore, the following covariates were included to control for the experimental conditions:

—
Stimulus duration (sd_m), with four levels: {20, 40, 70, 110 ms}.
—
Stimulus orientation (o_m), with eight levels: {0°, 45°, 90°, …, 315°}.
—
Luminance contrast (c_m), with two levels: {high, low}.
—
Fixation duration (fd_m), with 11 levels: {1.5, 1.6, …, 2.5 s}.
—
Stimulus location (loc_m), with two levels: {left, right}.

All the covariates, except for d_m, are treated as categorical variables, even though some of them could be naturally treated as continuous variables. We do so to capture possible nonlinear effects and to explain the maximum portion of variability.

We considered a range of logistic models. The full model with all variables included (M₁) has the form

\log (\frac{p_{m}}{1 - p_{m}}) = α + β_{s d_{m}} + β_{o_{m}} + β_{d} d_{m} + β_{c_{m}} + β_{f d_{m}} + β_{l o c_{m}},

3.2

where p_m is the probability of a correct answer in trial m. This model was compared with Model M₂, where the log-likelihood difference was excluded from the set of predictors. Moreover, we fitted six single-predictor models M₀₁−M₀₆ with only one variable at a time, to assess the predictive power of each covariate on their own.

The fit of the models and their ability to predict the accuracy of the visual identification is quantified with Akaike information criteria (AIC), Bayesian information criteria (BIC) and area under the curve (AUC) statistics in table 3, and receiver operating characteristic (ROC) curves corresponding to the models are shown in figure 6 for Participant 1. As expected, the stimulus duration is the variable with the highest predictive ability for both participants, when used as the only predictor (AUC = 0.78 for Participant 1 and AUC = 0.85 for Participant 2). It is followed by stimulus orientation ( $AUC = 0.61 and 0.63$ , respectively) and log-likelihood difference (AUC = 0.60 and 0.61, respectively). The predictive ability of luminance contrast, stimulus location and fixation duration is weaker. These predictors are also borderline significant (the contrast is significant for Participant 1 and insignificant for Participant 2; the stimulus location is insignificant for Participant 1, but significant for Participant 2) or even insignificant (fixation duration for both participants) in the one-predictor models M₀₄, M₀₅ and M₀₆ (compared with table 4).

Table 3.

Fitting and prediction performance of the logistic models.

model	covariates	d.f.	Participant 1			Participant 2
model	covariates	d.f.	AIC	BIC	AUC	AIC	BIC	AUC
M₀₁	stim. duration	3	627.0	644.6	0.78	481.6	499.1	0.85
M₀₂	stim. orientation	7	757.4	792.7	0.61	703.3	738.3	0.63
M₀₃	loglik. diff.	1	741.2	750.0	0.60	700.9	709.7	0.61
M₀₄	contrast	1	755.4	764.3	0.57	712.2	721.0	0.54
M₀₅	fix. duration	10	776.3	824.8	0.56	724.2	772.3	0.57
M₀₆	left/right	1	761.9	770.7	0.54	709.1	717.9	0.56
M₁	all	23	598.8	704.7	0.84	439.8	544.8	0.91
M₂	all except d_m	22	616.6	718.1	0.83	448.6	549.2	0.91

Open in a new tab

Figure 6. — The ability of logistic models (a,b: M₀₁–M₀₆, c,d: M₁, M₂ and M₀₃) to predict correct answers of Participant 1 captured by ROC curves (a,c) and the corresponding areas under the curve AUC (b,d).

Table 4.

Significance tests of the covariates in the full model M₁ and in the single covariate models M₀₁ − M₀₆. The reported numbers are p-values from drop-in-deviance tests.

covariate	Participant 1		Participant 2
covariate	M₁	${M_{0 i}}_{i = 1}^{6}$	M₁	${M_{0 i}}_{i = 1}^{6}$
stim. duration	<0.001	<0.001	<0.001	<0.001
stim. orientation	<0.001	0.007	<0.001	<0.001
loglik. difference	<0.001	<0.001	0.001	<0.001
contrast	<0.001	0.002	<0.001	0.058
fix. duration	0.915	0.784	0.628	0.470
left/right	0.020	0.094	0.008	0.010

Open in a new tab

Model 1 with all covariates has good predictive power (AUC = 0.84 for Participant 1, AUC = 0.91 for Participant 2). The predictive power of Model 2, where the log-likelihood difference is excluded, is nearly the same. However, the decrease of fit is statistically significant both for Participant 1 (p < 10⁻⁴) and Participant 2 (p = 0.002). This suggests that the network right before the stimulus onset impacts the performance in the visual task. Nevertheless, data from more participants would be needed to confirm that the effect of the brain state before the stimulus onset on the cognitive performance is general.

4. Discussion

We have shown how cointegration methodology can be applied to infer functional networks in EEG data and have thereby expanded the available statistical toolbox for infering functional connectivity. We have applied the cointegration analysis on EEG data obtained from a visual task experiment with two participants.

Cointegration analysis is based on a VAR model, which is not new in EEG data analysis [7–9]. Our use of VAR models differs in two ways. First, a standard VAR model has no restriction on the rank of $Π$ , which means that $\hat{Π}$ has full rank almost surely and no stochastic trends are allowed. If the data are non-stationary, the fitted model may have no link to existing relations between EEG channels due to the phenomenon of spurious regression [10]. On the contrary, the cointegration approach starts with estimating the rank of $Π$ ; therefore, if non-stationarity is present, it is taken into account correctly.

Second, the VAR model in our approach is not just an auxiliary intermediate step as in other procedures of network recovery, but the actual model of interactions between EEG units. The parameters of the rank-restricted VAR model fitted as part of the cointegration analysis can also be used in further analysis as illustrated earlier or e.g. used to deduce Granger causality. We only use a VAR model of order 1, since a higher order VAR model of high dimension would have too many parameters to be fit reliably. Furthermore, we did not achieve a substantially better fit by adding more lags (results not shown). A VAR(1) model has moreover the advantage of a direct link to the continuous time Ornstein–Uhlenbeck model, which has a straightforward interpretation. Nevertheless, it should always be checked if a VAR model with lag order 1 is sufficient to fit a particular dataset, e.g. by inspecting the autocorrelation function of the residuals. We did not find any substantial autocorrelation in the residuals from most channels, and adding more lags did practically not change the residuals (results not shown).

We found that the cointegration analysis produced a distinct subject-specific network that did not differ visually between experimental conditions (figure 4), but nevertheless was predictive of task performance. Specifically, the pre-stimulus network predicted the response accuracy on the subsequent visual identification task. This finding corroborates previous studies investigating the effect of specific spontaneous and oscillatory brain activity features prior to the onset of a sensory stimulus on identification performance [17]. Thus, the cointegration analysis offers an exploratory approach to understand if and how the brain state at one instance in time affects behavioural performance in a subsequent task.

The classification of brain states employs a logistic model, but other approaches to brain state classification exist, such as neural networks [9] or quadratic discriminant analysis [34]. However, to evaluate the specific contribution of the brain state compared with other covariates, a parametric approach such as logistic regression is more suitable.

We estimated the functional connectivity network between channels. However, the measured potentials are only indirect indicators of unknown sources of neuronal activity. The underlying sources can influence measured activity at several nearby channels and lead to spurious correlations between EEG channels. This problem could be potentially fixed by applying some of the algorithms of EEG source localization [35] and inferring the functional connectivity network between the reconstructed sources [36,37]. The reliability of the cointegration methodology with the reconstructed sources needs to be further investigated. Altogether, the network inferred from EEG channels should be interpreted cautiously. However, if changes in connectivity caused by an experimental manipulation are of main interest, as in the logistic model for predicting performance in the visual task, the ambiguity in the connectivity of the true sources is not an issue.

Acknowledgements

The authors would like to thank Jacob Stærk-Østergaard for providing R-code for fitting cointegration models.

Data accessibility

The EEG data from the visual task experiment, the full data analysis for both participants and the underlying R code are publicly available at the online repository https://doi.org/10.17894/ucph.f19b3ddd-ea40-4d96-8787-c41aac9bd2e7 [38].

Authors' contributions

M.L.: formal analysis, software, visualization, writing—original draft and writing—review and editing; J.H.C.: data curation and writing—review and editing; S.D.: conceptualization, methodology and writing—review and editing.

All authors gave final approval for publication and agreed to be held accountable for the work performed therein.

Conflict of interest declaration

We declare we have no competing interests.

Funding

M.L. received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement no. 887784. S.D. received funding from the Novo Nordisk Foundation NNF20OC0062958 and the Independent Research Fund Denmark no. 9040-00215B.

References

1.Luck SJ. 2014. An introduction to the event-related potential technique. Cambridge, MA: MIT Press. [Google Scholar]
2.Blinowska KJ. 2011. Review of the methods of determination of directed connectivity from multichannel data. Med. Biol. Eng. Comput. 49, 521-529. ( 10.1007/s11517-011-0739-x) [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Bastos AM, Schoffelen JM. 2016. A tutorial review of functional connectivity analysis methods and their interpretational pitfalls. Front. Sys. Neurosci. 9, 175. ( 10.3389/fnsys.2015.00175) [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Lachaux JP, Rodriguez E, Martinerie J, Varela FJ. 1999. Measuring phase synchrony in brain signals. Hum. Brain Map. 8, 194-208. ( 10.1002/(SICI)1097-0193(1999)8:4<194::AID-HBM4>3.0.CO;2-C) [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Bressler SL, Seth AK. 2011. Wiener–Granger causality: a well established methodology. Neuroimage 58, 323-329. ( 10.1016/j.neuroimage.2010.02.059) [DOI] [PubMed] [Google Scholar]
6.McKenna T, McMullen T, Shlesinger M. 1994. The brain as a dynamic physical system. Neuroscience 60, 587-605. ( 10.1016/0306-4522(94)90489-8) [DOI] [PubMed] [Google Scholar]
7.Kaminski MJ, Blinowska KJ. 1991. A new method of the description of the information flow in the brain structures. Biol. Cybern. 65, 203-210. ( 10.1007/BF00198091) [DOI] [PubMed] [Google Scholar]
8.Baccalá LA, Sameshima K. 2001. Partial directed coherence: a new concept in neural structure determination. Biol. Cybern. 84, 463-474. ( 10.1007/pl00007990) [DOI] [PubMed] [Google Scholar]
9.Anderson CW, Stolz EA, Shamsunder S. 1998. Multivariate autoregressive models for classification of spontaneous electroencephalographic signals during mental tasks. IEEE Trans. Biomed. Eng. 45, 277-286. ( 10.1109/10.661153) [DOI] [PubMed] [Google Scholar]
10.Granger CW, Newbold P. 1974. Spurious regressions in econometrics. J. Econom. 2, 109-118. ( 10.1016/0304-4076(74)90034-7) [DOI] [Google Scholar]
11.Granger CW. 1981. Some properties of time series data and their use in econometric model specification. J. Econom. 16, 121-130. ( 10.1016/0304-4076(81)90079-8) [DOI] [Google Scholar]
12.Schmith T, Johansen S, Thejll P. 2012. Statistical analysis of global surface temperature and sea level using cointegration methods. J. Climate 25, 7822-7833. ( 10.1175/JCLI-D-11-00598.1) [DOI] [Google Scholar]
13.Dahlhaus R, Kiss IZ, Neddermeyer JC. 2018. On the relationship between the theory of cointegration and the theory of phase synchronization. Statist. Sc. 33, 334-357. ( 10.1214/18-sts659) [DOI] [Google Scholar]
14.Østergaard J, Rahbek A, Ditlevsen S. 2017. Oscillating systems with cointegrated phase processes. J. Math. Biol. 75, 845-883. ( 10.1007/s00285-017-1100-2) [DOI] [PMC free article] [PubMed] [Google Scholar]
15.VanRullen R, Koch C. 2003. Is perception discrete or continuous? Trends Cogn. Sci. 7, 207-213. ( 10.1016/S1364-6613(03)00095-0) [DOI] [PubMed] [Google Scholar]
16.Mathewson KE, Beck DM, Ro T, Maclin EL, Low KA, Fabiani M, Gratton G. 2014. Dynamics of alpha control: preparatory suppression of posterior alpha oscillations by frontal modulators revealed with combined EEG and event-related optical signal. J. Cogn. Neurosci. 26, 2400-2415. ( 10.1162/jocn_a_00637) [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Busch NA, Dubois J, VanRullen R. 2009. The phase of ongoing EEG oscillations predicts visual perception. J. Neurosci. 29, 7869-7876. ( 10.1523/JNEUROSCI.0113-09.2009) [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Fiser J, Berkes P, Orbán G, Lengyel M. 2010. Statistically optimal perception and learning: from behavior to neural representations. Trends Cogn. Sci. 14, 119-30. ( 10.1016/j.tics.2010.01.003) [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Ignaccolo M, Latka M, Jernajczyk W, Grigolini P, West BJ. 2010. Dynamics of electroencephalogram entropy and pitfalls of scaling detection. Phys. Rev. E 81, 031909. ( 10.1103/PhysRevE.81.031909) [DOI] [PubMed] [Google Scholar]
20.Ruse MG, Samson A, Ditlevsen S. 2020. Inference for biomedical data by using diffusion models with covariates and mixed effects. J. R. Statist. Soc. Ser. C Appl. Stat. 69, 167-193. ( 10.1111/rssc.12386) [DOI] [Google Scholar]
21.Kessler M, Rahbek A. 2004. Identification and inference for multivariate cointegrated and ergodic Gaussian diffusions. Stat. Inference Stoch. Process. 7, 137-151. ( 10.1023/B:SISP.0000026044.28647.56) [DOI] [Google Scholar]
22.Lütkepohl H. 2005. New introduction to multiple time series analysis. Berlin, Germany: Springer Science & Business Media. [Google Scholar]
23.Johansen S. 1995. Likelihood-based inference in cointegrated vector autoregressive models. Oxford, UK: Oxford University Press. [Google Scholar]
24.Velu R, Reinsel GC. 2013. Multivariate reduced-rank regression: theory and applications, vol. 136. Berlin, Germany: Springer Science & Business Media. [Google Scholar]
25.Wilms I, Croux C. 2016. Forecasting using sparse cointegration. Int. J. Forecast. 32, 1256-1267. ( 10.1016/j.ijforecast.2016.04.005) [DOI] [Google Scholar]
26.Smeekes S, Wijler E. 2018. Macroeconomic forecasting using penalized regression methods. Int. J. Forecast. 34, 408-430. ( 10.1016/j.ijforecast.2018.01.001) [DOI] [Google Scholar]
27.Liao Z, Phillips PC. 2015. Automated estimation of vector error correction models. Econom. Theory 31, 581-646. ( 10.1017/S026646661500002X) [DOI] [Google Scholar]
28.Liang C, Schienle M. 2019. Determination of vector error correction models in high dimensions. J. Econom. 208, 418-441. ( 10.1016/j.jeconom.2018.09.018) [DOI] [Google Scholar]
29.Levakova M, Ditlevsen S. Submitted. Penalization methods in fitting high-dimensional cointegrated VAR models: a review.
30.Zou H, Hastie T. 2005. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301-320. ( 10.1111/j.1467-9868.2005.00503.x) [DOI] [Google Scholar]
31.Bunea F, She Y, Wegkamp MH. 2012. Joint variable and rank selection for parsimonious estimation of high-dimensional matrices. Ann. Stat. 40, 2359-88. ( 10.1214/12-AOS1039) [DOI] [Google Scholar]
32.Hansen PR. 2003. Structural changes in the cointegrated vector autoregressive model. J. Econom. 114, 261-295. ( 10.1016/S0304-4076(03)00085-X) [DOI] [Google Scholar]
33.Li K, Ditlevsen S. In preparation. Clustering and inference in complicated mixture models.
34.Lee YY, Hsieh S. 2014. Classifying different emotional states by means of EEG-based functional connectivity patterns. PLoS ONE 9, e95415. ( 10.1371/journal.pone.0095415) [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Michel CM, He B. 2019. EEG source localization. Handb. Clin. Neurol. 160, 85-101. [DOI] [PubMed] [Google Scholar]
36.Schoffelen JM, Gross J. 2009. Source connectivity analysis with MEG and EEG. Hum. Brain Map. 30, 1857-1865. ( 10.1002/hbm.20745) [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Mahjoory K, Nikulin VV, Botrel L, Linkenkaer-Hansen K, Fato MM, Haufe S. 2017. Consistency of EEG source localization and connectivity estimates. Neuroimage 152, 590-601. ( 10.1016/j.neuroimage.2017.02.076) [DOI] [PubMed] [Google Scholar]
38.Levakova M, Christensen JH, Ditlevsen S. 2022. Supplementary material from: Classification of brain states that predicts future performance in visual tasks based on co-integration analysis of EEG data. Electronic Research Data Archive. University of Copenhagen. ( 10.17894/ucph.f19b3ddd-ea40-4d96-8787-c41aac9bd2e7) [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[RSOS220621C1] 1.Luck SJ. 2014. An introduction to the event-related potential technique. Cambridge, MA: MIT Press. [Google Scholar]

[RSOS220621C2] 2.Blinowska KJ. 2011. Review of the methods of determination of directed connectivity from multichannel data. Med. Biol. Eng. Comput. 49, 521-529. ( 10.1007/s11517-011-0739-x) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS220621C3] 3.Bastos AM, Schoffelen JM. 2016. A tutorial review of functional connectivity analysis methods and their interpretational pitfalls. Front. Sys. Neurosci. 9, 175. ( 10.3389/fnsys.2015.00175) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS220621C4] 4.Lachaux JP, Rodriguez E, Martinerie J, Varela FJ. 1999. Measuring phase synchrony in brain signals. Hum. Brain Map. 8, 194-208. ( 10.1002/(SICI)1097-0193(1999)8:4<194::AID-HBM4>3.0.CO;2-C) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS220621C5] 5.Bressler SL, Seth AK. 2011. Wiener–Granger causality: a well established methodology. Neuroimage 58, 323-329. ( 10.1016/j.neuroimage.2010.02.059) [DOI] [PubMed] [Google Scholar]

[RSOS220621C6] 6.McKenna T, McMullen T, Shlesinger M. 1994. The brain as a dynamic physical system. Neuroscience 60, 587-605. ( 10.1016/0306-4522(94)90489-8) [DOI] [PubMed] [Google Scholar]

[RSOS220621C7] 7.Kaminski MJ, Blinowska KJ. 1991. A new method of the description of the information flow in the brain structures. Biol. Cybern. 65, 203-210. ( 10.1007/BF00198091) [DOI] [PubMed] [Google Scholar]

[RSOS220621C8] 8.Baccalá LA, Sameshima K. 2001. Partial directed coherence: a new concept in neural structure determination. Biol. Cybern. 84, 463-474. ( 10.1007/pl00007990) [DOI] [PubMed] [Google Scholar]

[RSOS220621C9] 9.Anderson CW, Stolz EA, Shamsunder S. 1998. Multivariate autoregressive models for classification of spontaneous electroencephalographic signals during mental tasks. IEEE Trans. Biomed. Eng. 45, 277-286. ( 10.1109/10.661153) [DOI] [PubMed] [Google Scholar]

[RSOS220621C10] 10.Granger CW, Newbold P. 1974. Spurious regressions in econometrics. J. Econom. 2, 109-118. ( 10.1016/0304-4076(74)90034-7) [DOI] [Google Scholar]

[RSOS220621C11] 11.Granger CW. 1981. Some properties of time series data and their use in econometric model specification. J. Econom. 16, 121-130. ( 10.1016/0304-4076(81)90079-8) [DOI] [Google Scholar]

[RSOS220621C12] 12.Schmith T, Johansen S, Thejll P. 2012. Statistical analysis of global surface temperature and sea level using cointegration methods. J. Climate 25, 7822-7833. ( 10.1175/JCLI-D-11-00598.1) [DOI] [Google Scholar]

[RSOS220621C13] 13.Dahlhaus R, Kiss IZ, Neddermeyer JC. 2018. On the relationship between the theory of cointegration and the theory of phase synchronization. Statist. Sc. 33, 334-357. ( 10.1214/18-sts659) [DOI] [Google Scholar]

[RSOS220621C14] 14.Østergaard J, Rahbek A, Ditlevsen S. 2017. Oscillating systems with cointegrated phase processes. J. Math. Biol. 75, 845-883. ( 10.1007/s00285-017-1100-2) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS220621C15] 15.VanRullen R, Koch C. 2003. Is perception discrete or continuous? Trends Cogn. Sci. 7, 207-213. ( 10.1016/S1364-6613(03)00095-0) [DOI] [PubMed] [Google Scholar]

[RSOS220621C16] 16.Mathewson KE, Beck DM, Ro T, Maclin EL, Low KA, Fabiani M, Gratton G. 2014. Dynamics of alpha control: preparatory suppression of posterior alpha oscillations by frontal modulators revealed with combined EEG and event-related optical signal. J. Cogn. Neurosci. 26, 2400-2415. ( 10.1162/jocn_a_00637) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS220621C17] 17.Busch NA, Dubois J, VanRullen R. 2009. The phase of ongoing EEG oscillations predicts visual perception. J. Neurosci. 29, 7869-7876. ( 10.1523/JNEUROSCI.0113-09.2009) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS220621C18] 18.Fiser J, Berkes P, Orbán G, Lengyel M. 2010. Statistically optimal perception and learning: from behavior to neural representations. Trends Cogn. Sci. 14, 119-30. ( 10.1016/j.tics.2010.01.003) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS220621C19] 19.Ignaccolo M, Latka M, Jernajczyk W, Grigolini P, West BJ. 2010. Dynamics of electroencephalogram entropy and pitfalls of scaling detection. Phys. Rev. E 81, 031909. ( 10.1103/PhysRevE.81.031909) [DOI] [PubMed] [Google Scholar]

[RSOS220621C20] 20.Ruse MG, Samson A, Ditlevsen S. 2020. Inference for biomedical data by using diffusion models with covariates and mixed effects. J. R. Statist. Soc. Ser. C Appl. Stat. 69, 167-193. ( 10.1111/rssc.12386) [DOI] [Google Scholar]

[RSOS220621C21] 21.Kessler M, Rahbek A. 2004. Identification and inference for multivariate cointegrated and ergodic Gaussian diffusions. Stat. Inference Stoch. Process. 7, 137-151. ( 10.1023/B:SISP.0000026044.28647.56) [DOI] [Google Scholar]

[RSOS220621C22] 22.Lütkepohl H. 2005. New introduction to multiple time series analysis. Berlin, Germany: Springer Science & Business Media. [Google Scholar]

[RSOS220621C23] 23.Johansen S. 1995. Likelihood-based inference in cointegrated vector autoregressive models. Oxford, UK: Oxford University Press. [Google Scholar]

[RSOS220621C24] 24.Velu R, Reinsel GC. 2013. Multivariate reduced-rank regression: theory and applications, vol. 136. Berlin, Germany: Springer Science & Business Media. [Google Scholar]

[RSOS220621C25] 25.Wilms I, Croux C. 2016. Forecasting using sparse cointegration. Int. J. Forecast. 32, 1256-1267. ( 10.1016/j.ijforecast.2016.04.005) [DOI] [Google Scholar]

[RSOS220621C26] 26.Smeekes S, Wijler E. 2018. Macroeconomic forecasting using penalized regression methods. Int. J. Forecast. 34, 408-430. ( 10.1016/j.ijforecast.2018.01.001) [DOI] [Google Scholar]

[RSOS220621C27] 27.Liao Z, Phillips PC. 2015. Automated estimation of vector error correction models. Econom. Theory 31, 581-646. ( 10.1017/S026646661500002X) [DOI] [Google Scholar]

[RSOS220621C28] 28.Liang C, Schienle M. 2019. Determination of vector error correction models in high dimensions. J. Econom. 208, 418-441. ( 10.1016/j.jeconom.2018.09.018) [DOI] [Google Scholar]

[RSOS220621C29] 29.Levakova M, Ditlevsen S. Submitted. Penalization methods in fitting high-dimensional cointegrated VAR models: a review.

[RSOS220621C30] 30.Zou H, Hastie T. 2005. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301-320. ( 10.1111/j.1467-9868.2005.00503.x) [DOI] [Google Scholar]

[RSOS220621C31] 31.Bunea F, She Y, Wegkamp MH. 2012. Joint variable and rank selection for parsimonious estimation of high-dimensional matrices. Ann. Stat. 40, 2359-88. ( 10.1214/12-AOS1039) [DOI] [Google Scholar]

[RSOS220621C32] 32.Hansen PR. 2003. Structural changes in the cointegrated vector autoregressive model. J. Econom. 114, 261-295. ( 10.1016/S0304-4076(03)00085-X) [DOI] [Google Scholar]

[RSOS220621C33] 33.Li K, Ditlevsen S. In preparation. Clustering and inference in complicated mixture models.

[RSOS220621C34] 34.Lee YY, Hsieh S. 2014. Classifying different emotional states by means of EEG-based functional connectivity patterns. PLoS ONE 9, e95415. ( 10.1371/journal.pone.0095415) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS220621C35] 35.Michel CM, He B. 2019. EEG source localization. Handb. Clin. Neurol. 160, 85-101. [DOI] [PubMed] [Google Scholar]

[RSOS220621C36] 36.Schoffelen JM, Gross J. 2009. Source connectivity analysis with MEG and EEG. Hum. Brain Map. 30, 1857-1865. ( 10.1002/hbm.20745) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS220621C37] 37.Mahjoory K, Nikulin VV, Botrel L, Linkenkaer-Hansen K, Fato MM, Haufe S. 2017. Consistency of EEG source localization and connectivity estimates. Neuroimage 152, 590-601. ( 10.1016/j.neuroimage.2017.02.076) [DOI] [PubMed] [Google Scholar]

[RSOS220621C38] 38.Levakova M, Christensen JH, Ditlevsen S. 2022. Supplementary material from: Classification of brain states that predicts future performance in visual tasks based on co-integration analysis of EEG data. Electronic Research Data Archive. University of Copenhagen. ( 10.17894/ucph.f19b3ddd-ea40-4d96-8787-c41aac9bd2e7) [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Classification of brain states that predicts future performance in visual tasks based on co-integration analysis of EEG data

Marie Levakova

Jeppe Høy Christensen

Susanne Ditlevsen

Roles

Abstract

1. Introduction

2. Cointegration methodology

2.1. Model

2.2. Estimation procedure

2.2.1. Single trial

2.2.2. Repeated trials

2.2.3. Dealing with the reference level in electroencephalogram measurements

2.2.4. Regularization

2.2.5. Determination of the cointegration rank

2.2.6. Test of structural difference between two cointegrated networks

3. Application to electroencephalogram recordings from a visual task experiment

Figure 1.

3.1. Fitting the cointegration model

Table 1.

Figure 2.

Figure 3.

Figure 4.

Table 2.

Figure 5.

3.2. Brain networks identified as predictors of performance in the visual task

Table 3.

Figure 6.

Table 4.

4. Discussion

Acknowledgements

Data accessibility

Authors' contributions

Conflict of interest declaration

Funding

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases