Bayesian Inferences on Neural Activity in EEG-Based Brain-Computer Interface

Tianwen Ma; Yang Li; Jane E Huggins; Ji Zhu; Jian Kang

doi:10.1080/01621459.2022.2041422

. Author manuscript; available in PMC: 2023 Mar 18.

Published in final edited form as: J Am Stat Assoc. 2022 Mar 18;117(539):1122–1133. doi: 10.1080/01621459.2022.2041422

Bayesian Inferences on Neural Activity in EEG-Based Brain-Computer Interface

Tianwen Ma ¹, Yang Li ², Jane E Huggins ³, Ji Zhu ², Jian Kang ^1,^*

PMCID: PMC9609845 NIHMSID: NIHMS1798425 PMID: 36313593

Abstract

A brain-computer interface (BCI) is a system that translates brain activity into commands to operate technology. A common design for an electroencephalogram (EEG) BCI relies on the classification of the P300 event-related potential (ERP), which is a response elicited by the rare occurrence of target stimuli among common non-target stimuli. Few existing ERP classifiers directly explore the underlying mechanism of the neural activity. To this end, we perform a novel Bayesian analysis of the probability distribution of multi-channel real EEG signals under the P300 ERP-BCI design. We aim to identify relevant spatial temporal differences of the neural activity, which provides statistical evidence of P300 ERP responses and helps design individually efficient and accurate BCIs. As one key finding of our single participant analysis, there is a 90% posterior probability that the target ERPs of the channels around visual cortex reach their negative peaks around 200 milliseconds post-stimulus. Our analysis identifies five important channels (PO7, PO8, Oz, P4, Cz) for the BCI speller leading to a 100% prediction accuracy. From the analyses of nine other participants, we consistently select the identified five channels, and the selection frequencies are robust to small variations of bandpass filters and kernel hyper-parameters.

Keywords: Bayesian Analysis, Neural Activity, Gaussian Process, Brain-computer Interface

1. Introduction

1.1. Background

A brain-computer interface (BCI) is a device that interprets brain activity to operate technology. An electroencephalogram (EEG)-based BCI speller system is a particular BCI device that enables a person to “type” words without using a physical keyboard by recording EEG brain activity. It has been used for assisting people with disabilities, such as amyotrophic lateral sclerosis (ALS), with regular communication (Wolpaw et al., 2018). The brain activity is measured with EEG signals, which have the features of non-invasiveness, low cost, and high temporal resolution.

The conventional BCI framework is based on the P300 event-related potential (ERP) BCI design, known as the P300 ERP-BCI design (Farwell and Donchin, 1988). However, we also include other types of ERPs that help interpret and classify the brain activity. An ERP is a signal pattern in the brain activity in response to an external event. The P300 ERP is a particular ERP that occurs in response to a rare, but relevant event (i.e., highlighting a group of characters on the screen). The relevant (target) P300 ERP has a positive deflection in voltage with the latency (the delay from the onset of the event to the first response peak) around 300 ms (Rodden and Stemmer, 2008). The rightmost plot in Figure 1 shows the typical target and non-target P300 ERPs from a real participant.

Fig. 1 — An illustration of the conventional procedure of the P300 ERP-BCI operation. The P300 ERP-BCI design presents a sequence of events on a virtual screen to the user. The user focuses on a specific character and responds to different events eliciting different brain signals (P300 or no P300). These brain signals are recorded by the EEG machine. Classifiers are then constructed to analyze EEG signals in a fixed time response window after each event to make a binary decision whether a P300 ERP response is produced. The binary classification results are converted into character-level probabilities, and the character with the highest probability is shown on the screen.

There are three challenges in making valid inferences on brain activity in the P300 ERP-BCI system. First, the signal-to-noise ratio of the EEG signals is quite low. A typical P300 ERP-BCI system requires collecting data from multi-dimensional input and repeated sequences of events. Second, to reduce the time to complete the sequence of events necessary to present all the keys on the virtual keyboard, we minimize the time between adjacent events within each sequence and between adjacent sequences. Thus, the time between events is shorter than the time required to produce a P300 ERP response. Therefore, the observed EEG signal is a mixture of overlapping ERP responses, which may or may not contain a P300 ERP. No formal statistical methods can resolve this mixture and make valid inferences on the overlapping responses. Finally, during the calibration time in the current P300 ERP-BCI system, participants may experience variations in attention from fatigue to boredom, leading to missed or delayed responses that may obscure statistical inferences.

1.2. Conventional Framework with Motivating Dataset

The conventional P300 ERP-BCI design presents a sequence of events on a virtual keyboard and analyzes the EEG signals in a fixed time response window after each event to make a binary decision whether a P300 ERP response is produced by that event, which forms the fundamental basis of the P300 ERP-BCI operation. For multi-channel EEG signals, channel-specific EEG signal segments are concatenated for binary classification. Here, an EEG channel is defined as an electrode capturing brain activity. Multiple electrodes are placed on the scalp to achieve stable prediction accuracy. The binary classification results are then converted into character-level probabilities. We denote “key” and “target key” as a generic character to be typed and the specific character that the user wants to type, respectively. Usually, events within each sequence cover all the possible keys, but multiple keys can exist in each event. Thus, the P300 ERP-BCI is designed to identify the unique key from the intersection of all events that produce P300 ERP responses within each sequence. Finally, the conventional P300 ERP-BCI design presents a fixed number of events (stimuli) with a fixed number of sequences before the final decision is made. Figure 1 describes the procedure of the conventional P300 ERP-BCI operation.

To better illustrate the framework, we briefly introduce the motivating dataset following the experimental protocol by (Thompson et al., 2014). It is part of the database of non-invasive experimental data in the P300 ERP-BCI experiments conducted at the University of Michigan Direct Brain Interface Laboratory (UM-DBI). Under the protocol mentioned above, each participant copied a multi-character phrase during the experimental session. The dataset of each participant consisted of the training (calibration) data and the testing (free-typing) data. We created a participant-specific classifier with the training data and tested on the free-typing data. The study adopted the row-and-column paradigm (RCP) design developed by Farwell and Donchin in 1988. The BCI display screen was a 6 × 6 grid of characters. Each event was either a row stimulus or a column stimulus. The order of the row and column stimuli was random, and it looped through all rows and columns every consecutive 12 stimuli, called a sequence. For each character of interest, participants were asked to mentally count when they saw a row or column stimulus containing the character of interest and to ignore stimuli that did not include the current character of interest. Thus, each sequence always had two events (stimuli) that were supposed to elicit P300 ERPs (one row and one column) out of every 12 events. In particular, the left side of Figure 1 shows 36 characters in a 6 × 6 grid with the fourth row being highlighted.

Many state-of-the-art machine learning (ML) methods such as stepwise linear discriminant analysis (swLDA) (Donchin et al., 2000), (Krusienski et al., 2008), logistic regression (LR) (Viana et al., 2014), random forest (RF) (Okumuş and Aydem ir, 2017), support vector machine (SVM) (Kaper et al., 2004), convolutional neural network (CNN) (Cecotti and Graser, 2010), independent component analysis (ICA) (Xu et al., 2004), and recent XGBoost (Leoni et al., 2021) have successfully constructed binary classifiers for P300-ERPs. These discriminant approaches treat target or non-target stimuli as the response variable and the truncated-and-concatenated EEG signal segments as feature vectors. Although these approaches are straightforward to implement, it is difficult for them to make statistical inferences about brain activity with overlapping P300 ERP responses. The functional graphical model (Qiao et al., 2019, FGM) is a powerful tool to model the conditional dependency over functional variables and it has been used to model multiple-subject EEG data in an alcoholism study for functional connectivity analysis. However, FGM cannot be directly adopted in our study due to the differences in the goal of analysis and the data structure.

As a flexible tool for Bayesian nonparametrics and machine learning, the Gaussian Process (GP), a stochastic process where every finite collection of its realizations follows a multivariate normal distribution, has been widely used for modeling functional and dependent data over time and space (Rasmussen, 2003). Different extensions of GPs have been proposed for different neuroscience applications. In particular, for feature selection in scalar-on-image regression, the soft-thresholded GP prior (Kang et al., 2018) models sparse, continuous and piece-wise smooth functions. This prior has also been extended to model the sparsity and dependence in the effects of nodes over a graph in the framework of Bayesian network marker selection (Cai et al., 2020). However, none of these existing GPs can be directly applied to detection of our P300 ERPs in EEG signals.

1.3. Our Contributions

To the best of our knowledge, we are among the first to study the probability distribution of multi-trial EEG signals from real participants in BCI experiments using a Bayesian generative model. Our Bayesian analysis explores the mechanism of neural activity in response to external stimuli. Our model explicitly addresses the challenge of overlapping ERPs between adjacent stimuli, and the model can be applied to multi-channel EEG signals without signal concatenation nor segmentation. We develop a new GP-based prior to the spatial-temporal varying trajectories of P300 ERP responses. The proposed prior facilitates selecting important time windows in which the average brain activity in response to the target stimuli and non-target stimuli is different (split) or the same (merge); thus, it is termed the split-and-merge GP (SMGP). We make fully posterior inferences on participant-and-channel-specific P300 ERPs in a fixed EEG response window.

Based on our Bayesian analysis, we first aim to identify significant split time windows for frontal, central, parietal, parietal-occipital, and occipital channels. We do not expect to identify significant split time windows for channels close to ears. We study the neural activity patterns among both healthy controls and participants with the Amyotrophic Lateral Sclerosis (ALS) disease under the ERP-BCI design. Finally, we perform the brain region ranking by the participant-specific information criterion. We hypothesize brain regions associated with the cognitive function as well as the visual function will be selected with high reproducibility across participants (Brunner et al., 2010). In addition, we expect that the signal to detect target P300 ERPs for the participant with ALS is weaker than healthy controls, but it should still be significant for classification. Finally, we expect that it may take longer for senior participants than for the young participants to reach the peak of target P300 ERP responses (Polich et al., 1985).

The paper is organized as follows: Section 2 presents the model for the probability distribution of EEG signals under the P300 ERP-BCI design along with the prior specifications. Section 3 develops the method for posterior inference. Sections 4 and 5 present the analyses of the multi-channel EEG data from real BCI users and simulations, respectively. Section 6 concludes the paper with a brief discussion.

2. Bayesian Modeling of EEG-BCI Data

2.1. Notation and Problem Setup

We begin with the notation. Denote by $ℝ$ the real line. For any interval $A \subset ℝ$ , let $I_{A} (t) = 1$ if $t \in A$ and 0 otherwise. Denote by $N (μ, Σ)$ a normal distribution with mean μ and variance (covariance) Σ. Denote by $G P (μ, κ)$ the GP with the mean function μ and the covariance kernel κ. All the time variables in this manuscript are multiples of a pre-specified unit time.

Our model focuses on the multi-channel EEG data for one participant. Suppose a total of L target characters are typed for BCI calibration in the training data. For each character l(l = 1,…,L), the BCI generates I sequences of J(J = 12) stimuli consisting of six row stimuli, denoted as 1,…,6 and six column stimuli, denoted as 7,…,12 on the 6 × 6 keyboard in a random order (Figure 2a.). Let i(i = 1,…,I) index the sequence. For the Ith sequence of the Ith target character, let W_l,i, = (W_l,i,1,…,W_l,i,12)^T represent the starting time points of the J stimuli (stimulus-occurring indicators) and take values from permutations of {1,…,12}. For example, $W_{l, i} = {(\underset{J dimension}{\underset{︸}{8, \dots, 3, 2, \dots 11}})}^{⊤}$ indicates that the first row, …, the last row, the first column … and the last column appear in the 8th stimulus, … 3rd stimulus, 2nd stimulus, …, and 11th stimulus, respectively. Let Y_l = (Y_l,1,…,Y_l,12)^T represent the stimulus-type indicators, where Y_l,j, ∈{0,1} with the constraint $\sum_{j = 1}^{6} Y_{l, j} = \sum_{j = 7}^{12} Y_{l, j} = 1$ . The event Y_l,j, = 1 indicates the lth target letter is located in the jth row stimulus for j = 1,…,6 and the (j−6) th column stimulus for j = 7,…,12. Thus, each possible value of Y_l uniquely determines one target character on the 6 × 6 keyboard. For example, $Y_{l} = {(\underset{row}{\underset{︸}{0, 0, 0, 1, 0, 0}}, \underset{colum}{\underset{︸}{0, 1, 0, 0, 0, 0}})}^{⊤}$ indicates that the target letter is “T” located at the fourth row and the second column. We drop the sequence index i for Y_l because the stimulus-type indicators are always the same given the same character l. For all the sequences, the time domain of the EEG signals are registered to [0,T]. Finally, suppose we consider E channels of EEG signals and let e(e = 1,…,E) index the channel, and we denote X_l,I,e(t) as the observed EEG signal intensity of the ith sequence and Ith target character from channel e at time t ∈[0,T].

Fig. 2 — (a). A figure showing a 6 × 6 grid screen of the ERP-BCI speller system, where only one row or one column was being flashed grey for each stimulus. (b). A figure from *Wikimedia Commons* (URL) by Brylie Christopher Oxley / CC0, 2017, demonstrating a 64-channel EEG locations using the International 10–20 standard developed by Jasper. Channels marked with red were used in our ERP-BCI design. (c). An illustration of the data generative mechanism of a single-channel EEG sequence under the ERP-BCI design. Red, blue, green, and yellow blocks represented target responses, non-target responses, background noise irrelevant to stimuli, and observed signals (X_t). W, Y were stimulus-occurring indicators and stimulus-type indicators. We assumed each stimulus-related potential could be characterized by β₁ or β₀ with a long and fixed response window; the observed signal was generated when we aligned different signal components and summed up at each time point. For example, given the target character was “T”, the fourth stimulus was the target one. The graph in the bottom right of the figure illustrates the empirical ERP estimates from channel Cz based on a real participant, where target and non-target ERP estimates were averaged over 570 and 2850 EEG signal segments, respectively. A significant magnitude difference between target and non-target ERPs was observed around 300 ms post-stimulus.

2.2. A Bayesian Generative Model

Suppose we are interested in making inferences on the P300-ERP in a window of length T_z right after the onset of the stimulus. We refer to T_z as the response window length and assume T_z is a multiple of d for simplicity, where d is the stimulus-to-stimulus interval. The total length of time T per sequence is then defined as T = T_z + (J−1)d. We consider the observed EEG signals X_l,i,e,(t) as a mixture of the J stimulus-induced potentials given stimulus-type indicators Y_l and stimulus-occurring times W_l,i as follows: For any t ∈[0,T],

\begin{array}{l} X_{l, i, e} (t) = M_{l, i, e} (t) + ϵ_{l, i, e} (t), τ_{l, i, j} = t - (W_{l, i, j} - 1) d, \\ M_{l, i, e} (t) = \sum_{j = 1}^{J} [β_{1, e} (τ_{l, i, j}) Y_{l, j} + β_{0, e} (τ_{l, i, j}) (1 - Y_{l, j})] I_{[0, T_{z}]} (τ_{l, i, j}), \end{array}

(1)

where M_l,I,e(t) is the expected EEG signals at time t from channel e induced by J stimuli that occur at different time points. The two unknown functions β_1,e (τ) and β_1,e(τ) (τ ∈[0,T_z])) represent the average brain activity responses to the target and the non-target stimulus, respectively. To simplify the problem, we assume that the shape and magnitude of ERP functions only depend on the stimulus-type indicators, regardless of the stimulus location or the stimulus order. The random noise ϵ_l,i,e,(t) characterizes the intrinsic brain activity of channel e that is unrelated to the stimulus responses. Assuming that ϵ_l,i,e,(t) is spatially-correlated across channels and temporally dependent, we consider the following additive model:

\begin{array}{l} ϵ_{l, i, e} (t) = ζ_{l, i, e} + ε_{l, i} (t), \\ ζ_{l, i} = {(ζ_{l, i, 1}, \dots, ζ_{l, i, E})}^{⊤} ~ N (0, C_{s}), \\ ε_{l, i} (t) = ρ_{t, 0} + \sum_{m = 1}^{q} ρ_{t, m} ε_{l, j} (t - m d) + ε_{l, i, 0} (t), ε_{l, i, 0} (t) ~ N (0, σ_{x}^{2}), \end{array}

where ζ_l,i,e is the channel-specific random effect and ζ_l,i,1,…,ζ_l,i,E jointly follows a multivariate normal distribution with the mean zero and the covariance matrix C_s. The temporal random effect ε_l,i(t) is assumed to follow an autoregressive model of order q and noise variance $σ_{x}^{2}$ . For a given channel e and a letter l, Figure 2c illustrates the proposed Bayesian generative model for half the length of a sequence. Among the consecutive six stimuli, there exists one target stimulus at the 4th stimulus.

2.3. The Split-and-Merge GP

To identify the time window that contains major differences in brain activity responses between target and non-target stimuli, we develop a new GP-based model to model the joint prior distribution of β_0,e(τ) and β_1,e(τ), for τ ∈[0,T_z ], named as the split-and-merge GP (SMGP). For k = 0, 1, we assume that {β_k,1(τ),…,β_k,E(τ)} are independent and marginally follow the same prior distribution specified by the SMGP. For simplicity, we drop the channel-specific subscript e to specify the SMGP as follows:

β_{k} (τ) = α_{k} (τ) ζ (τ) + α_{0} (τ) {1 - ζ (τ)},

(2)

where $α_{k} (τ) ~ G P (0, κ_{α})$ and ζ(τ) ∈[0,T]. Note that β₀(τ) = α₀ and β₁(τ) is the weighted average between α₁(τ) and α₀(τ) by ζ(τ). When ζ(τ) = 0, β₀(τ) = β₁(τ) with probability one, i.e. the two processes are merged; when ζ(τ) = β₀(τ) ≠ β₁(τ) with probability one. Thus, we refer to ζ(τ) as the split-and-merge indicator process. Let $W_{s} = {τ : ζ (τ) > ζ_{0}}$ and $W_{m} = {τ : ζ (τ) \leq ζ_{0}}$ represent the split time window and the merge time interval, respectively, where ζ₀ is a hyper-parameter. For efficient posterior inference on $W_{s}$ and $W_{m}$ , we define the truncated GP (TGP) similar to the ordinary GP as follows. A time-continuous stochastic process ${ζ (τ), τ \in T}$ is a truncated GP if and only if for every finite set of indices τ₁,…,τ_p in the index set $T$ , $ζ_{τ_{1}}, \dots, ζ_{τ_{p}}$ follows a multivariate truncated Gaussian distribution, where the truncated domain has the block rectangular shape. In this case, we assign a TGP prior with mean 0.5 and covariance kernel κ_ζ truncated on [0,1] to ζ(τ), i.e. $ζ (τ) ~ T G P_{[0, 1]} (0.5, κ_{ζ})$ .

3. Posterior Inference

3.1. Model Representation and Prior Specification

Let $M N (M, U, V)$ denote a matrix normal distribution with location matrix M and two scale matrices U and V (Dawid, 1981). We rewrite equation (1) in the form of matrix normal distribution such that

X_{l, i} ~ M N (M_{l, i}, C_{t}, C_{s}),

(3)

where $X_{l, i} = {(X_{l, i, e})}_{e = 1}^{E}$ and $M_{l, i} = {(M_{l, i, e})}_{e = 1}^{E}$ are matrix-wise observed EEG signals and predicted EEG signals using convolution for the Ith sequence, Ith target character, respectively. C_s and C_t are the spatial and temporal covariance matrices jointly characterizing the random error $ϵ_{l, i} = {(ϵ_{l, i, e})}_{e = 1}^{E}$ , respectively. equation (3) can be expressed as

v e c (X_{l, i}) ~ N (v e c (M_{l, i}), C_{s} \otimes C_{t}),

(4)

where ⊗ is the Kronecker product and vec(·) is the vectorization operator that converts the matrix to the column vector. The log-likelihood of the matrix normal model is

\sum_{l, i} - \frac{T}{2} \log \det (C_{s}) - \frac{E}{2} \log \det (C_{t}) - \frac{1}{2} tr [C_{s}^{- 1} {(X_{l, i} - M_{l, i})}^{T} C_{t}^{- 1} (X_{l, i} - M_{l, i})] .

(5)

Therefore, we rewrite the mean structure of M_l,i with convolution as follows:

\begin{array}{l} v e c (X_{l, i}) ~ N (D i a g (G_{l, i}) v e c (β), C_{s} \otimes C_{t}), i = 1, \dots, I, l = 1, \dots, L, \\ β = {(β_{e})}_{e = 1}^{E}, β_{e} = {(β_{1, e}^{T}, β_{0, e}^{T})}^{T} = S (ζ_{e}) α_{e} = A (α_{e}) ζ_{e}, \\ α = {(α_{e})}_{e = 1}^{E}, α_{e} = {(α_{1, e}^{T}, α_{0, e}^{T})}^{T}, \end{array}

(6)

where β_1,e,β_0,e are channel-specific response functions to target and non-target stimuli after we have applied the SMGP prior. α_1,e,α_0,e are channel-specific response functions to target and non-target stimuli before selection. They follow the $G P (0, κ_{α})$ with the scale parameters $σ_{0, 1, e}^{2}$ , $σ_{0, 0, e}^{2}$ . We use a γ – exponential function shown in equation (7) to specify the kernel covariance function.

k (x_{i}, x_{j}) = σ_{0}^{2} \exp {- {(\frac{{‖ x_{i} - x_{j} ‖}_{2}^{2}}{s_{0}})}^{γ_{0}}},

(7)

where 0 ≤ γ₀ < 2, s₀ > 0. In practice, we treat them as the hyper-parameters and select the optimal pair by the Bayes factor (Kass and Raftery, 1995). ζ_e follows the truncated normal distribution $T N_{D} (μ, Σ)$ with the prior mean 05. and the prior covariance matrix Σ_ζ on the truncated domain ${[0, 1]}^{T_{z}}$ . We use the method by Li and Ghosh in 2015 for efficient sampling. S, A are linear transformations that map α_e,ζ_e, to β_e. G_l,i is the linear transformation that maps β_e to the predicted EEG signals via convolution. For C_s, we decompose C_s as $σ_{x}^{2} {\tilde{C}}_{s}$ , where $σ_{x}^{2}$ follows the inverse gamma distribution Γ⁻¹(a_s,b_s) with the shape parameter a_s and the rate parameter b_s, and ${\tilde{C}}_{s}$ is a positive definite matrix characterized by the distance measure among selected channels. To simplify, we assume all selected channels share the same distance such that ${\tilde{C}}_{s}$ has a compound symmetry structure dependent on the scalar parameter ρ_s. We use an adaptive rejection sampling method (Gilks and Wild, 1992) to sample ρ_s, where it is originally generated from the uniform distribution U(0, 1). For C_t (ρ_t), we assume ρ_t follows a discrete uniform distribution $U_{d} (V_{ρ_{1}})$ , where ρ_t is a 2-dimension vector and takes values from a discrete set $V_{ρ_{t}}$ for which the correlation matrix is invertible, i.e., ∥ρ_t ∥₁<1. Finally, the prior specification is as follows:

\begin{array}{l} α_{1, e} ~ G P (0, σ_{1, e}^{2} κ_{α}), α_{0, e} ~ G P (0, σ_{0, e}^{2} κ_{α}), ζ_{e} ~ T N_{[0, 1]} (0 .5, Σ_{ζ}), \\ σ_{x}^{2} ~ Γ^{- 1} (a_{s}, b_{s}), ρ_{s} ~ U (0, 1), ρ_{t} ~ U_{d} (V_{ρ_{t}}) . \end{array}

(8)

3.2. Markov Chain Monte Carlo

We perform the standard Markov chain Monte Carlo (MCMC) method to sample parameters from their posterior conditional distribution given the training set. We adopt the Gibbs sampler to simulate the posterior distribution of α, ζ, $σ_{x}^{2}$ , ρ_s, and ρ_t. Since ζ takes continuous values between 0 and 1, we average the posterior samples of β₁,β₀ whenever ζ samples are smaller than the threshold ζ₀ for the explicit split-and-merge effect, where ζ₀ is a hyper-parameter, and it takes discrete values in {0.1,0.2,…,0.8,0.9} and the optimal one is selected by the Bayes factor. For the convergence check, we run multiple chains with different seed values, and evaluate the conditional log-likelihood and Gelman-Rubin statistic of each parameter (Gelman and Rubin, 1992). Details of the Gibbs sampling scheme can be found in the Supplementary Material.

3.3. Posterior Predictive Probability for Character Classification

Under the RCP design, the selection of the target character requires the selection of the target row among six candidate rows and the target column among six candidate columns. Let W*,Y*, and X* be I* sequences of stimulus-occurring indicators, stimulus-type indicators, and I* sequences of matrix-wise EEG signals from new observations given the same target character ω, respectively. Let Θ be the parameter set defined in equation (1). Let y^ω ∈{0,1},r^ω,c^ω be the stimulus-type indicator, row index, and column index associated with the target character ω, respectively. The probability of ω as the target character is

\Pr (Y^{*} = y^{ω} ∣ X^{*}, W^{*}, X, W, Y) = \int \Pr (Y^{*} = y^{w} ∣ Θ, X^{*}, W^{*}) π (Θ ∣ X, W, Y) d Θ = \int \Pr (Y^{*} = y^{ω}, y_{r^{ω}}^{ω} = y_{c^{ω}}^{ω} = 1, y_{j}^{*} = 0, j \notin {r^{ω}, c^{ω}} ∣ Θ; X^{*}, W^{*}) π (Θ ∣ X, W, Y) d Θ

where $\Pr (Y^{*} = y^{ω}, y_{r^{ω}}^{ω} = y_{c^{ω}}^{ω} = 1, y_{j}^{ω} = 0, j \notin {r^{ω}, c^{ω}} ∣ Θ; X^{*}, W^{*})$ is proportional to $\Pr (Y^{*} = y^{ω}) \prod_{i = 1}^{I^{*}} π (X_{i}^{*} ∣ Θ; y_{r^{ω}}^{ω} = y_{c^{ω}}^{ω} = 1, y_{j}^{ω} = 0, j \notin {r^{ω}, c^{ω}}, W_{i}^{*})$ .

Here, Pr(Y* = y^ω) = 1/36 is the predictive prior on each candidate character if we do not have prior knowledge about the inferred target character. In practice, when we need multiple sequences to select the target character, we compute the cumulative character-based posterior conditional probability vector by multiplying sequence-specific posterior conditional likelihood estimates together.

4. Analysis of EEG-BCI Data

We performed the analysis of EEG-BCI data and demonstrated the detailed results from one real BCI participant, referred to as Participant A. Since the primary goal of our analysis was to identify the spatial-temporal pattern of P300 ERP response signals, participants with clear signal patterns (larger signal-to-noise ratios) were preferred. We selected ten participants among the total population under the RCP design such that the number of sequence to achieve 100% accuracy on the training data with the logistic model was smaller than five. The steps of real-data analysis were as follows: First, we fitted the model to all 16 channels using the spatial dependency correlation of the compound symmetry structure. We identified the spatial-temporally activated locations. Next, we performed the channel selection based on our method and fitted the model to the data for the selected channels using the same spatial dependency assumption. Then, we fitted six existing ML methods to the dataset and compared the prediction accuracy of our method to the other ML methods to evaluate the goodness of model fit. Finally, we provided the cross-participant, sensitivity and reproducibility analyses.

4.1. Dataset and Pre-processing

For the training session, each participant was asked to wear an EEG cap with 16 channels corresponding to different regions on the brain surface and sit approximately 0.8 m from a 17-inch monitor with the BCI display. Figure 2b shows the spatial distribution of channels. Channels marked with red were used for recording and analysis purposes. The abbreviated names were F3, Fz, F4, T7, C3, Cz, C4, T8, CP3, CP4, P3, Pz, P4, PO7, PO8, and Oz (Thompson et al., 2014). For the calibration dataset, each participant copied a 19-character phrase “THE_QUICK_BROWN_FOX” including three spaces. The stimulus presentation and recording were controlled using the BCI2000 software platform (Schalk et al., 2004). An event was defined as a row stimulus or column stimulus, which highlighted for 31.25 ms and paused for 125 ms afterwards, and the total of 156.25 ms was referred to as the stimulus-to-stimulus interval d. We defined the 12 stimuli flashing all rows and columns as a sequence and defined multiple sequences as a super-sequence. In our P300 ERP-BCI design, a super-sequence corresponded to the EEG signals associated with the given target character. During the training session, each super-sequence included 15 sequences, and a total of 19 super-sequences were collected. Extra time was recorded after the last stimulus in the super-sequence. The length of each super-sequence was about 29,000 ms with the sampling rate of 256 Hz.

The data pre-processing steps are summarized as follows: First, we applied a notch filter at 60 Hz to remove the power line noise and a band-pass filter between 0.5Hz and 6Hz to all 16 channels and then down-sampled raw signals with a decimation factor of eight. Second, we truncated each character-specific super-sequence into 15 sequence segments, where each sequence segment contained 12 consecutive stimuli and subsequent signals of 20 time points to record the entire ERP response to the last stimulus within the single sequence. Each sequence segment contained 2,500 ms, 80 sampling points.

4.2. Model Settings

To evaluate the model performance, we chose the odd sequences in the calibration dataset as the training set and used the even sequences as the testing set. This splitting scheme reduced the overlap between adjacent sequences and attenuated the effect of any shift in attention compared to a random training-testing-split scheme. Since it took time for participants to be familiar with the study design or identify the target characters, we excluded the first sequence of each super-sequence from the training set. Therefore, for the SMGP method, the training set and testing set both ended up with 133 (7 sequences for 19 characters) 80-dimension sequence segments for each channel. We used the cumulative character-level accuracy at seven sequences for prediction evaluation.

For other existing ML methods, we truncated the original character-specific super-sequence into 180 stimulus signal segments in addition to the same band-pass filter, down-sampling procedure, and splitting scheme, where each stimulus signal segment started from the onset of a single stimulus and lasted for 780 ms, i.e. 25 sampling points. Therefore, the training set and testing set both contained 1596 (19 characters, each contained 7 sequences of 12 stimuli) 400-dimension concatenated truncated signal segments for all 16 channels.

For the SMGP method, κ_α was generated from a γ-exponential kernel with hyper-parameters s₀ = 0.5, γ ₀ = 1.8, $σ_{0, 1}^{2} = 1$ , and $σ_{0, 0}^{2} = 1$ . For feature selection and prediction of the swLDA method, the inclusion and exclusion probabilities were 0.1 and 0.15, and at most 30% of the feature vector was selected. We ran the MCMC algorithm for 2,000 iterations with 1,000 burn-ins for three chains with different seed values. We concluded that the algorithm converged, as the Gelman-Rubin statistics for the parameters of interest were all smaller than 1.1.

To rank the importance of the channels, we propose the following statistics based on the SMGP model fitting of multi-channels EEG data:

R_{e}^{2} = \frac{V a r {E (X_{e} (t) ∣ M_{e} (t)}}{Var {X_{e} (t)}},

(9)

where the numerator and the denominator explained the variability of the convolution components in equation (1) across sequences and the variability of the observed signals across sequences, respectively. Under our model assumption, $R_{e}^{2}$ took values between 0 and 1. To examine the proposed information criterion, we included from the optimal two and up to five channels for sub-channel analyses. For each combination of channels, we refitted the model and reported the prediction accuracy.

4.3. Single-Participant Results

We focus on the results of Participant A in this subsection.

ERP Estimates

The left panel of Figure 3 showed the mean estimated ERP functions of target and non-target stimuli and their 95% credible bands based on the 16-channel model fitting result. Channel-specific plots are arranged by their relative spatial locations. In general, we saw a clear separation of target against non-target ERP functions for all channels except channel T8. Between 400 ms and 500 ms post stimulus, the target ERP functions gradually declined to zero and collapsed with non-target ERP functions, which shows that our SMGP prior worked well in this case.

Fig. 3 — Left Panel: Channel-specific ERP function estimates of target and non-target stimuli with the 95% credible bands of Participant A. Right Panel: Channel-specific significant temporal intervals by varying thresholds of median split probabilities of Participant A. The result was produced by the 16-channel model fitting results. The varying thresholds included 0.6,0.75, and 0.9. We arranged the channel-specific plots by their spatial locations. The upper and lower rows represented the front and back of the head. A “z” (zero) referred to a channel placed on the mid-line sagittal plane of the skull. Channels with even numbers (2,4,6,8) referred to the electrode placement on the right side of the head, whereas channels with odd numbers (1,3,5,7) referred to those on the left.

Split Windows

The right panel of Figure 3 showed channel-specific significant split time windows with varying thresholds of median split probabilities of 0.6, 0.75, and 0.9. We rearranged channel-specific brain activity plots by their spatial locations. With 90% posterior probability, the split time windows appeared at 50–65 ms and 160–175 ms for channel F3, at 170–205 ms for channel PO7, at 160–170 ms for channel Oz, and at 150–190 ms for channel PO8 post-stimulus. These significant split time windows corresponded to the first negative peaks of their target ERP curve estimates. For channel Cz, the split time windows appear at 370–430 ms post-stimulus with 75% posterior probability, which approximately corresponded to the first positive peaks of the target P300 ERP response curve estimates. For channel Pz, the split time windows appeared at 650–700 ms post-stimulus with 75% posterior probability. For channels T7, C4, and T8 close to ears, moderate differences in brain activity between target and non-target stimuli were observed, but no split time window was identified with more than 60% posterior probability. A common gap of split time windows around 150 ms was observed, which corresponded to the time points where target and non-target ERP functions first crossed. For time points when target and non-target ERP functions were merged, fewer points were generally selected by the SMGP prior.

Interpretation

Two common patterns were observed among the results of the ERP estimates. First, the target ERPs of the frontal and central channels (channel names starting with “F” and “C”) shared the negative drop around 100 ms and reached their first peak with the latency around 250 ms, which corresponded to the N100 and P300 pattern described by Rodden and Stemmer in 2008. Second, the target ERPs of parietal-occipital and occipital channels (channel names starting with “PO” and “O”) reached their negative peaks around 200 ms post-stimulus, and they gradually collapsed with non-target ERP functions without reaching a positive peak. Since channels PO7, PO8, and Oz represented the locations of the visual cortex, observing only the negative peaks might be indicative of the pattern of the N2 signal (Folstein and Van Petten, 2008). Several discrepancies were also observed. First, the lengths of the split time windows differed among channels. For example, the central channels and frontal channels had the split time window between the onset of the stimulus and 500 ms post-stimulus and between the onset of the stimulus and 400 ms post-stimulus, respectively. Second, the shapes of ERP functions differed among channels. For example, channels C3, CP3, and P3 had secondary peaks around 400 ms post-stimulus, while target ERP functions of other channels collapsed with the non-target ones without clear secondary peaks. Those secondary peaks might be indicative of the pattern of the P3b signals (van Dinteren et al., 2014).

Channel Ranking and Prediction

According to the 16-channel joint fitting result of the SMGP method and the proposed information criterion $R_{e}^{2}$ in equation (9), the top five selected channels for Participant A were PO7, PO8, Oz, P4, and Cz. We compared the prediction accuracy of our SMGP method to other ML methods for Participant A to evaluate the goodness of our model fit. Table 1 summarizes the cumulative testing prediction accuracy, comparing the SMGP method to other ML methods at seven sequences for the top five selected channels and all 16 channels. The SMGP method achieved 100% accuracy with channels PO8 and PO7, and maintained 100% with more channels included. It performed better than other ML methods. The SMGP method, swLDA and XGBoost performed perfectly when all channels were used.

Table 1.

Cumulative prediction accuracy of Participant A for 19 characters comparing the SMGP method with ζ₀ = 0.4 to other ML methods at seven sequences for the top five selected channels and all 16 channels.

Channels	SMGP	CNN	SVM	Logistic	RF	swLDA	XGBoost
PO8, PO7	1.00	0.89	0.95	0.95	0.95	0.95	0.95
PO8, PO7, Oz	1.00	0.89	1.00	1.00	0.95	1.00	0.95
PO8, PO7, Oz, P4	1.00	0.89	1.00	0.95	1.00	1.00	0.95
PO8, PO7, Oz, P4, Cz	1.00	0.89	1.00	0.95	1.00	1.00	0.95
All Channels	1.00	0.89	0.95	0.95	0.95	1.00	1.00

Open in a new tab

Sensitivity and Reproducibility

We performed the sensitivity analysis for the dataset of Participant A by changing the hyper-parameters of the γ-exponential kernel. We assigned 0.4, 0.5, and 0.6 to the scale parameter s₀ and 1.7,1.8, and 1.9 to the gamma parameter γ₀. We selected channels PO7, PO8, Oz, P4, and Cz for the sensitivity analysis. Figures S3 and S4 showed the P300 ERP function estimates with 95% credible bands and channel-specific significant temporal intervals by different thresholds of median split probabilities for channels Cz and PO8 under nine variations of kernel hyper-parameters. Overall, the combination of s₀ and γ₀ did not affect either ERP function estimates very much. For channel Cz, we observed the split window with the threshold of 0.90 when s₀ and γ₀ were in the middle of the hyper-parameter space. Table S4 shows the prediction accuracy with channels PO8, PO7, Oz, P4, and Cz at seven sequences under nine combinations of kernel hyper-parameters. The analysis suggested that a combination of moderate s₀ and γ₀ produced the best prediction performance for Participant A.

4.4. Cross-Participant Comparison

First, we applied our information criterion to each of the selected ten participants to identify the top 5 channels by the information criterion in equation (9), and selected the ultimate top 5 channels based on the frequency. Then, we identified spatial-temporal patterns of the neural activity based on selected ten participants. Among ten participants, we selected four typical participants to compare the neural activity patterns between participants with ALS and controls as well as between younger and older participants.

We performed two sensitivity analyses on channel ranking with respect to bandpass filters and kernel hyper-parameters. Overall, the channel selection results were robust. For bandpass filters, we always identified channels PO7, PO8, and Oz, followed by channels P4 and Cz. For kernel hyper-parameters, we always identified channels PO7, PO8, and Oz, followed by channels P4 and P3. For common neural patterns, target ERPs of frontal and central channels shared the negative drops between 100ms and 150ms and reached their first positive peaks around 300ms post stimulus. Target ERP functions gradually declined to zero and collapsed with non-target ERP functions between 600ms and 800ms post stimulus. Target ERP functions of parietal-occipital, and occipital channels only reached their negative peaks between 200ms and 250ms post stimulus without reaching further positive peaks.

In comparing the results of Participant E with ALS to the three healthy controls (A, B, and J), Figure 4 showed the ERP function estimates of channels Fz of the four participants. We identified a common positive peak for target ERP functions around 300 ms post stimulus although Participant E had the smallest peak magnitude of 0.6 μV compared to the remaining three above 2.0 μV. Finally, we compared the neural activity patterns of two young participants (A and B, around 25 years old) with two senior participants (E and J, around 60 years old). The split-and-merge time windows (SMTW) of frontal channels appeared significantly different between the young and senior participants. On channel Fz, target ERP functions of all participants showed another negative peak after the first major positive peak. For young participants (A and B), target ERP functions merged with non-target ERP functions after the second negative peak within the 800 ms post-stimulus window; however, for senior participants (E and J), target ERP functions were significantly below non-target ERP functions. One reason is that generally, it takes longer for senior participants to achieve the target P300 response peak (Pavarini et al., 2018). Therefore, for a senior participant, if the ERP response window is set to be longer, target ERP functions may merge with non-target ERP functions after 800 ms.

Fig. 4 — ERP function estimates of target and non-target stimuli with 95% credible bands of Participants A, B, E, and J at channel Fz. Participants A and B were young female healthy controls, while Participants E and J were elderly men, of whom only E was diagnosed with ALS.

5. Simulations

We performed several simulation studies to make statistical inferences and compare the prediction accuracy of our method to other ML methods. To make the simulated data resemble the real data, we assumed the simulated data with an additive signal-and-noise effect. For the signal component, we applied the convolution rule, and designed the ERP functions based on Hoffmann et al. For the noise component, we considered both Gaussian and student-t distributions to mimic different tail distributions with variances close to the real data. We also considered the autoregressive correlation structure to model the temporal association of the background noise. Finally, we considered a scenario where, given true stimulus-type indicators, a subset of target stimuli was randomly selected as non-target ones. This pattern mimicked a situation when participants missed target stimuli due to an attention shift in practical BCI use.

Section 5.1 presents a multi-channel simulation study to examine channel ranking and selection by our information criterion, and to evaluate the SMTW with our inference-based criterion. Section 5.2 presents the single-channel simulation study with different mis-specification scenarios to test the robustness of our analysis.

5.1. Channel Selection and Ranking

Setup

We randomly generated stimulus-occurring indicators and stimulus-type indicators with 19 characters of interest, “THE_QUICK_BROWN_FOX,” including three spaces. To evaluate the performance and the channel ranking, we designed two groups of pre-specified mean response functions (MRFs 1 and 2). MRF 1 had different temporal separation effects, while MRF 2 had channel-specific SNR values (Figure S1). We considered a true generative scenario with two levels of noise variance, i.e., $σ_{x}^{2} \in {20, 40}$ . We simulated the noise assuming a temporal relationship of AR(2) with the parameter ρ_t = (0.5,0) and a spatial dependency relationship of compound symmetry structure with the parameter ρ_s = 0.5. The EEG signals were generated with a response window of length 935 ms, i.e. 30 time points. We performed 100 dataset replications for this scenario. For each dataset, we generated five sequences per character for training and testing.

Model Settings and Diagnostics

All simulated datasets were fitted with equation (3). A feature vector was defined as a 3-dimensional super-sequence matrix with five replications and the channel-specific response window was of length 935 ms, i.e. 30 time points. The covariance kernel κ_α was assumed with a γ-exponential kernel. The length-scale, gamma, and scaling of non-target stimuli were s₀ = 0.5,γ₀ =1.8, and $σ_{0, 0}^{2} = 0.5$ , respectively. For simulation studies with MRF 1, the peak ratios of target to non-target stimuli were all 5; for simulation studies with MRF 2, the peak ratios of target to non-target stimuli were 5, 2, and 1, respectively. We ran the MCMC for 2, 000 iterations with 1, 000 burn-ins. The MCMC convergence was assessed by running three chains with different seeds and initial values. The Gelman-Rubin statistics for the parameters of interest were smaller than 1.1, indicating an approximate convergence for each model fit.

Results

To evaluate the SMTW, we defined two quantities, the inference-based split window ratio (ISWR) and the inference-based merge window ratio (IMWR) as follows:

ISWR (ζ) = \frac{| {t : \hat{ζ} (t) > ζ_{0} & ζ (t) = 1} |}{| {t : ζ (t) = 1} |}, IMWR (ζ) = \frac{| {t : \hat{ζ} (t) \leq ζ_{0} & ζ (t) = 0} |}{| {t : ζ (t) = 0} |} .

Since the swLDA method explicitly performed feature selection, we defined the estimation-based selection window ratio (ESWR) and the estimation-based exclusion window ratio (EEWR) as follows:

E S W R (ζ) = \frac{| {t : \hat{ζ} (t) = 1 & ζ (t) = 1} |}{| {t : ζ (t) = 1} |}, E E W R (ζ) = \frac{| {t : \hat{ζ} (t) = 0 & ζ (t) = 0} |}{| {t : ζ (t) = 0} |} .

Table 2 summarized the channel-specific ISWR, IMWR of the SMGP method and the ESWR, EEWR of the swLDA method, and the cumulative prediction accuracy over the number of testing sequences with $σ_{x}^{2} = 20$ comparing the SMGP method to other ML methods. The ISWR of the SMGP method was close to 100%, which indicated that our method identified relevant temporal features better than the swLDA method. Our method also had the highest and most precise prediction accuracy among all methods. Similar results were obtained when we used $σ_{x}^{2} = 40$ . Plots of ERP function estimates for both $σ_{x}^{2} = 20, 40$ , prediction accuracy, and the SMGP prior evaluation for $σ_{x}^{2} = 40$ were shown in the Supplementary Material. For simulation studies with varying SNR values, the means and standard errors of $R_{e}^{2}$ estimates were 20.52(1.55),9.94(1.07),4.81(0.82) for $σ_{x}^{2} = 20$ , and 10.66(1.05),4.90(0.68),2.48(0.53) for $σ_{x}^{2} = 40$ (values multiplied by 100). The information criterion ranked three channels successfully for all the datasets, indicating that the information criterion worked well.

Table 2.

Upper Panel: Cumulative prediction accuracy for the multi-channel simulation study under the true generative mechanism with $σ_{x}^{2} = 20$ , ρ_t = (0.5, 0), ρ_s = 0.5 comparing the SMGP method to other ML methods. The split threshold of SMGP method was ζ₀ = 0.5. Point estimates and standard errors averaged over 100 datasets were reported. Results of the SMGP method were marked in bold. Overall, the SMGP method had the highest and most precise prediction accuracy. Lower Panel: The ISWR, IMWR of the SMGP method and the ESWR, EEWR of the swLDA method for the multi-channel simulation study under the true generative mechanism with $σ_{x}^{2} = 20$ , ρ_t = (0.5, 0). Channel-specific point estimates and standard errors averaged over 100 datasets were reported.

	Testing Sequences
Methods	3	4	5
SMGP	0.91 (0.07)	0.96 (0.04)	0.99 (0.03)
Neural Network	0.76 (0.10)	0.87 (0.08)	0.92 (0.07)
SVM	0.81 (0.09)	0.89 (0.07)	0.94 (0.06)
Logistic Regression	0.76 (0.08)	0.87 (0.07)	0.91 (0.06)
Random Forest	0.76 (0.10)	0.86 (0.08)	0.92 (0.06)
swLDA	0.85 (0.08)	0.93 (0.06)	0.97 (0.04)
XGBoost	0.67 (0.11)	0.77 (0.09)	0.85 (0.08)
	SMGP		swLDA
Channels	ISWR	IMWR	ESWR	EEWR
1	0.98 (0.03)	0.56 (0.11)	0.32 (0.07)	0.69 (0.08)
2	0.99 (0.03)	0.56 (0.12)	0.32 (0.07)	0.75 (0.09)
3	0.99 (0.02)	0.59 (0.11)	0.26 (0.07)	0.8 (0.09)

Open in a new tab

5.2. Mis-specification Scenarios

Setup

The stimulus-occurring indicators and stimulus-type indicators were generated randomly following the same rule as in Section 5.1. We illustrated the design of the pre-specified mean response functions in Figure 5. For the data generative mechanism, we considered the following five scenarios with the AR(2) temporal correlation parameter ρ_t = (0.5,0) and two levels of the noise variance $σ_{x}^{2} = 10, 20$ . (i). The true generative mechanism scenario simulated the data completely from equation (1). (ii). The mis-specified noise scenario simulated the data from equation (1) with the noise following a Student-t distribution with 5 degrees of freedom. (iii). The scenario of the shorter response window length simulated the data with pre-specified mean response functions of length 780 ms, i.e. 25 time points. (iv). The scenario of the longer response window length simulated the data with pre-specified mean response functions of length 1,090 ms, i.e. 35 time points. (v). The mis-specified signal scenario simulated the data with a disproportionate distribution of target and non-target stimuli. Given true stimulus-type indicators, a subset (10%) of target stimuli was randomly treated as non-target ones by mistake so that it produced the incorrect target P300 ERPs. The replication size, training sequences, and testing sequences were the same as in Section 5.1.

Fig. 5 — The upper and lower panels showed the 95% credible bands of ERP functions to target and non-target stimuli under five simulation scenarios with true parameter $σ_{x}^{2} = 10$ , ρ = (0.5, 0) and $σ_{x}^{2} = 20$ , ρ = (0.5, 0) respectively. The split threshold was ζ₀ = 0.5. The dots and curves were the true curve values. For the true generative scenario, the credible bands covered the entire true curve. For the mis-specified scenarios, the credible bands almost covered the true curves.

Model Settings and Diagnostics

All simulated datasets were fitted with the proposed model with the estimated response window of length 935 ms, i.e. 30 time points. The covariance kernel κ_α was set to an exponential squared kernel. The length-scale, the scaling of target stimuli, and the scaling of non-target stimuli were s₀ = 0.5, $σ_{0, 1}^{2} = 10$ , and $σ_{0, 0}^{2} = 0.5$ , respectively. We ran the MCMC for 2, 000 iterations with 1, 000 burn-ins. The MCMC convergence was assessed by running three chains with different seeds and initial values. The Gelman-Rubin statistics for the parameters of interest were smaller than 1.1, indicating an approximate convergence.

Results

Figure 5 showed the estimated ERP functions for target and non-target stimuli under five scenarios with true parameters $σ_{x}^{2} = 10$ (the upper panel) and $σ_{x}^{2} = 20$ (the lower panel). For the true generative scenario, the credible bands covered the entire true curves. For the mis-specified scenarios, credible bands almost covered the true curves. The posterior distributions of σ_x and ρ concentrated around the true values. Table 3 summarizes the ISWR, IMWR of the SMGP method and the ESWR, EEWR of the swLDA method under five scenarios with $σ_{x}^{2} = 10$ (the upper panel) and $σ_{x}^{2} = 20$ (the lower panel). Both point estimates and standard errors over 100 datasets were computed. In the single-channel setting, both the ISWR and IMWR of our method were higher than the ESWR and EEWR of the swLDA method. This result implied that our method identified time windows better than the swLDA method. We also summarized the cumulative prediction accuracy under five scenarios comparing the SMGP method to other ML methods. The prediction accuracy of the SMGP method among the mis-specified scenarios was consistently higher than the other ML methods, suggesting that our analysis was relatively robust to moderate model mis-specifications.

Table 3.

The detection accuracy of the SMTW of the SMGP and swLDA methods for the single-channel simulation study under five scenarios with $σ_{x}^{2} = 10$ , ρ_t = (0.5, 0) in the upper panel and $σ_{x}^{2} = 20$ , ρ_t = (0.5, 0) in the lower panel. The split threshold of the SMGP method was ζ₀ = 0.5. Point estimates and standard errors averaged over 100 datasets were reported.

$σ_{x}^{2} = 10$	SMGP		swLDA
Scenarios	ISWR	IMWR	ESWR	EEWR
True Generative	0.89 (0.07)	0.96 (0.05)	0.53 (0.08)	0.75 (0.07)
Mis-specified Noise	0.86 (0.07)	0.94 (0.06)	0.48 (0.07)	0.78 (0.06)
Shorter Window	0.91 (0.07)	0.96 (0.04)	0.64 (0.07)	0.79 (0.07)
Longer Window	0.86 (0.06)	0.96 (0.06)	0.46 (0.07)	0.72 (0.09)
Mis-specified Signal	0.86 (0.07)	0.96 (0.04)	0.49 (0.07)	0.76 (0.07)
$σ_{x}^{2} = 20$	SMGP		swLDA
Scenarios	ISWR	IMWR	ESWR	EEWR
True Generative	0.86 (0.07)	0.94 (0.07)	0.47 (0.07)	0.79 (0.08)
Mis-specified Noise	0.82 (0.08)	0.91 (0.08)	0.41 (0.07)	0.81 (0.07)
Shorter Window	0.88 (0.08)	0.95 (0.05)	0.55 (0.08)	0.84 (0.06)
Longer Window	0.82 (0.07)	0.93 (0.08)	0.41 (0.08)	0.77 (0.08)
Mis-specified Signal	0.83 (0.08)	0.94 (0.07)	0.43 (0.07)	0.81 (0.07)

Open in a new tab

6. Discussion

We have applied a new Bayesian generative framework to model the conditional distribution of multi-sequence EEG signals from real participants under the P300 ERP design. Our Bayesian analysis explored the mechanism of brain activity in response to external stimuli by directly considering the overlapping ERPs between adjacent stimuli without signal concatenation and segmentation. We developed a new GP-based prior to identify the spatial-temporally activated intervals with the split-and-merge GP (SMGP) prior. We proposed an information criterion for channel ranking and confirmed it with existing literature.

We made fully posterior inferences on participant-and-channel specific P300 ERPs with the SMGP prior given a fixed EEG response window. Although past studies by (D’Avanzo et al., 2011) and (Mowla et al., 2018) have developed Bayesian and frequentist filtering methods to estimate amplitude and latency of P300 ERP responses, their results were based on single-trial (sequence) EEG signals, and both methods discarded the spatial dependence among channels. Our SMGP method handles multi-channel, multi-sequence, overlapping EEG signals, produces mean P300 ERP estimates with 95% credible bands, and achieves comparable prediction accuracy. When we compare the ERP function estimates of channel Pz for the three methods, they share a small negative drop in amplitude around 100 ms post-stimulus, followed by a major positive peak between 200 ms and 450 ms post-stimulus. Then, the ERP function estimates gradually decline to zero. The identification of channel-specific SMTW provides statistical evidence for the scientific findings of P300 ERP responses.

In terms of channel ranking and selection, the 2015 study by McCann et al. pointed out that the difference in P300 ERP-BCI communication efficiency was subtle with five or more channels. Both studies performed channel ranking and selection from the same cohort of participants. They identified Cz, Pz, PO7, PO8, and Oz as the top selected channels, which overlapped with our identification of PO7, PO8, Oz, and Cz. These shared selection results provide statistical evidence for spatial distributions of P300 ERP responses. In particular, the finding that channels PO8, PO7, and Oz appear the most frequently supports the finding that the performance of a P300 speller is associated with eye gaze (Brunner et al., 2010). Finally, the participant-specific channel selection helps establish user-specific profiles for efficient brain-computer communications. Thus, we can incorporate user-specific channel selection to design the EEG cap, which increases the implementation speed.

Potential future directions would improve our work. First, we could modify the stimulus presentation paradigm from the current RCP design to the checkerboard design (Townsend et al., 2010). The checkerboard design avoids the refractory effect (Martens et al., 2009) in the RCP design, where participants might miss or fail to produce the second regular P300 ERP response when two target stimuli are too close. Second, we could measure the participant-specific brain connectivity under the no-control (NoC) condition to specify the prior spatial covariance matrix. For Participant A, we could assume a multi-block compound symmetry structure to estimate within-block, intra-block correlation parameters, and the scalar parameter σ². Third, it is also of interest to adjust the potential confounders in the model for single participant analysis, which may include preferences over certain characters to type and the duration of BCI use. This analysis requires a new study design to collect data on those information. In addition, we could develop the framework of a multi-subject analysis to incorporate the age effect by modifying the priors.

Overall, the proposed generative modeling approach performs innovative statistical inferences on brain activity and provides a promising platform to develop the simulation study framework to test other online P300 ERP-BCI study designs. The Bayesian framework also incorporates prior information such as character-to-character relationships to increase the spelling speed.

Supplementary Material

NIHMS1798425-supplement-Supplementary_Material.zip^{(49.1MB, zip)}

Supplementary Figures and Tables

NIHMS1798425-supplement-Supplementary_Figures_and_Tables.pdf^{(6.4MB, pdf)}

References

Brunner P, Joshi S, Briskin S, Wolpaw JR, Bischof H, and Schalk G (2010). Does the “P300”speller depend on eye gaze? Journal of Neural Engineering, 7(5):056013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cai Q, Kang J, and Yu T (2020). Bayesian Network Marker Selection via the Thresholded Graph Laplacian Gaussian Prior. Bayesian Analysis, 15(1):79. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cecotti H and Graser A (2010). Convolutional Neural Networks for P300 Detection with Application to Brain-computer Interfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(3):433–445. [DOI] [PubMed] [Google Scholar]
Dawid AP (1981). Some Matrix-variate Distribution Theory: Notational Considerations and a Bayesian Application. Biometrika, 68(1):265–274. [Google Scholar]
Donchin E, Spencer KM, and Wijesinghe R (2000). The Mental Prosthesis: Assessing the Speed of a P300-based Brain-computer Interface. IEEE Transactions on Rehabilitation Engineering, 8(2):174–179. [DOI] [PubMed] [Google Scholar]
D’Avanzo C, Schiff S, Amodio P, and Sparacino G (2011). A Bayesian Method to Estimate Single-trial Event-related Potentials with Application to the Study of the P300 Variability. Journal of Neuroscience Methods, 198(1):114–124. [DOI] [PubMed] [Google Scholar]
Farwell LA and Donchin E (1988). Talking off the Top of Your Head: Toward a Mental Prosthesis Utilizing Event-related Brain Potentials. Electroencephalography and Clinical Neurophysiology, 70(6):510–523. [DOI] [PubMed] [Google Scholar]
Folstein JR and Van Petten C (2008). Influence of Cognitive Control and Mismatch on the N2 Component of the ERP: A Review. Psychophysiology, 45(1):152–170. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gelman A and Rubin DB (1992). Inference from Iterative Simulation using Multiple Sequences. Statistical Science, 7(4):457–472. [Google Scholar]
Gilks WR and Wild P (1992). Adaptive Rejection Sampling for Gibbs Sampling. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(2):337–348. [Google Scholar]
Hoffmann U, Vesin J-M, Ebrahimi T, and Diserens K (2008). An Efficient P300-based Brain-computer Interface for Disabled Subjects. Journal of Neuroscience Methods, 167(1):115–125. [DOI] [PubMed] [Google Scholar]
Jasper HH (1958). The Ten-twenty Electrode System of the International Federation. Electroencephalography and Clinical Neurophysiology, 10:370–375. [PubMed] [Google Scholar]
Kang J, Reich BJ, and Staicu A-M (2018). Scalar-on-image Regression via the Soft-thresholded Gaussian Process. Biometrika, 105(1):165–184. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kaper M, Meinicke P, Grossekathoefer U, Lingner T, and Ritter H (2004). BCI Competition 2003-data Set IIb: Support Vector Machines for the P300 Speller Paradigm. IEEE Transactions on Biomedical Engineering, 51(6):1073–1076. [DOI] [PubMed] [Google Scholar]
Kass RE and Raftery AE (1995). Bayes Factors. Journal of the American Statistical Association, 90(430):773–795. [Google Scholar]
Krusienski DJ, Sellers EW, McFarland DJ, Vaughan TM, and Wolpaw JR (2008). Toward Enhanced P300 Speller Performance. Journal of Neuroscience Methods, 167(1):15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
Leoni J, Strada SC, Tanelli M, Jiang K, Brusa A, and Proverbio AM (2021). Automatic Stimuli Classification from ERP Data for Augmented Communication via Brain-Computer Interfaces. Expert Systems with Applications, page 115572. [Google Scholar]
Li Y and Ghosh SK (2015). Efficient Sampling Methods for Truncated Multivariate Normal and Student-t Distributions Subject to Linear Inequality Constraints. Journal of Statistical Theory and Practice, 9(4):712–732. [Google Scholar]
Martens S, Hill N, Farquhar J, and Schölkopf B (2009). Overlap and Refractory Effects in a Brain-computer Interface Speller Based on the Visual P300 Event-related Potential. Journal of Neural Engineering, 6(2):026003. [DOI] [PubMed] [Google Scholar]
McCann MT, Thompson DE, Syed ZH, and Huggins JE (2015). Electrode Subset Selection Methods for an EEG-based P300 Brain-computer Interface. Disability and Rehabilitation: Assistive Technology, 10(3):216–220. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mowla MR, Huggins JE, Natarajan B, and Thompson DE (2018). P300 Latency Estimation Using Least Mean Squares Filter. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 1976–1979. IEEE. [DOI] [PubMed] [Google Scholar]
Okumuş H and Aydem ir Ö (2017). Random Forest Classification for Brain Computer Interface Applications. In 2017 25th Signal Processing and Communications Applications Conference (SIU), pages 1–4. IEEE. [Google Scholar]
Pavarini SCI, Brigola AG, Luchesi BM, Souza ÉN, Rossetti ES, Fraga FJ, Guarisco LPC, Terassi M, Oliveira NA, Hortense P, et al. (2018). On the Use of the P300 as a Tool for Cognitive Processing Assessment in Healthy Aging: A Review. Dementia & neuropsychologia, 12(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
Polich J, Howard L, and Starr A (1985). Effects of age on the p300 component of the event-related potential from auditory stimuli: peak definition, variation, and measurement. Journal of Gerontology, 40(6):721–726. [DOI] [PubMed] [Google Scholar]
Qiao X, Guo S, and James GM (2019). Functional Graphical Models. Journal of the American Statistical Association, 114(525):211–222. [Google Scholar]
Rasmussen CE (2003). Gaussian Processes in Machine Learning. In Summer school on machine learning, pages 63–71. Springer. [Google Scholar]
Rodden FA and Stemmer B (2008). A Brief Introduction to Common Neuroimaging Techniques. In Handbook of the Neuroscience of Language, pages 57–67. Elsevier. [Google Scholar]
Schalk G, McFarland DJ, Hinterberger T, Birbaumer N, and Wolpaw JR (2004). BCI2000: A General-purpose Brain-computer Interface (BCI) System. IEEE Transactions on Biomedical Engineering, 51(6):1034–1043. [DOI] [PubMed] [Google Scholar]
Thompson DE, Gruis KL, and Huggins JE (2014). A Plug-and-play Brain-computer Interface to Operate Commercial Assistive Technology. Disability and Rehabilitation: Assistive Technology, 9(2):144–150. [DOI] [PMC free article] [PubMed] [Google Scholar]
Townsend G, LaPallo BK, Boulay CB, Krusienski DJ, Frye G, Hauser C, Schwartz NE, Vaughan TM, Wolpaw JR, and Sellers EW (2010). A Novel P300-based Brain-computer Interface Stimulus Presentation Paradigm: Moving beyond Rows and Columns. Clinical Neurophysiology, 121(7):1109–1120. [DOI] [PMC free article] [PubMed] [Google Scholar]
van Dinteren R, Arns M, Jongsma ML, and Kessels RP (2014). P300 Development across the Lifespan: A Systematic Review and Meta-analysis. PloS one, 9(2):e87347. [DOI] [PMC free article] [PubMed] [Google Scholar]
Viana SS, Batista DM, and Melges DB (2014). Logistic Regression Models: Feature Selection for P300 Detection Improvement. In XXIV Brazilian Congress on Biomedical Engineering–CBEB, volume 2014, pages 979–982. [Google Scholar]
Wolpaw JR, Bedlack RS, Reda DJ, Ringer RJ, Banks PG, Vaughan TM, Heckman SM, McCane LM, Carmack CS, Winden S, et al. (2018). Independent Home Use of a Brain-computer Interface by People with Amyotrophic Lateral Sclerosis. Neurology, 91(3):e258–e267. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu N, Gao X, Hong B, Miao X, Gao S, and Yang F (2004). BCI Competition 2003-data Set IIb: Enhancing P300 Wave Detection Using ICA-based Subspace Projections for BCI Applications. IEEE Transactions on Biomedical Engineering, 51(6):1067–1072. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

NIHMS1798425-supplement-Supplementary_Material.zip^{(49.1MB, zip)}

Supplementary Figures and Tables

NIHMS1798425-supplement-Supplementary_Figures_and_Tables.pdf^{(6.4MB, pdf)}

[R1] Brunner P, Joshi S, Briskin S, Wolpaw JR, Bischof H, and Schalk G (2010). Does the “P300”speller depend on eye gaze? Journal of Neural Engineering, 7(5):056013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Cai Q, Kang J, and Yu T (2020). Bayesian Network Marker Selection via the Thresholded Graph Laplacian Gaussian Prior. Bayesian Analysis, 15(1):79. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Cecotti H and Graser A (2010). Convolutional Neural Networks for P300 Detection with Application to Brain-computer Interfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(3):433–445. [DOI] [PubMed] [Google Scholar]

[R4] Dawid AP (1981). Some Matrix-variate Distribution Theory: Notational Considerations and a Bayesian Application. Biometrika, 68(1):265–274. [Google Scholar]

[R5] Donchin E, Spencer KM, and Wijesinghe R (2000). The Mental Prosthesis: Assessing the Speed of a P300-based Brain-computer Interface. IEEE Transactions on Rehabilitation Engineering, 8(2):174–179. [DOI] [PubMed] [Google Scholar]

[R6] D’Avanzo C, Schiff S, Amodio P, and Sparacino G (2011). A Bayesian Method to Estimate Single-trial Event-related Potentials with Application to the Study of the P300 Variability. Journal of Neuroscience Methods, 198(1):114–124. [DOI] [PubMed] [Google Scholar]

[R7] Farwell LA and Donchin E (1988). Talking off the Top of Your Head: Toward a Mental Prosthesis Utilizing Event-related Brain Potentials. Electroencephalography and Clinical Neurophysiology, 70(6):510–523. [DOI] [PubMed] [Google Scholar]

[R8] Folstein JR and Van Petten C (2008). Influence of Cognitive Control and Mismatch on the N2 Component of the ERP: A Review. Psychophysiology, 45(1):152–170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Gelman A and Rubin DB (1992). Inference from Iterative Simulation using Multiple Sequences. Statistical Science, 7(4):457–472. [Google Scholar]

[R10] Gilks WR and Wild P (1992). Adaptive Rejection Sampling for Gibbs Sampling. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(2):337–348. [Google Scholar]

[R11] Hoffmann U, Vesin J-M, Ebrahimi T, and Diserens K (2008). An Efficient P300-based Brain-computer Interface for Disabled Subjects. Journal of Neuroscience Methods, 167(1):115–125. [DOI] [PubMed] [Google Scholar]

[R12] Jasper HH (1958). The Ten-twenty Electrode System of the International Federation. Electroencephalography and Clinical Neurophysiology, 10:370–375. [PubMed] [Google Scholar]

[R13] Kang J, Reich BJ, and Staicu A-M (2018). Scalar-on-image Regression via the Soft-thresholded Gaussian Process. Biometrika, 105(1):165–184. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Kaper M, Meinicke P, Grossekathoefer U, Lingner T, and Ritter H (2004). BCI Competition 2003-data Set IIb: Support Vector Machines for the P300 Speller Paradigm. IEEE Transactions on Biomedical Engineering, 51(6):1073–1076. [DOI] [PubMed] [Google Scholar]

[R15] Kass RE and Raftery AE (1995). Bayes Factors. Journal of the American Statistical Association, 90(430):773–795. [Google Scholar]

[R16] Krusienski DJ, Sellers EW, McFarland DJ, Vaughan TM, and Wolpaw JR (2008). Toward Enhanced P300 Speller Performance. Journal of Neuroscience Methods, 167(1):15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Leoni J, Strada SC, Tanelli M, Jiang K, Brusa A, and Proverbio AM (2021). Automatic Stimuli Classification from ERP Data for Augmented Communication via Brain-Computer Interfaces. Expert Systems with Applications, page 115572. [Google Scholar]

[R18] Li Y and Ghosh SK (2015). Efficient Sampling Methods for Truncated Multivariate Normal and Student-t Distributions Subject to Linear Inequality Constraints. Journal of Statistical Theory and Practice, 9(4):712–732. [Google Scholar]

[R19] Martens S, Hill N, Farquhar J, and Schölkopf B (2009). Overlap and Refractory Effects in a Brain-computer Interface Speller Based on the Visual P300 Event-related Potential. Journal of Neural Engineering, 6(2):026003. [DOI] [PubMed] [Google Scholar]

[R20] McCann MT, Thompson DE, Syed ZH, and Huggins JE (2015). Electrode Subset Selection Methods for an EEG-based P300 Brain-computer Interface. Disability and Rehabilitation: Assistive Technology, 10(3):216–220. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Mowla MR, Huggins JE, Natarajan B, and Thompson DE (2018). P300 Latency Estimation Using Least Mean Squares Filter. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 1976–1979. IEEE. [DOI] [PubMed] [Google Scholar]

[R22] Okumuş H and Aydem ir Ö (2017). Random Forest Classification for Brain Computer Interface Applications. In 2017 25th Signal Processing and Communications Applications Conference (SIU), pages 1–4. IEEE. [Google Scholar]

[R23] Pavarini SCI, Brigola AG, Luchesi BM, Souza ÉN, Rossetti ES, Fraga FJ, Guarisco LPC, Terassi M, Oliveira NA, Hortense P, et al. (2018). On the Use of the P300 as a Tool for Cognitive Processing Assessment in Healthy Aging: A Review. Dementia & neuropsychologia, 12(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Polich J, Howard L, and Starr A (1985). Effects of age on the p300 component of the event-related potential from auditory stimuli: peak definition, variation, and measurement. Journal of Gerontology, 40(6):721–726. [DOI] [PubMed] [Google Scholar]

[R25] Qiao X, Guo S, and James GM (2019). Functional Graphical Models. Journal of the American Statistical Association, 114(525):211–222. [Google Scholar]

[R26] Rasmussen CE (2003). Gaussian Processes in Machine Learning. In Summer school on machine learning, pages 63–71. Springer. [Google Scholar]

[R27] Rodden FA and Stemmer B (2008). A Brief Introduction to Common Neuroimaging Techniques. In Handbook of the Neuroscience of Language, pages 57–67. Elsevier. [Google Scholar]

[R28] Schalk G, McFarland DJ, Hinterberger T, Birbaumer N, and Wolpaw JR (2004). BCI2000: A General-purpose Brain-computer Interface (BCI) System. IEEE Transactions on Biomedical Engineering, 51(6):1034–1043. [DOI] [PubMed] [Google Scholar]

[R29] Thompson DE, Gruis KL, and Huggins JE (2014). A Plug-and-play Brain-computer Interface to Operate Commercial Assistive Technology. Disability and Rehabilitation: Assistive Technology, 9(2):144–150. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] Townsend G, LaPallo BK, Boulay CB, Krusienski DJ, Frye G, Hauser C, Schwartz NE, Vaughan TM, Wolpaw JR, and Sellers EW (2010). A Novel P300-based Brain-computer Interface Stimulus Presentation Paradigm: Moving beyond Rows and Columns. Clinical Neurophysiology, 121(7):1109–1120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] van Dinteren R, Arns M, Jongsma ML, and Kessels RP (2014). P300 Development across the Lifespan: A Systematic Review and Meta-analysis. PloS one, 9(2):e87347. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] Viana SS, Batista DM, and Melges DB (2014). Logistic Regression Models: Feature Selection for P300 Detection Improvement. In XXIV Brazilian Congress on Biomedical Engineering–CBEB, volume 2014, pages 979–982. [Google Scholar]

[R33] Wolpaw JR, Bedlack RS, Reda DJ, Ringer RJ, Banks PG, Vaughan TM, Heckman SM, McCane LM, Carmack CS, Winden S, et al. (2018). Independent Home Use of a Brain-computer Interface by People with Amyotrophic Lateral Sclerosis. Neurology, 91(3):e258–e267. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Xu N, Gao X, Hong B, Miao X, Gao S, and Yang F (2004). BCI Competition 2003-data Set IIb: Enhancing P300 Wave Detection Using ICA-based Subspace Projections for BCI Applications. IEEE Transactions on Biomedical Engineering, 51(6):1067–1072. [DOI] [PubMed] [Google Scholar]

PERMALINK

Bayesian Inferences on Neural Activity in EEG-Based Brain-Computer Interface

Tianwen Ma

Yang Li

Jane E Huggins

Ji Zhu

Jian Kang

Abstract

1. Introduction

1.1. Background

Fig. 1.

1.2. Conventional Framework with Motivating Dataset

1.3. Our Contributions

2. Bayesian Modeling of EEG-BCI Data

2.1. Notation and Problem Setup

Fig. 2.

2.2. A Bayesian Generative Model

2.3. The Split-and-Merge GP

3. Posterior Inference

3.1. Model Representation and Prior Specification

3.2. Markov Chain Monte Carlo

3.3. Posterior Predictive Probability for Character Classification

4. Analysis of EEG-BCI Data

4.1. Dataset and Pre-processing

4.2. Model Settings

4.3. Single-Participant Results

ERP Estimates

Fig. 3.

Split Windows

Interpretation

Channel Ranking and Prediction

Table 1.

Sensitivity and Reproducibility

4.4. Cross-Participant Comparison

Fig. 4.

5. Simulations

5.1. Channel Selection and Ranking

Setup

Model Settings and Diagnostics

Results

Table 2.

5.2. Mis-specification Scenarios

Setup

Fig. 5.

Model Settings and Diagnostics

Results

Table 3.

6. Discussion

Supplementary Material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases