Abstract
A class of brain computer interfaces (BCIs) employs noninvasive recordings of electroencephalography (EEG) signals to enable users with severe speech and motor impairments to interact with their environment and social network. For example, EEG based BCIs for typing popularly utilize event related potentials (ERPs) for inference. Presentation paradigm design in current ERP-based letter by letter typing BCIs typically query the user with an arbitrary subset characters. However, the typing accuracy and also typing speed can potentially be enhanced with more informed subset selection and flash assignment. In this manuscript, we introduce the active recursive Bayesian state estimation (active-RBSE) framework for inference and sequence optimization. Prior to presentation in each iteration, rather than showing a subset of randomly selected characters, the developed framework optimally selects a subset based on a query function. Selected queries are made adaptively specialized for users during each intent detection. Through a simulation-based study, we assess the effect of active-RBSE on the performance of a language-model assisted typing BCI in terms of typing speed and accuracy. To provide a baseline for comparison, we also utilize standard presentation paradigms namely, row and column matrix presentation paradigm and also random rapid serial visual presentation paradigms. The results show that utilization of active-RBSE can enhance the online performance of the system, both in terms of typing accuracy and speed. Moreover, we conduct real time experiments with human participants to study the human-in-the-loop effect on the performance of the proposed active-RBSE framework and consistent with the simulation results, the results of these experiments show improvement both in typing speed and accuracy.
Keywords: Brain computer interface, Matrix Speller, RSVP Keyboard™, Event Related Potential, P300, Active Learning, Recursive Bayesian State Estimation
I. Introduction
Noninvasive electroencephalography (EEG) based brain computer interfaces (BCIs) have shown promising results for establishing a safe alternative channel between the people with severe speech/muscle impairment and their environment [1]. BCIs can be used for different applications such as communication, environment control and wheelchair navigation [1], [2]. For communication, letter by letter typing BCIs have been the subject of extensive research and development in the field [1], [2], [3], [4], [5]. For instance, P300-matrix speller, which was first introduced by Donchin and Farewell, typically uses a letter by letter typing BCI [3]. Even though the matrix-based presentation scheme is very commonly used and various paradigms for this scheme were developed to improve typing speed and accuracy [6], [7], [8], [9], [10], [11], [12], [13], it has been shown the matrix-based presentation paradigms are highly gaze dependent and they cannot operate well for the population with covert attention [14]. As a minimally gaze dependent alternative, rapid serial visual presentation (RSVP) paradigm has been offered, in which all symbols are presented in a pseudo-random order at a predefined location of the screen in a rapid serial manner [15], [16], [5], [17], [18]. Performance comparison of different users on typing with row and column presentation (RCP), single character presentation (SCP), and RSVP paradigms, has demonstrated that, at least for healthy users, the best presentation paradigm should be defined separately for each individual [19].
We have developed a noninvasive EEG-based typing BCI that enables the user to choose among different matrix-based presentation paradigms and the RSVP paradigm [19]. This system detects the user intent using recursive Bayesian state estimation (RBSE) in which the state represents the user target symbol. To improve the detection performance, we have incorporated a 6-gram language model that provides context priors to be probabilistically fused with the EEG likelihoods. Our earlier studies demonstrated a great benefit in using the language model both in terms of typing speed and accuracy [19], [5]. In our system, a user is presented with a sequence of symbols and an event related potential in response to the desired character, in the recorded EEG, can indicate the user intent. To achieve a confident decision, our system might query the user with one or multiple sequences until a predefined confidence threshold is attained or an upper-bound on the number of sequences is reached [18], [19], [20]. In earlier versions of the system, each sequence contained the entire set of the symbols. However, experimentally, we observed that this method is very inefficient and adaptive subset selection methods are necessary to further improve the typing speed.
Here, we describe a sequence design strategy using the active learning concept. Active learning will be built upon RBSE for dynamic and time effective sequence optimization. The proposed query design and inference mechanism is denoted as the active-RBSE framework. In this framework, we develop query strategies to actively choose a subset of stimuli for every presentation sequence. Specifically, in this manuscript, we propose to use a query function that employs the observed EEG evidence along with context information. According to this function, queries are adaptively specialized for the user, at every iteration, during each target detection step. We show that this query function is a modular and monotonic set function and accordingly the optimal solution for this function can be achieved with guaranteed convergence. We utilized real EEG data, to run Monte-Carlo simulations of the system when active-RBSE framework was utilized. The results show that, this query function along with efficient optimization can improve the online typing performance in terms of speed and accuracy for both matrix-based presentation and RSVP paradigms.
II. General BCI system specifications
In a previous study, we have shown that popular matrix-based presentation paradigms (namely 1. row and column presentation (RCP) paradigm, 2. single character presentation (SCP) paradigm), and rapid serial visual presentation (RSVP) paradigm in general provide comparable spelling performances and the optimal presentation paradigm is different for each user [19].
We assume that the user intent is to select a symbol (or character) x from a finite vocabulary (or alphabet) set . In a typical typing scenario we can define where < represents the backspace for error correction and _ is the space symbol. In a matrix presentation paradigm, we flash a subset of the vocabulary in a “trial”. For example, in SCP the trials are singletons, |A| = 1. Accordingly, we can assess the EEG response to that query e(A) for detection. A set of trials Φ = {A1, A2, · · ·, Α|Φ|} which is presented to the user in a rapid serial manner is called a “sequence”. After every sequence the system attempts to infer the target character intended by the user, but, due to low signal to noise ratio (SNR) of EEG usually the system need to query the user with more sequences to reach a decision with a predefined confidence level. On the other hand, in order to limit the time spent on typing each character the system will make a decision, regardless of the required confidence threshold, after a predefined maximum number of sequences. A set of sequences that lead to a decision is called an “epoch”.
A. Probabilistic Graphical Model
The proposed probabilistic graphical model (PGM) is illustrated in Fig 1. In this figure, xk represents the system state at epoch k, is the set of trials selected (i.e. query set) to be flashed in the sth sequence, is the number of trials (i.e. the size of query set), Ck is the context information, which, in a typing scenario, can be provided by a language model, and is the EEG observation for the jth trial in sequence s. In our model the label for trial is a deterministic function of xk, i.e. , then if or otherwise. Finally, the maximum number of sequences allowed in an epoch is denoted as ms.
Figure 1.
Proposed probabilistic graphical model representing the kth epoch. Here, the dashed lines show a deterministic relation, the solid lines define a probabilistic correspondence, and red rectangles represent conditional independence.
Active-RBSE is built upon this graphical model. Within this framework, a generic AL and maximum a posteriori (MAP) inference loop will iterate by alternating between the following query (Q) and inference (I) steps:
| (1) |
| (2) |
Here, is a potential query set restricted to the set of feasible queries at time k, , which is a subset of all possible queries, , the power set of . The quality of a query from the perspective of AL is measured by the set function g. Note that we focus on MAP inference in this manuscript. Next, we describe how to obtain the MAP the inference and then in Section III, we will explain the query strategy.
1). Recursive Bayesian State Estimation and Intent Inference:
We employ the maximum a-posteriori (MAP) inference mechanism to detect user intent, using the Bayes rule “posterior ∝ prior × likelihood” to estimate the posterior probability mass function (PMF) over the (discrete-valued –in this application–) state space. For letter by letter typing, we use a symbol n-gram language model (LM) to estimate the prior PMF over the character set, and the likelihood is obtained from EEG observations. Given the system state we assume, EEG measurements for sequences are independent from each others (see Fig 1). Based on this assumption we can update the posterior PMF recursively, after every sequence.
The prior PMF from the LM along with the EEG observation likelihood is used in our system to calculate the posterior PMF over the vocabulary set after each sequence. Assume 1 ≤ s ≤ ms sequences have been shown to the user and define , where . Similarly, take then define . The MAP framework estimates the user intent by solving the following optimization problem:
| (3) |
The posterior probability defined in (3) can be factorized in terms of likelihood and context prior using the assumptions imposed in Figure 1.
| (4) |
But, for a given xk the for is deterministically defined. Hence, according to the conditional independence of and context information defined in PGM we obtain;
| (5) |
Then we can rewrite (3) as
| (6) |
In the next two subsections, we will describe how to obtain the context prior probabilities P(xk = x| C) and the class conditional likelihoods and .
2). Symbol n-gram Language Model for Context-based Prior Estimation:
We utilize an n-gram language model (LM) to estimate the prior probabilities over the vocabulary. The n-gram LM used in our system is essentially a Markov model of order n – 1 which estimates a PMF over the state space for upcoming character, given n – 1 previously typed letters. Let us define as the order set of n – 1 preceding typed characters where, represents the character at lag l. Then the probability of current character can be defined as follows:
| (7) |
The LM used in our system, is trained on the NY Times portion of English Gigaword corpus [21]. This LM has shown to enhance the BCI typing performance in terms of typing accuracy and speed [18], [17].
3). Class Conditional Distributions for EEG Observations:
To estimate the class conditional distributions we collect labeled data in a “calibration session”. Typically, during a calibration session the user is presented with 100 sequences. Prior to each sequence the user is asked to focus on a predefined target character during that sequence.
We acquire EEG from 16 channels. To improve the SNR we apply a bandpass filter with the pass band of [1.5, 42] Hz and a notch filter at 60 Hz to further attenuate the line noise. According to the pass band of the filter and data acquisition sampling rate (256 Hz), we down-sample the signal by a factor of 2 to eliminate non-informative time samples while avoiding aliasing. The preprocessed EEG in a time window of [0, 500)ms from the onset of each stimulus is assigned to that trial as its EEG measurement at each channel. Subsequently, we eliminate directions with zero or negligible variances by applying principle component analysis (PCA) on these EEG measurements and removing every direction at Eigen values smaller than 1e – 5 times of the largest one, at each channel separately. Finally, we concatenate the measurements from every channel to form the EEG feature vector for jth trial in sequence s in the current epoch (omitting the epoch index for simplicity of notation).
EEG is assumed to be a Gaussian process [22], [23], [19], hence, we could utilize quadratic discriminant analysis (QDA) to project the EEG feature vector onto a one dimensional space with minimum expected classification risk. But QDA requires a full rank class conditional covariance matrix estimation while for a typical setup of our system this is not feasible because the feature vector dimensionality is relatively higher than number of observations at each class. Instead, we utilize regularized discriminant analysis (RDA) which applies regularization and shrinkage on estimated class conditional covariance matrices and makes them invertible [24].
Assume fi is an m dimensional feature vector and yi ∈ {0, 1} is the binary label for fi then the maximum likelihood estimator for class conditional covariance matrix for class k ∈ {0, 1} is;
| (8) |
for which δ(.,.) represents the Kronecker-δ function and Nk is the number of observations in class k. Then the regularization and shrinkage steps are applied as follows:
| (9) |
where λ, γ ∈ [0, 1] are the shrinkage and regularization parameters, tr[·] is the trace operator and Im is an identity matrix of size m × m. To optimize the values of λ and γ, in our system we apply 10-fold cross validation to maximize the area under the receiver operating characteristics (ROC) curve (AUC). In our system the “EEG evidence” for trial query is computed as follows.
| (10) |
Here, is the EEG feature vector for trial query , is the Gaussian probability density function with mean μ and covariance Σ, and λ, γ ∈ [0, 1] are the shrinkage and regularization parameters. To optimize the values of λ and γ, in our system we apply 10-fold cross validation to maximize the area under the receiver operating characteristics (ROC) curve (AUC). To obtain the class conditional probability distributions and over the EEG evidence we further apply kernel density estimation (KDE) with a Gaussian kernel of a bandwidth which is computed using a Silverman rule of thumb [25].
III. Query set Optimization Using Active Learning
We propose that optimizing the query set using the prior information from the language model and the EEG signal in response to earlier sequences in a specific epoch can improve the typing performance for that epoch. In our earlier studies we have shown that for the RSVP paradigm it is inefficient to present the whole or randomly selected subsets of vocabulary at every sequence [20]. Here, we use active query selection inspired by the active learning concept to define a combinatorial optimization problem, which exploits previously acquired information to appoint the query set elements in a timely manner.
A. Objective Function
We will consider a specific selection of g(.) to be used in active-RBSE framework, as specified in (1). Let us hypothesize that we know the actual user intent for current epoch (i.e. epoch k). Our objective is to define an optimal query set for sequence s + 1, while s sequences have been already queried for inference of but the required confidence threshold is not attained yet. Then, before presenting the (s + 1)st sequence, we can obtain a prediction of posterior PMF for a given as follows. We define , as:
| (11) |
where represents the prior probability of before observing sequence s + 1. Moving from the third line to the fourth of (11), we use the following
| (12) |
in which the denominator is the normalization constant.
Note that, computes the posterior probability of hypothesized target for a particular given previously observed EEG and context. But during the current epoch, x is yet to be estimated; and hence, we eliminate the dependency on this unobserved random variable by computing the expected value of with respect to the most recent estimate of state space posterior PMF, Πs+1(x).
Accordingly, the objective function for query set selection is then defined as follows.
| (13) |
B. The Solution of the Optimization Problem
In equation (11), for fixed x and , the argument inside the expectation is only a function of . Accordingly, we define , where
and use (6), to specify such that,
| (14) |
To simplify the optimization process –the reason for the simplification is described later in this section– we approximate using Taylor series expansion and the function defined in (14):
| (15) |
where . We propose to substitute the original function with its locally suboptimal linear approximation around the μσ. The proposed linear approximation is widely used in the field of signal processing [26], especially when higher order central moments of the distribution are negligible. Typically, in our system, the estimated class conditional distributions are unimodal with small variance, hence we assume this approximation is justifiable. Accordingly, we have:
| (16) |
Since we define , the second term in equation (16) is equal to zero. The next step is to compute . Recall that according to the proposed graphical model in Figure 1, for a given the EEG evidence for different symbols, , are independent from each other. Hence, is independent from such that is evaluated at samples from the following distributions:
Hence, by defining , such that
| (17) |
we can estimate 1 as follows:
| (18) |
in which,
where . We approximate with and rewrite the optimization problem as follows:
| (19) |
Here, since logarithm is a monotonically increasing function, taking the logarithm of the cost function does not change the solution. Then, to solve the optimization problem defined in (19), we define a lower-bound for the objective function using Jensen’s inequality as follows:
| (20) |
In this application, the class-conditional distributions are typically unimodal with small variances and different mean values; accordingly, we assume, and . Furthermore, in our system we introduce an upper bound to limit the number of trials in a sequence and provide a practical presentation paradigm. Then we get,
As a result . Hence,
| (21) |
Note that in (21), the only term that is a function of is ; therefore, from (19),
| (22) |
C. Combinatorial Optimization
The approximated objective function defined in (22) is a modular and monotonic set function ; therefore its computationally efficient global optimization is guaranteed [27]. For definitions and proof please see appendix A.
It has previously been shown that a deterministic greedy algorithm can provide a good approximation of the optimal solution for an NP-hard optimization problem with submodular and monotone objective functions within a guaranteed bound relative to the optimal solution. Moreover, the greedy algorithm attains the optimal solution when the objective is a modular monotone set function [27].
For a fixed number of trials, Nt, in each sequence, deterministic greedy algorithm is described in Algorithm 1. This algorithm provides the global solution to the optimization problem defined in (22)
Algorithm 1:
Greedy algorithm for maximization of Q
IV. Experimental Results
In this study, we used EEG data collected from 12 healthy individuals according to an IRB approved protocol for an earlier study [19]. The data was acquired from 16 EEG locations: Fp1, Fp2, F3, F4, Fz, Fc1, Fc2, Cz, P1, P2, C1, C2, Cp3, Cp4, P5 and P6 according to the International 10/20 configuration. To record the data we utilized a g.USBamp bio-signal amplifier at a sampling rate of 256Hz with active g.Butterfly electrodes. Each user performed three calibration sessions, one for each presentation paradigm (RCP, SCP, RSVP), with 150ms intertrial-interval (ITI)2. A calibration session consisted of 100 sequences of 10 trials containing 1 target symbol, and prior to each sequence the designated target character is displayed. These data sets were used to obtain model parameters used in Monte-Carlo simulations.
For each calibration data set, we first estimate class conditional distributions for target and non-target EEG evidences. We used the samples drawn from these distributions to perform 20 Monte-Carlo simulations of the system. In each simulation, the system typed missing words in 10 different phrases with different difficulty levels3 from 1 (the easiest) to 5 (the most difficult). These phrases are selected uniformly across five difficulty levels. Here we report the results for simulated online performance of our system under proposed and baseline methods. In all cases, probability distribution from the language model is fused with EEG likelihoods, in a Bayesian framework, to perform MAP inference for intent detection. Hence the only differing factor among these cases is the sequence design and not the inference mechanism. One should note that the proposed method does not affect the calibration session as its goal is to enhance online performance by including the prior knowledge available in sequence design to avoid non-informative and irrelevant queries while obtaining most information about more probable choices. We report the results in terms of: (I) total typing duration (TTD) for typing 10 phrases -which is inversely proportional to typing speed-, (II) probability of phrase completion (PPC) -which is a measure of typing accuracy-, and (III) information transfer rate (ITR). We compute the ITR in the standard form consistent with the literature,
for which, Te is the average time for typing a letter (in seconds), and p is the probability of typing the correct letter that is computed from simulations as the number of correctly typed letters over the number of all selections[28]. We report ITR in (bits/s).
Using the exact same setup we also tested the system performance using real experiments from another 12 participants who were consented to an IRB approved protocol at the Northeastern University. In our earlier work, we already showed that there is no statistically significant differences between the simulation and real experiment results [29]. Hence these individuals used the system for two presentation paradigms of RSVP and SCP -that are aligned with the assumptions of the proposed method- only while the system computed the decision employing Active-RBSE. In these experiments, participants used the system to type exact same 10 phrases used earlier in Mote-Carlo simulations. We used PPC and TTD to compare the real experiments to simulations and assess the human-in-the-loop effect on the results.
A. RSVP Paradigm
We used two sets of Monte-Carlo simulations (1) with random trial selection, and (2) with optimal query selection, to assess the effect active-RBSE on online system performance. The maximum number of sequences mt = 8 and number of trials within a sequence k = 14, were selected based on our earlier experimental study [20], [19]. The TTD for active RSVP (ARSVP) vs. the random RSVP paradigm is presented in a scatter plot in Figure 2. In this figure, the horizontal axis represents TTD for random RSVP and the vertical axis show the TTD for typing with ARSVP. The width and the height of the box around each data point in the scatter plot shows the standard deviation of TTD from 20 Monte-Carlo simulations for random and optimal sequences scenarios, respectively. According to this figure 9 out of 12 users could achieve a higher typing speed with optimal sequence selection. Wilcoxon signed-rank test result confirms a statistically significant improvement (P < 0.03) in TTD among users.
Figure 2.
Scatter plot of average TTD in minutes from 20 Monte-Carlo simulations. The horizontal axis shows the TTD when the sequences are selected randomly and the vertical axis represents the TTD for the optimal sequence selection. The width of the box around every point shows +/− standard deviation of TTD for random sequences and the height shows the same when sequences are optimized.
Figure 9.
Scatter plot of TTD of 10 phrases in terms of minutes. The horizontal axis shows the TTD when the sequences are selected based on SCP paradigm and the vertical axis represents the TTD for optimal sequence selection. The width of the box around every point shows +/− standard deviation of TTD for SCP paradigm and the height shows the same when sequences are optimized.
In Monte-Carlo simulations, a phrase is assumed to be successfully completed, if the system types the correct phrase in a predefined duration with no more than five consecutive mistakes; otherwise, that phrase is assumed to be incomplete. Using this setup, the PPC obtained from the simulation sets are presented in Figure 3. In this figure, the horizontal axis represents the AUC values for different participants. The green “*” points represent the averaged PPC from 20 Monte-Carlo simulations and the error-bars represent the 90% area under a beta distribution fitted to different PPCs obtained from ARSVP paradigm. Similarly, using the PPC values obtained from simulations with random RSVP paradigm, the mean PPC value, the red “o”, and 90% standard deviations, the red bars, are computed.
Figure 3.
Average probability of phrase completion with 90% confidence intervals for RSVP paradigm. The confidence interval is calculated by fitting a Beta distribution on PPC obtained from 20 Monte-Carlo simulations.
The results in Figure 3 show that the optimal query strategy improves the typing accuracy. This effect is more clear for the AUCs∈ [0.7, 0.9], and usually this range includes most of the users in a healthy population. We measured the consistency of this improvement for all participants by performing the Wilcoxon signed-rank test on average PPCs. The result demonstrates statistically significant improvement in PPC with P < 0.003.
In Figure 4 we present the bar-charts of ITR for ARSVP and random RSVP, side by side, to study the effect of sequence optimization. ITR help us to summarizes TTD and PPC in a more standard measure and compare speed and accuracy both at the same time. As the results in this plot suggest, sequence optimization has improved the ITR significantly for all participants except 2 individuals. Among those, as we can see in Figure 3, one participant has gained typing accuracy although the typing speed has decreased, as shown in Figure 2, which has led to lower ITR. According to experts, the first objective of BCIs for target population is to achieve accurate typing; and hence sequence optimization has improved the primary performance measure for this individual.
Figure 4.
Median of information transfer rate for active learning based adaptive sequences vs. random sequences for RSVP paradigm per subject. The statistical difference of ITR distributions computed by Wilcoxon rank sum test from 20 Monte-Carlo simulation at each condition is presented for each subject on top of the bar charts by p value.
As shown in Figure 3, the probability of phrase completion for the second subject is close to 0 in both cases, which means that participant has not been successful in using our system with or without sequence optimization, due to low AUC, 59%, close to chance level. In general, when the AUC values are lower, typing duration is longer because the users are more successful in correcting their errors. But eventually, when the EEG classification accuracy is close to chance level, they will reach the time limit of epoch and system will mark the phrase as unsuccessful.
The results for users performance in an online typing scenario on the background of results from 20 Monte-Carlo simulations are presented in Figure 5. Although the simulations seems to over estimate the performance of actual typing. We think that the degradation in actual experiments performance is due to factors such as non-stationarity intruded in EEG signal due to user tiredness through the session and change in electrode connection impedance as a result of conductive gel desiccation, rather than human factors introduced specifically by proposed sequence optimization mechanism.
Figure 5.
Typing speed analysis results. TTD and PPC are shown for RSVP paradigm when sequence optimization is in use. Simulation results are used to define the shaded 90% confidence area shown. The dashed line shows the expected value from simulation for each variable and the solid line shows actual typing outcomes in a single experimental run that follows.
B. Matrix-based Presentation Paradigm with Overlapping Trials
Let us define a function such that . Accordingly, we define a code matrix C = [c(A1), · · ·, c(Ak)]. Then each row of the C matrix assigns a code word to each member of the vocabulary set which demonstrates its presence in trials of a sequence.
RCP is the most widely used matrix-based presentation paradigm in which the trials have overlaps with Ai ∩ Aj ≤ 1, , i ≠ j. If we define the number of rows and columns as Nr and Nc respectively, then the RCP paradigm offers unique codewords of length Nr + Nc with two nonzero elements. In our experiments, we utilized a 4 × 7 background matrix of characters to efficiently distribute the 28 symbols of our vocabulary set over the space available in wide-screen layout. In this layout Nr = 4 and Nc = 7 and the codewords length is 11 for the RCP paradigm.
In this study, we propose to define the search space such that each letter is uniquely identifiable from each sequence. When considering matrix presentation paradigms for ERP-based BCIs, one needs to consider some conditions for sequence set design, to satisfy the requirements for eliciting ERP. Here note that, in oppose to RSVP-based paradigms, matrix-based presentation paradigms can benefit from visual evoked potentials (VEPs) so we allowed for more frequent flashes of the same character in matrix based presentation paradigm [30]. This can improve the typing speed by reducing the length of sequences. Accordingly, we propose to define the feasible set such that a unique code word is assigned to each symbol while, each symbol is presented with a probability of less than 0.5 in each sequence.
We set the codeword length k = 6, to get enough codewords with at most 3 non-zero elements. This setup offers unique code words to be assigned to each member of the vocabulary set. The TTD comparison of RCP and actively learned presentation (ALP) paradigm are presented in Figure 6. The scatter plot suggests that the ALP can offer shorter TTD. The benefit of ALP is more visible for the lower AUCs which are demonstrated by the points concentrated in the center of the figure. Although the TTD improvement due to our proposed method is clear from this figure but, we performed Wilcoxon signed-rank test between average TTDs of ALP and RCP among all participants to demonstrate statistical evidence. Result confirms the statistical significant with P < 0.0005. Figure 7, shows the effect of ALP on PPC in contrast to RCP. In RCP paradigm the participants AUCs are generally higher and consequently the PPC are above %95 event without any sequence optimization. Thus our method did not offer any significant improvement for this case (P > 0.75). As a conclusion, from Figures 6 and 7, ALP can significantly reduce the TTD while preserving the PPC.
Figure 6.
Scatter plot of total typing duration of 10 phrases in terms of minutes. The horizontal axis shows the TTD when the sequences are selected based on RCP paradigm and the vertical axis represents the TTD for optimal sequence selection. The width of the box around every point shows +/− standard deviation of TTD for RCP paradigm and the height is the same when sequences are optimized.
Figure 7.
Average probability of phrase completion with 90% confidence intervals for random and optimal flashing subsets.
In a real typing scenario based on the assumptions, one can further propose to present a smaller subset of vocabulary while applying more restriction on the feasible space to prevent from repetition blindness or reduce the probability of each character in a sequence to attain oddball conditions. There are more refinement required to adopt active-RBSE for this presentation paradigm in real typing (which is a work in progress).
The bar-chart of ITR for these presentation paradigms are shown in Figure 8. The bar-charts of ITR for ALP and RCP are plotted side by side and per-individual statistical difference significance (p values) are shown on top of each ITR pair. As the results suggest in this plot, sequence optimization has improved the ITR significantly for all participants. We can see from Figures 6, and 7 that the main reason for ITR improvement is increased typing speed while typing accuracy remains comparable.
Figure 8.
Median information transfer rate for active learning based adaptive sequences vs. random sequences for matrix-based paradigm with overlapping trials per subject. The statistical difference of ITR distributions computed by Wilcoxon rank sum test from 20 Monte-Carlo simulation at each condition is presented for each subject on top of the bar charts by p value.
For this manuscript, we didn’t conduct actual experiments using ALP presentation paradigm since 1.we cannot satisfy the oddball conditions required for eliciting ERP; 2.the mathematical framework that led to the algorithm, used for sequence optimization, assume non overlapping trials (which is not a correct assumption for this presentation paradigm).
C. Matrix-based Presentation Paradigm with Single Character Trials
In our earlier studies on optimal sequence length for RSVP paradigm, we have shown that the best typing performance can be achieved when not all letters but a subset of vocabulary is presented in each sequence [20]. The matrix SCP paradigm is closely related to RSVP paradigm in the sense that each trial consists of a single letter and each letter will presented at most once in each sequence. Hence, we assume here that the best typing performance for SCP can be achieved with sequences of length 14, similar to RSVP paradigm. Consequently, we assess the typing performance for optimizing the sequences of length k = 14 where |Ai| = 1, and compare it to typing performance obtained from the standard SCP paradigm in which all the vocabulary set would be flashed in every sequence.
The results are summarized in Figures 9, 10, & 11. Figure 9 represents the scatter plot of TTD. The horizontal axis shows the TTD value for standard SCP paradigm and the vertical axis shows the TTD when the adaptive single character presentation (ASCP) paradigm is used for typing. This figure suggests that, the typing speed of a SCP paradigm can be significantly improved by optimally selecting smaller subset of characters (P < 0.01). More interestingly, the PPC comparison of ASCP and SCP as demonstrated in Figure 10, presents statistically significant improvements, with P < 0.008, when an optimized subset of characters are used instead of full vocabulary set.
Figure 10.
Average probability of phrase completion with 90% confidence intervals for random and optimized sequences.
Figure 11.
Median information transfer rate for active learning based adaptive sequences vs. random sequences for SCP per subject. The statistical difference of ITR distributions computed by Wilcoxon rank sum test from 20 Monte- Carlo simulation at each condition is presented for each subject on top of the bar charts by p value.
Also, as one expects from TTD and PPC results of SCP, sequence optimization improves the ITR significantly for all users (please see Figure 11).
Finally, the results for users performance in an online typing scenario against results from 20 Monte-Carlo simulations are presented in Figure 12. Typing simulations for SCP paradigm seems to be more consistent with real experimental results. We think that this is due to the presence of VEP (in addition to ERP) component which is more robust to nonstationary caused by user tiredness.
Figure 12.
Typing speed analysis results. TTD and PPC are shown for SCP paradigm when sequence optimization is in use. Simulation results are used to define the shaded 90% confidence area shown. The dashed line shows the expected value from simulation for each variable and the solid line shows actual typing outcomes in a single experimental run that follows.
V. Discussion and Future Work
We presented the active-RBSE framework which utilizes active-learning concepts to optimize query sets in noninvasive visual ERP-based bCis. This framework is a mathematical foundation for active state estimation developed based on our experimental observations in BCIs using the RSVP paradigm. We demonstrated the usefulness of active-RBSE and provided a generalization to assist matrix-based presentation paradigms, for which optimal subset selection for trials has been a question of interest recently.
For that, we used 36 supervised data sets collected from 12 healthy participants who utilized a language-model-assisted letter-by-letter typing interface with three different presentation paradigms of: RSVP, SCP, and RCP. Initial assessment of active-RBsE framework through Monte-Carlo simulations demonstrated that this framework offers a significant improvement in terms of typing speed and accuracy over well known existing presentation paradigms. We have also experimentally validated these results with 12 participants in an online setting. We believe that this framework needs further improvements and analyses at least in three different directions.
(I) The alternative bound on the original objective function used to solve the optimization problem does not factor in overlaps between trials. This effect is mainly due to replacing the normalization factor with a fixed upper-bound. In addition, we need to provide the error introduced in the objective function value when the optimum point is estimated with this alternative bound.
(II) The objective function presented here was a particular choice based on intuition. It also corresponds to minimizing the expected value of Rényi entropy of order infinity obtained from predicted posterior PMFs. In future we can optimize the queries based on well established information theoretic concepts such as Shannon and Rényi entropy for exploration or exploitation. Moreover, we can relax the fixed sequence length assumption by allowing the algorithm to decide on query set sizes in every decision cycle.
(III) Earlier studies have shown system improvement when physiological factors, such as repetition blindeness, have been accounted for[31]. This mathematical formulation allows us to consider human-in-the-loop effects both for decision making and query optimization by including physiological factors, such as repetition blindness and effect of neighboring flashes, which will be studied in future work.
The applicability of the active-RBSE framework is not limited to EEG-based spelling BCIs. We expect to utilize this framework not only under different sensor modalities and BCI applications, but also for many other systems which utilize recursive querying in a sticky state estimation scenario.
Acknowledgments
This work was supported by: NSF CNS-1136027, IIS1149570; NIH 2R01DC009834–06A1; NIDRR H133E140026. A complete package containing code and data associated with this document can be found online at the Northeastern University Library Digital Repository: http://hdl.handle.net/2047/d20194049.
Appendix A
Objective function as modular monotone set function
Below, after providing relevant definitions, we prove that Q, define in (22), is a monotone modular set function.
Definition 1.
(Discrete derivative [32])
Assume a set function , , and , then Δf (w|B) := f(B ∪ {w}) — f(B) is “discrete derivative” of f at B with respect to w.
Definition 2.
(Modularity [32])
A function is “modular” if for every and ,
or equivalently the function is “modular” if for every B1,
Lemma 1.
Take , then the function as defined in (22) is a modular set function.
Proof: Assume and , then
Since A ∉ Φ1 we use the definition of to write
Similarly as A ∉ Φ2, we have
■
Definition 3.
(Monotonicity [32])
A set function is “monotone” if for every , we get f(B1) ≤ f(B2).
Lemma 2.
Take , then the function as defined in (22) is a monotone set function.
Proof: Assume , and define Φ3 = Φ2 \ Φ1, then Φ3 ∪ Φ1 = Φ2 and we can write:
Moreover, Φ3 ∩ Φ1 = ∅ then according to the definition of we have
Based on our assumption, . Also due to definition, . Hence
■
Footnotes
Note that the approximation in (16) corresponds to defining a point estimate of EEG scores by calculating their mean value as computed in (17).
ITI refers to the time between the onsets of two consecutive trials.
Lower levels consist of copying letters with high probabilities according to the language model. As the level increases, the language model probabilities become increasingly adversarial. Level 3 is neutral on average.
Contributor Information
Mohammad Moghadamfalahi, Email: moghamdam@ece.neu.edu.
Murat Akcakaya, Email: akcakaya@pitt.edu.
Hooman Nezamfar, Email: nezamfar@ece.neu.edu.
Jamshid Sourati, Email: sourati@ece.neu.edu.
Deniz Erdogmus, Email: erdogmus@ece.neu.edu.
References
- [1].Akcakaya M, Peters B, Moghadamfalahi M, Mooney A, Orhan U, Oken B, Erdogmus D, and Fried-Oken M, “Noninvasive Brain Computer Interfaces for Augmentative and Alternative Communication,” Biomedical Engineering, IEEE Reviews in, vol. 7, no. 1, pp. 31–49, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Moghimi S, Kushki A, Guerguerian AM, and Chau T, “A Review of EEG-Based Brain-Computer Interfaces as Access Pathways for Individuals with Severe Disabilities,” Assistive Technology: The Official Journal of RESNA, vol. 25, no. 2, pp. 99–110, 2012. [DOI] [PubMed] [Google Scholar]
- [3].Farwell L and Donchin E, “Talking off the top of your head: Toward a mental prosthesis utilizing event-related brain potentials,” Electroencephalography and clinical Neurophysiology, vol. 70, pp. 510–523, 1988. [DOI] [PubMed] [Google Scholar]
- [4].Sellers E, Schalk G, and Donchin E, “The P300 as a typing tool: tests of brain computer interface with an ALS patient,” Psychophysiology, vol. 40, p. 77, 2003.12751806 [Google Scholar]
- [5].Orhan U, Hild KE, Erdogmus D, Roark B, Oken B, and Fried-Oken M, “RSVP keyboard: An EEG based typing interface,” Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pp. 645–648, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Sellers EW, Krusienski DJ, McFarland DJ, Vaughan TM, and Wolpaw JR, “A P300 event-related potential brain-computer interface (BCI): the effects of matrix size and inter stimulus interval on performance,” Biological psychology, vol. 73, no. 3, pp. 242–252, 2006. [DOI] [PubMed] [Google Scholar]
- [7].Allison BZ, Pineda J et al. , “ERPs evoked by different matrix sizes: implications for a brain computer interface (BCI) system,” Neural Systems and Rehabilitation Engineering, IEEE Transactions on, vol. 11, no. 2, pp. 110–113, 2003. [DOI] [PubMed] [Google Scholar]
- [8].Jin J, Allison BZ, Sellers EW, Brunner C, Horki P, Wang X, and Neuper C, “Optimized stimulus presentation patterns for an event-related potential EEG-based brain-computer interface,” Medical & biological engineering & computing, vol. 49, no. 2, pp. 181–191, 2011. [DOI] [PubMed] [Google Scholar]
- [9].Townsend G, LaPallo B, Boulay C, Krusienski D, Frye G, Hauser C, Schwartz N, Vaughan T, Wolpaw J, and Sellers E, “A novel P300-based brain-computer interface stimulus presentation paradigm: moving beyond rows and columns,” Clinical Neurophysiology, vol. 121, no. 7, pp. 1109–1120, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Townsend G, Shanahan J, Ryan DB, and Sellers EW, “A general P300 brain-computer interface presentation paradigm based on performance guided constraints,” Neuroscience letters, vol. 531, no. 2, pp. 63–68, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Jin J, Sellers EW, Zhou S, Zhang Y, Wang X, and Cichocki A, “A p300 brain-computer interface based on a modification of the mismatch negativity paradigm,” International journal of neural systems, vol. 25, no. 03, p. 1550011, 2015. [DOI] [PubMed] [Google Scholar]
- [12].Li Y, Pan J, Long J, Yu T, Wang F, Yu Z, and Wu W, “Multimodal bcis: target detection, multidimensional control, and awareness evaluation in patients with disorder of consciousness,” Proceedings of the IEEE, vol. 104, no. 2, pp. 332–352, 2016. [Google Scholar]
- [13].Yeom S-K, Fazli S, Muller K-R, and Lee S-W, “An efficient erp-based brain-computer interface using random set presentation and face familiarity,” PloS one, vol. 9, no. 11, p. e111157, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Treder MS and Blankertz B, “Research (C) overt attention and visual speller design in an ERP-based brain-computer interface,” Behavioral & Brain Functions, vol. 6, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Acqualagna L, Treder MS, Schreuder M, and Blankertz B, “A novel brain-computer interface based on the rapid serial visual presentation paradigm,” in Engi neering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE IEEE, 2010, pp. 2686–2689. [DOI] [PubMed] [Google Scholar]
- [16].Acqualagna L and Blankertz B, “Gaze-independent BCI-spelling using rapid serial visual presentation (RSVP),” Clinical Neurophysiology, vol. 124, no. 5, pp. 901–908, 2013. [DOI] [PubMed] [Google Scholar]
- [17].Orhan U, Erdogmus D, Roark B, Oken B, and Fried-Oken M, “Offline analysis of context contribution to ERP-based typing BCI performance,” Journal of neural engineering, vol. 10, no. 6, p. 066003, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Orhan U, Erdogmus D, Roark B, Oken B, Purwar S, Hild KE, Fowler A, and Fried-Oken M, “Improved accuracy using recursive Bayesian estimation based language model fusion in ERP-based BCI typing systems,” in Engineering in Medicine and Biology Society (EMBC), 2012 Annual International Conference of the IEEE IEEE, 2012, pp. 2497–2500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Moghadamfalahi M, Orhan U, Akcakaya M, Nezamfar H, Fried-Oken M, and Erdogmus D, “Language-Model Assisted Brain Computer Interface for Typing: A Comparison of Matrix and Rapid Serial Visual Presentation,” Neural Systems and Rehabilitation Engineering, IEEE Transactions on, vol. 23, no. 5, pp. 910–920, September 2015. [DOI] [PubMed] [Google Scholar]
- [20].Moghadamfalahi M, Gonzalez-Navarro P, Akcakaya M, Orhan U, and Erdogmus D, “The Effect of Limiting Trial Count in Context Aware BCIs: A Case Study with Language Model Assisted Spelling,” in Foundations of Augmented Cognition. Springer, 2015, pp. 281–292. [Google Scholar]
- [21].Roark B, Villiers JD, Gibbons C, and Fried-Oken M, “Scanning methods and language modeling for binary switch typing,” in Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies Association for Computational Linguistics, 2010, pp. 28–36. [Google Scholar]
- [22].Faul S, Gregorcic G, Boylan G, Marnane W, Lightbody G, and Connolly S, “Gaussian process modeling of eeg for the detection of neonatal seizures,” IEEE Transactions on Biomedical Engineering, vol. 54, no. 12, pp. 2151–2162, 2007. [DOI] [PubMed] [Google Scholar]
- [23].Zhong M, Lotte F, Girolami M, and Lécuyer A, “Classifying eeg for brain computer interfaces using gaussian processes,” Pattern Recognition Letters, vol. 29, no. 3, pp. 354–359, 2008. [Google Scholar]
- [24].Friedman JH, “Regularized discriminant analysis,” Journal of the American statistical association, vol. 84, no. 405, pp. 165–175, 1989. [Google Scholar]
- [25].Silverman BW, Density estimation for statistics and data analysis. CRC press, 1986, vol. 26. [Google Scholar]
- [26].Kay S, “Fundamentals of Statistical Signal Processing, Volume II: Detection Theory.” 2008.
- [27].Nemhauser GL, Wolsey LA, and Fisher ML, “An analysis of approximations for maximizing submodular set functionsI,” Mathematical Programming, vol. 14, no. 1, pp. 265–294, 1978. [Google Scholar]
- [28].Orhan U, “Rsvp keyboard: An eeg based bci typing system with context information fusion,” Ph.D. dissertation, Northeastern University, 2013. [Google Scholar]
- [29].Orhan U, Nezamfar H, Akcakaya M, Erdogmus D, Higger M, Moghadamfalahi M, Fowler A, Roark B, Oken B, and Fried-Oken M, “Probabilistic simulation framework for eeg-based bci design,” Brain-Computer Interfaces, pp. 1–15, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Chennu S, Alsufyani A, Filetti M, Owen AM, and Bowman H, “The cost of space independence in P300-BCI spellers,” Journal of neuroengineering and rehabilitation, vol. 10, no. 82, pp. 1–13, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Verhoeven T, Buteneers P, Wiersema J, Dambre J, and Kindermans P, “Towards a symbiotic brain-computer interface: exploring the application-decoder interaction,” Journal of Neural Engineering, vol. 12, no. 6, p. 066027, 2015. [DOI] [PubMed] [Google Scholar]
- [32].Krause A and Golovin D, “Submodular function maximization,” Tractability: Practical Approaches to Hard Problems, vol. 3, p. 19, 2012. [Google Scholar]













