Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Dec 14.
Published in final edited form as: Physiol Meas. 2022 Dec 14;43(12):10.1088/1361-6579/aca6ca. doi: 10.1088/1361-6579/aca6ca

Seizure Forecasting using Machine Learning Models Trained by Seizure Diaries

Ezequiel Gleichgerrcht 1, Mircea Dumitru 2, David A Hartmann 3, Brent C Munsell 4, Ruben Kuzniecky 5, Leonardo Bonilha 6, Reza Sameni 2,
PMCID: PMC9940727  NIHMSID: NIHMS1871064  PMID: 36541513

Abstract

Objectives:

People with refractory epilepsy are overwhelmed by the uncertainty of their next seizures. Accurate prediction of future seizures could greatly improve the quality of life for these patients. New evidence suggests that seizure occurrences can have cyclical patterns for some patients. Even though these cyclicalities are not intuitive, they can be identified by machine learning (ML), to identify patients with predictable vs unpredictable seizure patterns.

Approach:

Self-reported seizure logs of 153 patients from the Human Epilepsy Project with more than three reported seizures (totaling 8,337 seizures) were used to obtain inter-seizure interval time-series for training and evaluation of the forecasting models. Two classes of prediction methods were studied: 1) statistical approaches using Bayesian fusion of population-wise and individual-wise seizure patterns; and 2) ML-based algorithms including least squares, least absolute shrinkage and selection operator, support vector machine (SVM) regression, and long short-term memory regression. Leave-one-person-out cross-validation was used for training and evaluation, by training on seizure diaries of all except one subject and testing on the left-out subject.

Main Results:

The leading forecasting models were the SVM regression and a statistical model that combined the median of population-wise seizure time-intervals with a test subject’s prior seizure intervals. SVM was able to forecast 50%, 70%, 81%, 84%, and 87% of seizures of unseen subjects within 0, 1, 2, 3 to 4 days of mean absolute forecasting error, respectively. The subject-wise performances show that patients with more frequent seizures were generally better predicted.

Significance:

ML models can leverage non-random patterns within self-reported seizure diaries to forecast future seizures. While diary-based seizure forecasting alone is only one of many aspects of clinical care of patients with epilepsy, studying the level of predictability across seizures and patients paves the path towards a better understanding of predictable vs unpredictable seizures on individualized and population-wise bases.

1. Introduction

Over 30–40% of patients with epilepsy continue to have seizures despite anti-seizure medications [1]. One of the most impairing aspects of drug-refractory epilepsy is the unpredictability of seizures [2], causing many individuals to live in apprehension of when their next seizure will occur [3], a phenomenon described as “seizure worry.” For this reason, the ability to predict seizures could greatly improve the quality of life for patients with refractory epilepsy. If accurate seizure forecasting were possible, patients could attain better control of their lives by planning activities based on forecast seizures and/or perhaps taking additional ‘rescue’ anti-seizure medications when ictal events are predicted to occur [4]. Furthermore, accurate seizure prediction would allow epileptologists to develop treatment plans tailored to impending seizure burden, as well as increase the yield of scheduled admissions to the epilepsy monitoring unit [5].

Recent studies have confirmed classic clinical observations that, at least for some patients, seizures tend to occur in cyclic patterns at timescales that vary between individuals and can range from hours to weeks [6, 7, 8, 9, 10]. The concept of seizure periodicity dates back to ancient Babylon [8], with supporting data coming from the turn of the 20th century showing that many patients’ seizures followed individualized schedules, occurring around the same time of day [11] with a seemingly consistent number of days between seizures [12]. Recent studies with electroencephalogram (EEG) recordings [13, 14] and patient-reported seizure diaries [7, 15] have also found that many patients’ seizures exhibit individualized rhythms. Furthermore, interictal epileptiform discharges detected with implanted electrodes exhibit a periodicity that relates to seizure events, providing further evidence that seizures follow non-random biologic cycles of neural excitation [6, 16, 17, 18]. Spontaneous [19, 20] and induced [21, 22] animal models of epilepsy also exhibit cycles in their seizure rate. Altogether, seizure susceptibility for some patients may be modulated by pacemakers that operate on timescales from hours to weeks, which remain to be identified.

Therefore, if seizures exhibit some degree of periodicity, could someone’s seizure regularity be leveraged to predict their next seizure? [23] Many studies have accurately predicted seizures minutes before they occur using subtle changes in continuous scalp EEG with a standard 10–20 scalp electrode array [24, 25]. Implanted electrodes [26] and responsive neurostimulation (RNS) devices [17] have also been used to create seizure forecasts hours to days in advance with accuracy far better than chance. Using EEG to predict seizures has great potential [27], but its current practical implementation is limited, because long-term, continuous EEG recordings are costly to obtain and analyze, and only a small proportion of patients with epilepsy have implanted electrodes. Importantly, Karoly, et al. recently showed that seizures could be accurately predicted using patient logs of clinical seizures, suggesting that EEG might not be necessary for predicting seizures in many patients [9]. In their study, pseudo-prospective predictions based on an individual’s fast (hours) and slow (days-weeks) seizure cycles achieved remarkable accuracy, with nearly 70% of seizures occurring during the 15% of the time when patients were in a predicted state of high seizure probability. Improvements in their model, such as allowing for more subtle cycles than the dominant ‘fast’ and ‘slow’ components, and providing predictions days in advance rather than minutes or hours, might provide more accurate and clinically useful results [9]. In these studies, a large proportion of patients had predictable seizures [17, 23]. Even if seizure forecasting were not accurate for all individuals (e.g., due to lack of cyclicaity in their events), accurate seizure prediction could still have a tremendous impact among those patients in whom it is accurate, and could help in categorizing patients with predictable vs unpredictable seizures to better understand the underlying mechanisms of epilepsy.

Machine learning (ML) approaches can be used for seizure predictions. In particular, these approaches can be used for time-series forecasting, using diaries seizure logs and can be extended to account for other features such as patient age and epilepsy type [28]. In this study, we use several statistical and ML forecasting models to evaluate if self-reported seizure occurrences of outpatients could be predicted for some individuals using seizure diaries collected as part of the Human Epilepsy Project [29]. By using a leave-one-person-out cross validation scheme, we assess how accurately ML models trained by the seizure diaries of a population of patients with epilepsy can forecast the future seizures of a new patient, assuming that one has access to the k (k ≥ 1) most recent seizure diaries of the patient under test. The proposed forecasting algorithms are designed as proof of concepts for real-world scenarios. For example if a new patient with a given seizure diary were admitted to a healthcare facility one could forecast the most likely date of the patient’s future seizure, and depending on the actual outcome versus the forecast date, one may assess the patient’s level of seizure predictability. The results demonstrate that seizures of a large proportion of patients can be predicted using the proposed models, within a few days from the actual seizure dates. Numerous factors impact seizure outbreaks and forecasting seizures based on seizure logs alone is evidently only one of many clinical aspects for relieving patients from seizure worries (as confirmed by our results and the existing literature on the predictability of seizures for inpatients). Nonetheless, the results provide the performance bounds of diary-based-only seizure forecasting, which can be used as benchmarks for forecasting perceived and unperceived seizures using biosignals such as EEG or RNS-based systems. Our framework also enables categorizing predictable vs unpredictable patients and seizure outbreaks, providing a better understanding of epilepsy at population-wise and individual-wise levels.

In Section 2, the seizure diaries dataset is described. In Section 3, we formulate the problem and detail the proposed ML and statistical seizure forecasting methods. The seizure forecasting results are presented and discussed in Sections 4 and 5, followed by concluding remarks and future perspectives in Section 6.

2. Dataset

Data for this study were obtained in de-identified format from the Human Epilepsy Project (HEP), a multi-site observational study of adult patients with focal epilepsy (ClinicalTrials.gov; NCT02126774). Patients who volunteered for the study were all treated for focal epilepsy by an epileptologist based on a diagnosis established clinically either based on a history of two seizures occurring more than twenty-four hours apart, or a single seizure with an abnormal MRI or EEG in the twelve months preceding enrollment. The subjects were newly diagnosed patients who were actively being managed for epilepsy. The participants were asked to track their seizures, symptoms, and medications§ on “My Epilepsy Diary,” a self-management Web-based service for seizure tracking, developed by Irody, Inc. For each seizure, participants entered the time and date of the event, an approximate estimation of its duration, and a label of the event based on each participant’s unique seizure type. Data were available to us for seizures occurring between November 2012 and July 2018. We combined all seizure types reported by each patient, and we did not utilize the time of seizure or the duration of each seizure to reduce subjective biases in reporting perceived seizures. The obtained data did not include demographics.

For the current study, the data was abstracted in the form of a table of subject IDs and seizure dates, and the days with one or more seizures were coded as one entry (i.e., multiple seizures per day are counted as one seizure). In total, there were seizure logs from 243 patients, with inter-seizure gaps ranging from zero to 1694 days, where zero corresponds to multiple seizures in one day.

3. Methods

3.1. Data model and problem formulation

We use the seizure diaries to derive subject-wise seizure interval time-series. Suppose that a patient has reported a total number of n seizure occurrences at dates d1, d2, . . . , dn, sorted in chronological order of seizure dates. The objective is to design ML algorithms to obtain d^n+1, which is an estimate of the next seizure’s actual date dn+1. The design objective is to minimize

en=|dn+1d^n+1| (1)

i.e., the absolute error (in days) between the actual and estimated seizure dates on an average basis (mean or median, as used in the sequel). In our proposed model, we do not consider date-dependent cyclicalities, e.g., over years or seasons, months, weekdays, etc. Therefore, we simplify the data model by abstracting out the actual seizure dates and focusing on seizure time gaps. To this end, for each patient, we use the reported seizure dates to calculate sequences of seizure-intervals (in days):

δ1=(d2d1),δ2=(d3d2),,δn1=(dndn1) (2)

Now, the objective is to estimate δ^n, which is an estimate of the next seizure-interval δn. In this scheme, the next seizure date can be found from:

d^n+1=δ^n+dn. (3)

The seizure-intervals approach is a simplified formulation for the actual seizure-dates estimation problem; because the actual date information (and any potential seasonal or annual cyclicality associated to the exact dates) is neglected in the seizure-intervals, and only the time-gap between successive seizures is used to design the seizure forecasting algorithms. Training algorithms with this scheme has the advantage of being generalizable to other subjects and future seizure dates, independent of the actual seizure dates. Therefore, in the sequel, our working hypothesis is the seizure time-interval forecasting model.

In order to assess and compare the forecasting capacity of different models, for each subject, we define

Δk=[δnk,δnk+1,,δn1], (4)

which is the seizure-intervals sequence truncated to the last k seizures. We refer to k (k ≥ 1) as the look-back window. Intuitively, using a patient’s entire seizure history should result in the most accurate estimate of d^n+1 (the maximum k per patient). We will assess this hypothesis by considering the impact of the look-back window length as a parameter in the forecast models.

In the available dataset, the time of the day at which the seizure occurred was not available. Therefore, we acknowledge the fact that due to the day-wise resolution of the seizure dates, there is an inevitable rounding error in the seizure time intervals. For example, consecutive seizures occurring at 6 am and 10 pm, with sixteen hours of time gap, count as two seizures in the same day, while consecutive seizures occurring at 11 pm and 1 am, with only two hours of time gap, are reported in two days. This results in an inevitable plus/minus half-a-day of uncertainty in the seizure date estimates, which our forecasting algorithms can not resolve.

3.2. Data preparation

The subject seizure dates data introduced in Section 2 is used to calculate Δk from (4), for all 243 subjects and for all n (ranging from the first to the last seizure index of each patient). By removing seizures with more than 60 days of gap and the subjects with less than 3 seizures, 153 subjects with a total number of 8,337 seizures remain that are used for processing. The look-back window length k is a hyper-parameter that we sweep from 1 to 21, to investigate the impact of seizure look-back history on the forecasting performance. For each patient, the look-back values of the seizure time gaps corresponding to n < k are left-padded with the first available seizure time-gap, to compensate for missing leading values and for subjects who have fewer seizure reports (smaller than the look-back window length k). An example of self-reported seizure dates and the corresponding seizure time gaps are shown in Table 1.

Table 1.

A sample seizure diary report and the corresponding seizure time-intervals used for seizure forecasting. The variable terms denote unknown values to be estimated.

n Seizure Date (dn) δn (days) Δ1 Δ2 Δ3
1 Jan 15, 2016 2 2 [2, 2] [2, 2, 2]
2 Jan 17, 2016 6 6 [2, 6] [2, 2, 6]
3 Jan 23, 2016 11 11 [6, 11] [2, 6, 11]
4 Feb 3, 2016 14 14 [11, 14] [6, 11, 14]
5 Feb 17, 2016 1 1 [14, 1] [11, 14, 1]
6 Feb 18, 2016 9 9 [1, 9] [14, 1, 9]
7 Feb 27, 2016 δ 7 δ 7 [9, δ7] [1, 9, δ7]
8 d 8 δ 8 δ 8 [δ7, δ8] [9, δ7, δ8]

A leave-one-person-out scheme is used for cross validation. Accordingly, the seizure diaries of all except one subject is used for training the forecasting algorithms, and the trained models are applied to the left-out subject for validation. In the test phase, we forecast every single seizure of the test subject, given a vector of the last k seizures of the subject and the ML models trained on the population from the seizure diaries of other subjects. This guarantees the generalizability of the models to unseen subjects.

In ML terminology, the application of interest is a regression problem. We use the following notations in the sequel:

  • Δtraink: the set of all seizure time-intervals of the training set for a look-back window length k used as training predictors;

  • δtrain: the set of seizure time-intervals of the training set, used as training predictants;

  • Δtraink: the set of previous seizure time-intervals of the subject under test for look-back window k (test subject predictor);

  • δ^: the forecast seizure seizure time-intervals of the subject under test (test subject predictant).

The distribution of the inter-seizure time-intervals δn = dndn−1 across all the successive seizure dates of the entire population is depicted in Fig. 1. Accordingly, about 85% of seizures occur within 10 days from the first seizure and more than 90% of them occur within 20 days after the first seizures. The algorithms proposed in the sequel are generalizations of such statistical inferences via sequences of individual seizure time-intervals by using ML and Bayesian estimation models, resulting in more accurate and fine-grained individualized forecasts of future seizure occurrence dates.

Figure 1.

Figure 1.

The seizure interval time-series per patient (top), the histogram (bottom left) and the cumulative distribution (bottom right) of the seizure intervals for 8,337 seizure events from 153 patients with epilepsy, for seizure intervals up to 60 days. Statistically, about 85% of seizures occur within 10 days from the first seizure and more than 90% of them occur within 20 days after the first seizures. The proposed algorithms are generalizations of such statistical inferences via sequences of individual seizure time-intervals by using ML algorithms. The regularity of inter-seizure intervals makes them predictable (in the stochastic sense). In the top panel, the number of seizures per subject have been truncated to 200 seizures, seizure intervals above 60 days have been omitted (replaced with not-a-number values), and the subject-wise time-series have been vertically offsetted for better visualization.

3.3. Statistical forecasting models

We start by proposing a set of ad hoc statistical forecasting models, which combine population-wise and individual-wise seizure diaries, using average statistics.

3.3.1. Population-wise forecasting (non-individualized)

If the seizure diaries of a test patient are not available or accurate, a naive estimator for the most likely future seizure date is to use the average population-wise inter-seizure intervals (cf. Fig. 1), i.e.

δ^mn_train=mean(Δtraink) (5)
δ^md_train=median(Δtraink) (6)

which are the mean and medians of the seizure-intervals across the training population, respectively. Notice that in (5) and (6), the seizure history of the subject of interest Δtraink has not been considered and only the population-wise statistics are used. Therefore, there is no patient-wise individualization.

3.3.2. Individual-wise forecasting

Another basic seizure-interval estimator is to only use the test subject’s prior seizure history over the look-back window, regardless of the training populations seizure-interval distribution. This results in the following estimators, which are totally individualized:

δ^mn_test=mean(Δtestk) (7)
δ^md_test=median(Δtestk) (8)

3.3.3. Weighted population-individual-wise forecasting

The population-wise and individual-wise estimators proposed in Sections 3.3.1 and 3.3.2 are extreme cases. The former discards individual-wise history and the latter discards population-level prior information. One way of aggregating both the individual-wise and the population-wise data is via inverse-variance weighting, which is a common data fusion method in Bayesian estimation schemes. Accordingly, we define:

δ^weighted_mn=σp2δ^mn_test+σi2δ^mn_trainσp2+σi2 (9)

where σp2V(Δtraink) and σp2V(Δtestk) are the variances of the training set (the subscript p denoting population-wise) and test set (the subscript i denoting individual-wise) seizure-intervals, respectively. Apparently, δ^weighted_mn is always between δ^mn_test and δ^mn_train, and it is closer to the one with the smaller variance. Therefore, if a subjects’ prior seizures are very regular, the seizure time-intervals tend to a constant (σi2 is very small), and (9) anticipates the next seizure to continue the same average seizure-interval pattern, i.e., δ^weighted_mnδ^mn_test; but if a subjects’ seizures are highly irregular, σi2 becomes very large and the forecast tends towards the population-wise average seizure-gap, i.e., δ^weighted_mnδ^mn_train.

A similar estimate is a weighted average of the median values, which can be more robust to outliers as compared with (9):

δ^weighted_md=σp2δ^md_test+σi2δ^md_trainσp2+σi2 (10)

3.4. Regression-based forecasting models

The estimators proposed in (5) to (10) were based on an average (mean or median) of individual-wise and population-wise seizure time intervals. From an estimation theoretical standpoint, the supporting hypothesis for these estimators is that the average seizure time-intervals may be the minimum sufficient statistics [30, Ch. 5.3], for seizure forecasting. This happens when there is no additional temporal structure in the seizure diaries (beyond the first-order time intervals), which could be used to forecast future seizures. We assess this hypothesis by comparing the statistical estimators detailed in Section 3.3, with ML-based regression models, which can potentially use longer (more than first-order) temporal structure of the seizure diaries. The order of the look-back window k, will also be considered within the minimum sufficient statistics context.

Three ML-based regression algorithms are introduced in the sequel.

3.4.1. Least Squares (LS)

LS regression presumes a linear relationship between the seizure-intervals in the look-back window. Therefore, it can potentially capture beyond order-one dependencies in the seizure-time intervals. The LS regression coefficients α are obtained by solving α*=argminαδtrainαΔtraink and applied to the test subject as follows:

δ^LS=α*Δtestk (11)

3.4.2. Least absolute shrinkage and selection operator (LASSO)

We may hypothesize that while seizure time-intervals generally follow predictable patterns, there are also outliers that do not follow the general trend. Excluding such outliers may potentially improve the estimation quality as compared with ordinary LS. The LASSO regression is an estimator that can be used to assess this hypothesis. The LASSO cost function is a constrained version of the LS:

[β0*,β1*]=argminβ0,β1δtrainβ0β1Δtraink+λβ11, (12)

resulting in the following estimator on the test subject:

δ^LASSO=β0+β1*Δtestk (13)

In our implementations, the LASSO parameters were optimized via ten-fold cross-validation on the training set.

3.4.3. Support Vector Machine (SVM) regression

A SVM regression with a radial basis function (RBF) kernel is another regression scheme that we used for evaluation. We used the fitrsvm function in MATLAB, with a heuristic kernel scaling based on predictor subsampling (selected by the KernelScale parameter of fitrsvm). The SVM estimator is denoted as δ^SVM, for later reference.

3.4.4. Long short-term memory (LSTM) regression

The LSTM is a powerful state-of-the-art tool for regression. It takes into account potential linear/nonlinear temporal dependencies within time-series. We use a deep neural network architecture with the following layers: 1) a sequential input layer of order k (the same as the predictor feature vector length), 2) an LSTM layer with 25 hidden layers, 3) a fully connected layer with a single output, 4) a Rectified Linear Unit (ReLU). The Adam optimizer is used to train the network, with 1000 maximum training epochs, gradient threshold of 1, initial learning rate of 0.05, tanh(·) state activation function, sigmoid(·) gate activation function, and 10% learning rate drop rate per each 50 learning epoch.

4. Results

Fig. 2 presents the forecast accuracy percentage comparison between the regression based forecasts and the statistical forecasts. The vertical axis corresponds to the cumulative distribution of the number of seizures predicted correctly up to the mean absolute forecasting error (in days) indicated on the horizontal axis. For example a vertical value of 0.8 in 3 days indicates that 85% of the total number of the test set seizures were correctly estimated within an average of plus/minus three days of the actual seizure (using the leave-one-person-out cross validation scheme), and a value of 0.9 in 4 days indicates that 90% of the seizures are correctly predicted within four days of error (0 to 4 days). From the leading model performances in Fig. 2, for instance, if the model predicts a seizure on January 4th, there was approximately a 75% chance that the actual seizure would occur from Jan 3rd to Jan 5th, and approximately an 85% chance that the actual seizure would occur from Jan 1st to Jan 7th. The results correspond to a look-back window of up to 21 seizures per patient (whenever available). As shown in Fig. 2, the SVM regression (δ^SVM) and the weighted median (WMD) of the population-wise and individual-wise medians (δ^weighted_md) are the algorithms that have the best performance within three days of mean absolute error. We study the performance of these two methods in further details.

Figure 2.

Figure 2.

Forecasting accuracy of the studied algorithms vs the mean absolute forecasting error in days. MEAN_TRAIN corresponds to the training set (population-wise) mean defined in (5), MEDIAN_TRAIN is the population-wise median defined in (6), MEAN_TEST is the test set (individual-wise) mean defined in (7), MEDIAN_TEST is the individual-wise median defined in (8), WEIGHTED_MEAN is the weighted average between the population-wise and individual-wise means defined in (9), WEIGHTED_MEDIAN is the weighted average between the population-wise and individual-wise medians defined in (10), LS is least squares estimator defined in (11), SVM is support vector machine regression, LASSO is least absolute shrinkage and selection operator regression defined in (13), and LSTM denotes long short-term memory regression.

Fig. 3 demonstrates the impact of the look-back window length on WMD and SVM regression performances. Accordingly, increasing the look-back window length generally improves the performances of both methods . The distribution of the forecasting error for the WMD and SVM methods with a look-back window of 21 seizures is shown in Fig. 4, which is skewed towards positive errors, i.e., the forecasting algorithms tend to predict the seizures earlier than their actual dates (on average).

Figure 3.

Figure 3.

Forecasting accuracy vs the mean absolute errors for a) the weighted median (WMD) of the training and test sets (δ^weighted_md), and b) SVM regression (δ^SVM), for different look-back window values ranging from 1 to 20. For both methods, higher look-back orders improve the forecasting performances.

Figure 4.

Figure 4.

Distribution of forecasting mean absolute errors for a) the weighted median (WMD) of the training and test sets (δ^weighted_md), and b) SVM regression (δ^SVM) for a look-back window of twenty seizures.

Different subjects are expected to demonstrate different levels of seizure predictability. Fig. 5 demonstrates the subject-wise performances of the WMD (δ^weighted_md) and SVM regression (δ^SVM) methods. The subjects are sorted in order of the mean absolute error (MAE) of the seizure forecasting errors per subject (blue bars), overlapped with the median absolute forecasting error (MED-AE) per subject (in red). The practical incentive for using the MED-AE criterion is that it reflects the number of correctly predicted seizures regardless of the outliers. Accordingly, using WMD 69 subjects (out of 153) have a MAE seizure forecasting error of below 3.4 days, which is the average inter-seizure interval of the entire population (excluding the outliers with 60 days of seizure gaps, as detailed in Section 3.2). The number of subjects with this feature is 60 for the SVM algorithm. By sorting the same results by MED-AE, 102 and 105 subjects have a median error below 3.4 days, for the WMD and SVM algorithms, respectively. As shown in Fig. 5, the MED-AE is significantly smaller than the MAE for the majority of the subjects, which indicates that while the forecasts are typically close to the actual values (reflected by the lower MED-AEs), there are also significant outliers.

Figure 5.

Figure 5.

The individual-wise performances of the weighted median between train and test sets (WMD) and SVM regression algorithms. The subjects are sorted in terms of mean absolute error across the seizures of each individual. The number of seizures per subject corresponding to the WMD (Fig. 5(a)) and SVM (Fig. 5(c)) methods are shown in Figs. 5(b) and 5(d), respectively.

Figs. 5(b) and 5(d) demonstrate the number of seizures for each patient sorted according to the corresponding WMD and SVM results reported in Figs. 5(a) and 5(c). It is generally observed that the “most predictable subjects” (the subjects with smaller forecasting errors) have more seizures, with the exceptions of the first 15 to 20 subjects (on the left sides of Figs. 5(b) and 5(d)), which have only had a few seizures and were nonetheless accurately forecast. This implies that the patients with more frequent seizures are generally better predicted (have more predictable seizure patterns).

5. Discussion

Using data from patients in the Human Epilepsy Project, we tested whether self-reported seizure diaries could train statistical and ML-based regression algorithms to successfully predict future seizures. Seizure dates of the patients were converted into inter-seizure date gaps, to remove time dependencies (due to years, seasons, and months) and to focus only on the inter-seizure intervals. This guarantees that the developed ML models are generalizable and invariant to time offset. We used a leave-one-person-out cross validation scheme to assess the generalizability of the forecasting models. The statistical models were based on the “sufficient statistics” hypothesis [30, Ch. 5.3], using only the mean or median values of seizure time-intervals of the training population and the previous seizure logs of the under-test subject up to a fixed seizure look-back window length. The ML-based regression algorithms used the temporal structure of the seizure-interval time-series. The results show that two methods were the most effective at accurately forecasting seizures (at the individual patient level): 1) the WMD model, which was inspired by a Bayesian inference-like formulation for fusing the median of the population-wise seizure time-intervals and a test subject’s prior seizure intervals; 2) an SVM regression trained over population-wise seizure time-intervals and applied to a test subject’s prior seizure intervals time-series. The SVM model was the most effective among all models and was able to forecast 50%, 70%, 81%, 84%, and 87% of seizures of unseen subjects within 0, 1, 2, 3, and 4 days of mean absolute forecasting errors, respectively.

The subject-wise performances presented in Fig. 5 indicate that each subject demonstrates a different level of seizure predictability. Seizures from the patients with more number of seizures were generally better predicted as compared with the subjects with fewer seizures. However, a number of patients with infrequent seizures were still able to have their ictal events accurately predicted, as their seizure intervals were statistically close to the population-wise average seizure intervals. In addition, it was shown that using longer seizure histories for time-series forecasting (the look back parameter k) generally improved the performance of both the statistical and ML-based regression algorithms.

These findings are inline with the variability in seizure cyclicality seen across individuals. An undetermined ‘pacemaker’ that governs the timing of epileptic seizures has been theorized since antiquity, and recent evidence has accumulated in favor of individualized rhythmicity for many people with epilepsy [7, 8, 11, 15]. The rhythm can be challenging to identify by eye; but the problem is well-suited for time-series forecasting and ML-based regression models. The neurobiological underpinnings could be related to a number of factors that also occur in cycles, such as periodic behaviors (e.g. drinking alcohol on the weekends, forgetting to refill medications at the end of the month), coherence of biological clocks (e.g., circadian rhythms converging from different organs), or a longer-term and enigmatic regulator of brain activity [23]. Individual variability (such as the hereby demonstrated, which can be captured for a large number of individuals based on self-reported seizure events) has important implications for the care of patients with epilepsy. To help alleviate their “seizure worry”, the majority of patients report being amenable to using a seizure prediction device if it were to achieve 90% sensitivity and 50% specificity [31]. In the landmark NeuroVista trial, two out of fifteen patients with implanted electrodes received warnings about impending seizures with greater than 90% sensitivity and high specificity for more than four months [26]. This prospective trial was pivotal, but it also highlighted the difficulty in predicting seizures even with immediate feedback from an invasive and often undesirable intracranial EEG [31]. Towards less invasive methods of seizure prediction [32], recent studies have compared predictions derived from subdural EEG versus self-reported diaries, and have shown consistency in seizure cycles and seizure frequency variance across datasets [9, 29].

A limitation of our approach is that self-reported seizures are not entirely reliable, often underestimating seizure burden when compared to EEG [26, 33], or RNS recordings. It is likely that the participants have logged their “most significant seizures” (the seizures that most affect their lives); although, there is no evidence that the HEP participants were consistent in their seizure logging over time or followed a similar seizure logging practice (i.e., the dataset does not discriminate good, intermediate and bad trackers).

A relevant question is “how far before a seizure will patients find an alert useful?” While an accurate hazard lead-time of a few minutes can be transformative for patients with epilepsy, seizure forecasting based on self-reported seizure-diaries alone are not expected to reach this level of accuracy. Arguably, only forecasting based on hemodynamic changes that precede seizure onsets [34], or technologies based on biosignals such as the EEG or RNS devices may reach minute-wise accurate forecasts. Nonetheless, diary-based seizure forecasts remain transformative by providing baselines for comparison and providing coarse-level seizure forecasts for long-term (nonemergency) plannings of patients with epilepsy. The applications include 1) using additional medications (since medication effects do not last minutes, but rather many hours to days); 2) rescheduling of important events or daily/weekly/monthly workload ahead of a probable seizure date; 3) scheduling admissions to the epilepsy monitoring unit to increase chances of seizure capture in the hospital; 4) longitudinal hypothesis building, by combining coarse-level diary-based forecasts with fine-level biosignal-based forecasts to build accurate multi-modality seizure forecasting models. In future studies, using the EEG or RNS data can reduce the subjectiveness of seizure logs. Additionally, development of successful models might motivate patients to keep a more accurate seizure diary, and could accelerate interest in developing non-invasive seizure monitoring tools [32].

Another limitation to our study is that this cohort of participants from the HEP may not be generalizable to a general epilepsy clinical patient. For instance, we expect higher adherence to record-keeping in our dataset given the willingness of the volunteers to participate in the trial, and the close follow-up and encouragement they receive as part of the trial.

6. Conclusion

In this study, we demonstrated that seizure diaries of patients with epilepsy can be used to train statistical and ML-based models to predict future seizures even for unseen patients (patients not included in the training phase). The model builds upon the promising combination of time-series forecasting algorithms for the purpose of seizure forecasting, a topic which has gained significant attention and interest in recent years [28].

Patients and clinicians would immensely benefit from accurate seizure prediction tools, and this study represents an important step towards finding the right tool for the right patients. In future research, the hereby developed models can be combined with short-term forecasting algorithms based on EEG and RNS recordings to further enhance predictive accuracy and to benefit from the synergistic effect of behavioral and neurophysiological data. In balancing cost and invasiveness with the greater accuracy offered by chronic EEG recordings [9], one could imagine a step-wise approach: behavioral data could be initially used to identify the subset of patients whose clinical seizures are most likely predictable; these patients could then be offered more invasive monitoring such as an EEG-based or RNS-based warning system if it were to provide more accurate forecasting warnings.

The presented results demonstrate that diary-based seizure forecasting based on self-reported seizures alone has potential applications in the clinical care of patients with epilepsy, especially by providing same day estimates of seizure occurrence, or at most with one day of error (totaling between 50% to 70% of the seizures, according to our results). Arguably, seizure predictions with higher errors could have potentially dangerous consequences or could increase the uncertainty, anxiety and stress for patients with epilepsy. Nonetheless, even for higher error rates, the framework provides the performance bounds of diary-based-only seizure forecasting, which can be used in future research as benchmarks for forecasting perceived and unperceived seizures using EEG-based or RNS-based systems. Overall, the cost-effectiveness of seizure diary-based forecasting could have important implications for seizure worry but also for management interventions, such as deciding when to prescribe additional anti-seizure medication doses or when to schedule life events or medical procedures such as admissions to the epilepsy monitoring unit.

While the objective of this study was to assess the level of seizure predictability for unseen patients (solely from population-wise patterns and a subject’s prior seizure diaries), in future studies, the developed forecasting models can be extended to individualized models that are customized per patient. The advantages and disadvantages of early seizure warnings, which may lead into anxiety, patient stress and reluctance (due to occasional seizure false alarms), also need to be studied from the psychological perspective.

Acknowledgments

The authors sincerely thank the participants and principal investigators of the Human Epilepsy Project for making data available to researchers. L. Bonilha acknowledges the NIH-NINDS R01 NS110347 award for supporting this study.

Footnotes

§

Note that there was no accounting for the impact of medication changes in this study.

The monotonic improvement of the performances by increasing the look-back window lengths was observed with all the regression algorithms detailed in Section 3.

References

  • [1].West S, Nevitt SJ, Cotton J, Gandhi S, Weston J, Sudan A, Ramirez R, and Newton R, “Surgery for epilepsy,” Cochrane Database of Systematic Reviews, Jun. 2019. [Online]. Available: 10.1002/14651858.cd010541.pub3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Epilepsy Innovation Institute, 2016 Community Survey, 2016, published 2016. Accessed September 23, 2019. [Online]. Available: https://www.epilepsy.com/sites/default/files/atoms/files/community-survey-report-2016%20V2.pdf
  • [3].Fisher RS, “Epilepsy from the patient’s perspective: Review of results of a community-based survey,” Epilepsy & Behavior, vol. 1, no. 4, pp. S9–S14, Aug. 2000. [Online]. Available: 10.1006/ebeh.2000.0107 [DOI] [PubMed] [Google Scholar]
  • [4].Baud MO and Rao VR, “Gauging seizure risk,” Neurology, vol. 91, no. 21, pp. 967–973, Oct. 2018. [Online]. Available: 10.1212/wnl.0000000000006548 [DOI] [PubMed] [Google Scholar]
  • [5].Keezer MR, Simard-Tremblay E, and Veilleux M, “The diagnostic accuracy of prolonged ambulatory versus routine EEG,” Clinical EEG and Neuroscience, vol. 47, no. 2, pp. 157–161, Sep. 2015. [Online]. Available: 10.1177/1550059415607108 [DOI] [PubMed] [Google Scholar]
  • [6].Spencer DC, Sun FT, Brown SN, Jobst BC, Fountain NB, Wong VSS, Mirro EA, and Quigg M, “Circadian and ultradian patterns of epileptiform discharges differ by seizure-onset location during long-term ambulatory intracranial monitoring,” Epilepsia, vol. 57, no. 9, pp. 1495–1502, Jul. 2016. [Online]. Available: 10.1111/epi.13455 [DOI] [PubMed] [Google Scholar]
  • [7].Karoly PJ, Goldenholz DM, Freestone DR, Moss RE, Grayden DB, Theodore WH, and Cook MJ, “Circadian and circaseptan rhythms in human epilepsy: a retrospective cohort study,” The Lancet Neurology, vol. 17, no. 11, pp. 977–985, Nov. 2018. [Online]. Available: 10.1016/s1474-4422(18)30274-6 [DOI] [PubMed] [Google Scholar]
  • [8].Wilson JVK and Reynolds EH, “Translation and analysis of a cuneiform text forming part of a babylonian treatise on epilepsy,” Medical History, vol. 34, no. 2, pp. 185–198, Apr. 1990. [Online]. Available: 10.1017/s0025727300050651 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Karoly PJ, Cook MJ, Maturana M, Nurse ES, Payne D, Brinkmann BH, Grayden DB, Dumanis SB, Richardson MP, Worrell GA, Schulze-Bonhage A, Kuhlmann L, and Freestone DR, “Forecasting cycles of seizure likelihood,” Epilepsia, vol. 61, no. 4, pp. 776–786, Mar. 2020. [Online]. Available: 10.1111/epi.16485 [DOI] [PubMed] [Google Scholar]
  • [10].Leguia MG, Andrzejak RG, Rummel C, Fan JM, Mirro EA, Tcheng TK, Rao VR, and Baud MO, “Seizure cycles in focal epilepsy,” JAMA Neurology, vol. 78, no. 4, pp. 454–463, Apr. 2021. [Online]. Available: 10.1001/jamaneurol.2020.5370 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Langdon-Down M and Brain WR, “Time of day in relation to convulsions in epilepsy.” The Lancet, vol. 213, no. 5516, pp. 1029–1032, May 1929. [Online]. Available: 10.1016/s0140-6736(00)79288-9 [DOI] [Google Scholar]
  • [12].Griffiths G and Fox J, “Rhythm in epilepsy,” The Lancet, vol. 232, no. 5999, pp. 409–416, Aug. 1938. [Online]. Available: 10.1016/s0140-6736(00)41614-4 [DOI] [Google Scholar]
  • [13].Durazzo TS, Spencer SS, Duckrow RB, Novotny EJ, Spencer DD, and Zaveri HP, “Temporal distributions of seizure occurrence from various epileptogenic regions,” Neurology, vol. 70, no. 15, pp. 1265–1271, Apr. 2008. [Online]. Available: 10.1212/01.wnl.0000308938.84918.3f [DOI] [PubMed] [Google Scholar]
  • [14].Pavlova MK, Shea SA, and Bromfield EB, “Day/night patterns of focal seizures,” Epilepsy & Behavior, vol. 5, no. 1, pp. 44–49, Feb. 2004. [Online]. Available: 10.1016/j.yebeh.2003.10.013 [DOI] [PubMed] [Google Scholar]
  • [15].Binnie C, Aarts J, Houtkooper M, Laxminarayan R, Silva AD, Meinardi H, Nagelkerke N, and Overweg J, “Temporal characteristics of seizures and epileptiform discharges,” Electroencephalography and Clinical Neurophysiology, vol. 58, no. 6, pp. 498–505, Dec. 1984. [Online]. Available: 10.1016/0013-4694(84)90038-5 [DOI] [PubMed] [Google Scholar]
  • [16].Baud MO, Kleen JK, Mirro EA, Andrechak JC, King-Stephens D, Chang EF, and Rao VR, “Multi-day rhythms modulate seizure risk in epilepsy,” Nature Communications, vol. 9, no. 1, Jan. 2018. [Online]. Available: 10.1038/s41467-017-02577-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Proix T, Truccolo W, Leguia MG, King-Stephens D, Rao VR, and Baud MO, “Forecasting seizure risk over days,” Lancet Neurol, oct 2019. [Online]. Available: 10.1101/19008086 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Karoly PJ, Freestone DR, Boston R, Grayden DB, Himes D, Leyde K, Seneviratne U, Berkovic S, O’Brien T, and Cook MJ, “Interictal spikes and epileptic seizures: their relationship and underlying rhythmicity,” Brain, vol. 139, no. 4, pp. 1066–1078, Feb. 2016. [Online]. Available: 10.1093/brain/aww019 [DOI] [PubMed] [Google Scholar]
  • [19].Baldassano SN, Brinkmann BH, Ung H, Blevins T, Conrad EC, Leyde K, Cook MJ, Khambhati AN, Wagenaar JB, Worrell GA, and Litt B, “Crowdsourcing seizure detection: algorithm development and validation on human implanted device recordings,” Brain, vol. 140, no. 6, pp. 1680–1691, Apr. 2017. [Online]. Available: 10.1093/brain/awx098 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Gregg NM, Nasseri M, Kremen V, Patterson EE, Sturges BK, Denison TJ, Brinkmann BH, and Worrell GA, “Circadian and multiday seizure periodicities, and seizure clusters in canine epilepsy,” Brain Communications, vol. 2, no. 1, Jan. 2020. [Online]. Available: 10.1093/braincomms/fcaa008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Quigg M, Straume M, Menaker M, and Bertam EH, “Temporal distribution of partial seizures: Comparison of an animal model with human partial epilepsy,” Annals of Neurology, vol. 43, no. 6, pp. 748–755, Jun. 1998. [Online]. Available: 10.1002/ana.410430609 [DOI] [PubMed] [Google Scholar]
  • [22].Pitsch J, Becker AJ, Schoch S, Müller JA, de Curtis M, and Gnatkovsky V, “Circadian clustering of spontaneous epileptic seizures emerges after pilocarpine-induced status epilepticus,” Epilepsia, vol. 58, no. 7, pp. 1159–1171, May 2017. [Online]. Available: 10.1111/epi.13795 [DOI] [PubMed] [Google Scholar]
  • [23].Stirling RE, Cook MJ, Grayden DB, and Karoly PJ, “Seizure forecasting and cyclic control of seizures,” Epilepsia, vol. 62, no. S1, Jul. 2020. [Online]. Available: 10.1111/epi.16541 [DOI] [PubMed] [Google Scholar]
  • [24].Tsiouris KM, Pezoulas VC, Zervakis M, Konitsiotis S, Koutsouris DD, and Fotiadis DI, “A long short-term memory deep learning network for the prediction of epileptic seizures using EEG signals,” Computers in Biology and Medicine, vol. 99, pp. 24–37, Aug. 2018. [Online]. Available: 10.1016/j.compbiomed.2018.05.019 [DOI] [PubMed] [Google Scholar]
  • [25].Hussein R, Palangi H, Ward RK, and Wang ZJ, “Optimized deep neural network architecture for robust detection of epileptic seizures using EEG signals,” Clinical Neurophysiology, vol. 130, no. 1, pp. 25–37, Jan. 2019. [Online]. Available: 10.1016/j.clinph.2018.10.010 [DOI] [PubMed] [Google Scholar]
  • [26].Cook MJ, O’Brien TJ, Berkovic SF, Murphy M, Morokoff A, Fabinyi G, D’Souza W, Yerra R, Archer J, Litewka L, Hosking S, Lightfoot P, Ruedebusch V, Sheffield WD, Snyder D, Leyde K, and Himes D, “Prediction of seizure likelihood with a long-term, implanted seizure advisory system in patients with drug-resistant epilepsy: a first-in-man study,” The Lancet Neurology, vol. 12, no. 6, pp. 563–571, Jun. 2013. [Online]. Available: 10.1016/s1474-4422(13)70075-9 [DOI] [PubMed] [Google Scholar]
  • [27].Dumanis SB, French JA, Bernard C, Worrell GA, and Fureman BE, “Seizure forecasting from idea to reality. outcomes of the my seizure gauge epilepsy innovation institute workshop,” eneuro, vol. 4, no. 6, pp. ENEURO.0349–17.2017, Nov. 2017. [Online]. Available: 10.1523/eneuro.0349-17.2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Chiang S, Goldenholz DM, Moss R, Rao VR, Haneef Z, Theodore WH, Kleen JK, Gavvala J, Vannucci M, and Stern JM, “Prospective validation study of an epilepsy seizure risk system for outpatient evaluation,” Epilepsia, vol. 61, no. 1, pp. 29–38, Dec. 2019. [Online]. Available: 10.1111/epi.16397 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Goldenholz DM, Goldenholz SR, Moss R, French J, Lowenstein D, Kuzniecky R, Haut S, Cristofaro S, Detyniecki K, Hixson J, Karoly P, Cook M, Strashny A, and Theodore WH, “Is seizure frequency variance a predictable quantity?” Annals of Clinical and Translational Neurology, vol. 5, no. 2, pp. 201–207, Jan. 2018. [Online]. Available: 10.1002/acn3.519 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Kay SM, Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice Hall PTR, 1993. [Google Scholar]
  • [31].Schulze-Bonhage A, Sales F, Wagner K, Teotonio R, Carius A, Schelle A, and Ihle M, “Views of patients with epilepsy on seizure prediction devices,” Epilepsy & Behavior, vol. 18, no. 4, pp. 388–396, Aug. 2010. [Online]. Available: 10.1016/j.yebeh.2010.05.008 [DOI] [PubMed] [Google Scholar]
  • [32].Beniczky S, Karoly P, Nurse E, Ryvlin P, and Cook M, “Machine learning and wearable devices of the future,” Epilepsia, vol. 62, no. S2, Jul. 2020. [Online]. Available: 10.1111/epi.16555 [DOI] [PubMed] [Google Scholar]
  • [33].Blum DE, Eskola J, Bortz JJ, and Fisher RS, “Patient awareness of seizures,” Neurology, vol. 47, no. 1, pp. 260–264, Jul. 1996. [Online]. Available: 10.1212/wnl.47.1.260 [DOI] [PubMed] [Google Scholar]
  • [34].Zhang T, Zhou J, Jiang R, Yang H, Carney PR, and Jiang H, “Pre-seizure state identified by diffuse optical tomography,” Scientific Reports, vol. 4, no. 1, Jan. 2014. [Online]. Available: 10.1038/srep03798 [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES