Abstract
Objective:
Manually distinguishing between seizure and non-seizure events in intracranial electroencephalography (iEEG) recordings is highly time-consuming. In this study, we explored AI-based approaches for electrographic seizure classification (ESC) and seizure onset detection (SOD) in treatment-resistant epilepsy patients. ESC involves distinguishing seizure events from non-seizure activity, while SOD focuses on pinpointing the exact moment a seizure begins.
Methods:
We assessed several image-based and time-series-based model architectures for ESC and SOD, including convolutional neural networks (CNNs), vision transformers (ViTs), and time-series transformers. We used a dataset of approximately 560,000 iEEG traces from 291 focal epilepsy patients implanted with the NeuroPace RNS® system. We conducted extensive experimentation, varying the number of trainable parameters, input formats, and weight initializations to assess performance.
Results:
ViTs showed superior performance across both ESC and SOD, outperforming other models. For ESC, ViTs achieved a mean seizure classification accuracy of 97% across five folds of data, and an accuracy of 96% on a separate clinician-annotated test dataset. In the SOD task, ViTs showed an average median absolute error (MAE) of 1.4 seconds between model-predicted and human-labeled seizure onsets across five folds, and an MAE of 0.8 second on the separate clinician-annotated test dataset.
Conclusion:
These findings highlight the advantage of image-based AI approaches, particularly ViTs, in capturing nuanced seizure patterns from iEEG data.
Significance:
AI-driven ESC and SOD could support the development of more personalized treatment strategies for epilepsy patients while reducing the time required for manual review.
Index Terms: Electroencephalography, seizure detection, epilepsy, vision transformers, convolutional neural networks, time-series analysis
I. Introduction
Accurate assessment of seizure activity is crucial for managing and treating patients with treatment-resistant epilepsy, but effective monitoring remains difficult. Traditional methods rely on patient self-reports or caregiver observations, which are often incomplete or unreliable due to unnoticed seizure events [1]-[4]. This gap in accurate seizure tracking complicates the evaluation of treatment efficacy, making it difficult for clinicians to tailor interventions or adjust therapies effectively.
An implanted responsive neurostimulation device (RNS® System, NeuroPace) obtains valuable intracranial electroencephalography (iEEG) recordings. This information has the potential to enhance seizure management, especially in treatment-resistant epilepsy. However, advanced computational methods are essential to leverage this data fully and overcome the limitations of patient self-reports. Manual analysis of iEEG data is time-consuming and subject to human error, making automation a pathway to more precise monitoring and therapeutic interventions [5].
Among the computational techniques used in the field, Fourier and wavelet transform methods are commonly employed to facilitate seizure detection [6]-[9]. These methods provide a time-frequency analysis that captures both frequency and temporal characteristics of the signals, making them effective for detecting transient changes such as seizure onsets. In addition to spectral features, a variety of feature engineering techniques have been employed in conjunction with machine learning classifiers for seizure detection [10]-[14]. The extracted features serve as inputs for machine learning models. Classical machine learning algorithms, such as support vector machines and logistic regression, use these features to identify and learn patterns associated with impending seizures. While these techniques have been helpful, they do exhibit certain limitations. One notable shortfall lies in their dependence on handcrafted features. The process of manually selecting and engineering features from EEG signals is inherently subjective and may overlook subtle patterns or relationships that are not apparent to human experts [15].
Deep learning methods, particularly convolutional neural networks (CNNs), offer a powerful alternative to classical machine learning techniques. Unlike traditional approaches, CNNs can automatically learn and extract hierarchical features from raw data, reducing the need for manual feature engineering. The convolutional layers in CNNs are effective at capturing spatial relationships in data, enabling these models to identify complex patterns that may indicate seizure activity. CNNs have been successfully applied to both time-series and image-based representations of EEG data, where they have demonstrated the ability to recognize subtle shifts in brain activity associated with seizures [16]-[20]. However, despite their strengths, CNNs can struggle to capture long-term dependencies in EEG signals, as they are primarily designed to model local spatial and temporal correlations [21], [22].
Transformers have emerged as highly effective models in a variety of tasks due to their ability to capture long-range dependencies and context, making them well-suited for seizure detection [23], [24]. Unlike CNNs, which focus on local spatial or temporal patterns, transformers rely on attention mechanisms that allow the models to weigh the importance of different segments of the input, regardless of their distance from one another. This makes transformers particularly powerful for analyzing iEEG signals, which often require modeling long-term dependencies and complex temporal relationships to accurately detect seizures. Vision Transformers (ViTs), a specialized variant of transformers originally designed for image classification, have also shown promise in the context of seizure prediction and classification [25], [26]. By leveraging self-attention across time-frequency spectrogram image patches, ViTs can potentially identify subtle and complex patterns that are indicative of seizure activity.
In this paper, we compare different image-based and time-series-based AI model architectures for two important tasks related to iEEG recordings: electrographic seizure classification (ESC) and seizure onset detection (SOD). ESC is crucial for accurately identifying the presence of seizure events in iEEG recordings. SOD, on the other hand, focuses on pinpointing the precise moment a seizure begins, enabling accurate intervention strategies. We evaluate several state-of-the-art deep learning architectures, including CNNs, ViTs and time-series transformers. By comparing the performance of these architectures, we aim to identify the most effective model for each task.
II. Methods
A. Model Architectures
For the image-based models, we explored CNNs and ViTs. Within the CNN category, we used the ResNet-50 architecture, which is renowned for its deep residual learning capabilities, as well as a 2-dimensional CNN (2D CNN) with three convolutional layers to capture spatial patterns in the iEEG data. For ViTs, we employed the standard ViT-B/16 variant [28], which uses 12 transformer layers with 12 self-attention heads per layer, a hidden dimension of 768, and processes images by dividing them into 16x16 pixel patches. We experimented with two different configurations of trainable parameters for the ViT: one with 49 million and another with 86 million trainable parameters which is the total capacity of ViT-B/16. For clarity, we referred to them as ViT (49M) and ViT (86M), respectively.
In the time-series domain, we applied a 1-dimensional CNN (1D CNN) with eight convolutional layers to capture temporal patterns in the iEEG signals. Additionally, we used a time-series transformer model that integrated four convolutional layers with six transformer blocks. This hybrid approach aimed to combine the strengths of convolutional layers for local feature extraction with the global sequence modeling capabilities of transformers.
B. Model Inputs
The NeuroPace RNS® System is an FDA-approved neurostimulation device for treating drug-resistant partial-onset epilepsy in adults aged 18 and older [27]. The device delivers targeted neurostimulation in response to the detection of abnormal brain activity unique to each patient, and can also record iEEG data. Figure 1 shows the RNS® System and an example of a 4-channel iEEG recording from the device.
Fig. 1.

A) The NeuroPace RNS® System. The device includes a neurostimulator placed within the skull that is connected to 2 leads, each containing 4 sensing and stimulating electrodes, that are placed at the region of seizure onset. B) Example 4-channel iEEG recording from an implanted device in a patient. The top panel displays the time-series data, while the bottom panel shows the corresponding time-frequency spectrograms.
iEEG data recordings from 291 focal epilepsy patients who were implanted with the RNS® system were included in the analysis here. Each patient typically underwent two RNS® lead implantations. Each iEEG record typically contained 4 channels of brain activity, with 2 channels corresponding to each implanted lead. The sampling rate for these recordings was 250 Hz, and the time series activity for each iEEG channel was either 90 seconds or 180 seconds in duration. Individual iEEG channels (typically 4 per iEEG record) were used as inputs to the models.
We removed stimulation artifacts from each channel in all iEEG recordings. These artifacts, introduced by the device during neurostimulation, represent the primary source of outliers and noise in the raw iEEG signals. As the timing and duration of stimulation events were precisely annotated within each recording, we were able to reliably exclude these segments before further processing by the models.
For the image-based models, the time-series data were transformed into time-frequency spectrogram images, which were then used as inputs to the models. This transformation allowed the data to be represented in a format suitable for image-based processing.
In contrast, for the time-series-based models, the raw iEEG time-series data were initially scaled using min-max normalization. Since the stimulation artifact was removed from the signals, they were largely free of outliers, making min-max normalization an appropriate preprocessing step. The input data were standardized to a fixed length of 180 seconds. If the total iEEG channel activity exceeded this duration, it was truncated; if shorter, it was zero-padded towards the end to meet the required input size. The zero-padded values were not considered during training or inference.
C. ESC Model
1). Datasets:
iEEG recordings from 113/291 patients were included for ESC training process. In total, 136878 distinct iEEG records were collected from these patients. The ESC models were trained to classify seizure activity independently on each of the iEEG channels within these typically 4-channel records, resulting in a total of around 550000 individual iEEG channels used for training.
To evaluate the final performance of the models, we used a separate clinician-annotated test dataset consisting of 1,073 iEEG records, which resulted in 4,235 labeled iEEG channels due to clinicians’ uncertainty regarding the labels of some individual channels. This dataset was derived from 100 of the 291 patients. The data from these 100 patients was only used for final clinical evaluation and was not a part of the train/validation/test data splits. This dataset was independently labeled by three board-certified neurologists to ensure high reliability and clinical accuracy.
2). Labeling Process:
The iEEG channels included in the training dataset (136878 recordings from 113 patients) were labeled as either electrographic seizure or non-seizure activity by a NeuroPace employee after receiving training, assisted by a clustering-based workflow, as described in [17]. This clustering approach involved extracting features from each iEEG channel using a pretrained GoogLeNet Inception-V3 model. The concatenated features for each record (4 channels) were then reduced in dimensionality using PCA followed by t-SNE, and subsequently clustered using a Bayesian Gaussian Mixture Model, applied on a per-patient basis. Clustering was performed independently and separately for each patient’s iEEG records to efficiently identify patient-specific patterns for labeling. Manual labeling was performed by first assigning seizure/non-seizure labels to the cluster centroids. These labels were then propagated to all iEEG records within the corresponding clusters. To ensure labeling quality, a manual verification step was conducted in which the labeler visually inspected all records within each cluster using a thumbnail-based review interface. Labels that appeared inconsistent with the cluster were corrected accordingly.
3). Data Splits:
From the 113 patients used for the ESC task, we initially reserved data from 23 patients for testing. From the remaining 90 patients, we generated five random data folds, each consisting of 72 patients for training and 18 patients for validation. The distribution of patients, total number of iEEG records (including seizures and non-seizures), along with the number of seizure and non-seizure iEEG channels in the training, validation and test sets for each fold, is shown in Table I. The training sets were class-balanced with equal representation of seizure and non-seizure channels. For each fold, the model was trained and validated using the respective splits, followed by an evaluation on the held-out test set. The performance metrics across all folds were averaged to report the final test performance. A visualization of the data splits for both ESC and SOD analyses is shown in Figure 4.
TABLE I.
Per-fold distribution of patients, seizures (Sz.), non-seizures (NSz.) and number of iEEG recordings across training, validation, and test sets in ESC. Sz. and NSz. show number of iEEG channels of each type.
| Training Set | Validation Set | Test Set | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Fold | Pts. | Sz. | NSz. | # records | Pts. | Sz. | NSz. | # records | Pts. | Sz. | NSz. | # records |
| 1 | 72 | 81573 | 81573 | 77482 | 18 | 21692 | 40239 | 21845 | 23 | 36918 | 41883 | 25123 |
| 2 | 72 | 78218 | 78218 | 72865 | 18 | 33398 | 35751 | 22045 | 23 | 36918 | 41883 | 25123 |
| 3 | 72 | 103572 | 103572 | 87347 | 18 | 8381 | 36832 | 14631 | 23 | 36918 | 41883 | 25123 |
| 4 | 72 | 91214 | 91214 | 75880 | 18 | 24957 | 56364 | 26475 | 23 | 36918 | 41883 | 25123 |
| 5 | 72 | 100983 | 100983 | 91080 | 18 | 15917 | 25066 | 12628 | 23 | 36918 | 41883 | 25123 |
Fig. 4.

A visualization of all the data used for ESC and SOD analysis is shown. The dataset consists of recordings from 291 patients with focal epilepsy. For the final clinical evaluation, data from 100 patients were reserved. Of the remaining 191 patients, data from 113 were used for ESC analysis, while data from 122 were used for SOD analysis. There was an overlap of 44 patients between the ESC and SOD analyses.
4). Model Outputs:
Both types of models—image-based and time-series-based—outputted seizure probabilities, providing a classification of the iEEG channels into seizure or non-seizure categories.
5). Performance Metrics:
The binary classification accuracy of the model, which distinguishes between seizure and non-seizure classes, was used as the primary performance metric. This metric evaluates the proportion of correctly classified instances among the total number of instances, providing a measure of the model’s effectiveness in distinguishing between the two classes. An example of the two classes is shown in Figure 2. The figure shows an example of a spectrogram image labeled by a human annotator as a seizure and another labeled non-seizure.
Fig. 2.

A) Example iEEG channel trace labeled ’seizure’ by a human annotator. B) Another example iEEG channel trace labeled ’non-seizure’ by the annotator.
6). Experiments:
Both image-based and time-series-based models were assessed for the task of ESC. Various experiments were conducted to evaluate model performance under different conditions. For each of the image-based models, both color and grayscale spectrogram images were used separately as inputs to assess the impact of color representation.
D. SOD Model
1). Datasets:
The dataset for SOD analysis comprised of iEEG records from 122/291 patients. 44 of these patients were also a part of the ESC training data. From these 122 patients, 2,324 distinct iEEG seizure records were randomly selected for labeling. Due to the time-consuming nature of human labeling, only these limited number of records could be labeled. Many individual iEEG channels within these records did not contain seizures, resulting in a total of 7,808 individual iEEG seizure channels. Seizure onset times for these channels were labeled by NeuroPace employees under supervision of a clinician board-certified in Neurology and Epilepsy, forming the original training dataset for SOD.
For evaluating the final performance of the models, the same clinician-annotated test dataset from 100 patients was used, as in the case of ESC (see Figure 4). Seizure onset times for a subset of the 4235 iEEG channels which contained seizures were labeled by the same board-certified neurologists independently.
2). Data Augmentation:
Deep learning models generally perform better with larger and more diverse datasets. To address the relatively small size of our original dataset, consisting of 7,808 iEEG channels, we applied two distinct data augmentation strategies.
Augmentation 1 involved translational shifting within individual iEEG channels by shifting the seizure onset time, removing data from the end of the recording, and appending an equivalent amount to the beginning. This resulted in 45694 augmented iEEG channels of data. This approach effectively simulates a time-shifted seizure onset, allowing the model to learn from different temporal contexts within the same signal. This augmentation technique involved a one-way adjustment, always shifting the temporal activity to the right.
Augmentation 2 was performed by appending baseline non-seizure activity from randomly chosen iEEG records within the same patient to the beginning of their labeled electrographic seizure onsets. A total of 30 such augmentations per each original iEEG channel were performed. Finally the data from augmentation 1 were also included in this dataset. This resulted in a total of 257807 augmented iEEG channels of data. Two examples of each augmentation type, along with the original iEEG channel activity, are shown in Figure 3 for the image-based input.
Fig. 3.

Data augmentation for SOD analysis. The top row presents an original iEEG spectrogram, which undergoes augmentation 1, resulting in two different augmented examples shown in panels A and B. Similarly, the bottom row displays the same original iEEG spectrogram that is subjected to augmentation 2, with two resulting examples shown in panels C and D. In each spectrogram, the seizure onset times are indicated by text and marked by red vertical lines. Both original and modified seizure onset times after augmentation are shown. Orig. = original; mod. = modified; aug. = augmentation.
3). Data Splits:
The patients were randomly shuffled, and 5 folds of data were created from the 122 patients. In each fold, 98 patients were allocated for training and 24 for testing. The results from each fold were reported to ensure robust performance evaluation across the dataset (see Figure 4 for more details).
4). Model Outputs:
Both types of models—image-based and time-series-based—outputted a seizure onset time in seconds for the input iEEG channel data.
5). Performance Metrics:
Median absolute error (MAE) between human-labeled seizure onset times and model predictions was used to compare performance for SOD, since this metric is robust to outliers and diverse preferences of human labelers. An example is shown in Figure 5. It displays a spectrogram image with both the human-labeled seizure onset and the model-predicted onset. The black arrow indicates the error between the two.
Fig. 5.

An example iEEG channel trace is presented, with both human-labeled and model-predicted seizure onset times marked by red and blue vertical lines, respectively. The black arrow highlights the discrepancy between the predicted and actual seizure onset times, indicating the error in the model prediction.
6). Experiments:
We conducted a series of experiments to assess the performance of the models in detecting seizure onset times accurately. Specifically, we compared model performance when trained on the original dataset versus the two augmented datasets (augmentation 1 and augmentation 2). Importantly, the models were trained on augmented datasets but evaluated exclusively on the original, unaugmented data. For image-based models, we used both color and grayscale spectrograms as input images, while also comparing random weight initialization against pre-trained ImageNet weight initialization.
7). Hyperparameters:
A detailed list of hyperparameters used for both the ESC and SOD models is provided in Appendix A, in the Supplementary Materials.
III. Results
A. ESC Model Performance
1). Performance on Test Dataset:
The performances of different types of models on the test set (data from 23 patients as shown in Figure 4) under different experimental conditions are shown in Table II. The table reports the binary accuracy of the models in distinguishing between seizure and non-seizure iEEG channels. Binary accuracies are provided for 5 data folds, and the final column presents the mean ± standard deviation (SD) of test accuracy across the 5 folds. The best performing model in this comparison was ViT (86M) with color spectrogram inputs, achieving the highest mean test accuracy of ≈ 97%.
TABLE II.
ESC performance on the test set (23 patient data)
| Model type |
Image input |
Fold 1 | Fold 2 | Fold 3 | Fold 4 | Fold 5 | Mean ± SD |
|---|---|---|---|---|---|---|---|
| ResNet-50 | Color | 96.24 | 95.38 | 95.27 | 95.78 | 95.45 | 95.62 ± 0.35 |
| ResNet-50 | Grayscale | 95.42 | 93.97 | 94.49 | 94.58 | 95.24 | 94.74 ± 0.53 |
| ViT (49M) | Color | 96.55 | 96.06 | 94.91 | 96.70 | 95.70 | 95.98 ± 0.64 |
| ViT (49M) | Grayscale | 96.50 | 93.55 | 94.74 | 95.32 | 95.16 | 95.06 ± 0.95 |
| ViT (86M) | Color | 96.90 | 96.54 | 95.95 | 97.24 | 97.12 | 96.75 ± 0.47 |
| 2D CNN | Color | 94.65 | 93.78 | 93.89 | 95.53 | 94.15 | 94.40 ± 0.64 |
| 2D CNN | Grayscale | 94.91 | 94.23 | 94.20 | 94.46 | 92.58 | 94.08 ± 0.79 |
| 1D CNN | n/a | 92.12 | 84.52 | 93.64 | 90.08 | 83.04 | 88.68 ± 4.18 |
| Time-series transformer | n/a | 93.40 | 87.28 | 91.00 | 95.85 | 90.78 | 91.66 ± 2.86 |
We also evaluated the mean area under the receiver operating characteristic curve (AUROC) for each model type, as shown in Figure 6. The ViT (86M) and ResNet-50 models achieved the highest scores across 5 folds on the 23-patient held-out test set, both reaching mean AUROC of 0.99.
Fig. 6.

Mean ROC curves across 5 folds for each model on the 23-patient test set in the ESC task. Shaded areas represent ± 1 standard deviation. Legend shows mean AUROC ± standard deviation.
When comparing image-based models (ResNet-50, ViT, 2D CNN) to time-series-based models (1D CNN, time-series transformer), the image-based models clearly outperformed the time-series models (Table II). The best time-series model, the time-series transformer, achieved a mean test accuracy of 91.7%, which was significantly lower than the best image-based models, such as the ViT (86M) at 96.8% (statistically significant, Wilcoxon rank-sum test, p < 0.05). The performance comparison of models using color versus grayscale spectrogram image inputs is presented in Appendix B of the Supplementary Materials.
2). Performance on Clinician-Annotated Dataset:
Table III reports binary seizure classification accuracy and F1 scores on the clinician-annotated dataset, for the best-performing image-based and time-series-based models on the 23-patient test set. Model comparisons with each of the expert labelers are presented in separate columns (exp. denotes expert and TST denotes the time-series transformer). On this clinician-annotated dataset, we observed clear performance distinctions among the models across both accuracy and F1 score metrics. In terms of accuracy and F1 scores, ViT (86M) consistently outperformed the other models. ViT (86M) achieved accuracy values of 90.2%, 90.3%, and 87.2% when compared to expert 1, expert 2, and expert 3, respectively, making it the highest-performing model across all comparisons. The ResNet-50 model followed closely, with accuracies of 87.8%, 88.4%, and 85.1%, showing slightly lower but competitive performance. The 1D CNN model showed the lowest accuracy.
TABLE III.
ESC performance on the clinician-annotated dataset
| Accuracy | F1 score | |||||
|---|---|---|---|---|---|---|
| Model | vs. Exp. 1 |
vs. Exp. 2 |
vs. Exp. 3 |
vs. Exp. 1 |
vs. Exp. 2 |
vs. Exp. 3 |
| ResNet-50 | 87.82 | 88.37 | 85.14 | 85.99 | 86.25 | 84.38 |
| ViT (86M) | 90.17 | 90.34 | 87.15 | 88.25 | 88.12 | 85.99 |
| 2D CNN | 87.15 | 87.78 | 83.99 | 84.81 | 85.16 | 82.71 |
| 1D CNN | 84.33 | 84.25 | 80.01 | 80.94 | 80.29 | 77.97 |
| TST | 87.12 | 87.57 | 83.34 | 84.16 | 85.10 | 82.25 |
Out of the 4,235 iEEG channels annotated by clinicians for final model evaluation, only 3,225 channels showed full agreement among the three experts, where all experts labeled the channels as either seizure or no-seizure. For further analysis, we evaluated the performance of the top-performing models from the 23-patient test set on these iEEG channels with unanimous expert agreement. Accuracy and F1 scores for these channels were computed and visualized in Figure 7. From the figure, we observe that ViT (86M) showed superior performance achieving a seizure classification accuracy of 96% and an F1 score of 95%, outperforming the other models.
Fig. 7.

ESC model performance on iEEG channels within the clinician-annotated dataset where all experts agreed on the labels.
B. SOD Model Performance
1). Performance on Test Dataset:
In the SOD task, ViT (86M) with augmentation 2 achieved the best performance overall, with an average MAE between model predictions and human labels of 1.4 seconds (across the 5 folds).
Data augmentation led to improved model performance, with augmentation 2 yielding the greatest gains. The average MAE with augmentation 2 was generally lower than that of the original data for both image-based and time-series-based models, reaching statistical significance for image-based models (Wilcoxon rank-sum test, Bonferroni-corrected p < 0.005). Additionally, the average MAE for augmentation 2 was lower than that of augmentation 1 across both image-based and time-series-based models, with a significant difference observed in case of ResNet-50 (Wilcoxon rank-sum test, Bonferroni-corrected p < 0.005). These results are also shown in Figure 8. Additional details on the SOD model performance across five folds under varying data augmentation and weight initialization conditions are provided in Appendix C in the Supplementary Materials.
Fig. 8.

Performance of the SOD models on original data compared to two types of data augmentation. Augmentation 2 consistently produced lower median absolute errors across all models. Error bars represent the average median absolute errors ± standard deviation (in seconds) between model predictions and human-labeled seizure onset times across five folds.
2). Performance on Clinician-Annotated Dataset:
We also evaluated the performance of different models on the clinician-annotated test dataset where three clinicians board-certified in Neurology and Epilepsy labeled seizure onset times on a subset of the 4235 iEEG channels (as described in the case of ESC) which contained seizures. Out of the 4,235 iEEG channels, we identified a subset of 1,231 channels for which seizure onset time labels from all three experts were available. In the remaining cases, some experts were uncertain about the precise seizure onset times, resulting in unavailable labels for those channels. The performance of our models on this subset of iEEG channels is shown in Table IV. It shows the MAE of different models in detecting seizure onset times compared to the three experts in each column. Both image-based and time-series-based models shown here underwent augmentation 2. Additionally, the image-based models shown used color spectrogram inputs, and were initialized with ImageNet weights. We defined inter-expert variability as the range of expert-labeled seizure onset times, calculated as the absolute difference between the maximum and minimum onset times for each individual iEEG channel, as labeled by the three experts. Inter-expert variability reflects the level of consensus among experts in labeling seizure onset times with lower values indicating high consensus. The value any in the inter-expert variability column indicates that all 1231 iEEG channels were taken into account. The value ≤ 1 second signifies that only those iEEG channels were included where the inter-expert variability was less than or equal to 1 second. When all 1,231 iEEG channels were included in the analysis, the ViT (86M) model outperformed the other models in terms of MAE across all three expert comparisons. Furthermore, when inter-expert variability was constrained to 1 second (high consensus), the ViT (86M) model again showed superior performance, achieving an MAE of 0.8 second between the model-predicted seizure onset times and the expert labels. Figure 9 presents a plot of MAE against inter-expert variability for the best-performing ViT (86M) model. The three curves represent the MAE between the model and each expert as inter-expert variability is allowed to vary. As an example, a point at 20 seconds on the x-axis in this cumulative plot indicates that the analysis included only those iEEG channels where inter-expert variability was less than or equal to 20 seconds. The curves show that the MAE between the model and experts was lowest when there was a strong consensus among experts. On the iEEG channels where all experts labelled the same seizure onset times (inter-expert variability = 0), the MAE between model and experts was 0.6 second.
TABLE IV.
SOD performance on the clinician-annotated dataset
| Model type | Inter-expert variability | Model vs. Expert 1 | Model vs. Expert 2 | Model vs. Expert 3 |
|---|---|---|---|---|
| ResNet-50 | any | 2.40 | 1.95 | 2.70 |
| ViT (49M) | any | 2.90 | 2.60 | 3.20 |
| ViT (86M) | any | 2.26 | 1.79 | 2.68 |
| 1D CNN | any | 9.43 | 11.92 | 13.6 |
| Time-series transformer | any | 8.6 | 8.5 | 9.8 |
| ResNet-50 | ≤ 1 second | 1.2 | 1.2 | 1.2 |
| ViT (49M) | ≤ 1 second | 1.5 | 1.5 | 1.5 |
| ViT (86M) | ≤ 1 second | 0.91 | 0.83 | 0.83 |
| 1D CNN | ≤ 1 second | 10.35 | 10.33 | 10.41 |
| Time-series transformer | ≤ 1 second | 8.36 | 8.32 | 8.38 |
Fig. 9.

MAE versus inter-expert variability for the ViT (86M) model. The three curves represent the MAE between the model and each expert as inter-expert variability in labeling seizure onset times is allowed to vary. The model achieves best performance when there is a high level of consensus among the experts.
IV. Discussion
In this study, we investigated various image- and time-series-based model architectures to improve electrographic seizure classification and seizure onset detection in iEEG recordings, addressing the time-intensive nature of manual review. The results underscore the effectiveness of image-based models, particularly ViT (86M) and ResNet-50, in detecting seizure activity. Notably, the ViT (86M) turned out to be the best performer for both ESC and SOD tasks.
The significant performance difference between image-based and time-series-based models underscores the benefit of using a 2D structure to represent iEEG data. While both types of models analyze temporal information, image-based models process the data as a 2D representation, such as a spectrogram, where one axis represents time and the other represents frequency or another derived feature. This allows them to capture complex patterns across both time and frequency, which time-series models, focusing strictly on the sequence of data over time, might miss. Thus, transforming iEEG data into image representations may improve interpretability and lead to more accurate seizure classification.
Although a clustering-based workflow (as described in [17]) was used to assist the labeling of iEEG recordings for the training dataset (113 patients), manual verification was performed to correct any labels in the clusters that appeared inconsistent. The clustering pipeline operated completely independently from the supervised ESC models. The ESC models were trained only after manual labeling was finalized and shared no inputs, interactions, or features with the clustering process. This clustering method enabled manual labeling of 136878 iEEG recordings in 320 hours, approximately five times faster than labeling the same dataset without clustering. Our model was evaluated on a completely independent clinician-annotated test dataset from 100 patients (as shown in Figure 4), where all iEEG recordings were manually labeled by 3 board-certified neurologists without any clustering help. Thus, there was no information leakage between the clustering pipeline used to assist labeling, and the model training and evaluation processes.
In the SOD task, data augmentation improved model performance by increasing the number of available iEEG channels. Techniques like translational shifting of seizure onsets and appending baseline non-seizure activity to them, helped the models better capture variability in seizure presentations. Data augmentation was applied only to the SOD task due to the limited size of its labeled dataset (7808 iEEG channels), which necessitated augmentation for effective model training. In contrast, the ESC task had a substantially larger dataset (around 550000 labeled iEEG channels), enabling strong model performance without augmentation. Future work may explore augmentation for ESC to further enhance generalization.
For ESC, the large dataset enabled a clear train/test split with cross-validation on the training set. In contrast, the smaller SOD dataset warranted cross-validation without a held-out test set. Given the complex and subjective nature of seizure onset time labeling, final SOD performance was only evaluated on a separate clinician-annotated dataset to ensure reliability.
We observed that models initialized with ImageNet weights performed better than those with random initialization, highlighting the benefits of pre-trained weights in related domains. In future work, we plan to use weights derived from the ESC as a starting point for the SOD task. Given the similarity between these domains, we expect that this transfer learning approach may further enhance model performance.
We observed that when there was stronger agreement among the three experts in their labels, the ESC and SOD models performed much better. For instance, in the case of SOD, when there was stronger agreement among the three experts regarding seizure onset time labels, it indicated that the onset times were clearer and more straightforward to identify. This consensus among experts reflected in the performance of the models as shown in Figure 9.
In our SOD analysis, we chose median absolute error (MAE) as the performance metric due to the inter-expert variability in labels. The median provides a robust measure of typical model performance, minimizing the influence of outliers and extreme disagreements between experts. In some iEEG channels, experts disagreed on seizure onset labels by as much as 80 seconds. By focusing on the median, we ensured that our metric accurately represented the capability of the models to capture well-defined seizure onset times, thereby serving as a reliable indicator of their effectiveness.
Due to the large number of trainable parameters in our models (e.g., ViT with 86M and ResNet-50 with 23M) and the substantial amount of training data, each ESC training run required 1–2 days per fold on high-performance GPUs, making multiple runs per fold computationally infeasible in our final evaluation. However, preliminary experiments with repeated training using different initializations showed consistent performance. Additionally, we assessed model performance across various conditions, including pretrained vs. random initialization and color vs. grayscale spectrogram inputs. These results support the stability and reliability of our reported performance metrics.
In our study, we evaluated a diverse set of models including ResNet-50, 1D and 2D CNNs, time-series transformers, and vision transformers. While other temporal models such as recurrent neural networks (RNNs), long short-term memory networks (LSTMs), and hybrid CNN-RNN architectures have been widely used for time-series tasks, we did not include them in this work. This decision was informed by prior studies and recent benchmarks showing that transformer-based models typically outperform RNNs in modeling long-range dependencies, while avoiding challenges such as vanishing gradients. Nonetheless, exploring RNN-based and hybrid models remains an interesting direction for future work.
In addition to supervised learning approaches, several alternative methodologies have been proposed for seizure localization and classification that do not rely on expert labels. Notably, Granger causality-based methods have been used to uncover directed connectivity and seizure onset zone localization. For example, high-frequency Granger causality (>80 Hz) has been shown to precede visible ictal activity and align well with clinically defined seizure onset zones, highlighting its potential for early seizure detection and network-level understanding of seizure propagation [29]-[31]. Additionally, unsupervised and domain adaptation techniques using Riemannian geometry have shown promise in cross-subject EEG classification. A recent model leveraging adversarial learning and Riemannian manifold alignment significantly reduced domain shift and improved generalization in seizure prediction and detection tasks [32]. Simultaneous EEG–fMRI studies have also shown the utility of automated methods for categorizing interictal epileptiform discharges into distinct classes, offering an efficient alternative to traditional visual labeling [33]-[36]. It is worth noting that most of these methods rely on high-density recordings (e.g., 20–64 channels), allowing for fine-grained spatial localization. In contrast, our study uses only 4 iEEG channels, limiting spatial resolution but providing a feasible solution for seizure classification in limited-channel or implantable systems. While our focus is on classification rather than localization, these alternative methods are relevant and suggest valuable future directions, especially those that reduce reliance on expert labels and thus mitigate rater bias.
We envision a pipeline, where iEEG records undergo processing through an ESC module. When the ESC identifies a seizure, the data is directed to the SOD module, which detects the precise seizure onset time within each iEEG channel. This integrated approach could be important for identifying the channels of earliest seizure onset. This knowledge can inform targeted surgical interventions, such as resections, by focusing on the specific areas responsible for initiating seizures. In the future, we also plan to combine the ESC and SOD models into a single model capable of performing both tasks.
V. Conclusion
We have shown that image-based models using time-frequency spectrogram inputs, particularly vision transformers, are well-suited for the tasks of seizure classification and seizure onset detection, outperforming their time-series counterparts. We aim to extend the applicability of our models to other types of epilepsy, such as generalized epilepsy, as our current assessment has primarily focused on iEEG recordings from patients with focal epilepsy. Additionally, we aim to develop models capable of distinguishing different electrographic seizure morphologies, rather than merely distinguishing between seizure and non-seizure events. This enhancement would enable a more nuanced understanding of seizure characteristics and potentially lead to more personalized treatment approaches for patients.
Supplementary Material
Acknowledgment
MFA, WB, TKT, DG, CS, and MJM have equity ownership/stock options with NeuroPace and are employees of NeuroPace. SAD has equity ownership/stock options with NeuroPace and is a former employee of NeuroPace. JK, SWB, and CBT have received support from and served as paid consultants for NeuroPace. EJK has no conflicts of interest to disclose. This research was supported in part by the National Institutes of Health (NIH) under Grant Nos. UH3NS109557 and R61NS125568.
References
- [1].Inoue Y and Mihara T, “Awareness and Responsiveness During Partial Seizures,” Epilepsia, vol. 39, no. S5, pp. 7–10, May 1998, doi: 10.1111/j.1528-1157.1998.tb05142.x. [DOI] [PubMed] [Google Scholar]
- [2].Blum D et al. , “Patient awareness of seizures,” Neurology, vol. 47, no. 1, pp. 260–264, Jul. 1996, doi: 10.1212/wnl.47.1.260. [DOI] [PubMed] [Google Scholar]
- [3].Kerling F et al. , “When Do Patients Forget Their seizures? an Electroclinical Study,” Epilepsy & Behavior, vol. 9, no. 2, pp. 281–285, Sep. 2006, doi: 10.1016/j.yebeh.2006.05.010. [DOI] [PubMed] [Google Scholar]
- [4].Hoppe C et al. , “Epilepsy,” Archives of Neurology, vol. 64, no. 11, p. 1595, Nov. 2007, doi: 10.1001/archneur.64.11.1595. [DOI] [PubMed] [Google Scholar]
- [5].Litt B and Echauz J, “Prediction of epileptic seizures,” The Lancet Neurology, vol. 1, no. 1, pp. 22–30, May 2002, doi: 10.1016/s1474-4422(02)00003-0. [DOI] [PubMed] [Google Scholar]
- [6].Tzallas A et al. , “Epileptic Seizure Detection in EEGs Using Time–Frequency Analysis,” IEEE Transactions on Information Technology in Biomedicine, vol. 13, no. 5, pp. 703–710, Sep. 2009, doi: 10.1109/titb.2009.2017939. [DOI] [PubMed] [Google Scholar]
- [7].Gotman J, “Automatic Detection of Seizures and Spikes,” vol. 16, no. 2, pp. 130–140, Mar. 1999, doi: 10.1097/00004691-199903000-00005. [DOI] [PubMed] [Google Scholar]
- [8].Amin H et al. , “A novel approach based on wavelet analysis and arithmetic coding for automated detection and diagnosis of epileptic seizure in EEG signals using machine learning techniques,” Biomedical Signal Processing and Control, vol. 56, p. 101707, Feb. 2020, doi: 10.1016/j.bspc.2019.101707. [DOI] [Google Scholar]
- [9].Perez-Sanchez A et al. , “Epileptic Seizure Prediction Using Wavelet Transform, Fractal Dimension, Support Vector Machine, and EEG Signals,” Fractals, Jul. 2022, doi: 10.1142/s0218348x22501547. [DOI] [Google Scholar]
- [10].Brabandere A et al. , ”Detecting Epileptic Seizures Using Hand-Crafted and Automatically Constructed EEG Features,” in IEEE Transactions on Biomedical Engineering, vol. 71, no. 1, pp. 318–325, Jan. 2024, doi: 10.1109/TBME.2023.3299821. [DOI] [PubMed] [Google Scholar]
- [11].Boonyakitanont P et al. , “A review of feature extraction and performance evaluation in epileptic seizure detection using EEG,” Biomedical Signal Processing and Control, vol. 57, p. 101702, Mar. 2020, doi: 10.1016/j.bspc.2019.101702. [DOI] [Google Scholar]
- [12].Siddiqui M et al. A review of epileptic seizure detection using machine learning classifiers. Brain Inform. 2020. May 25;7(1):5. doi: 10.1186/s40708-020-00105-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Mirowski P et al. , “Classification of patterns of EEG synchronization for seizure prediction,” Clinical Neurophysiology, vol. 120, no. 11, pp. 1927–1940, Nov. 2009, doi: 10.1016/j.clinph.2009.09.002. [DOI] [PubMed] [Google Scholar]
- [14].Shiao H et al. , “SVM-Based System for Prediction of Epileptic Seizures from iEEG Signal,” IEEE transactions on bio-medical engineering, vol. 64, no. 5, pp. 1011–1022, May 2017, doi: 10.1109/TBME.2016.2586475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Xu Y et al. , “An End-to-End Deep Learning Approach for Epileptic Seizure Prediction,” Aug. 2020, doi: 10.1109/aicas48895.2020.9073988. [DOI] [Google Scholar]
- [16].Desai S et al. , “Expert and deep learning model identification of Ieeg seizures and seizure onset times,” Frontiers in Neuroscience, vol. 17, Jul. 2023, doi: 10.3389/fnins.2023.1156838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Barry W et al. , “A High Accuracy Electrographic Seizure Classifier Trained Using Semi-Supervised Labeling Applied to a Large Spectrogram Dataset,” Frontiers in Neuroscience, vol. 15, Jun. 2021, doi: 10.3389/fnins.2021.667373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Eberlein M et al. , “Convolutional Neural Networks for Epileptic Seizure Prediction,” 2018. Accessed: Jan. 26, 2020. [Online]. Available: https://arxiv.org/pdf/1811.00915 [Google Scholar]
- [19].Jia M et al. , “Efficient graph convolutional networks for seizure prediction using scalp EEG,” Frontiers in Neuroscience, vol. 16, Aug. 2022, doi: 10.3389/fnins.2022.967116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Truong N et al. , “Convolutional neural networks for seizure prediction using intracranial and scalp electroencephalogram,” Neural Networks, vol. 105, pp. 104–111, Sep. 2018, doi: 10.1016/j.neunet.2018.04.018. [DOI] [PubMed] [Google Scholar]
- [21].Ma Y et al. , “A Multi-Channel Feature Fusion CNN-Bi-LSTM Epilepsy EEG Classification and Prediction Model Based on Attention Mechanism,” IEEE Access, vol. 11, pp. 62855–62864, Jan. 2023, doi: 10.1109/access.2023.3287927. [DOI] [Google Scholar]
- [22].Wu X et al. , “An end-to-end seizure prediction approach using long short-term memory network,” Frontiers in Human Neuroscience, vol. 17, May 2023, doi: 10.3389/fnhum.2023.1187794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].İPotter I et al. , “Unsupervised Multivariate Time-Series Transformers for Seizure Identification on EEG,” 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), vol. 15, pp. 1304–1311, Dec. 2022, doi: 10.1109/icmla55696.2022.00208. [DOI] [Google Scholar]
- [24].Busia P et al. , ”EEGformer: Transformer-Based Epilepsy Detection on Raw EEG Traces for Low-Channel-Count Wearable Continuous Monitoring Devices,” 2022 IEEE Biomedical Circuits and Systems Conference (BioCAS), Taipei, Taiwan, 2022, pp. 640–644, doi: 10.1109/Bio-CAS54905.2022.9948637. [DOI] [Google Scholar]
- [25].Zhang X and Li H, ”Patient-Specific Seizure prediction from Scalp EEG Using Vision Transformer,” 2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 2022, pp. 1663–1667, doi: 10.1109/ITOEC53115.2022.9734546. [DOI] [Google Scholar]
- [26].Hussein R et al. , “Multi-Channel Vision Transformer for Epileptic Seizure Prediction,” Biomedicines, vol. 10, no. 7, pp. 1551–1551, Jun. 2022, doi: 10.3390/biomedicines10071551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Jarosiewicz B and Morrell M, “The RNS System: brain-responsive neurostimulation for the treatment of epilepsy,” Expert Review of Medical Devices, vol. 18, no. 2, pp. 129–138, Sep. 2020, doi: 10.1080/17434440.2019.1683445. [DOI] [PubMed] [Google Scholar]
- [28].Dosovitskiy A et al. , “An image is worth 16x16 words: Transformers for image recognition at scale,” Jun. 2021. Available: https://arxiv.org/pdf/2010.11929. [Google Scholar]
- [29].Adhikari B et al. , “Localizing Epileptic Seizure Onsets with Granger Causality.” Physical Review E, vol. 88, no. 3, 19 Sept. 2013, 10.1103/physreve.88.030701. [DOI] [PubMed] [Google Scholar]
- [30].Epstein C et al. , “Application of High-Frequency Granger Causality to Analysis of Epileptic Seizures and Surgical Decision Making.” Epilepsia, vol. 55, no. 12, 4 Nov. 2014, pp. 2038–2047, 10.1111/epi.12831. [DOI] [PubMed] [Google Scholar]
- [31].Protopapa F et al. , “Children with well controlled epilepsy possess different spatio-temporal patterns of causal network connectivity during a visual working memory task.” Cognitive neurodynamics vol. 10,2 (2016): 99–111. doi: 10.1007/s11571-015-9373-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Peng P et al. , “Domain Adaptation for Epileptic EEG Classification Using Adversarial Learning and Riemannian Manifold.” Biomedical Signal Processing and Control, vol. 75, May 2022, p. 103555. ScienceDirect, 10.1016/j.bspc.2022.103555. [DOI] [Google Scholar]
- [33].C. Bénar et al. , “Quality of EEG in Simultaneous EEG-FMRI for Epilepsy.” Clinical Neurophysiology, vol. 114, no. 3, Mar. 2003, pp. 569–580, 10.1016/s1388-2457(02)00383-8. [DOI] [PubMed] [Google Scholar]
- [34].C. Bénar et al. , “EEG–FMRI of Epileptic Spikes: Concordance with EEG Source Localization and Intracranial EEG.” NeuroImage, vol. 30, no. 4, May 2006, pp. 1161–1170, 10.1016/j.neuroimage.2005.11.008. [DOI] [PubMed] [Google Scholar]
- [35].Pedreira C et al. , “Classification of EEG Abnormalities in Partial Epilepsy with Simultaneous EEG-fMRI Recordings.” NeuroImage, vol. 99, Oct. 2014, pp. 461–76. 10.1016/j.neuroimage.2014.05.009. [DOI] [PubMed] [Google Scholar]
- [36].Galaris E et al. , “Electroencephalography Source Localization Analysis in Epileptic Children during a Visual Working-Memory Task.” International Journal for Numerical Methods in Biomedical Engineering, vol. 36, no. 12, Dec. 2020, p. e3404. 10.1002/cnm.3404. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
