Abstract
Objective:
Seizures are a harmful complication of acute intracerebral hemorrhage (ICH). “Early” seizures in the first week after ICH are a risk factor for deterioration, later seizures, and herniation. Ideally, seizure medications after ICH would only be administered to patients with a high likelihood to have seizures. We developed and validated machine-learning (ML) models to predict early seizures after ICH.
Design:
We used two large datasets to train and then validate our models in an entirely independent test set. The first model (“CAV”) predicted early seizures from a subset of variables of the CAVE score (a prediction rule for later seizures): cortical hematoma location, age less than 65 years, and hematoma volume greater than 10 mL, while early seizure was the dependent variable. We attempted to improve upon the “CAV” model by adding anti-coagulant use, anti-platelet use, Glasgow Coma Scale, international normalized ratio, and systolic blood pressure (“CAV+”). For each model we utilized logistic regression, lasso regression (regularized), support vector machines, boosted trees (Xgboost), and Random Forest models. Final model performance was reported as the area under the curve using receiver operating characteristic models for the test data.
Setting:
Two large academic institutions.
Patients:
864 survivors of ICH – 634 from Institution 1 and 230 from Institution 2.
Interventions:
None.
Measurements and Main Results:
Early seizures were predicted similarly across the ML models by the CAV score in test data, (AUC 0.72, 95% CI 0.62–0.82). CAV+ had both the greatest and significantly improved performance for Xgboost compared to CAV (AUC 0.79, 95% CI 0.71–0.87, p=0.04 compared to CAV model AUC).
Conclusions:
Early seizures after ICH are predictable. Models utilizing cortical hematoma location, age less than 65 years, and hematoma volume greater than 10 mL had very good accuracy, and performance improved with more independent variables. Additional methods to predict seizures could improve patient selection for monitoring and prophylactic seizure medications.
Indexing terms: seizures, epilepsy, cerebral hemorrhage, machine learning, critical care, neurology
Introduction
Non-traumatic intracerebral hemorrhage (ICH) accounts for approximately 10% of strokes.1 Alongside hematoma expansion, seizures are a serious complication of ICH.2 Early seizures are related to increased midline shift on CT scan, worse health-related quality of life in survivors, and death.3–5 Preventing seizures is a rational strategy to potentially improve outcomes after ICH if the most appropriate patients to receive treatment could be identified.
There are few available strategies to predict seizures after ICH at the time of presentation before electroencephalography (EEG) can be obtained. Even when applied to patients at high risk for seizures, the incidence of seizures is only about one in six, suggesting that scarce resources could be more precisely directed.6 Cortical hematoma is widely recognized as a risk factor for early seizures after ICH. The CAVE score predicts late seizures (more than one week) after ICH based on cortical hematoma location, age less than 65, volume greater than 10 mL, and early seizure.7 Widespread prophylactic seizure medication is not recommended by guidelines, and is generally associated with worse patient outcomes, so a model to predict early seizures could prompt prophylaxis for patients most likely to have seizures, and preclude treatment in patients who are least likely to benefit and might have side effects.8–11 We attempted to predict early seizures in patients with ICH.
Methods
Patient Identification
We used prospectively collected data from two institutions: Northwestern University (referred to as Institution 1), and Johns Hopkins University (referred to as Institution 2). The methods of patient identification have been previously described.8, 12 A board-certified neurologist confirmed the diagnosis of spontaneous ICH as well as the location of the ICH, using each patient’s head computed tomography results and the appropriate clinical history. Early seizure was defined as a seizure within 1 week of ICH. Seizures were defined based on characteristic clinical presentation as observed by a clinician, reviewed by a study neurologist. Patients with trauma, hemorrhagic conversion of ischemic stroke, or structural lesions (e.g. tumor) were excluded. Patients with ICH were screened for inclusion from November 2006 to April 2021.
Standard Protocol Approvals, Registrations, and Patient Consents
Data collection for registry study was separately approved by the Northwestern University Institutional Review Board and the Johns Hopkins University Institutional Review Board as previously reported.12, 13 Patients or a legally appointed representative gave written consent, except where a non-consentable patient had no legally appointed representative, in which case the IRB approved an exemption. Decedents were also exempt from written consent.
Variables
Patient-level characteristics were collected at both Institutions during the index hospital admission including: early seizures, hematoma location, patient age, hematoma volume (by ABC/2 method), pre-ICH anti-coagulant use, pre-ICH anti-platelet use, Glasgow Coma Scale, international normalized ratio, and initial systolic blood pressure.14, 15 Both international normalized ratio (INR) and systolic blood pressure are continuous variables. These variables were prospectively collected at initial presentation. Only complete patient observations were used in the analysis.
We selected the additional variables for the CAV+ models based on previous evidence that these variables are associated with worse outcomes after ICH, and variables that were collected in both Institutions. Pre-ICH anti-coagulant use, anti-platelet use, and elevated INR generally predict more intracranial bleeding and worse patient outcomes.16–18 GCS is notably a part of the ICH score and is associated with increased mortality after ICH.19 Lowering systolic blood pressure is a treatment goal early in ICH, with higher systolic blood pressures being associated with worsening outcomes after ICH.20 Additional variables including surgical management of ICH and subarachnoid extension were considered but ultimately not selected for inclusion given mixed evidence on their role in improving or worsening outcomes in ICH.11, 21 While we have information regarding seizures seen on EEG for Institution 1, we do not have this information for Institution 2, and so this data was not included. We were limited in further variable selection, because building models relies on variables being present in both groups.
Statistical Analysis
Continuous variables were presented as median (interquartile range [IQR]) with non-normality determined by the Shapiro-Wilk test; normally distributed variables are presented as mean +/− SD. The associations between individual predictors and seizure occurrence were compared with the Kruskal-Wallis H test and Chi-square test for continuous and categorical variables, respectively.
Supervised machine learning (ML) models (i.e., the classifier [label] of early seizures vs. no early seizures was known) were trained on data from Institution 1 and tested on independently obtained data from Institution 2. We created two sets of models to compare (1) early seizure prediction from components of the CAVE score except early seizures because it is the dependent variable, i.e. cortical hematoma location, age less than 65 years, and hematoma volume greater than 10 mL (“CAV”); and (2) seizure prediction from the three CAV based predictors plus five independent variables associated with greater morbidity in patients with ICH (“CAV+”): anti-coagulant use, anti-platelet use, Glasgow Coma Scale (GCS), INR, and systolic blood pressure. Within each set we built logistic regression, regularized lasso regression, support vector machines (SVM), extreme gradient boosted trees (Xgboost), and Random Forest models to predict early seizures.
We used 5-fold cross validation of the training set replicated ten times with random splitting to reduce the likelihood of a given split improperly influencing the results. We opted for 5-fold cross validation over 10-fold cross validation given the higher computational cost of increased folds with ultimately diminishing returns. Random grids were used to tune model hyper-parameters. The fitted models were tested on the observations from Institution 2. Final model performance was reported as the area under the receiver operating characteristic curve (ROC) for the test data. We used DeLong’s test to compare the area under the curve (AUC) for each set of models. Variable importance scores were calculated to quantify each variable’s relative influence on seizure prediction. We reported the variable importance scores corresponding to the most predictive model. Statistical analysis was conducted using freely available software (R v4.0.2, package “tidymodels”).
Data Availability
Anonymized data and associated documentation will be made available by request from any qualified investigator for the purposes of reproducing the analysis.
Results
Nine hundred and thirty-one patients with ICH were screened for inclusion. Upon retrospective data analysis, 930 patients reported a seizure outcome, of whom 864 contained no missing patient-level characteristics (Figure 1). Of the 864 patients with ICH, 634 were from Institution 1 and 230 were from Institution 2. Sex, age, and ethnicity were similar between the two Institutions. Detailed demographics of each dataset are shown in Table 1.
Figure 1:

CONSORT diagram of patient flow
Table 1.
Demographics of Patients from Institution 1 and Institution 2
| Institution 1 N = 634 | Institution 2 N = 230 | |
|---|---|---|
| Variable | N(%), Mean ± SD | N(%), Mean ± SD |
| Age | 65.3 ± 14.1 | 64.1 ± 15.6 |
| Male sex | 342 (53.9%) | 129 (56.1%) |
| Ethnicity* | ||
| Black or African American | 256 (40.4%) | 115 (50.0%) |
| Not Black or African American | 376 (59.3%) | 115 (50.0%) |
| White | 338 (53.5%) | |
| Asian | 24 (3.8%) | |
| Native Pacific Islander | 11 (1.7%) | |
| American Indian/Native Alaskan | 3 (0.5%) | |
| Hispanic or Latino** | 53 (8.4%) |
number of patients whose ethnicities are identified does not equal total number of Institution 1 patients due to missing data
patients were not asked to identify as Hispanic or Latino versus not Hispanic at Institution 2
Early seizures after ICH were identified in 8.2% of patients from Institution 1 (training and validation) and 10% of patients from Institution 2 (test). Univariate associations with early seizures from Institution 1 and Institution 2 are shown in Table 2. As expected, hematoma location was the most prominent variable associated with seizures. The CAV+ model with the Xgboost model selected hematoma location (0.69), GCS (0.15), and INR (0.15) as the most important predictors of seizure. The fractional contributions of each feature’s importance in the Xgboost model are depicted in Figure 2.
Table 2.
Characteristics of Patients from Institution 1 and Institution 2, stratified by seizure occurrence
| Institution 1 | Institution 2 | |||||
|---|---|---|---|---|---|---|
| Variable | No seizure N=582 | Seizure N=52 | P | No seizure N=207 | Seizure N=23 | P |
| N (%) or median [IQR] | N (%) or median [IQR] | |||||
| Age <65 | 290 (49.8) | 26 (50.0) | 1 | 120 (58.0) | 14 (60.9) | 1 |
| ICH Volume ( > 10 mL) | 341 (58.6) | 35 (67.3) | 0.3 | 119 (57.5) | 18 (78.3) | 0.09 |
| Hematoma Location (Cortical) | 200 (34.4) | 34 (65.4) | <0.001 | 51 (24.6) | 13 (56.5) | 0.003 |
| Anti-coagulant Use | 71 (12.2) | 11 (21.2) | 0.1 | 17 (8.2) | 5 (21.7) | 0.09 |
| Antiplatelet Use | 190 (32.6) | 22 (42.3) | 0.2 | 75 (36.2) | 9 (39.1) | 1 |
| GCS | 0.02 | 0.07 | ||||
| 13–15 | 326 (56.0) | 18 (34.6) | 128 (61.8) | 7 (30.4) | ||
| 5–12 | 183 (31.4) | 28 (53.8) | 52 (25.1) | 12 (52.2) | ||
| 3–4 | 73 (12.5) | 6 (11.5) | 27 (13.0) | 4 (17.4) | ||
| INR | 1.1 [1.0–1.2] | 1.1 [1.0–1.4] | 0.1 | 1.1 [1.0–1.1] | 1.1 [1.0–1.6] | 0.3 |
| Initial SBP | 175 [145–211] | 163 [144–201] | 0.07 | 192 [160–220] | 175 [144–210] | 0.08 |
GCS: Glasgow Coma scale, INR: international normalized ratio, SBP: systolic blood pressure
Figure 2:

Feature Importance in the CAV+ Xgboost model
GCS: Glasgow Coma Scale, SBP: Initial systolic blood pressure, INR: International normalized ratio
Apart from SVM, the CAV and CAV+ models predicted early seizures with high discrimination, with AUC values ranging from 0.69 to 0.79 for both the CAV and the CAV+ models using logistic regression, lasso regression, Random Forest, and Xgboost models, respectively. We observed a statistically significant difference between CAV and CAV+ for the Xgboost model (AUC 0.72, 95% CI 0.62–0.82 vs. 0.79, 95% CI 0.71–0.87, p=0.04) and lasso regression model (AUC 0.69, 95% CI 0.58–0.80 vs. 0.77, 95% CI 0.68–0.85, p=0.02). The CAV+ Xgboost model predicted seizure with the greatest AUC (0.79, 95% CI 0.71–0.87). Figure 3 shows the ROC curves for the prediction modeling methods for CAV and CAV+, and Table 3 summarizes the AUC results.
Figure 3:

Receiver Operating Characteristic Curves for predicting early seizures after ICH.
Red line represents logistic regression; Green line represents Random Forest; Dark blue line represents lasso regression; Teal blue line represents Xgboost
CAV (left plot): cortical hematoma, age <65, volume >10mL; CAV+ (right plot): CAV, anti-platelet use, anti-coagulant use, Glasgow Coma Scale, International normalized ratio, systolic blood pressure
Table 3.
Area under the receiver operating characteristic curve (AUC) for each set of models
| Model Type | CAV (95% CI) | CAV+ (95% CI) | P for comparison |
|---|---|---|---|
| Logistic Regression | 0.72 (0.61–0.82) | 0.73 (0.62–0.83) | 0.8 |
| Lasso Regression | 0.69 (0.58–0.80) | 0.77 (0.68–0.85) | 0.02 |
| Support Vector Machines | 0.50 (0.50–0.50) | 0.52 (0.38–0.67) | 0.8 |
| Random Forest | 0.70 (0.60–0.80) | 0.78 (0.68–0.88) | 0.2 |
| Xgboost | 0.72 (0.62–0.82) | 0.79 (0.71–0.87) | 0.04 |
CAV: cortical hematoma, age <65 years, volume >10mL; CAV+: CAV variables plus anti-platelet use, anti-coagulant use, Glasgow Coma Scale, international normalized ratio on admission, systolic blood pressure
Discussion
We found that early seizures were predictable after ICH. A model including cortical hematoma location, age less than 65, and hematoma volume greater than 10 mL (i.e., the CAVE score minus early seizures, “CAV”) had very good accuracy for predicting early seizures.7 The inclusion of anti-coagulant use, anti-platelet use, GCS, INR, and systolic blood pressure as well as machine learning techniques other than SVM and non-regularized logistic regression (e.g., Xgboost) improved prediction to the “excellent” range (AUC ~0.8) when the model was tested using data from a separate Institution. The feature importance analysis of the Xgboost model shows the relative influence of each variable on the outcome of interest, in this case early seizures. Consistent with the CAVE model, hematoma location was noted to be the most important feature along with GCS and INR, suggesting that location plays a larger role in predicting early seizure than the other variables. The addition of these variables associated with worse patient outcomes after ICH significantly improved the model performance using machine learning. These data and models are a further step towards identifying those who may benefit from early monitoring and treatment of seizure after ICH using additional clinical variables.
In our study we utilized five different types of models: logistic regression, lasso regression, support vector machines, random forest, and Xgboost. We started with logistic regression as this is a standard analysis technique previously used in research involving prediction models.7 We included lasso regression as a regularized form of logistic regression and SVM which used a nonlinear radial basis function to create a decision boundary. Random forest and Xgboost are tree-based methods and are generally considered to be less prone to overfitting. We used tree-based methods due to their performance with categorical predictors, resistance to overfitting (inability to generalize to independent test data sets), and accommodation of multiple independent variables.22–24 We did not attempt to apply nearest neighbor methods due to the prevalence of categorical predictors in our dataset and the reliance of this method on calculating Euclidean distances, for which continuous numbers are more appropriate. This method may be helpful for predicting other dichotomous events in patients with acute ICH (e.g., likelihood of hematoma expansion).25
This study has several limitations. Machine learning methods’ utility requires reproducibility and generalizability in subsequent cohorts. To mitigate these concerns in the models we present here, we used a 5-fold cross validation training approach replicated ten times to minimize overfitting. Our use of an independent test set suggests the model generalizes well. Finally, our model was trained on hundreds of patient observations and typically machine learning models with thousands of observations demonstrate better performance. The size of our dataset is subject to the incidence of ICH at two large academic centers. It is possible that additional data would improve performance further, although the performance of the model may be limited by clinically observable variables.
The results of this study may have potential implications for improved prediction models that incorporate insights from EEG data. EEG is now routine to detect subclinical seizures after ICH.26 Work to include EEG information may be informative. Limited EEG montages may be applied quickly after admission and could help focus attention on patients most likely to have seizures after ICH.6 Quantitative EEG is a field exploring mathematical analysis of EEG for a complementary approach to screening for epileptic seizures.27 The 2HELPS2B score incorporates characteristics of EEG as well as prior seizure to predict seizure risk in hospitalized patients generally.28 Similar to the CAVE score, the 2HELPS2B score incorporates early seizure as part of the prediction for later seizures. Since early seizures are the outcome of interest, a modification of the 2HELPS2B score, similar to the removal of early seizures in our study, may be useful to predict early seizures.
An additional area of research would be to further delineate how hematoma location predicts early seizures. While a dichotomous description of lobar hematoma location is predictive of early seizures, the risk may not be the same for all areas of the cortex (e.g., the incidence of seizures after ICH in the temporal lobes may be different from the occipital lobes).7 Testing such a hypothesis would require methods of mapping of neuroimaging or deep learning, which were outside the scope of this investigation.
Conclusion
We found that early seizures after ICH were predictable from routinely collected clinical data. Existing models for late seizures may be modifiable to predict early seizures. Validated rules to predict early seizures could improve patient selection for EEG monitoring, prophylactic seizure medications, and treatment of seizures after ICH.
Acknowledgements
All those who meaningfully contributed to the manuscript are included as an author.
Sources of Funding
J.M. is supported in part by T32 LM012203.
R.F. is supported by a K23 NS101124
A.M.N. is supported in part by R01 NS110779 and U01 NS110772
Footnotes
Disclosures
None
References
- 1.Virani SS, Alonso A, Aparicio HJ, et al. Heart Disease and Stroke Statistics-2021 Update: A Report From the American Heart Association. Circulation 2021:Cir0000000000000950. [DOI] [PubMed] [Google Scholar]
- 2.Qureshi AI, Tuhrim S, Broderick JP, Batjer HH, Hondo H, Hanley DF. Spontaneous intracerebral hemorrhage. N Engl J Med 2001;344:1450–1460. [DOI] [PubMed] [Google Scholar]
- 3.Sheikh ZB, Stretz C, Maciel CB, et al. Deep Versus Lobar Intraparenchymal Hemorrhage: Seizures, Hyperexcitable Patterns, and Clinical Outcomes. Crit Care Med 2020;48:e505–e513. [DOI] [PubMed] [Google Scholar]
- 4.Vespa PM, O’Phelan K, Shah M, et al. Acute seizures after intracerebral hemorrhage: a factor in progressive midline shift and outcome. Neurology 2003;60:1441–1446. [DOI] [PubMed] [Google Scholar]
- 5.Naidech A, Weaver B, Maas MB, Bleck TP, VanHaerents S, Schuele SU. Early Seizures Are Predictive of Worse Health-Related Quality of Life at Follow-Up After Intracerebral Hemorrhage. Crit Care Med 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Vespa PM, Olson DM, John S, et al. Evaluating the Clinical Impact of Rapid Response Electroencephalography: The DECIDE Multicenter Prospective Observational Clinical Study. Crit Care Med 2020;48:1249–1257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Haapaniemi E, Strbian D, Rossi C, et al. The CAVE score for predicting late seizures after intracerebral hemorrhage. Stroke 2014;45:1971–1976. [DOI] [PubMed] [Google Scholar]
- 8.Naidech AM, Garg RK, Liebling S, et al. Anticonvulsant use and outcomes after intracerebral hemorrhage. Stroke 2009;40:3810–3815. [DOI] [PubMed] [Google Scholar]
- 9.Naidech AM, Beaumont J, Muldoon K, et al. Prophylactic Seizure Medication and Health-Related Quality of Life After Intracerebral Hemorrhage. Crit Care Med 2018;46:1480–1485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Messe SR, Sansing LH, Cucchiara BL, et al. Prophylactic antiepileptic drug use is associated with poor outcome following ICH. Neurocrit Care 2009;11:38–44. [DOI] [PubMed] [Google Scholar]
- 11.Hemphill JC 3rd, Greenberg SM, Anderson CS, et al. Guidelines for the Management of Spontaneous Intracerebral Hemorrhage: A Guideline for Healthcare Professionals From the American Heart Association/American Stroke Association. Stroke 2015;46:2032–2060. [DOI] [PubMed] [Google Scholar]
- 12.Faigle R, Marsh EB, Llinas RH, Urrutia VC, Gottesman RF. Novel score predicting gastrostomy tube placement in intracerebral hemorrhage. Stroke 2015;46:31–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Naidech AM, Beaumont JL, Berman M, et al. Dichotomous “Good Outcome” Indicates Mobility More Than Cognitive or Social Quality of Life. Crit Care Med 2015;43:1654–1659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kothari RU, Brott T, Broderick JP, et al. The ABCs of measuring intracerebral hemorrhage volumes. Stroke 1996;27:1304–1305. [DOI] [PubMed] [Google Scholar]
- 15.Teasdale G, Jennett B. Assessment of coma and impaired consciousness. A practical scale. Lancet 1974;2:81–84. [DOI] [PubMed] [Google Scholar]
- 16.Goldstein JN, Greenberg SM. Should anticoagulation be resumed after intracerebral hemorrhage? Cleve Clin J Med 2010;77:791–799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Yildiz OK, Arsava EM, Akpinar E, Topcuoglu MA. Previous antiplatelet use is associated with hematoma expansion in patients with spontaneous intracerebral hemorrhage. J Stroke Cerebrovasc Dis 2012;21:760–766. [DOI] [PubMed] [Google Scholar]
- 18.Burchell SR, Tang J, Zhang JH. Hematoma Expansion Following Intracerebral Hemorrhage: Mechanisms Targeting the Coagulation Cascade and Platelet Activation. Curr Drug Targets 2017;18:1329–1344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hemphill JC 3rd, Bonovich DC, Besmertis L, Manley GT, Johnston SC. The ICH score: a simple, reliable grading scale for intracerebral hemorrhage. Stroke 2001;32:891–897. [DOI] [PubMed] [Google Scholar]
- 20.Sakamoto Y, Koga M, Todo K, et al. Relative systolic blood pressure reduction and clinical outcomes in hyperacute intracerebral hemorrhage: the SAMURAI-ICH observational study. J Hypertens 2015;33:1069–1073. [DOI] [PubMed] [Google Scholar]
- 21.Maas MB, Nemeth AJ, Rosenberg NF, et al. Subarachnoid extension of primary intracerebral hemorrhage is associated with poor outcomes. Stroke 2013;44:653–657. [DOI] [PubMed] [Google Scholar]
- 22.Bolandzadeh N, Kording K, Salowitz N. Predicting cognitive function from clinical measures of physical function and health status in older adults.. 2015. [DOI] [PMC free article] [PubMed]
- 23.Zou H, Hastie T. Regularization and variable selection via the elastic net. J Royal Statis Soc B 2005:301–320. [Google Scholar]
- 24.Chen T, Guestrin C. Xgboost: A scalable tree boosting system. 22nd ACM SIGKDD INternational Conference on Knowledge discovery and data mining; 2016: 785–794. [Google Scholar]
- 25.Hall AN, Weaver B, Liotta E, et al. Identifying Modifiable Predictors of Patient Outcomes After Intracerebral Hemorrhage with Machine Learning. Neurocrit Care 2021;34:73–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Claassen J, Jette N, Chum F, et al. Electrographic seizures and periodic discharges after intracerebral hemorrhage. Neurology 2007;69:1356–1365. [DOI] [PubMed] [Google Scholar]
- 27.Nuwer M Assessment of digital EEG, quantitative EEG, and EEG brain mapping: report of the American Academy of Neurology and the American Clinical Neurophysiology Society. Neurology 1997;49:277–292. [DOI] [PubMed] [Google Scholar]
- 28.Struck AF, Ustun B, Ruiz AR, et al. Association of an Electroencephalography-Based Risk Score With Seizure Probability in Hospitalized Patients. JAMA Neurol 2017;74:1419–1424. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Anonymized data and associated documentation will be made available by request from any qualified investigator for the purposes of reproducing the analysis.
