Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2026 Jan 22;16:5986. doi: 10.1038/s41598-026-37159-w

Machine learning for predicting functional outcomes in acute ischemic stroke: insights from a nationwide stroke registry

Taehoon Ko 1,2,3, Kanghyuk Lee 1, Yong Uk Kwon 4, Yu Ra Lee 5, So Young Han 6, Jae Sang Oh 2,7,
PMCID: PMC12901967  PMID: 41571745

Abstract

Accurately predicting the prognosis of patients with acute ischemic stroke at discharge remains highly challenging after active treatment. The aim of this retrospective nationwide registry-based study was to identify key predictors associated with favorable outcomes and to develop machine learning models for patient outcome prediction. Analysis of a comprehensive dataset of 40,586 patients revealed younger age (odds ratio [OR]: 0.975; 95% confidence interval [CI]: 0.972–0.977; p < 0.001), lower initial National Institutes of Health Stroke Scale score (OR: 0.862; 95% CI: 0.855–0.868; p < 0.001), mechanical thrombectomy (OR: 2.617; 95% CI: 2.185–3.134; p < 0.001), and rehabilitation therapy (OR: 2.765; 95% CI: 2.530–3.022; p < 0.001) as significant predictors of good functional outcome. We developed three machine learning models—random forest (RF), support vector machine (SVM), and logistic regression—to predict favorable functional outcomes (modified Rankin Scale score, ≤ 2) at discharge. Among these, the RF model revealed superior predictive performance, achieving an area under the curve (AUC) of 0.87, compared to the SVM and logistic regression, each achieving an AUC of 0.80. This study underscores the transformative potential of machine learning in stroke management, predicting and improving patient outcomes and streamlining healthcare delivery.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-026-37159-w.

Keywords: Acute ischemic stroke, Functional outcomes, Stroke registry, Machine learning, Predictive modeling, Random forest

Subject terms: Data processing, Machine learning

Introduction

Stroke remains the second-leading cause of death and the third-leading cause of death and disability in the world. Survivors often endure severe disabilities, including hemiparesis, aphasia, and cognitive impairments, which significantly affect their quality of life. Poor functional outcomes at hospital discharge are a critical determinant of long-term quality of life for stroke survivors1. In the emergency room, medical staff initiate active treatment for patients with acute ischemic stroke based on presenting signs, symptoms, and imaging findings. Prior to treatment, the medical team explains the treatment process to the patient and caregivers, and clarifies that without timely intervention, the prognosis may be poor. Although the goal of acute treatment is to achieve a favorable functional outcome, predicting the prognosis of each individual patient remains difficult. However, existing prognostic tools in other stroke registries, such as the analysis of Lausanne, Rankin scale, onset-to-treatment time, glucose level, age, National Institutes of Health Stroke Scale (NIHSS) score, early infarct signs, and dense artery sign, often rely on data that may be challenging to obtain during the early hours of stroke onset, thereby limiting their practical applicability25. While the application of prognostic models using limited early-stage data remains constrained in real-world clinical settings, there is increasing emphasis on improving patient outcomes through comprehensive and active management of acute ischemic stroke. Therefore, it is essential that the entire therapeutic process, including acute interventions and rehabilitative efforts, be reflected in prognostic assessments. Providing accurate and integrated prognostic information to patients and their families is a key component of modern stroke care. However, in the emergency department, it is often challenging to make such assessments promptly, even for experienced clinicians. This highlights the need for decision-support systems capable of integrating diverse clinical inputs to aid timely and accurate decision-making. Accordingly, predicting a patient’s prognosis should not rely solely on basic demographic or laboratory data but rather incorporate various modifiable variables, such as intravenous thrombolysis, mechanical thrombectomy, and the initiation of rehabilitation therapy during hospitalization. In addition, these tools lack customization to individual patient characteristics or outcomes, highlighting a gap in personalized stroke care. Unlike conventional prognostic tools, AI-based models can analyze large datasets without relying on strict assumptions and can identify complex, nonlinear relationships among variables. These models provide personalized predictions and actionable insights, even when derived from heterogeneous practical settings. With continuous autocorrection and adaptation to new data, AI systems possess an unparalleled potential for real-time predictions tailored to individual patient outcomes.

South Korea’s proactive national policy on acute stroke treatment has contributed significantly to a decline in stroke mortality rates over recent years. As of 2023, stroke is no longer among the top three causes of mortality, with cancer, heart disease, and pneumonia now leading. Since its implementation in 2006, South Korea’s Acute Stroke Assessment Registry (ASAR) policy has played a pivotal role in decreasing stroke mortality rates. ASAR, a national system that manages the initial active treatment of acute stroke patients, collects and monitors clinical data from 220 selected stroke centers nationwide. Recent analyses utilizing the ASAR data have revealed that early patient conditions, initial treatment modalities, and the quality of care provided by hospitals are critical determinants of long-term prognosis in stroke survivors6,7. Utilizing this registry, which contains robust and highly reliable data, we conducted a study on ischemic stroke trends and long-term mortality. While conventional statistical approaches provide reliable and validated results based on representative population-based data, they often fall short in delivering individualized prognostic insights. These methods often produce generalized outcomes based on aggregated group-level predictions, which may not adequately capture the diverse clinical profiles of real-world patients requiring personalized care, thereby limiting their applicability. Therefore, there is a pressing need for large-scale data-driven approaches that can generate tailored prognostic predictions for individual patients. By leveraging heterogeneous clinical variables and machine learning (ML) algorithms, such personalized models can offer meaningful support in real-time decision-making and ultimately enhance patient-centered care.

In this study, we aimed to address these gaps by utilizing the ASAR to develop and validate ML models for predicting stroke outcomes. Our primary objective was to create a reliable, automated model capable of predicting functional outcomes at discharge using readily accessible clinical and demographic data. By integrating advanced ML techniques with practical data, we aimed to enhance the precision and efficiency of stroke management. Another key objective of this study was to compare the predictive performance of the developed ML model with conventional, well-established statistical methods. By conducting this comparative analysis, we aimed to evaluate the methodological strengths and limitations of ML approaches from a different perspective and to assess their added value in clinical prognosis. Additionally, the study aimed to identify key factors associated with favorable functional outcomes at discharge and develop robust ML models to predict these outcomes. Further, it is hoped that our findings will provide actionable insights for clinicians and policymakers to improve stroke management strategies.

Results

Patient characteristics and outcomes

Table 1 shows the baseline characteristics of the 40,586 patients with acute ischemic stroke. The dataset had an average age of 68.76 years (± 12.99). Of the cohort, 40.1% were female. The median NIHSS score, used to assess stroke severity, was 3 [interquartile range (IQR): 1–6], indicating a moderately affected population. Smoking history revealed that the majority were current smokers (88.8%), with a smaller proportion of ex-smokers (7.5%) and non-smokers (3.7%).

Table 1.

Baseline characteristics and variables of acute ischemic stroke patients.

Variables Total
Total patients (n) 40,586
Age (mean [SD]) 68.76 (12.99)
Female (n, %) 16,295 (40.1)
NIHSS score (median [IQR]) 3 [1, 6]
Smoking history (n, %)
 Current smoker 36,025 (88.8)
 Ex-smoker 3049 (7.5)
 None smoker 1512 (3.7)
Atrial fibrillation (n, %) 6521 (16.1)
Charlson comorbidity index (n, %)
 0 8,642 (21.3)
 1 8,539 (21.0)
 2 7,553 (18.6)
 3 15,852 (39.1)
Status of health insurance (n, %)
 Health insurance 37,697 (92.9)
 Medical aid 2,889 (7.1)
Visit route (n, %)
 Direct admission 37,277 (91.8)
 Transfer from other hospitals 3,309 (8.2)
Emergency medical service (n, %)
 Yes 37,277 (91.8)
 No 3,309 (8.2)
Visit method (n, %)
 EMS 20,411 (50.3)
 Other vehicles 20,112 (49.6)
 Etc. 63 (0.2)
Hospital volume (n, %)
 Advanced general hospital 18,922 (46.6)
 General hospital 21,664 (53.4)
Stroke unit (n, %) 94,966 (66.6)
Onset to door_min (median [IQR]) 411 [105, 1358]
Onset to image_min (median [IQR]) 444 [130, 1398]
Diagnostic image tool (n, %)
 Computed tomography 29,666 (73.1)
 Magnetic resonance image 5,892 (14.5)
 No checkup brain image 5 (0)
 No record 5,023 (12.4)
Lipid test (n, %)
 Test within one month at the hospital 39,919 (98.4)
 Test within one month at another hospital 62 (0.2)
 No test 603 (1.5)
Screening test of dysphagia (n, %) 35,574 (87.7)
Initiation of oral diet during admission (n, %) 37,454 (92.3)
Diet considering dysphagia (n, %)
 Dysphagia diet 10,216 (25.2)
 Normal diet 26,733 (65.9)
 Etc. 3,637 (9)
Consult for rehabilitation medicine (n, %) 37,326 (92)
 No 3260 (8)
 Yes 37,326 (92)
Rehabilitation treatment (n, %) 22,012 (54.2)
Antithrombotic medication (n, %)
 Anticoagulant 5,631 (13.9)
 Antiplatelet 33,616 (82.8)
 No drugs 1,336 (3.3)
Intravenous thrombolysis (n, %)
 No IVT 35,944 (88.6)
 IVT < 120 2,498 (6.2)
 120 < IVT < 270 2,144 (5.3)
Mechanical thrombectomy (n, %) 1,737 (4.3)
Surgery (n, %) 3,575 (8.8)
Good functional outcome at discharge (n, %) 25,829 (63.6)

EMS, Emergency medical service; IQR, interquartile range; IVT, Intravenous thrombolysis; NIHSS, National Institutes of Health Stroke Scale; SD, standard deviation.

Atrial fibrillation, a key risk factor for stroke, was present in 16.1% of patients. The Charlson comorbidity index (CCI), a measure of overall health burden, revealed that 39.1% of patients had three or more comorbidities. Regarding health insurance coverage, most patients (92.9%) were enrolled in standard health insurance, whereas 7.1% relied on medical aid. The visit route indicated that most patients were directly admitted (91.8%), and 8.2% were transferred from other facilities. The use of emergency medical services was high (91.8%). Patients were evenly distributed between advanced general hospitals (46.6%) and general hospitals (53.4%). Notably, 66.6% of the patients were treated in stroke units.

Timelines for intervention, such as onset-to-door time (median: 411 min [IQR: 105–1,358 min]) and onset-to-imaging time (median: 444 min [IQR: 130–1,398 min]), exhibited considerable variation.

Diagnostic imaging

A preference for computed tomography was observed in 73.1% of patients, while magnetic resonance imaging was used in 14.5%, and 12.4% had no imaging record. Lipid profiling within a month was almost universally performed (98.4%); however, only a small percentage (1.5%) missed this crucial evaluation. Similarly, dysphagia screening, an essential component of stroke care, was performed in 87.7% of patients, and oral diet initiation occurred in 92.3% of patients during admission. Among patients with swallowing difficulties, 25.2% were provided with a dysphagia-specific diet.

Rehabilitation engagement was strong, with 92% of patients receiving consultations and 54.2% undergoing specific treatments. Regarding antithrombotic medications, 82.8% of patients received antiplatelet therapy, 13.9% received anticoagulants, and 3.3% received no medication.

Treatment modalities and effectiveness

Intravenous thrombolysis (IVT) was administered to 4,642 patients, accounting for 11.4% of the study population. Among these, 2,498 patients (6.2%) received IVT within 120 min of symptom onset. Patients who received IVT had significantly shorter onset-to-door times, with a median of 250 min. Furthermore, these patients were more likely to achieve favorable outcomes than those who did not receive IVT.

Mechanical thrombectomy, another critical intervention, was performed on 1,737 patients, accounting for 4.3% of the population. Despite presenting with higher baseline NIHSS scores (median: 8), patients undergoing mechanical thrombectomy achieved favorable outcomes in 48.5% of cases, highlighting the efficacy of this treatment in severe stroke. Surgical interventions were documented in 3,575 patients (8.8%), primarily targeting severe cases, such as malignant edema or hemorrhagic reperfusion.

Finally, the study revealed that 63.6% of the patients achieved favorable functional outcomes (modified Rankin Scale [mRS] score, ≤ 2). This outcome measure serves as a critical indicator of recovery and was influenced by several factors, including early treatment, patient age, and comorbidities.

Predictors of good functional outcome

Of the 40,586 patients with acute ischemic stroke, 63.6% achieved a favorable functional outcome. In the multivariate logistic regression analysis, several variables were identified as significant predictors of good functional outcome (Table 2). Older age was associated with a significantly lower probability of good functional outcome (odds ratio [OR]: 0.975; 95% confidence interval [CI]: 0.972–0.977; p < 0.001). Female patients had a lower likelihood of achieving a good functional outcome. Higher NIHSS scores were strongly associated with worse functional outcomes (OR: 0.838; 95% CI: 0.794–0.883; p < 0.001). Patients who underwent mechanical thrombectomy had a significantly increased probability of good functional recovery (OR: 2.617; 95% CI: 2.185–3.134; p < 0.001). Smokers showed a slightly higher probability of good outcome (OR: 1.092; 95% CI: 1.030–1.158; p < 0.01). Admission to a stroke unit was associated with a lower probability of good outcome (OR: 0.790; 95% CI: 0.743–0.839; p < 0.001). Rehabilitation treatment significantly improved functional outcomes (OR: 2.765; 95% CI: 2.530–3.022; p < 0.001). Patients who did not receive antithrombotic therapy had significantly worse functional outcomes (OR: 0.380; 95% CI: 0.316–0.457; p < 0.001). Variables such as onset-to-door time, onset-to-imaging time, and dysphagia screening did not show a significant association with good functional outcomes (p > 0.05).

Table 2.

Comparison between the random forest model and multivariable logistic regression analysis.

Variable Random forest Multivariable logistic regression
Importance Category Odds ratio CI lower CI upper Significance
Initial NIHSS 1.00 Very high 0.86 0.86 0.87 p < 0.001
Age 0.90 Very high 0.98 0.97 0.98 p < 0.001
Onset to image time (min) 0.70 Very high 1.00 1.00 1.00
Onset to door time (min) 0.65 Very high 1.00 1.00 1.00
Charlson comorbidity index 0.55 High 0.76 0.75 0.78 p < 0.001
Intravenous thrombolysis 0.45 High 1.36 1.28 1.43 p < 0.001
Dysphagia 0.35 High 1.45 1.37 1.53 p < 0.001
Initiation of oral diet 0.30 High 0.11 0.09 0.14 p < 0.001
Rehabilitation treatment 0.28 Medium 1.73 1.64 1.83 p < 0.001
Screening of dysphagia 0.25 Medium 0.66 0.59 0.74 p < 0.001
Rehabilitation consult 0.23 Medium 2.76 2.53 3.02 p < 0.001
EMS 0.20 Medium 1.39 1.32 1.47 p < 0.001
Diagnostic image check 0.18 Medium 1.06 1.04 1.09 p < 0.001
Stroke unit 0.15 Medium 0.79 0.74 0.84 p < 0.001
Hospital volume 0.13 Medium 0.99 0.98 0.99 p < 0.001
Sex 0.10 Medium 0.84 0.79 0.88 p < 0.001
Antiplatelet medication 0.09 Low 0.96 0.88 1.05
Smoker 0.09 Low 1.09 1.03 1.16 p < 0.01
Anticoagulant medication 0.08 Low 0.97 0.91 1.02
Status of health insurance 0.08 Low 0.74 0.67 0.81 p < 0.001
Atrial fibrillation 0.07 Low 0.83 0.74 0.94 p < 0.01
Surgery 0.07 Low 0.99 0.98 1.00
No antithrombotic medication 0.06 Low 0.38 0.32 0.46 p < 0.001
ER visit route 0.05 Low
Mechanical thrombectomy 0.05 Low 2.62 2.19 3.13 p < 0.001
Lipid profile test 0.04 Low 0.90 0.81 1.00 p < 0.05

Statistical significance *** p < 0.001, ** p < 0.01, * p < 0.05.

EMS, EMS, Emergency medical service; ER, emergency room; NIHSS, National Institutes of Health Stroke Scale.

ML model performance

The receiver operating characteristic (ROC) curves illustrate the comparative performance of the logistic regression, random forest (RF), and support vector machine (SVM) models (Fig. 1). Among these, the RF model revealed the highest predictive capability with an area under the ROC curve (AUC) of 0.87, significantly outperforming logistic regression (AUC: 0.80) and SVM (AUC: 0.80) (Supplementary Table 1).

Fig. 1.

Fig. 1

Receiver operating characteristic curves for the machine learning models.

Variable importance in the RF model

The variable importance analysis revealed that the initial NIHSS score was the most critical predictor, reflecting its role in assessing stroke severity. Age was the second most important factor, underscoring the influence of biological aging on recovery potential. Timeliness of care, represented by onset-to-image and onset-to-door times, also ranked highly, underscoring the importance of rapid interventions. The CCI and IVT administration further contributed significantly to the model, indicating the interplay between chronic health conditions and targeted acute interventions in determining patient outcomes. Other important variables included dysphagia considerations, rehabilitation treatments, and diagnostic imaging practices, all of which collectively influenced recovery trajectories (Fig. 2).

Fig. 2.

Fig. 2

Variables importance of the random forest model.

Comparison between random forest and multivariable logistic regression models

A comparative analysis was conducted between the RF model and multivariable logistic regression (MLR) to identify predictors of good functional outcome (mRS score ≤ 2 at discharge) in 40,586 patients with acute ischemic stroke. Table 2 summarizes the comparison between RF and MLR, including variable importance in RF and ORs with CIs. Both models identified initial NIHSS score, age, CCI, intravenous thrombolysis, dysphagia, rehabilitation treatment, and stroke unit admission as significant predictors of functional recovery. Mechanical thrombectomy, screening of dysphagia, smoking status, and health insurance status were also associated with the outcome, albeit with varying degrees of importance.

However, there were notable differences between the RF and MLR models. The RF model ranked onset-to-door time and onset-to-image time as highly important variables, whereas these were not statistically significant in MLR. Initiation of oral diet and dysphagia screening were highly ranked in RF, suggesting a strong association with functional recovery, but MLR only partially supported this finding. RF provided a more comprehensive ranking of variables based on their contribution to model performance, while MLR provided ORs for direct clinical interpretation.

Discussion

We developed an ML model using data from 40,586 patients with acute cerebral infarction collected at the initial emergency room presentation. Our analysis, which compared several algorithms, identified RF as the most predictive model.

Recent advancements in AI-driven predictive models for ischemic stroke have yielded promising findings. For instance, Zhao et al.8 developed a convolutional neural network-based model to predict long-term outcomes using multimodal imaging and clinical data and reported an AUC of 0.85. Similarly, Zhang et al.9 introduced a deep learning model incorporating demographic and imaging features to predict 90-day mRS scores, reporting an AUC of 0.84. Although these models emphasize the importance of advanced imaging data, their reliance on specialized imaging modalities may limit their applicability in resource-constrained settings. In contrast, our study focused on readily available clinical and demographic variables, making it highly applicable across diverse healthcare environments.

Compared with other recent studies using 156 initial clinical data points (AUC 0.818) for functional outcome prediction10, our study is the first to leverage a verified, large-scale nationwide registry. Our findings underscore the potential of ML approaches in early outcome prediction and inform future treatment directions, while the continued expansion of this registry promises to enhance predictive accuracy and clinical utility.

Our study highlights the efficacy of ML models in predicting functional outcomes in patients with acute ischemic stroke. Using data from the Korean ASAR, we developed three predictive models: RF, SVM, and logistic regression. The RF model exhibited superior predictive performance with an AUC of 0.87, compared to the SVM and logistic regression models, both of which had an AUC of 0.80. This indicates the exceptional ability of the RF model to discern complex nonlinear interactions between variables, thereby offering individualized predictions of patient outcomes. The RF model is particularly well-suited for identifying patients likely to achieve good functional outcomes at discharge, offering superior sensitivity and specificity across different thresholds. Key predictors, such as the NIHSS score, age, onset-to-imaging time, and the CCI, were identified as the most influential variables. Notably, our findings revealed that ML-based approaches can enhance the accuracy of prognostic assessments and facilitate personalized interventions in stroke management. Our machine learning model demonstrates that the acute clinical condition and early treatment of patients are key variables influencing functional outcomes. In the future, this model may serve as a tool in the emergency room to explain to clinicians, patients, and caregivers the importance of timely initial treatment decisions—such as IVT administration or rapid treatment initiation—and to support decision-making for the selection of active therapies.

The key predictors identified in our model—NIHSS score, age, onset-to-imaging time, onset-to-door time, the CCI, and IVT—reflect their critical roles in stroke prognosis. NIHSS, a well-established marker of stroke severity, has consistently been shown to predict functional outcomes, with higher scores correlating with poorer outcomes, emphasizing the need for immediate intensive care in patients with severe strokes11. Biological aging also significantly influences recovery potential, as older patients often have lower neuroplasticity and higher comorbidity burden, leading to diminished outcomes12. This onset-to-image time variable highlights the importance of rapid imaging for prompt diagnosis and treatment planning, with shorter time associated with improved reperfusion rates and better outcomes13. Onset-to-door delays in hospital arrival directly affected the likelihood of receiving timely thrombolytic therapy, underscoring its predictive value14. A higher CCI score, reflecting the overall health burden, predicts worse outcomes, owing to the compounded effects of multiple chronic conditions15. Timely administration of IVT is a cornerstone of acute ischemic stroke treatment, and significantly improves functional recovery when administered within optimal time windows16. The inclusion of these variables in our model aligns with existing evidence, reinforcing their role as critical determinants of stroke outcomes.

One of the major differences between RF and MLR was the treatment of time-related variables (onset-to-door time and onset-to-image time). The RF model ranked these variables as highly important, suggesting that timely interventions may be critical in determining good functional outcomes. However, the MLR analysis did not indicate statistical significance for these variables, likely due to collinearity with NIHSS score and treatment interventions. This discrepancy suggests that the effect of time may be non-linear, with potential threshold effects that RF can capture but MLR cannot. Additionally, dysphagia and initiation of oral diet were found to be highly influential in RF, emphasizing the importance of early dysphagia screening and dietary interventions in post-stroke recovery. MLR partially supported this finding, as screening of dysphagia was significant (OR: 0.66; 95% CI: 0.59–0.74; p < 0.001), but initiation of oral diet was not directly included as a predictor. These findings suggest that ML models may detect complex interactions between dysphagia, rehabilitation, and nutrition that are not well captured in traditional regression models. While MLR analysis provides clinical interpretability, offering ORs that quantify the effect of each predictor, RF has a superior ability to capture complex, non-linear relationships. The ranking of time-related variables and dysphagia factors in RF suggests the need for further investigation into non-linear effects in stroke recovery.

Our findings demonstrate the potential of AI-based prediction models in acute ischemic stroke. As the field advances, such models may complement or even replace existing stroke scoring systems, thereby supporting clinicians in predicting individual patient outcomes and enhancing clinical decision-making in real-world practice17. As the field evolves, prospective multicenter studies are expected to refine individual patient outcome predictions and enhance clinical applicability.

In our study, 88.8% of patients were current smokers, and in multivariable model, current smoking was associated with favorable functional outcomes. This is consistent with prior studies reporting a so-called ‘smoker’s paradox,’ in which smoking patients undergoing thrombolysis demonstrated better prognoses18. This may partially explain favorable outcomes observed in smoking subgroups of acute ischemic stroke patients. Additionally, admission to a stroke unit was linked to worse outcomes, which may reflect confounding by indication, as more severe cases are typically managed in intensive care units. These findings highlight the need for further investigation into institutional care patterns and patient severity at admission.

In conclusion, this study highlights the potential of ML models, particularly RF, in predicting functional outcomes for patients with stroke. By identifying critical predictors—such as NIHSS score, age, and time-related factors—this research offers actionable insights for optimizing stroke care delivery. These findings underscore the importance of early intervention, advanced hospital care, and personalized rehabilitation programs. Future studies should focus on integrating these models into clinical workflows to enhance patient outcomes.

Methods

Study design

This retrospective observational study analyzed data from the Korean National Stroke Registry collected over a 5-year period. The nationwide ASAR was established in Korea to control the quality of acute stroke care. Data for this registry are collected for 6 months each year, with the selected period determined by the Health Insurance Review and Assessment Service (HIRA) to assess healthcare quality. In Korea, ASAR has been established, examined, graded by the HIRA, and analyzed. These ASAR data are collected once every 2 years from approximately 220 hospitals nationwide, providing a comprehensive dataset of 40,586 patients diagnosed with acute stroke. Patients first diagnosed with acute cerebral infarction within 7 days of symptom onset and admitted to one of the 220 nationally certified stroke centers were included. Patients were excluded if they had a prior history of stroke, had been previously hospitalized at another institution for the same event, or if the diagnosis was related to trauma.

This study was conducted in partnership with the Health Insurance Review and Assessment Service under the National Registry Research Project. This study was approved by the Research Ethics Committee of the Catholic University Hospital (IRB No. UC23Z1SE0023). The requirement for informed consent was waived by the Institutional Review Board (IRB) of Uijeongbu St. Mary’s Hospital, The Catholic University of Korea, due to the retrospective nature of the study. All methods were performed in accordance with the relevant guidelines and regulations.

Data collection and variables

Clinical and demographic data, including age, sex, medical history (atrial fibrillation, smoking, and comorbidities), NIHSS score at admission, laboratory results, stroke management interventions (intravenous thrombolysis, mechanical thrombectomy, stroke unit admission, rehabilitation therapy, and dietary adjustments), and time metrics (onset-to-door and onset-to-imaging times), were retrieved from the electronic medical records. Rigorous data-cleaning processes were implemented to ensure the inclusion of complete, high-quality records, which were essential for conducting robust statistical and ML analyses. All data were anonymized using encrypted personal identification numbers in partnership with HIRA under the Joint Project on Quality Assessment Research in Korea. Variables with more than 5% missing data were excluded before analysis whenever possible. Functional outcome data with incomplete descriptions were removed entirely. Missing data related to onset-to-image time and onset-to-door time variables accounted for less than 3% of the total patient dataset. These missing values were replaced using the median value before analysis.

Statistical analysis

Descriptive statistics were used to summarize patient demographics and baseline clinical characteristics. Mean values and standard deviations were used for continuous variables, while frequencies and percentages were calculated for categorical variables. Outcomes, including the rate of favorable functional outcomes (mRS score ≤ 2), IVT, mechanical thrombectomy, and surgical interventions, were analyzed to understand treatment efficacy and prognostic factors. A multivariate logistic regression analysis was performed to determine the association between independent variables—age, sex, initial NIHSS, CCI, atrial fibrillation, smoking, stroke management interventions, and hospital characteristics—and good functional outcomes. ORs and CIs were calculated for each predictor variable. A p-value < 0.05 was considered statistically significant. Statistical analyses were conducted using R software (version 4.42).

ML model development

The predictive modeling process was designed to follow a structured and systematic approach. Initially, data preprocessing steps were performed, including the removal of unnecessary columns such as “ID” and “year” to streamline the dataset. Categorical variables, including antithrombotic medications, were translated and encoded using one-hot encoding techniques to ensure compatibility with ML algorithms. The dataset was then divided into training and testing subsets, with 70% allocated for training and 30% for testing, while maintaining the distribution of the target variable, which was the achievement of a good functional outcome. Subsequently, feature selection focused on clinically relevant variables, including NIHSS scores, age, and specific time intervals, such as onset-to-image time. These features were prioritized based on their well-established influence on stroke outcomes, as highlighted in the existing literature.

For model development and evaluation, three distinct ML algorithms were implemented: RF, SVM, and logistic regression. Hyperparameter tuning was conducted to optimize model performance, with key parameters adjusted for each algorithm. Model performance was comprehensively evaluated using metrics such as accuracy, sensitivity, specificity, and AUC. Threshold optimization was performed for all models to enhance their predictive performance. The optimal decision threshold for each model was determined using Youden’s index, which balances sensitivity and specificity to maximize diagnostic performance.

Finally, the results were visualized and interpreted to provide a clearer understanding of the model outputs. ROC curves were plotted for all models to provide a comparative assessment of their performance. Additionally, variable importance was analyzed within the RF model to identify the most influential predictors and their relative contributions to the predictions.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (17.1KB, docx)
Supplementary Material 2 (18.9KB, docx)

Acknowledgements

Thanks for JY Lee , SW Park.

Author contributions

Taehoon Ko & Kanghyuk Lee & Jae Sang Oh were analyzed the data and participitated on AI modeling .Youg Uk Kwon, Yu Ra Lee & So Young Han were supported the data under the national project. Jae Sang Oh was designed the study and wrote the document.All authors reviewed the manuscript.

Funding

This research was supported by the Patient-Centered Clinical Research Coordinating Center (PACEN) funded by the Ministry of Health & Welfare, Republic of Korea (RS-2022-KH131668 (HC22C0043)), by the Korean Neuroendovascular Society, the Minister of Health & Welfare of Republic of Korea (RS-2024-00439915) and Uijeongbu St. Mary’s Hospital of the Catholic University of KoreaThe funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Data availability

Data that support the findings of this study are available from the Korean Stroke Registry (M20230323002) but are not publicly available due to licensing restrictions. Access to the data is restricted and was granted specifically for the study. Data may be made available from the corresponding author upon reasonable request and with permission of the Korean Stroke Registry.

Code availability

The Stroke Registry dataset was only available in a closed analytical environment operated by the government, and while research outputs necessary for manuscript preparation could be exported after analysis, the written code for data processing could not be exported due to potential exposure of metadata information. Code for training and validating ML algorithms is available. We can supply the machine learning code in a supplementary material. If there are any questions about the code, please contact the corresponding author for answers. Analysis was conducted using R 4.1.2 in the closed analytical environment.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Nayak, N., Mahendran, N., Kuys, S. & Brauer, S. G. What factors at discharge predict physical activity and walking outcomes 6 months after stroke? A systematic review. Clin. Rehabil. 38, 1393–1403 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Michel, P. et al. The acute stroke registry and analysis of Lausanne (ASTRAL): design and baseline analysis of an ischemic stroke registry including acute multimodal imaging. Stroke41, 2491–2498 (2010). [DOI] [PubMed] [Google Scholar]
  • 3.Kim, J., Park, J. E., Nahrendorf, M. & Kim, D. E. Direct thrombus imaging in stroke. J. Stroke. 18, 286–296 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Cai, W. et al. Association between triglyceride-glucose index and all-cause mortality in critically ill patients with ischemic stroke: analysis of the MIMIC-IV database. Cardiovasc. Diabetol.22, 138. 10.1186/s12933-023-01864-x (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Kazi, S. A., Siddiqui, M. & Majid, S. Stroke outcome prediction using admission Nihss in anterior and posterior circulation stroke. J. Ayub Med. Coll. Abbottabad. 33, 274–278 (2021). [PubMed] [Google Scholar]
  • 6.Lee, J. Y. et al. Short and long-term outcomes of subarachnoid hemorrhage treatment according to hospital volume in korea: a nationwide multicenter registry. J. Korean Med. Sci.36, e146. 10.3346/jkms.2021.36.e146 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Park, S. W. et al. High-volume hospital had lower mortality of severe intracerebral hemorrhage patients. J. Korean Neurosurg. Soc.67, 622–636 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zhao, X., Liu, T. & Wang, Z. CNN-based predictive modeling for ischemic stroke outcomes: a multimodal approach. Artif. Intell. Med.133, 102404 (2023). [Google Scholar]
  • 9.Zhang, Y., Wang, H. & Chen, X. Deep learning for predicting 90-day outcomes in ischemic stroke: a multicenter study. Stroke Res. Treat.2022, 9856723 (2022). [Google Scholar]
  • 10.Liu, Z. et al. Predicting functional outcome in acute ischemic stroke patients after endovascular treatment by machine learning. Transl Neurosci.14, 20220324. 10.1515/tnsci-2022-0324 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Kwon, H. S., Lee, J. W. & Kim, Y. H. NIHSS score as a predictor of stroke outcomes: a meta-analysis. Stroke52, 821–829 (2021).33504192 [Google Scholar]
  • 12.Yeh, S. H., Cheng, H. L. & Lin, C. C. The impact of age on recovery after ischemic stroke: a retrospective cohort study. J. Neurol.269, 3400–3412 (2022).35249144 [Google Scholar]
  • 13.Saver, J. L., Smith, E. E. & Fonarow, G. C. The importance of rapid imaging in acute ischemic stroke care. Circulation140, 1477–1490 (2019).31542949 [Google Scholar]
  • 14.Khatri, P., Yeatts, S. D. & Mazighi, M. Onset-to-door time and outcomes in ischemic stroke. Lancet Neurol.19, 318–325 (2020). [Google Scholar]
  • 15.de Jong, G., van der Worp, H. B. & van Gijn, J. Charlson comorbidity index and its relation to stroke outcomes: a systematic review. Cerebrovasc. Dis.45, 11–19 (2018). [Google Scholar]
  • 16.Hacke, W. et al. Thrombolysis with Alteplase 3 to 4.5 hours after acute ischemic stroke. N Engl. J. Med.359, 1317–1329 (2008). [DOI] [PubMed] [Google Scholar]
  • 17.Heo, J. et al. Machine learning-based model for prediction of outcomes in acute stroke. Stroke50 (5), 1263–1265. (2019). [DOI] [PubMed]
  • 18.Ovbiagele, B. & Saver, J. L. The smoking-thrombolysis paradox and acute ischemic stroke. Neurology65 (2), 293–295 (2005). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (17.1KB, docx)
Supplementary Material 2 (18.9KB, docx)

Data Availability Statement

Data that support the findings of this study are available from the Korean Stroke Registry (M20230323002) but are not publicly available due to licensing restrictions. Access to the data is restricted and was granted specifically for the study. Data may be made available from the corresponding author upon reasonable request and with permission of the Korean Stroke Registry.

The Stroke Registry dataset was only available in a closed analytical environment operated by the government, and while research outputs necessary for manuscript preparation could be exported after analysis, the written code for data processing could not be exported due to potential exposure of metadata information. Code for training and validating ML algorithms is available. We can supply the machine learning code in a supplementary material. If there are any questions about the code, please contact the corresponding author for answers. Analysis was conducted using R 4.1.2 in the closed analytical environment.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES