Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Aug 20;15:30657. doi: 10.1038/s41598-025-10104-z

Machine learning-based prediction model for post-stroke cerebral-cardiac syndrome: a risk stratification study

Tingyu Zhang 1, Zelin Hao 1, Qunlian Jiang 1, Linhui Zhu 1, Lifang Ye 2,
PMCID: PMC12368185  PMID: 40835616

Background

Cerebral-cardiac syndrome (CCS) is a severe cardiac complication following acute ischemic stroke, often associated with adverse outcomes. This study developed and validated a machine learning (ML) model to predict CCS using clinical, laboratory, and pre-extracted imaging features. A retrospective cohort of 511 post-stroke patients was analyzed. Data on demographics, laboratory results, imaging findings, and medications were collected. CCS diagnosis was based on cardiac dysfunction occurring after stroke, excluding pre-existing cardiac diseases. Five machine learning models, including Logistic Regression, Random Forest, Support Vector Machine (SVM), XGBoost, and Deep Neural Network, were trained on 80% of the data and tested on the remaining 20%. Discrimination was assessed by AUC (95% CI), calibration by Hosmer–Lemeshow test and Brier score, and thresholds by Youden’s index. Model interpretability was evaluated using SHAP. On the test set, XGBoost achieved the highest discrimination (AUC 0.879; 95% CI 0.807–0.942), accuracy 0.825, precision 0.844, recall 0.675, and F1 score 0.750. Random forest followed closely (AUC 0.866; accuracy 0.845; precision 0.962; recall 0.625; F1 0.758). SVM and logistic regression yielded AUCs of 0.853 and 0.818, respectively. Calibration was optimal for SVM (HL p > 0.05; Brier 0.126) and random forest (HL p > 0.05; Brier 0.131). SHAP analysis identified D-dimer, ACEI/ARB use, HbA1c, C-reactive protein, and prothrombin time as top predictors. ML-based models accurately predict early CCS in ischemic stroke patients. XGBoost offers superior discrimination, while SVM and random forest demonstrate better calibration. Incorporation of these models into clinical workflows may enhance risk stratification and guide targeted preventive strategies.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-10104-z.

Keywords: Cerebral-cardiac syndrome, Machine learning, Stroke, SHAP analysis, Risk prediction

Subject terms: Stroke, Risk factors, Predictive markers

Introduction

Acute ischemic stroke (AIS) is a major global health concern, ranking among the leading causes of death and disability worldwide1. While the primary focus of stroke management has traditionally been on restoring cerebral perfusion and preventing neurological damage, the systemic complications of AIS, particularly those affecting the cardiovascular system, have garnered increasing attention in recent years2. Cerebral-cardiac syndrome (CCS), a severe and multifaceted complication of AIS, exemplifies the intricate interplay between the brain and the heart3,4. CCS encompasses a spectrum of cardiac dysfunctions, including arrhythmias, myocardial injury, and structural abnormalities, which collectively contribute to poor cardiovascular outcomes and increased mortality3,4. The pathophysiology of CCS is complex, involving excessive activation of the sympathetic nervous system, systemic inflammation, catecholamine surges, and endothelial damage, all of which exacerbate myocardial toxicity and impair cardiac function5.

Despite advances in understanding the mechanisms underlying CCS, its early detection and prediction remain significant clinical challenges6. The multifactorial nature of CCS, coupled with the variability in its clinical presentation, complicates risk stratification and timely intervention. Current diagnostic approaches rely heavily on clinical observation and basic biomarkers, which often fail to capture the dynamic and interconnected processes driving CCS3,7. This diagnostic gap underscores the urgent need for innovative tools that can integrate diverse data sources and provide accurate, real-time predictions of CCS risk.

In recent years, machine learning (ML) has emerged as a transformative technology in healthcare, offering powerful tools for analyzing complex datasets and uncovering patterns that elude traditional statistical methods8. ML algorithms have demonstrated remarkable success in predictive analytics, particularly in fields such as cardiology and neurology9,10. By integrating multidimensional data, including clinical, imaging, and biomarker information, ML models can identify subtle correlations and interactions, enabling the development of personalized risk prediction tools913. In the context of CCS, ML holds the potential to revolutionize risk assessment by integrating diverse data sources to predict the likelihood of cardiac complications post-stroke with high accuracy and reliability. However, the application of ML in CCS prediction is still in its infancy, with limited studies exploring its feasibility and effectiveness.

In this study, we aimed to develop and validate an ML-based predictive model for CCS using a comprehensive dataset, integrating SHAP analysis to elucidate the most significant predictors of CCS and ensure model transparency. The findings of this study provide a clinically applicable framework for early risk stratification, which could guide timely interventions and optimize the management of post-stroke patients at risk for CCS.

Methods

Study design and data sources

This retrospective cohort study analyzed data from 511 patients admitted to The Affiliated Hospital of Hangzhou Normal University between March 1, 2020 and March 15, 2023. Patient records were retrieved from the hospital’s electronic health system. Ethical approval was obtained from the institutional review board of The Affiliated Hospital of Hangzhou Normal University, and all procedures were conducted in compliance with the Declaration of Helsinki. Given the retrospective nature of the study, the requirement for informed consent was waived by the institutional review board. Patient data were anonymized to ensure confidentiality and used solely for research purposes.

Participants and setting

Inclusion criteria

Adults (age > 18 years) with confirmed AIS as diagnosed by CT or MRI.

Exclusion criteria

Pre-existing cardiac diseases (e.g., prior myocardial infarction, symptomatic heart failure [LVEF < 50%], known arrhythmias, moderate-to-severe valvular disease, previous coronary revascularization, cardiomyopathies, congenital heart defects), thrombolysis or thrombectomy recipients, malignancy, severe hepatic, renal, or pulmonary disease, and cases with > 30% missing data.

Patients were categorized into CCS and non-CCS groups based on new-onset cardiac dysfunction within 72 h post-stroke, excluding those with prior cardiac conditions.

Outcome definition

The primary outcome was occurrence of cerebral-cardiac syndrome (CCS), defined as development of cardiac dysfunction within 72 h of stroke onset, identified by one or more of: elevated troponin levels, ECG or Holter-detected arrhythmias, LVEF < 50% on echocardiography, ST-segment changes, QT prolongation, or T-wave inversion. Outcome assessment was blinded to predictor values. Outcome assessments were performed by two board-certified cardiologists blinded to predictor data; discrepancies were resolved by consensus.

Predictors

Candidate predictors included demographic variables (age, sex), stroke severity (NIHSS), vascular imaging (carotid stenosis grade), laboratory biomarkers (e.g., D-dimer, CRP, HbA1c, prothrombin time), medication use (e.g., ACEI/ARB, statins), vital signs, and pre-extracted imaging features. All predictors were defined and measured at hospital admission. Nominal variables were one-hot encoded; ordinal predictors were mapped to integer codes.

Sample size and missing data

The full cohort comprised 511 patients with 178 CCS events (34.8%). Cases with > 30% missing entries were excluded. Continuous missing values were imputed by median and categorical by mode; remaining data completeness permitted robust model training without additional omission.

Data preprocessing and partitioning

Continuous features were standardized to zero mean and unit variance using scikit-learn’s StandardScaler. The dataset was split via stratified sampling into training (80%, n = 408) and testing (20%, n = 103) sets, preserving CCS prevalence.

Model development and internal validation

Five classification algorithms (Logistic Regression, Random Forest, Support Vector Machine, XGBoost, Deep Neural Network) were implemented using default hyperparameters. Hyperparameter tuning employed stratified 5-fold GridSearchCV optimizing ROCAUC on the training set; no performance gain justified retaining defaults. Internal validation comprised 5-fold cross-validation, reporting mean AUC and accuracy across folds.

Performance evaluation

Discrimination was assessed by ROCAUC with 95% CIs estimated via bootstrapping (n = 1000). Classification thresholds were chosen by maximizing Youden’s index to derive accuracy, precision, recall, and F1-score on the held-out test set. Calibration was evaluated using calibration plots, Hosmer-Lemeshow test, Brier score, and Expected Calibration Error (ECE). Clinical utility was examined through decision curve analysis comparing net benefit against “treat all” and “treat none” strategies. Model explainability was quantified using SHAP to rank feature importance and visualize effect directionality.

Statistical analyses were conducted using Python (version 3.9). Continuous variables were compared using Student’s t-test or the Mann-Whitney U test, while categorical variables were analyzed using the chi-square or Fisher’s exact test. A p-value of < 0.05 was considered statistically significant.

Results

Cohort characteristics

Out of 571 screened records, 511 met eligibility criteria; 178 (34.8%) developed CCS. Baseline comparisons between CCS and non-CCS groups showed significant differences in age (68.8 ± 7.9 vs. 66.8 ± 8.2 years, p = 0.011), NIHSS (median 11 vs. 9, p < 0.001), D-dimer (median 1.17 vs. 0.79, p < 0.001), CRP, HbA1c, and severe carotid stenosis (9.0% vs. 2.4%, p = 0.006) (Table 1).

Table 1.

Baseline characteristics of the study Population.

Characteristics CCS (n = 178) Non-CCS (n = 333) P-value
Age 68.78 ± 7.88 66.84 ± 8.24 0.011
BMI 23.80 (22.70–25.00) 23.90 (22.20–25.80) 0.760
SBP 129.88 ± 14.98 129.49 ± 15.04 0.779
DBP 79.96 ± 9.71 79.83 ± 9.89 0.888
NIHSS 11.00 (8.00–15.00) 9.00 (6.00–12.00) < 0.001
BNP 114.90 (44.10-153.40) 39.80 (29.70–51.50) < 0.001
cTNI 0.83 ± 0.34 0.03 ± 0.01 < 0.001
WBC 7.78 (6.80–8.93) 7.84 (6.92–8.79) 0.752
Neutrophil_% 60.39 ± 5.22 59.97 ± 4.88 0.368
Lymphocyte 2.48 ± 0.45 2.52 ± 0.49 0.354
Hemoglobin 119.93 ± 11.48 127.22 ± 10.40 < 0.001
Albumin 38.60 (34.90–42.50) 39.30 (36.60–42.30) 0.047
BUN 6.78 ± 1.95 6.60 ± 1.87 0.309
Creatinine 86.13 ± 16.71 87.11 ± 17.55 0.541
K 4.05 ± 0.35 4.02 ± 0.30 0.392
CRP 4.70 (3.40–5.90) 3.90 (2.90–4.90) < 0.001
HbA1c 6.72 ± 0.94 6.28 ± 0.91 < 0.001
D_dimer 1.17 (0.82–1.52) 0.79 (0.66–0.92) < 0.001
PT 11.63 ± 0.92 11.32 ± 0.93 < 0.001
APTT 26.08 ± 1.21 25.79 ± 1.10 0.007
TT 16.62 ± 0.80 16.83 ± 0.78 0.004
FIB 3.20 (2.50–3.80) 2.80 (2.40–3.20) < 0.001
TC 4.72 ± 0.43 4.68 ± 0.40 0.297
TG 1.18 (0.86–1.49) 1.24 (0.98–1.53) 0.086
HDL_C 1.32 ± 0.20 1.31 ± 0.20 0.421
LDL_C 2.79 ± 0.39 2.77 ± 0.40 0.640
Gender (Male) 105 (59.0%) 204 (61.3%) 0.685
Smoke 42 (23.6) 73 (21.9) 0.749
Drink 29 (16.3) 38 (11.4) 0.156
Hypertension 126 (70.8) 193 (58.0) 0.006
Diabetes 63 (35.4) 94 (28.2) 0.116
History_Stroke 31 (17.4) 24 (7.2) < 0.001
ACEI_ARB 87 (48.9) 97 (29.1) < 0.001
Beta_Blocker 77 (43.3) 83 (24.9) < 0.001
Statin 85 (47.8) 98 (29.4) < 0.001
LVEF 45.0 ± 6.5% 56.0 ± 5.8% < 0.001
Carotid_Stenosis

Grade 0: 97 (54.5%)

Grade 1: 35 (19.7%)

Grade 2: 30 (16.9%) Grade 3: 16 (9.0%)

Grade 0: 211 (63.4%) Grade 1: 60 (18.0%) Grade 2: 54 (16.2%) Grade 3: 8 (2.4%) 0.006

Continuous variables are presented as mean ± standard deviation for normally distributed data or median (interquartile range) for non-normally distributed data. Categorical variables are presented as n (%). CCS, cerebral-cardiac syndrome; BMI, body mass index; SBP, systolic blood pressure; DBP, diastolic blood pressure; NIHSS, national institutes of health stroke scale; BNP, B‑type natriuretic peptide; cTnI, cardiac troponin I; WBC, white blood cell; BUN, blood urea nitrogen; CRP, C-reactive protein; PT, prothrombin time; APTT, activated partial thromboplastin time; TT, thrombin time; FIB, fibrinogen; TC, total cholesterol; TG, triglycerides; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; ACEI, angiotensin‑converting enzyme inhibitor; ARB, angiotensin II receptor blocker; LVEF, left ventricular ejection fraction.

Model development and cross-validation

During 5-fold cross-validation on the training set, SVM achieved the highest mean ROCAUC (0.799 ± 0.056) but lower accuracy (0.662 ± 0.004). Random Forest maximized accuracy (0.819 ± 0.018, AUC 0.794 ± 0.042). XGBoost and Logistic Regression balanced discrimination and accuracy (XGBoost: AUC 0.779 ± 0.042, accuracy 0.811 ± 0.018; Logistic Regression: AUC 0.785 ± 0.052, accuracy 0.792 ± 0.033). Deep Neural Network performance was moderate (AUC 0.759 ± 0.041, accuracy 0.755 ± 0.023). All AUC standard deviations were below 0.06, indicating stable discrimination performance across folds (Supplementary Table S1). Hyperparameter tuning did not enhance test-set AUC; default models were retained. Supplementary Table S2 details each model’s parameter settings and performance before and after tuning.

Test set performance

On the test cohort (n = 103), XGBoost achieved the highest discrimination (AUC 0.879; 95% CI 0.807–0.942), with accuracy 0.825, precision 0.844, recall 0.675, and F1score 0.750. Random Forest followed (AUC 0.866, accuracy 0.845, precision 0.962, recall 0.625, F1score 0.758). SVM and Logistic Regression yielded AUCs of 0.853 and 0.818, respectively; Deep Neural Network trailed (AUC 0.817). Detailed metrics are summarized in Table 2 and ROC curves shown in Fig. 1.

Table 2.

Model performance comparison of the five machine learning models.

Model Accuracy AUC Precision Recall F1-score
Logistic regression 0.796 0.818 0.771 0.675 0.720
Random forest 0.845 0.866 0.962 0.625 0.758
SVM 0.854 0.853 0.963 0.650 0.776
XGBoost 0.825 0.879 0.844 0.675 0.750
Deep neural network 0.786 0.817 0.725 0.725 0.725

AUC, area under the curve; ROC, receiver operating characteristic; SVM, support vector machine; XGBoost, extreme gradient boosting.

Fig. 1.

Fig. 1

The ROC curves for the five machine learning models. AUC: Area Under the Curve; ROC: Receiver Operating Characteristic; SVM: Support Vector Machine; XGBoost: Extreme Gradient Boosting.

Confusion matrices for each classifier further illustrate that XGBoost predicted 27 true positives, 58 true negatives, 5 false positives and 13 false negatives, with analogous breakdowns for Random Forest (25 TP, 62 TN, 1 FP, 15 FN), SVM (26 TP, 62 TN, 1 FP, 14 FN), Logistic Regression (27 TP, 55 TN, 8 FP, 13 FN) and Deep Neural Network (29 TP, 52 TN, 11 FP, 11 FN) (Fig. 2).

Fig. 2.

Fig. 2

Confusion Matrix for the five machine learning models. Panels A–E correspond to (A) logistic regression, (B) random forest, (C) support vector machine, (D) XGBoost, and (E) deep neural network. SVM: Support Vector Machine; XGBoost: Extreme Gradient Boosting; CCS: cerebral–cardiac syndrome.

Calibration performance

Model calibration was assessed using the Hosmer-Lemeshow goodness-of-fit test and Brier scores across all five machine learning algorithms. The analysis revealed significant differences in calibration quality among models (Table 3). SVM demonstrated the best calibration performance with a Hosmer-Lemeshow p-value of 0.246 (p > 0.05, indicating good calibration) and the lowest Brier score of 0.126. Random Forest also showed acceptable calibration (HL p-value = 0.153, Brier score = 0.131).

Table 3.

Model calibration performance: of the five machine learning models.

Model Brier score HL p-value Well calibrated
Logistic regression 0.148 0.001 No
Random forest 0.131 0.153 Yes
SVM 0.126 0.246 Yes
XGBoost 0.141 < 0.001 No
Deep neural network 0.178 < 0.001 No

SVM, support vector machine; XGBoost, extreme gradient boosting; HL, hosmer-lemeshow.

In contrast, XGBoost exhibited poor calibration despite achieving the highest discrimination performance, with a Hosmer-Lemeshow p-value < 0.001 and Brier score of 0.141. Similarly, the Deep Neural Network showed severe miscalibration (HL p-value < 0.001, Brier score = 0.178), while Logistic Regression demonstrated moderate calibration issues (HL p-value = 0.001, Brier score = 0.148).

Calibration plots (Fig. 3) visually confirmed these findings, with SVM and Random Forest showing predicted probabilities closely aligned with observed frequencies across deciles, while XGBoost and Deep Neural Network showed systematic deviations from the diagonal line of perfect calibration.

Fig. 3.

Fig. 3

Calibration curves for the five machine learning models. SVM: Support Vector Machine; XGBoost: Extreme Gradient Boosting.

Decision-curve analysis

In decision curve analysis (Fig. 4), all five models confer greater net benefit than either default strategy (“treat all” or “treat none”) across a wide range of clinically relevant threshold probabilities (0.1–0.8). Across the clinically relevant threshold range of 0.1–0.6, the SVM and RF models yielded the highest net benefit, consistently exceeding both “treat-all” and “treat-none” strategies. XGBoost provided marginally higher net benefit only at very low thresholds (< 0.15), reflecting its aggressive detection of true positives. Beyond 0.6, none of the models offered advantage over default strategies, indicating diminishing clinical utility at high decision thresholds.

Fig. 4.

Fig. 4

Decision curve analysis for the five machine learning models. SVM: Support Vector Machine; XGBoost: Extreme Gradient Boosting.

Threshold optimization analysis

Because the dataset exhibited a moderate class imbalance (Non-CCS: CCS = 1.96 : 1; CCS prevalence = 33.8% in training and 38.8% in test data), six threshold-selection strategies were compared: the default cut-off of 0.50, a prevalence-based cut-off (0.338), Youden Index optimisation, F1-score maximisation, and cost-sensitive optimisation with false-negative penalties of 2 : 1 and 3 : 1.

The SVM model with a Youden-optimised threshold (0.576) delivered the most balanced overall performance (accuracy = 87.4%, sensitivity = 70.0%, specificity = 98.4%, F1-score = 0.812), outperforming all other model–strategy combinations. For high-sensitivity applications, XGBoost tuned with a 3 : 1 false-negative penalty reached 95.0% sensitivity at a threshold of 0.019, albeit with reduced specificity (52.4%) and accuracy (68.9%). Conversely, Random Forest retained the highest specificity (98.4%) at the default 0.50 threshold, making it preferable for confirmatory testing despite lower sensitivity (62.5%).

Across models, prevalence-based thresholds consistently increased sensitivity (≈ 5–10% points) relative to the default but did not maximise any single metric. Notably, the Youden and F1-maximising thresholds converged for SVM, Random Forest, and XGBoost, indicating stable optimisation despite the modest imbalance. (Supplementary Table S3)

Feature importance

Feature importance analysis using SHAP identified D-dimer as the most influential predictor, followed closely by ACEI/ARB usage, HbA1c, CRP, and prothrombin time. Additional significant contributors included age, NIHSS scores, and severe carotid artery stenosis. Figure 5A presents the SHAP feature importance plot, which ranks predictors based on their average impact on model output. The SHAP summary plot (Fig. 5B) further illustrates the distribution and directionality of these features, demonstrating how higher levels of D-dimer, CRP, and HbA1c, among others, are associated with increased CCS risk.

Fig. 5.

Fig. 5

Feature Importance and SHAP Summary. (A) Feature importance ranked by mean SHAP values, (B) SHAP summary plot. SHAP: SHapley Additive exPlanations.

Discussion

Our study demonstrates that ML algorithms can effectively predict CCS in AIS patients, offering a novel approach to early risk stratification. Among the five ML models tested, XGBoost achieved the highest discrimination (AUC 0.879), while SVM (AUC 0.853) and random forest (AUC 0.866) exhibited superior calibration. This performance divergence highlights a common modeling challenge: balancing discrimination with calibration. In clinical practice, an optimal algorithm must align with the intended application—XGBoost may excel at identifying high-risk individuals, whereas SVM or random forest may provide more reliable absolute risk estimates for individualized decision-making.

Key predictors identified by SHAP analysis, including D-dimer, prothrombin time, ACEI/ARB usage, HbA1c, and CRP, collectively highlight the critical roles of coagulation, inflammation, glycemic control, and blood pressure regulation in the pathophysiology of cerebral-cardiac syndrome (CCS). Among these, Elevated D-dimer and prolonged prothrombin time, both key markers of coagulation dysfunction, reflect the hyperactivation of the coagulation and fibrinolytic systems frequently observed following acute ischemic events14,15. This state is often accompanied by systemic inflammation, endothelial dysfunction, and microvascular thrombosis, all of which are central to CCS development16,17. The inclusion of these coagulation markers as key predictors underscores their utility in capturing the complex interplay between thrombosis and cardiovascular dysfunction. Clinically, monitoring these markers in ischemic stroke patients could facilitate early identification of those at risk for CCS, enabling timely interventions such as anticoagulation therapy or enhanced cardiac monitoring18.

Interestingly, ACEI/ARB usage emerged as a significant predictor of CCS, which may appear counterintuitive given their well-established protective effects in cardiovascular diseases. However, this association likely reflects the presence of poorly controlled hypertension in CCS patients, as suggested by the higher baseline systolic and diastolic blood pressures observed in this group. Poor blood pressure control, despite ACEI/ARB therapy, may indicate treatment resistance or suboptimal adherence, would increase the risk of cardiovascular complications following AIS18,19. This finding highlights the critical importance of achieving optimal blood pressure management to mitigate the risk of cardiac complications in stroke patients.

Elevated HbA1c levels highlight the impact of poor glycemic control, which exacerbates oxidative stress, endothelial dysfunction, and systemic inflammation—key mechanisms contributing to CCS development16,17. Similarly, elevated CRP levels emphasize the role of systemic inflammation, which is heightened after AIS and is known to promote cardiac dysfunction16,17. These findings collectively illustrate how vascular risk factors, combined with the inflammatory and thrombotic responses to stroke, drive the development and progression of CCS. Carotid artery stenosis, particularly severe stenosis (70–99%), was also more prevalent in CCS patients, highlighting the role of systemic atherosclerosis and cerebrovascular pathology in CCS development20.

Our work builds on earlier ML research in stroke by shifting the focus from neurological to cardiac outcomes. Whereas past studies predominantly addressed complications such as hemorrhagic transformation or functional decline21,22 our models emphasize transparent, interpretable risk insights via SHAP, facilitating clinician adoption and seamless integration into electronic health record systems.

From a translational standpoint, ML-driven CCS risk scores could inform targeted surveillance and preventive strategies. For example, patients identified as high-risk by XGBoost could undergo early echocardiography, continuous electrocardiographic monitoring, or preemptive cardioprotective interventions (e.g., beta-blockers, tailored anticoagulation). Those deemed low-risk might avoid unnecessary resource utilization and potential treatment-related complications. Decision curve analysis confirmed that all models provided net benefit over treat-all or treat-none approaches across clinically relevant threshold probabilities (10–60%), with random forest and SVM offering particularly robust benefit in intermediate-risk ranges.

The findings of this study have important clinical implications. Early identification of CCS risk through machine learning can improve clinical decision-making by enabling proactive measures to prevent cardiac complications, reduce morbidity, and improve outcomes. Future research should focus on integrating machine learning models into clinical workflows and evaluating their impact on patient care in real-world settings. Additionally, incorporating longitudinal data and external validation cohorts would further strengthen the robustness and applicability of the predictive model.

Limitation

This study has several limitations. Its retrospective design and single-center data may limit generalizability, requiring prospective, multicenter validation. The exclusion of thrombolysis and thrombectomy patients restricts applicability to this subgroup. Potential biases in data collection and the moderate sample size raise concerns about overfitting, despite cross-validation efforts. The use of static variables overlooks dynamic changes that might influence CCS risk, and while SHAP analysis highlights feature importance, causal relationships remain unconfirmed. Individual diabetes medications were not modeled; instead, we used diabetes status and HbA₁c as proxies to preserve model simplicity given inconsistent prescription coding. Finally, challenges in integrating machine learning models into clinical workflows, including data standardization and clinician acceptance, warrant further investigation.

Conclusion

ML-based models accurately predict early CCS in ischemic stroke patients. XGBoost offers superior discrimination, while SVM and random forest demonstrate better calibration. Important predictors highlight the roles of coagulation, inflammation, and metabolic control. This study underscores the potential of ML models in enhancing clinical decision-making and risk stratification, ultimately improving patient outcomes.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1 (19.3KB, docx)
Supplementary Material 2 (18.3KB, docx)
Supplementary Material 3 (22.5KB, docx)

Acknowledgements

None.

Abbreviations

ACEI/ARB

Angiotensin-converting enzyme inhibitors/angiotensin receptor blockers

AUC

Area under the curve

APTT

Activated partial thromboplastin time

BNP

Brain natriuretic peptide

BMI

Body mass index

BUN

Blood urea nitrogen

CCS

Cerebral-cardiac syndrome

CRP

C-reactive protein

CT

Computed tomography

DBP

Diastolic blood pressure

FIB

Fibrinogen

HbA1c

Glycated hemoglobin

HDL-C

High-density lipoprotein cholesterol

HL

Hosmer-lemeshow

LDL-C

Low-density lipoprotein cholesterol

LVEF

Left ventricular ejection fraction

ML

Machine learning

MLP

Multi-layer perceptron

MRI

Magnetic resonance imaging

NIHSS

National Institutes of Health Stroke Scale

NT-proBNP

N-terminal pro-brain natriuretic peptide

PT

Prothrombin time

ROC

Receiver operating characteristic

SBP

Systolic blood pressure

SHAP

SHapley additive exPlanations

SVM

Support vector machine

TC

Total cholesterol

TG

Triglycerides

TT

Thrombin time

WBC

White blood cell count

XGBoost

Extreme gradient boosting

Author contributions

T.Z.: Conceptualization, Formal Analysis, Writing—Original Draft, Investigation; Z.H.: Methodology, Software, Formal Analysis, Visualization, Data Curation, Writing—Original Draft; L.Z.: Validation, Data Curation, Visualization, Investigation; Q.J.: Validation, Investigation, Software, Writing—Review and Editing; L.Y.: Conceptualization, Supervision, Writing—Review and Editing, Project Administration.

Funding

None.

Data availability

The study protocol, de-identified data and analysis code are available from the corresponding author on reasonable request.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Robba, C., van Dijk, E. J. & van der Jagt, M. Acute ischaemic stroke and its challenges for the intensivist. Eur. Heart J. Acute Cardiovasc. Care. 11, 258–268 (2022). [DOI] [PubMed] [Google Scholar]
  • 2.Battaglini, D. et al. Brain–heart interaction after acute ischemic stroke. Crit. Care. 24, 163 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wang, L. et al. Stroke–heart syndrome: Current progress and future outlook. J. Neurol.271, 4813–4825 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Mitrică, M. et al. The hidden heart: Exploring cardiac damage post-stroke: A narrative review. Medicina (Kaunas)60, 1699 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wang, M. & Peng, Y. Advances in brain-heart syndrome: Attention to cardiac complications after ischemic stroke. Front Mol. Neurosci15 (2022). [DOI] [PMC free article] [PubMed]
  • 6.Wang, X. C. et al. Predictive factors for cerebrocardiac syndrome in patients with severe traumatic brain injury: A retrospective cohort study. Front Neurol14 (2023). [DOI] [PMC free article] [PubMed]
  • 7.Lian, H. et al. Early prediction of cerebral-cardiac syndrome after ischemic stroke: The PANSCAN scale. BMC Neurol.20, 272 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Sidey-Gibbons, J. A. M. & Sidey-Gibbons, C. J. Machine learning in medicine: A practical introduction. BMC Med. Res. Methodol.19, 64 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bonkhoff, A. K. & Grefkes, C. Precision medicine in stroke: Towards personalized outcome predictions using artificial intelligence. Brain145, 457–475 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Mathur, P., Srivastava, S., Xu, X. & Mehta, J. L. Artificial intelligence, machine learning, and cardiovascular disease. Clin. Med. Insights Cardiol.14, 1179546820927404 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Vu, T. et al. Machine learning model for predicting coronary heart disease risk: Development and validation using insights from a Japanese population–based study. JMIR Cardio. 9, e68066 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Vu, T. et al. Prediction of depressive disorder using machine learning approaches: Findings from the NHANES. BMC Med. Inf. Decis. Mak.25, 83 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Thanh, N. T., Luan, V. T., Viet, D. C., Tung, T. H. & Thien, V. A machine learning-based risk score for prediction of mechanical ventilation in children with dengue shock syndrome: A retrospective cohort study. PLoS One. 19, e0315281 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Barber, M., Langhorne, P., Rumley, A., Lowe, G. D. O. & Stott, D. J. Hemostatic function and progressing ischemic stroke. Stroke35, 1421–1425 (2004). [DOI] [PubMed] [Google Scholar]
  • 15.Luan, X. et al. High levels of plasma fibrinogen and prothrombin time are related to post-stroke emotional impairment. Brain Res.1748, 147017 (2020). [DOI] [PubMed] [Google Scholar]
  • 16.Dziedzic, T. Systemic inflammation as a therapeutic target in acute ischemic stroke. Expert Rev. Neurother.15, 523–531 (2015). [DOI] [PubMed] [Google Scholar]
  • 17.Chen, Z. et al. Brain-heart interaction: Cardiac complications after stroke. Circ. Res.121, 451–468 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Guidelines for the Early Management of Patients With Acute Ischemic Stroke. 2019 update to the 2018 guidelines for the early management of acute ischemic stroke: A guideline for healthcare professionals from the American heart association/American stroke association|stroke. https://www.ahajournals.org/doi/10.1161/STR.0000000000000211
  • 19.Bath, P. M. et al. Blood pressure management for ischemic stroke in the first 24 hours. Stroke53, 1074–1084 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.van Velzen, T. J., Kuhrij, L. S., Westendorp, W. F., van de Beek, D. & Nederkoorn, P. J. Prevalence, predictors and outcome of carotid stenosis: A sub study in the preventive antibiotics in stroke study (PASS). BMC Neurol.21, 20 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Issaiy, M., Zarei, D., Kolahi, S. & Liebeskind, D. S. Machine learning and deep learning algorithms in stroke medicine: A systematic review of hemorrhagic transformation prediction models. J. Neurol.272, 37 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lee, R., Choi, H., Park, K. Y., Kim, J. M. & Seok, J. W. Prediction of post-stroke cognitive impairment using brain FDG PET: deep learning-based approach. Eur. J. Nucl. Med. Mol. Imaging. 49, 1254–1262 (2022). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (19.3KB, docx)
Supplementary Material 2 (18.3KB, docx)
Supplementary Material 3 (22.5KB, docx)

Data Availability Statement

The study protocol, de-identified data and analysis code are available from the corresponding author on reasonable request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES