Machine learning-based models for prediction of the risk of stroke in coronary artery disease patients receiving coronary revascularization

Lulu Lin; Li Ding; Zhongguo Fu; Lijiao Zhang

doi:10.1371/journal.pone.0296402

. 2024 Feb 8;19(2):e0296402. doi: 10.1371/journal.pone.0296402

Machine learning-based models for prediction of the risk of stroke in coronary artery disease patients receiving coronary revascularization

Lulu Lin ^1,^#, Li Ding ^1,^#, Zhongguo Fu ², Lijiao Zhang ^3,^*

Editor: Aamna Mohammed AlShehhi⁴

PMCID: PMC10852291 PMID: 38330052

Abstract

Background

To construct several prediction models for the risk of stroke in coronary artery disease (CAD) patients receiving coronary revascularization based on machine learning methods.

Methods

In total, 5757 CAD patients receiving coronary revascularization admitted to ICU in Medical Information Mart for Intensive Care IV (MIMIC-IV) were included in this cohort study. All the data were randomly split into the training set (n = 4029) and testing set (n = 1728) at 7:3. Pearson correlation analysis and least absolute shrinkage and selection operator (LASSO) regression model were applied for feature screening. Variables with Pearson correlation coefficient<9 were included, and the regression coefficients were set to 0. Features more closely related to the outcome were selected from the 10-fold cross-validation, and features with non-0 Coefficent were retained and included in the final model. The predictive values of the models were evaluated by sensitivity, specificity, area under the curve (AUC), accuracy, and 95% confidence interval (CI).

Results

The Catboost model presented the best predictive performance with the AUC of 0.831 (95%CI: 0.811–0.851) in the training set, and 0.760 (95%CI: 0.722–0.798) in the testing set. The AUC of the logistic regression model was 0.789 (95%CI: 0.764–0.814) in the training set and 0.731 (95%CI: 0.686–0.776) in the testing set. The results of Delong test revealed that the predictive value of the Catboost model was significantly higher than the logistic regression model (P<0.05). Charlson Comorbidity Index (CCI) was the most important variable associated with the risk of stroke in CAD patients receiving coronary revascularization.

Conclusion

The Catboost model was the optimal model for predicting the risk of stroke in CAD patients receiving coronary revascularization, which might provide a tool to quickly identify CAD patients who were at high risk of postoperative stroke.

Introduction

Coronary artery disease (CAD) is the most common cardiovascular diseases wherein atherosclerosis occurs in one or more of the coronary arteries [1]. CAD was reported to be one of the major causes of mortality in both the developed and developing countries [2]. Currently, percutaneous coronary intervention (PCI) and coronary artery bypass grafting (CABG) are common coronary revascularization procedures [3]. With the development and application of drug-eluting stents and minimally invasive surgery, the prognosis of patients undergoing PCI or CABG was improved, but some patients still have postoperative adverse cardiovascular events, which result in worse prognosis [4]. Stroke is a cerebrovascular disorder which is the second leading cause of mortality and morbidity worldwide [5]. Stroke is a prevalent complication among surgical and ICU patients, with postoperative stroke incidence in cardiac surgery patients ranging from 0.8% to 9% [6]. Previous evidence suggested that the occurrence of stroke occurrence was associated with a significantly elevated risk of mortality in patients undergoing PCI or CABG procedures [7, 8]. Constructing predictive models to accurately identify patients receiving coronary revascularization who were at high risk of stroke is of great significance.

Recently, machine learning methods are gradually applied to the construction of clinical models in order to improve the accuracy of clinical diagnosis or prediction of diseases [9]. Machine learning method was widely applied in predicting poor prognosis after heart surgery and the risk of postoperative stroke, which presented better performance than traditional risk models such as logistic regression [10–12]. However, no studies have reported the use of machine learning to predict the risk of postoperative stroke in patients undergoing coronary revascularization.

This study intended to construct several prediction models for the risk of stroke in CAD patients who underwent coronary revascularization based on machine learning methods. The optimal prediction model was identified and the predictive value was compared with traditional logistic regression model.

Methods

Study design and population

In this cohort study, the records of 6289 CAD patients receiving coronary revascularization were obtained in Medical Information Mart for Intensive Care IV (MIMIC-IV). MIMIC-IV builds upon the success of MIMIC-III and incorporates numerous enhancements from 2008 to 2019. MIMIC-IV is a relational database that encompasses authentic hospitalizations of patients admitted to a tertiary academic medical center located in Boston, MA, USA. Each patient’s length of stay, laboratory tests, medication treatment, vital signs and other comprehensive information during their ICU stay were recorded [13]. Patients with age < 18 years old and those with the length of ICU stay less than 24 h were excluded. Finally, 5757 participants were included. The requirement of ethical approval for this was waived by the Institutional Review Board of The second hospital of Dalian medical university, because the data was accessed from MIMIC-IV (a publicly available database). The need for written informed consent was waived by the Institutional Review Board of The second hospital of Dalian medical university due to retrospective nature of the study.

Potential predictors

Age (years), gender (female or male), ethnicity (White, Black, others or unknown), insurance (Medicaid, Medicare or others), marital status (married, or no married), first care unit [Cardiac Care Unit (CCU), cardiovascular intensive care unit (CVICU) or others), family history of stroke (yes or no), personal history of stroke (yes or no), treatments (CABG and PCI, CABG alone or PCI alone), thrombolysis (yes or no), antiplatelet (yes or no), beta-blockers (yes or no), calcium channel blockers (yes or no), ventilation (yes or no), vasopressors (yes or no), Glasgow Coma Scale (GCS) score, Charlson Comorbidity Index (CCI), weight (kg), heart rate (bpm), systolic blood pressure (SBP) (mmHg), diastolic blood pressure (DBP) (mmHg), respiratory rate (bpm), temperature (°C), oxygen saturation (SpO₂) (%), white blood cell (WBC) (K/uL), platelet (K/uL), hemoglobin (g/dL), red blood cell distribution width (RDW) (%), hematocrit (%), creatinine (mg/dL), international normalized ratio (INR), prothrombin time (PT) (seconds), partial thromboplastin time (PTT) (seconds), blood urea nitrogen (BUN) (mg/dL), glucose (mg/dL), calcium (mmol/L), sodium (mEq/L), chloride (mEq/L), bicarbonate (mEq/L), and lactate (mmol/L) were potential predictors. All these data were collected within 24 hours of admission to ICU.

Construction of the prediction models

Logistic regression model is a classification algorithm evolved from linear regression, and belongs to a Sigmoid function normalization model of generalized linear regression model, which is commonly used to solve binary classification problems and has strong explanatory ability [14].

Support vector machine (SVM) is a classification algorithm, and it can also be classified. Different models can be made according to different input data. If the input label is classified value, SVC() is used for classification. This algorithm improves the generalization ability of learning machine by seeking the minimum structural risk, and minimizes the empirical risk and confidence range. Its basic model is defined as the linear classifier with the largest interval on the feature space, that is, the learning strategy of support vector machine is to maximize the interval, and it can be converted into the solution of a convex quadratic programming problem [15].

Random forest is an extended variant of Bagging ensemble learning. On the basis of constructing Bagging ensemble with decision tree as a base learner, it further adds the selection of random attributes to the training process of decision tree. For each node of the base decision tree, a subset containing k attributes is randomly selected from the candidate attribute set of the node. Then an optimal attribute is selected for division. The method of prediction stage of this algorithm is Bagging strategy. The classification model uses voting method to determine the final result, and the regression model uses mean method to determine the final result [16].

Extreme Gradient Boosting (XGBoost) is an efficient gradient lifting decision tree algorithm, which is improved on the basis of the original Gradient Boosting Decision Tree. As a forward addition model, its core is to adopt the Boosting thought, which integrates multiple weak learners into a strong learner by a certain method, that is, multiple trees make decisions together, and the result of each tree is the difference between the target value and the predicted result of all the previous trees, adding up all the results to get the final result. In this way, the effect of the whole model is improved [17].

Adaptive Boosting (Adaboost) is an iterative algorithm to train different classifiers for the same training set, and then set these weak classifiers together to form a stronger final classifier. The set strategy is to increase the weight of the samples that were classified wrong by the previous round of classifiers. Reducing the weight of samples with correct classification will get more attention from the following classifiers, and then weaker classifiers can be generated. By combining these weak classifiers with majority weighted voting, the classifier with small error rate is increased, and the classifier with large error rate is reduced, so that it plays a less role in voting [18].

Naive bayes is one of the most widely used classification algorithms. It is a classifier method based on Bayesian definition and independent assumption of feature conditions. Naive Bayes algorithm is based on Bayesian principle and uses the knowledge of probability statistics to classify sample data sets. It is characterized by the combination of prior probability and posterior probability, which avoids the subjective bias of using only prior probability, and also avoids the overfitting phenomenon of using sample information alone [19].

K-nearest neighbor (KNN) is one of the most basic and simplest algorithms in the machine learning algorithm model. It can be used for classification and regression by measuring the distance between different eigenvalues. The working principle is to use the training data to partition the eigenvector space and take the partition result as the final algorithm model.

Categorical boosting (Catboost) is a kind of gradient boosting algorithm library that can handle categorical features well. It has made some improvements on the basis of the original Gradient Boosting Decision Tree. Specifically, the algorithm has two characteristics of adaptive learning rate and categorical feature processing, which can help the algorithm better control the contribution of the weak learner in each iteration. In addition, the algorithm can deal with categorical features efficiently and reasonably, so it can deal with the influence of categorical features better [20].

Outcome variable

Postoperative stroke was the outcome, which was screened according to the ICD diagnosis codes, ICD-9 (430–436, and 997020, and ICD-10 three I60-I66, I9782, I97820.

Statistical analysis

Mean ± SD was used to describe the distribution of measurement data subject to normal distribution, and t-test was used to compare the difference between groups. Median and quartiles were used to describe the distribution of measurement data that did not follow normal distribution, and Wilcoxon rank sum test was used to compare the difference between groups. The enumeration data were expressed as number and percentage of cases [n (%)], and the Chi-square test or Fisher’s exact probability were used to compare the differences between the groups. Missing values <20% were dealt by Random forest interpolation, and ≥20% were deleted (Table 1). Sensitivity analysis were performed before and after interpolation (Table 2). All the data were randomly split into the training set (n = 4029) and testing set (n = 1728) at 7:3 with the random seed if 42 [21]. Pearson correlation analysis and least absolute shrinkage and selection operator (LASSO) regression model were applied for feature screening. Variables with Pearson correlation coefficient<9 were included, and the regression coefficients were set to 0. Features more closely related to the outcome were selected from the 10-fold cross-validation, and features with non-0 Coefficient were retained and included in the final model. Eight prediction models were constructed, and the parameter settings were shown in Table 3. The predictive values of the models were evaluated by sensitivity, specificity, area under the curve (AUC), accuracy, and 95% confidence interval (CI). The confidence level alpha = 0.05. Missing value interpolation, training set and testing set split, data modeling and result visualization were completed using Python 3.9.12. Sensitivity analysis and difference comparison were performed by SAS 9.4 (SAS Institute Inc., Cary, NC, USA).

Table 1. The number and percentage of missing values.

Variables	n	%	Total samples (n)	Manipulation
GCS	6	0.10%	5751	Random forest interpolation
SPO₂	6	0.10%	5751	Random forest interpolation
Systolic	7	0.12%	5750	Random forest interpolation
Diastolic	7	0.12%	5750	Random forest interpolation
Platelet	7	0.12%	5750	Random forest interpolation
Heart rate	8	0.14%	5749	Random forest interpolation
Chloride	8	0.14%	5749	Random forest interpolation
Sodium	9	0.16%	5748	Random forest interpolation
BUN	11	0.19%	5746	Random forest interpolation
Creatinine	11	0.19%	5746	Random forest interpolation
Hematocrit	12	0.21%	5745	Random forest interpolation
Bicarbonate	15	0.26%	5742	Random forest interpolation
Glucose	16	0.28%	5741	Random forest interpolation
RDW	18	0.31%	5739	Random forest interpolation
Hemoglobin	18	0.31%	5739	Random forest interpolation
WBC	18	0.31%	5739	Random forest interpolation
Calcium	35	0.61%	5722	Random forest interpolation
Weight	135	2.34%	5622	Random forest interpolation
PTT	147	2.55%	5610	Random forest interpolation
INR	149	2.59%	5608	Random forest interpolation
PT	150	2.61%	5607	Random forest interpolation
Marital status	358	6.22%	5399	Random forest interpolation
Temperature	685	11.90%	5072	Random forest interpolation
Lactate	734	12.75%	5023	Random forest interpolation
Respiratory rate	783	13.60%	4974	Random forest interpolation
Lymphocytes	2514	43.67%	3243	Deleting
Neutrophil	2514	43.67%	3243	Deleting
ALT	4976	86.43%	781	Deleting
Bilirubin total	4977	86.45%	780	Deleting
AST	4983	86.56%	774	Deleting
ALP	5012	87.06%	745	Deleting
Albumin	5377	93.40%	380	Deleting
Cholesterol	5460	94.84%	297	Deleting
Cholesterol-HDL	5464	94.91%	293	Deleting
Triglycerides	5466	94.95%	291	Deleting
Cholesterol-LDL	5467	94.96%	290	Deleting
Height	5498	95.50%	259	Deleting
Ntprobnp	5628	97.76%	129	Deleting
GGT	5753	99.93%	4	Deleting
Troponin-I	5757	100.00%	0	Deleting

Open in a new tab

GCS: Glasgow Coma Scale, SPO₂: oxygen saturation, BUN, blood urea nitrogen, RDW: red blood cell distribution width, WBC: white blood cell, PTT partial thromboplastin time, INR: international normalized ratio, PT prothrombin time, HDL: high density lipoprotein

Table 2. Sensitivity analysis of data before and after manipulation.

Variables	After manipulation (n = 5757)	Before manipulation (n = 5757)	Statistics	P
Platelet, K/uL, Mean ± SD	159.42 ± 51.43	157.92 ± 51.44	t = 1.55	0.121
Calcium, mmol/L, M (Q₁, Q₃)	1.20 (1.11, 1.35)	1.20 (1.11, 1.35)	Z = -0.413	0.679
SPO₂, %, Mean ± SD	98.81 ± 2.63	98.82 ± 2.63	t = -0.02	0.985
RDW, %, Mean ± SD	13.60 ± 0.99	13.57 ± 1.00	t = 1.44	0.151
BUN, mg/dL, M (Q₁, Q₃)	16.00 (13.00, 21.00)	16.00 (13.00, 21.00)	Z = -0.011	0.991
Glucose, mg/dL, Mean ± SD	139.92 ± 33.61	139.41 ± 34.17	t = 0.79	0.427
Marital status, n (%)			χ² = 1.459	0.227
Married	3854 (66.94)	3556 (65.86)
No married	1903 (33.06)	1843 (34.14)
Creatinine, mg/dL, M (Q₁, Q₃)	0.90 (0.70, 1.10)	0.90 (0.70, 1.10)	Z = 0.025	0.980
Lactate, mmol/L, M (Q₁, Q₃)	2.00 (1.60, 2.60)	2.10 (1.50, 2.70)	Z = 1.470	0.141
Weight, kg, Mean ± SD	84.67 ± 16.98	84.51 ± 17.21	t = 0.49	0.622
Hemoglobin, g/dL, Mean ± SD	10.27 ± 2.27	10.27 ± 2.27	t = 0.06	0.955
Temperature, °C, Mean ± SD	36.35 ± 0.47	36.36 ± 0.51	t = -1.75	0.080
INR, ratio, Mean ± SD	1.38 ± 0.19	1.38 ± 0.20	t = 0.50	0.620
Chloride, mEq/L, Mean ± SD	105.84 ± 3.60	105.91 ± 3.58	t = -0.97	0.334
PT, seconds, Mean ± SD	15.15 ± 2.02	15.12 ± 2.05	t = 0.89	0.375
PTT, seconds, Mean ± SD	37.23 ± 21.82	36.66 ± 21.72	t = 1.39	0.165
Hematocrit, %, Mean ± SD	30.77 ± 6.67	30.74 ± 6.66	t = 0.25	0.800
Heart rate, bpm, Mean ± SD	80.04 ± 9.38	80.00 ± 9.52	t = 0.19	0.848
SBP, mmHg, Mean ± SD	113.69 ± 16.67	113.49 ± 16.70	t = 0.65	0.516
DBP, mmHg, Mean±SD	59.74 ± 10.73	59.43 ± 10.66	t = 1.52	0.128
Respiratory rate, bpm, Mean ± SD	15.21 ± 2.07	15.18 ± 2.34	t = 0.61	0.542
Bicarbonate, mEq/L, Mean ± SD	23.18 ± 2.17	23.19 ± 2.20	t = -0.27	0.790
Sodium, mEq/L, Mean ± SD	135.76 ± 2.83	135.76 ± 2.84	t = -0.10	0.920
WBC, K/uL, M (Q₁, Q₃)	12.20 (9.20, 15.50)	12.10 (9.20, 15.50)	Z = -0.730	0.466
GCS, score, Mean ± SD	13.38 ± 3.54	13.38 ± 3.54	t = 0.02	0.985

Open in a new tab

SD: standard deviation, M: median, Q1: 1st quartile, Q3: 3st quartile, SPO₂: oxygen saturation, RDW: red blood cell distribution width, BUN, blood urea nitrogen, INR: international normalized ratio, PT prothrombin time, PTT partial thromboplastin time, SBP: systolic blood pressure, DBP: diastolic blood pressure, WBC: white blood cell, GCS: Glasgow Coma Scale

Table 3. Parameter settings for 8 machine learning models.

Model	Logistic regression	SVM	Random forest	XGBoost	Adaboost	Naive bayes	KNN	Catboost
Parameter 1	C = 1	kernel = ’rbf’	n_jobs = -1	learning_rate = 0.001	learning_rate = 0.001	alpha = 10	n_neighbors = 13	objective = ‘ CrossEntropy’
Parameter 2	max_iter = 1000	probability = True	oob_score = True	max_depth = 4	n_estimators = 178		weights = ‘ uniform’	colsample_bylevel = 0.503349
Parameter 3	n_jobs = -1	C = 0.732536	class_weight = ’balanced’	n_estimators = 494	random_state = 151		leaf_size = 35	depth = 4
Parameter 4	random_state = 151	gamma = 0.011125	n_estimators = 300	min_child_weight = 6				l2_leaf_reg = 1
Parameter 5	solver = ’newton-cg’	random_state = 151	max_depth = 5	subsample = 0.218971				boosting_type = ‘Ordered’
Parameter 6	penalty = ’l2’		random_state = 151	reg_lambda = 1				bootstrap_type = ‘MVS’
Parameter 7				Seed = 151				random_state = 151
Parameter 8								used_ram_limit = ‘3gb’
Parameter 9								learning_rate = 0.001

Open in a new tab

SVM: support vector machine, XGBoost: Extreme Gradient Boosting, Adaboost: Adaptive Boosting, KNN: K-nearest neighbor, Catboost: Categorical boosting

Results

Comparisons of the characteristics of participants with and without postoperative stroke in the training set

A total of 6289 CAD patients undergoing coronary revascularization were identified in MIMIC-IV. Among them, patients with the length of ICU stay less than 24 h were excluded (n = 532). Finally, 5757 participants were included. All patients were divided into the postoperative stroke group (n = 433) and postoperative non-stroke group according whether postoperative stroke occurred. The screen process of the participants was exhibited in Fig 1.

The percentage of patients with personal history of stroke in those with postoperative stroke was higher those without (15.86% vs 5.35%; P<0.001). The mean INR in patient with postoperative stroke was higher those without (1.41 vs 1.37; P = 0.001). The median length of stay in patents with postoperative stroke was higher those without (2.31 days vs 1.83 days; P<0.001). More detailed information was observed in Table 4.

Table 4. Comparisons of the characteristics of participants with and without postoperative stroke in the training set.

		Postoperative stroke
Variables	Total (n = 4029)	No (n = 3720)	yes (n = 309)	Statistics	P
Age, years, Mean ± SD	67.77 ± 10.96	67.45 ± 11.00	71.54 ± 9.85	t = -6.94	<0.001
Gender, n (%)				χ² = 8.998	0.003
Female	1005 (24.94)	906 (24.35)	99 (32.04)
Male	3024 (75.06)	2814 (75.65)	210 (67.96)
Ethnicity, n (%)				χ² = 4.675	0.197
White	2881 (71.51)	2650 (71.24)	231 (74.76)
Black	162 (4.02)	155 (4.17)	7 (2.27)
Others	483 (11.99)	453 (12.18)	30 (9.71)
Unknown	503 (12.48)	462 (12.42)	41 (13.27)
Insurance, n (%)				χ² = 13.545	0.001
Medicaid	130 (3.23)	120 (3.23)	10 (3.24)
Medicare	1819 (45.15)	1649 (44.33)	170 (55.02)
Others	2080 (51.63)	1951 (52.45)	129 (41.75)
Marital status, n (%)				χ² = 5.110	0.024
Married	2686 (66.67)	2498 (67.15)	188 (60.84)
No married	1343 (33.33)	1222 (32.85)	121 (39.16)
First care unit, n (%)				χ² = 4.078	0.130
CCU	595 (14.77)	561 (15.08)	34 (11.00)
CVICU	3388 (84.09)	3117 (83.79)	271 (87.70)
Others	46 (1.14)	42 (1.13)	4 (1.29)
Family history of stroke, n (%)				-	0.653
No	4010 (99.53)	3703 (99.54)	307 (99.35)
Yes	19 (0.47)	17 (0.46)	2 (0.65)
Personal history of stroke, n (%)				χ² = 54.537	<0.001
No	3781 (93.84)	3521 (94.65)	260 (84.14)
Yes	248 (6.16)	199 (5.35)	49 (15.86)
Treatments, n (%)				χ² = 4.197	0.123
CABG and PCI	22 (0.55)	21 (0.56)	1 (0.32)
CABG alone	3316 (82.30)	3049 (81.96)	267 (86.41)
PCI alone	691 (17.15)	650 (17.47)	41 (13.27)
Thrombolysis, n (%)				-	1.000
No	4027 (99.95)	3718 (99.95)	309 (100.00)
Yes	2 (0.05)	2 (0.05)	0 (0.00)
Antiplatelet, n (%)				-	0.321
No	3970 (98.54)	3663 (98.47)	307 (99.35)
Yes	59 (1.46)	57 (1.53)	2 (0.65)
Beta blockers, n (%)				χ² = 18.061	<0.001
No	2933 (72.80)	2740 (73.66)	193 (62.46)
Yes	1096 (27.20)	980 (26.34)	116 (37.54)
Calcium channel blockers, n (%)				χ² = 22.544	<0.001
No	3616 (89.75)	3363 (90.40)	253 (81.88)
Yes	413 (10.25)	357 (9.60)	56 (18.12)
Ventilation, n (%)				χ² = 4.717	0.030
No	176 (4.37)	170 (4.57)	6 (1.94)
Yes	3853 (95.63)	3550 (95.43)	303 (98.06)
Vasopressors, n (%)				χ² = 8.987	0.003
No	1060 (26.31)	1001 (26.91)	59 (19.09)
Yes	2969 (73.69)	2719 (73.09)	250 (80.91)
GCS, score, Mean ± SD	13.36 ± 3.55	13.40 ± 3.52	12.89 ± 3.90	t = 2.21	0.028
CCI, score, M (Q₁, Q₃)	2.00 (1.00, 3.00)	2.00 (1.00, 3.00)	3.00 (2.00, 5.00)	Z = 14.468	<0.001
Weight, kg, Mean ± SD	84.60 ± 16.96	84.92 ± 16.98	80.81 ± 16.34	t = 4.10	<0.001
Heart rate, bpm, Mean ± SD	80.03 ± 9.40	80.07 ± 9.44	79.58 ± 8.94	t = 0.88	0.379
SBP, mmHg, Mean ± SD	113.64 ± 16.80	113.68 ± 16.76	113.17 ± 17.33	t = 0.51	0.610
DBP, mmHg, Mean ± SD	59.69 ± 10.76	59.92 ± 10.67	56.98 ± 11.39	t = 4.63	<0.001
Respiratory rate, bpm, Mean ± SD	15.21 ± 2.08	15.21 ± 2.09	15.20 ± 1.98	t = 0.11	0.916
Temperature, °C, Mean ± SD	36.34 ± 0.47	36.34 ± 0.47	36.31 ± 0.49	t = 1.40	0.162
SPO₂, %, Mean ± SD	98.79 ± 2.74	98.78 ± 2.73	98.88 ± 2.83	t = -0.59	0.556
WBC, K/uL, M (Q₁, Q₃)	12.20 (9.30, 15.60)	12.30 (9.30, 15.60)	11.70 (8.70, 15.10)	Z = -1.343	0.179
Platelet, K/uL, M (Q₁, Q₃)	152.20 (121.00, 192.00)	153.00 (121.00, 192.00)	144.00 (118.00, 186.00)	Z = -1.714	0.086
Hemoglobin, g/dL, Mean ± SD	10.27 ± 2.28	10.33 ± 2.28	9.64 ± 2.24	t = 5.13	<0.001
RDW, %, Mean ± SD	13.60 ± 1.00	13.57 ± 1.00	13.86 ± 0.95	t = -4.79	<0.001
Hematocrit, %, Mean ± SD	30.77 ± 6.72	30.94 ± 6.71	28.79 ± 6.58	t = 5.40	<0.001
Creatinine, mg/dL, M (Q₁, Q₃)	0.90 (0.70, 1.10)	0.90 (0.70, 1.10)	0.90 (0.70, 1.10)	Z = -0.588	0.557
INR, ratio, Mean ± SD	1.38 ± 0.19	1.37 ± 0.19	1.41 ± 0.21	t = -3.24	0.001
PT, seconds, Mean ± SD	15.14 ± 2.00	15.11 ± 1.99	15.51 ± 2.11	t = -3.38	<0.001
PTT, seconds, Mean ± SD	37.36 ± 22.21	37.45 ± 22.57	36.28 ± 17.30	t = 1.12	0.266
BUN, mg/dL, M (Q₁, Q₃)	16.00 (13.00, 21.00)	16.00 (13.00, 21.00)	17.00 (14.00, 23.00)	Z = 1.946	0.052
Glucose, mg/dL, Mean ± SD	140.04 ± 33.37	140.03 ± 33.27	140.20 ± 34.62	t = -0.09	0.932
Calcium, mmol/L, M (Q₁, Q₃)	1.20 (1.11, 1.35)	1.20 (1.12, 1.36)	1.18 (1.11, 1.33)	Z = -1.864	0.062
Sodium, mEq/L, Mean ± SD	135.80 ± 2.85	135.82 ± 2.84	135.58 ± 2.90	t = 1.43	0.153
Chloride, mEq, Mean ± SD	105.86 ± 3.63	105.84 ± 3.63	106.09 ± 3.71	t = -1.16	0.245
Bicarbonate, mEq/L, Mean ± SD	23.20 ± 2.17	23.22 ± 2.17	22.98 ± 2.17	t = 1.83	0.068
Lactate, mmol/L, M (Q₁, Q₃)	2.01 (1.60, 2.60)	2.01 (1.60, 2.60)	2.01 (1.50, 2.80)	Z = 0.490	0.624
LOS, days, M (Q₁, Q₃)	1.92 (1.26, 3.11)	1.83 (1.25, 3.07)	2.31 (1.36, 3.64)	Z = 6.621	<0.001

Open in a new tab

SD: standard deviation, M: Median, Q₁: 1st quartile, Q₃: 3st quartile, CCU: Cardiac Care Unit, CVICU: cardiovascular intensive care unit, CABG: coronary artery bypass grafting, PCI: percutaneous coronary intervention, GCS: Glasgow Coma Scale, CCI: Charlson Comorbidity Index, SBP: systolic blood pressure, DBP: diastolic blood pressure, SPO₂: oxygen saturation, WBC: white blood cell, RDW: red blood cell distribution width, INR: international normalized ratio (INR), PT prothrombin time, PTT partial thromboplastin time, BUN, blood urea nitrogen, LOS: length of stay

Construction and the predictive values of the prediction models for the risk of stroke in CAD patients receiving coronary revascularization

Initially, 40 features included, and 45 features were identified after one-hot encoding during the discretization of classification features. There were 39 variables with Pearson correlation coefficient<9. In order to ensure the stability and efficiency of features, valuable feature sets were selected from the 10-fold cross-validation results. As λ gradually expanded from 10⁻¹⁰ to 10¹⁰, the number of variables entering the model decreased. When λ was 0.002984, LASSO regression model showed the best prediction performance. Finally, 20 features with non-0 Coefficient were retained, which were age, GCS, CCI, weight, heart rate, DBP, SPO₂, platelet, creatinine, INR, PTT, BUN, glucose, calcium, Ethnicity-unknown, Ethnicity-White, personal history of stroke-yes, beta-blockers-yes, calcium channel blockers-yes, and vasopressors-yes. Fig 2 presented the changes of mean-squared error (MSE), and Fig 3 showed the changes of Coefficients with Lambda in the Lasso regression.

The predictive values of the prediction models for the risk of stroke in CAD patients undergoing coronary revascularization

The predictive values of prediction models for stroke in CAD patients undergoing coronary revascularization were presented in Table 5. The results delineated that Catboost model presented the best predictive performance with the AUC of 0.831 (95%CI: 0.811–0.851) in the training set, and 0.760 (95%CI: 0.722–0.798) in the testing set. The AUC of the logistic regression model was 0.789 (95%CI: 0.764–0.814) in the training set and 0.731 (95%CI: 0.686–0.776) in the testing set (Table 6). The results of Delong test revealed that the predictive value of the Catboost model was significantly higher than the logistic regression model (P<0.05). The ROC curves of machine learning models in the training set and testing set were respectively shown in Figs 4 and 5.

Table 5. The predictive values of different machine learning prediction models.

Model	Cut off	Sensitivity (95%CI)	Specificity (95%CI)	AUC (95%CI)	Accuracy (95%CI)
Logistic regression Training set	0.076	0.764 (0.716–0.811)	0.696 (0.681–0.711)	0.789 (0.764–0.814)	0.701 (0.687–0.715)
Logistic regression Testing set	0.076	0.669 (0.587–0.752)	0.671 (0.648–0.694)	0.731 (0.686–0.776)	0.671 (0.649–0.693)
SVM Training set	0.114	0.417 (0.362–0.472)	0.787 (0.774–0.800)	0.629 (0.595–0.662)	0.759 (0.746–0.772)
SVM Testing set	0.114	0.331 (0.248–0.413)	0.816 (0.797–0.835)	0.591 (0.536–0.646)	0.781 (0.762–0.801)
Random forest Training set	0.466	0.877 (0.840–0.914)	0.717 (0.703–0.732)	0.883 (0.866–0.900)	0.730 (0.716–0.743)
Random forest Testing set	0.466	0.621 (0.536–0.706)	0.734 (0.713–0.756)	0.749 (0.712–0.786)	0.726 (0.705–0.747)
XGBoost Training set	0.339	0.838 (0.797–0.879)	0.622 (0.606–0.638)	0.814 (0.792–0.837)	0.639 (0.624–0.653)
XGBoost Testing set	0.339	0.782 (0.710–0.855)	0.577 (0.553–0.601)	0.744 (0.703–0.784)	0.591 (0.568–0.615)
Adaboost Training set	0.148	0.699 (0.648–0.750)	0.671 (0.656–0.686)	0.720 (0.696–0.743)	0.673 (0.659–0.688)
Adaboost Testing set	0.148	0.677 (0.595–0.760)	0.673 (0.650–0.696)	0.718 (0.681–0.755)	0.674 (0.652–0.696)
Naïve bayes Training set	0.071	0.599 (0.544–0.653)	0.708 (0.694–0.723)	0.708 (0.679–0.737)	0.700 (0.686–0.714)
Naïve bayes Testing set	0.071	0.524 (0.436–0.612)	0.646 (0.622–0.669)	0.629 (0.582–0.676)	0.637 (0.614–0.660)
KNN Training set	0.154	0.612 (0.557–0.666)	0.798 (0.785–0.811)	0.785 (0.761–0.809)	0.784 (0.771–0.797)
KNN Testing set	0.154	0.226 (0.152–0.299)	0.906 (0.892–0.920)	0.625 (0.573–0.677)	0.857 (0.841–0.874)
Catboost Training set	0.173	0.838 (0.797–0.879)	0.662 (0.646–0.677)	0.831 (0.811–0.851)	0.675 (0.661–0.690)
Catboost Testing set	0.173	0.718 (0.639–0.797)	0.660 (0.637–0.683)	0.760 (0.722–0.798)	0.664 (0.642–0.687)

Open in a new tab

Table 6. Comparisons of the prediction performance of Catboost model and logistic regression model.

Dataset	Model	AUC (95%CI)	Statistics	P
Training set	Logistic regression	0.789 (0.764–0.814)	Ref
Training set	Catboost	0.831 (0.811–0.851)	Z = 6.36	<0.001
Testing set	Logistic regression	0.731 (0.686–0.776)	Ref
Testing set	Catboost	0.760 (0.722–0.798)	Z = 3.12	0.002

Open in a new tab

The importance of each feature in the Catboost model was displayed in Fig 6, which depicted that CCI was the most important variable associated with the risk of stroke in CAD patients undergoing coronary revascularization.

The SHapley Additive exPlanations (SHAP) values of features in the Catboost model were visualized in Fig 7, with SHAP values on the X-axis, features on the Y-axis, and each point representing a sample. The redder color indicates a stronger effect of the feature on the outcome, while the bluer color indicates a weaker effect. CCI was an important factor that exhibited a positive correlation with the risk of stroke in CAD patients following coronary revascularization. Creatinine levels were found to be associated with the risk of stroke, as indicated by blue dots primarily concentrated in areas where SHAP values exceeded 0, suggesting that lower creatinine levels were linked to higher stroke risk. Fig 8 depicted the SHAP value analysis of each sample in the Catboost model, where blue represents negative feature contribution and red indicates positive contribution. The length of an arrow signifies the degree of influence that a feature has on output, and its reduction or increase can be observed through the scale value on the X-axis. Base value refers to the average output of the model and training data, while the number below the arrow represents the actual eigenvalue of a single sample. Referring to a single sample, we found that CCI had the reddest summary plots, indicating that patients with a CCI score of 6 might be at risk for postoperative stroke. Age exhibited the longest blue bar, suggesting that patients aged 53 might experience reduced risk of postoperative stroke. The final SHAP output value of the model was -1.58. Fig 9 illustrated the impact of each feature in the Catboost prediction model on the model’s output. The x-axis represents features, while the y-axis represents SHAP values, and each dot denotes a sample. When a feature’s SHAP value exceeded 0, the risk of postoperative stroke was increased. CCI≥2 was associated with an increased risk of stroke.

Discussion

The present study constructed several prediction models for the risk of stroke in CAD patients who received coronary revascularization based on machine learning methods. The results demonstrated that Catboost model was the optimal model for predicting the risk of stroke in CAD patients who received coronary revascularization. The AUC of Catboost model was 0.831 in the training set, and 0.760 in the testing set, which were higher than the logistic regression model. The findings might provide a novel and quick tool to identify CAD patients receiving coronary revascularization treatments who were at high risk of stroke, and offer timely interventions to prevent the poor prognosis.

Previously, several prediction models were constructed to predict the risk of cardiovascular events in CAD patients receiving coronary revascularization. Zhang et al. built a nomogram for predicting major adverse cardiovascular events after PCI in coronary heart disease patients with chronic kidney disease, and the AUC value of the model was 0.612 [22]. Another prediction model for predicting major adverse cardiac and cerebrovascular events among high-risk myocardial infarction patients undergoing primary PCI had an AUC of 0.883 in the testing set [23]. A very early prediction model for stroke patients undergoing CABG had an AUC of 0.70 [24]. A multicenter Spanish study established a multivariate prediction model for perioperative in-hospital cerebrovascular accident after coronary bypass surgery, and the AUC was 0.77 [25]. The present study constructed several prediction models using machine learning method, and the Catboost model had the optimum predictive value for the risk of stroke in CAD patients who underwent coronary revascularization. The prediction model can handle irregular data, missing values and other problems well, and has good robustness, and effectively prevent overfitting, which also makes the model more general; in addition, it can match any advanced machine learning algorithm in terms of model performance [20]. The prediction model might provide an easy tool for the clinicians to quickly identify CAD patients undergoing coronary revascularization who were at high risk of stroke. The success of deep learning and machine learning has brought excitement and high expectations in revolutionary changes in health care in CAD patients [26–28]. The deep learning and machine learning algorithm could achieve more accurate results and outperform statistical methods. The findings of this study might be interesting for other researchers from different fields.

CCI is a measure of comorbidity burden that facilitates the evaluation of the prognostic significance of various clinical conditions based on their quantity and individual prognostic impact [29]. CCI has been extensively investigated in various clinical conditions and its significance as a prognostic indicator has been demonstrated. A previous study depicted that CCI was higher in patients with a more diffuse extent of CAD than those with milder disease [30]. Rashid et al. indicated that CCI >2 significantly increased the risk of mortality in acute coronary syndrome [31]. CCI was also identified to be a predictor of readmission in CAD patients [32]. CCI was reported to be highly associated with long-term survival and almost equivalent to left ventricular ejection fraction [33]. CCI was independently associated with an increase in 30-day, 1-year and 5-year cardiac death and major adverse cardiovascular events [34]. Other studies also revealed that CCI was a reliable indicator for the mortality of ischemic stroke patients [35–37]. Herein, CCI was found to be a vital predictor for the risk of stroke in CAD patients who received coronary revascularization. Age was identified to be an important variable associated with the outcomes of post-ST-segment elevation myocardial infarction patients among patients without preexisting coronary artery disease [38]. In this study, age was also related to the risk of stroke in CAD patients who received coronary revascularization.

Several prediction model for stroke risk in CAD patients with coronary revascularization treatments was constructed based on a variety of machine learning methods, which might provide certain references for early identification of high-risk patients and management of postoperative complications. Some limitations existed in this study. Firstly, although all the data were divided into the training set and testing set, the samples were from a single center, and the model should be applied with caution. Secondly, due to the limitations of the MIMIC database, some preoperative and postoperative data, and data related to liver enzymes could not be obtained. More studies were required to verify the results in this study.

Conclusions

The current study established several prediction models and identified an optimal model for the risk of stroke in CAD patients who received coronary revascularization. The prediction model might offer quick tool to identify CAD patients receiving coronary revascularization who were at high risk of stroke, and make specific treatment strategies to prevent the occurrence of stroke.

Data Availability

The datasets generated and/or analyzed during the current study are available in the MIMIC-IV database, https://mimic.physionet.org/iv/.

Funding Statement

The author(s) received funding from Dalian Medical Science research project (No. 2112012) for this work.

References

1.Amin AM. Metabolomics applications in coronary artery disease personalized medicine. Advances in clinical chemistry. 2021;102:233–70. Epub 2021/05/29. doi: 10.1016/bs.acc.2020.08.003 . [DOI] [PubMed] [Google Scholar]
2.Malakar AK, Choudhury D, Halder B, Paul P, Uddin A, Chakraborty S. A review on coronary artery disease, its risk factors, and therapeutics. Journal of cellular physiology. 2019;234(10):16812–23. Epub 2019/02/23. doi: 10.1002/jcp.28350 . [DOI] [PubMed] [Google Scholar]
3.King SB 3rd, Marshall JJ, Tummala PE. Revascularization for coronary artery disease: stents versus bypass surgery. Annual review of medicine. 2010;61:199–213. Epub 2009/10/15. doi: 10.1146/annurev.med.032309.063039 . [DOI] [PubMed] [Google Scholar]
4.Bangalore S, Kumar S, Fusaro M, Amoroso N, Attubato MJ, Feit F, et al. Short- and long-term outcomes with drug-eluting and bare-metal coronary stents: a mixed-treatment comparison analysis of 117 762 patient-years of follow-up from randomized trials. Circulation. 2012;125(23):2873–91. Epub 2012/05/16. doi: 10.1161/CIRCULATIONAHA.112.097014 . [DOI] [PubMed] [Google Scholar]
5.Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980–2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet (London, England). 2016;388(10053):1459–544. Epub 2016/10/14. doi: 10.1016/S0140-6736(16)31012-1 . [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Arnan MK, Hsieh TC, Yeboah J, Bertoni AG, Burke GL, Bahrainwala Z, et al. Postoperative blood urea nitrogen is associated with stroke in cardiac surgical patients. The Annals of thoracic surgery. 2015;99(4):1314–20. Epub 2015/02/17. doi: 10.1016/j.athoracsur.2014.11.034 . [DOI] [PubMed] [Google Scholar]
7.Yamamoto K, Natsuaki M, Morimoto T, Shiomi H, Matsumura-Nakano Y, Nakatsuma K, et al. Periprocedural Stroke After Coronary Revascularization (from the CREDO-Kyoto PCI/CABG Registry Cohort-3). The American journal of cardiology. 2021;142:35–43. Epub 2020/12/07. doi: 10.1016/j.amjcard.2020.11.031 . [DOI] [PubMed] [Google Scholar]
8.Head SJ, Milojevic M, Daemen J, Ahn JM, Boersma E, Christiansen EH, et al. Stroke Rates Following Surgical Versus Percutaneous Coronary Revascularization. Journal of the American College of Cardiology. 2018;72(4):386–98. Epub 2018/07/22. doi: 10.1016/j.jacc.2018.04.071 . [DOI] [PubMed] [Google Scholar]
9.Silva GFS, Fagundes TP, Teixeira BC, Chiavegatto Filho ADP. Machine Learning for Hypertension Prediction: a Systematic Review. Current hypertension reports. 2022;24(11):523–33. Epub 2022/06/23. doi: 10.1007/s11906-022-01212-6 . [DOI] [PubMed] [Google Scholar]
10.Tseng PY, Chen YT, Wang CH, Chiu KM, Peng YS, Hsu SP, et al. Prediction of the development of acute kidney injury following cardiac surgery by machine learning. Critical care (London, England). 2020;24(1):478. Epub 2020/08/02. doi: 10.1186/s13054-020-03179-9 . [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Nistal-Nuño B. Machine learning applied to a Cardiac Surgery Recovery Unit and to a Coronary Care Unit for mortality prediction. Journal of clinical monitoring and computing. 2022;36(3):751–63. Epub 2021/04/17. doi: 10.1007/s10877-021-00703-2 . [DOI] [PubMed] [Google Scholar]
12.Zhang X, Fei N, Zhang X, Wang Q, Fang Z. Machine Learning Prediction Models for Postoperative Stroke in Elderly Patients: Analyses of the MIMIC Database. Frontiers in aging neuroscience. 2022;14:897611. Epub 2022/08/05. doi: 10.3389/fnagi.2022.897611 . [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Liu Q, Zheng HL, Wu MM, Wang QZ, Yan SJ, Wang M, et al. Association between lactate-to-albumin ratio and 28-days all-cause mortality in patients with acute pancreatitis: A retrospective analysis of the MIMIC-IV database. Frontiers in immunology. 2022;13:1076121. Epub 2023/01/03. doi: 10.3389/fimmu.2022.1076121 . [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Ding C, Guo Y, Mo Q, Ma J. Prediction Model of Postoperative Severe Hypocalcemia in Patients with Secondary Hyperparathyroidism Based on Logistic Regression and XGBoost Algorithm. Computational and mathematical methods in medicine. 2022;2022:8752826. Epub 2022/08/05. doi: 10.1155/2022/8752826 . [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Zhao X, Lu Y, Li S, Guo F, Xue H, Jiang L, et al. Predicting renal function recovery and short-term reversibility among acute kidney injury patients in the ICU: comparison of machine learning methods and conventional regression. Renal failure. 2022;44(1):1326–37. Epub 2022/08/06. doi: 10.1080/0886022X.2022.2107542 . [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Cabrera A, Bouterse A, Nelson M, Razzouk J, Ramos O, Chung D, et al. Use of random forest machine learning algorithm to predict short term outcomes following posterior cervical decompression with instrumented fusion. Journal of clinical neuroscience: official journal of the Neurosurgical Society of Australasia. 2023;107:167–71. Epub 2022/11/15. doi: 10.1016/j.jocn.2022.10.029 . [DOI] [PubMed] [Google Scholar]
17.Hauptman A, Balasubramaniam GM, Arnon S. Machine Learning Diffuse Optical Tomography Using Extreme Gradient Boosting and Genetic Programming. Bioengineering (Basel, Switzerland). 2023;10(3). Epub 2023/03/30. doi: 10.3390/bioengineering10030382 . [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Sorayaie Azar A, Babaei Rikan S, Naemi A, Bagherzadeh Mohasefi J, Pirnejad H, Bagherzadeh Mohasefi M, et al. Application of machine learning techniques for predicting survival in ovarian cancer. BMC medical informatics and decision making. 2022;22(1):345. Epub 2022/12/31. doi: 10.1186/s12911-022-02087-y . [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Saroj RK, Yadav PK, Singh R, Chilyabanyama ON. Machine Learning Algorithms for understanding the determinants of under-five Mortality. BioData mining. 2022;15(1):20. Epub 2022/09/25. doi: 10.1186/s13040-022-00308-8 . [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Qin Y, Wu J, Xiao W, Wang K, Huang A, Liu B, et al. Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type. International journal of environmental research and public health. 2022;19(22). Epub 2022/11/27. doi: 10.3390/ijerph192215027 . [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Chen Q, Pan T, Wang YN, Schoepf UJ, Bidwell SL, Qiao H, et al. A Coronary CT Angiography Radiomics Model to Identify Vulnerable Plaque and Predict Cardiovascular Events. Radiology. 2023;307(2):e221693. Epub 2023/02/15. doi: 10.1148/radiol.221693 . [DOI] [PubMed] [Google Scholar]
22.Zhang Y, Wang J, Zhai G, Zhou Y. Development and Validation of a Predictive Model for Chronic Kidney Disease After Percutaneous Coronary Intervention in Chinese. Clinical and applied thrombosis/hemostasis: official journal of the International Academy of Clinical and Applied Thrombosis/Hemostasis. 2022;28:10760296211069998. Epub 2022/01/25. doi: 10.1177/10760296211069998 . [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Zhao X, Liu C, Zhou P, Sheng Z, Li J, Zhou J, et al. Development and Validation of a Prediction Rule for Major Adverse Cardiac and Cerebrovascular Events in High-Risk Myocardial Infarction Patients After Primary Percutaneous Coronary Intervention. Clinical interventions in aging. 2022;17:1099–111. Epub 2022/07/27. doi: 10.2147/CIA.S358761 . [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Charlesworth DC, Likosky DS, Marrin CA, Maloney CT, Quinton HB, Morton JR, et al. Development and validation of a prediction model for strokes after coronary artery bypass grafting. The Annals of thoracic surgery. 2003;76(2):436–43. Epub 2003/08/07. doi: 10.1016/s0003-4975(03)00528-9 . [DOI] [PubMed] [Google Scholar]
25.Hornero F, Martín E, Rodríguez R, Castellà M, Porras C, Romero B, et al. A multicentre Spanish study for multivariate prediction of perioperative in-hospital cerebrovascular accident after coronary bypass surgery: the PACK2 score. Interactive cardiovascular and thoracic surgery. 2013;17(2):353–8; discussion 8. Epub 2013/05/01. doi: 10.1093/icvts/ivt102 . [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Chan HP, Samala RK, Hadjiiski LM, Zhou C. Deep Learning in Medical Image Analysis. Advances in experimental medicine and biology. 2020;1213:3–21. Epub 2020/02/08. doi: 10.1007/978-3-030-33128-3_1 . [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Huang AA, Huang SY. Use of machine learning to identify risk factors for coronary artery disease. PloS one. 2023;18(4):e0284103. Epub 2023/04/15. doi: 10.1371/journal.pone.0284103 . [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Jee J. Machine learning-based markers for CAD. Lancet (London, England). 2023;402(10397):183. Epub 2023/07/16. doi: 10.1016/S0140-6736(23)01062-0 . [DOI] [PubMed] [Google Scholar]
29.Niewiński G, Graczyńska A, Morawiec S, Raszeja-Wyszomirska J, Wójcicki M, Zieniewicz K, et al. Renaissance of Modified Charlson Comorbidity Index in Prediction of Short- and Long-Term Survival After Liver Transplantation? Medical science monitor: international medical journal of experimental and clinical research. 2019;25:4521–6. Epub 2019/06/19. doi: 10.12659/MSM.914669 . [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Karabağ T, Altuntaş E, Kalaycı B, Şahіn B, Somuncu MU, Çakır MO. The relationship of Charlson comorbidity index with stent restenosis and extent of coronary artery disease. Interventional medicine & applied science. 2018;10(2):70–5. Epub 2018/10/27. doi: 10.1556/1646.10.2018.20 . [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Rashid M, Kwok CS, Gale CP, Doherty P, Olier I, Sperrin M, et al. Impact of co-morbid burden on mortality in patients with coronary heart disease, heart failure, and cerebrovascular accident: a systematic review and meta-analysis. European heart journal Quality of care & clinical outcomes. 2017;3(1):20–36. Epub 2017/09/21. doi: 10.1093/ehjqcco/qcw025 . [DOI] [PubMed] [Google Scholar]
32.Albar HM, Alahmdi RA, Almedimigh AA, Shaik RA, Ahmad MS, Almutairi AB, et al. Prevalence of coronary artery disease and its risk factors in Majmaah City, Kingdom of Saudi Arabia. Frontiers in cardiovascular medicine. 2022;9:943611. Epub 2022/09/27. doi: 10.3389/fcvm.2022.943611 . [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
33.Sachdev M, Sun JL, Tsiatis AA, Nelson CL, Mark DB, Jollis JG. The prognostic importance of comorbidity for mortality in patients with stable coronary artery disease. Journal of the American College of Cardiology. 2004;43(4):576–82. Epub 2004/02/21. doi: 10.1016/j.jacc.2003.10.031 . [DOI] [PubMed] [Google Scholar]
34.Mamas MA, Fath-Ordoubadi F, Danzi GB, Spaepen E, Kwok CS, Buchan I, et al. Prevalence and Impact of Co-morbidity Burden as Defined by the Charlson Co-morbidity Index on 30-Day and 1- and 5-Year Outcomes After Coronary Stent Implantation (from the Nobori-2 Study). The American journal of cardiology. 2015;116(3):364–71. Epub 2015/06/04. doi: 10.1016/j.amjcard.2015.04.047 . [DOI] [PubMed] [Google Scholar]
35.Hall RE, Porter J, Quan H, Reeves MJ. Developing an adapted Charlson comorbidity index for ischemic stroke outcome studies. BMC health services research. 2019;19(1):930. Epub 2019/12/05. doi: 10.1186/s12913-019-4720-y . [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Bajic B, Galic I, Mihailovic N, Ristic S, Radevic S, Cupic VI, et al. Performance of Charlson and Elixhauser Comorbidity Index to Predict in-Hospital Mortality in Patients with Stroke in Sumadija and Western Serbia. Iranian journal of public health. 2021;50(5):970–7. Epub 2021/06/30. doi: 10.18502/ijph.v50i5.6114 . [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Falsetti L, Viticchi G, Tarquinio N, Silvestrini M, Capeci W, Catozzo V, et al. Charlson comorbidity index as a predictor of in-hospital death in acute ischemic stroke among very old patients: a single-cohort perspective study. Neurological sciences: official journal of the Italian Neurological Society and of the Italian Society of Clinical Neurophysiology. 2016;37(9):1443–8. Epub 2016/05/12. doi: 10.1007/s10072-016-2602-1 . [DOI] [PubMed] [Google Scholar]
38.Jang SJ, Kim LK, Sobti NK, Yeo I, Cheung JW, Feldman DN, et al. Mortality of patients with ST-segment-elevation myocardial infarction without standard modifiable risk factors among patients without known coronary artery disease: Age-stratified and sex-related analysis from nationwide readmissions database 2010–2014. American journal of preventive cardiology. 2023;14:100474. Epub 2023/03/17. doi: 10.1016/j.ajpc.2023.100474 . [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets generated and/or analyzed during the current study are available in the MIMIC-IV database, https://mimic.physionet.org/iv/.

[pone.0296402.ref001] 1.Amin AM. Metabolomics applications in coronary artery disease personalized medicine. Advances in clinical chemistry. 2021;102:233–70. Epub 2021/05/29. doi: 10.1016/bs.acc.2020.08.003 . [DOI] [PubMed] [Google Scholar]

[pone.0296402.ref002] 2.Malakar AK, Choudhury D, Halder B, Paul P, Uddin A, Chakraborty S. A review on coronary artery disease, its risk factors, and therapeutics. Journal of cellular physiology. 2019;234(10):16812–23. Epub 2019/02/23. doi: 10.1002/jcp.28350 . [DOI] [PubMed] [Google Scholar]

[pone.0296402.ref003] 3.King SB 3rd, Marshall JJ, Tummala PE. Revascularization for coronary artery disease: stents versus bypass surgery. Annual review of medicine. 2010;61:199–213. Epub 2009/10/15. doi: 10.1146/annurev.med.032309.063039 . [DOI] [PubMed] [Google Scholar]

[pone.0296402.ref004] 4.Bangalore S, Kumar S, Fusaro M, Amoroso N, Attubato MJ, Feit F, et al. Short- and long-term outcomes with drug-eluting and bare-metal coronary stents: a mixed-treatment comparison analysis of 117 762 patient-years of follow-up from randomized trials. Circulation. 2012;125(23):2873–91. Epub 2012/05/16. doi: 10.1161/CIRCULATIONAHA.112.097014 . [DOI] [PubMed] [Google Scholar]

[pone.0296402.ref005] 5.Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980–2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet (London, England). 2016;388(10053):1459–544. Epub 2016/10/14. doi: 10.1016/S0140-6736(16)31012-1 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref006] 6.Arnan MK, Hsieh TC, Yeboah J, Bertoni AG, Burke GL, Bahrainwala Z, et al. Postoperative blood urea nitrogen is associated with stroke in cardiac surgical patients. The Annals of thoracic surgery. 2015;99(4):1314–20. Epub 2015/02/17. doi: 10.1016/j.athoracsur.2014.11.034 . [DOI] [PubMed] [Google Scholar]

[pone.0296402.ref007] 7.Yamamoto K, Natsuaki M, Morimoto T, Shiomi H, Matsumura-Nakano Y, Nakatsuma K, et al. Periprocedural Stroke After Coronary Revascularization (from the CREDO-Kyoto PCI/CABG Registry Cohort-3). The American journal of cardiology. 2021;142:35–43. Epub 2020/12/07. doi: 10.1016/j.amjcard.2020.11.031 . [DOI] [PubMed] [Google Scholar]

[pone.0296402.ref008] 8.Head SJ, Milojevic M, Daemen J, Ahn JM, Boersma E, Christiansen EH, et al. Stroke Rates Following Surgical Versus Percutaneous Coronary Revascularization. Journal of the American College of Cardiology. 2018;72(4):386–98. Epub 2018/07/22. doi: 10.1016/j.jacc.2018.04.071 . [DOI] [PubMed] [Google Scholar]

[pone.0296402.ref009] 9.Silva GFS, Fagundes TP, Teixeira BC, Chiavegatto Filho ADP. Machine Learning for Hypertension Prediction: a Systematic Review. Current hypertension reports. 2022;24(11):523–33. Epub 2022/06/23. doi: 10.1007/s11906-022-01212-6 . [DOI] [PubMed] [Google Scholar]

[pone.0296402.ref010] 10.Tseng PY, Chen YT, Wang CH, Chiu KM, Peng YS, Hsu SP, et al. Prediction of the development of acute kidney injury following cardiac surgery by machine learning. Critical care (London, England). 2020;24(1):478. Epub 2020/08/02. doi: 10.1186/s13054-020-03179-9 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref011] 11.Nistal-Nuño B. Machine learning applied to a Cardiac Surgery Recovery Unit and to a Coronary Care Unit for mortality prediction. Journal of clinical monitoring and computing. 2022;36(3):751–63. Epub 2021/04/17. doi: 10.1007/s10877-021-00703-2 . [DOI] [PubMed] [Google Scholar]

[pone.0296402.ref012] 12.Zhang X, Fei N, Zhang X, Wang Q, Fang Z. Machine Learning Prediction Models for Postoperative Stroke in Elderly Patients: Analyses of the MIMIC Database. Frontiers in aging neuroscience. 2022;14:897611. Epub 2022/08/05. doi: 10.3389/fnagi.2022.897611 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref013] 13.Liu Q, Zheng HL, Wu MM, Wang QZ, Yan SJ, Wang M, et al. Association between lactate-to-albumin ratio and 28-days all-cause mortality in patients with acute pancreatitis: A retrospective analysis of the MIMIC-IV database. Frontiers in immunology. 2022;13:1076121. Epub 2023/01/03. doi: 10.3389/fimmu.2022.1076121 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref014] 14.Ding C, Guo Y, Mo Q, Ma J. Prediction Model of Postoperative Severe Hypocalcemia in Patients with Secondary Hyperparathyroidism Based on Logistic Regression and XGBoost Algorithm. Computational and mathematical methods in medicine. 2022;2022:8752826. Epub 2022/08/05. doi: 10.1155/2022/8752826 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref015] 15.Zhao X, Lu Y, Li S, Guo F, Xue H, Jiang L, et al. Predicting renal function recovery and short-term reversibility among acute kidney injury patients in the ICU: comparison of machine learning methods and conventional regression. Renal failure. 2022;44(1):1326–37. Epub 2022/08/06. doi: 10.1080/0886022X.2022.2107542 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref016] 16.Cabrera A, Bouterse A, Nelson M, Razzouk J, Ramos O, Chung D, et al. Use of random forest machine learning algorithm to predict short term outcomes following posterior cervical decompression with instrumented fusion. Journal of clinical neuroscience: official journal of the Neurosurgical Society of Australasia. 2023;107:167–71. Epub 2022/11/15. doi: 10.1016/j.jocn.2022.10.029 . [DOI] [PubMed] [Google Scholar]

[pone.0296402.ref017] 17.Hauptman A, Balasubramaniam GM, Arnon S. Machine Learning Diffuse Optical Tomography Using Extreme Gradient Boosting and Genetic Programming. Bioengineering (Basel, Switzerland). 2023;10(3). Epub 2023/03/30. doi: 10.3390/bioengineering10030382 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref018] 18.Sorayaie Azar A, Babaei Rikan S, Naemi A, Bagherzadeh Mohasefi J, Pirnejad H, Bagherzadeh Mohasefi M, et al. Application of machine learning techniques for predicting survival in ovarian cancer. BMC medical informatics and decision making. 2022;22(1):345. Epub 2022/12/31. doi: 10.1186/s12911-022-02087-y . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref019] 19.Saroj RK, Yadav PK, Singh R, Chilyabanyama ON. Machine Learning Algorithms for understanding the determinants of under-five Mortality. BioData mining. 2022;15(1):20. Epub 2022/09/25. doi: 10.1186/s13040-022-00308-8 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref020] 20.Qin Y, Wu J, Xiao W, Wang K, Huang A, Liu B, et al. Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type. International journal of environmental research and public health. 2022;19(22). Epub 2022/11/27. doi: 10.3390/ijerph192215027 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref021] 21.Chen Q, Pan T, Wang YN, Schoepf UJ, Bidwell SL, Qiao H, et al. A Coronary CT Angiography Radiomics Model to Identify Vulnerable Plaque and Predict Cardiovascular Events. Radiology. 2023;307(2):e221693. Epub 2023/02/15. doi: 10.1148/radiol.221693 . [DOI] [PubMed] [Google Scholar]

[pone.0296402.ref022] 22.Zhang Y, Wang J, Zhai G, Zhou Y. Development and Validation of a Predictive Model for Chronic Kidney Disease After Percutaneous Coronary Intervention in Chinese. Clinical and applied thrombosis/hemostasis: official journal of the International Academy of Clinical and Applied Thrombosis/Hemostasis. 2022;28:10760296211069998. Epub 2022/01/25. doi: 10.1177/10760296211069998 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref023] 23.Zhao X, Liu C, Zhou P, Sheng Z, Li J, Zhou J, et al. Development and Validation of a Prediction Rule for Major Adverse Cardiac and Cerebrovascular Events in High-Risk Myocardial Infarction Patients After Primary Percutaneous Coronary Intervention. Clinical interventions in aging. 2022;17:1099–111. Epub 2022/07/27. doi: 10.2147/CIA.S358761 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref024] 24.Charlesworth DC, Likosky DS, Marrin CA, Maloney CT, Quinton HB, Morton JR, et al. Development and validation of a prediction model for strokes after coronary artery bypass grafting. The Annals of thoracic surgery. 2003;76(2):436–43. Epub 2003/08/07. doi: 10.1016/s0003-4975(03)00528-9 . [DOI] [PubMed] [Google Scholar]

[pone.0296402.ref025] 25.Hornero F, Martín E, Rodríguez R, Castellà M, Porras C, Romero B, et al. A multicentre Spanish study for multivariate prediction of perioperative in-hospital cerebrovascular accident after coronary bypass surgery: the PACK2 score. Interactive cardiovascular and thoracic surgery. 2013;17(2):353–8; discussion 8. Epub 2013/05/01. doi: 10.1093/icvts/ivt102 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref026] 26.Chan HP, Samala RK, Hadjiiski LM, Zhou C. Deep Learning in Medical Image Analysis. Advances in experimental medicine and biology. 2020;1213:3–21. Epub 2020/02/08. doi: 10.1007/978-3-030-33128-3_1 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref027] 27.Huang AA, Huang SY. Use of machine learning to identify risk factors for coronary artery disease. PloS one. 2023;18(4):e0284103. Epub 2023/04/15. doi: 10.1371/journal.pone.0284103 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref028] 28.Jee J. Machine learning-based markers for CAD. Lancet (London, England). 2023;402(10397):183. Epub 2023/07/16. doi: 10.1016/S0140-6736(23)01062-0 . [DOI] [PubMed] [Google Scholar]

[pone.0296402.ref029] 29.Niewiński G, Graczyńska A, Morawiec S, Raszeja-Wyszomirska J, Wójcicki M, Zieniewicz K, et al. Renaissance of Modified Charlson Comorbidity Index in Prediction of Short- and Long-Term Survival After Liver Transplantation? Medical science monitor: international medical journal of experimental and clinical research. 2019;25:4521–6. Epub 2019/06/19. doi: 10.12659/MSM.914669 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref030] 30.Karabağ T, Altuntaş E, Kalaycı B, Şahіn B, Somuncu MU, Çakır MO. The relationship of Charlson comorbidity index with stent restenosis and extent of coronary artery disease. Interventional medicine & applied science. 2018;10(2):70–5. Epub 2018/10/27. doi: 10.1556/1646.10.2018.20 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref031] 31.Rashid M, Kwok CS, Gale CP, Doherty P, Olier I, Sperrin M, et al. Impact of co-morbid burden on mortality in patients with coronary heart disease, heart failure, and cerebrovascular accident: a systematic review and meta-analysis. European heart journal Quality of care & clinical outcomes. 2017;3(1):20–36. Epub 2017/09/21. doi: 10.1093/ehjqcco/qcw025 . [DOI] [PubMed] [Google Scholar]

[pone.0296402.ref032] 32.Albar HM, Alahmdi RA, Almedimigh AA, Shaik RA, Ahmad MS, Almutairi AB, et al. Prevalence of coronary artery disease and its risk factors in Majmaah City, Kingdom of Saudi Arabia. Frontiers in cardiovascular medicine. 2022;9:943611. Epub 2022/09/27. doi: 10.3389/fcvm.2022.943611 . [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]

[pone.0296402.ref033] 33.Sachdev M, Sun JL, Tsiatis AA, Nelson CL, Mark DB, Jollis JG. The prognostic importance of comorbidity for mortality in patients with stable coronary artery disease. Journal of the American College of Cardiology. 2004;43(4):576–82. Epub 2004/02/21. doi: 10.1016/j.jacc.2003.10.031 . [DOI] [PubMed] [Google Scholar]

[pone.0296402.ref034] 34.Mamas MA, Fath-Ordoubadi F, Danzi GB, Spaepen E, Kwok CS, Buchan I, et al. Prevalence and Impact of Co-morbidity Burden as Defined by the Charlson Co-morbidity Index on 30-Day and 1- and 5-Year Outcomes After Coronary Stent Implantation (from the Nobori-2 Study). The American journal of cardiology. 2015;116(3):364–71. Epub 2015/06/04. doi: 10.1016/j.amjcard.2015.04.047 . [DOI] [PubMed] [Google Scholar]

[pone.0296402.ref035] 35.Hall RE, Porter J, Quan H, Reeves MJ. Developing an adapted Charlson comorbidity index for ischemic stroke outcome studies. BMC health services research. 2019;19(1):930. Epub 2019/12/05. doi: 10.1186/s12913-019-4720-y . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref036] 36.Bajic B, Galic I, Mihailovic N, Ristic S, Radevic S, Cupic VI, et al. Performance of Charlson and Elixhauser Comorbidity Index to Predict in-Hospital Mortality in Patients with Stroke in Sumadija and Western Serbia. Iranian journal of public health. 2021;50(5):970–7. Epub 2021/06/30. doi: 10.18502/ijph.v50i5.6114 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0296402.ref037] 37.Falsetti L, Viticchi G, Tarquinio N, Silvestrini M, Capeci W, Catozzo V, et al. Charlson comorbidity index as a predictor of in-hospital death in acute ischemic stroke among very old patients: a single-cohort perspective study. Neurological sciences: official journal of the Italian Neurological Society and of the Italian Society of Clinical Neurophysiology. 2016;37(9):1443–8. Epub 2016/05/12. doi: 10.1007/s10072-016-2602-1 . [DOI] [PubMed] [Google Scholar]

[pone.0296402.ref038] 38.Jang SJ, Kim LK, Sobti NK, Yeo I, Cheung JW, Feldman DN, et al. Mortality of patients with ST-segment-elevation myocardial infarction without standard modifiable risk factors among patients without known coronary artery disease: Age-stratified and sex-related analysis from nationwide readmissions database 2010–2014. American journal of preventive cardiology. 2023;14:100474. Epub 2023/03/17. doi: 10.1016/j.ajpc.2023.100474 . [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Machine learning-based models for prediction of the risk of stroke in coronary artery disease patients receiving coronary revascularization

Lulu Lin

Li Ding

Zhongguo Fu

Lijiao Zhang

Roles

Abstract

Background

Methods

Results

Conclusion

Introduction

Methods

Study design and population

Potential predictors

Construction of the prediction models

Outcome variable

Statistical analysis

Table 1. The number and percentage of missing values.

Table 2. Sensitivity analysis of data before and after manipulation.

Table 3. Parameter settings for 8 machine learning models.

Results

Comparisons of the characteristics of participants with and without postoperative stroke in the training set

Fig 1. The screen process of the participants.

Table 4. Comparisons of the characteristics of participants with and without postoperative stroke in the training set.

Construction and the predictive values of the prediction models for the risk of stroke in CAD patients receiving coronary revascularization

Fig 2. The changes of MSE with Lambda in the Lasso regression.

Fig 3. The changes of Coefficients with Lambda in the Lasso regression.

The predictive values of the prediction models for the risk of stroke in CAD patients undergoing coronary revascularization

Table 5. The predictive values of different machine learning prediction models.

Table 6. Comparisons of the prediction performance of Catboost model and logistic regression model.

Fig 4. The ROC curves of machine learning models in the training set.

Fig 5. The ROC curves of machine learning models in the testing set.

Fig 6. Absolut summary plot showing the importance of each feature in the Catboost model.

Fig 7. Summary plot for SHAP values of features in the Catboost model.

Fig 8. Force plot for SHAP values of features in the Catboost model.

Fig 9. Dependence plot for SHAP values of CCI.

Discussion

Conclusions

Data Availability

Funding Statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases