Abstract
Background
Despite the established link between metabolic syndrome (MetS) and stroke incidence, the effects of dynamic and cumulative MetS scores on stroke risk among middle‐aged and older populations in China remain inadequately explored. Furthermore, it is unclear whether MetS scores could serve as a more robust predictor of new‐onset stroke.
Methods
Using data from 4281 participants aged 45 years and older from Waves 1 and 3 of the CHARLS (China Health and Retirement Longitudinal Study), time‐varying MetS scores were classified via K‐means clustering into 4 distinct subgroups spanning 2012 to 2015. Associations between MetS scores and incident stroke were evaluated employing logistic regression, and machine learning predicted new‐onset stroke risk based on MetS score and other covariates.
Results
Elevated baseline and cumulative MetS scores were independently associated with an increased risk of stroke. Participants categorized within Class 3 (persistent moderate‐to‐high MetS levels) and Class 4 (highly fluctuating elevated MetS levels) exhibited significantly higher stroke risk relative to Class 1 (stable low MetS levels). The gradient boosting machine model achieved superior predictive accuracy, reflected by an area under the curve of 0.76 (95% CI, 0.72–0.79). Shapley additive explanations identified age, MetS score, and body mass index as the most influential predictors.
Conclusions
Fluctuations in MetS scores, along with baseline and cumulative MetS measurements, are independently associated with an elevated risk of stroke. Moreover, the MetS score is anticipated to be a reliable and clinically relevant indicator for the prediction of new‐onset stroke risk.
Keywords: CHARLS, dynamic changes, K‐means clustering, machine learning, metabolic syndrome, risk prediction, stroke
Subject Categories: Machine Learning, Big Data and Data Standards, Ischemic Stroke, Cerebrovascular Disease/Stroke
Nonstandard Abbreviations and Acronyms
- CHARLS
China Health and Retirement Longitudinal Study
- MetS
metabolic syndrome
- SHAP
Shapley additive explanations
Clinical Perspective.
What Is New?
Fluctuations in MetS scores, as well as baseline and cumulative metabolic syndrome scores, are independently associated with an increased risk of stroke.
The metabolic syndrome score serves as a robust and meaningful indicator for the assessment of stroke risk.
What Are the Clinical Implications?
The findings suggest that early intervention strategies focused on reducing metabolic syndrome scores may be effective in mitigating the risk of stroke development, and the identification of individuals experiencing worsening changes in metabolic syndrome presents a valuable opportunity for early intervention, which could potentially prevent the onset of stroke.
Metabolic syndrome (MetS) refers to a cluster of interrelated metabolic abnormalities, including abdominal obesity, hypertension, hyperglycemia, hyperlipidemia, and elevated triglycerides, which are of significant clinical importance. 1 , 2 With changes in lifestyle, the prevalence of MetS has been increasing steadily, with global prevalence ranging from 14% to 30%, making it a major public health challenge worldwide. 3 , 4 Stroke is one of the leading causes of death and long‐term disability worldwide. It affects approximately 1.2% of the global population, with around 12 million new cases, 7 million deaths, and 93.8 million stroke survivors in 2021. The annual global economic burden of stroke exceeds USD 890 billion, accounting for 0.66% of the global gross domestic product. 5 , 6
Numerous studies have established a strong association between MetS and the occurrence and recurrence of stroke. 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 A meta‐analysis encompassing 16 prospective cohort studies with 116 496 participants suggested that MetS is a significant risk factor for stroke, particularly in women and patients with ischemic stroke. 8 Furthermore, another meta‐analysis of 113 cohort studies involving 59 919 participants indicated that MetS and certain components, such as low high‐density lipoprotein cholesterol (HDL‐C) and the number of MetS components, may be risk factors for stroke recurrence. 11
However, there remains a paucity of large‐scale studies investigating the relationship between dynamic changes in metabolic syndrome and the risk of stroke. Existing research primarily focuses on the binary classification of MetS and the binary outcomes of stroke occurrence. However, this binary classification does not accurately reflect the severity of metabolic disorders. The development and progression of metabolic syndrome are typically gradual and exhibit certain characteristics of dynamic changes. To better quantify the severity of MetS and reflect its dynamic changes, Yang et al developed a MetS scoring system tailored for the Chinese population, based on age, sex, and ethnic specificity. 15 This scoring system assigns weights to 5 common metabolic abnormalities—triglycerides, HDL‐C, fasting blood glucose, waist circumference, and mean arterial pressure—to quantitatively assess the severity of MetS.
The manifestation of MetS is not static, and an individual’s metabolic health status can change over different life stages, which may be closely related to the risk of stroke. Therefore, the dynamic changes in the MetS score, particularly its direction and speed of progression, are important for predicting stroke occurrence. Furthermore, studying the development trends and long‐term patterns of MetS is essential for early identification of modifiable risk factors and the formulation of effective prevention strategies.
This study aims to investigate the impact of MetS and its dynamic changes on incident stroke, with the goal of providing new theoretical insights and clinical guidance for the early prevention and intervention of stroke. In addition, we sought to develop and validate a machine learning‐based predictive model for new‐onset stroke using MetS scores and related clinical features. The data for this study are derived from the CHARLS (China Health and Retirement Longitudinal Study), a nationally representative prospective cohort study.
METHODS
Data Availability
The data sets used in this study are available in online repositories. Detailed information, including the names of the repositories and their accession numbers, can be accessed via the website at https://charls.pku.edu.cn/en. It should be noted that this study did not involve the generation or analysis of new data sets.
Declarations
Ethics Approval and Consent to Participate
The CHARLS study was performed in accordance with the principles of the Declaration of Helsinki and was approved by the Institutional Review Board of Peking University. All participants provided written informed consent before participating in the CHARLS study.
Study Population and Study Design
This study used data from the CHARLS, a nationally representative longitudinal survey of residents aged 45 and older in China. 16 The study used a multistage, stratified probability sampling method with probability proportional to size at baseline (2011–2012, Wave 1) to recruit 17 708 participants from 10 257 households across 28 provinces in China. CHARLS participants were followed up every 2 years through face‐to‐face, computer‐assisted personal interviews. All physicians involved in the study were trained by CHARLS staff at Peking University. Three subsequent follow‐ups were conducted among the surviving participants in 2013 to 2014 (Wave 2), 2015 to 2016 (Wave 3), and 2017 to 2018 (Wave 4).
A total of 8734 participants with complete MetS scores from Wave 1 were initially included. New‐onset stroke was defined as the absence of stroke before Wave 3 and the occurrence of a stroke event reported by Wave 4. Participants without MetS score data in Wave 3 (n=3831) were excluded from the follow‐up cohort. An additional 564 individuals were excluded due to age <45 years, or a documented stroke or without stroke data before Wave 3. Following this, 58 participants due to incomplete stroke outcome data in Wave 4 and missing baseline information on sex and body mass index (BMI) were further excluded. Ultimately, 4281 participants remained for the main analysis. Separately, for supplementary analyses, 6928 participants without baseline stroke who had complete MetS scores at Wave 1 were included (Figure 1, Data S1).
Figure 1. Flow chart of the study population.

A total of 4281 participants were recruited for this survey. AUC indicates area under the curve; BMI, body mass index; DCA, decision curve analysis; MetS, metabolic syndrome; RCS, restricted cubic spline; and SHAP, Shapley additive explanations.
MetS Score Assessment and Time‐Varying MetS Score
Referring to the MetS score model developed for Chinese adults by Yang et al, we constructed the MetS score following standard procedures. 15 The MetS score was calculated as follows: MetS score=−3.1436+0.0258× waist circumference +0.361× triglycerides −0.9348×HDL‐C+0.0128× mean arterial pressure +0.1224× fasting blood glucose. Because key biochemical parameters (triglycerides, HDL‐C, and fasting blood glucose) were measured only in Waves 1 (2011–2012) and 3 (2015–2016), MetS scores were calculated only for these 2 waves. To ensure analytical robustness and reliability, the MetS score, as a continuous variable, was standardized before regression modeling. The resulting regression coefficients represent the effect of a 1‐SD increase in the MetS score on the dependent variable. Additionally, the MetS score was categorized into quartiles (Q1–Q4) for further analysis. Furthermore, we calculated the cumulative MetS score between Wave 1 and Wave 3 to evaluate changes in the MetS score. 17 , 18 The calculation formula is (Wave 1 MetS score+Wave 3 MetS score)/2×time (2015–2012). The K‐means algorithm partitions the data set into K distinct clusters by iteratively minimizing the within‐cluster sum of squares. The optimal number of clusters (K) is typically determined by analyzing the elbow plot, where the point of diminishing returns in within‐cluster sum of squares reduction indicates the most appropriate value for K. 19 We employed an unsupervised machine learning technique, K‐means clustering using Euclidean distance, to classify MetS scores from 2012 to 2015 and assess changes. 20 , 21 The K‐means algorithm partitions the data set into K clusters by minimizing the sum of squares within each cluster. This method categorized participants into 4 clusters (Class 1–4) by analyzing the distribution patterns of MetS scores at 2 time points. This approach was designed to capture the dynamic changes in MetS scores among different participants. Specifically, the K‐means algorithm minimized within‐cluster sum of squares, allowing the classification of participants based on the characteristics of their MetS score distributions. Specifically, Class 1 represented participants with stable and consistently low MetS scores, indicating favorable metabolic health. Class 2 comprised individuals whose MetS scores showed a mild upward trend over time, suggesting early signs of metabolic deterioration. Class 3 included those with persistently moderate‐to‐high MetS scores, reflecting sustained metabolic burden. Class 4 was characterized by high and fluctuating MetS scores, with initially elevated values and substantial variability, indicating the most unstable metabolic profile. This clustering approach effectively captures the time‐varying patterns of MetS scores across different groups.
Covariates
This study includes several covariates, including age, sex, education level, residence, marital status, BMI, current smoking and alcohol consumption status, as well as medical history (including hypertension, diabetes, cardiovascular disease, and dyslipidemia). Education level was categorized into 2 groups: below high school and high school or higher. Residence was categorized as rural or urban. Marital status was classified as married or living with a stable partner, versus other statuses, including divorced, separated, widowed, or single. Alcohol consumption was defined as drinking at least once per week. Diabetes was ascertained by hemoglobin A1c ≥6.5%, fasting plasma glucose ≥126 mg/dL, random blood glucose ≥200 mg/dL, diabetes diagnosis, or usage of antidiabetic drugs. Hypertension was defined as a mean systolic blood pressure of ≥140 mm Hg or a mean diastolic blood pressure of ≥90 mm Hg, or a documented history of physician‐diagnosed hypertension. The diagnosis necessitates multiple blood pressure measurements (>2 times). Cardiovascular disease (CVD) diagnosis was based on participants’ answers to the question: “Has a doctor ever informed you of a heart disease diagnosis (such as angina, heart attack, heart failure, coronary heart disease, or other heart issues)?” Those who responded “Yes” to either question were categorized as having CVD, ensuring accurate identification of individuals with a history of significant cardiovascular events or conditions. The diagnosis of hyperlipidemia was based on the criteria of the Chinese Guidelines on Dyslipidemia in adults as indicated by TC ≥5.18 mmol/L, triglycerides ≥1.70 mmol/L, HDL‐C <1.04 mmol/L, low‐density lipoprotein‐C ≥3.37 mmol/L, or a prior hyperlipidemia diagnosis.
Statistical Analysis
In the descriptive statistics section, continuous variables were presented as mean±SD or median (interquartile range), and categorical variables were expressed as count (percentage). Comparisons of baseline characteristics between groups were performed using 1‐way ANOVA or the Kruskal–Wallis test for continuous variables and the chi‐square test for categorical variables.
The continuous MetS score was standardized using the baseline mean±SD for further analysis. Logistic regression was employed to evaluate the associations between the MetS score (continuous), MetS score per SD increase, and cumulative MetS score with stroke incidence, estimating odds ratios (ORs) and their 95% CIs. The covariates used for adjustment in the logistic regression models were determined a priori based on existing literature and clinical relevance to control for potential confounding. Restricted cubic splines were used to explore potential nonlinear relationships between MetS scores and stroke risk. Additionally, the associations of the 4 clustering subtypes with stroke incidence were analyzed using logistic regression models. Subgroup analyses were conducted to investigate how different demographic characteristics (eg, age, sex, marital status, smoking status) and medical histories (eg, hypertension, diabetes) influence the relationship between MetS scores and stroke risk. Linear trend tests and interaction tests were performed to assess the linearity of the effects and potential interactions within subgroups. A P value of <0.05 was considered statistically significant. All statistical analyses were performed using R version 4.4.1.
Feature Selection in the Development of a Prediction Model
We applied machine learning algorithms for feature selection to determine the importance of variables within the prognostic model. The Boruta algorithm, 22 a sophisticated feature selection technique, employs “shadow features” in conjunction with a “binomial distribution” to evaluate the significance of predictors in the data set. This approach creates shadow features by permuting the values of the original variables, establishing a baseline for feature importance assessment. A predictor is classified as significant if its Z score surpasses the highest Z score of the shadow features. This rigorous selection criterion ensures that only features demonstrating a statistically meaningful contribution to the model are retained, thereby enhancing the robustness of predictive analyses. 23
Model Development, Evaluation, and Interpretation
In this study, we divided the data set into a 70% training set and a 30% test set using randomly stratified sampling. To balance the positive and negative samples, the training set underwent preprocessing with random undersampling and synthetic minority oversampling technique oversampling. 24 Our research concentrated on the implementation of 8 extensively used machine learning algorithms—namely, logistic regression, 25 support vector machine, 26 gradient boosting machine, 27 neural networks, 28 k‐nearest neighbors, 29 extreme gradient boosting, 30 AdaBoost, 31 and CatBoost 32 —for the prediction of new‐onset stroke, employing a set of selected clinic features. As for support vector machine, neural network, and k‐nearest neighbors methods are sensitive to feature scales, so we performed feature scaling on relevant variables before modeling with support vector machine, k‐nearest neighbors, and neural network algorithms. Following this adjustment, we reevaluated their predictive performances on the test set. Ten cross‐validation procedures were performed to determine the optimal hyperparameters for the 8 machine learning models in the train set. The test set was not employed during the model tuning phase; rather, it was reserved exclusively for evaluating model performance following the completion of model selection and training. The training set was primarily used for the development of the machine learning models, whereas the test set was solely designated for assessing the predictive performance of these models.
The predictive accuracy of the model was assessed using receiver operating characteristic curves, implemented with the pROC package. 33 Additionally, decision curve analysis and clinical impact curve analysis were performed and visualized using “dca.R” and the “rmda” package, respectively. 34 Additionally, to assess our model’s performance, we used metrics from the confusion matrix, including accuracy, sensitivity, specificity, precision, and F1 score. Kernel Shapley additive explanations (SHAP) 35 enables the application of any classification model to generate predictions for outcome variables, thereby improving model interpretability and mitigating the inherent black‐box characteristics of machine learning. This methodology assists clinicians in understanding the results produced by these models. 36
Sensitivity Analyses
To assess the robustness of our findings and validate the conclusions, we conducted several sensitivity analyses: We systematically investigated the patterns and mechanisms underlying missingness in the MetS scores at Wave 1 and Wave 3, subsequently implementing multiple imputation 37 techniques appropriate to the identified missing data mechanisms. To evaluate the robustness of the association between baseline MetS score and incident stroke, we conducted sensitivity analyses using both complete case data and multiply imputed data sets from Wave 1. Additionally, to address potential clustering effects due to participants from the same household, we randomly selected 1 participant per household using unique household identifiers. Subsequently, K‐means clustering and the primary analyses were repeated on this restricted data set. Restricted cubic spline analysis was employed to assess potential nonlinear relationships between stroke risk and MetS scores, considering both baseline and cumulative measures. Moreover, subgroup and multiplicative interaction analyses were performed to examine the effects of five distinct MetS scoring methods on stroke risk across various stratifications, including dynamic change groups, baseline quartiles, cumulative quartiles, as well as treating baseline and cumulative MetS scores as continuous variables. To further mitigate confounding bias and balance baseline covariates, we applied the weighting method using the “CausalGPS” package. 38
RESULT
Baseline Characteristics of Participants
Table 1 presents the baseline characteristics of the 4281 participants stratified by the occurrence of new‐onset stroke. Significant differences were observed between the 2 groups in terms of age (P<0.001), baseline MetS score (Wave 1) (P<0.001), cumulative MetS score (P<0.001), BMI (P<0.001), sex (P<0.001), place of residence (P<0.001), smoking status (P=0.04), alcohol consumption (P=0.03), hypertension (P<0.001), diabetes (P<0.001), dyslipidemia (P<0.001), and prevalence of CVD (P<0.001). These results underscore significant differences in demographic, lifestyle, metabolic syndrome‐related, and clinical characteristics between individuals with and without new‐onset stroke. The missing variables and their respective proportions were summarized for participants with complete MetS scores at Wave 1 and Wave 3 (Table S1).
Table 1.
Baseline Characteristics of Study Participants by New‐Onset Stroke Occurrence
| Variable | Total (n=4281) | Nonstroke (n=3932) | New‐onset stroke (n=349) | Statistic | P value* |
|---|---|---|---|---|---|
| Age, y | 58.60±8.46 | 58.42±8.42 | 60.68±8.68 | −4.68 | <0.001 |
| Body mass index, kg/m2 | 23.72±3.77 | 23.64±3.74 | 24.54±4.08 | −3.95 | <0.001 |
| MetS score (Wave 1) | 1.04±0.79 | 1.02±0.79 | 1.29±0.76 | −6.37 | <0.001 |
| MetS score (Wave 3) | 0.98±0.70 | 0.96±0.70 | 1.20±0.69 | −6.41 | <0.001 |
| Cumulative MetS score | 3.03±2.05 | 2.97±2.05 | 3.74±2.02 | −6.88 | <0.001 |
| Age group | 14.70 | <0.001 | |||
| 45–60 y | 2399 (56.04) | 2238 (56.92) | 161 (46.13) | ||
| ≥60 y | 1882 (43.96) | 1694 (43.08) | 188 (53.87) | ||
| Baseline MetS score (quartile) | 39.88 | <0.001 | |||
| Q1 | 1071 (25.02) | 1021 (25.97) | 50 (14.33) | ||
| Q2 | 1068 (24.95) | 995 (25.31) | 73 (20.92) | ||
| Q3 | 1071 (25.02) | 970 (24.67) | 101 (28.94) | ||
| Q4 | 1071 (25.02) | 946 (24.06) | 125 (35.82) | ||
| Cumulative MetS score (quartile) | 41.12 | <0.001 | |||
| Q1 | 1070 (24.99) | 1020 (25.94) | 50 (14.33) | ||
| Q2 | 1070 (24.99) | 994 (25.28) | 76 (21.78) | ||
| Q3 | 1070 (24.99) | 976 (24.82) | 94 (26.93) | ||
| Q4 | 1071 (25.02) | 942 (23.96) | 129 (36.96) | ||
| Sex | 0.04 | 0.85 | |||
| Female | 2407 (56.23) | 2213 (56.28) | 194 (55.58) | ||
| Male | 1874 (43.78) | 1719 (43.72) | 155 (44.41) | ||
| Marital status | 4.78 | 0.03 | |||
| Married with spouse present | 3692 (86.24) | 3405 (86.60) | 287 (82.24) | ||
| Other | 589 (13.76) | 527 (13.40) | 62 (17.77) | ||
| Education | 0.06 | 0.81 | |||
| Middle school and below | 3879 (90.61) | 3561 (90.57) | 318 (91.12) | ||
| High school or above | 402 (9.39) | 371 (9.44) | 31 (8.88) | ||
| Residence place | 3.04 | 0.08 | |||
| Rural | 2881 (67.30) | 2631 (66.91) | 250 (71.63) | ||
| Urban | 1400 (32.70) | 1301 (33.09) | 99 (28.37) | ||
| Smoke | 6.51 | 0.04 | |||
| Current smoker | 1233 (28.80) | 1133 (28.82) | 100 (28.65) | ||
| Former smoker | 341 (7.97) | 301 (7.66) | 40 (11.46) | ||
| Never | 2707 (63.23) | 2498 (63.53) | 209 (59.89) | ||
| Drink | 4.49 | 0.03 | |||
| No | 2879 (67.25) | 2626 (66.79) | 253 (72.49) | ||
| Yes | 1402 (32.75) | 1306 (33.23) | 96 (27.51) | ||
| Hypertension | 64.05 | <0.001 | |||
| No | 2617 (61.13) | 2474 (62.92) | 143 (40.97) | ||
| Yes | 1664 (38.87) | 1458 (37.08) | 206 (59.03) | ||
| Diabetes | 16.86 | <0.001 | |||
| No | 3650 (85.26) | 3379 (85.94) | 271 (77.65) | ||
| Yes | 631 (14.74) | 553 (14.06) | 78 (22.35) | ||
| Cardiovascular disease | 33.31 | <0.001 | |||
| No | 3821 (89.26) | 3542 (90.08) | 279 (79.94) | ||
| Yes | 460 (10.75) | 390 (9.92) | 70 (20.06) | ||
| Dyslipidemia | 16.65 | <0.001 | |||
| No | 2532 (59.15) | 2362 (60.07) | 170 (48.71) | ||
| Yes | 1749 (40.86) | 1570 (39.93) | 179 (51.29) |
MetS indicates metabolic syndrome.
ANOVA, the Kruskal–Wallis test, and the chi‐square test assume independence of observations and do not account for correlated data. Therefore, the results should be interpreted with caution.
Association Between Baseline MetS Score and New‐Onset Stroke
The association between baseline MetS scores and stroke risk was analyzed using crude and adjusted logistic regression models. For the baseline MetS score (Wave 1), the crude model showed a significant association with stroke risk (OR, 1.50 [95% CI, 1.32–1.71], P<0.001). After adjusting for covariates in adjusted model 1, the association remained significant (OR, 1.56 [95% CI, 1.36–1.78], P<0.001). However, in adjusted Model 2, the association was no longer significant (OR, 1.17 [95% CI, 0.97–1.41], P=0.10). When the standardized MetS score (per SD increase) was analyzed, the association with stroke risk was consistently significant across all models. The crude model showed an OR of 1.38 (95% CI, 1.24–1.53, P<0.001), which remained significant after adjustment in Adjusted Model 1 (OR, 1.41 [95% CI, 1.27–1.56], P<0.001) and Model 2 (OR, 1.23 [95% CI, 1.09–1.38], P<0.001). For MetS score quartiles, stroke risk increased across the quartiles, showing a clear dose–response relationship. In the crude model, the ORs for stroke risk were 1.50 (95% CI, 1.03–2.17), 2.13 (95% CI, 1.50–3.02), and 2.70 (95% CI, 1.92–3.79) for Q2, Q3, and Q4, respectively, compared with Q1 (reference group). Similar patterns were observed in Adjusted Model 1, where the ORs were 1.52 (95% CI, 1.05–2.21), 2.23 (95% CI, 1.57–3.18), and 2.87 (95% CI, 2.04–4.05). In Adjusted Model 2, the association remained significant for Q3 (OR, 1.58 [95% CI, 1.07–2.33], P=0.02) and showed a borderline significance for Q4 (OR, 1.55 [95% CI, 0.99–2.43], P=0.06). Trend tests confirmed a significant linear trend between increasing MetS quartiles and stroke risk (P for trend <0.001 in all models) (Table 2). Restricted cubic spline demonstrated a significant linear relationship between baseline MetS scores and stroke risk (P for overall <0.001), in the crude and adjusted models, with no evidence of nonlinearity (P for nonlinear >0.05). (Figure 2A–C). Subgroup analyses revealed that higher MetS quartiles (Q2–Q4 versus Q1) were consistently associated with an increased risk of stroke, with a significant dose–response trend (P for trend <0.001) (Table S2). Age showed significant interaction (P for interaction=0.028), with the 45 to 60 years group exhibiting a stronger association (Q4 versus Q1: OR, 3.83 [95% CI, 2.35–6.54]), whereas other subgroup variables, such as sex, hypertension, and diabetes, showed no significant interactions (Table S2). Associations of the continuous baseline MetS scores with new‐onset stroke risk stratified by different factors in different adjusted variables. P for interaction in all subgroups was insignificant in Adjusted Model 2, implicating the association between baseline MetS scores and new‐onset stroke was robust in these subpopulations (Figure S1).
Table 2.
Association Between Baseline MetS Scores and New‐Onset Stroke Risk
| Characteristic | Crude model | Adjusted model 1 | Adjusted model 2 | |||
|---|---|---|---|---|---|---|
| OR (95% CI) | P value | OR (95% CI) | P value | OR (95% CI) | P value | |
| Baseline MetS score | 1.50 (1.32–1.71) | <0.0001 | 1.56 (1.36–1.78) | <0.0001 | 1.17 (0.97–1.41) | 0.10 |
| Baseline MetS score (per 1‐SD) | 1.38 (1.24–1.53) | <0.0001 | 1.41 (1.27–1.56) | <0.0001 | 1.23 (1.09–1.38) | <0.001 |
| Baseline MetS score (quartile) | ||||||
| Q1 | Ref | Ref | Ref | |||
| Q2 | 1.50 (1.03–2.17) | 0.0300 | 1.52 (1.05–2.21) | 0.0300 | 1.27 (0.87–1.86) | 0.22 |
| Q3 | 2.13 (1.50–3.02) | <0.0001 | 2.23 (1.57–3.18) | <0.0001 | 1.58 (1.07–2.33) | 0.02 |
| Q4 | 2.70 (1.92–3.79) | <0.0001 | 2.87 (2.04–4.05) | <0.0001 | 1.55 (0.99–2.43) | 0.06 |
| P for trend | <0.0001 | <0.0001 | 0.04 | |||
Crude model includes only the baseline MetS score variable. Adjusted model 1 adjusts for age, sex, education, residence (urban/rural), and marital status. Adjusted model 2 further adjusts for age, sex, education, residence (urban/rural), marital status, body mass index, alcohol consumption, smoking status, dyslipidemia, cardiovascular disease, diabetes, and hypertension. MetS indicates metabolic syndrome; and OR (95% CI), odds ratios and their 95% Cis.
Figure 2. Linear associations between baseline and cumulative MetS scores and the incidence of new‐onset stroke.

A, Crude RCS model analysis of the association between baseline MetS scores and the risk of new‐onset stroke. Crude model was adjusted with no covariates. B, Adjusted Model 1 adjusted for age, sex, educational attainment, residential status (urban/rural), and marital status. C, Adjusted Model 2 further adjusted for BMI, alcohol consumption, smoking status, dyslipidemia, CVD, diabetes, and hypertension. D, Crude RCS model analysis of the association between cumulative MetS scores and the risk of new‐onset stroke. Crude model was adjusted with no covariates. E, Adjusted Model 1 adjusted for age, sex, educational attainment, residential status (urban/rural), and marital status. F, Adjusted Model 2 further adjusted for BMI, alcohol consumption, smoking status, dyslipidemia, CVD, diabetes, and hypertension. BMI indicates body mass index; CVD, cardiovascular disease; MetS, metabolic syndrome; and RCS, restricted cubic spline.
Association Between Cumulative MetS Scores and New‐Onset Stroke Risk
The baseline characteristics of participants by the change in cumulative MetS score are listed in Table S3. The association between cumulative MetS scores and stroke risk was analyzed using crude and adjusted logistic regression models. Cumulative MetS scores were significantly associated with stroke risk in all models. In Adjusted Model 2, each 1‐SD increase in cumulative MetS scores was associated with a 21% higher stroke odds (OR, 1.21 [95% CI, 1.04–1.42], P=0.02). Quartile analysis further demonstrated a significant dose–response relationship, with participants in the highest quartile (Q4) showing a significantly increased stroke risk compared with the lowest quartile (Q1) in Adjusted Model 2 (OR, 1.69 [95% CI, 1.08–2.66], P=0.02). Trend tests confirmed a significant linear association between cumulative MetS scores and stroke risk (P for trend <0.001) (Table 3). The results of the restricted cubic spline analysis in both crude and adjusted models indicated no evidence of a nonlinear relationship between cumulative MetS scores and stroke risk (P for nonlinearity >0.20) (Figure 2D–F). Subgroup analyses revealed that higher cumulative MetS quartiles (Q2–Q4 versus Q1) were significantly associated with increased stroke risk, showing a clear dose–response trend (P for trend <0.001). No significant interactions were found between cumulative MetS score quartiles and stroke risk across subgroup variables, including sex, place of residence, and marital status, among others (Table S4). Associations of the continuous cumulative MetS scores with new‐onset stroke risk stratified by different factors in adjusted models. P for interaction in all subgroups was insignificant in adjusted model 2, implicating the association between cumulative MetS scores and new‐onset stroke was robust in these subpopulations (Figure S2).
Table 3.
Association Between Cumulative MetS Scores and New‐Onset Stroke Risk
| Characteristic | Crude model | Adjusted model 1 | Adjusted model 2 | |||
|---|---|---|---|---|---|---|
| OR (95% CI) | P value | OR (95% CI) | P value | OR (95% CI) | P value | |
| Cumulative MetS scores | 1.20 (1.14–1.26) | <0.001 | 1.22 (1.15–1.28) | <0.001 | 1.10 (1.02–1.19) | 0.02 |
| Cumulative MetS scores (per 1‐SD) | 1.44 (1.30–1.61) | <0.001 | 1.5 (1.34–1.67) | <0.001 | 1.21 (1.04–1.42) | 0.02 |
| Cumulative MetS score (quartile) | ||||||
| Q1 | Ref | Ref | Ref | |||
| Q2 | 1.56 (1.08–2.25) | 0.02 | 1.62 (1.12–2.34) | 0.01 | 1.35 (0.93–1.98) | 0.12 |
| Q3 | 1.96 (1.38–2.80) | <0.001 | 2.11 (1.47–3.02) | <0.001 | 1.5 (1.01–2.23) | 0.05 |
| Q4 | 2.79 (1.99–3.92) | <0.001 | 3.07 (2.18–4.34) | <0.001 | 1.69 (1.08–2.66) | 0.02 |
| P for trend | <0.001 | <0.001 | 0.03 | |||
Crude model includes only the cumulative MetS scores variable. Adjusted model 1 adjusts for age, sex, education, residence (urban/rural), and marital status. Adjusted model 2 further adjusts for body mass index, alcohol consumption, smoking status, dyslipidemia, cardiovascular disease, diabetes, and hypertension. MetS indicates metabolic syndrome; and OR (95% CI), odds ratios and their 95% CIs.
Relationship Between K‐Means Clustering Subclasses and Stroke Risk
To determine the optimal number of clusters (k) in our data set, we employed the elbow method, which evaluates the tradeoff between the number of clusters and the within‐cluster sum of squares. The optimal number of clusters for changes in MetS scores was finally determined to be 4 (Figure 3A). K‐means clustering was performed on MetS scores from 2012 and 2015, identifying 4 distinct subclasses (Class 1 to Class 4) based on participants’ MetS score changes (Figure 3B). The participants were categorized into 4 distinct groups: Class 1 (n=802), characterized by a consistently low and stable MetS score; Class 2 (n=1477), characterized by a moderate and increasing MetS score; Class 3 (n=1424), characterized by persistent moderate‐to‐high MetS score; and Class 4 (n=578), characterized by highly fluctuating and high MetS scores (Figure 3C and Table S5). Distribution of the MetS Score at 2011 and 2015 was visualized with density plot (Figure S3A and S3B). The association between clustering subclasses and stroke risk was assessed using crude and adjusted logistic regression models. Compared with Class 1 (reference group), stroke risk increased progressively across subclasses. In the crude model, ORs for stroke were 1.68 (95% CI, 1.13–2.48, P=0.01) for Class 2, 2.24 (95% CI, 1.53–3.28, P<0.001) for Class 3, and 3.37 (95% CI, 2.22–5.10, P<0.001) for Class 4 (Table 4). After adjusting for age, sex, education, residence, and marital status in adjusted model 1, the trend remained consistent, with significant associations observed for all subclasses (P for trend <0.001). In the fully adjusted Model 2, which additionally accounted for BMI, smoking, alcohol consumption, dyslipidemia, CVD, diabetes, and hypertension, Class 4 (OR, 1.79 [95% CI, 1.05–3.07], P=0.03) remained significantly associated with stroke risk. Class 3 showed a borderline significant association (OR, 1.56 [95% CI, 1.00–2.42], P=0.05), and Class 2 showed a borderline association (OR, 1.44 [95% CI, 0.96–2.16], P=0.08) (Table 4).
Figure 3. Clustering of changes in the MetS score between Wave 1 and Wave 3.

A, Determination of the optimal number of clusters (k=4) for changes in MetS scores using the K‐means clustering algorithm. B, Visualization of the 4 identified clusters based on K‐means clustering with Euclidean distance; the x and y axes represent the principal components of the changes in MetS scores. C, Data visualization for the classes of the change in the MetS score from 2011 to 2012 (Wave 1) and 2015 to 2016 (Wave 3). MetS indicates metabolic syndrome.
Table 4.
Relationship Between Categories of Time‐Varying MetS Scores and New‐Onset Stroke Incidence in the Complete Cohort
| Class category | Crude model | Adjusted model 1 | Adjusted model 2 | |||
|---|---|---|---|---|---|---|
| OR (95% CI) | P value | OR (95% CI) | P value | OR (95% CI) | P value | |
| Class 1 | Ref | Ref | Ref | |||
| Class 2 | 1.68 (1.13–2.48) | 0.01 | 1.74 (1.18–2.59) | 0.01 | 1.44 (0.96–2.16) | 0.08 |
| Class 3 | 2.24 (1.53–3.28) | <0.001 | 2.42 (1.65–3.57) | <0.001 | 1.56 (1.00–2.42) | 0.05 |
| Class 4 | 3.37 (2.22–5.10) | <0.001 | 3.74 (2.45–5.70) | <0.001 | 1.79 (1.05–3.07) | 0.03 |
| P for trend | <0.001 | <0.001 | 0.04 | |||
Crude model, no variable adjusted; Adjusted model 1, age, sex, education, residence place, and marital status were adjusted; Adjusted model 2, age, sex, education, residence place, marital status, body mass index, drink, smoke, dyslipidemia, cardiovascular disease, diabetes, and hypertension were adjusted. BMI; CVD; MetS indicates metabolic syndrome; and OR (95% CI), odds ratios and their 95% CIs.
In the subgroup analysis (Table 5), a significant interaction was observed in the age subgroup (P for interaction=0.04). The association between Class 4 (versus Class 1) and stroke risk was stronger in individuals aged 45 to 60 years (OR, 3.83 [95% CI, 2.09–7.43], P<0.001) compared with those aged ≥60 years (OR, 3.17 [95% CI, 1.83–5.63], P<0.001). Additionally, although former smokers showed the strongest association (Class 4 versus Class 1: OR, 5.80 [95% CI, 1.43–39.04], P=0.03), the interaction between smoking status and stroke risk was not statistically significant (P for interaction=0.83).
Table 5.
Subgroup Analysis of the Association Between Categories of Time‐Varying MetS Scores and New‐Onset Stroke Risk
| Characteristic | Class 1 | Class 2 OR (95% CI) | P value | Class 3 OR (95% CI) | P value | Class 4 OR (95% CI) | P value | P value for | P value for |
|---|---|---|---|---|---|---|---|---|---|
| Trend | Interaction | ||||||||
| Age group | 0.04 | ||||||||
| 45–60 | Ref | 1.35 (0.74–2.61) | 0.34 | 2.75 (1.58–5.14) | <0.001 | 3.83 (2.09–7.43) | <0.001 | <0.001 | |
| ≥60 | Ref | 2.06 (1.26–3.50) | 0.01 | 1.93 (1.17–3.28) | 0.01 | 3.17 (1.83–5.63) | <0.001 | <0.001 | |
| Sex | 0.43 | ||||||||
| Male | Ref | 1.79 (1.08–3.10) | 0.03 | 1.89 (1.12–3.30) | 0.02 | 3.09 (1.72–5.67) | <0.001 | <0.001 | |
| Female | Ref | 1.61 (0.91–3.00) | 0.12 | 2.64 (1.55–4.81) | <0.001 | 3.76 (2.11–7.08) | <0.001 | <0.001 | |
| Marital status | 0.96 | ||||||||
| Married with spouse present | Ref | 1.70 (1.118–2.70) | 0.02 | 2.25 (1.48–3.53) | <0.001 | 3.58 (2.28–5.76) | <0.001 | <0.001 | |
| Other | Ref | 1.68 (0.75–4.17) | 0.23 | 2.40 (1.09–5.82) | 0.04 | 2.93 (1.02–8.43) | 0.04 | 0.02 | |
| Residence place | 0.37 | ||||||||
| Urban | Ref | 1.08 (0.50–2.52) | 0.85 | 1.63 (0.81–3.64) | 0.20 | 3.17 (1.52–7.24) | 0.00 | <0.001 | |
| Rural | Ref | 1.94 (1.25–3.10) | 0.00 | 2.62 (1.71–4.18) | <0.001 | 3.46 (2.12–5.77) | <0.001 | <0.001 | |
| Smoking | 0.83 | ||||||||
| Never | Ref | 1.79 (1.04–3.26) | 0.04 | 2.605 (1.55–4.66) | <0.001 | 3.99 (2.28–7.35) | <0.001 | <0.001 | |
| Former smoker | Ref | 3.49 (0.92–22.87) | 0.11 | 3.446 (0.93–22.38) | 0.11 | 5.80 (1.43–39.04) | 0.03 | 0.04 | |
| Current smoker | Ref | 1.40 (0.78–2.60) | 0.28 | 1.762 (0.97–3.32) | 0.07 | 2.36 (1.16–4.81) | 0.02 | 0.01 | |
| Drink | 0.71 | ||||||||
| No | Ref | 1.47 (0.92–2.43) | 0.12 | 2.14 (1.36–3.48) | 0.00 | 3.13 (1.92–5.26) | <0.001 | <0.001 | |
| Yes | Ref | 2.08 (1.09–4.24) | 0.03 | 2.18 (1.13–4.49) | 0.03 | 3.55 (1.70–7.70) | <0.001 | 0.00 | |
| Hypertension | 0.77 | ||||||||
| No | Ref | 1.71 (1.06–2.86) | 0.03 | 1.95 (1.18–3.30) | 0.01 | 2.33 (1.15–4.57) | 0.02 | 0.01 | |
| Yes | Ref | 1.12 (0.59–2.29) | 0.73 | 1.36 (0.74–2.69) | 0.35 | 1.856 (0.99–3.72) | 0.06 | 0.01 | |
| Diabetes | 0.99 | ||||||||
| No | Ref | 1.67 (1.11–2.57) | 0.02 | 2.17 (1.45–3.32) | <0.001 | 3.125 (1.96–5.05) | <0.001 | <0.001 | |
| Yes | Ref | 1.65 (0.56–6.02) | 0.40 | 2.02 (0.76–7.01) | 0.20 | 2.771 (1.04–9.63) | 0.07 | 0.03 | |
| Dyslipidemia | 0.48 | ||||||||
| No | Ref | 1.50 (0.99–2.35) | 0.07 | 2.30 (1.48–3.64) | <0.001 | 2.75 (1.14–5.98) | 0.02 | <0.001 | |
| Yes | Ref | 2.15 (0.83–7.35) | 0.16 | 2.11 (0.85–7.05) | 0.16 | 3.32 (1.33–11.11) | 0.02 | 0.004 | |
| Cardiovascular disease | 0.86 | ||||||||
| No | Ref | 1.750 (1.16–2.73) | 0.01 | 2.22 (1.47–3.43) | <0.001 | 3.11 (1.97–5.02) | <0.001 | <0.001 | |
| Yes | Ref | 1.271 (0.49–3.71) | 0.64 | 1.99 (0.83–5.55) | 0.15 | 3.15 (1.27–9.01) | 0.02 | 0.004 |
MetS indicates metabolic syndrome; and OR (95% CI), odds ratios and their 95% CIs.
Machine Learning‐Based Stroke Risk Prediction
Categorical variables were encoded using standardized numerical schemes for compatibility with machine learning algorithms. Details of the encoding methods are provided in Table S6. The Boruta algorithm report classifies variables, including MetS score, in the green area as important factors in the model. MetS score is a key predictor of new‐onset stroke in patients aged >45. Variables in the yellow area are considered potential contributors to adverse outcomes, whereas those in the red area are classified as unimportant (Figure 4A and 4B). The proportions of positive and negative outcome samples before and after applying synthetic minority oversampling technique. It can be observed that following this preprocessing, the distribution of outcome classes in the training set was approximately balanced (Figure S4). The grid search parameters and the corresponding optimal parameters for the 8 machine learning models used to predict incident stroke are presented in Table S7. Model performance was evaluated based on the area under the receiver operating characteristic curve. In the testing data set, GBM demonstrated the best performance with an area under the receiver operating characteristic curve of 0.75 (95% CI, 0.72–0.79), followed by CatBoost (area under the receiver operating characteristic curve, 0.72 [95% CI 0.68–0.75]) (Figure 4C). Furthermore, the calibration curve for the GBM model exhibited a high degree of agreement with the reference line, suggesting exceptional predictive performance (Figure 4D). The decision curve analysis curves illustrated that the GBM algorithm model conferred a significant net benefit and demonstrated robust clinical effectiveness (Figure 4E). Additionally, the GBM model demonstrated robust performance, with all key metrics derived from the confusion matrix—including accuracy, sensitivity, specificity, precision, and F1 score—exceeding 0.64 (Table 6). The SHAP method was employed to evaluate the relative importance of each predictive feature. The SHAP summary plots illustrate the distribution of SHAP values for the 8 clinical variables, reflecting their respective impact and contribution to the GBM model (Figure 4F). Based on the absolute mean SHAP values, the 2 most influential factors associated with new‐onset stroke, ranked by importance, were age and MetS score (Figure 4G).
Figure 4. An assessment of the discriminatory capabilities of 8 machine learning models, accompanied by a feature importance analysis using SHAP.

A, Feature importance scores derived from the Boruta algorithm across 100 classifier runs. Each line represents different variable. B, Analysis of feature selection using the Boruta algorithm; the horizontal axis represents the names of each variable, and the vertical axis displays the Z‐values. The box plot illustrates the Z‐values for each variable in the model, with green boxes indicating the selected 8 important variables, yellow boxes representing tentative variables, and red boxes marking unimportant variables. Blue boxes are calculated as reference levels during the run of Boruta algorithm. Notably, MetS score emerged as the top predictor. C, Receiver operating characteristic curves for the machine learning models evaluated on the testing data set. D, Calibration curves for machine learning models applied to the testing data set. E, Decision curve analysis illustrating the net benefits of the various models. F, SHAP summary plot detailing the impact of each feature on stroke risk within the GBM model, as indicated by SHAP values. The higher the SHAP value of each variable, the more impact and contribution to the model. G, Bar plot of the mean absolute SHAP values for each predictor, which indicate the importance of the top 8 significant variables most correlated with stroke risk in the GBM model. AUC indicates area under the curve; BMI, body mass index; GBM, gradient boosting machine; KNN, k‐nearest neighbors; MetS, metabolic syndrome; SHAP, Shapley additive explanations; and SVM, support vector machine.
Table 6.
Evaluation Metrics in Machine Learning Prediction Model
| Model | Threshold | Accuracy | Sensitivity | Specificity | Precision | F1 |
|---|---|---|---|---|---|---|
| Logistic | 0.39 | 0.61 | 0.75 | 0.51 | 0.53 | 0.62 |
| Support vector machine | 0.41 | 0.61 | 0.78 | 0.50 | 0.53 | 0.63 |
| Gradient boosting machine | 0.45 | 0.70 | 0.66 | 0.73 | 0.64 | 0.65 |
| Neural network | 0.44 | 0.65 | 0.62 | 0.67 | 0.58 | 0.60 |
| Extreme gradient boosting | 0.50 | 0.66 | 0.68 | 0.65 | 0.59 | 0.63 |
| K‐nearest neighbors | 0.50 | 0.72 | 0.69 | 0.74 | 0.67 | 0.68 |
| Adaboost | 0.50 | 0.63 | 0.72 | 0.56 | 0.55 | 0.62 |
| CatBoost | 0.61 | 0.67 | 0.66 | 0.68 | 0.61 | 0.63 |
Sensitivity Analysis
To ensure the robustness of our findings, several sensitivity analyses were conducted. First, intergroup comparisons of baseline characteristics between participants with and without missing MetS scores at Wave 1 and Wave 3 revealed significant differences in most variables, suggesting that the missingness was not completely at random but likely missing at random (Tables S8 and S9). Consistent results were observed when primary analyses were repeated using either multiple imputation for missing data or complete case data at Wave 1 (Tables S10 and S11). Second, sensitivity analyses restricted to participants with unique household IDs also demonstrated a consistent association between baseline, cumulative, and time‐varying MetS score classes and incident stroke (Figure S5, Table S12). Furthermore, propensity score weighting was applied to balance the groups with respect to sex, age, education, place of residence, alcohol consumption, smoking status, and CVD (Figure S6). Weighted logistic regression analyses confirmed a positive dose–response association between baseline, cumulative, and time‐varying MetS score classes and stroke risk (Table S13). These results corroborate the robustness of our primary findings and underscore the potential of MetS as a target for interventions aimed at preventing or delaying stroke onset.
DISCUSSION
This study investigated the relationship between MetS and stroke risk, focusing on the dynamic changes in MetS severity over time. Using data from the CHARLS, we applied K‐means clustering to classify the dynamic changes in participants’ MetS scores from 2011 to 2015. The results indicated that a higher baseline MetS score, as well as a cumulative MetS score and membership in Class 3 (persistently moderate‐to‐high scores) or Class 4 (highly fluctuating and high scores), were significantly associated with an increased risk of stroke.
Our analysis revealed a positive association between baseline MetS scores and stroke risk, consistent with previous studies that have identified MetS as a significant risk factor for stroke. 8 , 9 , 12 , 39 , 40 , 41 Furthermore, cumulative MetS scores, reflecting the overall burden of metabolic abnormalities over time, demonstrated a dose–response relationship with stroke incidence. This underscores the importance of considering both the initial severity and the progression of MetS in assessing stroke risk.
The previous studies predominantly relied on binary diagnoses of MetS, overlooking the impact of disease progression and variability on outcomes. Research led by Sehoon Park 42 , 43 has examined how changes in MetS status affect the risk of major adverse cardiovascular events. This study underscores the significant impact of dynamic MetS status on cardiovascular health. Therefore, for patients with MetS, it may be more important to monitor the dynamic changes in their MetS status rather than focusing solely on static indicators. By employing K‐means clustering, we identified distinct patterns in MetS progression. Participants classified into high‐risk clusters, characterized by persistently high or highly fluctuating MetS scores (Class 3 and Class 4), exhibited a significantly elevated risk of stroke. These findings align with recent research emphasizing the prognostic value of metabolic health trajectories in predicting cardiovascular events.
Previous studies have shown that patients with MetS are at a significantly increased risk of stroke due to the combined effects of various metabolic disorders, such as chronic inflammation, insulin resistance, hyperglycemia, and dyslipidemia. 44 , 45 , 46 , 47 , 48 The review by Welty 49 and colleagues highlights that central obesity and high‐fat, high‐sugar diets induce insulin resistance, which serves as the common pathological basis for all MetS. Chronic inflammation, through various mechanisms, links these metabolic abnormalities together, exacerbating the dysregulation and significantly increasing the risk of CVD and type 2 diabetes. These factors accelerate the progression of atherosclerosis, cerebral small vessel disease, and inflammatory responses. Therefore, early identification of core metabolic abnormalities (eg, blood glucose dysregulation, lipid disorders, and hypertension) and proactive interventions are essential strategies for stroke prevention in MetS patients.
Subgroup analyses indicated that the association between the progression of MetS and stroke risk was particularly pronounced among individuals aged 45 to 60 years and former smokers. This suggests that early intervention in MetS progression may be particularly beneficial in these populations, underscoring the need for targeted preventive strategies. To date, the factors influencing prognosis in patients with stroke remain unclear, and existing risk stratification tools have limited applicability for this population. However, advancements in artificial intelligence have enabled more accurate predictions of these complex conditions using machine learning methods. In our study, we first identified the MetS score as a significant risk factor among participants. Second, the model efficiently detects patients at higher risk for stroke based on readily available variables. Lastly, we used SHAP to enhance the interpretability and practicality of our predictive models. This research could offer an effective tool for early identification and intervention in new‐onset stroke risk among older patients. Additionally, it may contribute to raising health awareness and encouraging healthier lifestyles, as well as aiding physicians in managing patients more effectively with limited resources, potentially reducing hospitalization rates and medical costs.
Our findings highlight the necessity of regular monitoring of MetS components and the implementation of interventions aimed at mitigating MetS progression. Health care providers should take into account both the current severity and the progression of MetS when evaluating stroke risk. Lifestyle modifications, including dietary changes, increased physical activity, and smoking cessation, should be emphasized to halt or reverse the progression of MetS.
A major strength of this study is the use of a nationally representative cohort with longitudinal data, allowing for the assessment of MetS progression over time. The application of advanced statistical modeling techniques, such as K‐means clustering, enabled the identification of distinct trends in the changes of MetS and their association with stroke risk. Furthermore, we pioneered the development and evaluation of an interpretable machine learning model aimed at using MetS score at identifying risk factors and predicting new‐onset stroke in older patients. Several limitations should be acknowledged. First, stroke was defined based on self‐reported data, which may introduce reporting bias; future research should incorporate medical records or imaging diagnostics to improve accuracy and reduce misclassification. Second, blood samples were collected only during Wave 1 and Wave 3, and additional follow‐up data may be needed for a more in‐depth analysis; Third, the MetS score was developed using data from the Chinese older population; due to potential age, racial, and geographic differences, multicenter validation is required to ensure its applicability across diverse populations. Fourth, the CHARLS cohort primarily samples older individuals at county and village levels, which may affect generalizability and introduce selection bias. The analytic sample might also be biased toward healthier individuals due to differential mortality. Future studies with more complete follow‐up data, including exact death dates and causes, should apply competing risk methods to yield more precise stroke risk estimates. Nevertheless, despite these limitations, the CHARLS data set remains largely representative due to its extensive geographic coverage across numerous counties and villages, as well as its substantial sample size.
CONCLUSIONS
This study demonstrates that both the severity and progression of MetS are risk factors for new‐onset stroke. Identifying individuals with worsening MetS changes offers an opportunity for early intervention to prevent stroke. Future research should explore the underlying mechanisms linking MetS progression to stroke and evaluate the effectiveness of targeted interventions in high‐risk populations.
Sources of Funding
The work was supported by Major Scientific Instrument Development Project of the National Natural Science Foundation of China [grant number 32127802], National Natural Science Foundation of China [grant number 82170445] and the National Major Scientific Instruments and Equipments Development Project of National Natural Science Foundation of China [grant number 2015BAI01B00].
Disclosures
None.
Supporting information
Data S1
Supplemental Methods
Tables S1–S13
Figures S1–S6
Reference 50.
Acknowledgments
Qiaoqiao Li, Yimou Liu, Xueping Gao, and Jing Huang conceived the study. Qiaoqiao Li and Yimou Liu prepared and analyzed the data. Qinghua Fang, Yuan Xu, and Long Zeng carried out literature search. Qiaoqiao Li and Yimou Liu drafted the article. Qiaoqiao Li, Yimou Liu, and Xueping Gao revised the article. All authors have approved the final draft of the article.
This article was sent to Mahasin S. Mujahid, PhD, MS, FAHA, Associate Editor, for review by expert referees, editorial decision, and final disposition.
Supplemental Material is available at https://www.ahajournals.org/doi/suppl/10.1161/JAHA.125.041833
For Sources of Funding and Disclosures, see page 14.
References
- 1. Stone NJ, Bilek S, Rosenbaum S. Recent National Cholesterol Education Program Adult Treatment Panel III update: adjustments and options. Am J Cardiol. 2005;96:53e–59e. [DOI] [PubMed] [Google Scholar]
- 2. Zafar U, Khaliq S, Ahmad HU, Manzoor S, Lone KP. Metabolic syndrome: an update on diagnostic criteria, pathogenesis, and genetic links. Hormones (Athens). 2018;17:299–313. doi: 10.1007/s42000-018-0051-3 [DOI] [PubMed] [Google Scholar]
- 3. de Souza FH, Shinjo SK. The high prevalence of metabolic syndrome in polymyositis. Clin Exp Rheumatol. 2014;32:82–87. [PubMed] [Google Scholar]
- 4. Grundy SM, Cleeman JI, Daniels SR, Donato KA, Eckel RH, Franklin BA, Gordon DJ, Krauss RM, Savage PJ, Smith SC Jr, et al. Diagnosis and management of the metabolic syndrome: an American Heart Association/National Heart, Lung, and Blood Institute scientific statement. Circulation. 2005;112:2735–2752. doi: 10.1161/CIRCULATIONAHA.105.169404 [DOI] [PubMed] [Google Scholar]
- 5. Wolfe CD. The impact of stroke. Br Med Bull. 2000;56:275–286. doi: 10.1258/0007142001903120 [DOI] [PubMed] [Google Scholar]
- 6. Feigin VL, Brainin M, Norrving B, Martins S, Pandian J, Lindsay P, Grupper M, Rautalin I. World Stroke Organization: global stroke fact sheet 2025. Int J Stroke. 2025;20:132–144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Chei CL, Yamagishi K, Tanigawa T, Kitamura A, Imano H, Kiyama M, Sato S, Iso H. Metabolic syndrome and the risk of ischemic heart disease and stroke among middle‐aged Japanese. Hypertens Res. 2008;31:1887–1894. doi: 10.1291/hypres.31.1887 [DOI] [PubMed] [Google Scholar]
- 8. Li X, Li X, Lin H, Fu X, Lin W, Li M, Zeng X, Gao Q. Metabolic syndrome and stroke: a meta‐analysis of prospective cohort studies. J Clin Neurosci. 2017;40:34–38. doi: 10.1016/j.jocn.2017.01.018 [DOI] [PubMed] [Google Scholar]
- 9. Moghadam‐Ahmadi A, Soltani N, Ayoobi F, Jamali Z, Sadeghi T, Jalali N, Vakilian A, Lotfi MA, Khalili P. Association between metabolic syndrome and stroke: a population based cohort study. BMC Endocr Disord. 2023;23:131. doi: 10.1186/s12902-023-01383-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Roever L, Resende ES, Diniz ALD, Penha‐Silva N, O’Connell JL, Gomes PFS, Zanetti HR, Roerver‐Borges AS, Veloso FC, Fidale TM, et al. Metabolic syndrome and risk of stroke: protocol for an update systematic review and meta‐analysis. Medicine (Baltimore). 2018;97:e9862. doi: 10.1097/MD.0000000000009862 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Zhang F, Liu L, Zhang C, Ji S, Mei Z, Li T. Association of metabolic syndrome and its components with risk of stroke recurrence and mortality: a meta‐analysis. Neurology. 2021;97:e695–e705. doi: 10.1212/WNL.0000000000012415 [DOI] [PubMed] [Google Scholar]
- 12. Liu L, Zhan L, Wang Y, Bai C, Guo J, Lin Q, Liang D, Xu E. Metabolic syndrome and the short‐term prognosis of acute ischemic stroke: a hospital‐based retrospective study. Lipids Health Dis. 2015;14:76. doi: 10.1186/s12944-015-0080-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Mi D, Jia Q, Zheng H, Hoff K, Zhao X, Wang C, Liu G, Wang Y, Liu L, Wang X. Metabolic syndrome and stroke recurrence in Chinese ischemic stroke patients‐‐the ACROSS‐China study. PLoS One. 2012;7:e51406. doi: 10.1371/journal.pone.0051406 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Zhu S, McClure LA, Lau H, Romero JR, White CL, Babikian V, Nguyen T, Benavente OR, Kase CS, Pikula A. Recurrent vascular events in lacunar stroke patients with metabolic syndrome and/or diabetes. Neurology. 2015;85:935–941. doi: 10.1212/WNL.0000000000001933 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Yang S, Yu B, Yu W, Dai S, Feng C, Shao Y, Zhao X, Li X, He T, Jia P. Development and validation of an age‐sex‐ethnicity‐specific metabolic syndrome score in the Chinese adults. Nat Commun. 2023;14:6988. doi: 10.1038/s41467-023-42423-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Zhao Y, Hu Y, Smith JP, Strauss J, Yang G. Cohort profile: the China Health and Retirement Longitudinal Study (CHARLS). Int J Epidemiol. 2014;43:61–68. doi: 10.1093/ije/dys203 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Cui H, Liu Q, Wu Y, Cao L. Cumulative triglyceride‐glucose index is a risk for CVD: a prospective cohort study. Cardiovasc Diabetol. 2022;21:22. doi: 10.1186/s12933-022-01456-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Huo RR, Zhai L, Liao Q, You XM. Changes in the triglyceride glucose‐body mass index estimate the risk of stroke in middle‐aged and older Chinese adults: a nationwide prospective cohort study. Cardiovasc Diabetol. 2023;22:254. doi: 10.1186/s12933-023-01983-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Anderson SR, Roberts JM, Zhang J, Steele MR, Romero CO, Bosco A, Vetter ML. Developmental apoptosis promotes a disease‐related gene signature and independence from CSF1R signaling in retinal microglia. Cell Rep. 2019;27:2002‐13.e5. doi: 10.1016/j.celrep.2019.04.062 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Sinaga KP, Yang M‐S. Unsupervised K‐means clustering algorithm. IEEE Access. 2020;8:80716–80727. doi: 10.1109/ACCESS.2020.2988796 [DOI] [Google Scholar]
- 21. Singh A, Yadav A, Rana A. K‐means with three different distance metrics. Int J Comput Appl. 2013;67:13–17. doi: 10.5120/11430-6785 [DOI] [Google Scholar]
- 22. Kursa MB, Rudnicki WR. Feature selection with the Boruta package. J Stat Softw. 2010;36:1–13. [Google Scholar]
- 23. Degenhardt F, Seifert S, Szymczak S. Evaluation of variable selection methods for random forests and omics data sets. Brief Bioinf. 2019;20:492–503. doi: 10.1093/bib/bbx124 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Wang K, Tian J, Zheng C, Yang H, Ren J, Li C, Han Q, Zhang Y. Improving risk identification of adverse outcomes in chronic heart failure using SMOTE+ENN and machine learning. Risk Manag Healthc Policy. 2021;14:2453–2463. doi: 10.2147/RMHP.S310295 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Cramer JS. The origins of logistic regression. 2002.
- 26. Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intellig Syst Applic. 1998;13:18–28. doi: 10.1109/5254.708428 [DOI] [Google Scholar]
- 27. Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Statis. 2001;29:1189–1232. doi: 10.1214/aos/1013203451 [DOI] [Google Scholar]
- 28. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back‐propagating errors. Nature. 1986;323:533–536. doi: 10.1038/323533a0 [DOI] [Google Scholar]
- 29. Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inform Theory. 1967;13:21–27. doi: 10.1109/TIT.1967.1053964 [DOI] [Google Scholar]
- 30. Chen T, Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016.
- 31. Favaro P, Vedaldi A. AdaBoost. Computer Vision: A Reference Guide. Springer; 2021:36–40. [Google Scholar]
- 32. Dorogush AV, Ershov V, Gulin A. CatBoost: gradient boosting with categorical features support. 181011363. 2018.
- 33. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Müller M. pROC: an open‐source package for R and S+ to analyze and compare ROC curves. BMC Bioinf. 2011;12:77. doi: 10.1186/1471-2105-12-77 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Song J, Wang L, Ng NN, Zhao M, Shi J, Wu N, Li W, Liu Z, Yeom KW, Tian J. Development and validation of a machine learning model to explore tyrosine kinase inhibitor response in patients with stage IV EGFR variant‐positive non‐small cell lung cancer. JAMA Netw Open. 2020;3:e2030442. doi: 10.1001/jamanetworkopen.2020.30442 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Lundberg SM, Lee S‐I. A unified approach to interpreting model predictions. Adv Neu Inf Process Syst. 2017;30:4768–4777. [Google Scholar]
- 36. Nohara Y, Matsumoto K, Soejima H, Nakashima N. Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput Methods Prog Biomed. 2022;214:106584. doi: 10.1016/j.cmpb.2021.106584 [DOI] [PubMed] [Google Scholar]
- 37. Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393. doi: 10.1136/bmj.b2393 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Wu X, Mealli F, Kioumourtzoglou MA, Dominici F, Braun D. Matching on generalized propensity scores with continuous exposures. J Am Stat Assoc. 2024;119:757–772. doi: 10.1080/01621459.2022.2144737 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Chen Y, Wu J, Chen M, Zhu Y, Wang H, Cui T, Zhang S, Wang D. Association between metabolic syndrome and outcomes of large‐artery atherosclerosis stroke treated with reperfusion therapy. J Stroke Cerebrovasc Dis. 2024;33:107927. doi: 10.1016/j.jstrokecerebrovasdis.2024.107927 [DOI] [PubMed] [Google Scholar]
- 40. Li X, Li X, Fang F, Fu X, Lin H, Gao Q. Is metabolic syndrome associated with the risk of recurrent stroke: a meta‐analysis of cohort studies. J Stroke Cerebrovasc Dis. 2017;26:2700–2705. doi: 10.1016/j.jstrokecerebrovasdis.2017.03.014 [DOI] [PubMed] [Google Scholar]
- 41. Sarrafzadegan N, Gharipour M, Sadeghi M, Nezafati P, Talaie M, Oveisgharan S, Nouri F, Khosravi A. Metabolic syndrome and the risk of ischemic stroke. J Stroke Cerebrovasc Dis. 2017;26:286–294. doi: 10.1016/j.jstrokecerebrovasdis.2016.09.019 [DOI] [PubMed] [Google Scholar]
- 42. Finucane TE. Altered risk for cardiovascular events with changes in the metabolic syndrome status. Ann Intern Med. 2020;172:707. doi: 10.7326/L20-0075 [DOI] [PubMed] [Google Scholar]
- 43. Park S, Lee S, Kim Y, Lee Y, Kang MW, Han K, Han SS, Lee H, Lee JP, Joo KW, et al. Altered risk for cardiovascular events with changes in the metabolic syndrome status: a nationwide population‐based study of approximately 10 million persons. Ann Intern Med. 2019;171:875–884. doi: 10.7326/M19-0563 [DOI] [PubMed] [Google Scholar]
- 44. Arshad N, Lin TS, Yahaya MF. Metabolic syndrome and its effect on the brain: possible mechanism. CNS Neurol Disord Drug Targets. 2018;17:595–603. doi: 10.2174/1871527317666180724143258 [DOI] [PubMed] [Google Scholar]
- 45. Herisson F, Zhou I, Mawet J, Du E, Barfejani AH, Qin T, Cipolla MJ, Sun PZ, Rost NS, Ayata C. Posterior reversible encephalopathy syndrome in stroke‐prone spontaneously hypertensive rats on high‐salt diet. J Cereb Blood Flow Metab. 2019;39:1232–1246. doi: 10.1177/0271678X17752795 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Kim CE, Shin S, Lee HW, Lim J, Lee JK, Shin A, Kang D. Association between sleep duration and metabolic syndrome: a cross‐sectional study. BMC Public Health. 2018;18:720. doi: 10.1186/s12889-018-5557-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Natarajan R. Epigenetic mechanisms in diabetic vascular complications and metabolic memory: the 2020 Edwin Bierman award lecture. Diabetes. 2021;70:328–337. doi: 10.2337/dbi20-0030 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Brown AE, Walker M. Genetics of insulin resistance and the metabolic syndrome. Curr Cardiol Rep. 2016;18:75. doi: 10.1007/s11886-016-0755-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Welty FK, Alfaddagh A, Elajami TK. Targeting inflammation in metabolic syndrome. Transl Res. 2016;167:257–280. doi: 10.1016/j.trsl.2015.06.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over‐sampling technique. J Artific Intellig Res. 2002;16:321–357. doi: 10.1613/jair.953 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1
Supplemental Methods
Tables S1–S13
Figures S1–S6
Reference 50.
Data Availability Statement
The data sets used in this study are available in online repositories. Detailed information, including the names of the repositories and their accession numbers, can be accessed via the website at https://charls.pku.edu.cn/en. It should be noted that this study did not involve the generation or analysis of new data sets.
