Skip to main content
BMC Geriatrics logoLink to BMC Geriatrics
. 2025 Dec 12;26:67. doi: 10.1186/s12877-025-06688-w

Development of a sarcopenic obesity risk prediction model for older adults based on the CHARLS database

Biheng Feng 1,2, Yuanyuan Qin 1,2, Qingjiang Cai 1,2, Debin Huang 1,2,
PMCID: PMC12817618  PMID: 41382036

Abstract

Objective

To construct a risk prediction model for sarcopenic obesity in the elderly using different machine learning methods.

Methods

The research data were derived from the CHARLS 2015 national survey data. According to the inclusion and exclusion criteria, 2,375 elderly people were selected and 42 research variables were included. The risk factors were screened by univariate analysis. The variables with statistically significant differences (P < 0.05) were selected using the Boruta feature selection method, resulting in 16 features. A prediction model was then constructed based on six machine learning algorithms. The model was comprehensively evaluated using the area under the receiver operating characteristic curve (AUROC), accuracy, recall, precision, F1 score, and Brier score. The interpretable analysis of the optimal machine learning model was performed with SHAP(Shapley Additive Explanations.

Results

Among the six models, XGBoost had the best comprehensive performance, with an AUC of 0.78, accuracy of 0.89, recall of 0.20, precision of 0.53, F1 score of 0.28, and Brier score of 0.11. The importance analysis of shap features showed that waist circumference, pace and uric acid were important risk factors.

Conclusion

The model constructed by XGBoost machine learning algorithm has the best predictive performance, which may facilitate for early clinical evaluation and prevention of sarcopenic obesity.

Keywords: Elderly, Machine learning, Prediction model, Sarcopenic obesity

Introduction

Sarcopenic obesity (SO) is defined as a geriatric syndrome characterized by the coexistence of age-related sarcopenia and obesity, manifesting as decreased muscle mass, reduced strength, functional impairment, and typical obesity features [1]. Compared to patients with isolated sarcopenia or obesity alone, patients with SO face significantly higher risks of adverse outcomes—including cognitive decline, physical disability, overall mortality [2, 3], and elevated cardiometabolic morbidity—making this condition a growing public health concern in aging populations [4]. Epidemiological data indicate a global prevalence of S0 at approximately 28.3% among elderly individuals [5]. The National Health and Nutrition Examination Survey (NHANES) reveals marked age and gender disparities in SO prevalence, escalating from 12.6% (males) and 33.5% (females) in adults over 60 years to 27.5% and 48.0% respectively in those over 80. This health challenge is expected to worsen substantially with population aging [6]. Therefore, early identification and prevention of SO are crucially important in clinical practice. In recent years, machine learning has been increasingly applied in medical research. Compared with traditional regression models, machine learning demonstrates superior capability in handling complex nonlinear relationships between variables and establishing more robust risk prediction models [7]. The purpose of this study is to develop six SO prediction models using machine learning methods, evaluate their predictive performance, identify the optimal model through comparative analysis, and investigate associated risk factors. This approach aims to provide an effective tool for assessing and preventing sarcopenic obesity, ultimately reducing its risk in elderly populations.

Related work

The rising prevalence of SO has made the development of effective risk prediction tools a research focus. Several studies have attempted to construct prediction models using machine learning and statistical models, employing diverse datasets and feature combinations, which provide valuable references for this field.

For instance, Bae et al. [8] developed a neural network model based on Korean national fitness data to predict SO. This model emphasized the central role of physical fitness indicators in prediction, particularly body fat percentage and grip strength, and demonstrated exceptionally high accuracy (up to 93.1%) in validation, highlighting the practicality of physical health indicators in real-world prediction. Zambon Azevedo et al. [9]proposed a diagnostic criterion based on body composition phenotypes. Their study, based on an overweight/obese clinical cohort, utilized dual-energy X-ray absorptiometry (DXA) for precise body composition assessment and employed unsupervised machine learning for cluster analysis. They ultimately constructed sex-specific diagnostic equations incorporating key indicators like fat mass and appendicular skeletal muscle mass, providing important methodological support for the standardized diagnosis of SO. Furthermore, the study by Xu et al. [10], which built a prediction model based on a clinical cohort, focused on more in-depth clinical indicators. It integrated body composition data such as body fat percentage, visceral fat area, and neck circumference, achieving high predictive performance, albeit with relatively high data acquisition costs. Lian et al. [11]developed and validated an interpretable multi-center prediction model demonstrating good generalization ability. They deployed the final model as an open-source web application, significantly enhancing the tool’s clinical accessibility.

Although the aforementioned studies have their respective focuses in terms of data sources, model objectives, or clinical applicability, they also possess certain limitations. For example, the model by Bae et al. relies on systematic physical fitness test data, the research by Zambon Azevedo primarily focuses on diagnostic criteria, while the studies by Xu and Lian are based on a clinical cohort and multi-center data from specific regions, respectively. Although these models exhibit good predictive performance, they require specific detection equipment or a clinical setting, posing challenges for implementation in resource-limited community healthcare scenarios. To our knowledge, no study has yet systematically utilized the nationally representative community-based CHARLS database to construct a dedicated SO prediction model while comprehensively comparing multiple machine learning algorithms. Therefore, this study aims to bridge this gap by leveraging routinely accessible indicators from CHARLS to develop a risk prediction tool that does not rely on complex equipment. This tool is more suitable for primary healthcare settings. Through algorithm comparison to determine the optimal model, combined with interpretability analysis, this study systematically reveals, for the first time in large-scale community data, the critical roles of novel predictors such as “pace” and “uric acid,” providing new perspectives and practical solutions for the early community screening and mechanistic exploration of SO.

The main terminology definitions of this study

(1) Sarcopenia: Sarcopenia is defined as the progressive loss of muscle strength, quality, and function, which will lead to a decline in physical function and increase the risk of disability and death. In the presence of multiple diagnostic criteria, it is generally believed that low muscle strength and function is a key component [12].

(2) possible sarcopenia: possible sarcopenia is defined as decreased muscle strength and/or impaired physical function. Muscle strength is assessed by grip strength level, and impaired physical function is assessed by the five-time repeated chair stand test [13].

(3) Sarcopenic obesity(SO): Due to the limitations of the study, in this study, SO refers to individuals who meet the above " possible sarcopenia " diagnostic criteria and body mass index (BMI) ≥ 28 kg/m². The specific reasons are explained in the subsequent content.

Materials and methods

General information

The research data were extracted from the China Health and Retirement Longitudinal Study (CHARLS) database. CHARLS, a nationally representative longitudinal survey conducted by Peking University, targets residents aged ≥ 45 years in China and has served as a key resource for multidisciplinary research in medicine, socioeconomics, and related fields [14, 15]. The CHARLS national baseline survey was launched in 2011. It covered 150 county-level units, 450 village-level units, and approximately 10,000 households. Investigators collected high-quality micro-level data through structured questionnaires administered via face-to-face interviews, with biennial follow-up surveys conducted after the baseline study [16]. This study retrospectively analyzed data from 20,967 participants in the 2015 CHARLS Wave 3. Inclusion criteria were: age ≥ 60 years; exclusion criteria: individuals with missing data in any included variable. After screening, 2375 participants were ultimately included for prediction model construction. The sample screening process is shown in Fig. 1.

Fig. 1.

Fig. 1

Sample selection flowchart

Research indicators and data collection

1) Research variables: The research variables comprised demographic characteristics (gender, age, education level, place of residence, marriage, health insurance), health status and functional capacity (falls in the past 12 months, smoking history, drinking history, hypertension, diabetes, dyslipidemia, heart disease, cancer, pulmonary disease, arthritis, stroke, liver disease, renal disease, depressive status, physical activity level, hospitalization in the past 12 months, self-care ability, sleep duration), family structure (living alone status), as well as physical examination and biochemical parameters (systolic blood pressure, diastolic blood pressure, pace, waist circumference, triglyceride-glucose index, total cholesterol, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, fasting blood glucose, C-reactive protein, triglycerides, glycated hemoglobin, hemoglobin, uric acid, creatinine, cystatin C, blood urea nitrogen).

2) Outcome variables: The primary outcome of this study was the presence of sarcopenic obesity. Diagnostic criteria were established as follows: (1) Possible sarcopenia: The Asian Working Group on Sarcopenia (AWGS) updated its consensus in 2019 (AWGS 2019), incorporating diagnostic criteria from the European Working Group on Sarcopenia in Older People and introducing the concept of “possible sarcopenia“ [17, 18]. Given the large-scale nature of the CHARLS database and the challenges in obtaining muscle mass measurements, this study adopted the AWGS 2019 recommendation emphasizing the importance of early screening and intervention for populations with “possible sarcopenia” prior to formal muscle mass assessment. Consequently, our analysis specifically focused on investigating correlations between sarcopenia-related factors within this “possible sarcopenia” population. Possible sarcopenia was defined as decreased muscle strength and/or impaired physical function, assessed through two criteria: (1) Decreased muscle strength: Evaluated by grip strength levels, with diagnostic thresholds set at < 28.0 kg for males and < 18.0 kg for females; (2) Impaired physical function: Measured using the five-repetition chair stand time, with a diagnostic cutoff threshold of ≥ 12 s established based on consensus guidelines [13]. (2) Obesity: Defined as a body mass index (BMI) ≥ 28 kg/m² [19]. Participants presenting with concurrent sarcopenia and obesity were diagnosed with SO [20].

3) Statistical Methods: Data processing was conducted using SPSS 23.0 software in combination with R Statistical Software (version 4.2.2; The R Foundation; http://www.R-project.org) and the Free Statistics analysis platform (Version 2.0; Beijing, China). For non-normally distributed measurement data, results were presented as medians with interquartile ranges [M (Q1, Q3)] and analyzed using the non-parametric Wilcoxon rank-sum test. Categorical data were expressed as frequencies with percentages (%), and statistical comparisons were performed using chi-square tests and Fisher’s exact test. Risk factors were preliminarily screened through univariate analysis. Boruta is a feature selection algorithm based on random forest. Unlike general feature selection algorithms, its goal is to select the set of features most relevant to the dependent variable, rather than the set most relevant to a specific model [21]. This approach can meet the need to comprehensively explore potential risk factors associated with SO and avoid omitting any clinically significant variables in the early stages. Therefore, variables showing statistically significant differences (P < 0.05) were subsequently selected using the Boruta feature selection algorithm. The final feature set was employed to construct predictive models through six machine learning algorithms: Random Forest (RF), Extreme Gradient Boosting (XGBoost), Categorical Gradient Boosting (CatBoost), Light Gradient Boosting Machine (LightGBM), Support Vector Machine (SVM), and Logistic Regression (LR). All patients were randomly divided into training and test sets at a 7:3 ratio. After model development, the predictive performance was evaluated using the test cohort. The SHAP (Shapley Additive Explanations) method was utilized to interpret feature importance in the final model. Finally, propensity score matching (PSM) and multivariate logistic regression analysis were used to explore the correlation between waist circumference and the occurrence of SO in the elderly. PSM used a 1:1 nearest neighbor matching method (caliper value = 0.2). The balance of variables was evaluated after matching. The matching variables included age, gender, education level, marriage, depressive status, self-care ability, and cystatin C.

Results

Research data

A total of 2,375 participants were enrolled in this study, with their demographic and clinical characteristics summarized in Table 1. The cohort comprised 2,109 individuals without SO and 266 patients diagnosed with sarcopenic obesity, yielding a prevalence rate of 11.20%. Significant differences were observed between the two groups across 30 variables including age (P = 0.019), gender (P < 0.001), marriage (P = 0.010), place of residence (P = 0.027) and so on. These differential variables were subsequently subjected to feature selection using the Boruta algorithm, which identified them as predictive factors for SO.

Table 1.

Research participant characteristics

Variable Non-sarcopenic obesity (n = 2109) Sarcopenic obesity
(n = 266)
χ2/Z P
Age 66.00(63.00, 72.00) 68.00(64.00, 73.00) −2.352 0.019
Gender 25.073 < 0.001
 Male 1274(60.40) 118(44.40)
 Female 835(39.60) 148(55.60)
Marriage [n(%)] 6.686 0.010
 Divorced or widowed 563(26.70) 91(34.20)
 Married 1546(73.30) 175(65.80)
Place of residence [n(%)] 4.881 0.027
 Urban 773(36.70) 116(43.60)
 Rural 1336(63.30) 150(56.40)
Heart disease [n(%)] 25.593 < 0.001
 Yes 440(20.90) 92(34.60)
 No 1669(79.10) 174(65.40)
Stroke [n(%)] 9.335 0.002
 Yes 76(3.60) 20(7.50)
 No 2033(96.40) 246(92.50)
Arthritis [n(%)] 5.146 0.023
 Yes 947(44.90) 139(52.30)
 No 1162(55.10) 127(47.70)
Dyslipidemia [n(%)] 29.912 < 0.001
 Yes 427(20.20) 93(35.00)
 No 1682(79.80) 173(65.00)
Liver disease [n(%)] 4.196 0.041
 Yes 129(6.10) 25(9.40)
 No 1980(93.90) 241(90.60)
Hypertension [n(%)] 22.818 < 0.001
 Yes 421(20.00) 87(32.70)
 No 1688(80.00) 179(67.30)
Diabetes [n(%)] 17.324 < 0.001
 Yes 137(6.50) 36(13.50)
 No 1972(93.50) 230(86.50)
Drinking history [n(%)] 11.139 0.001
 Yes 808(38.30) 74(27.80)
 No 1301(61.70) 192(72.20)
Smoking history [n(%)] 28.696 < 0.001
 Yes 725(34.40) 48(18.00)
 No 1384(65.60) 218(82.00)
Hospitalization in the past 12 months [n(%)] 6.868 0.009
 Yes 341(16.20) 60(22.60)
 No 1768(83.80) 206(77.40)
Falls in the past 12 months [n(%)] 15.605 <0.001
 Yes 368(17.40) 73(27.40)
 No 1741(82.60) 193(72.60)
Depressive status [n(%)] 20.076 <0.001
 Yes 684(32.40) 123(46.20)
 No 1425(67.60) 143(53.80)
Education level [n(%)] 13.209 0.008
 Illiterate 990(46.90) 148(55.60)
 Primary school 556(26.40) 72(27.10)
 Junior high school 369(17.50) 30(11.30)
 Senior high school or vocational 156(7.40) 11(4.10)
 College or above 38(1.80) 5(1.90)
Self-care ability [n(%)] 39.122 <0.001
 Independent 1662(78.80) 163(61.30)
 Mildly dependent 354(16.80) 83(31.20)
 Moderately dependent 75(3.60) 14(5.30)
 Severely dependent 18(0.90) 6(2.30)
SBP(mmHg) 129.50(116.00,144.50) 134.00(121.00,150.63) -4.060 <0.001
DBP(mmHg) 74.00(67.00,82.00) 76.00(69.88,83.63) -3.605 <0.001
Waist circumference(cm) 84.40(77.80,91.20) 96.00(90.45,101.00) -16.350 <0.001
TG(mg/dl) 102.65(76.99,148.67) 134.51(93.81,184.51) -6.541 <0.001
HDL-C(mg/dl) 50.58(43.63,58.69) 45.95(40.83,53.28) -6.175 <0.001
FBG(mg/dl) 95.50(88.29,104.50) 99.10(91.89,109.91) -5.160 <0.001
UA(mg/dl) 4.90(4.00,5.90) 5.00(4.20,6.40) -2.046 0.041
CYSC(mg/l) 0.89(0.77,0.99) 0.93(0.81,1.03) -3.382 0.001
CRP(mg/l) 1.30(0.75,2.50) 1.90(1.00,3.00) -4.695 <0.001
HbA1c(%) 5.80(5.60,6.10) 6.00(5.60,6.50) -3.623 <0.001
TyG 8.51(8.17,8.92) 8.85(8.43,9.21) -7.065 <0.001
Pace(m/s) 3.11(2.65,3.73) 3.54(2.83,4.40) -6.327 <0.001

Remark: SBP Systolic blood pressure, DBP Diastolic blood pressure, TG Triglycerides, HDL-C High-density lipoprotein cholesterol, FBG Fasting blood glucose, UA Uric acid, CYSC Cystatin C, CRP C-reactive protein, HbA1c Glycated hemoglobin, TyG Triglyceride-glucose index

Predictor selection

Predictive factors were selected using the Boruta feature selection method, with results visualized in Fig. 2A and 2B. Following feature screening, 16 predictors associated with SO were incorporated into model construction. These included: marriage, high-density lipoprotein cholesterol (HDL-C), education level, depression status, glycated hemoglobin (HbA1c), self-care ability, uric acid, gender, systolic blood pressure, cystatin C, diastolic blood pressure, age, pace, triglycerides, triglyceride-glucose index (TyG index), and waist circumference.

Fig. 2.

Fig. 2

Predictor screening for SO in elderly populations using boruta feature selection. Panel A: All 30 candidate variables. Green boxes indicate confirmed important features (n = 16), red boxes show the shadow (randomised) features used as the discrimination threshold, and blue boxes represent rejected variables. Panel B: Zoom-in view of the 16 finally retained predictors; all green boxes lie above the maximum importance of the shadow features and were therefore selected for model training

Prediction model development and comparison

Prior to model training, hyperparameter optimization was conducted using the automated machine learning module within the R Statistical Software (version 4.2.2; The R Foundation; http://www.R-project.org) and the Free Statistics analysis platform (Version 2.0; Beijing, China). This module internally employs a Bayesian optimization strategy to automatically identify the optimal hyperparameter combinations for each selected algorithm. The tuning criterion for the optimization process was set to maximize the weighted Area Under the Receiver Operating Characteristic curve. Owing to the user-friendly and automated design of the software, the specific parameter search ranges and the final hyperparameter values are not exposed to the user. Six machine learning algorithms—including random forest (RF), extreme gradient boosting (XGBoost), categorical gradient boosting (CatBoost), light gradient boosting machine (LightGBM), support vector machine (SVM), and logistic regression (LR) - were subsequently employed to train prediction models based on the screened predictors. The training process employed five-fold cross-validation, allocating 70% of the data to the training set and 30% to the test set for each fold, and receiver operating characteristic (ROC) curves were generated to evaluate model performance. As shown in Fig. 3, the comparative evaluation of the six machine learning models for predicting SO in elderly populations demonstrated that five models achieved area under the ROC curve (AUC) values greater than 0.75, indicating strong predictive capability. Ranked by descending AUC values, the models performed as follows: CatBoost (AUC = 0.82), random forest (RF; AUC = 0.80), LightGBM (AUC = 0.79), logistic regression (LR; AUC = 0.79), and extreme gradient boosting (XGBoost; AUC = 0.78). To holistically evaluate model performance, we computed multiple metrics: accuracy, recall, precision, F1-score, and Brier score. As presented in Table 2, since the goal is early screening for SO risk, the Extreme Gradient Boosting (XGBoost) model demonstrated the best comprehensive performance among the six algorithmic models.

Fig. 3.

Fig. 3

Comparative ROC analysis of six machine learning models for SO risk prediction in elderly populations

Table 2.

Performance indicators of six machine learning models

Model AUC Accuracy Recall Precision F1 Score Brier Score
CatBoost 0.8246 0.8913 0.1068 0.6140 0.1789 0.1087
RF 0.7954 0.8856 0.0337 0.3000 0.0601 0.1144
LR 0.7919 0.8875 0.0905 0.4583 0.1455 0.1125
LightGBM 0.7910 0.8843 0.1571 0.4593 0.2275 0.1157
XGBoost 0.7825 0.8887 0.1967 0.5250 0.2770 0.1113
SVM 0.6234 0.8869 0.0444 0.1000 0.0615 0.1131

Although the CatBoost model achieved the highest AUC value (0.82), the model selection was based on a comprehensive performance evaluation consistent with the clinical objective of early screening for sarcopenic obesity. Given that the primary goal of this study is to identify high-risk individuals (i.e., maximizing the true positive detection rate), Recall and F1-score were considered relatively critical evaluation metrics. Among all models, XGBoost demonstrated the highest Recall and F1-score, indicating its superior capability in correctly identifying patients with SO. Therefore, XGBoost was selected as the optimal model for subsequent interpretation and analysis.

Feature importance assessment

The SHAP method is used to determine the importance of each prediction feature in the model. The vertical axis represents each independent variable, and the horizontal axis is the SHAP value. The positive value represents a positive contribution to the model prediction. The SHAP value indicates that each feature points to the final prediction result. Fig. 4 shows the SHAP feature importance map from the XGBoost model. It can be seen from the figure that waist circumference is the most important feature, exerting a significant impact on SO in the elderly, with a feature weight of 2.11, followed by pace and uric acid, with feature weights of 0.57 and 0.40, respectively. Fig. 5 shows that the importance of the final model predictors ranks in the order of waist circumference, pace, uric acid, high-density lipoprotein cholesterol, systolic blood pressure, triglyceride, triglyceride-glucose index, depressive status, and cystatin C. Among them, waist circumference, pace, and uric acid, which have a greater impact on the prediction results, are positively correlated with the risk of SO. Specifically, the greater the waist circumference, the faster the pace, and the higher the uric acid level, the greater the risk of SO in the elderly.

Fig. 4.

Fig. 4

SHAP global variable feature importance map

Fig. 5.

Fig. 5

Importance of predictors in clinical prediction model of SO in the elderly

Association analysis between waist circumference and SO

Multivariable logistic regression was employed to further analyze the association between waist circumference and SO risk in older adults. The results are presented in Table 3. In the unadjusted crude model, waist circumference showed a significant positive association with sarcopenic obesity, with an odds ratio (OR) of 13.29 (95% CI: 8.59–20.56). After PSM (normalized mean difference for all matched variables: SMD < 0.1), waist circumference remained significantly correlated with SO, showing an OR of 14.42 (95% CI: 8.87–23.10).

Table 3.

Correlation between waist circumference and so in the elderly

Variable Overall
OR (95% CI)
P value Propensity-score matched
OR (95% CI)
P value
Crude Model 13.29 (8.59–20.56) < 0.001 14.42 (8.87–23.10) < 0.001
Model 1 14.38 (9.25–22.34) < 0.001 12.15 (7.83–18.85) < 0.001
Model 2 11.3 (7.16–17.84) < 0.001 9.59 (5.8–15.87) < 0.001
Model 3 11.55 (7.29–18.31) < 0.001 10.55 (6.21–17.91) < 0.001

Adjusted sequentially for residual confounding variables through three models:

  1. Model 1 (age, gender, marriage, education level);

  2. Model 2 (Model 1 covariates plus systolic blood pressure, diastolic blood pressure, high-density lipoprotein cholesterol, uric acid, triglyceride, triglyceride-glucose index, glycosylated hemoglobin, cystatin C);

  3. Model 3 (Model 2 covariates plus depressive status, self-care ability, pace).

After adjustment, there was still a significant correlation between waist circumference and sarcopenic obesity. The OR values of different models were consistent, and all models (P < 0.001) confirmed that there was a significant correlation between waist circumference and the risk of sarcopenic obesity, indicating that waist circumference was an important risk factor for SO in the elderly.

Discussions

SO is an age-related complex geriatric syndrome characterized by decreased skeletal muscle strength, reduced muscle mass, impaired physical function, and concurrent obesity [22, 23]. Multiple studies [2427]have demonstrated that SO in elderly populations is frequently associated with comorbidities including chronic liver disease, type 2 diabetes mellitus, osteoporosis, and cardiovascular disease. Notably, elderly individuals with SO exhibit significantly higher risks of falls, hospitalizations, metabolic dysregulation, psychological depression, and cognitive impairment compared to those presenting with either sarcopenia or obesity alone [28, 29].

Epidemiological studies indicate that the prevalence of SO in Asian populations stands at 12% [30]. In the current study cohort of 2,376 participants, 266 cases were identified with sarcopenic obesity, yielding a prevalence rate of 11.20% that aligns with existing epidemiological data. Globally, over 20% of the elderly are affected by SO, imposing substantial burdens on individuals and healthcare systems. Early identification of high-risk elderly populations enables timely interventions, thereby mitigating adverse health outcomes [5, 20]. Amid rapid technological progress, information-based diagnostic tools have become vital for enhancing clinical efficiency and optimizing healthcare resources. Artificial intelligence, particularly machine learning, has gained prominence in medicine due to its adaptive, nonlinear processing and efficient learning capabilities, establishing itself as a crucial predictive analytical tool [31, 32].

To investigate the predictive capacity of individual variables for sarcopenic obesity, this study employed feature variables selected through the Boruta feature selection method and developed predictive models using six machine learning algorithms: Random Forest (RF), Extreme Gradient Boosting (XGBoost), Categorical Gradient Boosting (CatBoost), Light Gradient Boosting Machine (LightGBM), Support Vector Machine (SVM), and Logistic Regression (LR). The main purpose of this study is to early detect and screen high-risk populations for SO in the elderly. Through model performance comparisons, it was found that the recall rates and F1 scores of all six models were relatively low. This may be attributed to the research samples being derived from a single-center database, the number of positive samples being smaller than that of negative samples, and the imbalanced data categories. Among these models, the XGBoost model demonstrated the best performance in predicting the risk of SO, achieving the highest recall rate and F1 score among the six models and its AUC value reached 0.78, with an accuracy of 0.89, indicating relatively high predictive accuracy. Additionally, LightGBM and CatBoost also exhibit high AUC and accuracy, suggesting that gradient-boosting-type machine learning algorithms hold comparative advantages in addressing such complex nonlinear problems. In contrast, support vector machines (SVM), logistic regression (LR), and random forests (RF) show relatively weaker performance, particularly in recall and precision metrics. This limitation may stem from their inherent challenges in handling high-dimensional data and nonlinear relationships. Thus, the XGBoost model’s stability and reliability in predicting SO risk in the elderly are well-substantiated.

Based on the CHARLS database, this study systematically compared multiple machine learning algorithms and innovatively identified indicators such as pace and uric acid as key predictors of SO through SHAP analysis. This finding differs from the focus of previous studies [8, 9], which primarily emphasized traditional body composition indicators (such as BMI and grip strength) or clinical biochemical data. It reveals predictive features with considerable potential that are often overlooked in community-based populations, offering new directions for exploring the early mechanisms of SO. In terms of model performance, our best model (XGBoost) achieved an AUC of 0.78. Although this value shows a certain gap compared to the performance demonstrated by models based on clinical data cohorts [10] and multicenter studies [11], the strength of our model lies in the fact that its input variables are all derived from routine community health surveys, making them more accessible and cost-effective. In resource-limited primary healthcare settings, it provides a practical and well-generalizable tool for large-scale, efficient preliminary screening of SO.

The SHAP analysis revealed that the three most influential characteristics contributing to the prediction of SO were waist circumference, pace, and uric acid. Among these, larger waist circumference and higher uric acid levels were associated with an increased risk of SO, which is consistent with previous studies [4, 3335]. As a sensitive indicator of abdominal fat accumulation, waist circumference is closely related to both fat distribution and metabolic state. The increase in abdominal fat not only contributes to obesity but also triggers metabolic abnormalities such as chronic inflammation and insulin resistance [36, 37]. Hyperuricemia or elevated uric acid levels are associated with metabolic disorders, including obesity. Uric acid may induce insulin resistance and impair insulin signal transduction through reactive oxygen species-related pathways in both vivo and vitro studies [38]. Given that insulin resistance is a core mechanism of SO [39], these abnormalities could further affect muscle mass and function, ultimately leading to SO. Contrary to conventional expectation and the consensus from previous studies [4042], our analysis revealed a positive association between faster pace and an increased risk of SO in this specific cohort. It is imperative to interpret this observed correlation with caution, as it contradicts the well-established conclusion that slower pace predicts poor health outcomes.”. We hypothesize that this counterintuitive finding may be attributed to several plausible explanations. First, the study focuses on elderly populations, where short-distance walking speed measurements may not fully reflect real-world gait patterns. Some patients may exhibit shortened stride length and increased walking speed due to abnormal muscle function or disease-related factors (e.g., festinating gait), even though their muscle mass has actually declined. Similarly, obese patients may compensate by walking faster to maintain mobility, despite having relatively insufficient muscle mass, leading to a compensatory increase in pace. Second, Potential confounding factors may also influence the results. For instance, chronic low-grade inflammation or specific previous exercise habits could independently affect both gait performance and body composition, thereby partially explaining the spurious association observed in the current study. Future studies should conduct further stratified analyses to explore whether the relationship between pace and SO varies across subgroups. Additionally, longitudinal tracking data should be used to analyze the temporal sequence between changes in pace and the onset of SO, thereby ruling out reverse causality. Furthermore, combining muscle biopsies or imaging techniques to assess muscle quality and fat infiltration in individuals with faster pace could provide a more comprehensive understanding of the relationship between pace and SO.

In the intervention of sarcopenia, the risk of SO in the elderly can be mitigated by addressing waist circumference and uric acid levels. For waist circumference management, healthcare providers can assist the elderly in controlling abdominal fat accumulation through personalized dietary plans (e.g., calorie restriction, increased dietary fiber intake) and tailored exercise programs (such as Tai Chi, Baduanjin, or swimming). Regarding uric acid control, dietary management should include limiting high-purine foods (e.g., organ meats and seafood) while avoiding exercise modes that may elevate uric acid levels (e.g., sprinting or competitive ball sports). Regular monitoring of waist circumference, grip strength, pace, and periodic uric acid testing (with adjustments as needed) is essential to reduce the risk of SO in this population.

Although numerous studies have utilized the CHARLS database to explore sarcopenia and its related factors in older adults, most of these works have focused solely on sarcopenia—such as its influencing factors, or its associations with health outcomes like cognitive impairment, depression, or frailty in the elderly—while predictive model research specifically targeting SO, a complex syndrome, remains relatively scarce [15, 43, 44]. SO is not simply the co-occurrence of sarcopenia and obesity, but rather a distinct clinical syndrome with unique pathophysiological mechanisms; its risk factors and intervention strategies differ from those of either condition alone [5]. Therefore, developing prediction tools specifically for SO holds significant clinical importance. We specifically selected the CHARLS 2015 wave data because it concurrently includes physical performance indicators and comprehensive blood biomarkers, providing an essential data foundation for the comprehensive assessment of SO. On this basis, we systematically introduced and compared multiple machine learning algorithms and employed the SHAP method to enhance model interpretability, thereby identifying novel, easily accessible community-based predictors such as walking speed and uric acid. This approach not only overcomes the reliance of traditional models on complex body composition measurements but also provides a feasible and efficient tool for the early identification of SO in community settings.

Limitations and prospects

In this study, SO was defined as ' possible sarcopenia ' (based on the AWGS 2019 criteria) combined with BMI ≥ 28 kg/m2. It should be noted that the definition of ' possible sarcopenia ' is mainly used for early screening and lacks direct assessment of muscle mass. Therefore, these findings may reflect a disease burden that is more closely related to the risk of sarcopenia than SO defined by a comprehensive muscle mass assessment. In explaining the conclusions of the study, the limitations of this methodology should be carefully considered. Future research should define SO in conjunction with direct muscle mass measurements to more accurately represent the disease and associated health effects.

Furthermore, while the use of automated machine learning software streamlined the analytical workflow, it also introduced limitations in methodological transparency. The specific hyperparameter search spaces utilized by the optimization algorithm and their final determined values were not accessible to the user, which may affect the full reproducibility of the model tuning process. Future research aiming for utmost transparency should consider employing open-source frameworks that provide complete control and logging of the hyperparameter optimization.

The lack of external validation is an important limitation of this study. Although a strict internal validation method was used to evaluate the performance of the prediction model, the model was not validated on independent external populations (such as CHARLS 2018, CHARLS 2020, other regional populations, or individuals in different medical environments). As a result, the findings cannot determine whether the model can be generalized and applied to a wider population or different medical environments. This also limits the potential clinical application value of the model at the current stage. Future research should use prospectively collected datasets or established independent longitudinal cohorts (such as CHARLS, HRS, etc.) for strict external validation and temporal validation to evaluate the robustness and generalizability of the model in real-world settings.

Additionally, this study is limited by the absence of a direct performance benchmark against previously established SO prediction models (e.g., those developed from the NHANES database or other clinical cohorts). Without such comparative analysis, it is challenging to precisely quantify the incremental value and comparative advantage of our model over existing tools. Future research should prioritize such benchmarking efforts on standardized datasets to clearly delineate the clinical utility and superiority of the proposed model.

Furthermore, future studies should employ multiple feature selection methods (e.g., LASSO, Recursive Feature Elimination) for comparative analysis to further verify the robustness of the predictors identified in this study.

As this study is based on a single-center database, the sample size is limited and potential category imbalance may exist. In future research, multicenter prospective cohort studies could be conducted to expand the sample size, balance category distribution, and incorporate more comprehensive research variables.

Acknowledgements

The authors acknowledge both the CHARLS research team and all participants for their invaluable contributions to this study.

Authors’ contributions

All authors have made substantial contributions to the reported work, including the conception and design of the study, execution, data acquisition, analysis, and interpretation. They have approved the submitted version of the manuscript and agreed to be accountable for all aspects of the research. All authors have reviewed and approved the final manuscript.

Funding

This study was supported by:

The 2024 Hospital Self-Funded Scientific Research Cultivation Project of the First Affiliated Hospital of Guangxi Medical University—Clinical Nursing Research Climbing Program (Project Title: Clinical Study on Early Resistance Training for Preventing Venous Thromboembolism in Mechanically Ventilated ICU Patients; Project Number: YYZS2023018).

The Guangxi Natural Science Foundation General Program (Regional High-Incidence Disease Special Project-Guangxi Medical University) (Project Title: Study on the Effects and Potential Mechanisms of Early Pulmonary Rehabilitation on Muscle Atrophy and Diaphragm Function in Mechanically Ventilated Patients; Project Number: 2025GXNSFAA069940).

Data availability

The research datasets are publicly available through the China Health and Retirement Longitudinal Study (CHARLS) repository: [https://charls.pku.edu.cn] (https:/charls.pku.edu.cn).

Ethics approval and consent to participate

The authors declare no competing interests.

Ethics approval and consent to participate

The authors declare no competing interests.

Consent for publication

Not applicable.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Stenholm S, Harris TB, Rantanen T, Visser M, Kritchevsky SB, Ferrucci L. Sarcopenic obesity: definition, cause and consequences. Curr Opin Clin Nutr Metab Care. 2008;11(6):693–700. 10.1097/MCO.0b013e328312c37d. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Cavazzotto TG, et al. Association between cognitive performance and sarcopenic obesity in older adults with Alzheimer’s disease. Dement Neuropsychol. 2022;16(1):28–32. 10.1590/1980-5764-dn-2021-0039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Huang S-W, Lee Y-H, Liao C-D, Escorpizo R, Liou T-H, Lin H-W. Association of physical functional activity impairment with severity of sarcopenic obesity: findings from National health and nutrition examination survey. Sci Rep. 2024;14(1):3787. 10.1038/s41598-024-54102-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chen JY, et al. Association between sarcopenic obesity, obesity, sarcopenia and quality of life in middle-aged and older Chinese: the Guangzhou biobank cohort study. Qual Life Res. 2025;34(7):1995–2004. 10.1007/s11136-025-03960-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Axelrod CL, Dantas WS, Kirwan JP. Sarcopenic obesity: emerging mechanisms and therapeutic potential. Metabolism. 2023;146:155639. 10.1016/j.metabol.2023.155639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Batsis JA, Villareal DT. Sarcopenic obesity in older adults: aetiology, epidemiology and treatment strategies. Nat Rev Endocrinol. 2018;14(9):513–37. 10.1038/s41574-018-0062-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Handelman GS, Kok HK, Chandra RV, Razavi AH, Lee MJ, Asadi H. eDoctor machine learning and the future of medicine.pdf. J Intern Med. 2018;284(6):603–619. 10.1111/joim.12822. [DOI] [PubMed]
  • 8.Bae J-H, Seo J, Li X, Ahn S, Sung Y, Kim DY. Neural network model for prediction of possible sarcopenic obesity using Korean National fitness award data (2010–2023). Sci Rep. 2024;14(1):14565. 10.1038/s41598-024-64742-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zambon Azevedo V, Ponnaiah M, Bel Lassen P, Ratziu V, Oppert J-M. A diagnostic proposal for sarcopenic obesity in adults based on body composition phenotypes. Clin Nutr ESPEN. 2022;52:119–30. 10.1016/j.clnesp.2022.10.010. [DOI] [PubMed] [Google Scholar]
  • 10.Xu M, et al. Construction of a prediction model for sarcopenic obesity based on machine learning. Front Public Health. 2025;13:1576338. 10.3389/fpubh.2025.1576338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lian R, et al. Development and multi-center cross-setting validation of an explainable prediction model for sarcopenic obesity: a machine learning approach based on readily available clinical features. Aging Clin Exp Res. 2025;37(1):63. 10.1007/s40520-025-02975-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Damluji AA, et al. Sarcopenia and cardiovascular diseases. Circulation. 2023;147(20):1534–53. 10.1161/circulationaha.123.064071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Chen L-K, et al. Asian working group for sarcopenia: 2019 consensus update on sarcopenia diagnosis and treatment. J Am Med Dir Assoc. 2020;21(3):300–7. 10.1016/j.jamda.2019.12.012. [DOI] [PubMed]
  • 14.Si Y, Hanewald K, Chen S, Li B, Bateman H, Beard J. Life-course inequalities in intrinsic capacity and healthy ageing, China, Bull World Health Org. 2023;101(05):307-316C. 10.2471/BLT.22.288888. [DOI] [PMC free article] [PubMed]
  • 15.Hu Y, Peng W, Ren R, Wang Y, Wang G. Sarcopenia and mild cognitive impairment among elderly adults: the first longitudinal evidence from CHARLS. J Cachexia Sarcopenia Muscle. 2022;13(6):2944–52. 10.1002/jcsm.13081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Zhao Y, Hu Y, Smith JP, Strauss J, Yang G. Cohort profile: the China health and retirement longitudinal study (CHARLS). Int J Epidemiol. 2014;43(1):61–8. 10.1093/ije/dys203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Han A, Bokshan S, Marcaccio S, DePasse J, Daniels A. Diagnostic criteria and clinical outcomes in sarcopenia research: a literature review. JCM. 2018;7(4):70. 10.3390/jcm7040070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Chen L-K, et al. Sarcopenia in Asia: consensus report of the Asian working group for sarcopenia. J Am Med Dir Assoc. 2014;15(2):95–101. 10.1016/j.jamda.2013.11.025. [DOI] [PubMed] [Google Scholar]
  • 19.Chen C, Lu FC, Department of Disease Control Ministry of Health, China PR. The guidelines for prevention and control of overweight and obesity in Chinese adults. Biomed Environ Sci. 2004;17:1–36. [PubMed] [Google Scholar]
  • 20.Ji T, Li Y, Ma L. Sarcopenic obesity: an emerging public health problem. Aging Dis. 2022;13(2):379. 10.14336/AD.2021.1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Zhou H, Xin Y, Li S. A diabetes prediction model based on Boruta feature selection and ensemble learning. BMC Bioinformatics. 2023;24(1):224. 10.1186/s12859-023-05300-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Qiu H, Zheng W, Zhou X, Liu Q, Zhao X. Training modalities for elder sarcopenic obesity: a systematic review and network meta-analysis. Front Nutr. 2025;12:1537291. 10.3389/fnut.2025.1537291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Flores-Flores O, et al. Sarcopenia and sarcopenic obesity among community-dwelling Peruvian adults: a cross-sectional study. PLoS One. 2024;19(4):e0300224. 10.1371/journal.pone.0300224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Bunchorntavakul C. Sarcopenia and frailty in cirrhosis. Med Clin North Am. 2023;107(3):589–604. 10.1016/j.mcna.2022.12.007. [DOI] [PubMed] [Google Scholar]
  • 25.Sanz-Cánovas J, et al. Management of type 2 diabetes mellitus in elderly patients with frailty and/or sarcopenia. IJERPH. 2022;19(14):8677. 10.3390/ijerph19148677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Liu C, Liu N, Xia Y, Zhao Z, Xiao T, Li H. Osteoporosis and sarcopenia-related traits: a bi-directional Mendelian randomization study. Front Endocrinol. 2022;13:975647. 10.3389/fendo.2022.975647. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zuo X, et al. Sarcopenia and cardiovascular diseases: a systematic review and meta-analysis. J Cachexia Sarcopenia Muscle. 2023;14(3):1183–98. 10.1002/jcsm.13221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Booranasuksakul U, Tsintzas K, Macdonald I, Cm B, Stephan, Siervo M. Application of a new definition of sarcopenic obesity in middle-aged and older adults and association with cognitive function: Findings from the national health and nutrition examination survey 1999–2002, Clin Nutr ESPEN. 2024;63:919–928. 10.1016/j.clnesp.2024.08.017. [DOI] [PubMed]
  • 29.Chen L-K. Unique compositional signature, pathophysiology and clinical implications of sarcopenic obesity. Arch Gerontol Geriatr. 2024;124:105501. 10.1016/j.archger.2024.105501. [DOI] [PubMed] [Google Scholar]
  • 30.Gao Q, et al. Global prevalence of sarcopenic obesity in older adults: a systematic review and meta-analysis. Clin Nutr. 2021;40(7):4633–41. 10.1016/j.clnu.2021.06.009. [DOI] [PubMed] [Google Scholar]
  • 31.Liew BXW, Kovacs FM, Rügamer D, Royuela A. Machine learning versus logistic regression for prognostic modelling in individuals with non-specific neck pain. Eur Spine J. 2022;31(8):2082–91. 10.1007/s00586-022-07188-w. [DOI] [PubMed] [Google Scholar]
  • 32.Liu H, Tripathy RK. Machine learning and deep learning for healthcare data processing and analyzing: towards data-driven decision-making and precise medicine. Diagnostics. 2025;15(8):1051. 10.3390/diagnostics15081051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Luo Y, et al. Prevalence of sarcopenic obesity in the older non-hospitalized population: a systematic review and meta-analysis. BMC Geriatr. 2024;24(1):357. 10.1186/s12877-024-04952-z. [DOI] [PMC free article] [PubMed]
  • 34.Yu EH, Lee HJ, Kim HJ, Kim IH, Joo JK, Na YJ. Correlation of sarcopenic obesity on various cardiometabolic risk factors and fracture risk in mid-aged Korean women. J Menopausal Med. 2023. 10.6118/jmm.23014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Li Z, Yin S, Zhao G, Cao X. Association between sarcopenic obesity and osteoarthritis: the potential mediating role of insulin resistance. Exp Gerontol. 2024;197:112611. 10.1016/j.exger.2024.112611. [DOI] [PubMed] [Google Scholar]
  • 36.Arif M, Gaur DK, Gemini N, Iqbal ZA, Alghadir AH. Correlation of percentage body fat, waist circumference and waist-to-hip ratio with abdominal muscle strength. Healthcare. 2022;10(12):2467. 10.3390/healthcare10122467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Han SY, Kim NH, Kim DH, Kim YH, Park YK, Kim SM. Associations between body mass index, waist circumference, and myocardial infarction in older adults aged over 75 years: a population-based cohort study. Medicina (B Aires). 2022;58(12):1768. 10.3390/medicina58121768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Wan X, et al. Uric acid regulates hepatic steatosis and insulin resistance through the NLRP3 inflammasome-dependent mechanism. J Hepatol. 2016;64(4):925–32. 10.1016/j.jhep.2015.11.022. [DOI] [PubMed] [Google Scholar]
  • 39.Hong S, Choi KM. Sarcopenic obesity, insulin resistance, and their implications in cardiovascular and metabolic consequences. IJMS. 2020;21(2):494. 10.3390/ijms21020494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Booranasuksakul U, Macdonald IA, Stephan BCM, Siervo M. Body composition, sarcopenic obesity, and cognitive function in older adults: findings from the National Health and Nutrition Examination Survey (NHANES) 1999–2002 and 2011–2014. Journal of the American Nutrition Association. 2024;43(6):539–52. 10.1080/27697061.2024.2333310. [DOI] [PubMed] [Google Scholar]
  • 41.Guimarães NS, Reis MG, Tameirão DR, De Castro Cezar NO, Leopoldino AAO, Magno LAV. Factors associated with sarcopenic obesity in Brazilian adults and older people: systematic review and meta-analysis of observational studies. Geriatr Gerontol Int. 2024;24(7):661–74. 10.1111/ggi.14918. [DOI] [PubMed] [Google Scholar]
  • 42.Choi S, et al. The impact of the physical activity level on sarcopenic obesity in community-dwelling older adults. Healthcare. 2024;12(3):349. 10.3390/healthcare12030349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Xu W, et al. Sarcopenia and frailty among older Chinese adults: findings from the CHARLS study. PLoS One. 2024;19(11):e0312879. 10.1371/journal.pone.0312879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Liu Y, Cui J, Cao L, Stubbendorff A, Zhang S. Association of depression with incident sarcopenia and modified effect from healthy lifestyle: the first longitudinal evidence from the CHARLS. J Affect Disord. 2024;344:373–9. 10.1016/j.jad.2023.10.012. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The research datasets are publicly available through the China Health and Retirement Longitudinal Study (CHARLS) repository: [https://charls.pku.edu.cn] (https:/charls.pku.edu.cn).


Articles from BMC Geriatrics are provided here courtesy of BMC

RESOURCES