Summary
This study aims to identify and predict latent trajectories of depression and chronic disease among middle-aged and older adults in China using data-driven and interpretable machine learning methods, and to explore key factors that promote healthy aging. To achieve this, we analyzed longitudinal data from 13,073 middle-aged and older adults in the China Health and Retirement Longitudinal Study (CHARLS). Group-based multi-trajectory modeling (GBMTM) was applied to identify latent trajectory groups for depression and chronic disease statuses. Predictive factors included sociodemographic characteristics, health conditions, and lifestyle factors. Machine learning models and dynamic nomograms were used to predict trajectory groups, and model performance was evaluated using the area under the receiver operating characteristic curve (AUROC) and decision curve analysis (DCA). As a result, three main trajectory groups were identified: a normal healthy trajectory group (26.9%), a potential depression and disease increase trajectory group (55.6%), and a high depression and disease burden trajectory group (17.5%). Additionally, the study found that older age, disability, shorter sleep duration, and poor self-reported health status were associated with a higher likelihood of belonging to the latent depression and disease increase trajectory group or the high disease burden trajectory group, particularly among urban women. In conclusion, this study demonstrates that the GBMTM and machine learning models can effectively identify and predict depression and chronic disease trajectories. The identified predictors are crucial for developing targeted interventions to promote healthy aging among the middle-aged and older adults.
Keywords: group-based multi-trajectory modeling, successful aging, depression, chronic disease, predictive models
Introduction
As global population aging accelerates, China is facing severe social and health challenges (1). The country's older adult population now exceeds that of all European nations combined, making aging a major public health concern (2). Chronic diseases are among the most serious consequences of aging, contributing to a substantial societal burden and imposing significant psychological and economic stress on both patients and their families (3). In China, 75.8% of older adults have at least one chronic condition (4), and the risk of multimorbidity increases with age (5).
Depression, characterized by low mood and anhedonia, is also common in older adults (6,7). It is strongly associated with chronic illness, with individuals experiencing multiple conditions exhibiting a higher risk of depression (8,9). Furthermore, studies have shown that the incidence of depression exhibits clear temporal dynamics, with the risk of chronic diseases increases progressively with age (10). Despite the increasing prevalence of depression and chronic diseases, research on the developmental trajectory patterns between depression and chronic diseases remains relatively limited. Therefore, investigating the joint development trajectories of depression and chronic diseases is crucial for formulating effective prevention and intervention strategies.
Although prior studies have investigated the trajectories of depression or chronic disease separately, few have integrated both to assess the heterogeneity and interrelation in their progression (11,12). Traditional statistical models, such as multiple regression, often fail to capture nonlinear trends and higher-order interactions (13). Machine learning methods address these limitations by modeling complex, nonlinear relationships and improving predictive accuracy (14). However, the application of machine learning to chronic disease trajectories remains limited (15).
This study aims to address the following key scientific questions: First, using group-based multi-trajectory modeling (GBMTM), we will analyze the dynamic changes in depression and chronic disease trajectories over eight years among adults aged 45 and older in the China Health and Retirement Longitudinal Study (CHARLS). Second, based on feature selection techniques including Least Absolute Shrinkage and Selection Operator (LASSO) and Recursive Feature Elimination (RFE), we will construct and evaluate nine machine learning algorithms -Logistic Regression (LR), Multi-Layer Perceptron (MLP), LightGBM, Elastic Net (Enet), Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Random Forest (RF), and Extreme Gradient Boosting (XGB) - to systematically classify and predict trajectory groups of depression and chronic disease. The performance of each algorithm will be assessed in terms of predictive accuracy. Third, we will use SHAP (SHapley Additive exPlanations), logistic regression, and nomograms to identify and visualize the most influential predictors of the identified trajectories.
Materials and Methods
Data source
This study is based on data from the CHARLS conducted in 2011, 2013, 2015, and 2018. CHARLS is a nationally representative cohort study, with the baseline survey conducted in 2011 through a multistage probability sampling method. The survey covers 28 provinces, 150 counties, and 450 villages or urban communities across China. Follow-up surveys were conducted in 2013, 2015, and 2018, aiming to comprehensively collect data on the health status and related factors of older adults in China. CHARLS was approved by the Biomedical Ethics Committee of Peking University (IRB00001052-11015), and all participants provided written informed consent (16). The present analysis adheres to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines (17).
In this study, the exclusion criteria were as follows: participants who were younger than 45 years of age at baseline (n = 2,547), those missing baseline depression scores or chronic disease data (n = 1,606), or those lacking key information (e.g., education level, marital status, smoking habits, drinking habits, self-reported health status, and related diseases) (n = 24,565). Additionally, 4,569 participants were excluded due to missing follow-up data on depression and chronic disease. To ensure the robustness of the results, 6,691 individuals who had data from only one measurement were also excluded. Ultimately, a total of 13,073 participants with complete data from at least two time points were included in the final analysis (Supplemental Figure S1, https://www.globalhealthmedicine.com/site/supplementaldata.html?ID=104).
Assessment of depression and chronic disease count
Depressive symptoms were assessed using the 10- item version of the Center for Epidemiologic Studies Depression Scale (CESD-10), which has demonstrated validity for evaluating depression among Chinese adults (18). The scale consists of 10 items, each with four response options referring to the frequency of symptoms over the past week: i) rarely or none of the time (< 1 day), ii) some or a little of the time (1-2 days), iii) occasionally or a moderate amount of time (3-4 days), and iv) most or all of the time (5-7 days). The total score ranges from 0 to 30, with higher scores indicating more severe depressive symptoms. In the study, a CESD- 10 score of ≥ 10 was used to define the presence of depressive symptoms (19).
The number of chronic diseases was assessed using a standardized questionnaire, which asked participants whether they had ever been diagnosed by a physician with any of the following conditions: hypertension, dyslipidemia, diabetes, cancer, chronic pulmonary disease, liver disease, heart disease, stroke, kidney disease, digestive system diseases, emotional or psychiatric disorders, memory-related diseases, arthritis or rheumatic diseases, or asthma. The total number of chronic diseases was then calculated (ranging from 0 to 14). Participants were classified into four groups: 0 (no chronic diseases), 1 (one chronic disease), 2 (two chronic diseases), and ≥ 3 (three or more chronic diseases) (20).
Heterogeneous trajectory grouping
This study utilized the GBMTM to identify groups of individuals with similar trajectories of depressive symptoms and chronic disease count. GBMTM is primarily used to analyze longitudinal data, aiming to cluster individuals with comparable developmental patterns and to identify distinct trajectory subgroups (21). Detailed information on the implementation of the trajectory model is provided in Supplemental Method 1 (https://www.globalhealthmedicine.com/site/supplementaldata.html?ID=104).
The selection of the trajectory model was based on several criteria, including the Bayesian Information Criterion (BIC), Akaike's Information Criterion (AIC), log-likelihood (LL), and entropy. Additionally, to further validate the model's robustness, we used Average Posterior Probability (AvePP, requiring a value above 0.7) and the Predicted Probability of Group Membership (PPGM, requiring a value above 5%) as supplementary statistical indicators (22).
Predictive variables
To predict the trajectory groups of depression, this study initially screened 14 variables based on previous research (20,23). These variables were consistently recorded across all four waves of data and were considered relevant to depression status. The specific descriptions and definitions of the variables can be found in Supplemental Method 2 (https://www.globalhealthmedicine.com/site/supplementaldata.html?ID=104), with detailed information provided in Supplemental Table S1 (https://www.globalhealthmedicine.com/site/supplementaldata.html?ID=104).
The predictive variables in the study were divided into three major categories: sociodemographic characteristics, health status, and health behaviors.
i) Sociodemographic Characteristics: This category includes basic information such as age, gender, education level, marital status, and place of residence.
ii) Health Status: This category encompasses various factors closely related to both physical and mental health, including average daily sleep duration, self-reported health status (compared to the previous year), Activities of Daily Living (ADL), Instrumental Activities of Daily Living (IADL), cognitive status (MMSE), and the presence of any disability.
iii) Health Behaviors: This includes smoking status, drinking status, participation in exercise, and involvement in leisure activities.
To minimize the impact of missing data on model prediction performance, missing values were imputed using the missForest algorithm, which is based on the assumption of randomness for the missing values. This algorithm performs well with mixed-type data, and its implementation is described in Supplemental Method 3 (https://www.globalhealthmedicine.com/site/supplementaldata.html?ID=104).
Analysis methods
Feature selection
Simplicity is one of the core principles in building predictive models to prevent overfitting, which can be achieved through feature selection (24). Thus, this study employed a two-stage selection approach, including LASSO and RFE. In RFE, RF, DT, and Naive Bayes (NB) were compared as base models. A detailed description of the feature selection process is provided in Supplemental Method 4 (https://www.globalhealthmedicine.com/site/supplementaldata.html?ID=104).
Initially, LASSO and RFE were used separately to perform feature selection on the initial set of variables. The LASSO model applies L1 regularization to select a sparse subset of features that are significantly associated with the target variable. LASSO has a notable advantage in handling multicollinearity among features and generates models with high interpretability (25). During the RFE phase, recursive feature selection was conducted based on DT, LR, and NB base models to leverage the ability of different models to assess feature importance.
Ultimately, considering both the LASSO and RFE selection results, six key variables were identified (Supplemental Figure S2, https://www.globalhealthmedicine.com/site/supplementaldata.html?ID=104). This multi-method feature selection strategy not only effectively improved the predictive performance of the model but also significantly reduced redundant features, further enhancing the model's simplicity and generalizability.
Development and validation of trajectory group prediction models
The development and validation of predictive models followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement (26). Although a single algorithm may suffice for prediction in practical applications, to avoid model selection bias, we tested multiple algorithms, including LR, Enet, KNN, LightGBM, DT, MLP, RF, SVM, and XGBoost (27-29). A detailed description of the methods is provided in Supplemental Method S5 (https://www.globalhealthmedicine.com/site/supplementaldata.html?ID=104).
The data was randomly divided into a training set (70%) and a testing set (30%). Given the imbalanced distribution of depression symptoms and sleep duration trajectory groups, Synthetic Minority Over-sampling Technique (SMOTE) was applied to resample the training set to reduce predictive bias caused by data imbalance (30).
To avoid data leakage and result bias, data preprocessing, including missing value imputation, feature selection, standardization, one-hot encoding, and resampling, was first completed on the training set after the testing data was separated (31). Hyperparameters for the training set were optimized using 10-fold cross-validation and grid search, with the best hyperparameters selected based on predictive accuracy. Finally, internal validation was conducted on the testing set using 1,000 bootstrap resamples to assess the model's generalizability (32,33).
We performed a comprehensive comparison of the nine machine learning models' performances, using various evaluation metrics to assess model performance, including area under the receiver operating characteristic curve (AUROC), accuracy, Kappa coefficient, sensitivity, specificity, Matthews correlation coefficient (MCC), Youden index, balanced accuracy, precision, recall, F1 score, and Brier score (Supplemental Method 6, (https://www.globalhealthmedicine.com/site/supplementaldata.html?ID=104). These metrics allowed for a multi-dimensional assessment of the models' classification capabilities.
Furthermore, to evaluate the practical effectiveness of the models, we introduced decision curve analysis (DCA) to compare the net benefits of the models. This comprehensive evaluation helps to better understand the advantages and limitations of each model and provides a scientific basis for model selection in practical applications.
Additionally, based on the results of the LR model, we created a forest plot to display the key factors influencing the trajectory groups of depression and chronic disease count. We also developed a nomogram based on the LR model for practical application. Finally, a dynamic nomogram was constructed for the dynamic prediction of depression status and chronic disease trajectory groups (33). The overall workflow for model development and validation is shown in Figure 1.
Figure 1.

Workflow of model development and validation.
Sensitivity analysis
A sensitivity analysis was performed to evaluate the robustness of the primary findings. Given that trajectory analysis yields more stable results for individuals with more frequent assessments, we included 8,241 participants who completed at least three waves of the CES-D 10 and chronic disease questionnaires. The results of the trajectory analysis were consistent with the main analysis (Supplemental Figure S3, https://www.globalhealthmedicine.com/site/supplementaldata.html?ID=104).
Statistical analysis
Statistical analyses were performed using R software (version 4.4.1) and Stata (version 18.0). Continuous variables are presented as Mean ± SD and were compared using the Student's t-test or the Mann-Whitney test; categorical variables are described as frequencies (percentages) and were compared using chi-square tests or Fisher's exact test.
The development of machine learning models was conducted using the "tidymodels" package in R. Static and dynamic nomograms were constructed using the "rms", "regplot", "DynNom", and "shiny" packages. A two-sided P-value < 0.05 was considered statistically significant.
Results
Depression symptoms and chronic disease count trajectory groups
Based on depression status and chronic disease count, three trajectory groups were identified as the best-fitting model (BIC = -232,734.10, AIC = -232,629.40, log-likelihood: -232,601.40, Entropy = 0.927) (Figure 2). The single trajectory analysis of depression status and sleep duration, along with their estimated parameters, is provided in Supplemental Table S2 and S3 and Supplemental Figure S4 (https://www.globalhealthmedicine.com/site/supplementaldata.html?ID=104).
Figure 2.

Trend for the single outcome within (reading down) and between (reading across) the two groups (at least two waves, n = 13,073). Group 1: normal healthy trajectory group; Group 2: potential depression and disease increase trajectory group; Group 3: high depression and disease burden trajectory group. CES-D 10, The 10-item Center for Epidemiologic Studies Depression Scale.
As shown in Figure 2, 26.9% of individuals exhibited a relatively stable and low level of both CES-D 10 scores (indicating depression status) and chronic disease count, which was defined as the "normal healthy" trajectory group (Group 1). In addition, 55.6% of individuals showed an increasing trend in depression status and chronic disease count over time, with depression levels at a threshold that suggested potential depression; this group was defined as the "potential depression and disease increase" trajectory group (Group 2). On the other hand, 17.5% of individuals exhibited more severe depression and an increased chronic disease burden, with a rising trend in both factors during the follow-up period, which was defined as the "high depression and disease burden" trajectory group (Group 3).
Baseline characteristics of participants
At baseline, the mean age of participants was 57.31 ± 8.64 years, with 49.3% of participants being female. The average CES-D 10 score was 7.68 ± 5.99, and the average sleep duration was 6.40 ± 1.76 hours. Additionally, comparisons of other participant characteristics and baseline features across different trajectory groups are provided in Supplemental Table S4 (https://www.globalhealthmedicine.com/site/supplementaldata.html?ID=104).
Predictors of depression and chronic disease count trajectory groups
This study utilized both LASSO regression and RFE methods to optimize the selection of predictors for the depression trajectory groups. LASSO regression analysis identified 6 key predictive variables from an initial set of 14 candidate variables. After considering both the simplicity and accuracy of the predictive model, and integrating the results from LASSO and RFE, a final set of 6 core predictive features was determined. These features included baseline age, gender, disability status, place of residence, self-reported health status, and average sleep duration. The related analysis results are detailed in Supplemental Figure S2 and Supplemental Table S5 and S6 (https://www.globalhealthmedicine.com/site/supplementaldata.html?ID=104).
Performance evaluation of trajectory prediction models
The performance evaluation results of the machine learning models based on the test set are shown in Figure 3 and Supplemental Table S7 (https://www.globalhealthmedicine.com/site/supplementaldata.html?ID=104). Figure 3 illustrates the performance of various models across recall, sensitivity, specificity, F1 score, accuracy, balanced accuracy, AUROC, precision, and Brier score. The predictive models constructed using the 6 key features identified by LASSO regression and RFE achieved AUROCs of 0.65 or higher on the test set. Among these models, SVM demonstrated the highest performance, with an AUROC of 0.72. In contrast, other models, including LR and XGBoost, also exhibited AUROCs exceeding 0.7. Specificity remained consistently high across all models, around 0.70, indicating reliable identification of negative cases. XGBoost achieved the highest precision (0.54), while the F1 score and balanced accuracy remained stable across models, ranging from 0.47 to 0.50 and 0.56 to 0.59, respectively. Calibration, measured by the Brier score, revealed low predictive errors across the models, reflecting their overall reliability. While SVM exhibited the strongest discriminative power, other models showed unique advantages: XGBoost achieved the highest precision, random forest excelled in specificity, and logistic regression and ensemble methods maintained consistent calibration, highlighting the complementary strengths of different machine learning models across various performance metrics.
Figure 3.

Heatmap for the performance of machine learning models.
Figure 4 (A-C) presents the ROC curves and their corresponding AUC values for different machine learning models in predicting the three trajectory groups. Overall, the models demonstrated strong predictive performance in Group 1 and Group 3, with AUC values approaching 0.7, indicating good overall predictive accuracy. Among these models, SVM, LR, and XGBoost performed particularly well, exhibiting stable and superior predictive capabilities. However, the models performed relatively poorly in Group 2, with notably lower AUC values. This phenomenon may be attributed to class imbalance or greater heterogeneity within Group 2.
Figure 4.

ROC and DCA curves for the machine learning models. ROC, receiver operating characteristics curves; DCA, decision curve analysis.
Figure 4 (D-F) further evaluates the net benefit of different models at various decision thresholds through clinical DCA. The DCA results provide insights into the potential clinical utility of these models. In Group 1 (Figure 4D), most models showed higher net benefit within a lower decision threshold range (less than 0.3), particularly LR and XGBoost, demonstrating their potential for practical application in clinical decision-making. In Group 2 (Figure 4E), the net benefit of the models was generally lower and exhibited some degree of fluctuation. For Group 3 (Figure 4F), XGBoost and SVM showed higher net benefit within the lower threshold range, further confirming their predictive advantage and clinical applicability in Group 3.
Interpretability analysis of the predictive model
In this study, the logistic regression model exhibited excellent performance. To further interpret the predictive results of the logistic regression model, we introduced SHAP values, which help to elucidate the contribution of each variable to the model's predictions. SHAP decomposes the model's predictions into the individual contributions of each input feature, allowing for a quantification of how each variable affects the model's outcome. As shown in Figure 5 (A-C), the SHAP analysis revealed that self-reported health status, age, sleep duration, and disability status were the most influential variables in predicting the trajectory groups 1, 2, and 3. These features were central to the model's output and provide valuable insights for future strategies aimed at preventing depression and chronic disease burdens.
Figure 5.

Interpretability analysis with SHapley Additive exPlanations and RCS based on logistic regression to analyze the relationship between important variables including age and sleep duration and depression and chronic disease trajectory. RCS, restricted cubic spline.
To further explore the relationship between the core variables and the burden of depression and chronic diseases, and to simplify the analysis, we combined trajectory groups 2 and 3 into a single group representing the increased burden of depression and chronic diseases. Based on restricted cubic splines (RCS) analysis (Figure 5D, 5E), the results showed that increasing age significantly elevated the risk of depression and chronic disease burden, whereas longer sleep duration was associated with a significantly reduced risk. This finding was further validated by logistic regression analysis. Overall, changes in age and sleep duration were found to be crucial determinants of the risk for depression and chronic disease burden, providing a scientific basis for the development of targeted intervention strategies.
To further enhance the interpretability of the model, we incorporated the confusion matrix (Supplemental Figure S5, https://www.globalhealthmedicine.com/site/supplementaldata.html?ID=104), which provides a more intuitive representation of the model's predictive accuracy across different trajectory groups, thereby further supporting the validity and reliability of the model's results.
Nomogram for predicting trajectory groups
As shown in Figure 6A, logistic regression analysis (Supplemental Table S8, https://www.globalhealthmedicine.com/site/supplementaldata.html?ID=104) revealed that baseline characteristics play a significant role in predicting the trajectory groups of "potential depression and disease increase" and "high depression and disease burden". The results are presented in the form of a forest plot. Feature selection through LASSO and RFE methods identified six key factors as important predictors of these two trajectory groups. Specifically, younger age, male sex, rural residence, absence of disability, better self-reported health, and longer sleep duration were identified as protective factors, significantly reducing the likelihood of individuals entering the "potential depression and disease increase" and "high depression and disease burden" trajectory groups.
Figure 6.

Forest plot, static and dynamic nomogram for the LR model. (A): Forest plot for Worsening Depression and Chronic Disease Burden based on LR; (B): Static Nomogram for Worsening Depression and Chronic Disease Burden; (C): Dynamic Nomogram for Worsening Depression and Chronic Disease Burden. LR: logistic Regression.
To further simplify the model and enhance its practical utility, we combined the "potential depression and disease increase" and "high depression and disease burden" trajectory groups into a single group labeled "worsening depression and chronic disease burden", as both represent poor depression status and increased chronic disease burden (34). Based on this, we developed a static nomogram (Figure 6B) and a dynamic nomogram (Figure 6C, link: https://ranyandynamicnomogram.shinyapps.io/dynnomapp-2/) to predict the probability of future increases in depression and chronic disease burden for individuals.
Discussion
This study, utilizing data from the CHARLS spanning from 2011 to 2018, is the first to explore the development trajectories of depressive symptoms and the number of chronic diseases in middle-aged and older adults in China, as well as their key predictors. Using GBMTM, we identified three major trajectories of depression and chronic diseases, finding that only 26.9% of participants exhibited stable depressive symptoms and chronic disease conditions over the study period. Additionally, by incorporating machine learning algorithms, we successfully identified the following six key predictive factors: baseline age, place of residence, disability status, average sleep duration, self-reported health status, and gender. SHAP analysis was employed to explain the importance of these factors in predicting the different trajectory groups, and RCS analysis revealed the non-linear relationships between age, sleep duration, and the trajectories of increasing depression and chronic disease burden.
The findings indicate that older age, urban residence, insufficient sleep, poorer health, and disability status in middle-aged and older women are more likely to be associated with trajectories of increasing depression and chronic disease burden. We observed a positive correlation between depressive symptoms and age, with the number of chronic diseases increasing as age progresses. Existing research supports the notion that depression may lead to further deterioration in neuropsychological functioning among older adults, and they are more susceptible to chronic diseases as they age (35). Notably, women have a higher risk of chronic diseases such as cardiovascular disease and diabetes (36), and the interplay of social and physiological factors makes them more vulnerable to depression (37,38).
Both insufficient and excessive sleep have been linked to an elevated risk of depression and adverse physical health outcomes (39). Disability not only increases the prevalence of chronic diseases but is also associated with a higher incidence of depressive symptoms (40). Therefore, improving basic health and quality of life, along with maintaining good sleep quality, is crucial for promoting healthy aging. Individuals with poorer self-reported health status often experience more significant depression and chronic disease issues due to physical disabilities and emotional distress (41). Consequently, maintaining basic health and improving quality of life are essential for promoting positive aging.
This study validates the potential of integrating machine learning techniques with existing health data as an effective screening tool. This tool not only helps optimize the assessment of depression and chronic disease conditions in middle-aged and older adults but also provides guidance for the personalization and flexibility of prevention and treatment strategies. Furthermore, the study identifies low-cost and easily accessible predictive factors, such as good sleep quality and maintaining overall health, which provide a scientific basis for developing preventive strategies targeted at high-risk groups. These strategies may help delay the progression of depressive and chronic disease symptoms.
The practical value of this study is substantial. First, we used the GBMTM method to explore, for the first time, the group characteristics of depression and chronic diseases among the older population in China. Second, through LASSO and RFE feature selection methods, coupled with the further explanation provided by SHAP values, this study reveals the mechanisms through which various variables influence the trajectory categories of depression and chronic disease. Additionally, we analyzed the non-linear relationship between age, sleep duration, and the high-risk trajectory groups for depression and chronic disease using RCS curves. Finally, the static and dynamic nomogram tools designed in this study provide critical technical support for personalized risk assessment in community healthcare services, thus laying the foundation for the development of early prevention and intervention strategies.
Despite the significant progress made in this study, there are some limitations. First, while internal validation was conducted at multiple time points to assess the generalizability of the model, external validation was not performed to confirm the model's stability. Second, because the dataset includes only CES-D 10 depression assessments and chronic disease questionnaire data from up to four time points, the model's applicability to data from additional time points could not be verified. Furthermore, although existing studies have shown a significant association between cognitive decline and depressive symptoms (42), this study did not further investigate the potential role of cognitive status in predicting depression and chronic disease trajectories due to limitations in data resources and research design. Future research should further explore these issues and address these limitations to enhance our understanding of aging health trajectories and the reliability of predictive models.
Conclusion
This study identified three trajectory patterns of comorbid depression and chronic disease among the middle-aged and older adults in China. The results indicate that women who are older, reside in urban areas, have disabilities, self-report poor health, and have shorter sleep durations are more likely to belong to the high-risk trajectory of increasing depression and chronic disease. Additionally, the dynamic nomogram proposed in this study provides a practical tool for early risk identification, offering new insights for the development of targeted mental health screening and intervention strategies for the middle-aged and older adults.
Acknowledgements
We express our gratitude to the China Health and Retirement Longitudinal Study (CHARLS) team for providing the data. Publicly available datasets were analyzed in this study. These data can be found at http://charls.pku.edu.cn.
Funding
Research funding is provided by a university-level project at Renmin University of China: A Study on the Impact of Urban-Rural Differences on Depression among Middle-Aged and Older People in China.
Conflict of Interest
The authors have no conflicts of interest to disclose.
References
- 1. Chen X, Giles J, Yao Y, et al. The path to healthy ageing in China: A Peking University-Lancet Commission. Lancet. 2022; 400:1967-2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Guo C, Zheng X. Health challenges and opportunities for an aging China. Am J Public Health. 2018; 108:890-892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Khera R, Valero-Elizondo J, Nasir K. Financial toxicity in atherosclerotic cardiovascular disease in the United States: Current state and future directions. J Am Heart Assoc. 2020; 9:e017793. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Wang LM, Chen ZH, Zhang M, Zhao ZP, Huang ZJ, Zhang X, Li C, Guan YQ, Wang X, Wang ZH, Zhou MG. Study of the prevalence and disease burden of chronic disease in the elderly in China. Zhonghua Liu Xing Bing Xue Za Zhi. 2019; 40:277-283. (in Chinese) [DOI] [PubMed] [Google Scholar]
- 5. Arokiasamy P, Uttamacharya U, Jain K, et al. The impact of multimorbidity on adult physical and mental health in low- and middle-income countries: What does the study on global ageing and adult health (SAGE) reveal? BMC Med. 2015; 13:178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Herrman H, Patel V, Kieling C, et al. Time for united action on depression: A Lancet-World Psychiatric Association Commission. Lancet. 2022; 399:957-1022. [DOI] [PubMed] [Google Scholar]
- 7. Torrey EF, Simmons WW, Hancq ES, Snook J. The continuing decline of clinical research on serious mental illnesses at NIMH. Psychiatr Serv. 2021; 72:1342-1344. [DOI] [PubMed] [Google Scholar]
- 8. Liu H, Zhou Z, Fan X, Shen C, Ma Y, Sun H, Xu Z. Association between multiple chronic conditions and depressive symptoms among older adults in China: Evidence from the China Health and Retirement Longitudinal Study (CHARLS). Int J Public Health. 2023; 68:1605572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Li D, Su M, Guo X, Liu B, Zhang T. The association between chronic disease and depression in middle-aged and elderly people: The moderating effect of health insurance and health service quality. Front Public Health. 2023; 11:935969. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Bundy R, Mandy W, Crane L, Belcher H, Bourne L, Brede J, Hull L, Brinkert J, Cook J. The impact of early stages of COVID-19 on the mental health of autistic adults in the United Kingdom: A longitudinal mixed-methods study. Autism. 2022; 26:1765-1782. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Chen H, Zhou Y, Huang L, Xu X, Yuan C. Multimorbidity burden and developmental trajectory in relation to later-life dementia: A prospective study. Alzheimers Dement. 2023; 19:2024-2033. [DOI] [PubMed] [Google Scholar]
- 12. You R, Li W, Ni L, Peng B. Study on the trajectory of depression among middle-aged and elderly disabled people in China: Based on group-based trajectory model. SSM Popul Health. 2023; 24:101510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Speiser JL, Callahan KE, Ip EH, Miller ME, Tooze JA, Kritchevsky SB, Houston DK. Predicting future mobility limitation in older adults: A machine learning analysis of health ABC study data. J Gerontol A Biol Sci Med Sci. 2022; 77:1072-1078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Leist AK, Klee M, Kim JH, Rehkopf DH, Bordas S, Muniz-Terrera G, Wade S. Mapping of machine learning approaches for description, prediction, and causal inference in the social and health sciences. Sci Adv. 2022; 8:eabk1942. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Ma Y, Liang L, Zheng F, Shi L, Zhong B, Xie W. Association between sleep duration and cognitive decline. JAMA Netw Open. 2020; 3:e2013573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Zhao Y, Hu Y, Smith JP, Strauss J, Yang G. Cohort profile: The China Health and Retirement Longitudinal Study (CHARLS). Int J Epidemiol. 2014; 43:61-68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP; STROBE Initiative. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. Epidemiology. 2007; 18:800-804. [DOI] [PubMed] [Google Scholar]
- 18. Cheng ST, Chan AC. The Center for Epidemiologic Studies Depression Scale in older Chinese: Thresholds for long and short forms. Int J Geriatr Psychiatry. 2005; 20:465-470. [DOI] [PubMed] [Google Scholar]
- 19. Zhou L, Ma X, Wang W. Relationship between cognitive performance and depressive symptoms in Chinese older adults: The China Health and Retirement Longitudinal Study (CHARLS). J Affect Disord. 2021; 281:454-458. [DOI] [PubMed] [Google Scholar]
- 20. Huang J, Xu T, Dai Y, Li Y, Tu R. Age-related differences in the number of chronic diseases in association with trajectories of depressive symptoms: A population-based cohort study. BMC Public Health. 2024; 24:2496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Nagin DS, Odgers CL. Group-based trajectory modeling in clinical research. Annu Rev Clin Psychol. 2010; 6:109-138. [DOI] [PubMed] [Google Scholar]
- 22. Nagin DS, Jones BL, Elmer J. Recent advances in group-based trajectory modeling for clinical research. Annu Rev Clin Psychol. 2024; 20:285-305. [DOI] [PubMed] [Google Scholar]
- 23. Zhang J, Meiser-Stedman R, Jones B, Smith P, Dalgleish T, Boyle A, Edwards A, Subramanyam D, Dixon C, Sinclaire-Harding L, Schweizer S, Newby J, McKinnon A. Trajectory of post-traumatic stress and depression among children and adolescents following single-incident trauma. Eur J Psychotraumatol. 2022; 13:2037906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Wiemken TL, Kelley RR. Machine learning in epidemiology and health outcomes research. Annu Rev Public Health. 2020; 41:21-36. [DOI] [PubMed] [Google Scholar]
- 25. Lee SW, Lee HY, Bang HJ, Song HJ, Kong SW, Kim YM. An improved prediction model for ovarian cancer using urinary biomarkers and a novel validation strategy. Int J Mol Sci. 2019; 20:4938. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. BMJ. 2015; 350:g7594. [DOI] [PubMed] [Google Scholar]
- 27. Hu M, Shu X, Yu G, Wu X, Välimäki M, Feng H. A risk prediction model based on machine learning for cognitive impairment among Chinese community-dwelling elderly people with normal cognition: Development and validation study. J Med Internet Res. 2021; 23:e20298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Khan T, Jacobs PG. Prediction of mild cognitive impairment using movement complexity. IEEE J Biomed Health Inform. 2021; 25:227-236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Yadgir SR, Engstrom C, Jacobsohn GC, Green RK, Jones CMC, Cushman JT, Caprio TV, Kind AJH, Lohmeier M, Shah MN, Patterson BW. Machine learning-assisted screening for cognitive impairment in the emergency department. J Am Geriatr Soc. 2022; 70:831-837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Hao M, Wang Y, Bryant SH. An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data. Anal Chim Acta. 2014; 806:117-127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Zhu J, Wu Y, Lin S, Duan S, Wang X, Fang Y. Identifying and predicting physical limitation and cognitive decline trajectory group of older adults in China: A data-driven machine learning analysis. J Affect Disord. 2024; 350:590-599. [DOI] [PubMed] [Google Scholar]
- 32. Dong BR, Gu XQ, Chen HY, Gu J, Pan ZG. Development and validation of a nomogram to predict frailty progression in nonfrail Chinese community-living older adults. J Am Med Dir Assoc. 2021; 22:2571-2578.e4. [DOI] [PubMed] [Google Scholar]
- 33. Zhang L, Cui H, Chen Q, Li Y, Yang C, Yang Y. A web-based dynamic nomogram for predicting instrumental activities of daily living disability in older adults: A nationally representative survey in China. BMC Geriatr. 2021; 21:311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Wu Y, Xiang C, Jia M, Fang Y. Interpretable classifiers for prediction of disability trajectories using a nationwide longitudinal database. BMC Geriatr. 2022; 22:627. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Black CL, Williams WW, Arbeloa I, Kordic N, Yang L, MaCurdy T, Worrall C, Kelman JA. Trends in influenza and pneumococcal vaccination smong US nursing home residents, 2006-2014. J Am Med Dir Assoc. 2017; 18:731-735.e14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Wang Y, O'Neil A, Jiao Y, Wang L, Huang J, Lan Y, Zhu Y, Yu C. Sex differences in the association between diabetes and risk of cardiovascular disease, cancer, and all-cause and cause-specific mortality: A systematic review and meta-analysis of 5,162,654 participants. BMC Med. 2019; 17:136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Reisner SL, Katz-Wise SL, Gordon AR, Corliss HL, Austin SB. Social epidemiology of depression and anxiety by gender identity. J Adolesc Health. 2016; 59:203-208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Noh JW, Kwon YD, Park J, Oh IH, Kim J. Relationship between physical disability and depression by gender: A panel regression model. PLoS One. 2016; 11:e0166238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Chunnan L, Shaomei S, Wannian L. The association between sleep and depressive symptoms in US adults: Data from the NHANES (2007-2014). Epidemiol Psychiatr Sci. 2022; 31:e63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Martin Ginis KA, Sharma R, Brears SL. Physical activity and chronic disease prevention: Where is the research on people living with disabilities? CMAJ. 2022; 194:E338-E340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Picaza Gorrochategi M, Eiguren Munitis A, Dosil Santamaria M, Ozamiz Etxebarria N. Stress, anxiety, and depression in people aged over 60 in the COVID-19 outbreak in a sample collected in Northern Spain. Am J Geriatr Psychiatry. 2020; 28:993-998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Formánek T, Csajbók Z, Wolfová K, Kučera M, Tom S, Aarsland D, Cermakova P. Trajectories of depressive symptoms and associated patterns of cognitive decline. Sci Rep. 2020; 10:20888. [DOI] [PMC free article] [PubMed] [Google Scholar]
