Abstract
Pain is common in middle-aged and older adults, has also been identified as a fall risk factor, whereas the mechanism of falls in pain is unclear. This study included 13,074 middle-aged and older adults from the China health and retirement longitudinal study (wave 2011–2015) to separately develop four-year fall risk prediction models for older adults with and without pain, using five machine learning algorithms with 145 input variables as candidate features. Shapley Additive exPlanations (SHAP) was used for the prediction model explanations. Adjusted logistic regression (LR) models showed that pain (OR 1.40 [1.29, 1.53]) was associated with a higher fall risk. Among pain characteristics, lower limb pain had the highest risk (OR 1.71 [1.22, 2.18]), followed by severe pain (OR 1.53 [1.36, 1.73]) and multisite pain (OR 1.43 [1.28, 1.55]). Among the fall prediction models for pain and non-pain, the LR model performed best with AUC-ROC values of 0.732 and 0.692, respectively. Common important features included fall history and height. Unique features for the pain model were functional limitation, SPPB, WBC, chronic disease score, life satisfaction, platelets, cooking fuel, and pain quantity, while marital status, age, depressive symptoms, cognitive function, hearing, rainy days, tidiness, and sleep duration were exclusive to the non-pain model. Pain characteristics are associated with falls among middle-aged and older adults. Prediction model can help identify people at high risk of falls with pain. Important features of falls differ between pain and non-pain populations, and prevention strategies should target specific populations for fall risk prediction.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-025-01651-6.
Keywords: Pain, Falls, Machine learning, Risk factors, Older adults
Subject terms: Trauma, Geriatrics, Fracture repair, Epidemiology
Introduction
Falls in older people are common worldwide, and the number is increasing further1. Worldwide, approximately 30% of people aged 65 and older experience falls annually, and this incidence escalates with advancing age2. Falls are the leading cause of injury-related medical visits and death among people 65 years and above in China3. Apart from causing personal distress, falls present a grave healthcare issue due to their link to subsequent hospitalization, disability, and mortality4. Falls are widely recognized as multifactorial events, influenced by physiological decline (e.g., muscle weakness, balance impairment), chronic conditions (e.g., arthritis, diabetes), and environmental hazards (e.g., uneven surfaces, poor lighting)1,2. Recent studies have identified additional risk factors, including poor sleep quality and daytime sleepiness5, physical performance6, prior hospitalizations7, and post-acute care status8, further highlighting the complexity of fall mechanisms. Although many studies have shown that multifactorial interventions can effectively reduce fall risk in some older adults, there remains room for improvement, and more precise, targeted strategies are still needed9,10. Therefore, more efforts are needed to deepen our understanding of these risk factors and develop tailored intervention strategies for specific populations11.
Pain is a prevalent, debilitating, and costly condition that disproportionately affects older adults and is increasingly recognized as a critical risk factor for falls12,13. Notably, both pain and falls are trending younger, extending their relevance to middle-aged populations2. While prior studies have established an association between pain and fall risk, it remains unclear whether the underlying mechanisms differ between individuals with and without pain13. In addition, pain-related factors may vary considerably, such as single versus multiple pain sites, and severity of pain. Hence, building predictive models of fall risk in both pain and non-pain populations and comparing the important predictive features can facilitate insights into the mechanisms of influence and enable early and precise screening. Machine learning (ML) methods have been extensively utilized in clinical predictive models in recent years, facilitating early identification of high-risk populations14. Compared to traditional predictive models (e.g., Logistic regression and Cox regression), the most prominent advantage of machine learning approaches is handling high-dimensional data15. Moreover, the new frameworks have enhanced the interpretability of complex ML models, allowing for actionable insights from the models16, which could provide insights into the mechanism influencing falls among pain and non-pain populations.
This present study aims to (1) examine the association between pain and falls among middle-aged and older adults; (2) use ML methods to build 4-year fall risk prediction models among pain and non-pain, and (3) to compare the essential factors in different fall models and to develop targeted strategies for fall prevention.
Methods
Study design
The data were obtained from the China health and retirement longitudinal study (CHARLS)17, publicly available at http://charls.pku.edu.cn. The CHARLS project, organized by the National Development Institute of Peking University and approved by the Ethics Review Committee, is a nationally representative longitudinal survey, launched in 2011 with the aim of monitoring the health of middle-aged and older adults (mainly those aged ≥ 45 years) in 450 villages or communities in 150 counties across 28 provinces in China19. The original CHARLS was approved by the Ethical Review Committee of Peking University (IRB00001052-11015), and all participants signed the informed consent at the time of participation. To ensure sample size in the prediction phase of the model, data from waves 1 to 3 (2011, 2013 and 2015) were selected. Participants were excluded if any of the following criteria were met: (1) age < 45 years and incomplete information about age and sex, (2) did not participate in the 2013 and 2015 follow-ups, (3) incomplete information about pain in wave 2011 and incomplete information about falls in wave 2013 and 2015. The final sample size of this study was 13,074 (Fig. 1).
Fig. 1.
Flow chart of participant selection.
Outcome variables and input variables
Falls
A fall is an unexpected event in which an individual comes to rest on the ground, floor, or lower level18. Fall-related injury is an injury resulting in medical attention including hospitalization for a fall such as fractures, joint dislocation, head injury, sprain or strain, bruising, swelling, laceration, or other serious injury following a fall. The outcome of this study is based on the question from two follow-up surveys in 2013 and 2015: Have you fallen down since the last interview? and Have you fallen down seriously enough to need medical treatment?
Pain
Pain characteristics were assessed by three questions in wave 2011: “Are you often troubled with anybody pains?” Participants who answered “yes” were considered to have pain and were subsequently asked two follow-up questions “On what part of your body do you feel pain [head, shoulder, arm, wrist, fingers, chest, stomach, back, waist, buttocks, leg, knees, ankles, toes, and neck]?” and “How bad is your pain [mild, moderate or severe]?” Participants who reported pain in at least two or more of the 15 listed body parts were categorized as having multisite pain.
Input variables
Based on the literature reviews of fall risk factors and data availability in the current database, variables with more than 20% missing information were excluded from the analysis2,11,19,20. Notably, samples were not excluded. Finally, a total of 145 input variables were selected in the baseline (wave 2011). We divided all input variables into four sets based on the Social-Ecological Model: a framework for prevention21, including individual, relationship, community, and societal factors. For individual level, we included: (1) Sociodemographic variables such as age and sex; (2) Health and lifestyle variables including comorbidity and medication use (hypertension, dyslipidemia, diabetes, stroke, and other chronic conditions), smoking, alcohol consumption, sleep duration, and leisure activities; (3) Psychological variables including depressive symptoms (evaluated by the 10-item Center for epidemiological study of depression scale [CESD-10]22, with scores of 10 or higher was identified as exhibiting depressive symptoms) and cognitive function23 (including three dimensions of orientation and attention, episodic memory, and visuo-construction, with scores ranging from 0 to 31). (4) Physical condition including lung function, grip strength, Short Physical Performance Battery (SPPB), height and weight. The SPPB was evaluated by using tests of gait speed, standing balance and repeated chair stands, with score (ranging from 0 to 12) divided into three categories: poor (0–6), fair (7–9), and good (10–12)24. (5) Blood indices including white blood cell (WBC) count, blood urea nitrogen, platelets, and cystatin C. For relationship level, we included marital status, living alone, occupation, and residence. For community level, we included: (1) Home environment variables such as handicapped facilities, type of toilet, cooking fuel, and tidiness; (2) Community environment variables including type of road, public facilities, socio-economic status, tidiness of the roads, and industrial pollution. For societal level, we included: income, insurance and retirement. Variables were primarily collected through questionnaires and measured parameters. Assignments of variables are presented in Appendix Table S1.
Statistical analysis
Association between pain characteristics and falls stages
In examining the association between pain characteristics and falls, we conducted analyses using logistic regression (LR) models, controlling for the following confounding factors based on other references12,25,26: sociodemographic characteristics (i.e., age, sex, education, and residence), health and lifestyle conditions (i.e., chronic disease score, polypharmacy score, vision, hearing, smoking, alcohol consumption, and body mass index [BMI]), and physical condition (i.e., SPPB). Additional sex-stratified analyses were performed to explore the association between pain-related variables and falls within each sex group separately.
Machine-learning stages
Based on biomedical research guideline recommendations and model features, five commonly used ML methods (logistic regression [LR], Naive Bayesian [NB], random forest [RF], extreme gradient boosting [XGBoost] and artificial neural network [ANN]) were used to build risk prediction models for falls among older adults with pain and non-pain. Among these algorithms, LR is widely recognized as a classical algorithm in statistical methods27. NB is a probabilistic algorithm based on probability theory, capable of handling small-sample data and addressing multi-class classification problems28. RF29 and XGBoost30 are ensemble techniques that utilize decision trees based on the principles of bagging and boosting to minimize the risks of underfitting and overfitting. ANN is a fundamental neural network model, proficient in parallel processing and accelerating the handling of intricate nonlinear relationships31.
In this study, we meticulously developed model building and evaluation, strictly adhering to the TRIPOD process32. The original dataset was randomly divided into training and testing dataset in a ratio of 7:3. After separating the test set, pre-processing, feature selection, and hyperparameter tuning were first performed on the training set to avoid data leakage and result bias. Data pre-processing encompasses several steps: outlier elimination, imputation, and data normalization. Among these, the MissForest algorithm was employed for missing value imputation, which effectively addresses colinearity issues and simultaneously fills continuous and categorical variables33. For feature selection, the method employed was the most commonly used least absolute shrinkage and selection operator34. Subsequently, 5-fold cross-validation and Bayesian optimization method for hyperparameter tuning were performed in each ML model, and the test set were only used to evaluate the final performance of the classifiers. For model evaluation, this study comprehensively evaluated and screened the optimal prediction models in terms of two dimensions: discrimination and calibration. The discrimination includes four metrics, i.e., accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC-ROC). The primary performance measure utilized is the AUC-ROC, wherein a more excellent value denotes an enhanced model35. Sensitivity was considered as an additional important metric for model assessment, particularly when comparing models with similar performance across other indicators36. The Brier score was selected for calibration, with a lower score indicating a better fit37. The Shapley Additive exPlanations (SHAP) value was used to evaluate the contribution of each predictor in prediction models38. We also utilized partial dependence plots to visualize the impact of individual predictors on the predicted outcome, showing how variations in specific predictors influence the model’s predictions.
For the distributional characteristics of the baseline variables, the mean (standard deviation, SD) or medians (interquartile range, IQR) were used to describe the continuous variables, and number (percentage) were applied to describe categorical variables. The t-test, Wilcoxon rank sums, and Chi-square tests were selected for analysis when comparing between-group differences in falls and non-falls. DeLong’s test was conducted to evaluate the differences in AUC-ROC.
Descriptive analysis and logistic regression analyses were conducted using R4.3.1, while data preprocessing, features selection, ML model building, and evaluation were executed using Python 3.7.6. A two-sided test was utilized, and P < 0.05 was considered statistically significant.
Results
A total of 17,708 participants were enrolled in the baseline survey (wave 2011). According to the exclusion criteria, the final sample was narrowed down to 13,074. Among them, 4,358 (33.3%) were with pain and 8,716 (66.7%) were with non-pain, with the pain group having a mean age of 59.6 years (SD = 9.2 years) and the non-pain group having a mean age of 58.7 years (SD = 9.4 years). The male percentages were 38.5% (1679/4358) and 52.7% (4592/8716) in two groups (Appendix Table S2). The 4-year occurrence of falls was significantly higher in middle-aged and older adults with pain than in those with non-pain (42.9 and 29.5%, P < 0.001) (Appendix Table S3), similar to that of fall-related injuries (45.6 and 31.4%, P < 0.001) (Appendix Table S4).
The adjusted Logistic regression results show that after adjusting for confounders, middle-aged and older adults with pain had a higher risk of falls than those without pain (adj. OR 1.40, 95% CI 1.29, 1.53) (Table 1). Different pain characteristics were associated with falls, including pain site, pain severity, and pain quantity. Among these, lower limb pain was associated with the highest fall risk (adj. OR 1.71, 95% CI 1.22, 2.18), followed by severe pain (adj. OR 1.53, 95% CI 1.36, 1.73) and multisite pain (adj. OR 1.43, 95% CI 1.28, 1.55). In fall-related injuries (Appendix Table S5), there were significant differences except for head/neck and lower limb. In the sex-stratified analysis (Appendix Table S6), there were significant differences except for single site pain in males and mild pain severity level in females.
Table 1.
Multivariate analysis of the association between pain characteristics and falls (N = 13,074).
Pain characteristics | n | No. of falls | Model 1a | Model 2b | Model 3c | Model 4d |
---|---|---|---|---|---|---|
Adj. OR (95% CI) | ||||||
Pain | ||||||
No | 8716 | 2135 | 1.0 | 1.0 | 1.0 | 1.0 |
Yes | 4358 | 1603 | 1.79 (1.66, 1.94) | 1.67 (1.54, 1.81) | 1.43 (1.31, 1.56) | 1.40 (1.29, 1.53) |
Pain site | ||||||
No pain | 8716 | 2135 | 1.0 | 1.0 | 1.0 | 1.0 |
Head/neck | 771 | 271 | 1.67 (1.43, 1.95) | 1.59 (1.36, 1.86) | 1.38 (1.17, 1.62) | 1.36 (1.15, 1.59) |
Trunk | 990 | 334 | 1.57 (1.36, 1.81) | 1.49 (1.29, 1.72) | 1.34 (1.16, 1.55) | 1.32 (1.14, 1.52) |
Upper limb | 2308 | 882 | 1.91 (1.73, 2.10) | 1.76 (1.59, 1.95) | 1.45 (1.30, 1.61) | 1.42 (1.27, 1.58) |
Lower limb | 289 | 116 | 2.01 (1.63, 2.63) | 1.89 (1.49, 2.41) | 1.75 (1.37, 2.24) | 1.71 (1.33, 2.18) |
Pain severity | ||||||
No pain | 8716 | 2135 | 1.0 | 1.0 | 1.0 | 1.0 |
Mild | 1078 | 348 | 1.47 (1.28, 1.69) | 1.40 (1.21, 1.60) | 1.25 (1.09, 1.44) | 1.25 (1.08, 1.43) |
Moderate | 1576 | 574 | 1.77 (1.58, 1.98) | 1.64 (1.46, 1.85) | 1.40 (1.24, 1.58) | 1.39 (1.23, 1.57) |
Severe | 1704 | 681 | 2.05 (1.84, 2.29) | 1.90 (1.70, 2.13) | 1.59 (1.41, 1.79) | 1.53 (1.36, 1.73) |
Pain quantity | ||||||
No pain | 8716 | 2135 | 1.0 | 1.0 | 1.0 | 1.0 |
Single site pain | 928 | 308 | 1.53 (1.32, 1.77) | 1.49 (1.29, 1.73) | 1.40 (1.20, 1.62) | 1.37 (1.18, 1.59) |
Multisite pain | 3430 | 1295 | 1.87 (1.72, 2.04) | 1.73 (1.58, 1.88) | 1.44 (1.31, 1.58) | 1.43 (1.28, 1.55) |
Adj. OR adjusted odds ratio, CI confidence interval, SPPB short physical performance battery.
aModel 1 estimated unadjusted odds ratio from logistic regression models.
bModel 2 was adjusted for age, sex, education, marital status, residence.
cModel 3 was additionally adjusted for chronic disease score, polypharmacy score, vision, hearing, smoking, alcohol consumption, and BMI.
dModel 4 was additionally adjusted for SPPB.
The values in bold indicate statistically significant results (p < 0.05).
Based on the association between pain and falls, our machine learning models further revealed population-specific risk profiles. During the baseline survey, 24 and 27 input variables of falls with pain and non-pain individuals were selected from the 145 candidate features through the least absolute shrinkage and selection operator regression algorithm (Appendix Table S7 and S8). The comparison of machine learning models (Table 2) indicated that LR achieved the best predictive performance, with an AUC-ROC of 0.732 in the pain population and 0.692 in the non-pain population (Fig. 2 and Appendix Table S9). Notably, the higher AUC-ROC in the pain group suggests that pain status may serve as a strong predictive factor for falls. The lowest Brier score further confirmed LR’s superior calibration (pain group: 0.197 vs. non-pain group: 0.165).
Table 2.
Performance of the five ML models for predicting falls among pain and non-pain populations on the test set.
ML models | Threshold | AUC-ROC (95% CI) | Accuracy | Sensitivity | Specificity | Brier score |
---|---|---|---|---|---|---|
Pain | ||||||
LR | 0.385 | 0.732 (0.695–0.766) | 0.708 | 0.706 | 0.714 | 0.197 |
NB | 0.239 | 0.711 (0.678–0.756) | 0.687 | 0.716 | 0.623 | 0.240 |
RF | 0.378 | 0.731 (0.691–0.763) | 0.687 | 0.669 | 0.815 | 0.201 |
XGBoost | 0.358 | 0.727 (0.681–0.758) | 0.686 | 0.667 | 0.831 | 0.202 |
ANN | 0.385 | 0.729 (0.689–0.763) | 0.709 | 0.703 | 0.729 | 0.198 |
Non-pain | ||||||
LR | 0.242 | 0.692 (0.647–0.738) | 0.772 | 0.781 | 0.638 | 0.165 |
NB | 0.223 | 0.682 (0.626–0.741) | 0.737 | 0.813 | 0.459 | 0.220 |
RF | 0.247 | 0.691 (0.636–0.747) | 0.774 | 0.773 | 0.784 | 0.166 |
XGBoost | 0.220 | 0.651 (0.599–0.698) | 0.770 | 0.772 | 0.730 | 0.171 |
ANN | 0.273 | 0.688 (0.638–0.743) | 0.760 | 0.760 | 0.733 | 0.168 |
ML machine learning, LR logistic regression, NB naive Bayesian, RF random forest, XGBoost extreme gradient boosting, ANN artificial neural network, AUC-ROC area under the receiver operating characteristic curve.
Fig. 2.
Receiver operating characteristic curve performance of five models on the test set. (A) Pain populations. (B) Non-pain populations. ML machine learning, LR logistic regression, NB naive Bayesian, RF random forest, XGBoost extreme gradient boosting, ANN artificial neural network, AUC-ROC area under the receiver operating characteristic curve.
Among the top 10 important features identified by SHAP analysis (Fig. 3) based on LR model, fall history and height were shared predictors among both populations. However, the pain model uniquely prioritized biomarkers (e.g., WBC and platelets) and physical function (functional limitations and SPPB), chronic disease score, life satisfaction, cooking fuel, and pain quantity. In contrast, the non-pain model emphasized mental health factors (e.g., depressive symptoms and cognitive function), environmental factors (e.g., rainy days and tidiness), sociodemographic factors (e.g., age and marital status), physiological factors like hearing, and health behaviors like sleep duration. We further analyzed the effect of 10 important predictors in the pain population on fall prediction using Partial Dependence Plots (Appendix Figure S1). Specifically, fall history, functional limitation, decreased SPPB, elevated WBC, increased chronic disease score, lower life satisfaction, polluted cooking fuel, and increased pain quantity were associated with a higher predicted probability of falls. In contrast, improved platelets and greater height were associated with a decreased likelihood of falls. In the SHAP analysis of the pain population, 7 features consistently ranked among the top 10 predictors across five ML models (Appendix Figure S2): fall history, functional limitation, SPPB, WBC, chronic disease score, height, and pain quantity.
Fig. 3.
Feature Importance Ranking (top 10) with the SHAP summary plot for the logistic regression models. (A) Pain population model, (B) Non-pain population model. SPPB short physical performance battery, WBC white blood cell.
For the age-stratified prediction models (Appendix Table S10), the AUC-ROC of the LR model was higher in different age groups for the pain population than for the non-pain population (0.720 vs. 0.677 for 45–60 years and 0.722 vs. 0.676 for ≥ 60 years). Common predictors for both age groups in the pain population included fall history, functional limitations, chronic disease score, platelets, and WBC (Appendix Figure S3).
Discussion
In this study of 13,074 Chinese middle-aged and older adults, we discovered significant associations between pain characteristics (i.e., status, site, severity, and quantity) and falls. In 4-year fall risk prediction models among 4,358 and 8,716 middle-aged and older adults with pain and non-pain, LR demonstrated optimal performance in both models, with an AUC-ROC of 0.73 in the pain group and 0.69 in the non-pain group. LR was chosen as the optimal model not only for its higher AUC-ROC but also for its better overall performance across metrics like sensitivity and Brier score, suggesting that its effectiveness in identifying older adults at high risk of falling. Baseline fall history and height were identified as jointly important predictive features for both groups.
The superior performance of LR over complex machine learning models (e.g., NB, RF, and XGBoost) or deep learning models (e.g., ANN) may result from its ability to balance interpretability, computational efficiency, and robustness to moderate sample sizes27,39, demonstrating that in clinical settings where transparency is paramount, LR provides actionable insights through feature weights, while avoiding overfitting risks associated with high-dimensional data. Besides, it’s worth noting that retrospective self-reports of falls over a two-year period may underestimate true incidence40, particularly for non-injurious or single falls41, and the four-year prediction window may not accurately reflect short-term fall risk due to changes in lifestyle, environment, and other uncontrollable factors, which can affect prediction accuracy. Therefore, caution is needed when applying this model. Future studies should prioritize prospective monitoring to improve temporal resolution and accuracy18.
In the SHAP analysis of LR, baseline fall history and height were identified as key predictive features shared by both groups. Additionally, the two features consistently appeared as key predictors across all five models built for the pain population. Numerous studies have shown that a history of falls is the most significant risk factor for future falls11,42,43. Even though a history of falls may prompt some older adults to take steps to prevent falls, for others (e.g., older adults with pain), the causes of falls are more likely to result in another fall or even fall-related injury. For the association between height and falls, study has found that older adults with height loss have an increased risk of falling44. On one hand, shorter older adults have a poorer field of vision45, while on the other hand, factors like osteoporosis, vertebral disc compression, posture issues, kyphosis, and muscle atrophy contribute to height reduction46. These factors collectively make older adults more prone to gait and balance issues, increasing the likelihood of falls.
In terms of physical conditions, pain can contribute to falls by affecting balance and functional activity. A systematic review of 39 articles revealed an association between pain and impaired static, dynamic, multi-component, and reactive balance among older adults47. Another study of 600 older adults found that multisite pain was associated with weak lower extremity function (assessed by SPPB)48. An Indian study found that difficulty in activities of daily living and instrumental activities of daily living with pain among older adults were 2.28 and 1.67 times higher than those without pain, respectively49. These indicators of physical condition are essential factors in falls and are vital in evaluating life satisfaction in older adults.
Regarding blood indices, WBC and platelets are important features for the predictive model. Most studies indicate that inflammation often accompanies the condition in pain cases, leading to an elevation in WBC levels50. Autoimmune disorders (e.g., platelet abnormalities) or prolonged chronic inflammation can lead to muscle weakness, decreased balance and responsiveness through increased protein metabolism, oxidative stress, and interference with neural control and endocrine regulation. Additionally, inflammation may lead to vascular damage and nerve inflammation, affecting blood flow and nerve conduction and increasing the risk of falls51. Some studies have also found that unclean cooking fuels are associated with health outcomes in older adults52,53. The underlying mechanisms driving these associations point towards inflammation and oxidative stress. Notably, pain is prevalent in chronic conditions like cancer, heart failure, kidney diseases, and musculoskeletal disorders, occasionally surpassing the prominence of their cardinal symptoms54,55.
In the non-pain model, the important features are mainly sociodemographic, psychological, and environmental variables. The main reason why psychological factors (i.e., depressive symptoms and cognitive function) contribute to falls is that mood problems and cognitive decline tend to lead to distraction and hesitancy to act, and a reduced ability to perceive the surrounding environment, which makes it difficult to recognize potential fall risk factors11. Moreover, environmental risk factors, whether in the home or community environment, are important causes of falls, especially for younger older adults1.
The strength of this study is that, to the authors’ knowledge, it is the first study on the association and risk prediction of pain and falls among Chinese older adults, with a large sample size and comprehensive variables (especially including community environment variables). There are also some limitations to this research. First, relying on retrospective self-reporting of falls over a two-year period may introduce recall bias. Future research should collect fall information through weekly or monthly diaries or phone follow-ups, in order to minimize the underestimation of fall incidence. Second, while our study analyzed pain by subgroups, we did not develop model predictions for different subgroups. Future studies may build predictive models for different subgroups to more accurately capture different fall risk factors and provide more precise risk predictions for subgroups. Third, while CHARLS includes a physical activity questionnaire, incomplete responses from half of the participants led us to exclude this variable. Instead, we incorporated leisure activity participation, which includes a “participation in exercise” subcomponent, and SPPB to partially compensate for this limitation. Fourth, the current ML models used in this study could identify only associations between falls and potential factors, but they could not establish causal inferences. Moreover, this study only conducted internal validation and lacked external validation to assess the model’s generalization ability. Future research should incorporate samples from various regions to externally validate the model.
In conclusion, this study suggests that older adults with pain, whether measured by pain or pain site, pain severity, or multisite pain, have a 4-year increased risk of falls. Important features of falls differ between pain and non-pain populations, and prevention strategies should target specific populations for fall risk prediction. Machine learning methods have application value in fall risk prediction in older adults with pain and can provide a scientific reference for early screening of falls, but future studies with external validation and refined datasets are needed to confirm their broader applicability.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Acknowledgements
This work was supported by Shantou Science and Technology Plan Medical and Health Project (240506186498668, 240922169602002). The funders were not involved in the study design, data collection, analysis, decision to publish, or manuscript preparation. We expressed our gratitude to the CHARLS research team, the field team, and every respondent.
Author contributions
SC: Conceptualization, Data curation, Formal Analysis, Visualization, Software, Writing—original draft, Writing—review & editing. YG: Data curation, Software, Visualization, Writing—review & editing. LD: Data curation, Resources, Writing—review & editing. MM: Formal Analysis, Resources, Writing—review & editing. LX: Writing—review & editing. LL: Conceptualization, Writing—review & editing. XC: Conceptualization, Formal Analysis, Methodology, Writing—original draft, Writing—review & editing. ZZ: Project administration, Methodology, Supervision, Funding acquisition, Writing—review & editing.
Data availability
The datasets generated and/or analysed during the current study are available in the CHARLS repository, http://charls.pku.edu.cn.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Xiaodong Chen, Email: xdchen0754@163.com.
Zhigang Zhong, Email: stzzg@163.com.
References
- 1.Who. Step Safely: Strategies for Preventing and Managing Falls across the life-course. Geneva: World Health Organization (2021).
- 2.Who. Who Global Report on Falls Prevention in Older Age. Geneva: World Health Organization (2007).
- 3.China disease prevention and control center. Technical guidelines for falls intervention in the elderly. http://www.nhc.gov.cn/cms-search/xxgk/getManuscriptXxgk.htm?id=52857
- 4.Global burden. Of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the global burden of disease study 2019. Lancet396 (10258), 1204–1222. 10.1016/S0140-6736(20)30925-9 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Salis, F. et al. Sleep quality, daytime sleepiness, and risk of falling: results from an exploratory cross-sectional study. Eur. Geriatr. Med.16 (1), 197–204. 10.1007/s41999-024-01092-w (2025). [DOI] [PubMed] [Google Scholar]
- 6.Salis, F. & Mandas, A. Physical performance and falling risk are associated with five-year mortality in older adults: an observational cohort study. Medicina-Lithuania59 (5). 10.3390/medicina59050964 (2023). [DOI] [PMC free article] [PubMed]
- 7.Adams, C. M., Tancredi, D. J., Bell, J. F., Catz, S. L. & Romano, P. S. Associations between home injury falls and prior hospitalizations in community dwelling older adults: a population case-crossover study. Injury51 (2), 260–266. 10.1016/j.injury.2019.11.035 (2020). [DOI] [PubMed] [Google Scholar]
- 8.Adams, C. M., Tancredi, D. J., Bell, J. F., Catz, S. L. & Romano, P. S. Risk of home falls among older adults after acute care hospitalization: a cohort study. J. Trauma. Nurs.31 (6), 281–289. 10.1097/JTN.0000000000000816 (2024). [DOI] [PubMed] [Google Scholar]
- 9.Pillay, J. et al. Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences. Syst. Rev-London. 13 (1), 289. 10.1186/s13643-024-02681-3 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Guirguis-Blake, J. M., Perdue, L. A., Coppola, E. L. & Bean, S. I. Interventions to prevent falls in older adults: updated evidence report and systematic review for the Us preventive services task force. Jama-J Am. Med. Assoc.332 (1), 58–69. 10.1001/jama.2024.4166 (2024). [DOI] [PubMed] [Google Scholar]
- 11.Montero-Odasso, M. et al. World guidelines for falls prevention and management for older adults: a global initiative. Age Ageing51 (9). 10.1093/ageing/afac205 (2022). [DOI] [PMC free article] [PubMed]
- 12.Hirase, T., Okubo, Y., Menant, J., Lord, S. R. & Sturnieks, D. L. Impact of pain on reactive balance and falls in community-dwelling older adults: a prospective cohort study. Age Ageing. 49 (6), 982–988. 10.1093/ageing/afaa070 (2020). [DOI] [PubMed] [Google Scholar]
- 13.Munch, T. et al. Pain and falls and fractures in community-dwelling older men. Age Ageing. 44 (6), 973–979. 10.1093/ageing/afv125 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chekroud, A. M. et al. The promise of machine learning in predicting treatment outcomes in psychiatry. World Psychiatry. 20 (2), 154–170. 10.1002/wps.20882 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Hawkins, D. M. The problem of overfitting. J. Chem. Inf. Comput. Sci.44 (1), 1–12 (2004). [DOI] [PubMed] [Google Scholar]
- 16.Bi, Q., Goodman, K. E., Kaminsky, J. & Lessler, J. What is machine learning? A primer for the epidemiologist. Am. J. Epidemiol.188 (12), 2222–2239. 10.1093/aje/kwz189 (2019). [DOI] [PubMed] [Google Scholar]
- 17.Zhao, Y., Hu, Y., Smith, J. P., Strauss, J. & Yang, G. Cohort profile: the China health and retirement longitudinal study (charls). Int. J. Epidemiol.43 (1), 61–68. 10.1093/ije/dys203 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Jehu, D. A. & Skelton, D. A. The measurement and reporting of falls: recommendations for research and practice on defining faller types. J. Frailty Sarcopenia Falls. 8 (4), 200–203. 10.22540/JFSF-08-200 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wu, Y., Wang, X., Gu, C., Zhu, J. & Fang, Y. Investigating predictors of progression from mild cognitive impairment to Alzheimer’s disease based on different time intervals. Age Ageing52 (9). 10.1093/ageing/afad182 (2023). [DOI] [PMC free article] [PubMed]
- 20.Chen, X. & Li, L. Prediction of sarcopenia at different time intervals: an interpretable machine learning analysis of modifiable factors. BMC Geriatr.25 (1), 133. 10.1186/s12877-025-05792-1 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Centers for disease control and prevention. The social-ecological model: a framework for prevention. https://www.cdc.gov/violenceprevention/about/social-ecologicalmodel.html
- 22.Andresen, E. M., Malmgren, J. A., Carter, W. B. & Patrick, D. L. Screening for depression in well older adults: evaluation of a short form of the ces-d (center for epidemiologic studies depression scale). Am. J. Prev. Med.10 (2), 77–84 (1994). [PubMed] [Google Scholar]
- 23.Zhai, D. et al. The effect of water source on cognitive functioning in Chinese adults: a cross-sectional and follow-up study. Ecotox Environ. Safe230 (113156). 10.1016/j.ecoenv.2021.113156 (2021). [DOI] [PubMed]
- 24.Guralnik, J. M. et al. Lower extremity function and subsequent disability: consistency across studies, predictive models, and value of gait speed alone compared with the short physical performance battery. J. Gerontol. A-Biol.55 (4), M221–M231. 10.1093/gerona/55.4.m221 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Cai, Y., Leveille, S. G., Shi, L., Chen, P. & You, T. Chronic pain and risk of injurious falls in community-dwelling older adults. J. Gerontol. A-Biol.76 (9), e179–e186. 10.1093/gerona/glaa249 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zhu, X. et al. Associations of pain and sarcopenia with successful aging among older people in China: evidence from charls. J. Nutr. Health Aging. 27 (3), 196–201. 10.1007/s12603-023-1892-2 (2023). [DOI] [PubMed] [Google Scholar]
- 27.Christodoulou, E. et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J. Clin. Epidemiol.110, 12–22. 10.1016/j.jclinepi.2019.02.004 (2019). [DOI] [PubMed] [Google Scholar]
- 28.Berger, J. O. Statistical Decision Theory and Bayesian Analysis (Statistical Decision Theory and Bayesian Analysis, 1985).
- 29.Breiman Random forests. Mach Learn45 (1), 5–32. (2001).
- 30.Freund, Y. & Schapire, R. E. A desicion-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci.55, 119–139 (1997). [Google Scholar]
- 31.Basheer, I. A. & Hajmeer, M. Artificial neural networks: fundamentals, computing, design, and application. J. Microbiol. Meth. 43 (1), 3–31. 10.1016/s0167-7012(00)00201-3 (2000). [DOI] [PubMed] [Google Scholar]
- 32.Luo, W. et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J. Med. Internet Res.18 (12), e323. 10.2196/jmir.5870 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Stekhoven, D. J. & Buhlmann, P. Missforest–non-parametric missing value imputation for mixed-type data. Bioinformatics28 (1), 112–118. 10.1093/bioinformatics/btr597 (2012). [DOI] [PubMed] [Google Scholar]
- 34.Tibshirani, R. Regression shrinkage and selection via the Lasso. J. Royal Stat. Soc. Ser. B58 (1). (1996).
- 35.van den Bosch, T. et al. Predictors of 30-day mortality among Dutch patients undergoing colorectal cancer surgery, 2011–2016. Jama Netw. Open4 (4), e217737. 10.1001/jamanetworkopen.2021.7737 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Park, S. H. Tools for assessing fall risk in the elderly: a systematic review and meta-analysis. Aging Clin. Exp. Res.30 (1), 1–16. 10.1007/s40520-017-0749-0 (2018). [DOI] [PubMed] [Google Scholar]
- 37.Redelmeier, D. A., Bloch, D. A. & Hickam, D. H. Assessing predictive accuracy: how to compare Brier scores. J. Clin. Epidemiol.44 (11), 1141–1146. 10.1016/0895-4356(91)90146-z (1991). [DOI] [PubMed] [Google Scholar]
- 38.Lundberg, S. & Lee, S. I. A unified approach to interpreting model predictions. Adv. Neural. Inf. Process. Syst.201730, 4766–4775.
- 39.Hawkins, D. M. The problem of overfitting. J. Chem. Inf. Comput. Sci.44 (1), 1–12. 10.1021/ci0342472 (2004). [DOI] [PubMed] [Google Scholar]
- 40.Cummings, S. R., Nevitt, M. C. & Kidd, S. Forgetting falls. The limited accuracy of recall of falls in the elderly. J. Am. Geriatr. Soc.36 (7), 613–616. 10.1111/j.1532-5415.1988.tb06155.x (1988). [DOI] [PubMed] [Google Scholar]
- 41.Ganz, D. A., Higashi, T. & Rubenstein, L. Z. Monitoring falls in cohort studies of community-dwelling older people: effect of the recall interval. J. Am. Geriatr. Soc.53 (12), 2190–2194. 10.1111/j.1532-5415.2005.00509.x (2005). [DOI] [PubMed] [Google Scholar]
- 42.Hirsch, C. In older adults with a history of falls, interventions to reduce use of fall risk-increasing drugs do not reduce falls. Ann. Intern. Med.172 (12), JC68. 10.7326/ACPJ202006160-068 (2020). [DOI] [PubMed] [Google Scholar]
- 43.Wapp, C., Mittaz, H. A., Hilfiker, R. & Zysset, P. History of falls and fear of falling are predictive of future falls: outcome of a fall rate model applied to the Swiss chef trial cohort. Front. Aging3 (1056779) 10.3389/fragi.2022.1056779 (2022). [DOI] [PMC free article] [PubMed]
- 44.Arai, T. et al. Loss of height predicts fall risk in elderly Japanese: a prospective cohort study. J. Bone Min. Metab.41 (1), 88–94. 10.1007/s00774-022-01383-x (2023). [DOI] [PubMed] [Google Scholar]
- 45.Lin, G., Al, A. R. & Niechwiej-Szwedo, E. Age-related deficits in binocular vision are associated with poorer inhibitory control in healthy older adults. Front. Neurosci. Switz.14 (605267). 10.3389/fnins.2020.605267 (2020). [DOI] [PMC free article] [PubMed]
- 46.McDaniels-Davidson, C., Nichols, J. F., Vaida, F., Marshall, L. M. & Kado, D. M. Kyphosis and 3-year fall risk in community-dwelling older men. Osteoporos. Int.31 (6), 1097–1104. 10.1007/s00198-019-05155-8 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Hirase, T., Okubo, Y., Sturnieks, D. L. & Lord, S. R. Pain is associated with poor balance in community-dwelling older adults: a systematic review and meta-analysis. J. Am. Med. Dir. Assoc.21 (5), 597–603e8. 10.1016/j.jamda.2020.02.011 (2020). [DOI] [PubMed] [Google Scholar]
- 48.Eggermont, L. H., Bean, J. F., Guralnik, J. M. & Leveille, S. G. Comparing pain severity versus pain location in the mobilize Boston study: chronic pain and lower extremity function. J. Gerontol. A-Biol. 64 (7), 763–770. 10.1093/gerona/glp016 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Muhammad, T., Rashid, M. & Zanwar, P. P. Examining the association of pain and pain frequency with self-reported difficulty in activities of daily living and instrumental activities of daily living among community-dwelling older adults: findings from the longitudinal aging study in India. J. Gerontol. B-Psychol. 78 (9), 1545–1554. 10.1093/geronb/gbad085 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Erlinger, T. P. et al. Leukocytosis, hypoalbuminemia, and the risk for chronic kidney disease in Us adults. Am. J. Kidney Dis.42 (2), 256–263. 10.1016/s0272-6386(03)00650-4 (2003). [DOI] [PubMed] [Google Scholar]
- 51.Abete, I. et al. Association of lifestyle factors and inflammation with sarcopenic obesity: data from the predimed-plus trial. J. Cachexia Sarcopeni. 10 (5), 974–984. 10.1002/jcsm.12442 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Li, C., Lao, W. & Wang, S. Risk assessment of unclean cooking energy usage from the perspective of subjective wellbeing: the mediating role of perceived physical and mental health. Ecotox Environ. Safe. 281, 116603. 10.1016/j.ecoenv.2024.116603 (2024). [DOI] [PubMed] [Google Scholar]
- 53.Wen, Q. et al. Self-reported primary cooking fuels use and risk of chronic digestive diseases: a prospective cohort study of 0.5 million Chinese adults. Environ. Health Persp. 131 (4), 47002. 10.1289/EHP10486 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Viderman, D., Tapinova, K., Aubakirova, M. & Abdildin, Y. G. The prevalence of pain in chronic diseases: an umbrella review of systematic reviews. J. Clin. Med.12 (23). 10.3390/jcm12237302 (2023). [DOI] [PMC free article] [PubMed]
- 55.Smith, T. O. et al. The prevalence, impact and management of musculoskeletal disorders in older people living in care homes: a systematic review. Rheumatol. Int.36 (1), 55–64. 10.1007/s00296-015-3322-1 (2016). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets generated and/or analysed during the current study are available in the CHARLS repository, http://charls.pku.edu.cn.