Skip to main content
BMJ Open logoLink to BMJ Open
. 2025 Sep 3;15(9):e095312. doi: 10.1136/bmjopen-2024-095312

Development and validation of a machine learning-based prediction model for frailty in older adults with diabetes: a study protocol for a retrospective cohort study

An Luo 1,0, Yiting Pan 1,0, Yaqing Liu 1,0, Longhan Zhang 1, Hao Bai 1, Zeyuan Long 1, Lingqiao Song 1, Xingyu Wei 2, Li Liao 1,
PMCID: PMC12410600  PMID: 40903080

Abstract

Introduction

Frailty is a common condition in older adults with diabetes, which significantly increases the risk of adverse health outcomes. Early identification of frailty in this population is crucial for implementing timely interventions. However, there is a lack of specific prediction models for frailty in older adults with diabetes. This study aims to develop and validate a prediction model for frailty in this high-risk group.

Methods and analysis

This study uses data from the national follow-up of the China Health and Retirement Longitudinal Study (CHARLS), which range from 2011 to 2020. The study population includes older adults with diabetes aged 60 and above. Frailty is assessed using Fried’s frailty phenotype. Potential predictors will be identified through a systematic review and expert consultation. Eight machine learning models will be developed to predict frailty, with model performance to be evaluated using receiver operating characteristic curves, calibration plots and internal validation through leave-one-out cross validation. Finally, the optimal model will be deployed via an electronic risk calculator with Shapley Additive Explanation-based visualisations.

Ethics and dissemination

The CHARLS was approved by the Biomedical Ethics Committee of Peking University (approval number: IRB00001052-11015), and all participants were required to sign informed consent. This study was approved by the Medical Research Ethics Committee of the University of South China (approval number: 2023NHHL006). We will disseminate results via presentations at scientific meetings and publication in peer-reviewed journals.

PROSPERO registration number

CRD42023470933.

Keywords: Aged, Frail Elderly, Frailty, General diabetes


Strengths and limitations of this study.

  • The protocol follows the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis—Artificial Intelligence reporting guideline to ensure methodological transparency.

  • A predefined statistical analysis plan includes multiple machine learning methods for model development.

  • Predictor selection is informed by a systematic review and expert consultation.

  • The retrospective design may result in missing data and limit the range of available predictors.

  • The black-box nature of machine learning may limit transparency despite Shapley Additive Explanation-based interpretability tools.

Introduction

Diabetes continues to impose a growing health burden worldwide, especially in ageing populations. The 10th edition of the IDF Diabetes Atlas estimates that nearly one in four adults aged 75–79 will have diabetes by 2045, corresponding to a projected prevalence of 24.7%.1 With ongoing population ageing worldwide, the proportion of adults aged over 60 living with diabetes is projected to rise continuously.2 Frailty is increasingly being identified as the third major complication of diabetes, following microvascular and macrovascular complications.3 Frailty is a multidimensional condition marked by decreased physiological reserve and increased sensitivity to stressors due to impairments in multiple systems.4 Among older adults with diabetes, frailty is widely recognised as being linked to elevated risks of both mortality and disability.5 Importantly, the prevalence of frailty is 3–5 times higher in older adults with diabetes than in their non-diabetic counterparts.6 Current research suggests a reciprocal interaction between diabetes and frailty, which may contribute to a cycle of worsening health. Frailty plays a critical role in shaping the prognosis of older adults with diabetes.7 Frailty, while commonly linked to ageing, can arise independently and may not always progress over time.8 In line with this, evidence indicates that frailty-related impairments in older adults are potentially modifiable and may fluctuate over time.9 A tool that can detect and predict frailty at an early stage would significantly enhance healthcare professionals’ decision-making capabilities and resource management.

Machine learning (ML) is a field of computer science that leverages existing data to predict future outcomes when new data are presented.10 With growing access to data and the availability of off-the-shelf software for applying ML methods, developing prediction models has become faster and more straightforward.11 Numerous studies have been dedicated to the development of frailty prediction models, with researchers employing a variety of ML algorithms to detect, classify and predict the onset of frailty.12,15 These algorithms, including random forests, support vector machines, logistic regression and artificial neural networks, have shown promise in identifying risk factors and patterns that precede the development of frailty. Despite the advancements in this field, the application of these models to older adults with diabetes remains limited. This population is characterised by a complex interplay of metabolic dysregulation, chronic inflammation and multiple comorbidities,16 which may confound the prediction of frailty. Therefore, a tailored prediction model specifically designed for this subgroup is essential to address the unique challenges faced by older adults with diabetes. This study aims to address this gap by creating a predictive model that combines the complexities of diabetes-related pathophysiology with the phenotypic characteristics of frailty. By doing so, we aim to provide clinicians with a robust tool for the early identification of frailty, enabling personalised care plans that mitigate its negative impacts on the health and well-being of older adults with diabetes.

Objectives

Primary objectives

The primary objective is to develop and validate a predictive model for frailty in older adults with diabetes using data from the 2011 to 2020 China Health and Retirement Longitudinal Study (CHARLS), enabling early identification and timely interventions to improve health outcomes.

Secondary objectives

The secondary objectives are to identify key predictors through systematic review and expert consultation to ensure the model’s relevance, to compare the performance of eight advanced ML algorithms to select the most effective model and to enhance clinical applicability by developing an electronic risk calculator with Shapley Additive Explanation (SHAP)-based visualisations to support interpretation and decision-making in clinical practice.

Methods

We will conduct a prediction model development and validation study using a retrospective cohort design (figure 1). This study will be reported following the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis—Artificial Intelligence (TRIPOD-AI) statement.11

Figure 1. The flowchart of developing the prediction model. CHARLS, China Health and Retirement Longitudinal Study.

Figure 1

Source of data

The research data source is the CHARLS led by Peking University from 2011 to 2020. In the baseline survey (2011), CHARLS surveyed 17 000 middle-aged and older adults aged 45 and above in about 10 000 households.17 The participants were interviewed face-to-face in their homes using computer-assisted personal interviewing technology. In terms of survey content, the CHARLS questionnaire covers individual and family information of middle-aged and older adults, including physical and mental health status represented by depressive symptoms, chronic disease status and self-rated health, as well as demographic variables such as gender, age, marital status and education level. CHARLS provides the data for this study to conduct a prediction model development and validation for frailty in older adults with diabetes. The CHARLS datasets are available for download from the CHARLS homepage: http://charls.pku.edu.cn/en.

Participants

To be eligible to take part in the study, participants must meet the following criteria: (a) aged ≥60 years (CHARLS variable: ba004_w3_1≤1951) and (b) diagnosed with diabetes,18 defined as meeting at least one of the following criteria: fasting plasma glucose ≥126 mg/dL (CHARLS variable: newglu), HbA1c ≥6.5% (CHARLS variable: newhba1c) or self-reported physician diagnosis of diabetes (CHARLS variable: DA007). Exclusion criteria: (a) abnormal data, defined as biologically implausible or logically inconsistent values; (b) data missing for frailty items in the CHARLS >10% and (c) data missing for predictor items in the CHARLS >10%. We plan to use the training set (patients from CHARLS 2011 and 2015) to develop the frailty prediction models and use other waves for validation.

Study period

This study is scheduled to be conducted from 1 July 2025 to 1 May 2026, including systematic review and expert consultation (July–September 2025), model development and validation (October 2025–January 2026) and manuscript preparation (February–May 2026).

Outcome

We use the frailty phenotype (FP) developed by Fried19 to assess frailty status. Currently, FP is the most widely used frailty scale and has been validated in other studies using CHARLS data.20 21 The details are as follows: (1) weakness is assessed through a self-reported item: ‘have difficulty with lifting or carrying weights over 5 kg, like a heavy bag of groceries’; (2) slowness is determined by the participant’s difficulty in walking 100 metres or climbing several flights of stairs without resting; (3) exhaustion is noted if the participant answered ‘Most or all of the time’ or ‘Occasionally or a moderate amount of the time’ to the questions: ‘I felt everything I did was an effort’ or ‘I could not get going’; (4) low physical activity is defined as either the complete absence of physical activity or engaging in walking sessions that last <10 min each throughout the week and (5) weight loss is defined as an involuntary loss of 5 kg or more over the past year, or a current body mass index of 18.5 kg/m² or lower. Frailty is characterised by the presence of three or more of these criteria.

Predictors

The predictors are based on our previous systematic review and expert consultation. From the systematic review, we included the variables significantly predictive of frailty among older adults with diabetes. Furthermore, we consult with two diabetologists (LZ and XW) to finalise the selection of predictors. Online supplemental table 1 presents the researched domains, predictors, the measurement method and the planned handling of variables in the statistical analysis.22,24

Sample size

The sample size of the development dataset must be large enough to develop a prediction model equation that is reliable when applied to new individuals in the target population.25 For the binary outcome prediction model, the sample size calculation is conducted as described in a previous study,26 as follows: n=exp0.508+0.259In(Φ)+0.504In(P)In(MAPE)0.544 Based on the above formula, the estimated minimum sample size required to ensure that prediction models will yield a low prediction error in estimating outcome probabilities for the target population (as measured by the mean absolute prediction error, MAPE) is calculated to be 375. Besides, the sample size for external validation should be sufficient to accurately assess model performance.27 Therefore, we calculated the model external validation sample size according to the Riley et al27 recommendations and the Stata code provided by Ensor.28 Stata code of sample size calculations is: ‘pmvalsampsize, type(b) prevalence(.28) cstatistic(.88) lpbeta(1.55,1.85) oeciwidth(.30) csciwidth(.4) cstatciwidth(.1)’

As a result, at least 558 participants (153 events) are required for the external validation study to ensure calibration is properly evaluated.

Patient and public involvement

Patients and the public are not involved in the design, conduct, reporting or dissemination plans of this research, as the study is a secondary analysis of publicly available data from the CHARLS cohort.

Missing data

If more than 10% of the dataset consists of incomplete records, imputation procedures will be employed. If the data are missing at random or better, a multiple imputation approach will be used. For continuous variables, missing values will be addressed using the proximity point mean method.

Statistical analysis methods and model-building procedures

Predictors will be used as feature parameters for inclusion in the prediction model. Continuous variables will be standardised, while categorical variables will be handled using one-hot encoding methods. Employing recursive feature elimination, the study will identify the optimal subset to obtain the most favourable combination of features. Eight ML algorithms (Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), K-nearest Neighbour (KNN), Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM) and Decision Tree (ID3)) will be used to develop the prediction model for frailty in older adults with diabetes. We select these eight models to represent diverse methodological paradigms, each with distinct strengths in addressing data complexity. This comparative approach facilitates the identification of the most suitable model and is consistent with current best-practice guidelines for transparent and robust prediction modelling, as recommended by TRIPOD-AI statement. Model performance will be evaluated using receiver operating characteristic (ROC) curves, calibration plots and internal validation with leave-one-out cross validation. To select the optimal model, we will prioritise the area under the ROC curve (AUC) as the primary criterion for discrimination. If models demonstrate similar AUC values, we will further evaluate calibration using calibration plots and Brier scores. When discrimination and calibration are comparable, decision curve analysis will be used to guide selection based on clinical utility. This approach follows TRIPOD-AI recommendations to ensure transparency and clinical relevance in model development. Afterwards, the model will be evaluated using an external validation dataset derived from later waves of the CHARLS (eg, 2018 or 2020), which are not used in model development, to ensure temporal separation and enhance generalisability. The SHAP method will be employed to illustrate the relative importance of each variable, providing interpretable insights within the model. Finally, the optimal model will be deployed via an electronic risk calculator with SHAP-based visualisations to enhance clinical interpretability and support decision-making in routine practice.

Discussion

Currently, China has the highest number of older adults with diabetes in the world, totalling 35.5 million. This number is projected to increase to 78.1 million by 2045.1 Early identification of individuals at high risk of frailty is important for delaying the onset, reducing complications and improving the prognosis of older adults with diabetes. To the best of our knowledge, this study aims to provide a thorough assessment of frailty risk in older adults with diabetes using the latest ML algorithms. Unlike other studies that have identified predictive variables through univariable logistic regression analysis, this study stands out by selecting model predictors based on meta-analysis and systematic review. Meta-analysis is an important research method and one of the strongest sources of evidence in evidence-based medicine.29 This may help us identify more useful predictor variables. Furthermore, in the realm of ML, data are the essential component driving model training and performance improvement. The research data source is the CHARLS led by Peking University from 2011 to 2020, which is a high-quality database with a good representation of China’s middle-aged and older adults. Consequently, we believe our model holds significant potential for clinical application.

This study has some anticipated limitations. First, due to its retrospective nature, some data may be missing. Second, the number of available predictors may be limited, potentially leading to model underfitting. Despite these limitations, we expect this study to represent a valuable scientific contribution to the field of geriatric diabetes care.

Supplementary material

online supplemental table 1
bmjopen-15-9-s001.docx (23.1KB, docx)
DOI: 10.1136/bmjopen-2024-095312

Acknowledgements

We thank the China Health and Retirement Longitudinal Study team for providing the data and for their training in dataset usage.

Footnotes

Funding: This study is supported by the Natural Science Foundation of Hunan Province (grant number: 2023JJ50137).

Prepublication history and additional supplemental material for this paper are available online. To view these files, please visit the journal online (https://doi.org/10.1136/bmjopen-2024-095312).

Provenance and peer review: Not commissioned; externally peer reviewed.

Patient consent for publication: Not applicable.

Patient and public involvement: Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

References

  • 1.Magliano DJ, Boyko EJ, Committee IDA 10th edition scientific . IDF DIABETES ATLAS [Internet]. 10th edition. International Diabetes Federation; 2021. Global picture. [Google Scholar]
  • 2.Sun H, Saeedi P, Karuranga S, et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. 2022;183:109119. doi: 10.1016/j.diabres.2021.109119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ulley J, Abdelhafiz AH. Frailty predicts adverse outcomes in older people with diabetes. Practitioner. 2017;261:17–20. [PubMed] [Google Scholar]
  • 4.Pilotto A, Custodero C, Maggi S, et al. A multidimensional approach to frailty in older people. Ageing Res Rev. 2020;60:101047. doi: 10.1016/j.arr.2020.101047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Castro-Rodríguez M, Carnicero JA, Garcia-Garcia FJ, et al. Frailty as a Major Factor in the Increased Risk of Death and Disability in Older People With Diabetes. J Am Med Dir Assoc. 2016;17:949–55. doi: 10.1016/j.jamda.2016.07.013. [DOI] [PubMed] [Google Scholar]
  • 6.Yanase T, Yanagita I, Muta K, et al. Frailty in elderly diabetes patients. Endocr J. 2018;65:1–11. doi: 10.1507/endocrj.EJ17-0390. [DOI] [PubMed] [Google Scholar]
  • 7.Strain WD, Down S, Brown P, et al. Diabetes and Frailty: An Expert Consensus Statement on the Management of Older Adults with Type 2 Diabetes. Diabetes Ther. 2021;12:1227–47. doi: 10.1007/s13300-021-01035-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Rodriguez-Mañas L, Laosa O, Vellas B, et al. Effectiveness of a multimodal intervention in functionally impaired older people with type 2 diabetes mellitus. J Cachexia Sarcopenia Muscle. 2019;10:721–33. doi: 10.1002/jcsm.12432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Gill TM, Gahbauer EA, Allore HG, et al. Transitions between frailty states among community-living older persons. Arch Intern Med. 2006;166:418–23. doi: 10.1001/archinte.166.4.418. [DOI] [PubMed] [Google Scholar]
  • 10.Wiens J, Shenoy ES. Machine Learning for Healthcare: On the Verge of a Major Shift in Healthcare Epidemiology. Clin Infect Dis. 2018;66:149–53. doi: 10.1093/cid/cix731. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Collins GS, Moons KGM, Dhiman P, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024;385:e078378. doi: 10.1136/bmj-2023-078378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Akbari G, Nikkhoo M, Wang L, et al. Frailty Level Classification of the Community Elderly Using Microsoft Kinect-Based Skeleton Pose: A Machine Learning Approach. Sensors (Basel) 2021;21:4017. doi: 10.3390/s21124017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ambagtsheer RC, Shafiabady N, Dent E, et al. The application of artificial intelligence (AI) techniques to identify frailty within a residential aged care administrative data set. Int J Med Inform. 2020;136:104094. doi: 10.1016/j.ijmedinf.2020.104094. [DOI] [PubMed] [Google Scholar]
  • 14.Aponte-Hao S, Wong ST, Thandi M, et al. Machine learning for identification of frailty in Canadian primary care practices. Int J Popul Data Sci. 2021;6:1650. doi: 10.23889/ijpds.v6i1.1650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Blanes-Selva V, Doñate-Martínez A, Linklater G, et al. Complementary frailty and mortality prediction models on older patients as a tool for assessing palliative care needs. Health Informatics J. 2022;28 doi: 10.1177/14604582221092592. [DOI] [PubMed] [Google Scholar]
  • 16.Ji L, Hu D, Pan C, et al. Primacy of the 3B approach to control risk factors for cardiovascular disease in type 2 diabetes patients. Am J Med. 2013;126:925. doi: 10.1016/j.amjmed.2013.02.035. [DOI] [PubMed] [Google Scholar]
  • 17.Zhao Y, Hu Y, Smith JP, et al. Cohort profile: the China Health and Retirement Longitudinal Study (CHARLS) Int J Epidemiol. 2014;43:61–8. doi: 10.1093/ije/dys203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Clinical Guidelines for the Prevention and Control of Type 2 Diabetes Mellitus in the Elderly in China (2022 Edition) Chin J Diabetes. 2022;30:2–51. doi: 10.3969/j.issn.1006⁃6187.2022.01.002. [DOI] [Google Scholar]
  • 19.Fried LP, Tangen CM, Walston J, et al. Frailty in older adults: evidence for a phenotype. J Gerontol A Biol Sci Med Sci. 2001;56:M146–56. doi: 10.1093/gerona/56.3.m146. [DOI] [PubMed] [Google Scholar]
  • 20.Ning H, Zhang H, Xie Z, et al. Relationship of hearing impairment, social participation and depressive symptoms to the incidence of frailty in a community cohort. J Am Geriatr Soc. 2023;71:1167–76. doi: 10.1111/jgs.18164. [DOI] [PubMed] [Google Scholar]
  • 21.Bu F, Deng X-H, Zhan N-N, et al. Development and validation of a risk prediction model for frailty in patients with diabetes. BMC Geriatr. 2023;23:172. doi: 10.1186/s12877-023-03823-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Andresen EM, Malmgren JA, Carter WB, et al. Screening for depression in well older adults: evaluation of a short form of the CES-D (Center for Epidemiologic Studies Depression Scale) Am J Prev Med. 1994;10:77–84. [PubMed] [Google Scholar]
  • 23.Zeng Z, Bian Y, Cui Y, et al. Physical Activity Dimensions and Its Association with Risk of Diabetes in Middle and Older Aged Chinese People. Int J Environ Res Public Health. 2020;17:7803. doi: 10.3390/ijerph17217803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Craig CL, Marshall AL, Sjöström M, et al. International physical activity questionnaire: 12-country reliability and validity. Med Sci Sports Exerc. 2003;35:1381–95. doi: 10.1249/01.MSS.0000078924.61453.FB. [DOI] [PubMed] [Google Scholar]
  • 25.Riley RD, Ensor J, Snell KIE, et al. Calculating the sample size required for developing a clinical prediction model. BMJ. 2020;368:m441. doi: 10.1136/bmj.m441. [DOI] [PubMed] [Google Scholar]
  • 26.van Smeden M, Moons KG, de Groot JA, et al. Sample size for binary logistic prediction models: Beyond events per variable criteria. Stat Methods Med Res. 2019;28:2455–74. doi: 10.1177/0962280218784726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Riley RD, Snell KIE, Archer L, et al. Evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study. BMJ. 2024;384:e074821. doi: 10.1136/bmj-2023-074821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ensor J. PMSAMPSIZE: stata module to calculate the minimum sample size required for developing a multivariable prediction model. 2023
  • 29.Hernandez AV, Marti KM, Roman YM. Meta-Analysis. Chest. 2020;158:S97–102. doi: 10.1016/j.chest.2020.03.003. [DOI] [PubMed] [Google Scholar]

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    online supplemental table 1
    bmjopen-15-9-s001.docx (23.1KB, docx)
    DOI: 10.1136/bmjopen-2024-095312

    Articles from BMJ Open are provided here courtesy of BMJ Publishing Group

    RESOURCES