Skip to main content
BMJ Open logoLink to BMJ Open
. 2026 Jan 27;16(1):e103171. doi: 10.1136/bmjopen-2025-103171

Development and validation of a two-stage machine learning model for personalised type 2 diabetes screening in the All of Us Research Program and UK Biobank

Ahmed Khattab 1,2, Shang-Fu Chen 1,2, Hossein Javedani Sadaei 1,2, Nathan E Wineinger 1,2, Ali Torkamani 1,2,
PMCID: PMC12853511  PMID: 41592831

Abstract

Abstract

Objective

To develop and externally validate a two-stage machine learning framework that integrates polygenic risk and clinical variables for early identification of individuals at risk of developing type 2 diabetes.

Methods

We conducted a prospective prediction study using data from the All of Us Research Program for model development and the UK Biobank for external validation. Two models were constructed. Stage 1 used gradient boosted decision trees (XGBoost) with cross validation, automated hyperparameter optimisation and class weighting to predict 5-year incident type 2 diabetes using demographic, clinical and polygenic predictors. Stage 2 incorporated glycated haemoglobin or fasting glucose measurements to refine risk estimates. Model interpretation used SHapley Additive exPlanations values and permutation importance, and logistic regression and random forest models served as comparators. Discrimination of all models was compared using the DeLong test.

Results

The Stage 1 model achieved an area under the receiver operating characteristic curve (AUROC) of 0.81 in All of Us and 0.82 in UK Biobank, performing significantly better than the phenotype-only model in UK Biobank (DeLong p=1.05×10⁻⁷⁶). Higher polygenic risk quartiles were associated with increased incidence of type 2 diabetes in both cohorts (global χ2 p<0.001). The Stage 2 model achieved AUROC values of 0.78 in All of Us and 0.77 in UK Biobank. Subgroup performance was consistent across sex and ancestry groups, with CIs reported. Cost analysis suggested potential net savings compared with the American Diabetes Association test.

Conclusion

A two-stage machine learning framework that integrates genetic and clinical information can support personalised screening for type 2 diabetes across diverse populations. The approach demonstrated robust performance across cohorts and offers a practical structure for early risk identification.

Keywords: Diabetes Mellitus, Type 2; Risk Assessment; Primary Prevention


STRENGTHS AND LIMITATIONS OF THIS STUDY.

  • The models were developed using a large, diverse real-world cohort (All of Us Research Program (AoU)) and externally validated in an independent research cohort (UK Biobank (UKBB)), allowing rigorous assessment of transportability.

  • Multiple complementary explainability methods (impurity importance, permutation importance and SHapley Additive exPlanations (SHAP)) were applied and demonstrated stable predictor behaviour, supporting the robustness of model interpretation.

  • The two-stage framework mirrors clinical workflows by separating initial screening from glycaemic refinement, enabling evaluation of predictors across different stages of risk assessment.

  • Predictor sets were restricted to variables available in both All of Us (AoU) and UK Biobank (UKBB) to ensure comparable external validation, which may have excluded additional relevant risk factors.

  • Some ancestry subgroups had limited numbers of incident cases, reducing the precision of subgroup performance estimates.

Introduction

Type 2 diabetes mellitus (T2DM) accounts for over 90% of diabetes cases worldwide, posing a significant public health challenge due to its rising prevalence, complications and healthcare costs.1 2 Early detection is critical for preventing complications such as cardiovascular disease, neuropathy, nephropathy and retinopathy.3 Current screening tools, such as the American Diabetes Association (ADA) risk assessment test, rely on traditional clinical risk factors like age, body mass index, family history and lifestyle behaviours.4 While accessible and easy to use, these tools may lack the capacity for personalised risk evaluation and may not fully capture individual heterogeneity, potentially limiting the effectiveness of targeted preventive interventions.

Advancements in genomics have introduced polygenic risk scores (PRS), which aggregate the effects of numerous genetic variants across the genome to quantify an individual’s predisposition to T2DM.5 By identifying genetic susceptibility, PRS provides an opportunity for early and personalised risk assessment, particularly when combined with clinical factors. For example, Läll et al demonstrated that integrating PRS with traditional clinical risk factors modestly improved T2DM prediction compared with clinical models alone.6 Similarly, Khera et al developed a genome-wide PRS incorporating millions of genetic variants, showing that individuals in the highest risk percentiles had a markedly elevated risk of developing T2DM.7

Machine learning (ML) models have further improved T2DM risk prediction by capturing complex nonlinear relationships and handling large, high-dimensional datasets.8,10 Recent studies using gradient boosting, random forests and neural networks have consistently outperformed traditional regression-based tools by capturing interactions among demographic, anthropometric, behavioural and laboratory predictors.11 Integrating PRS into ML frameworks has shown promise, offering individualised risk assessments superior to models relying solely on clinical factors.12 13 However, many existing ML models lack external validation, rely on cross-sectional rather than prospective outcomes or are trained on relatively homogeneous populations, which limits their real-world applicability.1113,15 These gaps underscore the need for ML approaches that integrate genetic and clinical factors, are trained in diverse real-world cohorts and undergo rigorous external evaluation across populations.

The All of Us (AoU) Research Program16 provides such an opportunity by offering a large, ethnically diverse cohort with real-world electronic health record (EHR) and genomic data. Its representation of demographic, socioeconomic and clinical variability allows for the development of screening models that better reflect the complexities encountered in practical, person-centred clinical environments. In contrast, research-centric cohorts such as the UK Biobank (UKBB) offer highly structured and complete data suitable for robust external validation.

In this study, we develop and evaluate a two-stage ML framework designed to improve personalised T2DM screening in real-world settings. Our approach addresses a prospective classification problem: predicting 5-year incident T2DM among individuals free of diabetes at baseline, using baseline phenotypic and genetic information. In the first stage, a PRS-Integrated Screening Model combines polygenic scores with readily available clinical predictors to identify individuals at elevated risk without requiring immediate glycaemic testing. Those flagged as at risk then undergo the second stage, where glycated haemoglobin (HbA1c) and fasting plasma glucose (FPG) values are incorporated into a PRS-Integrated Advanced Risk Model to refine risk estimation across the full glycaemic spectrum. We compare this framework with the ADA risk test and with a Phenotype-Only Model trained using the same predictors but excluding PRS, and externally validate performance in the UKBB. This design enables a clinically aligned evaluation of PRS within a two-stage workflow and directly addresses the need for prospective, externally validated screening tools for diverse populations.

Methods

We developed predictive models to estimate the 5-year risk of incident T2DM among individuals free of diabetes at baseline. Both the AoU and UKBB cohorts were restricted to participants without any evidence of prevalent diabetes based on EHR diagnoses, laboratory measurements or self-reported diabetes. The outcome was defined as incident T2DM within 5 years of baseline assessment, identified using longitudinal EHR and laboratory measurements as detailed below. Further methodological details are provided in online supplemental materials.

Study population and data sources

All of Us Research Program

The AoU dataset was used for model training, excluding participants without whole genome sequencing (WGS) data and those with type 1 diabetes mellitus (T1DM) codes. The final cohort included 11 567 incident T2DM cases and 117 896 controls, totalling 129 463 participants (figure 1a). The data represent approximately 5 years of longitudinal follow-up within the AoU programme, from which incident T2DM cases were identified based on EHRs and laboratory data. AoU is a demographically and socioeconomically diverse cohort that reflects real-world clinical practice in the USA, including substantial variability in health status, care access and data completeness. This diversity and natural missingness make AoU well suited for developing a screening model intended for broad clinical implementation.

Figure 1. Overview of the study design and two-stage screening framework. (a) Workflow for model development in the All of Us cohort, external validation in UK Biobank and comparison with the ADA risk test.(b) The proposed two-stage screening approach, consisting of an initial PRS-Integrated Screening Model followed by a PRS-Integrated Advanced Risk Model incorporating glycaemic measures. ADA, American Diabetes Association; HbA1c, glycated haemoglobin; ICD, International Classification of Diseases; PRS, polygenic risk scores; SHAP, SHapley Additive exPlanations; T1D, type 1 diabetes; T2D, type 2 diabetes; UKBB, UK Biobank; WGS, whole genome sequencing; XGBoost, Extreme Gradient Boosting. Created in BioRender by Khattab.21 Source files available at Created in BioRender. Khattab, A. (2026) https://BioRender.com/e09ywha.

Figure 1

UK Biobank

UKBB was used exclusively for external validation. Of 502 414 participants, we excluded individuals who withdrew consent, lacked EHRs, had T1DM codes, had no HbA1c measurements or developed T2DM more than 5 years after recruitment. The final sample included 73 281 participants (6718 incident T2DM cases; 66 563 controls). These individuals were selected from the broader UKBB incidence cohort (n=1 66 623) to match the 5-year follow-up window used in AoU. UKBB represents a structured research cohort with more complete phenotype and laboratory data and predominantly European ancestry, enabling robust validation in a contrasting data environment.

Definition of medical diagnoses

T2DM was determined based on several criteria. A diagnosis was assigned if participants had the presence of ICD-10 (International Classification of Diseases, Tenth Revision) codes E11.0 to E11.9 or ICD-9 (International Classification of Diseases, Ninth Revision) codes 250.xx in their EHRs. Laboratory measurements were also used for diagnosis, with an HbA1c value of 6.5% (48 mmol/mol) or higher, or an FPG value of 126mg/dL (7.0mmol/L) or higher indicating T2DM. In the UKBB, Read codes were also used to define T2DM. Additionally, individuals who self-reported T2DM during the recruitment interview were considered to have the condition. Other medical diagnoses were defined using both ICD-9 and ICD-10 codes present in the EHRs. The specific codes used for all conditions are listed in online supplemental table 1).

Data imputation

Imputation was not conducted in the AoU dataset due to the degree and structure of missingness. Instead, the dataset was used in its entirety, allowing for missing values, as XGBoost models can handle missing data points during predictions. In contrast, the UKBB dataset underwent imputation for variables with less than 20% missing data or responses recorded as ‘Do not know’ or ‘Prefer not to answer’ using the multiple imputation by chained equations algorithm.17 The imputation process incorporated all available variables, including those with complete data, to maximise the use of the information present. Ten iterations were performed using the ‘miceRanger’ package in R to produce three imputed datasets, which were subsequently averaged into a final dataset. The convergence between imputed and reported values for all variables is illustrated in online supplemental figure 1.

To evaluate the robustness of our findings to the missing-data strategy, we performed a sensitivity analysis by comparing model performance on the imputed and non-imputed UKBB datasets. The same model was applied to both datasets after aligning individuals and outcomes. We compared the area under the receiver operating characteristic curve (ROC-AUC), the area under the precision-recall curve (PR-AUC), using a paired DeLong test.18

Predictor selection

Predictors included demographic characteristics, body composition measures, family history of diabetes, lifestyle factors, clinical conditions, laboratory biomarkers and 73 PRS selected from the PRS Catalog19 covering various biological pathways relevant to T2DM (online supplemental tables 2 and 3). These PRS represented genetic predispositions related to cardiovascular diseases, lipid metabolism, glycaemic traits, anthropometric measures, cognitive disorders, renal function, psychological conditions, sleep disorders and various metabolic and endocrine disorders, including insulin resistance and secretion. This set explicitly included T2DM-specific PRS in addition to related metabolic and glycaemic traits, ensuring that both direct and pathway-level genetic susceptibility to diabetes were represented. By including a broad spectrum of PRS, we aimed to capture the genetic heterogeneity and complex aetiology of T2DM. Only variables available in both UKBB and AoU datasets were included, resulting in 39 mutual phenotypic variables and the set of PRS.

PRS derived from genome-wide association studies (GWAS) conducted on the UKBB dataset were excluded to avoid circularity in model validation. This ensured that the PRS used in the study were independent of the validation dataset, strengthening the external validity of the results.

PRS calculation

PRS were generated using the array-based genotyping data from the UKBB after imputation, using the standard weighted sum of allele effects This method involves summing the products of risk allele dosages and their corresponding effect sizes from GWAS, followed by standardisation to ensure comparability across individuals.20 Similarly, PRSs were calculated from WGS data in the AoU dataset using our AoU-specific PRS calculator, AoUPRS.21 The AoUPRS tool employs the same standard weighted sum of allele effects and standardisation approach as used in the UKBB. By applying consistent calculation methodologies across both datasets, we ensured that the PRS were comparable, facilitating reliable external validation of our models.

Baseline models for comparative analysis

To benchmark the independent contributions of genetic predictors and linear modelling approaches, we constructed several baseline reference models that were not part of the two-stage XGBoost screening workflow. These baseline models were used solely to quantify the incremental value of adding PRS and to provide interpretable traditional benchmarks. They included:

  1. Baseline Phenotype-Only Logistic Regression Model, incorporating demographic, anthropometric, behavioural and clinical predictors without genetic variables. This model establishes the performance of a traditional linear risk-prediction framework based on routine clinical factors.

  2. Baseline Composite PRS Logistic Regression Model, which adds a single aggregated PRS summary score to the baseline phenotype-only model. This tests whether a simplified one-number genetic measure improves discrimination over clinical predictors alone.

  3. Baseline LASSO PRS Model, which incorporates all 73 curated T2DM-related PRS within an L1-regularised logistic regression framework. This model evaluates whether jointly modelling multiple PRS with automatic coefficient shrinkage and selection enhances prediction beyond the composite score.

All baseline models were trained on the AoU dataset and evaluated in the UKBB cohort using identical preprocessing to ensure fair comparison. These models are conceptually distinct from the XGBoost-based Phenotype-Only Model and PRS-Integrated Screening Model used in the primary two-stage workflow; the baseline models serve as linear reference points for benchmarking.

Machine learning pipeline

The primary predictive models used in the two-stage screening framework were built using XGBoost, a gradient-boosted decision tree algorithm well suited for large, heterogeneous clinical datasets with mixed variable types and missingness.8 XGBoost was selected a priori for all main models because of its strong performance in high-dimensional tabular data and its native ability to handle missing values without imputation.

All models were trained exclusively on the AoU dataset to leverage its real-world population diversity. To ensure a rigorous evaluation of generalisability, all trained models were then externally validated in the UKBB cohort using identical preprocessing steps, aligned predictors and a harmonised outcome definition.

Model training and hyperparameter tuning

The AoU dataset was randomly split into training (80%) and testing (20%) sets. Hyperparameter tuning was performed using Optuna, an automated optimisation framework.22 Each trial used five-fold cross-validation within the training set to estimate out-of-fold performance. We explored a predefined parameter search space covering gamma, max_depth, min_child_weight, subsample, colsample_bytree, learning_rate, n_estimators, scale_pos_weight, reg_alpha and reg_lambda. Early stopping was applied with a patience of 10 rounds to prevent overfitting.

Handling class imbalance

Because incident T2DM was less frequent than non-diabetes cases, we addressed class imbalance by adjusting the scale_pos_weight parameter. This method increases the weight of positive cases during training, enabling the model to better learn minority-class patterns without introducing synthetic data.

Model interpretation

To enhance interpretability, we used SHapley Additive exPlanations (SHAP) to quantify each predictor’s contribution to individual predictions.23 SHAP provides locally accurate, model-agnostic feature attributions, allowing us to visualise the predictors driving risk across the T2DM spectrum.

To assess the stability and robustness of model interpretability across ensemble methods, we conducted a secondary analysis using a Random Forest classifier trained on the same feature set, training split and outcome definition as the XGBoost screening model. This model was not used for primary prediction but served to compare impurity-based, permutation-based and SHAP-based feature attributions across bagging and boosting approaches. A Random Forest with 500 estimators, balanced class weights and max_features = ‘sqrt’ was trained without hyperparameter tuning, as the purpose was to evaluate the consistency of feature attribution methods rather than optimise predictive performance.

Primary models in the two-stage workflow

This XGBoost pipeline was applied to construct:

Phenotype-only model

Uses demographic, anthropometric, behavioural and clinical predictors available in both AoU and UKBB. This model serves as the primary comparator for evaluating the incremental value of PRS.

PRS-integrated screening model

Incorporates all phenotypic predictors plus 73 curated PRS. This model estimates incident T2DM risk without requiring immediate glycaemic testing and forms Stage 1 of the proposed screening framework.

PRS-integrated advanced risk model

Adds HbA1c and FPG to the feature set to refine risk estimation for individuals flagged as at risk in Stage 1. This constitutes Stage 2 of the screening workflow. All models were trained and evaluated using the same pipeline, ensuring comparability across stages and between phenotype-only and PRS-enhanced approaches.

Implementation details

All analyses were conducted using Python programming language (V.3.11.4), with XGBoost implemented through the xgboost library (V.2.0.1). Optuna (V.3.5.0) was used for hyperparameter optimisation, and SHAP (V.0.46.0) was employed for model interpretation. A fixed random seed of 42 was set to ensure the reproducibility of results.

Screening approach and risk stratification

Our study employs a two-stage framework designed for practical integration into real-world clinical workflows. In Stage 1, the PRS-Integrated Screening Model combines polygenic risk with demographic and routinely available clinical variables to estimate risk without requiring immediate laboratory testing. This enables early identification of individuals who may benefit from targeted glycaemic evaluation.

Individuals flagged as at risk in Stage 1 proceed to glycaemic assessment, such as HbA1c or FPG. In Stage 2, the PRS-Integrated Advanced Risk Model incorporates glycaemic measures alongside PRS and clinical features to refine individual risk estimates across the full glycaemic spectrum. This two-stage structure supports both broad population screening and precise stratification among those with normal, borderline or elevated glycaemic values, enabling earlier and more personalised intervention strategies (figure 1b)

Thresholds for both models were selected using Youden’s index to optimise the balance between sensitivity and specificity; however, operating points may be adjusted by health systems to prioritise early detection, manage clinical workload or align with prevention programme capacity.

Comparison of models

The performance of the PRS-Integrated Screening Model was compared with a Phenotype-Only Model and the ADA risk assessment test. These comparisons were conducted at matched specificity levels, focusing on differences in the number of true positives (TP) and false negatives (FN) identified by each method. Additionally, the PRS-Integrated Screening Model’s performance was analysed across quartiles of the PRS distribution to evaluate its ability to capture individuals at varying levels of genetic risk.

To compare the discriminative performance of the baseline logistic regression models and the PRS-Integrated XGBoost model, we performed pairwise DeLong tests for the difference in ROC-AUC.18 These tests were conducted in the UKBB external validation cohort, which served as an independent dataset not used for model training. Six prespecified model pairs were evaluated to quantify the incremental value of (1) adding a composite PRS, (2) incorporating multiple PRS simultaneously and (3) applying nonlinear machine-learning approaches. DeLong tests and corresponding CIs were computed using the MLstatkit package.24

Model evaluation and validation

Model performance was assessed using ROC-AUC, PR-AUC and the F1 score. The models were trained on the AoU dataset to reflect real-world data constraints and externally validated on the UKBB cohort to evaluate generalisability across diverse populations. This validation approach ensured robustness and consistency of the models’ predictive accuracy in identifying T2DM risk across heterogeneous cohorts.

Patient and public involvement

None.

Results

Cohort characteristics

Within a five-year post-recruitment window in the UKBB and using longitudinal EHR data in AoU, 6,718 UKBB participants (9.16%) developed T2DM compared with 11,567 AoU participants (8.93%). The AoU cohort was younger (mean age 48.7 vs 56.9 years) and included a smaller proportion of males (36.8% vs 45.2%) compared with UKBB (table 1).

Table 1. Baseline characteristics of the UK Biobank and All of Us study participants.

Predictors Mean (±SD) or count (%)
UK Biobank (n=73 281) All of Us (n=129 463)
Case (n=6718) Control (n=66 563) Case (n=11 567) Control (n=117 896)
Demographics
 Age (year) 59.65±7.09 56.70±8.09 54.29±16.10 48.10±16.95
 Sex (male) 3954 (58.86%)  29 190 (43.85%) 4662 (40.30%) 43 050 (36.52%)
Body compositions
 BMI 31.68±5.58 27.11±4.52 30.84±7.96 28.79±7.10
 Hip circumference 109.25±11.23 102.96±8.84 110.39±15.86 106.89±14.37
 Waist circumference 102.73±13.38 89.32±12.86 99.42±17.69 93.04±16.81
Family history
 Type 2 diabetes 2479 (36.90%) 13 423 (20.17%) 906 (7.83%) 10 042 (8.52%)
Clinical conditions
 Essential hypertension 4755 (70.78%) 19 324 (29.03%) 6360 (54.98%) 24 841 (21.07%)
 Hyperlipidaemia 2844 (42.33%) 9227 (13.86%) 5508 (47.62%) 26 912 (22.83%)
 Obesity 1450 (21.58%) 3968 (5.96%) 3908 (33.79%) 16 829 (14.27%)
 NAFLD 371 (5.52%) 833 (1.25%) 1399 (12.09%) 3792 (3.22%)
 Chronic ischaemic heart disease 1788 (26.62%) 5998 (9.01%) 1838 (15.89%) 4796 (4.07%)
 Sleep disorders 528 (7.86%) 1356 (2.04%) 3675 (31.77%) 16 948 (14.38%)
 Depression 794 (11.82%) 4065 (6.11%) 3795 (32.81%) 20 565 (17.44%)
 Anxiety 495 (7.37%) 2936 (4.41%) 3656 (31.61%) 19 395 (16.45%)
 Acute renal failure 749 (11.15%) 2343 (3.52%) 1291 (11.16%) 1263 (1.07%)
 Chronic renal failure 771 (11.48%) 2340 (3.52%) 1144 (9.89%) 2551 (2.16%)
 Alcohol abuse 292 (4.35%) 1367 (2.05%) 1006 (8.70%) 4053 (3.44%)
 Tobacco harmful use 627 (9.33%) 3280 (4.93%) 1022 (8.84%) 5587 (4.74%)
Biomarker
 HbA1c (mmol/mol) 40.98±4.29 34.87±3.60 39.21±10.33 35.60±4.24
 FPG (mmol/L) 5.61±1.26 4.93±0.58 5.92±1.82 5.10±0.85
 Creatinine (μmol/L) 75.51±23.33 71.95±17.68 82.69±61.83 74.21±27.93
 HDL cholesterol (mmol/L) 1.23±0.32 1.47±0.39 1.41±0.46 1.51±0.45
 Triglycerides (mmol/L) 2.32±1.27 1.71±0.98 3.37±2.15 2.90±1.76
 Alanine aminotransferase (U/L) 30.97±19.20 22.89±13.42 31.93±73.56 24.96±57.10
 Aspartate aminotransferase (U/L) 29.74±15.52 26.01±9.74 35.28±382.15 24.92±46.01

BMI, body mass index; FPG, fasting plasma glucose; HbA1c, glycated haemoglobin; HDL, high-density lipoprotein; NAFLD, non-alcoholic fatty liver disease .

Comorbidities were generally more prevalent in AoU, consistent with prior findings that diseases are more common in AoU than in UKBB or the general US population.25 Anxiety and depression rates were significantly higher in AoU across both cases and controls (eg, depression: 32.8% vs 17.4%) compared with UKBB (11.8% vs 6.1%). Despite these differences in absolute prevalence, the relative case-control ratios for these conditions were similar between cohorts (eg, depression: 1.88× in AoU vs 1.93× in UKBB; anxiety: 1.92× in AoU vs 1.67× in UKBB). Cardiovascular conditions, such as chronic ischaemic heart disease, were more common in UKBB cases (26.6% vs 15.9% in AoU), reflecting the older age of its cohort.

Metabolic conditions were notably more prevalent in AoU. Obesity affected 33.8% of cases in AoU compared with 21.6% in UKBB, while hyperlipidaemia was also higher in AoU cases (47.6% vs 42.3%). Non-alcoholic fatty liver disease was more common in AoU cases than in UKBB (12.1% vs 5.5%), reflecting the greater metabolic burden in AoU. Elevated alanine and aspartate aminotransferase levels in AoU cases further highlight this metabolic disparity. Additionally, lifestyle factors such as alcohol abuse and tobacco harmful use were more prevalent in AoU, underscoring differences in behavioural risk factors.

These findings illustrate the challenges of risk stratification in AoU, where controls exhibit significant metabolic and behavioural risk factors and are younger, increasing their likelihood of transitioning to higher-risk groups over time. This underscores the importance of regularly updating risk assessment tools to account for evolving phenotypes and ensure timely identification of at-risk individuals.

Comparison of PRS-integrated screening and phenotype-only models

AoU internal evaluation

When tested on the 20% holdout set from the AoU cohort, the PRS-Integrated Screening Model showed comparable performance to the Phenotype-Only Model, with both equivalent ROC-AUC (0.81) and PR-AUC (0.33), and marginally improved F1 score (0.31 vs 0.30). Case identification improvements across PRS quartiles were modest, peaking at 3.6% in Q4 (figure 2b). A global χ2 test confirmed significant differences in rescued-case proportions across quartiles (p=0.003), although the modest improvements in AoU likely reflect the cohort’s higher baseline metabolic risk, which reduces the additional predictive value contributed by PRS.

Figure 2. Contribution of the T2DM polygenic risk score to case identification in the PRS-Integrated Screening Model. Percentage of additional incident T2DM cases (‘rescued cases’) identified by the PRS-Integrated Screening Model relative to the Phenotype-Only Model across quartiles of the T2DM PRS distribution in (a) UKBB and (b) AoU. Higher rescued-case proportions in upper PRS quartiles indicate greater incremental utility of genetic information among individuals with higher inherited risk. AoU, All of Us; PRS, polygenic risk score; T2DM, type 2 diabetes mellitus; UKBB, UK Biobank.

Figure 2

External validation in UKBB

When tested on the UKBB, the PRS-Integrated Screening Model more strongly outperformed the Phenotype-Only Model, with improved ROC-AUC (0.82 vs 0.80), improved PR-AUC (0.29 vs 0.26) and improved F1 score (0.34 vs 0.33). Case identification gains were more pronounced in high genetic risk groups, ranging from 2% in Q1 to 12.5% in Q4 (figure 2a). Differences across quartiles were highly significant (global χ2 p=2.3×10⁻²³), reflecting clear enrichment of PRS benefit in individuals with greater genetic predisposition. These findings underscore the stronger utility of PRS in UKBB, a cohort with a healthier baseline and lower metabolic disease burden.

Explainability analysis

SHAP analysis (online supplemental figure 2) highlighted key differences in predictor importance between cohorts. In AoU, metabolic, psychological and behavioural risk factors dominated, reflecting the cohort’s higher baseline metabolic risk. PRS played a less prominent role, potentially because the physical expression of genetic risk was captured by the higher metabolic disease burden observed in this cohort. The accelerated expression of genetic risk observed in the AoU cohort leads to greater redundancy in the predictive value of genetic versus clinical contributors to risk, diminishing the utility of genetics. In contrast, in the healthier UKBB cohort, PRS emerged as a key contributor, particularly in Q4, where it helped identify high-risk individuals. The lower metabolic disease burden in UKBB allowed PRS to play a more central role, as the healthier cohort made genetic contributions more distinguishable.

Additional model explainability analyses using impurity-based feature importance, permutation importance and SHAP values demonstrated consistent predictor rankings, supporting the robustness and stability of the XGBoost model (online supplemental figures 2-3).

A Random Forest classifier trained on the same dataset was included solely for explainability sensitivity analysis. Its predictive performance was lower than XGBoost PRS-Integrated Screening (AUC=0.795, PR-AUC=0.278, F1=0.029), and the model was therefore not pursued further for primary prediction tasks.

Although both XGBoost and Random Forest identified similar high-impact clinical predictors, Random Forest exhibited substantially noisier SHAP value distributions and attenuated mid-tier feature effects relative to XGBoost. Quantitatively, SHAP-based rankings showed strong agreement for the top predictors (Spearman ρ=0.805, p=1.90×10⁻⁵), but only moderate concordance across all features (ρ=0.402, p=1.32×10⁻⁵). Agreement decreased further for permutation importance (ρ=0.408, p=9.7×10⁻⁶), and impurity-based importance showed minimal correlation (ρ=0.069, p=0.48), reflecting known methodological differences between bagging and boosting. Full results are provided in the Supplementary Results and online supplemental figure 4.

Comparison of PRS-integrated screening model and ADA risk test

The comparison was limited to UKBB due to the lack of physical activity data in AoU, a key predictor in the ADA risk test. The PRS-Integrated Screening Model outperformed the ADA test, achieving higher ROC-AUC (0.82 vs 0.77) and PR-AUC (0.29 vs 0.27). The PRS-Integrated Model also demonstrated a net reclassification improvement (NRI) of 5.2%, underscoring its superior predictive performance.

Performance of PRS-integrated advanced risk model

AoU internal evaluation

The PRS-Integrated Advanced Risk Model was trained on individuals with HbA1c measurements in the AoU cohort (2109 cases and 8493 controls). When tested within AoU, the model achieved a sensitivity of 68%, specificity of 74%, ROC-AUC of 0.78, PR-AUC of 0.54 and F1 score of 0.50, demonstrating its ability to predict T2DM in a diverse, real-world population characterised by significant variability in metabolic risk factors.

External validation in UKBB

The model was validated on individuals identified as high-risk by the PRS-Integrated Screening Model in UKBB, simulating the intended two-stage screening workflow. Within this high-risk subset, the model achieved a sensitivity of 66%, specificity of 74%, ROC-AUC of 0.77, PR-AUC of 0.53 and F1 score of 0.50. These results highlight the model’s generalisability across two distinct cohorts, demonstrating its utility in both diverse, real-world populations and more structured settings. Figure 3 compares the performance of all models, including the ADA risk test, in both AoU and UKBB.

Figure 3. Performance of risk prediction models across cohorts. ROC and PR curves for (a–b) the PRS-Integrated Screening Model; (c–d) the PRS-Integrated Advanced Risk Model; (e–f) the Phenotype-Only Model; and (g–h) the ADA risk assessment test (UKBB only). Panels show model discrimination based on ROC-AUC and PR-AUC, illustrating differences in performance across the two-stage workflow and comparator models. ADA, American Diabetes Association; AoU, All of Us; AUC, area under the curve; UKBB, UK Biobank; PR, precision–recall; PRS, polygenic risk score; ROC, receiver operating characteristic.

Figure 3

Subgroup performance

Subgroup analyses showed that the PRS-Integrated Screening Model performed consistently across sex and ancestry groups in both AoU and UKBB, with all metrics reported alongside non-parametric 95% CIs (online supplemental table 4). CIs were wider in smaller groups such as NHPI, MENA and Chinese participants, reflecting limited event counts. Occasional values such as 100% sensitivity occurred only in strata with very small numbers of cases and therefore represent sampling variability rather than true perfect performance. Across all major groups, the PRS-Integrated model matched or exceeded the performance of the Phenotype-Only Model and the ADA risk test.

Comparative performance of baseline and nonlinear models

Pairwise DeLong tests were conducted in the UKBB external validation cohort to quantify the incremental contributions of genetic information and nonlinear modelling (online supplemental table 5). Adding a single composite PRS to the Phenotype-Only Logistic Model resulted in a minimal but statistically significant improvement in AUC (0.7939 to 0.7957; ΔAUC=0.0017, p=8.7×10⁻¹²), indicating limited discriminative gain from a simplified genetic summary measure.

In contrast, incorporating the full set of 73 PRS using LASSO logistic regression produced a substantially larger improvement (Composite PRS Model vs LASSO: ΔAUC=0.0144, p=1.36×10⁻¹¹¹; Phenotype-Only vs LASSO: ΔAUC=0.0161, p=1.79×10⁻¹²¹). This demonstrates that joint modelling of multiple PRS captures substantially more genetic signal than a composite PRS alone.

The PRS-Integrated XGBoost model did not significantly outperform the LASSO model in UKBB (AUC 0.8086 vs 0.8100; ΔAUC = −0.0014, p=0.33), suggesting that nonlinear interactions contributed limited additional value beyond the set of sparsely selected genetic and clinical features. However, XGBoost significantly outperformed both the Phenotype-Only and Composite PRS Logistic Models (p<10⁻¹⁶), reflecting its use of a richer genetic feature set.

Consistent with this, the PRS-Integrated XGBoost model also significantly outperformed the XGBoost phenotype-only model (AUC 0.8220 vs 0.8041; ΔAUC=0.0180, p=1.05×10⁻⁷⁶), demonstrating that performance gains were driven primarily by the inclusion of PRS rather than modelling complexity.

Overall, these comparisons indicate that while genetic information meaningfully improves T2DM risk prediction, most of the discriminative gain arises from linear modelling of multiple PRS, with minimal added benefit from nonlinear approaches in the UKBB setting. The performance of all these models is shown in online supplemental figure 5 and online supplemental table 6.

Sensitivity analysis for missing data

To evaluate robustness to missing data, we compared model performance on the UKBB dataset with and without multiple imputation. Discrimination was very similar across approaches. For the PRS-Integrated Screening Model, the ROC-AUC increased slightly from 0.809 in the non-imputed dataset to 0.822 in the imputed dataset, with corresponding PR-AUC values of 0.282 and 0.289. Although the ROC-AUC difference reached statistical significance mostly due to the large sample size (paired DeLong p<0.001), the absolute magnitude of the difference was small (online supplemental table 7). These findings indicate that model performance was stable across missing-data strategies, supporting the robustness of the analytical framework.

Cost reduction analysis compared with the ADA risk test

Using the ADA’s reported annual medical expenditure of US$12 202 per individual with T2DM,26 we calculated the net savings of implementing the PRS-Integrated Screening Model versus the ADA risk test. Assuming a PRS test cost of US$70 per person,27 the PRS-Integrated Model yielded US$849 321 in net savings per 1000 individuals screened, compared with US$802 339 for the ADA test, corresponding to an incremental savings of US$46 982. Bootstrap resampling (1000 iterations) produced a 95% CI of US$11 837–US$83 389, confirming that the PRS-Integrated Model remains significantly cost-saving under realistic pricing.

Under a more conservative scenario in which the PRS cost was increased to US$100, the PRS-Integrated Model still produced US$819 321 in net savings per 1000 individuals. The mean incremental savings relative to the ADA test remained positive (US$16 982), although the bootstrap CI (–US$21 282 to US$53 692) crossed zero, indicating greater uncertainty when PRS pricing is substantially higher.

The Number Needed to Screen to identify one additional TP case with the PRS-Integrated Model compared with the ADA test was approximately 19. This highlights the model’s enhanced ability to detect at-risk individuals, offering significant opportunities for early intervention and prevention.

Additional data

Further results data are provided in online supplemental materials.

Discussion

Although prior studies have shown that both ML models and PRS can improve prediction of T2DM, many existing approaches rely on single-stage frameworks, limited predictor sets or lack external validation across diverse populations. Building on this foundation, our study evaluates a clinically aligned two-stage structure that integrates PRS with routine clinical variables and glycaemic measures, trained in a diverse real-world cohort and validated in a distinct research cohort. This design allows us to assess how genetic and clinical factors operate together within a screening workflow that mirrors practical implementation conditions.

This study explores the integration of PRS with clinical risk factors into ML models to create a personalised approach to T2DM screening. Our two-stage framework begins with a PRS-Integrated Screening Model that uses readily available clinical and genetic information, followed by an Advanced Risk Model incorporating glycaemic data (eg, HbA1c) for refined risk stratification. This approach aims to enhance early detection and optimise intervention strategies, addressing the growing health burden of T2DM.

While aligned with current clinical practices, this framework also anticipates a future in which routine genetic testing provides stable, lifelong data early in adulthood. When paired with repeated glycaemic measurements later in life, this structure enables a dynamic, individualised approach to T2DM risk stratification. Integrating stable genetic markers with time-varying biomarkers provides a pathway toward more adaptive screening models that evolve with clinical practice and technological innovation.

Evaluation across AoU and UKBB highlights how population context shapes the contribution of genetic and phenotypic predictors. In UKBB, a relatively healthy cohort with lower metabolic disease burden, PRS provided meaningful discrimination, particularly among individuals with high genetic risk, as reflected by greater improvements in rescued-case proportions and stronger SHAP contributions. In AoU, where metabolic comorbidities were more prevalent even among controls, PRS added less incremental information. This likely reflects partial expression of genetic predisposition through baseline metabolic traits, reducing the additional predictive value of PRS. These findings underscore the importance of validating precision-risk tools across diverse populations and recognising how underlying risk distributions influence model behaviour.

Across sex and major ancestry groups, PRS-Integrated models demonstrated consistent performance in both cohorts. Wider CIs observed in smaller subgroups, such as NHPI, MENA and Chinese individuals, reflect limited event counts rather than model instability. Overall patterns suggest that PRS-Integrated frameworks can perform equitably across populations, while also highlighting the need for continued inclusion of under-represented groups in genomic and longitudinal datasets to refine subgroup estimates.

Explainability analyses supported the stability of predictor influence. Metabolic traits, renal function markers, adiposity measures and hypertension consistently ranked among the strongest predictors across impurity-based, permutation-based and SHAP analyses. PRS contributions were modest in AoU and more pronounced in UKBB, aligning with differences in cohort profiles. Random Forest sensitivity analyses confirmed that these attribution patterns represent underlying signal rather than model-specific artefacts, reinforcing confidence in model interpretability.

Beyond clinical performance, our study highlights the potential economic benefits of integrating PRS into T2DM screening. Compared with the ADA risk test, the PRS-Integrated Screening Model identified more cases, leading to greater projected reductions in annual medical costs associated with undiagnosed or late-stage T2DM. Although the cost analysis focuses solely on the screening phase, the value of genetic data increases substantially when a single sequencing event informs risk assessment for multiple conditions. This broader relevance supports the feasibility and long-term value of integrating PRS into preventive healthcare strategies.

Several considerations should be noted. Because the models were harmonised across AoU and UKBB, predictors were limited to variables available in both cohorts. This ensured fair external validation but may have omitted additional risk factors that could enhance prediction in settings where richer data are available, such as lifestyle measures or longitudinal biomarker trajectories. Some ancestry subgroups included relatively few incident cases, resulting in wide CIs and reduced precision of subgroup estimates. Finally, while the cost analysis captures screening-related differences, prospective evaluation is needed to determine the long-term clinical and economic effects of implementing the two-stage workflow.

Overall, this work demonstrates that combining PRS with clinical and glycaemic data in a structured screening workflow can improve individualised T2DM risk assessment. The models performed consistently across cohorts, demonstrated equitable subgroup performance and maintained stable interpretability across multiple explainability methods. These findings support future work on real-world implementation, broader validation across healthcare settings and integration of PRS-informed ML tools into preventive care programmes.

Supplementary material

online supplemental figure 1
bmjopen-16-1-s001.tif (671.9KB, tif)
DOI: 10.1136/bmjopen-2025-103171
online supplemental figure 2
bmjopen-16-1-s002.tif (1.3MB, tif)
DOI: 10.1136/bmjopen-2025-103171
online supplemental figure 3
bmjopen-16-1-s003.tif (527.4KB, tif)
DOI: 10.1136/bmjopen-2025-103171
online supplemental figure 4
bmjopen-16-1-s004.tif (422.2KB, tif)
DOI: 10.1136/bmjopen-2025-103171
online supplemental figure 5
bmjopen-16-1-s005.tif (331.8KB, tif)
DOI: 10.1136/bmjopen-2025-103171
online supplemental file 1
bmjopen-16-1-s006.docx (17.9KB, docx)
DOI: 10.1136/bmjopen-2025-103171
online supplemental file 2
bmjopen-16-1-s007.xlsx (38.2KB, xlsx)
DOI: 10.1136/bmjopen-2025-103171

Acknowledgements

This study used data from the All of Us Research Program and the UK Biobank. We thank the participants and coordinators of both programmes for their invaluable contributions.

Footnotes

Funding: This work was supported by the National Institutes of Health grant (5R01HG010881-03).

Prepublication history and additional supplemental material for this paper are available online. To view these files, please visit the journal online (https://doi.org/10.1136/bmjopen-2025-103171).

Provenance and peer review: Not commissioned; externally peer reviewed.

Patient consent for publication: Not applicable.

Ethics approval: This study involves human participants and was approved by the Scripps Institutional Review Board. Participants in this study were enrolled through the All of Us Research Program (AoU) and the UK Biobank (UKBB), both of which obtained informed consent from participants at the time of enrolment. Our study used de-identified data from these cohorts, ensuring participant confidentiality and data protection. Additionally, this study (IRB-17-7005) was approved by the Scripps Institutional Review Board, which reviewed and confirmed that our research complied with ethical standards for secondary data analysis.

Data availability free text: The data used in this study are available from the UKBB and the AoU Research Program. Access to UKBB data is subject to approval and can be requested through their application process. The UKBB approved the use of data for this study under application number 41999. Data from the AoU Research Program are available to registered researchers through the Researcher Workbench.

Patient and public involvement: Patients and/or the public were not involved in the design, or conduct, or reporting or dissemination plans of this research.

Data availability statement

Data may be obtained from a third party and are not publicly available.

References

  • 1.Zheng Y, Ley SH, Hu FB. Global aetiology and epidemiology of type 2 diabetes mellitus and its complications. Nat Rev Endocrinol. 2018;14:88–98. doi: 10.1038/nrendo.2017.151. [DOI] [PubMed] [Google Scholar]
  • 2.Resources, diabetes l with, acknowledgement, faqs, contact, et al . IDF Diabetes Atlas 2021. 2024. https://diabetesatlas.org/atlas/tenth-edition/ Available. [Google Scholar]
  • 3.American Diabetes Association Prevention or Delay of Type 2 Diabetes: Standards of Medical Care in Diabetes—2021. Diabetes Care. 2021;44:S34–9. doi: 10.2337/dc21-S003. [DOI] [PubMed] [Google Scholar]
  • 4.Bang H, Edwards AM, Bomback AS, et al. Development and validation of a patient self-assessment score for diabetes risk. Ann Intern Med. 2009;151:775–83. doi: 10.7326/0003-4819-151-11-200912010-00005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Vujkovic M, Keaton JM, Lynch JA, et al. Discovery of 318 new risk loci for type 2 diabetes and related vascular outcomes among 1.4 million participants in a multi-ancestry meta-analysis. Nat Genet. 2020;52:680–91. doi: 10.1038/s41588-020-0637-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Läll K, Mägi R, Morris A, et al. Personalized risk prediction for type 2 diabetes: the potential of genetic risk scores. Genet Med. 2017;19:322–9. doi: 10.1038/gim.2016.103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Khera AV, Chaffin M, Aragam KG, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet. 2018;50:1219–24. doi: 10.1038/s41588-018-0183-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Chen T, Guestrin C. XGBoost: a scalable tree boosting system; 2016. pp. 785–94. Available. [DOI] [Google Scholar]
  • 9.Ganie SM, Pramanik PKD, Bashir Malik M, et al. An ensemble learning approach for diabetes prediction using boosting techniques. Front Genet. 2023;14:1252159. doi: 10.3389/fgene.2023.1252159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Deberneh HM, Kim I. Prediction of Type 2 Diabetes Based on Machine Learning Algorithm. Int J Environ Res Public Health. 2021;18:3317. doi: 10.3390/ijerph18063317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Wang S, Chen R, Wang S, et al. Comparative study on risk prediction model of type 2 diabetes based on machine learning theory: a cross-sectional study. BMJ Open. 2023;13:e069018. doi: 10.1136/bmjopen-2022-069018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Rout M, Wander GS, Ralhan S, et al. Assessing the prediction of type 2 diabetes risk using polygenic and clinical risk scores in South Asian study populations. Therapeutic Advances in Endocrinology. 2023;14:20420188231220120. doi: 10.1177/20420188231220120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Liu X, Littlejohns TJ, Bešević J, et al. Incorporating polygenic risk into the Leicester Risk Assessment score for 10-year risk prediction of type 2 diabetes. Diabetes & Metabolic Syndrome: Clinical Research & Reviews . 2024;18:102996. doi: 10.1016/j.dsx.2024.102996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Hahn SJ, Kim S, Choi YS, et al. Prediction of type 2 diabetes using genome-wide polygenic risk score and metabolic profiles: A machine learning analysis of population-based 10-year prospective cohort study. EBioMedicine. 2022;86:104383. doi: 10.1016/j.ebiom.2022.104383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kiran M, Xie Y, Anjum N, et al. Machine learning and artificial intelligence in type 2 diabetes prediction: a comprehensive 33-year bibliometric and literature analysis. Front Digit Health . 2025;7:1557467. doi: 10.3389/fdgth.2025.1557467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bianchi DW, Brennan PF, Chiang MF, et al. The All of Us Research Program is an opportunity to enhance the diversity of US biomedical research. Nat Med. 2024;30:330–3. doi: 10.1038/s41591-023-02744-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.miceRanger WS. Multiple imputation by chained equations with random forests. 2021. https://cran.r-project.org/web/packages/miceRanger/index.html Available.
  • 18.DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44:837–45. [PubMed] [Google Scholar]
  • 19.Lambert SA, Gil L, Jupp S, et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat Genet. 2021;53:420–5. doi: 10.1038/s41588-021-00783-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Chen S-F, Dias R, Evans D, et al. Genotype imputation and variability in polygenic risk score estimation. Genome Med. 2020;12:100. doi: 10.1186/s13073-020-00801-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Khattab A, Chen S-F, Wineinger N, et al. AoUPRS: A cost-effective and versatile PRS calculator for the All of Us Program. BMC Genomics. 2025;26:521. doi: 10.1186/s12864-025-11693-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Akiba T, Sano S, Yanase T, et al. Optuna: a next-generation hyperparameter optimization framework.proceedings of the 25th acm sigkdd international conference on knowledge discovery & data mining [internet; 2019. pp. 2623–31. [DOI] [Google Scholar]
  • 23.Lundberg SM, Lee SI. Advances in Neural Information Processing Systems. Curran Associates, Inc; 2017. A unified approach to interpreting model predictions.https://papers.nips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html Available. [Google Scholar]
  • 24.Sun X, Xu W. Fast Implementation of DeLong’s Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves. IEEE Signal Process Lett. 2014;21:1389–93. doi: 10.1109/LSP.2014.2337313. [DOI] [Google Scholar]
  • 25.Zeng C, Schlueter DJ, Tran TC, et al. Comparison of phenomic profiles in the All of Us Research Program against the US general population and the UK Biobank. J Am Med Inform Assoc. 2024;31:846–54. doi: 10.1093/jamia/ocad260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Parker ED, Lin J, Mahoney T, et al. Economic Costs of Diabetes in the U.S. in 2022. Diabetes Care. 2024;47:26–43. doi: 10.2337/dci23-0085. [DOI] [PubMed] [Google Scholar]
  • 27.Kiflen M, Le A, Mao S, et al. Cost-Effectiveness of Polygenic Risk Scores to Guide Statin Therapy for Cardiovascular Disease Prevention. Circ: Genomic and Precision Medicine. 2022;15:e003423. doi: 10.1161/CIRCGEN.121.003423. [DOI] [PubMed] [Google Scholar]

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    online supplemental figure 1
    bmjopen-16-1-s001.tif (671.9KB, tif)
    DOI: 10.1136/bmjopen-2025-103171
    online supplemental figure 2
    bmjopen-16-1-s002.tif (1.3MB, tif)
    DOI: 10.1136/bmjopen-2025-103171
    online supplemental figure 3
    bmjopen-16-1-s003.tif (527.4KB, tif)
    DOI: 10.1136/bmjopen-2025-103171
    online supplemental figure 4
    bmjopen-16-1-s004.tif (422.2KB, tif)
    DOI: 10.1136/bmjopen-2025-103171
    online supplemental figure 5
    bmjopen-16-1-s005.tif (331.8KB, tif)
    DOI: 10.1136/bmjopen-2025-103171
    online supplemental file 1
    bmjopen-16-1-s006.docx (17.9KB, docx)
    DOI: 10.1136/bmjopen-2025-103171
    online supplemental file 2
    bmjopen-16-1-s007.xlsx (38.2KB, xlsx)
    DOI: 10.1136/bmjopen-2025-103171

    Data Availability Statement

    Data may be obtained from a third party and are not publicly available.


    Articles from BMJ Open are provided here courtesy of BMJ Publishing Group

    RESOURCES