Abstract
Background
Osteoporosis represents a major health challenge in aging populations, yet its diagnosis largely depends on dual-energy X-ray absorptiometry (DXA), which is both costly and radiation-based. This study aimed to develop a practical, non-radiographic prediction model for osteoporosis using interpretable machine learning techniques and to implement it as an accessible online calculator for rapid clinical and community screening.
Methods
Data were derived from the 2008–2011 waves of the Korean National Health and Nutrition Examination Survey (KNHANES). Individuals with over 30% missing data were excluded, and incomplete variables were imputed via polynomial interpolation (for continuous variables) and mode imputation (for categorical variables). After performing Spearman correlation analysis (p < 0.001) to identify osteoporosis-related features, GradientBoost-RFE and LASSO regression were applied for dimensionality reduction, yielding 15 essential predictors, including age, sex, body mass index (BMI), etc. GradientBoost, CatBoost, and XGBoost algorithms were trained to estimate abnormal DXA results and classify bone status. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), specificity (SPE), and accuracy (ACC), with a temporal validation set (the 2008 wave of KNHANES) for testing.
Results
A total of 18,179 participants were included, with 14,747 in the development cohort and 3,432 in the temporal validation set. Among them, 64.6% exhibited normal DXA results. The optimal model achieved an AUC of 0.845 and SPE of 0.897 for identifying abnormal DXA outcomes, and demonstrated an AUC of 0.876 and SPE of 0.909 in temporal validation. For multiclass classification (normal, osteopenia, osteoporosis), the model reached ACC of 0.724 and 0.744, and SPE of 0.803 and 0.819 in the development and validation datasets, respectively.
Conclusion
We developed and validated an interpretable machine learning model that accurately predicts osteoporosis risk and DXA abnormalities using readily available demographic, biochemical, and lifestyle data. To facilitate clinical translation, the model has been deployed as an interactive online calculator, enabling non-invasive, rapid osteoporosis risk assessment without radiological testing. This tool may support early identification of high-risk individuals, optimize DXA utilization, and enhance preventive care strategies across diverse healthcare settings.
Supplementary Information
The online version contains supplementary material available at 10.1186/s13040-026-00520-w.
Keywords: Machine learning, Prediction model, Osteoporosis, Dual-energy X-ray examinations, Cross-sectional study, SHAP interpretability, Temporal validation
Background
Osteoporosis has become a major public health problem in an aging society [1], affecting approximately 200 million people worldwide [2]. The growing number of elderly has led to an increase in the prevalence of osteoporosis worldwide, currently estimated at 19.7% [3]. Osteoporosis is characterized by low bone mass, microarchitectural deterioration, fragility, and increased risk of fractures. In fact, osteoporosis is mild at an early stage. However, it often leads to delayed diagnosis until a fracture occurs [4]. The International Osteoporosis Foundation reports that approximately one-third of women aged 50 and over and one-fifth of men experience an osteoporotic fracture [5, 6]. Osteoporotic fractures, especially hip fractures, are associated with limited walking, chronic pain, disability, loss of independence, and decreased quality of life. Thus, osteoporotic fractures are recognized by the World Health Organization (WHO) as one of the most important public health priorities [7]. Because osteoporosis is often asymptomatic until a fracture occurs, early screening and detection are key strategies for osteoporosis management [8].
Dual-energy X-ray absorptiometry (DXA) is the gold standard for measuring bone mineral density and diagnosing osteoporosis [9]. DXA also has some limitations, including long wait times and limited accessibility [10]. In European countries, the average waiting time for a DXA scan can be up to 180 days [11]. At the same time, the need for skilled technicians and radiation exposure also limit its widespread use [12]. Therefore, the limited availability and accessibility of DXA have hampered its use in population screening and primary care diagnosis [13]. Besides, opportunistic imaging techniques, including MRI-based bone quality assessments, have been increasingly investigated in recent years [14]. While MRI and CT-based opportunistic bone assessments offer an alternative to DXA, they are often expensive and not widely available in all clinical settings [15]. These limitations have contributed to the growing interest in machine learning-based approaches for osteoporosis prediction, as they allow for non-invasive and cost-effective risk assessment using routinely collected clinical and demographic data.
Developing an osteoporosis risk prediction model without DXA data may be a simple way for public healthcare organizations to facilitate early diagnosis and timely treatment of osteoporosis [16]. There are many prediction models for osteoporosis, each of which incorporates several clinical variables such as a history of fractures, advanced age, low body weight, early estrogen deficiency, low calcium intake, and vitamin D deficiency [17]. Several studies have developed clinical decision-making tools such as the osteoporosis risk index and the Osteoporosis Self-Assessment Tool for Asians (OSTA) [18]. However, these models contain relatively few or single risk factors, and the accuracy of these assessment tools is low, so their clinical application is limited [19]. The prediction of osteoporosis requires the development of more diverse models that incorporate more variables, and the methodology can also be improved.
Machine learning (ML) is an important artificial intelligence method that uses complex algorithms to discover patterns in large data sets [20–22]. As the acceptance of clinicians gradually increases, ML has been applied in many fields of clinical medicine, including osteoporosis [23, 24]. However, there are shortcomings, such as a single study population and difficulty in obtaining inclusion indicators. This study will fully leverage the advantages of the KNHANES database, such as its large sample size, rapid variable acquisition, and screening nature, to construct a more practical rapid screening model for osteoporosis using machine learning methods (Fig. 1-a). Furthermore, we implemented this model as an online calculator to facilitate clinical and community use (Fig. 1-c, available at http://123.56.120.106:9000/). This tool provides an application foundation for osteoporosis risk prediction in the general population.
Fig. 1.
Central illustration and flow chart of the study design. This picture contains experimental design and experimental process. a is the process of the entire study and visually illustrate the included variables. We also briefly summarize the findings and advantages of this study. b: The processing flow of the entire dataset. After excluding some participants, a total of 18,179 people were included. Then the dataset of 2009–2011 was used for five-fold cross-validation to train and validate the model, and the remaining data of 2008 was used as an independent temporal validation set
Methods
Study design and participants
This cross-sectional study used a comprehensive health examination dataset based on the Korean National Health and Nutrition Examination Survey (KNHANES IV and V; available online at https://knhanes.kdca.go.kr/knhanes/eng) conducted from 2008 to 2011. The KNHANES surveyed demographic characteristics, vital sign measurements, laboratory data, and DXA only during this period. The study protocol was approved by the Institutional Review Board of the Korean Center for Disease Control and Prevention (No. 2008–04EXP-01-C, 2009–01CON-03-C, 2010–02CON-21-C, and 2011–02CON-06-C), and data collection was approved by the Institutional Review Board of the Korean National Institute for Bioethics Policy. All participants signed consent forms for the use of their health information for data collection of the KNHANES. The KNHANES is a nationwide, population-based, cross-sectional survey conducted by the Division of Chronic Disease Surveillance of the Korea Centers for Disease Control and Prevention [25]. Each participant completed a questionnaire containing information such as age, household income, alcohol use, smoking status, hypertension, and diabetes [26]. We included participants who met the following criteria: (1) availability of DXA results, (2) complete T-scores for all three regions (femoral neck, total femur, and lumbar spine), and (3) less than 30% missing variables. Participants were excluded if they lacked DXA results, had missing T-scores for any of the three regions, or had more than 30% missing variables.
Data collection and processing
We used demographic characteristics, vital sign measurements, and laboratory data collected through the Health and Nutrition Examination Survey questionnaire to identify features and construct a prediction model. The codes and explanations of the variables included in this study are shown in Supplementary Table 1. All data were from the data published on the official website of the KNHANES. Since some people in the 2008–2011 study did not complete the DXA examination or lacked some key data, we excluded this group of people before constructing the prediction model. We deleted variables with more than 30% missing values and handled the remaining missing value variables using polynomial interpolation.
Definition of osteoporosis
Combining the KNHANES standard proposed in 2013 [27] and the DXA diagnostic standard for osteoporosis proposed by WHO in 2000 [28], we use the bone mineral density (BMD) T-score of the femoral neck, total femur, and lumbar spine obtained by DXA measurement as the diagnostic standard for osteoporosis. For each subject, the classification was determined by the minimum T-score among the three sites: if the minimum value was ≤ -2.5, the subject was classified as osteoporotic; if it was between − 2.5 and − 1.0, the subject was classified as osteopenia; and if it was ≥ -1.0, the subject was classified as normal. Accordingly, we assigned normal subjects to category 0, osteopenia to category 1, and osteoporosis to category 2.
Correlation analysis
Spearman correlation analysis is a non-parametric statistical method used to measure the relationship between two variables without assuming that the relationship between these variables is linear. By employing Spearman correlation analysis, we preliminarily filter a large dataset to identify variables that exhibit a strong correlation with osteoporosis. These strongly correlated variables are then used as inputs for variable selection. p < 0.001 is considered statistically significant.
Variable selection
A total of 15 characteristics, including age, gender, body mass index (BMI), waist circumference, age of drinking start, average monthly household income, total household income, education level, daily food intake, daily protein intake, daily phosphorus intake, daily potassium intake, hemoglobin, alkaline phosphatase (ALP), and forced vital capacity (FVC), were used to develop the prediction model. Among them, the variables of total household income and average monthly household income were further classified according to previous literature for better description in the baseline table (Table 1) [29, 30]. The classification of education level comes from previous research, and we further divide it into three categories: low, medium, and high [31]. To further investigate the impact of different variables on osteoporosis and to identify the most important subset of features, we employed two feature selection methods: Lasso regression correlation analysis and GradientBoost Recursive Feature Elimination (RFE). In Lasso regression, the regularization term shrinks the regression coefficients during the fitting process, with some coefficients being shrunk to zero, thereby facilitating variable selection. RFE, on the other hand, iteratively constructs a model and removes the least contributing features, gradually reducing the number of variables to identify the subset most relevant to osteoporosis. The variables included in this study are shown in Supplementary Table 1. By combining these two methods, we can comprehensively and accurately assess the importance of variables, providing robust support for subsequent model construction.
Table 1.
Baseline characteristics of male and female participants
| Variables | Total (n = 18179) |
Male (n = 7852) |
Female (n = 10327) |
p-Value |
|---|---|---|---|---|
| Age (years) | 49 (37, 62) | 49 (37, 62) | 49 (37, 63) | 0.503 |
| BMI (kg/m2) | 23 (21, 26) | 24 (22, 26) | 23 (21, 25) | < 0.001* |
| Waistline (cm) | 81 (74, 88) | 84 (78, 90) | 78 (71, 85) | < 0.001* |
| Alcohol (n, %) | 16,603 (91.33%) | 7606 (96.87) | 8997 (87.12) | < 0.001* |
| Age of initiation of drinking (years) | 20 (18, 26) | 19 (17, 20) | 22 (19, 40) | < 0.001* |
| Household income (n, %) | < 0.001* | |||
| less than Q1 | 4513 (24.83) | 1755 (22.35) | 2758 (26.71) | |
| Q1-Q2 | 4434 (24.39) | 1926 (24.53) | 2508 (24.29) | |
| Q2-Q3 | 4687 (25.78) | 2105 (26.81) | 2582 (25.00) | |
| more than Q3 | 4545 (25.00) | 2066 (26.31) | 2479 (24.01) | |
| Household incomes per capita (n, %) | < 0.001* | |||
| less than Q1 | 4532 (24.93) | 1781 (22.68) | 2751 (26.64) | |
| Q1-Q2 | 3911 (21.51) | 1725 (21.97) | 2186 (21.17) | |
| Q2-Q3 | 5170 (28.44) | 2346 (29.88) | 2824 (27.35) | |
| more than Q3 | 4566 (25.12) | 2000 (25.47) | 2566 (24.85) | |
| Educational level (n, %) | < 0.001* | |||
| low | 2571 (14.14) | 623 (7.93) | 1948 (18.86) | |
| medium | 6321 (34.77) | 2645 (33.69) | 3676 (35.60) | |
| high | 9225 (50.75) | 4556 (58.02) | 4669 (45.21) | |
| unknown | 62 (0.34) | 28 (0.36) | 34 (0.33) | |
| Abnormal bone mass (n, %) | 6437 (35.41) | 1993 (25.38) | 4444 (43.03) | < 0.001* |
| Daily food intake (g) | 1231 (851, 1721) | 1410 (988, 1962) | 1108 (780, 1539) | < 0.001* |
| Daily protein intake (g) | 60 (42, 84) | 71 (50, 98) | 53 (38,72) | < 0.001* |
| Daily potassium intake (mg) | 2706 (1903, 3729) | 3066 (2203, 4141) | 2452 (1732, 3369) | < 0.001* |
| Daily phosphorus intake (mg) | 1048 (776, 1403) | 1210 (917, 1603) | 940 (705, 1235) | < 0.001* |
| Hb (g/L) | 14 (13, 15) | 15 (14, 16) | 13 (12, 14) | < 0.001* |
| FVC (L) | 3 (3, 4) | 4 (3, 5) | 3 (3, 4) | < 0.001* |
| ALP | 214 (175, 261) | 225 (190, 266) | 204 (164, 156) | < 0.001* |
Values are presented as n (%) as appropriate or the median [interquartile range (IQR)]. ALP, alkaline phosphatase; BMI, body mass index; FVC, forced vital capacity; Hb, hemoglobin
Development of models
Data from a cross-sectional cohort consisting of health questionnaires and nutritional screening data collected by KNHANES for three consecutive years were randomly divided, with 80% used for training and 20% for validation (internal validation) to avoid overfitting problems. In addition, data from the 2008 KNHANES cohort were used for the temporal validation set. The feasibility of this approach comes from the low repeatability and regional differences of the KNHANES sampling surveys between different years [32]. And it is not difficult to use the data of a certain year as a data set for validation, because the inclusion and exclusion criteria of the data are the same as those of the original cohort [8]. The predicted value of the prediction model was then compared with the actual recorded conditions of the subjects.
We employed three ML algorithms—GradientBoost, CatBoost, and XGBoost—to construct binary classification (distinguish patients with normal or abnormal DXA examination results) and ternary classification (distinguish whether the patient is osteoporosis, osteopenia, or normal) models for osteoporosis prediction. We conducted experiments using five-fold cross-validation to comprehensively evaluate their performance on the dataset. Various evaluation metrics were used to compare these models, including AUC, accuracy, sensitivity, specificity, F1 score, positive predictive value, and negative predictive value.
SHAP interpretable analysis for machine learning
To enhance the interpretability of our model, we employ SHAP (SHapley Additive exPlanations) analysis. SHAP is a unified framework for interpreting predictions, which assigns each feature an importance value for a particular prediction [33, 34]. This method leverages concepts from cooperative game theory, specifically Shapley values, to provide a fair allocation of the contribution of each feature to the model’s output. By applying SHAP analysis to our models, we achieve a deeper understanding of how each feature influences the model’s predictions.
Statistical analysis
Data analyses were performed with Python, version 3.8.18 (https://www.python.org), and SPSS statistical software, version 25.0 (https://www.ibm.com/spss). Continuous variables with a normal distribution were reported as mean ± standard deviation (SD), skewed data as median (interquartile range), and categorical variables were presented as numbers (percentages). Baseline variables among patients in different groups were compared using analysis of variance (ANOVA), Mann-Whitney test, Kruskal-Wallis H test, Pearson chi-square test, or Fisher’s exact test according to the data types. The Area Under the Curve (AUC) was utilized as a metric to assess the prediction performance of binary classification outcome variables. Displays restricted cubic spline (RCS) curves with 4 knots to test nonlinear relationships between independent variables and outcomes [35, 36] and to estimate the optimal independent variable values for the most accurate osteoporosis. In the case of three-category outcome variables, the AUC metric is transformed into the Accuracy (ACC).
Implementation details
For hyperparameter tuning, we adopted a two-stage strategy. First, we used the default hyperparameters of each model to establish baseline performance. For example, in the case of the GradientBoost, the default configuration includes a learning rate of 0.1, 100 boosting estimators, a maximum depth of 3, and a subsample ratio of 1.0. Second, we conducted targeted hyperparameter optimization focusing on key parameters, using a grid search combined with cross-validation. This approach allowed us to identify parameter settings that improved predictive performance while minimizing the risk of overfitting. For polynomial interpolation, we conducted comparative experiments and ultimately selected second-order polynomial interpolation because it provided the most stable and consistent results for our dataset.
Results
Baseline characteristics of participants
This retrospective study comprised 18,179 participants, who were divided into two groups based on gender (7852 male; 10327 female). Among all the subjects, 6437 people had abnormal DXA examination results, accounting for 35.4%. Among them, 4142 people had osteopenia and 2295 people had osteoporosis, accounting for 22.8% and 12.6% of all the subjects, respectively. Their baseline characteristics are shown in Table 1. Except for age, there were found to be significant differences in all socioeconomic and nutritional health-related variables between the two groups (p < 0.001). Compared with the female group, both the BMI and ALP levels of the male group were significantly higher (BMI: male 24 kg/m2 vs. female 23 kg/m2; ALP: male 225 IU/L vs. female 204 IU/L). The descriptive statistics of all socioeconomic and nutritional health-related variables across the categories of BMI and ALP were presented in Supplementary Tables 2 and Supplementary Table 3.
Variables selection
After filling in the missing variables according to the method described before, we further screened the variables that were strongly correlated with the osteoporosis outcome label using Spearman correlation analysis (p < 0.001) in Supplementary Fig. 1-a. For these variables, we used GradientBoost-RFE and the least absolute shrinkage and selection operator (LASSO) for dimensionality reduction. In the LASSO dimensionality reduction method, the best binary model prediction performance, that is, the highest AUC value, can be achieved when the number of included variables is 20 (Supplementary Fig. 1-b1). Similarly, after dimensionality reduction using the RFE method, we retained 17 variables to obtain the best performance (Supplementary Fig. 1-b2). We then took the intersection of these two parts of variables and determined the 15 variables that were finally included in the model (Supplementary Fig. 1-c).
Results of binary classification models
After incorporating 15 variables, we built a model using three methods: GradientBoost, CatBoost, and XGBoost. We employ a classification scheme with a fixed threshold. The output of the model is whether the non-radioactive predicted DXA examination results of the subjects are abnormal, which is compared with the final result recorded in the electronic medical record. It was then validated in an independent temporal validation set. We found that the model achieved good predictive performance in the development dataset and temporal validation dataset with AUCs of 0.845 and 0.876, respectively, which proved the stability and usability of the model, as shown in Supplementary Fig. 2 and Table 2. In terms of other evaluation indicators, including accuracy (ACC), sensitivity (SEN), specificity (SPE), positive predictive value (PPV), and negative predictive value (NPV), the model still achieved stability in both the internal test set and the temporal validation set, as shown in Table 2. As for AUC, GradientBoost has the best results among the three algorithms (84.5%, 95% confidence interval (CI): 0.831 to 0.861). The performances in CatBoost and XGBoost are 84.2% and 84.1% respectively (95% CI: 0.826 to 0.859 and 0.806 to 0.838, respectively). In SPE, GradientBoost is slightly inferior to CatBoost, with the result of 89.7% and 89.8% (95% CI: 0.893 to 0.902 and 0.890 to 0.907). And the SPE in XGBoost is 87.8% with 95% CI: 0.865 to 0.886.
Table 2.
Summary of binary and multi-classification results
| Model | AUC (95% CI, low-high) |
Specificity (95% CI, low-high) |
Sensitivity (95% CI, low-high) |
Accuracy (95% CI, low-high) |
PPV (95% CI, low-high) |
NPV (95% CI, low-high) |
|---|---|---|---|---|---|---|
| Binary classification model in the internal test set | ||||||
| GradientBoost | 0.845(0.831–0.861) | 0.897(0.893–0.902) | 0.618(0.596–0.631) | 0.798(0.790–0.806) | 0.768(0.761–0.780) | 0.810(0.801–0.816) |
| CatBoost | 0.842(0.826–0.859) | 0.898(0.890–0.907) | 0.613(0.593–0.628) | 0.797(0.787–0.805) | 0.768(0.753–0.785) | 0.808(0.799–0.814) |
| XGBoost | 0.821(0.806–0.838) | 0.878(0.865–0.886) | 0.606(0.576–0.627) | 0.782(0.768–0.794) | 0.733(0.714–0.752) | 0.802(0.789–0.812) |
| Binary classification model in the temporal validation set | ||||||
| GradientBoost | 0.876(0.874–0.877) | 0.909(0.906–0.912) | 0.646(0.642–0.651) | 0.817(0.816–0.819) | 0.793(0.788–0.797) | 0.827(0.825–0.829) |
| CatBoost | 0.872(0.871–0.873) | 0.909(0.908–0.912) | 0.644(0.640–0.648) | 0.817(0.816–0.818) | 0.793(0.790–0.796) | 0.826(0.825–0.828) |
| XGBoost | 0.855(0.853–0.855) | 0.889(0.886–0.892) | 0.641(0.636–0.649) | 0.802(0.800-0.804) | 0.756(0.753–0.761) | 0.822(0.820–0.825) |
| Multi-classification model in the internal test set | ||||||
| GradientBoost | - | 0.803(0.797–0.813) | 0.578(0.561–0.597) | 0.724(0.717–0.736) | 0.618(0.606–0.630) | 0.847(0.841–0.858) |
| CatBoost | - | 0.798(0.792–0.807) | 0.567(0.551–0.589) | 0.715(0.708–0.726) | 0.604(0.588–0.615) | 0.839(0.835–0.847) |
| XGBoost | - | 0.796(0.758–0.808) | 0.559(0.533–0.575) | 0.709(0.694–0.720) | 0.593(0.565–0.607) | 0.833(0.825–0.844) |
| Multi-classification model in the temporal validation set | ||||||
| GradientBoost | - | 0.819(0.818–0.821) | 0.604(0.602–0.606) | 0.744(0.742–0.846) | 0.640(0.636–0.643) | 0.860(0.857–0.861) |
| CatBoost | - | 0.817(0.813–0.822) | 0.598(0.592–0.610) | 0.740(0.735–0.748) | 0.635(0.626–0.649) | 0.855(0.852–0.860) |
| XGBoost | - | 0.812(0.809–0.815) | 0.585(0.576–0.598) | 0.731(0.727–0.735) | 0.622(0.613–0.634) | 0.847(0.845–0.850) |
AUC, area under curve; CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value
Explanation of binary classification models
Since it is difficult for clinicians to accept a prediction model that is not directly explainable and interpretable, the SHAP method is used to explain the output of the final model by calculating the contribution of each variable to the prediction. As shown in Fig. 2-a, the mean SHAP values evaluate the contribution of features to the model and display them in descending order. The four variables that most significantly influence the prediction of DXA examination results are age, gender, BMI, and ALP. Figure 2-b provides a more intuitive observation of the correspondence between different variables and the prediction results of DXA examination for participants. It can be observed that the top four variables impacting DXA examination results exhibit certain patterns. For instance, BMI shows a gradient line from red to blue. Especially around the SHAP value of 0, there is a distinct color boundary, indicating an exploitable pattern between BMI values and DXA examination results. When the BMI value is low, the model tends to predict abnormal DXA examination results, whereas high BMI values are associated with normal DXA examination results. As illustrated in Fig. 2-b, higher age, female gender, lower BMI, and higher ALP values are key factors leading the model to predict abnormal DXA examination results. Figure 2-c shows the magnitude of the influence of all variables on the outcome for a random sample in the model.
Fig. 2.
Global model explanation by the SHAP method. a: The absolute SHAP values of all variables are sorted, with the variables sorted from top to bottom to indicate the effect on the outcome from largest to smallest. b: We visualize the size of the variable in each sample and the role of the effect on the outcome. Different colors indicate differences in the size of the variable values. The gradient from red to blue indicates that the variable values are ranked from high to low. The horizontal coordinate indicates the magnitude of the effect on the outcome. c: Take a sample to visualize the effect of each variable on outcome judgment. d: Interaction effect of gender distribution and age distribution. Different genders significantly contribute to model outcomes in different directions. Male patients are all located below the zero point of the longitudinal axis, and the vast majority of female patients are located above the zero point of the longitudinal axis. In addition, younger patients were located near the zero point of the longitudinal axis, suggesting that the effect of gender on outcomes is reduced at younger ages. e: The interaction effect of age distribution and sex distribution. The overall trend is upward sloping, suggesting that the effect of age on results is more likely to identify patients as having an abnormal DXA examination as age increases. For patients of the same age, most male patients were closer to the zero point of the longitudinal axis, and most female patients were farther away from the zero point of the longitudinal axis, suggesting that gender creates variation in the role of age on outcome. The effect of age on outcome was smaller for male patients of the same age than for female. f: The interaction effect of ALP distribution and BMI distribution. The upward trend in overall skew suggests that as ALP increases, the role of ALP on outcomes is more inclined to identify patients as those with an abnormal DXA examination. g: The interaction effect of BMI distribution and age distribution. The overall trend is downward sloping, which indicates an increase in BMI, and the role of BMI on the results is more toward identifying patients as those with normal DXA examinations. h: Force plot (age perspective) for 500 samples. When age is relatively low, the blue region is larger, indicating that younger individuals are more likely to be classified as having normal DXA results. As age increases, the red region becomes more dominant, reflecting a stronger contribution toward abnormal DXA classifications at older ages. i: Force plot (BMI perspective) for 500 samples. When BMI is low, the red region is larger, indicating a higher contribution toward abnormal DXA classifications. When BMI is higher, the blue region becomes more prominent, suggesting that higher BMI contributes more strongly to normal DXA classifications. j: Force plot (ALP perspective) for 500 samples. Similar to the age pattern, when ALP levels are low, the blue region is larger, indicating a stronger contribution to normal DXA results. As ALP increases, the red region becomes dominant, reflecting a greater contribution toward abnormal DXA classifications
Furthermore, we explore the interaction of different variables within our model, as depicted in Fig. 2-d and g. Although potential patterns between variables and DXA examination results can be observed in Fig. 2-b, the lack of interaction among different variables is evident, which is crucial for multi-factor predictions. As shown in Fig. 2-d, we find that two rectangles appear blue near the zero value on the vertical axis, while the rest are red. This indicates that, although gender directly influences the orientation of DXA examination results, its impact is minimized when age is lower. In other words, the influence of gender becomes more pronounced with increasing age, but not when age is lower. Unlike in Fig. 2-b, in Fig. 2-e, we use the SHAP value of age as the vertical axis. Near the zero value on the vertical axis, the number of blue points far exceeds that of red points. This suggests that being male can reduce the influence of age on the outcome to some extent. Additionally, Fig. 2-h ~ j present the model’s explanation plots. These plots visualize 500 samples, with the X-axis representing the size of different variables and the Y-axis indicating the magnitude of their influence on the outcome. The red areas indicate a greater tendency towards abnormal DXA examination results, while the blue areas indicate the opposite.
Performance and explanation of binary classification models in subgroups
In the SHAP diagram, we clearly observed the importance of the four continuous variables BMI, ALP, gender, and age for the prediction model. Especially for BMI and ALP, the conclusions reported in previous literature were different [37–39]. Therefore, in further experiments, RCS was used to explore the nonlinear relationship between these three variables and osteoporosis endpoints [40]. According to the results of RCS (Fig. 3-c1 ~ c3), there is a nonlinear positive correlation between age and RCS, and the older the age, the more likely it is to have osteoporosis or osteopenia. In particular, the relationship between BMI and abnormal DXA examination results is a nonlinear negative correlation, and the key point of the nonlinear change is when the BMI value is about 25. The relationship between ALP and the outcome endpoint is a nonlinear positive correlation, and the cutoff value of ALP is 180 IU/L. Therefore, in order to further verify whether the model is still stable before and after the key value range where the correlation between the independent variable and the outcome changes significantly, we designed a further subgroup analysis experiment: the subgroups were BMI ≥ 25 kg/m2 or < 25 kg/m2 and ALP ≥ 180 IU/L or < 180 IU/L. Previous reports and guidelines have clearly stated that 50 is the common age for women to enter menopause, and clinical studies have also confirmed that menopausal women are at high risk of osteoporosis [41, 42]. For male, those over 50 years old are also at higher risk of osteoporosis [5, 6, 43]. Therefore, we made a comprehensive division into age and gender subgroups, including men aged ≥ 50 years, men aged < 50 years, women aged ≥ 50 years, and women aged < 50 years. In the subgroup analysis, the performance of the model remained stable. In the training set, we found that except for the subgroups that age < 50 years old, including men and women, the training set AUC of all subgroups exceeded 70%, and the highest was 86.6% in the BMI ≥ 25 kg/m2 subgroup. The AUC of all test sets was higher than 70%, with the lowest being 70.4% for men < 50 years old and the highest being 88.2% for BMI ≥ 25 kg/m2. GradientBoost performed best in almost all models, except for the female subgroup aged ≥ 50 years. The best AUC of the training set in this subgroup was achieved by the CatBoost algorithm, but GradientBoost was also very close to it (81.7% vs. 81.6%). In the validation set of this subgroup, GradientBoost was still better than CatBoost (82.7% vs. 82.2%).
Fig. 3.
Model explanation and comparison in subgroups by SHAP method; restricted cubic spline analysis and confusion matrix for multi-classification models. a: This figure describes the ranking of variable importance for each subgroup. We plotted the top four variables in each subgroup in b1 ~ b4. The vertical axis lists the importance of the variables in order from bottom to top, with the bottom being the most important, and the horizontal axis shows the different groupings labeled along the way. b1 ~ b4: The SHAP diagrams of different subgroups. b1: The male subgroup aged ≥ 50 years old. b2: The male aged ≤ 50 years old. b3: The female aged ≥ 50 years old. b4: The female aged ≤ 50 years old. c1 ~ c3: The nonlinear correlation between age, BMI, ALP and DXA abnormalities. c1: Relationship between age and osteoporosis risk. Before approximately age 50, the odds ratio (OR) for osteoporosis, while fluctuating slightly, remains below 1, indicating younger age is generally protective. Around age 50, the OR exceeds 1 and continues to increase with age, showing that aging becomes a persistent and strong risk factor. c2: Relationship between alkaline phosphatase (ALP) and osteoporosis risk. A significant threshold effect is observed. Below the threshold of ~ 180 IU/L, osteoporosis risk slightly increases with ALP but remains low (OR < 1). When ALP exceeds this threshold, risk sharply rises (OR > 1), suggesting this level may serve as a key biomarker for identifying high-risk individuals and guiding early screening. C3: Relationship between body mass index (BMI) and osteoporosis risk. A significant non-linear relationship exists, showing a protective threshold. Below the threshold, higher BMI gradually decreases risk, though overall risk remains above reference (OR > 1). Once BMI surpasses the threshold, higher BMI clearly protects against osteoporosis (OR < 1), with the protective effect increasing as BMI rises. d ~ f: Results of GradientBoost, XGBoost, and CatBoost averaged over the internal test set for the three-classification confusion matrix in the multi-classification experiment. g ~ i: Results of GradientBoost, XGBoost, and CatBoost averaged over the temporal validation set for the three-classification confusion matrix in the multi-classification experiment
As shown in Fig. 3-b1 ~ b4, we then conducted a subgroup variable importance analysis using the SHAP method. We found that in the male group aged 50 years or older, the variable that had the greatest impact on the model results was BMI, followed by age, ALP, and average monthly household income (Fig. 3-b1). Similarly, the most important variable in men under 50 years old was BMI, followed by ALP, age, and total household income (Fig. 3-b2). In the female group, the order of variables between the subgroups aged 50 years or older and under 50 years old was quite different (Fig. 3-b3 and Fig. 3-b4). The order of variables in menopausal women was age, BMI, ALP, and education, while in younger women it was ALP, BMI, waist circumference, and age. Figure 3-a describes the ranking of variable importance for each subgroup.
Result of multi-classification models
The above experiments fully prove that we have achieved a non-radioactive prediction of whether DXA results are normal. This prediction applies to the population that has completed blood tests and health questionnaires. Based on the different treatment options for osteoporosis and osteopenia, we tried to further distinguish between normal people, osteopenia, and osteoporosis. Therefore, we conducted a three-classification experiment. The results are shown in Fig. 3-d1 ~ d6 and Supplementary Table 4. Further subcategorization of patients was more challenging than distinguishing between normal and abnormal DXA results. As a result, the results in the multi-classification experiments were decreased compared to the dichotomization. Figure 3-d1 ~ d3 shows the results of GradientBoost, CatBoost, and XGBoost on the internal test set averaged over the three categorical confusion matrices. Figure 3-d4 ~ d6 shows the triple classification confusion matrix results averaged over the three models in the temporal validation set. GradientBoost achieves the best results consistently (72.4% accuracy, 57.8% sensitivity, and 80.3% specificity).
Calibration performance and decision curve analysis
The calibration analysis demonstrated that our multivariable model achieved better agreement between predicted and observed risks compared with the BMI-only model. As shown in Supplementary Fig. 3-a1, the calibration curve of our model exhibited a wider distribution of predicted probabilities, with an intercept of 0.0118 and a slope of 1.0458, indicating minimal overall bias and good calibration across the risk spectrum. In contrast, the BMI-based univariable model showed a narrower probability range and poorer calibration accuracy, with an intercept of − 0.1626 and a slope of 0.8283 (Supplementary Fig. 3-b1).
Decision Curve Analysis (DCA) further confirmed the superior clinical usefulness of the multivariable model. As shown in Supplementary Fig. 3-a2, the net benefit of our model remained consistently higher than both the Treat All and Treat None strategies across all clinically relevant threshold probabilities. Conversely, the BMI-only model (Supplementary Fig. 3-b2) exceeded the two reference strategies only within a limited range of threshold probabilities, indicating substantially lower clinical utility.
Discussion
Screening for osteopenia and osteoporosis in the general population is important so that timely intervention can be made to prevent fragility fractures. This is because fracture healing in elderly people with osteoporosis remains challenging due to the adverse effects of the aging immune microenvironment [44]. In addition, the Annals of Internal Medicine recommended in its updated clinical guidelines last year that all people with osteopenia and osteoporosis - that is, those with abnormal DXA tests - take adequate vitamin D and calcium supplements [45]. In the current study, we implemented a multi-classification model using ML algorithms and electronic health records from a cross-sectional cohort of 18,179 participants. Compared with the model developed by Li et al. using electronic health records [16], our approach demonstrates a clear performance advantage, achieving a higher AUC (0.845 vs. 0.815). Our prediction had high power and specificity. In particular, it remained stable in subgroups based on nonlinearly correlated independent variables, indicating a very strong clinical application value. To our knowledge, this is one of the largest samples used to develop a nonradioactive prediction model for osteoporosis risk in Asians.
A notable feature of our prediction model is that we included all non-radiological variables in the electronic medical database as potential predictors, regardless of whether they were previously associated with osteoporosis, similar to recent machine learning approaches using routinely collected clinical data [46, 47]. Since the electronic medical records were entered by examiners when conducting the survey, the ready-made data enhanced the feasibility of integrating the prediction model into the workflow of community physical examination centers and daily health questionnaire surveys. In addition, the independent sampling design of the KNHANES enhances the potential generalizability of our prediction model, indicating that it may achieve robust performance when applied to the broader Korean population. Although the model showed strong specificity and NPV, the sensitivity was moderate, indicating that some patients with osteoporosis may be misclassified as non-osteoporotic (false negatives). Such cases could lead to delayed diagnosis and missed opportunities for early intervention, potentially increasing fracture risk. Conversely, false positives may result in unnecessary diagnostic evaluations, although confirmatory DXA testing would typically mitigate the risk of overtreatment. Taken together, our model is intended primarily as a high-specificity, risk-stratification and rule-in tool to help identify individuals at elevated risk and thereby reduce unnecessary DXA examinations, rather than as a population-level screening instrument or a replacement for DXA confirmation. In practical application, patients can input the values of relevant variables into the model by filling out a questionnaire containing the variables included in this model. The model will then predict whether they have osteoporosis, osteopenia, or normal bone status. The specialist will recommend whether the patient needs further DXA screening based on the model conclusion. Although the final prediction model demonstrated strong discrimination and calibration performance, the practical availability of some predictors may limit its immediate deployability in all clinical settings. In the current version of the online calculator, we therefore clearly state that the intended users are trained healthcare professionals, who can obtain or interpret these predictors appropriately within clinical workflow. Additionally, we provide possible ways to obtain some predictors in daily practice. This clarification helps ensure that the calculator is applied within settings where input variables can be reliably collected, thereby reducing the likelihood of inappropriate usage.
Gender differences can be found in many diseases, especially those whose pathogenesis involves hormone levels that are susceptible to gender [48]. Osteoporosis is one such disease with gender differences. Sexual dimorphism is a prominent feature of osteoporosis, with gender-specific differences in epidemiology and pathogenesis. Specifically, women are more likely to develop osteoporosis than men, while men are more likely to be disabled or die from osteoporosis [2]. There were statistical differences between all independent variables except age in the men and women included in this study. This provided a basis for further subgroup analysis. Due to the interaction between gender and age, we explored the difference in predictive efficacy between different age groups of the same gender in subsequent subgroup analysis. In the same age group, the osteoporosis risk prediction efficacy of the male group is lower than that of the female group. This is related to the fact that sex hormones, including estrogen, androgens, and testosterone, contribute to gender differences in the risk and pathophysiology of osteoporosis [49]. Such as bone loss after menopause linked to falling estrogen levels in female [50]. And it is also consistent with the insights from previous researches that the osteoporosis risk of men is weaker than that of women [2].
As a representative indicator of obesity level [51], BMI is associated with a variety of diseases such as all-cause mortality, cardiovascular disease mortality, and osteoporosis [39, 52]. However, whether this association is positive or negative remains to be discussed. After being first proposed in 1999 [53], the obesity paradox has also received a lot of new evidence [51, 52, 54]. Although previous studies have reached different conclusions on the predictive effects of BMI on osteoporosis risk, our results suggest that higher BMI is generally associated with a lower likelihood of osteoporosis in this population. This pattern is consistent with prior observational research and may reflect a manifestation of the so-called “osteoporosis obesity paradox.” However, given the cross-sectional design and single-country dataset, these findings should be interpreted as hypothesis-generating rather than confirmatory. The relationship between BMI and bone health warrants further validation in larger, multi-center, and longitudinal studies.
Another clinical advantage is the high-risk warning indicator for osteoporosis. While previous studies have reported a correlation between ALP and osteoporosis [55], our study explored a nonlinear relationship between ALP and osteoporosis and suggested a potential high-risk ALP range. The importance of ALP in our analysis was supported by the SHAP algorithm and further examined using RCS and subgroup analyses. Our RCS analysis suggested that the probability of DXA abnormalities appeared to increase when ALP was above approximately 180 IU/L, which may be related to the involvement of ALP in osteoblast–osteoclast transdifferentiation [56, 57]. These findings may provide hypothesis-generating evidence regarding the potential role of ALP in osteoporosis risk assessment and warrant further validation in larger, multi-center, and longitudinal studies before any clinical application.
It should also be noted that in the whole population model, education level and age at onset of drinking were ranked fifth and sixth in importance among all variables, respectively. The correlation between education level and osteoporosis in Europeans has been confirmed in recent studies [58]. This association has also been used in the osteoporosis prediction model for Asians based on machine learning [59]. This study evaluated these associations in a relatively large cohort, which provides additional evidence supporting observations reported in earlier research. There are similar views in previous studies on the relationship between education level and osteoporosis. A higher educational attainment was associated with an increased level of bone mineral density and a lower risk of fracture [58]. As for the relationship between the start age of drinking and osteoporosis, it is less explored. Previous studies have concluded that there is no significant correlation between early heavy drinking starting at the age of ≤ 15 and bone density [60]. Our results add to this limited body of evidence by suggesting a potential association between drinking initiation age and abnormal DXA findings, which may offer preliminary support for future clinical or mechanistic investigations.
There are some already published articles that use the same database source as this research to establish osteoporosis risk prediction models and achieve good prediction efficiency [8, 61]. It is important to note that while these studies share many similarities with ours, our study specifically focuses on differentiating between osteoporosis and its early disease state, osteopenia, whereas other studies have primarily focused on predicting osteoporosis and its later disease state, fracture risk. And the variables we used are more readily available than those used in other studies (questionnaire variables versus genetic variables). With the publication of the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guidelines a decade ago [62], prediction models have clear methodological guidance. Some studies did not meet reporting standards and performed poorly [63, 64]. Our study achieved good model performance for predicting osteoporosis after rigorous evaluation. We also note that among all the prediction models for osteoporosis, our study used data that are usually collected as part of routine care and can be easily extracted from electronic health records [65]. Compared with other prediction models, the model developed in this study meets a clear clinical need because it requires the richest variety of variables with the fewest number and is applicable to all patients commonly seen in clinical practice.
Our study also has several limitations. First, the feature selection process (Spearman correlation filtering, LASSO, and GradientBoost-based RFE) was performed independently prior to model training. Although this approach helped reduce redundancy and improve interpretability, the fact that it was not fully nested within cross-validation may introduce a degree of optimistic bias. Second, although our dataset included a large number of subjects, the KNHANES is a cross-sectional survey that captures population data at a single time point. As a result, our model predicts current osteoporosis status rather than future fracture risk. This also limits comparison with clinical fracture-prediction tools such as FRAX, which incorporate longitudinal risk estimates. Third, the ground truth for osteoporosis in our study was defined solely by DXA-based T-scores. While DXA is the clinical gold standard, it does not fully capture bone quality, microarchitecture, or other determinants of fracture susceptibility, thereby constraining the model’s clinical utility for broader skeletal risk assessment. In addition, although we used all available KNHANES data containing DXA examinations (2008–2011), the reliance on a single national survey may affect the contemporaneous relevance of the results and the applicability of the derived tool in other populations. The model also lacks external validation using independent or real-world datasets, which limits generalizability, particularly for non-Korean or multiethnic populations. Moreover, several clinically important risk factors—such as prior fragility fractures, long-term glucocorticoid use, family history of osteoporosis, and specific comorbidities—were not directly included. This restricts the comprehensiveness and real-world deployment of the model. Finally, some predictors may interact with unmeasured lifestyle, clinical, or genetic factors not captured in KNHANES, which may influence the model’s behavior in unseen populations. Acknowledging these constraints is essential for correct interpretation, and future multicenter, longitudinal, and clinically enriched studies are warranted to improve fracture prediction, enable comparison with tools like FRAX, and enhance the translational value of our approach.
Conclusion
Our findings may help in early screening of DXA abnormalities in the general population, especially in community settings. This will not only reduce unnecessary radioactive examinations but also facilitate timely specialist treatment. Future research should focus on external validation in multiethnic cohorts, longitudinal prediction of osteoporosis risk, and refinement of model sensitivity to further enhance its clinical utility.
Supplementary Information
Below is the link to the electronic supplementary material.
Acknowledgements
Not applicable.
Abbreviations
- ACC
Accuracy
- ALP
Alkaline phosphatase
- ANOVA
Analysis of variance
- AUC
Area under the curve
- BMD
Bone mineral density
- BMI
Body mass index
- CI
Confidence interval
- DXA
Dual-energy X-ray examinations
- FVC
Forced vital capacity
- KNHANES
Korean national health and nutrition examination survey
- LASSO
Least absolute shrinkage and selection operator
- ML
Machine learning
- NPV
Negative predictive value
- OSTA
Osteoporosis self-assessment tool for Asians
- PPV
Positive predictive value
- RCS
Restricted cubic spline
- RFE
Recursive feature rlimination
- SHAP
SHapley Additive exPlanations
- SD
Standard deviation
- SEN
Sensitivity
- SPE
Specificity
- TRIPOD
Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis
- WHO
World health organization
Author contributions
Chao Tong, Kun Zhang and Hui Huang are the guarantors of the study. All authors (Sijin Li, Yuqi Zhang, Peibiao Mai, Qiang Su, Minnan Gao, Kuan Zeng, Chao Tong, Kun Zhang and Hui Huang) were involved in the conceptualization and design of the study. Sijin Li and Yuqi Zhang were responsible for the experiment. Data cleaning was done by Peibiao Mai, Qiang Su, Minnan Gao and Kuan Zeng. Analysis and interpretation were done by Sijin Li and Yuqi Zhang under the supervision and withthe support of Chao Tong, Kun Zhang and Hui Huang. Drafting of the article was done by Sijin Li and Yuqi Zhang. All authors (Sijin Li, Yuqi Zhang, Peibiao Mai, Qiang Su, Minnan Gao, Chao Tong, Kuan Zeng, Kun Zhang and Hui Huang) revised and contributed to the intellectual content of the article. All authors (Sijin Li, Yuqi Zhang, Peibiao Mai, Qiang Su, Minnan Gao, Chao Tong, Kuan Zeng, Kun Zhang and Hui Huang) approved the final version of the article, including the authorship list.
Funding
This study is partially supported by National Natural Science Foundation of China (62572033, 62176016, 72274127), Beijing Municipal Science and Technology Program (Project) Task: Research and Validation of Integrated Satellite Internet-Based Communication-Navigation and Intelligent Low-Altitude Airspace Management and Control Technologies (Z251100003625009), Guizhou Province Science and Technology Project: Research on Q&A Interactive Virtual Digital People for Intelligent Medical Treatment in Information Innovation Environment (supported by Qiankehe [2024] General 058), Haidian innovation and translation program from Peking University Third Hospital (HDCXZHKC2023203), Project: Research on the Decision Support System for Urban, Park Carbon Emissions Empowered by Digital Technology - A Special Study on the Monitoring and Identification of Heavy Truck Beidou Carbon Emission Reductions to Chao Tong. National Natural Science Foundation of China (72274127), R&D Program of Beijing Municipal Education Commission (KZ202010025047) to Su Qiang. Project (YXYXCXRC202401, GCCRCYJ065, JCYJ20230807110302005) and National Natural Science Foundation of Guangdong Province (2022A1515011041) to Kuan Zeng. Shenzhen Medical Research Fund (B2302020, C2504001), Noncommunicable Chronic Diseases- National Science and Technology Major Project (2025ZD0547300), National Natural Science Foundation of China (82330021, 82270771), Shenzhen Science and Technology Program (KCXFZ20211020163801002, ZDSYS20220606100801004, SGDX20230116092459009), the Regional Joint Funding Key Project of Guangdong Basic Research and Basic Research for Application (2024B1515120018), and Shenzhen Key Medical Discipline Construction Fund (SZXK002), Futian District Public Health Scientific Research Project of Shenzhen (FTWS2022001), Chinese Association of Integrative Medicine-Shanghai Hutchison Pharmaceuticals Fund (HMPE202202), China Heart House-Chinese Cardiovascular Association HX fund (2022-CCA-HX-090) to Hui Huang.
Data availability
The raw data is available directly on the official website to get specific content. And the codes used during the current study are available from https://github.com/YuqiZhang-Buaa/Online-Non-Radiographic-Osteoporosis-Prediction-Calculator.
Declarations
Ethics approval and consent to participate
The study protocol was approved by the Institutional Review Board of the Korean Center for Disease Control and Prevention (No. 2008–04EXP-01-C, 2009–01CON-03-C, 2010–02CON-21-C and 2011–02CON-06-C). The raw data sets are publicly available through the KNHANES website, and data collection from the KNHANES dataset was approved by the Institutional Review Board of the Korean National Institute for Bioethics Policy, which waived the requirement for informed consent for this study. The study adhered to the tenets of the Declaration of Helsinki.
Consent for publication
All authors have read and agreed to the published version of the manuscript.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Yuqi Zhang and Sijin Li contributed equally to this work.
Contributor Information
Chao Tong, Email: tongchao@buaa.edu.cn.
Kun Zhang, Email: zhangk65@mail.sysu.edu.cn.
Hui Huang, Email: huangh8@mail.sysu.edu.cn.
References
- 1.Compston JE, McClung MR, Leslie WD, Osteoporosis. Lancet. 2019;393(10169):364–76. [DOI] [PubMed] [Google Scholar]
- 2.Zhang YY, Xie N, Sun XD, et al. Insights and implications of sexual dimorphism in osteoporosis. Bone Res. 2024;12(1):8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Xiao PL, Cui AY, Hsu CJ, et al. Global, regional prevalence, and risk factors of osteoporosis according to the world health organization diagnostic criteria: a systematic review and meta-analysis. Osteoporos Int. 2022;33(10):2137–53. [DOI] [PubMed] [Google Scholar]
- 4.Papadopoulou SK, Papadimitriou K, Voulgaridou G, et al. Exercise and nutrition impact on osteoporosis and Sarcopenia-the incidence of osteosarcopenia: a narrative review. Nutrients 2021;13(12). [DOI] [PMC free article] [PubMed]
- 5.Curtis EM, van der Velde R, Moon RJ, et al. Epidemiology of fractures in the United Kingdom 1988–2012: variation with age, sex, geography, ethnicity and socioeconomic status. Bone. 2016;87:19–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.International Osteoporosis F. Epidemiology of osteoporosis and fragility fractures. 2022.
- 7.Cummings SR, Cosman F, Lewiecki EM, et al. Goal-Directed treatment for osteoporosis: A progress report from the ASBMR-NOF working group on Goal-Directed treatment for osteoporosis. J Bone Min Res. 2017;32(1):3–10. [DOI] [PubMed] [Google Scholar]
- 8.Suh B, Yu H, Kim H, et al. Interpretable Deep-Learning approaches for osteoporosis risk screening and individualized feature analysis using large Population-Based data: model development and performance evaluation. J Med Internet Res. 2023;25:e40179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Camacho PM, Petak SM, Binkley N, et al. American association of clinical Endocrinologists/American college of endocrinology clinical practice guidelines for the diagnosis and treatment of postmenopausal Osteoporosis-2020 update. Endocr Pract. 2020;26(Suppl 1):1–46. [DOI] [PubMed] [Google Scholar]
- 10.Li CC, Ou LC, Chang YF, et al. Performance and interventional cutoffs of osteoporosis self-assessment tools in the community: implications for screening and early referral. Osteoporos Int. 2025. [DOI] [PubMed]
- 11.Kanis JA, Norton N, Harvey NC, et al. SCOPE 2021: a new scorecard for osteoporosis in Europe. Arch Osteoporos. 2021;16(1):82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Looker AC, Melton LJ 3rd, Harris TB, Borrud LG, Shepherd JA. Prevalence and trends in low femur bone density among older US adults: NHANES 2005–2006 compared with NHANES III. J Bone Min Res. 2010;25(1):64–71. [DOI] [PMC free article] [PubMed]
- 13.Choksi P, Jepsen KJ, Clines GA. The challenges of diagnosing osteoporosis and the limitations of currently available tools. Clin Diabetes Endocrinol. 2018;4:12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Emir SN, Guner G. Evaluation of lumbar vertebral bone quality using T1-weighted MRI: can it differentiate normal, osteopenia, and osteoporosis? J Clin Densitom. 2025;28(2):101561. [DOI] [PubMed] [Google Scholar]
- 15.Yang J, Liao M, Wang Y, et al. Opportunistic osteoporosis screening using chest CT with artificial intelligence. Osteoporos Int. 2022;33(12):2547–61. [DOI] [PubMed] [Google Scholar]
- 16.Li GH, Cheung CL, Tan KC, et al. Development and validation of sex-specific hip fracture prediction models using electronic health records: a retrospective, population-based cohort study. EClinicalMedicine. 2023;58:101876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Khosla S, Melton LJ. 3rd. Clinical practice. Osteopenia. N Engl J Med. 2007;356(22):2293–300. [DOI] [PubMed] [Google Scholar]
- 18.Toh LS, Lai PSM, Wu DB, et al. A comparison of 6 osteoporosis risk assessment tools among postmenopausal women in Kuala Lumpur, Malaysia. Osteoporos Sarcopenia. 2019;5(3):87–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Petersen TG, Abrahamsen B, Hoiberg M, et al. Ten-year follow-up of fracture risk in a systematic population-based screening program: the risk-stratified osteoporosis strategy evaluation (ROSE) randomised trial. EClinicalMedicine. 2024;71:102584. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Goecks J, Jalili V, Heiser LM, Gray JW. How machine learning will transform biomedicine. Cell. 2020;181(1):92–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Zhang Y, Li S, Wu W, et al. Machine-learning-based models to predict cardiovascular risk using oculomics and clinic variables in KNHANES. BioData Min. 2024;17(1):12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zhang Y, Li S, Mai P, et al. A machine learning-based model for predicting paroxysmal and persistent atrial fibrillation based on EHR. BMC Med Inf Decis Mak. 2025;25(1):51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Haug CJ, Drazen JM. Artificial intelligence and machine learning in clinical Medicine, 2023. N Engl J Med. 2023;388(13):1201–8. [DOI] [PubMed] [Google Scholar]
- 24.Zhang Y, Yu M, Tong C, Zhao Y, Han J. CA-UNet segmentation makes a good ischemic stroke risk prediction. Interdisciplinary Sciences: Comput Life Sci. 2024;16(1):58–72. [DOI] [PubMed] [Google Scholar]
- 25.Kweon S, Kim Y, Jang MJ, et al. Data resource profile: the Korea National health and nutrition examination survey (KNHANES). Int J Epidemiol. 2014;43(1):69–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Yoo TK, Oh E. Association between dry eye syndrome and osteoarthritis severity: A nationwide Cross-Sectional study (KNHANES V). Pain Med. 2021;22(11):2525–32. [DOI] [PubMed] [Google Scholar]
- 27.Lee JS, Jang S. A study on reference values and prevalence of osteoporosis in korea: the Korea National health and nutrition examination survey 2008–2011. J Korean Official Stat. 2013;18(2):42–65. [Google Scholar]
- 28.Kanis JA, Gluer CC. An update on the diagnosis and assessment of osteoporosis with densitometry. Committee of scientific Advisors, international osteoporosis foundation. Osteoporos Int. 2000;11(3):192–202. [DOI] [PubMed] [Google Scholar]
- 29.Choi E, Choi KW, Jeong HG, et al. Long working hours and depressive symptoms: moderation by gender, income, and job status. J Affect Disord. 2021;286:99–107. [DOI] [PubMed] [Google Scholar]
- 30.Oh H, Kim J, Huh Y, Kim SH, Jang SI. Association of household income level with vitamin and mineral intake. Nutrients 2021;14(1). [DOI] [PMC free article] [PubMed]
- 31.Oh J, Ye S, Kang DH, Ha E. Association between exposure to fine particulate matter and kidney function: results from the Korea National health and nutrition examination survey. Environ Res. 2022;212(Pt A):113080. [DOI] [PubMed] [Google Scholar]
- 32.Lee HH, Lee H, Townsend RR, Kim DW, Park S, Kim HC. Cardiovascular implications of the 2021 KDIGO blood pressure guideline for adults with chronic kidney disease. J Am Coll Cardiol. 2022;79(17):1675–86. [DOI] [PubMed] [Google Scholar]
- 33.Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30.
- 34.Lundberg SM, Erion G, Chen H, et al. From local explanations to global Understanding with explainable AI for trees. Nat Mach Intell. 2020;2(1):56–672522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Gupta S, Glezerman IG, Hirsch JS, et al. Derivation and external validation of a simple risk score for predicting severe acute kidney injury after intravenous cisplatin: cohort study. BMJ. 2024;384:e077169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Bhaskaran K, Dos-Santos-Silva I, Leon DA, Douglas IJ, Smeeth L. Association of BMI with overall and cause-specific mortality: a population-based cohort study of 3.6 million adults in the UK. Lancet Diabetes Endocrinol. 2018;6(12):944–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Liu Y, Liu Y, Huang Y, et al. The effect of overweight or obesity on osteoporosis: A systematic review and meta-analysis. Clin Nutr. 2023;42(12):2457–67. [DOI] [PubMed] [Google Scholar]
- 38.Lin YJ, Liang WM, Chiou JS, et al. Genetic predisposition to bone mineral density and their health conditions in East Asians. J Bone Min Res. 2024. [DOI] [PubMed]
- 39.Gruneisen E, Kremer R, Duque G. Fat as a friend or foe of the bone. Curr Osteoporos Rep. 2024;22(2):245–56. [DOI] [PubMed] [Google Scholar]
- 40.Lee DH, Keum N, Hu FB, et al. Predicted lean body mass, fat mass, and all cause and cause specific mortality in men: prospective US cohort study. BMJ. 2018;362:k2575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Romandini M, Shin HS, Romandini P, Lafori A, Cordaro M. Hormone-related events and periodontitis in women. J Clin Periodontol. 2020;47(4):429–41. [DOI] [PubMed] [Google Scholar]
- 42.Glynne S, Newson L, Reisel D. Hormone therapy for the prevention of chronic conditions in postmenopausal persons. JAMA. 2023;329(11):940–1. [DOI] [PubMed] [Google Scholar]
- 43.Gregson CL, Armstrong DJ, Bowden J, et al. UK clinical guideline for the prevention and treatment of osteoporosis. Arch Osteoporos. 2022;17(1):58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Ji H, Shen G, Liu H, et al. Biodegradable Zn-2Cu-0.5Zr alloy promotes the bone repair of senile osteoporotic fractures via the immune-modulation of macrophages. Bioact Mater. 2024;38:422–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Qaseem A, Hicks LA, Etxeandia-Ikobaltzeta I, et al. Pharmacologic treatment of primary osteoporosis or low bone mass to prevent fractures in adults: A living clinical guideline from the American college of physicians. Ann Intern Med. 2023;176(2):224–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Carvalho FR, Gavaia PJ. Enhancing osteoporosis risk prediction using machine learning: A holistic approach integrating biomarkers and clinical data. Comput Biol Med. 2025;192:110289. [DOI] [PubMed] [Google Scholar]
- 47.Carvalho FR, Gavaia PJ. Letter to the editor: robustness of osteoporosis risk prediction models with enhanced statistical analyses. Comput Biol Med. 2025;196(Pt A):110711. [DOI] [PubMed] [Google Scholar]
- 48.Shi Y, Ma J, Li S, et al. Sex difference in human diseases: mechanistic insights and clinical implications. Signal Transduct Target Ther. 2024;9(1):238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Arnold AP, Cassis LA, Eghbali M, Reue K, Sandberg K. Sex hormones and sex chromosomes cause sex differences in the development of cardiovascular diseases. Arterioscler Thromb Vasc Biol. 2017;37(5):746–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Manolagas SC, O’Brien CA, Almeida M. The role of Estrogen and androgen receptors in bone health and disease. Nat Rev Endocrinol. 2013;9(12):699–712. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Tran TXM, Chang Y, Choi HR, et al. Adiposity, body composition Measures, and breast cancer risk in Korean premenopausal women. JAMA Netw Open. 2024;7(4):e245423. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Lv Y, Zhang Y, Li X, et al. Body mass index, waist circumference, and mortality in subjects older than 80 years: a Mendelian randomization study. Eur Heart J. 2024;45(24):2145–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Schmidt DS, Salahudeen AK. Obesity-survival paradox-still a controversy? Semin Dial. 2007;20(6):486–92. [DOI] [PubMed] [Google Scholar]
- 54.Dai M, Xia B, Xu J, Zhao W, Chen D, Wang X. Association of waist-calf circumference ratio, waist circumference, calf circumference, and body mass index with all-cause and cause-specific mortality in older adults: a cohort study. BMC Public Health. 2023;23(1):1777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Schini M, Vilaca T, Gossiel F, Salam S, Eastell R. Bone turnover markers: basic biology to clinical applications. Endocr Rev. 2023;44(3):417–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Zhang D, Wang X, Sun K, et al. Onion (Allium Cepa L.) flavonoid extract ameliorates osteoporosis in rats facilitating osteoblast proliferation and differentiation in MG-63 cells and inhibiting RANKL-induced osteoclastogenesis in RAW 264.7 cells. Int J Mol Sci. 2024;25(12). [DOI] [PMC free article] [PubMed]
- 57.Deng H, Li H, Liu Z, et al. Pro-osteogenic role of interleukin-22 in calcific aortic valve disease. Atherosclerosis. 2024;388:117424. [DOI] [PubMed] [Google Scholar]
- 58.Duan JY, You RX, Zhou Y, et al. Assessment of causal association between the socio-economic status and osteoporosis and fractures: a bidirectional mendelian randomization study in European population. J Bone Min Res. 2024. [DOI] [PubMed]
- 59.Yang Q, Cheng H, Qin J, et al. A machine Learning-Based preclinical osteoporosis screening tool (POST): model development and validation study. JMIR Aging. 2023;6:e46791. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.LaBrie JW, Boyle S, Earle A, Almstedt HC. Heavy episodic drinking is associated with poorer bone health in adolescent and young adult women. J Stud Alcohol Drugs. 2018;79(3):391–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Wu X, Park S. A prediction model for osteoporosis risk using a Machine-Learning approach and its validation in a large cohort. J Korean Med Sci. 2023;38(21):e162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med. 2015;162(1):55–63. [DOI] [PubMed] [Google Scholar]
- 63.Lu JH, Callahan A, Patel BS, et al. Assessment of adherence to reporting guidelines by commonly used clinical prediction models from a single vendor: A systematic review. JAMA Netw Open. 2022;5(8):e2227779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Kamran F, Tjandra D, Heiler A, et al. Evaluation of sepsis prediction models before onset of treatment. NEJM AI 2024: AIoa2300032%@ 2836–9386.
- 65.Fihn SD, Berlin JA, Haneuse S, Rivara FP. Prediction models and clinical Outcomes-A call for papers. JAMA Netw Open. 2024;7(4):e249640. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The raw data is available directly on the official website to get specific content. And the codes used during the current study are available from https://github.com/YuqiZhang-Buaa/Online-Non-Radiographic-Osteoporosis-Prediction-Calculator.



