Abstract
Background
Renal injury is a severe complication among individuals diagnosed with gout. This research constructed a machine learning predictive model to assess renal injury risk in gout patients.
Methods
In this study, we trained predictive models for renal injury in patients with gout using NHANES, from 2007 to 2018 database. Extreme Gradient Boosting (XGBoost), support vector machine (SVM) and K-Nearest Neighbors (KNN) were used to train models. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), along with calibration curves and standard evaluation metrics including sensitivity, specificity, accuracy, and F1 score.
Results
A cohort of 1,203 patients was analyzed using seventeen variables to develop the predictive model. Extreme Gradient Boosting (XGBoost) was found to be the most effective model due to the area under the receiver operating characteristic curve (AUC). Extreme Gradient Boosting (XGBoost) was explained using variable importance. The four most important variables are blood urea nitrogen, age, uric acid, and urinary albumin.
Conclusions
This research successfully developed machine learning (ML) models to predict renal impairment in gout patients, with the XGBoost model demonstrating superior performance among the three models tested. And we constructed a Web-based tool for calculating the probability of kidney injury in gout patients based on the model XGBoost. We developed a web-based tool that leverages the XGBoost model to estimate the likelihood of renal injury in patients with gout.
Keywords: Gout, renal injury, machine learning, predictive modeling
1. Introduction
Gout is a widespread and treatable condition caused by elevated serum urate levels, which result in the deposition of sodium urate crystals in the joints, typically accompanied by intense pain. The global prevalence of gout is on the rise [1]. The incidence of gout has doubled during the past three decades [2]. The 2017 Global Burden of Disease study estimated 41.2 million adults worldwide have gout, double the number with rheumatoid arthritis [3]. Excessive serum urate in gout patients leads to ongoing kidney damage, and studies have shown that elevated uric acid levels exacerbate kidney injury by tubulointerstitial fibrosis [4]. Studies in the US and UK have demonstrated that gout patients have an increased risk of chronic kidney disease (CKD) [5]. Approximately 3.9% of Americans have gout, and one-fifth of these patients are diagnosed with stage 4 or higher CKD, according to the National Health and Nutrition Examination Survey (NHANES) [6]. Studies in the United Kingdom have shown that within three years of being diagnosed with gout, the risk of developing CKD stage 3 is increased by 78% compared to individuals without gout [7]. A meta-analysis has shown that the overall prevalence of CKD in gout patients is 24% [8]. The presence of kidney injury in gout patients complicates medication management for doctors. Moreover, gout patients with concurrent kidney injury often face severe consequences, including high medical costs and mortality rates [9]. The risk of cardiovascular disease is significantly increased when gout is combined with kidney impairment. Kidney injury and hyperuricemia work together to exacerbate cardiovascular problems such as hypertension and atherosclerosis [10]. In addition, medication management is further complicated by kidney injury, as it limits the use of certain gout medications, such as nonsteroidal anti-inflammatory drugs (NSAIDs) and colchicine, which may exacerbate renal dysfunction [11]. Therefore, treatment options are limited, increasing the complexity of management. Moreover, in the early stages of kidney injury, gout patients often exhibit minimal symptoms. Once kidney damage progresses, treatment is typically less effective, and the condition may become irreversible. This highlights the critical importance of early identification of gout patients at high risk for kidney injury. Recent research have also emphasized the vital role of predictive models in the early detection of kidney damage [12], this provides a way for identifying whether gout patients have concurrent kidney injury.
Recent advances in renal injury research have enhanced both diagnostic accuracy and prognostic capabilities. Physicians have developed numerous clinical prediction models based on linear models, such as logistic regression, to diagnose kidney injury. Additionally, the development of machine learning models has provided a new avenue for predicting kidney injury. Machine learning, a subset of artificial intelligence, is defined as the capability of machines to learn from a set of training data and make predictions on data beyond the initial training dataset [13]. In the field of medicine, the widespread digitization of medical data has led to the rapid adoption of machine learning in the medical field with the transformation of electronic health information systems [14], particularly in disease diagnosis and prognosis [15–17]. Due to its exceptional performance in handling large, high-dimensional datasets, machine learning outperforms traditional statistical methods to some extent in the construction of predictive models [18]. Currently, machine learning is gradually being applied to kidney injury research. Many risk prediction models built using machine learning have been developed to predict kidney injury, though most of these are focused on patients in intensive care units and those following cardiac surgery [19–21]. Zheng et al. developed a risk prediction model for kidney injury using clinical data from 1,149 intensive care unit patients. The model optimized the prediction of acute kidney injury by incorporating variables such as serum creatinine, total bilirubin, magnesium, shock index, lymphocyte count [22]. Ryan et al. [20] used electronic health record data from the MIMIC-IV database and employed machine learning techniques to develop a risk prediction model for kidney injury in post-cardiac surgery patients. The model incorporated hemodynamic data, medications, fluid intake/output, and other indicators to predict kidney injury in these patients. It can predict future kidney injury every hour within 48 h with high accuracy. Currently, in the field of gout, there is a gap in predicting kidney injury, with no scientific prediction tool available for assessing kidney injury in gout patients.
Renal injury in gout patients is a multifactorial and multi-mechanism process involving multiple aspects such as abnormal uric acid metabolism and oxidative stress. Traditional statistical methods may have limitations in dealing with this complexity, while machine learning models can better handle this complex nonlinear relationship. Therefore, this study proposes to use three machine learning methods to construct a risk prediction model of whether renal injury occurs in gout patients.
2. Materials and methods
2.1. Data source
The NHANES, conducted every two years by the National Center for Health Statistics (NCHS), provides a comprehensive evaluation of the health and nutritional status of the civilian population in the United States. This systematic surveillance program aims to provide detailed epidemiological data on contemporary disease patterns and subsequently inform evidence-based public health policy development and implementation [23]. It collects a variety of information including patient demographics, dietary status, physical measurements, and laboratory test results. Since NHANES data has received prior approval from the NCHS’ Institutional Review Board and contains only anonymized information, no additional ethical clearance is needed to use this public dataset. This study strictly adheres to the NHANES data use guidelines and conducts secondary analysis on the relevant variables.
2.2. Data collection and preprocessing
We downloaded data from the NHANES website and screened variables based on existing parameters, current clinical guidelines in the disease area, authoritative literature, and expert consensus to ensure clear clinical relevance and solid theoretical justification, resulting in the selection of 18 candidate variables. These 18 variables underwent univariate logistic regression analyses, and variables demonstrating statistical significance (p < 0.05) were included for final model development, ultimately yielding 13 variables for analysis. These variables include demographic factors such as age and body mass index. Additionally, we collected data on serum phosphate, serum potassium, serum sodium, serum uric acid (UA), hypertension, diabetes, hemoglobin, blood urea nitrogen (BUN), kidney stones, urine protein, bicarbonate, serum calcium, triglycerides, and cholesterol. Our outcome variable was whether the patient had kidney injury, with eGFR calculated using the 2021 CKD Epidemiology Collaboration (CKD-EPI) formula. Kidney injury was defined as eGFR < 60 mL/min/1.73 m2. This study analyzed data from six consecutive survey cycles (2007–2018) of the NHANES database, comprising 59,842 participants. Following rigorous screening based on exclusion criteria, 58,639 participants were excluded from the study. This group comprised 27,454 individuals outside the designated age range (<18 or >80 years), 30,994 non-gouty arthritis cases, 132 individuals with missing serum creatinine data, and 59 with incomplete demographic information. Missing data in eligible samples were imputed using the random forest method. To assess model robustness and determine the impact of the imputation technique on the machine learning model, we also applied the k-nearest neighbor method and constructed risk prediction models from both datasets. The results demonstrated that the choice of interpolation method did not significantly influence the model, as evidenced by consistent evaluation indices and the top three features ranked by SHAP values. For further details, please refer to the supplementary materials Table S3. Ultimately, 1203 eligible participants were included in the final analytical cohort.
2.3. Machine learning
In this study, all gout patients were randomly assigned to training and validation groups, with 70% of the total participants in the training group and 30% in the validation group. Three machine learning techniques were employed to construct risk prediction models: XGBoost, KNN, and SVM. A ten-fold cross validation method was implemented, using AUC from the training dataset to evaluate performance. Grid search was applied to select the best hyperparameters, as detailed in the Supplemental material. We assessed the predictive performance of various machine learning models by calculating accuracy, sensitivity, specificity, AUC, and F1 score. To validate the model, calibration curves were created to evaluate the agreement between the predicted and actual probabilities. Decision curves were also plotted to evaluate the clinical utility of the model. We calculated the Youden index for the model with the strongest predictive performance, determined the optimal cutoff value, and stratified patients into high-risk and low-risk groups accordingly.
2.4. Statistical analysis
Firstly, univariate analysis was conducted for each candidate variable. For continuous variables (e.g. age, BMI, blood calcium, urinary albumin), the Kolmogorov-Smirnov test was used to assess the normality of distributions. It was found that all continuous variables were skewed, and these skewed variables were described using the median (interquartile range), with group comparisons performed using the Mann-Whitney U test. For categorical variables (e.g. gender, diabetes, hypertension), frequencies and percentages were used for representation, and the chi-square test was employed to evaluate the significance of differences between groups. All statistical analyses were performed in RStudio using R version 4.4.1.
3. Results
3.1. Baseline characteristics
A total of 1203 eligible participants were enrolled in this study with a median age of 63 [54, 71] years, 70.41% males, and a median BMI of 31.10[27.20, 36.00] (kg/m2), where the prevalence of common complications such as hypertension, diabetes, and kidney stones were, 71.74%, 33%, and 16.46%, respectively. The blood biochemical parameters comprised uric acid and calcium levels, among other clinically relevant measures.
The study cohort was divided into a training dataset (n = 843) and a Validation dataset (n = 360). Statistical analyses indicated no significant differences in most clinical parameters between the two patient groups (p > 0.05). The baseline characteristics of both the training and validation cohorts are presented in Table 1. Patients were stratified into renal injury and non-renal injury groups based on eGFR thresholds (≥60 vs <60 mL/min/1.73m2). As shown in Table 2, comparative analysis of baseline characteristics revealed significant demographic and clinical differences between the groups. Compared to the non-renal injury group, the renal injury group had a significantly higher median age (69 vs. 61 years, p < 0.01). Furthermore, several clinical parameters, including blood urea nitrogen and BMI, were significantly elevated in the renal injury group (p < 0.01). These marked differences in demographic and clinical indicators suggest their potential role as contributing factors in the development of renal injury.
Table 1.
The clinicopathologic characteristics of patients in the training and validation dataset.
| Characteristics | All patients (n = 1203) | training dataset (n = 843) | validation dataset (n = 360) | p |
|---|---|---|---|---|
| Age (years), median (IQR) | 63 [54, 71] | 62.00 [53.50, 71.00] | 63.00 [54.00, 70.25] | 0.89 |
| Sex n (%) | 0.58 | |||
| Male | 847 (70.41%) | 598 (70.9%) | 249 (69.2%) | |
| Female | 356 (29.59%) | 245 (29.1%) | 111 (30.8%) | |
| Cancer comorbidity | 0.60 | |||
| YES | 209 (17.37%) | 151 (17.9%) | 58 (16.1%) | |
| NO | 994 (82.63%) | 692 (82.1%) | 302 (83.9%) | |
| BMI(kg/m2), median (IQR) | 31.10 [27.20,36.00] | 30.90 [27.11, 35.59] | 31.65 [27.90, 36.77] | 0.06 |
| Kidney stone n (%) | 0.83 | |||
| YES | 198 (16.46%) | 137 (16.2%) | 61 (16.9%) | |
| NO | 1005 (83.54%) | 706 (83.8%) | 299 (83.1%) | |
| Urinary albumin (μg/mL), median (IQR) | 13.80 [6.10, 42.05] | 14.10 [6.25, 43.90] | 12.45 [5.88, 36.73] | 0.57 |
| Hemoglobin (g/dL), median (IQR) | 14.20 [13.10, 15.20] | 14.20 [13.10, 15.30] | 14.20 [12.90, 15.10] | 0.22 |
| Diabetes n (%) | 0.53 | |||
| YES | 397 (33%) | 273 (32.4%) | 124 (34.4%) | |
| NO | 806 (67 %) | 570 (67.6%) | 236 (65.6%) | |
| Hypertension n (%) | 0.65 | |||
| YES | 863 (71.74%) | 601 (71.3%) | 262 (72.8%) | |
| NO | 340 (28.26%) | 242 (28.7%) | 98 (27.2%) | |
| Blood urea nitrogen (mmol/L), median (IQR) | 5.71 [4.28, 7.14] | 5.36 [4.28, 7.14] | 5.71 [4.28, 7.14] | 0.55 |
| Calcium (mmol/L), median (IQR) | 2.35 [2.27, 2.40] | 2.35 [2.27, 2.40] | 2.35 [2.27, 2.40] | 0.41 |
| Phosphorus (mmol/L), median (IQR) | 1.16 [1.07, 1.29] | 1.16 [1.07, 1.29] | 1.16 [1.07, 1.29] | 0.64 |
| Bicarbonate (mmol/L), median (IQR) | 25.00 [23.00, 27.00] | 25.00 [23.00, 27.00] | 25.00 [23.75, 27.00] | 0.80 |
| Sodium (mmol/L), median (IQR) | 139.50 [138.00,141.00] | 139.50 [138.00,141.00] | 139.50 [138.00,141.00] | 0.62 |
| Cholesterol (mmol/L), median (IQR) | 4.82 ± 1.16 | 4.82 ± 1.18 | 4.83 ± 1.12 | 0.96 |
| Triglycerides (mmol/L), median (IQR) | 1.71 [1.17, 2.61] | 1.68 [1.16, 2.60] | 1.81 [1.22, 2.64] | 0.16 |
| Potassium (mmol/L), median (IQR) | 4.06 [3.80, 4.30] | 4.10 [3.80, 4.30] | 4.00 [3.72, 4.30] | 0.01 |
| Uric acid (μmol/L), median (IQR) | 394.81 ± 110.02 | 395.20 ± 107.70 | 393.71 ± 115.40 | 0.82 |
Table 2.
The clinicopathologic characteristics of patients in the renal injury and No-renal injury cohort.
| Characteristics | All patients (n = 1203) | No-renal injury (n = 916) | renal injury (n = 287) | p |
|---|---|---|---|---|
| Age (years), median (IQR) | 63 [54, 71] | 61.00 [51.00, 69.00] | 69.00 [63.00, 75.00] | <0.01 |
| Sex n (%) | 0.08 | |||
| Male | 847 (70.41%) | 657 (71.7%) | 190 (66.2%) | |
| Female | 356 (29.59%) | 259 (28.3%) | 97 (33.8%) | |
| Cancer comorbidity | <0.01 | |||
| YES | 209 (17.37%) | 135 (14.7%) | 74 (25.8%) | |
| NO | 994 (82.63%) | 781 (85.3%) | 213 (74.2%) | |
| BMI (kg/m2), median (IQR) | 31.10 [27.20,36.00] | 30.80 [26.84, 35.30] | 32.08 [27.95, 38.60] | <0.01 |
| Kidney stone n (%) | 0.55 | |||
| YES | 198 (16.46%) | 147 (16%) | 51 (17.8%) | |
| NO | 1005 (83.54%) | 769 (84%) | 236 (82.2%) | |
| Urinary albumin (μg/mL), median (IQR) | 13.80 [6.10, 42.05] | 11.10 [5.50, 27.52] | 40.60 [10.95, 169.00] | <0.01 |
| Hemoglobin (g/dL), median (IQR) | 14.20 [13.10, 15.20] | 14.60 [13.40, 15.40] | 13.20 [11.90, 14.30] | <0.01 |
| Diabetes n (%) | <0.01 | |||
| YES | 397 (33%) | 251 (27.4%) | 146 (50.9%) | |
| NO | 806 (67 %) | 665 (72.6%) | 141 (49.1%) | |
| Hypertension n (%) | <0.01 | |||
| YES | 863 (71.74%) | 608 (66.4%) | 255 (88.8%) | |
| NO | 340 (28.26%) | 308 (33.6%) | 32 (11.2%) | |
| Blood urea nitrogen (mmol/L), median (IQR) | 5.71 [4.28, 7.14] | 5.00 [3.93, 6.07] | 8.57 [6.43, 11.42] | <0.01 |
| Calcium (mmol/L), median (IQR) | 2.35 [2.27, 2.40] | 2.35 [2.30, 2.40] | 2.35 [2.27, 2.42] | <0.01 |
| Phosphorus (mmol/L), median (IQR) | 1.16 [1.07, 1.29] | 1.16 [1.07, 1.29] | 1.23 [1.07, 1.32] | <0.01 |
| Bicarbonate (mmol/L), median (IQR) | 25.00 [23.00, 27.00] | 25.00 [24.00, 27.00] | 25.00 [23.00, 27.00] | <0.01 |
| Sodium (mmol/L), median (IQR) | 139.50 [138.00,141.00] | 139.00 [138.00,141.00] | 140.00 [138.00,142.00] | <0.01 |
| Cholesterol (mmol/L), median (IQR) | 4.82 ± 1.16 | 4.94 ± 1.17 | 4.45 ± 1.05 | <0.01 |
| Triglycerides (mmol/L), median (IQR) | 1.71 [1.17, 2.61] | 1.72 [1.16, 2.62] | 1.73 [1.24, 2.51] | <0.01 |
| Potassium (mmol/L), median (IQR) | 4.06 [3.80, 4.30] | 4.00 [3.80, 4.23] | 4.20 [3.90, 4.50] | <0.01 |
| Uric acid (μmol/L), median (IQR) | 394.81 ± 110.02 | 386.65 ± 104.36 | 420.88 ± 123.01 | <0.01 |
3.2. Performance comparison of machine learning algorithm models
In this study, we incorporated 13 variables into the model development. For each algorithm in the training dataset, we performed ten-fold cross-validation grid search, using AUC as the evaluation criterion to determine the best hyperparameters (see supplemental material). The model with the highest predictive accuracy was then validated in the model’s validation cohort. The supplemental material presents a comparative analysis of the three machine learning models’ performance.
The XGBoost algorithm achieved the best predictive performance in the validation cohort, with an accuracy of 0.833 (95% CI:0.791–0.866), sensitivity of 0.639 (95% CI:0.527–0.735), specificity of 0.894 (95% CI:0.852–0.925), AUC of 0.866 (95% CI:0.817–0.908), and an F1 score of 0.647 (95% CI:0.550–0.721). The performance of the other models is detailed in the supplementary document. The ROC curves for the three machine learning algorithms in the training dataset are shown in Figure 1A, and the ROC curves in the Validation dataset are shown in Figure 1B. To demonstrate the predictive models’ comparative accuracy and clinical value, we generated calibration and clinical decision curves for all three models, with detailed information provided in the Supplemental material.
Figure 1.
(A) ROC curves of three ML models in the training cohort; (B) ROC curves of three ML models in the validation cohort.
3.3. Interpretation and visualization of xgboost predictions
To elucidate the impact of each predictor on the XGBoost model output, SHAP values were computed in this study. SHAP value interpretation, as an additive feature attribution method, represents the model’s prediction as a linear function of binary variables. This approach quantifies each feature’s contribution toward the final prediction for each observation, thus measuring the relative importance of each variable for an individual outcome. It achieves this by considering all possible combinations of feature subsets (including the specific feature in question) to predict the model’s output [24].
Through SHAP value analysis, we examine how the XGBoost model predicts kidney injury. This summary plot of SHAP values illustrates the ranking of importance of the features of the model constructed by XGboost, with blood urea nitrogen, uric acid, age, and urinary albumin being the main contributors to this model.
Figure 2 demonstrates that blood urea nitrogen (mean SHAP = 1.231) is the most influential driver for the model’s predictions, surpassing other variables by a significant margin. Following closely in importance are uric acid (0.681), age (0.628), and urinary albumin (0.492).
Figure 2.
The important features derived from the XGBoost model.
Figure 3 illustrates that blood urea nitrogen exhibits a robust, bidirectional effect: higher levels strongly elevate predicted values (concentrated positive SHAP values), whereas lower levels significantly reduce predictions (concentrated negative SHAP values).
Figure 3.
SHAP summary plot of the features of the XGBoost model.
Increasing age consistently acts as a stable risk factor for elevated predicted values. Advanced age (represented by red points in the figure) correlates with higher SHAP values, indicating that the risk of kidney injury rises progressively with age.
Uric acid shows a moderate, bidirectional influence: elevated uric acid levels tend to increase predicted values (positive SHAP values), while lower levels generally decrease predicted values (negative SHAP values).
Urinary albumin ranks as the fourth most critical predictor within the model, demonstrating a predominantly positive association with outcomes. High urinary albumin levels are primarily linked to positive SHAP values, signifying an increased risk of kidney injury, whereas low levels correlate mainly with negative SHAP values, signifying a reduced risk.
For more information see Figure 2, a feature’s SHAP value directly correlates with the probability of developing kidney injury. Each horizontal line displays a distinct feature, with its position along the x-axis indicating the corresponding SHAP value. Purple dots indicate higher values of features and yellow dots indicate lower values of features. In addition, we also plotted the importance of the characteristics. More information is detailed in Figure 3.
3.4. A web-based tool for predicting kidney injury in gout patients
We ultimately chose the model built by the best performing XGBoost algorithm for deployment, and developed an easy-to-access online tool (https://ricardo-shiny-account.shinyapps.io/shinyapp/) for clinicians to predict renal injury in gout patients, in a clinical practice setting, by taking the values and inputting them into a web application and clicking the ‘Prediction’ button to obtain the probability of renal injury in gout patients and automatically classifying patients into high- and low-risk groups based on the best threshold value. In a clinical practice setting, the values are entered into a web application and then the ‘Predict’ button is clicked to obtain the probability of renal injury in gout patients, and patients are automatically categorized into high-risk and low-risk groups based on an optimal threshold. This has important clinical implications.
4. Discussion
In this retrospective study using six consecutive survey cycles from the NHANES database, a sample of 1203 participants met the eligibility criteria. Among them, 287 individuals developed kidney injury, resulting in a prevalence rate of 23.86%, which is consistent with previous studies. We constructed predictive models employing three machine learning techniques to forecast the occurrence of kidney injury in gout patients. XGBoost performed best among the three machine learning models at predicting kidney injury in gout patients. Detailed performance metrics for the three models are provided in the Supplemental material. The XGBoost model achieved an AUC value of 0.866 in predicting whether gout patients had kidney injury. To examine how different variables affect the XGBoost model’s performance, we explained the machine learning algorithm using SHAP value plots. According to the SHAP value plot, blood urea nitrogen, age, uric acid, and urinary albumin were the four most significant features contributing to the XGBoost model’s prediction of kidney injury in individuals with gout. This contributes to a comprehensive understanding of machine learning models used to predict whether renal injury occurs in gout patients.
Currently, most renal injury risk assessment tools are designed for populations with diabetes and cardiovascular diseases. Dunkler et al. developed a prediction model using relevant variables such as biomarkers, age, and sex to predict kidney injury in patients with type 2 diabetes [25]. Wang et al. developed a model to predict acute kidney injury using urinary biomarkers from 149 patients who had undergone cardiovascular surgery, employing a logistic regression model [26]. Compared to diabetes and cardiovascular disease, there is a lack of tools to assess kidney injury in gout patients, and this study fills a research gap by focusing on a previously overlooked group of gout patients.
In this study, the XGBoost model exhibited higher accuracy in identifying kidney injury in gout patients compared to other models and was more precise in determining whether or not a gout patient had kidney injury. This is consistent with findings from other studies in the field. The study by Fan et al. demonstrated that the XGBoost algorithm outperforms both the Random Forest method and logistic regression in predicting the occurrence of kidney injury in sepsis [27]. Tseng et al. found that the use of XGBoost was superior to SVM and RF in predicting post-cardiac surgery acute kidney injury [28]. There are multiple reasons for the high predictive power of the XGBoost model. XGBoost, derived from gradient tree boosting, effectively handles complex interactions, discontinuities, and nonlinear relationships while remaining robust against both predictor variable outliers and multicollinearity [29].
In this research, we used three machine learning models to predict kidney injury in gout patients, with XGBoost achieving the highest performance. The four most predictive variables in this model were blood urea nitrogen, uric acid, age and urinary albumin, suggesting that these factors are closely related to kidney injury in gout patients. Blood urea nitrogen is a vital indicator of kidney function, and numerous researches have shown that blood urea nitrogen levels are negatively correlated with decreased kidney function [30,31]. Gout patients often have a high protein diet, high protein diet can increase the glomerular filtration load, triggering ultrafiltration and glomerular intraglomerular pressure, long-term effect accelerates glomerulosclerosis, triggering kidney impairment [32], so the abnormal increase in BUN level becomes a key signal reflecting the process of kidney injury. Similarly, in this research, blood urea nitrogen had the greatest weight in the XGBoost model and was a significant predictor of whether kidney injury occurred in gout patients. In addition to assessing the contribution of each variable to the model using SHAP values, we also statistically examined the differences between subgroups of the key variables using the non-parametric test to explore their distributional characteristics at the group level. The non-parametric test revealed a statistically significant difference in median blood urea nitrogen levels between the kidney injury and non-kidney injury groups (p < 0.01). Notably, patients in the kidney injury group exhibited markedly elevated blood urea nitrogen concentrations compared to those without kidney injury, lending further support to the established association between heightened blood urea nitrogen levels and increased risk of kidney injury. As age increases, kidney function progressively declines [33]. In this research, age emerged as a significant predictor of kidney injury in gout patients. The older the patient, the higher the risk of kidney injury. Additionally, the median age of patients with kidney injury in our study was significantly greater than that of gout patients without kidney injury. This further strengthens the idea that aging is a key factor in the onset of kidney injury among gout patients. This is consistent with the outcome study of DAMMAN et al. [34]. Hemoglobin is a known predictor of kidney injury [35]. Our study found that the lower the hemoglobin level, the greater the chance of kidney damage in gout patients, in this study with the kidney injury group showing significantly lower hemoglobin levels compared to those without kidney injury, and hemoglobin demonstrated a significant contribution to the model. High uric acid concentrations significantly increase the risk of developing gout, and uric acid is excreted primarily through the kidneys [36,37]. Hyperuricemia has been recognized as a standalone risk factor for renal impairment [38]. Our research indicates that elevated uric acid levels are associated with an increased risk of kidney damage in gout patients. Blood uric acid levels were significantly higher in the kidney injury group compared to the non-kidney injury group (p < 0.01) Urinary albumin, serving as a marker of glomerular injury and a predictor of renal damage [39], has been substantiated by numerous studies to forecast kidney injury in patients undergoing cardiac surgery and those with sepsis [40,41], in our study urinary albumin levels contributed high impact to the model and were significantly elevated in patients with kidney injury compared to those without. Obesity is one of the risk factors for kidney damage. A meta-analysis showed that there is a link between obesity or high BMI and kidney damage. Compared with normal weight, being overweight and/or obese is associated with an increased risk of kidney damage [42]. In this study, the BMI of patients in the renal injury group was significantly higher than that in the non-renal injury group, which was consistent with previous studies. Blood biochemical indices, including cholesterol, blood potassium, blood phosphorus, and bicarbonate, all showed statistically significant differences (p < 0.01). This suggests that these indicators have a general influence in predicting kidney injury in gout patients.
Studies have shown that lipid metabolism disorders are associated with the development of kidney injury in patients with gout [43], cholesterol play important role in lipid metabolism, Dang et al. [44] demonstrated that monitoring cholesterol levels facilitates early detection of renal dysfunction and predicts the risk of kidney injury in patients with gout, which aligns with our findings. The kidneys have a key role in maintaining fluid and electrolyte balance, and changes in the serum electrolytes of potassium, phosphorus, and bicarbonate have been associated with kidney injury. The results of a retrospective cohort study showed that serum potassium was a predictor of renal injury, and that even fluctuations in serum potassium within the normal range were associated with the development of renal injury [45]. Serum phosphorus was shown to be a risk factor for predicting renal injury in Burra et al.’s study, which allowed early prediction of acute kidney injury in pediatric cardiac surgery [46]. Bicarbonate is an important substance in the human body for maintaining acid-base balance and physiological functions. An observational study has shown that bicarbonate levels upon admission can independently predict kidney injury in critically patients [47].
In addition to biochemical markers, clinical comorbidities such as hypertension is also predictor of kidney injury. Hypertension is a major cause of renal injury and can lead to diseases such as nephrosclerosis and hypertensive nephropathy [48]. In our study, hypertension was statistically different on renal injury as well as non-kidney injury groups and contributed to the model, which is consistent with previous studies. Furthermore, diabetes showed statistically significant differences between the kidney injury and non-kidney injury groups, exhibiting a notable influence within the predictive model, consistent with previous studies. Cancer is currently recognized as a contributor to kidney injury, primarily due to tumor-related compression and nephrotoxic treatments. In this study, cancer was also identified as a predictive factor for gout, with significant differences in cancer comorbidity between the kidney injury and non-kidney injury groups. Similar to diabetes, cancer comorbidity contributed meaningfully to the predictive model.
Machine learning-based risk prediction models demonstrate robust performance; however, their clinical adoption is constrained by the additional technical development and costs required to integrate them into existing electronic health record systems and clinical workflows. This study developed an online calculator based on the best-performing XGBoost model, enabling clinicians to easily access the tool via computers and smartphones and thereby support the effective management of renal injury in gout patients.
However, this study has several limitations. Firstly, there are a number of factors that influence kidney injury in gout patients that were not included in this study, such as medication use. The use of uric acid-lowering medications affects renal function to some extent in gout patients, and renal impairment is lower in gout patients who have been on stable long-term treatment with uric acid-lowering medications than in those who have not. We took this into account but did not include it in the model due to the lack of data related to drug dosage as well as type in the dataset, which affected the accuracy of the model to some extent. Secondly, as a cross-sectional study, its predictive power is inherently limited compared to prospective investigations. Additionally, our predictive model was developed based on NHANES data, which primarily reflects the characteristics of the general population in the United States. Therefore, there is uncertainty as to whether it is equally applicable in different countries, ethnicities, genetic backgrounds, co-morbidities, or healthcare settings, and the performance of the model in other populations may need to be reassessed and adjusted to take into account differences in genetics, lifestyles, and healthcare resources across populations. Due to short-term unavailability of other agencies or external data for validation. This study was only internally validated using a split dataset and not externally validated, it may affect the model’s ability to generalize, and its performance in other healthcare settings or patient populations is unclear, limiting the generalizability of the model and risking an overestimation of predictive efficacy; prospective studies are needed in the future to further confirm the model’s robustness and actual clinical value. And consider comparisons with traditional clinical risk scores (e.g. Chronic Renal Failure Risk Calculator)
5. Conclusion
In conclusion, we constructed three models to predict the occurrence of renal injury in gout patients, using predictors such as sex, age, diabetes, kidney stones, and blood biochemical indicators. Among these models, XGBoost demonstrated the highest effectiveness. This study preliminarily demonstrates that a risk prediction model integrating multiple factors has high accuracy in the early screening of kidney injury in gout patients, highlighting its potential clinical application value and academic significance.
Supplementary Material
Acknowledgments
Study design: Suya Sun and Yuankai Li; Data extraction: Lihua Mao and Yuankai Li; Data analysis: Suya Sun and Yuankai Li; Academic and clinical supervision: Donghui Shi and Xiaoli Yang; Manuscript preparation: Yuankai Li; The final version of the manuscript was critically reviewed and formally approved by all contributing authors.
Funding Statement
The authors report that the work in this study is not supported by relevant funds.
Disclosure statement
No potential conflict of interest was reported by the authors.
Data availability statement
The data used in this study is publicly available at NHANES’ https://www.cdc.gov/nchs/ nhanes/
References
- 1.Dalbeth N, Gosling AL, Gaffo A, et al. Gout. Lancet. 2021;397(10287):1843–1855. doi: 10.1016/S0140-6736(21)00569-9. [DOI] [PubMed] [Google Scholar]
- 2.Danve A, Neogi T.. Rising Global Burden of Gout: time to Act. Arthritis Rheumatol. 2020;72(11):1786–1788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Global, regional, and national disability-adjusted life-years (DALYs) for 359 diseases and injuries and healthy life expectancy (HALE) for 195 countries and territories, 1990-2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392(10159):1859–1922. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Stack AG, Johnson ME, Blak B, et al. Gout and the risk of advanced chronic kidney disease in the UK health system: a national cohort study. BMJ Open. 2019;9(8):e031550. doi: 10.1136/bmjopen-2019-031550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Singh JA, Gaffo A.. Gout epidemiology and comorbidities. Semin Arthritis Rheum. 2020;50(3s):S11–s16. doi: 10.1016/j.semarthrit.2020.04.008. [DOI] [PubMed] [Google Scholar]
- 6.Zhu Y, Pandya BJ, Choi HK.. Prevalence of gout and hyperuricemia in the US general population: the National Health and Nutrition Examination Survey 2007-2008. Arthritis Rheum. 2011;63(10):3136–3141. doi: 10.1002/art.30520. [DOI] [PubMed] [Google Scholar]
- 7.Roughley M, Sultan AA, Clarson L, et al. Risk of chronic kidney disease in patients with gout and the impact of urate lowering therapy: a population-based cohort study. Arthritis Res Ther. 2018;20(1):243. doi: 10.1186/s13075-018-1746-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Curiel RV, Guzman NJ.. Challenges associated with the management of gouty arthritis in patients with chronic kidney disease: a systematic review. Semin Arthritis Rheum. 2012;42(2):166–178. doi: 10.1016/j.semarthrit.2012.03.013. [DOI] [PubMed] [Google Scholar]
- 9.Jaffe DH, Klein AB, Benis A, et al. Incident gout and chronic Kidney Disease: healthcare utilization and survival. BMC Rheumatol. 2019;3(1):11. doi: 10.1186/s41927-019-0060-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Feig DI, Kang DH, Johnson RJ.. Uric acid and cardiovascular risk. N Engl J Med. 2008;359(17):1811–1821. doi: 10.1056/NEJMra0800885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Medani S, Wall C.. Colchicine toxicity in renal patients - Are we paying attention?. Clin Nephrol. 2016;86(2):100–105. doi: 10.5414/CN108343. [DOI] [PubMed] [Google Scholar]
- 12.Tran TT, Yun G, Kim S.. Artificial intelligence and predictive models for early detection of acute kidney injury: transforming clinical practice. BMC Nephrol. 2024;25(1):353. doi: 10.1186/s12882-024-03793-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ting Sim JZ, Fong QW, Huang W, et al. Machine learning in medicine: what clinicians should know. Singapore Med J. 2023;64(2):91–97. doi: 10.11622/smedj.2021054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Barrios JP, Tison GH.. Advancing cardiovascular medicine with machine learning: progress, potential, and perspective. Cell Rep Med. 2022;3(12):100869. doi: 10.1016/j.xcrm.2022.100869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Piccialli F, CALABRò F, Crisci D, et al. Precision medicine and machine learning towards the prediction of the outcome of potential celiac disease. Sci Rep. 2021;11(1):5683. doi: 10.1038/s41598-021-84951-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Marcinkevics R, Reis Wolfertstetter P, Wellmann S, et al. Using machine learning to predict the diagnosis, management and severity of pediatric appendicitis. Front Pediatr. 2021;9:662183. doi: 10.3389/fped.2021.662183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Jiang M, Li Y, Jiang C, et al. Machine learning in rheumatic diseases. Clin Rev Allergy Immunol. 2021;60(1):96–110. doi: 10.1007/s12016-020-08805-6. [DOI] [PubMed] [Google Scholar]
- 18.Bernard D, Doumard E, Ader I, et al. Explainable machine learning framework to predict personalized physiological aging. Aging Cell. 2023;22(8):e13872. doi: 10.1111/acel.13872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zhang Z, Ho KM, Hong Y.. Machine learning for the prediction of volume responsiveness in patients with oliguric acute kidney injury in critical care. Crit Care. 2019;23(1):112. doi: 10.1186/s13054-019-2411-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ryan CT, Zeng Z, Chatterjee S, et al. Machine learning for dynamic and early prediction of acute kidney injury after cardiac surgery. J Thorac Cardiovasc Surg. 2023;166(6):e551–e564. doi: 10.1016/j.jtcvs.2022.09.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Gottlieb ER, Samuel M, Bonventre JV, et al. Machine learning for acute kidney injury prediction in the intensive care unit. Adv Chronic Kidney Dis. 2022;29(5):431–438. doi: 10.1053/j.ackd.2022.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zheng L, Lin Y, Fang K, et al. Derivation and validation of a risk score to predict acute kidney injury in critically ill cirrhotic patients. Hepatol Res. 2023;53(8):701–712. doi: 10.1111/hepr.13907. [DOI] [PubMed] [Google Scholar]
- 23.Mao Y, Weng J, Xie Q, et al. Association between dietary inflammatory index and Stroke in the US population: evidence from NHANES 1999–2018. BMC Public Health. 2024;24(1):50. doi: 10.1186/s12889-023-17556-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lundberg S, Lee S-I. A unified approach to interpreting model predictions. arXiv Preprint. 2017. [Google Scholar]
- 25.Dunkler D, Gao P, Lee SF, et al. Risk Prediction for Early CKD in Type 2 Diabetes. Clinical J Am Soc Nephrol. 2015;10(8):1371–1379. doi: 10.2215/CJN.10321014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Wang JJ, Chi NH, Huang TM, et al. Urinary biomarkers predict advanced acute kidney injury after cardiovascular surgery. Crit Care. 2018;22(1):108. doi: 10.1186/s13054-018-2035-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Fan Z, Jiang J, Xiao C, et al. Construction and validation of prognostic models in critically Ill patients with sepsis-associated acute kidney injury: interpretable machine learning approach. J Transl Med. 2023;21(1):406. doi: 10.1186/s12967-023-04205-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Tseng PY, Chen YT, Wang CH, et al. Prediction of the development of acute kidney injury following cardiac surgery by machine learning. J Crit Care. 2020;24(1):478. doi: 10.1186/s13054-020-03179-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Li X, Wu R, Zhao W, et al. Machine learning algorithm to predict mortality in critically ill patients with sepsis-associated acute kidney injury. Sci Rep. 2023;13(1):5223. doi: 10.1038/s41598-023-32160-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Seki M, Nakayama M, Sakoh T, et al. Blood urea nitrogen is independently associated with renal outcomes in Japanese patients with stage 3-5 chronic kidney disease: a prospective observational study. BMC Nephrol. 2019;20(1):115. doi: 10.1186/s12882-019-1306-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Kim HJ, Kim TE, Han M, et al. Effects of blood urea nitrogen independent of the estimated glomerular filtration rate on the development of anemia in non-dialysis chronic kidney disease: the results of the KNOW-CKD study. PLoS One. 2021;16(9):e0257305. doi: 10.1371/journal.pone.0257305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Ko G-J, Rhee CM, Kalantar-Zadeh K, et al. The effects of high-protein diets on kidney health and longevity. JASN. 2020;31(8):1667–1679. doi: 10.1681/ASN.2020010028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Holmes J, Phillips D, Donovan K, et al. Acute kidney injury, age, and socioeconomic deprivation: evaluation of a national data set. Kidney Int Rep. 2019;4(6):824–832. doi: 10.1016/j.ekir.2019.03.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Damman K, Valente MA, Voors AA, et al. Renal impairment, worsening renal function, and outcome in patients with heart failure: an updated meta-analysis. Eur Heart J. 2014;35(7):455–469. doi: 10.1093/eurheartj/eht386. [DOI] [PubMed] [Google Scholar]
- 35.Chen YT, Jenq CC, Hsu CK, et al. Acute kidney disease and acute kidney injury biomarkers in coronary care unit patients. BMC Nephrol. 2020;21(1):207. doi: 10.1186/s12882-020-01872-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Dalbeth N, Choi HK, Joosten LAB, et al. Gout. Nat Rev Dis Primers. 2019;5(1):69. doi: 10.1038/s41572-019-0115-y. [DOI] [PubMed] [Google Scholar]
- 37.Fathallah-Shaykh SA, Cramer MT.. Uric acid and the kidney. Pediatr Nephrol. 2014;29(6):999–1008. doi: 10.1007/s00467-013-2549-x. [DOI] [PubMed] [Google Scholar]
- 38.Pascual E, Sivera F, ANDRéS M.. Managing gout in the patient with renal impairment. Drugs Aging. 2018;35(4):263–273. doi: 10.1007/s40266-018-0517-7. [DOI] [PubMed] [Google Scholar]
- 39.Duff S, Irwin R, Cote JM, et al. Urinary biomarkers predict progression and adverse outcomes of acute kidney injury in critical illness. Nephrol Dial Transplant. 2022;37(9):1668–1678. doi: 10.1093/ndt/gfab263. [DOI] [PubMed] [Google Scholar]
- 40.Sugimoto K, Toda Y, Iwasaki T, et al. Urinary albumin levels predict development of acute kidney injury after pediatric cardiac surgery: a prospective observational study. J Cardiothorac Vasc Anesth. 2016;30(1):64–68. doi: 10.1053/j.jvca.2015.05.194. [DOI] [PubMed] [Google Scholar]
- 41.Zhang Z, Lu B, Ni H, et al. Microalbuminuria can predict the development of acute kidney injury in critically ill septic patients. J Nephrol. 2013;26(4):724–730. doi: 10.5301/jn.5000231. [DOI] [PubMed] [Google Scholar]
- 42.Lan J, Xu G, Zhu Y, et al. Association of body mass index and acute kidney injury incidence and outcome: a systematic review and meta-analysis. J Ren Nutr. 2023;33(3):397–404. doi: 10.1053/j.jrn.2023.01.005. [DOI] [PubMed] [Google Scholar]
- 43.Zhang X, Liu J.. Regulating lipid metabolism in gout: a new perspective with therapeutic potential. Int J Gen Med. 2024;17:5203–5217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Dang W, Xu X, Luo D, et al. Analysis of risk factors for changes in the renal two-dimensional image in gout patients. Int J Gen Med. 2021;14:6367–6378. doi: 10.2147/IJGM.S336220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Lombardi G, Gambaro G, Ferraro PM.. Serum potassium disorders predict subsequent kidney injury: a retrospective observational cohort study of hospitalized patients. Kidney Blood Press Res. 2022;47(4):270–276. doi: 10.1159/000521833. [DOI] [PubMed] [Google Scholar]
- 46.Burra V, Nagaraja PS, Singh NG, et al. Early prediction of acute kidney injury using serum phosphorus as a biomarker in pediatric cardiac surgical patients. Ann Card Anaesth. 2018;21(4):455–459. doi: 10.4103/aca.ACA_14_18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Gujadhur A, Tiruvoipati R, Cole E, et al. Serum bicarbonate may independently predict acute kidney injury in critically ill patients: An observational study. World J Crit Care Med. 2015;4(1):71–76. doi: 10.5492/wjccm.v4.i1.71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Arendshorst WJ, Vendrov AE, Kumar N, et al. Oxidative stress in kidney injury and hypertension. Antioxidants. 2024;13(12):1454. doi: 10.3390/antiox13121454. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data used in this study is publicly available at NHANES’ https://www.cdc.gov/nchs/ nhanes/



