Skip to main content
PLOS One logoLink to PLOS One
. 2025 Jan 31;20(1):e0317558. doi: 10.1371/journal.pone.0317558

Classification and Regression Trees analysis identifies patients at high risk for kidney function decline following hospitalization

Weihao Wang 1, Wei Zhu 1, Janos Hajagos 2, Laura Fochtmann 2,3, Farrukh M Koraishy 4,*
Editor: Keiko Hosohata5
PMCID: PMC11785296  PMID: 39888928

Abstract

Estimated glomerular filtration rate (eGFR) decline is associated with negative health outcomes, but the use of decision tree algorithms to predict eGFR decline is underreported. Among patients hospitalized during the first year of the COVID-19 pandemic, it remains unclear which individuals are at the greatest risk of eGFR decline after discharge. We conducted a retrospective cohort study on patients hospitalized at Stony Brook University Hospital in 2020 who were followed for 36 months post discharge. Random Forest (RF) identified the top ten features associated with fast eGFR decline. Logistic regression (LR) and Classification and Regression Trees (CART) were then employed to uncover the relative importance of these top features and identify the highest risk patients. In the cohort of 1,747 hospital survivors, 61.6% experienced fast eGFR decline, which was associated with younger age, higher baseline eGFR, and acute kidney injury (AKI). Multivariate LR analysis showed that older age was associated with lower odds of fast eGFR decline whereas length of hospitalization and vasopressor use with greater odds. CART analysis identified length of hospitalization as the most important factor and that patients with AKI and hospitalization of 27 days or more were at highest risk. After grouping by ICU and COVID-19 status and propensity score matching for demographics, these risk factors of fast eGFR decline remained consistent. CART analysis can help identify patient subgroups with the highest risk of post-discharge eGFR decline. Clinicians should consider the length of hospitalization in post-discharge monitoring of kidney function.

Introduction

Fast decline in kidney function, as measured by estimated glomerular filtration rate (eGFR), is a hallmark of the development and progression of chronic kidney disease (CKD), and associated with adverse health outcomes [14]. Inadequate kidney recovers after acute kidney injury (AKI), a common diagnosis in hospitalized patients, is strongly associated with the risk of CKD [5]. Among hospitalized patients, post-discharge eGFR decline is seen in patients with or without AKI and associated with increased risk of death [68].

Machine learning (ML) models have been used to predict the risk of kidney disease [911]. In studies of patients with CKD, logistic regression (LR) and random forest (RF) algorithms have been used to identify patients at risk of fast eGFR decline [12, 13]. Although predictive models of eGFR decline identify important clinical factors, the relative importance of these factors and the combination of factors that identify highest and lowest risk patient subgroups is less extensively studied. Classification and Regression Trees (CART) is a non-parametric and non-linear supervised ML algorithm that identifies the various factors associated with clinical outcomes in an open-box decision tree approach. The feature importance of each node (clinical variable) is determined by its hierarchy in the decision tree. The resulting cluster and hierarchy of nodes can identify the patient subgroups at lowest and highest risk of the clinical outcome. CART has been used in the study of kidney diseases including AKI [1417], urinary obstruction [18], and dialysis patients [19]. Although decision tree algorithms have also been used to predict CKD [2026], their use in the prediction of eGFR decline has rarely been reported [27, 28]. The only study that clearly identified the use of CART to predict eGFR decline, was limited to a small, select group of patients and did not investigate post-hospitalization eGFR decline [28].

We recently reported the characteristics of Coronavirus disease 2019 (COVID-19) associated AKI in hospitalized patients in a large national cohort of 336,473 patients with COVID-19, out of whom 129,176 (38%) patients had AKI [29]. In a previous study at our medical center, we reported that among hospitalized patients, the prevalence of AKI was 41.3% in those with COVID-19, while only 24.2% in those without COVID [30]. Although COVID-19 associated AKI was recently reported to be associated with a lower risk of kidney outcomes compared to AKI due to other causes [31], the study of post-discharge kidney function in those without AKI is lacking. There is also lack of data on the use of ML models to predict factors associated with eGFR decline in patients hospitalized with COVID-19.

We hypothesized that using CART, we will be able to determine the relative importance of clinical risk factors and identify patients at the highest risk of kidney function decline after hospitalization during the first wave of COVID-19 pandemic. In this study, we employed RF to identify the most significant factors associated with post-hospitalization eGFR decline in patients admitted at Stony Brook University Hospital (SBUH), and then used LR and CART to determine the relative importance of these factors and the patient sub-groups at the highest and lowest risk.

Materials and method

Study design and participants

We conducted a retrospective cohort study on patients hospitalized at Stony Brook University Hospital (SBUH) from March 6th, 2020, to December 31st, 2020, during the first year of the COVID-19 pandemic. SBUH is the largest tertiary care centers in Long Island, NY that treated one of the highest numbers of COVID-19 patients in the United States in 2020. All patients who were discharged alive from the hospital were followed until February 16th, 2023, for assessment of kidney function (using outpatient eGFR values). We excluded patients who were ≤18 years of age, pregnant, or who had end stage kidney disease (ESKD), including chronic dialysis or kidney transplant. The study was approved by the SBU Institutional Review Board (IRB # 2020–00239).

Consent was not obtained (waived) since the data were deidentified and analyzed anonymously.

Privacy protection and data security

The data were accessed for research purposes on 01/06/2023 and 01/12/2023. Initially, the authors involved in the formal analysis had access to Protected Health Information (PHI) when it was extracted from the SBUH electronic health records (EHR). These files were stored on a secure, HIPAA-compliant server. PHI was promptly removed, and all subsequent analyses were conducted on de-identified patient data. All other authors had access only to de-identified data and summary results during research meetings.

Outcome (eGFR decline) definition

Hospital survivors were followed for 36 months post discharge from index hospitalization. For each patient, a baseline eGFR was identified during or before the index hospitalization. The change in eGFR was estimated using this baseline eGFR and the most recent eGFR for all patients with an outpatient eGFR value 90 or more days after discharge from the index hospitalization. The target outcome of “fast eGFR decline” was defined as ≥5 ml/min/1.73 m2 per year [32]. The ‘control group’ was patients without fast eGFR decline during follow-up.

Data collection and definition of other variables

Information pertaining to data collection and the definition of variables is detailed in the S1 Methods section.

Univariate analyses

If more than 5% of values for a variable were missing, the variable was removed from the analysis. In each subset and propensity matched dataset, univariate LR was used to identify potential risk factors for the target outcome (fast versus not fast eGFR decline). For continuous variables, we summarized the mean value and standard deviation for each patient group (ICU admitted versus not admitted, COVID-19 positive versus COVID-19 negative, and with fast eGFR decline versus without fast eGFR decline). For binary variables (true or false), we determined the number and corresponding proportion in each patient group. P-value < 0.05 was considered as the cutoff for statistical significance.

Summary of the multivariate and machine learning analyses

We generated multivariate LR models and CART decision trees to predict whether hospital survivors will develop fast eGFR decline during the follow-up period. The potential risk factors were selected based on RF analyses. By default, in each RF model, the number of decision trees was set to 500 and the number of variables used as a potential candidate split variable was 3. We chose the union of the top 10 variables based on the mean decrease accuracy (MDA) and the mean decrease in Gini (MDG) coefficient to fit LR and CART models.

In multivariate LR models, variables with significant odds ratio were considered as influential factors. In CART decision trees analyses, the minimum number of observations allowed in a terminal node was set to be 2% of the sample size N. The maximum depth of the decision tree was set to be 3. Variables that appeared as the split condition for the nodes in the decision tree were considered influential factors. We also calculated the overall prediction accuracy, sensitivity and specificity of each LR model and CART decision tree to compare the model performances.

The details of LR, RF and CART methods are in the S1 Methods section.

Propensity score matching (PSM) analysis

PSM is a quasi-experimental method used to establish an artificial control group by matching characteristics of each study unit with a control unit to better estimate the impact of a predictor. In our study, PSM was conducted using nearest neighbor matching in a 1:1 ratio without replacement. Propensity scores were estimated through logistic regression based on the important demographic covariates: sex, race, ethnicity and age. This method ensures that each treated unit (ICU admission or COVID-19 positive) is matched to a control unit (non-ICU admission or COVID-19 negative respectively) with the closest propensity score, thereby minimizing group differences and enabling direct comparisons.

To assess balance, we calculated absolute standardized mean differences (SMDs) for both the unadjusted (pre-matching) and adjusted (post-matching) datasets. The absolute mean difference plots demonstrate that all SMDs post-matching decreased to below 0.10, which is a commonly used threshold for acceptable balance, as previously reported [33]. This substantial reduction in SMDs indicates a significant improvement in balance after matching, ensuring the comparability of the matched groups (S13 and S14 Figs).

Statistical analysis

All statistical analyses and machine learning analyses were performed using R 4.2.3.

Results

1. Comparison of patients with and without fast eGFR decline

Of the cohort of 1,747 hospital survivors, 61.6% were noted to have a fast eGFR decline during a mean follow-up of 214.33 (±109.28) days (Table 1). Patients in the fast eGFR decline sub-group had a mean eGFR decline of 35.04 ml/min/1.73 m2 per year (standard deviation [SD] 34.44), whereas those in the control group (without fast eGFR decline) had a mean rise in eGFR of 11.84 ml/min/1.73 m2 per year (SD 21.00). Patients with fast eGFR decline were more likely to be younger, have a higher baseline eGFR, and had greater severity of hospital illness as reflected by a greater length of hospital stay (LOHS) and increased likelihood of requiring intensive care unit (ICU) admission, mechanical ventilation (MV), and vasopressors (Table 1). Patients with fast eGFR decline were 1.4 times more likely to have moderate/severe AKI (stages 2 and 3 ‐ AKI-2/3) during hospitalization (11.9 vs 8.4%).

Table 1. Univariate analysis of the followed-up patients with and without fast eGFR decline.

Variables Total N = 1747 Without Fast eGFR decline N = 671 38.41% Fast eGFR decline N = 1076 61.59%
N = 1747 (Mean/N) (Std/%) (Mean/N) (Std/%) (Mean/N) (Std/%) P-value
Demographics
Sex (N, %)
 Male 962 55.07% 362 53.95% 600 55.76% 0.459
 Female 785 44.93% 309 46.05% 476 44.24% 0.459
Race (N, %)
 White 1270 72.70% 491 73.17% 779 72.40% 0.723
 Non-White 477 27.30% 180 26.83% 297 27.60% 0.723
 Unknown 323 18.49% 125 18.63% 198 18.40% 0.905
Ethnicity (N, %)
 Non-Hispanic 1298 74.30% 489 72.88% 809 75.19% 0.283
 Hispanic 189 10.82% 73 10.88% 116 10.78% 0.949
 Unknown 260 14.88% 109 16.24% 151 14.03% 0.207
Age (Mean, SD) 63.81 17.95 65.23 17.31 62.92 18.29 0.009
Co-morbid conditions (N, %)
DM 548 31.37% 206 30.70% 342 31.78% 0.635
HF 404 23.13% 146 21.76% 258 23.98% 0.285
CKD 396 22.67% 156 23.25% 240 22.30% 0.647
COPD 221 12.65% 88 13.11% 133 12.36% 0.645
HTN 894 51.17% 355 52.91% 539 50.09% 0.253
CAD 543 31.08% 211 31.45% 332 30.86% 0.795
Cancer 362 20.72% 141 21.01% 221 20.54% 0.812
Asthma 139 7.96% 55 8.20% 84 7.81% 0.770
Psychiatric diagnosis 968 55.41% 358 53.35% 610 56.69% 0.172
BMI (Mean, SD) 28.62 8.47 28.53 7.54 28.67 9.01 0.751
Severity of illness
LOHS (Mean, SD) 8.44 11.22 6.62 7.62 9.58 12.84 <0.001
ICU admission (N, %) 306 17.52% 83 12.37% 223 20.72% <0.001
MV (N, %) 89 5.09% 17 2.53% 72 6.69% <0.001
MV days (Mean, SD) 0.61 4.36 0.14 1.01 0.90 5.48 0.002
ARDS (N, %) 20 1.14% 4 0.60% 16 1.49% 0.100
Vasopressor (N, %) 400 22.90% 116 17.29% 284 26.39% <0.001
Sepsis (N, %) 244 13.97% 80 11.92% 164 15.24% 0.052
AKI_23 184 10.53% 56 8.35% 128 11.90% 0.019
COVID-19 260 14.88% 92 13.71% 168 15.61% 0.278
Kidney function measures
Baseline eGFR 86.06 29.90 78.91 29.12 90.51 29.52 <0.001
Baseline eGFR > 120 189 10.82% 34 5.07% 155 14.41% <0.001
Baseline eGFR 90 to 120 693 39.57% 254 37.85% 439 40.80% 0.241
Baseline eGFR 60 to 89 494 28.28% 204 30.40% 290 26.96% 0.133
Baseline eGFR 30 to 59 292 16.71% 134 19.97% 158 14.68% 0.005
Baseline eGFR 15 to 29 62 3.55% 32 4.77% 30 2.79% 0.041
Baseline eGFR < 15 17 0.97% 13 1.94% 4 0.37% 0.003
Final eGFR 77.38 30.11 84.28 27.72 73.07 30.75 <0.001
Change in eGFR -8.68 17.10 5.37 9.26 -17.45 14.87 <0.001
Follow-up days 214.33 109.28 226.06 124.70 207.02 97.80 0.001
eGFR change per year -17.03 37.68 11.84 21.00 -35.04 34.44 0.552
Other lab measures
WBC 8.03 4.19 8.00 5.13 8.05 3.49 0.813
Hb 11.37 2.07 11.49 2.05 11.29 2.08 0.049
Platelets 251.65 122.42 242.97 124.08 257.04 121.13 0.020

Categorical variables presented as a count with associated percentage, continuous variables presented as value with standard deviation (Std). Univariate logistic p-values < 0.05 were considered significant and have been bolded.

Abbreviations: DM = diabetes mellitus, HF = heart failure, CKD = chronic kidney disease, COPD = chronic obstructive pulmonary disease, HTN = hypertension, CAD = coronary artery disease, BMI = Body Mass Index, LOHS = length of hospital stay, ICU admission = intensive care unit admission, MV = mechanical ventilation, ARDS = acute respiratory distress syndrome, AKI = acute kidney injury, COVID-19 = Corona virus disease 2019, eGFR = estimated glomerular filtration rate (mL/min/1.73m2), WBC = White Blood Cell count, Hb = Hemoglobin, Platelets = Platelet count.

Patients with fast eGFR decline were more likely to have a higher baseline eGFR (90.51 ± 29.52 mL/min/1.73m2) compared to those without fast eGFR decline (78.91 ± 29.12 mL/min/1.73m2). On further categorization by baseline eGFR, those with baseline eGFR >120 mL/min/1.73m2 were significantly more likely to be fast eGFR decliners, while those with baseline eGFR <60 mL/min/1.73m2 were more likely to be in the group without fast eGFR decline (Table 1).

1a. Machine learning analysis

In RF analysis, LOHS was among the top three variables associated with fast versus not fast eGFR decline in both the MDA and MDG plots (S1 Fig). The other top variables included vasopressor use, age, COPD and BMI. Also, among the top 3 variables were AKI-2/3 in the COVID-19 negative subset (S2 Fig) and Hispanic ethnicity, MV and MV days in the COVID-19 positive subset (S3 Fig). The top ten variables from RF analysis were used for LR and CART decision-tree analysis.

In multivariate LR analysis of the whole cohort, longer LOHS, and vasopressor use were significantly associated with greater odds of fast (vs. not fast) eGFR decline whereas older age was associated with lower odds (Table 2). Greater odds of fast (vs. not fast) eGFR decline were significantly associated with longer LOHS and vasopressor use in the COVID-19 negative subset (S1 Table) and with diabetes mellitus (DM) in the COVID-19 positive subgroup (S2 Table).

Table 2. Logistic regression for fast eGFR decline in the original dataset (N = 1747).
Variable OR (univariable) OR (multivariable)
LOHS Mean (SD) 1.03 (1.02–1.05, ***) 1.02 (1.01–1.04, **)
Vasopressor 1 1.72 (1.35–2.19, ***) 1.45 (1.12–1.89, **)
COPD 1 0.93 (0.70–1.25) 0.95 (0.70–1.29)
Age Mean (SD) 0.99 (0.99–1.00, **) 0.99 (0.99–1.00, *)
MV days Mean (SD) 1.13 (1.06–1.23, **) 1.05 (0.98–1.18)
Asthma 1 0.95 (0.67–1.36) 0.92 (0.64–1.34)
White 1 0.96 (0.77–1.19) 1.03 (0.82–1.30)
AKI_23 1 1.48 (1.07–2.08, *) 0.90 (0.62–1.31)
MV 1 2.76 (1.65–4.87, ***) 1.03 (0.45–2.26)
CKD 1 0.95 (0.75–1.19) 0.97 (0.76–1.26)
BMI Mean (SD) 1.00 (0.99–1.01) 1.00 (0.99–1.01)
Male 1 1.08 (0.89–1.31) 1.01 (0.83–1.24)
Psychiatric diagnosis 1 1.14 (0.94–1.39) 1.09 (0.89–1.33)
HTN 1 0.89 (0.74–1.08) 0.91 (0.74–1.12)
DM 1 1.05 (0.85–1.30) 1.09 (0.87–1.36)
CAD 1 0.97 (0.79–1.20) 1.08 (0.86–1.37)
Cancer 1 0.97 (0.77–1.23) 0.95 (0.74–1.21)

The top variables form Random Forest analysis were selected for Logistic Regression analysis.

P-values < 0.05 were considered significant and were summarized with ‘*’, p-values < 0.01 were considered significant and were summarized with ‘**’, and p-values < 0.001 were considered significant and were summarized with ‘***’.

Abbreviations: LOHS = length of hospital stay, COPD = chronic obstructive pulmonary disease, MV = mechanical ventilation, CKD = chronic kidney disease, HTN = hypertension, DM = diabetes mellitus, CAD = coronary artery disease, eGFR = estimated glomerular filtration rate.

In CART decision tree analysis of the full cohort, LOHS was the most important factor followed by vasopressor use and AKI-2/3 diagnosis (Fig 1). Those with AKI and hospital stay ≥ 27 days had the highest likelihood of fast (vs. not fast) eGFR decline, whereas the lowest risk was found in two subgroups (those without vasopressor use and body mass index [BMI] ≥ 22 and those with vasopressor use, but with a hospital stay < 6 days) (Fig 1). In the COVID-19 negative subgroup, LOHS was again the most important risk factor followed by vasopressor use (Fig 2). Patients with hospital stay ≥ 6 days, with vasopressor use and of female sex had the highest likelihood of fast eGFR decline, whereas those with hospital stay < 2 days and with coronary artery disease (CAD) had the lowest risk (Fig 2). In the COVID-19 positive subgroup, LOHS was also the most important risk factor followed by cancer diagnosis and age (Fig 3). Patients with hospital stay ≥ 15 days, age < 65 and BMI < 27 had the highest likelihood of fast eGFR decline, whereas those with hospital stay < 15 days with cancer and without hypertension (HTN) had the lowest risk (Fig 3).

Fig 1. CART decision tree for fast eGFR decline in the overall cohort.

Fig 1

The number of observations in a terminal node was set as at least 2% of the sample size. The percentage mentioned in the terminal node is the % of patients of the starting cohort of the analyses. In each terminal node, the risk of fast eGFR decline (vs. not fast) ranges from 0.00 (lowest) to 1.00 (highest). The color of the terminal node represents the risk associated with the tree attached to each node, with the intensity of green color indicating a stronger risk, while intensity of blue color representing a lower risk. The maximum depth of the decision tree was set to be 3.

Fig 2. CART decision tree for fast eGFR decline in the COVID negative subset.

Fig 2

The number of observations in a terminal node was set as at least 2% of the sample size. The percentage mentioned in the terminal node is the % of patients of the starting cohort of the analyses. In each terminal node, the risk of fast eGFR decline (vs. not fast) ranges from 0.00 (lowest) to 1.00 (highest). The color of the terminal node represents the risk associated with the tree attached to each node, with the intensity of green color indicating a stronger risk, while intensity of blue color representing a lower risk. The maximum depth of the decision tree was set to be 3.

Fig 3. CART decision tree for fast eGFR decline in the COVID positive subset.

Fig 3

The number of observations in a terminal node was set as at least 2% of the sample size. The percentage mentioned in the terminal node is the % of patients of the starting cohort of the analyses. In each terminal node, the risk of fast eGFR decline (vs. not fast) ranges from 0.00 (lowest) to 1.00 (highest). The color of the terminal node represents the risk associated with the tree attached to each node, with the intensity of green color indicating a stronger risk, while intensity of blue color representing a lower risk. The maximum depth of the decision tree was set to be 3.

2. Comparison of patients with and without ICU admission after PSM for demographics

Patients admitted to the ICU were more likely to require vasopressors or have a greater LOHS or diagnosis of sepsis (S3 Table). Patients with ICU admission were 3.7 times more likely to have AKI-2/3 during hospitalization and had a greater proportion of patients with fast eGFR decline (72.9 vs 53.9%) during follow-up compared those not admitted to the ICU.

2a. Machine learning analysis

In the PSM matched cohort based on ICU status, LOHS was again among the top three variables associated with fast (vs. not fast) eGFR decline in RF analysis in both the MDA and MDG plots (S4 Fig). The other top variables were MV days, vasopressor use, age, and BMI. The top three variables also included ICU admission among the COVID-19 negative subgroup (S5 Fig) and MV among the COVID-19 positive subgroup (S6 Fig).

In multivariate LR analysis, longer LOHS and ICU admission were significantly associated with greater odds of fast (vs. not fast) eGFR decline whereas baseline CKD was associated with lower odds (Table 3). In the COVID-19 negative subset (S4 Table), older age and male sex were significantly associated with lower odds, whereas in the COVID-19 positive subgroup (S5 Table), DM was associated with greater odds of fast eGFR decline.

Table 3. Logistic regression for fast eGFR decline in the PSM matched ICU subset of the whole cohort (N = 612).
Variable OR (univariable) OR (multivariable)
LOHS Mean (SD) 1.04 (1.02–1.06, ***) 1.03 (1.00–1.05, *)
MV days Mean (SD) 1.13 (1.05–1.23, **) 1.04 (0.97–1.19)
Vasopressor 1 1.89 (1.33–2.70, ***) 1.17 (0.75–1.81)
CKD 1 0.60 (0.40–0.88, **) 0.61 (0.39–0.95, *)
ICU admission 1 2.30 (1.64–3.23, ***) 1.71 (1.14–2.59, **)
COVID 1 1.05 (0.67–1.66) 0.76 (0.46–1.27)
Cancer 1 0.91 (0.61–1.38) 0.82 (0.53–1.28)
AKI_23 1 1.82 (1.15–2.94, *) 0.89 (0.51–1.58)
COPD 1 0.88 (0.54–1.44) 0.93 (0.55–1.56)
MV 1 2.77 (1.63–4.99, ***) 0.91 (0.37–2.13)
BMI Mean (SD) 1.00 (0.98–1.02) 0.99 (0.97–1.02)
Age Mean (SD) 0.99 (0.98–1.00, *) 0.99 (0.98–1.00)
Psychiatric diagnosis 1 1.21 (0.87–1.68) 0.96 (0.67–1.37)
HTN 1 0.91 (0.66–1.27) 0.82 (0.57–1.19)
Male 1 0.86 (0.61–1.21) 0.86 (0.60–1.23)
DM 1 0.98 (0.69–1.40) 1.14 (0.78–1.69)

The top variables form Random Forest analysis were selected for Logistic Regression analysis.

P-values < 0.05 were considered significant and were summarized with ‘*’, p-values < 0.01 were considered significant and were summarized with ‘**’, and p-values < 0.001 were considered significant and were summarized with ‘***’.

Abbreviations: LOHS = length of hospital stay, COPD = chronic obstructive pulmonary disease, MV = mechanical ventilation, CKD = chronic kidney disease, HTN = hypertension, DM = diabetes mellitus, CAD = coronary artery disease. eGFR = estimated glomerular filtration rate.

In CART analysis, ICU admission was the most important factor followed by LOHS and baseline CKD (Fig 4). The subgroups with ICU admission and either LOHS ≥ 32 days and White race, or LOHS< 32 days and age < 32 years had the highest likelihood of fast eGFR decline. The lowest risk occurred in those without an ICU admission who had CKD at baseline and a BMI<24 (Fig 4). In the COVID-19 negative subgroup, age was the most important factor followed by ICU admission and LOHS (S7 Fig). Patients with age < 70 years and hospital stay ≥ 44 days had the highest likelihood of fast eGFR decline, whereas those with age ≥ 70 years but < 81 years and no ICU admission had the lowest risk (S7 Fig). In the COVID-19 positive subgroup, LOHS was again the most important factor followed by DM diagnosis and BMI (S8 Fig). Patients with hospital stay ≥ 24 days, BMI < 35 and age ≥ 37 years had the highest likelihood of fast eGFR decline, whereas those with DM diagnosis but hospital stay < 3 days had the lowest risk (S8 Fig).

Fig 4. CART decision tree for fast eGFR decline in the PSM matched ICU subset of the whole cohort.

Fig 4

The number of observations in a terminal node was set as at least 2% of the sample size. The percentage mentioned in the terminal node is the % of patients of the starting cohort of the analyses. In each terminal node, the risk of fast eGFR decline (vs. not fast) ranges from 0.00 (lowest) to 1.00 (highest). The color of the terminal node represents the risk associated with the tree attached to each node, with the intensity of green color indicating a stronger risk, while intensity of blue color representing a lower risk. The maximum depth of the decision tree was set to be 3.

3. Comparison of patients with and without COVID-19 diagnosis after PSM for demographics

In the overall cohort, compared to those without COVID-19, patients admitted with COVID-19 diagnosis were more likely to have a greater mean decline in eGFR during follow-up, but the proportion of patients with fast eGFR decline was not significantly different (S6 Table). There was no significant difference in baseline age, sex or eGFR between the COVID positive and negative groups.

In the sub-group of patients who were propensity matched for demographics, compared to those without COVID-19, patients admitted with COVID-19 diagnosis were less likely to require vasopressors or have baseline HTN, CAD, and cancer; but were more likely to have an ICU admission, sepsis, ARDS, MV, and a greater LOHS (S7 Table). Patients with COVID-19 were 1.9 times more likely to have AKI-2/3 during hospitalization and a greater mean decline in eGFR during follow-up, but the proportion of patients with fast eGFR decline was not significantly different.

3a. Machine learning analysis

In the PSM matched cohort based on COVID status, LOHS was again among the top 3 variables associated with fast (vs not fast) eGFR decline in RF analysis in both the MDA and MDG plots (S9 Fig). The other top variables were MV days, White race, age and BMI. The top 3 variables also included vasopressor use and Hispanic ethnicity among the COVID-19 negative subgroup (S10 Fig) and MV and Hispanic ethnicity among COVID-19 positive subset (S11 Fig).

In multivariate LR analysis, DM was significantly associated with greater odds of fast (vs. not fast) eGFR decline whereas older age was associated with lower odds (Table 4). In the COVID-19 negative subset, there were no statistically significant variables in multivariate LR (S8 Table). The findings of the COVID-19 positive subgroup are noted in S2 Table.

Table 4. Logistic regression for fast eGFR decline in the PSM matched COVID-19 subset of the whole cohort (N = 520).
Variable OR (univariable) OR (multivariable)
LOHS Mean (SD) 1.04 (1.02–1.06, ***) 1.02 (1.00–1.05)
White 1 1.17 (0.80–1.69) 1.30 (0.85–2.00)
MV days Mean (SD) 1.11 (1.03–1.24, *) 1.03 (0.96–1.16)
Hispanic 1 0.77 (0.51–1.18) 0.69 (0.42–1.14)
Vasopressor 1 2.15 (1.35–3.51, **) 1.54 (0.91–2.66)
Age Mean (SD) 0.99 (0.98–1.00) 0.99 (0.97–1.00, *)
HTN 1 1.45 (1.01–2.07, *) 1.45 (0.99–2.14)
Sepsis 1 1.78 (1.11–2.93, *) 1.39 (0.83–2.36)
DM 1 1.31 (0.89–1.93) 1.55 (1.02–2.38, *)
COVID 1 1.18 (0.83–1.68) 1.07 (0.71–1.61)
BMI Mean (SD) 0.99 (0.97–1.01) 0.98 (0.95–1.00)
Male 1 0.97 (0.68–1.39) 0.81 (0.55–1.17)
Psychiatric diagnosis 1 1.13 (0.79–1.61) 0.96 (0.65–1.41)

The top variables form Random Forest analysis were selected for Logistic Regression analysis.

P-values < 0.05 were considered significant and were summarized with ‘*’, p-values < 0.01 were considered significant and were summarized with ‘**’, and p-values < 0.001 were considered significant and were summarized with ‘***’.

Abbreviations: LOHS = length of hospital stay, COPD = chronic obstructive pulmonary disease, MV = mechanical ventilation, CKD = chronic kidney disease, HTN = hypertension, DM = diabetes mellitus, CAD = coronary artery disease, eGFR = estimated glomerular filtration rate.

In CART analysis, LOHS was the most important factor followed by BMI and age (Fig 5). Those with hospital stay ≥ 8 days and age < 46 but ≥36 years had the highest likelihood of fast (vs. not fast) eGFR decline, whereas those with hospital stay < 8 days, BMI ≥37 but no DM diagnosis had the lowest risk (Fig 5). In the COVID-19 negative subgroup, LOHS was again the most important factor followed by BMI (S12 Fig). Patients with hospital stay ≥ 8 but < 12 days had the highest likelihood of fast) eGFR decline, whereas those with hospital stay < 2 days and BMI ≥ 31 had the lowest risk. The findings of the COVID-19 positive subgroup are noted in Fig 3.

Fig 5. CART decision tree for fast eGFR decline in the PSM matched COVID-19 subset of the whole cohort.

Fig 5

The number of observations in a terminal node was set as at least 2% of the sample size. The percentage mentioned in the terminal node is the % of patients of the starting cohort of the analyses. In each terminal node, the risk of fast eGFR decline (vs. not fast) ranges from 0.00 (lowest) to 1.00 (highest). The color of the terminal node represents the risk associated with the tree attached to each node, with the intensity of green color indicating a stronger risk, while intensity of blue color representing a lower risk. The maximum depth of the decision tree was set to be 3.

4. Comparison of LR and CART methods

Both CART and LR methods showed similar accuracy and predictive power (S9 Table) in each dataset (primary and PSM cohorts). Both methods showed high specificity but low sensitivity for outcome prediction. The sensitivity and accuracy were comparatively higher in patients diagnosed with COVID-19 compared to those without.

Discussion

In this study of 1,747 hospital survivors with and without COVID-19, we identified age, baseline eGFR, greater severity of hospital illness, and moderate/severe AKI as the key hospital factors associated with fast kidney function decline after hospitalization. Among the factors associated with severity of hospital illness, length of hospital stay (LOHS) was the most important factor followed by vasopressor use in both LR and CART analysis. Using CART, we were able to identify the patient sub-groups with the highest and lowest risk of post-hospitalization fast eGFR decline. After stratification of the cohort based on ICU and COVID-19 status, other risk factors such as admission to the ICU and baseline CKD, DM, and BMI were identified. In all analyses, LOHS emerged as a highly significant factor. To our knowledge this is the first study to report the use of CART for identifying the patient sub-groups associated with the highest risk of fast post-hospitalization eGFR decline among patients hospitalized during the first year of the pandemic.

Severity of acute illness, patient characteristics, and complications during the hospitalization are associated with longer LOHS [3437]. Longer LOHS has been associated with severity of kidney disease in the hospital [38, 39], and adverse post-discharge health outcomes [40, 41]. Similar findings have also been reported during the COVID-19 pandemic [4244]. In a study of patients with COVID-19 who had AKI in the hospital, post-discharge eGFR decline was associated with longer hospitalization [45]. In this study, we report that length of stay during hospitalization is significantly associated with post-discharge eGFR decline in both LR and CART analyses in one of the largest patient cohorts to-date from the pandemic era. This finding has significant implications for the outpatient kidney monitoring and risk stratification of patients who were recently hospitalized.

Although the severity of illness in the hospital, including the diagnosis of AKI, is well known to be associated with worse long-term kidney outcomes, in our study we report other less recognized associations with fast post-discharge eGFR decline. For example, we found patients with fast eGFR decline were more likely to have a higher baseline eGFR. Baseline eGFR is known to be associated with eGFR change after AKI [46], and with faster decline in eGFR over time [6, 47, 48]. Pathologic glomerular hyperfiltration resulting in higher eGFR values might be a contributing factor [49]. While did not find a difference in CKD diagnosis (Table 1) in the eGFR decline groups, it’s not clear why patients with low baseline eGFR might have slower eGFR decline, although the number of patients with eGFR < 60 mL/min/1.73m2 in our study was low (only 21% of the cohort). This interesting association of baseline eGFR with eGFR decline certainly needs further exploration in research studies.

Another interesting finding in our study was that younger age was associated with higher risk of fast post-discharge eGFR decline. This association has been previously reported [50] and might be related to inadequacy of current eGFR estimates in older individuals [51] and possibly to different pathophysiological mechanisms of CKD progression [52]. Both findings have important clinical implications in renal monitoring of patients after hospital discharge.

As an advancement over previous models using Cox proportional hazards [53], newer ML-based models including gradient boosting, regression splines and random forest have been developed to predict kidney disease [5460]. Decision tree analysis can identify the patient subgroups at lowest and highest risk of the eGFR decline through resulting cluster and hierarchy of decision nodes. CART has been used in the study of kidney diseases [1419], however, the use of CART to predict eGFR decline has rarely been reported. A recent study used CART to identify factors associated with eGFR decline but was limited by a small sample and restricted to patients with partial nephrectomy [28]. To our knowledge, our study is the first to use CART to analyze factors associated with post-discharge eGFR decline in patients with and without COVID-19. Using CART analysis, we found that LOHS was the most important factor followed by vasopressor use and AKI-2/3 diagnosis. Those with AKI and hospital stay ≥ 27 days had the highest likelihood of fast (vs. not fast) eGFR decline, whereas the lowest risk was found in two subgroups: those with no vasopressor use and BMI≥ 22 and those with vasopressor use, but with a hospital stay < 6 days. These findings highlight the importance of evaluating recent hospitalization data of patients while monitoring kidney function in outpatient clinics.

Our study also highlights the importance of utilizing a non-parametric supervised learning algorithm like CART to identify high and low risk patient sub-groups, rather than relying solely on a traditional approach like LR which only identifies individual clinical factors. The application of CART models in clinical practice, particularly in ICU settings, offers several advantages [61, 62] CART’s ability to generate simple and interpretable decision trees makes it a practical tool for risk stratification and decision-making in high-risk environments. For ICU patients, who often experience a higher frequency of AKI due to severity illness and other nephrotoxic insults and consequently have a high risk of post-discharge eGFR decline and CKD progression. CART can help identify high-risk individuals early, allowing for targeted interventions such as nephroprotective measures or closer monitoring. Furthermore, CART’s flexibility in handling complex interactions between variables, such as baseline renal function, comorbidities, laboratory parameters and treatment modalities, is particularly relevant for ICU populations. Integrating CART models into electronic health record systems could facilitate real-time risk assessments, improving patient outcomes [63, 64]. However, successful implementation requires rigorous validation of the model in diverse ICU settings and the availability of high-quality, real-time data. This should be the focus of future validation and implementation studies.

During the pandemic period, multiple factors including COVID-19 infection were found to be associated with increased odds of rapid kidney function decline [65]. However, there have been only a few studies that report the use of ML to predict eGFR decline in hospital survivors from the first year of the pandemic when the treatment of this disease was evolving, and mass public vaccinations had not yet started [66]. Vaid et al. used several ML models for predicting dialysis requirement and death in patients hospitalized with COVID-19 and found an XGBoost model without imputation to have the highest accuracy [67]. We had previously reported the use of XGBoost to predict recovery after AKI in patients with and without COVID-19 [30]. The use of ML models, especially decision tree analysis, to predict post-hospitalization eGFR decline in COVID-19 survivors has been rarely reported. In a small study of 37 critically ill patients with COVID-19 during the first year of the pandemic, One Rule and decision trees methods were used to classify patients for risk of CKD and mortality [68]. However, this study was limited in design, sample size and focused only on in-hospital outcomes. In our study, among patients with COVID-19, baseline DM was significantly associated with greater odds of fast eGFR decline whereas older age was associated with lower odds in multivariate LR analysis. In CART analysis, LOHS was the most important factor followed by BMI and age. This data provides valuable insights into high-risk patients admitted with COVID-19 who require closer kidney monitoring after discharge.

Our study shows comparable accuracy between LR and CART as previously reported [69, 70]. Besides CART, other decision trees used in the prediction of kidney disease have had variable accuracy compared to other ML techniques [2327]. CART’s strengths lie in its simplicity and ease of interpretation due to its binary tree structure, unlike methods with multi-way splits. Its tree pruning technique, which grows a large tree and prunes it to optimal size, effectively prevents overfitting. As a non-parametric approach that is applicable to both classification and regression tasks, CART does not assume data distribution, making it particularly effective for modeling nonlinear relationships that parametric methods like LR may not handle well.

In our study, we used RF for selecting the top ten most significant features associated with fast eGFR decline. We then undertook LR and CART analysis to study the relative importance of these features. Previous studies have shown that the employment of ML for feature selection before decision tree analyses increases accuracy [26, 71].

Our study had several limitations. Due to our study requirements of at least two outpatient eGFR values more than 90 days after hospital discharge, only a third of the hospital survivors during the pandemic had data available to evaluate post-discharge eGFR decline. We did not have accurate urine output data and AKI in the hospital was diagnosed by the serum creatinine criteria only. A significant proportion of hospitalized patients did not have pre-hospitalization baseline eGFR available, and in those cases, the lowest serum creatinine during hospitalization was used to estimate baseline eGFR. Since most patients in our cohort had normal baseline kidney function (mean baseline eGFR of 86.06 ±29.90 ml/min/1.73m2 and mean final eGFR of 77.38 ±30.11 ml/min/1.73m2), we were not able to study other outcomes associated with CKD progression such as incident CKD, > 40% eGFR decline or incident ESKD. The lack of data on proteinuria, retrospective design, and use of data from a single center are additional limitations.

In conclusion, we report that use of combined ML techniques can provide a comprehensive understanding of the key factors associated with kidney outcomes. CART analysis can help identify the subgroups of hospitalized patients with the highest and lowest risk of post-discharge eGFR decline. In this study, CART identified length of hospitalization as the most important factor. Based on these findings, we advise clinicians to consider the length of hospitalization in their post-discharge monitoring of kidney function. We anticipate that identification of high-risk hospitalized patients through CART can significantly improve the post-discharge clinical management. Further studies in other healthcare systems are required to validate our findings.

Supporting information

S1 Fig. Random Forest for fast eGFR decline in the original dataset (N = 1747).

(DOCX)

pone.0317558.s001.docx (120.4KB, docx)
S2 Fig. Random Forest for fast eGFR decline in the COVID negative subset (N = 1487).

(DOCX)

pone.0317558.s002.docx (114.7KB, docx)
S3 Fig. Random Forest for fast eGFR decline in the COVID positive subset (N = 260).

(DOCX)

pone.0317558.s003.docx (112.2KB, docx)
S4 Fig. Random Forest for fast eGFR decline in the ICU matched subset (N = 612).

(DOCX)

pone.0317558.s004.docx (114.3KB, docx)
S5 Fig. Random Forest for fast eGFR decline in the ICU matched COVID negative subset (N = 510).

(DOCX)

pone.0317558.s005.docx (111.4KB, docx)
S6 Fig. Random Forest for fast eGFR decline in the ICU matched COVID positive subset (N = 102).

(DOCX)

pone.0317558.s006.docx (111.5KB, docx)
S7 Fig. CART decision tree for fast eGFR decline in the COVID-19 negative subgroup of the PSM matched ICU subset (N = 510).

(DOCX)

pone.0317558.s007.docx (58.1KB, docx)
S8 Fig. CART decision tree for fast eGFR decline in the COVID-19 positive subgroup of the PSM matched ICU subset (N = 102).

(DOCX)

pone.0317558.s008.docx (61.9KB, docx)
S9 Fig. Random Forest for fast eGFR decline in the COVID matched subset (N = 520).

(DOCX)

pone.0317558.s009.docx (113.9KB, docx)
S10 Fig. Random Forest for fast eGFR decline in the COVID matched COVID negative subset (N = 260).

(DOCX)

pone.0317558.s010.docx (112.1KB, docx)
S11 Fig. Random Forest for fast eGFR decline in the COVID matched COVID positive subset (N = 260).

(DOCX)

pone.0317558.s011.docx (108.7KB, docx)
S12 Fig. CART decision tree for fast eGFR decline in the COVID negative subgroup of the PSM matched COVID-19 subset (N = 260).

(DOCX)

pone.0317558.s012.docx (62.1KB, docx)
S13 Fig. Standardized Mean Differences (SMDs) Before and After Matching for ICU Admission Groups.

(DOCX)

pone.0317558.s013.docx (80.7KB, docx)
S14 Fig. Standardized Mean Differences (SMDs) Before and After Matching for COVID-19 Diagnosis Groups.

(DOCX)

pone.0317558.s014.docx (80.7KB, docx)
S1 Table. Logistic regression for fast eGFR decline in the COVID negative subset (N = 1487).

(DOCX)

pone.0317558.s015.docx (21.7KB, docx)
S2 Table. Logistic regression for fast eGFR decline in the COVID positive subset (N = 260).

(DOCX)

pone.0317558.s016.docx (21.4KB, docx)
S3 Table. Univariate analysis of the followed-up patients admitted and not admitted to the ICU after PSM on 4 demographic variables.

(DOCX)

pone.0317558.s017.docx (26.7KB, docx)
S4 Table. Logistic regression for fast eGFR decline in the COVID negative subgroup of the PSM matched ICU subset (N = 510).

(DOCX)

pone.0317558.s018.docx (21.6KB, docx)
S5 Table. Logistic regression for fast eGFR decline in the COVID positive subgroup of the PSM matched ICU subset (N = 102).

(DOCX)

pone.0317558.s019.docx (21.4KB, docx)
S6 Table. Univariate analysis of the followed-up patients with and without COVID-19 (N = 1,747).

(DOCX)

pone.0317558.s020.docx (29.2KB, docx)
S7 Table. Univariate analysis of the followed-up patients with and without COVID-19 after PSM on 4 demographic variables.

(DOCX)

pone.0317558.s021.docx (27.3KB, docx)
S8 Table. Logistic regression for fast eGFR decline in the COVID-negative subgroup of the PSM matched COVID-19 subset (N = 260).

(DOCX)

pone.0317558.s022.docx (21.2KB, docx)
S9 Table. Comparisons between CART decision tree and logistic regression models.

(DOCX)

pone.0317558.s023.docx (20.7KB, docx)
S1 Methods

(DOCX)

pone.0317558.s024.docx (38.5KB, docx)

Data Availability

All relevant data are within the paper and its Supporting Information files.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Kim HJ, Kim DW, Rhee H, Song SH, Park SK, Kim SW, et al. Rapid decline in kidney function is associated with rapid deterioration of health-related quality of life in chronic kidney disease. Sci Rep. 2023;13(1):1786. doi: 10.1038/s41598-023-28150-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ali I, Chinnadurai R, Ibrahim ST, Kalra PA. Adverse outcomes associated with rapid linear and non-linear patterns of chronic kidney disease progression. BMC Nephrol. 2021;22(1):82. doi: 10.1186/s12882-021-02282-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Rifkin DE, Shlipak MG, Katz R, Fried LF, Siscovick D, Chonchol M, et al. Rapid kidney function decline and mortality risk in older adults. Arch Intern Med. 2008;168(20):2212–8. doi: 10.1001/archinte.168.20.2212 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Hussain J, Grubic N, Akbari A, Canney M, Elliott MJ, Ravani P, et al. Associations between modest reductions in kidney function and adverse outcomes in young adults: retrospective, population based cohort study. BMJ. 2023;381:e075062. doi: 10.1136/bmj-2023-075062 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Kurzhagen JT, Dellepiane S, Cantaluppi V, Rabb H. AKI: an increasingly recognized risk factor for CKD development and progression. J Nephrol. 2020;33(6):1171–87. doi: 10.1007/s40620-020-00793-2 [DOI] [PubMed] [Google Scholar]
  • 6.Sawhney S, Marks A, Fluck N, Levin A, McLernon D, Prescott G, et al. Post-discharge kidney function is associated with subsequent ten-year renal progression risk among survivors of acute kidney injury. Kidney Int. 2017;92(2):440–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Haines RW, Powell-Tuck J, Leonard H, Crichton S, Ostermann M. Long-term kidney function of patients discharged from hospital after an intensive care admission: observational cohort study. Sci Rep. 2021;11(1):9928. doi: 10.1038/s41598-021-89454-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Lai CF, Wu VC, Huang TM, Yeh YC, Wang KC, Han YY, et al. Kidney function decline after a non-dialysis-requiring acute kidney injury is associated with higher long-term mortality in critically ill survivors. Crit Care. 2012;16(4):R123. doi: 10.1186/cc11419 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Saha I, Gourisaria MK, Harshvardhan GM, editors. Classification System for Prediction of Chronic Kidney Disease Using Data Mining Techniques. Advances in Data and Information Sciences; 2022 2022//; Singapore: Springer Singapore. [Google Scholar]
  • 10.Poonia RC, Gupta MK, Abunadi I, Albraikan AA, Al-Wesabi FN, Hamza MA, et al. Intelligent Diagnostic Prediction and Classification Models for Detection of Kidney Disease. Healthcare. 2022;10(2):371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Islam MA, Majumder MZH, Hussein MA. Chronic kidney disease prediction based on machine learning algorithms. J Pathol Inform. 2023;14:100189. doi: 10.1016/j.jpi.2023.100189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Inaguma D, Hayashi H, Yanagiya R, Koseki A, Iwamori T, Kudo M, et al. Development of a machine learning-based prediction model for extremely rapid decline in estimated glomerular filtration rate in patients with chronic kidney disease: a retrospective cohort study using a large data set from a hospital in Japan. BMJ Open. 2022;12(6):e058833. doi: 10.1136/bmjopen-2021-058833 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Inaguma D, Kitagawa A, Yanagiya R, Koseki A, Iwamori T, Kudo M, et al. Increasing tendency of urine protein is a risk factor for rapid eGFR decline in patients with CKD: A machine learning-based prediction model by using a big database. PLoS One. 2020;15(9):e0239262. doi: 10.1371/journal.pone.0239262 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wang R, Zhang J, He M, Xu J. Classification and Regression Tree Predictive Model for Acute Kidney Injury in Traumatic Brain Injury Patients. Ther Clin Risk Manag. 2024;20:139–49. doi: 10.2147/TCRM.S435281 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chi R, Liang M, Zou Q, Li C, Zhou H, Jian Z. [Construction and validation of a decision tree based on biomarkers for predicting severe acute kidney injury in critically ill patients]. Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2020;32(6):721–5. doi: 10.3760/cma.j.cn121430-20200509-00371 [DOI] [PubMed] [Google Scholar]
  • 16.Schneider DF, Dobrowolsky A, Shakir IA, Sinacore JM, Mosier MJ, Gamelli RL. Predicting acute kidney injury among burn patients in the 21st century: a classification and regression tree analysis. J Burn Care Res. 2012;33(2):242–51. doi: 10.1097/BCR.0b013e318239cc24 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Gao W, Wang J, Zhou L, Luo Q, Lao Y, Lyu H, et al. Prediction of acute kidney injury in ICU with gradient boosting decision tree algorithms. Computers in Biology and Medicine. 2022;140:105097. doi: 10.1016/j.compbiomed.2021.105097 [DOI] [PubMed] [Google Scholar]
  • 18.Binongo JN, Taylor A, Hill AN, Schmotzer B, Halkar R, Folks R, et al. Use of classification and regression trees in diuresis renography. Acad Radiol. 2007;14(3):306–11. doi: 10.1016/j.acra.2006.12.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Lin S-J, Liu C-C, Tsai DMT, Shih Y-H, Lin C-L, Hsu Y-C. Prediction Models Using Decision Tree and Logistic Regression Method for Predicting Hospital Revisits in Peritoneal Dialysis Patients. Diagnostics. 2024;14(6):620. doi: 10.3390/diagnostics14060620 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Debal DA, Sitote TM. Chronic kidney disease prediction using machine learning techniques. Journal of Big Data. 2022;9(1):109. [Google Scholar]
  • 21.Tekale S, Shingavi P, Wandhekar S, Chatorikar A. Prediction of chronic kidney disease using machine learning algorithm. International Journal of Advanced Research in Computer and Communication Engineering. 2018;7(10):92–6. [Google Scholar]
  • 22.Khan SH. Predictive models for chronic renal disease using decision trees, naïve bayes and case-based methods [Student thesis]2010. [Google Scholar]
  • 23.Ilyas H, Ali S, Ponum M, Hasan O, Mahmood MT, Iftikhar M, et al. Chronic kidney disease diagnosis using decision tree algorithms. BMC Nephrol. 2021;22(1):273. doi: 10.1186/s12882-021-02474-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Pasadana IA, Hartama D, Zarlis M, Sianipar AS, Munandar A, Baeha S, et al. Chronic Kidney Disease Prediction by Using Different Decision Tree Techniques. Journal of Physics: Conference Series. 2019;1255(1):012024. [Google Scholar]
  • 25.Chiu Y-L, Jhou M-J, Lee T-S, Lu C-J, Chen M-S. Health Data-Driven Machine Learning Algorithms Applied to Risk Indicators Assessment for Chronic Kidney Disease. Risk Management and Healthcare Policy. 2021;14(null):4401–12. doi: 10.2147/RMHP.S319405 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Shih CC, Lu CJ, Chen GD, Chang CC. Risk Prediction for Early Chronic Kidney Disease: Results from an Adult Health Examination Program of 19,270 Individuals. Int J Environ Res Public Health. 2020;17(14). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Cao X, Lin Y, Yang B, Li Y, Zhou J. Comparison Between Statistical Model and Machine Learning Methods for Predicting the Risk of Renal Function Decline Using Routine Clinical Data in Health Screening. Risk Manag Healthc Policy. 2022;15:817–26. doi: 10.2147/RMHP.S346856 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Uleri A, Baboudjian M, Gallioli A, Territo A, Gaya JM, Sanz I, et al. A new machine-learning model to predict long-term renal function impairment after minimally invasive partial nephrectomy: the Fundacio Puigvert predictive model. World J Urol. 2023;41(11):2985–90. [DOI] [PubMed] [Google Scholar]
  • 29.Yoo YJ, Wilkins KJ, Alakwaa F, Liu F, Torre-Healy LA, Krichevsky S, et al. Geographic and Temporal Trends in COVID-Associated Acute Kidney Injury in the National COVID Cohort Collaborative. Clin J Am Soc Nephrol. 2023;18(8):1006–18. doi: 10.2215/CJN.0000000000000192 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Sun S, Annadi RR, Chaudhri I, Munir K, Hajagos J, Saltz J, et al. Short- and Long-Term Recovery after Moderate/Severe AKI in Patients with and without COVID-19. Kidney360. 2022;3(2):242–57. doi: 10.34067/KID.0005342021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Aklilu AM, Kumar S, Nugent J, Yamamoto Y, Coronel-Moreno C, Kadhim B, et al. COVID-19-Associated Acute Kidney Injury and Longitudinal Kidney Outcomes. JAMA Intern Med. 2024. doi: 10.1001/jamainternmed.2023.8225 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Koraishy FM, Hooks-Anderson D, Salas J, Rauchman M, Scherrer JF. Fast GFR decline and progression to CKD among primary care patients with preserved GFR. Int Urol Nephrol. 2018;50(3):501–8. doi: 10.1007/s11255-018-1805-1 [DOI] [PubMed] [Google Scholar]
  • 33.Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat Med. 2009;28(25):3083–107. doi: 10.1002/sim.3697 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Roger C, Debuyzer E, Dehl M, Bulaid Y, Lamrani A, Havet E, et al. Factors associated with hospital stay length, discharge destination, and 30-day readmission rate after primary hip or knee arthroplasty: Retrospective Cohort Study. Orthop Traumatol Surg Res. 2019;105(5):949–55. doi: 10.1016/j.otsr.2019.04.012 [DOI] [PubMed] [Google Scholar]
  • 35.Inabnit LS, Blanchette C, Ruban C. Comorbidities and length of stay in chronic obstructive pulmonary disease patients. COPD. 2018;15(4):355–60. doi: 10.1080/15412555.2018.1513470 [DOI] [PubMed] [Google Scholar]
  • 36.Tefera GM, Feyisa BB, Umeta GT, Kebede TM. Predictors of prolonged length of hospital stay and in-hospital mortality among adult patients admitted at the surgical ward of Jimma University medical center, Ethiopia: prospective observational study. J Pharm Policy Pract. 2020;13:24. doi: 10.1186/s40545-020-00230-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Marfil-Garza BA, Belaunzaran-Zamudio PF, Gulias-Herrero A, Zuniga AC, Caro-Vega Y, Kershenobich-Stalnikowitz D, et al. Risk factors associated with prolonged hospital length-of-stay: 18-year retrospective study of hospitalizations in a tertiary healthcare center in Mexico. PLoS One. 2018;13(11):e0207203. doi: 10.1371/journal.pone.0207203 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Su G, Xu H, Marrone G, Lindholm B, Wen Z, Liu X, et al. Chronic kidney disease is associated with poorer in-hospital outcomes in patients hospitalized with infections: Electronic record analysis from China. Sci Rep. 2017;7(1):11530. doi: 10.1038/s41598-017-11861-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Tam-Tham H, Ravani P, Zhang J, Weaver RG, Quinn RR, James MT, et al. Association of Initiation of Dialysis With Hospital Length of Stay and Intensity of Care in Older Adults With Kidney Failure. JAMA Netw Open. 2020;3(2):e200222. doi: 10.1001/jamanetworkopen.2020.0222 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Han TS, Murray P, Robin J, Wilkinson P, Fluck D, Fry CH. Evaluation of the association of length of stay in hospital and outcomes. Int J Qual Health Care. 2022;34(2). doi: 10.1093/intqhc/mzab160 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Sud M, Yu B, Wijeysundera HC, Austin PC, Ko DT, Braga J, et al. Associations Between Short or Long Length of Stay and 30-Day Readmission and Mortality in Hospitalized Patients With Heart Failure. JACC Heart Fail. 2017;5(8):578–88. [DOI] [PubMed] [Google Scholar]
  • 42.Peixoto SG, Wolf JM, Glaeser AB, Maccari JG, Nasi LA. Longer length of stay, days between discharge/first readmission, and pulmonary involvement >/ = 50% increase prevalence of admissions in ICU in unplanned readmissions after COVID-19 hospitalizations. J Med Virol. 2022;94(8):3750–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Mastaneh Z, Mouseli A, Mohseni S, Dadipoor S. Predictors of hospital length of stay and mortality among COVID-19 inpatients during 2020–2021 in Hormozgan Province of Iran: A retrospective cohort study. Health Sci Rep. 2023;6(6):e1329. doi: 10.1002/hsr2.1329 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Lomba GSB, Silva P, Rosario NFD, Medeiros T, Alves LS, Silva AA, et al. Post-discharge all-cause mortality in COVID-19 recovered patients hospitalized in 2020: the impact of chronic kidney disease. Rev Inst Med Trop Sao Paulo. 2024;66:e1. doi: 10.1590/S1678-9946202466001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Bento GAO, Leite VLT, Campos RP, Vaz FB, Daher EF, Duarte DB. Reduction of estimated glomerular filtration rate after COVID-19-associated acute kidney injury. J Bras Nefrol. 2023;45(4):488–94. doi: 10.1590/2175-8239-JBN-2022-0179en [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Jensen SK, Heide-Jorgensen U, Vestergaard SV, Gammelager H, Birn H, Nitsch D, et al. Kidney function before and after acute kidney injury: a nationwide population-based cohort study. Clin Kidney J. 2023;16(3):484–93. doi: 10.1093/ckj/sfac247 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Melsom T, Nair V, Schei J, Mariani L, Stefansson VTN, Harder JL, et al. Correlation Between Baseline GFR and Subsequent Change in GFR in Norwegian Adults Without Diabetes and in Pima Indians. Am J Kidney Dis. 2019;73(6):777–85. doi: 10.1053/j.ajkd.2018.11.011 [DOI] [PubMed] [Google Scholar]
  • 48.Baba M, Shimbo T, Horio M, Ando M, Yasuda Y, Komatsu Y, et al. Longitudinal Study of the Decline in Renal Function in Healthy Subjects. PLoS One. 2015;10(6):e0129036. doi: 10.1371/journal.pone.0129036 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.van der Burgh AC, Rizopoulos D, Ikram MA, Hoorn EJ, Chaker L. Determinants of the Evolution of Kidney Function With Age. Kidney Int Rep. 2021;6(12):3054–63. doi: 10.1016/j.ekir.2021.10.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Buyadaa O, Salim A, Morton JI, Magliano DJ, Shaw JE. Rate of decline in kidney function and known age-of-onset or duration of type 2 diabetes. Sci Rep. 2021;11(1):14705. doi: 10.1038/s41598-021-94099-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Liu P, Quinn RR, Lam NN, Elliott MJ, Xu Y, James MT, et al. Accounting for Age in the Definition of Chronic Kidney Disease. JAMA Intern Med. 2021;181(10):1359–66. doi: 10.1001/jamainternmed.2021.4813 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Douville P, Martel AR, Talbot J, Desmeules S, Langlois S, Agharazii M. Impact of age on glomerular filtration estimates. Nephrol Dial Transplant. 2009;24(1):97–103. doi: 10.1093/ndt/gfn473 [DOI] [PubMed] [Google Scholar]
  • 53.Tangri N, Inker LA, Hiebert B, Wong J, Naimark D, Kent D, et al. A Dynamic Predictive Model for Progression of CKD. Am J Kidney Dis. 2017;69(4):514–20. doi: 10.1053/j.ajkd.2016.07.030 [DOI] [PubMed] [Google Scholar]
  • 54.Ferguson T, Ravani P, Sood MM, Clarke A, Komenda P, Rigatto C, et al. Development and External Validation of a Machine Learning Model for Progression of CKD. Kidney Int Rep. 2022;7(8):1772–81. doi: 10.1016/j.ekir.2022.05.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Aoki J, Kaya C, Khalid O, Kothari T, Silberman MA, Skordis C, et al. CKD Progression Prediction in a Diverse US Population: A Machine-Learning Model. Kidney Med. 2023;5(9):100692. doi: 10.1016/j.xkme.2023.100692 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Chauhan K, Nadkarni GN, Fleming F, McCullough J, He CJ, Quackenbush J, et al. Initial Validation of a Machine Learning-Derived Prognostic Test (KidneyIntelX) Integrating Biomarkers and Electronic Health Record Data To Predict Longitudinal Kidney Outcomes. Kidney360. 2020;1(8):731–9. doi: 10.34067/KID.0002252020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Tsai MH, Jhou MJ, Liu TC, Fang YW, Lu CJ. An integrated machine learning predictive scheme for longitudinal laboratory data to evaluate the factors determining renal function changes in patients with different chronic kidney disease stages. Front Med (Lausanne). 2023;10:1155426. doi: 10.3389/fmed.2023.1155426 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Xiao J, Ding R, Xu X, Guan H, Feng X, Sun T, et al. Comparison and development of machine learning tools in the prediction of chronic kidney disease progression. J Transl Med. 2019;17(1):119. doi: 10.1186/s12967-019-1860-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Hong MS, Lee YH, Kong JM, Kwon OJ, Jung CW, Yang J, et al. Personalized Prediction of Kidney Function Decline and Network Analysis of the Risk Factors after Kidney Transplantation Using Nationwide Cohort Data. J Clin Med. 2022;11(5). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Kaufman HW, Wang C, Wang Y, Han H, Chaudhuri S, Usvyat L, et al. Machine Learning Case Study: Patterns of Kidney Function Decline and Their Association With Clinical Outcomes Within 90 Days After the Initiation of Renal Dialysis. Adv Kidney Dis Health. 2023;30(1):33–9. [DOI] [PubMed] [Google Scholar]
  • 61.Berney SC, Gordon IR, Opdam HI, Denehy L. A classification and regression tree to assist clinical decision making in airway management for patients with cervical spinal cord injury. Spinal Cord. 2011;49(2):244–50. doi: 10.1038/sc.2010.97 [DOI] [PubMed] [Google Scholar]
  • 62.Kilincer Bozgul SM, Kurtulmus IA, Yargucu Zihni F, Akad Soyer N, Yagmur B, Gunes A, et al. Classification and Regression Tree Analysis in Adult Hemophagocytic Syndrome: Identifying Acute Heart Failure Predictors in ICU Patients. Journal of inflammation research. 2024;17(null):9711–23. doi: 10.2147/JIR.S491627 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Abu-Hanna A, de Keizer N. Integrating classification trees with local logistic regression in Intensive Care prognosis. Artificial Intelligence in Medicine. 2003;29(1):5–23. doi: 10.1016/s0933-3657(03)00047-2 [DOI] [PubMed] [Google Scholar]
  • 64.Trujillano J, Badia M, Serviá L, March J, Rodriguez-Pozo A. Stratification of the severity of critically ill patients with classification trees. BMC Medical Research Methodology. 2009;9(1):83. doi: 10.1186/1471-2288-9-83 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Diamantidis CJ, Cook DJ, Redelosa CK, Vinculado RB, Cabajar AA, Vassalotti JA. CKD and Rapid Kidney Function Decline During the COVID-19 Pandemic. Kidney Med. 2023;5(9):100701. doi: 10.1016/j.xkme.2023.100701 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Mahmood MA, Lata P, editors. A Machine Learning Approach to Predict Renal Diseases with SARS-CoV-2. 2021 Emerging Technology in Computing, Communication and Electronics (ETCCE); 2021 21–23 Dec. 2021. [Google Scholar]
  • 67.Vaid A, Chan L, Chaudhary K, Jaladanki SK, Paranjpe I, Russak A, et al. Predictive Approaches for Acute Dialysis Requirement and Death in COVID-19. Clin J Am Soc Nephrol. 2021;16(8):1158–68. doi: 10.2215/CJN.17311120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Herzog AL, von Jouanne-Diedrich HK, Wanner C, Weismann D, Schlesinger T, Meybohm P, et al. COVID-19 and the kidney: A retrospective analysis of 37 critically ill patients using machine learning. PLoS One. 2021;16(5):e0251932. doi: 10.1371/journal.pone.0251932 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Rudolfer SM, Paliouras G, Peers IS. A comparison of logistic regression to decision tree induction in the diagnosis of carpal tunnel syndrome. Comput Biomed Res. 1999;32(5):391–414. doi: 10.1006/cbmr.1999.1521 [DOI] [PubMed] [Google Scholar]
  • 70.Tsien CL, Fraser HS, Long WJ, Kennedy RL. Using classification tree and logistic regression methods to diagnose myocardial infarction. Stud Health Technol Inform. 1998;52 Pt 1:493–7. [PubMed] [Google Scholar]
  • 71.Chaudhuri AK, Sinha D, Banerjee DK, Das A. A novel enhanced decision tree model for detecting chronic kidney disease. Network Modeling Analysis in Health Informatics and Bioinformatics. 2021;10(1):29. [Google Scholar]

Decision Letter 0

Keiko Hosohata

26 Nov 2024

PONE-D-24-49407Classification and Regression Trees Analysis Identifies Patients at High Risk for Kidney Function Decline Following HospitalizationPLOS ONE

Dear Dr. KORAISHY,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jan 10 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Keiko Hosohata, Ph.D.

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. Please include a caption for figure 6, 7, 8, 9, 10, 11, 12.

4. We note you have included a table to which you do not refer in the text of your manuscript. Please ensure that you refer to Table 2 in your text; if accepted, production will need this reference to link the reader to the Table.

5. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Dear Authors

The current manuscript is outstanding but needs improvement in some areas. Even so, there are some questions.

Introduction

The authors should report the strong relationship between CKD and AKI, where one type of kidney injury can predispose to the other in both directions and both ways.

The authors should briefly report the frequency of AKI in hospitalized patients with COVID-19 and without COVID-19 to better contextualize.

Methods

The researchers should be able to classify kidney injury by the time of onset, such as AKI lasting up to 7 days, AKD between 7 and 30 days, and CKD lasting more than 30 days.

The authors analysed a lot of clinical data to understand the progression of CKD after COVID-19. However, it would be interesting to also analyse clinical laboratory data, such as complete blood count, to understand the relationship between hemoglobin concentration, leukocyte count, and platelet count with the progression to CKD.

Results

Was Machine Learning analysis able to observe other clinical factors related to greater fast GFR decline in severe-ill patients, such as mechanical ventilation requirement, or low urinary output?

Did the researchers perform a comparative analysis of eGFR at hospital admission between patients with and without COVID-19? Moreover, within the sample of patients with COVID-19. Was there a comparison of eGFR between patients with rapid loss of renal function and patients without rapid loss of renal function?

Discussion

Researchers should engage in a thorough discussion of our findings that 'patients with fast eGFR decline were more likely to have a higher baseline eGFR.' This intriguing contradiction challenges the existing literature, which suggests that CKD (lower GFR) is associated with a greater predisposition to AKI and that the presence of AKI is associated with progression or may even cause CKD.

Researchers should discuss in more detail the possibility of using CART in clinical practice, especially in ICUs, as patients are more severely ill and have a higher frequency of AKI.

And how useful this tool could be for physicians when assessing these patients.

Reviewer #2: The author firstly used Classification and Regression Trees (CART) to identify risk factors of fast GFR decline in post discharge patients during the COVID-19 pandemic. It showed the strength of utilizing machine learning models in the prediction of disease progression. Here are some minor revision advice:

1.The decision to remove variables with more than 5% missing data might lead to loss of potentially important information. It could be beneficial to explore and justify additional methods to handle missing data, such as multiple imputation.

2.Detail whether the matching was performed one-to-one, one-to-many, or another method, as this impacts the analysis.

3.Describe how balance was assessed post-matching, as this is crucial to ensure that the groups are comparable.

4.The rationale for selecting 500 decision trees and three variables for RF models could be more detailed. Also, discuss the sensitivity of results to different hyperparameter choices.

5.Ensure the selection of influential factors in multivariate LR and CART analyses is clearly rationalized, possibly by exploring whether other thresholds for variable inclusion could impact findings.

6.Ensure consistent use of terms and abbreviations throughout the text. For instance, if "eGFR" is used, avoid switching to "GFR" without explanation.

7.Use consistent formatting for statistical data, such as percentages and p-values, to maintain uniformity.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Miguel Angelo Goes, MD, PhD, FASN

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2025 Jan 31;20(1):e0317558. doi: 10.1371/journal.pone.0317558.r002

Author response to Decision Letter 0


12 Dec 2024

please see the attached Author Response letter file

Attachment

Submitted filename: FK edits Kidney ML paper - Author Responses Letter 121224.docx

pone.0317558.s025.docx (159.4KB, docx)

Decision Letter 1

Keiko Hosohata

2 Jan 2025

Classification and Regression Trees analysis identifies patients at high risk for kidney function decline following hospitalization

PONE-D-24-49407R1

Dear Dr. KORAISHY,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Keiko Hosohata, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Keiko Hosohata

22 Jan 2025

PONE-D-24-49407R1

PLOS ONE

Dear Dr. Koraishy,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr Keiko Hosohata

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Random Forest for fast eGFR decline in the original dataset (N = 1747).

    (DOCX)

    pone.0317558.s001.docx (120.4KB, docx)
    S2 Fig. Random Forest for fast eGFR decline in the COVID negative subset (N = 1487).

    (DOCX)

    pone.0317558.s002.docx (114.7KB, docx)
    S3 Fig. Random Forest for fast eGFR decline in the COVID positive subset (N = 260).

    (DOCX)

    pone.0317558.s003.docx (112.2KB, docx)
    S4 Fig. Random Forest for fast eGFR decline in the ICU matched subset (N = 612).

    (DOCX)

    pone.0317558.s004.docx (114.3KB, docx)
    S5 Fig. Random Forest for fast eGFR decline in the ICU matched COVID negative subset (N = 510).

    (DOCX)

    pone.0317558.s005.docx (111.4KB, docx)
    S6 Fig. Random Forest for fast eGFR decline in the ICU matched COVID positive subset (N = 102).

    (DOCX)

    pone.0317558.s006.docx (111.5KB, docx)
    S7 Fig. CART decision tree for fast eGFR decline in the COVID-19 negative subgroup of the PSM matched ICU subset (N = 510).

    (DOCX)

    pone.0317558.s007.docx (58.1KB, docx)
    S8 Fig. CART decision tree for fast eGFR decline in the COVID-19 positive subgroup of the PSM matched ICU subset (N = 102).

    (DOCX)

    pone.0317558.s008.docx (61.9KB, docx)
    S9 Fig. Random Forest for fast eGFR decline in the COVID matched subset (N = 520).

    (DOCX)

    pone.0317558.s009.docx (113.9KB, docx)
    S10 Fig. Random Forest for fast eGFR decline in the COVID matched COVID negative subset (N = 260).

    (DOCX)

    pone.0317558.s010.docx (112.1KB, docx)
    S11 Fig. Random Forest for fast eGFR decline in the COVID matched COVID positive subset (N = 260).

    (DOCX)

    pone.0317558.s011.docx (108.7KB, docx)
    S12 Fig. CART decision tree for fast eGFR decline in the COVID negative subgroup of the PSM matched COVID-19 subset (N = 260).

    (DOCX)

    pone.0317558.s012.docx (62.1KB, docx)
    S13 Fig. Standardized Mean Differences (SMDs) Before and After Matching for ICU Admission Groups.

    (DOCX)

    pone.0317558.s013.docx (80.7KB, docx)
    S14 Fig. Standardized Mean Differences (SMDs) Before and After Matching for COVID-19 Diagnosis Groups.

    (DOCX)

    pone.0317558.s014.docx (80.7KB, docx)
    S1 Table. Logistic regression for fast eGFR decline in the COVID negative subset (N = 1487).

    (DOCX)

    pone.0317558.s015.docx (21.7KB, docx)
    S2 Table. Logistic regression for fast eGFR decline in the COVID positive subset (N = 260).

    (DOCX)

    pone.0317558.s016.docx (21.4KB, docx)
    S3 Table. Univariate analysis of the followed-up patients admitted and not admitted to the ICU after PSM on 4 demographic variables.

    (DOCX)

    pone.0317558.s017.docx (26.7KB, docx)
    S4 Table. Logistic regression for fast eGFR decline in the COVID negative subgroup of the PSM matched ICU subset (N = 510).

    (DOCX)

    pone.0317558.s018.docx (21.6KB, docx)
    S5 Table. Logistic regression for fast eGFR decline in the COVID positive subgroup of the PSM matched ICU subset (N = 102).

    (DOCX)

    pone.0317558.s019.docx (21.4KB, docx)
    S6 Table. Univariate analysis of the followed-up patients with and without COVID-19 (N = 1,747).

    (DOCX)

    pone.0317558.s020.docx (29.2KB, docx)
    S7 Table. Univariate analysis of the followed-up patients with and without COVID-19 after PSM on 4 demographic variables.

    (DOCX)

    pone.0317558.s021.docx (27.3KB, docx)
    S8 Table. Logistic regression for fast eGFR decline in the COVID-negative subgroup of the PSM matched COVID-19 subset (N = 260).

    (DOCX)

    pone.0317558.s022.docx (21.2KB, docx)
    S9 Table. Comparisons between CART decision tree and logistic regression models.

    (DOCX)

    pone.0317558.s023.docx (20.7KB, docx)
    S1 Methods

    (DOCX)

    pone.0317558.s024.docx (38.5KB, docx)
    Attachment

    Submitted filename: FK edits Kidney ML paper - Author Responses Letter 121224.docx

    pone.0317558.s025.docx (159.4KB, docx)

    Data Availability Statement

    All relevant data are within the paper and its Supporting Information files.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES