Abstract
The exponential rise in the burden of chronic kidney disease (CKD) worldwide has put enormous pressure on the economy. Predictive modeling of CKD can ease this burden by predicting the future disease occurrence ahead of its onset. There are various regression methods for predictive modeling based on the distribution of the outcome variable. However, the accuracy of the predictive model depends on how well the model is developed by taking into account the goodness of fit, choice of covariates, handling of covariates measured on a continuous scale, handling of categorical covariates, and number of outcome events per predictor parameter or sample size. Optimal performance of a predictive model on an independent cohort is desired. However, there are several challenges in the predictive modeling of CKD. Disease-specific methodological challenges hinder the development of a predictive model that is cost-effective and universally applicable to predict CKD onset. In this review, we discuss the advantages and challenges of various regression models available for predictive modeling and highlight those best for future CKD prediction.
Keywords: Chronic kidney disease, Predictive modelling, Regression, Statistical modelling, Methodology
Core Tip: The burden of chronic kidney disease (CKD) is growing rapidly and there is an urgent need to prevent the growth of the disease burden by identifying the individuals at high risk for the development of CKD. A broad spectrum of statistical models exist that can predict the future onset of the disease. This narrative review discusses the practical applicability of various statistical models for CKD prediction.
INTRODUCTION
The growing burden of chronic diseases calls for advanced preventive measures, proper screening, and early diagnosis to limit the economic burden. Preventive strategies through changes in lifestyle and dietary habits could limit the burden of chronic diseases. However, it is difficult to inculcate these changes, and it is a long-term process to reach targets of sustainable development goals to reduce premature mortality. Statistical methods could be effectively applied to predict the onset of these chronic conditions through well-developed and validated predictive models. Different predictive models have been developed for different chronic diseases[1,2]. However, the feasibility of applying existing models in real life with predictive accuracy and translational significance is still a major challenge among practitioners. Regional and sociodemographic differences of the individuals pose generalizability issues of the existing models. In addition, the appropriate modeling techniques, including model development and validation methods, are among the few other challenges for the practical application of the existing models.
Chronic kidney disease (CKD), among the broad spectrum of chronic diseases is on an exponential rise. Kidney Disease Improving Global Outcomes (KDIGO) guidelines define CKD as structural or functional abnormalities in the kidneys, present for > 3 mo[3]. Functional abnormalities in the kidneys can be assessed using glomerular filtration rate (GFR), which measures the rate of filtration of blood through glomeruli (network of blood vessels in kidneys). It is measured by clearance of exogenous filtration markers[4]. However, functional abnormalities in clinical practice are approximated by using estimated glomerular filtration rate (eGFR)[5]. It is calculated using serum creatinine or serum cystatin (endogenous markers) and classifies kidney function into G1–G5 categories, whereas KDIGO classification based on urine albumin-to-creatinine ratio (ACR) classifies the disease into A1–A3 categories. However, early diagnosis of CKD between stages 1 and 3 is challenging, as CKD remains asymptomatic in its early stage. Noninvasive markers show up when the majority of kidney tissue is already damaged. Thus, predictive modeling can address the issue and help ease the future CKD burden by predicting disease onset.
Several regression methods exist for the predictive modeling of the disease. The choice of the method depends on the distribution of the outcome variable and its relationship with the covariates. Nevertheless, each of the available regression methods is defined under a set of assumptions that are specific to the method under consideration. However, the extent to which the real data deviate from the defined set of assumptions poses a real challenge for statisticians. Internal validation, calibration and discrimination of the model have been suggested to be adequately considered when developing the predictive model of a disease[6,7]. The broad classification of the regression methods, based on the distribution of the outcome variable includes multiple linear regression, quantile regression, logistic regression, Poisson regression, and negative binomial regression (Figures 1 and 2). This review discusses the challenges associated with the application of these regression methods for the predictive modeling of CKD.
Figure 1.
Selection of appropriate regression model.
Figure 2.
Selection of appropriate regression model for chronic kidney disease. ACR: albumin-to-creatinine ratio; KDIGO: Kidney Disease Improving Global Outcomes; eGFR: Estimated glomerular filtration rate.
REGRESSION MODELS BASED ON CONTINUOUS OUTCOME VARIABLES
Simple/multiple linear regression model for CKD
Simple linear regression is the most basic regression method initially conceptualized and applied by Sir Francis Galton to solve the problem of heredity in the 19th century. The mathematical notation of the simple linear regression model is given by: E(Y|X)= µ(X)= β0 + β1(X), which is a line with intercept β0 and slope β1, with Y the outcome variable measured on a continuous scale and X the covariate.
Simple linear regression can be extended to multiple linear regression to include more than one independent variable, to model multifactorial diseases like CKD. Multiple linear regression analysis uses the ordinary least square estimation method to study the causal association between the outcome variable and the covariates[8]. Linear regression analysis relies on the basic assumption of the linear relationship between the predictor variables and the outcome; the outcome variable being measured on a continuous scale. However, as KDIGO classifies kidney disease based on eGFR and ACR categories, the application of linear regression to predict future kidney disease is irrelevant for the case of CKD. However, multiple linear regression can only be used to model changes in eGFR or ACR, which are continuous variables and also surrogate points for CKD[9]. Nevertheless, longitudinal cohort studies with longer follow-up periods are required to achieve the minimum sample size for the clinically significant decline in eGFR[10]. However, the assumption of the linear relationship between the outcome and the predictor still holds in addition to various other assumptions of heteroskedasticity (differences in variance of errors), multicollinearity (correlation between independent variables (covariates, in case of multiple linear regression), and independence of observations[8]. The concept of simple linear regression can be extended to include multiple independent variables (multiple linear regression). However, the decline in kidney function is a multifactorial condition with the probability of being skewed[11]. For example, Zhang et al[12] reported the serum stem cell factor level as a predictor of decline in kidney function using multiple linear regression. They used a single-time assessment of eGFR, unlike what is recommended by KDIGO guidelines, to assess kidney health. Similarly, Cheung et al[13] identified risk factors of incident CKD by eGFR change, contrary to KDIGO recommendation. Another study[14] applied multiple linear regression to predict urine ACR in diabetes, which could not provide information on how much risk of kidney disease (categorized as persistent ACR ≥ 30 mg/g) was estimated in individuals with diabetes. These studies indicate the limitations of using multiple linear regression to predict CKD. The other strategy would be to overcome the stringent assumptions of linear regression; for this, the quantile regression method could be an alternative for CKD prediction, as discussed in the following paragraph.
Quantile regression model for CKD
The concept of quantile regression was given by Koenker and Bassett in 1978. The mathematical model for the quantile regression to estimate the qth quantile of the outcome variable Y and covariate X: QY|X(q) = f(β,X = xi) = Xβq, where, probability (Y ≤ f(β,X = xi)) = q and β is regression coefficient, 0 ≤ q ≤ 1.
Quantile regression models the quantile of the outcome variable and thus can handle skewed distribution of kidney function decline, with the assumptions of covariates being the same[8]. As for ordinary least square regression, quantile regression minimizes the weighted distances. Additionally, it is more robust and does not make any assumption about the distribution of the outcome variable, except the continuity of the variable, and can be used to model extreme values[15,16]. However as discussed in linear regression, the issue of categorization of eGFR for KDIGO-based CKD classification cannot be neglected[12,14]. Nevertheless, it requires a larger sample size than linear regression[8].
REGRESSION MODELS BASED ON CATEGORICAL OUTCOME VARIABLES FOR CKD
Poisson regression model for CKD
Poisson regression was named after the French mathematician and physicist Siméon Denis Poisson. The Poisson regression model is given by: Yi = Log (λi) = β0 + βiXi, where observed values Yi~Poisson distribution with λ = λi, Xis are covariates, and βis are regression coefficients.
Poisson regression is used to model the variable following the Poisson distribution under the assumption of equal mean and variance of the variable[17]. It was initially developed to model discrete outcome variables (count variable) but has also been widely accepted to model dichotomous variables (variables with binary outcome). Thus, Poisson regression could be an option to model the occurrence of CKD. However, since CKD has a low yearly incidence resulting in a maximum number of nondisease cases, the distribution of the outcome variable is skewed. This violates the assumption of equivalence of mean and variance. The incidence of CKD reported to date ranges from 0.49%/year to 1.9%/year in different disease groups[13,18-22]; i.e. approximately 1 in 100 individuals followed up for a year develops CKD and most of the participants remain disease free. This confirms the skewed distribution of the data with unequal mean and variance, limiting the use of Poisson regression for the predictive modeling of CKD. Various resources suggest the use of zero-inflated Poisson regression in case of overdispersion, as observed in the case of CKD[23]. However, zero-inflated models assume the presence of two processes behind the generation of added zeros; the unexplored area of CKD[24,25]. Thus, zero-inflated models could not apply to CKD. Negative binomial regression could be a more recommended technique for the predictive modeling of CKD; however, to model such cases whether the negative binomial regression model is better than the proportional odds model is still debatable[26].
Logistic regression model for CKD
The logistic regression model was primarily developed by Joseph Berkson where the relationship between the outcome variable Y and the covariate X is given by: Logit{Y|X}= logit(P) = log = Xβ, where, P = Prob{Y = 1|X} and β is the regression coefficient.
Logistic regression models the categorical outcome variable using the method of maximum likelihood estimation[8]. The three logistic regressions, binary, ordinal (proportional odds model), and multinomial, model three different types of outcome variables: dichotomous, ordinal and nominal, respectively. The sample size required for the diagnostic models needs to be such that the predictive model does not overfit the training data and is based on the event per predictor parameter and the number of predictors[27]. CKD is a multifactorial disease with poor awareness of its risk factors, especially in low-resource settings[28-30]. Thus, larger study cohorts with longer periods of follow-up are required to predict CKD, which is a challenge for low-resource settings. Although they have a few limitations, logistic regression models with penalized predictor effects can be used to partially overcome the issue of overfitting[31]. This agrees with the evidence from the existing literature[32]. In the case of small sample studies, internal validation using bootstrapping could be preferred for robust model estimates[33]. Table 1 shows the form of hypothetical data valid to be used for logistic regression.
Table 1.
Hypothetical data format for the use of logistic regression model for chronic kidney disease
ID
|
Age (year)
|
Gender
|
eGFR1 (mL/min/1.73 m²)
|
eGFR_grade11
|
eGFR2 (mL/min/1.73 m²)
|
eGFR_grade21
|
Chronic kidney disease2
|
1 | 58 | 0 | 88.08 | 2 | 103.68 | 1 | 0 |
2 | 48 | 1 | 107.51 | 1 | 88.90 | 2 | 0 |
3 | 37 | 1 | 94.28 | 1 | 88.12 | 2 | 0 |
4 | 58 | 0 | 93.17 | 1 | 87.06 | 2 | 0 |
5 | 53 | 0 | 58.42 | 3 | 51.35 | 3 | 1 |
6 | 37 | 0 | 95.73 | 1 | 108.90 | 1 | 0 |
7 | 43 | 1 | 84.51 | 2 | 97.25 | 1 | 0 |
8 | 49 | 0 | 100.02 | 1 | 97.84 | 1 | 0 |
9 | 33 | 1 | 105.80 | 1 | 98.74 | 1 | 0 |
10 | 53 | 0 | 108.04 | 1 | 104.39 | 1 | 0 |
11 | 46 | 1 | 106.05 | 1 | 89.04 | 2 | 0 |
12 | 59 | 0 | 114.62 | 1 | 106.81 | 1 | 0 |
13 | 60 | 0 | 121.17 | 1 | 88.75 | 2 | 0 |
14 | 40 | 0 | 101.23 | 1 | 103.60 | 1 | 0 |
15 | 55 | 1 | 114.35 | 1 | 90.59 | 1 | 0 |
16 | 55 | 1 | 90.07 | 1 | 119.00 | 1 | 0 |
17 | 42 | 1 | 86.74 | 2 | 157.50 | 1 | 0 |
28 | 43 | 0 | 47.93 | 3 | 55.74 | 3 | 1 |
29 | 41 | 0 | 97.79 | 1 | 102.74 | 1 | 0 |
30 | 30 | 1 | 117.68 | 1 | 77.65 | 1 | 0 |
> 2 ordered categories – ordinal logistic regression.
2 categories – binary logistic regression.
1 coded for male and 0 coded for female. eGFR: Estimated glomerular filtration rate.
CHALLENGES ASSOCIATED WITH PREDICTIVE MODELING
Overfitting in predictive models
As stated in the previous section, the regression model developed using a small sample size is usually overoptimistic and may not perform well in external validation (performance of the developed model in an independent cohort)[8]. Adequate sample size methods have been suggested to reduce overfitting[10,27,34]. Overfitting of the model also comes into play in cases of rare diseases with lower incidence where the potential risk factors of the disease could not be accurately estimated. The duration of diabetes plays a major role in the prediction of CKD[35]. However, with the poor awareness of diabetes, the correct reporting of the duration of diabetes is the major issue that may cause overfitting of the model due to an added potential predictor with suboptimal accuracy. Furthermore, chronic diseases like CKD are complexly affected by various demographic, biochemical, environmental, genetic and lifestyle-associated factors. Thus, chronic diseases with multiple confounding factors are prone to cause overfitting in their predictive models. To overcome this, several methods of penalization have been developed that shrink the coefficients of unimportant variables close to zero and thereby reduce the overfitting of the developed model. LASSO regression, elastic net, and Ridge regression are the available penalization methods that account for the overfitting of the model[8,36]. The global shrinkage factor of 0.9 is considered optimum, with bootstrapping considered the best method to calculate shrinkage post-estimation[27]. However, shrinkage methods have also been shown to fail in cases of small sample sizes[31]. Thus, using a lesser number of predictors, meaningful derivatives (variables calculated from several variables, like body mass index using height and weight) that combine several variables, and principal component analysis to reduce the number of covariates has been suggested[27]. Similar to the prediction of CKD, the modeling of time to the occurrence of CKD also suffers the limitation of overfitting. Thus, apart from dimension reduction techniques, penalization methods such as penalized maximum likelihood for binary logistic regression, and penalized likelihood in Cox regression were observed as a better-developed and more general shrinkage method[6,37].
Loss of information due to categorization of continuous variables
KDIGO classifies CKD using eGFR or ACR; defined as persistent eGFR < 60 mL/min/1.73 m2 or ACR ≥ 30 mg/g for ≥ 3 mo[3]. The categorization of the variable measured on a continuous scale is done for the diagnosis of the disease or classification of the disease in different stages. Categorization is useful for descriptive purposes but may result in a loss of information for data analysis[8,38]. The comparison between studies can be efficiently made when the optimal cutoff point is available (as in the case of CKD); however, differences in the use of disease definitions restrict the generalization of the findings[39-41]. There is a lack of agreement between definitions of decline in renal function or CKD incidence[42,43]. In most studies, the definition of CKD was taken as per the KDIGO guidelines (based on outcome), while in some, varying units of percentage decline in eGFR or increase in ACR were used to describe kidney disease. The decline of ≥ 15%, ≥ 30% and 40% to varying units of annual decline in eGFR is being used to define decrease in kidney function[39,44-47]. Similarly, the lack of agreement between studies also exists for a persistent decline in eGFR, or increase in ACR as many studies analyzed results with single-time assessment of ACR or eGFR. These differences in disease definitions and categorizations leads to comparisons between the studies being difficult, leading to information loss and biased conclusions.
For improved CKD prediction and categorization of the parameters to diagnose CKD, KDIGO guidelines need to be followed. Furthermore, it has been suggested that studies that define decline in renal function (in the case of CKD) by their median lead to the loss of power similar to loss incurred by a loss of a third of data from small studies[38,48]. Thus, dichotomization by median also leads to false-positive results, underestimation of the extent of variability in the variable, and misclassification of individuals with similar characteristics as being different[49]. However, studies suggest using three or more categories (preferably at percentiles), so that the apparent shape of the relationship between the variables under study can be inferred[38]. Nevertheless, the use of quartiles for the categorization of continuous variables remains debatable[50].
CONCLUSION
The prediction of disease is complex and requires several factors and rigorous methodology for the predictive model to be efficient, parsimonious and generalizable to a larger population of interest. To the best of our knowledge, this is the first review discussing the broad categories of predictive modeling methods and several other challenges associated with the prediction of CKD. The review discusses the various methodological challenges associated with several statistical models to predict CKD. Following KDIGO guidelines, eGFR or ACR needs to be categorized to define CKD in clinical and epidemiological settings. Thus, regression models, that could best study categorical outcome variables is the suggestive methodology for CKD modeling. Since the categorization of eGFR or ACR could not be neglected for the diagnosis of CKD, therefore linear regression or quantile regression cannot be used for the predictive modeling of CKD. Moreover, with the low early incidence of CKD, the assumption of equidispersion of Poisson regression cannot be achieved. Nevertheless, the clinical implication of these models could be achieved only if we adhere to the clinical practice guidelines formulated by KDIGO. Thus, in light of the review of existing literature, binary logistic regression seems to be the preferred method for the predictive modeling of CKD using the method of maximum likelihood estimation. Moreover, using appropriate shrinkage methods, penalized maximum likelihood estimates can be used to account for overfitting. Moreover, with the severity of CKD and its consequences on public health and challenges associated with its prediction, we anticipate that the predictive model of CKD could be accurate and specific with the inclusion of the important demographic, biochemical and molecular markers rather than being parsimonious to control the kidney disease burden.
ACKNOWLEDGEMENTS
The authors would also like to thank Dr. Ashish Awasthi, Senior Scientist, Central Drug Research Institute, India for the motivation to write the review.
Footnotes
Conflict-of-interest statement: All authors have no conflicts of interest to disclose.
Provenance and peer review: Invited article; Externally peer reviewed.
Peer-review model: Single blind
Specialty type: Methodology
Country of origin: India
Peer-review report’s classification
Scientific Quality: Grade C
Novelty: Grade B
Creativity or Innovation: Grade B
Scientific Significance: Grade B
P-Reviewer: Jamaluddin J S-Editor: Liu JH L-Editor: Kerr C P-Editor: Yu HG
Contributor Information
Sukhanshi Khandpur, Department of Molecular Medicine & Biotechnology, Sanjay Gandhi Post Graduate Institute of Medical Science, Lucknow 226014, Uttar Pradesh, India.
Prabhaker Mishra, Department of Biostatistics and Health Informatics, Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow 226014, Uttar Pradesh, India.
Shambhavi Mishra, Department of Statistics, University of Lucknow, Lucknow 226007, Uttar Pradesh, India.
Swasti Tiwari, Department of Molecular Medicine & Biotechnology, Sanjay Gandhi Post Graduate Institute of Medical Science, Lucknow 226014, Uttar Pradesh, India. tiwaris@sgpgi.ac.in.
References
- 1.Takura T, Hirano Goto K, Honda A. Development of a predictive model for integrated medical and long-term care resource consumption based on health behaviour: application of healthcare big data of patients with circulatory diseases. BMC Med. 2021;19:15. doi: 10.1186/s12916-020-01874-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Zhang Q, Zhu Y, Yu W, Xu Z, Zhao Z, Liu S, Xin Y, Lv K. Diagnostic accuracy assessment of molecular prediction model for the risk of NAFLD based on MRI-PDFF diagnosed Chinese Han population. BMC Gastroenterol. 2021;21:88. doi: 10.1186/s12876-021-01675-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kidney Disease: Improving Global Outcomes (KDIGO) Diabetes Work Group. KDIGO 2020 Clinical Practice Guideline for Diabetes Management in Chronic Kidney Disease. Kidney Int. 2020;98:S1–S115. doi: 10.1016/j.kint.2020.06.019. [DOI] [PubMed] [Google Scholar]
- 4.Zager RA. Exogenous creatinine clearance accurately assesses filtration failure in rat experimental nephropathies. Am J Kidney Dis. 1987;10:427–430. doi: 10.1016/s0272-6386(87)80188-9. [DOI] [PubMed] [Google Scholar]
- 5.Thompson LE, Joy MS. Endogenous markers of kidney function and renal drug clearance processes of filtration, secretion, and reabsorption. Curr Opin Toxicol. 2022;31 doi: 10.1016/j.cotox.2022.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Collins GS, Dhiman P, Ma J, Schlussel MM, Archer L, Van Calster B, Harrell FE Jr, Martin GP, Moons KGM, van Smeden M, Sperrin M, Bullock GS, Riley RD. Evaluation of clinical prediction models (part 1): from development to external validation. BMJ. 2024;384:e074819. doi: 10.1136/bmj-2023-074819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Riley RD, Archer L, Snell KIE, Ensor J, Dhiman P, Martin GP, Bonnett LJ, Collins GS. Evaluation of clinical prediction models (part 2): how to undertake an external validation study. BMJ. 2024;384:e074820. doi: 10.1136/bmj-2023-074820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Harrell FE. Regression Modeling Strategies. Springer Series in Statistics, 2001. [Google Scholar]
- 9.Inker LA, Collier W, Greene T, Miao S, Chaudhari J, Appel GB, Badve SV, Caravaca-Fontán F, Del Vecchio L, Floege J, Goicoechea M, Haaland B, Herrington WG, Imai E, Jafar TH, Lewis JB, Li PKT, Maes BD, Neuen BL, Perrone RD, Remuzzi G, Schena FP, Wanner C, Wetzels JFM, Woodward M, Heerspink HJL CKD-EPI Clinical Trials Consortium. A meta-analysis of GFR slope as a surrogate endpoint for kidney failure. Nat Med. 2023;29:1867–1876. doi: 10.1038/s41591-023-02418-0. [DOI] [PubMed] [Google Scholar]
- 10.Riley RD, Snell KIE, Ensor J, Burke DL, Harrell FE Jr, Moons KGM, Collins GS. Minimum sample size for developing a multivariable prediction model: Part I-Continuous outcomes. Stat Med. 2019;38:1262–1275. doi: 10.1002/sim.7993. [DOI] [PubMed] [Google Scholar]
- 11.Rosansky SJ, Glassock RJ. Is a decline in estimated GFR an appropriate surrogate end point for renoprotection drug trials? Kidney Int. 2014;85:723–727. doi: 10.1038/ki.2013.506. [DOI] [PubMed] [Google Scholar]
- 12.Zhang W, Jia L, Liu DLX, Chen L, Wang Q, Song K, Nie S, Ma J, Chen X, Xiu M, Gao M, Zhao D, Zheng Y, Duan S, Dong Z, Li Z, Wang P, Fu B, Cai G, Sun X, Chen X. Serum Stem Cell Factor Level Predicts Decline in Kidney Function in Healthy Aging Adults. J Nutr Health Aging. 2019;23:813–820. doi: 10.1007/s12603-019-1253-3. [DOI] [PubMed] [Google Scholar]
- 13.Cheung KL, Crews DC, Cushman M, Yuan Y, Wilkinson K, Long DL, Judd SE, Shlipak MG, Ix JH, Bullen AL, Warnock DG, Gutiérrez OM. Risk Factors for Incident CKD in Black and White Americans: The REGARDS Study. Am J Kidney Dis. 2023;82:11–21.e1. doi: 10.1053/j.ajkd.2022.11.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Huang LY, Chen FY, Jhou MJ, Kuo CH, Wu CZ, Lu CH, Chen YL, Pei D, Cheng YF, Lu CJ. Comparing Multiple Linear Regression and Machine Learning in Predicting Diabetic Urine Albumin-Creatinine Ratio in a 4-Year Follow-Up Study. J Clin Med. 2022;11 doi: 10.3390/jcm11133661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Lê Cook B, Manning WG. Thinking beyond the mean: a practical guide for using quantile regression methods for health services research. Shanghai Arch Psychiatry. 2013;25:55–59. doi: 10.3969/j.issn.1002-0829.2013.01.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Marrie RA, Dawson NV, Garland A. Quantile regression and restricted cubic splines are useful for exploring relationships between continuous variables. J Clin Epidemiol. 2009;62:511–7.e1. doi: 10.1016/j.jclinepi.2008.05.015. [DOI] [PubMed] [Google Scholar]
- 17.Coxe S, West SG, Aiken LS. The analysis of count data: a gentle introduction to poisson regression and its alternatives. J Pers Assess. 2009;91:121–136. doi: 10.1080/00223890802634175. [DOI] [PubMed] [Google Scholar]
- 18.Kampmann JD, Heaf JG, Mogensen CB, Mickley H, Wolff DL, Brandt F. Prevalence and incidence of chronic kidney disease stage 3-5 - results from KidDiCo. BMC Nephrol. 2023;24:17. doi: 10.1186/s12882-023-03056-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Barkas F, Elisaf M, Liberopoulos E, Kalaitzidis R, Liamis G. Uric acid and incident chronic kidney disease in dyslipidemic individuals. Curr Med Res Opin. 2018;34:1193–1199. doi: 10.1080/03007995.2017.1372157. [DOI] [PubMed] [Google Scholar]
- 20.Pongpirul W, Pongpirul K, Ananworanich J, Klinbuayaem V, Avihingsanon A, Prasithsirikul W. Chronic kidney disease incidence and survival of Thai HIV-infected patients. AIDS. 2018;32:393–398. doi: 10.1097/QAD.0000000000001698. [DOI] [PubMed] [Google Scholar]
- 21.Agarwal R, Song RJ, Vasan RS, Xanthakis V. Left Ventricular Mass and Incident Chronic Kidney Disease. Hypertension. 2020;75:702–706. doi: 10.1161/HYPERTENSIONAHA.119.14258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Yu MK, Katon W, Young BA. Associations between sex and incident chronic kidney disease in a prospective diabetic cohort. Nephrology (Carlton) 2015;20:451–458. doi: 10.1111/nep.12468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Yang Z, Hardin JW, Addy CL. Testing overdispersion in the zero-inflated Poisson model. J Stat Plan Infer. 2009;139:3340–3353. [Google Scholar]
- 24.Paulo Fávero L. Count Data Regression Analysis: Concepts, Overdispersion Detection, Zero-inflation Identification, and Applications with R Detection, Zero-inflation Identification, and Applications with R. 2021. Available from: https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1488&context=pare .
- 25.Moriña D, Puig P, Navarro A. Analysis of zero inflated dichotomous variables from a Bayesian perspective: application to occupational health. BMC Med Res Methodol. 2021;21:277. doi: 10.1186/s12874-021-01427-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Fernandez GA, Vatcheva KP. A comparison of statistical methods for modeling count data with an application to hospital length of stay. BMC Med Res Methodol. 2022;22:211. doi: 10.1186/s12874-022-01685-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Riley RD, Snell KI, Ensor J, Burke DL, Harrell FE Jr, Moons KG, Collins GS. Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes. Stat Med. 2019;38:1276–1296. doi: 10.1002/sim.7992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Chen TK, Knicely DH, Grams ME. Chronic Kidney Disease Diagnosis and Management: A Review. JAMA. 2019;322:1294–1304. doi: 10.1001/jama.2019.14745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Khandpur S, Bhardwaj M, Awasthi A, Newtonraj A, Purty AJ, Khanna T, Abraham G, Tiwari S. Association of kidney functions with a cascade of care for diabetes and hypertension in two geographically distinct Indian cohorts. Diabetes Res Clin Pract. 2021;176:108861. doi: 10.1016/j.diabres.2021.108861. [DOI] [PubMed] [Google Scholar]
- 30.Flood D, Seiglie JA, Dunn M, Tschida S, Theilmann M, Marcus ME, Brian G, Norov B, Mayige MT, Singh Gurung M, Aryal KK, Labadarios D, Dorobantu M, Silver BK, Bovet P, Adelin Jorgensen JM, Guwatudde D, Houehanou C, Andall-Brereton G, Quesnel-Crooks S, Sturua L, Farzadfar F, Saeedi Moghaddam S, Atun R, Vollmer S, Bärnighausen TW, Davies JI, Wexler DJ, Geldsetzer P, Rohloff P, Ramírez-Zea M, Heisler M, Manne-Goehler J. The state of diabetes treatment coverage in 55 low-income and middle-income countries: a cross-sectional study of nationally representative, individual-level data in 680 102 adults. Lancet Healthy Longev. 2021;2:e340–e351. doi: 10.1016/s2666-7568(21)00089-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Riley RD, Snell KIE, Martin GP, Whittle R, Archer L, Sperrin M, Collins GS. Penalization and shrinkage methods produced unreliable clinical prediction models especially when sample size was small. J Clin Epidemiol. 2021;132:88–96. doi: 10.1016/j.jclinepi.2020.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Nusinovici S, Tham YC, Chak Yan MY, Wei Ting DS, Li J, Sabanayagam C, Wong TY, Cheng CY. Logistic regression was as good as machine learning for predicting major chronic diseases. J Clin Epidemiol. 2020;122:56–69. doi: 10.1016/j.jclinepi.2020.03.002. [DOI] [PubMed] [Google Scholar]
- 33.Steyerberg EW, Bleeker SE, Moll HA, Grobbee DE, Moons KG. Internal and external validation of predictive models: a simulation study of bias and precision in small samples. J Clin Epidemiol. 2003;56:441–447. doi: 10.1016/s0895-4356(03)00047-7. [DOI] [PubMed] [Google Scholar]
- 34.Riley RD, Ensor J, Snell KIE, Harrell FE Jr, Martin GP, Reitsma JB, Moons KGM, Collins G, van Smeden M. Calculating the sample size required for developing a clinical prediction model. BMJ. 2020;368:m441. doi: 10.1136/bmj.m441. [DOI] [PubMed] [Google Scholar]
- 35.Fenta ET, Eshetu HB, Kebede N, Bogale EK, Zewdie A, Kassie TD, Anagaw TF, Mazengia EM, Gelaw SS. Prevalence and predictors of chronic kidney disease among type 2 diabetic patients worldwide, systematic review and meta-analysis. Diabetol Metab Syndr. 2023;15:245. doi: 10.1186/s13098-023-01202-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Cessie SL, Houwelingen JCV. Ridge Estimators in Logistic Regression. Appl Stat. 1992;41:191. [Google Scholar]
- 37.Verweij PJ, Van Houwelingen HC. Penalized likelihood in Cox regression. Stat Med. 1994;13:2427–2436. doi: 10.1002/sim.4780132307. [DOI] [PubMed] [Google Scholar]
- 38.Altman DG. Categorizing Continuous Variables. Wiley StatsRef: Statistics Reference Online, 2014. [Google Scholar]
- 39.Masrouri S, Alijanzadeh D, Amiri M, Azizi F, Hadaegh F. Predictors of decline in kidney function in the general population: a decade of follow-up from the Tehran Lipid and Glucose Study. Ann Med. 2023;55:2216020. doi: 10.1080/07853890.2023.2216020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Baba M, Shimbo T, Horio M, Ando M, Yasuda Y, Komatsu Y, Masuda K, Matsuo S, Maruyama S. Longitudinal Study of the Decline in Renal Function in Healthy Subjects. PLoS One. 2015;10:e0129036. doi: 10.1371/journal.pone.0129036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lin CC, Niu MJ, Li CI, Liu CS, Lin CH, Yang SY, Li TC. Development and validation of a risk prediction model for chronic kidney disease among individuals with type 2 diabetes. Sci Rep. 2022;12:4794. doi: 10.1038/s41598-022-08284-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Levey AS, Eckardt KU, Tsukamoto Y, Levin A, Coresh J, Rossert J, De Zeeuw D, Hostetter TH, Lameire N, Eknoyan G. Definition and classification of chronic kidney disease: a position statement from Kidney Disease: Improving Global Outcomes (KDIGO) Kidney Int. 2005;67:2089–2100. doi: 10.1111/j.1523-1755.2005.00365.x. [DOI] [PubMed] [Google Scholar]
- 43.Levey AS, Eckardt KU, Dorman NM, Christiansen SL, Hoorn EJ, Ingelfinger JR, Inker LA, Levin A, Mehrotra R, Palevsky PM, Perazella MA, Tong A, Allison SJ, Bockenhauer D, Briggs JP, Bromberg JS, Davenport A, Feldman HI, Fouque D, Gansevoort RT, Gill JS, Greene EL, Hemmelgarn BR, Kretzler M, Lambie M, Lane PH, Laycock J, Leventhal SE, Mittelman M, Morrissey P, Ostermann M, Rees L, Ronco P, Schaefer F, St Clair Russell J, Vinck C, Walsh SB, Weiner DE, Cheung M, Jadoul M, Winkelmayer WC. Nomenclature for kidney function and disease: report of a Kidney Disease: Improving Global Outcomes (KDIGO) Consensus Conference. Kidney Int. 2020;97:1117–1129. doi: 10.1016/j.kint.2020.02.010. [DOI] [PubMed] [Google Scholar]
- 44.Hayashi K, Takayama M, Abe T, Kanda T, Hirose H, Shimizu-Hirota R, Shiomi E, Iwao Y, Itoh H. Investigation of Metabolic Factors Associated with eGFR Decline Over 1 Year in a Japanese Population without CKD. J Atheroscler Thromb. 2017;24:863–875. doi: 10.5551/jat.38612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Ataga KI, Zhou Q, Derebail VK, Saraf SL, Hankins JS, Loehr LR, Garrett ME, Ashley-Koch AE, Cai J, Telen MJ. Rapid decline in estimated glomerular filtration rate in sickle cell anemia: results of a multicenter pooled analysis. Haematologica. 2021;106:1749–1753. doi: 10.3324/haematol.2020.267419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Zhang Z, He P, Liu M, Zhou C, Liu C, Li H, Zhang Y, Li Q, Ye Z, Wu Q, Wang G, Liang M, Qin X. Association of Depressive Symptoms with Rapid Kidney Function Decline in Adults with Normal Kidney Function. Clin J Am Soc Nephrol. 2021;16:889–897. doi: 10.2215/CJN.18441120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Grams ME, Brunskill NJ, Ballew SH, Sang Y, Coresh J, Matsushita K, Surapaneni A, Bell S, Carrero JJ, Chodick G, Evans M, Heerspink HJL, Inker LA, Iseki K, Kalra PA, Kirchner HL, Lee BJ, Levin A, Major RW, Medcalf J, Nadkarni GN, Naimark DMJ, Ricardo AC, Sawhney S, Sood MM, Staplin N, Stempniewicz N, Stengel B, Sumida K, Traynor JP, van den Brand J, Wen CP, Woodward M, Yang JW, Wang AY, Tangri N CKD Prognosis Consortium. Development and Validation of Prediction Models of Adverse Kidney Outcomes in the Population With and Without Diabetes. Diabetes Care. 2022;45:2055–2063. doi: 10.2337/dc22-0698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Cohen J. The Cost of Dichotomization. Appl Psychol Meas. 1983;7:249–253. [Google Scholar]
- 49.Altman DG, Royston P. The cost of dichotomising continuous variables. BMJ. 2006;332:1080. doi: 10.1136/bmj.332.7549.1080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Bennette C, Vickers A. Against quantiles: categorization of continuous variables in epidemiologic research, and its discontents. BMC Med Res Methodol. 2012;12:21. doi: 10.1186/1471-2288-12-21. [DOI] [PMC free article] [PubMed] [Google Scholar]