Skip to main content
PLOS One logoLink to PLOS One
. 2020 May 29;15(5):e0233491. doi: 10.1371/journal.pone.0233491

Application of explainable ensemble artificial intelligence model to categorization of hemodialysis-patient and treatment using nationwide-real-world data in Japan

Eiichiro Kanda 1,*,#, Bogdan I Epureanu 2,#, Taiji Adachi 3,#, Yuki Tsuruta 4,#, Kan Kikuchi 5,#, Naoki Kashihara 6,#, Masanori Abe 7,#, Ikuto Masakane 8,#, Kosaku Nitta 9,#
Editor: Kojiro Nagai10
PMCID: PMC7259704  PMID: 32469924

Abstract

Background

Although dialysis patients are at a high risk of death, it is difficult for medical practitioners to simultaneously evaluate many inter-related risk factors. In this study, we evaluated the characteristics of hemodialysis patients using machine learning model, and its usefulness for screening hemodialysis patients at a high risk of one-year death using the nation-wide database of the Japanese Society for Dialysis Therapy.

Materials and methods

The patients were separated into two datasets (n = 39,930, 39,930, respectively). We categorized hemodialysis patients in Japan into new clusters generated by the K-means clustering method using the development dataset. The association between a cluster and the risk of death was evaluated using multivariate Cox proportional hazards models. Then, we developed an ensemble model composed of the clusters and support vector machine models in the model development phase, and compared the accuracy of the prediction of mortality between the machine learning models in the model validation phase.

Results

Average age of the subjects was 65.7±12.2 years; 32.7% had diabetes mellitus. The five clusters clearly distinguished the groups on the basis of their characteristics: Cluster 1, young male, and chronic glomerulonephritis; Cluster 2, female, and chronic glomerulonephritis; Cluster 3, diabetes mellitus; Cluster 4, elderly and nephrosclerosis; Cluster 5, elderly and protein energy wasting. These clusters were associated with the risk of death; Cluster 5 compared with Cluster 1, hazard ratio 8.86 (95% CI 7.68, 10.21). The accuracy of the ensemble model for the prediction of 1-year death was 0.948 and higher than those of logistic regression model (0.938), support vector machine model (0.937), and deep learning model (0.936).

Conclusions

The clusters clearly categorized patient on their characteristics, and reflected their prognosis. Our real-world-data-based machine learning system is applicable to identifying high-risk hemodialysis patients in clinical settings, and has a strong potential to guide treatments and improve their prognosis.

Introduction

The mortality rates of dialysis patients are very high and the number of prevalent end-stage-kidney disease (ESKD) patients has been increasing in the USA and Japan [1, 2]. To improve their prognosis, early identification of patients at a high risk of death, and interventional treatments of their conditions are necessary.

Various risk factors for death in dialysis patients have been identified [35]. These risk factors are associated with each other forming a complex network which should be simultaneously taken into account and controlled [68]. The Dialysis Outcomes and Practice Patterns Study (DOPPS) has defined a survival index to predict the hemodialysis patients’ risk of death using logistic regression models [6]. We also have developed a nutritional risk index (NRI) for hemodialysis patients using Cox proportional hazards models [8]. However, these indices make some statistical assumptions which limit their application; they also take time to calculate, which is inconvenient when dealing with many patients in clinical settings. The development of a new automatic system is needed to help manage various risk factors simultaneously, and to improve the prognosis of a large number of patients.

Artificial intelligence (AI) methods hold great promise for decision-making in complex systems including those used in medicine for diagnosis and prediction [9, 10]. Although AI is useful to accurately diagnose patients at a high risk of death, only a few studies on the prediction of ESKD patients’ prognosis have been carried out [1113]. Difficulties constructing AI algorithms for clinical use have been pointed out, such as the scarce availability of reliable and large data sets for AI algorithm construction, the lack of transparency of conventional AI algorithms, the difficult integration of AI algorithms into complex existing clinical work flow, and the cumbersome compliance with regulatory medical frameworks [14]. Overcoming some or all of these difficulties is required to create a new AI-based system for ESKD patients.

Therefore, in this study, we aim to establish an implementable AI system for screening hemodialysis patients at a high risk of death and for predicting their prognosis on the basis of real-world data from the Japanese Society for Dialysis Therapy (JSDT) Renal Data Registry (JRDR). JRDR is a nationwide-data registry and includes 98.8% of ESKD patients in Japan [1]. To provide transparency and accuracy of AI predictions, an ensemble model composed of the K-means method and a support vector machine (SVM) was developed. Then, the performance of the proposed model was compared with that of a SVM-alone model, a deep learning model, and a multivariate logistic regression model. Moreover, considering their usage and applicability to clinical settings, we developed a new total-care system for treating hemodialysis patients at a high risk of death.

Materials and methods

Dataset

This is a prospective cohort study of maintenance hemodialysis patients using JRDR data. JSDT has been conducting annual surveys of dialysis facilities in Japan since 1968. The JRDR data from 2008 to 2013 were used in this study. This study was approved by the ethics committee of JSDT and was exempt from the need to obtain informed consent from participants (JSDT No. 33). The data were analyzed anonymously. The study was performed in accordance with the relevant guidelines and the Declaration of Helsinki of 1975 as revised in 1983.

The subjects of this study were the 275,553 patients (Fig 1). The exclusion criteria were as follows: patients younger than twenty years; patients on hemodiafiltration, hemofiltration, or peritoneal dialysis; patients with missing values or outlier values of laboratory data; patients who had a limb amputated; and patients with a hemodialysis vintage of less than one year. Thus, 79,860 patients were included in the analysis. The included subjects were randomly classified into two groups to obtain a dataset for the development of the machine learning algorithms (development dataset, 39,930) and a dataset for the validation of the algorithms (validation dataset, 39,930).

Fig 1. Randomization of study population and datasets.

Fig 1

The endpoints were all-cause death within one and five years. The data of the baseline characteristics were as follows: age; gender; diabetes mellitus (DM), chronic glomerulonephritis (CGN), or nephrosclerosis as a cause of ESKD; history of cardiovascular disease (CVD); body mass index (BMI); serum albumin, sodium, potassium, calcium, phosphorus, creatinine, total cholesterol, and C-reactive protein (CRP) levels; hemoglobin level; normalized protein catabolic rate (nPCR); vintage; Kt/V; and ultrafiltration ratio. The laboratory data were measured before hemodialysis, and BMI was calculated using the weight measured after hemodialysis. Nutritional status was evaluated using NRI for hemodialysis patients [8]. NRI is a nutritional screening index used to predict hemodialysis patients’ prognosis, and was developed by JSDT. It is calculated as follows:

Riskscore=lowBMI+lowserumalbuminlevel+abnormalserumtotalcholesterollevel+lowserumcreatininelevel (1)

where the above parameters are defined as followed: low BMI (<20kg/m2), yes = 3, no = 0; low serum albumin level (age <65, <3.7g/dL; age ≥65, <3.5g/dL), yes = 4, no = 0; abnormal serum total cholesterol level, low(<130mg/dL) = 1, high (≥220mg/dL) = 2, no = 0; low serum creatinine level (age <65, male <11.6, female <9.7mg/dL; age ≥65, male <9.6, female <8.0mg/dL), yes = 4, no = 0. Risk of 1-year death: low risk, risk score = 0 to 7; medium risk, score = 8 to 10; high risk, score = 11 and higher.

Statistical analyses

Normally distributed variables are presented as mean±standard deviation; otherwise, the median and interquartile ranges are presented. Highly skewed variables were transformed with the natural logarithm function prior to use in models [ln(vintage), ln(CRP)]. Intergroup comparisons of parameters were performed using the chi-square test, t-test, Mann-Whitney U test, one-way analysis of variance, and the Kruskal-Wallis test as appropriate. These analyses were conducted using SAS version 9.4 (SAS, Inc., NC, USA), R version 3.4.1 (R project for Statistical Computing, Vienna, Austria), and Python version 3.7.4 (Python Software Foundation, DE, USA). Statistical significance was defined as p < 0.05.

Development of machine learning models

The variables of the baseline characteristics were Z-score-normalized and used for the following modeling.

K-means method model

Step 1: Patients were grouped into clusters from 2 to 10 on the basis of their baseline characteristics by the K-means method using the basis of the development dataset. Patients with similar characteristics were grouped in one cluster, and the patients in other clusters showed dissimilar characteristics. First, patients were randomly selected as initial cluster centers. Next, each patient was assigned to one cluster on the basis of the closeness of their characteristics to the cluster center. The mean of samples in a cluster was calculated as the new cluster center, μ. These steps were repeated until the final stable clustering results were obtained. The similarity between a patient x and a center μ in a cluster was evaluated using the Euclidean distance in an m-dimensional space, dist(x,μ):

dist(x,μ)2=j=1m(xjμj)2=xμ2 (2)

where j is the j th variable of the baseline characteristics, m is the number of variables of the baseline characteristics; in this study, m = 20.

Step 2: To evaluate the clustering, the within-cluster sum of squared errors (SSEs), namely, distortion J, was measured:

J=i=1nj=1krijxiμj2 (3)

where μj is the center for cluster j, if xi is in cluster j, rij = 1, else rij = 0, k is the number of clusters, and n is the number of patients.

To use a gradient-based optimizer for J, Eq (3) is partially differentiated by μj to obtain:

Jμj=2i=1nrij(xiμj)=0 (4)
μj=rijxirij (5)

The elbow method was used to identify the number of clusters where the within-cluster SSE decreased rapidly.

Next, to evaluate whether the clusters could discriminate the patients on the basis of their risks of the endpoints, the survival probabilities of the clusters were evaluated using Kaplan-Meier survival curves. The clusters were indicated by numbers on the basis of the risks, and Cox proportional hazards models were evaluated to compare the risk of an endpoint between clusters. The Cox proportional hazards models were developed including only the cluster used as a categorical variable because the K-means method can be considered as a function which was composed of variables of the baseline characteristics:

Clusteri=f(x1,x2,,xm) (6)

Hazard ratio results (HRs) with 95% confidence interval (CI) are presented here.

Step 3: The patients in the validation dataset were grouped into clusters using the K-means method trained using the development dataset. Then, the relationship between the clusters and the risk of the endpoints were evaluated using Kaplan-Meier survival curves, and Cox proportional hazards models. Considering the results, the optimal number of clusters, k, was determined, and the differences in characteristics between the clusters were statistically evaluated.

Multivariate logistic regression model

To predict the probabilities of the endpoints, multivariate logistic regression models (LRMs) including all variables of the baseline characteristics were developed using the development dataset as follows:

log(p1p)=α+i=1mβixi (7)
p=11+exp(i=1mβixi) (8)

where xi is the ith variable of the baseline characteristics, and βi is the parameter estimate for the same variable.

When p was estimated to be more than 0.5, a patient’s death was predicted. Then, using the validation dataset, we evaluated the accuracy of the prediction using the LRMs.

Support vector machine models

SVM models were used to predict the endpoints. SVM models with a Gaussian radial basis function kernel included all of the variables of the baseline characteristics. In the development of each SVM model, classification was examined on the basis of the three-fold cross validation method, and the accuracy of the prediction was estimated by taking the three results. Then, the final SVM models were developed. Using the validation dataset, we evaluated the accuracy of the prediction of the endpoints using the SVM models developed.

Ensemble model

Using the development dataset, we grouped the patients into the k clusters previously determined by the K-means method (Fig 2). Each SVM model including all of the variables of the baseline characteristics for each cluster was trained to predict the risk of the endpoints. And k SVM models, Φ(Cluster i), were developed.

Φ(x)=(φ(Cluster1),φ(Cluster2),,φ(Clusterk)) (9)

where φ(Cluster i) is a SMV model for Cluster i.

Fig 2. Structure of ensemble model.

Fig 2

The patients were grouped into five clusters using the K-means method. Each cluster was analyzed using each SVM model. The results of analysis using SVM models were unified. Abbreviation: SVM, support vector machine.

Then, the patients in the validation dataset were grouped into k clusters, and the trained SVM models were applied to the corresponding clusters. The results of the prediction of endpoints were unified.

Deep learning models

Deep learning models were developed to predict death at 1-year and 5-years of dialysis (1-year and 5-year deaths, respectively). The numbers of layers and hyperparameters were optimized on the basis of the accuracy to predict the endpoints and to prevent overfitting (Figs 3 and 4). In the development of each deep learning model, two-thirds of the development dataset was used as the training dataset and the remaining one-third was used as the test dataset. Then, using the validation dataset, we evaluated the accuracy of the prediction of the endpoints using the deep learning models.

Fig 3. Deep learning model for prediction of 1-year death.

Fig 3

The models for the prediction of 1-year and 5-year deaths were composed of multiple layers: 1 input layer; 2 or 4 hidden layers, respectively; and 1 output layer. Data of a patient were treated as 1 vector of 20 dimensions. Through the hidden layers, the patient’s characteristics were extracted. The dropout rate of each hidden layer was determined appropriately. Adam was used as a learning rate optimization algorithm. ReLUs were used as the activation function of hidden layers, and the logistic activation function was used in the output layer. The performance of a deep learning model was evaluated in terms of accuracy and loss function. The trained model was applied to the validation dataset.

Fig 4. Deep learning model for prediction of 5-year death.

Fig 4

The deep learning model for the prediction of 5-year death was developed similarly to the model for 1-year death. The trained model was applied to the validation dataset.

Evaluation of model performance

The performance of the models developed for the binary diagnosis decision (death or no death) in terms of accuracy, sensitivity, and specificity was evaluated using the validation dataset. Accuracy is calculated as follows:

accuracy=sensitivity×riskofdeath+specificity×(1riskofdeath) (10)
riskofdeath=numberofdeathtotalnumberofthepatients (11)

Because of this method chosen to calculate accuracy, the value of accuracy changes depending on the number of endpoints (the risk of death). Here, given that sensitivity and specificity were constant, we simulated accuracy at various risks of death from 0.05 to 0.65, and compared the accuracies of the models.

Results

Baseline characteristics

The baseline characteristics including biochemical data are shown in Table 1. No statistically significant differences in the baseline characteristics between the development and validation datasets were observed. Machine learning models were constructed (Fig 5).

Table 1. Baseline characteristics.

All Development dataset Validation dataset p
N 79,860 39,930 39,930
Age (years) 65.7 ± 12.2 65.7 ± 12.2 65.6 ± 12.2 0.28
Male (%) 49084 (61.5) 24537 (61.5) 24547 (61.5) 0.95
DM (%) 26154 (32.7) 13200 (33.1) 12954 (32.4) 0.065
CGN (%) 31758 (39.8) 15829 (39.6) 15935 (39.9) 0.45
Nephrosclerosis (%) 5343 (6.7) 2673 (6.7) 2670 (6.7) 0.98
CVD (%) 14998 (18.8) 7530 (18.9) 7468 (18.7) 0.58
BMI (kg/m2) 21.2 ± 3.4 21.2 ± 3.4 21.2 ± 3.4 0.24
Albumin (g/dL) 3.7 ± 0.4 3.7 ± 0.4 3.7 ± 0.4 0.99
Sodium (mEq/L) 138.9 ± 3.2 138.9 ± 3.2 138.9 ± 3.2 0.44
Potassium (mEq/L) 5 ± 0.8 5 ± 0.8 5 ± 0.8 0.78
Calcium (mg/dL) 9.3 ± 0.8 9.3 ± 0.8 9.3 ± 0.8 0.65
Phosphorus (mg/dL) 5.3 ± 1.4 5.3 ± 1.4 5.3 ± 1.4 0.88
Creatinine (mg/dL) 10.7 ± 2.8 10.7 ± 2.8 10.7 ± 2.8 0.79
Total cholesterol (mg/dL) 153.7 ± 34.5 153.4 ± 34.4 154 ± 34.6 0.052
CRP (mg/dL) 0.49 ± 1.42 0.11 (0.05, 0.34) 0.5 ± 1.41 0.11 (0.05, 0.34) 0.49 ± 1.43 0.11 (0.05, 0.34) 0.41
Hemoglobin (g/dL) 10.4 ± 1.2 10.4 ± 1.2 10.4 ± 1.2 0.92
nPCR (g/kg/day) 0.88 ± 0.17 0.88 ± 0.17 0.88 ± 0.17 0.91
Vintage (years) 8.3 ± 6.7 6.2 (3.3, 11.1) 8.2 ± 6.7 6.2 (3.3, 11.1) 8.3 ± 6.7 6.3 (3.3, 11.1) 0.13
Kt/V 1.4 ± 0.3 1.4 ± 0.3 1.4 ± 0.3 0.68
Ultrafiltration (%) 4.4 ± 1.8 4.4 ± 1.8 4.4 ± 1.8 0.062
1-year death (%) 5234 (6.6) 2649 (6.7) 2585 (6.5) 0.37
5-year death (%) 25410 (31.8) 12709 (31.8) 12701 (31.8) 0.96

Variables are expressed as mean±standard deviation. Vintage and CRP are also shown as median and interquartile range. Intergroup comparisons of parameters were performed using the chi-square test, t-test, and the Mann-Whitney U test as appropriate.

Abbreviations: DM, diabetes mellitus as a cause of end-stage renal disease; CGN, chronic glomerulonephritis; CVD, cardiovascular disease; BMI, body mass index; CRP, C-reactive protein; nPCR, normalized protein catabolic rate.

Fig 5. Development of machine learning models and comparison of their performance.

Fig 5

This study consisted of the model development phase and model validation phase. In the model development phase, the K-means method model, SVM model, ensemble model, DL model, and LRM were developed. In the model validation phase, the performances of the models and NRI were compared. Abbreviations: SVM, support vector machine; DL, deep learning; LRM, logistic regression model; NRI, nutritional risk index.

K-means method models

The K-means method was conducted, and the models with 2 to 10 clusters were developed. The elbow method showed decreasing in within-cluster SSE with increasing numbers of clusters (Fig 6). Five and six clusters were chosen as candidate numbers of clusters.

Fig 6. Elbow method.

Fig 6

Elbow method shows the relationship between within-cluster SSE and number of clusters. Abbreviation: within-cluster SSE, within-cluster sum of squared errors.

The Kaplan-Meier survival curves showed the relationship between the numbers of clusters and the risk of death (Figs 7 and 8). The five-cluster model clearly distinguished the patients on the basis of the risk of 1-year and 5-year deaths both in the development and validation datasets (Figs 7A, 7B, 8A and 8B); (Tables 2A and 3A). Cluster 5 showed the highest risks of 1-year and 5-year deaths.

Fig 7. Association between number of clusters and 1-year mortality.

Fig 7

A. Five clusters in the development dataset. B. Five clusters in the validation dataset. C. Six clusters in the development dataset. D. Six clusters in the validation dataset. The Kaplan-Meier survival curves show that the low-risk group had the highest survival probability in both datasets.

Fig 8. Association between number of clusters and 5-year mortality.

Fig 8

A. Five clusters in the development dataset. B. Five clusters in the validation dataset. C. Six clusters in the development dataset. D. Six clusters in the validation dataset. Kaplan-Meier survival curves show that the low-risk group had the highest survival probability for the both datasets.

Table 2. Clusters and risk of 1-year death.

A
Cluster number Development dataset Validation dataset
1 Reference Reference
2 1.36 (1.16, 1.6) 1.58 (1.33, 1.88)
3 1.83 (1.58, 2.13) 2.22 (1.9, 2.61)
4 3.51 (2.95, 4.18) 4.36 (3.64, 5.24)
5 7.14 (6.27, 8.14) 8.86 (7.68, 10.21)
B
Cluster number Development dataset Validation dataset
1 Reference Reference
2 1.87 (1.53, 2.28) 1.58 (1.32, 1.87)
3 2.55 (2.12, 3.08) 1.51 (1.08, 2.11)
4 2.89 (2.39, 3.5) 2.22 (1.89, 2.6)
5 4.62 (3.75, 5.69) 7.2 (5.95, 8.71)
6 12.76 (10.72, 15.18) 8.79 (7.62, 10.13)

A. Five clusters.

B. Six clusters.

Values are HRs with 95% CIs of Clusters 2 to 5 compared with Cluster 1.

Abbreviations: HR, hazard ratio; CI, confidence interval.

Table 3. Clusters and risk of 5-year death.

A
Cluster number Development dataset Validation dataset
1 Reference Reference
2 1.36 (1.27, 1.45) 1.34 (1.25, 1.42)
3 1.98 (1.87, 2.1) 2.04 (1.92, 2.16)
4 2.79 (2.59, 3.01) 2.8 (2.59, 3.01)
5 5.05 (4.78, 5.33) 4.9 (4.64, 5.18)
B
Cluster number Development dataset Validation dataset
1 Reference Reference
2 1.92 (1.77, 2.08) 1.34 (1.25, 1.42)
3 2.91 (2.7, 3.13) 1.57 (1.4, 1.77)
4 3.11 (2.88, 3.35) 2.03 (1.92, 2.15)
5 3.92 (3.59, 4.28) 4.28 (3.94, 4.66)
6 8.82 (8.2, 9.49) 4.9 (4.63, 5.17)

A. Five clusters.

B. Six clusters.

Values are HRs with 95% CIs of Clusters 2 to 5 compared with the cluster 1.

Abbreviations: HR, hazard ratio; CI, confidence interval.

In contrast, the six-cluster model showed that the rank of the clusters based on the risk of 1-year death in the development dataset was different from the rank in the validation dataset (Table 2B). Although the risk of 1-year death of Cluster 2 (HR, 1.87) was lower than that of Cluster 3 (HR, 2.55) in the development dataset, the risk of Cluster 2 (HR, 1.58) was higher than that of Cluster 3 (HR, 1.51). Moreover, Cluster 6 showed the highest risk of 5-year death in the development dataset (Table 3B). However, in the validation dataset, the risk of Cluster 5 was very close to that of Cluster 6 (Table 3B); (Fig 8C and 8D), which suggests that the six-cluster model might be unreliable in reflecting the patients’ prognosis depending on the patient data. Therefore, considering the stability of the accuracy of the five-cluster model in reflecting the patients’ prognosis, k = 5 was considered appropriate for the model, and the five-cluster model was hereafter adopted.

Difference in the characteristics among five clusters

The five-cluster model could cluster the patients on the basis of their characteristics (Table 4). The mean ages of Clusters 4 and 5 were older than those of other groups. A gender difference was observed; most of the patients in Cluster 1 were males (94.2%), and those in Cluster 2 were females (92.3%). There were also significant differences in the causes of ESKD between the groups as follows: Clusters 1 and 2, CGN (74.6%, and 60.3%, respectively); Cluster 3, DM (93.3%); Cluster 4, nephrosclerosis (100%). In Cluster 5, the numbers of patients with DM and CGN were almost the same as the mean numbers in the study population (Tables 1 and 4). Moreover, the numbers of patients who had a history of CVD were larger in Clusters 3 to 5 than in Clusters 1 and 2.

Table 4. Baseline characteristics of clusters in validation dataset.

1 2 3 4 5 p
N 10358 (25.9) 8935 (22.4) 10266 (25.7) 2660 (6.7) 7711 (19.3)
Age (years) 59.4 ± 12.2 64.6 ± 11.6 64.7 ± 10.5 72.8 ± 11.3 74 ± 9 <0.0001
Male (%) 9755 (94.2) 692 (7.7) 7675 (74.8) 1708 (64.2) 4717 (61.2) <0.0001
DM (%) 29 (0.3) 696 (7.8) 9579 (93.3) 0 (0) 2680 (34.8) <0.0001
CGN (%) 7723 (74.6) 5391 (60.3) 28 (0.3) 0 (0) 2793 (36.2) <0.0001
Nephrosclerosis (%) 1 (0) 0 (0) 0 (0) 2660 (100) 9 (0.1) <0.0001
CVD (%) 1201 (11.6) 936 (10.5) 2247 (21.9) 622 (23.4) 2462 (31.9) <0.0001
BMI (kg/m2) 21.8 ± 3.1 19.7 ± 2.8 22.8 ± 3.5 21.1 ± 3.4 20 ± 3.1 <0.0001
Albumin (g/dL) 3.9 ± 0.3 3.8 ± 0.3 3.8 ± 0.3 3.7 ± 0.4 3.3 ± 0.4 <0.0001
Sodium (mEq/L) 139.4 ± 2.9 139.4 ± 3.0 138.6 ± 3.1 139 ± 3.2 138.2 ± 3.8 <0.0001
Potassium (mEq/L) 5.3 ± 0.7 5.2 ± 0.7 5.1 ± 0.8 4.9 ± 0.8 4.4 ± 0.7 <0.0001
Calcium (mg/dL) 9.4 ± 0.8 9.4 ± 0.8 9.1 ± 0.7 9.3 ± 0.8 9.4 ± 0.8 <0.0001
Phosphorus (mg/dL) 5.8 ± 1.3 5.3 ± 1.3 5.5 ± 1.3 5.1 ± 1.3 4.3 ± 1.1 <0.0001
Creatinine (mg/dL) 13.2 ± 2.3 10.2 ± 1.8 10.6 ± 2.3 10 ± 2.8 8.1 ± 2.2 <0.0001
Total cholesterol (mg/dL) 146.5 ± 30.5 171 ± 34.6 152 ± 33.9 155.5 ± 34.2 146.4 ± 34.1 <0.0001
CRP (mg/dL) 0.3 ± 0.9 0.1 (0.1, 0.3) 0.3 ± 0.9 0.1 (0, 0.2) 0.3 ± 1.0 0.1 (0.1, 0.3) 0.6 ± 1.5 0.1 (0.1, 0.4) 1.2 ± 2.5 0.3 (0.1, 1.1) <0.0001
Hemoglobin (g/dL) 10.8 ± 1.2 10.4 ± 1.1 10.6 ± 1.1 10.4 ± 1.2 9.9 ± 1.3 <0.0001
nPCR (g/kg/day) 0.9 ± 0.2 1 ± 0.2 0.9 ± 0.2 0.9 ± 0.2 0.7 ± 0.1 <0.0001
Vintage (years) 10.8 ± 7.3 9 (5.1, 14.8) 11.2 ± 7.6 9.4 (5.3, 15.6) 4.9 ± 3.3 4.1 (2.4, 6.7) 5.5 ± 4.1 4.3 (2.6, 7.2) 7.1 ± 6.2 5.1 (2.8, 9.3) <0.0001
Kt/V 1.4 ± 0.2 1.7 ± 0.3 1.3 ± 0.2 1.4 ± 0.3 1.4 ± 0.3 <0.0001
Ultrafiltration (%) 4.5 ± 1.5 4.9 ± 1.7 4.4 ± 1.6 4.2 ± 1.8 3.6 ± 2.2 <0.0001
NRI <0.0001
Low risk (%) 9220 (89.0) 6531 (73.1) 8321 (81.1) 1812 (68.1) 2820 (36.6)
Medium risk (%) 1003 (9.7) 1927 (21.6) 1713 (16.7) 597 (22.4) 2839 (36.8)
High risk (%) 135 (1.3) 477 (5.3) 232 (2.3) 251 (9.4) 2052 (26.6)

Variables are expressed as mean±standard deviation. Vintage and CRP are also shown as median and interquartile range. Intergroup comparisons of parameters were performed using the chi-square test, t-test, and the Mann-Whitney U test as appropriate.

Abbreviations: DM, diabetes mellitus as a cause of end-stage renal disease; CGN, chronic glomerulonephritis; CVD, cardiovascular disease; BMI, body mass index; CRP, C-reactive protein; nPCR, normalized protein catabolic rate; NRI, nutritional risk index.

There were significant differences in the laboratory data among the clusters. Serum albumin and potassium levels gradually decreased with increasing in clusters number. The serum phosphorus, and creatinine levels; and nPCR in Cluster 5 were lower than those in the other groups. The number of patients with high and medium risk of NRIs were larger in Clusters 4 and 5 than in the other clusters. The CRP levels in Clusters 4 and 5 were higher than those in other groups.

The risk of all-cause death in Cluster 5 was higher than those in the other groups (Table 5). The trends similar to all-cause death were observed in the risks of CVD- and infection-caused deaths. The details of 5-year death were as follows. The proportions of CVD-caused death in 5-year death were Cluster 1, 27.3%; Cluster 2, 27.4%; Cluster 3, 26.5%; Cluster 4, 27.4%; and Cluster 5 29.8% (p<0.0001). And those of infection-caused death were Cluster 1, 5.3%; Cluster 2, 4.6%; Cluster 3, 3.9%; Cluster 4, 6.6%; and Cluster 5, 8.0% (p<0.0001).

Table 5. Number of endpoints in validation dataset.

1 2 3 4 5 p
N 10358 8935 10266 2660 7711
1-year death (%) 221 (2.1) 299 (3.3) 482 (4.7) 240 (9) 1343 (17.4) <0.0001
5-year death (%) 1784 (17.2) 1993 (22.3) 3299 (32.1) 1084 (40.8) 4541 (58.9) <0.0001
Cause of 5-year death
CVD caused death (%) 487 (4.7) 547 (6.1) 875 (8.5) 297 (11.2) 1354 (17.6) <0.0001
Infection-caused death (%) 94 (0.9) 91 (1.0) 130 (1.3) 72 (2.7) 362 (4.7) <0.0001
Other-cause death (%) 1203 (11.6) 1355 (15.2) 2294 (22.3) 715 (26.9) 2825 (36.6) <0.0001

The values are number of deaths (%).

Abbreviations: CVD, cardiovascular disease.

Performance of models to predict death

The five-cluster model had four cutoff points. The accuracies of predicting death on the basis of these cutoff points were compared with those of the LRM, SVM model, ensemble model, deep learning model, and the high-risk group of NRI (Fig 9). The accuracies of predicting 1-year and 5-year deaths using LRM (0.938, 0.759), SVM model (0.937, 0.758), ensemble model (0.948, 0.755), and deep learning model (0.936, 0.756) were higher than those using the five-cluster model with Clusters 4 as the cutoff point (0.809, 0.716). The ensemble model showed the highest accuracy to predict 1-year death. The accuracies of predicting 5-year death using LRM, SVM model, ensemble model, and deep learning model were almost the same, and were higher than those using the five-cluster models and NRI.

Fig 9. Accuracies of predicting 1-year and 5-year deaths.

Fig 9

A. 1-year death. B. 5-year death. The accuracies of the models were evaluated using the validation dataset. Abbreviations: 5 Cls 1, the cutoff point was the first cluster of Cluster 5; LRM, multivariate logistic regression model; SVM, support vector machine; Ensemble, ensemble model of K-means method and SVM models; DL, deep learning model; NRI, nutritional risk index.

The estimated accuracies of the models decreased with increasing risks of 1-year and 5-year deaths (Fig 10). The lines of the accuracies of predicting the risk of 1-year death crossed at 0.3 of the risk of 1-year death (Fig 10A). The accuracies for the risk of 1-year death using the LRM, SVM model, ensemble model, and deep learning model showed similar patterns, and were more than 0.9 at 0.1 of the risk of 1-year death, which was higher than those using the five-cluster model. Moreover, for the prediction of 5-year death, the accuracies of the LRM, SVM model, ensemble model, and deep learning model were higher than that of the five-cluster model (Fig 10B). The accuracies of the deep learning model, LRM, SVM, and ensemble model were almost the same, more than 0.7, at which the interval of 5-year death was about 0.4.

Fig 10. Accuracies and risk of 1-year and 5-year deaths.

Fig 10

A. 1-year death. B. 5-year death. The accuracies of the model were calculated at 1year and 5-year death. Abbreviations: 5 Cls, the cutoff point was the fourth cluster of clusters; LRM, multivariate logistic regression model; SVM, support vector machine; ensemble, ensemble model of K-means method and SVM models; DL, deep learning model; NRI, nutritional risk index.

The sensitivities and specificities of the models showed a negative relationship at different cutoff points for clusters (Fig 11). To predict both 1- and 5-year deaths, the five-cluster model with Cluster 1 as the cutoff point showed higher sensitivities (0.915, 0.860) than the other models. The sensitivities of the LRM, SVM model, ensemble model, deep learning model, and NRI were low. On the other hand, the specificities of the five-cluster model with Cluster 4 as the cutoff point were high (1-year death, 0.829; 5-year death, 0.884), but lower than those of other models (LRM, 0.996, 0.890; SVM model, 0.999, 0.919; ensemble model, 0.999, 0.910; deep learning model, 0.998, 0.855; NRI, 0.937, 0.961, for 1- and 5-year deaths, respectively).

Fig 11. Sensitivities and specificities of models to predict risks of 1-year and 5-year deaths.

Fig 11

A. 1-year death. B. 5-year death. Abbreviations: 5 Cls 1, the cutoff point was the first cluster of five clusters; LRM, multivariate logistic regression model; SVM, support vector machine; ensemble, ensemble model of K-means method and SVM models; DL, deep learning model; NRI, nutritional risk index.

Total-care system for hemodialysis patients

Considering the characteristics of the machine learning models, for our system, we adopted an ensemble model with the K-means method and SVM for use in clinical settings. Our recommended system is as follows (Fig 12): After clustering, the patients in clusters 1 to 3 are followed up periodically, because some of them may be classified in Cluster 4 or 5 in the future. Then, the patients in Clusters 4 and 5 are examined using the SVM models. If they are diagnosed to be at a high risk, they undergo detailed medical examinations. If diseases or aggravation of comorbid conditions are diagnosed, intervention and therapy are provided. If not, they are followed up as high-risk patients more frequently and thoroughly than the patients in Clusters 1 to 3.

Fig 12. Diagnostic system using ensemble model.

Fig 12

Our system is as follows: a. Extract high-risk patients (Clusters 4 and 5) on the basis of sensitivity, specificity, and characteristics of the clusters. b. Examine in detail patients in Clusters 4 and 5 using SVM models, and routinely follow up depending on their risk of death. The blue part can be examined using the ensemble model. Abbreviation: SVM, support vector machine models.

Discussion

There are various types of machine learning, whose mechanisms cannot be fully understood by humans, and are called black boxes. Thus, an explainable machine learning model has been studied. Among the types of machine learning, K-means is based on the least square method, and is more understandable than other models. Moreover, SVM can be used to predict patients’ prognosis. In this work, we developed an explainable ensemble model for the prediction of patients’ prognosis, which was composed of K-means and SVM. Hemodialysis patients were categorized into five clusters by the K-means method on their basis of baseline characteristics, which reflected the risk of death. Then, we developed machine learning and statistical models, and compared their performances. The ensemble model of the K-means method and SVM showed the highest accuracy of the prediction of death. Although some studies showed a high accuracy of the prediction of dialysis patients’ death using machine learning models, the internal structures of the models were difficult to understand [1113]. There is a tradeoff relationship between the accuracy of prediction and the transparency of algorithms [14]. We attempted to achieve a balance by developing a blended system, which we found useful for identifying patients at a high risk of death, and which was easily applicable to clinical settings.

The International Society of Renal Nutrition and Metabolism proposed an algorithm for the nutritional management and support of chronic kidney disease patients [15]. In the algorithm, multiple nutritional examinations, such as measurement of dietary nutritional intakes, subjective global assessment, and anthropometrics, are recommended [15]. However, it is difficult for all of these nutritional examination results to be digitized and evaluated by machine learning models. Moreover, a systematic review of the studies of the data-driven population segmentation analysis pointed out that a perfect diagnosis is not always guaranteed; and the review suggested the importance of assessing the segmentation outcome with a combination of statistical reasoning, clinical judgement, and policy implication [16]. Therefore, we did not leave the entire diagnosis to be performed by a machine learning system, and instead developed the ensemble model as part of the medical system. The ensemble model and detailed medical examinations can complement each other, which enhances the robustness of this system.

According to the JSDT annual report in 2015, the mean age of Japanese dialysis patients was 67.86 years, 64.3% were male, and the causes of ESKD were DM (38.4%), CGN (29.8%), nephrosclerosis (9.5%) [1]. Considering these basic statistics, our system could divide the patients into the five clusters reflecting their baseline characteristics (Table 6). These characteristics were risk factors for death in their prognosis [3, 4, 6]. For example, the risks of all-cause death, CVD- and infection-caused deaths in Cluster 5 were higher than those in other clusters. And Cluster 5 showed lower serum albumin and creatinine levels and lower nPCR, which are nutritional factors, than the other clusters, and included a large number of patients with high and medium risks of NRI of 26.6% and 36.8%, respectively. Moreover, a high serum CRP level, which indicates inflammation, was also observed in Cluster 5. Inflammation is often observed in ESKD patients with malnutrition, and this complex state of malnutrition and inflammation is called protein energy wasting (PEW) [5]. PEW causes CVD which is a risk factor for death [5, 8]. The classification of an elderly patient with PEW into Cluster 5 indicates that the treatment of PEW should be of the highest priority.

Table 6. Specific characteristics of clusters.

Cluster number Specific characteristics
1 Young, male, CGN
2 Female, CGN
3 DM
4 Elderly, nephrosclerosis
5 Elderly, CVD, malnutrition, inflammation

Abbreviations: CGN, chronic glomerulonephritis as a cause of end-stage renal disease; DM, diabetes mellitus as a cause of end-stage renal disease; CVD, cardiovascular disease.

Our system could clearly distinguish patients with DM (Cluster 3) or nephrosclerosis (Cluster 4) from those with other conditions. The patients in Cluster 4 showed a higher risk of death than those in Cluster 3. What factor made this difference? Both DM and aging are the main causes of CVD in hemodialysis patients [17]. According to a systematic review, they are risk factors for all-cause and CVD-caused deaths [18]. In our study, no clear differences were observed in the other risk factors reported in the systematic review, such as history of CVD, BMI, hemoglobin level, serum albumin, and CRP levels, between Clusters 3 and 4 [18]. The only factors different between these clusters were the causes of ESKD and age; patients in Cluster 4 were about 8 years older than those in Cluster 3. DOPPS showed no statistically significant difference in mortality rate between patients with DM and hypertension [19]. It is possible that age itself might have caused the survival difference. DM has been the leading cause of ESKD in Japan, and the number of dialysis patients with DM has been stable over the past few years [1]. In contrast, nephrosclerosis is caused by aging and hypertension, and the number of patients with nephrosclerosis has been increasing with the aging of the population in Japan [1]. Elderly patients with nephrosclerosis should be paid more attention, because they are at a high risk of death, and will be a majority among dialysis patients in the near future.

Similar to our study, a cohort study of the health care system in Singapore showed a relationship among K-means clusters, healthcare utilization pattern, and mortality [20]. Why do the clusters obtained by the K-means method reflect the patients’ prognosis in the Singapore study and our study? The cluster centers were obtained using Eq (5). μj is a vector equal to the mean of all data of patients in Cluster j. That is, patients in Cluster j are distributed in an m-dimensional sphere with the center at μj. In this study, the number of clusters was determined by the links with the risk of death as an important true endpoint, which showed that μj was strongly associated with risk of death. On the basis of these theoretical backgrounds, each cluster had specific characteristics of risk factors for death, such as gender, causes of ESKD, and PEW (Table 6). In the risk prediction models using standard statistics, the variables are often arbitrarily selected, whereas in machine learning, patients’ features are extracted from their numerical data, even though a human does not provide sufficient information. There is a possibility that this feature extraction can clarify the new pathophysiological characteristics of diseases. For example, the five clusters in this study, which had different numerical features, may have different courses of change in their body condition after dialysis initiation. Thus, new unknown research seeds will be mined by machine learning.

The performance of machine learning is often evaluated by the accuracy of classification. When using the validation data, the ensemble model showed a higher accuracy of the prediction of death than other models. The analysis of machine learning models, e.g., SVM and deep learning models, is a black box [21]. Because our ensemble model was composed of the K-means method and SVM model, this combined system of classification and prediction made the results interpretable with high accuracy, and closely matched the clinical decision-making process. The practical applications of this kind of machine learning model have never been reported.

Because accuracy is determined by the incident number of events, it changes with the composition of the sample population. Thus, we evaluated the changes in the accuracies of the models with the changes in the risk of death. The risks of 1-year death in Japan and USA are 9.6% and 13.4%, and those of 5-year death in Japan, Italy, and USA are 39.5%, 44.4%, and 58%, respectively [1, 2, 22, 23]. In the simulation using these populations, the machine learning models could show high accuracies, and effectively predicted the prognosis of ESKD patients.

The classification performance of diagnostic tests is commonly evaluated in terms of sensitivity and specificity. The machine learning models in this study showed their high specificity to predict 1-year and 5-year deaths. High specificity means that the models have a small number of false-positive patients. That is, when a patient is diagnosed to be positive for a risk by the models, the possibility of the presence of a disease is high. Therefore, it could be said that the diagnosis obtained using the models with high specificities is useful to confirm the diagnosis. On the other hand, because the sensitivities of SVM and deep learning models were low, they were not appropriate for screening high-risk patients. The sensitivities of the K-means method using clusters were higher than those of the other models. The clusters might be useful for identifying the high-risk patients.

Our system is applicable to clinical settings in the context of its limitations. First, in this study, JRDR data were used. This data were obtained from 98.8% of dialysis patients in Japan, reflecting the real-world of dialysis patients in Japan. Because our system was developed using these data, its accuracy for Japanese or Asian patients is high, but the results using data from other countries might be biased by the sampling of patients. Second, we did not include patients with missing data in this study, which might cause a selection bias. Third, the JRDR data did not include sufficient data for assessing malnutrition, blood pressure, comorbid conditions, and medications. And, we were unable to evaluate the effects of the differences in the baseline characteristics such as dietary intake; comorbid conditions such as DM and hypertension; and medications such as hypoglycemic and antihypertensive medicines on the clustering. Further studies are needed to evaluate the relationship between these factors and clustering. Thus, such data would improve the accuracy of the models.

Conclusions

We developed a novel system using machine learning algorithms that analyzes hemodialysis patients’ data, categorizes the patients on the basis of their characteristics, and identifies patients at a high risk of death. The new approach has a strong potential to guide treatments and improve hemodialysis patients’ prognosis.

Acknowledgments

The data reported here have been provided by JSDT. The interpretation and reporting of these data are the responsibility of the authors and in no way seen as an official policy or interpretation of the JSDT.

Data Availability

Data cannot be made publicly available by the authors, as they are owned by the Japanese Society for Dialysis Therapy. Interested readers may request the data at the following URL: http://www.jsdt.or.jp/jsdt/1761.html.

Funding Statement

This work was supported by Japan Society for the Promotion of Science (KAKENHI Grant Number JP 19K08740) to EK. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Masakane I, Taniguchi M, Nakai S, Tsuchida K, Goto S, Wada A, et al. Annual Dialysis Data Report 2015, JSDT Renal Data Registry. Renal Replacement Therapy. 2018;4(19):1–99. [Google Scholar]
  • 2.Saran R, Robinson B, Abbott KC, Agodoa LYC, Bragg-Gresham J, Balkrishnan R, et al. US Renal Data System 2018 Annual Data Report: Epidemiology of Kidney Disease in the United States. Am J Kidney Dis. 2019;73(3S1):A7–A8. Epub 2019/02/21. 10.1053/j.ajkd.2019.01.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Robinson BM, Akizawa T, Jager KJ, Kerr PG, Saran R, Pisoni RL. Factors affecting outcomes in patients reaching end-stage kidney disease worldwide: differences in access to renal replacement therapy, modality use, and haemodialysis practices. Lancet. 2016;388(10041):294–306. Epub 2016/05/22. 10.1016/S0140-6736(16)30448-2 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bradbury BD, Fissell RB, Albert JM, Anthony MS, Critchlow CW, Pisoni RL, et al. Predictors of early mortality among incident US hemodialysis patients in the Dialysis Outcomes and Practice Patterns Study (DOPPS). Clin J Am Soc Nephrol. 2007;2(1):89–99. 10.2215/CJN.01170905 . [DOI] [PubMed] [Google Scholar]
  • 5.Fouque D, Kalantar-Zadeh K, Kopple J, Cano N, Chauveau P, Cuppari L, et al. A proposed nomenclature and diagnostic criteria for protein-energy wasting in acute and chronic kidney disease. Kidney Int. 2008;73(4):391–8. 5002585 [pii] 10.1038/sj.ki.5002585 . [DOI] [PubMed] [Google Scholar]
  • 6.Kanda E, Bieber BA, Pisoni RL, Robinson BM, Fuller DS. Importance of simultaneous evaluation of multiple risk factors for hemodialysis patients' mortality and development of a novel index: dialysis outcomes and practice patterns study. PLoS One. 2015;10(6):e0128652 Epub 2015/06/01. 10.1371/journal.pone.0128652 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kanda E, Tsuruta Y, Kikuchi K, Masakane I. Use of vasopressor for dialysis-related hypotension is a risk factor for death in hemodialysis patients: Nationwide cohort study. Sci Rep. 2019;9(1):3362 Epub 2019/03/04. 10.1038/s41598-019-39908-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kanda E, Kato A, Masakane I, Kanno Y. A new nutritional risk index for predicting mortality in hemodialysis patients: Nationwide cohort study. PLoS One. 2019;14(3):e0214524 Epub 2019/03/28. 10.1371/journal.pone.0214524 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Obermeyer Z, Emanuel EJ. Predicting the Future—Big Data, Machine Learning, and Clinical Medicine. N Engl J Med. 2016;375(13):1216–9. 10.1056/NEJMp1606181 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56. Epub 2019/01/07. 10.1038/s41591-018-0300-7 . [DOI] [PubMed] [Google Scholar]
  • 11.Akbilgic O, Obi Y, Potukuchi PK, Karabayir I, Nguyen DV, Soohoo M, et al. Machine Learning to Identify Dialysis Patients at High Death Risk. Kidney Int Rep. 2019;4(9):1219–29. Epub 2019/06/22. 10.1016/j.ekir.2019.06.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Mezzatesta S, Torino C, Meo P, Fiumara G, Vilasi A. A machine learning-based approach for predicting the outbreak of cardiovascular diseases in patients on dialysis. Comput Methods Programs Biomed. 2019;177:9–15. Epub 2019/05/13. 10.1016/j.cmpb.2019.05.005 . [DOI] [PubMed] [Google Scholar]
  • 13.Jacob AN, Khuder S, Malhotra N, Sodeman T, Gold JP, Malhotra D, et al. Neural network analysis to predict mortality in end-stage renal disease: application to United States Renal Data System. Nephron Clin Pract. 2010;116(2):c148–58. Epub 2010/06/01. 10.1159/000315884 . [DOI] [PubMed] [Google Scholar]
  • 14.He J, Baxter SL, Xu J, Zhou X, Zhang K. The practical implementation of artificial intelligence technologies in medicine. Nat Med. 2019;25(1):30–6. Epub 2019/01/07. 10.1038/s41591-018-0307-0 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ikizler TA, Cano NJ, Franch H, Fouque D, Himmelfarb J, Kalantar-Zadeh K, et al. Prevention and treatment of protein energy wasting in chronic kidney disease patients: a consensus statement by the International Society of Renal Nutrition and Metabolism. Kidney Int. 2013;84(6):1096–107. 10.1038/ki.2013.147 . [DOI] [PubMed] [Google Scholar]
  • 16.Yan S, Kwan YH, Tan CS, Thumboo J, Low LL. A systematic review of the clinical application of data-driven population segmentation analysis. BMC Med Res Methodol. 2018;18(1):121 Epub 2018/11/03. 10.1186/s12874-018-0584-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Cozzolino M, Mangano M, Stucchi A, Ciceri P, Conte F, Galassi A. Cardiovascular disease in dialysis patients. Nephrol Dial Transplant. 2018;33(suppl_3):iii28–iii34. 10.1093/ndt/gfy174 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ma L, Zhao S. Risk factors for mortality in patients undergoing hemodialysis: A systematic review and meta-analysis. Int J Cardiol. 2017;238:151–8. Epub 2017/02/22. 10.1016/j.ijcard.2017.02.095 . [DOI] [PubMed] [Google Scholar]
  • 19.Bradbury B, Fissell R, Albert J, Anthony M, Critchlow C, Pisoni R, et al. Predictors of early mortality among incident US hemodialysis patients in the Dialysis Outcomes and Practice Patterns Study (DOPPS). Clin J Am Soc Nephrol. 2007;2(1):89–99. CJN.01170905 [pii] 10.2215/CJN.01170905 . [DOI] [PubMed] [Google Scholar]
  • 20.Low LL, Yan S, Kwan YH, Tan CS, Thumboo J. Assessing the validity of a data driven segmentation approach: A 4 year longitudinal study of healthcare utilization and mortality. PLoS One. 2018;13(4):e0195243 Epub 2018/04/05. 10.1371/journal.pone.0195243 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Castelvecchi D. Can we open the black box of AI? Nature. 2016;538(7623):20–3. 10.1038/538020a . [DOI] [PubMed] [Google Scholar]
  • 22.Masakane I, Nakai S, Ogata S, Kimata N, Hanafusa N, Hamano T, et al. Annual Dialysis Data Report 2014 JSDT Renal Data Registry (JRDR). 2017;3(18):1–43. [Google Scholar]
  • 23.Nordio M, Limido A, Maggiore U, Nichelatti M, Postorino M, Quintaliani G, et al. Survival in patients treated by long-term dialysis compared with the general population. Am J Kidney Dis. 2012;59(6):819–28. Epub 2012/02/22. 10.1053/j.ajkd.2011.12.023 . [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Kojiro Nagai

22 Apr 2020

PONE-D-20-09660

Application of explainable ensemble artificial intelligence model to categorization of hemodialysis-patient and treatment using nationwide-real-world data in Japan

PLOS ONE

Dear Professor Kanda,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The system the authors proposed is promising, but there is some questions remained to clarify the importance. Please read the reviewer's comments and let us know your opinion.

In addition, the journal staff provided the comment as follows.

Whether it meets PLOS ONE criteria for papers that describe new methods or software for applications? Specifically these reports must meet the criteria of utility validation and availability which are described in detail at http://journals.plos.org/plosone/s/submission-guidelines#loc-methods-software-databases-and-tools.

Please include the reply to the comment when you send the revised manuscript.

We would appreciate receiving your revised manuscript by Jun 06 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Kojiro Nagai

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

3. Thank you for stating the following financial disclosure:

'The funders had no role in study design, data collection and analysis, decision to

publish, or preparation of the manuscript.'

At this time, please address the following queries:

  1. Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution.

  2. State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

  3. If any authors received a salary from any of your funders, please state which authors and which funders.

  4. If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: General comments

This manuscript described the construction and evaluation of artificial intelligence model to categorization of hemodialysis-patient for survival rate using nationwide-real-world data in Japan. This study contains some novel factors. However, there are several concerns that should be addressed.

Comments

1. This clustering system might be useful for identifying the high-risk patients. However, does the simplified clustering system (specific characteristics of clusters) reflect the results derived by machine learning?

2. If we want a prognostic prediction model for an individual patient, can we make a prognostic prediction by clustering information from the patient in front of us? I would also like to see a clear distinction between the scoring system and the clustering system.

Reviewer #2: The manuscript by Kanda E et al. investigated the characteristics of hemodialysis patients using machine learning model, and its usefulness for screening hemodialysis patients at a high risk of one-year death using the nation-wide database of the Japanese Society for Dialysis Therapy (JSDT), and found that the five clusters clearly distinguished the groups on the basis of their characteristics and reflected their prognosis. The paper is well written and the topic is important for analyzing the present and future situations of hemodialysis patients in Japan. I have some comments as follows:

1. The mean ages of clusters 4 and 5 were older and the C-reactive protein levels of them higher than those of other groups. Therefore, the authors should discuss the cause of death because it seems to be different in the five clusters.

2. Did the dataset of the JSDT include data on blood pressure and medication use? If not, please describe in the discussion. If included, please explain why the authors did not use them as the baseline characteristics in their model.

3. Minor: page 7, line 13; albuminlevel should be albumin level.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 May 29;15(5):e0233491. doi: 10.1371/journal.pone.0233491.r002

Author response to Decision Letter 0


28 Apr 2020

PLOS ONE

Academic Editor

Kojiro Nagai

Dear Dr. Nagai:

Thank you very much for your letter with the reviewers’ comments and for your helpful remarks on our paper. We have revised the manuscript in accordance with the reviewers’ comments and suggestions. The changes are written with tracked changes. This cover letter includes our point-by-point responses to the reviewers’ comments, which we hope to have been addressed satisfactorily.

Reviewer #1: General comments

This manuscript described the construction and evaluation of artificial intelligence model to categorization of hemodialysis-patient for survival rate using nationwide-real-world data in Japan. This study contains some novel factors. However, there are several concerns that should be addressed.

>> Thank you very much for your comments.

Comments

1. This clustering system might be useful for identifying the high-risk patients. However, does the simplified clustering system (specific characteristics of clusters) reflect the results derived by machine learning?

>> Thank you very much for your question. There are various types of machine learning, whose mechanisms cannot be fully understood by humans, and are called black boxes. Thus, an explainable machine learning model has been studied. K-means is based on the least square method, and is more understandable than other models. Moreover, the support vector machine (SVM) can be used to accurately predict patients’ prognosis. In this work, we developed an explainable ensemble model for the prediction of patients’ prognosis, which is composed of K-means and SVM. Thus, because K-means (clustering system) is part of the ensemble model, the clusters showed intermediate results of the ensemble model. Considering the difficulty for readers to understand the models, the above explanation was provided in Discussion: Page 28, paragraph 1.

2-1. If we want a prognostic prediction model for an individual patient, can we make a prognostic prediction by clustering information from the patient in front of us?

>> Considering your question No. 1, because the clustering was part of the ensemble model, to evaluate accurately patients’ prognosis, not only the clustering but also analyses by SVM are necessary. The trained ensemble model can be easily applicable to other patients in clinical settings.

2-2. I would also like to see a clear distinction between the scoring system and the clustering system.

>> In this study, because clustering is a part of the analysis of the ensemble model, the explanation of the difference between the risk scoring system and machine learning is appropriate for this question.

The standard scoring system is usually developed using a logistic regression model or a Cox proportional hazards model. These models are constructed on the basis of statistical assumptions. For example, there is an ideal population, and are restrictions of variables, such as the number of variables included in a model, the distribution pattern of error, proportional hazards, and so forth. Moreover, the variables are often used in a very simple linear model, such as β1x1 + β2x2 + + βnxn. Therefore, scoring systems have many restrictions of statistical assumptions, and have a limit of accuracy of prediction.

On the other hand, in machine learning, a population is not assumed; there is no assumption of the models, and no limit of the number of the variables in the models. Moreover, machine learning can be used to construct nonlinear models. Therefore, machine learning models such as SVM and deep learning can show higher prediction accuracy than scoring systems. In this study, the ensemble model could attain high prediction accuracy because of the combination of K-means and SVM. Therefore, machine learning has less restrictions of model development, and is expected to attain higher prediction accuracy than scoring system.

Moreover, in the scoring system, the variables are arbitrarily selected, whereas in machine learning, patients’ features are extracted from their numerical data, even though a human does not teach anything. There is a possibility that this feature extraction can clarify the new pathophysiological characteristics of diseases. For example, the five clusters in this study, which had different numerical features, may have different courses of change in their body condition after dialysis initiation. Thus, new unknown research seeds will be mined by machine learning. This is described in Discussion: Page 32, line 9.

Reviewer #2: The manuscript by Kanda E et al. investigated the characteristics of hemodialysis patients using machine learning model, and its usefulness for screening hemodialysis patients at a high risk of one-year death using the nation-wide database of the Japanese Society for Dialysis Therapy (JSDT), and found that the five clusters clearly distinguished the groups on the basis of their characteristics and reflected their prognosis. The paper is well written and the topic is important for analyzing the present and future situations of hemodialysis patients in Japan. I have some comments as follows:

>> Thank you very much for your comments.

1. The mean ages of clusters 4 and 5 were older and the C-reactive protein levels of them higher than those of other groups. Therefore, the authors should discuss the cause of death because it seems to be different in the five clusters.

>> Thank you very much for pointing this out. The details of deaths are summarized in Table 5. The percentages of CVD- and infection-caused deaths in Cluster 5 were higher than those in other clusters. These results are described in Results and Discussion: Page 23, Table 5; Page 23, Paragraph 2; Page 30, Paragraph 1.

2. Did the dataset of the JSDT include data on blood pressure and medication use? If not, please describe in the discussion. If included, please explain why the authors did not use them as the baseline characteristics in their model.

>> Blood pressure and medication were important factors for analyzing the characteristics of patients on the basis of the clusters. However, these factors were not included in our dataset. This is described in Discussion as a limitation: Page 34, Paragraph 2, line 7.

3. Minor: page 7, line 13; albuminlevel should be albumin level.

>> Thank you very much.

Decision Letter 1

Kojiro Nagai

7 May 2020

Application of explainable ensemble artificial intelligence model to categorization of hemodialysis-patient and treatment using nationwide-real-world data in Japan

PONE-D-20-09660R1

Dear Dr. Kanda,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Kojiro Nagai

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In this manuscript, the authors revised their manuscript in accordance with our review.

This manuscript fulfilled our suggestion.

Reviewer #2: The revised manuscript by Kanda E., et al. responded well to the points raised. I have no further critique.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Acceptance letter

Kojiro Nagai

15 May 2020

PONE-D-20-09660R1

Application of explainable ensemble artificial intelligence model to categorization of hemodialysis-patient and treatment using nationwide-real-world data in Japan

Dear Dr. Kanda:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Kojiro Nagai

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Availability Statement

    Data cannot be made publicly available by the authors, as they are owned by the Japanese Society for Dialysis Therapy. Interested readers may request the data at the following URL: http://www.jsdt.or.jp/jsdt/1761.html.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES