Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2021 Aug 31;16(8):e0256428. doi: 10.1371/journal.pone.0256428

Predicting mortality among patients with liver cirrhosis in electronic health records with machine learning

Aixia Guo 1, Nikhilesh R Mazumder 2,3, Daniela P Ladner 3,4, Randi E Foraker 1,5,*
Editor: Ming-Lung Yu6
PMCID: PMC8407576  PMID: 34464403

Abstract

Objective

Liver cirrhosis is a leading cause of death and effects millions of people in the United States. Early mortality prediction among patients with cirrhosis might give healthcare providers more opportunity to effectively treat the condition. We hypothesized that laboratory test results and other related diagnoses would be associated with mortality in this population. Our another assumption was that a deep learning model could outperform the current Model for End Stage Liver disease (MELD) score in predicting mortality.

Materials and methods

We utilized electronic health record data from 34,575 patients with a diagnosis of cirrhosis from a large medical center to study associations with mortality. Three time-windows of mortality (365 days, 180 days and 90 days) and two cases with different number of variables (all 41 available variables and 4 variables in MELD-NA) were studied. Missing values were imputed using multiple imputation for continuous variables and mode for categorical variables. Deep learning and machine learning algorithms, i.e., deep neural networks (DNN), random forest (RF) and logistic regression (LR) were employed to study the associations between baseline features such as laboratory measurements and diagnoses for each time window by 5-fold cross validation method. Metrics such as area under the receiver operating curve (AUC), overall accuracy, sensitivity, and specificity were used to evaluate models.

Results

Performance of models comprising all variables outperformed those with 4 MELD-NA variables for all prediction cases and the DNN model outperformed the LR and RF models. For example, the DNN model achieved an AUC of 0.88, 0.86, and 0.85 for 90, 180, and 365-day mortality respectively as compared to the MELD score, which resulted in corresponding AUCs of 0.81, 0.79, and 0.76 for the same instances. The DNN and LR models had a significantly better f1 score compared to MELD at all time points examined.

Conclusion

Other variables such as alkaline phosphatase, alanine aminotransferase, and hemoglobin were also top informative features besides the 4 MELD-Na variables. Machine learning and deep learning models outperformed the current standard of risk prediction among patients with cirrhosis. Advanced informatics techniques showed promise for risk prediction in patients with cirrhosis.

Background and significance

Cirrhosis of the liver is a leading cause of morbidity and mortality in the United states, causing 40,000 deaths each year [1]. The vast majority of patients with cirrhosis have subclinical disease, however once their disease progresses they often rapidly decompensate and are at high risk of morbidity, mortality, and poor quality of life [2, 3]. The current method for predicting mortality in sick patients relies on the Model for End Stage Sodium (MELD-Na) score, a modified logistic regression model developed in 2002 that accurately predicts 90 day mortality at high scores and can help triage treatment and monitoring [4, 5].

Unfortunately, while accurate at high scores and short follow up, the MELD-Na score is not as predictive at lower scores and longer time spans [68]. Furthermore, the vast majority of patients with cirrhosis have missing labs to calculate a MELD-Na score or have low MELD-Na scores, with 93% having a MELD-Na of less than 18 [9, 10]. Thus, an alternative method to predict mortality for the cohort of patients with cirrhosis at large is needed.

One possible reason for the conventional MELD-Na score fails to predict patient outcomes may be due to the complex biological relationship among non-linear and multi-dimensional variables present in medicine [11]. Deep learning algorithms have been successfully used in some healthcare applications due to their ability to effectively capture informative features, patterns, and variable interactions from complex data [1216]. One 2006 study demonstrated that an artificial neural network performed better than MELD in predicting 3-month mortality among 400 patients with end-stage liver disease [11]. Another study in 2018 investigated 12–24 hour mortality prediction among 500 patients critically ill patients with cirrhosis by logistic regression (LR) and long short-term memory (LSTM) neural networks [17]. The limitations of these studies were that the cohorts included relatively few patients and predicted short-term mortality rather than long-term mortality, which allows for interventions that may change the outcomes.

Objective

The goal of our study was to assess if deep learning and machine learning techniques can provide an advantage in predicting 90 day, 180 day, and 365 mortality in patients with cirrhosis.

Materials and methods

Ethics statement

The need for informed consent for this study, which used existing patient records, was waived by the Institutional Review Board (IRB) of Washington University in Saint Louis (IRB ID # 202006212).

Method

Data source and study design

In this study, we utilized electronic health record (EHR) data from a large academic liver transplant center. Our institution partnered with MDClone [18, 19] (Beer Sheva, Israel) for the data storage and retrieval. MDClone platform is a data engine by storing EHR medical events in a time order for each patient. Queries can be built to pull computationally-derived or original EHR data from the platform. Patients in the EHR were included if they had an initial diagnosis code of liver cirrhosis in the period 1/1/2012 through 12/31/2019 and were 18 years of age or older at first diagnosis. The cohort was identified based on the following International Classification of Disease codes: K76.6 Portal hypertension, K76.1 cardiac/congestive cirrhosis of liver, K74.60 Unspecified cirrhosis of liver, B19.21 Unspecified Viral Hep C with hepatic coma, K70.30 Alcoholic cirrhosis of liver, K74.69 Other cirrhosis of the liver, K70.31 Alcoholic cirrhosis of liver with ascites, I85.00 Esophageal varices, I86.4 gastric varices, K76.7 Hepatorenal syndrome, K65.2 Spontaneous bacterial peritonitis, K74.3 Primary Biliary Cirrhosis, E83.110 pigmentary cirrhosis (of liver), E83.01 Wilson’s Disease, I85.01 Esophageal varices with bleeding, K76.81 Hepatopulmonary syndrome similar to previous studies [20, 21]. Time of inclusion was defined as the first occurrence of any of these codes.

Primary outcome

The primary outcome was all-cause mortality ascertained by the medical record. We performed analyses within three specific timeframes 90 days, 180 days, and 365 days.

Feature extraction

Features were measured at or before the time of first diagnosis code. They included patient demographics, laboratory data, and information on last hospitalization. The selected features are predictive for mortality in patients with liver cirrhosis. Baseline demographic characteristics such as age, race and ethnicity, and laboratory features collected from blood such as serum aspartate aminotransferase, alanine aminotransferase, and total bilirubin were all informative predictors for mortality predictions in patients with liver cirrhosis [22, 23]. We included features that had more than 10% non-missing values, otherwise we discarded them. For included features, missing values were imputed using the multiple imputation [24] for the continuous variables and the mode for categorical variables. We also compared the multiple imputation strategy with mean imputation strategy by using mean for continuous variables. The results for mean imputation strategy were shown in Supplementary materials. For features with multiple values, only the closest value prior to inclusion was used. Using this method, 41 features were incorporated into models. The included features and possible value examples for each feature were listed in S1 Table.

Statistical analysis

We assessed and compared a deep learning model (DNN [25]) with two machine learning models, random forest (RF) [26] and LR [27], to predict mortality and compared this to using the four variables in the MELD-Na model (serum sodium, International Normalized Ratio (INR), Creatinine, and Total Bilirubin).

For each model, we utilized 5-fold cross validation. Due to the low overall rate of mortality, Synthetic Minority Over-sampling Technique (SMOTE) [28] for Nominal and Continuous was used to deal with imbalanced classes by oversampling positive patients in each fold for training but the original distribution was maintained in the validation fold. We studied 18 total prediction cases in terms of 3 different windows (365 days, 180 days and 90 days), two different groups of features (41 and 4 variables), and three different models (DNN, LR, and RF). We utilized the receiver operating curve, overall accuracy, sensitivity, specificity, precision, and F1-score to evaluate the performance of models for each prediction case.

We then investigated the feature importance to understand the reason of better performances for models with 41 features compared to those of 4 variables. The importance of each feature was quantified by each of the three trained models. We studied all the three models to confirm that the three methods could lead to consistent results. The coefficients for each input variable retrieved from the LR model was used to measure the feature importance for each input feature. The mean decrease impurity importance of a feature by the trained RF model was used to assess feature importance of RF model. We used the “iNNvestigate” [29] package with gradient to calculate feature importance for the DNN model.

Our DNN was comprised of an input layer, 4 hidden layers (with 128 nodes each) and a scalar output layer. We used the Sigmoid function [30] at the output layer and ReLu function [31] at each hidden layer. Binary cross-entropy was used as loss function and Adam optimizer was used to optimize the models with a mini-batch size of 512 samples. The hyperparameters were determined by optimal accuracy using a grid search method from 2 to 8 for hidden layers of network depth, 128 and 256 for hidden layer dimensions, 64, 128, and 512 for batch size.

We performed a grid search of hyperparameters for RF and LR models by five-fold cross validation. We searched the number of trees in the forest for 200, 500, and 700, and we considered the number of features for the best split according to auto, sqrt, and log2. For the LR model, we searched the norm for L1 and L2 norm in penalization, and the inverse value of regularization strength for 10 different numbers spaced evenly on a log scale of [0, 4].

The RF model was configured as follows: the number of trees in the RF was set 500; the number of maximum features that could be used in each tree was set as the square root of the total number of features; the minimum number of samples at a leaf node of a tree was set as 1. The LR model was configured as follows: the L2 norm was used in the penalization, i.e., the variance of predicted value and real value of training data; the stopping criteria was set as 1.0*10–4; the inverse of regularization strength, which reduces the potential overfitting, was set as 1.0.

Fig 1 shows our flowchart of our work. Analyses were conducted using the libraries of Scikit-learn, Keras, Scipy, and Matplotlib with Python, version 3.6.5 (2019).

Fig 1. The flowchart of our work.

Fig 1

Results

Cohort characteristics

During the study period, 34,575 patients met the inclusion criteria and their characteristics are listed in Table 1. Patients were primarily white, with a mean age of 60 years, with a mean MELD-Na score of 12.3. Approximately 5% (n = 1,784), 6% (n = 2,217), and 8% (n = 2,775) of patients died at 90, 180, and 365 days after inclusion.

Table 1. Characteristics [mean (SD) or n (%)] of our study populations.

Patients Mean (SD) or n (%)
Total patients N 34,575
Mortality within 365 days n (%) 2,775 (8.0)
Mortality within 180 days n (%) 2,217 (6.4)
Mortality within 90 days n (%) 1,784 (5.2)
Age 60.5 (14)
Gender
 Female 17,600 (50.9)
 Male 16,973 (49.1)
Race
 White 26,790 (77.5)
 Black 5,438 (15.7)
 Other/unknown 2,347 (6.8)
Ethnicity
Not Hispanic or Latino 23,156 (67.0)
Hispanic or Latino 313 (0.9)
Unknown 11,106 (32.1)
BMI 29.0 (7.1)
INR 1.3 (0.6)
Sodium 138.2 (3.9)
Creatinine 1.13 (1.06)
Total bilirubin 1.3 (3.2)
Hemoglobin 12.3 (2.3)
Potassium 4.1 (0.5)
Bicarbonate 24.5 (4.6)
MELD score 11.5 (6.2)
MELD-Na score 12.3 (6.5)

Results of prediction models

Fig 2 shows the prediction performance for the prediction cases of mortality within 365 days, 180 days, and 90 days by using all the 41 variables and only the 4 variables in MELD-Na model. In all 3 cases, all 3 models consistently indicated that performances with all 41 variables outperformed the cases of using only 4 variables used in the MELD-Na model. Among the 3 machine learning models, DNN model had the best performance (better than LR and RF model) in all the prediction cases with only MELD variables and RF had the best performance in the case of using 41 variables. The average AUC was 0.88 (0.86 and 0.85) for DNN model, 0.80 (0.77 and 0.74) for LR, and 0.90 (0.88 and 0.86) for RF model in the case of 90-day (180-day and 365-day) prediction for the case of 41 variables. In each chart of Fig 2, the model name plus MELD-Na in the legend means only the 4 variables in the MELD-Na model were inputs, otherwise, if only the model name is listed, the model used all 41 variables as inputs. For example, DNN Mean AUC in the legend refers to the case of using all 41 variables in DNN model, and DNN MELD-NA Mean AUC refers to the average AUC in the case of DNN model using only the 4 MELD-Na variables in the 5-fold cross-validation. S1 Fig shows the prediction performance for the case with mean imputation strategy.

Fig 2. Prediction performance by deep neural network (DNN), random forest (RF) and logistic regression (LR).

Fig 2

Figure a, b, c is for the case of mortality within 365 days, 180 days, and 90 days, respectively.

Table 2 shows other metrics, i.e., overall accuracy, precision, recall, F1-score, and specificity for 3 models by using all variables and 4 MELD-Na variables. In all the cases, DNN and RF models achieved similar higher values compared to the LR model in terms of F1-score. Although RF and DNN achieved similar values of F1-score, DNN model achieved higher recall/sensitivity compared to RF model. All the metrics improved by using all the 41 variables instead of only the 4 MELD-Na variables. Prediction metrics were more accurate for the case of mortality at 90 days, compared to the other two cases of 180 days and one year. S2 Table shows the same metrics results with the mean imputation strategy. It is clinically useful to consider different tradeoffs of sensitivity and specificity for the specific clinical application. We have also conducted the analysis considering 10 different tradeoffs, i.e., 0.05, 0.1, 0.2, 0.3, 0.4, 0.6, 0.7, 0.8, 0.9, 0.95. The results of models with all 41 variables were summarized as S3 Table.

Table 2. Prediction metrics [n (%)] of 3 period cases for 3 machine learning models.

Models Period Accuracy Precision Recall F1-Score Specificity
(days) Mean(std) Mean(std) Mean(std) Mean(std) Mean(std)
DNN 365 0.83(0.01) 0.27(0.0) 0.65(0.04) 0.38(0.01) 0.85(0.01)
(all variables) 180 0.86(0.02) 0.26(0.02) 0.64 (0.03) 0.37(0.02) 0.88(0.02)
90 0.90(0.02) 0.30(0.05) 0.63(0.04) 0.40(0.04) 0.92(0.02)
LR 365 0.77(0.01) 0.21(0.0) 0.72(0.01) 0.33(0.01) 0.77(0.01)
(all variables) 180 0.79(0.0) 0.19(0.0) 0.75(0.0) 0.31(0.0) 0.79(0.0)
90 0.81(0.01) 0.18(0.0) 0.78(0.03) 0.29(0.01) 0.81 (0.01)
RF 365 0.92(0.0) 0.47(0.04) 0.37(0.02) 0.41(0.02) 0.96 (0.0)
(all variables) 180 0.93(0.0) 0.46(0.03) 0.40(0.02) 0.43(0.02) 0.97 (0.0)
90 0.94 (0.0) 0.43(0.01) 0.41(0.02) 0.42(0.01) 0.97(0.0)
DNN 365 0.78(0.02) 0.20(0.01) 0.59(0.04) 0.30(0.01) 0.80(0.03)
(4 MELD-Na variables) 180 0.80(0.03) 0.18(0.02) 0.61(0.05) 0.28(0.02) 0.81(0.04)
90 0.80(0.02) 0.16(0.01) 0.66(0.03) 0.25(0.01) 0.81(0.02)
LR 365 0.78(0.01) 0.20(0.01) 0.58(0.0) 0.30(0.01) 0.80(0.01)
(4 MELD-Na variables) 180 0.80(0.01) 0.18(0.01) 0.61(0.03) 0.28(0.01) 0.81(0.0)
90 0.81(0.01) 0.16(0.01) 0.64(0.02) 0.25(0.01) 0.82(0.01)
RF 365 0.85(0.0) 0.22(0.02) 0.36(0.04) 0.27(0.02) 0.89(0.0)
(4 MELD-Na variables) 180 0.87(0.0) 0.20(0.01) 0.36(0.01) 0.26(0.01) 0.90(0.01)
90 0.89(0.0) 0.20 (0.02) 0.38(0.04) 0.26(0.03) 0.92(0.0)

In order to investigate why the performance decreased when only using the 4 MELD-Na variables compared to using all of the variables, we investigated the feature importance obtained from the three trained models (Fig 3). Fig 3A was obtained from DNN model, Fig 3B and 3C were based on RF and LR models. The three figures consistently showed that the 4 MELD-Na variables were in the top features among all the variables, which indicated that the 4 variables were important and informative. However, besides these 4 variables, other features such as alkaline phosphatase values, Alanine aminotransferase values, hemoglobin values, and hospital admission start date (date difference in days between diagnosis of liver cirrhosis and previous hospital admission start dates) were also top features, which meant they might also be informative and play an important role in the predictions.

Fig 3.

Fig 3

Feature importance by DNN (a), RF (b) and LR (c). This is the case for mortality within 365 days. We have similar results for 180 days and 90 days (not shown).

Fig 3 shows the case of 365-day mortality prediction. The other cases (90-day and 180-day mortality) had the same trends and similar results.

Discussion

In this study, we utilized 8-year EHR data that were synthesized and deidentified by the MDClone platform to identify patients with liver cirrhosis to predict their mortality within 365, 180, and 90 days from the first diagnoses of liver cirrhosis by machine learning and deep learning models. We also investigated the most important features by ANOVA for understanding the better performance of models with 41 variables than those of 4 variables in the MELD-Na model. We finally investigated the performances of models with 11 different cut-off values to discover changes in model performance.

Our results indicated that the deep learning model DNN can effectively predict the mortality within 90, 180, 365 days of patients compared to LR and RF models in all prediction cases and are superior to the four variables in the MELD-Na score alone. Although the 4 variables used in MELD-Na model were among the top most informative features, other features such as hemoglobin, alkaline phosphatase, alanine aminotransferase, and time since recent hospitalization were also top features and might play an important role in mortality prediction. Therefore, adding additional discrete features might improve the accuracy of the MELD-Na scoring system. In addition, features of ‘Reference Event-Facility’ and ‘Age at event’ were also important features indicated by all three models, which implied the facility to which patients presented and their age at first diagnosis had strong associations with mortality.

Our study has several strengths and limitations. We were able to utilize a large database of more than 34,000 patients with cirrhosis using a retrospective design. Nevertheless, we expect our finding that these newer informatics methods better predict outcomes over the MELD-Na score in a prospective sample patient sample [32, 33]. The outcome of interest for these analyses was all-cause mortality, which we acknowledge may not always represent liver-related causes of death. Our study has another limitation. The cohort selection based on diagnosis codes (e.g., K76.6) may include patients with non-cirrhotic disease, although these conditions are frequently seen among patients with cirrhosis. Our inclusion criteria is based on the literature which was validated against chart review with good specificity [21, 34]. Furthermore, patients with primary biliary cirrhosis comprised only 216 (0.6%) of the cohort and Wilsons disease comprised an even smaller 54 (0.15%). Thus, we feel that the effect of non-cirrhotic patients in this small subset would not affect our findings. Lastly, we did have some features with large amounts of missing data requiring feature selection and imputation. This situation is commonly encountered when using clinical data for research purposes and including these cases in the pipeline improves the generalizability of the results. Our future work will further investigate patients with GI bleeding, end stage renal disease, and osteoporosis or recent bone fracture as these conditions may cause an elevated ALKP.

Conclusions

Deep learning models (DNN) can be used to predict longer term mortality among patients with liver cirrhosis more reliably than the MELD-Na variables alone using common EHR data variables. Our findings suggest that newer informatics methods might benefit patients who are inadequately triaged by the MELD-Na score. Future work should validate this methodology in actual patient data and incorporate competing the competing risk of transplant to avoid mortality.

Supporting information

S1 Fig. Prediction performance by deep neural network (DNN), random forest (RF) and logistic regression (LR) with mean imputation strategy.

Figure a, b, c is for the case of mortality within 365 days, 180 days, and 90 days, respectively.

(PDF)

S1 Table. The study features and feature description.

(DOCX)

S2 Table. Prediction metrics [n (%)] of 3 period cases for 3 machine learning models with mean strategy for imputation.

(DOCX)

S3 Table. Prediction metrics [n (%)] of 3 machine learning models under 10 different tradeoffs for case of 365 days.

(DOCX)

Data Availability

As the data set contains potentially identifying and sensitive patient information, we cannot share these data publicly without permission. To request the data, please contact the Washington University Human Research Protection Office by mail at MS08089-29-2300, 600 S. Euclid Avenue, Saint Louis, MO 63110 or by phone at 314-747-6800 or by email hrpo@wustl.edu.

Funding Statement

Dr. Mazumder was supported by NIH grant T32DK077662. The other authors received no specific funding for this work.

References

  • 1.Heron M. National Vital Statistics Reports Volume 68, Number 6, June 24, 2019, Deaths: Leading Causes for 2017. 2019. [PubMed]
  • 2.D ‘amico G, Garcia-Tsao G, Pagliaro L. Natural history and prognostic indicators of survival in cirrhosis: A systematic review of 118 studies. doi: 10.1016/j.jhep.2005.10.013 [DOI] [PubMed] [Google Scholar]
  • 3.Chung WJW, Jo C, Chung WJW, Kim DJ. Liver cirrhosis and cancer: comparison of mortality. Hepatol Int. 2018;12: 269–276. doi: 10.1007/s12072-018-9850-5 [DOI] [PubMed] [Google Scholar]
  • 4.Biggins SW, Kim WR, Terrault NA, Saab S, Balan V, Schiano T, et al. Evidence-Based Incorporation of Serum Sodium Concentration Into MELD. doi: 10.1053/j.gastro.2006.02.010 [DOI] [PubMed] [Google Scholar]
  • 5.Malinchoc M, Kamath PS, Gordon FD, Peine CJ, Rank J, Ter Borg PCJ. A model to predict poor survival in patients undergoing transjugular intrahepatic portosystemic shunts. Hepatology. 2000. doi: 10.1053/he.2000.5852 [DOI] [PubMed] [Google Scholar]
  • 6.Somsouk M, Kornfield R, Vittinghoff E, Inadomi JM, Biggins SW. Moderate ascites identifies patients with low model for end-stage liver disease scores awaiting liver transplantation who have a high mortality risk. Liver Transpl. 2011;17: 129–36. doi: 10.1002/lt.22218 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ray Kim W, Biggins SW, Kremers WK, Wiesner RH, Kamath PS, Benson JT, et al. Hyponatremia and Mortality among Patients on the Liver-Transplant Waiting List. N Engl J Med. 2008;359: 1018–26. doi: 10.1056/NEJMoa0801209 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Godfrey EL, Malik TH, Lai JC, Mindikoglu AL, Galván NTN, Cotton RT, et al. The decreasing predictive power of MELD in an era of changing etiology of liver disease. Am J Transplant. 2019. doi: 10.1111/ajt.15559 [DOI] [PubMed] [Google Scholar]
  • 9.Trotter JF, Osgood MJ. MELD Scores of Liver Transplant Recipients According to Size of Waiting List. JAMA. 2004;291: 1871. doi: 10.1001/jama.291.15.1871 [DOI] [PubMed] [Google Scholar]
  • 10.Yi Z, Mayorga ME, Orman ES, Wheeler SB, Hayashi PH, Barritt AS. Trends in Characteristics of Patients Listed for Liver Transplantation Will Lead to Higher Rates of Waitlist Removal Due to Clinical Deterioration. Transplantation. 2017;101: 2368–2374. doi: 10.1097/TP.0000000000001851 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Cucchetti A, Vivarelli M, Heaton ND, Phillips S, Piscaglia F, Bolondi L, et al. Artificial neural network is superior to MELD in predicting mortality of patients with end-stage liver disease. Gut. 2007. doi: 10.1136/gut.2005.084434 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Goodfellow I, Bengio Y, Courville A. Deep Learning. MIT Press. 2016. doi: 10.1533/9780857099440.59 [DOI] [Google Scholar]
  • 13.Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. npj Digit Med. 2018;1: 18. doi: 10.1038/s41746-018-0029-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kwon J, Kim K-H, Ki-Hyun J, Lee SE, Lee H-Y, Cho. Artificial intelligence algorithm for predicting mortality of patients with acute heart failure. PLoS One. 2019. doi: 10.1371/journal.pone.0219302 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Park SH, Nikhilesh M, Mehrotra S, Ho B, Kaplan B, Ladner DP. Artificial Intelligence-related Literature in Transplantation. Transplantation. 2020. doi: 10.1097/TP.0000000000003304 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Park SH, Mazumder NR, Mehrotra S, Ho B, Kaplan B, Ladner DP. Artificial Intelligence-related Literature in Transplantation: A Practical Guide. Transplantation. 2020;Online Fir. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Harrison E, Chang M, Hao Y, Flower A. Using machine learning to predict near-term mortality in cirrhosis patients hospitalized at the University of Virginia health system. 2018 Systems and Information Engineering Design Symposium, SIEDS 2018. 2018. doi: 10.1109/SIEDS.2018.8374719 [DOI]
  • 18.Foraker RE, Yu SC, Gupta A, Michelson AP, Pineda Soto JA, Colvin R, et al. Spot the difference: comparing results of analyses from real patient data and synthetic derivatives. JAMIA Open. 2020. doi: 10.1093/jamiaopen/ooaa060 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Guo A, Foraker RE, MacGregor RM, Masood FM, Cupps BP, Pasque MK. The Use of Synthetic Electronic Health Record Data and Deep Learning to Improve Timing of High-Risk Heart Failure Surgical Intervention by Predicting Proximity to Catastrophic Decompensation. Front Digit Heal. 2020. doi: 10.3389/fdgth.2020.576945 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Chang EK, Yu CY, Clarke R, Hackbarth A, Sanders T, Esrailian E, et al. Defining a Patient Population With Cirrhosis. J Clin Gastroenterol. 2016;50: 889–894. doi: 10.1097/MCG.0000000000000583 [DOI] [PubMed] [Google Scholar]
  • 21.Lo Re V, Lim JK, Goetz MB, Tate J, Bathulapalli H, Klein MB, et al. Validity of diagnostic codes and liver-related laboratory abnormalities to identify hepatic decompensation events in the Veterans Aging Cohort Study. Pharmacoepidemiol Drug Saf. 2011;20: 689–99. doi: 10.1002/pds.2148 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Radisavljevic MM, Bjelakovic GB, Nagorni A V., Stojanovic MP, Radojkovicn MD, Jovic JZ, et al. Predictors of Mortality in Long-Term Follow-Up of Patients with Terminal Alcoholic Cirrhosis: Is It Time to Accept Remodeled Scores. Med Princ Pract. 2017. doi: 10.1159/000451057 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Li Y, Chaiteerakij R, Kwon JH, Jang JW, Lee HL, Cha S, et al. A model predicting short-term mortality in patients with advanced liver cirrhosis and concomitant infection. Med (United States). 2018. doi: 10.1097/MD.0000000000012758 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Wayman JC. Multiple Imputation For Missing Data: What Is It and How Can I Use It? Res Methods Psychol. 2003. [Google Scholar]
  • 25.Bengio Y. Learning deep architectures for AI. Found Trends Mach Learn. 2009. doi: 10.1561/2200000006 [DOI] [Google Scholar]
  • 26.Ho TK. Random decision forests. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. 1995. doi: 10.1109/ICDAR.1995.598994 [DOI]
  • 27.Hosmer D, Lemeshow S, Sturdivant RX. Model-Building Strategies and Methods for Logistic Regression. Applied Logistic Regression. 2013. doi: 10.1002/0471722146.ch4 [DOI] [Google Scholar]
  • 28.Chawla N V., Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res. 2002. [Google Scholar]
  • 29.Alber M, Lapuschkin S, Seegerer P, Hägele M, Schütt KT, Montavon G, et al. INNvestigate neural networks! J Mach Learn Res. 2019. [Google Scholar]
  • 30.Han J, Moraga C. The influence of the sigmoid function parameters on the speed of backpropagation learning. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 1995. doi: 10.1007/3-540-59497-3_175 [DOI] [Google Scholar]
  • 31.Nair V, Hinton GE. Rectified linear units improve Restricted Boltzmann machines. ICML 2010—Proceedings, 27th International Conference on Machine Learning. 2010.
  • 32.Foraker RE, Mann DL, Payne PRO. Are Synthetic Data Derivatives the Future of Translational Medicine? JACC BASIC TO Transl Sci. 2018;3. doi: 10.1016/j.jacbts.2018.08.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Uzuner Ö, Luo Y, Szolovits P. Evaluating the State-of-the-Art in Automatic De-identification. J Am Med Informatics Assoc. 2007;14: 550–563. doi: 10.1197/jamia.M2444 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Chang EK, Christine YY, Clarke R, Hackbarth A, Sanders T, Esrailian E, et al. Defining a Patient Population With Cirrhosis: An Automated Algorithm With Natural Language Processing. J Clin Gastroenterol. 2016. doi: 10.1097/MCG.0000000000000583 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Ming-Lung Yu

12 Feb 2021

PONE-D-20-37171

Predicting mortality among patients with liver cirrhosis in electronic health records with machine learning

PLOS ONE

Dear Dr. Foraker,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

Please submit your revised manuscript by Mar 21 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Ming-Lung Yu, MD, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1) Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2) Thank you for stating the following in the Acknowledgments Section of your manuscript:

[Dr. Mazumder was supported by NIH grant T32DK077662.]

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

 [The author(s) received no specific funding for this work.]

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

3)   We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

4) Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please move it to the Methods section and delete it from any other section. Please ensure that your ethics statement is included in your manuscript, as the ethics statement entered into the online submission form will not be published alongside your manuscript.

5) We noted in your submission details that a portion of your manuscript may have been presented or published elsewhere, as the Abstract appears in https://aasldpubs.onlinelibrary.wiley.com/doi/10.1002/hep.31579. Please clarify whether this conference proceeding/publication was peer-reviewed and formally published. If this work was previously peer-reviewed and published, in the cover letter please provide the reason that this work does not constitute dual publication and should be included in the current manuscript.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: No

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Dear Authors, congratulations of the innovative work to improve predictive power of the prognosis of cirrhotic patients via machine learning. Here are my comments:

Q1:

Quoted from paragraph of "BACKGROUND AND SIGNFICANCE": "One 2006 study demonstrated that an artificial neural network

performed better than MELD-NA in predicting 3-month mortality among 400 patients with endstage liver disease."

--> Actually MELD-NA was created by W Ray Kim in 2008, and the referred study in 2006 (Reference #10) actually compared the predicting power of liver disease‐related mortality of artificial neural network (ANN) and MELD.

Q2-0:

Quoted from paragraph of "Data source and study design": "...based on the following International Classification of Disease codes codes", the last word "codes" was repeated.

Q2-1:

The study cohort had an initial diagnosis code of liver cirrhosis in the period 1/1/2012 through 12/31/2019, however, ICD 10 codes were introduced since October 2015. What were the ICD 9 codes used to identify the study cohort?

Q2-2:

Aside from K74.3, K74.6, and K74.69, ICD 10 code K74.1 Hepatic sclerosis, and K74.2 Hepatic fibrosis with hepatic sclerosis, were also frequently used to make a diagnosis of liver cirrosis. It seemed these K74.1 and K74.2 were not included in the current study, was there a specific reason?

Q2-3:

The ICD 10 code K74.3 Primary Biliary Cirrhosis was a diagnosis used interchangeably with Primary Biliary Cholangitis. The diagnosis could be made when serum anti-mitochodrial antibody (AMA) tested positive, and evident cholangitis (elevated serum ALKP/rGT, or histologically inflammation/fibrosis of bile duct) were noted. Therefore, an ICD 10 code 74.3 may include many patients with long term cholangitis with or without treatment, but no evidently cirrhotic liver. Similar condition occurs when including E83.01 Wilson's disease, which may include those with only steatosis or chronic hepatitis. Including K74.3 and E83.01 without other ICD codes (such as K74.1, K76.6 or K74.60, etc.) may yield an inaccurate cohort for this study.

Q3:

Quoted from paragraph of Feature Extraction, "We included features that had more than 10% non-missing values, otherwise we discarded them."

10% non-missing values seemed significanly insufficient, why include features more than 10% non-missing values?

It would be more reasonable to include features with more than 90% non-missing values or less than 10% missing values.

Q4:

When performing an evaluation of cirrhotic patients, besides MELD-Na, a physician would probably choose parameters as "encephalopathy episodes" or "serum ammonia levels" (implemented in Child Pugh score, indicating liver failure if positive or elevated level), "serum albumin" (also included in Child Pugh score) or "rGT" (relates to cholestasis in cirrhosis and HCC risk in Chronic hepatitis C) and "platelet counts" (relates to liver decompensation and portal hypertension), those were parameters with established correlation to liver decompensation. In "Supplemental Table 1", Hemoglobin, Potassium and Bicarbonate were amongst the 41 features chosen for model training, what were the rationales?

Q5.

As the study revealed ALKP and Hb were among the most informative parameters for mortality prediction, did the authors excluded patients with recent GI bleeding or end stage renal disease( which may contribute to anemia) and osteoporosis or recent bone fracture( which may cause an elevated ALKP) when referencing the EHR?

Q6.

Stated in "Statistical analysis": "The LR and RF models were configured by the default options in package of Scikit-learn in Python 3.". The best hyperparameters for a Random Forest classifier were not likely to determine ahead of time, and tuning the hyperparameters to determine the optimal settings would usually be inevitable. Please specify the final configuration of LR and RF models in the current study.

Q7. Stated in "Results" of "Abstract": "The DNN model performed the best ... for 90, 180, and 365 day mortality respectively." However, in "Results" of the manuscript, it was stated that: The average AUC was 0.82 (0.79 and 0.76) for DNN model, and 0.83 (0.80 and 0.79) for RF model in the case of 90-day (180-day and 365-day) prediction for the case of 41 variables. And Figure 2 also showed RF, instead of DNN, performed the best?

Q8.

Quoted from "Results of Prediction Models": "besides these 4 variables, other features such as alkaline phosphatase values, Alanine aminotransferase values, hemoglobin values, and hospital admission start date (date difference in days between diagnosis of liver cirrhosis and previous hospital admission start dates) were also top features."

Q8-0: Why was alanine aminotransferase not mentioned in "Discussion" (Quoted: "other features such as hemoglobin, alkaline phosphatase (AP) and time since recent hospitalization were also top features and might play an important role in mortality prediction.") ?

Q8-1: According to Figure 3.(c), "hospital admission start dates" ranked least importance in LR? What were the possible explanations?

Q8-2: In Figure 3, some features, such as "Reference Event-Facility" and "Age at event", seemed to be important features in all three models, even more important than hemoglobin. Those features should be mentioned in discussion as well.

Reviewer #2: The authors seek to define an improved prognostic metric for cirrhosis using deep neural networks and machine learning. This is an important goal given limitations of the MELD/MELD-NA score.

Critiques:

- It may be helpful to cite and incorporate newer data on the MELD score. For example PMID: 31394020. This paper supports the authors’ claim that an improved prognostic metric is needed.

- The authors claim that DNN provides the best performance but it appears RF has the best AUC at each of the three time points.

- The subject selection by diagnosis would include subjects with non-cirrhotic portal hypertension. While it may be difficult to exclude such subjects, this issue should at least be addressed and mitigated if possible

- What were the causes of death in these patients? Were they liver-related? Perhaps MELD is performing poorly because it is inferior at predicting non-liver related mortality.

- The authors describe selecting features from an initial pool. How was the initial pool of features selected? Please justify why the original pool of variables may have a priori utility of prognosis in cirrhosis or discuss why they do not need any expectation of utility.

- “We included features that had more than 10% non-missing values, otherwise we discarded them” This implies that features could have up to 90% missing values. A more typical approach would be to only include features that have less than 5 or 10% missing values.

- What metric was used to assess feature importance?

- Please provide plausible/physiologic explanations for why the selected features should/could be predictive of mortality

- Please justify the use of mean/mode for missing data rather than a more sophisticated method of imputation (e.g. multiple imputation)

- Details are provided for the parameters used for DNN but not for RF and LR. "The LR and RF models were configured by the default options in package of Scikit-learn in Python 3.” It would be helpful to provide similar

- MDClone is mentioned for the first time in the discussion. This should be explained earlier.

- Table 2: How were these combinations of recall/sensitivity and specificity selected? For clinical applications it is often useful to consider set one of these metrics (sensitivity or specificity) that is expected to be clinically useful and then compare the other metric amongst the models. I recommend some consideration of the tradeoffs of sensitivity and specificity for this application depth of information for these latter models.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Aug 31;16(8):e0256428. doi: 10.1371/journal.pone.0256428.r002

Author response to Decision Letter 0


8 Jun 2021

Predicting mortality among patients with liver cirrhosis in electronic health records with machine learning

We would like to thank the reviewers for their informed, thoughtful, and helpful comments. Please find our responses to the reviews below in italics. We believe that the manuscript has been significantly improved by our collaboration with the reviewers and hope that they will find it suitable for publication in the PlOS ONE.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: No

Thank you for the helpful reviews. We have added more details about the technical description and conducted more rigorous experiments according to the reviewers’ comments and suggestions.

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

We appreciate the opportunity to improve upon our previous statistical analyses, and have conducted our experiments more rigorously according to the reviewers’ comments and suggestions.

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

Thank you for calling our attention to the details of this policy. Please find the data availability statement in our manuscript as follows.

“Data availability statement

The datasets for the current study are available from the corresponding author on reasonable request. Requests to access these data sets should be directed to randi.foraker@wustl.edu.”

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Dear Authors, congratulations of the innovative work to improve predictive power of the prognosis of cirrhotic patients via machine learning. Here are my comments:

Q1:

Quoted from paragraph of "BACKGROUND AND SIGNFICANCE": "One 2006 study demonstrated that an artificial neural network performed better than MELD-NA in predicting 3-month mortality among 400 patients with endstage liver disease."

--> Actually MELD-NA was created by W Ray Kim in 2008, and the referred study in 2006 (Reference #10) actually compared the predicting power of liver disease‐related mortality of artificial neural network (ANN) and MELD.

We appreciate the reviewer’s insight and feedback. Our apologies for making this typo in the original manuscript. We have corrected the error.

Q2-0:

Quoted from paragraph of "Data source and study design": "...based on the following International Classification of Disease codes codes", the last word "codes" was repeated.

Thank you for pointing out this mistake. We have deleted the last word “codes”.

Q2-1:

The study cohort had an initial diagnosis code of liver cirrhosis in the period 1/1/2012 through 12/31/2019, however, ICD 10 codes were introduced since October 2015. What were the ICD 9 codes used to identify the study cohort?

This is a great question. We appreciate and agree with the reviewer’s comment. We used the MDClone platform to pull the original data as our study cohort and the platform had already implemented a crosswalk to convert ICD 9 codes to ICD 10 codes. For example, the ICD 9 code ‘572.3’ was converted to the ICD 10 code ‘K76.6’.

Q2-2:

Aside from K74.3, K74.6, and K74.69, ICD 10 code K74.1 Hepatic sclerosis, and K74.2 Hepatic fibrosis with hepatic sclerosis, were also frequently used to make a diagnosis of liver cirrosis. It seemed these K74.1 and K74.2 were not included in the current study, was there a specific reason?

We appreciate the reviewer’s insightful feedback and expert knowledge. Hepatic fibrosis and sclerosis can range from minimal to end stage, termed 'cirrhosis'. While fibrosis and sclerosis are asymptomatic, cirrhosis is not and is often associated with significant clinical complications. These late stage patients, e.g. those with cirrhosis, were the intended population for our predictive modeling, thus the other codes (K74.1 and K74.2) were not used for inclusion.

Q2-3:

The ICD 10 code K74.3 Primary Biliary Cirrhosis was a diagnosis used interchangeably with Primary Biliary Cholangitis. The diagnosis could be made when serum anti-mitochodrial antibody (AMA) tested positive, and evident cholangitis (elevated serum ALKP/rGT, or histologically inflammation/fibrosis of bile duct) were noted. Therefore, an ICD 10 code 74.3 may include many patients with long term cholangitis with or without treatment, but no evidently cirrhotic liver. Similar condition occurs when including E83.01 Wilson's disease, which may include those with only steatosis or chronic hepatitis. Including K74.3 and E83.01 without other ICD codes (such as K74.1, K76.6 or K74.60, etc.) may yield an inaccurate cohort for this study.

Thank you for sharing all the helpful information. We appreciate the reviewer’s expert knowledge. The reviewer is correct that primary biliary cirrhosis and Wilsons disease at their early stages do not necessarily imply cirrhosis, and we have added to the discussion section to address this valid limitation to our approach. as follows.

“Our inclusion criteria is based on the literature which was validated against chart review with good specificity. [1-2] Furthermore, patients with primary biliary cirrhosis comprised only 216 (0.6%) of the cohort and Wilsons disease comprised an even smaller 54 (0.15%). Thus, the effect of non-cirrhotic patients in this small subset would not affect our findings.”

[1] Lo Re V, Lim JK, Goetz MB, Tate J, Bathulapalli H, Klein MB, et al. Validity of diagnostic codes and liver-related laboratory abnormalities to identify hepatic decompensation events in the Veterans Aging Cohort Study. Pharmacoepidemiol Drug Saf. 2011;20: 689–99. doi:10.1002/pds.2148

[2] Chang EK, Christine YY, Clarke R, Hackbarth A, Sanders T, Esrailian E, et al. Defining a Patient Population With Cirrhosis: An Automated Algorithm With Natural Language Processing. J Clin Gastroenterol. 2016.

Q3:

Quoted from paragraph of Feature Extraction, "We included features that had more than 10% non-missing values, otherwise we discarded them."10% non-missing values seemed significanly insufficient, why include features more than 10% non-missing values? It would be more reasonable to include features with more than 90% non-missing values or less than 10% missing values.

We appreciate and agree with the reviewer’s comment. Missing values are a common issue in electronic health record data. We agree with the reviewer that 10% non-missing values seemed insufficient, but we tried to retain as many features/variables for analysis by using a low threshold (10%). We acknowledge this is a limitation and have added this limitation to the Discussion section as follows.

“Lastly, we did have some features with large amounts of missing data requiring feature selection and imputation. This situation is commonly encountered when using clinical data for research purposes and including these cases in the pipeline improves the generalizability of the results.”

Q4:

When performing an evaluation of cirrhotic patients, besides MELD-Na, a physician would probably choose parameters as "encephalopathy episodes" or "serum ammonia levels" (implemented in Child Pugh score, indicating liver failure if positive or elevated level), "serum albumin" (also included in Child Pugh score) or "rGT" (relates to cholestasis in cirrhosis and HCC risk in Chronic hepatitis C) and "platelet counts" (relates to liver decompensation and portal hypertension), those were parameters with established correlation to liver decompensation. In "Supplemental Table 1", Hemoglobin, Potassium and Bicarbonate were amongst the 41 features chosen for model training, what were the rationales?

We really appreciate the reviewer’s attention to the detail of our manuscript. Our model sought to utilize routinely obtained clinical data to improve prediction in outcomes for patients with cirrhosis. As such, we included both 'classic' liver related datapoints (creatinine, bilirubin, etc) as well as other datapoints that are routinely collected to determine if additional factors may be important (bicarbonate, hemoglobin, potassium, etc). Clinically these are routine measures and belie the severity of systemic diseases (acidosis, anemia, renin-aldosterone axis) and could provide additional accuracy to model predictions. To prevent over-fitting, we used 5-fold cross validation as described in the manuscript.

Q5.

As the study revealed ALKP and Hb were among the most informative parameters for mortality prediction, did the authors excluded patients with recent GI bleeding or end stage renal disease (which may contribute to anemia) and osteoporosis or recent bone fracture (which may cause an elevated ALKP) when referencing the EHR?

We appreciate the reviewer’s expert comments and in-depth and thoughtful feedback. The reviewer raised a great point. In this study, we did not further investigate if patients had recent GI bleeding episodes, end stage renal disease, or osteoporosis or recent bone fracture, which may cause an elevated ALKP. We have added this as a future direction in the Discussion section as follows.

“Our future work will further investigate patients with GI bleeding, end stage renal disease, and osteoporosis or recent bone fracture as these conditions may cause an elevated ALKP.”

Q6.

Stated in "Statistical analysis": "The LR and RF models were configured by the default options in package of Scikit-learn in Python 3.". The best hyperparameters for a Random Forest classifier were not likely to determine ahead of time, and tuning the hyperparameters to determine the optimal settings would usually be inevitable. Please specify the final configuration of LR and RF models in the current study.

We appreciate the reviewer’s feedback and our apologies for excluding the configuration of LR and RF models. We have added more details with regard to hyperparameter tuning as follows.

“We performed a grid search of hyperparameters for RF and LR models by five-fold cross validation. We searched the number of trees in the forest for 200, 500, and 700, and we considered the number of features for the best split according to auto, sqrt, and log2. For the LR model, we searched the norm for L1 and L2 norm in penalization, and the inverse value of regularization strength for 10 different numbers spaced evenly on a log scale of [0, 4].

The RF model was configured as follows: the number of trees in the RF was set 500; the number of maximum features that could be used in each tree was set as the square root of the total number of features; the minimum number of samples at a leaf node of a tree was set as 1. The LR model was configured as follows: the L2 norm was used in the penalization, i.e., the variance of predicted value and real value of training data; the stopping criteria was set as 1.0*10-4; the inverse of regularization strength, which reduces the potential overfitting, was set as 1.0.”

Q7. Stated in "Results" of "Abstract": "The DNN model performed the best ... for 90, 180, and 365 day mortality respectively." However, in "Results" of the manuscript, it was stated that: The average AUC was 0.82 (0.79 and 0.76) for DNN model, and 0.83 (0.80 and 0.79) for RF model in the case of 90-day (180-day and 365-day) prediction for the case of 41 variables. And Figure 2 also showed RF, instead of DNN, performed the best?

We really appreciate the reviewer’s attention to the detail of our manuscript. Our apologies for the inconsistency of results between the Abstract and Results section. We have corrected the “Results” of the “Abstract” as follows.

“For example, the DNN model achieved an AUC of 0.88, 0.86, and 0.85 for 90, 180, and 365-day mortality respectively as compared to the MELD which only had an AUC of 0.81, 0.79, and 0.76 for the same instances.”

Q8.

Quoted from "Results of Prediction Models": "besides these 4 variables, other features such as alkaline phosphatase values, Alanine aminotransferase values, hemoglobin values, and hospital admission start date (date difference in days between diagnosis of liver cirrhosis and previous hospital admission start dates) were also top features."

Q8-0: Why was alanine aminotransferase not mentioned in "Discussion" (Quoted: "other features such as hemoglobin, alkaline phosphatase (AP) and time since recent hospitalization were also top features and might play an important role in mortality prediction.")?

Thank you. We appreciate the reviewer’s insight and feedback. Our apologies for excluding alanine aminotransferase from the “Discussion” section. We have added it to the Discussion as follows.

“Although the 4 variables used in MELD-Na model were among the top most informative features, other features such as hemoglobin, alkaline phosphatase, alanine aminotransferase, and time since recent hospitalization were also top features and might play an important role in mortality prediction.”

Q8-1: According to Figure 3.(c), "hospital admission start dates" ranked least importance in LR? What were the possible explanations?

Thank you. This is a great question aimed at possible explanations drawn by different models. One possible reason was the combination of other features already obtained higher prediction accuracy in LR.

Q8-2: In Figure 3, some features, such as "Reference Event-Facility" and "Age at event", seemed to be important features in all three models, even more important than hemoglobin. Those features should be mentioned in discussion as well.

We appreciate and agree with the reviewer’s comment. We have added text to the manuscript expanding our discussion as follows.

“In addition, features of ‘Reference Event-Facility’ and ‘Age at event’ were also important features indicated by all three models, which implied the facility to which patients presented and their age at first diagnosis had strong associations with mortality.”

Reviewer #2:

The authors seek to define an improved prognostic metric for cirrhosis using deep neural networks and machine learning. This is an important goal given limitations of the MELD/MELD-NA score.

Critiques:

- It may be helpful to cite and incorporate newer data on the MELD score. For example PMID: 31394020. This paper supports the authors’ claim that an improved prognostic metric is needed.

We appreciate the reviewer’s insight and feedback. Our apologies for not including the newer data on the MELD score in the original manuscript. We have added the reference as follows.

Godfrey EL, Malik TH, Lai JC, Mindikoglu AL, Galván NTN, Cotton RT, et al. The decreasing predictive power of MELD in an era of changing etiology of liver disease. Am J Transplant. 2019. doi:10.1111/ajt.15559

- The authors claim that DNN provides the best performance but it appears RF has the best AUC at each of the three time points.

We appreciate and agree with the reviewer’s comment, and our apologies for the confusion. We have corrected in the Results of Abstract section with the new results as follows.

“For example, the DNN model achieved an AUC of 0.88, 0.86, and 0.85 for 90, 180, and 365-day mortality respectively as compared to the MELD which only had an AUC of 0.81, 0.79, and 0.76 for the same instances.”

- The subject selection by diagnosis would include subjects with non-cirrhotic portal hypertension. While it may be difficult to exclude such subjects, this issue should at least be addressed and mitigated if possible

Thank you. We appreciate the reviewer’s insightful feedback. We have added this limitation to the Discussion section as follows.

“Our study has another limitation. The cohort selection based on diagnosis codes (e.g., K76.6) may include patients with non-cirrhotic disease, although these conditions are frequently seen among patients with cirrhosis. Our inclusion criteria is based on the literature which was validated against chart review with good specificity.[1-2]”

[1] Lo Re V, Lim JK, Goetz MB, Tate J, Bathulapalli H, Klein MB, et al. Validity of diagnostic codes and liver-related laboratory abnormalities to identify hepatic decompensation events in the Veterans Aging Cohort Study. Pharmacoepidemiol Drug Saf. 2011;20: 689–99. doi:10.1002/pds.2148

[2] Chang EK, Christine YY, Clarke R, Hackbarth A, Sanders T, Esrailian E, et al. Defining a Patient Population With Cirrhosis: An Automated Algorithm With Natural Language Processing. J Clin Gastroenterol. 2016.

- What were the causes of death in these patients? Were they liver-related? Perhaps MELD is performing poorly because it is inferior at predicting non-liver related mortality.

We appreciate and agree with the reviewer’s insightful feedback. In our study, we stated that “The primary outcome was all-cause mortality ascertained by the medical record” with the study cohort of patients with liver cirrhosis. We have added this limitation about causes of death in the Discussion section as follows.

“The outcome of interest for these analyses was all-cause mortality, which we acknowledge may not always represent liver-related causes of death.”

- The authors describe selecting features from an initial pool. How was the initial pool of features selected? Please justify why the original pool of variables may have a priori utility of prognosis in cirrhosis or discuss why they do not need any expectation of utility.

Thank you for the reviewer’s insightful feedback. We have added the following explanations for the initial pool of feature selection.

“Baseline demographic characteristics such as age, race and ethnicity, and laboratory features collected from blood such as serum aspartate aminotransferase, alanine aminotransferase, and total bilirubin were all informative predictors for mortality predictions in patients with liver cirrhosis.[3][4]”

[3] Radisavljevic MM, Bjelakovic GB, Nagorni A V., Stojanovic MP, Radojkovicn MD, Jovic JZ, et al. Predictors of Mortality in Long-Term Follow-Up of Patients with Terminal Alcoholic Cirrhosis: Is It Time to Accept Remodeled Scores. Med Princ Pract. 2017. doi:10.1159/000451057

[4] Li Y, Chaiteerakij R, Kwon JH, Jang JW, Lee HL, Cha S, et al. A model predicting short-term mortality in patients with advanced liver cirrhosis and concomitant infection. Med (United States). 2018. doi:10.1097/MD.0000000000012758

- “We included features that had more than 10% non-missing values, otherwise we discarded them” This implies that features could have up to 90% missing values. A more typical approach would be to only include features that have less than 5 or 10% missing values.

- What metric was used to assess feature importance?

We appreciate and agree with the reviewer’s comment. Missing values are a common issue in electronic health record data. We agree with the reviewer that 10% non-missing values seemed insufficient, but we tried to retain as many features/variables for analysis by using a low threshold (10%). We acknowledge this is a limitation and have added this limitation to the Discussion section as follows.

“Lastly, we did have some features with large amounts of missing data requiring feature selection and imputation. This situation is commonly encountered when using clinical data for research purposes and including these cases in the pipeline improves the generalizability of the results.”

The metric was used to assess feature importance for each model as follows, and we have added this description to the Methods section.

“The coefficients for each input variable retrieved from the LR model was used to measure the feature importance for each input feature. The mean decrease impurity importance of a feature by the trained RF model was used to assess feature importance of RF model. We used the “iNNvestigate” package with gradient to calculate feature importance for the DNN model.”

- Please provide plausible/physiologic explanations for why the selected features should/could be predictive of mortality

Thank you for the reviewer’s insightful feedback. We have added the following explanations for why the selected features should be predictive.

“The selected features are predictive for mortality in patients with liver cirrhosis. Baseline demographic characteristics such as age, race and ethnicity, and laboratory features collected from blood such as serum aspartate aminotransferase, alanine aminotransferase, and total bilirubin were all informative predictors for mortality predictions in patients with liver cirrhosis.[3][4]”

[3] Radisavljevic MM, Bjelakovic GB, Nagorni A V., Stojanovic MP, Radojkovicn MD, Jovic JZ, et al. Predictors of Mortality in Long-Term Follow-Up of Patients with Terminal Alcoholic Cirrhosis: Is It Time to Accept Remodeled Scores. Med Princ Pract. 2017. doi:10.1159/000451057

[4] Li Y, Chaiteerakij R, Kwon JH, Jang JW, Lee HL, Cha S, et al. A model predicting short-term mortality in patients with advanced liver cirrhosis and concomitant infection. Med (United States). 2018. doi:10.1097/MD.0000000000012758

- Please justify the use of mean/mode for missing data rather than a more sophisticated method of imputation (e.g. multiple imputation)

We appreciate and agree with the reviewer’s comment. We have conducted all the analyses by using multiple imputation for all continuous variables and got better predictive values compared to using the mean strategy. So, we moved all of the results in which the mean strategy was used to the supplementary materials and replaced the main results with those using multiple imputation for missing values. The results of Figure 2 and Table 2 are attached here.

Table 2. Prediction Metrics [n (%)] of 3 period cases for 3 machine learning models.

Models Period

(days) Accuracy

Mean(std) Precision

Mean(std) Recall

Mean(std) F1-Score

Mean(std) Specificity

Mean(std)

DNN

(all variables) 365 0.83(0.01) 0.27(0.0) 0.65(0.04) 0.38(0.01) 0.85(0.01)

180 0.86(0.02) 0.26(0.02) 0.64 (0.03) 0.37(0.02) 0.88(0.02)

90 0.90(0.02) 0.30(0.05) 0.63(0.04) 0.40(0.04) 0.92(0.02)

LR

(all variables) 365 0.77(0.01) 0.21(0.0) 0.72(0.01) 0.33(0.01) 0.77(0.01)

180 0.79(0.0) 0.19(0.0) 0.75(0.0) 0.31(0.0) 0.79(0.0)

90 0.81(0.01) 0.18(0.0) 0.78(0.03) 0.29(0.01) 0.81 (0.01)

RF

(all variables) 365 0.92(0.0) 0.47(0.04) 0.37(0.02) 0.41(0.02) 0.96 (0.0)

180 0.93(0.0) 0.46(0.03) 0.40(0.02) 0.43(0.02) 0.97 (0.0)

90 0.94 (0.0) 0.43(0.01) 0.41(0.02) 0.42(0.01) 0.97(0.0)

DNN

(4 MELD-Na variables) 365 0.78(0.02) 0.20(0.01) 0.59(0.04) 0.30(0.01) 0.80(0.03)

180 0.80(0.03) 0.18(0.02) 0.61(0.05) 0.28(0.02) 0.81(0.04)

90 0.80(0.02) 0.16(0.01) 0.66(0.03) 0.25(0.01) 0.81(0.02)

LR

(4 MELD-Na variables) 365 0.78(0.01) 0.20(0.01) 0.58(0.0) 0.30(0.01) 0.80(0.01)

180 0.80(0.01) 0.18(0.01) 0.61(0.03) 0.28(0.01) 0.81(0.0)

90 0.81(0.01) 0.16(0.01) 0.64(0.02) 0.25(0.01) 0.82(0.01)

RF

(4 MELD-Na variables) 365 0.85(0.0) 0.22(0.02) 0.36(0.04) 0.27(0.02) 0.89(0.0)

180 0.87(0.0) 0.20(0.01) 0.36(0.01) 0.26(0.01) 0.90(0.01)

90 0.89(0.0) 0.20 (0.02) 0.38(0.04) 0.26(0.03) 0.92(0.0)

- Details are provided for the parameters used for DNN but not for RF and LR. "The LR and RF models were configured by the default options in package of Scikit-learn in Python 3.” It would be helpful to provide similar

We appreciate the reviewer’s feedback and our apologies for excluding the configuration of LR and RF models. We have added more details with regard to hyperparameter tuning as follows.

“We performed a grid search of hyperparameters for RF and LR models by five-fold cross validation. We searched the number of trees in the forest for 200, 500, and 700, and we considered the number of features for the best split according to auto, sqrt, and log2. For the LR model, we searched the norm for L1 and L2 norm in penalization, and the inverse value of regularization strength for 10 different numbers spaced evenly on a log scale of [0, 4].

The RF model was configured as follows: the number of trees in the RF was set 500; the number of maximum features that could be used in each tree was set as the square root of the total number of features; the minimum number of samples at a leaf node of a tree was set as 1. The LR model was configured as follows: the L2 norm was used in the penalization, i.e., the variance of predicted value and real value of training data; the stopping criteria was set as 1.0*10-4; the inverse of regularization strength, which reduces the potential overfitting, was set as 1.0.”

- MDClone is mentioned for the first time in the discussion. This should be explained earlier.

Thank you for this suggestion. We have explained the MDClone platform for storing data earlier in the Methods section as follows.

“Our institution partnered with MDClone[5][6] (Beer Sheva, Israel) for the data storage and retrieval. MDClone platform is a data engine by storing EHR medical events in a time order for each patient. Queries can be built to pull computationally-derived or original EHR data from the platform.”

[5] Foraker R, Yu S, Michelson A, Pineda Soto J, Colvin R, Loh F, et al. Spot the Difference: Comparing Results of Analyses from Real Patient Data and Synthetic Derivatives. JAMIA OPEN.

[6] Guo A, Foraker RE, MacGregor RM, Masood FM, Cupps BP, Pasque MK. The Use of Synthetic Electronic Health Record Data and Deep Learning to Improve Timing of High-Risk Heart Failure Surgical Intervention by Predicting Proximity to Catastrophic Decompensation. Front Digit Heal. 2020. doi:10.3389/fdgth.2020.576945

- Table 2: How were these combinations of recall/sensitivity and specificity selected? For clinical applications it is often useful to consider set one of these metrics (sensitivity or specificity) that is expected to be clinically useful and then compare the other metric amongst the models. I recommend some consideration of the tradeoffs of sensitivity and specificity for this application depth of information for these latter models.

We appreciate the reviewer’s feedback and thoughtful comments. In Table 2, we used the threshold of 0.5 to calculate evaluation metrics such as recall and specificity. We agree that it is clinically useful to consider different tradeoffs of sensitivity and specificity for the specific clinical application. We have conducted the analysis considering other 10 different tradeoffs, i.e., 0.05, 0.1, 0.2, 0.3, 0.4, 0.6, 0.7, 0.8, 0.9, 0.95. The results of models with all 41 variables were summarized as Table S3 as follows for the case of 365 days.

Table S3. Prediction Metrics [n(%)] of 3 machine learning models under 10 different tradeoffs for case of 365 days.

Models Tradeoff Accuracy

Mean(std) Precision

Mean(std) Recall

Mean(std) F1-Score

Mean(std) Specificity

Mean(std)

DNN 0.05 0.52 (0.06) 0.14 (0.01) 0.93 (0.02) 0.24 (0.02) 0.48 (0.07)

0.1 0.63 (0.06) 0.17 (0.02) 0.87 (0.04) 0.28 (0.02) 0.61 (0.07)

0.2 0.78 (0.01) 0.22 (0.01) 0.71 (0.01) 0.34 (0.01) 0.79 (0.01)

0.3 0.79 (0.02) 0.22 (0.01) 0.69 (0.05) 0.34 (0.02) 0.79 (0.03)

0.4 0.82 (0.03) 0.25 (0.03) 0.62 (0.03) 0.35 (0.03) 0.83 (0.04)

0.6 0.86 (0.01) 0.3 (0.02) 0.51 (0.03) 0.37 (0.02) 0.9 (0.01)

0.7 0.88 (0.01) 0.31 (0.02) 0.44 (0.02) 0.37 (0.02) 0.92 (0.01)

0.8 0.91 (0.0) 0.41 (0.03) 0.33 (0.04) 0.37 (0.03) 0.96 (0.0)

0.9 0.93 (0.0) 0.6 (0.06) 0.21 (0.02) 0.31 (0.02) 0.99 (0.0)

0.95 0.93 (0.0) 0.86 (0.04) 0.16 (0.03) 0.26 (0.04) 1.0 (0.0)

LR 0.05 0.25 (0.01) 0.09 (0.0) 0.96 (0.0) 0.17 (0.0) 0.19 (0.01)

0.1 0.35 (0.01) 0.1 (0.0) 0.92 (0.0) 0.19 (0.0) 0.3 (0.01)

0.2 0.51 (0.01) 0.12 (0.0) 0.85 (0.01) 0.22 (0.0) 0.48 (0.01)

0.3 0.61 (0.01) 0.14 (0.0) 0.78 (0.01) 0.24 (0.0) 0.6 (0.01)

0.4 0.7 (0.01) 0.17 (0.01) 0.71 (0.02) 0.27 (0.01) 0.7 (0.01)

0.6 0.82 (0.01) 0.23 (0.01) 0.53 (0.02) 0.32 (0.01) 0.85 (0.01)

0.7 0.86 (0.0) 0.27 (0.01) 0.44 (0.02) 0.33 (0.01) 0.9 (0.0)

0.8 0.89 (0.0) 0.32 (0.01) 0.32 (0.01) 0.32 (0.01) 0.94 (0.0)

0.9 0.91 (0.0) 0.37 (0.04) 0.15 (0.02) 0.22 (0.03) 0.98 (0.0)

0.95 0.92 (0.0) 0.39 (0.07) 0.07 (0.01) 0.12 (0.02) 0.99 (0.0)

RF 0.05 0.51 (0.01) 0.13 (0.0) 0.93 (0.01) 0.23 (0.0) 0.48 (0.01)

0.1 0.63 (0.01) 0.16 (0.0) 0.88 (0.01) 0.28 (0.0) 0.61 (0.01)

0.2 0.77 (0.01) 0.22 (0.01) 0.75 (0.01) 0.34 (0.01) 0.77 (0.01)

0.3 0.85 (0.0) 0.28 (0.01) 0.62 (0.01) 0.39 (0.01) 0.86 (0.01)

0.4 0.89 (0.0) 0.36 (0.01) 0.5 (0.01) 0.42 (0.01) 0.92 (0.0)

0.6 0.92 (0.0) 0.49 (0.02) 0.29 (0.02) 0.36 (0.02) 0.97 (0.0)

0.7 0.92 (0.0) 0.53 (0.03) 0.21 (0.01) 0.3 (0.02) 0.98 (0.0)

0.8 0.93 (0.0) 0.62 (0.03) 0.15 (0.01) 0.25 (0.01) 0.99 (0.0)

0.9 0.93 (0.0) 0.73 (0.04) 0.11 (0.01) 0.19 (0.01) 1.0 (0.0)

0.95 0.93 (0.0) 0.82 (0.04) 0.09 (0.01) 0.16 (0.01) 1.0 (0.0)

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Ming-Lung Yu

22 Jul 2021

PONE-D-20-37171R1

Predicting mortality among patients with liver cirrhosis in electronic health records with machine learning

PLOS ONE

Dear Dr. Foraker,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Sep 05 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Ming-Lung Yu, MD, PhD

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: I Don't Know

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: The authors have largely addressed my concerns. I still question the use of features with only more than 10% non-missing values. My understanding this is contrary to common practice which would require a much higher proportion of non-missing values.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Ta-Wei Liu, M.D.

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Aug 31;16(8):e0256428. doi: 10.1371/journal.pone.0256428.r004

Author response to Decision Letter 1


25 Jul 2021

Predicting mortality among patients with liver cirrhosis in electronic health records with machine learning

We would like to thank the reviewers for their informed, thoughtful, and helpful comments. Please find our responses to the reviews below in italics. We hope that they will find it suitable for publication in the PlOS ONE.

Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: The authors have largely addressed my concerns. I still question the use of features with only more than 10% non-missing values. My understanding this is contrary to common practice which would require a much higher proportion of non-missing values.

We appreciate and agree with the reviewer’s comment, and our apologies for the confusion in our original manuscript and the response of the first-round revision. Among all these 41 features, most of the features had low missing value rates (<20%), and only 4 features had high missing value rate (shown in the following Table 1). We tried to keep more features/variables for the study.

Table 1. Missing value rates of all studied 41 features.

Variables Non-missing value rate (%) Missing value rate (%)

Gender 100 0

Primary race 97.36 2.64

Ethnicity 67.84 32.16

Age at event 100 0

Associated visit type 86.43 13.57

Condition 100 0

Condition type 100 0

Present on admission 84.22 15.78

Diagnosis type 84.22 15.78

Ascites-Condition 100 0

hosp-Admission start date 65.42 34.58

hosp-Admission end date 65.42 34.58

bmi-Average calculated bmi 55.49 44.51

bmi-Average weight 61.45 38.55

bmi-Average height 58 42

smk-Alcohol use 48.13 51.87

Reference Event-Facility 99.99 0.01

sodium-Result value numeric 80.49 19.51

sodium-Age at event 80.49 19.51

INR-Result value numeric 62.08 37.92

INR-Age at event 62.15 37.85

creatinine-Result value numeric 79.99 20.01

creatinine-Age at event 80.06 19.94

Tbili-Result value numeric 77.01 22.99

Tbili-Age at event 77.39 22.61

Mcv-Estimated result 79.87 20.13

Mcv-Age at event 79.87 20.13

hemoglobin-Result value numeric 80.9 19.1

hemoglobin-Age at event 80.9 19.1

potassium-Result value numeric 80.56 19.44

potassium-Age at event 80.57 19.43

bicarbonate-Result value numeric 16.19 83.81

bicarbonate-Age at event 16.19 83.81

alt-Result value numeric 76.55 23.45

alt-Age at event 76.71 23.29

ast-Result value numeric 77.42 22.58

ast-Age at event 77.42 22.58

alkaline-Result value numeric 77.37 22.63

alkaline-Age at event 77.39 22.61

abo/Rh-Age at event 22.62 77.38

AFP-Age at event 14.84 85.16

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 2

Ming-Lung Yu

9 Aug 2021

Predicting mortality among patients with liver cirrhosis in electronic health records with machine learning

PONE-D-20-37171R2

Dear Dr. Foraker,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Ming-Lung Yu, MD, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: I Don't Know

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Dear authors:

There were at least five missing data rate larger than 50%, which might raise concerns:

bicarbonate-Result value numeric16.19 83.81

bicarbonate-Age at event16.19 83.81

AFP-Age at event14.84 85.16

abo/Rh-Age at event22.62 77.38

smk-Alcohol use48.13 51.87

The author could discuss more about the features with high missing values.

Firstly, what is the rationale to choose each features with large amount of missing values? Secondly, the authors may indicate the pattern of the missing data, i.e., were those data missing at completely random or not? Is there specific mechanism causing the missing data? Based on the underlying mechanism of missing data, the authors may address the model of the distribution of each feature which validates the results of multiple imputation.

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Ta-Wei Liu

Reviewer #2: No

Acceptance letter

Ming-Lung Yu

23 Aug 2021

PONE-D-20-37171R2

Predicting mortality among patients with liver cirrhosis in electronic health records with machine learning

Dear Dr. Foraker:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Ming-Lung Yu

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Prediction performance by deep neural network (DNN), random forest (RF) and logistic regression (LR) with mean imputation strategy.

    Figure a, b, c is for the case of mortality within 365 days, 180 days, and 90 days, respectively.

    (PDF)

    S1 Table. The study features and feature description.

    (DOCX)

    S2 Table. Prediction metrics [n (%)] of 3 period cases for 3 machine learning models with mean strategy for imputation.

    (DOCX)

    S3 Table. Prediction metrics [n (%)] of 3 machine learning models under 10 different tradeoffs for case of 365 days.

    (DOCX)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    As the data set contains potentially identifying and sensitive patient information, we cannot share these data publicly without permission. To request the data, please contact the Washington University Human Research Protection Office by mail at MS08089-29-2300, 600 S. Euclid Avenue, Saint Louis, MO 63110 or by phone at 314-747-6800 or by email hrpo@wustl.edu.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES