Training and Validation of Deep Neural Networks for the Prediction of 90-Day Post-Liver Transplant Mortality Using UNOS Registry Data

Brent D Ershoff; Christine K Lee; Christopher L Wray; Vatche G Agopian; Gregor Urban; Pierre Baldi; Maxime Cannesson

doi:10.1016/j.transproceed.2019.10.019

. Author manuscript; available in PMC: 2020 Sep 29.

Published in final edited form as: Transplant Proc. 2020 Jan 8;52(1):246–258. doi: 10.1016/j.transproceed.2019.10.019

Training and Validation of Deep Neural Networks for the Prediction of 90-Day Post-Liver Transplant Mortality Using UNOS Registry Data

Brent D Ershoff ^a,^1,^*, Christine K Lee ^b,¹, Christopher L Wray ^a, Vatche G Agopian ^c, Gregor Urban ^d, Pierre Baldi ^b,^d, Maxime Cannesson ^a

PMCID: PMC7523496 NIHMSID: NIHMS1628916 PMID: 31926745

Abstract

Prediction models of post-liver transplant mortality are crucial so that donor organs are not allocated to recipients with unreasonably high probabilities of mortality. Machine learning algorithms, particularly deep neural networks (DNNs), can often achieve higher predictive performance than conventional models. In this study, we trained a DNN to predict 90-day post-transplant mortality using preoperative variables and compared the performance to that of the Survival Outcomes Following Liver Transplantation (SOFT) and Balance of Risk (BAR) scores, using United Network of Organ Sharing data on adult patients who received a deceased donor liver transplant between 2005 and 2015 (n = 57,544). The DNN was trained using 202 features, and the best DNN’s architecture consisted of 5 hidden layers with 110 neurons each. The area under the receiver operating characteristics curve (AUC) of the best DNN model was 0.703 (95% CI: 0.682-0.726) as compared to 0.655 (95% CI: 0.633-0.678) and 0.688 (95% CI: 0.667-0.711) for the BAR score and SOFT score, respectively. In conclusion, despite the complexity of DNN, it did not achieve a significantly higher discriminative performance than the SOFT score. Future risk models will likely benefit from the inclusion of other data sources, including high-resolution clinical features for which DNNs are particularly apt to outperform conventional statistical methods.

LIVER transplantation is the definitive treatment for irreversible liver failure, with thousands of lives saved each year in the Unites States through deceased donor organ donation. Unfortunately, with the demand for donor organs far exceeding the supply, thousands of patients die waiting for this life saving procedure [1]. As such, the development of predictive models of post-transplant mortality is crucial to avoid transplanting an individual with an unacceptably low probability of post-transplant survival. As the severity of recipient medical comorbidities has grown, there is concern that an increasing number of patients are becoming too sick to transplant [2,3]. While the prediction of preoperative mortality among those waiting for an organ has been quite successful with the adoption of the Model for End-Stage Liver Disease (MELD) score to prioritize organ allocation [3-6], the accurate prediction of post-transplant mortality has been difficult and less successful [7].

Several predictive models have been developed using preoperative recipient and organ donor factors from either registry- or institution-level data. These have been developed with the aim of avoiding futile transplantation, assisting with donor-recipient matching, and for comparing outcomes across different institutions. Two of the most commonly cited risk models are the Balance of Risk (BAR) score [8] and the Survival Outcomes Following Liver Transplantation (SOFT) score [9], both of which predict 90-day post-liver transplant mortality using United Network of Organ Sharing (UNOS) registry data. The SOFT score incorporated a combination of 18 recipient and donor variables and achieved a c-statistic of 0.7, and the BAR score achieved a C-statistic of 0.7 using a combination of just 6 recipient and donor variables. Despite the popularity of these models in academic circles, their clinical use has been limited due to their modest discriminative performance with decision making left to the judgment of the selection committee and transplant clinicians.

Risk models in medicine have traditionally been based on regression models whereby the outcome variable is modeled as a linear combination of predictor variables and thereby have been limited in their ability to model high-order interactions and nonlinear functions of the features. Machine learning algorithms, which allow for more flexible modeling of the data, can often achieve higher predictive performance than more conventional statistical models. One class of machine learning algorithms, deep neural networks (DNNs), also known as deep learning, has become popular in recent years because of its success in solving a variety of problems from computer vision [10-15], high energy physics [16,17], chemistry [18-20], and biology [21-23]. In clinical medicine, predictive modeling using machine learning has been applied to the prediction of cardiorespiratory instability [24,25], 30-day readmission, [26,27], and in-hospital postoperative mortality [28].

The use of DNNs in liver transplantation has been relatively limited. To date, DNNs have been largely unexplored in the prediction of post-liver transplant mortality using UNOS data. In this manuscript, we present the development and validation of a DNN model using preoperative variables from the UNOS registry to predict 90-day post-liver transplant mortality. We compare the discriminative ability of the DNN model to that of the BAR and SOFT score models.

MATERIALS AND METHODS

This manuscript follows the “Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View” [29].

UNOS Data Extraction

All data for this study were extracted from the standard transplant analysis and research (STAR) dataset, which contains patient-level data for all transplants in the Unites States reported to the Organ Procurement and Transplantation Network (OPTN) since October 1, 1989. The database has been used in numerous important studies of transplantation [30] and contains data on pretransplant variables pertaining to the recipient, donor variables reported from the organ procurement organization, as well as post-transplantation outcome data. The OPTN mortality data are linked by UNOS to the Social Security Death Master file to improve ascertainment of recipient death data [30]. In accordance with the OPTN Final Rule, 42 CFR Part 121, the UNOS provided the author (B.E.) with the patient-level, nonidentifiable data extracted from the STAR database maintained by UNOS for the purpose of conducting this research. Access to this data was approved through a data-use agreement with UNOS.

Study Sample

The study sample included adult deceased donor liver transplants performed from 2005 to 2015. Transplants performed from 2016 onward were not included in this analysis to ensure adequate time for ascertainment of outcome data, and transplants performed prior to 2005 were excluded because 1. transplants before 2002 were performed prior to implementation of the MELD score allocation system and 2. data on several predictor variables were either not reported or were inconsistently recorded prior to that time. Exclusion criteria included age less than 18 years, living donor transplantation (n = 2347), multiple-organ transplantation (n = 5267), as well as those lost to follow-up within 90 days post-transplantation (n = 70) as these cases were excluded in the development of the SOFT score and BAR score (Fig 1). For patients who underwent more than 1 liver transplantation (n = 3503), we included each of the transplantations in the analysis, as did other comparable prediction models. The study sample included split liver as well as donation after cardiac death donors. In sum, we analyzed 57,544 recipients.

Fig 1. — Flow chart of study cohort. The flow chart illustrates the inclusion and exclusion criteria of liver transplant recipients included in the study sample. STAR, Standard Transplant Analysis and Research. *Based on OPTN data as of September 9, 2016.

Model Endpoint Definition

The occurrence of death within 90 days from transplantation was extracted as a binary event (0, 1). An event occurred if the value of the variable “pstatus” from the STAR dataset was equal to “1”, and the variable “prime” was less than or equal to 90. The variable “pstatus” indicates whether the recipient had died post-transplant, and the variable “ptime” indicates the time from transplantation to either death or censoring. These variables are based on the combination of mortality data from OPTN database as well as verified external sources of death (described above) and not based on the variable “PX_STAT,” which only accounts for death as documented by the OPTN alone.

Model Input Features

The original STAR dataset contained 395 variables, many of which were not considered for inclusion in the model. Variables that were excluded from model development included those pertaining to post-transplant data, living donor transplants, multiorgan transplants, and identifier code variables. Variables with zero or near zero variances, high levels of missing data (> 98%) or those that were highly correlated to other variables (r > 0.99) were removed. A few variables with > 50% missing data combined with low clinical significance based on domain experts (B.E. and C.W.) were not analyzed. This resulted in 202 features, including 132 recipient variables and 70 donor-related variables (Table 1). To further reduce the feature set, variables with greater than 50% missing data or those containing greater than 95% zero values were removed, and the remaining variables comprised a reduced feature set (RFS).

Table 1.

Description of Deep Neural Network Input Features

Feature	Description
abo_A^†	Recipient blood type A
abo_AB	Recipient blood type AB
abo_B	Recipient blood type B
abo_don_A	Donor blood type A
abo_don_AB^†	Donor blood type AB
abo_don_B	Donor blood type B
abo_don_O	Donor blood type O
abo_mat	Donor-recipient ABO match level
abo_O	Recipient blood type O
age	Recipient’s age
age_don	Donor’s age
albumin_tx	Recipient’s albumin concentration at transplant
antihype_don	Donor received antihypertensives within 24 h of cross clamp
arginine_don	Donor received arginine vasopressin within 24 h of cross clamp
ascites_tx^*	Recipient’s degree of ascites at transplantation
bact_perit_tcr	Recipient had history of SBP at registration
bmi_calc	Recipient’s BMI at transplantation
bmi_don_calc	Donor’s BMI
bmi_tcr	Recipient’s BMI at registration
bun_don	Donor’s terminal blood urea nitrogen concentration
cardarrest_neuro	Donor had a cardiac arrest after brain death
cdc_risk_hiv_don	Donor had risk factors for blood-borne disease transition
citizenship^†,^*	Recipient was a United States citizen
citizenship_don^†,^*	Donor was a United States citizen
clin_infect_don	Donor had a clinical infection
cmv_don^*	Donor’s CMV seropositivity
cmv_igg^*	Recipient’s CMV IGG test result at transplant
cmv_igm^‡,^*	Recipient’s CMV IGM test result at transplant
cmv_status^*	Recipient’s CMV seropositivity at transplant
cod_cad_don_1	Donor’s cause of death was due to anoxia
cod_cad_don_2	Donor’s cause of death was due to stroke
cod_cad_don_3	Donor’s cause of death was due to head trauma
cod_cad_don_4^†	Donor’s cause of death was due to a cans tumor
cold_isch	Cold ischemia time
coronary1^*	Donor coronary angiogram was performed, and result was normal
creat_don	Donor’s terminal creatinine concentration
creat_tx	Recipient’s creatinine concentration at transplant
dayswait_chron	Number of days recipient was on transplant waiting list
ddavp_don	Donor received DDAVP
death_circum_don^*	Donor’s circumstance of death was due to natural causes
death_mech_don^†,^*	Donor’s mechanism of death
dgn_tcr_AHN^†,^*	Recipient’s primary diagnosis at listing: acute hepatic necrosis
dgn_tcr_autoimmune^†,^*	Recipient’s primary diagnosis at listing: autoimmune hepatitis
dgn_tcr_cryptogenic^*	Recipient’s primary diagnosis at listing: cryptogenic cirrhosis
dgn_tcr_etoh^*	Recipient’s primary diagnosis at listing: ETOH cirrhosis
dgn_tcr_etoh_hcv^*	Recipient’s primary diagnosis at listing: ETOH or HCV cirrhosis
dgn_tcr_HBV^†,^*	Recipient’s primary diagnosis at listing: HBV cirrhosis
dgn_tcr_HCC^*	Recipient’s primary diagnosis at listing: HCC cirrhosis
dgn_tcr_HCV^*	Recipient primary diagnosis at listing: HCV cirrhosis
dgn_tcr_NASH^*	Recipient’s primary diagnosis at listing: NASH cirrhosis
dgn_tcr_PBC^†,^*	Recipient’s primary diagnosis at listing: PBC cirrhosis
dgn_tcr_PSC^†,^*	Recipient’s primary diagnosis at listing: PSC cirrhosis
dgn2_tcr_AHN^†,^*	Recipient secondary diagnosis at listing: acute hepatic necrosis
dgn2_tcr_autoimmune^†,^*	Recipient’s secondary diagnosis at listing: autoimmune hepatitis
dgn2_tcr_cryptogenic^†,^*	Recipient’s secondary diagnosis at listing: cryptogenic cirrhosis
dgn2_tcr_etoh^†,^*	Recipient’s secondary diagnosis at listing: ETOH cirrhosis
dgn2_tcr_etoh_hcv^†,^*	Recipient’s secondary diagnosis at listing: ETOH or HCV cirrhosis
dgn2_tcr_HBV^†,^*	Recipient’s secondary diagnosis at listing: HBV cirrhosis
dgn2_tcr_HCC^*	Recipient’s secondary diagnosis at listing: HCC cirrhosis
dgn2_tcr_HCV^†,^*	Recipient’s secondary diagnosis at listing: HCV cirrhosis
dgn2_tcr_NASH^†,^*	Recipient’s secondary diagnosis at listing: NASH cirrhosis
dgn2_tcr_PBC^†,^*	Recipient’s secondary diagnosis at listing: PBC cirrhosis
dgn2_tcr_PSC^†,^*	Recipient’s secondary diagnosis at listing: PSC cirrhosis
diab^*	Recipient had diabetes at registration
diabdur_don	Duration of time that donor had diabetes
diabetes_don	Donor had a history of diabetes
diag_AHN^†,^*	Recipient’s diagnosis at transplant: acute hepatic necrosis
diag_autoimmune^†,^*	Recipient’s diagnosis at transplant: autoimmune hepatitis
diag_cryptogenic^†,^*	Recipient’s diagnosis at transplant: cryptogenic cirrhosis
diag_etoh^*	Recipient’s diagnosis at transplant: ETOH cirrhosis
diag_etoh_hcv^†,^*	Recipient’s diagnosis at transplant: ETOH or HCV cirrhosis
diag_HBV^†,^*	Recipient’s diagnosis at transplant: HBV cirrhosis
diag_HCC^*	Recipient’s diagnosis at transplant: HCC cirrhosis
diag_HCV^*	Recipient’s diagnosis at transplant: HCV cirrhosis
diag_NASH^*	Recipient’s diagnosis at transplant: NASH cirrhosis
diag_PBC^†,^*	Recipient’s diagnosis at transplant: PBC cirrhosis
diag_PSC^†,^*	Recipient’s diagnosis at transplant: PSC cirrhosis
dial_tx	Recipient had dialysis in the wk prior to transplant
distance	Distance from donor hospital to transplant hospital
ebv_igg_cad_don^*	Donor’s EBV IGG test result
ebv_igm_cad_don^†,^*	Donor’s EBV IGM test result
ebv_serostatus^*	Recipient’s EBV seropositivity at transplant
ecd_donor	Donor was an ECD donor per kidney allocation definition
education^*	Recipient’s highest education level at registration
enceph_tx^*	Recipient’s degree of encephalopathy at transplant
end_stat^†,^*	Recipient was status 1 at time of transplant
ethcat_1^*	Recipient’s race is Caucasian
ethcat_2^*	Recipient’s race is of African descent
ethcat_4^*	Recipient’s ethnicity is Hispanic
ethcat_5^†,^*	Recipient’s race is Asian
ethcat_don_1^*	Donor’s race is Caucasian
ethcat_don_2^*	Donor’s race is of African descent
ethcat_don_4^*	Donor’s ethnicity is Hispanic
ethcat_don_5^†,^*	Donor’s race is Asian
ethcat_don_other^†,^*	Donor’s race is other
ethcat_other^†,^*	Recipient’s race is other
ever_approved^‡	Recipient ever had a MELD exception application approved
exc_case	Recipient had MELD exception points at the time of transplantation
exc_diag_id_cat1^*	Recipient’s exception points allotted for HCC
exc_diag_id_cat2^†,^*	Recipient’s exception points allotted for familial amyloidosis
exc_diag_id_cat3^†,^*	Recipient’s exception points allotted for hepatopulmonary syndrome
exc_diag_id_cat4^†,^*	Recipient’s exception points allotted for portopulmonary hypertension
exc_diag_id_cat5^†,^*	Recipient’s exception points allotted for metabolic diseases
exc_diag_id_cat6^†,^*	Recipient’s exception points allotted for hepatic artery thrombosis
exc_diag_id_cat7^*	Recipient’s exception points allotted for other causes
exc_ever	Whether an exception was ever submitted for the recipient
exc_hcc	Recipient’s exception was for HCC
final_inr	Recipient’s INR at transplantation
final_serum_sodium	Recipient’s sodium concentration at transplantation
func_stat_tcr^*	Recipient’s functional status at registration
func_stat_trr^*	Recipient’s functional status at transplantation
gender	Recipient’s gender
gender_don	Donor’s gender
hbv_core^*	Recipient’s HBV core seropositivity
hbv_core_don^*	Donor’s HBV core seropositivity
hbv_sur_antigen^†,^*	Recipient’s HBV surface antigen seropositivity
hbv_sur_antigen_don^†,^*	Donor’s HBV surface antigen seropositivity
hcc_ever_appr^‡	Whether recipient ever had an approved HCC exception
hcv_serostatus	Recipient’s HCV seropositivity
hematocrit_don	Donor’s hematocrit
hep_c_anti_don^†	Donor’s HCV seropositivity
heparin_don	Donor received heparin
hgt_cm_calc	Recipient’s height at transplantation
hgt_cm_don_calc	Donor’s height
hgt_cm_tcr	Recipient’s height at registration
hist_cancer_don^†	Donor had a history of cancer
hist_cig_don	Donor had a history > 20 pack-years of smoking
hist_cocaine_don	Donor had a history of cocaine use
hist_insulin_dep_don^‡	Donor had a history of insulin dependent diabetes
hist_oth_drug_don	Donor had a history of other drug use in the past
history_mi_don^†	Donor had a history of myocardial infarction
hypertens_dur_don^*	Donor’s history and duration of hypertension
index2^*	Recipient’s number of previous liver transplants prior to current one
init_age	Recipient’s age at listing
init_albumin^*	Recipient’s albumin concentration at listing
init_ascites	Recipient’s degree of ascites at listing
init_bilirubin	Recipient’s bilirubin concentration at listing
init_bmi_calc	Recipient’s BMI at listing
init_dialysis_prior_week^†	Recipient at listing had received dialysis twice in the prior wk
init_enceph^*	Recipient’s degree of encephalopathy at listing
init_hgt_cm	Recipient’s height at listing
init_inr	Recipient’s INR at listing
init_meld_peld_lab_score	Recipient’s laboratory MELD score at listing
init_serum_creat	Recipient’s creatinine concentration at listing
init_serum_sodium	Recipient’s sodium concentration at listing
init_stat^†,^*	Recipient was status 1 at listing
init_wgt_kg	Recipient’s weight at listing
inotrop_support_don	Donor was on inotropic medications at procurement
inr_tx	Recipient’s INR at transplantation
insulin_dep_don^*	Donor had a history of insulin dependent diabetes
insulin_don	Recipient received insulin within 24 h of cross clamp
life_sup_tcr^†	Recipient was on “life support” at registration
life_sup_trr	Recipient was on “life support” at transplant
lityp^*	Donor graft was a split or whole graft
macro_fat_li_don^*	Donor organ was biopsied and macrosteatosis was greater than 30%
Malig	Recipient had a history of malignancy at transplantation
malig_tcr	Recipient had a history of malignancy at registration
malig_type^†,‡	Recipient’s malignancy type was HCC
med_cond_trr	Recipient’s medical condition at transplant (1 = home, 2 = hospital, 3 = ICU)
meld_diff_reason_cd_1^†,^*	MELD score and laboratory MELD score difference is because of status 1
meld_diff_reason_cd_2^*	MELD score and laboratory MELD score difference is because of HCC
meld_peld_lab_score	Recipient’s laboratory MELD score at transplant
micro_fat_li_don^*	Donor organ was biopsied and microsteatosis was greater than 30%
non_hrt_don	Donor is a donation after cardiac death organ
num_prev_tx	Recipient’s number of previous transplants
on_vent_trr	Recipient was on ventilator at time of transplant
oth_life_sup_tcr^†	Recipient was on other type of “life support” at registration
oth_life_sup_trr^†	Recipient was on other type of “life support” at transplantation
ph_don	Donor pH
portal_vein_tcr^†	Recipient had portal vein thrombosis at registration
portal_vein_trr	Recipient had portal vein thrombosis at transplant
prev_ab_surg_tcr	Recipient had previous abdominal surgeries at registration
prev_ab_surg_trr	Recipient had previous abdominal surgeries at transplant
prev_tx	Recipient ever had a previous liver transplant
pri_payment_tcr	Projected payment for transplant at registration is from private insurance
pri_payment_trr^*	Payment source for transplant is from private insurance
protein_urine	Donor had protein in urine
prvtxdif^*	Number of days between current liver transplant and prior liver transplant
pt_diuretics_don	Donor received diuretics within 24 h of procurement
pt_oth_don	Donor received prerecovery medications
pt_steroids_don	Donor received steroids within 24 h of procurement
pt_t3_don^†	Donor received T3 within 24 h of procurement
pt_t4_don	Donor received T4 within 24 h of procurement
recov_out_us^†	Donor organ was recovered outside of the United States
resuscit_dur^*	Time from cardiac arrest to resuscitation for brain dead donors with arrest
sgot_don	Donor’s terminal AST concentration
sgpt_don	Donor’s terminal ALT concentration
share_ty^†,^*	Donor’s allocation type (local/regional/other)
tattoos	Donor had tattoos
tbili_don	Donor’s terminal bilirubin concentration
tbili_tx	Recipient’s bilirubin concentration at transplant
tipss_tcr	Recipient had a TIPS at registration
tipss_trr	Recipient had a TIPS at time of transplant
vasodil_don	Donor received vasodilators within 24 h of cross clamp
vdrl_don^†	Donor’s RPR seropositivity
ventilator_tcr^†	Recipient was on ventilator at registration
warmjsch_tm_don^†,^*	Duration of warm ischemia time for DCD donors
wgt_kg_calc	Recipient’s weight at transplant
wgt_kg_don_calc	Donor’s weight
wgt_kg_tcr	Recipient’s weight at registration
work_income_tcr	Recipient was working for income at registration
work_income_trr	Recipient was working for income at time of transplantation

Open in a new tab

Abbreviations: ALT, alanine aminotransferase; AST, aspartate aminotransferase; BMI, body mass index; CMV, cytomegalovirus virus; DCD, donation after cardiac death; DDAVP, desmopressin; EBV, Epstein-Barr virus; ECD, expanded criteria donor; ETOH, alcoholic; HBV, hepatitis B virus; HCC, hepatocellular carcinoma; HCV, hepatitis C virus; ICU, intensive care unit; IGM, Immunoglobulin M; IGG, Immunoglobulin G; INR, international normalized ratio; MELD, Model for End-Stage Liver Disease; NASH, nonalcoholic steatohepatitis; PBC, primary biliary cirrhosis; PSC, primary sclerosing cholangitis; RPR, rapid plasma regain; SBP, spontaneous bacterial peritonitis; TIPS, transjugular intrahepatic portosystemic shunt.

Input feature was engineered; see Supplemental Table 1 for description.

^†

feature excluded from RFS due to greater than 95% of values were equal to zero.

^‡

Feature excluded from RFS due to greater than 50% of values were missing.

While most of the categorical features had a simple binary encoding (Table 1), categorical features identified by domain expert (B.E. and C.W.) that required more complex encoding were encoded based on clinician judgment. For example, the variable “DIAG,” which indicates a recipient’s primary liver disease diagnosis at transplantation, contains 70 possible unique diagnosis codes. Rather than creating 70 new binary categorical features, groups of diagnosis codes were used to collapse the 70 unique codes into 11 new categorical features.

BAR Score and SOFT Score

The BAR score and SOFT score are 2 models used to predict 90-day post-liver transplant survival using UNOS data. To compare the discriminative ability of the DNN to that of these models, the BAR score and SOFT score were calculated for recipients in this dataset. The formula for calculating the BAR score and SOFT score are provided in Fig 2 [8,9]. Data on cold ischemia time was missing for 2.8% of recipients; therefore, the BAR score could not be calculated for these subjects. The amount of missing data for other variables was < 0.1%, and these cases were removed from the calculation of the BAR score’s area under the receiver operating characteristics curve (AUC). Missing data for the SOFT score was handled by assigning the missing value to the reference group category, as indicated by the scoring methodology. One of the 18 variables that comprises the original SOFT score is the presence of a portal bleed within 48 hours of transplantation. This variable was not available in the STAR dataset and therefore was not included in the calculated SOFT score. In the original development of the SOFT score model, only 3% of patients had a portal bleed, and data for this variable were missing for 50% of recipients [9]. In our analysis, we calculated the SOFT score using the remaining 17 components.

Fig 2. — Calculation of BAR score and SOFT score. The BAR score and SOFT score are calculated by adding the points assigned to each attribute. BMI, body mass index; CVA, cerebrovascular accident; MELD, Model for end-stage liver disease. *Feature not available in STAR dataset. SOFT score in this manuscript was calculated on the available 17 features.

Data Preprocessing

Prior to model development, missing values were imputed with the mean value for continuous variables and with 0 for categorical variables. The data were then randomly divided into training (80%) and test (20%) data sets. The training data was rescaled to have a mean of 0 and standard deviation of 1 per feature. The test data was rescaled to the training mean and standard deviation.

“Soft” Binning Features

Besides following the standard approach of normalizing individual input features, we also experimented with a novel idea that we will refer to as “soft binning.” Similar to standard/“hard” binning, the data representation of any feature is replaced by a fixed number of bins, containing numbers between 0 and 1. Ordinary binning discretizes a feature by representing it as a single “1” in 1 bin and zeroes in all other bins, potentially resulting in loss of information and making the classification task harder. “Soft” binning is the most straightforward generalization of binning without loss of information, where 2 bins are assigned values in the range of 0 to 1, which sum to 1. These values encode the fraction to which the feature’s value falls into the given bins. For example, if in standard binning a value would fall exactly on the boundary between 2 bins, then it would instead be represented as 2 neighboring entries of “0.5” in the neighboring bins in “soft” binning. Our motivation for creating “soft” binning was that binning alleviates the burden for the neural network to learn individual features thresholds (ie, “high,” “average,” or “low”) and thus improves classification accuracy.

Development of the Model

The primary aim of the study was to classify recipients with 90-day post-liver transplant mortality using DNNs, also referred to as deep learning. During development of DNNs, there are many unknown model parameters that need to be optimized during training. These model parameters are first initialized and then optimized to decrease the error of the model’s output to correctly classify mortality. The type of DNN used in this study was a feedforward network with fully connected layers and a logistic output. “Fully connected” refers to the fact that all neurons between 2 adjacent layers are fully pairwise connected. A logistic output was chosen so that the output of the model could be interpreted as probability of mortality (0-1). We used stochastic gradient descent with momentum (0.2, 0.5, 0.9) and initial learning rates (0.01, 0.001, 0.1) and a batch size of 500. We also assessed DNN architectures of 1 to 5 hidden layers with (10, 50, 100, 110, 115, 120, 130, 140, 150) neurons per layer and rectified linear unit activation functions. The loss function was cross entropy. To minimize overfitting, we used 3 methods: 1. early stopping with a patience of 10 epochs, 2. L2 weight decay, and 3. dropout [31,32]. We assessed L2 weight penalties of (0.01, 0.001, 0.0001), and dropout was applied to all layers with a probability of (0,0.2, 0.5, 0.9). We used 5-fold cross validation with the training set (80%) to select the best hyperparameters and architecture based on mean cross-validation performance. These best hyperparameters and architecture were then used to train a model on the entire training set (80%) prior to testing final model performance on the separate test set (20%).

Model Performance

All model performances were assessed on 20% of the data held out from training as a test set. Model performance was assessed using AUC and was compared to the BAR score and the SOFT score.

Choosing a Threshold

The F1 score, sensitivity, and specificity were calculated for different thresholds for the DNN, as well as for the BAR score and SOFT score models. The F1 score is a measure of precision and recall, ranging from 0 to 1. It is calculated as $1 = 2 * \frac{p r e c i s i o n * r e c a l l}{p r e c i s i o n + r e c a l l}$ Thresholds that optimized the F1 score were then chosen for each model/score. The minimum thresholds to achieve a sensitivity or specificity of 90% for each model/score were also calculated. Ninety-five percent confidence intervals were calculated for all performance metrics using bootstrapping with 1000 samples.

All DNN models were developed and applied using Keras [33]. All performance metrics were calculated using scikit-learn [34]. Code is available upon reasonable request.

RESULTS

Patient Characteristics

The data consisted of 57,544 liver transplant recipients. These data were split into training (n = 46,035) and test (n = 11,509). The 90-day post-liver transplant mortality in the training and test sets were 5.4% (n = 2483) and 5.6% (n = 640), respectively.

Development of the Model

The best DNN model used the 202 original feature set (OFS) with “softbin” preprocessing of input features (DNN with OFS + softbin). The model consisted of 5 hidden layers of 110 neurons per layer with rectified linear unit activations and a logistic output and was trained with no dropout, an L2 weight decay of 0.001, a learning rate of 0.01, and a momentum of 0.5 (Table 2).

Table 2.

Best Deep Neural Network Hyperparameters for Each Model

	# of Hidden Layers	# of Neurons per Layer	L2 Lambda	Dropout Probability	Learning Rate	Momentum
DNN w/original 202 features (OFS)	5	100	0.001	0.5	0.01	0.5
DNN w/OFS + softbin	5	110	0.001	0	0.01	0.5
DNN w/reduced 140 features (RFS)	5	100	0.001	0.5	0.01	0.5
DNN w/RFS + softbin	5	110	0.001	0	0.01	0.5

Open in a new tab

Description of the architecture and selected hyperparameters of the trained neural networks.

Abbreviations: DNN, deep neural network; OFS, original feature set; RFS, reduced feature set.

Model Performance

All performance metrics reported below refer to the test dataset.

Area Under the Receiver Operating Characteristics Curves

Receiver operating characteristics curves and AUC results are shown in Fig 3 and Table 3. The best DNN model (DNN with OFS + softbin) had a higher AUC (0.703 [95% CI: 0.682-0.726]) compared to that for the BAR score and SOFT score models (0.655 [95% CI: 0.633-0.678]; 0.688 [95% CI: 0.667-0.711]), respectively, on the 11,207 patients with available BAR scores. In addition, softbin preprocessing of input features improved performance of both the OFS and RFS models. While the best DNN had a significantly higher AUC than the BAR score, the DNN did not achieve a significantly higher AUC than the SOFT score. The DNN with the reduced feature set and softbin preprocessing (DNN with RFS + softbin) performed comparably (AUC 0.702 [95% CI: 0.68-0.725]) to the DNN with OFS + softbin.

Table 3.

Area Under the ROC Curve Results With 95% Confidence Intervals for the Test Set (n = 11,509) and on the Test Set With No Null BAR Scores (n = 11,207).

	AUC (95% CI)
	n = 11,509	n = 11,207^*
BAR score^*	0.655 (0.633-0.678)	0.655 (0.633-0.678)
SOFT score	0.691 (0.671-0.714)	0.688 (0.667-0.711)
DNN w/Original 202 Features Set (OFS)	0.697 (0.678-0.72)	0.695 (0.675-0.717)
DNN w/OFS + softbin	0.708 (0.689-0.73)	0.703 (0.682-0.726)
DNN w/Reduced 140 Features Set (RFS)	0.699 (0.681-0.722)	0.698 (0.679-0.72)
DNN w/RFS + softbin	0.707 (0.688-0.729)	0.702 (0.68-0.725)

Open in a new tab

For the entire test set results, BAR score was calculated on 11,207 test patients.

Choosing a Threshold

For comparison of F1 scores, sensitivity, and specificity at different thresholds, the DNN models were compared to the BAR score and SOFT score models (Table 4). Additionally, for each of the thresholds, the number of correctly and incorrectly classified patients is displayed for all test set patients. As the BAR score could not be calculated on 302 patients in the test set due to missing data, Table 4 provides metrics applied to test sets that contain all patients with available data for the model, as well as to the set of patients for which the BAR scores could be calculated.

Table 4.

F1 Score, Sensitivity, Specificity, and Number of Correctly Identified Patients With 95% Confidence Intervals (CI) for the Test Set (n = 11,509) and on the Test Set With No Null BAR Scores (n = 11,207) for the Thresholds That Maximize F1 Score

ALL Test Patients (n = 11,509)
	Threshold	F1 Score (95% CI)	Sensitivity (95% CI)	Specificity (95% CI)	Precision (95% CI)	# TN	# FP	# FN	# TP
BAR Score^*	15	0.179 (0.159-0.2)	0.319 (0.287-0.357)	0.873 (0.867-0.88)	0.124 (0.109-0.141)	9269	1343	405	190
SOFT Score	20	0.223 (0.201-0.247)	0.38 (0.344-0.419)	0.881 (0.874-0.887)	0.158 (0.14-0.177)	9571	1298	397	243
DNN w/OFS	0.092	0.212 (0.19-0.236)	0.348 (0.316-0.385)	0.886 (0.88-0.892)	0.153 (0.135-0.171)	9632	1237	417	223
DNN w/OFS + softbin	0.113	0.22 (0.197-0.246)	0.322 (0.289-0.359)	0.906 (0.9-0.911)	0.167 (0.148-0.188)	9843	1026	434	206
DNN w/RFS	0.095	0.212 (0.19-0.235)	0.358 (0.323-0.397)	0.881 (0.875-0.888)	0.151 (0.133-0.169)	9581	1288	411	229
DNN w/RFS + softbin	0.105	0.221 (0.197-0.245)	0.345 (0.311-0.382)	0.895 (0.889-0.901)	0.162 (0.144-0.182)	9727	1142	419	221
ALL Test Patients w/BAR Score (n = 11,207)
	Threshold	F1 Score (95% CI)	Sensitivity (95% CI)	Specificity (95% CI)	Precision (95% CI)	# TN	# FP	# FN	# TP
BAR Score^*	15	0.179 (0.159-0.2)	0.319 (0.287-0.357)	0.873 (0.867-0.88)	0.124 (0.109-0.141)	9269	1343	405	190
SOFT Score	20	0.215 (0.191-0.238)	0.375 (0.336-0.416)	0.881 (0.875-0.888)	0.151 (0.132-0.169)	9354	1258	372	223
DNN w/OFS	0.092	0.206 (0.183-0.231)	0.345 (0.309-0.384)	0.887 (0.882-0.893)	0.147 (0.129-0.165)	9418	1194	390	205
DNN w/OFS + softbin	0.114	0.21 (0.186-0.235)	0.309 (0.274-0.346)	0.908 (0.903-0.913)	0.159 (0.138-0.18)	9638	974	411	184
DNN w/RFS	0.095	0.204 (0.181-0.227)	0.353 (0.315-0.391)	0.882 (0.876-0.888)	0.144 (0.126-0.162)	9361	1251	385	210
DNN w/RFS + softbin	0.105	0.21 (0.187-0.236)	0.334 (0.299-0.372)	0.896 (0.891-0.902)	0.153 (0.135-0.174)	9513	1099	396	199

Open in a new tab

Performance metrics for all models at the threshold that maximized F1 score. Among the trained DNN models, DNN w/RFS + softbin achieved the highest F1 score.

Abbreviations: DNN, deep neural network; FN, false negative; FP, false positive; OFS, original feature set; RFS, reduced feature set, TN, true negative; TP, true positive.

For the full test set results, BAR score metrics were calculated only on the 11,207 recipients with BAR scores available.

By choosing a threshold that optimizes the F1 score, the SOFT score achieved the highest F1 score (0.215 [95% CI: 0.191-0.238]) at a threshold of 20, with sensitivity and specificity of 0.375 (95% CI: 0.336-0.416) and 0.881 (95% CI: 0.875-0.888), respectively, for the 11,207 patients with available BAR scores. This score was not significantly different from the highest F1 score among the DNN models, which was achieved by DNN with RFS + softbin (0.21 [95% CI: 0.187-0.236]) at a threshold of 0.106, with sensitivity and specificity of 0.331 (95% CI: 0.296-0.369) and 0.898 (95% CI: 0.892-0.904), respectively. At this threshold, the SOFT score had slightly more true positives compared to the DNN model (223 vs 199) as a result of the higher sensitivity but with more false positives (1194 vs 1099) as a result of the lower specificity. The best DNN model based on AUC, namely DNN with OFS + softbin, had a comparable F1 score 0.209 (95% CI: 0.184-0.234) at a threshold of 0.113.

Adjusting the thresholds of the risk models will increase either the sensitivity or specificity with a consequent decrease in the complementary measure. By choosing the minimal threshold to achieve a sensitivity of at least 90%, the BAR score achieved a sensitivity of 93.8 at a threshold of 3, whereas the DNN w/OFS+ softbin achieved a sensitivity of 0.91 at a threshold of 0.025. However, the specificity of the BAR score was substantially lower at 0.15 versus 0.26 for the DNN model. For the SOFT score, a sensitivity of 0.92 was achieved at a threshold of 5, with a corresponding specificity of 0.23, which is lower than that for the DNN. By choosing the threshold to achieve a minimum specificity of 90%, the SOFT score achieved a specificity of 0.91 at a threshold of 22, whereas the DNN w/RFS + softbin achieved a specificity of 0.9 at a threshold of 0.107. At these thresholds, the sensitivity of the SOFT score was 0.30 versus 0.33 for the DNN model.

DISCUSSION

The results demonstrate that a DNN can be used to predict 90-day post-liver transplant mortality using UNOS registry data. While the AUC for the best performing DNN (DNN with OFS + softbin) was the highest among the tested models, significantly outperforming the BAR score, it did not achieve significantly higher performance compared to the SOFT score. Similarly, the DNN’s maximal F1 measure, which reflects a balanced valuation of sensitivity and specificity, was not significantly different from that of the SOFT score. At the thresholds that maximized the F1 measures for the DNN with OFS + softbin and SOFT score, the DNN model had significantly higher specificity with fewer false positive (990 vs 1258). However, the SOFT score had more true positives (223 vs 185), reflecting the higher sensitivity of the SOFT score. It is important to note that by adjusting the threshold value, arbitrarily high sensitivities or specificities can be achieved for both models with a consequent decrease in the complimentary metric. While the F1 measure values sensitivity and specificity equally, the relative costs of a false positive (i.e., failing to transplant a patient who otherwise would live) versus the cost of a false negative (transplanting a patient who will die) is a decision that must be made by the transplant community. Rana et al argue that a SOFT score greater than or equal to 40 may indicate futile transplantation [9]. However, in our cohort, a threshold of 40 for the SOFT score carried a sensitivity of only 0.025 (95% CI: 0.014-0.038), raising questions about its clinical utility.

While several predictive models exist, we chose to compare the DNN to the BAR score and SOFT score as they were both derived from UNOS registry data and have the highest AUC in predicting 90-day post-transplant mortality. While both models report an AUC of 0.7, in our study the calculated AUC were slightly lower at 0.66 and 0.69 for the BAR score and SOFT score, respectively. These differences may be explained by differing exclusion criteria with the dataset used to derive the BAR score excluding split livers and donation after cardiac death donors. The SOFT score in our dataset was based on 17 of the original 18 features, as the variable indicating portal bleed within 48 hours of transplantation was not available in the UNOS dataset.

Given the scarcity of organ donors, when adverse outcomes occur, the logical question is whether the organ would have been better served by being allocated to another recipient. As such, many have questioned whether to transplant a patient based solely on need or whether to do so based on expected outcomes [2]. The concept of futile transplantation is not new, and defining futility is difficult [35]. An underlying theme, however, points to the need to estimate postoperative mortality and not solely focus on preoperative survival. Authors have suggested models that account for both waitlist mortality and the probability of post-transplant survival [36], and some have called for novel liver allocation models that achieve collective survival benefits [37]. Given the success that DNNs have had in various classification tasks, we tested the hypothesis of whether they could perform superiorly in this classification problem and therefore be an important step to ultimately achieving better allocation models.

Machine learning algorithms can model more complex interactions and nonlinearities among the input features and often achieve higher predictive performance than conventional statistical models. To date, though, few groups have explored these methods to predict post-liver transplant morbidity and mortality. Lau et al recently used a random forest to classify graft failure within 30 days following liver transplantation using a study sample of 180 recipients from institution-level data and achieved an AUC of 0.818, although performance was significantly diminished when applying the model to the validation set. [38]. While some have explored using neural networks to predict liver transplant mortality, most were based on a small number of patients at individual institutions [39-41]. Raji et al applied a neural network using UNOS level data to predict post-transplantation graft failure, but the authors only included a few hundred patients in the model [42].

While DNN have achieved improved performance in various classification tasks, there are several possible reasons why the DNN failed to significantly outperform a logistic regression model in this study. There are likely features that are predictive of post-transplant mortality that were not included in this risk model. Multiple cardiac risk factors, for example, have been found to be associated with adverse events including survival, and several studies have shown that cardiac morbidity is 1 of the leading causes of post-transplant mortality [43]. Single-center studies have identified cardiovascular risk [37], preoperative troponin levels [44], coronary artery disease [45], and echocardiographic measures [46,47] as predictors of survival. As these data are not included in the UNOS database, we were unable to account for this variability in the outcome. It is possible that other machine learning algorithms, either alone or in combination with a DNN, may be able to achieve superior performance given the same training data. While a DNN can, in theory, approximate any complex function that maps the predictors to the response variable, given limited training data this may not be achieved, and other machine learning algorithms may achieve better discriminative performance.

As researchers are using machine learning more frequently, an emerging theme is how these sophisticated algorithms do not always outperform conventional statistical models such as regression. In a recent study, our group applied deep learning to the prediction of postoperative mortality using institution-level data and found that it did not outperform logistic regression [28]. Similarly, machine learning algorithms failed to outperform logistic regression in the prediction of heart failure readmission [26]. Machine learning algorithms such as DNNs are more likely to excel in the analysis of complex, high granularity data that is lacking from the UNOS database. Finally, all machine learning models are limited by whether relevant features can be appropriately encoded in such a way that can be included as a variable in the model. Several tacit knowledge variables, such as the physical appearance of a patient, are difficult to quantify and therefore include in a DNN model. The future may allow such variables to be represented in models, but for the foreseeable future, the clinician will be involved in risk assessment.

CONCLUSIONS

To date, there has been a dearth of research using the rich set of complex data within a patient’s electronic health record to develop more accurate patient-specific estimates of outcomes following transplantation. To achieve improved discriminative performance, future studies should incorporate higher-resolution clinical data from a patient’s electronic health record. The development of more patient-specific estimates of transplant risk can help achieve improved organ allocation with improvement of outcomes for the recipient and the transplant community at large.

Supplementary Material

Supplement

NIHMS1628916-supplement-Supplement.docx^{(20.6KB, docx)}

ACKNOWLEDGMENTS

This work was supported in part by Health Resources and Services Administration contract 234-2005-370011C. The content is the responsibility of the authors alone and does not necessarily reflect the views or policies of the Department of Health and Human Services nor does it mention of trade names, commercial products, or organizations imply endorsement by the US Government. The data reported here have been supplied by the United Network for Organ Sharing as the contractor for the Organ Procurement and Transplantation Network. The interpretation and reporting of these data are the responsibility of the authors and in no way should be seen as an official policy of or an interpretation by the OPTN or the US Government.

This work was supported by the National Institutes of Health (NIH) (R01 HL144692). The Department of Anesthesiology and Perioperative Medicine at the University of California, Los Angeles receives funding from the NIH (R01GM117622; R01 NR013012; U54HL119893; 1R01HL144692).

Dr. Cannesson is a consultant for Edwards Lifesciences and Masimo Corporation and has funded research from Edwards Lifesciences and Masimo Corporation. He is also the founder of Sironis and owns patents and receives royalties for closed loop hemodynamic management that is licensed to Edwards Life-sciences. Dr. Cannesson’s department receives funding from the National Institutes of Health (NIH) (R01GM117622; R01 NR013012; U54HL119893; 1R01HL144692). This work was supported by the NIH (R01 HL144692). Dr. Lee is a salaried employee of Edwards Lifesciences, but this research was unrelated her employment and was a part of her PhD work. Mr. Urban receives funding from the National Science Foundation (NSF 1633631).

Data Availability Statement

All data is from the United Network for Organ Sharing Standard Transplant Analysis and Research File, which is based on the Organ Procurement and Transplantation Network data as of September 9, 2016.

REFERENCES

[1].Wertheim JA, Petrowsky H, Saab S, Kupiec-Weglinski JW, Busuttil RW. Major challenges limiting liver transplantation in the United States. Am J Transplant 2011;11:1773–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
[2].Weismuller TJ, Fikatas P, Schmidt J, et al. Multicentric evaluation of model for end-stage liver disease-based allocation and survival after liver transplantation in Germany–limitations of the “sickest first”-concept. Transpl Int 2011;24:91–9. [DOI] [PubMed] [Google Scholar]
[3].Dutkowski P, Linecker M, DeOliveira ML, Mullhaupt B, Clavien PA. Challenges to liver transplantation and strategies to improve outcomes. J Gastroenterol 2015;148:307–23. [DOI] [PubMed] [Google Scholar]
[4].Wiesner R, Edwards E, Freeman R, et al. Model for end-stage liver disease (MELD) and allocation of donor livers. J Gastroenterol 2003;124:91–6. [DOI] [PubMed] [Google Scholar]
[5].Kamath PS, Kim WR, Advanced Liver Disease Study G. The model for end-stage liver disease (MELD). J Hepatol 2007;45: 797–805. [DOI] [PubMed] [Google Scholar]
[6].Kamath PS, Wiesner RH, Malinchoc M, et al. A model to predict survival in patients with end-stage liver disease. J Hepatol 2001;33:464–70. [DOI] [PubMed] [Google Scholar]
[7].Desai NM, Mange KC, Crawford MD, et al. Predicting outcome after liver transplantation: utility of the model for end-stage liver disease and a newly derived discrimination function. Transplantation 2004;77:99–106. [DOI] [PubMed] [Google Scholar]
[8].Dutkowski P, Oberkofler CE, Slankamenac K, et al. Are there better guidelines for allocation in liver transplantation? A novel score targeting justice and utility in the model for end-stage liver disease era. Ann Surg 2011;254:745–3 [discussion: 753]. [DOI] [PubMed] [Google Scholar]
[9].Rana A, Hardy MA, Halazun KJ, et al. Survival outcomes following liver transplantation (SOFT) score: a novel method to predict patient survival following liver transplantation. Am J Transplant 2008;8:2537–46. [DOI] [PubMed] [Google Scholar]
[10].Le Cun Y, Boser B, Denker JS, et al. Handwritten digit recognition with a back-propagation network. Burlington, Mass: Morgan Kaufmann; 1990. [Google Scholar]
[11].Baldi P, Chauvin Y. Neural networks for fingerprint recognition. Neural Comput 1993;5. [Google Scholar]
[12].Krizhevsky Sutskever, Hinton E. ImageNet classification with deep convolutional neural networks. In: Adv Neural Inf Process Syst; 2012. p. 1097–105. [Google Scholar]
[13].Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2015. p. 1–9. [Google Scholar]
[14].Srivastava RK, Greff K, Schmidhuber J. Training very deep networks. In: Adv Neural Inf Process Syst; 2015. p. 2377–85. [Google Scholar]
[15].He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 770–8. [Google Scholar]
[16].Baldi P, Sadowski P, Whiteson D. Searching for exotic particles in high-energy physics with deep learning. Nat Commun 2014;5. [DOI] [PubMed] [Google Scholar]
[17].Sadowski PJ, Collado J, Whiteson D, Baldi P. Deep learning, dark knowledge, and dark matter. JMLR 2015;42. [Google Scholar]
[18].Kayala MA, Azencott C-A, Chen JH, Baldi P. Learning to predict chemical reactions. J Chem Inf Model 2011;51:2209–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
[19].Kayala MA, Baldi P. ReactionPredictor: prediction of complex chemical reactions at the mechanistic level using machine learning. J Chem Inf Model 2012;52:2526–40. [DOI] [PubMed] [Google Scholar]
[20].Lusci A, Pollastri G, Baldi P. Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J Chem Inf Model 2013;53: 1563–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Lena P, Nagata K, Baldi P. Deep architectures for protein contact map prediction. J Bioinform 2012;28:2449–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Baldi P, Pollastri G. The principled design of large-scale recursive neural network architectures–dag-rnns and the protein structure prediction problem. JMLR 2003;4. [Google Scholar]
[23].Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 2015;12:931–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
[24].Guillame-Bert M, Dubrawski A, Wang D, Hravnak M, Clermont G, Pinsky MR. Learning temporal rules to forecast instability in continuously monitored patients. JAMIA 2016;24:47–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
[25].Chen L, Dubrawski A, Clermont G, Hravnak M, Pinsky M. Modelling risk of cardio-respiratory instability as a heterogeneous process. AMIA Annual Symposium Proceedings 2015. [PMC free article] [PubMed] [Google Scholar]
[26].Frizzell JD, Liang L, Schulte PJ, Yancy CW, Heidenreich PA, Hernandez AF, et al. Prediction of 30-day allcause readmissions in patients hospitalized for heart failure: comparison of machine learning and other statistical approaches. JAMA Cardiol 2017;2:204–9. [DOI] [PubMed] [Google Scholar]
[27].Shadmi E, Flaks-Manov N, Hoshen M, Goldman O, Bitterman H, Balicer RD. Predicting 30-day readmissions with preadmission electronic health record data. Med Care 2015;53:283. [DOI] [PubMed] [Google Scholar]
[28].Lee CK, Hofer I, Gabel E, Baldi P, Cannesson M. Development and validation of a deep neural network model for prediction of postoperative in-hospital mortality. Anesthesiology 2018;129: 649–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
[29].Luo W, Phung D, Tran T, et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res 2016;18: e323. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Massie AB, Kucirka LM, Segev DL. Big data in organ transplantation: registries and administrative claims. Am J Transplent 2014;14:1723–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
[31].Baldi P, Sadowski P. The dropout learning algorithm. Artif Intell 2014;210:78–122. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32].Srivastava N Dropout: a simple way to prevent neural networks from overfitting. JMLR 2014;15. [Google Scholar]
[33].Keras Chollet F.. https://github.com/fchollet/keras; 2015. [Accessed 12-16-2018].
[34].Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. JMLR 2011;12. [Google Scholar]
[35].Zimmerman MA, Ghobrial RM. When shouldn’t we retransplant? Liver Transpl 2005:S14–20. [DOI] [PubMed] [Google Scholar]
[36].Briceno J, Ciria R, de la Mata M. Donor-recipient matching: myths and realities. J Hepatol 2013;58:811–20. [DOI] [PubMed] [Google Scholar]
[37].Dutkowski P, Clavien PA. Scorecard and insights from approaches to liver allocation around the world. Liver Transpl 2016;22:9–13. [DOI] [PubMed] [Google Scholar]
[38].Lau L, Kankanige Y, Rubinstein B, et al. Machine-learning algorithms predict graft failure after liver transplantation. Transplantation 2017;101:e125–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
[39].Cucchetti A, Vivarelli M, Heaton ND, et al. Artificial neural network is superior to MELD in predicting mortality of patients with end-stage liver disease. Gut 2007;56:253–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
[40].Zhang M, Yin F, Chen B, et al. Pretransplant prediction of posttransplant survival for liver recipients with benign end-stage liver diseases: a nonlinear model. PLoS One 2012;7:e31256. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
[41].Cruz-Ramirez M, Hervas-Martinez C, Fernandez JC, Briceno J, de la Mata M. Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. Artif Intell Med 2013;58:37–49. [DOI] [PubMed] [Google Scholar]
[42].Raji CG, Vinod Chandra SS. Artificial neural networks in prediction of patient survival after liver transplantation. J Health Med Inform 2016;7:1. [Google Scholar]
[43].Fouad TR, Abdel-Razek WM, Burak KW, Bain VG, Lee SS. Prediction of cardiac complications after liver transplantation. Transplantation 2009;87:763–70. [DOI] [PubMed] [Google Scholar]
[44].Watt KD, Coss E, Pedersen RA, Dierkhising R, Heimbach JK, Charlton MR. Pretransplant serum troponin levels are highly predictive of patient and graft survival following liver transplantation. Liver Transpl 2010;16:990–8. [DOI] [PubMed] [Google Scholar]
[45].Yong CM, Sharma M, Ochoa V, et al. Multivessel coronary artery disease predicts mortality, length of stay, and pressor requirements after liver transplantation. Liver Transpl 2010;16: 1242–8. [DOI] [PubMed] [Google Scholar]
[46].Dowsley TF, Bayne DB, Langnas AN, et al. Diastolic dysfunction in patients with end-stage liver disease is associated with development of heart failure early after liver transplantation. Transplantation 2012;94:646–51. [DOI] [PubMed] [Google Scholar]
[47].Ershoff BD, Gordin JS, Vorobiof G, et al. Improving the prediction of mortality in the high model for end-stage liver disease score liver transplant recipient: a role for the left atrial volume index. Transplant Proc 2018;50:1407–12. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

NIHMS1628916-supplement-Supplement.docx^{(20.6KB, docx)}

Data Availability Statement

All data is from the United Network for Organ Sharing Standard Transplant Analysis and Research File, which is based on the Organ Procurement and Transplantation Network data as of September 9, 2016.

[R1] [1].Wertheim JA, Petrowsky H, Saab S, Kupiec-Weglinski JW, Busuttil RW. Major challenges limiting liver transplantation in the United States. Am J Transplant 2011;11:1773–84. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] [2].Weismuller TJ, Fikatas P, Schmidt J, et al. Multicentric evaluation of model for end-stage liver disease-based allocation and survival after liver transplantation in Germany–limitations of the “sickest first”-concept. Transpl Int 2011;24:91–9. [DOI] [PubMed] [Google Scholar]

[R3] [3].Dutkowski P, Linecker M, DeOliveira ML, Mullhaupt B, Clavien PA. Challenges to liver transplantation and strategies to improve outcomes. J Gastroenterol 2015;148:307–23. [DOI] [PubMed] [Google Scholar]

[R4] [4].Wiesner R, Edwards E, Freeman R, et al. Model for end-stage liver disease (MELD) and allocation of donor livers. J Gastroenterol 2003;124:91–6. [DOI] [PubMed] [Google Scholar]

[R5] [5].Kamath PS, Kim WR, Advanced Liver Disease Study G. The model for end-stage liver disease (MELD). J Hepatol 2007;45: 797–805. [DOI] [PubMed] [Google Scholar]

[R6] [6].Kamath PS, Wiesner RH, Malinchoc M, et al. A model to predict survival in patients with end-stage liver disease. J Hepatol 2001;33:464–70. [DOI] [PubMed] [Google Scholar]

[R7] [7].Desai NM, Mange KC, Crawford MD, et al. Predicting outcome after liver transplantation: utility of the model for end-stage liver disease and a newly derived discrimination function. Transplantation 2004;77:99–106. [DOI] [PubMed] [Google Scholar]

[R8] [8].Dutkowski P, Oberkofler CE, Slankamenac K, et al. Are there better guidelines for allocation in liver transplantation? A novel score targeting justice and utility in the model for end-stage liver disease era. Ann Surg 2011;254:745–3 [discussion: 753]. [DOI] [PubMed] [Google Scholar]

[R9] [9].Rana A, Hardy MA, Halazun KJ, et al. Survival outcomes following liver transplantation (SOFT) score: a novel method to predict patient survival following liver transplantation. Am J Transplant 2008;8:2537–46. [DOI] [PubMed] [Google Scholar]

[R10] [10].Le Cun Y, Boser B, Denker JS, et al. Handwritten digit recognition with a back-propagation network. Burlington, Mass: Morgan Kaufmann; 1990. [Google Scholar]

[R11] [11].Baldi P, Chauvin Y. Neural networks for fingerprint recognition. Neural Comput 1993;5. [Google Scholar]

[R12] [12].Krizhevsky Sutskever, Hinton E. ImageNet classification with deep convolutional neural networks. In: Adv Neural Inf Process Syst; 2012. p. 1097–105. [Google Scholar]

[R13] [13].Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2015. p. 1–9. [Google Scholar]

[R14] [14].Srivastava RK, Greff K, Schmidhuber J. Training very deep networks. In: Adv Neural Inf Process Syst; 2015. p. 2377–85. [Google Scholar]

[R15] [15].He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 770–8. [Google Scholar]

[R16] [16].Baldi P, Sadowski P, Whiteson D. Searching for exotic particles in high-energy physics with deep learning. Nat Commun 2014;5. [DOI] [PubMed] [Google Scholar]

[R17] [17].Sadowski PJ, Collado J, Whiteson D, Baldi P. Deep learning, dark knowledge, and dark matter. JMLR 2015;42. [Google Scholar]

[R18] [18].Kayala MA, Azencott C-A, Chen JH, Baldi P. Learning to predict chemical reactions. J Chem Inf Model 2011;51:2209–22. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] [19].Kayala MA, Baldi P. ReactionPredictor: prediction of complex chemical reactions at the mechanistic level using machine learning. J Chem Inf Model 2012;52:2526–40. [DOI] [PubMed] [Google Scholar]

[R20] [20].Lusci A, Pollastri G, Baldi P. Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J Chem Inf Model 2013;53: 1563–75. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] [21].Lena P, Nagata K, Baldi P. Deep architectures for protein contact map prediction. J Bioinform 2012;28:2449–57. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] [22].Baldi P, Pollastri G. The principled design of large-scale recursive neural network architectures–dag-rnns and the protein structure prediction problem. JMLR 2003;4. [Google Scholar]

[R23] [23].Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 2015;12:931–4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] [24].Guillame-Bert M, Dubrawski A, Wang D, Hravnak M, Clermont G, Pinsky MR. Learning temporal rules to forecast instability in continuously monitored patients. JAMIA 2016;24:47–53. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] [25].Chen L, Dubrawski A, Clermont G, Hravnak M, Pinsky M. Modelling risk of cardio-respiratory instability as a heterogeneous process. AMIA Annual Symposium Proceedings 2015. [PMC free article] [PubMed] [Google Scholar]

[R26] [26].Frizzell JD, Liang L, Schulte PJ, Yancy CW, Heidenreich PA, Hernandez AF, et al. Prediction of 30-day allcause readmissions in patients hospitalized for heart failure: comparison of machine learning and other statistical approaches. JAMA Cardiol 2017;2:204–9. [DOI] [PubMed] [Google Scholar]

[R27] [27].Shadmi E, Flaks-Manov N, Hoshen M, Goldman O, Bitterman H, Balicer RD. Predicting 30-day readmissions with preadmission electronic health record data. Med Care 2015;53:283. [DOI] [PubMed] [Google Scholar]

[R28] [28].Lee CK, Hofer I, Gabel E, Baldi P, Cannesson M. Development and validation of a deep neural network model for prediction of postoperative in-hospital mortality. Anesthesiology 2018;129: 649–62. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] [29].Luo W, Phung D, Tran T, et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res 2016;18: e323. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] [30].Massie AB, Kucirka LM, Segev DL. Big data in organ transplantation: registries and administrative claims. Am J Transplent 2014;14:1723–30. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] [31].Baldi P, Sadowski P. The dropout learning algorithm. Artif Intell 2014;210:78–122. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] [32].Srivastava N Dropout: a simple way to prevent neural networks from overfitting. JMLR 2014;15. [Google Scholar]

[R33] [33].Keras Chollet F.. https://github.com/fchollet/keras; 2015. [Accessed 12-16-2018].

[R34] [34].Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. JMLR 2011;12. [Google Scholar]

[R35] [35].Zimmerman MA, Ghobrial RM. When shouldn’t we retransplant? Liver Transpl 2005:S14–20. [DOI] [PubMed] [Google Scholar]

[R36] [36].Briceno J, Ciria R, de la Mata M. Donor-recipient matching: myths and realities. J Hepatol 2013;58:811–20. [DOI] [PubMed] [Google Scholar]

[R37] [37].Dutkowski P, Clavien PA. Scorecard and insights from approaches to liver allocation around the world. Liver Transpl 2016;22:9–13. [DOI] [PubMed] [Google Scholar]

[R38] [38].Lau L, Kankanige Y, Rubinstein B, et al. Machine-learning algorithms predict graft failure after liver transplantation. Transplantation 2017;101:e125–32. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] [39].Cucchetti A, Vivarelli M, Heaton ND, et al. Artificial neural network is superior to MELD in predicting mortality of patients with end-stage liver disease. Gut 2007;56:253–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] [40].Zhang M, Yin F, Chen B, et al. Pretransplant prediction of posttransplant survival for liver recipients with benign end-stage liver diseases: a nonlinear model. PLoS One 2012;7:e31256. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]

[R41] [41].Cruz-Ramirez M, Hervas-Martinez C, Fernandez JC, Briceno J, de la Mata M. Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. Artif Intell Med 2013;58:37–49. [DOI] [PubMed] [Google Scholar]

[R42] [42].Raji CG, Vinod Chandra SS. Artificial neural networks in prediction of patient survival after liver transplantation. J Health Med Inform 2016;7:1. [Google Scholar]

[R43] [43].Fouad TR, Abdel-Razek WM, Burak KW, Bain VG, Lee SS. Prediction of cardiac complications after liver transplantation. Transplantation 2009;87:763–70. [DOI] [PubMed] [Google Scholar]

[R44] [44].Watt KD, Coss E, Pedersen RA, Dierkhising R, Heimbach JK, Charlton MR. Pretransplant serum troponin levels are highly predictive of patient and graft survival following liver transplantation. Liver Transpl 2010;16:990–8. [DOI] [PubMed] [Google Scholar]

[R45] [45].Yong CM, Sharma M, Ochoa V, et al. Multivessel coronary artery disease predicts mortality, length of stay, and pressor requirements after liver transplantation. Liver Transpl 2010;16: 1242–8. [DOI] [PubMed] [Google Scholar]

[R46] [46].Dowsley TF, Bayne DB, Langnas AN, et al. Diastolic dysfunction in patients with end-stage liver disease is associated with development of heart failure early after liver transplantation. Transplantation 2012;94:646–51. [DOI] [PubMed] [Google Scholar]

[R47] [47].Ershoff BD, Gordin JS, Vorobiof G, et al. Improving the prediction of mortality in the high model for end-stage liver disease score liver transplant recipient: a role for the left atrial volume index. Transplant Proc 2018;50:1407–12. [DOI] [PubMed] [Google Scholar]

PERMALINK

Training and Validation of Deep Neural Networks for the Prediction of 90-Day Post-Liver Transplant Mortality Using UNOS Registry Data

Brent D Ershoff

Christine K Lee

Christopher L Wray

Vatche G Agopian

Gregor Urban

Pierre Baldi

Maxime Cannesson

Abstract

MATERIALS AND METHODS

UNOS Data Extraction

Study Sample

Fig 1.

Model Endpoint Definition

Model Input Features

Table 1.

BAR Score and SOFT Score

Fig 2.

Data Preprocessing

“Soft” Binning Features

Development of the Model

Model Performance

Choosing a Threshold

RESULTS

Patient Characteristics

Development of the Model

Table 2.

Model Performance

Area Under the Receiver Operating Characteristics Curves

Fig 3.

Table 3.

Choosing a Threshold

Table 4.

DISCUSSION

CONCLUSIONS

Supplementary Material

ACKNOWLEDGMENTS

Data Availability Statement

REFERENCES

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases