Abstract
Heart failure (HF) is the final stage of the various heart diseases developing. The mortality rates of prognosis HF patients are highly variable, ranging from 5% to 75%. Evaluating the all-cause mortality of HF patients is an important means to avoid death and positively affect the health of patients. But in fact, machine learning models are difficult to gain good results on missing values, high dimensions, and imbalances HF data. Therefore, a deep learning system is proposed. In this system, we propose an indicator vector to indicate whether the value is true or be padded, which fast solves the missing values and helps expand data dimensions. Then, we use a convolutional neural network with different kernel sizes to obtain the features information. And a multi-head self-attention mechanism is applied to gain whole channel information, which is essential for the system to improve performance. Besides, the focal loss function is introduced to deal with the imbalanced problem better. The experimental data of the system are from the public database MIMIC-III, containing valid data for 10311 patients. The proposed system effectively and fast predicts four death types: death within 30 days, death within 180 days, death within 365 days and death after 365 days. Our study uses Deep SHAP to interpret the deep learning model and obtains the top 15 characteristics. These characteristics further confirm the effectiveness and rationality of the system and help provide a better medical service.
Introduction
Heart failure (HF) is a condition that causes structural or functional abnormalities of the heart through a variety of causes, resulting in dysfunction of the ventricular systolic or diastolic functions [1]. It is the final development stage of various heart diseases [2]. According to the American College of Cardiology, cardiovascular disease causes one-third of the world’s death. More than five million people in the United States suffer from heart failure, and 550,000 new cases are diagnosed each year [3–5]. Meanwhile, in China the prevalence of HF for people over 35 years old is 1.3%, and there are approximately 8.9 million HF patients [6]. As a part of cardiovascular disease, HF is an important cause of rising global mortality and has become a major public health problem worldwide [7]. Because of the high prevalence, the unsatisfactory prognosis and high re-hospitalization rate of HF, the direct and indirect heart failure costs are estimated to be $29 billion per year [8]. Therefore, an effective mortality prediction can help the doctor make more scientific treatment plans and prevent it from worsening, so as to improve the quality of life and reduce the medical expenses.
In recent years, there have been mortality prediction systems [9, 10] based on machine learning and deep learning algorithm, which could be widely used and make better analysis of the medical information. For example, Mark Stampehl et al. [11] applied three machine learning methods: classification and regression trees (CART), full logistic regression, and stepwise logistic regression, for giving the hospitalized Medicare patients a good mortality prediction. The ROC is greater than or equal to 0.74. Similarly, Sho Suzuki et al. [12] used a multiple stepwise logistic regression to select factors that influence HF mortality. Furthermore many improved tree models were used to predict mortality, such as CART [13], Random Forest (RF) [14, 15], Gradient Boosted Classification Tree (GBM) [16, 17]. In addition Zhe Wang et al. [18] utilized multiple empirical kernel learning to measure the weight of each feature influenced mortality. Besides compared with machine learning, deep learning can automatically select the important features and gain a better performance. Joon-myoung Kwon et al. [19] established a deep neural network (DNN) to predict mortality in patients with acute heart failure. Compared with the Guidelines–Heart Failure (GWTG-HF) score and other models, the deep learning model has a better AUC(0.880). Because convolutional neural network (CNN) has demonstrated excellent performance in handling features Zhe Wang et al. [20] used CNN to deal with high-dimensional features.
However, the above model ignores that in practice the collected dataset has the problems of incompleteness and imbalance. And it seriously influences the model accuracy and performance. Therefore, a Deep Learning System based on Multi-head Self-attention Mechanism (DLS-MSM) is proposed to better solve those problems. In the system data processing stage, we introduce an indicator vector to mark whether the value is a padding value, so as to quickly process the missing and expand the dimension. Then we use a convolutional network with different kernel sizes to represent features. But, after the recombination of convolution kernels of different sizes, the feature dimensions expand, which makes it difficult to determine which features should be paid more attention to. Hence we introduce the multi-head attention mechanism on CNN to better deal with characteristics from multiple views. Besides, the Focal loss function is introduced to pay more attention to imbalanced small samples and indistinguishable sample data, which solves the system data imbalance problem. Based on MIMIC-III public database data and Deep SHAP theory, the experiment proves that proposed system can effectively predict the four types of HF dead patients, who dead within 30 days, 180 days, 365 days and dead after 365 days.
Materials and methods
The system framework
The framework adopted in this paper shows in Fig 1. The system has four parts, which are (1) Data extraction, (2) Data preprocessing, (3) Deep learning model and (4) Prediction results. Frist of all, we extract the HF patient information from the MIMIC-III databases. Then, we propose to set the indicator vector for missing values flag after been filled. Finally, a CNN deep learning model based on multi-head self-attention methods (DLS-MSM) is proposed for mortality prediction.
Fig 1. The system framework, including data extraction, data preprocessing, deep learning model and prediction, results in four parts.
Data extraction
The HF dataset is extracted from the MIMIC-III v1.4. MIMIC-III (Medical Information Mart for Intensive Care III) [21, 22] is a large, freely-available database comprising health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. MIMIC-III V1.4 adopted ICD-9 codes. According to ICD-9 codes, 25 types of heart failure were extracted, shown in Table 1.
Table 1. HF patients correspond to the ICD-9 codes.
| ICD-9 codes | Name |
|---|---|
| 39891 | Rheumatic heart failure (congestive) |
| 40201 | Malignant hypertensive heart disease with heart failure |
| 40211 | Benign hypertensive heart disease with heart failure |
| 40291 | Unspecified hypertensive heart disease with heart failure |
| 40401 | Hypertensive heart and chronic kidney disease, malignant, with heart failure and with chronic kidney disease stage I through stage IV, or unspecified |
| 40403 | Hypertensive heart and chronic kidney disease, malignant, with heart failure and with chronic kidney disease stage V or end stage renal disease |
| 40411 | Hypertensive heart and chronic kidney disease, benign, with heart failure and with chronic kidney disease stage I through stage IV, or unspecified |
| 40413 | Hypertensive heart and chronic kidney disease, benign, with heart failure and chronic kidney disease stage V or end stage renal disease |
| 40491 | Hypertensive heart and chronic kidney disease, unspecified, with heart failure and with chronic kidney disease stage I through stage IV, or unspecified |
| 40493 | Hypertensive heart and chronic kidney disease, unspecified, with heart failure and chronic kidney disease stage V or end stage renal disease |
| 4280 | Congestive heart failure, unspecified |
| 4281 | Left heart failure |
| 42820 | Systolic heart failure, unspecified |
| 42821 | Acute systolic heart failure |
| 42822 | Chronic systolic heart failure |
| 42823 | Acute on chronic systolic heart failure |
| 42830 | Diastolic heart failure, unspecified |
| 42831 | Acute diastolic heart failure |
| 42832 | Chronic diastolic heart failure |
| 42833 | Acute on chronic diastolic heart failure |
| 42840 | Combined systolic and diastolic heart failure, unspecified |
| 42841 | Acute combined systolic and diastolic heart failure |
| 42842 | Chronic combined systolic and diastolic heart failure |
| 42843 | Acute on chronic combined systolic and diastolic heart failure |
| 4289 | Heart failure, unspecified |
Under the criterion that age greater than or equal to 18 years, we totally extracted 10311 patients. Starting point is defined as the time of first hospitalized HF patients, and the endpoint is the time when patients were dead or discharge. Then, each patient has a death time, such as 0, 364. Zero means alive. The number 364 means that a patient died 364 days after being admitted to the hospital for the first time. So, we divide HF patient by death time into five categories: survivable patients who are alive in the statistical period, dead within 30 days patients, dead within 180 days patients, dead within 365 days patients and patients died after 365 days. Each group of died patients form a group with those who did not. Therefore, we have four binary experimental classifications. Every group data is a binary value {0, 1}. All patients died were labeled as a positive sample, others is negative sample, shown in Table 2.
Table 2. Group and number of experiments.
| Groups, alias | Positive samples{1} | Negative samples{0} | Total | IR |
|---|---|---|---|---|
| 1, W30D | 2472, within 30 days patients | 4239, survivable patients | 6711 | 1.7148 |
| 2, W180D | 1393, within 180 days patients | 4239, survivable patients | 5632 | 3.0431 |
| 3, W365D | 530, within 365 days patients | 4239, survivable patients | 4769 | 7.9981 |
| 4, A365D | 1677, after 365 days patients | 4239, survivable patients | 5916 | 2.5277 |
Besides we calculate the imbalanced rate (IR) defined in the formula (1). Four datasets of imbalanced rate respectively is 1.7148, 3.0431, 7.9981 and 2.5277. The W365D dataset is the most imbalanced.
| (1) |
Data preprocessing
Feature vectorization
From MIMIC-III, we obtain 66 features including 39 discrete, 3 complete continuous and 24 having missing values continuous features. First we vectorize 39 discrete features by one-hot coding. There are gender (index 1 in Table 3), medication (index 3–9), surgery (index 10–13), related diseases (index 14–39) and the feature of Stayed in CCU (index 42).
Table 3. The total features of HF patients.
| Index | Name | Representation | Meaning and scope |
|---|---|---|---|
| 1 | Gender | One-hot coding, {0, 1} | 0 means male, 1 means female |
| 2 | Age | Value | 18–91.4 |
| 3–9 | Medication | One-hot coding, {0, 1} | 1 means used, 0 means not |
| 10–13 | Surgery | One-hot coding, {0, 1} | 1 means has had surgery, 0 means not |
| 14–39 | Related diseases | One-hot coding, {0, 1} | 1 means with this related diseases, 0 means not |
| 40 | ICU turnover times | Value | 0–41 |
| 41 | ICU stay time | Value | 0–268.25 |
| 42 | Stayed in CCU | One-hot coding, {0, 1} | 1 means stayed, 0 means not |
| 43–86 | Laboratory test | Value and one-hot coding, {0, 1} | — |
| 87–88 | Heart rate | Value and one-hot coding, {0, 1} | 30–303 |
| 89–90 | BMI | Value and one-hot coding, {0, 1} | 11.7–78.4 |
As shown in Table 3, the digit 0 is male and the digit 1 is female for gender. Then according to medicine efficacy, the medications are divided into 7 groups. Those are ACEI, ARB, beta-receptor blockers, CCB, digitalis, diuretic, and nitrates. Next, the surgery contains 4 classes, which are left ventricular assistant device (LVAD), cardiac resynchronization therapy (CRT), automatic implantable cardioverter/defibrillator check (ICD) and heart transplantation. Besides, we summary 26 diseases as the related diseases (see Appendix A in S1 File), such as Cardiac arrhythmias and Cardiomyopathy. The feature of medication, surgery, related diseases and staying in CCU are represented in the same way. The digit 1 means a patient has used one of the medication or has had one of the surgery and so on. On the contrary, the digit 0 means not.
Proposed indicator vector for missing values method
Since the 24 dimensions of laboratory test (see Appendix B in S1 File), heart rate and BMI features contain missing values, it is necessary to fill the missing values. Mean with variance is the most widely used missing value imputation techniques [23]. However a lot of characteristics in the MIMIC-III HF database have very large variance. For example, white cells count in blood (WBC) feature’s mean is 17.1002 K/uL and variance is 15.1180. So the mean/variance method is not suitable. Hence, we straightforwardly chose the easiest way to deal with missing values, which the normal range of the characteristics is used to fill missing values, and then the filled value is marked as the padded value. The digit 1 is a flag that the value is missing and has been filled, and the digit 0 means it’s a true value. In this approach, the problem of incomplete database can be solved quickly.
A sample processed by filling method is shown in Table 4. The hemoglobin value is missing, so we use the random value in the normal range to fill. Meanwhile we set digit 1 as an indicator value, which points out hemoglobin value is filled and not true value. Heart rate in Table 4 is true value, so the indicator vector is 0. Other features have implications similar to hemoglobin and heart rate.
Table 4. The variables having missing value are preprocessed.
| WBC | Indicator value | Hemoglobin | Indicator value | … | Heart rate | Indicator value | BMI | Indicator value |
|---|---|---|---|---|---|---|---|---|
| 8.2 | 0 | 15.47 | 1 | … | 86 | 0 | 28.9609 | 0 |
Since an indicator value is added after all missing features, the final feature dimension is 90 showed in Table 3. To further illustrate the features, Table 5 displays the composition of features. After missingness imputation, we adopt the Z-score normalization method to normalize data for avoiding the influence of outliers and extremes.
Table 5. The composition of features.
| Discrete features | Continuous features | Indicator vectors | Total features | |
|---|---|---|---|---|
| Complete features | Missing features | |||
| 39 | 3 | 24 | 24 | 90 |
Deep learning model
CNN with different kernel sizes
A convolutional neural network (CNN) is a feedforward neural network. Compared with the multi-layer perceptron (MLP) [24], CNN has the local connectivity which can make use of the local information, optimize network parameters and structure, reduce model training time and improve performance [25]. Therefore, we use CNN to predict the mortality. Because, the dataset, in our research, is a piece of HF patient characteristic information, so the input is in a 1D format. Therefore, we adopt the 1D convolution. Furthermore, the 1D convolution only convolutes one direction (vertical) of features sequence. It is easy to separate the indicator vector from the indicated feature and hard to capture characteristic information. Therefore, our study uses multiple size kernels to extract key information for better capturing the information of the indicator vector and local correlation.
In conclusion, the deep learning model is indicated in Fig 2. Firstly, the input size is 256*(1*90). 256 is the batch size. And 90 is the feature dimension and the reason has been explained in the section of Proposed indicator vector for missing values method. Secondly, eight convolutional kernels of (1*3) size combine with Batch Normalization (BN) and ReLU function as Convolution layer. Thirdly three different convolution groups, 3*(1*3), 3*(1*4) and 3*(1*5) combine with BN, ReLU function and MaxPooling as different kernels layer, which respectively receive the convolution layer learned information. Furthermore, the MaxPooling with 2 units of sampling window is used. Fourthly, stitching layer binds every output from different kernels structure, changing to 256*396. Then the multi-head self-attention structure is used to focus on the global and local information from the stitching layer. We set the head in multi-head self-attention is two. The reason is discussed in subsection 4.2. Fifthly, the fully connected layer with (396 128) and (128 2) size processes information from the attention mechanism. Finally the output is 256*(1*2) which means we obtain 256 mortality predictions at one time.
Fig 2. The CNN based on multi-head self-attention for prediction.
Multi-head self-attention mechanism
Attention simulates the human brain processing mechanism. It identifies the target area, through focusing on crucial information and neglecting other information. In this way, the efficiency and accuracy of a model can make great progress [26]. Yingying Zhang et al. [27] proposed that representation in different subspaces likely focuses on different information. All subspaces can enhance the global information. This idea inspires us. There are three different size receptive fields in our model, and each group focuses on different information. After splicing, all information is summarized. The inside of the information is a relatively complete whole, while there is no correlation and interaction between the three cores. Hence, we use the multi-head self-attention mechanism illustrated in Fig 3 to obtain the global information and pay more attention on the important features.
Fig 3. The architecture of multi-head self-attention mechanism.
As shown in Fig 3 the x (3*132) in stitching layer goes through three different linear layers to generate keys (denoted as K), queries (denoted as Q) and values (denoted as V), which is described in the formula (2). And the self-attention reflects the input from K, Q and v is the same.
| (2) |
Here, (Wk, bk) is a set of parameters about a linear layer also named fully connected layer. And Wk is the weight and bk is the bias. The (Wq, bq) and (Wv, bv) is the same as the (Wk, bk).
Then, the attention calculates the similarity between Q and K. The similarity reflects the importance of the extracted V, that is, the weight. Then according to input dimension (denoted as d_model), it scales the weights. Next, the attention value is obtained by weighted average, using the softmax function. The self-attention is reflected in Q = K = V. Formula (3) demonstrates the same process.
| (3) |
Afterwards, the multi-head self-attention mechanism used different head (number of h) to gain different representations from (Q, K, V). Ultimately, it concatenates the different results through a linear layer.
| (4) |
| (5) |
Where headi is the i-th head. In our research, h is equal to 2. And through the all steps, the output size is 3*132.
Focal loss function
In general, the imbalanced problem is a common problem in medical data processing and analysis. Similarly, the class imbalanced problem exists in MIMIC-III shown in Table 2. Therefore, we apply the focal loss function to deal with the problem. Focal loss function is proposed for one-stage detector in image object detection [28]. By reducing the weight of a large number of negative samples in the training samples, focal loss function makes the model focus on the category with fewer samples in the training process. Meanwhile, by reducing the weight of samples that are easy to classify, the accuracy of difficult to classify samples is improved.
The focal loss function is developed from the cross entropy loss function. And the cross entropy is defined:
| (6) |
where y is {0,1} and denotes the true label in dataset. In this research, label 1 is the dead HF patient. And pϵ[0,1] is the model prediction. For simplification, the transformation is as follow.
| (7) |
Hence, the cross entropy loss is defined as follows:
| (8) |
From the formula (8), the focal loss function representation is as follows.
| (9) |
In formula (8), the αt and γ are two hyper-parameters. The αt is used to adjust the proportion of positive and negative samples. Moreover, the γ revises the samples which are difficult to separated. In this study, we set αt and γ are 0.25 and 2 respectively.
Results
Training strategy
In first step, we adopt the 5 fold cross-validation to train our model for avoiding the over-fitting and under-fitting. Then, we divide each kind of dataset into a training set, verification set and test set. 5% of the test sets are randomly generated from each kind of dataset. Besides, the training epoch is 120. We used the Root Mean Square prop (RMSprop) optimizer and the initial learning rate is 0.001.
In the model evaluation step, we used seven criterions. There are accuracy (ACC), Positive Prediction Value (PPV) also named Precision, Negative Prediction Value (NPV), Recall, F1 score (F1), and Area Under Positive Rate (AUC), shown in formula (10) to (14). Considering the datasets are imbalanced, the model stability is crucial. Hence, we adopted the micro-average of AUC sensitive to the small samples to reflect stability.
| (10) |
| (11) |
| (12) |
| (13) |
| (14) |
In the above formulas, TP is the number of true positive samples, and on the contrary, TN is the number of true negative samples. FP is the number of false-positive samples and FN is the number of false-negative samples.
Mortality prediction for HF patients
In this subsection, we use proposed model to predict HF patients’ mortality. First of all, we apply model in the W30D datasets. As shown in Fig 4A, after 80 epochs, the model is gradually stable. The loss decreases from 0.9477 to 0.0417. The ACC maintains at 84.56%, and F1 score rises from 23.01% to 78.69%. The results reveal that it is of capacity that the model distinct dead patients and living patients. Besides, the AUC in Fig 5A is 91.00% declaring the model has good stability. Moreover, micro-average AUC displays 91.00%, which denotes model has the capability to deal with imbalanced data.
Fig 4. The loss, ACC F1 score curve in four experimental classifications.
Fig 5. The AUC in four experimental classifications.
Secondly, we measure the model performance on W180D datasets. The loss is constant at 0.0423, through 70 epochs in Fig 4B. Compared with W30D, the imbalanced rate of W180D is higher. Therefore, ACC downs three percent and finally keeps in 82.08%. As the imbalance rate increased, the F1 score drops to 56.33%, as does the ACC. However, the model remains stable. The AUC is 82.00% of both classes, in Fig 5B. As well as, the micro-average AUC reaches 88.00%. From this phenomenon, the model still handles the imbalance datasets well.
Afterwards when the imbalance rate continues to increase in W365D, the loss becomes more volatile. The Fig 4C loss curve appears this phenomenon. The loss disturbances between 0.022 and 0.024. Along with this comes that the AUC is 75.00% of both classes in Fig 5C, which indicates the decreasing stability. Moreover, there’s a huge difference between the micro-average AUC and macro-average AUC. The micro-average AUC is 91.00%. But the macro-average AUC is 75.00%. This difference indicates the model has the problem to handle the dataset with 7.988 imbalance rate. Then, although the ACC is higher than others, reaching 88.56%, the F1 score merely represented at 34.35%. This experiment proves the prediction tends to the category with a large number.
Ultimately, we discuss A365D dataset. The loss maintains around 0.050 in Fig 4D. It is slightly higher than others. This leads to the ACC is 76.07% and F1 is 46.34%. The A365D records HF patients who died after 1 year. However in A365D, 66.19% (1110/1677) of HF patients went 2 years or more. Therefore, there are more similarities in the characteristics of patients who died after one year and those who not died during the statistical period, comparing with other datasets. It declares the reason why the model has a lower ACC on A365D. In addition, AUC reaches of 72% both of classes. Fig 5D reports the micro-average AUC is 93%, explaining the model stability.
Model performance compared with other methods
In order to prove the validity, we compare DLS-MSM with six comparison models, which is representative and widely applied in the medical field or bio-medical field. There are Support Vector Machine (SVM) [29], Multi-layer Perceptron (MLP) [24], Logistic Regression (LR), Random Forest (RF) [30], Light Gradient Boosting Machine (LigthGBM, LGB) [31] and K-Nearest Neighbor (KNN). Table 6 showed comparison. All the decimals have been converted into percentages. Besides, for highlighting the best results for each metric have been bolded. The effects both are from the test set.
Table 6. Model performance compared with other methods.
| Datasets | Measurement | DLS-MSM | SVM | MLP | LR | RF | LGB | KNN |
|---|---|---|---|---|---|---|---|---|
| W30D | ACC | 84.56 | 85.12 | 80.95 | 82.44 | 86.31 | 84.23 | 80.35 |
| PPV | 76.28 | 80.64 | 70.16 | 70.96 | 72.58 | 77.40 | 64.52 | |
| NPV | 83.57 | 87.74 | 83.04 | 89.15 | 94.34 | 88.21 | 89.62 | |
| Recall | 81.26 | 79.37 | 76.32 | 79.28 | 88.24 | 79.34 | 78.43 | |
| F1 | 78.69 | 80.00 | 73.11 | 74.89 | 79.64 | 78.36 | 70.80 | |
| AUC | 91.00 | 83.97 | 79.82 | 81.64 | 86.85 | 83.15 | 79.82 | |
| Micro-average AUC | 91.00 | 90.00 | 89.00 | 90.00 | 92.00 | 91.00 | 89.00 | |
| W180D | ACC | 82.08 | 80.14 | 76.60 | 79.79 | 79.08 | 78.72 | 80.85 |
| PPV | 59.94 | 32.86 | 47.30 | 34.28 | 20.00 | 35.71 | 34.29 | |
| NPV | 85.21 | 95.75 | 83.32 | 94.81 | 98.58 | 92.92 | 96.23 | |
| Recall | 53.13 | 71.86 | 51.33 | 68.57 | 82.35 | 62.50 | 75.00 | |
| F1 | 56.33 | 45.10 | 49.23 | 45.71 | 32.18 | 45.45 | 47.06 | |
| AUC | 82.00 | 76.54 | 68.31 | 74.97 | 80.61 | 71.95 | 78.30 | |
| Micro-average AUC | 88.00 | 85.00 | 85.00 | 88.00 | 86.00 | 88.00 | 86.00 | |
| W365D | ACC | 88.56 | 88.70 | 87.87 | 88.70 | 88.70 | 89.12 | 88.28 |
| PPV | 31.03 | 0.00 | 18.52 | 3.71 | 0.00 | 3.7 | 0.00 | |
| NPV | 94.01 | 100.0 | 95.75 | 99.53 | 100.0 | 100.0 | 99.53 | |
| Recall | 38.46 | 0.00 | 41.67 | 50.00 | 0.00 | 100.0 | 0.00 | |
| F1 | 34.35 | 25.64 | 6.90 | 7.14 | 0.00 | |||
| AUC | 75.00 | 65.99 | 69.51 | 93.53 | 44.32 | |||
| Micro-average AUC | 91.00 | 91.00 | 93.00 | 92.00 | 90.00 | |||
| A365D | ACC | 76.07 | 71.96 | 70.94 | 76.01 | 72.97 | 74.66 | 72.63 |
| PPV | 47.35 | 9.52 | 35.71 | 30.96 | 4.76 | 29.76 | 22.62 | |
| NPV | 78.14 | 96.70 | 83.08 | 93.87 | 100.0 | 92.45 | 92.45 | |
| Recall | 45.37 | 53.33 | 43.42 | 66.67 | 100.0 | 60.98 | 54.29 | |
| F1 | 46.34 | 16.16 | 39.19 | 42.28 | 9.09 | 40.00 | 31.93 | |
| AUC | 72.00 | 63.14 | 63.08 | 72.05 | 86.30 | 68.92 | 64.69 | |
| Micro-average AUC | 93.00 | 74.00 | 76.00 | 82.00 | 82.00 | 82.00 | 77.00 |
Because W30D dataset is enough and has small imbalance ratio, the models all perform well. Specifically, RF obtains 86.31% ACC, and is 1.75% higher than DLS-MSM. In addition, the F1 score is the harmonic mean of PPV and Recall. So, F1 score make progress by 12.21%, 21.12% and 16.565%, respectively, in W180D, W365D and A365D. This phenomenon indicates that our model is preferable to deal with the dataset. As shown in Table 6, F1 score of the SVM and RF model are null in W365D dataset. This indicates those models have difficulty handling imbalanced data. However, in a large number of negative samples, DLS-MSM can accurately identify the dead HF patients. Moreover, the NPV in the four algorithms are basically flat at 99.88%, which indicates algorithms applied in W365D dataset are hard to solve the imbalance problem. In this point, the DLS-MSM with highest PPV and F1 score is much better than other algorithms.
On the one hand, from the perspective of AUC, the model remains relatively stable. Because of the higher imbalance ratio in W365D, the positive samples are virtually ignored in LGB, from higher Recall and lower F1. Hence, LGB is hard to be adopted in the mortality prediction. The same explanation applies to the RF, which is to 88.70% AUC in A365D. On the other hand, the micro-average of AUC proves that DLS-MSM is effective in dealing with imbalance problem. The micro-average of AUC promotes around 1.5%.
The effect of indicator vector
In order to verify the validity of the indicator vector, we carry out a comparative experiment. The dataset without indicator vector is the control group. So the dimension of input in the experiment is changed to 66. And the dimension of attention layer and linear layer changes to (2, 96) and (288,128) respectively.
Table 7 displays the comparative experiment, which establishes the validity of the indicator vector. All the model metrics with indicator vector are higher than without, except W365D. Because W365D dataset is the most lopsided dataset and F1 is lower than the model with indicator vector. The training model ignores the influence of the positive samples, in W365D. Adding indicator helps model to identify more positive samples from true positive samples. Furthermore the NPV value is higher than not adding the indicator vector which indicates the model can comprehensively handle the positives and negative samples.
Table 7. Effects before and after adding indicator vector.
| Datasets | Measurement | Before | After | Datasets | Measurement | Before | After |
|---|---|---|---|---|---|---|---|
| W30D | ACC | 81.64 | 84.56 | W365D | ACC | 87.52 | 88.56 |
| PPV | 82.49 | 76.28 | PPV | 33.33 | 31.03 | ||
| NPV | 79.24 | 83.57 | NPV | 94.37 | 94.01 | ||
| Recall | 68.32 | 81.26 | Recall | 10.78 | 38.46 | ||
| F1 | 74.74 | 78.69 | F1 | 16.30 | 34.35 | ||
| AUC | 79.24 | 91.00 | AUC | 61.46 | 75.00 | ||
| Micro-average AUC | 90.00 | 91.00 | Micro-average AUC | 92.00 | 91.00 | ||
| W180D | ACC | 80.47 | 82.08 | A365D | ACC | 70.37 | 76.07 |
| PPV | 56.69 | 59.94 | PPV | 46.92 | 47.35 | ||
| NPV | 84.94 | 85.21 | NPV | 76.62 | 78.14 | ||
| Recall | 51.14 | 53.13 | Recall | 43.49 | 45.37 | ||
| F1 | 53.77 | 56.33 | F1 | 45.14 | 46.34 | ||
| AUC | 71.61 | 82.00 | AUC | 62.76 | 73.00 | ||
| Micro-average AUC | 88.00 | 88.00 | Micro-average AUC | 77.00 | 93.00 |
The effect of using attention
Attention plays an import role in the CNN framework. Therefore, we compare CNN framework with and without attention, showing in Table 8. Because PPV value and Recall are mutually impacted. Recall in DLS-MSM with attention mechanism is higher than without attention. On the contrary, PPV value in without attention model is higher than with attention. This result involves the problem of recognition rate. The higher Recall means that the model can distinguish true positive samples. The NPV value reflects that the DLS-MSM with attention can accurately identify negative samples. In addition, the model with attention is more stable, which is displayed by the AUC and Micro-average AUC. All in all the proposed DLS-MSM system with attention is better than without attention except the recall in W30D database.
Table 8. Effect of using attention and not using.
| Position | Measurement | W30D | W180D | W365D | A365D |
|---|---|---|---|---|---|
| Without attention | ACC | 82.51 | 80.75 | 81.79 | 74.29 |
| PPV | 75.04 | 66.27 | 31.24 | 54.72 | |
| NPV | 86.78 | 78.10 | 93.53 | 73.73 | |
| Recall | 81.35 | 43.51 | 34.19 | 37.54 | |
| F1 | 78.07 | 52.53 | 32.65 | 44.53 | |
| AUC | 81.42 | 74.90 | 60.68 | 66.78 | |
| Micro-average AUC | 90.00 | 89.00 | 90.00 | 82.00 | |
| DLS-MSM | ACC | 84.56 | 82.08 | 88.56 | 76.07 |
| PPV | 76.28 | 59.94 | 31.03 | 47.35 | |
| NPV | 83.57 | 85.21 | 94.01 | 78.14 | |
| Recall | 81.26 | 53.13 | 38.46 | 45.37 | |
| F1 | 78.69 | 56.33 | 34.35 | 46.34 | |
| AUC | 91.00 | 82.00 | 75.00 | 73.00 | |
| Micro-average AUC | 91.00 | 88.00 | 91.00 | 93.00 |
Discussion
Effect of attention in different locations
The attention mechanism plays different roles in different locations of the system structure. Therefore, we involve two distinct experiments to study the effective locations. According to the principle of attention mechanism, we choose the position one between convolution layer and different kernels layer and the position two between the different kernel layers and stitching layer shown in Fig 2. The sizes of the attention mechanism are respectively (2, 90), (2, 44).
As shown in Table 9, recall of W180D dataset and A365D dataset in position 1 is 5.64% and 26.87% higher than position 2 respectively. However, the PPV is not higher. Hence attention in position 1 pays more attention on the positive samples. And in W365D because of the high imbalanced rate the attention mechanism does not work. The NPV in our model is average 1.966% higher than position 1 and 2. It indicates our model can effectively judges negative samples. Most results in our model are better than the other two positions. The results confirm the validity of the idea in our model that attention is applied after the splicing layer.
Table 9. The effects of different positions of multi-head self-attention.
| Position | Measurement | W30D | W180D | W365D | A365D |
|---|---|---|---|---|---|
| Position 1 | ACC | 80.78 | 76.35 | 89.43 | 66.28 |
| PPV | 76.94 | 51.79 | 32.53 | 38.86 | |
| NPV | 83.04 | 81.19 | 92.49 | 76.92 | |
| Recall | 71.11 | 49.62 | 11.97 | 39.48 | |
| F1 | 73.91 | 50.68 | 17.50 | 39.17 | |
| AUC | 79.91 | 67.84 | 60.31 | 57.88 | |
| Micro-average AUC | 89.00 | 84.00 | 91.00 | 73.00 | |
| Position 2 | ACC | 79.37 | 77.66 | 83.55 | 73.93 |
| PPV | 74.18 | 55.39 | 31.83 | 63.35 | |
| NPV | 81.94 | 82.35 | 91.18 | 77.02 | |
| Recall | 70.70 | 45.04 | 23.93 | 12.30 | |
| F1 | 72.40 | 49.68 | 27.32 | 20.60 | |
| AUC | 78.27 | 69.30 | 60.47 | 68.93 | |
| Micro-average AUC | 87.00 | 85.00 | 90.00 | 81.00 | |
| DLS-MSM | ACC | 84.56 | 82.08 | 88.56 | 76.07 |
| PPV | 76.28 | 59.94 | 31.03 | 47.35 | |
| NPV | 83.57 | 85.21 | 94.01 | 78.14 | |
| Recall | 81.26 | 53.13 | 38.46 | 45.37 | |
| F1 | 78.69 | 56.33 | 34.35 | 46.34 | |
| AUC | 91.00 | 82.00 | 75.00 | 73.00 | |
| Micro-average AUC | 91.00 | 88.00 | 91.00 | 93.00 |
Effect of different number of heads
The multi-head self-attention mechanism can have different heads, which may influence the system performance. From this point we set the head 2, 3, 6, 11 and 12 for analyzing. The specific results are explained in Fig 6. Besides, the more the number of the head increase the model complexity and influence the training time. Thus, we test the average time in different heads, shown in Fig 7. The unit is seconds.
Fig 6. Comparison of the different head in multi-head self-attention mechanism.
Fig 7. Comparison of the different head results with time(s).
The result in the upper left of Fig 6 is W30D. Recall fluctuates widely. When the number of the head reaches 12, the recall is 85.29. But there is no significant increase in F1 relative to the other results. The high recall rate but low F1 indicates that more real negative samples are predicted to be true. The number of six heads is similar to this. And the number of 3 heads and 11 heads behaves poorly. Recall is only 72.48 and 69.06 severally. Hence, in the W30D the number of 2 head has a better manifestation. Besides it spends less time training. Other datasets in Fig 6 give us a more obvious different model performance. Comprehensive analysis indicates that in our study the number of 2 head in attention has best consequence and less time consumption.
Important feature ranking based Deep SHAP
Neural network is often thought of as a black box, so how to interpret the complex models become a hot topic study. We adopt Deep SHAP [32] to interpret our deep learning model and rank the feature. Deep SHAP is a deep learning interpreted model combining DeepLIFT theory and SHAP value. DeepLIFT is used to interpret deep learning model by calculating the weight to each feature of the input in the backpropagation [33]. SHAP (SHapley Additive exPlanations) gives each feature a unique importance score. Scott M. Lundberg and Su-In Lee combine the two theories to explain the deep learning model and prove it more suitable for human to understand model. Fig 8 displays the top 15 features in our system.
Fig 8. The top 15 features in system.

In Fig 8, the blue label indicates the degree of every characteristic affects mortality. The red label indicates the alive degree. The features in Fig 8 are digital, and the characteristics corresponding to numbers are shown in Table 10.
Table 10. Feature correspondence q.
| Number | Feature |
|---|---|
| Feature 84 | pCO2 (Laboratory test) |
| Feature 17 | Respiratory failure (Diagnoses) |
| Feature 88 | BMI |
| Feature 3 | ARB (Medicine) |
| Feature 16 | Pulmonary circulation disorder (Diagnoses) |
| Feature 42 | WBC (Laboratory test) |
| Feature 7 | Diuretic (Medicine) |
| Feature 32 | Diabetes (Diagnoses) |
| Feature 41 | CCU stays (ICU information) |
| Feature 50 | GLU (Laboratory test) |
| Feature 66 | Potassium (Laboratory test) |
| Feature 5 | CCB (Medicine) |
| Feature 89 | Insert BMI (Indicator vector) |
| Feature 44 | Hemoglobin (Laboratory test) |
| Feature 8 | Nitrates (Medicine) |
Heart failure and severe hypoxia and ischemia can be combined with severe arrhythmia, especially the occurrence of ventricular fibrillation. Clinical symptoms include loss of consciousness, convulsions, respiratory arrest and even death. So the respiratory failure is the important features. The RAAS system is activated when heart failure occurs. ARB drugs can inhibit the RAAS system and myocardial remodeling, delay the progression of heart failure. Calcium channel blockers (CCB) can reduce calcium in myocardial cells concentration to improve myocardial active diastolic function and lower blood pressure.
From these clinical medical conclusions, it can be shown that the features extracted, in Table 10, by our system are scientific and effective. Therefore, our system can support to doctors in prognostic treatment.
Conclusion
In our study, a CNN deep learning model based on multi-head self-attention is applied to the mortality prediction system for prognostic HF patients. The system can distinguish between four categories of death, that is, death within 30 days, death within 180 days, death within 365 days and death after 365 days. First, we proposed that the indicator vector indicates the value is true or be filled. Then a multi-head self-attention is introduced to CNN deep learning model. Finally, the Focal loss function is applied to overcome the imbalance. The results from experiment display the idea is feasible. The whole system is effective to predict the mortality. In the end in order to explain the system, we use the Deep SHAP method to make an essential feature reasonable rank.
Supporting information
The code is about how to extract data from the MIMIC-III.
(7Z)
(DOCX)
Data Availability
Our data is not available upon request from the MIT Laboratory for Computational Physiology Institutional Data Access / Ethics Committee. Our data is from the MIMIC-III (Medical Information Mart for Intensive Care III). We obtain permission to use MIMIC-III according to the requirements and downloaded the corresponding data. All who want to use the data must be a credentialed user and sign the data use agreement for the project from https://physionet.org/ website. We provide data extraction code linked https://github.com/foneone/SQL-code-for-extracting-heart-failure-from-MIMIC which can be used by the authorized user to extract related data.
Funding Statement
The article is supported by the following projects: National Major Scientific Research Instrument Development Project (grant number 62027819): High-speed Real-time Analyzer for Laser Chip’s Optical Catastrophic Damage Process, awarded to JZ; The General Object of National Natural Science Foundation(grant number 62076177): Study on the risk Assessment Model of heart failure by integrating multi-modal big data, awarded to DL; Shanxi Province key technology and generic technology R&D project (grant number 2020XXX007): Energy Internet Integrated Intelligent Data Management and Decision Support Platform, DL; Key research and development program of Shanxi Province (grant number 202102020101006), awarded to JZ.
References
- 1.Tripoliti EE, Papadopoulos TG, Karanasiou GS, Naka KK, Fotiadis DI. Heart Failure: Diagnosis, Severity Estimation and Prediction of Adverse Events Through Machine Learning Techniques. Comp Struct Biotechnol J. 2017;15:26–47. doi: 10.1016/j.csbj.2016.11.001 PubMed PMID: WOS:000392631200005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Chinese Society of Cardiology of Chinese Medical Association, Editorial Board of Chinese Journal of Cardiology. Chinese guidelines for the diagnosis and treatment of heart failure 2018. Chin J Heart Fail & Cardiomyopathy. 2018;2(4):196–225. [Google Scholar]
- 3.Qian M, Pathak J, Pereira NL, Zhai C. Temporal Reflected Logistic Regression for Probabilistic Heart Failure Survival Score Prediction. In: Hu XH, Shyu CR, Bromberg Y, Gao J, Gong Y, Korkin D, et al., editors. 2017 Ieee International Conference on Bioinformatics and Biomedicine. IEEE International Conference on Bioinformatics and Biomedicine-BIBM 2017. p. 410–6. [Google Scholar]
- 4.Yancy CW, Jessup M, Bozkurt B, Butler J, Casey DE Jr., Colvin MM, et al. 2017 ACC/AHA/HFSA Focused Update of the 2013 ACCF/AHA Guideline for the Management of Heart Failure A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines and the Heart Failure Society of America. Circulation. 2017;136(6):E137-+. doi: 10.1161/CIR.0000000000000509 PubMed PMID: WOS:000407078200001. [DOI] [PubMed] [Google Scholar]
- 5.Yancy CW, Jessup M, Bozkurt B, Butler J, Casey DE Jr., Colvin MM, et al. 2017 ACC/AHA/HFSA Focused Update of the 2013 ACCF/AHA Guideline for the Management of Heart Failure A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines and the Heart Failure Society of America. Journal of Cardiac Failure. 2017;23(8):628–51. doi: 10.1016/j.cardfail.2017.04.014 PubMed PMID: WOS:000408403000009. [DOI] [PubMed] [Google Scholar]
- 6.Hao G, Wang X, Chen Z, Zhang L, Zhang Y, Wei B, et al. Prevalence of heart failure and left ventricular dysfunction in China: the China Hypertension Survey, 2012–2015 (vol 21, pg 1329, 2019). European Journal of Heart Failure. 2020;22(4):759-. doi: 10.1002/ejhf.1808 PubMed PMID: WOS:000528930800026. [DOI] [PubMed] [Google Scholar]
- 7.Roth GA, Johnson C, Abajobir A, Abd-Allah F, Abera SF, Abyu G, et al. Global, Regional, and National Burden of Cardiovascular Diseases for 10 Causes, 1990 to 2015. Journal of the American College of Cardiology. 2017;70(1):1–25. doi: 10.1016/j.jacc.2017.04.052 PubMed PMID: WOS:000404045700001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Duarte R, Stainthorpe A, Mahon J, Greenhalgh J, Richardson M, Nevitt S, et al. Lead-I ECG for detecting atrial fibrillation in patients attending primary care with an irregular pulse using single-time point testing: A systematic review and economic evaluation. Plos One. 2019;14(12). doi: 10.1371/journal.pone.0226671 PubMed PMID: WOS:000515082600040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Samad MD, Ulloa A, Wehner GJ, Jing L, Hartzel D, Good CW, et al. Predicting Survival From Large Echocardiography and Electronic Health Record Datasets Optimization With Machine Learning. Jacc-Cardiovascular Imaging. 2019;12(4):681–9. doi: 10.1016/j.jcmg.2018.04.026 PubMed PMID: WOS:000462880900011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Wang Z, Yao L, Li D, Ruan T, Liu M, Gao J. Mortality prediction system for heart failure with orthogonal relief and dynamic radius means. International Journal of Medical Informatics. 2018;115:10–7. doi: 10.1016/j.ijmedinf.2018.04.003 PubMed PMID: WOS:000432659000002. [DOI] [PubMed] [Google Scholar]
- 11.Stampehl M, Friedman HS, Navaratnam P, Russo P, Park S, Obi EN. Risk assessment of post-discharge mortality among recently hospitalized Medicare heart failure patients with reduced or preserved ejection fraction. Current Medical Research and Opinion. 2020;36(2):179–88. doi: 10.1080/03007995.2019.1662654 PubMed PMID: WOS:000486555300001. [DOI] [PubMed] [Google Scholar]
- 12.Suzuki S, Motoki H, Kanzaki Y, Maruyama T, Hashizume N, Kozuka A, et al. A Predictive Model for 6-Month Mortality in Elderly Patients with Heart Failure. International Heart Journal. 2020;61(2):325–31. doi: 10.1536/ihj.19-572 PubMed PMID: WOS:000522157900019. [DOI] [PubMed] [Google Scholar]
- 13.Avula S, LaFata M, Nabhan M, Allana A, Toprani B, Scheidel C, et al. Heart failure mortality prediction using PRISM score and development of a classification and regression tree model to refer patients for palliative care consultation. International journal of cardiology Heart & vasculature. 2020;26:100440-. doi: 10.1016/j.ijcha.2019.100440 PubMed PMID: MEDLINE:31886404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zahavi G, Frogel J, Shlomo N, Klempfner R, Unger R. MACHINE LEARNING MODELS PREDICT 30-DAY AND 1-YEAR MORTALITY IN HEART FAILURE. Journal of the American College of Cardiology. 2020;75(11):858-. PubMed PMID: WOS:000522979100844. [Google Scholar]
- 15.Kosztin A, Schwertner WR, Tokodi M, Toser ZS, Kovacs A, Veres B, et al. Machine-learning defined predictors of mortality in ischemic and non-ischemic heart failure patients undergoing CRT-P or CRT-D implantation. European Heart Journal. 2019;40:933-. PubMed PMID: WOS:000507313000894. [Google Scholar]
- 16.Agasthi P, Smith SD, Murphy KM, Buras MR, Golafshar M, Herner M, et al. Artificial Intelligence Helps Predict 5-year Mortality and Graft Failure in Patients Undergoing Orthotopic Heart Transplantation. Journal of Heart and Lung Transplantation. 2020;39(4):S142-S. PubMed PMID: WOS:000522637201022. [Google Scholar]
- 17.Adler ED, Voors AA, Klein L, Macheret F, Braun OO, Urey MA, et al. Improving risk prediction in heart failure using machine learning. European Journal of Heart Failure. 2020;22(1):139–47. doi: 10.1002/ejhf.1628 PubMed PMID: WOS:000495886300001. [DOI] [PubMed] [Google Scholar]
- 18.Wang Z, Wang B, Zhou Y, Li D, Yin Y. Weight-based multiple empirical kernel learning with neighbor discriminant constraint for heart failure mortality prediction. Journal of Biomedical Informatics. 2020;101. doi: 10.1016/j.jbi.2019.103340 PubMed PMID: WOS:000525735000012. [DOI] [PubMed] [Google Scholar]
- 19.Kwon J-M, Kim K-H, Jeon K-H, Lee SE, Lees H-Y, Cho H-J, et al. Artificial intelligence algorithm for predicting mortality of patients with acute heart failure. Plos One. 2019;14(7). doi: 10.1371/journal.pone.0219302 PubMed PMID: WOS:000484939800028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Wang Z, Zhu Y, Li D, Yin Y, Zhang J. Feature rearrangement based deep learning system for predicting heart failure mortality. Computer Methods and Programs in Biomedicine. 2020;191. doi: 10.1016/j.cmpb.2020.105383 PubMed PMID: WOS:000546180100006. [DOI] [PubMed] [Google Scholar]
- 21.Johnson AEW, Pollard TJ, Shen L, Lehman L-wH, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Scientific Data. 2016;3. doi: 10.1038/sdata.2016.35 PubMed PMID: WOS:000390216100001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):E215–20. doi: 10.1161/01.cir.101.23.e215 PubMed PMID: MEDLINE:10851218. [DOI] [PubMed] [Google Scholar]
- 23.Lin W-C, Tsai C-F. Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review. 2020;53(2):1487–509. doi: 10.1007/s10462-019-09709-4 PubMed PMID: WOS:000513275100021. [DOI] [Google Scholar]
- 24.Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review. 1958;65(6):386–408. doi: 10.1037/h0042519 PubMed PMID: MEDLINE:13602029. [DOI] [PubMed] [Google Scholar]
- 25.Xiao X, Zhang D, Hu G, Jiang Y, Xia S. CNN-MHSA: A Convolutional Neural Network and multi-head self-attention combined approach for detecting phishing websites. Neural Networks. 2020;125:303–12. doi: 10.1016/j.neunet.2020.02.013 PubMed PMID: WOS:000523306100027. [DOI] [PubMed] [Google Scholar]
- 26.Wang D, Zhang Z, Jiang Y, Mao Z, Wang D, Lin H, et al. DM3Loc: multi-label mRNA subcellular localization prediction and analysis based on multi-head self-attention mechanism. Nucleic acids research. 2021. doi: 10.1093/nar/gkab016 PubMed PMID: MEDLINE:33503258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zhang Y, Gong Y, Zhu H, Bai X, Tang W. Multi-head enhanced self-attention network for novelty detection. Pattern Recognition. 2020;107. doi: 10.1016/j.patcog.2020.107486 PubMed PMID: WOS:000552866000036. [DOI] [Google Scholar]
- 28.Lin T-Y, Goyal P, Girshick R, He K, Dollar P, Ieee. Focal Loss for Dense Object Detection. 2017 Ieee International Conference on Computer Vision. IEEE International Conference on Computer Vision2017. p. 2999–3007.
- 29.Tan Q, Li W, Chen X. Identification the source of fecal contamination for geographically unassociated samples with a statistical classification model based on support vector machine. Journal of hazardous materials. 2021;407:124821-. doi: 10.1016/j.jhazmat.2020.124821 PubMed PMID: MEDLINE:33340974. [DOI] [PubMed] [Google Scholar]
- 30.Tang BZ, Wang XL, Yan J, Chen QC. Entity recognition in Chinese clinical text using attention-based CNN-LSTM-CRF. BMC Med Inform Decis Mak. 2019;19:9. doi: 10.1186/s12911-019-0787-y PubMed PMID: WOS:000463675400011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, et al., editors. Advances in Neural Information Processing Systems 30. Advances in Neural Information Processing Systems. 302017.
- 32.Lundberg SM, Lee SI. A Unified Approach to Interpreting Model Predictions. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, et al., editors. Advances in Neural Information Processing Systems 30. Advances in Neural Information Processing Systems. 30. La Jolla: Neural Information Processing Systems (Nips); 2017. [Google Scholar]
- 33.Shrikumar A, Greenside P, Kundaje A, editors. Learning Important Features Through Propagating Activation Differences. 34th International Conference on Machine Learning; 2017 2017 Aug 06–11; Sydney, AUSTRALIA2017.







