Skip to main content
Journal of the American Medical Informatics Association: JAMIA logoLink to Journal of the American Medical Informatics Association: JAMIA
. 2021 Jan 26;28(4):713–726. doi: 10.1093/jamia/ocaa306

Importance-aware personalized learning for early risk prediction using static and dynamic health data

Qingxiong Tan 1,#, Mang Ye 2,#, Andy Jinhua Ma 3, Terry Cheuk-Fung Yip 4, Grace Lai-Hung Wong 4, Pong C Yuen 1,
PMCID: PMC7973445  PMID: 33496786

Abstract

Objective

Accurate risk prediction is important for evaluating early medical treatment effects and improving health care quality. Existing methods are usually designed for dynamic medical data, which require long-term observations. Meanwhile, important personalized static information is ignored due to the underlying uncertainty and unquantifiable ambiguity. It is urgent to develop an early risk prediction method that can adaptively integrate both static and dynamic health data.

Materials and Methods

Data were from 6367 patients with Peptic Ulcer Bleeding between 2007 and 2016. This article develops a novel End-to-end Importance-Aware Personalized Deep Learning Approach (eiPDLA) to achieve accurate early clinical risk prediction. Specifically, eiPDLA introduces a long short-term memory with temporal attention to learn sequential dependencies from time-stamped records and simultaneously incorporating a residual network with correlation attention to capture their influencing relationship with static medical data. Furthermore, a new multi-residual multi-scale network with the importance-aware mechanism is designed to adaptively fuse the learned multisource features, automatically assigning larger weights to important features while weakening the influence of less important features.

Results

Extensive experimental results on a real-world dataset illustrate that our method significantly outperforms the state-of-the-arts for early risk prediction under various settings (eg, achieving an AUC score of 0.944 at 1 year ahead of risk prediction). Case studies indicate that the achieved prediction results are highly interpretable.

Conclusion

These results reflect the importance of combining static and dynamic health data, mining their influencing relationship, and incorporating the importance-aware mechanism to automatically identify important features. The achieved accurate early risk prediction results save precious time for doctors to timely design effective treatments and improve clinical outcomes.

Keywords: early prediction of clinical risk, personalized medicine, importance-aware mechanism, learnable feature fusion, deep learning

INTRODUCTION

Peptic ulcer bleeding (PUB) is 1 type of common gastrointestinal condition characterized by hemorrhage caused by a mucosal break reaching the submucosa.1,2 In the general population, the incidence of PUB ranges between 19.4 and 57.0 per 100 000 individuals.3 PUB patients suffer from a wide variety of clinical severities, from minor bleeding to fatality, causing increased health care expenditure and decreased quality of life.1,4 Accurate risk predictions could help identify potential high-risk patients, optimize their management plan, and arrange more aggressive endoscopic and medical treatment, effectively improving final clinical outcomes.

Widely adopted electronic health records (EHRs) produce a large volume of medical sequential data (eg, laboratory parameters) and static background data (eg, birth date, gender), providing unprecedented opportunities for developing advanced machine learning models to integrate information from multiple sources to improve clinical outcomes.5,6 Static data provide basic health information on the personal immune abilities of patients,7 which indicate their different abilities to recover from illness, while dynamic medical data reflect the real-time changes in the health status of patients over time.8–12

Most risk prediction methods mainly analyze a single type of medical data (eg, static data,13–15 or dynamic data.16–25) The modeling of static medical data has been applied for several tasks (eg, the diagnosis of cancer and13,14 liver inflammation prediction15). However, static medical data only provide basic background health status of patients. As a result, the prediction accuracy of these methods is often limited. To improve risk prediction results, recurrent neural networks (RNNs)-based models have been proposed to learn temporal dependencies from the timestamped medical records. Choi et al16 introduced Doctor AI to cover dynamic medical records and perform multilabel predictions. To dynamically estimate the health status of patients, Aczon et al18 proposed an RNN model to learn the course that patients encounter. Che et al20 proposed gated recurrent network with a decay mechanism (GRU-D) to demonstrate the effeteness of missing data for clinical prediction. Shukla and Marlin23 introduced InterpNet consisting of an interpolation network and a prediction network to handle sparsity and irregularity in dynamic medical data. Baytas et al24 designed a time-aware long short-term memory (T-LSTM) network to deal with irregular intervals between successive visits of longitudinal medical records. To tackle the heterogeneity in the longitudinal records of Parkinson patients, Che et al25 introduced a new RNN architecture that directly learns the similarity between Parkinson patients by dynamically matching the temporal patterns. Attention mechanisms are incorporated to jointly model continuous medical data and discrete events19 and jointly deal with missing values and irregular intervals in multivariate medical records.22 Many efforts have been paid on improving structures of predictive models, designing new optimization strategies, and integrating novel attention mechanisms to achieve better results. However, to achieve satisfactory prediction results, these methods usually require long-term observations to learn complete patterns of changes in a patient’s disease. As a result, these methods have practical limitations because the diseases may have already become too severe to treat during the long observation periods.

Several pioneer works have studied the combination of heterogeneous health data for improving clinical outcome prediction performances. Xu et al26 designed a hierarchical LSTM (HieLSTM) model to jointly analyze structured and unstructured EHR data. However, the target of HieLSTM is to divide patients into 3 distinct subphenotypes. Therefore, how to further utilize HieLSTM to provide personalized risk predictions and health care services for each patient may still need to be explored. Che et al27 proposed a powerful knowledge-distillation approach, which feeds static medical data into a deep neural network (DNN) and dynamic medical data into a GRU network and then concatenates the learned features into a prediction layer. An interpretable mimic learning method is finally built to distill knowledge from the DNN via Gradient Boosting Trees. Based on RNNs, Esteban et al28 introduced a novel algorithm to combine different types of medical information to improve risk prediction results, which feeds each vector in static data and dynamic data into a representation layer. The learned latent representations are then stacked together horizontally into a DNN layer to obtain clinical outcomes. Different types of medical data are considered, but they treat static data and dynamic data as 2 isolated objects (ie, they simply concatenate features extracted from static data and dynamic data without considering the influencing relationship between them). As a result, the imperative relationship between static data and dynamic data is ignored in their models, which is essential for a reliable machine learning model.

Our basic idea is motivated by the observation that static data may influence future trajectories of dynamic variables, as illustrated in Figure 1. Since patients usually have different personal immune abilities, determined by static health data,7 their dynamic parameters could finally develop toward totally different trends though they have similar initial status. As shown in Figure 1, medical records of 2 patients show a large degree of similarity in the observation window 1 (ie, the majority of laboratory parameters are in the low-risk interval while very few records fall into the middle-risk interval). Only analyzing dynamic medical records in this window can hardly result in the correct predictions of their different clinical outcomes because of the similarity of observed records. Extending the observation period to window 2 will probably result in correct prediction results because the patient in Figure 1a shows an increasing trend in the middle-risk interval, while the patient in Figure 1b recovers from illness steadily. However, the extension of the observation period is costly for patients and may even lead to missing the optimum treatment time. Therefore, it is urgent to develop a personalized model that is able to provide accurate early prediction of clinical risk by considering the influence of personal immune abilities on future changes in the health conditions of patients. Such a personalized medical model can provide accurate risk prediction results, even when only medical events in a short-term period are observed, and promote the design of personalized treatments in advance to effectively improve clinical outcomes.

Figure 1.

Figure 1.

Illustration of dynamic changes in laboratory parameters of 2 patients with different immune abilities.

To tackle the aforementioned challenges, we propose a novel personalized deep learning approach (PDLA) to jointly handle static and dynamic medical data. To accurately extract the correlation between static data and dynamic data, which describes the influence of personal immune ability on dynamic health indicators, we design a residual network (ResNet) module to capture the influencing relationship between different variables. It shows stronger capacity than29,30 with the shortcut connection scheme31 to handle the changing variables in different modalities. Furthermore, to capture the dynamic changes in the status of patients, a LSTM network32 is designed for dynamic medical records. This component captures temporal information while avoiding the vanishing gradient problem by properly controlling different gates in the memory cell. Finally, we design a deep multi-residual multi-scale network (M2N), which has strong nonlinear modeling capacity and fuses features from multiple sources at multiple scales to simultaneously integrate information learned from diverse layers of the above designed ResNet and LSTM.

A preliminary conference version of this work33 has been published in AMIA 2018. In this journal version, we propose an extended version of PDLA built upon the end-to-end trainable architecture with the importance-aware mechanism, termed eiPDLA. It has the following major improvements: First, we give a detailed analysis about the rationale of combining static and dynamic medical data to achieve early prediction of clinical risk for practical medical applications. Second, we incorporate correlation attention and temporal attention into the ResNet and LSTM modules to adaptively learn and adjust the contribution weights of static and dynamic data, improving risk prediction performances while providing clinical interpolations. Third, we design eiPDLA in an end-to-end trainable architecture, which enables different components to optimize their parameters simultaneously, achieving global optimal. Fourth, we introduce a novel importance-aware mechanism to adaptively learn fusion weights for different deep features, assigning larger weights to key features and further promoting the clinical risk prediction performance. Last but not least, extensive experimental results with in-depth analysis under various settings are presented to illustrate the superiority of the proposed method. The main contributions of this article are listed below:

  • We introduce a novel personalized deep learning approach (PDLA) to jointly handle static and dynamic medical data, which fully employs useful information from different sources and captures their influencing relationship, providing accurate and personalized medical services.

  • We develop a Residual Network with Correlation Attention (CA-ResNet) to mine the correlations between static data and dynamic data, and the temporal cues are also simultaneously captured with a LSTM network with Temporal Attention (TA-LSTM), effectively identifying vital inputs while providing meaningful clinical interpolations.

  • We propose a novel eiPDLA network to introduce an importance-aware mechanism to adaptively learn fusion weights for different deep features and assign larger weights to important features. Furthermore, eiPDLA is designed in the end-to-end architecture, optimizing parameters of different components simultaneously.

  • We empirically show that the proposed method outperforms state-of-the-art methods under various settings on real-world medical datasets.

RELATED WORK

Existing approaches usually perform diagnostic predictions only analyzing static medical data13–15 or dynamic data.16–23 Zheng et al13 introduced a hybrid K-SVM method consisting of K-means and support vector machine (SVM) for the diagnosis of breast cancer. Gorunescu and Belciug14 designed a fitness-based weighted voting strategy to collaborate several machine learning algorithms for cancer detection. Nilashi et al15 developed a prediction system consisting of ensembles of neuro-fuzzy technique for liver inflammation prediction. However, these methods are only able to estimate the basic background health status of patients, which is not suitable for practical medical applications since most diseases demonstrate dynamic processes.

Recently, RNN-based models have been proposed to learn temporal dependencies from timestamped medical records. Choi et al16 built a temporal model named Doctor AI to cover recorded medication uses and medical conditions which is used to predict diagnosis categories. They also used RNN to detect heart failure by learning temporal relations among medical events.17 Aczon et al18 developed an RNN model to dynamically track the health conditions of patients by learning courses that patients encounter. Xu et al19 proposed a Recurrent Attentive and Intensive Model, which contains an efficient attention mechanism, to forecast physiological decompensation by jointly modeling discrete clinical events and continuous monitoring data. Che et al20 designed a new RNN-based approach to directly incorporate the time intervals and masking, which demonstrate the effeteness of missing data for clinical prediction tasks. To improve risk prediction results, Tan et al21 proposed an uncertainty-aware network to incorporate the uncertainty information caused by the generation of regular data from irregular medical records. They also built a Dual-Attention Time-Aware Gated Recurrent Unit (DATA-GRU) to jointly handle the missing values and varying visit intervals in irregular multivariate time series medical data.22 Shukla and Marlin23 proposed a new deep learning architecture, which contains both an interpolation network and a prediction network to handle sparse and irregular medical data. These models need relatively long observation windows to learn complete trends of diseases only from patients’ dynamic medical data so that they can achieve accurate prediction results. However, long observation windows may make the health conditions of patients become too severe to treat, providing limited significance for practical applications.

Several pioneer works have studied the combination of static and dynamic medical data to improve risk prediction results. Che et al27 built an interpretable deep learning model to predict clinical outcomes, in which static data and dynamic data are fed into a feedforward deep neural network (DNN) and a GRU network, respectively. The extracted features are concatenated into a prediction layer. Finally, an interpretable mimic learning method is formed by distilling knowledge from the DNN. Esteban et al28 fed each vector of static data and dynamic data into a representation layer and then stacked latent representations together horizontally into a DNN layer to predict clinical outcomes. However, these 2 methods simply concatenate features extracted from static and dynamic data without considering their influencing relationship. As a result, the imperative relationship is ignored in their models, which is essential for deducing the development trend of the diseases and building reliable prediction models to achieve early prediction of clinical risk.

MATERIALS AND METHODS

This article proposes a PDLA to achieve early prediction of clinical risk by jointly analyzing static and dynamic medical data, as shown in Figure 2. This framework contains 3 parts, namely correlation analysis, temporal analysis, and deep feature fusion. Firstly, correlation analysis is performed to learn the influencing relationship of static background health data on dynamic medical events. Secondly, a simple but efficient alignment module is designed to handle irregular medical sequential records and a multilayer TA-LSTM is employed to learn dynamic changes in the status of patients. Finally, a deep M2N is introduced to fuse different features.

Figure 2.

Figure 2.

An overview of the proposed personalized deep learning approach (PDLA) for early prediction of clinical risk.

Correlation analysis

Due to the heterogeneity of static data and dynamic data, directly analyzing their relationship is challenging. To tackle this, we distill some new static variables to represent the general state of dynamic data. The first variable is the mean of each type of dynamic data because it provides overall information. The second variable is the pattern of missing data. Different from most methods, which regard missing data as random phenomenon and simply use imputation techniques to fill in missing values,18,19,27 the authors of this article think that this phenomenon may be caused by judgments of doctors on the status of patients. For example, doctors often arrange for patients with severe status to undertake more examinations to timely grasp their health conditions. Conversely, for patients under good health status, doctors often arrange for fewer examinations to reduce their pain and expenses. Thus, moving windows are used to detect whether patients have missing data at different stages.

As illustrated in Figure 2, there is a certain degree of correlation between dynamic medical records and static data because of the influence of static data on the future development trends of dynamic variables. Specifically, static data contain background health information of a patient, which determines the personal immune ability of the patient. For patients with different personal immune abilities, their disease trajectories often develop toward different directions. As demonstrated in Figure 1, for patients with similar early health conditions, their final clinical risks could be totally different due to the differences in their immune abilities and resistances. Therefore, mining this relationship is very important for issuing early warnings of unfavorable outcomes for doctors to design effective treatments, especially when only dynamic medical records of a short period are observed.

To capture such relationships while identifying key variables, we introduce a CA-ResNet, which contains a large number of convolution units to simultaneously analyze different variables. Firstly, to detect important features from different inputs, the correlation attention weight matrix αca is firstly learned from the input X = [x1, x2, ···, xk]T via the following expression:

αca=δ(WcaX+bca)

where Wca ∈ Rk×k is the trainable weight vector; bca ∈ Rk is the trainable bias vector; and δ(•) is selected as the sigmoid function to limit the range of the attention weights in the range [0, 1].

The learned attention weigh matrix αca is then employed to adjust the contribution weights of elements in X using Xca = αcaX. After this, Xca is reshaped into 2D and injected into ResNet containing many basic blocks consisting of a residual module Fi and a shortcut connection.31 The mathematic expression used to produce the output for the ith basic block is represented as:

Yica=Fi(Xca)+Xca

where F i(•) is the connection of batch normalization units,34 convolution units, and rectified linear units.31

The adoption of the residual learning mechanism enables the neural network to become deeper than traditional networks without the vanishing gradient problem and hence obtain stronger modeling capacity at extracting correlation information between different variables. CA-ResNet contains 4 basic blocks with different numbers of filters, and each block contains a stack of two 3 × 3 convolution units and a shortcut connection. The convolution units perform correlation analysis to capture the relationship between static and dynamic medical data by jointly analyzing different variables. Meanwhile, the adoption of shortcut connections stack multiple blocks to increase the depth of the network without the vanishing gradient problem, thus increasing the modeling capacity. Furthermore, to enable ResNet to obtain enough correlation relationship features, several fully connected layers are connected to the averaging pool.

Temporal analysis

Influenced by a substantial number of factors (eg, judgments of doctors on the status of patients), dynamic medical data are sampled irregularly with varying intervals. Figure 1 provides 2 examples of longitudinal records which clearly illustrate that intervals between consecutive records vary significantly, ranging from days to years. To obtain regular time-series data, which can serve as inputs of arbitrary machine learning models, we employ a fixed-length moving window to resample irregular dynamic data into equally-spaced data.

1) Template Guided Alignment Module: Along with the dynamic changes in the health status of patients, the dynamic medical records are examined under different densities at different stages of the disease. The resample process can deal with the problem of multiple records occurring in a window. However, at certain stages, there is no record in the resample window, causing the missing data problem. Temporal discrepancies impact performance since machine learning models are usually built for aligned data. Although the dynamic time warping (DTW) algorithm can align 2 sequences to a certain length L,35,36 for different pairs of sequences, alignment lengths L differ from each other, which is not suitable for real-world medical data sets with thousands of time series. Existing DTW-based methods are only able to deal with a small number of time series.37–40 To tackle this issue, we design a simple but efficient Template Guided Alignment Module based on the following ideas:

  1. For each type of dynamic data, there is a template which represents the essential changing tendency of this feature for patients with a particular disease;

  2. We design a novel Template Guided DTW (TG-DTW), which overcomes the limitations of existing DTW methods that cannot align a large number of time series.

To obtain a suitable template for each type of dynamic data, we filter out samples influenced by the missing data problem and calculate the mean of remaining samples in the moving window periods mentioned above as the template. Since the obtained template is calculated from different patients with complete records, it can represent the essential trends of the feature for patients with this disease. To align a large number of sequences, we design a novel TG-DTW to align each sequence to the length of its template, defined as:

The template T and the time series O to be aligned are represented as follows:

T=t1, t2,···, ti,···, tnO=o1, o2,···, oj,···, om

where n is the length of the template T; m is the length of the “to be aligned” time series O; n > m > 0.

To align time series O to the same length with template T, we build an n-by-m matrix, where the distance between ti and oj is given in the (ith, jth) element. This distance is measured by the Euclidean distance, defined as follows:

d(ti,oj)=(tioj)2

Every element in this matrix indicates alignment of 2 elements in O and T. A contiguous set of matrix elements make up warping path W, which defines the aligning relationship between O and T. The ith element in W is as follows:

wi=(i, j)i

Accordingly, the warping path W is represented as the following expression:

W=w1,···, wi,···, wn

The warping path W is subjected to the following conditions:

Boundary: w 1 = (1, 1) and wn = (m, n). This condition requires that start cell and end cell of warping path W are diagonally opposite.

Monotonicity: Given wi = (i, j)i, then wi−1 = (i’, j’)i, where i − i’ ≥ 0 and j − j’ ≥ 0. This condition requires that all the elements in the warping path W should be monotonically spaced.

Continuity: Given wi = (i, j)i, then wi−1 = (i’, j’)i, where i − i’ = 1 and j − j’ ≤ 1. This condition restricts the allowed steps in warping path W, namely step size wiwi−1 ∈ {[1, 0], [1, 1]}.

The optimal path W can be efficiently found by using dynamic programming to minimize the dissimilarity distance of aligned series under the above 3 conditions. We then calculate the transformed (warped) sequence of O, termed as Ow. In W, step size [1, 1] represents a 1-step advance along with both T and O, while [1, 0] indicates that such advance is only made along O. To obtain the transformed (warped) sequence Ow, the last observation values are repeated at timestamps where the step size is [1, 0] but not at others.

2) Multi-layer TA-LSTM Network: Since the severity of illness is usually a dynamic process, mining temporal dependencies from dynamic medical data is especially important for evaluating the changes in the health conditions of patients along with time and detecting high-risk patients. However, traditional methods often perform statistical analysis or directly summarize dynamic data to get regular features, which ignores the temporal relationships among different events.15,41

To improve risk prediction performance, a multilayer LSTM with Temporal Attention (TA-LSTM) is built to perform temporal analysis for the aligned medical time-series data. Since different dynamic variables may contribute different volume of information for the evaluation of the health status of PUB patients, a novel temporal attention mechanism is first designed to learn attention weights from the dynamic records Xt via the following expression:

αta=δ(WtaXt+bta)

where Xt ∈ RT×J is the input multivariate time series; T is the length of the time series; J is the types of the dynamic variables; Wta ∈ RT×T is the trainable weight vector; bca ∈ RJ is the trainable bias vector; and δ(•) is selected as the sigmoid function to limit the range of the attention weights in the range [0, 1].

The learned temporal attention weigh matrix αta is thus applied to adjust the contribution scores of different dynamic data via Xta = αtaXt. The adjusted Xt is the injected into LSTM, which has memory cells to encode what information has been learned and decide how to handle the learned information for further calculation. Furthermore, as a modified version of RNNs,17,18,42,43 LSTM is able to operate memory units by adjusting the opening and closing of input gate i, forget gate f, and output gate o, thus overcoming the vanishing gradient problem.32 The update function of LSTM is defined as follows:

it=δ(Wixxta+Wimmt1+bi)ft=δ(Wfxxta+Wfmmt1+bf)ot=δ(Woxxta+Wommt1+bo)gt=φ(Wcxxta+Wcmmt1+bc)ct=ftΘct1+itΘ gtht=otΘct

where δ(x) = (1+ex)−1 is a sigmoid nonlinearity that maps inputs into the range of [0, 1] and φ(x) is a hyperbolic tangent nonlinearity; W and b are trainable matrices and vectors; Θ represents element-wise multiplication.

LSTM is able to handle long time series while overcoming the vanishing gradient problem. Thus, we employ it to perform temporal analysis for dynamic medical data to learn the changes in the health conditions of patients. Similarly, to obtain enough temporal features, several fully connected layers are connected to TA-LSTM.

Deep feature fusion

With more layers, deep networks can go deeper and extract features that are more relevant to the outputs. However, it is inevitable that some important information may be lost during signal processing in the early layers of the networks. To tackle this, we extract features from multiple layers of the correlation analysis network and the temporal analysis network.

Features extracted from the correlation perspective and the temporal perspective reflect different types of information on the conditions of patients, which may possess diverse properties. To fuse these features effectively, we design a deep M2N. Different from ordinary residual networks, M2N introduces multiple sizes of convolution units (ie, 1 × 1, 3 × 3, and 5 × 5, represented as Fs1, Fs3, and Fs5, respectively). This strategy enables M2N to analyze features from different aspects simultaneously and fuse them effectively. Furthermore, 1 × 1 convolution unit rather than the concatenation strategy is employed to effectively mix information from channels of different scales. In addition, M2N is built upon the multiresidual learning architecture, which enables it to utilize more residual information to stabilize the training of the network and improve final prediction accuracy. The mathematical expression for the multi-scale residual block (MSRB) is as follows:

Yms=Fms(Xms)+Xms=Fc1×1(Fs1(Xms)+Fs3(Xms)+Fs5(Xms))+Xms

where Xms and Yms are the input and the output of MSRB, respectively; and Fms(•) = Fc1 × 1(Fs1(•) + Fs3(•) + Fs5(•)) represents the overall mathematical expression of the multiscale convolution channel.

The M2N fusion network is defined as follows (for description convenience, expressions for Relu activation and Average pool are not included):

Yms1=Fms(XM2N)+XM2NYms2=Fms(Yms1)+Yms1YM2N=Yms2+XM2N

where XM2N and YM2N are the input and the output of M2N, respectively.

Therefore, for input XM2N, we can obtain the output of M2N YM2N as follows:

YM2N=Yms2+XM2N=Fms(Yms1)+Yms1+XM2N=Fms(Fms(XM2N)+XM2N)+(Fms(XM2N)+XM2N)+XM2N

where Fms(•) = Fc1 × 1(Fs1(•) + Fs3(•) + Fs5(•)); and the parameters in Fc1 × 1(•), Fs1(•), Fs3(•), and Fs5(•) are optimized during the training process.

End-to-end Importance-Aware Personalized Deep Learning Approach (eiPDLA)

In this subsection, we introduce the proposed eiPDLA, which jointly optimizes the above-mentioned components. Meanwhile, a novel importance-aware mechanism is designed to dynamically learn fusion weights for deep features extracted from multiple sources of medical data, adaptively assigning larger weights to important features and improving risk prediction results, as illustrated in Figure 3.

Figure 3.

Figure 3.

The architecture of end-to-end importance-aware personalized deep learning approach (eiPDLA) and the illustration of the importance-aware mechanism.

1) End-to-end Trainable Mechanism: One limitation of PDLA is that parameters of different components are optimized separately; that is, different feature extraction networks are first trained separately, and the extracted features are then inputted into the M2N fusion network to produce the final risk prediction results. Since these components are trained separately, the final risk prediction errors cannot be back propagated to direct the training of feature extraction networks, which may influence the prediction performance.

To solve this problem, eiPDLA is designed in the more advanced end-to-end trainable architecture, which ensures parameters in all the parts of eiPDLA are jointly optimized for 1 loss function, as illustrated in Figure 3a. During the training process, the prediction errors are calculated via the loss function to compare the difference between the prediction results and actual labels. In this article, the cross-entropy is adopted as the loss function to estimate the prediction results. The definition of cross-entropy is as follows:

L=-1Nn=1N(ynlog ynp+(1-yn)log(1-ynp))

where yn is the true label; ynP is the predicted mortality risk; and N is the number of samples in each mini-batch.

The prediction error information is back propagated to M2N to guide the optimization of parameters and improve fusion effects. The prediction error is further back propagated to the importance-aware mechanism to help it adaptively identify important features and readjust their contribution scores. Finally, the error information is back propagated to the correlation analysis and temporal analysis networks, which helps them better extract the influencing relationship and learn temporal dependencies. In the iteration process, the prediction error decreases gradually until convergence.

2) Importance-Aware Learnable Feature Fusion: Since the features are learned from multiple sources of medical data and correspond to raw medical data collected from different examination facilities and at different timestamps, they may possess different degrees of correlation with the outputs and contribute differently to the risk prediction task. For example, temporal medical features approaching the prediction timestamp often contribute more to the estimation of patients’ health conditions than the earlier temporal features. As illustrated in Figure 3b, the extracted correlation features, represented as Fc = [Fc1, Fc2, ···, Fcm], may have different degrees of correlation with the outputs, reflected in the transparency of the graphics. The extracted temporal features are represented as Ft = [Ft1, Ft2, ···, Ftm], which may also contribute differently to the clinical risk prediction task.

To deal with this problem, a novel importance-aware mechanism is introduced to dynamically learn fusion weights for different features. The importance-aware weight matrix Wc is learned from the correlation features Fc by using the following mathematical expression:

Wc=δ(WiacFc+bcia)

where Wcia ∈ Rm×m is trainable weight matrix; bcia ∈ Rm is trainable bias vector; m is the length of the correlation features; and δ(•) is the sigmoid function.

The learned importance-aware weight matrix Wc is then employed to adjust the contribution scores of different correlation features Fc. The mathematical expression is defined as follows:

Wac=WcFc

Furthermore, to stabilize the model training process and facilitate the information propagation, the importance-aware mechanism is built upon the residual learning mechanism. The importance-aware features with the residual learning structure are defined as follows:

Wacrl=Wac+Fc

To deal with the difference in the importance of features learned from events of different timestamps, the residual learning-based importance-aware mechanism is meanwhile adopted for the temporal features Ft = [Ft1, Ft2, ···, FtT]. The mathematical expression employed to learn the importance-aware weights for temporal features Ft is as follows:

Wt=δ(WiatFt+btia)

where Wiat∈ RT×T is trainable weight matrix; biat∈ RT is trainable bias vector; T is the length of the temporal features; and δ(•) is the sigmoid function.

The temporal features Ft are then adjusted by the learned weights Wt by using the following expression:

Wat=WtFt

Similarly, the residual learning mechanism is adopted to stabilize the training of the network and facilitate the propagation of information. The mathematical expression is defined as follows:

Watrl=Wat+Ft

The correlation features and the temporal features adjusted by the importance-aware mechanism built upon the residual learning structure are then fed into the M2N fusion network to produce the predicted risk scores.

Data description

Experiments are conducted on real-world medical data collected from 6367 patients who were diagnosed with PUB and under treatment at the Endoscopy Center, Prince of Wales Hospital, Hong Kong, from 2007 to 2016. The inputs are 35 types of static data (eg, birth date, gender, and total doses of concomitant drugs) and 7 types of irregularly recorded dynamic laboratory parameters, that is, Serial Creatinine (CR), Serial Hemoglobin (HB), Serial Hematocrit (HCT), Serial Platelet (PLT), Serial Prothrombin Time (PT), Serial Urea (UREA), and Serial White Cell Count (WCC).

Experimental settings and evaluation metrics

To achieve early prediction of clinical risk, we predict the risk level of patients long in advance. This helps doctors understand the potential clinical risks of patients and adjust treatments to improve clinical outcomes in a timely manner. We perform experiments to examine model performance under 2 settings. Firstly, the prediction window is fixed (6 months or 12 months) while the observation window varies, increasing from 6 years to 9 years. The research goal is to predict the possibility of death for a patient within 10 years after admission ahead of a certain period by using the medical records in the observation window.

The second setting is to evaluate the prediction capacity of the proposed method under different prediction windows. The observation window is fixed as 9 years while the prediction window increases from 3 months to 12 months gradually with a time interval of 3 months. The research goal is to predict the possibility of death for a patient at a certain period after the end of the observation window by analyzing the medical data in the 9-year observation period.

For each setting, 70% of the patients are randomly chosen as the training set and the remaining 30% patients are employed to test the trained models. The area under the receiver operator characteristic curves (AUC) is employed to evaluate the mortality risk prediction results.

Comparing methods

To our best knowledge, this paper is the first attempt to jointly analyze static and dynamic medical data by considering the influence of personal immune abilities (determined by background static data of patients) on the future trajectories of dynamic medical data. Existing methods are incapable of learning this influencing relationship. Therefore, the proposed method is compared with the following methods:

  • Random Forests (RF):44 As a baseline, RF is utilized to analyze the dynamic data (flattened to 2D) to predict the mortality risk of patients.

  • Long Short-Term Memory (LSTM):32 To handle the vanishing gradient problem, LSTM controls different gates to encode what has been learned and decide how to handle the learned information for further calculation. LSTM has been successfully applied for many sequential tasks. Therefore, LSTM is used to analyze dynamic data.

  • Residual Network (ResNet):31 ResNet introduces shortcut connections to handle the vanishing gradient problem. Equipped with convolution units, ResNet can jointly analyze different variables and has been successfully applied for many tasks. To explore the influence of mean values of dynamic data and missing data labels on risk prediction, ResNet is employed to analyze 3 kinds of static data: Static-I (includes original static data only), Static-II (includes Static-I and mean values), and Static-III (includes Static-II and missing data labels).

  • Method in27: As a state-of-the-art method, the interpretable deep learning model is employed to conduct the mortality risk prediction by using both static data and dynamic data.

  • Method in28: As a state-of-the-art method for combining static and dynamic information to perform clinical events, this approach is also applied to analyze static data and dynamic data to predict the clinical mortality risk of patients.

  • eiPDLA variants: Besides the above methods, we consider the following variants of eiPDLA to verify the effectiveness of each component: (a) PDLA is an eiPDLA without the end-to-end trainable architecture, the importance-aware mechanism, and interpretable attention; (b) ePDLA is PDLA with the end-to-end trainable architecture; (c) ePDLA+im is ePDLA with the importance-aware mechanism; (d) ePDLA+ia is ePDLA with the interpretable attention mechanisms (ie, correlation attention and temporal attention).

RESULTS

Results for different observation windows

In this section, we explore the influence of the observation window on mortality risk prediction results by fixing the prediction window as 6 months and 12 months, respectively, while increasing the observation window from 6 to 9 years gradually. The experimental results for the 6-month ahead and 12-month ahead mortality risk predictions are demonstrated in Tables 1 and 2, respectively. These results clearly demonstrate that the proposed method significantly outperforms comparing methods by achieving the largest AUC scores under various lengths of observation windows. In addition, it can be observed that longer observation windows usually result in better prediction performances. For example, when the prediction window is fixed as 6 months, AUC scores of PDLA and Method in28 are 0.925 and 0.902 for 6-year observation window. However, these scores increase to 0.947 and 0.924 when the observation window is extended to 9 years. This is because longer observations can record more complete patterns of dynamic changes in the disease severity of patients, which improves risk prediction results.

Table 1.

AUC scores of different observation windows for 6-month ahead mortality risk predictions (mean and 95% confidence interval)

Model Observation Winndow 6 Years Observation Window 7 Years Observation Window 8 Years Observation Window 9 Years
Random forests 0.838 (0.819–0.856) 0.848 (0.829–0.865) 0.851 (0.832–0.867) 0.860 (0.844–0.875)
LSTM (dynamic) 0.884 (0.868–0.898) 0.892 (0.877–0.905) 0.897 (0.882–0.910) 0.903 (0.888–0.915)
ResNet (Static-I) 0.880 (0.863–0.895) 0.889 (0.873–0.902) 0.892 (0.877–0.906) 0.897 (0.881–0.910)
ResNet (Static-II) 0.909 (0.894–0.922) 0.912 (0.900–0.925) 0.914 (0.901–0.927) 0.916 (0.902–0.928)
ResNet (Static-III) 0.910 (0.896–0.923) 0.917 (0.904–0.929) 0.922 (0.909–0.934) 0.936 (0.925–0.946)
Method in27 0.902 (0.887–0.916) 0.904 (0.889–0.917) 0.909 (0.895–0.922) 0.925 (0.913–0.937)
Method in28 0.902 (0.888–0.915) 0.907 (0.894–0.920) 0.910 (0.896–0.922) 0.924 (0.911–0.935)
PDLA 0.925 (0.913–0.937) 0.927 (0.915–0.939) 0.930 (0.918–0.941) 0.947 (0.937–0.957)
eiPDLA 0.943 (0.932–0.952) 0.944 (0.932–0.953) 0.949 (0.938–0.957) 0.952 (0.942–0.960)

Abbreviations: eiPDLA, end-to-end importance-aware personalized deep learning approach;

LSTM, long short-term memory; PDLA, personalized deep learning approach; ResNet, residual network.

Table 2.

AUC scores of different observation windows for 12-month ahead mortality risk predictions (mean and 95% confidence interval)

Model Observation Window 6 Years Observation Window 7 Years Observation Window 8 Years Observation Window 9 Years
Random forests 0.827 (0.809–0.844) 0.836 (0.817–0.853) 0.842 (0.823–0.859) 0.857 (0.839–0.873)
LSTM (dynamic) 0.872 (0.855–0.887) 0.884 (0.869–0.899) 0.893 (0.878–0.908) 0.897 (0.883–0.911)
ResNet (Static-I) 0.867 (0.849–0.881) 0.879 (0.860–0.893) 0.883 (0.866–0.897) 0.891 (0.875–0.904)
ResNet (Static-II) 0.884 (0.868–0.898) 0.886 (0.871–0.899) 0.894 (0.879–0.907) 0.904 (0.890–0.918)
ResNet (Static-III) 0.886 (0.870–0.901) 0.897 (0.882–0.911) 0.902 (0.888–0.916) 0.914 (0.902–0.926)
Method in27 0.883 (0.866–0.898) 0.891 (0.875–0.906) 0.903 (0.888–0.918) 0.921 (0.908–0.934)
Method in28 0.886 (0.871–0.900) 0.891 (0.877–0.905) 0.899 (0.885–0.912) 0.918 (0.904–0.929)
PDLA 0.899 (0.884–0.912) 0.908 (0.895–0.920) 0.913 (0.901–0.925) 0.939 (0.929–0.949)
eiPDLA 0.930 (0.918–0.942) 0.932 (0.920–0.943) 0.938 (0.927–0.948) 0.944 (0.934–0.954)

Furthermore, for different lengths of prediction window, changing speeds of AUC scores along with the increase of observation window length are different, as illustrated in Figure 4. We calculate ΔAUCs by using AUC scores of 6-month ahead prediction results to subtract AUC scores of 12-month ahead prediction results. These ΔAUCs are positive values because it is easier to predict clinical outcomes of near future than that of a longer future for the same observation window. More importantly, we can see that when the length of the observation window increases, ΔAUCs decrease gradually as a whole (eg, ΔAUCs of all these methods for 6-year observation window are larger than ΔAUCs of 9-year observation window). This demonstrates that when the observation window is extended by the same degrees, results of 12-month ahead risk prediction often receive larger degrees of improvement than results of 6-month ahead prediction. This is probably because for relative short-term predictions, short-term patterns of change are enough to produce satisfying results, which can be learned from a short observation period. However, for long-term risk predictions, long-term patterns of change are necessary, which often needs to be learned from records over a long observation period. Therefore, when the length of the observation window increases, long-term patterns of change of the disease are recorded and can be learned more comprehensively, which will effectively promote long-term prediction results. However, for short-term risk predictions, though results could also get improved, the degrees of improvement are often smaller since most short-term patterns of change could have already been learned from a short observation window.

Figure 4.

Figure 4.

Changes of area under the curve (AUC) scores along the increase of observation window length for 2 different prediction windows (6 months and 12 months).

Results for different prediction windows

We then explore the influence of the prediction window. We fix the observation window as 9 years and increase the prediction window from 3 months to 12 months with an interval of 3 months. Experimental results are provided in Table 3. It can be observed that PDLA achieves larger AUC scores than other methods at various prediction windows, demonstrating the strongest modeling capacity. In addition, we can observe that when the same models are used to perform different lengths of predictions, better results are often achieved for short prediction windows. For example, when the observation window is fixed as 9 years, PDLA achieves AUC scores of 0.973 and 0.939 for prediction windows of 3 months and 12 months, respectively. These results demonstrate that difficulty in achieving accurate prediction is positively related to how far into the future the mortality risk prediction is made. This is probably because more uncertainties could arise from different sources in longer future and make the disease’s patterns of change at that time different from its current patterns of change.

Table 3.

AUC scores of different levels of mortality risk predictions with 9-year observation window (mean and 95% confidence interval)

Model Prediction Window 3 Months Prediction Window 6 Months Prediction Window 9 Months Prediction Window 12 Months
Random forests 0.870 (0.853–0.885) 0.869 (0.852–0.885) 0.864 (0.846–0.880) 0.857 (0.839–0.873)
LSTM (dynamic) 0.908 (0.895–0.922) 0.905 (0.891–0.918) 0.904 (0.891–0.918) 0.897 (0.883–0.911)
ResNet (Static-I) 0.916 (0.902–0.929) 0.911 (0.897–0.923) 0.906 (0.892–0.919) 0.891 (0.875–0.904)
ResNet (Static-II) 0.925 (0.912–0.938) 0.919 (0.906–0.931) 0.909 (0.895–0.922) 0.904 (0.890–0.918)
ResNet (Static-III) 0.958 (0.948–0.967) 0.940 (0.928–0.951) 0.932 (0.920–0.943) 0.914 (0.902–0.926)
Method in27 0.958 (0.948–0.966) 0.953 (0.944–0.963) 0.946 (0.935–0.956) 0.921 (0.908–0.934)
Method in28 0.961 (0.952–0.968) 0.954 (0.945–0.963) 0.932 (0.921–0.943) 0.918 (0.904–0.929)
PDLA 0.973 (0.966–0.980) 0.966 (0.958–0.974) 0.960 (0.951–0.968) 0.939 (0.929–0.949)
eiPDLA 0.984 (0.978–0.989) 0.980 (0.974–0.986) 0.972 (0.965–0.979) 0.944 (0.934–0.954)

Abbreviations: eiPDLA, end-to-end importance-aware personalized deep learning approach;

LSTM, long short-term memory; PDLA, personalized deep learning approach; ResNet, residual network.

Ablation study

To evaluate the effectiveness of each component, the performances of different variants of eiPDLA are compared on the 12-month ahead mortality risk prediction task, as shown in Figure 5. First, we compare the performances of ePDLA and PDLA. We can observe that ePDLA consistently outperforms PDLA under various settings. For example, at the 12-month ahead prediction level, for the observation window of 6 years, the AUC score achieved by ePDLA is 0.914, which is much larger than 0.899 by PDLA. This is because different parts of PDLA are trained separately while components of ePDLA are optimized jointly. As a result, though each sub-module of PDLA could reach local optimums, their combination may not achieve the global optimum. On the contrary, ePDLA is constructed in the advanced end-to-end trainable architecture, which trains different modules together and has a much larger probability to achieve the global optimum. These results indicate the effectiveness of the end-to-end trainable architecture.

Figure 5.

Figure 5.

Ablation study on 12-month ahead mortality risk predictions.

Second, we explore the effectiveness of the importance-aware mechanism. As shown in Figure 5, extensive results under various settings indicate that ePDLA+im continually outperforms ePDLA. For example, for the 12-month prediction level, when the observation window lasts for 6 years, ePDLA+im achieves the AUC score of 0.924, which is much larger than 0.914 by ePDLA. When the observation window increases to 8 years, for the same task, the AUC score achieved by ePDLA+im increases to 0.936, which again improved compared with 0.922 by ePDLA. All these results demonstrate the effectiveness of the importance-aware mechanism for improving the model’s performance. This is because this mechanism can effectively fuse information extracted from multiple sources by adaptively learning the contribution scores of different features. During the training process, weights assigned to these features are dynamically adjusted by employing the back propagated prediction error. Thus, important information is dynamically assigned with large weights to play active roles to promote risk prediction results, while smaller weights are given to less important information to avoid distracting the deep neural network.

Third, we evaluate the effectiveness of the interpretable attention mechanism (ie, correlation attention, and temporal attention). As shown in Figure 5, we can observe that ePDLA+ia consistently outperforms ePDLA under various settings. For example, for the 12-month ahead mortality risk prediction task, when the observation window lasts for 8 years, ePDLA+ia achieves the AUC score of 0.933, which is significantly larger than the score of 0.922 by ePDLA. All these results demonstrate that by adaptively learning and adjusting the contribution weights of different static and dynamic input data, the proposed interpretable attention mechanism can effectively identify key elements from different inputs while strengthening their contributions to boost final results. More importantly, by analyzing the contribution weights assigned to static and dynamic inputs by the correlation and temporal attention modules, we can clearly obtain the contribution weight of every input, thus helping doctors identify main risk factors and accordingly adjusting the treatments of patients to improve clinical outcomes.

DISCUSSION

The proposed method jointly analyzes static and dynamic medical data from both the correlation perspective and the temporal perspective, outperforming models based on a single perspective (eg, LSTM, ResNet and RF). Since most methods analyze medical data from a single perspective, they only capture correlation relationship between different variables; this reflects their influencing relationship, or temporal information, which reflects changes of variables along time, and their prediction results are often limited. Conversely, the proposed method jointly analyzes static and dynamic medical data from different perspectives. Therefore, it can extract different types of information to comprehensively grasp the status of patients and achieve better prediction results.

Performing correlation analysis to capture the relationship between static data and dynamic data can effectively improve risk prediction performance. Under various experimental settings, the proposed method consistently outperforms 2 state-of-the-art models described in,27,28 which also build models by combining static data and dynamic data. This is probably because the methods in 27,28 simply concatenate static features and dynamic features and then use a prediction layer to map these features to the output, which ignores their correlation relationship. On the contrary, the proposed method utilizes a deep ResNet with many convolution units to jointly analyze static data and dynamic data from the correlation perspective, so that their influencing relationship can be captured and furthermore fused with temporal information via a deep fusion method. Therefore, although all these methods use the same data to build models and adopt similar RNN models to perform temporal analysis, the proposed method performs correlation analysis to capture correlation information and, as a result, achieves better mortality risk prediction results.

Prediction results of ResNet for Static-II and Static-III demonstrate that the utilization of missing data patterns can effectively improve mortality risk prediction results. This is probably because that missing data phenomenon can reflect judgments of doctors regarding the health status of patients (ie, doctors tend to arrange for patients with good status to take fewer examinations to reduce their pain and expense). However, for patients with severe conditions, intensive examinations are significant for the timely grasping of changes in the severity of patient’s disease and dynamically adjusting treatment plans to improve health care outcomes. This suggests that patients should adjust their own arrangements to undertake examinations in hospitals according to doctors’ advice so that missing data patterns in EHRs will contain more valuable medical information.

Several studies indicate that sometimes deep learning models cannot significantly outperform traditional methods for analyzing structured information in the EHRs.45–47 There are two possible reasons. When input data does not contain enough useful information for their tasks, both deep and traditional models make unsatisfied predictions at the same time. On the contrary, when the input data contains enough and easy-to-extract information, they all make very accurate predictions. Both situations limit the degree to which deep models outperform traditional methods. In this study, there are 2 main reasons for the significant performance improvement of the proposed eiPDLA method. First, the input data contain a large amount of valuable information (ie, both static and dynamic health data are incorporated into the proposed network), which can reflect the health status of patients from different aspects. However, because of the heterogeneity, the irregularity, the missing data problems, and the influencing relationships between different inputs, the contained valuable health information is difficult to extract, which limits the prediction performances of traditional methods. Second, to fully utilize the useful information from different sources, our proposed eiPDLA method simultaneously performs correlation analysis for static and dynamic health data to capture their relationships, conducts temporal analysis to learn dynamic changes in the health conditions of patients, and introduces the importance-aware mechanism to dynamically optimize fusion weights of different features.

The interpretability of machine learning models is very important for their real-world applications, especially in the health care area.47–50 To demonstrate the benefit of using eiPDLA for real-world mortality risk predictions, we analyze attention weights learned for both static and dynamic health data. Figure 6 provides a case study for a patient who finally had an unfavorable clinical outcome. The attention weights learned for static health data are demonstrated in Figure 6a. By inspecting the attention values of different data, we can clearly identify the clinical importance of health variables for estimating the conditions of the patient among which gender, birthdate, and the doses of Pantoprazole and Omeprazole are assigned with large contribution scores with values of 0.999, 0.976, 0.991, and 0.982, respectively. As important medications for the treatment of ulcer diseases and the prevention of upper gastrointestinal bleeding, Pantoprazole and Omeprazole have proved to be vital factors for evaluating the health conditions of PUB patients.51,52 Recent medical research shows that males have a significantly higher percentage of peptic ulcer bleeding than females,53 and old age is an important factor related to poor prognosis of ulcer bleeding,54 both of which indicate that these factors are also important for predicting the risk levels of PUB patients. These results prove that the proposed mechanism can effectively identify key factors from different personal basic information and automatically enlarge their contributions to improve the risk prediction accuracies while providing clinical interpretations. Figure 6b provides the attention weight matrix learned for dynamic health data, which indicates that diverse attention weight values are assigned to different sequences and records. By calculating the mean of attention weights at different timestamps, we can get the overall contribution score of each dynamic variable. The 2 variables assigned with the 2 largest attention weights are CR and UREA, which is reasonable because CR and UREA have proved to be closely related to the health status of PUB patients.55,56 To further explore the contribution weights of records at different timestamps, we analyze sequences of CR, UREA, and their contribution weights, as shown in Figures 6c and 6d. For CR, we observe that more attention is assigned to records with large values (eg, between 2014 and 2015). Furthermore, sudden increases in the values of CR also draw additional attention compared with records nearby (eg, between 2011 and 2012). Differently, for UREA, the attention weights are mainly allocated according to the values while much less attention is paid to the fluctuations. This is because many other factors can influence the value of UREA (eg, exercise).57 Therefore, sudden slight increases in its value may not contain much useful information. On the contrary, CR is often stable among healthy people but changes for patients. As a result, the fluctuations of CR may reflect changes in health conditions.57 These results prove that eiPDLA can identify important inputs and provide meaningful interpretations, which effectively improve prediction results while helping doctors identify main risk factors and design personalized treatments to improve clinical outcomes.

Figure 6.

Figure 6.

Visualization of attention weights: (a) the attention weights learned for static health data; (b) the attention weights learned for dynamic health data; (c) time-series data of CR and UREA; (d) contribution scores of CR and UREA.

CONCLUSION

In this paper, we propose a PDLA to jointly handle static and dynamic medical data to achieve accurate mortality risk prediction results. Compared with existing methods, PDLA effectively improves the risk prediction performance by simultaneously performing correlation analysis for static data and dynamic data to capture their influencing relationship, and furthermore performing temporal analysis to capture the dynamic changes in the severity of patients’ disease over time. In addition, we present a variant with the importance-aware mechanism built upon the end-to-end trainable architecture (eiPDLA) to dynamically learn fusion weights for different features, which adaptively assign larger weights to important features and weaken the influence of less important features. Furthermore, eiPDLA is designed in an end-to-end trainable architecture to simultaneously optimize parameters of different components, achieving global optimal and further improving risk prediction performance. Extensive experimental results under various settings on real-world medical data demonstrate that the proposed method significantly outperforms existing methods. Future studies may include a natural language processing module to further incorporate clinical text datasets (especially diagnosis records) as inputs to introduce doctors’ expertise into the deep learning architecture. Lastly, we note that the generalizability of a model might be influenced by many factors, including the lengths of the observation window, how far into the future the predictions are made, and the characteristics of the data.47 The experimental results have demonstrated that the proposed method significantly outperforms existing methods under various observations and prediction windows. Future investigations may explore the generalization ability of the proposed method for EHR data in other medical centers.

FUNDING

This work was supported by the Health and Medical Research Fund Project under Grant 07180216.

AUTHOR CONTRIBUTIONS

QT, MY, GLW, and PCY were responsible for the conception, design of the study, and the development of methodology. QT, QJM, TCY, and GLW were responsible for the data collection and analysis. QT, MY, AJM and PCY were responsible for building the machine learning model. All authors were responsible for the interpretation of data, writing, review, and final approval of the manuscript.

DATA AVAILABILITY

The data underlying this article are owned by a third party. The data were provided by Prince of Wales Hospital, Hong Kong, by permission. Data will be shared on request to the corresponding author with permission of Prince of Wales Hospital, Hong Kong.

CONFLICT OF INTEREST STATEMENT

None declared.

REFERENCES

  • 1. Lanas A, Chan FK.  Peptic ulcer disease. Lancet  2017; 390 (10094): 613–24. [DOI] [PubMed] [Google Scholar]
  • 2. Taha AS, McCloskey C, Prasad R, Bezlyak V.  Famotidine for the prevention of peptic ulcers and oesophagitis in patients taking low-dose aspirin (FAMOUS): a phase III, randomised, double-blind, placebo-controlled trial. Lancet  2009; 374 (9684): 119–25. [DOI] [PubMed] [Google Scholar]
  • 3. Ng FH, Wong SY, Lam KF, et al.  Famotidine is inferior to pantoprazole in preventing recurrence of aspirin-related peptic ulcers or erosions. Gastroenterology  2010; 138 (1): 82–8. [DOI] [PubMed] [Google Scholar]
  • 4. Camus M, Jensen DM, Kovacs TO, Jensen ME, Markovic D, Gornbein J.  Independent risk factors of 30‐day outcomes in 1264 patients with peptic ulcer bleeding in the USA: large ulcers do worse. Aliment Pharmacol Ther  2016; 43 (10): 1080–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Gao J, Xiao C, Glass LM, Sun J.  Dr. agent: clinical predictive model via mimicked second opinions. J Am Med Inform Assoc  2020; 27 (7): 1084–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Xiao C, Choi E, Sun J.  Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. J Am Med Inform Assoc  2018; 25 (10): 1419–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Ter Horst R, Jaeger M, Smeekens SP, et al.  Host and environmental factors influencing individual human cytokine responses. Cell  2016; 167 (4): 1111–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Tang F, Xiao C, Wang F, Zhou J.  Predictive modeling in urgent care: a comparative study of machine learning approaches. JAMIA Open  2018; 1 (1): 87–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Pivovarov R, Elhadad N.  Automated methods for the summarization of electronic health records. J Am Med Inform Assoc  2015; 22 (5): 938–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Goldstein BA, Navar AM, Pencina MJ, Ioannidis J.  Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Inform Assoc  2017; 24 (1): 198–208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Koleck TA, Dreisbach C, Bourne PE, Bakken S.  Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review. J Am Med Inform Assoc  2019; 26 (4): 364–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Li R, Chen Y, Moore JH.  Integration of genetic and clinical information to improve imputation of data missing from electronic health records. J Am Med Inform Assoc  2019; 26 (10): 1056–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Zheng B, Yoon SW, Lam SS.  Breast cancer diagnosis based on feature extraction using a hybrid of k-means and support vector machine algorithms. Expert Syst Appl  2014; 41 (4): 1476–82. [Google Scholar]
  • 14. Gorunescu F, Belciug S.  Evolutionary strategy to develop learningbased decision systems. application to breast cancer and liver fibrosis stadialization. J Biomed Inform  2014; 49: 112–8. [DOI] [PubMed] [Google Scholar]
  • 15. Nilashi M, Ahmadi H, Shahmoradi L, Ibrahim O, Akbari E.  A predictive method for hepatitis disease diagnosis using ensembles of neuro-fuzzy technique. J Infect Public Health  2019; 12 (1): 13–20. [DOI] [PubMed] [Google Scholar]
  • 16. Choi E, Bahadori MT, Schuetz A, Stewart WF, Sun J. Doctor ai: Predicting clinical events via recurrent neural networks. In proceedings of the Machine Learning for Healthcare Conference; August 19–20, 2016; Los Angeles, CA, USA. [PMC free article] [PubMed]
  • 17. Choi E, Schuetz A, Stewart WF, Sun J.  Using recurrent neural network models for early detection of heart failure onset. J Am Med Inform Assoc  2017; 24 (2): 361–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Aczon M, Ledbetter D, Ho L, et al. Dynamic mortality risk predictions in pediatric critical care using recurrent neural networks. arXiv preprint arXiv 1701.06675, 2017.
  • 19. Xu Y, Biswal S,, Deshpande SR, Maher KO, Sun J. Raim: Recurrent attentive and intensive model of multimodal patient monitoring data. in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; August 19‒23, 2018; London, United Kingdom.
  • 20. Che Z, Purushotham S, Cho K, Sontag D, Liu Y.  Recurrent neural networks for multivariate time series with missing values. Sci Rep  2018; 8 (1): 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Tan Q, Ma AJ, Ye M, et al. Ua-CRNN: Uncertainty-aware convolutional recurrent neural network for mortality risk prediction. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management; November 3–7, 2019; Beijing, China.
  • 22. Tan Q, Ye M, Yang B, et al. Data-gru: Dual-attention time-aware gated recurrent unit for irregular multivariate time series. In: proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence; February 7–12, 2020; New York, NY, USA.
  • 23. Shukla SN, Marlin BM. Interpolation-prediction networks for irregularly sampled time series in arXiv:1909.07782, 2019.
  • 24. Baytas IM, Xiao C, Zhang X, Wang F, Jain AK, Zhou J. Patient subtyping via time-aware LSTM networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; August 13–17, 2017; Halifax, NS, Canada.
  • 25. Che C, Xiao C, Liang J, Jin B, Zho J, Wang F. An RNN architecture with dynamic temporal matching for personalized predictions of parkinson's disease. In Proceedings of the 2017 SIAM International Conference on Data Mining; July 10–14, 2017; Pittsburgh, PA, USA.
  • 26. Xu Z, Chou J, Zhang XS, et al.  Identifying sub-phenotypes of acute kidney injury using structured and unstructured electronic health record data with memory networks. J Biomed Inform  2020; 102: 103361. vol [DOI] [PubMed] [Google Scholar]
  • 27. Che Z, Purushotham S, Khemani R, Liu Y.  Interpretable deep models for icu outcome prediction. AMIA Annu Symp Proc  2016; 2016: 371–80. [PMC free article] [PubMed] [Google Scholar]
  • 28. Esteban C, Staeck O, Baier S, Yang Y, Tresp V. Predicting clinical events by combining static and dynamic information using recurrent neural networks. In proceedings of the 2016 IEEE International Conference on Healthcare Informatics (ICHI); October 4–7, 2016; Chicago, IL, USA.
  • 29. Wang Z, Song W, Liu L, et al. Representation learning with deconvolution for multivariate time series classification and visualization. arXiv preprint arXiv 1610.07258, 2016.
  • 30. Fiterau M, Bhooshan S, Fries J, et al.  Shortfuse: biomedical time series representations in the presence of structured information. Proc Mach Learn Res  2017; 68: 59–74. [PMC free article] [PubMed] [Google Scholar]
  • 31. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; June 27–30, 2016; Las Vegas, NV, USA.
  • 32. Lipton ZC, Kale DC, Elkan C, Wetzel R. Learning to diagnose with lstm recurrent neural networks. arXiv preprint arXiv 1511.03677, 2015.
  • 33. Tan Q., Ma AJ., Deng H., et al.  A hybrid residual network and long short-term memory method for peptic ulcer bleeding mortality prediction. AMIA Annu Symp Proc  2018; 2018: 998–1007. [PMC free article] [PubMed] [Google Scholar]
  • 34. Ioffe S, Szegedy C. “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv 1502.03167, 2015.
  • 35. Keogh E, Ratanamahatana CA.  Exact indexing of dynamic time warping. Knowl Inf Syst  2005; 7 (3): 358–86. [Google Scholar]
  • 36. Morel M, Achard C, Kulpa R, Dubuisson S.  Time-series averaging using constrained dynamic time warping with tolerance. Pattern Recogn  2018; 74: 77–89. [Google Scholar]
  • 37. Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana CA. Fast time series classification using numerosity reduction. In Proceedings of the 23rd International Conference on Machine Learning; June 25–29, 2006; Pittsburgh, PA, USA.
  • 38. Zhou F, De la Torre F.  Generalized canonical time warping. IEEE Trans Pattern Anal Mach Intell  2016; 38 (2): 279–94. [DOI] [PubMed] [Google Scholar]
  • 39. Dau HA, Silva DF, Petitjean F, et al.  Optimizing dynamic time warpings window width for time series data mining applications. Data Min Knowl Disc  2018; 32 (4): 1074–120. [Google Scholar]
  • 40. Gharghabi S, Imani S, Bagnall A, Darvishzadeh A, Keogh E. An ultra-fast time series distance measure to allow data mining in more complex real-world deployments. In proceedings of the IEEE International Conference on Data Mining; November 17–20, 2018; Singapore, Singapore.
  • 41. Bhattacharya S, Rajan V, Shrivastava H. ICU mortality prediction: a classification algorithm for imbalanced datasets. In: proceedings of the Thirty-First AAAI Conference on Artificial Intelligence; February 4–10, 2017; San Francisco, CA, USA.
  • 42. Li Y, Jin R, Luo Y.  Classifying relations in clinical narratives using segment graph convolutional and recurrent neural networks (seggcrns). J Am Med Inform Assoc  2019; 26 (3): 262–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Magge A, Sarker A, Nikfarjam A, Gonzalez-Hernandez G.  Comment on:deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in twitter posts. J Am Med Inform Assoc  2019; 26 (6): 577–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Breiman L.  Random forests. Machine Learn  2001; 45 (1): 5–32. [Google Scholar]
  • 45. Rajkomar A, Oren E, Chen K, et al.  Scalable and accurate deep learning with electronic health records. NPJ Digital Med  2018; 1 (1): 1–10.  [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Min X, Yu B, Wang F.  Predictive modeling of the hospital readmission risk from patients’ claims data using machine learning: a case study on COPD. Sci Rep  2019; 9 (1): 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Wang F, Preininger A.  AI in health: State of the art, challenges, and future directions. Yearb Med Inform  2019; 28 (01): 16–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Wiens J, Saria S, Sendak M, et al.  Do no harm: a roadmap for responsible machine learning for health care. Nat Med  2019; 25 (9): 1337–40. [DOI] [PubMed] [Google Scholar]
  • 49. Wang F, Casalino LP, Khullar D.  Deep learning in medicine—promise, progress, and challenges. JAMA Intern Med  2019; 179 (3): 293–4. [DOI] [PubMed] [Google Scholar]
  • 50. Wang F, Kaushal R, Khullar D.  Should health care demand interpretable artificial intelligence or accept “black box” medicine?  Ann Intern Med  2020; 172 (1): 59–62. [DOI] [PubMed] [Google Scholar]
  • 51. Moayyedi P, Eikelboom JW, Bosch J, et al.  Pantoprazole to prevent gastroduodenal events in patients receiving rivaroxaban and/or aspirin in a randomized, double-blind, placebo-controlled trial. Gastroenterology  2019; 157 (2): 403–12. [DOI] [PubMed] [Google Scholar]
  • 52. Gong EJ, Lee SJ, Jun BG, et al.  Optimal timing of feeding after endoscopic hemostasis in patients with peptic ulcer bleeding: a randomized, noninferiority trial (CRIS KCT0001019). Am J Gastroenterol  2020; 115 (4): 548–54. [DOI] [PubMed] [Google Scholar]
  • 53. Venerito M, Schneider C, Costanzo R, Breja R, Röhl FW, Malfertheiner P.  Contribution of Helicobacter pylori infection to the risk of peptic ulcer bleeding in patients on nonsteroidal anti‐inflammatory drugs, antiplatelet agents, anticoagulants, corticosteroids and selective serotonin reuptake inhibitors. Aliment Pharmacol Ther  2018; 47 (11): 1464–71. [DOI] [PubMed] [Google Scholar]
  • 54. Cheng HC, Yang EH, Wu CT, et al.  Hypoalbuminemia is a predictor of mortality and rebleeding in peptic ulcer bleeding under proton pump inhibitor use. J Formos Med Assoc  2018; 117 (4): 316–25. [DOI] [PubMed] [Google Scholar]
  • 55. Laursen SB, Leontiadis GI, Stanley AJ, Møller MH, Hansen JM, de Muckadell OBS.  Relationship between timing of endoscopy and mortality in patients with peptic ulcer bleeding: a nationwide cohort study. Gastrointestinal Endosc  2017; 85 (5): 936–44. [DOI] [PubMed] [Google Scholar]
  • 56. Kumar NL, Claggett BL, Cohen AJ, Nayor J, Saltzman JR.  Association between an increase in blood urea nitrogen at 24 hours and worse outcomes in acute nonvariceal upper GI bleeding. Gastrointest Endosc  2017; 86 (6): 1022–7. [DOI] [PubMed] [Google Scholar]
  • 57. Sokal P, Jastrzębski Z, Jaskulska E, et al.  Differences in blood urea and creatinine concentrations in earthed and unearthed subjects during cycling exercise and recovery. Evid-Based Complement Altern Med  2013; 2013: 1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data underlying this article are owned by a third party. The data were provided by Prince of Wales Hospital, Hong Kong, by permission. Data will be shared on request to the corresponding author with permission of Prince of Wales Hospital, Hong Kong.


Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES