Abstract
Objective.
While many machine learning and deep learning-based models for clinical event prediction leverage various data elements from electronic healthcare records such as patient demographics and billing codes, such models face severe challenges when tested outside of their institution of training. These challenges are rooted not only in differences in patient population characteristics, but medical practice patterns of different institutions.
Method.
We propose a solution to this problem through systematically adaptable design of graph-based convolutional neural networks (GCNN) for clinical event prediction. Our solution relies on the unique property of GCNN where data encoded as graph edges is only implicitly used during the prediction process and can be adapted after model training without requiring model re-training.
Results.
Our adaptable GCNN-based prediction models outperformed all comparative models during external validation for two different clinical problems, while supporting multimodal data integration. For prediction of hospital discharge and mortality, the comparative fusion baseline model achieved 0.58 [0.52 - 0.59] and 0.81[0.80-0.82] AUROC on the external dataset while the GCNN achieved 0.70 [0.68-0.70] and 0.91 [0.90 - 0.92] respectively. For prediction of future unplanned transfusion, we observed even more gaps in performance due to missing/incomplete data in the external dataset - late fusion achieved 0.44[0.31-0.56] while the GCNN model achieved 0.70 [0.62-0.84].
Conclusion.
These results support our hypothesis that carefully designed GCNN-based models can overcome generalization challenges faced by prediction models.
Graphical Abstract

Figure: Graphical representation of the study; (a) Difference in comorbidities of similar patient populations (patients hospitalized with COVID-19 infection) from two different healthcare institution highlighting the challenge of model generalization across different institutions; (b) Sample graph formation involving several clinical data elements – edge features (e) are only used for edge formation based on similarity , features and similarity functions can be updated when trained graph convolutional neural network is shipped to outside institutions.
INTRODUCTION
During each patient visit, healthcare centers record the health data of patients in digital systems referred to as Electronic Health Records (EHR) that consist of heterogeneous elements. Structured format of EHR represents data that can take a value within a specified range or from a pre-defined dictionary. Examples of such EHR data include, but are not limited to, medical codes, medications, administrative data, vital signs, and laboratory test outcomes. In the era of digital age, secondary use of structured electronic health records (EHR) for developing machine learning (ML) and deep learning (DL) models for clinical event prediction and digital phenotyping [2, 23, 24, 25, 26] is becoming widely popular and is being clinically adopted for improving healthcare delivery. However, models trained on a single institution’s data often face severe challenges when applied across multiple different institutions and diverse populations [3, 27, 28].
ML/DL models commonly leverage ICD and CPT codes to incorporate the clinical status of patients in addition to their demographic features [4,5,22]. These codes are designed to convert healthcare services to billable revenue. Qualified healthcare coders are responsible for accuracy and completeness of these codes. However, significant differences exist in coding practices between different healthcare institutions [6]. ML/DL models often learn practice patterns of the training institution rather than relevant predictive features and can fail when applied to another institution [7]. Some studies have even shown that time and frequency of lab test order is more important for the model than actually the result of the lab test [8]. A survey paper recently concluded that when tested on external data, more than 20 models trained for prognosis for COVID-19 patients could not outperform univariable predictions made on oxygen saturation level at the time of admission to the hospital, calling into question the utility of ML modeling for clinical event prediction [9]. Epic sepsis model achieves subpar performance when validated externally [9]. Models also experience performance decay over time even when deployed in the same institution where it was trained, likely attributed to evolution of population characteristics and practice patterns over time [10]. These challenges limit the generalizability and scalability of ML/DL models that leverage electronic health records for tasks like clinical event prediction or patient phenotyping.
A popular remedy is the curation of refined clinical features for standardized risk prediction [11] e.g., pooled cohort equations [12]. Clinical features are refined to eliminate practice pattern-based variations that might arise in the recording of those features. However, feature curation requires manual effort, introduces the possibility of curation errors and is limited to smaller datasets. This approach also lacks comprehensiveness and can potentially miss other relevant predictive features for the given task contained within the EHR. Such models can only focus on expert-defined clinical features and cannot make use of the vast amount of information available in the electronic health records in general. Research has shown that comprehensive models using a wide variety of EHR outperform models using curated features [13]. Even after valuable curation effort and targeted modeling based on a narrow set of curated features, this approach shows biases among different population groups [12]. Another approach is harmonizing EHR under standard data models like Observational Medical Outcomes Partnership (OMOP) and Fast Healthcare Interoperability Resources (FHIR). These data models put well-known limitations on granularity of EHR and cannot handle variations in the data patterns themselves [14, 15].
Recent years, significant effort has been put into evaluating ML/DL models for healthcare and medicine specifically for generalization to the external populations. Dexter et. al reported weak generalization of ML-based models for notifiable disease detection across multiple lab systems and pointed towards syntactic variations in free-text reports as the reason behind weak generalizability [39]. Yang et al. tested COVID-19 detection models across four different NHS hospital trusts and showed that after training, these models needed site-specific threshold adjustment or fine tuning to achieve satisfactory performance on external sites [40]. Rasmy et al. evaluated an RNN based heart disease risk prediction model on data collected from different healthcare systems and highlighted population differences as the root cause of drop in performance in external validation [41]. Wang et al. proposed a highly generalizable Alzheimer’s disease progression prediction model by combining several convolutional neural networks (CNN) to extract relevant imaging features which are fed into a support vector machine for final prediction [42]. This scheme is only application to imaging models and does not cover multimodal datasets, particularly coded features.
We propose a novel solution to adaptability challenges of EHR models through the design of graph-based convolutional neural networks (GCNN). GCNN models are convolutional models that can incorporate user-defined neighborhoods as encoded in input graph structure, in contrast to CNN that can only incorporate spatial neighborhoods [1, 16, 21, 29, 30-35]. These models have two-fold learning capabilities; i) learning explicitly from node features, ii) learning implicitly from neighborhood/graph structure through message passing between connected nodes. Hence, meaningful definition of graph is implicitly used by the model for better prediction. For example, a graph based on clinical similarity, i.e., patient connected to each other through edges when they are clinically similar, allows the GCNN to not only learn from node features (patient characteristics) but also features of other clinically similar edge-connected patients through message passing mechanism. GCNNs have been used to fuse data modalities such as radiological images and demographic information in their two data structures - nodes and edges [1, 16, 21, 32-35].
Our contribution in this work is to formalize the use of the two-fold learning capacity of GCNN to overcome the challenges of cross-institutional model generalization, especially arising from differences in EHR data coding. Data elements that tend to be recorded consistently across institutions are used for explicit learning while data elements facing wider variations across institutions or over time are used for implicit learning through their use for edge structure formation. Edge formation function can be systematically adapted when practice pattern variations, across time or institutions, induce significant differences in the recording of corresponding data elements. GCNN trained on the graph formed by the original edge formation function remains applicable over the new graph formed by the new edge formation function as the GCNN only operates on similarity patterns as indicated by edge structure and is agnostic to the exact estimation process used for that similarity pattern. Our generalizable design framework is widely applicable to both static and temporal designs for similarity pattern estimation and a wide variety of data elements including imaging and non-imaging data. To the best of our knowledge, our work is the first attempt to formalize the design of GCNN for the explicit purpose of generalization and adaptability across a wide variety of clinical data elements. While many previously proposed GCNN models have established the superiority of two-fold learning capacity of GCNN for clinical events prediction and diagnosis, our work shows that such performance advantage can be coupled with generalization powers through the proposed design guidelines. No manual curation effort over EHR data is necessary for GCNN based generalizable model design. No effort is needed to harmonize EHR data elements across institutions, instead a new edge formation function is designed for the new version data while still keeping the pretrained GCNN model applicable. Even complex temporal ML models for edge formation function can be accommodated through self-supervised training on new cohorts.
This study was carefully designed to include a vast variety of data elements - imaging, non-imaging, and temporal - and various clinically relevant target events to demonstrate wide-spread applicability of our design framework. We design two different GCNN models under proposed design guidelines to solve two clinically relevant problems on completely different populations; 1) prediction of two clinical events for patients hospitalized with positive COVID-19 test: discharge from hospital and mortality using chest X-rays and EHR data elements such as billing codes and demographics features; and 2) prediction of blood transfusion in hospitalized patients using a wide range of EHR data elements (demographic features, CPT and ICD codes, medications, lab tests, vital signs). While these models were designed to process homogeneous graphs such that all nodes in one graph were represented by the same feature set, our design guidelines may be extended to heterogeneous graphs for careful selection of edge features. We used cohorts from large academic healthcare organization and public database for experimentation.
RELATED WORK
Graph convolutional neural networks have found vast applications within the field of medicine and healthcare [1,17-19, 21, 32-35]. GCNNs were used to model similarities between patients in terms of their demographics like age, gender, and IQ scores [1, 17, 19, 21] as graph edges and fuse them with brain MRI for detection of Alzheimer’s disease or autism spectrum disorder. GCN’s for clinical event prediction were built to process graphs encoding complex similarity patterns between patients in terms of billing and diagnosis codes and laboratory test results [18, 32-35] while node features were obtained from chest X-rays. These models represent several architectural innovations including kernel size selection [21] and introduction of recurrence [17]. However, no exploratory study to assess model generalization has been made available for these models.
Some GCN-based models have been designed to extract both node and edge features from the imaging modality [36] rendering them essentially as adaptable as any other imaging-based model but lacking in multimodal data fusion qualities. Severity of the COVID-19 outbreak was modeled for geographical areas with spatial location used to build graph structure [37]. Such an application may be independent of adaptability. GCNN has been used to impute missing clinical data elements through understanding of complex relationships between patients, medications, and laboratory tests modeled as a heterogeneous graph [35]. This line of research combined with our proposed generalizable GCNN design framework can enhance generalizability and adaptability properties of GCNNs.
METHODOLOGY
Graph Convolutional Neural Network
Graph convolutional neural network (GCNN) advanced machine learning by allowing the model designer to choose the definition of ‘neighborhood’ to be incorporated by the model through definition of a graph where denotes the set of nodes and denotes the set of edges. In this scenario, sample from the cohort forms node with two feature vectors, i.e., node features and edge features . An edge between and sample, denoted as , is decided based on edge-formation function . GCNN model will learn to generate embeddings for node by manipulating nodes features of this node , and ‘messages’ received from nodes in its edge-connected neighbor . At graph convolutional layer, the following describes the process of generating embedding of node
| (eq.1) |
| (eq.2) |
Several Functions (mean or summation) can be used to combine messages from all neighbors of the node. indicates how messages from neighboring nodes and the nodes feature vectors are combined. Several options may be employed such as summation or concatenations. denotes weights of graph convolutional layer while denotes the non-linearity applied at this layer.
In a supervised learning scenario where target label for each node is available, node embedding generated by graph convolutional layers is used to predict target label for node as
| (eq.3) |
where denote the weights of the fully connected classification layer and denote the non-linearity applied at this layer.
Through backpropagation of loss such as binary cross entropy defined on ground truth and predicted labels , weight matrices , and are optimized where the model included graph convolutional layers.
The neighborhood of node can be defined based on its edge-connected nodes, i.e., . Messages are sent and received between nodes in a neighborhood. In essence, these messages are features of the nodes in the neighborhood . GCNN model learns the function parameters to manipulate features of the node and ‘messages’ being received through edge-connected nodes from its neighborhood . However, model never directly manipulates edge features , leaving edge formation function to be adapted based on characteristics of the data of the individual institutions when shipping the trained model from one institute to the other.
Adaptable GCNN Design for external use cases
Trained GCNN model requires node feature formation in the external cohort to be consistent with that in the internal cohort for a GCNN model trained on the internal cohort to be applicable on the external cohort. However, the GCNN model does not directly manipulate edge features , and hence, edge formation process can be adapted to suit external cohort without hindering the application of trained GCNN model on external cohort. The following represent two scenarios where such adaptation is crucial.
Case – 1:
Let us assume that edge features set of internal cohort is denoted as where and that of the external cohort is denoted as where where . Such distinct feature selection for the two cohorts may be the result of frequency-based selection of common features such as billing codes or administered medication. No existing ML model trained on internal feature vectors will be applicable to a separate set of external feature vectors. However, GCNN can tolerate such difference by employing these features for edge formation as GCNN models do not manipulate edge feature vectors directly.
Case – 2:
Let us assume that edge features for internal and external cohorts are the same, i.e., with . However, edge feature formation is more complex that frequency-based or binary representation. For example, edge features may be collected over time intervals, and an edge is formed based on similarity in temporal pattern of these features for nodes and , i.e., and . Even with the same set of features, temporal patterns may be different for internal and external cohorts. For large academic healthcare centers, such patterns may involve both in-patient and out-patient data. For databases collected for critical-care patients only, outpatient data may be missing in temporal patterns. Hence, temporal pattern forming function should be different for internal and external cohort, i.e., and . Putting limitations on pattern formation function may enable traditional ML models trained on output of internal temporal pattern function to be applicable to outputs of external temporal pattern function , but the graph learning paradigm provides more flexibility. To suit the characteristics of each cohort, and may produce output of different dimensions (), or operate of sequences of different length and , or even work on different set of features, i.e., and (encompassing the scenario described in case –1).
In terms of patients’ cohort, one node may represent a patient at a certain point in time and edge may denote that two connected nodes/patients are similar in terms of some demographic (e.g., age) or clinical features (e.g., comorbidities). This property has been exploited for detection of Alzheimer and autism spectrum disorder by building graphs with patients as nodes, brain imaging data as node features, and simple demographic features-based similarity used for edge formation [1,16,18,19,20]. We move beyond such modeling by allowing much more comprehensive information to be used as edge features, e.g., all recorded billing codes for patients, or auto-encoder based compressed representation of historical patterns in recorded billing codes and medications.
The primary intuition of our generalizable GCNN design is to represent the measured/recorded health data (e.g. images, lab values) as node features which face minimal chance of variability due to practice pattern and leverage the adaptability of the edge formation to represent the variable EHR information (e.g. diagnosis and procedure codes) (See Supplementary Table I).
The flowchart in Figure 1 shows all steps involved in graph formation, graph-based model development, and model evaluation on internal and external cohort highlighting differences in graph formation process for the two cohorts which do not hinder application of GCNN trained on the internal cohort on the external cohort.
Figure 1.

Flowchart for development and evaluation of graph-based models on internal and external cohorts.
Clinical use-cases of the adaptable GCNN design
We validated our model design scheme on two clinically relevant use-cases; 1) prediction of major clinical events (discharge from hospital and mortality) for patients hospitalized with positive RT-PCR test for COVID-19, 2) prediction of need for transfusion for hospitalized patients. Figures 2-a and 2-b show frameworks for both use-cases. The first use-case employs a branched framework where patients marked as highly probable for discharge are evaluated for mortality risk and only in-patient data is used. Second use-case employs a historical pattern of recorded procedures, comorbidities, and medication for a patient as well as data recorded during the first 48 hours of hospitalization for estimation of risk of blood transfusion. Code repositories for use-case 11 and 22 are publicly available.
Figure 2.

(a) Branched framework and graph formation for use-case 1, (b) Node and edge features processing and graph formation for use-case 2
Model Design
Figure 3 shows distributions of subgroups of billing code sets (CPT and ICD) for use-case 1 from two different institutions. External institute used a much larger number of diagnostic tests indicated by higher bars for diagnostic radiology and drug assay subgroups. While blood disease seems to be more common among patients in the internal institute than in the external institute, the external cohort had a larger fraction of patients with metabolic disorders and heart disease. Such data elements require adaptation when experimenting on the external cohort, and hence are suitable for edge feature formation. Such variations are handled by our generalizable model design which relies on the unique adaptable learning paradigm of the GCNN model as described earlier.
Figure 3 -.

Billing codes distribution; a) ICD, b) CPT, in internal and external sets for COVID-19 patients’ cohorts
Chest X-rays and tabular data (demographics, and CPT and ICD codes) were available for case-study 1. Chest x-ray embeddings (1024-dimensional vectors) extracted from pre-trained DenseNet-121 models which were used as node features . All tabular data elements (demographics, CPT and ICD) were employed as edge feature vectors as one-hot encoded vectors. CPT and ICD codes were mapped to their corresponding subgroups in CPT and ICD code taxonomies before one-hot encoding. Edges were decided based on cosine similarity between edge feature vectors of the two nodes, i.e., .
The transfusion prediction model employed a larger variety of EHR features. Latest results of selected labs (recorded as Normal/Abnormal/Unknown), trend of change (gradient) in five important vital signs (temperature, mean arterial pressure (MAP), SpO2, pain score, pulse rate) recorded during the first 48 hours of hospitalization, demographic features (gender, race, ethnicity, and age binned into 10 year intervals), and free-text field of reason for visit vectorized under tf-idf featurization scheme were concatenated to form node features for this model. These features represented the most recent information in patient’s health status (lab test) and were not suitable for long-term sequence generation such as over a year (vital signs). Demographic features(race, gender or ethnicity) or specific data related to the hospital admission (reason for visit) rather unchanging in nature and not evolving within a year (e.g. age binned in 10 year intervals),. Thus, no temporal modeling was needed for node feature formation. On the other hand, highly variable data elements like billing codes (CPT and ICD) and medications were concatenated for patient for every 24 hours to form sequences of features vectors . These sequences were passed through LSTM-based temporal embedding model described in the next section to form edge feature vectors . Edges were decided based on cosine similarity between edge feature vectors of the two nodes, i.e., .
Initial GCNNs were designed for transductive learning such that they can only process graph structures used for training with no ability to process new graph structures. However, SAGE (SAmple and aggreGatE) graph convolution network (GraphSAGE) [43] bypassed this limitation by optimizing a sample aggregate function to collect ‘messages’ from neighboring nodes while generating vector embedding of a node. During inference for hold-out patients, GraphSAGE employs an optimized aggregate function to generate embedding for unseen nodes in the graph structures. We employed GraphSAGE inspired GCNNs for both use cases to allow inference on unseen data points.
Temporal Embedding Model
Embedding model encodes temporal patterns recorded as three-point sequences; Timepoint 1 (T1): data collected between 6 and 12 months before hospitalization, Timepoint 2 (T2): data collected between 6 months before hospitalization to the time of hospitalization, Timepoint 3 (T3): data collected within first 48 hours of hospitalization. As previously mentioned, only the encoded feature values of billing codes (ICD and CPT codes) and medication classes were to be used as edge feature vectors. CPT and ICD codes were mapped to their corresponding subgroups in CPT and ICD hierarchies while medications were mapped to their corresponding therapeutic classes. All mapped features were represented as one-hot feature vectors.
Temporal embedding model is essentially an LSTM based encoder-decoder architecture. The feature vector at each time point is encoded to a latent space such that when decoded, it generates the feature vector of the next time point. Hence, the model is trained in a self-supervised fashion with no regard to any downstream prediction label. At each timepoint, the model generates an estimate of the feature vector at the next time-point (future information) and a hidden-stage vector in light of feature vector of the timepoint (current or present information) and hidden-state vector from the previous time-point (information from past time-points). Thus hidden-state vector at a time-point represents the essence of temporal information learnt by the model upto that time-point. Once the model has been trained, the last hidden state vector generated in response to an input sequence can be used as an embedded edge feature vector as it represents temporal information encoded by the model for the whole input sequence. Our transfusion prediction model operates on a graph of patients where edges between patients are decided based on similarity in their embedded vectors . As explained in the Methodology section, this temporal embedding model is trained separately for external data, while keeping the GCNN-based transfusion prediction model trained on the internal cohort applicable over the external cohort.
Three timepoints sequence design was selected experimentally as further finer-grained splitting of historical data resulted in many empty timepoints (time interval with no available data). This is due to the nature of data collected as inpatient vs outpatient. Most historical data is outpatient data except for cases where a patient was hospitalized in the last one year as well. Outpatient data is much more sparse than inpatient data. Note that external data was collected from MIMIC IV which is very different from the internal cohort. MIMIC data records patient hospitalizations with no outpatient data. Hence historical data is only available if the patient was hospitalized within the last one year as well. Still, retraining the embedding model provided a chance for adaptation to such a different scenario.
RESULTS
Cohort Selection
For use-case 1, the internal cohort included all patients admitted to Emory University Hospital (EUH) between Jan-Dec 2020 with positive RT-PCR test and for whom chest X-ray examinations (AP view) were acquired at regular intervals during their hospital stay. External cohort was selected with similar criteria from four geographically disparate sites of May Clinic (MC) for the year 2020. Approval from internal review boards of both EUH and MC were obtained, waiving the requirement of informed consent.
For use-case 2, the transfusion prediction model was trained on internal data collected from two sites (Rochester and Arizona) of MC. External validation was performed on two datasets; a) data collected from geographically distant sites of MC, i.e., Florida, and b) open-source MIMIC IV dataset. Relatively small number of patients required blood transfusion (approximately 0.5% of hospitalizations in MC in 2019 required blood transfusion). Such a small positivity rate of transfusion drove us to curate a training dataset through propensity matching for the control group.
Demographic features and five major comorbidities groups (metabolic disorders, hypertensive disease, heart disease, acute kidney failure adverse, effects of drugs) were used as confounders to perform one-on-one propensity matching with cases (hospitalizations with blood transfusion) to select control groups (hospitalizations without blood transfusion). Salient characteristics of all cohorts are described in Table 1.
Table 1-.
Cohort characteristics - Race X: American Indian/Alaskan Native, Race Y: Native Hawaiian/Pacific Islander, Ethnicity Z: Hispanic or Latino
| COVID-19 Clinical Event Prediction Cohort | Transfusion Prediction Cohort | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Split (Patients, Hospitalization) |
Train (1578, 5741) |
Validation (184, 527) |
Test (439, 1545) |
External test (1082, 3800) |
Train (8752, 11290) |
Validation (993, 1286) |
Test (2472, 3151) |
Internal Holdout site (3041, 4042) |
External test (68,68) |
|
| Age | mean+/−std | 58.9 +/− 17.5 | 59.4 +/− 18.0 | 61.0 +/− 17.1 | 65.3 +/− 15.4 | 62.1 +/− 18.2 | 62.6 +/− 19.2 | 62.3 +/− 18.3 | 63.5 +/− 14.6 | 69.0 +/− 15.3 |
| Sex | Female | 785(49.7%) | 90(48.9%) | 215(49.0%) | 400(37.0%) | 3989(45.6%) | 438(44.1%) | 1141(46.2%) | 1459(48.0%) | 26(38.2%) |
| Male | 793(50.3%) | 94(51.1%) | 224(51.0%) | 682(63.0%) | 4763(54.4%) | 555(55.9%) | 1331(53.8%) | 1582(52.0%) | 42(61.8%) | |
| Race | White | 384(24.3%) | 43(23.4%) | 127(28.9%) | 892(82.4%) | 7688(87.8%) | 869(87.5%) | 2175(88.0%) | 2332(76.7%) | 48(70.6%) |
| Black | 1014(64.3%) | 121(65.8%) | 270(61.5%) | 86(7.9%) | 291(3.3%) | 33(3.3%) | 91(3.7%) | 455(15.0%) | 8(11.8%) | |
| Asian | 36(2.3%) | 6(3.3%) | 9(2.1%) | 46(4.3%) | 207(2.4%) | 20(2.0%) | 59(2.4%) | 101(3.3%) | 0(0%) | |
| Race X | 7(0.4%) | 0(0%) | 4(0.9%) | 14(1.3%) | 126(1.4%) | 13(1.3%) | 32(1.3%) | 10(0.3%) | 0(0%) | |
| Race Y | 4(0.3%) | 0(0%) | 2(0.5%) | 2(0.2%) | 23(0.3%) | 0(0%) | 9(0.4%) | 9(0.3%) | 0(0%) | |
| Unknown | 133(8.4%) | 14(7.6%) | 27(6.2%) | 42(3.9%) | 417(4.8%) | 58(5.8%) | 106(4.3%) | 134(4.4%) | 12(17.6%) | |
| Ethnicity | Not Z | 1362(86.3%) | 155(84.2%) | 377(85.9%) | 971(89.7%) | 8036(91.8%) | 896(90.2%) | 2276(92.1%) | 2767(91.0%) | 56(82.4%) |
| Z | 114(7.2%) | 19(10.3%) | 30(6.8%) | 99(9.1%) | 421(4.8%) | 62(6.2%) | 112(4.5%) | 167(5.5%) | 0(0%) | |
| Unknown | 102(6.5) | 10(5.4%) | 32(7.3%) | 12(1.1%) | 295(3.4%) | 35(3.5%) | 84(3.4%) | 107(3.5%) | 12(17.6%) | |
| Comorbidities | Diabetes | 746(47.3%) | 84(45.7%) | 205(46.7%) | 409(37.8%) | 2533(28.9%) | 268(27.0%) | 713(28.8%) | 1050(34.5%) | 0(0.0%) |
| Hypertension | 1076(68.2%) | 119(64.7%) | 311(70.8%) | 711(65.7%) | 5508(62.9%) | 620(62.4%) | 1575(63.7%) | 2176(71.6%) | 48(70.6%) | |
| Heart Disease | 831(52.7%) | 83(45.1%) | 230(52.4%) | 814(75.2%) | 5705(65.2%) | 659(66.4%) | 1647(66.6%) | 1925(63.3%) | 0(0.0%) | |
| Kidney Disease | 266(16.9%) | 22(12.0%) | 79(18.0%) | 554(51.2%) | 3932(44.9%) | 449(45.2%) | 1164(47.1%) | 1526(50.2%) | 0(0.0%) | |
Quantitative performance
As both use-cases involve multiple data elements and the proposed approach is based on fusion of all data elements through GCNN-based models, an intuitive comparative baseline is formed by single-modality models, each using only one of the data elements. We trained and evaluated such single modality models for all cases. We also employed a traditional fusion modeling approach, i.e., late fusion, which gathered target label probability estimates from single modality models and processed them together through a meta-learner for final target prediction. Tables 2 and 3 show performance of clinical event predictors for use-cases 1 and 2, respectively, for both internal held-out test sets and external sets.
Table 2 -.
Performance of clinical event prediction models for COVID-19 patients
| Model | Internal (EUH) | External (Mayo Clinic) | ||||
|---|---|---|---|---|---|---|
| Sensitivity | Specificity | AUROC | Sensitivity | Specificity | AUROC | |
| Hospital Discharge Prediction | ||||||
| Non-imaging (EHR) | 72.9 [71.7-74.1] | 59.9 [58.2-61.5] | 71.5 [70.4 72.5] | 48.9 [48.2-49.7] | 64.1 [62.8-65.4] | 57.8 [56.9-58.7] |
| Images (X-rays) | 71.7 [70.5-72.9] | 65.9 [64.2-67.4] | 74.9 [73.8-76.0] | 59.6 [58.9-60.3] | 54.6 [53.4-56.0] | 60.0 [59.1-60.9] |
| Late fusion | 69.7 [68.4-70.8] | 65.5 [64.1-67.2] | 74.5 [73.4-75.6] | 51.2 [50.4-51.9] | 61.3 [60.0-62.7] | 58.1 [57.2-59.0] |
| GCNN-Demo | 70.2 [69.0-71.4] | 68.9 [67.3-70.5] | 76.0 [75.0-77.0] | 60.7 [60.0-61.4] | 62.2 [60.9-63.5] | 65.2 [64.3-66.1] |
| GCNN-CPT | 71.1 [69.9-72.3] | 69.6 [68.2-71.4] | 77.1 [76.1-78.2] | 64.8 [64.1-65.4] | 57.4 [55.9-58.8] | 64.6 [63.7-65.5] |
| GCNN-ICD | 69.5 [68.2-70.6] | 70.2 [68.7-71.8] | 76.2 [75.1-77.3] | 64.5 [63.7-65.2] | 66.3 [65.1-67.5] | 70.0 [68.7-70.4] |
| Mortality Prediction | ||||||
| Non-imaging (EHR) | 86.4 [84.2-89.4] | 83.8 [82.5-85.3] | 86.7 [84.9-88.5] | 79.3 [77.2-81.5] | 81.4 [80.3-82.6] | 86.7 [85.7-87.7] |
| Images (X-rays) | 86.4 [84.0-89.0] | 77.8 [76.4-79.5] | 88.1 [86.9-89.3] | 64.9 [62.3-67.5] | 35.6 [34.2-37.1] | 48.7 [46.9-50.4] |
| Late fusion | 85.6 [83.0-88.5] | 81.1 [79.8-82.8] | 88.6 [87.2-90.2] | 76.0 [74.0-78.3] | 82.2 [81.1-83.4] | 81.6 [80.3-82.9] |
| GCNN-Demo | 84.7 [82.3-87.6] | 81.1 [79.7-82.6] | 89.4 [88.4-90.7] | 78.1 [76.1-80.2] | 86.1 [85.1-87.2] | 88.8 [87.9-89.8] |
| GCNN-CPT | 74.6 [71.3-78.0] | 84.5 [83.0-85.9] | 86.6 [85.2-88.0] | 83.1 [81.3-84.8] | 86.1 [85.2-87.4] | 91.4 [90.6-92.3] |
| GCNN-ICD | 84.7 [82.3-87.9] | 82.5 [81.1-83.9] | 90.1 [89.0-91.3] | 81.4 [79.6-83.5] | 74.6 [73.2-76.0] | 85.3 [84.2-86.5] |
Table 3 -.
Performance of all models for prediction of transfusion for hospitalized patients; ‘--’ were added to the places where the performance cannot be computed due to missing data.
| Model | Internal (Mayo Clinic – Rochester and Arizona) |
Held-out site (Mayo Clinic Florida) |
External (MIMIC-IV) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Sensitivity | Specificity | AUROC | Sensitivity | Specificity | AUROC | Sensitivity | Specificity | AUROC | |
| Demographics | 54.0 [52.0-56.0] |
51.5 [50.0-52.9] |
54.2 [52.8-55.6] |
53.8 [52.2-55.7] |
45.7 [44.3-47.0] |
50.5 [49.3-51.9] |
60.0 [50.0-75.0] |
66.7 [63.2-70.2] |
63.3 [54.2-76.3] |
| CPT | 54.8 [52.8-56.8] |
66.1 [64.8-67.4] |
61.4 [59.9-62.9] |
57.0 [55.3-58.7] |
61.6 [60.3-62.7] |
59.7 [58.4-60.9] |
-- | -- | -- |
| ICD | 59.0 [56.9-60.9] |
63.0 [61.7-64.4] |
62.3 [60.9-63.7] |
62.0 [60.3-63.7] |
63.3 [62.2-64.6] |
63.7 [62.5-64.9] |
60.0 [50.0-75.0] |
71.4 [68.4-75.5] |
58.6 [52.2-66.5] |
| Lab Test | 59.8 [57.8-61.7] |
62.3 [60.9-63.6] |
64.5 [63.1-65.7] |
50.5 [48.6-52.2] |
59.0 [57.9-60.3] |
56.3 [54.9-57.4] |
60.0 [50.0-75.0] |
49.2 [45.3-52.8] |
45.1 [34.2-54.2] |
| Medications | 58.1 [56.2-59.9] |
62.9 [61.5-64.3] |
63.2 [61.7-64.6] |
57.0 [55.1-58.7] |
61.8 [60.6-63.0] |
61.1 [59.8-62.3] |
60.0 [50.0-75.0] |
69.8 [66.7-73.7] |
61.0 [50.0-76.3] |
| Reason for visit | 35.2 [33.4-37.2] |
79.6 [78.5-80.8] |
58.2 [57.0-59.7] |
31.1 [29.4-32.7] |
77.0 [76.1-78.1] |
55.5 [54.3-56.6] |
-- | -- | -- |
| Vitals | 61.6 [59.7-63.4] |
58.0 [56.6-59.3] |
62.8 [61.4-64.1] |
60.9 [59.3-62.7] |
54.5 [53.2-55.7] |
59.4 [58.1-60.6] |
80.0 [75.0-100.0] |
31.7 [28.1-35.2] |
43.2 [32.0-49.2] |
| Late Fusion | 64.0 [62.1-66.0] |
64.2 [62.9-65.5] |
69.9 [68.6-71.2] |
68.0 [66.4-69.4] |
60.9 [59.8-62.2] |
68.3 [67.2-69.4] |
60.0 [50.0-75.0] |
54.0 [50.8-57.9] |
44.4 [31.3-56.0] |
| GNN |
73.8 [72.0-75.6] |
65.4 [64.1-66.7] |
77.4 [76.3-78.5] |
70.2 [68.6-71.7] |
70.0 [68.9-71.2] |
77.1 [76.1-78.1] |
80.0 [66.0-95.0] |
69.8 [66.7-73.6] |
70.8 [62.5-84.7] |
For use case 1, other than the obvious difference in coding patterns (Figure 3), the patient populations are significantly different between EUH and MC (internal and external institutions, respectively) -(1) 49% female patients in EUH while 37% female in MC; (2) 61.5% African American in EUH and 7.9% in MC; (3) as comorbidities, 16.9% kidney disease in EUH and 51.2% in MC; 52.4% cardiovascular disease in EUH while 75.2% in MC (Table 1). For use-case 2, the external cohort was significantly small (68 patients); however similar variations in patient population were also observed.
In the challenging scenario formed by vast differences in internal and external datasets, individual modality classifiers and traditional fusion models struggle when presented with external data. GCNN models fared better under similar settings. For use-case 1, late fusion model achieved 0.58 [0.52 - 0.59] AUROC on the external dataset for hospital discharge prediction while the GCNN achieved 0.70 [0.68-0.70]. For mortality prediction, late fusion model achieved 0.81[0.80-0.82] AUROC on external dataset while the GCNN model achieved 0.91 [0.90 - 0.92]. For the use-case 2, we observed larger gaps in performance due to missing/incomplete data in the external dataset - late fusion achieved 0.44[0.31-0.56] while the GCNN model achieved 0.7 [0.62-0.84].
We hypothesize that superior performance of GCNN on external data is due to the adaptation of the edge formation function. To test this hypothesis, we applied GCNN based models without adaptation of edge formation function on external for all our clinical event prediction tasks and compared results with application of GCNN based models with edge formation function adaptation. The models suffered significant performance loss in majority of the cases when used without edge formation function adaptation (Table 4), thus confirming our hypothesis.
Table 4 -.
Performance of GCNN models on external cohorts with and without edge adaptation on external cohorts
| Model | With adaptation | Without adaptation | ||||
|---|---|---|---|---|---|---|
| Sensitivity | Specificity | AUROC | Sensitivity | Specificity | AUROC | |
| Hospital Discharge Prediction | ||||||
| GCNN-CPT | 64.8 [64.1-65.4] | 57.4 [55.9-58.8] | 64.6 [63.7-65.5] | 63.4 [62.7-64.1] | 58.0 [56.5-59.4] | 64.2 [63.4-65.1] |
| GCNN-ICD | 64.5 [63.7-65.2] | 66.3 [65.1-67.5] | 70.0 [68.7-70.4] | 63.3 [62.6-64.0] | 61.3 [59.9-62.6] | 65.8 [65.0-66.7] |
| Mortality prediction | ||||||
| GCNN-CPT | 83.1 [81.3-84.8] | 86.1 [85.2-87.4] | 91.4 [90.6-92.3] | 69.4 [67.2-71.7] | 74.4 [73.1-75.8] | 79.6 [78.3-80.9] |
| GCNN-ICD | 81.4 [79.6-83.5] | 74.6 [73.2-76.0] | 85.3 [84.2-86.5] | 73.6 [71.4-76.0] | 74.9 [73.4-76.2] | 82.3 [81.0-83.5] |
| Transfusion Prediction | ||||||
| GCNN | 80.0 [66.7-100.0] | 69.8 [65.4-74.6] | 70.8 [57.9-85.6] | 60.0 [33.3-75.0] | 74.6 [70.4-79.2] | 63.2 [49.4-76.8] |
DISCUSSION
As highlighted in the literature [7,8], the challenges related to generalization often limit the application scope of ML/DL models that could otherwise leverage the rich electronic health records for tasks such as clinical event prediction or patient phenotyping. We propose an adaptable GCNN framework for EHR modeling that is generalizable across institutions where the difference in patient population and coding practices are significant. The GCNN framework allows the definition of patient/case similarity to be incorporated into the model through definition of a graph which not only mimics the parts of clinical decision making but can also be utilized to overcome the generalizability limitation of traditional ML/DL models. While the GCNN model learns from the features of nodes directly, it implicitly learns from information used for edge formation through incorporation of edge-connected nodes. Thus, our generalizable GCNN design primarily represents the measured/recorded health data (e.g. images, lab values) as node features which usually have minimal variability across sites and leverage the adaptability of the edge formation to represent the variable EHR information (e.g. diagnosis and procedure codes).
We validated the proposed generalizable GCNN model design framework to solve two important clinical use-cases; 1) prediction of adverse clinical events for COVID-19 patients, and 2) prediction of blood transfusion in hospitalized patients. We trained the GCNN models using data from one institution (EUH/MC) and validated externally on MC and publicly available MIMICIV datasets, respectively. We attempted to find comparative past studies to compare our models against. We found that a vast amount of research had been focused on diagnosis of COVID-19 using chest X-rays [44-46] and prediction of mortality using clinical features, there has been relatively limited research done on building comprehensive frameworks for prediction of multiple clinical events for these patients. Study conducted by Vaid et. al [47] focused on predicting mortality for COVID-19 patients by using ICD-9 features selected by clinical experts. Their approach was similar to the non-imaging (EHR) baseline of our clinical use case 1. Their reported performance (AUROC 0.84-0.88 for different validation settings) was also similar to our baseline (AUROC 0.87) and was worse than the proposed framework (AUROC 0.90). No external validation was available for their work. Similar incomplete comparisons could also be made with [48,49]. Often clinical features used in these papers required careful curation and no implementation of code had been made available. For clinical use case 2 of blood transfusion risk prediction, literature survey showed that previous studies have been focused on such risk estimation for very narrow subpopulations of hospitalized patients such as surgical patients with hip or shoulder replacement [50,51] or postpartum patients only [52,53]. Given their limited cohorts of interest, these models tended to incorporate only a limited set of manually selected clinical features which were suitable and informative of transfusion risk for their cohort of interest only. Since the proposed model in clinical use case 2 was designed for all in-patient populations, these studies did not provide direct performance comparison. Considering these limitations, we relied on development of several baseline models for both clinical use cases using traditional ML models and modern DL models. During our experimentations, even though the performances on the internal datasets were close with the baseline ML/DL models, GCNN models consistently outperformed the traditional models on the external datasets. We hypothesized that this performance trend is due to the ability of graph-based models to adapt their edge formation functions without requiring any re-training or fine tuning the model itself.
The core contribution of our work is formalization of a design framework exploiting two-fold learning capacity of GCNN for inducing generalization and adaptability properties in the face of significant practical pattern-based variation in clinical data elements. The proposed design framework can cover model designs with multiple input modalities (imaging and non-imaging data) as well as complex temporal model-based graph structure formation designs. Our proposed design has several important advantages. First, the model trained on an internal dataset does not need fine-tuning or retraining on the external data, even when the EHR data structure and coding frequency differs significantly between the institutions. Second, the graph design implicitly models the similarity between the patients and thus mimics the clinical decision making. Third, the adaptable edge formation technique allows to explore institution specific variables to define the patient/case similarity and provide flexible design choice. Fourth, graph design allows integration of multi-modal data (images + EHR).
Furthermore, though our design framework was not particularly focused on solutions to missing data, it can handle partially missing data as unavailable codes or demographic categories, and ideally the effect would be minimal compared to the traditional models since such features are only used to define the edges. For example, missing one code may not have any significant effect on the model if relevant codes from the same subgroup are available. This is applicable to billing codes as well as medication. However, vital signs are represented as node features with trend-of-change values computed from a sequence of recorded values within the first 48 hours of hospitalization (use-case 2). In this case, missing one record of a vital sign (temperature, blood pressure, etc.) may be insignificant if enough records are available to compute a meaningful trend of change. Similarly, in use-case I, a patient without any chest X-ray cannot be made part of the graph and hence, cannot be processed by the model.
Limitations:
There are several limitations in this study, such as those associated with a retrospective design of both use-cases. In addition, for the MIMIC dataset, the timestamp associated with the CPT code and reason for admission were missing, thus we were not able to evaluate the model performance using those data elements. Given the low prevalence, we used propensity score matching to select the control cases which provided only a selective sample for validation. A limitation may arise from missing data such as missing data modalities for some patients in the populations. While feature imputation techniques may be applicable on tabular clinical data elements, they will not be applicable on unstructured data types like imaging data.
Implementation of graph-based models in clinical workflow may face additional challenges imposed by the nature of graph convolutional modeling. First, unstructured modalities like images and text may still need pre trained feature extraction models for generating vector embeddings from such modalities to be used as node or edge feature. Therefore, the final GCNN performance depends on the selection of feature extraction models. Second, GCNN architectures may also suffer from a “homogenization” effect if too many graph convolution layers are employed. Chest x-rays feature vectors in use-case 1 that already share significant similarity with each other as they are all extracted from images with dark background and a human-shaped object in the foreground. During GCNN processing, “messages” passed between nodes which are essentially chest x-ray feature vectors are aggregated for each node with its directly connected neighbors. Additional convolution layers will combine already aggregated “messages” of feature vectors, leading to high levels of similarity between processed node feature vectors known as the “homogenization” effect leading to loss of any discriminatory information between nodes. On the other hand, this effect may not be as pronounced for tabular features like billing codes. However, tabular features like demographics may have moderate-to-vast variations in different cohorts, making it difficult to identify the optimal number of graph convolutional layers to be used keeping generalization to the external cohort in mind. Third, as with any deep learning-based model, computational resources may be a limitation in integration of GCNN models with clinical workflow, particularly for large-scale inference task since the full connect graph needs to be represented in the memory.
Finally, graph-based models are inherently more difficult to explain as explanations must cover both node feature importance as well as important neighborhood and its characteristics. Some frameworks have been proposed where subgraphs are randomly sampled to identify neighborhoods of importance for labels predicted for every node [38]. Tariq et. al developed a visualization dashboard for a graph-based model for estimation of risk of hospital acquired infection, demonstrating a way of employing neighborhood sampling to enhance model interpretability [32, 33]. However, the challenge lies in the identification of common clinical characteristics of the neighborhood of interest. When graph formation is simplistic, such that edges are decided based on the common age group of the patients, it is straightforward to identify the common characteristics of the neighborhood of interest. Complex graph formation such that similarity between temporally embedded feature vectors as used in our use-case 2, common characteristics of the neighborhood of interest are hard to define. Input of clinical experts may be needed to fully explain the model’s predictions after automated identification of neighborhoods of interest in graph-based models.
CONCLUSION
We proposed a novel solution to the challenges faced by machine learning and deep learning-based models relying on electronic health records for predictive modeling for patient populations. Generally, such models suffer from poor generalization capabilities due to differences in medical practice patterns and patient population characteristics when applied outside of their institute of training. Our systematic design of graph based convolutional neural networks implicitly learns from such varying data elements of electronic health records through their use in the edge formation process. Edge formation function can be adapted to suit the new population when models are to be tested externally without needing any retraining of the originally trained model. We proved the benefits of our approach through its application on two clinically relevant problems; each tested on two diverse populations. We included a wide variety of electronic health records data elements as well as imaging information, indicating that our approach is capable of handling complex multi-modal data while developing highly adaptable models. In future, we will attempt to study the utility of the proposed models as a clinical application use-case through prospective evaluation of these models through their integration with institutional electronic healthcare and imaging databases.
Supplementary Material
STATEMENT OF SIGNIFICANCE.
| Problem Deep learning (DL) models for EHR data experience performance degradation when applied outside of their training institute because of differences in patient populations and practice patterns, and inconsistent and incomplete data collected from different sources. |
| What is already known Data imputation and manual curation of important clinical features are often used to improve generalizability of models. Imputation is inapplicable to modalities like radiology images and manual curation process may be expensive and prone to errors. |
| What this paper adds We propose a systematic solution by using two-fold, direct and indirect, learning capabilities of graph convolutional neural network based modeling. |
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
COMPETING INTERESTS STATEMENT
Authors have no conflicts of interest to report.
AUTHORS CREDIT STATEMENT
A.T. contributed to study design, model development and manuscript writing.
G.K. contributed to data preprocessing and manuscript writing.
L.S. contributed to study design, cohort selection and manuscript writing.
J.G. contributed to study design, external validation and manuscript writing.
B.P. contributed to study design, cohort selection, external validation and manuscript writing.
I.B. contributed to study design, evaluation experiment design, and manuscript writing.
DATA AVAILABILITY STATEMENT
MIMIC dataset is publicly available. Third party datasets can be accessed through reasonable requests made to the third parties.
REFERENCES
- [1].Cao M, Yang M, Qin C, Zhu X, Chen Y, Wang J and Liu T, "Using DeepGCN to identify the autism spectrum disorder from multi-site resting-state data," Biomedical Signal Processing and Control, vol. 70, p. 103015, 2021. [Google Scholar]
- [2].Payrovnaziri SN, Chen Z, Rengifo-Moreno P, Miller T, Bian J, Chen JH, Liu X and He Z, "Explainable artificial intelligence models using real-world electronic health record data: a systematic scoping review," Journal of the American Medical Informatics Association, vol. 27, p. 1173–1185, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Futoma J, Simons M, Panch T, Doshi-Velez F and Celi LA, "The myth of generalisability in clinical research and machine learning in health care," The Lancet Digital Health, vol. 2, p. e489–e492, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Duan H, Sun Z, Dong W, He K and Huang Z, "On clinical event prediction in patient treatment trajectory using longitudinal electronic health records," IEEE Journal of Biomedical and Health Informatics, vol. 24, p. 2053–2063, 2019. [DOI] [PubMed] [Google Scholar]
- [5].Dong X, Rashidian S, Wang Y, Hajagos J, Zhao X, Rosenthal RN, Kong J, Saltz M, Saltz J and Wang F, "Machine learning based opioid overdose prediction using electronic health records," in AMIA Annual Symposium Proceedings, 2019. [PMC free article] [PubMed] [Google Scholar]
- [6].Cohen ME, Liu Y, Liu JB, Ko CY and Hall BL, "Use of a single CPT code for risk adjustment in American College of Surgeons NSQIP Database: is there potential bias with practice-pattern differences in multiple procedures under the same anesthetic?," Journal of the American College of Surgeons, vol. 226, p. 309–316, 2018. [DOI] [PubMed] [Google Scholar]
- [7].Fraccaro P, Van Der Veer S, Brown B, Prosperi M, O’Donoghue D, Collins GS, Buchan I and Peek N, "An external validation of models to predict the onset of chronic kidney disease using population-based electronic health records from Salford, UK," BMC medicine, vol. 14, p. 1–15, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Agniel D, Kohane IS and Weber GM, "Biases in electronic health record data due to processes within the healthcare system: retrospective observational study," Bmj, vol. 361, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Gupta RK, Marks M, Samuels THA, Luintel A, Rampling T, Chowdhury H, Quartagno M, Nair A, Lipman M, Abubakar I and others, "Systematic evaluation and external validation of 22 prognostic models among hospitalised adults with COVID-19: an observational cohort study," European Respiratory Journal, vol. 56, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Wong A, Otles E, Donnelly JP, Krumm A, McCullough J, DeTroyer-Cooley O, Pestrue J, Phillips M, Konye J, Penoza C and others, "External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients," JAMA Internal Medicine, vol. 181, p. 1065–1070, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Nestor B, McDermott MBA, Boag W, Berner G, Naumann T, Hughes MC, Goldenberg A and Ghassemi M, "Feature robustness in non-stationary health records: caveats to deployable model performance in common clinical machine learning tasks," in Machine Learning for Healthcare Conference, 2019. [Google Scholar]
- [12].Zhang P-I, Hsu C-C, Kao Y, Chen C-J, Kuo Y-W, Hsu S-L, Liu T-L, Lin H-J, Wang J-J, Liu C-F and others, "Real-time AI prediction for major adverse cardiac events in emergency department patients with chest pain," Scandinavian Journal of Trauma, Resuscitation and Emergency Medicine, vol. 28, p. 1–7, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Damen JA, Pajouheshnia R, Heus P, Moons KGM, Reitsma JB, Scholten RJPM, Hooft L and Debray T, "Performance of the Framingham risk models and pooled cohort equations for predicting 10-year risk of cardiovascular disease: a systematic review and meta-analysis," BMC medicine, vol. 17, p. 1–16, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Alaa AM, Bolton T, Di Angelantonio E, Rudd JHF and Van der Schaar M, "Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants," PloS one, vol. 14, p. e0213653, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Mandair D, Tiwari P, Simon S, Colborn KL and Rosenberg MA, "Prediction of incident myocardial infarction using machine learning applied to harmonized electronic health record data," BMC medical informatics and decision making, vol. 20, p. 1–10, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Franz L, Shrestha YR and Paudel B, "A deep learning pipeline for patient diagnosis prediction using electronic health records," arXiv preprint arXiv:2006.16926, 2020. [Google Scholar]
- [17].Kazi A, Shekarforoush S, Arvind Krishna S, Burwinkel H, Vivar G, Kortüm K, Ahmadi S-A, Albarqouni S and Navab N, "InceptionGCN: receptive field aware graph convolutional network for disease prediction," in International Conference on Information Processing in Medical Imaging, 2019. [Google Scholar]
- [18].Tariq A, Tang S, Sakhi H, Celi LAG, Newsome J, Rubin D, Trivedi H, Gicchoya JW, Patel B and Banerjee I, "Graph-based Fusion Modeling and Explanation for Disease Trajectory Prediction," medRxiv, 2022. [PMC free article] [PubMed] [Google Scholar]
- [19].Parisot S, Ktena SI, Ferrante E, Lee M, Guerrero R, Glocker B and Rueckert D, "Disease prediction using graph convolutional networks: application to autism spectrum disorder and Alzheimer’s disease," Medical image analysis, vol. 48, p. 117–130, 2018. [DOI] [PubMed] [Google Scholar]
- [20].Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A, "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015. [Google Scholar]
- [21].Valenchon J and Coates M, "Multiple-graph recurrent graph convolutional neural network architectures for predicting disease outcomes," in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. [Google Scholar]
- [22].Xiao C, Choi E and Sun J, "Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review," Journal of the American Medical Informatics Association, vol. 25, p. 1419–1428, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Goldstein BA, Navar AM, Pencina MJ and Ioannidis JP, 2017. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. Journal of the American Medical Informatics Association: JAMIA, 24(1), p.198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Payrovnaziri SN, Chen Z, Rengifo-Moreno P, Miller T, Bian J, Chen JH, Liu X and He Z, 2020. Explainable artificial intelligence models using real-world electronic health record data: a systematic scoping review. Journal of the American Medical Informatics Association, 27(7), pp.1173–1185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Kansagara Devan, et al. "Risk prediction models for hospital readmission: a systematic review." Jama 306.15 (2011): 1688–1698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Pathak J, Kho AN and Denny JC, 2013. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. Journal of the American Medical Informatics Association, 20(e2), pp.e206–e211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Rahmani K, Thapa R, Tsou P, Chetty SC, Barnes G, Lam C and Tso CF, 2023. Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction. International Journal of Medical Informatics, 173, p.104930. [DOI] [PubMed] [Google Scholar]
- [28].Lacson R, Eskian M, Licaros A, Kapoor N and Khorasani R, 2022. Machine learning model drift: predicting diagnostic imaging follow-up as a case example. Journal of the American College of Radiology, 19(10), pp.1162–1169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Song TA, Chowdhury SR, Yang F, Jacobs H, El Fakhri G, Li Q, Johnson K and Dutta J, 2019, April. Graph convolutional neural networks for Alzheimer’s disease classification. In 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019) (pp. 414–417). IEEE. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Choi E, Xu Z, Li Y, Dusenberry M, Flores G, Xue E and Dai A, 2020, April. Learning the graphical structure of electronic health records with graph convolutional transformer. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 01, pp. 606–613). [Google Scholar]
- [31].Kumar A, Tripathi AR, Satapathy SC and Zhang YD, 2022. SARS-Net: COVID-19 detection from chest x-rays by combining graph convolutional network and convolutional neural network. Pattern Recognition, 122, p.108255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Tang S, Tariq A, Dunnmon JA, Sharma U, Elugunti P, Rubin DL, Patel BN and Banerjee I, 2023. Predicting 30-day all-cause hospital readmission using multimodal spatiotemporal graph neural networks. IEEE Journal of Biomedical and Health Informatics, 27(4), pp.2071–2082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Tariq A, Lancaster L, Elugunti P, Siebeneck E, Noe K, Borah B, Moriarty J, Banerjee I and Patel BN, 2023. Graph convolutional network-based fusion model to predict risk of hospital acquired infections. Journal of the American Medical Informatics Association, 30(6), pp.1056–1067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Tariq A, Su L, Patel B and Banerjee I, 2023. Prediction of Transfusion among In-patient Population using Temporal Pattern based Clinical Similarity Graphs. In AMIA Annual Symposium Proceedings (Vol. 2023, p. 679). American Medical Informatics Association; [PMC free article] [PubMed] [Google Scholar]
- [35].Mao C, Yao L and Luo Y, 2022. MedGCN: Medication recommendation and lab test imputation via graph convolutional networks. Journal of Biomedical Informatics, 127, p.104000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Tang C, Hu C, Sun J, Wang SH and Zhang YD, 2022. NSCGCN: A novel deep GCN model to diagnosis COVID-19. Computers in Biology and Medicine, 150, p.106151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Sarkar S, Alhamadani A and Lu CT, 2022, December. Explainable Prediction of the Severity of COVID-19 Outbreak for US Counties. In 2022 IEEE International Conference on Big Data (Big Data) (pp. 5338–5345). IEEE. [Google Scholar]
- [38].Ying Z, Bourgeois D, You J, Zitnik M and Leskovec J, 2019. Gnnexplainer: Generating explanations for graph neural networks. Advances in neural information processing systems, 32. [PMC free article] [PubMed] [Google Scholar]
- [39].Dexter GP, Grannis SJ, Dixon BE and Kasthurirathne SN, 2020. Generalization of machine learning approaches to identify notifiable conditions from a statewide health information exchange. AMIA Summits on Translational Science Proceedings, 2020, p.152. [PMC free article] [PubMed] [Google Scholar]
- [40].Yang J, Soltan AAS & Clifton DA Machine learning generalizability across healthcare settings: insights from multi-site COVID-19 screening. npj Digit. Med 5, 69 (2022). 10.1038/s41746-022-00614-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Rasmy L, Wu Y, Wang N, Geng X, Zheng WJ, Wang F, Wu H, Xu H and Zhi D, 2018. A study of generalizability of recurrent neural network-based predictive models for heart failure onset risk using a large and heterogeneous EHR data set. Journal of biomedical informatics, 84, pp.11–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Wang C, Li Y, Tsuboshita Y et al. A high-generalizability machine learning framework for predicting the progression of Alzheimer’s disease using limited data. npj Digit. Med 5, 43 (2022). 10.1038/s41746-022-00577-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- [43].Hamilton WL, Ying R, and Leskovec J. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017. [Google Scholar]
- [44].Wang L, Lin ZQ, and Wong A, Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images. Scientific Reports, 2020. 10(1): p. 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Afshar P, et al. , Covid-caps: A capsule network-based framework for identification of covid-19 cases from x-ray images. Pattern Recognition Letters, 2020. 138: p. 638–643. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Khan AI, Shah JL, and Bhat MM, CoroNet: A deep neural network for detection and diagnosis of COVID-19 from chest x-ray images. Computer Methods and Programs in Biomedicine, 2020. 196: p. 105581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [47].Vaid A, et al. , Machine Learning to Predict Mortality and Critical Events in COVID-19 Positive New York City Patients: A Cohort Study. Journal of Medical Internet Research, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Vaid A, et al. , Federated Learning of Electronic Health Records Improves Mortality Prediction in Patients Hospitalized with COVID-19. medRxiv, 2020. [Google Scholar]
- [49].Shashikumar SP, et al. , Development and Prospective Validation of a Transparent Deep Learning Algorithm for Predicting Need for Mechanical Ventilation. medRxiv, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [50].Burns KA, Robbins LM, LeMarr AR, Childress AL, Morton DJ and Wilson ML, "Estimated blood loss and anemia predict transfusion after total shoulder arthroplasty: a retrospective cohort study," JSES open access, vol. 3, p. 311–315, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].Oglak SC, Obut M, Tahaoglu AE, Demirel NU, Kahveci B and Bagli I, "A prospective cohort study of shock index as a reliable marker to predict the patient’s need for blood transfusion due to postpartum hemorrhage," Pakistan Journal of Medical Sciences, vol. 37, p. 863, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Ahmadzia HK, Phillips JM, James AH, Rice MM and Amdur RL, "Predicting peripartum blood transfusion in women undergoing cesarean delivery: A risk prediction model," PLoS One, vol. 13, p. e0208417, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [53].Burns KA, Robbins LM, LeMarr AR, Childress AL, Morton DJ and Wilson ML, "Estimated blood loss and anemia predict transfusion after total shoulder arthroplasty: a retrospective cohort study," JSES open access, vol. 3, p. 311–315, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
MIMIC dataset is publicly available. Third party datasets can be accessed through reasonable requests made to the third parties.
