Abstract
Diabetic retinopathy (DR), a microvascular complication of diabetes, is the leading cause of vision loss among working-aged adults. However, due to the low compliance rate of DR screening and expensive medical devices for ophthalmic exams, many DR patients did not seek proper medical attention until DR develops to irreversible stages (i.e., vision loss). Fortunately, the widely available electronic health record (EHR) databases provide an unprecedented opportunity to develop cost-effective machine-learning tools for DR detection. This paper proposes a Multi-branching Temporal Convolutional Network with Tensor Data Completion (MB-TCN-TC) model to analyze the longitudinal EHRs collected from diabetic patients for DR prediction. Experimental results demonstrate that the proposed MB-TCN-TC model not only effectively copes with the imbalanced data and missing value issues commonly seen in EHR datasets but also captures the temporal correlation and complicated interactions among medical variables in the longitudinal clinical records, yielding superior prediction performance compared to existing methods. Specifically, our MB-TCN-TC model provides AUROC and AUPRC scores of 0.949 and 0.793 respectively, achieving an improvement of 6.27% on AUROC, 11.85% on AUPRC, and 19.3% on F1 score compared with the traditional TCN model.
Index Terms: Diabetic retinopathy, Electronic health records, Tensor data completion, Temporal convolutional network, Multi-branching outputs, Missing value, Imbalanced data
I. Introduction
Diabetic retinopathy (DR), caused by damage to the blood vessels in the tissue of the retina, is a microvascular complication of diabetes. It is a leading cause of vision loss among working-aged people around the world [1]. According to the American Diabetes Association (ADA), almost all type 1 patients and more than 60% of type 2 patients have varying degrees of retinopathy after 20 years of diabetes [2]. Vision loss can be prevented if DR is diagnosed at an early stage. Unfortunately, many diabetic patients are unaware of their potential risk of DR because DR is asymptomatic at the early stage. Thus, they miss the most effective time window to halt the DR progression, because vision loss or blindness is inevitable [3]. Additionally, although ADA suggests ophthalmic examinations in the annual DR screening guidelines, the compliance rates are below expectation, which results in 19% of DR patients being undiagnosed [1], [3]. The shortage of expensive medical devices and experts for ophthalmic exams also limits the availability of DR screening, especially in rural areas. As such, reliable and cost-effective DR detection tools are urgently needed to facilitate DR screening.
With the rapid development of medical information technology, we now live in an era of data explosion where a large amount of data are readily available and accessible in the clinical environment [4], [5], [6], [7]. This data-rich environment provides an unprecedented opportunity for developing automated tools for disease diagnosis. For example, many data-driven methods have been developed to support clinical decision-making by leveraging electronic health record (EHR) data [8], [9], [10], which contain rich health information including patient demographics, vital signs, and lab tests. However, EHRs are generally complexly structured with a high level of heterogeneity. Fully utilizing EHRs for reliable DR diagnosis poses the following challenges.
(1). Poor data quality.
Real-world EHRs suffer from the issues of imbalanced data and missing values. Imbalanced dataset with skewed distribution between two classes results from the fact that the number of negative patients (without a specific disease) is generally much larger than that of positive patients. Disease detection tasks are often formulated as binary classification problems, where each training sample is from two classes (e.g., positive vs. negative). With imbalanced training data, the majority class will dominate the classifier, yielding unsatisfactory performance in predicting the minority class. Missing value problems occur due to the heterogeneity in clinical needs and lack of documentation [11], [12]. For example, patients often do not take all types of lab tests at every clinic visit. A patient may do Complete Blood Count test to measure the count of red and white blood cells at one visit, while he/she may take Comprehensive Metabolic Panel test to measure the level of Alanine Aminotransferase (ALT) and Blood Urea Nitrogen at the next visit. Such data quality issues will result in biases in decision-making if left unaddressed, constraining the clinical value of EHRs.
(2). Multivariate longitudinal records.
EHR datasets are commonly associated with multivariate longitudinal records (or sequences), especially for patients with chronic diseases such as DR. The medical variables usually not only have intra-sequence correlation (i.e., autocorrelation within one individual sequence) but also have inter-sequence correlation (i.e., the correlation between different sequences). For example, the observation of ALT of one DR patient at a given time stamp is correlated with the observation at the nearby time step. On the other hand, the ALT could also be correlated with the observations of Aspartate Aminotransferase (AST) at the same or different time steps. Note that EHRs present a wealth of information on the dynamic evolution of medical variables for large populations. This leads to a new three-way tensor form of data with patients × variables × sequences, as opposed to the table form of predictor and response variables in traditional predictive modeling. Novel machine learning approaches are urgently needed to effectively handle such complex data structure for disease prediction.
In this paper, we propose a Multi-Branching Temporal Convolutional Network with Tensor Data Completion (MB-TCN-TC) to model multivariate longitudinal EHRs for reliable DR prediction by addressing both the imbalanced data and missing value issues. We first utilize the CANDECOMP/PARAFAC (CP) decomposition to complete the tensor data for missing value imputation by simultaneously accounting for the multidimensional correlations between lab test variables, patients, and time sequences. Second, to further account for the imputation uncertainty, we propose to add the missing value masks as another input stream to our predictive model to address the possible discrepancy between the imputed values and true observations. Third, we adapt the multi-branching TCN framework [8] to handle the imbalanced data and model both the intra- and inter-sequence correlations in the longitudinal EHRs for DR prediction. We evaluate the proposed MB-TCN-TC on a large real-world dataset. Experimental results demonstrate that our MB-TCN-TC model achieves better performances for DR detection compared with existing methods.
II. Research Background and Literature Review
A. DR screening methods in current clinical practice
The most common screening method for DR is a dilated comprehensive eye examination (DCEE) performed by an ophthalmologist or optometrist [2]. According to the guidelines of diabetes care by ADA [13], adults with diabetes should have an initial DCEE within 5 years after the onset of type 1 diabetes or at the diagnosis time of type 2 diabetes, and have subsequent exams every 1–2 years if there is no evidence of DR, or at least annually if any level of DR is present at the initial exam. During the exam, doctors apply eye drops to enlarge patients’ pupils so that they can examine patients’ retinas with devices such as a slit lamp. Within 4–6 hours after dilation, the patient’s vision may be blurry and the patient is advised not to drive. This causes a significant inconvenience, especially for patients living in rural areas that are a few hours away from the ophthalmologist or optometrist’s clinic.
An alternative screening method for DR is fundus photography. Trained clinicians take fundus pictures of diabetic patients. The digital color fundus photographs of the retina are then evaluated by doctors or certified technicians either at point-of-care or remotely. This diagnosis process requires experts to manually inspect the fundus pictures and evaluate the presence of lesions associated with the vascular abnormalities, which is time-consuming and depends heavily on the clinicians’ domain knowledge.
B. Machine Learning for Healthcare Data
The fast-growing healthcare data presents remarkable prospects for data-driven scientific knowledge discovery and clinical decision support. The utilization of machine learning in healthcare data analysis has witnessed exponential growth in recent years. Numerous studies have demonstrated the potential transformative impact of machine learning algorithms in deciphering complicated healthcare data including medical images and EHRs to improve the precision of diagnostic and treatment strategies [4], [14].
Medical imaging such as Magnetic Resonance Imaging, Computed Tomography, and ophthalmoscopy enables nonintrusive assessment of the inner structures of the body for disease diagnosis. A substantial body of research has been conducted to develop effective machine learning methods to analyze the imaging data [15], [16], [17], [18], [19], [20]. In particular, Convolutional Neural Networks (CNNs) have gained great interest in the field of medical image analysis, largely attributable to their unique ability to discern intricate structural relationships among neighboring pixels in images [21]. For example, Sanghvi. et al. [22] developed a CNN model for the detection of COVID and Pneumonia using Chest X-ray images. Calisto et al. [23] and Dontchos [24] demonstrated the promising potential of CNN models for breast cancer diagnosis using breast images.
The EHR is an electronic version of a patient’s medical information over time including demographics, medications, vital signs, laboratory data, etc. A variety of statistical and machine learning models including logistic regression, support vector machine (SVM), random forest, and deep neural networks (DNNs) have been developed to investigate the EHRs with applications in data-driven prediction of numerous diseases such as sepsis [8], diabetes [25], and myocardial infarction [26]. A comprehensive review on machine learning of healthcare data can be referred to [10], [27]
C. Existing Work on Data-Driven DR Prediction
In the past decade, a variety of machine-learning models have been developed to analyze fundus images for automated DR detection. For example, Sopharak et al. [33] proposed a machine learning model for exudate detection in retinal images of diabetic patients. Specifically, they first leveraged ophthalmologists’ expertise to perform feature extraction and selection. Then, an SVM classifier was built based on the selected features for DR detection. Roychowdhury et al. [34] evaluated AdaBoost, k-nearest neighbor (KNN), SVM, and Gaussian Mixture Model for classifying retinopathy lesions from fundus images. Reza et al. [35] proposed a machine learning algorithm based on marker-controlled watershed segmentation to detect optic disc, exudates, and cotton wool spots in fundus images and further identified DR. In general, most traditional machine learning methods still depend heavily on manual feature extraction and selection, which is not only a trial-and-error labor-intensive process but also depends on human expertise [33], [36], [37].
Deep learning models have been proposed to analyze retinal images. The primary advantage of DNNs over traditional machine learning methods is that DNNs have the ability to learn complex representations of the input data without the need for explicit feature engineering [38]. DNN-based features have been proved to be more informative than hand-crafted features for data-driven disease detection [39], [40], [41], [42]. As a result, DNN models, especially CNNs, have been developed to analyze retinal images for DR prediction with better performances than conventional machine learning methods. For example, Ting et al. [43] proposed a deep learning model for screening DR and other eye diseases from retinal images. Wang et al. [44] developed a multi-channel generative adversarial network to investigate retinal images for DR diagnosis. Comprehensive reviews of data-driven DR diagnosis based on retinal images can be found in [36], [37].
Despite the impressive performances of DNNs in fundus photography recognition, expensive imaging devices (e.g., digital fundus cameras) and ophthalmic imaging skills are still needed, limiting the use of this approach to well-funded healthcare providers with trained technicians. More cost-effective screening tools based on EHRs, which is widely available and accessible to all healthcare settings, have been developed such as Cox’s proportional hazard model [28], decision trees-based methods [29], ensemble models [30], and extreme gradient boosting (XGBoosting) [31]. However, many of the methods, e.g., decision trees-based models and gradient boosting-based models, lack the ability to model the dynamic disease trajectory of DR. Our prior work [3] has shown that the longitudinal information of the disease dynamics is conducive to data-driven disease prediction. Note that Cox proportional hazard models generally assume there is a linear relationship between the medical variables and proportional hazard [45]. This linearity assumption often is not valid for real-world EHR data, which generally describes the nonlinear and nonstationary disease dynamics.
Table I summarizes the limitations in existing machine learning methods for DR prediction using EHRs. Specifically, data-driven DR prediction based on EHR data suffers from the problem of missing values and extremely imbalanced data as we discussed in Section I. In the literature, there are various studies on the imputation of missing medical data, which generally estimate the missing values by investigating the data structure and exploiting the correlation patterns. Common imputation techniques include case deletion, mean imputation, interpolation and extrapolation, and advanced machine-learning-based imputation [46], [47]. However, most of those methods focus on addressing the missing value issue in matrix-form data but are not directly applicable to capture the multi-dimensional correlation among patients, variables, and sequences in longitudinal EHR, and thus are not well-suited for tensor-form data imputation.
TABLE I:
Existing machine learning methods for DR prediction using EHRs and their limitations
| Authors | Methodology | Limitations | |
|---|---|---|---|
| 1 | Tsao. et al (2011) [28] | Cox Proportional Hazard Model |
|
| 2 | Ogunyemi. et al (2015) [29] | Ensemble Learning |
|
| 3 | Piri. et al (2017) [30] | Logistic Regression Random Forest Neural Network |
|
| 4 | Wang. et al (2021) [31] | Random Forest | |
| 5 | Chen. et al (2022) [3] | Temporal Neural Network | Not able to exploit the multi-dimensional correlation for missing value imputation. |
| 6 | Wang. et al (2023) [32] | Bayesian Finite Mixture Model |
|
Similarly, various methods have been proposed to handle the imbalanced data issue, such as random under/oversampling, informed under-sampling, and synthetic minority over-sampling (SMOTE) [48], [49], [50], [51]. However, under/over-sampling methods suffer from huge information loss or over-fitting problems. Informed under-sampling (i.e., KNN-based imputation), and SMOTE are not generally applicable for longitudinal data analysis because, in those methods, the similarity calculation is based on the Euclidean distance, which may not be able to capture the true closeness between longitudinal sequences [48], [52]. Therefore, novel analytical models are urgently needed to effectively learn from complexly structured longitudinal EHRs for reliable DR prediction.
III. Research Methodology
This section presents the proposed MB-TCN-TC framework for DR prediction from longitudinal EHRs. Suppose there are patients indexed by and each patient is described by the medical records denoted as : denotes the medical variables; is the time point when medical variables are observed; is the label with indicating that patient is diagnosed with DR and otherwise. As shown in Fig. 1(a), the data for all patients can be summarized as a tuple , where is 3-way tensor data of size with denoting the number of patients, variables, and time stamps respectively, and is the label vector. As illustrated in Fig. 1(b), the missing values will first be imputed with tensor completion. Second, missing value masks will be generated to capture the missingness patterns, which will be combined with the imputed tensor to generate two streams of balanced subsets by under-sampling the majority class, as shown Fig. 1(c). Third, the balanced datasets are fed into the MB-TCN to train the network and further predict the DR probability for new patients as shown in Fig. 1(d).
Fig. 1:

(a) Multivariate longitudinal sequence with missing data; (b) Tensor completion-based data imputation; (c) Balanced subsets by under-sampling; (d) Missingness-informed TCN with multi-branching outputs.
A. Tensor Completion for Missing Data Imputation
Tensor factorization has been proven to be an efficient method to study the latent structure in tensor-form data for missing value imputation [53]. The critical idea of the tensor-factorization-based imputation is to jointly consider the multi-dimensional correlations among medical variables, time, and patients in a compute-efficient way. The CANDECOMP/PARAFAC (CP) decomposition [53] is a widely used approach to study tensor data by projecting it into a linear combination of rank-1 tensors as shown in Fig. 2. Specifically, suppose the rank of 3D tensor (i.e., medical records) is , where and the element of is denoted as , where , and . Then, the CP decomposition of a complete tensor is represented as: , where , and are elements of the factor matrices , and . As such, the high-dimensional tensor data is characterized by the low-dimensional latent space defined by the factor matrices , and . However, such a decomposition procedure is not viable if tensor is incomplete. With missing data, CP decomposition is achieved by minimizing the mean squared error between the observed data and reconstructed tensor as:
| (1) |
where is a 3D indicator tensor with if is missing and otherwise.
Fig. 2:

Illustration of CP decomposition: , and denote the -th column vector of matrix , and , respectively.
We employ the CP-weighted optimization algorithm [54] to solve this minimization problem in Eq. (1) and estimate the factor matrices , and . Specifically, at each iteration, we compute and (where * denotes element-wise multiplication) and further derive the gradient of the objective function in Eq. (1) by computing the partial derivatives of with respect to each element, i.e., , of the factor matrices:
| (2) |
where and are the corresponding matrices resulted from the matricization (i.e., flattening or unfolding) of tensors and [55], [56]: , , and . The symbol ⊙ denotes the Khatri-Rao product, which generates a matrix of sizes for two matrices and according to:
| (3) |
where ⊗ denotes Kronecker product of two vectors: .
Given the computed gradients, the nonlinear conjugate gradient algorithm based on Hestenes-Stiefel [57] updates and the More-Thuente line search method for step size [58] are engaged to solve the optimization problem and estimate the reconstructed tensor as . As such, the original incomplete tensor is imputed as . Algorithm 1 summarizes the detailed procedure for tensor-based missing value imputation.

B. Missingness-informed TCN
TCN is an effective tool to capture both temporal correlations in longitudinal data (i.e., intra-sequence correlation) and heterogeneous patterns between different features (i.e., inter-sequence correlation) [59]. Here, we propose a missingness-informed TCN to process the imputed tensor data to capture critical temporal and variable correlations while being aware of missingness patterns for better prediction. As shown in Fig. 3, there are four key features in this model:
Fig. 3:

Architectural detail of the missingness-informed TCN; (a) Missing value mask; (b) The architecture of a residual block; (c) The dilated causal convolution with dilation factors and kernel size ; (d) The multi-branching output layer.
(1). Multi-channel input:
TCNs treat the multi-variable longitudinal data as multi-channel sequences and apply trainable filters to process such multi-channel input at the same time. Fig. 4 shows the fundamental procedure of how TCNs handle multi-variable longitudinal data. Specifically, each variable in the time series is considered as an individual input channel (e.g., 2 channels in the figure means there are 2 variables in the input). TCNs then use convolutional filters to process temporal data where the filter size determines how many time points the network examines in one operation, and the channel number of the filter is the same as the number of input channels. As the filters are applied across the input channels, the network can learn from multiple interrelated sequences simultaneously, effectively capturing the intersequence correlations.
Fig. 4:

Illustration of the multi-channel input in TCNs.
(2). Dilated causal convolution:
Fig. 3(c) shows the procedure of dilated causal convolution. Specifically, TCNs achieve the dilated causal convolution with filters over longitudinal input as
| (4) |
where denotes the observations of variable for patient over time, * represents the convolutional operation, , are the dilated factor, filters, and kernel size, respectively. The causal convolution is realized by , which ensures the direction of convolutional operation is toward the past and the network output at the current time is not a function of any future information after time . By tuning dilated factor and kernel size , TCNs have the ability to capture both local- and long-range correlations in the temporal domain.
(3). Residual block:
Residual block (Fig. 3(b))[60] is another critical attribute for effective longitudinal data modeling, which is leveraged to solve the problem of gradient degradation when the network goes deeper for modeling complexly structured data. The output of residual block is , where is the activation function, and is a series of operations including the dilated causal convolution, batch normalization, nonlinear activation, and dropout. Such residual operation enables the network to focus on the residual of the identity mapping to cope with the gradient degradation problem in DNN training [60]. Our proposed model consists of two layers of causal convolution and ReLU activation within a residual block. In addition, we implement batch normalization for each causal convolutional layer to re-center and re-scale the input (or the output of previous layer), which helps to improve the speed and stability of network training [61]. Moreover, a dropout layer is added for each causal convolution to avoid the possible over-fitting problem [62].
(4). Missing value masks:
To account for the possible discrepancy between the imputed values and true observations, the missing value masks, i.e., the 3D indicator tensor , will serve as another input of the TCN as shown in Fig. 3(a). This will enable the network to learn the missingness pattern, i.e., correlation between non-missing and missing values, for mitigating the impact of possible imputation error incurred from the imputation procedure on the prediction performance.
C. Multi-Branching Outputs
In this subsection, we adapt the multi-branching (MB) architecture [8] to cope with the imbalanced issue in the DR dataset. The MB network consists of a core TCN and an MB output layer. The core TCN can be treated as a feature extractor or backbone of the network, and the MB layer is the output classifier. Existing literature has shown that the classifier component of the network suffers more from the imbalanced data issue, while the backbone portion is less susceptible during training [63], [64]. As such, we propose to train the core TCN with the whole dataset and use a balanced sub-dataset to train each branching output in the MB layer to alleviate the impact of imbalanced data on the classifier.
We denote the original dataset as , which consists of the minority class, (i.e., DR patients), and majority class (i.e., non-DR patients). As shown in Fig. 1(c), balanced datasets are created from by under-sampling the non-DR class as , where . The same operation will also be conducted for the missing value masks. Furthermore, an MB layer with outputs is added at the end of the core TCN to process the balanced sub-datasets. Fig. 3 shows the details of our MB network architecture. Note that each balanced dataset consists of the imputed EHRs and missing value masks, which are processed by two parallel sequential residual blocks. Then, the results of two parallel streams are concatenated and further flow to the fully connected layer. In the end, the MB output layer will generate predicted DR probabilities based on the outputs of the fully connected layer. As such, each balanced dataset serves as the training data for the corresponding branch of the output layer, and the core TCN structure will be optimized by all the training subsets. Specifically, the optimization will be achieved by minimizing the binary cross-entropy loss:
where represents the model parameters, denotes the indicator function indicating whether patient belongs to subset , and is the prediction of DR probability provided by -th branching output for the input signal and the corresponding mask of patient . The final prediction of DR probability for patient is the mean of the predicted probabilities:
| (5) |
IV. Experimental Design and Results
A. Data Source and Data Extraction
The dataset we use in this study is obtained from the 2018 Cerner Health Facts® data warehouse, one of the largest Health Insurance Portability and Accountability Act (HIPAA)–compliant databases in the U.S. storing de-identified clinical data of more than 63 million patients [3]. This database contains comprehensive clinical information including patient demographics, hospital visits, diagnoses, procedures, medication prescriptions, vital signs, lab tests, etc., providing an unprecedented opportunity for data-driven diagnosis of DR. The labels are derived by examining whether one or more DR diagnosis codes exist and “1” represents a DR patient while “0” represents a non-DR diabetic patient.
We select independent variables based on literature [1], [44] and include 21 routine blood tests for diabetic patients, 5 comorbidity variables (i.e., neuropathy, nephropathy, hypertension, obesity, and coronary artery disease), 3 demographic variables (i.e., gender, age, and race), and the duration of diabetes, which is measured in years from the first diabetic diagnosis to the beginning of the prediction window. Note that all 21 blood tests are subject to missing values, the missing rates of which are shown in Fig. 5. The final dataset consists of 414,199 diabetic patients with a 3% (12,590) DR positive rate. We randomly partition the dataset into a training set (70%) for model training, a validation set (10%) for hyperparameters tuning, and a testing set (20%) for performance evaluation.
Fig. 5:

Missing rates for blood test variables in the dataset from DR patients (top) and Non-DR patients (bottom).
B. Experimental Design
As shown in Fig. 6, the performance of our MB-TCN-TC will be compared with TCN + different imputation techniques and TCN + imbalanced data handling methods. Specifically, our MB-TCN-TC method will be benchmarked with TCN + carry forward imputation (TCN+CF), TCN + mean imputation (TCN+mean), TCN + under-sampling, TCN + over-sampling, and pure TCN. Additionally, we will evaluate the number of branching outputs (i.e., 5, 10, and 20 branches) on the performance of MB-TCN-TC. The performance of both our model and benchmarking methods will be evaluated according to 6 metrics: Confusion Matrix, Area Under Receiver-Operating-Characteristic Curve (AUROC), Area Under Precision-Recall Curve (AUPRC), Recall, Precision, and F1 score. The ROC curve characterizes the tradeoff between the True Positive Rate (TPR) and the False Positive Rate (FPR), and the PRC curve captures the relationship between Precision and Recall. Hence, AUROC and AUPRC serve as the overall metrics across all thresholds, quantifying the overall prediction performance.
Fig. 6:

Illustration of the experimental design to evaluate the performance of the proposed MB-TCN-TC method.
Our model is built using Pytorch leveraging the computational power of an NVIDIA RTX A4500 GPU. In the CP decomposition-based imputation, we select the rank of the tensor , maximum number of iterations , and the error tolerance . The hyperparameters of our neural network are selected by empirical fine-tuning. Specifically, our final network model consists of 3 residual blocks and 10 branching outputs (i.e., ); the number of filters is 64 and kernel size is in the causal convolution; the dilation factor is selected as 1 for the first and 2 for the second dilated convolutional layer in each residual block; the dropout rate is 0.5; we select Adam optimizer with an initial rate of 0.0002 and momentum values of during the network training.
C. Experimental Results
Fig. 7 (a) and (b) illustrate the ROC and PR curves of our MB-TCN-TC with 10 branching outputs (MB-TCN-TC-10) and other methods. Note that a good model in the ROC space is located towards the top-left corner, indicating a low FPR and high TPR. According to Fig. 7(a), our method, i.e., MB-TCN-TC-10 with mask, dominates the ROC space with the highest AUROC of 0.949. On the other hand, in the PR space, a good classification model should achieve both higher Recall and Precision, towards the top-right corner. As shown in Fig. 7(b), the MB-TCN-TC-10 with mask outperforms other methods, yielding the highest AUPRC of 0.793. Notably, our MB-TCN-TC-10 method achieves an improvement of 6.27% on AUROC and 11.85% on AUPRC compared with the pure TCN without the tensor-based imputation and MB architecture (with AUROC of 0.893 and AUPRC of 0.709).
Fig. 7:

(a) ROC and (b) PR curves for the proposed MB-TCN-TC with 10 branching outputs and other methods.
Table II further shows the performance comparison in terms of Confusion Matrix along with Recall, Precision, and F1 metrics. Specifically, our MB-TCN-TC-10 yields the highest true positive rate with 1,880 DR patients correctly identified, indicating a strong Recall in detecting DR. Our model also generates a lower false negative with 680 patients, which is critical in medical diagnostics as it reduces the risk of missing a potential DR diagnosis. Moreover, our model maintains the highest overall Precision (0.723) with the smallest false positives (i.e., 722 patients). Hence, our model makes correct DR prediction more often than other benchmark methods. Additionally, the highest Recall of 0.734 suggests that our MB-TCN-TC-10 has the best capability to identify all DR patients. Our model also yields the highest F1 score of 0.728, achieving the best balance between Recall and Precision.
TABLE II:
The comparison of performance scores between our MB-TCN-TC with 10 branching outputs and other methods.
| Model Name | True Positive | False Positive | True Negative | False Negative | Recall | Precision | F1 Score |
|---|---|---|---|---|---|---|---|
| MB-TCN-TC-10 | 1,880 | 722 | 79,558 | 680 | 0.734 | 0.723 | 0.728 |
| MB-TCN-TC-10 w/o Mask | 1,852 | 794 | 79,486 | 708 | 0.723 | 0.699 | 0.711 |
| TCN-TC + Over-sample | 1,844 | 822 | 79,458 | 716 | 0.720 | 0.691 | 0.705 |
| TCN-CF + Original | 1,800 | 1,540 | 78,740 | 760 | 0.703 | 0.539 | 0.610 |
Table III shows AUROC and AUPRC scores of our MB-TCN-TC and other imputation methods commonly used in current practice with and without (w/o) missing-value mask. Note that all the network models compared in Table III contain a classifier layer with 10 branching outputs. Our MB-TCN-TC with missing-value mask method provides the highest AUROC and AUPRC scores of 0.949 and 0.793 respectively, as compared with MB-TCN+CF with missing-value mask (with AUROC of 0.925 and AUPRC of 0.734), MB-TCN+CF w/o missing-value mask (with AUROC of 0.915 and AUPRC of 0.724), MB-TCN+mean with masks (with AUROC of 0.916 and AUPRC of 0.726), MB-TCN+mean w/o masks (with AUROC of 0.920 and AUPRC of 0.727), and MB-TCN-TC w/o masks (with AUROC of 0.937 and AUPRC of 0.775).
TABLE III:
The performance comparison in AUROC and AUPRC between our MB-TCN-TC model and other imputation methods with and w/o missing value mask. Note that all the network models contain an MB layer with 10 branching outputs.
| Model: MB-TCN-10 | Tensor Decomposition | Carry Forward imputation | Mean imputation | ||
|---|---|---|---|---|---|
| Mask w/o Mask | Mask | w/o Mask | Mask | w/o Mask | |
| AUROC | 0.949 0.937 | 0.925 | 0.915 | 0.916 | 0.920 |
| AUPRC | 0.793 0.775 | 0.734 | 0.724 | 0.726 | 0.727 |
Table IV further shows the comparison between our MB-TCN-TC-10 model and other methods in addressing the imbalanced data issue. Note that all methods compared here engage tensor decomposition-based imputation and missing value masks to handle the missing value issue. Our MB-TCN-TC-10 achieves better performance with AUROC and AUPRC scores of 0.949 and 0.793, respectively. Specifically, our MB-TCN-TC-10 achieves 1.71% and 4.89% improvements on AUROC and AUPRC respectively, compared with the under-sampling method; increases AUROC and AUPRC performance scores by 2.82% and 2.72%, compared with the pure TCN that is trained by the original imbalanced dataset; increases the AUROC and AUPRC metrics by 1.61% and 2.85%, compared to the over-sampling method.
TABLE IV:
The comparison between MB-TCN-TC-10 and other methods in handling the imbalanced data issue.
| Model | MB-TCN-TC-10 | TCN-TC + under-sampling | TCN-TC + over-sampling | TCN + original data |
|---|---|---|---|---|
| AUROC | 0.949 | 0.933 | 0.934 | 0.923 |
| AUPRC | 0.793 | 0.756 | 0.771 | 0.772 |
V. Discussion
A. Overfitting or underfitting issues in model training
Fig. 8 shows the values of training and validation losses of our MB-TCN-TC-10 model over 50 epochs. The training loss keeps decreasing with some random fluctuations when the epoch increases, while the validation loss first decreases and then stabilizes around a low value after about 20 epochs. A similar trend is also observed in the evolution of AUROC over epochs as shown in Fig. 8 (b). In our experiments, we select the total epoch number as 26 to increase the computational efficiency and guarantee the network model is well trained according to Fig. 8. Note that the validation loss (AUROC) does not further increase (decrease) as we increase the training epochs, indicating that there is no overfitting issue. This is attributable to the dropout layer in our network. Specifically, in our training procedure, we select the dropout rate as . The critical idea of dropout is to randomly freeze a subset of the DNN neurons with a pre-defined probability (i.e., ) at each training iteration [62]. This technique introduces stochasticity into network training to avoid the possible overfitting issue, which is equivalent to the regularization. In fact, the dropout loss function can be written as the summation of the original loss function and an regularization on the network parameters:
| (6) |
where regularization parameter is a function of the dropout rate denotes the number of hidden layers, and stands for the weights and bias for the th hidden layer. This regularization technique effectively avoids the overfitting issue.
Fig. 8:

The variations of training and validation (a) losses and (b) AUROC scores over epochs for our MB-TCN-TC-10 model.
Furthermore, it is worth noting that the training loss (AUROC) is larger (lower) than the validation loss (AUROC). This is also due to the use of dropout layer during model training. The dropout technique is only used in the training phase to avoid overfitting, while all neurons are active and contribute to the prediction during the validation or testing phase. In other words, regularization (i.e., the second term on the right hand) in Eq. (6) is removed during the validation or testing phase, leading to a lower validation loss compared to the training loss. Additionally, because all extracted features by the hidden layers remain active for prediction during the validation phase, a higher AUROC is generated for the validation set compared with the training phase.
B. Performance comparison with existing literature
Table V presents a comparison study of the performance scores between our MB-TCN-TC-10 model and traditional machine learning methods in existing literature. Note that traditional machine learning methods (i.e., random forest, XGBoost, simple neural network, logistic regression) are not directly applicable in handling longitudinal data. To facilitate a meaningful comparison, the longitudinal dataset is transformed into a non-temporal dataset by only keeping the last encounter of routine blood tests for each patient to train those models. According to Table V, our model outperforms the benchmarks across various metrics, notably better than the highest-performance benchmark, XGBoost. Specifically, the MB-TCN-TC-10 yields an improvement of 3.7% on AUROC and 8.1% in AUPRC compared to XGBoost.
TABLE V:
Performance comparison between our MB-TCN-TC-10 method and traditional machine learning methods.
| Model Name | True Positive | False Positive | True Negative | False Negative | AUROC | AUPRC | Recall | Precision | F1 Score |
|---|---|---|---|---|---|---|---|---|---|
| MB-TCN-TC-10 | 1,880 | 722 | 79,558 | 680 | 0.949 | 0.793 | 0.734 | 0.723 | 0.728 |
| Original TCN | 1,800 | 1,540 | 78,740 | 760 | 0.893 | 0.709 | 0.703 | 0.539 | 0.610 |
| Random forest | 1,805 | 2,382 | 77,898 | 755 | 0.907 | 0.728 | 0.705 | 0.431 | 0.535 |
| XGBoost | 1,813 | 2,345 | 77,935 | 747 | 0.915 | 0.734 | 0.708 | 0.436 | 0.540 |
| Neural network | 1,840 | 4,730 | 75,550 | 720 | 0.900 | 0.716 | 0.719 | 0.280 | 0.404 |
| Logistic regression | 1,772 | 2,669 | 77,611 | 788 | 0.882 | 0.709 | 0.692 | 0.399 | 0.507 |
Additionally, our model achieves the best Recall of 0.734 and the highest Precision of 0.723. The high Recall and Precision demonstrate that our model is capable of not only identifying DR patients more accurately but also achieving a superior reduction in false-positive rates. Moreover, the highest F1 score of 0.728 further demonstrates our model’s enhanced capability to maintain an optimal balance between Recall and Precision. Meanwhile, the MB-TCN-TC-10 provides the highest true positive rate with 1,880 DR patients correctly identified. Our model also yields the smallest false negative rate with 680 patients, which is critical in medical diagnostics as it reduces the risk of missing a potential DR diagnosis. Additionally, our model generates the smallest false positives (i.e., 722), maintaining the highest overall precision (0.723), which indicates that our model is able to make correct DR prediction more often than other benchmark methods. The superior performance of our model is due to its advanced architecture design (i.e., tensor decomposition-based imputation, missingness-informed TCN, and multi-branching output) to model multi-variate longitudinal data. In contrast, traditional machine learning methods cannot model longitudinal datasets to capture the dynamic disease trajectory, especially when dealing with missing information and imbalanced data issues, resulting in suboptimal prediction performance.
C. Discussion on the performance improvement
According to Tables III and IV, our MB-TCN-TC model effectively solved the missing value and imbalanced data issues, which are very common problems in data-driven disease detection. Specifically, given the same classification layer with 10 branching outputs, our MB-TCN-TC with missing-value mask model yields the best performance in DR identification (see Table III) compared with other imputation methods. This improvement is due to two unique features of our framework: (1) the tensor decomposition-based imputation is an effective method to jointly consider the multi-dimensional correlations among medical variables, time, and patients, which is conducive to reliably imputing the missing data; (2) the missing-value mask effectively captures the missingness structure for mitigating the impact of possible imputation error on the prediction performance.
Similarly, given the same imputation method (i.e., tensor decomposition-based imputation with missing value mask), our MB-TCN-TC-10 provides the most accurate prediction performance compared to other techniques in handling imbalanced data (see Table IV). This improvement is due to the MB-enhanced classifier layer, where each branching output is trained with a balanced dataset to mitigate the potential bias induced by the imbalanced data. Additionally, the MB architecture also combines the predictions from multiple independent classifiers according to Eq. (5): the branching outputs learn different mapping functions, combining which significantly lowers the variation and increases the robustness of the prediction performance.
Table VI further evaluates the prediction performance of the MB-TCN-TC with 5, 10, and 20 branching outputs. When increasing the MB outputs from 5 to 10, both the AUROC and AUPRC slightly improve. This is due to the fact that the imbalanced ratio in the training sub-dataset for each branching output improves from 6:1 to 3:1 when using 10 instead of 5 MB outputs. Note that there is a nonnegligible decrease in the performance scores when further increasing the branching outputs to 20. One possible reason is that the sample size of the balanced sub-dataset to train each branching output will decrease as the number of branches increases, which may lead to ineffective or insufficient training of each branching output. Additionally, increasing the branching outputs too much will also increase the model size, resulting in a more complex network that is more difficult to converge during the training.
TABLE VI:
The variation of AUROC and AUPRC generated by MB-TCN-TC with different numbers of MB outputs.
| Model | MB-TCN-TC-5 | MB-TCN-TC-10 | MB-TCN-TC-20 | ||
|---|---|---|---|---|---|
| Mask | w/o Mask | Mask w/o Mask | Mask | w/o Mask | |
| AUROC | 0.948 | 0.941 | 0.949 0.944 | 0.934 | 0.931 |
| AUPRC | 0.791 | 0.780 | 0.793 0.787 | 0.771 | 0.760 |
D. Broad applications
Our predictive model is built based on EHR data, which is readily available and accessible to all healthcare settings. It does not require fundus images taken by retinal cameras, which are expensive for underserved populations. Our approach provides a cost-effective non-image-based tool for DR screening, which has a promising potential to increase the compliance rate of recommended ophthalmic exams among asymptotic patients and overcome barriers to providing ubiquitous diabetic eye care in rural or underserved areas.
Furthermore, our approach provides effective solutions to the long-lasting challenges in modeling EHRs including incomplete datasets, imbalanced class distribution, and multivariate longitudinal records, thereby ensuring reliable data-driven disease detection. The versatility of MB-TCN-TC extends beyond DR prediction, making it applicable for automatic prediction of a wide range of diseases based on EHRs. Notably, chronic conditions such as Alzheimer’s disease, Myasthenia Gravis, and Hidradenitis Suppurativa, which typically exhibit gradual, long-term progression [65], [66], [67], can significantly benefit from our MB-TCN-TC approach. Our model can capture critical information about the heterogenous variable interaction and disease trajectory, which are difficult to discern through manual inspection of high-dimensional longitudinal EHR data. This capability facilitates accurate disease detection and has the potential to revolutionize chronic disease monitoring. These examples underscore the broad applicability of our research, indicating its potential for further exploration in diverse healthcare contexts.
E. Challenges in real-world clinical implementation
The model implementation in real-world clinical scenarios faces some challenges. One challenge is associated with the computational burden, which is a common issue in large-scale machine learning. The unique architecture of our MB-TCN-TC model allows it to capture intricate patterns in complexly structured EHR data. This, on the other hand, requires substantial computational resources to execute, especially when handling a large volume of EHR data. Specifically, our dataset contains 414,199 diabetic patients with multiple medical variables changing over time, resulting in approximately 217 million data points. In our study, we utilize NVIDIA RTX A4500 Graphics Processing Units (GPUs) to improve computational efficiency by parallel processing. As a result, the total training time for 50 epochs is approximately 10 hours.
Furthermore, DNN models are widely recognized to be difficult to optimize in addition to the computational burden. The objective function is generally non-convex, meaning there are multiple local optimal solutions. Traditional convex optimization algorithms and simple gradient descent methods [68] often struggle with this issue, possibly ending up with a suboptimal solution. The situation worsens with many flat regions in non-convex functions, where the function value is nearly constant. The gradient in such flat regions is tiny, leading the algorithms stuck at sub-optimal solutions. In this project, we implement the ADAM algorithm [69], a first-order gradient-based method that employs adaptive estimates of lower-order moments, to optimize our objective function. ADAM is an appealing choice due to its computational efficiency, minimal memory requirements, and suitability for handling large datasets or parameters. Its applicability to large-scale non-convex optimization enhances the optimization process of large DNN models.
VI. Conclusions
This paper presents a novel method of Multi-branching Temporal Convolutional Network with tensor completion (MB-TCN-TC) to investigate multi-variate longitudinal clinical data for reliable DR identification. We first leverage the method of tensor data completion to estimate the missing values in the incomplete EHR data. Second, we adapt the multi-branching temporal neural network framework to further cope with the imbalanced issue and model the longitudinal sequences. Experimental results show that the proposed MB-TCN-TC model is effective to capture critical temporal information and complicated variable interactions in the longitudinal EHRs by accounting for both the imbalanced data and missing value issues. Our MB-TCN-TC method yields superior performance when identifying DR from longitudinal EHR data with better AUROC and AUPRC metrics compared to existing methods. More importantly, this MB-TCN-TC framework can be generally applicable to model complex longitudinal sequences with imbalanced data and missing value issues for reliable health monitoring and diagnostics.
Acknowledgement
This research work was supported by the National Eye Institute of the National Institutes of Health under Award Number R01EY033861. We also acknowledge the Cerner Corporation and OSU Center for Health Systems Innovation (CHSI) for sharing the Health Facts® EHR database to support this research.
Contributor Information
Zekai Wang, Department of Industrial & Systems Engineering, The University of Tennessee, Knoxville, TN 37996..
Suhao Chen, Department of Industrial Engineering, South Dakota School of Mines and Technology, Rapid City, SD 57701..
Tieming Liu, School of Industrial Engineering and Management, Oklahoma State University, Stillwater, OK 74078.
Bing Yao, Department of Industrial & Systems Engineering, The University of Tennessee, Knoxville, TN 37996..
References
- [1].Yau JW, Rogers SL, Kawasaki R, Lamoureux EL, Kowalski JW, Bek T, Chen S-J, Dekker JM, Fletcher A, Grauslund J et al. , “Global prevalence and major risk factors of diabetic retinopathy,” Diabetes care, vol. 35, no. 3, pp. 556–564, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Association AD, “Diabetic retinopathy,” Diabetes Care, vol. 25, pp. s90–s93, 01 2002. [Online]. Available: 10.2337/diacare.25.2007.S90 [DOI] [Google Scholar]
- [3].Chen S, Wang Z, Yao B, and Liu T, “Prediction of diabetic retinopathy using longitudinal electronic health records,” in 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE). IEEE, 2022, pp. 949–954. [Google Scholar]
- [4].Denton BT, “Frontiers of medical decision-making in the modern age of data analytics,” IISE Transactions, vol. 55, no. 1, pp. 94–105, 2023. [Google Scholar]
- [5].Alagoz O, Lowry KP, Kurian AW, Mandelblatt JS, Ergun MA, Huang H, Lee SJ, Schechter CB, Tosteson AN, Miglioretti DL et al. , “Impact of the covid-19 pandemic on breast cancer mortality in the us: estimates from collaborative simulation modeling,” JNCI: Journal of the National Cancer Institute, vol. 113, no. 11, pp. 1484–1494, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Yao B, Zhu R, and Yang H, “Characterizing the location and extent of myocardial infarctions with inverse ecg modeling and spatiotemporal regularization,” IEEE journal of biomedical and health informatics, vol. 22, no. 5, pp. 1445–1455, 2017. [DOI] [PubMed] [Google Scholar]
- [7].Yao B and Yang H, “Physics-driven spatiotemporal regularization for high-dimensional predictive modeling: A novel approach to solve the inverse ecg problem,” Scientific reports, vol. 6, no. 1, pp. 1–13, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Wang Z and Yao B, “Multi-branching temporal convolutional network for sepsis prediction,” IEEE Journal of Biomedical and Health Informatics, 2021. [DOI] [PubMed] [Google Scholar]
- [9].Liu Z, Khojandi A, Li X, Mohammed A, Davis RL, and Kamaleswaran R, “A machine learning–enabled partially observable markov decision process framework for early sepsis prediction,” INFORMS Journal on Computing, vol. 34, no. 4, pp. 2039–2057, 2022. [Google Scholar]
- [10].Shickel B, Tighe PJ, Bihorac A, and Rashidi P, “Deep ehr: a survey of recent advances in deep learning techniques for electronic health record (ehr) analysis,” IEEE journal of biomedical and health informatics, vol. 22, no. 5, pp. 1589–1604, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Janssen KJ, Donders ART, Harrell FE Jr, Vergouwe Y, Chen Q, Grobbee DE, and Moons KG, “Missing covariate data in medical research: to impute is better than to ignore,” Journal of clinical epidemiology, vol. 63, no. 7, pp. 721–727, 2010. [DOI] [PubMed] [Google Scholar]
- [12].Groenwold RH, “Informative missingness in electronic health record systems: the curse of knowing,” Diagnostic and prognostic research, vol. 4, no. 1, pp. 1–6, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].ElSayed NA, Aleppo G, Aroda VR, Bannuru RR, Brown FM Bruemmer D, Collins BS, Gibbons CH, Giurini JM, Hilliard ME et al. , “12. retinopathy, neuropathy, and foot care: Standards of care in diabetes—2023,” Diabetes Care, vol. 46, no. Supplement_1, pp. S203–S215, 2023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Holzinger A, Keiblinger K, Holub P, Zatloukal K, and Müller H, “Ai for life: Trends in artificial intelligence for biotechnology,” New Biotechnology, vol. 74, pp. 16–24, 2023. [DOI] [PubMed] [Google Scholar]
- [15].Calisto FM, Nunes N, and Nascimento JC, “Modeling adoption of intelligent agents in medical imaging,” International Journal of Human-Computer Studies, vol. 168, p. 102922, 2022. [Google Scholar]
- [16].Sivaraman V, Bukowski LA, Levin J, Kahn JM, and Perer A, “Ignore, trust, or negotiate: understanding clinician acceptance of ai-based treatment recommendations in health care,” in Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023, pp. 1–18. [Google Scholar]
- [17].Gleeson F, Revel M-P, Biederer J, Larici AR, Martini K, Frauenfelder T, Screaton N, Prosch H, Snoeckx A, Sverzellati N et al. , “Implementation of artificial intelligence in thoracic imaging—a what, how, and why guide from the european society of thoracic imaging (esti),” European Radiology, pp. 1–10, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Calisto FM, Fernandes J, Morais M, Santiago C, Abrantes JM, Nunes N, and Nascimento JC, “Assertiveness-based agent communication for a personalized medicine on medical imaging diagnosis,” in Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023, pp. 1–20. [Google Scholar]
- [19].Xie J and Yao B, “Hierarchical active learning for defect localization in 3d systems,” IISE Transactions on Healthcare Systems Engineering, no. just-accepted, pp. 1–45, 2023. [Google Scholar]
- [20].Setzer FC, Shi KJ, Zhang Z, Yan H, Yoon H, Mupparapu M, and Li J, “Artificial intelligence for the computer-aided detection of periapical lesions in cone-beam computed tomographic images,” Journal of endodontics, vol. 46, no. 7, pp. 987–993, 2020. [DOI] [PubMed] [Google Scholar]
- [21].Tajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, and Liang J, “Convolutional neural networks for medical image analysis: Full training or fine tuning?” IEEE transactions on medical imaging, vol. 35, no. 5, pp. 1299–1312, 2016. [DOI] [PubMed] [Google Scholar]
- [22].Sanghvi HA, Patel RH, Agarwal A, Gupta S, Sawhney V, and Pandya AS, “A deep learning approach for classification of covid and pneumonia using densenet-201,” International Journal of Imaging Systems and Technology, vol. 33, no. 1, pp. 18–38, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Calisto FM, Santiago C, Nunes N, and Nascimento JC, “Breastscreening-ai: Evaluating medical intelligent agents for human-ai interactions,” Artificial Intelligence in Medicine, vol. 127, p. 102285 2022. [DOI] [PubMed] [Google Scholar]
- [24].Dontchos BN, Yala A, Barzilay R, Xiang J, and Lehman CD, “External validation of a deep learning model for predicting mammographic breast density in routine clinical practice,” Academic Radiology, vol. 28, no. 4, pp. 475–480, 2021. [DOI] [PubMed] [Google Scholar]
- [25].Zhu T, Li K, Herrero P, and Georgiou P, “Deep learning for diabetes: a systematic review,” IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 7, pp. 2744–2757, 2020 [DOI] [PubMed] [Google Scholar]
- [26].Wang Z, Liu C, and Yao B, “Multi-branching neural network for myocardial infarction prediction,” in 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE). IEEE, 2022, pp. 2118–2123. [Google Scholar]
- [27].Xiao C, Choi E, and Sun J, “Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review,” Journal of the American Medical Informatics Association, vol. 25, no. 10, pp. 1419–1428, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Semeraro F, Parrinello G, Cancarini A, Pasquini L, Zarra E, Cimino A, Cancarini G, Valentini U, and Costagliola C, “Predicting the risk of diabetic retinopathy in type 2 diabetic patients,” Journal of Diabetes and its Complications, vol. 25, no. 5, pp. 292–297, 2011. [DOI] [PubMed] [Google Scholar]
- [29].Ogunyemi O and Kermah D, “Machine learning approaches for detecting diabetic retinopathy from clinical and public health records,” in AMIA Annual Symposium Proceedings, vol. 2015. American Medical Informatics Association, 2015, p. 983. [PMC free article] [PubMed] [Google Scholar]
- [30].Piri S, Delen D, Liu T, and Zolbanin HM, “A data analytics approach to building a clinical decision support system for diabetic retinopathy: developing and deploying a model ensemble,” Decision Support Systems, vol. 101, pp. 12–27, 2017. [Google Scholar]
- [31].Wang R, Miao Z, Liu T, Liu M, Grdinovac K, Song X, Liang Y, Delen D, and Paiva W, “Derivation and validation of essential predictors and risk index for early detection of diabetic retinopathy using electronic health records,” Journal of Clinical Medicine, vol. 10, no. 7, p. 1473, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Wang R, Liang Y, Miao Z, and Liu T, “Bayesian analysis for imbalanced positive-unlabelled diagnosis codes in electronic health records,” The annals of applied statistics, vol. 17, no. 2, p. 1220, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Sopharak A, Dailey MN, Uyyanonvara B, Barman S, Williamson T, Nwe KT, and Moe YA, “Machine learning approach to automatic exudate detection in retinal images from diabetic patients,” Journal of Modern optics, vol. 57, no. 2, pp. 124–135, 2010. [Google Scholar]
- [34].Roychowdhury S, Koozekanani DD, and Parhi KK, “Dream: diabetic retinopathy analysis using machine learning,” IEEE journal of biomedical and health informatics, vol. 18, no. 5, pp. 1717–1728, 2013. [DOI] [PubMed] [Google Scholar]
- [35].Reza AW, Eswaran C, and Dimyati K, “Diagnosis of diabetic retinopathy: automatic extraction of optic disc and exudates from retinal images using marker-controlled watershed transformation,” Journal of medical systems, vol. 35, pp. 1491–1501, 2011. [DOI] [PubMed] [Google Scholar]
- [36].Hasan DA, Zeebaree SR, Sadeeq MA, Shukur HM, Zebari RR, and Alkhayyat AH, “Machine learning-based diabetic retinopathy early detection and classification systems-a survey,” in 2021 1st Babylon International Conference on Information Technology and Science (BICITS). IEEE, 2021, pp. 16–21. [Google Scholar]
- [37].Das D, Biswas SK, and Bandyopadhyay S, “A critical review on diagnosis of diabetic retinopathy using machine learning and deep learning,” Multimedia Tools and Applications, vol. 81, no. 18, pp. 25 613–25 655, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].LeCun Y, Bengio Y, and Hinton G, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015. [DOI] [PubMed] [Google Scholar]
- [39].Xie J and Yao B, “Physics-constrained deep learning for robust inverse ecg modeling,” IEEE Transactions on Automation Science and Engineering, 2022. [Google Scholar]
- [40].Wang Z, Stavrakis S, and Yao B, “Hierarchical deep learning with generative adversarial network for automatic cardiac diagnosis from ecg signals,” Computers in Biology and Medicine, p. 106641, 2023. [DOI] [PubMed] [Google Scholar]
- [41].Xie J and Yao B, “Physics-constrained deep active learning for spatiotemporal modeling of cardiac electrodynamics,” Computers in Biology and Medicine, vol. 146, p. 105586, 2022. [DOI] [PubMed] [Google Scholar]
- [42].Zheng Z, Yan H, Setzer FC, Shi KJ, Mupparapu M, and Li J, “Anatomically constrained deep learning for automating dental cbct segmentation and lesion detection,” IEEE Transactions on Automation Science and Engineering, vol. 18, no. 2, pp. 603–614, 2020. [Google Scholar]
- [43].Ting DSW, Cheung CY-L, Lim G, Tan GSW, Quang ND, Gan A, Hamzah H, Garcia-Franco R, San Yeo IY, Lee SY et al. , “Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes,” Jama, vol. 318, no. 22, pp. 2211–2223, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Wang S, Wang X, Hu Y, Shen Y, Yang Z, Gan M, and Lei B, “Diabetic retinopathy diagnosis using multichannel generative adversarial network with semisupervision,” IEEE Transactions on Automation Science and Engineering, vol. 18, no. 2, pp. 574–585, 2020. [Google Scholar]
- [45].Emmert-Streib F and Dehmer M, “Introduction to survival analysis in practice,” Machine Learning and Knowledge Extraction, vol. 1, no. 3, pp. 1013–1038, 2019 [Google Scholar]
- [46].Cismondi F, Fialho AS, Vieira SM, Reti SR, Sousa JM, and Finkelstein SN, “Missing data in medical databases: Impute, delete or classify?” Artificial intelligence in medicine, vol. 58, no. 1, pp. 63–72, 2013. [DOI] [PubMed] [Google Scholar]
- [47].Imani F, Cheng C, Chen R, and Yang H, “Nested gaussian process modeling and imputation of high-dimensional incomplete data under uncertainty,” IISE Transactions on Healthcare Systems Engineering, vol. 9, no. 4, pp. 315–326, 2019. [Google Scholar]
- [48].He H and Garcia EA, “Learning from imbalanced data,” IEEE Transactions on knowledge and data engineering, vol. 21, no. 9, pp. 1263–1284, 2009. [Google Scholar]
- [49].Krawczyk B, “Learning from imbalanced data: open challenges and future directions,” Progress in Artificial Intelligence, vol. 5, no. 4, pp. 221–232, 2016. [Google Scholar]
- [50].Mani I and Zhang I, “knn approach to unbalanced data distributions: a case study involving information extraction,” in Proceedings of workshop on learning from imbalanced datasets, vol. 126. ICML, 2003, pp. 1–7. [Google Scholar]
- [51].Chawla NV, Bowyer KW, Hall LO, and Kegelmeyer WP, “Smote: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002. [Google Scholar]
- [52].Yao B, “Spatiotemporal modeling and optimization for personalized cardiac simulation,” IISE Transactions on Healthcare Systems Engineering, vol. 11, no. 2, pp. 145–160, 2021. [Google Scholar]
- [53].Cichocki A, Zdunek R, and Amari S.-i., “Nonnegative matrix and tensor factorization [lecture notes],” IEEE signal processing magazine, vol. 25, no. 1, pp. 142–145, 2007 [Google Scholar]
- [54].Acar E, Dunlavy DM, Kolda TG, and Mørup M, “Scalable tensor factorizations for incomplete data,” Chemometrics and Intelligent Laboratory Systems, vol. 106, no. 1, pp. 41–56, 2011. [Google Scholar]
- [55].Kolda TG and Bader BW, “Tensor decompositions and applications,” SIAM review, vol. 51, no. 3, pp. 455–500, 2009. [Google Scholar]
- [56].Sidiropoulos ND, De Lathauwer L, Fu X, Huang K, Papalexakis EE, and Faloutsos C, “Tensor decomposition for signal processing and machine learning,” IEEE Transactions on Signal Processing, vol. 65, no. 13, pp. 3551–3582, 2017. [Google Scholar]
- [57].Nocedal J and Wright SJ, Numerical optimization. Springer, 1999. [Google Scholar]
- [58].Moré JJ and Thuente DJ, “Line search algorithms with guaranteed sufficient decrease,” ACM Transactions on Mathematical Software (TOMS), vol. 20, no. 3, pp. 286–307, 1994. [Google Scholar]
- [59].Bai S, Kolter JZ, and Koltun V, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” arXiv preprint arXiv:1803.01271, 2018. [Google Scholar]
- [60].He K, Zhang X, Ren S, and Sun J, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. [Google Scholar]
- [61].Ioffe S and Szegedy C, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning. pmlr, 2015, pp. 448–456. [Google Scholar]
- [62].Srivastava N, Hinton G, Krizhevsky A, Sutskever I, and Salakhutdinov R, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014. [Google Scholar]
- [63].Zhou B, Cui Q, Wei X-S, and Chen Z-M, “Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9719–9728. [Google Scholar]
- [64].Huo Z, Qian X, Huang S, Wang Z, and Mortazavi BJ, “Density-aware personalized training for risk prediction in imbalanced medical data,” in Machine Learning for Healthcare Conference. PMLR, 2022, pp. 101–122. [Google Scholar]
- [65].Liu S, Liu S, Cai W, Pujol S, Kikinis R, and Feng D, “Early diagnosis of alzheimer’s disease with deep learning,” in 2014 IEEE 11th international symposium on biomedical imaging (ISBI). IEEE, 2014, pp. 1015–1018. [Google Scholar]
- [66].Garg A, Kirby JS, Lavian J, Lin G, and Strunk A, “Sex-and age-adjusted population analysis of prevalence estimates for hidradenitis suppurativa in the united states,” JAMA dermatology, vol. 153, no. 8, pp. 760–764, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [67].Chang C-C, Yeh J-H, Chiu H-C, Chen Y-M, Jhou M-J, Liu T-C, and Lu C-J, “Utilization of decision tree algorithms for supporting the prediction of intensive care unit admission of myasthenia gravis: A machine learning-based approach,” Journal of Personalized Medicine, vol. 12, no. 1, p. 32, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [68].Bottou L, “Large-scale machine learning with stochastic gradient descent,” in Proceedings of COMPSTAT’2010: 19th International Conference on Computational StatisticsParis France, August 22–27, 2010 Keynote, Invited and Contributed Papers. Springer, 2010, pp. 177–186. [Google Scholar]
- [69].Kingma DP and Ba J, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014. [Google Scholar]
