Skip to main content
NPJ Digital Medicine logoLink to NPJ Digital Medicine
. 2025 Jan 2;8:1. doi: 10.1038/s41746-024-01410-3

A multitask deep learning model utilizing electrocardiograms for major cardiovascular adverse events prediction

Ching-Heng Lin 1,2, Zhi-Yong Liu 1, Pao-Hsien Chu 3,4, Jung-Sheng Chen 1, Hsin-Hsu Wu 5,6, Ming-Shien Wen 3,4, Chang-Fu Kuo 1,4, Ting-Yu Chang 4,7,
PMCID: PMC11696183  PMID: 39747648

Abstract

Deep learning analysis of electrocardiography (ECG) may predict cardiovascular outcomes. We present a novel multi-task deep learning model, the ECG-MACE, which predicts the one-year first-ever major adverse cardiovascular events (MACE) using 2,821,889 standard 12-lead ECGs, including training (n = 984,895), validation (n = 422,061), and test (n = 1,414,933) sets, from Chang Gung Memorial Hospital database in Taiwan. Data from another independent medical center (n = 113,224) was retrieved for external validation. The model’s performance achieves AUROCs of 0.90 for heart failure (HF), 0.85 for myocardial infarction (MI), 0.76 for ischemic stroke (IS), and 0.89 for mortality. Furthermore, it outperforms the Framingham risk score at 5-year MACEs and 10-year mortality prediction. Over 10-year follow-ups, the model-predicted-positive group exhibits significantly higher MACE incidences than the model-predicted-negative group (relative incidence ratio: HF: 15.28; MI: 7.87; IS: 4.74; mortality: 13.18). Using solely ECGs, ECG-MACE effectively predicts one-year events and exhibits long-term anticipation. It provides potential applications in preventive medicine.

Subject terms: Outcomes research, Prognosis

Introduction

Cardiovascular disease (CVD) remains the leading cause of global mortality and contributes to a substantial burden on public health worldwide. In 2019, CVD accounted for 32% of all global deaths, and up to 85% of these deaths were caused by ischemic heart disease and stroke1. Aside from advanced management strategies, there is a growing emphasis on prevention, early detection, and prognostic prediction to address the multifaceted challenges posed by CVD.

The electrocardiogram (ECG) stands as a readily accessible tool widely used for screening and diagnosing cardiac diseases in routine clinical practice. It translates the electrical signal of the cardiac rhythm into interpretable information. With the advancement of technology, various machine learning or deep learning (DL) models have been created to analyze ECG for assessing arrhythmias2, cardiomyopathies3, and heart failure (HF)4,5. While DL-based ECG analyses have demonstrated comparable accuracy to expert readings and shown promise in predicting disease prognosis, their applications predominantly focused on populations with known CVDs. A few recent studies have created ECG-based DL models among hospital-based populations to predict future events, including 30-day to 5-year mortality6 or long-term CVDs7. Despite these models making a significant leap forward in integrating DL into ECG applications for preventive medicine, there remain knowledge gaps. Specifically, models predicting multiple major cardiovascular events in the shorter term (1-year duration) are very limited, and data related to the Asian population is lacking. Consequently, the development of DL-based ECG predictive models for application to the general population is still in its nascent stages6.

This study aims to build up an innovative ECG DL model to predict four categories of first-ever major adverse cardiovascular events (MACE)8 within a one-year timeframe, encompassing non-fatal myocardial infarction (MI), non-fatal ischemic stroke (IS), hospitalization-requiring HF, and all-cause mortality. Leveraging a multi-center large dataset covering over 2 million ECG readings and 1.7 million subjects, we adopted a multi-task DL approach, rather than a single-task approach, to exploit valuable information from diverse learning tasks, thereby enhancing the generalization performance of each task9. The model’s performance was validated for each category across different sex and age groups, as well as externally validated for mortality. Additionally, the assessment of long-term (3-, 5-, and 10-year) outcomes using this model was verified and compared to the risk stratification by Framingham risk score (FRS)10.

Results

Figure 1 presents a schematic overview illustration of this study. The deep neural network model, ECG-MACE, was built to predict four major events. The model performance was extensively evaluated, including internal and external validation, comparison with FRS, and long-term outcome projections.

Fig. 1. Schematic overview of current study.

Fig. 1

The ECG-MACE, a deep neural network model, is built to predict four events. The model’s performance was extensively evaluated, including internal and external validation, comparison with the Framingham risk score, and long-term outcome projections.

Patient characteristics

The study analyzed 2,821,889 ECGs from 1,078,629 individuals, divided into three independent sets: 984,895 ECGs in the training set (mean age=60.1 ± 14.5), 422,061 ECGs in the validation set (mean age=59.9 ± 14.5), and 1,414,933 ECGs in the test set (mean age=60.2 ± 14.5). The demographic characteristics of each set are presented in Table 1. Age, sex, and major cardiovascular risk factors showed no significant difference among the sets (Absolute Standardized Mean Difference (ASMD) all < 0.2, indicating minimal variability). The demographic characteristics of the TSGH dataset for external validation are presented in Supplementary Table 1.

Table 1.

Baseline characteristics and comorbidities of training, validation, and test sets

Characteristics Training (n = 984,895) Validation (n = 422,061) Test (n = 1,414,933) ASMD a,b
Age years, mean ± SD 60.1 ± 14.5 59.9 ± 14.5 60.2 ± 14.5 0.014
Age groups, n(%)
30–39 93,894 (9.5) 40,364 (9.6) 134,634 (9.5)
40–49 163,379 (16.6) 71,512 (16.9) 233,800 (16.5)
50–59 232,622 (23.6) 100,596 (23.8) 334,944 (23.7)
60–69 229,868 (23.3) 98,417 (23.3) 330,446 (23.4)
70–79 171,597 (17.4) 72,105 (17.1) 245,855 (17.4)
80+ 93,535 (9.5) 39,067 (9.3) 135,254 (9.6)
Sex, n(%) 0.014
Female 491,671 (49.9) 209,651 (49.7) 705,153 (49.8)
Male 493,224 (50.1) 212,410 (50.3) 709,780 (50.2)
Medical history, n(%)
Diabetes mellitus 241,385 (24.5) 102,133 (24.2) 348,555 (24.6) 0.010
Hyperlipidaemia 44,943 (4.6) 17,900 (4.2) 61,879 (4.4) 0.016
ESRD and CKD 255,147 (25.9) 107,930 (25.6) 366,761 (25.9) 0.009
Hypertension 471,237 (47.9) 199,600 (47.3) 678,196 (47.9) 0.013
Coronary artery disease 150,989 (15.3) 64,926 (15.4) 218,625 (15.5) 0.013
Without any 389,258 (39.5) 168,610 (40.0) 556,994 (39.4) 0.012
Myocardial infarction, n (%)
Within 3 months 2943 (0.3) 1237 (0.3) 3986 (0.3) 0.003
Within 6 months 4051 (0.4) 1750 (0.4) 5564 (0.4) 0.003
Within 9 months 4935 (0.5) 2071 (0.5) 6678 (0.5) 0.004
Within 1 years 5742 (0.6) 2422 (0.6) 7703 (0.5) 0.005
Heart Failure, n(%)
Within 3 months 5504 (0.6) 2576 (0.6) 8016 (0.6) 0.007
Within 6 months 8209 (0.8) 3777 (0.9) 11,824 (0.8) 0.007
Within 9 months 10,292 (1.0) 4603 (1.1) 14,810 (1.1) 0.004
Within 1 years 12,079 (1.2) 5364 (1.3) 17,377 (1.2) 0.004
Ischemic stroke, n(%)
Within 3 months 2740 (0.3) 1194 (0.3) 3776 (0.3) 0.003
Within 6 months 4287 (0.4) 1890 (0.5) 5992 (0.4) 0.004
Within 9 months 5591 (0.6) 2466 (0.6) 7925 (0.6) 0.003
Within 1 years 6807 (0.7) 2984 (0.7) 9717 (0.7) 0.002
Death, n(%)
Within 3 months 62,342 (6.3) 26,534 (6.3) 89,720 (6.3) 0.002
Within 6 months 83,441 (8.5) 35,160 (8.3) 119,843 (8.5) 0.005
Within 9 months 98,954 (10.1) 41,810 (9.9) 141,661 (10.0) 0.005
Within 1 years 112,469 (11.4) 47,361 (11.2) 160,637 (11.4) 0.006

n number of ECG, ASMD Absolute standardized mean difference, CKD chronic kidney disease, ESRD End-stage renal disease.

aThe ASMD larger than 0.2 is considered a sign of important imbalance.

bThe balance of covariates between training and validation, training and test, validation and test datasets for CGMH were measured using ASMD. The largest value of ASMD was reported.

ECG-MACE performance in predicting one-year events

Based on the multi-task approach, ECG-MACE predicted 1-year events of HF, MI, IS, and all-cause mortality with the area under the receiver operating characteristic (AUROC) values of 0.90, 0.85, 0.76, and 0.89, respectively. The model’s performance including AUROC, sensitivity, and specificity are detailed in Table 2. Compared to the single-task learning model revealing AUROC of 0.83 for HF, 0.76 for MI, 0.66 for IS, and 0.88 for mortality, the ECG-MACE multi-task model yielded superior discriminative performance (Fig. 2). Notably, predicting IS was particularly challenging (AUROC only reached 0.66 under the single-task learning model). The multi-task learning approach improved IS prediction, resulting in a 0.1 increase in the AUROC score to a final value of 0.76, thereby achieving a 15% improvement. An age-and-sex-stratified comparison of ECG-MACE’s performance was conducted on the test set (Fig. 3). Overall, ECG-MACE revealed superior performance in females, particularly for MI in those younger than 70. Predictive ability declined with age except for mortality and was notably more intricate among the group aged 80-89. While AUROC remained over 0.83 for predicting mortality, it only reached 0.59 for elder men and 0.62 for elder women for predicting IS.

Table 2.

The area under the receiver operating characteristic (AUROC), sensitivity and specificity of the ECG-MACE model on predicting one-year heart failure, myocardial infarction, ischemic stroke, and all-cause mortality

Heart failure Myocardial infarction Ischemic stroke Mortality
AUROC 0.90 0.85 0.76 0.89
Sensitivity 0.83 0.76 0.75 0.81
Specificity 0.81 0.79 0.63 0.83

Fig. 2. Receiver operating characteristic (ROC) curves of ECG-MACE model and single prediction models.

Fig. 2

a ROC curves of the one-year heart failure prediction. b ROC curves of the one-year ischemic stroke prediction. c ROC curves of the one-year myocardial infarction prediction. d ROC curves of the one-year all-cause mortality prediction.

Fig. 3. Stratified comparison of ECG-MACE model.

Fig. 3

The ECG-MACE was tested on either sex and different age groups for one-year heart failure, myocardial infarction, ischemic stroke, and all-cause mortality prediction.

External validation

TSGH, a medical center located in Taipei, Taiwan, provided 113,224 ECGs from 54,036 adult patients for external validation. Given the restricted accessibility of comprehensive clinical information, our model underwent validation solely for predicting mortality using this external dataset. In this validation, ECG-MACE achieved an AUROC value of 0.83 for predicting 1-year mortality in the TSGH cohort (Supplementary Fig. 1).

ECG-MACE model’s ability to predict long-term events

While this study focused on predicting 1-year MACE events, we also examined the model’s long-term predictive ability. Subjects in the test set were categorized and followed according to their model-predicted results (predicted-positive versus predicted-negative). We calculated the incidence ratio by dividing the index event incidence in the predicted-positive group by that in the predicted-negative group, offering a relative incidence ratio for the predicted-positive group. Figure 4 illustrates the relative incidence ratio of new events over 1, 3, 5, and 10 years in the predicted-positive group. Compared to the predicted-negative, the predicted-positive group showed a remarkably higher incidence of HF (37-fold), MI (17-fold), IS (7-fold), and mortality (23-fold) within one year. This trend persisted at 3, 5, and 10 years, albeit with a slightly reduced relative risk in the predicted-positive group during the extended follow-up period; the 10-year relative incidence ratio remained high as 15.28 for HF, 7.87 for MI, and 13.18 for mortality (Fig. 4).

Fig. 4. Line graph of relative incidence ratio of new events over a follow-up period of 1, 3, 5, and 10 years.

Fig. 4

The relative incidence ratio for the predicted-positive group was calculated based on dividing the index event incidence in the predicted-positive group by that in the predicted-negative group. The predicted-positive group shows a remarkably higher incidence of heart failure, myocardial infarction, ischemic stroke, and mortality within one year. This trend persists for 3, 5, and 10 years.

ECG-MACE performance compared to FRS

Subsequently, FRS was applied to a subgroup with documented smoking status to validate our model’s long-term predictive yield, involving 19,097 ECGs in the training set and 27,471 ECGs in the test set. We employed a logistic regression model incorporating FRS factors. The high-risk FRS group was designated “predicted-positive”. Our model exhibited a higher AUROC than high-risk FRS for predicting future MACEs. It consistently outperformed FRS in predicting major cardiovascular events in the short-term (1-year) and mid-term (5-year) follow-ups, as well as in predicting long-term (10-year) HF and mortality (Supplementary Table 2).

ECG-MACE performance on controls

In the test set, 583,473 ECGs were identified as controls, with their characteristics listed in Supplementary Table 3. Evaluation of ECG-MACE on the control subgroup exhibited enhanced predictive capabilities, as illustrated in Fig. 5. The model’s AUROC manifested an increase of 0.03 for HF to 0.93, 0.05 for MI to 0.9, 0.02 for stroke to 0.78, and 0.04 for all-cause mortality to 0.93, indicating the effectiveness of the ECG-MACE in accurately predicting adverse outcomes in the control group.

Fig. 5. The performance of ECG-MACE model tested on the healthy controls.

Fig. 5

Improved predictive capabilities of the model exhibit an increase of 0.03 for heart failure, 0.05 for myocardial infarction, 0.02 for ischemic stroke, and 0.04 for all-cause mortality. AUROC: area under the receiver operating characteristic curve.

The explainability of the model’s behaviors

The interpretability of the model’s performance was facilitated using saliency maps (Supplementary Figs. 25), which helped to identify the main contributing features of ECGs to the prediction process. The saliency maps predominantly highlighted the ST-T segment of precordial leads, indicating its paramount importance in outcome prediction. Additionally, the P waves of limb leads, particularly lead I and lead II, also contributed as salient features.

Discussion

In this study, we developed the “ECG-MACE” model, a multi-task DL model utilizing resting ECG signals, to accurately predict multiple first-ever cardiovascular events within one year, including MI, IS, HF, and all-cause mortality. The model demonstrated excellent performance, achieving AUROCs of 0.90 for HF, 0.85 for MI, and 0.89 for all-cause mortality, although with a more modest performance in predicting stroke (AUROC of 0.76). When applied to control subjects without cardiovascular risk factors, the model exhibits even enhanced predictive accuracy. Furthermore, external validation has consistently demonstrated robust predictive capability for mortality, achieving an AUROC of 0.83. Notably, this external validation set was collected using a disparate system, reinforcing the model’s applicability across different ECG devices.

In comparison to FRS, our model outperformed in predicting 1-year and 5-year events while achieving comparable performance for 10-year prediction, with the AUROCs for predicting HF and all-cause mortality surpassing FRS. ECG-MACE not only identified individuals at risk for near-term MACE events within one year but effectively demonstrated long-term risk prediction, providing valuable insights for primary prevention.

In recent decades, guidelines have steadily emphasized the significance of risk assessment in primary prevention strategies for atherosclerotic cardiovascular disease (ASCVD)11. Widely used risk estimation systems, such as the FRS, the pooled cohort equations12, and the ASCVD risk calculator, typically focus on assessing the 10-year risk by relying on detailed demographics, personal history (smoking), comorbidities, and laboratory tests. However, a noteworthy gap exists in tools for estimating shorter-term ASCVD risk in primary prevention. Moreover, most 10-year ASCVD assessment tools were developed based on Western populations11, potentially yielding inconsistent or heterogeneous results when applied to the Asian population13. Our ECG-MACE model provides precise predictions for several cardiovascular outcomes in 1 year, relying solely on resting ECGs and eliminating the need for extensive personal information or laboratory tests. This tool is feasible and efficient for daily clinical practice, especially suggesting applicability in Asian populations.

Machine learning and DL models have played significant roles in individual-level ASCVD risk prediction nowadays. For instance, a convolutional neural network-based model was built to distinguish normal and MI ECG beats, achieving average accuracies of 93.53% and 95.22% with and without ECG signal noise removal, respectively14. In a registry-based study, a DL model using 51 variables (demographic information, clinical presentations, and others) from the Korea Acute Myocardial Infarction Registry predicted MACE occurrence up to twelve months after hospital admission, showing AUROCs of 0.97, 0.94, and 0.96 at one-, six-, and twelve-month follow-ups, respectively15. Additionally, Betancur et al. trained a LogitBoost model with combining clinical data and structured information from myocardial perfusion imaging to predict 3-year MACE, surpassing a model with only imaging structured data (AUROC of 0.81 vs. 0.78)16. While these studies or others1719 showed good performance with single-modality machine learning/DL models, they were often limited by structured data inputs or complex variables. In contrast, our proposed model can handle unstructured data (raw ECG signals) and jointly learn representative features through the multimodal multi-task learning approach, achieving remarkable performance in predicting MI, HF, and mortality.

The fundamental assumption of multi-task DL is the interrelation of learning tasks, suggesting that jointly learning tasks via shared information can enhance overall performance. Many studies have shown that multi-task DL models may perform better than single-task models2023. One example in medicine is DeepBeat, a multi-task DL model for real-time atrial fibrillation detection in wearable photoplethysmography devices. Compared to a single-task model, the multi-task learning approach significantly improved the model’s performance, increasing the F1 score from 0.54 to 0.9624. Similarly, our multi-task model exhibited superior performance compared to single-task approach during model development. Given the strong correlation among major cardiovascular events, multi-task learning, which combines multiple tasks since multiple outcomes are associated with each other, proves practical in predicting cardiovascular outcomes. The robust performance of our model in predicting MI, HF, and mortality may be attributed to the well-established links between MI, subsequent HF, and high mortality within one year25,26.

The ECG-MACE demonstrated superior performance among females in predicting MI, particularly in individuals aged 40 to 70. Many studies over the decades have shown that MI incidence is higher in men than in women27,28. Compared to women, men with MI tend to have more cardiovascular risk factors, which may explain the younger onset of MI in men27. With age, the gap in the number of risk factors between sexes diminishes28. The better MI prediction performance in women by ECG-MACE could be attributed to the higher prevalence of comorbidities and risk factors in men within this age group, leading to more diverse disease profiles in men than in women. Additionally, earlier studies noted gender-related ECG differences after MI, with women showing a higher percentage of ST-T changes in lateral precordial leads29. As seen in the model’s saliency maps (discussed below), these ECG differences may also contribute to better performance in predicting female MI.

Differently, our model exhibited only acceptable performance in predicting IS (AUROC = 0.76). When verifying in the test cohort, subjects predicted as positive for IS had a modestly higher incidence risk 7.32 times greater than predicted negative ones within one year. This might be attributed to the heterogeneous nature of stroke etiology30, with only specific subtypes of IS (such as atherosclerotic and cardioembolic) showing stronger association with other CVDs31. Additionally, our model demonstrated a low F1 score (Supplementary Table 2), which could be influenced by class imbalance with low event incidence in our cohort32, a common challenge as in other studies7. F1 score alone may not comprehensively reflect our model’s performance.

The interpretation of our model is enhanced through the saliency maps. Prominent features contributing to the model include the ST-T segment of precordial leads and the P waves of limb leads. Precordial ST-T changes are not only characteristic findings during acute MI33,34 but are also associated with left ventricular aneurysm, which can lead to subsequent HF35. ST-T depression is also commonly observed in patients with IS36,37. Meanwhile, P wave parameters are increasingly recognized as indicators of poor health outcomes, including IS, sudden cardiac death, and heart failure38. These salient features provide perspectives for physicians in evaluating cardiovascular prognosis.

While our model shows promising results in predicting multiple cardiovascular events, it has certain limitations that warrant consideration. Firstly, this study is ethnically confined to the Han-Chinese population, which constitutes 95% of Taiwan’s population, limiting the generalizability of our findings to other ethnic groups. Despite our data collection spanning urban, suburban, and rural areas from north to south in Taiwan, caution should be addressed when extrapolating these results to diverse populations. Secondly, being a hospital-based study, the dataset predominantly comprised patients with pre-existing health conditions, potentially introducing bias when applied to healthier subjects. Nonetheless, our cohort included subjects from health check centers (4.4%) and outpatient clinics (30.8%). We also tested the model on a subgroup (the controls), defined as subjects with no major cardiovascular risk factors. This diverse inclusion enriches the real-world relevance of our findings, offering insights into the model’s adaptability across various health contexts. Thirdly, we cannot ascertain whether patients who had no subsequent medical activities in our medical system experienced MACE due to data limitations. Consequently, these right-censored time-to-event data were excluded from the study. The inability to handle right-censored time-to-event data represents a limitation of our model. Finally, the inherent opacity of DL models presents a challenge, as we are unable to fully comprehend the decision-making process of artificial intelligence, highlighting the importance of interpretability in model outcomes. The saliency map (Supplementary Figs. 25) may supply some clues to help understand the model’s decision-making process and potentially identify ECG features with predictive value.

In conclusion, our ECG-MACE model provides accurate prediction for 1-year multiple cardiovascular events, encompassing MI, HF, and all-cause mortality. The trend of the model’s predictive capability extends up to 10 years, suggesting potential value in facilitating early intervention. Multi-task learning approach strengthens the capacity of DL-based ECG in preventive medicine. The modest performance in predicting 1-year IS implies the complexity of stroke. Future research should incorporate subjects from diverse ethnicities and with varied health conditions in a prospective study design, to broaden the clinical applicability of this economical and convenient ECG tool.

Methods

Study populations and data sources

This study encompassed all individuals who underwent a standardized 12-lead ECG at Chang Gung Memorial Hospital (CGMH) between October 2007 and December 2019. CGMH, the largest hospital system in Taiwan, comprises seven branches (Keelung, Taipei, Linkou, Taoyuan, Yunlin, Chiayi, and Kaohsiung) across diverse settings, including urban, suburban, and rural areas, with two medical centers (Linkou and Kaohsiung branches). All ECG records acquired within the CGMH system were linked to the Chang Gung Research Database, an electronic health record system for patients visiting any CGMH branch. The initial cohort included 4,932,544 ECGs from 1,684,294 individuals, with exclusions for those under 30 years old (given the age distribution of severe congenital heart disease mainly below this age)39, subjects with pacemakers, those without medical records in our system in the following one year after ECG, and those with a history of HF, MI, or IS. ECGs with acquisition dates matching the date of diagnosis were also excluded. Ultimately, the analysis included 2,821,889 ECGs from 1,078,629 individuals (Fig. 1). Patients’ demographic information and medical history were extracted from Chang Gung Research Database electronic medical records at the time of ECG recording. The distribution of patient sources, including outpatient clinics of cardiology, emergency room, health check center, intensive care units, and others, are detailed in Supplementary Table 4.

Using the simple random sampling technique, patients were randomly selected to compose a training set, a validation set, and a test set. This approach ensured an equal probability and non-replacement selection of patients for the aforementioned sample sets. Importantly, the selection procedure was designed to assign each patient exclusively to one of the three sets, i.e. ECGs of the same patients were all assigned to the same group. For external validation, we collected an additional 113,224 ECGs from a total of 54,036 adult patients at Tri-Service General Hospital (TSGH) between Apr. 2010 and Oct. 2020. This dataset was used to independently assess and validate our model’s ability to predict mortality. Figure 6 illustrates the data selection process.

Fig. 6. Summary of the data used in the study.

Fig. 6

The data from the Chang Gung Memorial Hospital (CGMH) was split into training, validation, and test sets. Data for external validation were collected from the Tri-Service General Hospital (TSGH).

This study was approved by the Institutional Review Board of Chang Gung Medical Foundation (IRB No.:202101339B0). Informed consent was waived as no identifiable information was used in this retrospective analysis. All procedures conducted in this study adhered to the relevant clinical research guidelines and regulations.

Definition of controls

From the test set, we identified a control subgroup that had no history of major cardiovascular comorbidities, including hypertension, diabetes mellitus, chronic kidney disease, end-stage renal disease, and dyslipidemia. Diagnostic codes associated with these conditions, using the International Classification of Diseases, Ninth and Tenth Revision, Clinical Modification (ICD-9-CM and ICD-10-CM), are listed in Supplementary Table 5. The model’s performance was evaluated both on the overall study population and the controls.

Definitions of outcomes

The MACEs investigated in this study included non-fatal MI, non-fatal IS, hospitalization-requiring HF, and all-cause mortality. The diagnosis of MI, IS, and HF was identified using the ICD-9-CM and ICD-10-CM codes (Supplementary Table 6) and determined based on the first two diagnoses at discharge. Survival status was cross-linked to the National Registry of Deaths provided by the Ministry of Health and Welfare in Taiwan.

ECG acquisition

Resting standard 12-lead ECGs with 10-second voltage-time traces were captured using a MAC5000 ECG machine (GE Healthcare, Chicago, IL, USA) at a sampling rate of 500 Hz. These ECGs were digitally archived in the MUSE™ Cardiology Information system (General Electric, Ltd., 2012). The ECGs from the TSGH dataset were acquired using a Philips system, with the same sampling rate and recording time settings as those employed for the CGMH dataset. Each standard 12-lead ECG was subsequently converted into a 12 × 5000 matrix, serving as the input for the model. No additional padding was applied to either dataset.

Multi-task deep learning model development and training procedure

A multi-task learning model, named ECG-MACE, was developed to predict MACEs by learning novel features from 12-lead ECG data. Many studies have demonstrated the ability of ResNet and its variants in ECG feature extraction4043. By hard parameter sharing strategy, we used a ResNet-18 neural network to extract features from patients’ 12-lead ECG. Those extracted features were further fed into four subnetworks for predicting MACEs.

The proposed overall model architecture is illustrated in Fig. 7. This multi-task model architecture contained a convolution layer followed by a residual block of static channel and three residual blocks of incremental channel. The output of each convolutional layer was followed by batch normalization for distribution normalization and fed into a rectified linear unit activate function layer and a dropout layer for model regularization. The output of the last block was fed into a hybrid (max and average) pooling44. Finally, the output of the hybrid pooling layer was directed to four fully connected subnetworks. Each subnetwork was equipped with a single softmax layer, serving as the final output layer to predict MI, IS, HF, and all-cause mortality, respectively.

Fig. 7. The architecture of ECG-MACE model with multitask learning.

Fig. 7

The output of the hybrid pooling layer is directed to four fully connected subnetworks, each equipped with a single softmax layer, serving as the final output layer to predict death, ischemic stroke (IS), myocardial infarction (MI), and heart failure (HF). Conv, convolutional layer; FC layer, fully connected layer.

In multi-task learning, combining subtask losses is commonly achieved by summing them or using a manually tuned weighted sum. However, experimentally determining the best weight is time-consuming and computationally expensive. In this study, we addressed this issue by adopting the uncertainty weighted loss45: Losstotal=iwiLi, which weighs multiple loss functions by considering the homoscedastic uncertainty of each task. The weights are trainable parameters based on the uncertainty of each task. In our binary classification task, the uncertainty weighted loss is Losstotal=iwi(j2yjlogpj)i, where yj and pj denote the true label and predicted probability of the true label, respectively. Given the imbalanced data with a large proportion belonging to “no event”, we further modified the loss function by incorporating the Focal loss46. The Focal Loss function applies a modulating term to the cross-entropy loss to focus learning on hard misclassified samples. The final loss function used in our proposed model is Losstotal=iwi(j2(jpj)γlogpj)i, where γ is the focusing parameter to be tuned.

ECG-MACE was optimized using Adam stochastic gradient descent or other suitable optimizers. Model hyperparameters were selected via grid search hyperparameter optimization in the validation set. All training was performed on an NVIDIA DGX-1 platform with two V100 GPUs and 32 GB of RAM per GPU (Santa Clara, CA) provided by the Center for Artificial Intelligence in Medicine at CGMH.

Statistical analysis and model evaluation

We used the ASMD to compare the demographics among the three sets for statistical significance, reporting the largest ASMD value. An ASMD larger than 0.2 indicated an important imbalance (statistical difference). Performance measures and model behaviors were exclusively evaluated in the test set. For classification models, the discriminative ability was assessed using the AUROC. The Youden index was used to assess optimal cut-off values. All statistical analyses employed SAS statistical software Ver.9.4. ECG-MACE’s behavior was evaluated using saliency maps47 to objectively identify salient regions in a particular lead contributing to MACEs. Four baseline predictive models for MI, HF, IS, and all-cause mortality were constructed following the same manner as the ECG-MACE model. The FRS Cox proportional hazards model was fit with sex, age at the time of ECG recording, total cholesterol, high-density lipoprotein cholesterol, diastolic and systolic blood pressure, diabetes, and smoking status. We selected patients who had documented information regarding their smoking status for FRS-Cox training and testing. The training set included 19,097 ECGs, while the test set comprised 27,471 ECGs.

Supplementary information

Supplemental Material (2.5MB, pdf)

Acknowledgements

The authors thank the Center for Artificial Intelligence in Medicine, Chang Gung Memorial Hospital for supporting computational resources, as well as Tri-Service General Hospital for providing external validation data. This work was supported by funding from the Chang Gung Memorial Hospital Research Project (CORPG3L0461, CORPG3N0381 and CLRPG3H0015) and the Ministry of Science and Technology, Taiwan (MOST111–2321-B-182A-004).

Author contributions

C.H.L. and T.Y.C conceived the idea of the study. C.H.L and Z.Y.L implemented the machine learning approaches. C.H.L. and T.Y.C drafted the manuscript. C.H.L. and Z.Y.L. conducted experiments. J.S.C. conducted the statistical analysis. T.Y.C and C.F.K. interpreted the results and provided practical suggestions to this study. P.H.C and M.S.W. collected and provided datasets. H.H.W. provided advices and inputs to the manuscript. All authors contributed to the review of the manuscript and approved the final version.

Data availability

The datasets generated and/or analyzed in the study are not publicly available due to institutional policy and the privacy of study individuals but are available from the corresponding author upon request under the data-sharing agreements between institutions.

Code availability

The underlying code for this study is not publicly available for proprietary reasons but could be available to specialists and researchers at reasonable request from the corresponding author.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

The online version contains supplementary material available at 10.1038/s41746-024-01410-3.

References

  • 1.Global Health Estimates: Life expectancy and leading causes of death and disability. (2020).
  • 2.Attia, Z. I. et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet394, 861–867 (2019). [DOI] [PubMed] [Google Scholar]
  • 3.Ko, W. Y. et al. Detection of Hypertrophic Cardiomyopathy Using a Convolutional Neural Network-Enabled Electrocardiogram. J. Am. Coll. Cardiol.75, 722–733 (2020). [DOI] [PubMed] [Google Scholar]
  • 4.Huang, Y.-C. et al. Artificial intelligence-enabled electrocardiographic screening for left ventricular systolic dysfunction and mortality risk prediction. Front. Cardiovasc. Med.1010.3389/fcvm.2023.1070641 (2023). [DOI] [PMC free article] [PubMed]
  • 5.Li, X. M. et al. Electrocardiogram-based artificial intelligence for the diagnosis of heart failure: a systematic review and meta-analysis. J. Geriatr. Cardiol.19, 970–980 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sun, W. et al. Towards artificial intelligence-based learning health system for population-level mortality prediction using electrocardiograms. NPJ Digit Med.6, 21 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hughes, J. W. et al. A deep learning-based electrocardiogram risk score for long term cardiovascular death and disease. npj Digital Med.6, 169 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Bosco, E., Hsueh, L., McConeghy, K. W., Gravenstein, S. & Saade, E. Major adverse cardiovascular event definitions used in observational analysis of administrative databases: a systematic review. BMC Med. Res. Methodol.21, 241 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Caruana, R. Multitask learning. Mach. Learn.28, 41–75 (1997). [Google Scholar]
  • 10.D’Agostino, R. B. Sr. et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation117, 743–753 (2008). [DOI] [PubMed] [Google Scholar]
  • 11.Lloyd-Jones, D. M. et al. Use of Risk Assessment Tools to Guide Decision-Making in the Primary Prevention of Atherosclerotic Cardiovascular Disease: A Special Report From the American Heart Association and American College of Cardiology. J. Am. Coll. Cardiol.73, 3153–3167 (2019). [DOI] [PubMed] [Google Scholar]
  • 12.Goff, D. C. et al. 2013 ACC/AHA Guideline on the Assessment of Cardiovascular Risk. Circulation129, S49–S73 (2014). [DOI] [PubMed] [Google Scholar]
  • 13.Zhang, Y. et al. Cardiovascular risk assessment tools in Asia. J. Clin. Hypertens. (Greenwich)24, 369–377 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Acharya, U. R. et al. Application of deep convolutional neural network for automated detection of myocardial infarction using ECG signals. Inf. Sci.415, 190–198 (2017). [Google Scholar]
  • 15.Kim, Y. J., Saqlian, M. & Lee, J. Y. Deep learning–based prediction model of occurrences of major adverse cardiac events during 1-year follow-up after hospital discharge in patients with AMI using knowledge mining. Personal. Ubiquitous Comput.26, 259–267 (2022). [Google Scholar]
  • 16.Betancur, J. et al. Prognostic value of combined clinical and myocardial perfusion imaging data using machine learning. JACC: Cardiovasc. Imaging11, 1000–1009 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Alaa, A. M., Bolton, T., Di Angelantonio, E., Rudd, J. H. & Van der Schaar, M. Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants. PloS one14, e0213653 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Weng, S. F., Reps, J., Kai, J., Garibaldi, J. M. & Qureshi, N. Can machine-learning improve cardiovascular risk prediction using routine clinical data? PloS one12, e0174944 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wang, J. et al. Risk Prediction of Major Adverse Cardiovascular Events Occurrence Within 6 Months After Coronary Revascularization: Machine Learning Study. JMIR Med. Inform.10, e33395 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Amyar, A., Modzelewski, R., Li, H. & Ruan, S. Multi-task deep learning based CT imaging analysis for COVID-19 pneumonia: Classification and segmentation. Computers Biol. Med.126, 104037 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Geng, Q. et al. In Healthcare. 1000 (MDPI).
  • 22.Ji, J., Chen, X., Luo, C. & Li, P. A deep multi-task learning approach for ECG data analysis. In 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI). Las Vegas, NV, USA, pp. 124–127 (IEEE, 2018).
  • 23.Wu, S. et al. Using Multi-Task Learning-Based Framework to Detect ST-Segment and J-Point Deviation From Holter. Front. Physiol.13, 912739 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Torres-Soto, J. & Ashley, E. A. Multi-task deep learning for cardiac rhythm detection in wearable devices. NPJ Digital Med.3, 1–8 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Jenča, D. et al. Heart failure after myocardial infarction: incidence and predictors. ESC Heart Fail8, 222–237 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Law, M. R., Watt, H. C. & Wald, N. J. The Underlying Risk of Death After Myocardial Infarction in the Absence of Treatment. Arch. Intern. Med.162, 2405–2410 (2002). [DOI] [PubMed] [Google Scholar]
  • 27.Anand, S. S. et al. Risk factors for myocardial infarction in women and men: insights from the INTERHEART study. Eur. Heart J.29, 932–940 (2008). [DOI] [PubMed] [Google Scholar]
  • 28.Remfry, E. et al. Sex-based differences in risk factors for incident myocardial infarction and stroke in the UK Biobank. Eur. Heart J. Qual. Care Clin. Outcomes10, 132–142 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Mieszczanska, H. et al. Gender-related differences in electrocardiographic parameters and their association with cardiac events in patients after myocardial infarction. Am. J. Cardiol.101, 20–24 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Amarenco, P. et al. The ASCOD Phenotyping of Ischemic Stroke (Updated ASCO Phenotyping). Cerebrovasc. Dis.36, 1–5 (2013). [DOI] [PubMed] [Google Scholar]
  • 31.Sun, W. et al. Stroke and Myocardial Infarction: A Bidirectional Mendelian Randomization Study. Int J. Gen. Med.14, 9537–9545 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Japkowicz, N. & Stephen, S. The class imbalance problem: A systematic study. Intell. data Anal.6, 429–449 (2002). [Google Scholar]
  • 33.Kléber, A. G. ST-segment elevation in the electrocardiogram: a sign of myocardial ischemia. Cardiovasc. Res.45, 111–118 (2000). [DOI] [PubMed] [Google Scholar]
  • 34.Thygesen, K. et al. Fourth Universal Definition of Myocardial Infarction. J. Am. Coll. Cardiol.72, 2231–2264 (2018). [DOI] [PubMed] [Google Scholar]
  • 35.Lindsay, J., Dewey, R. C., Talesnick, B. S. & Nolan, N. G. Relation of st-segment elevation after healing of acute myocardial infarction to the presence of left ventricular aneurysm. Am. J. Cardiol.54, 84–86 (1984). [DOI] [PubMed] [Google Scholar]
  • 36.Khechinashvili, G. & Asplund, K. Electrocardiographic Changes in Patients with Acute Stroke: A Systematic Review. Cerebrovasc. Dis.14, 67–76 (2002). [DOI] [PubMed] [Google Scholar]
  • 37.Purushothaman, S., Salmani, D., Prarthana, K. G., Bandelkar, S. M. & Varghese, S. Study of ECG changes and its relation to mortality in cases of cerebrovascular accidents. J. Nat. Sci. Biol. Med.5, 434–436 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Chen, L. Y. et al. P Wave Parameters and Indices: A Critical Appraisal of Clinical Utility, Challenges, and Future Research—A Consensus Document Endorsed by the International Society of Electrocardiology and the International Society for Holter and Noninvasive Electrocardiology. Circulation: Arrhythmia Electrophysiol.15, e010435 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Marelli, A. J., Mackie, A. S., Ionescu-Ittu, R., Rahme, E. & Pilote, L. Congenital heart disease in the general population: changing prevalence and age distribution. Circulation115, 163–172 (2007). [DOI] [PubMed] [Google Scholar]
  • 40.Han, C. & Shi, L. ML–ResNet: A novel network to detect and locate myocardial infarction using 12 leads ECG. Computer methods Prog. biomedicine185, 105138 (2020). [DOI] [PubMed] [Google Scholar]
  • 41.Zhu, Z. et al. Classification of Cardiac Abnormalities From ECG Signals Using SE-ResNet. 2020Computing in Cardiology, Rimini, Italy, pp. 1--4 (IEEE) (2020).
  • 42.Sakli, N. et al. ResNet-50 for 12-Lead Electrocardiogram Automated Diagnosis. Comput. Intell. Neurosci. IEEE, 10.1155/2022/7617551 (2022). [DOI] [PMC free article] [PubMed]
  • 43.Weimann, K. & Conrad, T. O. Transfer learning for ECG classification. Sci. Rep.11, 1–12 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Tong, Z. & Tanaka, G. Hybrid pooling for enhancement of generalization ability in deep convolutional neural networks. Neurocomputing333, 76–85 (2019). [Google Scholar]
  • 45.Kendall, A., Gal, Y. & Cipolla, R. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 7482–7491 (2018).
  • 46.Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. In Proc. the IEEE International Conference on Computer Vision. 2980–2988 (2017).
  • 47.Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material (2.5MB, pdf)

Data Availability Statement

The datasets generated and/or analyzed in the study are not publicly available due to institutional policy and the privacy of study individuals but are available from the corresponding author upon request under the data-sharing agreements between institutions.

The underlying code for this study is not publicly available for proprietary reasons but could be available to specialists and researchers at reasonable request from the corresponding author.


Articles from NPJ Digital Medicine are provided here courtesy of Nature Publishing Group

RESOURCES