Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Mar 13.
Published in final edited form as: JACC Cardiovasc Imaging. 2021 Oct 13;15(3):395–410. doi: 10.1016/j.jcmg.2021.08.004

Using deep learning algorithms to simultaneously identify right and left ventricular dysfunction from the electrocardiogram

Akhil Vaid 1,2,3, Kipp W Johnson 1,2,3, Marcus A Badgeley 4, Sulaiman S Somani 2, Mesude Bicak 2,3, Isotta Landi 2, Adam Russak 5, Shan Zhao 2,6, Matthew Levin 1,6, Robert S Freeman 1,7,8, Alexander W Charney 3,9,10, Atul Kukar 11, Bette Kim 12, Tatyana Danilov 13,14, Stamatios Lerakis 14,15, Edgar Argulian 14,15, Jagat Narula 14,15,16, Girish N Nadkarni 1,2,17, Benjamin S Glicksberg 1,2,3
PMCID: PMC8917975  NIHMSID: NIHMS1747211  PMID: 34656465

Abstract

Background.

Rapid evaluation of left and right ventricular function using deep learning (DL) on electrocardiograms (ECG) can assist diagnostic workflow. However, DL tools to estimate right ventricular (RV) function do not exist, while ones to estimate left ventricular (LV) function are restricted to quantification of very low LV function only.

Objectives.

This study sought to develop deep learning models capable of comprehensively quantifying left and right ventricular dysfunction from ECG data in a large, diverse population.

Methods.

A multi-center study was conducted with data from five New York City hospitals; four for internal testing and one serving as external validation. We created novel DL models to classify Left Ventricular Ejection Fraction (LVEF) into categories derived from the latest universal definition of heart failure, estimate LVEF through regression, and predict a composite outcome of either RV systolic dysfunction or RV dilation.

Results.

We obtained echocardiogram LVEF estimates for 147,636 patients paired to 715,890 ECGs. We used Natural Language Processing (NLP) to extract RV size and systolic function information from 404,502 echocardiogram reports paired to 761,510 ECGs for 148,227 patients. For LVEF classification in internal testing, Area Under Curve (AUC) at detection of LVEF<=40%, 40%<LVEF<=50%, and LVEF>50% was 0.94 (95% CI:0.94–0.94), 0.82 (0.81–0.83), and 0.89 (0.89–0.89) respectively. For external validation, these results were 0.94 (0.94–0.95), 0.73 (0.72–0.74) and 0.87 (0.87–0.88). For regression, the mean absolute error was 5.84% (5.82–5.85) for internal testing, and 6.14% (6.13–6.16) in external validation. For prediction of the composite RV outcome, AUC was 0.84 (0.84–0.84) in both internal testing and external validation.

Conclusions:

DL on ECG data can be utilized to create inexpensive screening, diagnostic, and predictive tools for both LV/RV dysfunction. Such tools may bridge the applicability of ECGs and echocardiography, and enable prioritization of patients for further interventions for either sided failure progressing to biventricular disease.

Keywords: Artificial Intelligence, Deep Learning, Machine Learning, HFrEF, Right Ventricular Dilation, Right Ventricular Systolic Dysfunction, echocardiography, electrocardiogram, ECG, EKG, LVEF, Left Ventricular Ejection Fraction, Left Heart Failure, Right Heart Failure

Introduction

Heart failure represents a significant health burden, with an estimated 6.2 million people affected in the United States1, and at least 64 million people worldwide2. Considerable attention has been paid to the pathophysiology of left ventricular (LV) failure. However, owing to the anatomical and functional proximity of the ventricles, either LV or right ventricular (RV) failure can precipitate biventricular involvement, with even subclinical RV dysfunction having been found to be associated with the risk of LV failure3. Patients with biventricular failure have significantly worse outcomes, with 2-year survival of 23%, as opposed to 71% in patients with isolated LV failure3,4. Studies also show that RV dysfunction is indicative of prognosis independent of LV dysfunction for a plethora of cardiac diseases516. In recognition of the inextricable nature of LV disease, RV disease, and biventricular disease, the recent first Universal Definition and Classification of Heart Failure did not attempt to specify distinct clinical entities for left and right heart failure17.

Early detection of heart failure creates the possibility of more efficient implementation of guideline-directed medical therapy and lifestyle modifications, which have been shown to improve overall outcomes for all forms of heart failure, in addition to slowing progression to advanced disease.18

Left ventricular ejection fraction (LVEF) is one of the most widespread hemodynamic parameters currently available in cardiovascular medicine. Among many other possible uses, LVEF as a measure of ventricular function, can be used to quantify disease progression and response to treatment1921, and to independently predict mortality22. LVEF measurements are most readily obtained by transthoracic echocardiography, and thus echocardiography is one of the procedures most commonly billed to Medicare and Medicaid in the United States23.

However, significant barriers remain to obtaining LVEF measurements in outpatient or resource-limited settings without sufficient trained echocardiographers and logistical support24, and there remains significant inter-observer and intra-observer variability in measurement17. Furthermore, trajectories of LVEF over time may be more useful than isolated measurements17 requiring repeated echocardiograms and trips to the clinic for patients.

In contrast, RV failure has classically been within the realm of clinical diagnosis, with no specific biomarkers or agreed upon guidelines for ECG interpretation. Numerical measurements of RV function such as RV ejection fraction are not as readily available due to difficulties in measurement from conventional transthoracic echocardiography25. Alternate methods to assess RV function such as tricuspid annular systolic plane excursion (TAPSE) have demonstrated promise in certain clinical settings26, but there remain challenges in common scenarios such as measuring disease progression27 or assessment of RV function following cardiac surgery28. 3D echocardiography, strain imaging, and cardiac MRI are promising replacements29, but are impractical for use as screening modalities due to concerns of cost and availability. Thus, assessment of the important role of right ventricular function in the pathophysiology of cardiovascular disease has to date been underappreciated30.

Taken together, there exists a pressing need for a readily available and inexpensive tool to simultaneously measure, screen, or predict both right and left heart function. The ECG is a cardinal investigation in the practice of Cardiology. It is ubiquitous, inexpensive and is often the first investigation performed in emergency situations. However, it has an upper bound of usefulness secondary to its skill requirement and subjectivity. Additionally, clinicians cannot identify subtle patterns that might indicate sub-clinical pathology, especially for conditions for which there are no interpretation guidelines. Recent breakthroughs in artificial intelligence have demonstrated that much more information may be available from the ECG to diagnose such conditions than currently leveraged31,32. Deep learning (DL), a class of machine learning that uses hierarchical networks to extract lower-dimensional features from a higher dimensional data input, has demonstrated significant potential for enabling ECG-based predictions and diagnoses33. For example, DL has been used to identify patients with atrial fibrillation while in normal sinus rhythm34, predict incident atrial fibrillation35, identify patients amenable to cardiac resynchronization therapy36, evaluate LV diastolic function37, evaluation of patients with echocardiographically concealed long QT syndrome38, predict risk of sudden cardiac death39, and to predict low LVEF.40,41.

For the first time, we present novel DL algorithms using ECG waveform data to simultaneously predict the presence of both left ventricular and right ventricular disease in a large, ethnically and socioeconomically diverse population. We further establish a regression framework to predict numerical values of LVEF. Finally, we evaluate these algorithms with respect to mitral valve regurgitation and in clinically meaningful ranges of LVEF.

Methods

Data Source and Patient Population

We utilized patient data from five hospitals within the Mount Sinai Health System from 2003–2020. These hospitals, specifically Mount Sinai Hospital, Mount Sinai Morningside, Mount Sinai Brooklyn, Mont Sinai West, and Mount Sinai Beth Israel, are located in Manhattan, and Brooklyn boroughs of New York City and serve a large and diverse patient population. Table 1 reflects the patient population utilized in this work. Figure 1 displays the workflow of obtaining and processing relevant data. We extracted Left Ventricular Ejection Fraction (LVEF) from the EPIC electronic healthcare record (EHR) system. LVEF values are abstracted from transthoracic echocardiography (echo) result reports written by physicians. Extracted records contained a unique identifier for the patient, the date and time of the echo, and the value of the LVEF as an integer. Utilizing this method, we acquired 444,786 reports for 219,437 patients. This study has been approved by the institutional review board at the Icahn School of Medicine at Mount Sinai. Details of either Right Ventricular Systolic Dysfunction (RVSD) or Right Ventricular Dilation (RVD) were not present as discrete parameters within the EHR database. We acquired .pdf files containing unstructured text corresponding to 404,502 transthoracic echo reports for 225,826 patients. As before, collected records also contained a unique identifier for the patient, and the date of the echo.

Table 1:

Population metrics of the study cohort by Left Ventricular Ejection Fraction classification

LVEF <= 40% 40% < LVEF <= 50% LVEF > 50% p
Internal Testing Cohort
Patients: n 17,840 19,586 109,615
Echo-ECG pairs: n 171,017 112,300 431,134
Age 67.0 (66.9 – 67.1) 66.6 (66.5 – 66.6) 65.6 (65.6 – 65.7) <0.0001
Gender n (%) <0.0001
  Male 12,230 (68.55) 13,128 (67.03) 55,831 (50.93)
  Female 5,610 (31.45) 6,458 (32.97) 53,784 (49.07)
Race n (%) <0.0001
  American Indian 419 (2.35%) 552 (2.82%) 2,755 (2.51%)
  Asian 505 (2.83%) 556 (2.84%) 3,962 (3.61%)
  Black 1,865 (10.45%) 1,883 (9.61%) 9,025 (8.23%)
  Hispanic 1,975 (11.07%) 2,196 (11.21%) 11,300 (10.31%)
  Other 1,302 (7.3%) 1,365 (6.97%) 8,041 (7.34%)
  Pacific Islander 41 (0.23%) 24 (0.12%) 126 (0.11%)
  Unknown 7,092 (39.75%) 7,669 (39.16%) 48,591 (44.33%)
  White 4,641 (26.01%) 5,341 (27.27%) 25,815 (23.55%)
LVEF 28.7 (28.6 – 28.7) 46.8 (46.7 – 46.8) 62.0 (62.0 – 62.0) <0.0001
Ventricular Rate 80.8 (80.5 – 81.0) 76.3 (76.1 – 76.6) 74.2 (74.1 – 74.3) <0.0001
Atrial Rate 92.0 (91.2 – 92.9) 85.5 (84.8 – 86.2) 79.3 (79.1 – 79.5) <0.0001
PR Interval 170.6 (167.7 – 173.5) 168.6 (168.0 – 169.1) 163.7 (163.5 – 163.9) <0.0001
QTc Interval 471.0 (470.3 – 471.6) 453.8 (453.2 – 454.3) 439.4 (439.2 – 439.6) <0.0001
External Validation Cohort
Patients 123 84 390
Echo-ECG pairs 372 214 853
Age 64.0 (62.4 – 65.6) 63.8 (61.8 – 65.8) 62.3 (61.3 – 63.4) 0.15
Gender n (%) <0.0001
  Male 81 (65.85) 58 (69.05) 200 (51.28%)
  Female 42 (34.15) 26 (30.95) 190 (48.72%)
Race n (%) <0.0001
  American Indian - - -
  Asian - - -
  Black 14 (11.38%) 12 (14.29%) 81 (23.21%)
  Hispanic - - 25 (7.16%)
  Other 32 (26.02%) 20 (23.81%) 90 (25.79%)
  Pacific Islander - - -
  Unknown 38 (30.89%) 21 (25.0%) 97 (24.87%)
  White 28 (22.76%) 24 (28.57%) 95 (27.22%)
LVEF 25.5 (24.6 – 26.3) 47.3 (46.9 – 47.7) 61.5 (61.1 – 61.8) <0.0001
Ventricular Rate 84.4 (80.9 – 87.9) 78.5 (75.4 – 81.5) 79.8 (78.1 – 81.6) <0.0001
Atrial Rate 90.9 (83.1 – 98.8) 93.7 (79.0 – 108.5) 86.1 (82.1 – 90.1) <0.0001
PR Interval 167.8 (160.4 – 175.1) 175.6 (165.7 – 185.5) 161.4 (158.2 – 164.5) <0.0001
QTc Interval 480.9 (473.0 – 488.8) 465.0 (456.9 – 473.2) 448.6 (444.8 – 452.3) <0.0001

Figure 1: CONSORT Diagram.

Figure 1:

Sources of data are highlighted in yellow. Data processing steps are highlighted in purple. Numbers in each step indicate datapoints retained for analysis following conditional filtering.

LVEF: Left Ventricular Ejection Fraction, RVSD: Right Ventricular Systolic Dysfunction, RVD: Right Ventricular Dilation, SD: Standard Deviation.

ECG data were obtained as XML files (eXtensible Markup Language) exported from the GE MUSE ECG system. These files contain demographic details for the patient, details about the testing site, per-lead cart generated parameters for ECGs, cart generated ECG diagnosis, and raw waveform data (see Electrocardiograph data subsection for more details). For each outcome defined by an echo as described below, we paired the echo report to any ECG performed within a time period of 7 days before, to 7 days after the date of the echo. Overall, we extracted 715,890 paired ECGs for 147,636 patients for LVEF prediction, and 761,510 paired ECGs for 148,227 patients for Right Ventricular status prediction. Finally, there was an overlap of 390,921 paired ECGs for 87,514 patients over the two datasets. (Figure 1)

Definition of Primary Outcomes

First, we elected to model LVEF in a classification framework. LVEF was stratified into three clinically relevant ranges of LVEF <= 40%, LVEF > 40% and <= 50%, and LVEF > 50%.42 Since none of these intervals overlap, the overall task may be considered a multi-class classification problem. To compare to prior work, we also assessed performance at classification of LVEF <= 35%. Second, we attempted to model LVEF using a regression framework (i.e., directly predicting integer values of LVEF). For this problem, the target label was the LVEF value associated with each Echo-ECG pair and required no additional processing.

Right heart status was considered as a composite phenotype positive for either Right Ventricular Systolic Dysfunction (RVSD) or Right Ventricular Dilation (RVD) as elicited from an echo report. The process for defining right heart status relied on use of natural language processing (NLP) of the text from the report (see Natural Language Processing of Echocardiography Reports for Outcome Extraction). Phrases used to define either of RVSD and RVD are listed in Supplementary Table 1. An Echo-ECG pair was labelled positive for the outcome and assigned a value of 1 in the presence of either RVD or RVSD of any severity, or a value of 0 if both were absent. Since there were only two possible values for the outcome, the task may be considered a binary classification problem.

Data Processing, Quality Control, and Filtering

Left Ventricular Ejection Fraction

We discarded outliers above 90% LVEF (99.77th percentile), and below 10% LVEF (0.18th percentile) within our patient population. (Figure 1: CONSORT diagram) Additionally, the value of LVEF generated from echo is subject to inter-rater and inter-test variability. Since we considered data collected over a period of ±7 days, if the difference in reported LVEF for a patient between any two consecutive reports within 7 days was greater than 10%43, both of these reports were discarded.

Natural Language Processing of Echocardiography Reports for Outcome Extraction

We chose a rule-based approach for extracting outcomes of interest from the text contained within echo reports - specifically Right Ventricular Systolic Dysfunction (RVSD), Right Ventricular Dilation (RVD), and Mitral Regurgitation. We created and iteratively expanded an overall list of rules designed to ensure capture of the variability surrounding phrases detailing the same semantic concept. We provide the final list of rules in Supplementary Table 1, and anonymized sample annotated echo reports in Supplementary Figures 1, 2, and 3.

While RVD and RVSD were only considered in terms of their presence or absence, we created additional rules to extract qualifiers of valvular disease severity. A total of 8 rules were created to be able to classify Mitral Regurgitation into Normal, Borderline (Trace/Minimal/Mild), Moderate, or Severe disease.

Overall NLP performance was measured through two single-blind faculty reviewers. Each review contained 210 reports randomly sampled and equally distributed based on detected normal, detected diseased for either of RVD, RVSD or MR, and no mention detected for RV or Mitral Valve status.

Electrocardiography Data

Waveform data within XML files is formatted as one-dimensional collections (vectors) of integers sampled at a rate of 500Hz. Each such vector corresponds to a lead, with each XML file containing data for leads I, II, and V1 - V6. These vectors extend to either 5 seconds (2500 samples), or 10 seconds (5000 samples) of recorded information for each lead, in addition to longer rhythm strip recordings. To avoid potential artifacts caused by extending 2500 samples to 5000, we restricted each sample to only the first 5 seconds of recording. Furthermore, our dataset does not contain data regarding leads III, aVF, aVL, or aVR. For our purposes, these leads were considered to have no additional information since they can be derived from linear transformations of the vectors representing the other leads44, and were therefore not included in our models. ECGs are analog recordings prone to error45. We describe the process of ECG preprocessing in Supplementary Materials.

Patient age and ECG cart extracted parameters (Corrected QT interval, PR interval, Atrial Rate, and Ventricular Rate) were also acquired from XML files and utilized for input to our model. Overall distribution of input variables with respect to each outcome is shown in the pairplots of Supplementary Figures 4 and 5. We found that input variables were not correlated either with respect to each other, or the outcome.

Finally, no ECGs were excluded based upon associated diagnoses in the hope for generalizability across pathologies.

Model selection and Architecture

ECG waveform data consisting of arrays of numbers (vectors) can be processed using either a 1D Convolutional Neural Network (CNN) or a 2D CNN. Typically, 2D CNNs are more computationally intensive, and are frequently used in image processing or genomics studies46. We elected to use a 2D CNN because all institutions may not store ECG data as vectors, and to be able to leverage pre-trained, robust 2D CNN architectures via transfer learning. We assessed different pre-built 2D CNN architectures at or about the same level of complexity as the backbone of our modeling (see Supplementary Materials) and identified the Efficientnet47 as the best performing CNN for our modelling task (Supplementary Figure 6, 7).

Experimental design

We collated data from four Mount Sinai facilities (Mount Sinai Hospital, Mount Sinai Brooklyn, Mount Sinai West, and Mount Sinai Beth Israel) to form an internal testing dataset. All data collected from Mount Sinai Morningside formed the external validation dataset. We confirmed that no patients within the external validation were present in the internal testing dataset. The relative distributions of the dataset across internal testing and external validation for each outcome are shown in Tables 1 and 2. For a detailed overview of our repeated measures design and model training, please see Supplementary Materials.

Table 2:

Population metrics of the study population by Right Ventricular Systolic Dysfunction or Right Ventricular Dilation category

RVSD or RVD Normal p
Internal Testing Cohort
Patients 30,780 104436
Echo-ECG pairs 278877 425815
Age 65.8 (65.8 – 65.9) 66.7 (66.7 – 66.8) <0.0001
Gender n (%) <0.0001
  Male 18,533 (60.21%) 51122 (48.95)
  Female 12,247 (39.79%) 53314 (51.05)
Race n (%) <0.0001
  American Indian 384 (1.25%) 962 (0.92%)
  Asian 708 (2.3%) 2414 (2.31%)
  Black 3,258 (10.58%) 8621 (8.25%)
  Hispanic 2,478 (8.05%) 6569 (6.29%)
  Other 2,513 (8.16%) 12715 (12.17%)
  Pacific Islander 43 (0.14%) 117 (0.11%)
  Unknown 12,300 (39.96%) 44079 (42.21%)
  White 9,096 (29.55%) 28959 (27.73%)
Ventricular Rate 82.4 (82.2 – 82.6) 77.8 (77.7 – 77.9) <0.0001
Atrial Rate 99.1 (98.4 – 99.8) 84.5 (84.2 – 84.8) <0.0001
PR Interval 170.1 (169.7 – 170.5) 163.8 (163.6 – 164.0) <0.0001
QTc Interval 464.1 (463.6 – 464.7) 446.2 (445.9 – 446.4) <0.0001
External Validation Cohort
Patients 1,783 11,649
Echo-ECG pairs 8,828 47,990
Age 67.2 (66.9 – 67.5) 67.3 (67.1 – 67.4) 0.06
Gender n (%) <0.0001
  Male 1,067 (59.84%) 5,881 (50.49%)
  Female 716 (40.16%) 5,768 (49.51%)
Race n (%) <0.0001
  American Indian - -
  Asian - 35 (0.3%)
  Black 289 (16.21%) 1629 (13.98%)
  Hispanic 33 (1.85%) 159 (1.36%)
  Other 502 (28.15%) 3775 (32.41%)
  Pacific Islander - -
  Unknown 587 (32.92%) 3483 (29.9%)
  White 368 (20.64%) 2564 (22.01%)
Ventricular Rate 86.0 (85.0 – 86.9) 79.9 (79.6 – 80.3) <0.0001
Atrial Rate 104.5 (101.5 – 107.5) 86.4 (85.6 – 87.2) <0.0001
PR Interval 171.8 (170.0 – 173.6) 166.9 (164.7 – 169.2) <0.0001
QTc Interval 473.6 (471.5 – 475.8) 451.9 (451.2 – 452.6) <0.0001

Performance Evaluation

Model performance for classification tasks was primarily evaluated by utilizing Area Under Receiver Operating Characteristic curve (AUROC), and Area Under Precision Recall Curve (AUPRC) metrics. Additionally, we also considered the Youden J index48 for calculation of threshold dependent metrics. For the regression task, we utilized Mean Absolute Error (MAE) as the evaluation metric (See Supplementary Materials for a more detailed discussion of these metrics).

To evaluate cumulative incidence by model prediction, we fit a Kaplan-Meier estimator to the time difference between the first model derived False Positive / True Negative diagnosis of low LVEF, and the first echocardiographically derived low LVEF value.

As part of a baseline comparison. we also implemented the data processing and modelling pipelines for traditional statistical approaches geared towards prediction of low LVEF4951. Labelled values within XML files were extracted to create a dataset corresponding to each paper’s methodology and choice of features.

Software and hardware

All analysis was performed using the pandas52, numpy53, scipy54, scikit-learn55, PIL56, torchvision57 and Pytorch58 libraries. Model explainability was derived utilizing captum59. Plotting was performed using the matplotlib60 and seaborn61 libraries. All program code was written for the Python programming language (3.9.x)62. All software was run within custom Docker63 containers created from official PyTorch docker images. Models were trained on a HIPAA-compliant Azure Cloud virtual machine containing 4x NVIDIA v100 GPUs with 16GB VRAM each.

Results

Performance of NLP algorithm for labeling RV abnormalities

We built a rule-based NLP algorithm in order to identify RVSD and RVD outcomes from echo reports. To assess validity of this procedure, human generated labels for these reports were compared to algorithm generated labels and quantified in terms of correctly classified labels, incorrectly classified labels, as well as missed labels.

From 420 outcomes included in review for RV function, we correctly classified 404, did not predict a label for 13, and incorrectly classified 3. For RV size, from the 420 outcomes included in review, we correctly classified 402 outcomes, did not predict a label for 17, and incorrectly classified 1. Within detected outcomes, we achieved an overall accuracy of 99.7% for extraction of either RV Size or RV Function (Supplementary Table 2).

Performance of LVEF classification

We built a machine learning model to classify LVEF in terms of the following clinically relevant categories: <=40%, >40% and <=50%, as well as >50% from an ECG (Table 3). We provide the outcome distribution for the LVEF dataset and experiments in Table 1 and in the pairplot in Supplementary Figure 4.

Table 3:

Performance at LVEF classification.

Sensitivity and Specificity have been derived using the Youden J index.

Outcome Cohort % eval prevalence AUROC AUPRC Sensitivity Specificity
LVEF <= 35% Internal Testing 9.22% 0.95 (0.95 – 0.95) 0.68 (0.67 – 0.69) 0.94 (0.92 – 0.96) 0.83 (0.81 – 0.84)
External Validation 23.07% 0.95 (0.95 – 0.96) 0.88 (0.87 – 0.89) 0.88 (0.83 – 0.92) 0.87 (0.84 – 0.90)
LVEF <= 40% Internal Testing 12.52% 0.94 (0.94 – 0.94) 0.72 (0.71 – 0.73) 0.89 (0.88 – 0.9) 0.83 (0.82 – 0.85)
External Validation 25.85% 0.94 (0.94 – 0.95) 0.88 (0.88 – 0.89) 0.87 (0.85 – 0.9) 0.85 (0.83 – 0.87)
LVEF 40 – 50% Internal Testing 10.73% 0.82 (0.81 – 0.83) 0.33 (0.3 – 0.36) 0.84 (0.82 – 0.86) 0.65 (0.62 – 0.67)
External Validation 14.87% 0.73 (0.72 – 0.74) 0.29 (0.28 – 0.31) 0.78 (0.73 – 0.83) 0.57 (0.52 – 0.63)
LVEF >50% Internal Testing 76.74% 0.89 (0.89 – 0.89) 0.96 (0.96 – 0.96) 0.81 (0.79 – 0.82) 0.80 (0.79 – 0.82)
External Validation 59.28% 0.87 (0.87 – 0.88) 0.90 (0.90 – 0.91) 0.84 (0.82 – 0.86) 0.81 (0.79 – 0.82)

Our model performed extremely well at detecting patients with an LVEF of <= 40% both for internal testing (12.52% prevalence) and external validation (25.85% prevalence) with AUROC values of 0.94 (95% CI: 0.94 – 0.95) in each case. This trend was maintained for the Precision Recall Curves as well, with AUPRC values of 0.72 (95% CI: 0.71 – 0.73) for internal testing, going up to 0.88 (95% CI: 0.88 – 0.89) in external validation.

Similar results were seen for detection of LVEF > 50%. For internal testing (76.7% prevalence), the model achieved an AUROC of 0.89 (95% CI: 0.89 – 0.89), which was maintained for external validation (59.3% prevalence) at 0.87 (95% CI: 0.87 – 0.88). AUPRC values were also exceptional at 0.96 (95% CI: 0.96 – 0.96) for internal testing and 0.90 (95% CI: 0.90 – 0.91) for external validation.

Performance was lower for LVEF values >40% and <=50%. In the internal testing dataset (10.73% prevalence), we achieved an AUROC of 0.82 (95% CI: 0.81 – 0.83), and for external validation, this value was 0.73 (95% CI: 0.72 – 0.74). AUPRC values were 0.33 (95% CI: 0.3 – 0.36) for the testing dataset, and 0.29 (95% CI: 0.28 – 0.31) for the external validation dataset. We show ROC curves in Figure 2, and PR curves in Supplementary Figure 8. The interpretability plots for LVEF prediction (Figure 3) using our explainability framework highlighted QRS complexes for prediction of each LVEF related outcome. The relative importance of the extracted features was found to be variable across tested patients.

Figure 2: Receiver Operating Characteristic Curves: LVEF Classification.

Figure 2:

Upper row shows performance for each outcome in the Internal Testing cohort, while the lower row shows performance in the external validation cohort.

Figure 3: LVEF Explainability.

Figure 3:

Panel A: Pixels of input image which were most responsible for driving the prediction towards an LVEF of <40% are highlighted. Panel B: Relative contributions of imaging data and tabular data to the overall prediction. Panel C: Effect of the tabular features on model’s prediction.

Predicted LVEF <40% probability: 0.89. Actual LVEF value: 29%

Model performance was maintained when tested against varying severity of MR, with better performance seen when tested against Normal - Mild MR. (Supplementary Table 3, Supplementary Figures 9, 10)

In an additional analysis, we applied the Youden J index to model predictions to derive False Positives / True Negatives. We found that cumulative incidence of low LVEF in a 5-year follow-up period after the first prediction was higher in patients labelled False Positives over those labelled True Negatives. We also found survival was higher in True Negatives over other classes of patients (Figure 4).

Figure 4: 5-year follow up after initial prediction.

Figure 4:

Cumulative incidence of low LVEF (left) and survival (right) over a 5-year follow-up period in low LVEF (<=40%) patient groups per model prediction.

In a separate experiment for detection of patients with an LVEF <= 35%, our model performed exceedingly well in internal testing (9.22% prevalence), with an AUROC of 0.95 (95% CI: 0.95 – 0.95), and an AUPRC of 0.68 (95% CI: 0.67 – 0.69). In external validation (23.07% prevalence), these results were maintained with an AUROC of 0.95 (95% CI: 0.95 – 0.95 AUPRC of 0.88 (95% CI: 0.87 – 0.89). (Table 3, Supplementary Figures 11, 12)

Performance of LVEF Regression

We constructed a deep learning model to predict the exact value of LVEF from an Echo-ECG pair within a regression framework. Within the internal testing dataset, the MAE was 5.84% (95% CI: 5.82 – 5.85%). For the external validation dataset, the MAE was 6.14% (95% CI: 6.13 – 6.16%). A scatterplot showing the relationship between predicted and actual values for the overall dataset is shown in Figure 5.

Figure 5: Scatterplot showing LVEF regression performance.

Figure 5:

Contour lines show density of predicted LVEF values vs ground truth LVEF values around the line of perfect concordance. Error bars around each datapoint indicate the mean absolute error. 10,000 samples shown for each panel to prevent overplotting. Contour maps were generated using the entire dataset.

We evaluated the performance of this regression model within clinically relevant LVEF subgroups. In the first subgroup of echo derived LVEF <=40%, the MAE for the model was 6.69% in internal testing, and 6.46% in external validation. Within the second subgroup of LVEF between 40 and 50%, the MAE was greater at 8.08% in internal testing, and 8.55% in external validation. Within the final subgroup of patients of LVEF > 50%, our model achieved a MAE of 5.41% in internal testing, and 5.44% in external validation (Supplementary Figure 13).

Performance of Right Ventricular Systolic Dysfunction and Right Ventricular Dilation classification

We built a deep learning model to predict either RVSD or RVD from a patient’s ECG in both internal testing (32.44% prevalence) as well as external validation (15.53% prevalence) (Table 2). Our model achieved robust performance in this task with an AUROC of 0.84 (95% CI: 0.84 – 0.84) in internal testing, maintained in external validation at 0.84 (95% CI: 0.84 – 0.84) (Figure 6, Table 4). Our model achieved similar success with respect to AUPRC, with values of 0.67 (95% CI: 0.66 – 0.67) in testing, and 0.55 (95% CI: 0.54 – 0.55) in external validation. (Supplementary Figure 14, Table 4). Plots created using the explainability framework again highlighted QRS complexes for prediction of the composite outcome. (Figure 7).

Figure 6: Receiver Operating Characteristic Curves: Right Ventricular Dilation or Right.

Figure 6:

Ventricular Systolic Dysfunction classification. Left panel shows performance for the Internal Testing cohort, while the right panel shows performance in the external validation cohort.

Table 4:

Performance at RVSD + RVD composite outcome classification.

Sensitivity and Specificity have been derived using the Youden J index.

Cohort % eval prevalence AUROC AUPRC Sensitivity Specificity
Internal Testing 32.44% 0.84 (0.84 – 0.84) 0.66 (0.66 – 0.67) 0.75 (0.73 – 0.76) 0.77 (0.75 – 0.78)
External Validation 15.54% 0.84 (0.84 – 0.84) 0.55 (0.54 – 0.55) 0.77 (0.76 – 0.78) 0.75 (0.74 – 0.76)

Figure 7: RV Composite Outcome (RVSD or RVD) Explainability.

Figure 7:

Panel A: Pixels of input image which were most responsible for driving the prediction towards the composite outcome are highlighted. Panel B: Relative contributions of imaging data and tabular data to the overall prediction. Panel C: Effect of the tabular features on model’s prediction.

Predicted probability of composite outcome: 0.948, RVSD: Present, RVD: Present

Model AUPRC values when evaluated in the presence of low LVEF were seen to be substantially increased over those in the presence of normal LVEF. (Supplementary Materials, Supplementary Figures 1516, Supplementary Table 4)

Discussion

Utilizing 700,000 ECGs for around 150,000 unique patients from a large and socioeconomically diverse cohort of patients in the Mount Sinai Health System in New York City. We developed, evaluated, and externally validated multimodal deep learning models capable of discerning the contractile state of both the left and right ventricles. We created an accurate Natural Language Processing pipeline for extraction of outcomes from free text echo reports. Finally, we developed a multimodal explainability framework to highlight which parts of the ECG are more salient for each outcome and derive interactions between demographic data and imaging data.

Existing work on LVEF extraction from ECGs is limited to classification of LVEF values of <= 35%. We extend our classification framework to clinically pertinent ranges42 of <= 40%, 40 – 50%, and >50% to be able to catch additional downstream management and prognostic implications. For example, the difference between an LVEF of 41% and one of 71% is hemodynamically and clinically significant. In additional testing, performance at detection of LVEF <=35% was an AUROC of 0.95 in a demographically diverse cohort of patients. Our deep-learning models also outperformed traditional statistical approaches which utilize extracted ECG features for detection of low LVEF4951 (Supplementary Table 5). An additional benefit deep-learning CNNs have over such models is the lack of a requirement of manual feature selection. While manual feature annotation may outperform deep learning at certain tasks due to stronger inductive bias, it poses significant limitations due to a hard requirement on expert domain knowledge. Further, patterns which represent an outcome of interest may not be apparent to humans at all.

Finally, higher cumulative incidence of LVEF <=40% over a 5-year follow up period in False Positives over True Negatives signals the model’s ability to gauge patient severity. Using our model, such patients may be diagnosed earlier in their clinical course with appropriate threshold selection.

Threshold selection also has a role in the utilization and deployment of such models. By setting the classification threshold to an appropriately low value, such models can be used as screening tools for low LVEF in asymptomatic patients, at the cost of some false positives. For LVEF <= 40%, we found that we achieved a sensitivity of 90% at a specificity of 82.5% at an AUROC of 0.94.

To the best of our knowledge, there has also been no work till date on estimating LVEF value as a continuous (percentage) number using ECGs. Clinical guidelines which segment patients based on LVEF assume that one set of classification boundaries is applicable to the entire population. However, normal variation in echo derived baseline values is expected secondary to patient demographics. A regression approach reduces risk of such misclassification37. We posit that LVEF regression may dramatically enhance the value of a screening ECG even in otherwise low risk groups and is also more useful for evaluation of LVEF in the longitudinal setting, where LVEF changes over time are as clinically important64. Additionally, the regression framework is also independent of changes in management guidelines.

Internal validation alone cannot guarantee model quality58. Biases within training data which help performance may not translate to external cohorts. It follows that external validation is extremely important to assess how generalizable a model is. We were encouraged to see that for evaluation of LVEF, there was minimal to low change in performance in going from internal to external validation.

Diagnosis of RV dysfunction using deep learning on ECGs is a novel approach. The Left and Right ventricles are inextricably linked, but we found that using LVEF as a predictor in a univariate Logistic Regression model for predicting composite RVSD and RVD outcome only achieves an AUROC of 0.71 (95% CI: 0.70 – 0.72). Our models perform robustly for the detection of compromised RV state at an AUROC of 0.84 (95 %CI: 0.83 – 0.84). Additionally, the exceptionally high AUPRC values in the presence of LVEF <= 40% indicate such models are suited for tracking RV involvement secondary to HFrEF. Once again, the models translated extremely well to external validation. Our decision to not stratify RV disease according to severity was made to allow for early disease detection. Depending on clinical context, this may be adjusted to more severe disease. We posit performance in this context will also increase due to there being a greater difference between normal and diseased cases.

We also evaluated the classification performance of all our models across the diverse demographic and socioeconomic subgroups that comprise our dataset (Supplementary Figures 1720). We were encouraged to find model performance was consistent across all subgroups in line with overall performance.

Deep learning represents a powerful set of tools that can find patterns that are too subtle for human perception. It has been applied with great success in natural image classification, and we have extended those capabilities into successfully reading ECGs for outcomes that do not have an established set of guidelines.

Limitations

LVEF extracted from an echocardiogram is subject to inter-rater, and intra-rater variability. Further, echo operators find it easier to tell when an LVEF value is very abnormal (<40%) or closer to normal (>50%), with the 40–50% range being more circumspect with respect to overall accuracy. Supervised machine learning algorithms such as neural networks rely heavily on the quality of ground truth labels, and performance is especially susceptible to systemic biases. We believe that the comparatively large change in performance in going from internal testing to external validation in the 40–50% group is a direct consequence of this. Even if the model were to make correct predictions by excluding either the <40% or the >50% group, performance would suffer due to the lower quality labels in the 40–50% range. Overall, these issues are amenable to methods which lead to more accurate ground truth about the LVEF, such as cardiac MRIs, or universal use of echogenic contrast medium. Additionally, we paired ECGs to echo reports over a ±7-day time period. Changes in LVEF secondary to either acute pathology or treatment may lead to discordance between ground truth and recorded values. This may be offset by the resilience of neural network architectures to random error, as is evidenced by the robust performance of our models in either evaluation cohort. Additional random error may have been introduced by the NLP pipeline creating erroneous labels in some cases, although our accuracy upon manual review was very high (99.7%). External validity, though demonstrated at another hospital site, is still limited as the external site was still a part of the Mount Sinai health system and similar geographical location, further creating the need for future work to focus on prospective and greater site validation of our models.

Conclusions

Deep learning models can extract information about biventricular function from the ECG that would ordinarily require an echocardiogram. Such models may enhance the usefulness of the ECG in the screening and management of either sided heart failure progressing to biventricular disease.

Supplementary Material

1

Central Illustration: Deep Learning Based Identification of Left and Right Ventricular Dysfunction from the Electrocardiogram workflow.

Central Illustration:

Demonstration of machine learning workflow. ECGs are preprocessed, and analyzed using a neural network. Followed by calculation of results and saliency maps.

Red: Left heart, Blue: Right heart

Clinical Perspectives.

Competency in Systems-based Practice

Deep learning on electrocardiograms can simultaneously detect right ventricular systolic dysfunction or dilation and quantitatively estimate left ventricular ejection fraction.

Translational Outlook

A prospective trial is needed to determine how automated deep learning tools on the electrocardiogram may be implemented to support clinical decision making in management of either sided heart failure progressing to biventricular disease.

Acknowledgements:

The authors would like to thank Manbir Singh, Mark Shervey, Adeyinka Fadehan, Sparshdeep Kaur, Ejaz Siddiqui, Percy LaRosa, and Thareesh Dondapati for their invaluable support.

Funding

This study was supported by the National Center for Advancing Translational Sciences, National Institutes of Health (U54 TR001433-05). This study has been approved by the institutional review board at the Icahn School of Medicine at Mount Sinai.

Abbreviations:

LVEF

Left Ventricular Ejection Fraction

RVSD

Right Ventricular Systolic Dysfunction

RVD

Right Ventricular Dilation

MR

Mitral Regurgitation

CNN

Convolutional Neural Network

CI

Confidence Interval

NLP

Natural Language Processing

AUROC

Area Under Receiver Operating Characteristic Curve

AUPRC

Area Under Precision Recall Curve

Footnotes

Disclosures:

The authors have no relationships relevant to the content of this paper to disclose.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Virani SS et al. Heart Disease and Stroke Statistics—2020 Update: A Report From the American Heart Association. Circulation 141, e139–e596, doi:doi: 10.1161/CIR.0000000000000757 (2020). [DOI] [PubMed] [Google Scholar]
  • 2.Lippi G & Sanchis-Gomar F Global epidemiology and future trends of heart failure. AME Medical Journal 5 (2020). [Google Scholar]
  • 3.Nochioka K et al. Right Ventricular Function, Right Ventricular-Pulmonary Artery Coupling, and Heart Failure Risk in 4 US Communities: The Atherosclerosis Risk in Communities (ARIC) Study. JAMA Cardiol 3, 939–948, doi: 10.1001/jamacardio.2018.2454 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Uduman J Epidemiology of Cardiorenal Syndrome. Adv Chronic Kidney Dis 25, 391–399, doi: 10.1053/j.ackd.2018.08.009 (2018). [DOI] [PubMed] [Google Scholar]
  • 5.Mikami Y et al. Right Ventricular Ejection Fraction Is Incremental to Left Ventricular Ejection Fraction for the Prediction of Future Arrhythmic Events in Patients With Systolic Dysfunction. Circ Arrhythm Electrophysiol 10, doi: 10.1161/CIRCEP.116.004067 (2017). [DOI] [PubMed] [Google Scholar]
  • 6.Meyer P et al. Effects of right ventricular ejection fraction on outcomes in chronic systolic heart failure. Circulation 121, 252–258, doi: 10.1161/CIRCULATIONAHA.109.887570 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Meyer P et al. Right ventricular ejection fraction <20% is an independent predictor of mortality but not of hospitalization in older systolic heart failure patients. Int J Cardiol 155, 120–125, doi: 10.1016/j.ijcard.2011.05.046 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Larose E et al. Right ventricular dysfunction assessed by cardiovascular magnetic resonance imaging predicts poor prognosis late after myocardial infarction. J Am Coll Cardiol 49, 855–862, doi: 10.1016/j.jacc.2006.10.056 (2007). [DOI] [PubMed] [Google Scholar]
  • 9.Courand PY et al. Prognostic value of right ventricular ejection fraction in pulmonary arterial hypertension. Eur Respir J 45, 139–149, doi: 10.1183/09031936.00158014 (2015). [DOI] [PubMed] [Google Scholar]
  • 10.Juilliere Y et al. Additional predictive value of both left and right ventricular ejection fractions on long-term survival in idiopathic dilated cardiomyopathy. Eur Heart J 18, 276–280, doi: 10.1093/oxfordjournals.eurheartj.a015231 (1997). [DOI] [PubMed] [Google Scholar]
  • 11.de Groote P et al. Right ventricular ejection fraction is an independent predictor of survival in patients with moderate heart failure. J Am Coll Cardiol 32, 948–954, doi: 10.1016/s0735-1097(98)00337-4 (1998). [DOI] [PubMed] [Google Scholar]
  • 12.Melenovsky V, Hwang SJ, Lin G, Redfield MM & Borlaug BA Right heart dysfunction in heart failure with preserved ejection fraction. Eur Heart J 35, 3452–3462, doi: 10.1093/eurheartj/ehu193 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mohammed SF et al. Right ventricular function in heart failure with preserved ejection fraction: a community-based study. Circulation 130, 2310–2320, doi: 10.1161/CIRCULATIONAHA.113.008461 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.McLaughlin VV et al. ACCF/AHA 2009 expert consensus document on pulmonary hypertension: a report of the American College of Cardiology Foundation Task Force on Expert Consensus Documents and the American Heart Association: developed in collaboration with the American College of Chest Physicians, American Thoracic Society, Inc., and the Pulmonary Hypertension Association. Circulation 119, 2250–2294, doi: 10.1161/CIRCULATIONAHA.109.192230 (2009). [DOI] [PubMed] [Google Scholar]
  • 15.Eysmann SB, Palevsky HI, Reichek N, Hackney K & Douglas PS Two-dimensional and Doppler-echocardiographic and cardiac catheterization correlates of survival in primary pulmonary hypertension. Circulation 80, 353–360, doi: 10.1161/01.cir.80.2.353 (1989). [DOI] [PubMed] [Google Scholar]
  • 16.Ghio S et al. Prognostic relevance of the echocardiographic assessment of right ventricular function in patients with idiopathic pulmonary arterial hypertension. Int J Cardiol 140, 272–278, doi: 10.1016/j.ijcard.2008.11.051 (2010). [DOI] [PubMed] [Google Scholar]
  • 17.Bozkurt B, Coats A & Tsutsui H Universal Definition and Classification of Heart Failure. J Card Fail, doi: 10.1016/j.cardfail.2021.01.022 (2021). [DOI] [PubMed] [Google Scholar]
  • 18.de Couto G, Ouzounian M & Liu PP Early detection of myocardial dysfunction and heart failure. Nature Reviews Cardiology 7, 334–344, doi: 10.1038/nrcardio.2010.51 (2010). [DOI] [PubMed] [Google Scholar]
  • 19.Ito S et al. Reduced Left Ventricular Ejection Fraction in Patients With Aortic Stenosis. J Am Coll Cardiol 71, 1313–1321, doi: 10.1016/j.jacc.2018.01.045 (2018). [DOI] [PubMed] [Google Scholar]
  • 20.Yancy CW et al. 2017 ACC/AHA/HFSA Focused Update of the 2013 ACCF/AHA Guideline for the Management of Heart Failure: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines and the Heart Failure Society of America. J Card Fail 23, 628–651, doi: 10.1016/j.cardfail.2017.04.014 (2017). [DOI] [PubMed] [Google Scholar]
  • 21.Goel S et al. Decline in Left Ventricular Ejection Fraction Following Anthracyclines Predicts Trastuzumab Cardiotoxicity. JACC Heart Fail 7, 795–804, doi: 10.1016/j.jchf.2019.04.014 (2019). [DOI] [PubMed] [Google Scholar]
  • 22.Wehner GJ et al. Routinely reported ejection fraction and mortality in clinical practice: where does the nadir of risk lie? Eur Heart J 41, 1249–1257, doi: 10.1093/eurheartj/ehz550 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Papolos A, Narula J, Bavishi C, Chaudhry FA & Sengupta PPUS Hospital Use of Echocardiography: Insights From the Nationwide Inpatient Sample. J Am Coll Cardiol 67, 502–511, doi: 10.1016/j.jacc.2015.10.090 (2016). [DOI] [PubMed] [Google Scholar]
  • 24.Jellis CL & Griffin BP Are We Doing Too Many Inpatient Echocardiograms?: The Answer From Big Data May Surprise You! J Am Coll Cardiol 67, 512–514, doi: 10.1016/j.jacc.2015.10.091 (2016). [DOI] [PubMed] [Google Scholar]
  • 25.Ostenfeld E & Flachskampf FA Assessment of right ventricular volumes and ejection fraction by echocardiography: from geometric approximations to realistic shapes. Echo Res Pract 2, R1–R11, doi: 10.1530/ERP-14-0077 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Schmid E et al. Tricuspid annular plane systolic excursion (TAPSE) predicts poor outcome in patients undergoing acute pulmonary embolectomy. Heart Lung Vessel 7, 151–158 (2015). [PMC free article] [PubMed] [Google Scholar]
  • 27.Raina A, Vaidya A, Gertz ZM, Susan C & Forfia PR Marked changes in right ventricular contractile pattern after cardiothoracic surgery: implications for post-surgical assessment of right ventricular function. J Heart Lung Transplant 32, 777–783, doi: 10.1016/j.healun.2013.05.004 (2013). [DOI] [PubMed] [Google Scholar]
  • 28.Tamborini G et al. Is right ventricular systolic function reduced after cardiac surgery? A two- and three-dimensional echocardiographic study. Eur J Echocardiogr 10, 630–634, doi: 10.1093/ejechocard/jep015 (2009). [DOI] [PubMed] [Google Scholar]
  • 29.Arrigo M et al. Right Ventricular Failure: Pathophysiology, Diagnosis and Treatment. Card Fail Rev 5, 140–146, doi: 10.15420/cfr.2019.15.2 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Ryan JJ & Tedford RJ Diagnosing and treating the failing right heart. Curr Opin Cardiol 30, 292–300, doi: 10.1097/HCO.0000000000000164 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Johnson KW et al. Artificial Intelligence in Cardiology. J Am Coll Cardiol 71, 2668–2679, doi: 10.1016/j.jacc.2018.03.521 (2018). [DOI] [PubMed] [Google Scholar]
  • 32.Johnson KW et al. Enabling Precision Cardiology Through Multiscale Biology and Systems Medicine. JACC Basic Transl Sci 2, 311–327, doi: 10.1016/j.jacbts.2016.11.010 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Somani S et al. Deep learning and the electrocardiogram: review of the current state-of-the-art. EP Europace, doi: 10.1093/europace/euaa377 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Attia ZI et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet 394, 861–867, doi: 10.1016/S0140-6736(19)31721-0 (2019). [DOI] [PubMed] [Google Scholar]
  • 35.Christopoulos G et al. Artificial Intelligence–Electrocardiography to Predict Incident Atrial Fibrillation. Circulation: Arrhythmia and Electrophysiology 13, e009355, doi:doi: 10.1161/CIRCEP.120.009355 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Feeny AK et al. Machine Learning of 12-Lead QRS Waveforms to Identify Cardiac Resynchronization Therapy Patients With Differential Outcomes. Circulation: Arrhythmia and Electrophysiology 13, e008210, doi:doi: 10.1161/CIRCEP.119.008210 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Kagiyama N et al. Machine Learning Assessment of Left Ventricular Diastolic Function Based on Electrocardiographic Features. Journal of the American College of Cardiology 76, 930–941, doi:doi: 10.1016/j.jacc.2020.06.061 (2020). [DOI] [PubMed] [Google Scholar]
  • 38.Bos JM et al. Use of Artificial Intelligence and Deep Neural Networks in Evaluation of Patients With Electrocardiographically Concealed Long QT Syndrome From the Surface 12-Lead Electrocardiogram. JAMA Cardiology, doi: 10.1001/jamacardio.2020.7422 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Lai D, Zhang Y, Zhang X, Su Y & Heyat MB B. An Automated Strategy for Early Risk Identification of Sudden Cardiac Death by Using Machine Learning Approach on Measurable Arrhythmic Risk Markers. IEEE Access 7, 94701–94716, doi: 10.1109/ACCESS.2019.2925847 (2019). [DOI] [Google Scholar]
  • 40.Attia ZI et al. Screening for cardiac contractile dysfunction using an artificial intelligence-enabled electrocardiogram. Nat Med 25, 70–74, doi: 10.1038/s41591-018-0240-2 (2019). [DOI] [PubMed] [Google Scholar]
  • 41.Jentzer JC et al. Left ventricular systolic dysfunction identification using artificial intelligence-augmented electrocardiogram in cardiac intensive care unit patients. International Journal of Cardiology 326, 114–123, doi: 10.1016/j.ijcard.2020.10.074 (2021). [DOI] [PubMed] [Google Scholar]
  • 42.Bozkurt B et al. Universal Definition and Classification of Heart Failure: A Report of the Heart Failure Society of America, Heart Failure Association of the European Society of Cardiology, Japanese Heart Failure Society and Writing Committee of the Universal Definition of Heart Failure. Journal of Cardiac Failure 27, 387–413, doi: 10.1016/j.cardfail.2021.01.022 (2021). [DOI] [PubMed] [Google Scholar]
  • 43.De Geer L, Oscarsson A & Engvall J Variability in echocardiographic measurements of left ventricular function in septic shock patients. Cardiovasc Ultrasound 13, 19–19, doi: 10.1186/s12947-015-0015-6 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Malmivuo J & Plonsey R 277–289 (1975).
  • 45.Sörnmo L & Laguna P in Bioelectrical Signal Processing in Cardiac and Neurological Applications (eds Sörnmo Leif & Laguna Pablo) 453–566 (Academic Press, 2005). [Google Scholar]
  • 46.Torada L et al. ImaGene: a convolutional neural network to quantify natural selection from genomic data. BMC Bioinformatics 20, 337, doi: 10.1186/s12859-019-2927-x (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Tan M & Le QV (2019).
  • 48.Ruopp MD, Perkins NJ, Whitcomb BW & Schisterman EF Youden Index and optimal cut-point estimated from observations affected by a lower limit of detection. Biom J 50, 419–430, doi: 10.1002/bimj.200710415 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.O’Neal WT et al. Electrocardiographic Predictors of Heart Failure With Reduced Versus Preserved Ejection Fraction: The Multi-Ethnic Study of Atherosclerosis. J Am Heart Assoc 6, doi: 10.1161/jaha.117.006023 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Hendry PB, Krisdinarti L & Erika M Scoring System Based on Electrocardiogram Features to Predict the Type of Heart Failure in Patients With Chronic Heart Failure. Cardiology Research; Vol. 7, No. 3, Jun 2016 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Alhamaydeh M et al. Identifying the most important ECG predictors of reduced ejection fraction in patients with suspected acute coronary syndrome. Journal of Electrocardiology 61, 81–85, doi: 10.1016/j.jelectrocard.2020.06.003 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.McKinney W pandas: a foundational Python library for data analysis and statistics. Python for High Performance and Scientific Computing 14, 1–9 (2011). [Google Scholar]
  • 53.Harris CR et al. Array programming with NumPy. Nature 585, 357–362, doi: 10.1038/s41586-020-2649-2 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Virtanen P et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods 17, 261–272, doi: 10.1038/s41592-019-0686-2 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Pedregosa F et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12, 2825–2830 (2011). [Google Scholar]
  • 56.Clark A (readthedocs, 2015).
  • 57.Marcel S & Rodriguez Y in Proceedings of the 18th ACM international conference on Multimedia 1485–1488 (Association for Computing Machinery, Firenze, Italy, 2010). [Google Scholar]
  • 58.Paszke A et al. in Advances in Neural Information Processing Systems 32 (eds Wallach H et al. ) 8024–8035 (Curran Associates, Inc., 2019). [Google Scholar]
  • 59.Kokhlikyan N et al. Captum: A unified and generic model interpretability library for pytorch. arXiv preprint arXiv:2009.07896 (2020).
  • 60.Hunter JD Matplotlib: A 2D graphics environment. Computing in Science & Engineering 9, 90–95, doi: 10.1109/mcse.2007.55 (2007). [DOI] [Google Scholar]
  • 61.Waskom M & team, t. s. d. mwaskom/seaborn doi: 10.5281/zenodo.592845 (2020). [DOI] [Google Scholar]
  • 62.van Rossum G The Python Programming Language, <https://www.python.org/> (
  • 63.Merkel D Docker: lightweight linux containers for consistent development and deployment. Linux journal 2014, 2 (2014). [Google Scholar]
  • 64.Patel JM & Heidenreich PA Abstract 225: LVEF Over Time: Trends and Analysis of a National VA Database. Circulation: Cardiovascular Quality and Outcomes 11, A225–A225, doi:doi: 10.1161/circoutcomes.11.suppl_1.225 (2018). [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES