Corresponding Author
Key words: artificial intelligence, deep learning, electrocardiography, machine learning
Machine learning, and deep learning in particular, has seen brisk growth in biomedical research; in 2015, PubMed indexed fewer than 500 citations under “deep learning,” while in 2021 alone, more than 14,000 citations were added. Although deep learning has existed for decades, recent advances in computational capacity and cataloging of large data sets have led to deep learning models, producing useful results in multiple spheres of daily life—for example, the grouping of related images in smartphone galleries, automated language translation, and digital transcription services. The utility of deep learning in these settings has accelerated interest in its possible applications in biomedicine. In this issue of JACC: Advances, Schlesinger et al1 demonstrate that deep learning can predict an elevated mean pulmonary capillary wedge pressure (mPCWP) based on the standard 12-lead electrocardiogram (ECG). The study highlights a specific potential role for deep learning in health care and exemplifies the application of this technology to biomedical data.
What is deep learning?
Deep learning is a subset of machine learning, which itself is a subset of artificial intelligence (AI). AI involves training machines to complete human tasks. Machine learning accomplishes AI by providing a computer algorithm with input data and candidate solutions. On the simplest level, logistic and linear regression may be considered versions of machine learning. Deep learning accomplishes machine learning by creating artificial neural networks (ANNs), which are so named as they were intended to resemble the neural structure underlying human cognition. An ANN is a series of simple computations which feed into one another; in so doing, they can capture patterns in high-dimensional data and map them to outputs. ANNs are comprised by layers of neurons—computational units that take multiple numerical inputs. The connections between neurons are assigned an adjustable weighting. Information is passed from neuron to neuron between layers until it converges on the output layer. By stacking and connecting neurons in different configurations, ANNs can be constructed to approximate numerous mapping permutations between the input and output layers. “Deep” ANNs are formed by stacking layers of neurons in series (with intermediate layers referred to as hidden layers). As data flow into deeper layers, the information is iteratively distilled, resulting in downstream representations containing more relevant data to the prediction task. Traditional machine learning, such as logistic regression, relies on any input data transformation to be pre-defined based on expert knowledge or experimentation. Schlesinger et al1 highlight this contrast for data transformations in their study; their logistic regression model relies on pre-specified interval data, while their deep learning models incorporates raw ECG signal and performed feature engineering independently.
For example, consider an image provided to a deep learning algorithm which is 100 pixels wide by 100 pixels high (Figure 1). Each of the pixels maps to multiple input neurons, which en masse represent the input layer. Hidden layers comprised of additional neurons are created between this input layer and the output layer. Each neuron has a value constructed by multiplying the weights specific to a neuron with the paired input. The value output by each neuron is then fed into an activation function which can be seen as analogous to the concept of threshold potentials in biological neurons; outputs are set to zero if they do not meet a threshold value. Then, through gradient descent, the model's weights are modified until the output of the ANN is close to the targets.
Figure 1.
Feedforward Networks are the Simplest Form of Artificial Neural Networks, Although No Longer Typically Used for Computer Vision Tasks They Illustrate Core Concepts in Neural Network Architecture
Here, the input to a neural network could be the pixels of an ECG image. As data flow through deeper layers in a neural network, it is thought that each successive layer is capable of capturing a “higher level” concept; for example, early layers could capture the presence of a downward or upward deflection, and subsequent layers could then use lower layer outputs to determine whether a QRS wave was present on the basis of the lower level neurons capturing a rapid succession of downward and upward signals being detected. Later layers could capture even “higher level” features such as the presence of a myocardial infarction. The final layer could then be fed into a simple model such as a logistic regression that uses the ANN's high-level features to classify the output. Deep learning models are not restricted to learning classical ECG features identified by clinicians; this is an advantage in many prediction tasks as it can be challenging to pre-suppose which features are valuable. For example, recent deep learning analyses of ECGs have revealed that ANNs can predict gender2 and paroxysmal atrial fibrillation from ECGs, showing normal sinus rhythm3—these are tasks that many expert clinicians would have no reasonable guesses as to which ECG features would be helpful for prediction.
How deep is an ECG?
In their recent study, Schlesinger et al1 assess whether standard ECGs could be used to predict elevated mPCWP. Logistic regression and deep learning were used to create 3 models. First, logistic regression was used based on ECG interval data. Second, deep learning used data from 6,739 patients with paired same-day ECGs and right heart catheterization (RHC) hemodynamic data to predict several hemodynamic measures, including mPCWP. Third, deep learning models were first trained to predict standard ECG interval data using 242,216 ECGs; then, the ANN weights learned through this process were used as a starting point to predict RHC data from ECGs using the same database as the other deep learning model.
The patient population of the paired ECG-RHC data consists primarily of patients with heart failure, including some with corrected hemodynamics following heart transplantation. Their logistical regression model had no discriminatory ability. In contrast, the deep learning models successfully identified elevated mPCWP, but the results improved significantly when the model was initially trained on the interval prediction task (for which data were plentiful) to inform the primary task data set.
The use of transfer learning to improve model performance is innovative in this domain and has been used successfully in other areas such as computer vision, where image recognition models first trained on large data sets for a different task have outperformed image recognition models trained only on the focused data set. Transfer learning herein is exemplified by training stand-alone ECGs to predict interval data. It is thought that by having a model predict a different task on similar input data, the model will learn to have some meaningful representation of the raw input and that this pre-trained representation will then allow a deep learning model efficiently detect features that are meaningful. This is similar to teaching a student to identify ST-segment elevation on an ECG; they will likely improve in identifying pericarditis and ST-segment elevation myocardial infarction as they rely on the detection of a similar high-level feature.
Deep learning in medicine is sometimes criticized for lack of model interpretability, unlike linear models where clinicians are more comfortable interpreting coefficients; deep learning models provide no such equivalent. Fears regarding bias propagation and reliance on spurious findings within training data sets exist. Schlesinger et al1 have attempted to improve clinician trust in their model by including a “trustworthiness” metric. Schlesinger et al1 use their model's output to predict the likelihood of pulmonary hypertension and elevated mPCWP; if there is discordance, the model flags its predictions as potentially unreliable. Stratifying model outputs by this unreliability score shows a significant difference in the discriminatory ability.
The model and analysis put forward by Schlesinger et al1 are innovative and compelling, but the population on which their model was created may be unique, and this raises questions regarding the ability of the model to generalize to other patient settings. Traditional classification metrics such as positive predictive value and negative predictive value can be adjusted for different baseline prevalence, which Schlesinger et al1 do; however, inherent in this assumption is the idea that the model's ability to discriminate hemodynamic data from ECGs is maintained. In the absence of a causal model, the transportability of this model's performance to different settings remains uncertain. (The term causal model refers to the idea that a prediction model may generalize better if it relies on the detection of features that are caused by what the model is trying to predict, not just those noncausally associated with it.) It is possible that the deep learning model relies on structural changes seen after chronically deranged hemodynamics and is not reflective of real-time mPCWP. More data may be a remedy for this issue; however, other forms of transfer learning may also provide further opportunities to improve generalization.
The study by Schlesinger et al1 exemplifies the potentially transformative impact deep learning may have on drawing deeper clinical inference from existing data—data that we, as clinicians, look at every day. However, despite the growing interest in deep learning in health care, few models have translated into the clinical setting yet. Issues regarding model generalizability, integration, interpretability, and utility have been challenges. Innovative solutions, such as the trustworthiness metric and the integration of transfer learning to leverage-related data sets, may help overcome these barriers. Nevertheless, in our view, the potential to favorably impact health care is very promising, and these results represent an exciting step forward for this growing field.
Funding support and author disclosures
Dr Lawler is supported by a Heart and Stroke Foundation of Canada National New Investigator award; and has received unrelated research funding from the Canadian Institutes of Health Research, the National Institutes of Health (National Heart, Lung, and Blood Institute), the Peter Munk Cardiac Centre, the LifeArc Foundation, the Thistledown Foundation, the Ted Rogers Centre for Heart Research, the Medicine by Design Fund, the University of Toronto, and the Government of Ontario. Dr Lawler has received unrelated consulting honoraria from Novartis, CorEvitas, and Brigham and Women's Hospital; and unrelated royalties from McGraw-Hill Publishing. All other authors have reported that they have no relationships relevant to the contents of this paper to disclose.
Acknowledgments
The authors thank Gail Rudakevich (freelance medical illustrator) for generating the figure and Bo Wang, PhD (Peter Munk Cardiac Centre, Vector Institute, and University of Toronto) for technical content review.
Footnotes
The authors attest they are in compliance with human studies committees and animal welfare regulations of the authors' institutions and Food and Drug Administration guidelines, including patient consent where appropriate. For more information, visit the Author Center.
References
- 1.Schlesinger D.E., Diamant N., Raghu A., et al. A deep learning model for inferring elevated pulmonary capillary wedge pressures from the 12-lead electrocardiogram. JACC Adv. 2022;1(1):100003. [Google Scholar]
- 2.Raghunath S., Pfeifer J.M., Ulloa-Cerna A.E., et al. Deep neural networks can predict new-onset atrial fibrillation from the 12-lead ECG and help identify those at risk of atrial fibrillation-related stroke. Circulation. 2021;143:1287–1298. doi: 10.1161/CIRCULATIONAHA.120.047829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Attia Z.I., Friedman P.A., Noseworthy P.A., et al. Age and sex estimation using artificial intelligence from standard 12-lead ECGs. Circ Arrhythmia Electrophysiol. 2019;12(9):1–11. doi: 10.1161/CIRCEP.119.007284. [DOI] [PMC free article] [PubMed] [Google Scholar]