Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Mar 29.
Published in final edited form as: JACC Cardiovasc Imaging. 2022 Oct 19;16(2):209–220. doi: 10.1016/j.jcmg.2022.07.017

Direct risk assessment from myocardial perfusion imaging using explainable deep learning

Ananya Singh a,#, Robert JH Miller a,b,#, Yuka Otaki a, Paul Kavanagh a, Michael T Hauser c, Evangelos Tzolos a,d, Jacek Kwiecinski a,e, Serge Van Kriekinge a, Chih-Chun Wei a, Tali Sharir f, Andrew J Einstein g, Mathews B Fish h, Terrence D Ruddy i, Philipp A Kaufmann j, Albert J Sinusas k, Edward J Miller k, Timothy M Bateman l, Sharmila Dorbala m, Marcelo Di Carli m, Joanna X Liang a, Cathleen Huang a, Donghee Han a, Damini Dey a, Daniel S Berman a, Piotr J Slomka a,*
PMCID: PMC10980287  NIHMSID: NIHMS1838068  PMID: 36274041

Abstract

Background:

Myocardial perfusion imaging (MPI) is frequently used to provide risk stratification but methods to improve the accuracy of these predictions are needed.

Objectives:

We developed an explainable deep-learning model (HARD MACE-DL) for the prediction of death or non-fatal myocardial infarction (MI) and validated its performance in large internal and external testing populations.

Methods:

Patients undergoing single photon emission computed tomography MPI were included, with 20,401 patients in the derivation population (5 sites) and 9,019 in the external testing population (2 different sites). HARD MACE-DL uses myocardial perfusion, motion, thickening, and phase polar maps combined with age, sex, and cardiac volumes. The primary outcome was all-cause mortality or non-fatal MI. Prognostic accuracy was evaluated using area under receiver-operating characteristic curve (AUC).

Results:

Patients with normal perfusion and elevated HARD-MACE-DL risk were at higher risk than patients with abnormal perfusion and low HARD-MACE-DL risk (annualized event rate 3.6% vs. 2.2%, p<0.001). Patients in the highest quartile of HARD MACE-DL score had an annual rate of death or MI (4.7%) which was more than 10-fold higher than patients in the lowest quartile (0.4% per year). In external testing, the AUC for HARD MACE-DL (0.73, 95% CI 0.71–0.75) was higher than a logistic (AUC 0.70), stress TPD (AUC 0.65) and ischemic TPD (AUC 0.63, all p<0.01). Calibration, a measure of how well predicted risk matches actual risk, was excellent in both populations (Brier score 0.079 for internal and 0.070 for external).

Conclusion:

The DL model predicts death or MI directly from MPI, estimating patient-level risk with good calibration and improved accuracy compared to traditional quantitative approaches. The model incorporates mechanisms to explain to the physician which image regions contribute to the adverse event prediction.

Keywords: Deep learning, artificial intelligence, risk prediction, prognosis, myocardial perfusion imaging

Condensed Abstract

We developed a deep-learning model (HARD MACE-DL) to predict death or non-fatal myocardial infarction (MI) and tested it in large internal (20,401) and external testing (9,019) populations. HARD MACE-DL uses myocardial perfusion imaging (MPI) polar maps combined with age, sex, and cardiac volumes to derive predictions. Patients in the highest quartile of HARD MACE-DL score had a 10-fold higher rate of death or MI (4.7%) than patients in the lowest quartile (0.4% per year). The DL model predicts death or MI directly from MPI, estimating patient-level risk with good calibration and improved accuracy compared to traditional quantitative approaches.

INTRODUCTION

Accurate prediction of cardiovascular risk is central to management of patients with known or suspected coronary artery disease (CAD)(1,2). Given the extensive evidence base supporting its use (3), myocardial perfusion imaging (MPI) is used to provide risk stratification with an estimated 15–20 million MPI scans performed annually worldwide (4). While assessment is frequently performed using subjective visual interpretation, quantitative assessment of perfusion may help refine cardiovascular risk assessment(5) or guide treatment decisions(6). However, quantitative assessment of perfusion alone does not consider other MPI findings associated with increased risk (79).

Artificial intelligence (AI) is able to incorporate multiple parameters to objectively predict a patient’s likelihood of having obstructive CAD or their cardiovascular risk(10). Machine learning models have been applied to MPI, which use predetermined clinical, stress, and imaging variables, with high diagnostic(11), and prognostic accuracy(1214). Deep learning (DL) represents an alternative approach, with predictions made directly from image data (1517). Since DL uses images as the model input, this removes the need for software-dependent quantification, which may potentially reduce generalizability of machine learning models, and allows the model to identify latent image features which are not currently quantified. Additionally, this approach of direct image analysis does not require manual data collection, for past medical history as an example, and avoids issues with missing values. Our previous machine learning model for estimating major adverse cardiovascular event (MACE) risk used a total of 70 variables, of which more than half potentially require manual collection. We recently developed a general explainable DL model (CAD-DL), which utilizes perfusion, motion, thickening polar maps for prediction of obstructive disease which highlights image regions associated with presence of obstructive CAD directly on the polar maps. However, the accuracy of MACE prediction with an explainable DL has not been studied to date.

Accordingly, we performed this study to assess the prognostic accuracy of an explainable DL model (HARD MACE-DL). The model was trained in a large multicenter population (Registry of Fast Myocardial Perfusion Imaging with Next generation SPECT [REFINE SPECT]) (18) with external testing performed in a large population from two separate sites to evaluate generalizability to new populations.

METHODS

Study Populations

Two separate patient populations were included, with model derivation performed using patients from the REFINE SPECT and external testing performed in a separate population from the University of Calgary and Oklahoma Heart Hospital. To the extent allowed by data sharing agreements and IRB protocols, the data from this manuscript will be shared upon written request.

Training and internal testing population

The population from REFINE SPECT included consecutive patients from one of 5 sites undergoing SPECT MPI between 2009 and 2014 (n=20,418) as previously described(18). Patients without gated imaging or follow-up for MACE (n=17) were excluded. In total, 20,401 patients were used for training.

External testing population

The external testing population included 9,019 consecutive patients from the University of Calgary (n=2,985) and Oklahoma Heart Hospital (n=6,034) with follow-up for MACE. In a secondary analysis, we excluded patients who underwent early revascularization (n=475), since this may influence the association between imaging findings and outcomes(3,19,20). The study protocol complied with the Declaration of Helsinki and was approved by the institutional review boards at each participating institution. The overall study was approved by the institutional review board at Cedars-Sinai Medical Center.

Imaging protocols and traditional nuclear cardiology analysis

Imaging protocols for the REFINE SPECT population have been described previously (18), with additional details in the Supplement.

Visual interpretation

Experienced cardiologists interpreted perfusion with access to clinical history, stress test results, and all available imaging at the time of clinical reporting. Visual subjective interpretation of stress perfusion was assessed with summed stress scores (SSS) using the 17-segment American Heart Association model, or overall reader interpretation. All studies in the external population were assessed using SSS, with SSS>3 defined as abnormal(5).

Outcomes

In both internal and external testing populations, patients were followed for development of hard MACE which included all-cause mortality and non-fatal myocardial infarction (MI). Non-fatal MI was defined as hospitalization for cardiac chest pain or anginal equivalent with positive cardiac biomarkers(1,2). All outcomes were adjudicated by experienced cardiologists after considering all available investigations. At sites in the United States, all-cause mortality is retrieved from the National Death index. In Ottawa, all-cause mortality is determined through the OACIS Clinical Information System. In Calgary, all-cause mortality is obtained from Alberta vital statistics. In Israel, all-cause mortality is determined by the Ministry of Health Database.

Model Architecture

The DL model (HARD MACE-DL) was developed in the internal population with inputs of raw perfusion and gated derived maps of motion, thickening, phase angle, and amplitude combined with age, sex, end-systolic, and end-diastolic volumes. The proposed network, shown in Figure 1, consists of 2 convolution blocks, each with 3×3 convolution kernels, batch normalization, dropout, and Leaky Rectified Linear Unit (ReLU) layers, which were added to prevent overfitting. Cardiac volumes, age, and sex variables were introduced to the model in the first fully connected layer. End-diastolic and end-systolic volumes were used as input, left ventricular ejection fraction (LVEF) was not since the information would be redundant. The output of HARD MACE-DL was the likelihood of a patient experiencing death or MI during follow-up. In order to evaluate the importance of age in the DL model, we also developed and tested a model without age as an input (HARD MACE-DL-no age). Lastly, we evaluated models developed specifically for prediction of death (ACM-DL) or non-fatal MI alone (MI-DL).

Figure 1: Model Architecture.

Figure 1:

The explainable deep-learning model (HARD MACE-DL) utilizes inputs of perfusion, motion, thickening, phase amplitude, and phase angle polar maps. Age, sex, and cardiac volumes are integrated in the first fully connected layer. The model outputs are probability of death or myocardial infarction and attention maps. Gradient-weighted Class Activation Mapping (GradCAM) is used to highlight image regions contributing to the prediction and SHapley Additive exPlanations (SHAP) is used to rank the importance of each polar map to the risk prediction. DL: Deep Learning, MACE: Major Adverse Cardiac Events, ReLU – Rectified Linear Unit.

Individual explanation of outcome prediction

HARD-MACE-DL incorporates attention maps, using an approach referred to as Gradient-weighted Class Activation Mapping (GradCAM) (23), that highlights the left ventricular regions, which contributed to the prediction for the physician. To determine the importance of input polar maps, SHapley Additive exPlanations (SHAP)(24) values per image input were obtained which assigns each feature a per-prediction importance value. These values combined with attention were converted to percentage and ranked, to provide individual contribution per image input for the prediction. This allows physicians to ensure the AI findings are clinically relevant. HARD MACE-DL was implemented using Python 3.7.6 and PyTorch 1.5.1. The training was performed using Titan RTX graphics card (Nvidia, Santa Clara, CA).

Logistic regression model

We also performed a traditional multivariable logistic regression analysis for comparison to HARD MACE-DL. The model included age, sex, stress TPD, rest TPD, stress LVEF, and stress left ventricular end-systolic volume (LVESV). Collinearity was assessed with a variance-covariance and correlation matrices, with no significant collinearity identified. The logistic regression score was generated by multiplicatively combining values from β coefficients, with model details available in Supplemental Table 1. The model was developed and tested in the internal population. The model developed in the derivation population was also tested in the external population.

Internal 10-fold repeated testing

The prognostic accuracy of HARD MACE-DL was evaluated using 10-fold repeated testing and external testing (Supplemental Fig. 1), with additional details in the supplement.

External testing

External testing for HARD MACE-DL and logistic regression model was performed in patients from the University of Calgary (n=2,985) and Oklahoma Heart Hospital (n=6,034). The model with the lowest validation loss in 10-fold repeated testing was tested in the external population, minimizing overfitting. Patients from the external testing site were not used in any way during model development.

Statistical analysis

The prognostic accuracy for death or MI of the logistic regression model, stress TPD, ischemic TPD, and HARD MACE-DL was evaluated using area under the receiver operating characteristic curve (AUC) and compared using DeLong’s method(26). Calibration was assessed with calibration plots and Brier scores. Patients who died prior to experiencing MI were excluded from the analysis of prediction performance for MI alone. We also evaluated categorical net reclassification index (NRI) using two risk categories for HARD MACE-DL (low-risk vs high-risk) when compared to expert visual interpretation (normal vs. abnormal). We also evaluated continuous NRI for the combination of summed stress score and HARD MACE-DL. All statistical tests were two-tailed and a p-value <0.05 was considered statistically significant. Analyses were performed using R studio version 1.3.959 (RStudio, Boston, MA) and Stata version 14 (Stata Corp, College Station, TX).

RESULTS

Clinical Characteristics

In total, 20,401 patients were included in the training population with median age 64 (IQR 56 – 73) and 11,630 (57.0%) male patients. There were 9,019 patients included in the external testing population with median age 68 (IQR 60 – 75) and 4,871 (54.0%) male patients. Population characteristics in the internal and external testing populations are shown in Table 1.

Table 1.

Comparison of Training Population and External Testing Population.

Name REFINE Cohort (n=19,704) External Cohort (n=9,019) p-value
Age 64 (56 – 73) 68 (60 – 75) <0.001
Men 11,630 (57.0) 4,871 (54.0) <0.001
BMI 27 (25 – 31) 29 (26 – 34) <0.001
CAD risk factors
 Diabetes 5,204 (25.5) 2,731 (30.3) <0.001
 Dyslipidemia 12,893 (63.2) 1,979 (21.9) <0.001
 Hypertension 12,907 (63.3) 5,736 (63.6)
 Current smoker 3,871 (19.0) 2,841 (31.5) <0.001
 Past MI 2,765 (13.6) 677 (7.5) <0.001
 Past PCI 3,965 (19.4) 839 (9.3) <0.001
 Past CABG 1,695 (8.3) 570 (6.3) <0.001
Exercise stress 9,722 (47.7) 5,318 (59.0) <0.001
Death or MI, n(%) 1,913 (9.4) 719 (8.0)
Death 1,615 (7.9) 545 (6.0) <0.001
MI 379 (1.9) 212 (2.4) <0.001
Follow-up 4.6 (3.6 – 5.8) 3.5 (3.1 – 3.8) <0.001

Continuous variables summarized as median (interquartile range) and categorical variables as number (proportion). Abbreviations: body mass index (BMI), coronary artery bypass grafting (CABG), coronary artery disease (CAD), confidence interval (CI), heart rate (HR), major adverse cardiovascular event (MACE), myocardial infarction (MI), percutaneous coronary intervention (PCI).

Internal testing

In the internal population, death or MI occurred in 1,913 patients during median follow-up of 4.6 years (IQR 3.6 – 5.8). The prediction performance for death or MI of HARD MACE-DL (AUC 0.76, 95% CI 0.75–0.77) was significantly higher than stress TPD (AUC 0.63, 95% CI 0.62–0.65), or ischemic TPD (AUC 0.61 95% CI 0.59–0.62), p<0.001 for both (Central Illustration). HARD MACE-DL also had higher prediction performance compared to logistic regression (AUC 0.72, 95% CI 0.71 – 0.73, p<0.001).

Central Illustration:

Central Illustration:

Prediction performance for death or myocardial infarction (MI) in internal and external testing populations. The area under the receiver operating characteristic curve (AUC) for deep learning was significantly higher than stress or ischemic total perfusion deficit (TPD). CI – confidence interval.

Figure 2a shows survival free of death or MI in patients stratified by visually normal (n=12,320) or abnormal (n=8,081) perfusion compared to similarly sized populations identified as low (<0.103) or high-risk (≥0.103) by HARD MACE-DL. Annualized event rates were higher in patients identified as high-risk by HARD MACE-DL compared to visual interpretation (3.7% vs 3.0%) and lower in patients identified as low-risk by HARD MACE-DL (0.9% vs 1.4%). Patients who had visually normal perfusion with elevated HARD-MACE-DL risk (n=3,905, 19.1%) were at higher risk compared to patients with visually abnormal perfusion and low HARD-MACE-DL risk (n=3,902, 19.1%, annualized event rate 2.9% vs. 1.2%, p<0.001).

Figure 2: Kaplan-Meier survival curves comparing patients stratified by HARD MACE-DL and expert visual interpretation.

Figure 2:

Kaplan-Meier survival curves comparing survival free of death or myocardial infarction based on classification by deep-learning (DL) and expert visual interpretation, in internal (a) and external populations (b). Hazard ratios (HR) are for patients with abnormal visual perfusion and high DL risk compared to patients with normal visual perfusion and low DL risk (red font).

Patients in the highest quartile of HARD MACE-DL score had an annual rate of death or MI of 4.8% (95% CI:4.5–5.1), with a 10-fold increased risk compared to patients in the lowest quartile of HARD MACE-DL score who had an annual rate of 0.48% (95% CI:0.40–0.58) (Supplemental Figure 2a).

Figure 3a shows the calibration plot for HARD MACE-DL, during internal testing, demonstrating excellent agreement between the predicted risk and actual rate of death or MI.

Figure 3: Calibration Plots.

Figure 3:

Calibration plots for HARD MACE-DL score in the internal (a) and external testing populations (b). In both cohorts, increasing percentile of deep-learning (HARD MACE-DL) score is associated with increasing risk of death or myocardial infarction.

External testing

In the external testing population, 719 patients experienced MI or death during median follow-up of 3.5 years (IQR 3.1 – 3.8). The prediction performance for death or MI of HARD MACE-DL (AUC 0.73, 95% CI 0.71–0.75) was higher than stress TPD (AUC 0.65, 95% CI 0.63–0.67), or ischemic TPD (AUC 0.64, 95% CI 0.61–0.66, both p<0.01), (Central Illustration). HARD MACE-DL also had higher prediction performance compared to the logistic regression model (AUC 0.70, 95% CI 0.68 – 0.72, p<0.01).

Figure 2b shows survival free of death or MI in the external testing population for patients stratified by visual assessment of perfusion and HARD MACE-DL risk. Elevated HARD-MACE-DL risk was defined as ≥0.103 (threshold defined in the internal testing population). Patients with visually normal perfusion and elevated HARD-MACE-DL risk (n=1,682, 19.7%) had a higher annualized event rate compared to patients with visually abnormal perfusion and low HARD-MACE-DL risk (n=656, 7.7%, annualized event rate 3.6% vs. 2.1%, p<0.001). Meanwhile, patients with abnormal visual perfusion and elevated HARD-MACE DL risk had the highest event rate (annualized event rate 6.9%) and patients with both normal visual perfusion and low HARD-MACE-DL risk were at the lowest risk (annualized event rate 1.1%). Patients in the highest quartile of HARD MACE-DL score (thresholds defined in the internal testing population) had an annualized rate of death or MI of 5.4% (4.9 – 6.0%) compared to 0.8% (95% CI 0.6 – 1.0%) in the lowest quartile (p<0.001) (Supplemental Figure 2b).

Calibration for HARD MACE-DL in the external testing population, adjusted for differences in follow-up time, is shown in Figure 3b. In the external population, there was good agreement between predicted risk and actual event rate. The Brier score for HARD-MACE-DL was 0.066, compared to 0.078 for logistic regression. When considered in addition to physician visual interpretation, HARD-MACE-DL had significantly improved categorical NRI for patients with events (21.2%, 95% CI 16.8 to 25.6%) and overall (9.5%, 95% CI 4.7% to 15.3%), results in Supplemental Table 2. Continuous NRI analysis is presented in Supplemental Table 3, with positive overall NRI for HARD-MACE DL and summed stress scores.

In the external population, prediction performance for death or non-fatal MI was similar for HARD-MACE-DL if patients who underwent early revascularization were excluded. The AUC for HARD MACE-DL (AUC 0.73, 95% CI 0.71–0.75) was higher than stress TPD (AUC 0.65, 95% CI 0.63–0.67, p<0.01), ischemic TPD (AUC 0.63, 95% CI 0.61–0.66, p<0.01), and the logistic regression model (AUC 0.71, 95% CI 0.69 – 0.73, p=0.024). Age significantly improved the prognostic accuracy of the DL model (p<0.01), but HARD MACE-DL-no age (AUC 0.70, 95% CI 0.68–0.72) continued to have higher prediction performance compared to stress TPD or ischemic TPD (both p<0.01). Prediction performance for death alone was higher for ACM-DL (AUC 0.77, 95% CI 0.75 – 0.79) compared to stress TPD (AUC 0.66, 95% CI 0.63 – 0.68) or ischemic TPD (AUC 0.64, 95% CI 0.61 – 0.66, p<0.01 for both). However, prediction performance for MI was similar for MI-DL (MI-DL; AUC 0.65, 95% CI 0.61–0.68) compared to stress TPD (AUC 0.64, 95% CI 0.60 – 0.68) or ischemic TPD (AUC 0.63, 95% CI 0.59 – 0.67, p>0.05).

Case Examples:

Figure 4 shows stress and rest perfusion imaging and HARD MACE- DL results for a 69-year-old man presenting with dyspnea. Visual interpretation (SSS=0) and TPD (0.6%) were consistent with low-risk, while the HARD MACE-DL scores suggested high-risk. The patient experienced a MI at 485 days follow-up. The information from the HARD MACE-DL prediction potentially could have been used to identify abnormalities in motion and phase amplitude polar maps, ultimately guiding physicians to consider more aggressive medical therapy. Additional cases outlining the importance of motion and thickening polar maps are shown in Supplemental Figures 3 and 4.

Figure 4: Case example.

Figure 4:

Stress and rest perfusion myocardial perfusion images (left), image polar maps (middle two columns), and explainable DL results for a 69-year-old man presenting with dyspnea. Visual interpretation (SSS 0) was normal and quantitative analysis shows minimal abnormalities (white arrows and black dashed circle on stress perfusion images), suggesting low risk. The HARD MACE-DL prediction (green box) was high-risk. The attention map (right) highlights (gold color) the apical inferior and inferoseptal segments on the stress perfusion polar map as abnormal. This corresponds to the location of the small perfusion defect as well as delayed count amplitude (white dashed circle) -a hallmark of dyssynchrony, and abnormal motion (black dotted circle). Image importance ranking by Shapley additive explanations (blue table right) identified stress perfusion as the most important polar map contributing to the high-risk prediction. The patient experienced a myocardial infarction at 485 days follow-up. DL: deep learning, SSS: summed stress score.

DISCUSSION

We assessed the prognostic accuracy of an explainable DL model compared to quantitative measures of perfusion and a standard regression model. The DL model had significantly higher prediction performance for death or MI compared traditional nuclear cardiology perfusion variables and a logistic regression model in both internal and external testing. Importantly, the DL model also re-classified a significant proportion of patients with visually normal perfusion as high-risk and visually abnormal perfusion as low risk. The HARD MACE-DL score was well calibrated in both internal and external testing, suggesting good agreement between predicted and actual risk. These results suggest that HARD MACE-DL could be implemented clinically to provide a fully automated prediction regarding cardiovascular risk, directly from MPI images with a high degree of accuracy, while also explaining predictions to physicians.

There is growing literature supporting a role for AI in providing patient-specific risk estimations following MPI. Most previous efforts directed at providing risk prediction have been performed using classical machine learning, which relies on previously quantified variables and specific software tools which derive such variables. Betancur et al. developed a model using clinical, imaging, and stress test variables using 2,689 patients from a single center, demonstrating improved MACE prediction compared to physician interpretation or quantitative analysis(14). Subsequent studies have shown that machine learning can improve prediction of early revascularization(12), or automatically select patients with a low risk of MACE for stress-only imaging(13). However, a major limitation to clinical implementation of machine learning has been the feasibility of collecting the required variables, with as many as 45 variables potentially requiring manual collection by clinicians or technical staff (14). DL may be more practical for clinical use because predictions can be made directly from the images. This removes the need for variable collection prior to predictions being made, increases generalizability by eliminating the need for software-specific image quantification, and avoids issues with missing values. Several previous studies have shown that DL models can predict likelihood of obstructive CAD directly (11,1517), or classify studies as having normal or abnormal perfusion (27,28). However, this is the first study to assess the prognostic accuracy of a DL model. This removes the need for variable collection prior to predictions being made, increases generalizability by eliminating the need for software-specific image quantification, and avoids issues with missing values. While providing additional information to the DL or logistic regression model may improve prediction performance, it would potentially make the model more difficult to implement clinically. Lastly, this approach facilitates methods for explainability which identify important areas of images for physicians to evaluate more fully.

For any AI method, integrating methods to explain predictions enables physicians to better evaluate their accuracy. For MPI applications, this also allows physicians to correlate predictions with potential coronary anatomy. The current model incorporates an attention map, which highlights regions of polar maps which contribute most to the DL model predictions. Additionally, we utilized SHAP approach to identify which polar map (perfusion, function, phase) was most important, further guiding the physician. Directed by these explanations, the images could be more closely assessed by physicians to identify abnormalities, or potentially imaging artifacts inappropriately identified by the DL model. While the AUC is lower than our previous work with machine learning utilizing all clinical information in addition to imaging data (AUC 0.81) (14), the current results were obtained during external testing and for a potentially more relevant clinical outcome of hard MACE (excluding revascularization) and from imaging data alone -which makes it easier to implement. The DL model is able to predict risk fully automatically, directly from images (independent of quantification software) and provide visual attention maps indicating the imaging source driving the prediction. In fact, the AUC for the present model in internal 10-fold cross validation testing (0.76) was comparable to the performance our previous machine learning model using imaging data alone (AUC 0.78). Importantly, the current model incorporates methods to explain predictions to physicians directly on polar map images, allowing them to identify the potential clinical relevance of the highlighted image segments. We recently demonstrated that explainable DL had higher diagnostic accuracy for obstructive CAD compared to quantitative or visual analysis (17) and may improve physician interpretation of nuclear cardiology scans (29). However, studies are needed to determine if this approach influences patient management, particularly if both diagnostic and prognostic predictions are presented to the physician.

We completed additional analyses to further evaluate the potential clinical utility of HARD MACE-DL for risk prediction. Our analyses demonstrated consistently high prognostic accuracy for the DL model during external testing including for the DL model without age and for prediction or all-cause mortality alone. While prediction performance was similar for prediction of MI alone, this analysis was limited by a low number of events. Interestingly, patients with visually normal perfusion but at high-risk patients by DL model had substantially higher risk compared to patients with visually abnormal perfusion but low DL risk. More than a quarter of patients would have their risk reclassified by considering DL risk predictions in addition to visual interpretation. Therefore, it may play a clinical role in identifying patients at high cardiovascular risk in spite of visually normal perfusion (such as the case in Figure 5). These patients may benefit from more aggressive management of cardiovascular risk factors or consideration of aspirin and statin therapies. While the myocardial perfusion alone may not identify patients who benefit from early invasive management(31), the combination of abnormal perfusion and function may (32). This highlights the importance of considering multiple imaging parameters when guiding patient management. The HARD-MACE-DL model incorporates perfusion, motion, thickening, and phase polar maps, simplifying the task for physicians and identifying relevant abnormalities. We also demonstrated good calibration between predicted risk and actual event rate in both the internal and external populations. Calibration is critical to ensuring clinical utility since significant over or under-estimation of risk could lead to drastic changes in patient management (32). Overall, these results suggest that HARD-MACE-DL would be able to provide reliable decision support in clinical practice.

Our study has a few important limitations. Formal prospective validation in an external population was not performed; however, the model was tested using both repeated internal testing and external testing on unseen data. Both approaches conservatively reflect the performance estimate when applied to a new patient population. Age significantly contributed to the prediction performance of HARD MACE-DL but is not a modifiable risk. However, the model without age also outperformed quantitative analysis of perfusion and the information can be extracted from the image header. We did not consider coronary artery calcium information since it is not obtained in the majority of SPECT MPI studies. However, given the clear prognostic utility of this information(33,34), dedicated studies to determine whether it can be used to improve DL model performance are warranted. In this large, multicenter study, we assessed all-cause mortality, but not cardiovascular specific mortality. However, determination of cardiovascular specific mortality from administrative databases has limited accuracy(35). Lastly, we included patients who underwent early revascularization in the primary analysis. Revascularization may alter the association between perfusion and outcomes(3,19,20), but results were similar when these patients were excluded.

CONCLUSIONS

In our study, an explainable DL model had better prognostic accuracy for hard cardiovascular events compared to traditional assessment of myocardial perfusion, or standard multivariable statistical modelling utilizing quantitative variables. The DL model was well calibrated, provided robust risk stratification, and incorporates methods to explain individual predictions. These results suggest that HARD-MACE-DL could be implemented clinically to provide a fully automated prediction for risk of hard cardiovascular events.

CLINICAL PERSPECTIVES

Clinical Competencies (Medical Knowledge): An explainable DL model had better prognostic accuracy for hard cardiovascular events compared to traditional assessment of myocardial perfusion.

Clinical Competencies (Medical Knowledge): The DL model was well calibrated, suggesting good agreement between predicted and actual risk.

Translation Outlook: Studies are needed to determine if explainable DL predictions influence patient management.

Supplementary Material

Supplementary Material

ACKNOWLEDGMENTS

We would like to thank all the individuals involved in the collection, processing, and analysis of data in this multicenter registry.

SOURCE OF FUNDING

This research was supported in part by grant R01HL089765 from the National Heart, Lung, and Blood Institute/ National Institutes of Health (NHLBI/NIH) (PI: Piotr Slomka). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Abbreviations

AI

artificial intelligence

AUC

area under the ROC curve

CAD

coronary artery disease

DL

deep learning

IQR

interquartile range

MACE

major adverse cardiac events

MI

myocardial infarction

MPI

myocardial perfusion imaging

ROC

receiver operating characteristics

SPECT

single photon emission computed tomography

SSS

summed stress score

TPD

total perfusion deficit

Footnotes

COMPETING INTERESTS

Drs. Berman and Slomka and Mr. Kavanagh participate in software royalties for QPS software at Cedars-Sinai Medical Center. Dr. Slomka has received research grant support from Siemens Medical Systems. Drs. Berman, Dorbala, Einstein, and Edward Miller have served as consultants for GE Healthcare. Dr. Einstein has served as a consultant to W. L. Gore & Associates. Dr. Dorbala has served as a consultant to Bracco Diagnostics; her institution has received grant support from Astellas. Dr. Di Carli has received research grant support from Spectrum Dynamics and consulting honoraria from Sanofi and GE Healthcare. Dr. Ruddy has received research grant support from GE Healthcare and Advanced Accelerator Applications. Dr. Einstein’s institution has received research support from GE Healthcare, Philips Healthcare, Toshiba America Medical Systems, Roche Medical Systems, and W. L. Gore & Associates. The remaining authors declare no competing interests.

REFERENCES

  • 1.Fihn SD, Gardin JM, Abrams J et al. 2012 ACCF/AHA/ACP/AATS/PCNA/SCAI/STS guideline for the diagnosis and management of patients with stable ischemic heart disease. Circulation 2012;126:e354–471. [DOI] [PubMed] [Google Scholar]
  • 2.Knuuti J, Wijns W, Saraste A et al. 2019 ESC Guidelines for the diagnosis and management of chronic coronary syndromes. Eur Heart J 2020;41:407–477. [DOI] [PubMed] [Google Scholar]
  • 3.Hachamovitch R, Hayes SW, Friedman JD, Cohen I, Berman DS. Comparison of the short-term survival benefit associated with revascularization compared with medical therapy in patients with no prior coronary artery disease. Circulation 2003;107:2900–7. [DOI] [PubMed] [Google Scholar]
  • 4.Einstein AJ. Multiple opportunities to reduce radiation dose from myocardial perfusion imaging. Eur J Nucl Med Mol Imaging 2013;40:649–651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Otaki Y, Betancur J, Sharir T et al. 5-Year Prognostic Value of Quantitative Versus Visual MPI in Subtle Perfusion Defects. JACC Cardiovasc Imaging 2020;13:774–785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Azadani PN, Miller RJH, Sharir T et al. Impact of Early Revascularization on Major Adverse Cardiovascular Events in Relation to Automatically Quantified Ischemia. JACC Cardiovasc Imaging 2021;14:644–653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Miller RJH, Sharir T, Otaki Y et al. Quantitation of Poststress Change in Ventricular Morphology Improves Risk Stratification. J Nucl Med 2021;62:1582–1590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Miller RJH, Hu LH, Gransar H et al. Transient ischaemic dilation and post-stress wall motion abnormality increase risk in patients with less than moderate ischaemia. Eur Heart J Cardiovasc Imaging 2020;21:567–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kuronuma K, Miller RJH, Otaki Y et al. Prognostic Value of Phase Analysis for Predicting Adverse Cardiac Events Beyond Conventional Single-Photon Emission Computed Tomography Variables. Circ Cardiovasc Imaging 2021;14:e012386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Dey D, Slomka PJ, Leeson P et al. Artificial Intelligence in Cardiovascular Imaging: JACC State-of-the-Art Review. J Am Coll Cardiol 2019;73:1317–1335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Eisenberg E, Miller RJH, Hu LH et al. Diagnostic safety of a machine learning-based automatic patient selection algorithm for stress-only myocardial perfusion SPECT. J Nucl Cardiol 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hu LH, Betancur J, Sharir T et al. Machine learning predicts per-vessel early coronary revascularization after fast myocardial perfusion SPECT. Eur Heart J Cardiovasc Imaging 2020;21:549–559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hu LH, Miller RJH, Sharir T et al. Prognostically safe stress-only single-photon emission computed tomography myocardial perfusion imaging guided by machine learning. Eur Heart J Cardiovasc Imaging 2021;22:705–714. [DOI] [PubMed] [Google Scholar]
  • 14.Betancur J, Otaki Y, Motwani M et al. Prognostic Value of Combined Clinical and Myocardial Perfusion Imaging Data Using Machine Learning. JACC Cardiovasc Imaging 2018;11:1000–1009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Betancur J, Commandeur F, Motlagh M et al. Deep Learning for Prediction of Obstructive Disease From Fast Myocardial Perfusion SPECT: A Multicenter Study. JACC Cardiovasc Imaging 2018;11:1654–1663. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Betancur J, Hu LH, Commandeur F et al. Deep Learning Analysis of Upright-Supine High-Efficiency SPECT Myocardial Perfusion Imaging for Prediction of Obstructive Coronary Artery Disease. J Nucl Med 2019;60:664–670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Otaki Y, Singh A, Kavanagh P et al. Clinical Deployment of Explainable Artificial Intelligence of SPECT for Diagnosis of Coronary Artery Disease. JACC Cardiovasc Imaging 2021;In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Slomka PJ, Betancur J, Liang JX et al. Rationale and design of the REgistry of Fast Myocardial Perfusion Imaging with NExt generation SPECT (REFINE SPECT). J Nucl Cardiol 2020;27:1010–1021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Miller RJH, Bonow RO, Gransar H et al. Percutaneous or surgical revascularization is associated with survival benefit in stable coronary artery disease. Eur Heart J Cardiovasc Imaging 2020;21:961–970. [DOI] [PubMed] [Google Scholar]
  • 20.Patel KK, Spertus JA, Chan PS et al. Extent of Myocardial Ischemia on Positron Emission Tomography and Survival Benefit With Early Revascularization. J Am Coll Cardiol 2019;74:1645–1654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Slomka PJ, Nishina H, Berman DS et al. Automated quantification of myocardial perfusion SPECT using simplified normal limits. J Nucl Cardiol 2005;12:66–77. [DOI] [PubMed] [Google Scholar]
  • 22.Cerqueira MD, Weissman NJ, Dilsizian V et al. Standardized myocardial segmentation and nomenclature for tomographic imaging of the heart. Circulation 2002;105:539–42. [DOI] [PubMed] [Google Scholar]
  • 23.Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. 2017. IEEE International Conf Comput Vis 2017:618–626. [Google Scholar]
  • 24.Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. Int Conf Neural Inform Process Systems 2017:4768–4777. [Google Scholar]
  • 25.Molinaro AM, Simon R, Pfeiffer RM. Prediction error estimation: a comparison of resampling methods. Bioinformatics 2005;21:3301–7. [DOI] [PubMed] [Google Scholar]
  • 26.DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44:837–45. [PubMed] [Google Scholar]
  • 27.Papandrianos N, Papageorgiou E. Automatic Diagnosis of Coronary Artery Disease in SPECT Myocardial Perfusion Imaging Employing Deep Learning. Applied Sciences 2021;11:6362. [Google Scholar]
  • 28.Spier N, Nekolla S, Rupprecht C, Mustafa M, Navab N, Baust M. Classification of Polar Maps from Cardiac Perfusion Imaging with Graph-Convolutional Neural Networks. Sci Rep 2019;9:7569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Miller R, Kuronuma K, Singh A et al. Explainable Deep Learning Improves Physician Interpretation of Myocardial Perfusion Imaging. J Nucl Med 2022;Epub ahead of print. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Maron DJ, Hochman JS, Reynolds HR et al. Initial Invasive or Conservative Strategy for Stable Coronary Disease. N Engl J Med 2020;382:1395–1407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lopes RD, Alexander KP, Stevens SR et al. Initial Invasive Versus Conservative Management of Stable Ischemic Heart Disease in Patients With a History of Heart Failure or Left Ventricular Dysfunction. Circulation 2020;142:1725–1735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Van Calster B, Vickers AJ. Calibration of risk prediction models: impact on decision-analytic performance. Med Decis Making 2015;35:162–9. [DOI] [PubMed] [Google Scholar]
  • 33.Aljizeeri A, Ahmed Ahmed I, Alfaris Mousa A et al. Myocardial Flow Reserve and Coronary Calcification in Prognosis of Patients With Suspected Coronary Artery Disease. JACC Cardiovasc Imaging 2021;14:2443–2452. [DOI] [PubMed] [Google Scholar]
  • 34.Engbers EM, Timmer JR, Ottervanger JP, Mouden M, Knollema S, Jager PL. Prognostic Value of Coronary Artery Calcium Scoring in Addition to Single-Photon Emission Computed Tomographic Myocardial Perfusion Imaging in Symptomatic Patients. Circulation: Cardiovasc Imaging 2016;9:e003966. [DOI] [PubMed] [Google Scholar]
  • 35.Lix LM, Sobhan S, St-Jean A et al. Validity of an algorithm to identify cardiovascular deaths from administrative health records. BMC Health Serv Res 2021;21:758. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

RESOURCES