Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Oct 1.
Published in final edited form as: J Nucl Cardiol. 2018 Sep 12;27(5):1652–1664. doi: 10.1007/s12350-018-1432-3

Diagnostic Performance of an Artificial Intelligence Driven Cardiac Structured Reporting System for Myocardial Perfusion SPECT Imaging

Ernest V Garcia 1, J Larry Klein 2, Valeria Moncayo 1, C David Cooke 1,3, Christian Del’Aune 3, Russell Folks 1, Liudmila Verdes Moreiras 1, Fabio Esteves 1
PMCID: PMC6414293  NIHMSID: NIHMS1506612  PMID: 30209754

Abstract

Objectives:

To describe and validate an AI driven structured reporting system by direct comparison of automatically generated reports to results from actual clinical reports generated by nuclear cardiology experts.

Background:

Quantitative parameters extracted from MPI studies are used by our AI reporting system to generate automatically a guideline compliant structured report (sR).

Method:

A new non-parametric approach generates distribution functions of rest and stress, perfusion and thickening, for each of 17 LV segments that are then transformed to certainty factors (CF) that a segment is hypoperfused, ischemic. These CFs are then input to our set of heuristic rules used to reach diagnostic findings and impressions propagated into a structured report referred as an AI driven structured Report (AIsR).

The diagnostic accuracy of the AIsR for detecting CAD and ischemia was tested in 1,000 patients who had undergone rest /stress SPECT MPI.

Results:

At the high-specificity level, in a subset of 100 patients, there were no statistical differences in the agreements between the AIsR and nine experts' impressions of CAD (p = .33) or ischemia (p = .37). This high-specificity level also yielded the highest accuracy across global and regional results in the 1000 patients. These accuracies were statistically significantly better than the other two levels (SN/SP tradeoff, high sensitivity) across all comparisons.

Conclusions:

This AI reporting system automatically generates a structured natural language report with a diagnostic performance comparable to those of experts.

Keywords: expert systems, artificial intelligence, myocardial perfusion SPECT, quantitative analysis, structured reporting

INTRODUCTION

AI methods to aid diagnosticians in making clinical image interpretation of SPECT myocardial perfusion studies have been reported. Examples include neural networks (1-4), case-based reasoning (5), support vector machines (6), machine-learning (7) and knowledge-based expert systems (8, 9). In expert systems, a knowledge base of heuristic rules is obtained from human experts capturing how they make their interpretations. Yet, to date, no one has developed automatically generated and/or validated natural language structured reports that follow society guidelines. The convergence of the high prevalence of heart disease, increased complexity of cardiac imaging techniques, the increasing amount of patient-specific clinical information and the reduced time the diagnostician has to dedicate to each patient inevitably lead to misdiagnosis and potential patient mismanagement. Hence, AI tools could assist physicians in interpreting and reporting studies at a faster rate and at the highest level of up-to-date expertise.

Here we report on our development and validation in 1000 patients of an expert system which applies its knowledge to extracted patients' LV perfusion and function information from MPI imagery to propagate this AI driven structured (9) Report (AIsR) following society guidelines (10). Although physicians can easily modify any aspect of the AIsR, here we only evaluate the automatically generated results.

METHODS

Study Design

This is a single center retrospective study designed to compare the diagnostic agreement between an automatically generated AIsR and the clinical rest/stress MPI report dictated by human experts. One of nine nuclear cardiology experts dictated these clinical reports. The primary hypothesis was to demonstrate that the per-patient and per-vessel diagnostic performance of the AIsR in reporting hypoperfusion (CAD) and reversibility (ischemia) is comparable (i.e. not inferior) to that of human experts' clinical reports. Agreement between the AIsR and the clinical report was compared in a 100 patient cohort to the agreement between the same MPI studies interpreted and reported a second time by another independent -10th human expert (VM) who started at Emory after the last MPI study in the trial was acquired (2010) and thus was never privy to their clinical reports. The second goal was to apply the same methodology to the entire 1000 study group to determine agreement rates between AIsR and experts.

Study Population

One thousand consecutive MPI conventional studies used for this evaluation were obtained from our cardiac database of patients (589 men) referred to Emory University Hospital for clinically indicated attenuation-corrected rest/stress myocardial perfusion SPECT imaging between May 2008 and March 2010. Note that none of these 1,000 patients was used for the development of the method. Patients imaged with a CZT SPECT camera and/or lower doses during this period were excluded due to differences in technology and changing protocols. Emory’s institutional review board approved this research.

Clinical Data

Age, gender, body mass index and risk factors data were extracted from the patients' medical records in Emory's data warehouse (Table 1). Risk factors mined were hypertension, hyperlipidemia, diabetes mellitus, smoking history, prior myocardial infarction, and prior revascularization. Representative quantitative MPI parameters were also extracted (Table 1) to characterize the population.

Table 1.

Characteristics of the study population.

Sample size 1000
Age (years) 61 ± 13
Male gender 59% (586)
Body mass index (kg/m2) 29.2 ± 6.0
Hypertension 74% (741)
Hyperlipidemia 87% (867)
Diabetes mellitus 42% (415)
Smoking history 8.7% (87)
Prior myocardial infarction 11% (105)
Prior revascularization 30% (304)
Prevalence of CAD* 34.7%
Prevalence of Ischemia* 12.0%
SSS^ 2.24 ± 4.57
SDS^ 1.11 ± 2.64
TID^ 1.01 ± .13
Stress LVEF^ 64 ± 13%
Rest LVEF^ 63 ± 13%
*

From Clinical MPI Reports

^

From ECTb4

Standard Dual-Detector SPECT

All patients underwent 8-frame ECG-gated one-day attenuation corrected (AC) low-dose rest, high-dose stress Tc-99m tetrofosmin myocardial perfusion dual-detector SPECT according to the ASNC guidelines (12). Rest-stress doses were determined based on patient’s body weight starting at <200 lbs (370 MBq rest (10 mCi), 1110 MBq stress (30 mCi)). Acquisition times were 14 minutes for rest imaging and 12 minutes for stress imaging. Conventional SPECT projections were obtained utilizing the simultaneous emission/transmission acquisition method that uses a scanning gadolinium-153 line source as the transmission source. The emission transaxial images were reconstructed with an OSEM algorithm with 4 subsets and 10 iterations and a uniform initial estimate. The scatter distribution obtained from the scatter window was used to correct both the scatter from the patient onto the photopeak window and the scatter from the patient onto the transmission energy window. Attenuation maps were reconstructed by use of a Bayesian algorithm with Butterworth filter preprocessing at 0.43 critical frequency and an order of 5.0. The attenuation map reconstruction used 30 iterations with a uniform initial estimate.

MPI reporting as reference standard

In each patient, the detection of hypoperfusion at stress and the presence of reversibility at rest for each major vascular territory reported by AIsR were compared to those from clinical reports generated by one of nine possible nuclear cardiology experts, each with at least 5 years of experience. The clinical interpretations reported were used as the reference standard. The image interpretation for the clinical reports were performed in the routine conventional way. The diagnosticians had full use of ECTb V3.0 images and quantitative results (13) as well as all the usual clinical information requested by the interpreter. None of the nine interpreters had access to the AIsR results from ECTb V4 developed after 2010 nor did any of these nine participate in developing any of the heuristic rules in the program’s knowledge base.

Thus, because of the differences in the approaches, the SSS, SDS global and regional values between V3 and V4 could be quite different. Disease was assigned to one or more vascular territory combinations: left anterior descending artery (LAD), left circumflex artery (LCX), and the right coronary artery (RCA).

Inter-observer variability subgroup

A subgroup of the last 100 consecutive patients were extracted from the 1000-patients to determine the interobserver variability between experts. A tenth nuclear cardiology expert (VM) recruited to our institution after the last patient in the study was acquired performed as an independent reader to determine how the diagnostic variability between human experts reports compared to the variability between experts and the AIsR.

Image Analysis and AIsR Interpretation and Reporting

All MPI studies were reconstructed and reoriented into oblique-axis tomograms using conventional techniques according to ASNC guidelines (12). The studies were then submitted by a technologist to a well-established automatic method of extracting 3D rest, stress distributions of myocardial perfusion, and function (13). The technologist reviewed the processing and manually modified the automatically determined parameters if deemed incorrect, which was done less than 10% of the times and usually at the LV base.

These 3D distributions were then submitted to our iterative method of database quantification implemented in ECTb V4.0. This iterative approach determines the 0-4 score for each of the conventional 17 segments using three iterations through the rest and stress AC and non-AC perfusion and non-AC function distributions. The iterative steps were as follows: 1) determination of the certainty that a segment is abnormal, 2) assigning the score to each of the 17 segments and 3) using our expert system to modify the score consistent with all the information available for that segment which we call a smart score.

Step 1. Determining certainty of segment abnormality.

A certainty factor (CF) is determined ranging from −1 to +1 for each of the 17 LV segments (−1 = definitely no count reduction (normal), +1 = definitely count reduction, and the range from −0.2 to +0.2 means the presence of any finding that is equivocal or indeterminate). This CF determination of segment abnormality first calculates the % abnormal probability (ps) for each segment (14) whether a patient’s normalized perfusion distribution (relative blood flow) is lower than that of the normal distribution redeveloped from a previously reported group of normal low likelihood patients (15,16). Since the relative blood flow is extracted in terms of number of counts and these counts vary depending on the injected dose, patient size, LV size, and instrument sensitivity, these count distributions for each voxel segment cvs have to be normalized both by the maximal voxel count uptake (Cmax) over the entire LV, and by the total number of LV voxels in each segment (Vs). The normalized count density n) for each voxel in segment s is given by:

nvs=[100cvs][VsCmax]

The value of a cumulative distribution function over all voxels in segment s is given by ptns as the sum of all normalized count densities for patient pt.:

nspt=nvsv:nspt=0for allnvs>nvs(nspt=100)

Thus, for example, the value of ptns at 50% in segment 2 in Figure 1 is found by finding the 50 in the x-axis to reach the patients red distribution, the value that you read 55% from the y-axis is ptns - this represents percentage of the total number of voxels in segment 2 which are ≤ an nvs of 50%. In Figure 1, the red distributions are the normalized cumulative count value stress distributions for each of the 17 segments of the patient shown in the polar map. Note that the patient’s distribution (red) is set to zero after it reaches 100%. This was done to increase the [ptns- nlns] difference and thus the discriminatory power of ps.

Figure 1. 17-Segment Results from a Patient with LCx vessel disease.

Figure 1.

Color polar map insert (A) shows the myocardial perfusion distribution for a female patient with LCx vessel disease with the 17-segment model with scores superimposed. The 17 plots correspond to the 17-segment model (B) with the LAD segments on the top, LCX in the middle, and RCA in the bottom rows. The x-axes are the normalized count values and the y-axes are the normalized voxel frequencies with those count values. The white distributions are the averaged normalized cumulative distributions from 20 female patients with low likelihood of CAD. The red distributions are the normalized cumulative count value distributions for the patient shown in the polar map. Note that red distributions to the left of the white normal ones represent increasing certainty of abnormality. Also note how well behaved is the shape of each of the patient’s segmental distributions even though it represents a small portion of the LV from just one patient.

The white distributions nlns are the cumulative distribution functions from all normal patients used to create this specific non-parametric normal database. The probability ps, is then determined for each of the 17 LV segments whether a patient’s tracer distribution is lower than that of the normal distribution as:

ps=100n[nsptnsnl]nspt

Note that ps is a function of nvs. Also, note that to determine the probability pswe are summing over all available n’s (i.e. all available samples of normalized count values) that is equivalent to summing all n’s from 0 to 100%. These ps are converted to CFs by a transformation from [0,100] → [−1, 1] using Shannon’s information theory (17). In this information approach, CF is obtained by using a transformation function between percent (ps) of a segment being abnormal and uncertainty U = (1-CF) as:

U=psiilog2psi

Where i is the potential number of states in this case 2, normal and abnormal. For example, in Figure 1, for segment 6, p6 = .89 (or 89%), hence U = − (.89 log2 .89 + .11 log2 .11) = .50 and therefore CF is abnormal as 1 − .5 = .5, consistent with this hypoperfused (abnormal) segment. For segment 8, on the other hand, the patient’s distribution (red) is inside the normal distribution (white) and thus the CF obtained is negative, which indicates that the segment is normally perfused. This allows CFs to range from −1 to +1. CFs are calculated for each segment and for each quantitative parameter used as input to the AIsR. This is a non-parametric approach as no assumptions are made as to the properties of the normalized count distribution (usually incorrectly approximated as Gaussian).

Step 2. Assigning a score to each of the segments.

This step converts the CF value for each segment into a score (0-4) (Figure 2). All segments with a normal CF (< - .2) are given a score of 0. The score for each abnormal (CF > .2) or equivocal (−.2 < CF < .2) segment depends on two parameters: 1) the type of distribution (stress, rest perfusion; perfusion reversibility; AC vs. non-AC, supine vs. prone, stress, rest thickening, thickening reversibility) and 2) the magnitude of the parameter (% uptake for perfusion, % thickening for thickening). These CF settings were done at 3 different levels (modes) of sensitivity/specificity settings: 1. High specificity, where an equivocal CF in the AIsR was set to normal, 2. High sensitivity, where an equivocal CF in the AIsR was set to abnormal and 3. Tradeoff sensitivity/specificity, where the lower half of the equivocal CF range (−.2 to 0) was set to normal and the upper half (0 to .2) to abnormal.

Figure 2. Combined slices/polar map displays of the patient with reversible lateral wall perfusion defect from figure 1.

Figure 2.

Stress (top)/Rest(bottom) SPECT attenuation corrected Slices, Rotating projections, transmission slices and 17 segment smart-scores. Note three contiguous segments in the lateral wall of the stress polar maps each with a score of 2 (SSS = 6) corresponding to 9% of the LV hypoperfused. Also note that circles around the stress perfusion scores (insert A) signifies that the original score in Figure 1.A were modified by the expert system.

A set of scores is determined for each segment in each distribution and then are merged into one set of results for stress perfusion, rest perfusion, reversibility perfusion, stress thickening and rest thickening. The merger takes place such that the most normal score for each segment in each distribution is retained. For example, if the scores for segment 16 in the stress perfusion distribution is a 2 for non-AC, -and a 0 for AC (or prone) the combined score retained is a 0.

Step 3. Determining smart scores and AIsR generation.

Here all sets of scores from step 2 are used as input to our expert system. This is a Bayesian inference engine forward chaining our MPI knowledge base of interpretation and reporting heuristic rules, similar to our previous reports (8,9) following well-established expert system methods (18). This expert system uses these input scores to determine the certainty of the location, size, shape, and reversibility of both the perfusion defects and thickening abnormalities to infer the certainty of the presence and vascular location of CAD. This information is then transmitted to the AIsR in natural language text. One main difference between our present expert system and our previous one (9) is that now all information for each segment is weighted to modify each segmental score during this iteration and the AIsR follows ASNC guidelines for reporting (11). Thus, for example a segment that exhibits a fixed perfusion defect in the non-AC distributions is more certain to be fixed if it is also fixed in the AC distributions and even more certain if the segment is thickening abnormally. Once all perfusion and function smart scores (Figure 2.A insert) and pertinent pre-specified data elements (example LVEF, TID, etc) along with their CF values are determined they are exported as a highly structured object which is then imported by the AIsR. These exported data elements are mapped to the existing data entry fields within the AIsR. When the user begins generating the report, all of the mapped input entry fields are automatically pre-populated including the smart-scores data generated by our expert system.

All the natural language text is conditionally generated by the reporting module of the system. Briefly, take as an example the results in Figure 3 and the AIsR report in Figure 4.A. Specifically consider the conclusion in both figures “the apical lateral segment is completely reversible”. Before reaching the report, the non-parametric statistics combined with the expert system portion of the AIsR has determined CFs for each possible state (categories). In this case of apical lateral segmental reversibility it has determined a CF that the segment is completely reversible, another CF that it is partially reversible, another CF that it is minimally reversible and another CF that it is fixed. The natural language generator reads these states and choses the one with the highest CF as the condition to report, in this case completely reversible.

Figure 3.

Figure 3.

Automatically generated AIsR perfusion sub-report of patient from Figure 2. Note concordance with the oblique slices and smart-scores. All drop down arrows indicate a parameter that can be modified by the nuclear cardiology expert before it reaches the final report (not used for this validation).

Figure 4.

Figure 4.

Findings and impressions extracted from AI Structured Report (A) and actual excerpts of the clinical report (B) for the MPI study shown in Figures 1-3. Note concordance in presence and location of hypoperfusion associated with ischemia.

Statistical analysis

All studies were classified as normal (definitely normal or probably normal) or abnormal (definitely abnormal or probably abnormal) based on the report describing the presence of one or more stress perfusion defects. To test the primary hypothesis the methodology previously reported by us to test for non-inferiority was used (15). The difference between two population proportions from a single sample (19) was used to test if there were differences in reporting agreements between AIsR-expert to independent-expert. If AIsR findings are equivalent to expert findings, the expected difference between the AIsR findings agreement to independent-expert agreement is zero. The primary analysis tested the null hypothesis of equivalence of AIsR-expert agreement to independent-expert agreement (no agreement rate reduction) versus inferiority (a reduction of >0%). A 95% confidence interval (CI) for the difference between AIsR-expert agreement rates to independent-expert agreement rate was calculated and the null hypothesis rejected if the upper limit was below 0% with a corresponding one-tail p value less than .05. Interobserver agreement between AIsR findings and expert findings for all 1000 MPI studies was measured using percent agreement (accuracy) and Cohen's kappa value. McNemar's test was used to test the statistical differences in accuracy in the 1000 MPI studies between each of the three sensitivity/specificity modes. To test whether there were differences between the MPI studies from the 1000 patients and the 100 patient cohort as to the prevalence of CAD, Ischemia and AIsR agreement rate the Medcalc Chi-squared comparison of proportion was used. A p < .05 was considered significant for all comparisons.

Results

Interobserver Analysis

The human experts' reporting of the 100 patient subgroup resulted in 17 patients with CAD and 83 without. Of the 17 patients diagnosed with CAD 9 were reported to be ischemic. The breakdown of stress hypoperfusion by vascular territory in the 17 CAD patients were as follows: 8 LAD, 10 LCX, and 5 RCA. The breakdown of reversible ischemia by vascular territory in the 9 ischemic patients were: 6 LAD, 5 LCX, and 1 RCA. The overall agreement rates, p values, agreement differences and 95% CI for each of the validated reported categories are shown in Table 2. At the high specificity level, there were no statistical differences in the agreements between the AIsR findings/impressions compared to the experts' findings/impressions when compared versus the independent (10th) reader findings/impressions vs. the experts in reporting the same studies. The finding of no statistical difference was true for the reporting of CAD (p = .33) or ischemia (p = .37). There were statistical differences for the tradeoff sensitivity/specificity level (CAD p = .01; ischemia p = .03) and even more difference for the high sensitivity level (CAD p = <.001; ischemia p = <.001). At the high-specificity level the 95% CI is above 0% for all categories (i.e. the AIsR findings are not inferior to the human expert reports) whereas they are below zero at four of eight categories at the tradeoff level and all eight categories for the high-sensitivity levels.

Table 2.

Agreement between automated smart-report results and human experts at three different sensitivity/specificity modes (n=100).

Hi-Specificity CAD LAD LCX RCA
%Agree: AIsR:expert 85 95 92 93
%Agree: Ind:expert 83 90 94 89
p value .33 .07 .24 .14
Δ agreement .052 .05 −.02 .04
95% CI −.07 to .11 −.01 to .11 −.08 to .04 −.03 to .11
Ischemia
%Agree: AIsR:expert 89 95 96 98
%Agree: Ind:expert 90 94 94 99
p value .37 .33 .21 .28
Δ agreement −.01 .01 .02 −.01
95% CI −.07 to .05 −.03 to .05 −.03 to .07 −.04 to .02
Tradeoff Sn/Sp CAD LAD LCX RCA
%Agree: AIsR:expert 74 82 77 89
%Agree: Ind:expert 83 90 94 89
p value .01 .02 <.01 .5
Δ agreement −.09 −.08 −.17 .00
95% CI −.17 to −.01 −.16 to −.003 −.25 to −.09 −.07 to .07
Ischemia
%Agree: AIsR:expert 83 91 89 97
%Agree: Ind:expert 90 94 94 99
p value .03 .12 .06 .08
Δ agreement −.07 −.03 −.05 −.02
95% CI −.14 to −.0007 −.08 to .02 −.11 to .01 −.05 to .007
Hi-Sensitivity CAD LAD LCX RCA
%Agree: AIsR:expert 61 65 63 73
%Agree: Ind:expert 83 90 94 89
p value <.001 .03 <.001 .001
Δ agreement −.22 −.25 −.31 −.16
95% Cl −.31 to −.13 −.35 to −.15 −.41 to −.21 −.26 to −.06
Ischemia
%Agree: AIsR:expert 64 76 72 88
%Agree: Ind:expert 90 94 94 99
p value <.001 <.001 <.001 <.001
Δ agreement −.26 −.18 −.22 −.11
95% CI −.35 to −.17 −.25 to −.10 −.31 to −.13 −.17 to −.05

AIsR agreement with experts.

The nine human experts reporting of the 1000-patient population resulted in 247 patients with CAD and 753 without. Of the 247 patients diagnosed with CAD 120 were deemed ischemic. The breakdown of stress hypoperfusion by vascular territory in the 247 CAD patients were 135 LAD, 103 LCX, and 85 RCA. These included 194 patients with single-vessel disease, 169 with double-vessel disease, and 117 with triple-vessel disease. The breakdown of reversible ischemia by vascular territory in the 120 ischemic patients were 61 LAD, 63 LCX, and 28 RCA. There were no significant differences between the 100 patient cohort used to test the non-inferiority of AIsR vs. expert and the 1000 patient study group used to determine agreement rates between AIsR and experts. The categories tested were prevalence of CAD (347/1000 vs 27/100; p = .11), prevalence of ischemia (120/1000 vs 9/100; p = .37), agreement rate for CAD (820/1000 vs 85/100; p = .45) and agreement rate for ischemia (880/1000 vs 89/100; p = .77). All statistical comparisons were done using AIsR’s high-specificity mode.

Figure 2 depicts images and smart-scores in a female patient with reversible defects in the LCX coronary territories with the corresponding smart-report shown in Figures 3 and 4.A. Figure 4.B shows the findings and impressions of the actual clinical report.

Figure 5 shows agreement results of AIsR-experts for the entire 1000 patient group using the reported expert clinical read as the reference and compared for the three levels of sensitivity/specificity. These agreements are shown as to detection of stress induced hypoperfusion and stress-induced ischemia. Note that for both the CAD and ischemia category the high specificity level yielded the highest accuracy and specificity across global and regional results. These accuracies were determined to be statistically significant across all comparisons for global and regional hypoperfusion and reversibility. Table 3 shows percent agreement, kappa agreement values between the AIsR and the experts' impressions of CAD and ischemia in the 1000 MPI studies. These kappa values ranged from 32.3 to 51.9 corresponding to a range from fair to moderate agreement as might be expected in the variation of clinical reports amongst nine different experts.

Figure 5.

Figure 5.

Diagnostic performance of the AI Structured Report in reporting stress-induced hypoperfusion as indicative of CAD (top row) and reversibility at rest as indicative of ischemia (bottom row). Results for the modes: high specificity (green bars), sensitivity (SN)-specificity (SP) tradeoff, (red bars) and high sensitivity (blue bars) results are shown for agreement (i.e., accuracy - left column), specificity (middle column) and sensitivity (right column) (* p < .001). The labels CAD and Ischemia in the abscissa of each graph refers to global findings regardless of vascular territory.

Table 3.

Agreement, kappa and 95% CI results for the automated AIsR using High-specificity mode and the human experts reports as reference standard (n=1000).

Hi-Specificity CAD LAD LCX RCA
%Agree: AIsR:expert 82 89 89 92
Kappa 48.7 47.7 51.4 40.3
95% CI 42.0 to 55.4 38.7 to 56.7 42.8 to 59.9 27.6 to 53.0
Ischemia
%Agree: AIsR:expert 88 93 93 97
Kappa 43.6 32.3 51.9 36.9
95% CI 34.0 to 53.2 16.8 to 47.9 40.7 to 63.1 14.2 to 59.3

Discussion

We developed and validated the diagnostic performance of an MPI natural language reporting system that utilizes non-parametric relative perfusion and function quantification as input to our expert system to interpret the study and generate the report. This is the first study that compares automatically generated MPI natural-language reports to actual clinical reports.

Our results show that the reporting of CAD (hypoperfusion at stress) and ischemia (reversibility at rest) from our automatically generated AIsR is not statistically inferior from that of experts when a high-specificity mode is used (i.e. equivocal = normal) and the reporting of other experts is used as the reference standard. Importantly this high-specificity mode yielded the highest accuracy in our extensive population. It should not be surprising that AIsR best agreed with the experts in the high-specificity mode since this indicates the human image interpretation trend being adjusted to the drop in the prevalence of abnormal studies to 25% at our institution (also in this population) similar to trends reported by others (21) and reported as low as 9% at other major institutions (22). These findings are also consistent with those reported from a meta-analysis of 49,000 patients demonstrating diagnostic performance for referral bias corrected MPI (similar to echocardiography) of 99% specificity and 38% sensitivity (from 69%, 85% uncorrected, respectively) (23).

Strength of the approach.

This is the first report showing full integration between an image analysis system and structured reporting; a critical need in modern imaging practice. Although the best agreement existed when the high-specificity mode was selected this choice is easily modified to a high-sensitivity level (or tradeoff level) when the AIsR is used to report on patients from a high-risk population such as diabetes. Newly reported here is the determination and use of our 17 segment smart-scores. This novel scoring uses a non-parametric normalized count distribution applied to information theory to generate a certainty of abnormality. This certainty for each segment is modified according to all available perfusion and function information for that segment including rest, stress, changes between stress and rest, AC and non-AC images, and prone images. Although not validated here, the diagnostician is allowed to change manually any of the scores that in turn would modify the report if needed. Importantly, as previously reported (20), the expert system tracks all steps in generating the report as a justification which may be used by the diagnosticians to decide whether they agree or not with the findings or impressions in the report. This is an important benefit of expert systems over conventional neural net or machine learning approaches. Another benefit of the expert system approach used here is that, compared to other AI approaches, only the 40 normal patients used for database generation were needed to train the system as most of the training comes from the cumulative experience of the experts.

Comparison of AIsR to PERFEX.

As described in the methods section we had previously developed and validated a decision support expert system to assist nuclear cardiology physicians with the image interpretation process (8, 9). There are several differences between that system (PERFEX) and the one reported here. PERFEX divided the LV into 32 segments; AIsR uses the standard 17-segment system. PERFEX depended on Gaussian distributions and statistics to determine normality and abnormality criteria; AIsR uses non-Parametric statistics. PERFEX did not use the global or regional functional information to reach its conclusions; AIsR integrates the functional information into all its conclusions. PERFEX did not use its conclusions to modify the ECTb results; AIsR uses its knowledge base and the available quantitative information to modify the original segmental scores into smart-scores. If AC was performed, PERFEX would provide a separate interpretation for the AC study and one for the non-AC study; AIsR integrates both into one set of scores and one conclusion. If there were, also a prone study performed AIsR would also integrate it. This integration takes place by trying to mimic in the code how human experts use the information. Before the integration is done AIsR determines segmental scores separately for each of the diagnostic categories considered: stress perfusion, rest perfusion, reversibility, and thickening. After these individual scores are determined, AIsR integrates the information into a meta-analysis module. Therefore, if an MPI study had AC, non-AC and Prone studies performed AIsR would use the most normal score for that segment. If the same segment exhibited reversibility AIsR would then modify the score using Bayesian statistics and the strength of the information (i.e. how much reversibility was present). Similarly, if the same segment exhibited abnormal thickening then AIsR would again modify the score using the same approach as with reversibility. Perhaps the most obvious difference between PERFEX and AIsR is that AIsR propagates its conclusions into a structured report.

Reference standard.

Since AI systems have to be “trained” and validated with both input images and accepted output interpretations, the question of what to use as the reference standard often arises. Use of invasive coronary angiography or clinical outcome as the gold standard for training and validating is often mentioned for an MPI AI system as attractive goals but it misses the point of these systems that is to interpret studies with the same level of expertise as experts. Moreover, using invasive catheterization as a gold standard is biased by the referral pattern of abnormal MPI studies to catheterization as well as by the discrepancies in comparing physiologic results to anatomic ones. Outcome is certainly an important measure but in MPI, coronary angiography and outcomes as gold standards are confounded by the fact the scan interpretation (e.g. ischemia or no ischemia) has a major impact on the referral to the catheterization lab or the clinical outcome (intervention versus observation); consequently, these gold standards are biased. Simply stated, the interpretation of the study affects the treatment and the treatment affects the outcomes thus biasing the outcomes as a reference standard. Thus, the practice of using interpretation of the MPI studies by experts is an acceptable approach that others and we have used (9, 24).

Limitations.

First, all the data used for this evaluation were obtained retrospectively from one center. Second, we had to extract manually the needed diagnostic information from the clinically dictated reports to use as the reference standard. Third, all the clinical reporting was performed by Emory experts. Although these experts were trained at different institutions, it could be argued that over time, they tended to read similarly and perhaps different from readers from other institutions. Fourth, although the AIsR uses standardized reporting guidelines we did not compare the size and severity of the hypoperfused or reversible areas between the experts and the AIsR, only whether these were present and in which vascular territory. This is because in part when the clinical reports were generated reporting guidelines were not being strictly applied by the experts. Fifth, we also chose not to report here the clinical reporting agreements as to functional variables. Although these functional parameters were used in the generation of the smart-scores, these variables are quantitative and straightforward in how they are usually reported thus not compared for simplification. Sixth, although we have previously integrated patients’ clinical information with their imaging results in order to improve diagnostic accuracy (25), this was not attempted here, as it would require either manual input and/or EMR interfaces with hospital systems that now would limit the applicability of this AIsR. Seventh, the agreement in reporting between the AIsR in the high-specificity mode and our clinicians reflects the current reduced prevalence of disease (25%) of our patient referral pattern. In other scenarios (such as other countries) where the prevalence of disease is much higher than 25% different results could have been obtained. This is the rationale for allowing the AIsR to switch easily between modes such as high sensitivity and sensitivity/specificity tradeoff mode. Finally, although the use of AC is not a limitation but an attribute that reduces the complexity of image interpretation, results of applying our approach to a large study population without AC (or prone imaging) cannot be predicted by the present study.

New Knowledge Gained

Non-parametric statistics can be used to determine certainty that a regional parameter of LV perfusion and/or function is abnormal. Due to apparent reduced prevalence of CAD in populations of patients undergoing MPI, automated diagnostic systems agreement with experts improves when set to analyze images at high-specificity settings.

Conclusions

Automatic structured reports from computer-assisted interpretation of rest/stress myocardial perfusion SPECT studies by an AI expert system when operating at a high-specificity level statistically agrees with the interpretations of nuclear cardiology experts and exhibits diagnostic accuracy consistent with that of experts when their clinical reports are used as the reference standard.

Supplementary Material

12350_2018_1432_MOESM1_ESM

Acknowledgments

This work was supported by the NHLBI grant number R42HL106818. We acknowledge Emory University Hospital nuclear cardiology diagnosticians for use of their clinical MPI reports as well as Archana Kudrimoti for data mining the data warehouse for the clinical data reported.

Abbreviations:

AI

artificial intelligence

AIsR

AI driven structured report

CAD

coronary artery disease

CDSS

clinical decision support system

CF

certainty factor

CI

confidence interval

ECTb

Emory Cardiac Toolbox

LAD

left anterior descending coronary artery

LCX

left circumflex coronary artery

LLK

low likelihood

LV

left ventricle

MPI

myocardial perfusion imaging

NC

nuclear cardiology

RCA

right coronary artery

TID

Trans-ischemic dilatation

SN

sensitivity

SP

specificity

sR

structured Report

SSS

sum stress score

Footnotes

Disclosures

EVG, CDC, RF and JLK receive royalties from the sale of the Emory Cardiac Toolbox and/or Smart Report described in this article. The terms of this arrangement have been reviewed and approved by Emory University in accordance with its COI practice. CDA and CDC are employees of or consultants to Syntermed.

References

  • 1.Fujita H, Katafuchi T, Uehara T, Nishimura T. Application of neural network to computer-aided diagnosis of coronary artery disease in myocardial SPECT Bull’s-eye images. J Nucl Med. 1992; 33:272–276. [PubMed] [Google Scholar]
  • 2.Porenta G, Dorffner G, Kundrat S, Petta P, Duit-Schedlmayer J, Sochor H. Automated interpretation of planar thallium-201-dipyridamole stress-redistribution scintigrams using artificial neural networks. J Nucl Med. 1994; 35:2041–2047. [PubMed] [Google Scholar]
  • 3.Hamilton D, Riley PJ, Miola UJ, Amro AA. A feed forward neural network for classification of bull’s-eye myocardial perfusion images. Eur J Nucl Med. 1995; 22:108–115. [DOI] [PubMed] [Google Scholar]
  • 4.Lindahl D, Lanke J, Lundin A, Palmer J, Edenbradt L. Improved classifications of myocardial bull’s-eye scintigrams with computer-based decision support system. J Nucl Med. 1999;40:96–101. [PubMed] [Google Scholar]
  • 5.Haddad M, Adlassnig KP, Porenta G. Feasibility analysis of a case-based reasoning system for automated detection of coronary heart disease from myocardial scintigrams. Artificial Intelligence in Medicine. 1997; 9:61–78. [DOI] [PubMed] [Google Scholar]
  • 6.Arsanjani RA, Xu Y, Dey D, Fish M, Dorbala S, Hayes S, et al. Improved Accuracy of Myocardial Perfusion SPECT for the Detection of Coronary Artery Disease Using a Support Vector Machine Algorithm. J Nucl Med. 2013; 54:549–555. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Arsanjani RA, Xu Y, Dey D, Vahistha V, Nakanishi R, Hayes S, et al. Improved Accuracy of Myocardial Perfusion SPECT for Detection of Coronary Artery Disease by machine learning in a large population. J Nucl Cardiol. 2013; 20:553–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Ezquerra N, Mullick R, Cooke D, Krawczynska E, Garcia E. PERFEX: An expert system for interpreting 3d myocardial perfusion. Expert Systems with Applications, 1993, 6:459–468. [Google Scholar]
  • 9.Garcia EV, Cooke CD, Folks RD, Santana CA, Krawczynska EG, De Braal L, et al. Diagnostic Performance Of An Expert System For The Interpretation Of Myocardial Perfusion SPECT Studies. J Nucl Med 2001; 42:1185–1191. [PubMed] [Google Scholar]
  • 10.Douglas PS, Hendel RC, JE Cummings Dent JM, Hodgson JM, Hoffmann U, et al. : ACCF/ ACR/ AHA/ ASE/ ASNC /HRS/NASCI/RSNA/SAIP/SCAI/SCCT/SCMR 2008 Health Policy Statement on Structured Reporting in Cardiovascular Imaging. JACC 53:1 2009:76–90. [DOI] [PubMed] [Google Scholar]
  • 11.Tilkemeier PL, Cooke CD, Grossman GB, McCallister BD, Ward RP. Standardized reporting of myocardial perfusion and function. J Nucl Cardiol 2009, doi: 10.1007/s12350-009-9095-8. [DOI] [PubMed] [Google Scholar]
  • 12.Hansen CL, Richard A. Goldstein (Co-chairs): Myocardial perfusion and function: Single photon emission computed tomography. ASNC Guidelines for Nuclear Cardiology Procedures. J Nucl Cardiol 2007;14:e39–60. [DOI] [PubMed] [Google Scholar]
  • 13.Garcia EV, Faber TL, Cooke CD, Folks RD, Chen J, Santana C: The increasing role of quantification in nuclear cardiology: The Emory approach. J Nucl Cardiol 2007;14: 420–32. [DOI] [PubMed] [Google Scholar]
  • 14.Cerqueira MD, Weissman NJ, Dilsizian V, Jacobs AK, Kaul S, Laskey WK et al. Standardized Myocardial Segmentation and Nomenclature for Tomographic Imaging of the Heart. Circulation 2002; 105:539–542. [DOI] [PubMed] [Google Scholar]
  • 15.Esteves FP, Raggi P, Folks RD, Keidar Z, Askew JW, Rispler S, et al. Novel solid-state-detector dedicated cardiac camera for fast myocardial perfusion imaging: multicenter comparison with standard dual detector cameras. J Nucl Cardiol 2009, 16:927–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Esteves FP, Galt JR, Folks RD, Verdes L, Garcia EV: Diagnostic Performance of low-dose rest/stress Tc-99m tetrofosmin myocardial perfusion SPECT using the 530c CZT camera: Quantitative vs. visual analysis. J Nucl Cardiol 2014; 21:158–65. [DOI] [PubMed] [Google Scholar]
  • 17.Shannon EC, Weaver W: The Mathematical Theory of Communication. Chicago, University of Illinois Press, 1949. [Google Scholar]
  • 18.Shortliffe EH. Computer-Based Medical Consultations: MYCIN. Elsevier scientific publishing company, Amsterdam, Netherlands: 1976, pp264. [Google Scholar]
  • 19.Dunn OJ. Basic statistics: a primer for the biomedical sciences. New York: John Wiley & Sons; 1977. p. 116–119 [Google Scholar]
  • 20.Garcia EV, Taylor A, Manatunga D, Folks R: A Software Engine to Justify the Conclusions of an Expert System for Detecting Renal Obstruction on 99mTc-MAG3 Scans J Nucl Med 2007. 48: 463–470. [PMC free article] [PubMed] [Google Scholar]
  • 21.Chhabra L, Ahlberg AW, Henzlova MJ, Duvall WL: Temporal trends of stress myocardial perfusion imaging: Influence of diabetes, gender and coronary artery disease status. Int J of Cardiology 2016: 922–929 [DOI] [PubMed] [Google Scholar]
  • 22.Rozanski A, Gransar H, Hayes SW, Min J, Friedman JD, Thomson LEJ, et al. Temporal Trends in the Frequency of Inducible Myocardial Ischemia During Cardiac Stress Testing: 1991 to 2009. JACC 2013. 10:1054–1065 [DOI] [PubMed] [Google Scholar]
  • 23.Ladapo JA, Blecker S, Elashoff MR, Federspiel JJ, Vieira DL, Sharma G, et al. Clinical Implications of Referral Bias in the Diagnostic Performance of Exercise Testing for Coronary Artery Disease. J Am Heart Assoc. 2013: 2(6): e000505 doi: 10.1161/JAHA113.000505 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Taylor A, Hill A, Binongo J, Manatunga A, Halkar R, Dubovsky EV, Garcia EV: Evaluation of two diuresis renography decision support systems designed to determine the need for furosemide in patients with suspected obstruction. AJR 2007;188:1395–1402 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Garcia EV, Taylor A, Folks R, Manatunga D, Halkar R, Savir-Baruch B, et al. iRENEX: a clinically informed decision support system for the interpretation of 99mTc-MAG3 scans to detect renal obstruction. Eur J Nucl Med Mol Imaging 2012. 39:1483–1491 DOI 10.1007/s00259-012-2151-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

12350_2018_1432_MOESM1_ESM

RESOURCES