Skip to main content
JACC: Advances logoLink to JACC: Advances
. 2025 Dec 24;4(12):102360. doi: 10.1016/j.jacadv.2025.102360

Deep Learning–Based Segmentation of Coronary Arteries and Stenosis Detection in X-Ray Coronary Angiography

Mitchel A Molenaar a,∗∗,, Elsa Hebbo b,, Jasper L Selder a, Nikoloz Shekiladze b, Pratik B Sandesara b, William J Nicholson b, Folkert W Asselbergs a,c, Syed Ahmad b, Daniel A Gold b, Shaimaa M Sakr b, Javier Oliván Bescós d, Vincent Auvray d, Martijn S van Mourik d, Alexander Haak d, Yida Zhao d, Jelle D Nieuwendijk a, Mark J Schuuring a,e,f, Berto J Bouma a, Steven AJ Chamuleau a, Niels J Verouden a,
PMCID: PMC12834072  PMID: 41447280

Abstract

Background

Deep learning applications may assist in automatically detecting coronary arteries on invasive coronary angiography (ICA).

Objectives

The authors aimed to train deep learning models for the segmentation of coronary arteries and the detection of significant stenoses on ICA, conduct external validation, and compare the performance with expert variabilities.

Methods

ICA studies from Amsterdam University Medical Centers (center 1) and Emory University Hospital (center 2) were retrospectively collected. Contours of the main coronary arteries and their ≥50% stenoses were manually segmented using dedicated software. Deep learning–based models were created using data from center 1, center 2, and both centers. The performance of the models was assessed on unseen data and compared to expert variability.

Results

A total of 10,573 ICA images were used to train models: 9,065 from center 1 (n = 2,624) and 1,508 (n = 456) from center 2. Validation was done on 186 center 1 images and 123 center 2 images. The segmentation model trained on data sets from both centers had the highest median Dice coefficient (0.86; IQR: 0.81-0.88). The stenoses detection algorithm trained on both centers achieved a detection rate of 0.67 (95% CI: 0.63-0.71), similar to expert agreement (0.65; 95% CI: 0.63-0.68). The model trained on the data with the most stenoses yielded the highest stenosis detection rate (0.67; 95% CI: 0.64-0.71). When matched for data set size and proportion of stenoses, the models trained on both centers performed similarly.

Conclusions

The models achieved performance levels on par with experts in coronary artery segmentation and detection of significant stenoses in the main arteries.

Key words: coronary angiogram, coronary artery, coronary artery disease, coronary stenosis, deep learning model

Central Illustration

graphic file with name ga1.jpg


Coronary artery disease is the narrowing of the coronary arteries, which is the most common cardiovascular disease leading to more than 9 million deaths annually.1 In patients with suspected coronary artery disease, invasive coronary angiography (ICA) is the standard diagnostic evaluation performed in intermediate- to high-risk patients.2,3 ICA accurately depicts coronary anatomy and stenosis severity, informing treatment decisions including medical therapy or revascularization.3

ICA has several limitations. The assessment of the two-dimensional ICA images is impeded by artery foreshortening, artery overlap, and low image quality.4 Stenosis severity is often assessed visually, even though numerous studies have shown that visual assessment of stenosis severity is limited by inaccuracy and interobserver and intraobserver variability.5, 6, 7 Visual estimation may overestimate stenosis by up to 21% compared with quantitative coronary angiography (QCA),5 which may affect the decision to opt for additional invasive intravascular imaging and/or physiologic assessment,3 and revascularization.8 Although QCA software provides quantitative anatomical stenosis severity assessment, it is rarely used due to tedious user interactions including frame selection and calibration.9 A real-time, user-friendly alternative is needed to bring QCA-like analysis into the catheterization lab.10

Deep learning, a subtype of machine learning, can extract complex image features traditionally interpreted by experts.11,12 Recent studies have demonstrated that deep learning applications may assist physicians in the interpretation of ICA images. These applications mainly focused on automated segmentation (detection) of coronary arteries,10,13, 14, 15, 16, 17, 18, 19, 20 which is a prerequisite for automated stenosis assessment.9,21 However, most studies did not validate their segmentation models on data from a different institution, thus possibly introducing intrinsic selection biases of operators, x-ray systems, data storage, and annotators.10 This study aimed to develop and externally validate deep learning models for automated coronary artery segmentation and stenosis detection on ICA.

Methods

Study population and image acquisition

Studies of patients who underwent ICA or percutaneous coronary intervention were retrospectively retrieved from 2 tertiary centers (center 1: Amsterdam UMC, the Netherlands [2015-2017]; center 2: Emory University School of Medicine, Atlanta, United States [2005-2021]) (Central Illustration). Patients with prior coronary artery bypass surgery were excluded. All studies were acquired using Philips systems and extracted from the picture archiving and communication system, stored in the 512 × 512 Digital Imaging and Communications in Medicine format, anonymized, and normalized between 0 and 1.

Central Illustration.

Central Illustration

Deep Learning–Based Segmentation of Coronary Arteries and Stenosis Detection in X-ray Coronary Angiography

A total of 2,624 patients with 9,065 images from center 1 and 456 ICA patients with 1,508 images from center 2 were selected. Each study contained multiple ICA cine runs from different views for comprehensive evaluation. Patient characteristics (age, sex, risk factors) were collected from electronic health records. The study adhered to the Declaration of Helsinki and was approved by local ethics boards with consent waived.

Data annotation

To train a deep learning–based segmentation model, a set of segmentations were created in both centers. For each ICA study, end-diastolic frames showing maximal coronary dilation were selected from multiple views. The optimal frames to assess the extent and severity of the significant stenosis (stenosis degree ≥50%) were then selected. If the patient did not have any significant stenoses, the optimal frame to label the coronary anatomy was selected. Typically, 2 left and one right coronary runs were selected.

The left circumflex artery (LCX), left anterior descending artery (LAD), and right circumflex artery (RCA) were manually annotated according to the segment model of the globally accepted SYNTAX (Synergy Between Percutaneous Coronary Intervention With Taxus and Cardiac Surgery22) score using dedicated software. Visible stenoses (≥50%), which were not impaired by low contrast or significant overlap, were marked by lines at the minimal lumen diameter, with reference lines proximal and distal. Annotation counts, segment distributions, and stenosis prevalence were recorded.

In center 1, trained medical students performed annotations under supervision of an interventional cardiologist. The annotations were randomly reviewed and corrected by 3 experienced annotators (an interventional cardiologist, a cardiologist, and a technical physician). In center 2, annotation was performed by 2 medical students who were trained and supervised by 2 interventional cardiologists.

A total of 186 images from 49 patients were randomly selected as a test set from center 1, and 123 images from 32 patients were selected from center 2. These unseen data were used for the final quantitative and qualitative analysis.

Segmentation and stenosis detection model

A model was developed that automatically outputs stenotic segments in visible main coronary arteries (Figure 1). The model started by leveraging a deep learning segmentation model (U-Net with a ResNet-101 encoder23) that outputs the segmentation masks of the 3 main arteries (if present).10 Training details are provided in the Supplemental Appendix. Each branch’s border was extracted to compute a diameter profile, from which stenoses were detected when the diameter locally fell below the theoretical value. Stenosis width, length, and severity (QCA) of the stenosis were quantified similar to the study of Liu et al.24

Figure 1.

Figure 1

Model Input and Output

1. An artificial intelligence–based segmentation of the main arteries is performed. 2. The artery borders are extracted and refined. 3. The corresponding diameter profile is analyzed. 4. Detected stenosed segments are reported and automatically measured.

Model training and testing

We trained several segmentation models: 1) on center 1 data only (Model 1); 2) on center 2 data only (Model 2); and 3) on data from both centers (Model 3). To ensure a more balanced comparison, a fourth (Model 1B) model was introduced, which was trained on a subset of data from center 1 matched in size to center 2. The fifth model (Model 1C) was trained on a subset of data from center 1, matched in size and proportion of images with stenosis to center 2. An overview of the training and testing processes is depicted in Figure 2. Each model was trained using a 5-fold patient-wise split, with 4 parts for training and one for validation in each iteration.

Figure 2.

Figure 2

Acquisition of Data Sets, Training, and Testing

Training and testing of deep learning models for the coronary artery segmentation and significant stenosis detection invasive coronary angiography using data sets from center 1 (Model 1, Models 1B, and 1C), center 2 (Model 2), and a combined data set from both centers (Model 3).

The models were implemented in Python using TensorFlow (v2.17) and trained for 200 epochs on a NVIDIA Titan RTX GPU.

Quantitative model evaluation

The segmentation quality was quantified using the Dice coefficient, reported for RCA, LAD, LCX, and combined vessels. The Dice coefficient is calculated as loose: 2TParterie/(2TParterie + FNarterie + FParterie), where TParterie, FNarterie, and FParterie represent true positives (pixels present in both the predicted and expert-labeled segmentations), false negatives (pixels in expert-labeled segmentation but not in the prediction), and false positives (pixels in the prediction but not in the expert-labeled segmentation), respectively.

A second indicator was the algorithm’s ability to detect and quantify stenotic segments in the main coronary branches. We designed a stenosis detection accuracy metric by selecting a single focal point at the center of the narrowest segment of each stenosis identified by the expert. We then checked whether this focal point fell within the predicted stenosis segmentation. If it did, the stenosis was classified as correctly detected, and TPstenosis was incremented. If the model missed a stenosis, FNstenosis was incremented. The stenosis detection rate (sensitivity) was reported as mean values with 95% CIs and defined as the ratio of expert-labeled stenoses in the main arteries that were correctly detected by the model: TPstenosis/(TPstenosis + FNstenosis). In addition, false positives (FPstenosis) generated by the model were measured and defined as regions predicted as stenotic that did not overlap any expert-labeled stenosis focal point.

Model performance was also evaluated at the image level, by calculating sensitivity, specificity, false positive rate, positive predictive value, negative predictive value, accuracy, balanced accuracy, F1 score, and FPstenosis per image. A true positive was defined as the detection of at least one stenosis in a main artery present in the image, regardless of whether it corresponded to the exact same stenosis. These metrics are described in the Supplemental Appendix.

After training each model on different splits, the models were applied to the test sets (49 center 1 patients, 32 center 2 patients) to evaluate the segmentation performance and the stenosis detection rate. Bootstrap samples with replacement were created, and the models were evaluated on these samples. The segmentation performance and stenosis detection rates were averaged to obtain robust mean estimates. The 95% CIs for the stenosis detection rate were estimated by applying one of the 5 split-wise trained models to the bootstrap data set. To avoid potential bias from shared training data, the 95% CIs were not calculated for the combined set of all split-wise trained models.

To further evaluate the models' performance on data of an external center, the models were applied to the test data set of the ARCADE (Automatic Region-based Coronary Artery Disease Diagnostics using X-ray angiography images) data set.24 This publicly available data set consists of 300 annotated ICA images, including coronary artery segmentations and stenoses. The process for applying our models to the ARCADE data set is outlined in the Supplemental Appendix.

To assess performance with varying data sizes, models were retrained on smaller subsets.

Quantitative interexpert evaluation

To assess interexpert variability, one expert from each center reannotated the stenoses in all test images. As a result, each test image had 3 sets of expert annotations: 1) the original annotation from the acquisition center (comprising both gold standard arteries and gold standard stenotic regions); 2) an additional annotation from center 1 (stenoses only); and 3) an additional annotation from center 2 (stenoses only).

Stenosis detection metrics were computed for the expert annotations in the same manner as for the model predictions. Each annotation served as a reference, and the number of stenoses located within the stenotic area marked by another expert was counted, updating TPstenosis, FNstenosis, and FPstenosis counts accordingly. These metrics were then used to estimate the variability among expert observers (interexpert), both within the same center (intracenter) and between centers (intercenter). Agreement rates were calculated as the percentage of stenoses on which 2 experts agreed. Interexpert agreement was also assessed on a per-image basis, defined as whether both experts identified at least one stenosis in the main arteries of an image. Interexpert agreement on an image basis was quantified using Fleiss’ kappa and interpreted according to standard thresholds.25

Qualitative model evaluation

Qualitative assessment of Model 3 segmentations was performed on 55 angiograms from 16 patients by 2 interventional cardiologists (one per center). Segmentation contours were reviewed alongside ICA images on the same monitor. Quality assessment focused on the correctness and accuracy of stenosis segmentation and was rated using a predefined scoring system (Supplemental Table 1).

Statistical analysis

Patient characteristics and data from both centers were described. Continuous variables were presented as mean ± SD or mean with 95% CI if normally distributed, and as median with IQR if not. Continuous variables were compared using the independent-samples t-test. Categorical variables and annotation statistics were expressed as frequencies and percentages and compared using the chi-square test.

Normality of the segmentation quality data was assessed using the Shapiro-Wilk test. Based on this test, model comparisons used the paired Student’s t-test or Wilcoxon signed rank test for segmentation quality. McNemar’s test on paired binary stenosis detection results was used to assess differences in stenosis detection rates between models. A P value <0.05 was considered statistically significant.

The study followed the CLAIM (Checklist for Artificial intelligence in Medical Imaging)26 checklist reporting (Supplemental Appendix).

Results

Study population

Center 1 patients (n = 2,624) had a mean age of 65 ± 12 years (66% male), and center 2 patients (n = 456) had a mean age of 56 ± 14 years (61% male). Cardiovascular risk factors were more frequent in patients of center 2 (Table 1). Significant stenoses were present in 5,572 images (61%) from center 1 images, compared with 102 images (7%) from center 2.

Table 1.

Baseline Characteristics of Patients in the Training Data Set

Center 1 (n = 2,624) Center 2 (n = 456) P Value
Patient characteristics
 Age (y), mean (SD) 64.6 ± 11.8 56.1 ± 13.7 <0.001
 Male, n (%) 1719 (66.1) 307 (61.4) 0.010
 Smoking, n (%) 809 (31.7) 180 (37.1) 0.022
 Hypertension, n (%) 1,443 (56.5) 362 (72.5) <0.001
 Diabetes mellitus, n (%) 534 (20.7) 149 (29.9) <0.001
 Dyslipidemia, n (%) 690 (27.3) 252 (50.5) <0.001
Annotation data
 Annotated images, n 9,065 1,508
 Left anterior descending, n 5,618 983
 Left circumflex artery, n 5,618 969
 Right coronary artery, n 2,738 482
 Significant stenosis (≥50%), n 5,572 102

Testing data

Annotated test sets included 186 images from 49 patients in center 1 and 123 images from 32 patients in center 2 (Table 2). The total number of expert stenosis annotations was largely consistent across centers, particularly for the main arteries in quantifiable regions (Supplemental Figure 1).

Table 2.

Number of Images and Patients Annotated by the Two Centers

Number of Patients Number of Images
Center 1: training and validation Model 1 2,624 9,065
Center 1: training and validation Model 1B 538 1,504
Center 1: training and validation Model 1C 546 1,508
Center 2: training and validation data Model 2 456 1,508
Center 1: test data 49 186
Center 2: test data 32 123

Model 1B was trained on a subset of data from center 1 of the same size as the training set from center 2. Model 1C was trained on a subset of data from center 1, matched in size and number of images containing stenoses to the center 2 training set.

Segmentation model

Illustrative examples of the segmentation models are shown in Supplemental Figure 2. The performance of the segmentation models on test sets is outlined in Figure 3 and Supplemental Table 2. Across both centers, Model 3 achieved the highest median Dice coefficient on the main arteries (0.86; IQR: 0.81-0.88), closely followed by Model 1 (0.85; IQR: 0.81-0.88). On center 1 data, Models 1 and 3 performed best (both median Dice 0.86), while on center 2 data, Models 1, 2, and 3 were highest (median Dice 0.84). Testing a model on data from another center reduced the mean Dice by 0.02 (Model 1), 0.04 (Model 1B, 1C), and 0.01 (Model 2).

Figure 3.

Figure 3

Evaluation of Segmentation Models on Unseen Test Data

The segmentation models were evaluated on all test data (top), center 1 test data (middle), and center 2 test data (bottom) using the Dice coefficient. LAD = left anterior descending artery; LCX = left circumflex artery; RCA = right circumflex artery.

The segmentation accuracy varied by artery: RCA was highest (median Dice Model 3 0.91; IQR: 0.87-0.94), followed by LAD (median Dice Model 3 0.88; IQR: 0.82-0.91) and LCX artery (median Dice Model 3 0.76; IQR: 0.62-0.87), as shown in Supplemental Table 2. Similar results were observed on the ARCADE data set (Supplemental Table 3), with Model 3 achieving the highest median Dice for RCA (0.90; IQR: 0.87-0.92), followed by LAD (0.87; IQR: 0.83-0.90) and LCX (0.83; IQR: 0.76-0.87). On the ARCADE data set, Models 1 and 3 achieved the highest segmentation performance, whereas Models 1B, 1C, and 2 showed reduced performance, particularly for the LAD and LCX arteries.

Segmentation performance declined from Model 1 to Model 1B (data size matched to Model 2) and 1C (data size and stenoses matched to Model 2) across all 3 testing data sets (Figure 3). Model 1 had higher mean Dice than Model 2 (+0.04 on main arteries) on all test data, while differences were minimal after equalizing training sizes (1B: +0.01, 1C: 0) (Supplemental Tables 2 and 4).

Segmentation performance improved with training data set size, reaching a plateau at approximately 4,000 samples for the LAD/LCX and 2,500 for RCA (Supplemental Figure 3). Beyond these points, further performance gains were minimal.

Stenosis detection

On a per-lesion basis, the stenosis detection rate of Model 3 was 0.67 (95% CI: 0.63-0.71), comparable to Model 1 (0.67; P = 0.76; Figure 4, Table 3, Supplemental Table 5) and interexpert agreement (0.65; 95% CI: 0.63-0.68). Intercenter and intracenter agreement rates were 0.65 (95% CI: 0.62-0.68) and 0.66 (95% CI: 0.62-0.71), respectively. Smaller training sets reduced detection rates, with Model 2 achieving a lower detection rate (0.41; 95% CI: 0.37-0.46) compared with Model 1 (0.67; 95% CI: 0.64-0.71; P < 0.001) and Model 1B (0.61; 95% CI: 0.57-0.65; P < 0.003). Model 1C, trained with the same data set size and stenosis rate as Model 2, showed a comparable detection rate (Model 1C: 0.48; 95% CI: 0.44-0.52; P = 0.10) to Model 2.

Figure 4.

Figure 4

Evaluation of Stenosis Detection Rate on Unseen Test Data

The stenosis detection rate (true positive rate/sensitivity) of the different models was computed for quantifiable stenoses in the main arteries. Interexpert, intracenter, and intercenter agreement were determined by treating each annotator in turn as the reference standard.

Table 3.

Stenosis Detection Rates for Quantifiable Stenoses in the Main Arteries on the Test Set, With 95% CIs

Model Number of Stenoses TP FN Stenosis Detection Rate
All test data
 Model 1 657 443 214 0.67 (0.64-0.71)
 Model 1B 657 389 268 0.61 (0.57-0.65)
 Model 1C 657 304 353 0.48 (0.44-0.52)
 Model 2 657 260 397 0.41 (0.37-0.46)
 Model 3 657 435 222 0.67 (0.63-0.71)
Test set of center 1
 Model 1 507 327 180 0.64 (0.59-0.69)
 Model 1B 507 282 225 0.58 (0.53-0.63)
 Model 1C 507 212 295 0.44 (0.39-0.49)
 Model 2 507 175 332 0.36 (0.32-0.41)
 Model 3 507 326 181 0.66 (0.62-0.71)
Test set of center 2
 Model 1 150 116 34 0.77 (0.70-0.84)
 Model 1B 150 107 43 0.70 (0.62-0.78)
 Model 1C 150 92 58 0.60 (0.51-0.68)
 Model 2 150 85 65 0.58 (0.50-0.66)
 Model 3 150 109 41 0.70 (0.63-0.78)

Each stenosis annotated by a reviewer was evaluated to determine whether the model detected it. Stenosis detection rate (sensitivity) is reported as mean (95% CI).

FN = false negative; TP = true positive.

Model performance varied by training and testing center. The mean detection rate difference between Model 1B (0.61) and Model 2 (0.41) was 0.20 on the complete test set, 0.22 on the center 1 test set (0.58 vs 0.36), and 0.12 on the center 2 test set (0.70 vs 0.58; Table 3). Models trained on center 1 data (Models 1, 1B, 1C) achieved higher stenosis detection rates on the center 2 test set, whereas Model 2 showed a lower performance on the center 1 test set. Detection rates increased for stenoses with QCA ≥0.5 (Model 3: 0.74; 95% CI: 0.70-0.78) and QCA >0.7 (Model 3: 0.76; 95% CI: 0.70-0.82), as shown in Supplemental Figure 4.

In the coronary segments, mean stenosis detection rates were highest for the RCA (0.49-0.88), followed by the LAD (0.46-0.75) and LCX (0.18-0.61; Table 4). LCX detection rates were particularly low for Model 1C (0.27; 95% CI: 0.19-0.34) and Model 2 (0.18; 95% CI: 0.12-0.25), whereas Model 1B (0.54; 95% CI: 0.46-0.63) achieved nearly double that of Model 1C. Detection rates were lower in distal segments than in mid or proximal segments (Supplemental Table 6).

Table 4.

Stenosis Detection Rates on the Test Set for Quantifiable Significant Stenoses (≥50% Narrowed) in the Main Coronary Artery Segments (RCA, LAD, LCX)

Number of RCA Stenoses Stenosis Detection Rate RCA Number of LAD Stenoses Stenosis Detection Rate LAD Number of LCX Stenoses Stenosis Detection Rate LCX
Model 1 161 0.88 (0.82-0.93) 223 0.75 (0.69-0.81) 140 0.61 (0.53-0.69)
Model 1B 161 0.77 (0.71-0.84) 223 0.63 (0.57-0.69) 140 0.54 (0.46-0.63)
Model 1C 161 0.49 (0.42-0.57) 223 0.58 (0.52-0.65) 140 0.27 (0.19-0.34)
Model 2 161 0.60 (0.52-0.67) 223 0.46 (0.40-0.52) 140 0.18 (0.12-0.25)
Model 3 161 0.81 (0.75-0.88) 223 0.73 (0.67-0.78) 140 0.57 (0.48-0.65)

The Table shows the number of stenoses per segment (≥50% narrowed) and the corresponding detection rates (mean; 95% CI) for each model on the total test set.

LAD = left anterior descending artery; LCX = left circumflex artery; RCA = right circumflex artery.

As shown in Figure 5, false positives per image were slightly higher for Models 1 (0.44; 95% CI: 0.39-0.50), 1B (0.49; 95% CI: 0.43-0.54), 1C (0.38; 95% CI: 0.32-0.43), and 3 (0.43; 95% CI: 0.38-0.48) compared with Model 2 (0.34; 95% CI: 0.30-0.39). The false positive rate of Model 2 was similar to expert performance (0.33; 95% CI: 0.25-0.39), whereas the other models produced slightly higher rates.

Figure 5.

Figure 5

Evaluation of Stenosis Prediction Performance on Unseen Test Data

Stenosis detection rate (true positive rate/sensitivity) and false positives per image were calculated for the 5 tested models and compared with expert variability. Expert false positives were estimated as the number of stenoses without agreement between reviewers divided by the total number of images. FP = false positive.

Application of the models to the ARCADE data set yielded results comparable to those on the full test set, with the highest stenosis detection rates observed for Model 1B, 1, and 3 (all 0.64; 95% CI: 0.59-0.70), followed by Model 2 (0.49; 95% CI: 0.45-0.54) and Model 1C (0.45; 95% CI: 0.39-0.50) (Supplemental Table 7). False positive rates per image were slightly lower overall, with Model 3 at 0.31 (95% CI: 0.25-0.37). Model 2 had the lowest rate (0.26; 95% CI: 0.21-0.32), followed by Model 1C (0.28; 95% CI: 0.22-0.34).

Per-image sensitivity was highest for Model 3 (0.75; 95% CI: 0.71-0.79) and Model 1 (0.74; 95% CI: 0.70-0.78) (Supplemental Table 8), while specificity was modest (0.45-0.55). Interexpert agreement per image was 0.81, with full agreement among all 3 experts in 0.71 of cases (Fleiss’ kappa was 0.61, indicating substantial agreement).

Qualitative analysis

The stenosis segmentation contours generated by Model 3 were qualitatively evaluated using 55 angiograms from 16 patients. Experts from center 1 and center 2 assessed that 33 (72%) and 40 (74%) of 48 stenoses in the main arteries were detected by the model, respectively (Table 5, Supplemental Table 9). The 2 experts agreed on stenosis significance in 73% of the detected cases. Stenosis length measurements were classified as accurate or with minor error in 76% of cases (25/33) by expert 1 and 95% (38/40) by expert 2. Minimal stenosis diameters were classified as accurate or with minor error in 82% of cases (27/33) by expert 1 and 90% (36/40) by expert 2.

Table 5.

Qualitative Analysis of 55 Coronary Angiograms by Two Interventional Cardiologists

Expert 1 (Center 1) Expert 2 (Center 2)
Stenosis Detection
 True positive stenosis 33 40
 False positive stenosis 15 8
 False negative stenosis 13 14
 Stenosis detection rate/True positive rate (TP/TP + FN) 0.72 0.74
 Positive predictive value (TP/TP + FP) 0.69 0.83
 F1 score (2∗TP/[2∗TP + FP + FN]) 0.70 0.78
Stenosis characteristics
 Percentage of stenosis with precise length or minimal error 76% 95%
 Percentage of stenosis with precise diameter or minimal error 82% 90%

FP = false positive; other abbreviations as in Table 3.

Discussion

This study evaluated the performance of deep learning models for the coronary artery segmentation and the detection of significant stenoses on ICA. We demonstrated that 1) segmentation performance decreases when models are tested on data from other centers; 2) model performance is influenced not only by the quantity of training data but also by its characteristics, such as the proportion of cases with significant stenoses; and 3) when trained on a large and representative data set, the stenosis detection models perform on par with experts.

Comparison with other studies

A reduction in performance when testing a model at another center is not uncommon, but it has not been explored as extensively as in this study by reciprocal testing across centers. Most previous studies on automated coronary angiography analysis perform validation on a subset of the data that is not used for training. This data set is often annotated within the same center by the same annotators. Du et al21 trained a deep learning model, called DeepDiscern, to automatically segment the coronary arteries and detect lesions on a large data set (segmentation model: 12,323 images, lesion detection: 6,239 images). On an unseen test set, the segmentation algorithm achieved an accuracy of 0.98, sensitivity of 0.85, and an average lesion detection rate of 0.90. Although DeepDiscern showed good performance, the model was not evaluated on data from another center.21 If studies perform external validation, they typically validate their models on one data set, which is often small in size compared to the data set the model was trained on. Yang et al10 trained a deep learning model on 3,302 angiographic images (annotated by 2 experts) and demonstrated a slight reduction in segmentation performance when validated in an external center (181 images), with the mean Dice coefficient decreasing from 0.92 to 0.90. In our study, the decrease in segmentation performance for Model 2 was minimal (−0.01), while it was larger for Model 1B (−0.04) and Model 1C (−0.04). The ICA images of center 1 were annotated by 10 experts, which may have led to inconsistencies in the annotations (eg, varying delineations of arteries and distal artery segmentation) and thereby the model’s performance. For example, in our results, there was considerable variability in the performance of the models trained for segmenting the LCX artery. This might be explained by the variability in how the LCX was drawn distally by the annotators. The model may not have learned a consistent representation of the distal LCX. These findings are in line with the study of Yang et al,10 which also reported a noticeable decrease in LCX segmentation performance when models were tested on data from an external center. Consistent with our results, the RCA segmentation models in their study remained accurate (Dice = 0.88-0.93), which may be explained by the relatively simple appearance of the RCA and the consistency it is segmented.

Deep learning for coronary angiography analysis

Several other studies have focused on various components of the pipeline required for the automated analysis of coronary angiography. The following components have been described: video and frame selection,9,27,28 anatomy localization or segmentation,9,10,21 stenosis detection,9,21,29 and stenosis assessment.9 In recent years, deep learning models have played an increasingly important role in automating these components of coronary angiography analysis. The deep learning applications for segmentation used in this and previous studies offer advantages over other methods, such as tracking-based,30 model-based,31 or filter-based techniques, in terms of processing steps and accuracy.31

Recently, Avram et al9 developed a pipeline, called CathAI, which addresses all components of automated coronary angiography interpretation using a series of algorithms. However, the segmentation algorithm showed suboptimal performance for identifying coronary artery segments and was only able to detect the coronary artery segments as bounding boxes. Furthermore, the stenosis localization was poor, with an average precision of 14% for the model trained on the left and right coronary arteries, and 26% for the model trained on the RCA. A follow-up to CathAI, DeepCoro,32 further improved the identification of coronary artery segments by refining the arterial delineation and employing video-based deep learning algorithms that process sequences of frames. DeepCoro was evaluated on 11 specific coronary segments of 200 images and achieved an average Dice coefficient of 0.74 for coronary artery segmentation, which was lower than the values in our study. Larger training sets in our study (center 1: 9,065 images, center 2: 1,508) and a segmentation method with focus on main arteries (not on segments) may account for the difference in performance. The DeepCoro stenosis detection algorithm, trained on 1,335 videos including 300 severe stenoses, demonstrated a detection rate (sensitivity) for ≥70% stenosis of 0.67 for the image-based model and 0.73 for the video-based model when applied to 333 videos containing 71 severe stenoses. For severe stenosis (>70% narrowing) in this study (Supplemental Figure 4), the detection rates were comparable to those in Model 1 (0.81), Model 1B (0.72), and Model 3 (0.76), but not to Model 1C (0.55) and Model 2 (0.42), which were trained on data sets with a low proportion of stenoses (7%).

Consistent with our test set, application of the models to the ARCADE data set showed that Model 1C and Model 2 had lower stenosis detection rates than models trained on data sets with more stenoses. These findings suggest that it is important to train a model on a data set with a substantial size and proportion of stenoses to achieve good performance when applied to other centers. In addition, we have observed that combining data of multiple centers (Model 3) seems to further improve performance compared to using data from one center. These factors are important to consider when training and testing these models, especially as further work progresses toward automated QCA, noninvasive fractional flow reserve estimation,33 and potentially automated SYNTAX scoring.11,21

Previous studies have excluded patients for a wide range of reasons, including chronic total occlusion,9,34 percutaneous coronary intervention,9 severe artery overlap,34 diffuse stenosis,18 or patients without a stenosis,10,34 or have focused on training models exclusively for the right coronary artery.35 These factors complicate comparisons between studies and hinder the application of these models to real-world patients. Another important factor is that studies often lack validation in an external center,21,29,36,37 which is critical because model performance can vary across centers, as demonstrated in this study. Models trained on center 2 (Model 2) showed lower performance on the center 1 test set, which may reflect differences in patient characteristics or the lower number of stenoses in the center 2 training data set. The higher performance of models trained on center 1 data on the center 2 test set may be due the larger and more diverse range of stenosis patterns captured in the center 1 training data. If models are evaluated in another center, it is important to use data sets that have been annotated in a similar manner. For example, out-of-plane mid and distal segments of the LAD were not all annotated in the ARCADE data set,24 which required modifications to the evaluation process, as detailed in the Supplemental Appendix.

Qualitative and quantitative analysis

When validation is performed, it is often done against a single expert, which does not account for the interuser variability inherent in interpreting coronary angiography images.6,8 A strength of this study is that the stenosis detection models were evaluated against expert agreement. We observed that the false positive stenosis predictions per image were slightly higher for these models compared with the expert, except for Model 2. The stenosis detection rates of Models 1 and 3 were comparable to expert agreement. These results suggest that, despite minor increases in false positives, the models (in particular Models 1 and 3) achieve detection performance on par with expert agreement. These findings highlight the potential utility of these algorithms in clinical settings.

Differences in annotator sensibilities regarding which stenoses are considered significant may explain the relatively low agreement between observers. Only preselected images were interpreted, and the runs were not provided to the experts. This approach has the advantage of presenting the experts with the precise set of information that the algorithm utilizes. However, it makes it difficult to robustly assess stenosis significance, whether there is a single stenosis or multiple separate stenoses, and whether a stenosis is quantifiable. However, the agreements observed in quantitative (65%) and qualitative analyses (73%) of this study align with previous studies, which found an expert agreement on stenosis significance in 65%6 and 81%.8 These numbers emphasize the importance of developing models that reduce variability in clinical interpretation. The application of these models as independent observers, assisting physicians in the interpretation of ICA images, could improve the reliability and accuracy of stenosis assessments. This may contribute to better-informed clinical decisions and potentially enhance patient outcomes.32

Study limitations

Several remarks should be made about this study. First, the models trained were only restricted to the main arteries. Smaller branches of the coronary arteries were not considered. Second, qualitative assessment provided additional information about the stenosis detection algorithm trained on data of both centers. The lengths and diameters of the stenosis were assessed as accurate or with minor errors in most cases, which suggest that the algorithm can help provide physicians with QCA information. Third, the speed of the algorithms and the time required from physicians is increasingly important when implementing novel digital solutions,38 especially when algorithms need to operate in real-time. In this study, the processing of single images was fast and typically ranged from 2 to 7 seconds. These values were in line with the DeepDiscern study, which recorded 1.3 seconds for segment recognition and 0.7 seconds for lesion detection.21 However, analyzing multiple frames, as required for processing ICA videos, will take more time. This was demonstrated by the DeepCoro pipeline, which required an average of 63 seconds to generate predictions for a video.32 Fourth, training and testing was performed on data of 2 centers, and the models were further evaluated on the ARCADE data set.24 It is further expected that including data from additional centers in model training will continue to improve performance. These models may enable a more standardized and reliable assessment of stenoses by serving as an independent observer during ICA interpretation and could enhance diagnostic accuracy. Fifth, all images were acquired using Philips equipment, which may limit the generalizability of the findings to centers using imaging systems from other vendors. However, we evaluated the models on the ARCADE data set, which include images acquired using both Philips and Siemens imaging systems and found similar stenosis detection rates. Sixth, video-based models have the potential to improve performance over image-based models used in this study. The temporal information video-based models use is necessary to understand the dynamics of the cardiovascular system, mimicking how physicians analyze ICA videos.32

Conclusions

Among the multiple deep learning models trained, 2 demonstrated performance on par with experts in coronary artery segmentation and in detecting significant stenoses in the main arteries. The findings of this study demonstrate that data set size, stenosis prevalence, and the choice of training and testing centers are important factors to consider when developing models for automated coronary angiography analysis.

Perspectives.

COMPETENCY IN MEDICAL KNOWLEDGE: This study focuses on automating coronary artery segmentation and stenosis detection by developing a deep learning model, externally validating it, and comparing the performance to expert variability. These findings enhance understanding of how AI-based tools can support accurate and consistent interpretation of coronary angiography.

TRANSLATIONAL OUTLOOK: Developing such models is essential, as they will help assist physicians and serve as a foundation for other technologies, including automated QCA, noninvasive fractional flow reserve estimation, and potentially automated SYNTAX scoring.

Funding support and author disclosures

Dr Asselbergs is supported by EU Horizon (AI4HF 101080430 and DataTools4Heart 101057849) and Dutch Research Council (MyDigiTwin 628011213). Dr Schuuring has received an independent research grant from AstraZeneca to the research institute and is supported by Stichting Hartcentrum Twente. Philips provided funding for the compensation for annotations of the images used in this research. The annotations were conducted independently, and Philips had no influence on their interpretation. Philips provided support for model training and offered technical expertise during result interpretation. The final article reflects the independent analysis and conclusions of the academic authors. All other authors have reported that they have no relationships relevant to the contents of this paper to disclose.

Footnotes

The authors attest they are in compliance with human studies committees and animal welfare regulations of the authors’ institutions and Food and Drug Administration guidelines, including patient consent where appropriate. For more information, visit the Author Center.

Appendix

For an expanded Methods section and supplemental tables and figures, please see the online version of this paper.

Contributor Information

Mitchel A. Molenaar, Email: mitchmolenaar@gmail.com.

Niels J. Verouden, Email: c.verouden@amsterdamumc.nl.

Supplemental material

Supplemental Material
mmc1.docx (1.9MB, docx)

References

  • 1.Vaduganathan M., Mensah G.A., Turco J.V., Fuster V., Roth G.A. The global burden of cardiovascular diseases and risk. J Am Col Cardiol. 2022;80:2361–2371. doi: 10.1016/j.jacc.2022.11.005. [DOI] [PubMed] [Google Scholar]
  • 2.Timmis A., Vardas P., Townsend N., et al. European society of cardiology: cardiovascular disease statistics 2021. Eur Heart J. 2022;43:716–799. doi: 10.1093/eurheartj/ehab892. [DOI] [PubMed] [Google Scholar]
  • 3.Lawton J.S., Tamis-Holland J.E., Bangalore S., et al. 2021 ACC/AHA/SCAI guideline for coronary artery revascularization: a report of the American college of cardiology/American heart association joint committee on clinical practice guidelines. Circulation. 2022;145:e18–e114. doi: 10.1161/CIR.0000000000001038. [DOI] [PubMed] [Google Scholar]
  • 4.Kobayashi T., Hirshfeld J.W. Radiation exposure in cardiac catheterization. Circ Cardiovasc Interv. 2017;10 doi: 10.1161/CIRCINTERVENTIONS.117.005689. [DOI] [PubMed] [Google Scholar]
  • 5.Zhang H., Mu L., Hu S., et al. Comparison of physician visual assessment with quantitative coronary angiography in assessment of stenosis severity in China. JAMA Intern Med. 2018;178:239–247. doi: 10.1001/jamainternmed.2017.7821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Zir L.M., Miller S.W., Dinsmore R.E., Gilbert J.P., Harthorne J.W. Interobserver variability in coronary angiography. Circulation. 1976;53:627–632. doi: 10.1161/01.cir.53.4.627. [DOI] [PubMed] [Google Scholar]
  • 7.Nallamothu B.K., Spertus J.A., Lansky A.J., et al. Comparison of clinical interpretation with visual assessment and quantitative coronary angiography in patients undergoing percutaneous coronary intervention in contemporary practice. Circulation. 2013;127:1793–1800. doi: 10.1161/CIRCULATIONAHA.113.001952. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Leape L.L., Park R.E., Bashore T.M., Harrison J.K., Davidson C.J., Brook R.H. Effect of variability in the interpretation of coronary angiograms on the appropriateness of use of coronary revascularization procedures. Am Heart J. 2000;139:106–113. doi: 10.1016/s0002-8703(00)90316-8. [DOI] [PubMed] [Google Scholar]
  • 9.Avram R., Olgin J.E., Ahmed Z., et al. CathAI: fully automated coronary angiography interpretation and stenosis estimation. NPJ Digit Med. 2023;6:1–12. doi: 10.1038/s41746-023-00880-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Yang S., Kweon J., Roh J.-H., et al. Deep learning segmentation of major vessels in X-ray coronary angiography. Sci Rep. 2019;9:1–11. doi: 10.1038/s41598-019-53254-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Molenaar M.A., Selder J.L., Nicolas J., et al. Current state and future perspectives of artificial intelligence for automated coronary angiography imaging analysis in patients with ischemic heart disease. Curr Cardiol Rep. 2022;24:365–376. doi: 10.1007/s11886-022-01655-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Schuuring M.J., Išgum I., Cosyns B., Chamuleau S.A.J., Bouma B.J. Routine echocardiography and artificial intelligence solutions. Front Cardiovasc Med. 2021;8 doi: 10.3389/fcvm.2021.648877. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Nasr-Esfahani E., Karimi N., Jafari M.H., et al. Segmentation of vessels in angiograms using convolutional neural networks. Biomed Signal Process Control. 2018;40:240–251. [Google Scholar]
  • 14.Cervantes-Sanchez F., Cruz-Aceves I., Hernandez-Aguirre A., Hernandez-Gonzalez M.A., Solorio-Meza S.E. Automatic segmentation of coronary arteries in X-ray angiograms using multiscale analysis and artificial neural networks. Appl Sci. 2019;9:5507. [Google Scholar]
  • 15.Nobre M.M., Silva J.L., Silva B., et al. Coronary X-ray angiography segmentation using artificial intelligence: a multicentric validation study of a deep learning model. Int J Cardiovasc Imaging. 2023;39:1385–1396. doi: 10.1007/s10554-023-02839-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Nobre Menezes M., Lourenço-Silva J., Silva B., et al. Development of deep learning segmentation models for coronary X-ray angiography: quality assessment by a new global segmentation score and comparison with human performance. Rev Port Cardiol. 2022;41:1011–1021. doi: 10.1016/j.repc.2022.04.001. [DOI] [PubMed] [Google Scholar]
  • 17.Liang D., Qiu J., Wang L., et al. Coronary angiography video segmentation method for assisting cardiovascular disease interventional treatment. BMC Med Imaging. 2020;20:65. doi: 10.1186/s12880-020-00460-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Iyer K., Najarian C.P., Fattah A.A., et al. AngioNet: a convolutional neural network for vessel segmentation in X-ray angiography. Sci Rep. 2021;11:18066. doi: 10.1038/s41598-021-97355-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Gao Z., Wang L., Soroushmehr R., et al. Vessel segmentation for X-ray coronary angiography using ensemble methods with deep learning and filter-based features. BMC Med Imaging. 2022;22:10. doi: 10.1186/s12880-022-00734-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Wang L., Liang D., Yin X., et al. Coronary artery segmentation in angiographic videos utilizing spatial-temporal information. BMC Med Imaging. 2020;20:110. doi: 10.1186/s12880-020-00509-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Du T., Xie L., Zhang H., et al. Training and validation of a deep learning architecture for the automatic analysis of coronary angiography. EuroIntervention. 2021;17:32–40. doi: 10.4244/EIJ-D-20-00570. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Serruys P.W., Morice M.-C., Kappetein A.P., et al. Percutaneous coronary intervention versus coronary-artery bypass grafting for severe coronary artery disease. N Engl J Med. 2009;360:961–972. doi: 10.1056/NEJMoa0804626. [DOI] [PubMed] [Google Scholar]
  • 23.He K., Zhang X., Ren S., Sun J. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) June 2016. Deep residual learning for image recognition. Las Vegas, NV, USA. pp. 770-778. [Google Scholar]
  • 24.Popov M., Amanturdieva A., Zhaksylyk N., et al. Dataset for automatic region-based coronary artery disease diagnostics using X-Ray angiography images. Sci Data. 2024;11:20. doi: 10.1038/s41597-023-02871-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Fleiss J.L. Measuring nominal scale agreement among many raters. Psychol Bull. 1971;76:378–382. [Google Scholar]
  • 26.Mongan J., Moy L., Kahn C.E. Checklist for artificial intelligence in medical imaging (CLAIM): a guide for authors and reviewers. Radiol Artif Intell. 2020;2 doi: 10.1148/ryai.2020200029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Ciusdel C., Turcea A., Puiu A., et al. Deep neural networks for ECG-Free cardiac phase and end-diastolic frame detection on coronary angiographies. Comput Med Imag Graph. 2020;84 doi: 10.1016/j.compmedimag.2020.101749. [DOI] [PubMed] [Google Scholar]
  • 28.Wu W., Zhang J., Xie H., Zhao Y., Zhang S., Gu L. Automatic detection of coronary artery stenosis by convolutional neural network with temporal constraint. Comput Biol Med. 2020;118 doi: 10.1016/j.compbiomed.2020.103657. [DOI] [PubMed] [Google Scholar]
  • 29.Danilov V.V., Klyshnikov K.Y., Gerget O.M., et al. Real-time coronary artery stenosis detection based on modern neural networks. Sci Rep. 2021;11:7582. doi: 10.1038/s41598-021-87174-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Shoujun Z., Jian Y., Yongtian W., Wufan C. Automatic segmentation of coronary angiograms based on fuzzy inferring and probabilistic tracking. BioMed Eng Online. 2010;9:40. doi: 10.1186/1475-925X-9-40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Kerkeni A., Benabdallah A., Manzanera A., Bedoui M.H. A coronary artery segmentation method based on multiscale analysis and region growing. Comput Med Imag Graph. 2016;48:49–61. doi: 10.1016/j.compmedimag.2015.12.004. [DOI] [PubMed] [Google Scholar]
  • 32.Labrecque Langlais É., Corbin D., Tastet O., et al. Evaluation of stenoses using AI video models applied to coronary angiography. NPJ Digit Med. 2024;7:1–13. doi: 10.1038/s41746-024-01134-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Arefinia F., Aria M., Rabiei R., Hosseini A., Ghaemian A., Roshanpoor A. Non-invasive fractional flow reserve estimation using deep learning on intermediate left anterior descending coronary artery lesion angiography images. Sci Rep. 2024;14:1818. doi: 10.1038/s41598-024-52360-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kim Y., Roh J.-H., Kweon J., et al., editors. Artificial intelligence-based quantitative coronary angiography of major vessels using deep-learningInt J Cardiol. 2024;405 doi: 10.1016/j.ijcard.2024.131945. [DOI] [PubMed] [Google Scholar]
  • 35.Moon J.H., Lee D.Y., Cha W.C., et al. Automatic stenosis recognition from coronary angiography using convolutional neural networks. Comput Methods Programs Biomed. 2021;198 doi: 10.1016/j.cmpb.2020.105819. [DOI] [PubMed] [Google Scholar]
  • 36.Molenaar M.A., Selder J.L., Schmidt A.F., et al. Validation of machine learning-based risk stratification scores for patients with acute coronary syndrome treated with percutaneous coronary intervention. Eur Heart J Digit Health. 2024;5:702–711. doi: 10.1093/ehjdh/ztae071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Molenaar M.A., Bouma B.J., Asselbergs F.W., et al. Explainable machine learning using echocardiography to improve risk prediction in patients with chronic coronary syndrome. Eur Heart J Digit Health. 2024;5:170–182. doi: 10.1093/ehjdh/ztae001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Man J.P., Koole M.A.C., Meregalli P.G., et al. Digital consults in heart failure care: a randomized controlled trial. Nat Med. 2024;30:2907–2913. doi: 10.1038/s41591-024-03238-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material
mmc1.docx (1.9MB, docx)

Articles from JACC: Advances are provided here courtesy of Elsevier

RESOURCES