Auto-segmentation, radiomic reproducibility, and comparison of radiomics between manual and AI-derived segmentations for coronary arteries in cardiac [18F]NaF PET/CT images

Suning Li; Jake Kendrick; Martin A Ebert; Ghulam Mubashar Hassan; Nathaniel Barry; Keaton Wright; Sing Ching Lee; Jamie W Bellinge; Carl Schultz

doi:10.1186/s40658-025-00751-6

. 2025 Apr 27;12:42. doi: 10.1186/s40658-025-00751-6

Auto-segmentation, radiomic reproducibility, and comparison of radiomics between manual and AI-derived segmentations for coronary arteries in cardiac [¹⁸F]NaF PET/CT images

Suning Li ^1,^5,^6,^7,^✉, Jake Kendrick ^1,^6,⁷, Martin A Ebert ^1,^5,^6,⁷, Ghulam Mubashar Hassan ^1,⁷, Nathaniel Barry ^1,⁶, Keaton Wright ³, Sing Ching Lee ⁸, Jamie W Bellinge ^2,⁴, Carl Schultz ^2,⁸

PMCID: PMC12034606 PMID: 40287890

Abstract

Background

[¹⁸F]NaF is a potential biomarker for assessing cardiac risk. Automated analysis of [¹⁸F]NaF positron emission tomography (PET) images, specifically through quantitative image analysis (“radiomics”), can potentially enhance diagnostic accuracy and personalised patient management. However, it is essential to evaluate the reproducibility and reliability of radiomic features to ensure their clinical applicability. This study aimed to (i) develop and evaluate an automated model for coronary artery segmentation using [¹⁸F]NaF PET and calcium scoring computed tomography (CSCT) images, (ii) assess inter- and intra-observer radiomic reproducibility from manual segmentations, and (iii) evaluate the radiomics reliability from AI-derived segmentations by comparison with manual segmentations.

Results

141 patients from the “effects of Vitamin K and Colchicine on vascular calcification activity” (VikCoVac, ACTRN12616000024448) trial were included. 113 were used to train an auto-segmentation model using nnUNet on [¹⁸F]NaF PET and CSCT images. Reproducibility of inter- and intra-observer radiomics and reliability of radiomics from AI-derived segmentations was assessed using lower bound of intraclass correlation coefficient (ICC). The auto-segmentation model achieved an average Dice Similarity Coefficient of 0.61 ± 0.05, having no statistically significant difference compared to the intra-observer variability (p = 0.922). For the unfiltered images, 47(12.6%) CT and 25(7.5%) PET radiomics were inter-observer reproducible, while 133(35.8%) CT and 57(15.3%) PET radiomics were intra-observer reproducible. 7(9.7%) CT and 18(25.0%) PET first-order features, as well as 17(17.7%) CT GLCM features, were reproducible for both inter- and intra-observer analyses. 9.8% and 16.8% of radiomics from AI-derived segmentations showed excellent and good reliability. First-order features were most reliable (ICC > 0.75; 78/144[54.2%]) and shape features least (2/112[1.8%]). CT features demonstrated greater reliability (147/428[34.3%]) than PET (81/428 [18.9%]). Features from the left anterior descending (76/214[35.5%]) and right coronary artery (75/214[35.0%]) were more reliability than the circumflex (49/214[22.9%]) and left main (28/214[13.1%]) arteries.

Conclusions

An effective segmentation model for coronary arteries was developed and reproducible [¹⁸F]NaF PET/CSCT radiomics were identified through inter- and intra-observer assessments, supporting their clinical applicability. The reliability of radiomics from AI-derived segmentations compared to manual segmentations was highlighted. The novelty of [¹⁸F]NaF as a biomarker underscores its potential in providing unique insights into vascular calcification activity and cardiac risk assessment.

Clinical trial registration

VIKCOVAC trial (“effects of Vitamin K and Colchicine on vascular calcification activity”). Unique identifier: ACTRN12616000024448. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=368825.

Supplementary Information

The online version contains supplementary material available at 10.1186/s40658-025-00751-6.

Keywords: [¹⁸F]NaF PET, Radiomics, Reproducibility, Coronary artery disease, Auto-segmentation

Background

Cardiovascular disease (CVD) is the leading global cause of death, with an estimated 19.91 million fatalities in 2021 [1]. Coronary artery disease (CAD), which constitutes up to half of all CVD cases, often progresses silently, highlighting the need for early detection and prevention. Coronary calcification burden, indicating atherosclerotic plaque extent, can be identified in asymptomatic patients via X-ray imaging [2–4]. The progression rate of coronary calcification is a stronger predictor of CVD events than a single baseline calcium score [5]. However, predicting coronary calcification progression remains challenging. [¹⁸F]NaF Positron Emission Tomography (PET) has proven highly effective in identifying microcalcifications (< 50 μm) that are often undetected by Computed Tomography (CT) scans [6]. As the sole tool capable of predicting calcification progression, it is invaluable for the early detection and monitoring of coronary calcification. While existing works have primarily explored basic [¹⁸F]NaF PET features such as maximum standardised uptake values (SUV_max) [7] and maximum tissue background ratio (TBR_max) [8–12], advanced quantitative imaging biomarkers may offer better predictions of coronary calcification progression.

Radiomics involves the high-throughput extraction of quantifiable metrics from regions of interest (ROIs) in medical images. Radiomics from [¹⁸F]NaF PET images have demonstrated prognostic potential across different diseases and anatomical sites, particularly in oncology settings [13–15]. However, its application in CVD has not been extensively explored. One consistent issue in the radiomics field is feature reproducibility and reliability, which has thus far hindered clinical translation of previously developed radiomics models. Reproducibility defines feature consistency across different imaging sessions, segmentation methods, scanning equipment, or imaging protocols [16]. while reliability ensures consistency of radiomic features resulting from AI-driven automated pipelines [17, 18]. With AI-based segmentation becoming widespread, assessing its alignment with manual segmentation in radiomics is essential. Identification of features that lack reproducibility or reliability enables their exclusion, enhancing the robustness and reliability of prognostic analysis and clinical risk stratification.

Segmentation of coronary arteries is crucial for extracting quantitative image features from [¹⁸F]NaF PET, yet it remains a manual, time-consuming task for clinicians. Advances in artificial intelligence have made it possible to automate this segmentation process and facilitate the extraction of quantitative metrics from images in a reproducible manner, but model development is often hindered by time consuming experimentation with image processing pipelines and architectures to design optimal configurations. The nnUNet architecture introduced by Isensee et al. [19] addresses these problems by automating key components of the model development pipeline with a robust, out-of-the-box segmentation solution [19]. The effectiveness of the nnUNet framework has been well-documented in the literature [20, 21], highlighting its potential as a framework for biomedical imaging segmentation.

The primary aim of this study was to develop an auto-segmentation model for coronary arteries using nnUNet applied to Calcium Scoring CT (CSCT) and [¹⁸F]NaF PET images and quantify its performance relative to intra-observer variability. The secondary aim was to assess inter- and intra-observer radiomic reproducibility and the reliability of radiomics features from AI-derived segmentations.

Methods

Participants

Data for this study were retrospectively collected from the “effects of Vitamin K and Colchicine on vascular calcification activity” (VikCoVac) trial [22], registered under ACTRN12616000024448 in the Australian New Zealand Clinical Trials Registry and approved by the Royal Perth Hospital Human Research Ethics Committee (REG14-095). The original study cohort has previously been described, with detailed inclusion and exclusion criteria [22]. Briefly, participants aged 50 to 80 years with a self-reported history of type-1 or type-2 diabetes mellitus were recruited between August 2015 and May 2018 and underwent baseline screening with a CT Coronary Calcium Score (CCS). Those with a CCS over ten, and an additional ten patients with a CCS of zero, underwent baseline clinical assessments and proceeded to [¹⁸F]NaF PET imaging [12, 22]. Out of the initial cohort, 141 participants were selected for this study based on data availability. All participants provided written informed consent.

Imaging

The electrocardiogram (ECG)-gated CSCT was performed using either a Philips iCT (128 detector rows, 256 slices, 120 kV, 40 mAs) or a Siemens Definition AS+ (64 detector rows, 128 slices, 120 kV, 60 mAs). All participants underwent baseline cardiac gated [¹⁸F]NaF PET/CT imaging of the thorax on a Siemens mCT64 PET/CT scanner at the Western Australia PET centre after intravenous administration of 250 MBq of [¹⁸F]NaF and a 60-minute rest period. To maintain a target heart rate of ≤ 60 bpm, some participants were administered oral beta-blocker therapy or ivabradine therapy prior to imaging. Detailed information on patient preparation, image acquisition, and reconstruction protocols can be found in Supplementary Tables 1, 2, and 3.

Coronary artery segmentation

The manual segmentation of the [¹⁸F]NaF PET coronary images employed a tailored workflow in MIM (Version 6.8.3; MIM Software, Inc). ECG-gated PET data was reconstructed in the diastolic phase and co-registered with the CSCT image, involving a manual alignment that initially matched the cardiac silhouettes of PET activity to the CSCT scan, and fine adjustments to align specific anatomical landmarks such as the aortic root, atrioventricular grooves, and ventricular blood pools. Two-dimensional ROIs were drawn by an expert reader on 3 mm consecutive axial slices of the co-registered images. All coronary arteries were segmented, including circumflex (LCx), left main (LM), left anterior descending (LAD), and right coronary artery (RCA). All structures segmented were converted to binary masks in NIfTI format using Plastimatch (Version: 0.4.3).

To evaluate intra-observer variability in manual segmentations, the same reader repeated the process two months after the initial segmentations on a subset of 10 patients from the cohort. The observer was blinded to the initial patient segmentations to ensure an unbiased assessment. The intra-observer variability was quantified by comparing the two sets of segmentations, employing quantitative measures including precision, recall, dice similarity coefficient (DSC), Hausdorff distance (HD) and HD95 (95th percentile of HD). The intra-observer variability was compared with the auto-segmentation results.

For inter-observer radiomic reproducibility, a second reader repeated the segmentation process on a separate set of 10 randomly selected patients. Radiomic reproducibility was assessed across both intra-observer segmentations and inter-observer segmentations, reflecting variability due to differing levels of expertise.

Model training

Prior to model training, PET images were converted to SUV normalised to the injected activity per patient weight. Out of 141 patients, 113 were randomly assigned to the training set, while the remaining 28, including all 10 patients with two sets of manual contours, were allocated to the testing set. Baseline CSCT and PET scans were used for training, wherein the four coronary arteries were separately labelled. Following nnUNet’s default protocol, 5-fold cross-validation was conducted, incorporating 22–23 patients in the validation set for each fold, and 1000 epochs executed per fold. Based on recommendations in the literature [23], the DiceTopK10 loss function within the 3D full-resolution configuration was used. The precision, recall, DSC, HD, and HD95 metrics were quantified at the voxel level to evaluate the model’s performance with respect to the manual segmentations for each artery.

Radiomics feature extraction

Radiomic features were extracted from 10 patients with inter-observer contours, 10 patients with intra-observer contours, and 28 participants with both manual and automated segmentations of the four coronary arteries. Feature extraction was carried out using the PyRadiomics library (Version: 3.0.1) in Python (Version: 3.10.6), adhering to the Image Biomarker Standardisation Initiative (IBSI) guidelines with any deviations documented [24]. 107 features were extracted per coronary artery per image, including shape (n = 14), first-order (n = 18), grey level co-occurrence matrix (GLCM, n = 24), grey level dependence matrix (GLDM, n = 14), grey level run length matrix (GLRLM, n = 16), grey level size zone matrix (GLSZM, n = 16) and neighbourhood grey tone difference matrix (NGTDM, n = 5) features. To evaluate the effect of image filtering on feature reproducibility and reliability, a range of filters including Laplacian of Gaussian (LoG, σ = 2, 3, 4, 5 mm), square, square root, logarithmic, exponential, and 3D wavelet transformations (eight decompositions) were applied. Given that shape features are unaffected by image filtering, an additional 1488 features were extracted per coronary artery per patient.

For texture feature extraction, grey level discretisation employed a fixed-bin-width method to attain a bin count ranging between 16 and 128, in accordance with recommended PET image analysis practices [25]. Widths of 0.05, 0.1, 0.15 and 0.2 SUV were considered. Omitting the bin width analyses, a bin width of 0.1 SUV for PET images was adopted. For CSCT images, the bin width was kept at 25 HU. Additionally, the masks from both manual and automated segmentation were uniformly dilated by 1, 2, and 3 CT voxels (SimpleITK, Version: 2.2.0) to assess the effect of dilation on feature reliability.

Statistical analysis

To compare the voxel-level segmentation performance metrics between the automated model and the secondary contours in the intra-observer analysis, for the 10 patients in the testing set, the Wilcoxon signed-rank test was used. Inter- and intra-observer radiomic reproducibility and reliability of radiomics from AI-derived segmentations were evaluated using intraclass correlation coefficient (ICC, Psych Version: 2.4.1, R Version: 4.2.2). The ICC values were calculated using a two-way mixed effects model with absolute agreement definition based on single rater measurements [26] as follows:

where MS_B, MS_W and MS_R are the between-subjects, within-subjects, and raters mean squares, respectively, k is the number of raters, and n is the number of subjects. ICC was interpreted using Koo and Li’s guidelines [26], where: ICC ≥ 0.9 corresponds to excellent reliability; 0.75 ≤ ICC < 0.9 corresponds to good reliability; 0.5 ≤ ICC < 0.75 corresponds to moderate reliability; and ICC < 0.5 corresponds to poor reliability. For robust radiomics assessment, the lower bound (LB) of the ICC’s 95% confidence interval (CI) was utilised. Features with the LB ICC 95% CI above 0.75 were considered robust.

Results

Patient characteristics

The characteristics of the participants for both training and testing set are summarised in Table 1. The majority of the study cohort was male, representing 68.1% (77/113) of the training set and 64.3% (18/28) of the testing set. For both datasets, the patient’s median age was 66 (training range: 50–80, testing range: 56–78). The training set had median CCS of 154.0 (range: 1.0–4359.0), and the testing set had median CCS of 325.5 (range: 13.0-1878.0).

Table 1.

Patient characteristic summary

Auto-Segmentation Study
Characteristic	Training Set ( n = 113)	Testing Set ( n = 28)
Gender, n(%)
Female	36 (31.9)	10 (35.7)
Male	77 (68.1)	18 (64.3)
Age (years), median (range)	66 (50–80)	66 (56–78)
Antihypertensives, n (%)
Yes	83 (73.5)	24 (85.7)
No	30 (26.5)	4 (14.3)
Statins, n (%)
Yes	80 (70.8)	23 (82.1)
No	33 (29.2)	5 (17.9)
Insulin, n (%)
Yes	29 (25.7)	7 (25.0)
No	84 (74.3)	21 (75.0)
Body Mass Index, median (range)	29.8 (19.9–43.9)	31.6 (22.7–47.2)
eGFR, median (range)	84.0 (42.0–90.0)	87.0 (52.0–90.0)
LDL cholesterol (mmol/L), median (range)	2.1 (0.8–4.4)	1.8 (0.9–4.2)
HbA1c (%), median (range)	7.2 (5.2–10.4)	7.4 (5.7–13.0)
CCS Baseline, median (range)	154.0 (1.0–4359.0)	325.5 (13.0–1878.0)
Reproducibility Study
Characteristic	Inter-observer ( n = 10)	Intra-observer ( n = 10)
Gender, n (%)
Female	3 (30.0)	3 (30.0)
Male	7 (70.0)	7 (70.0)
Age (years), median (range)	62 (53–72)	62 (53–72)
Antihypertensives, n (%)
Yes	8 (80.0)	10 (100.0)
No	2 (20.0)	0 (0.0)
Statins, n (%)
Yes	6 (60.0)	7 (70.0)
No	4 (40.0)	3 (30.0)
Insulin, n (%)
Yes	2 (20.0)	3 (30.0)
No	8 (80.0)	7 (70.0)
Body Mass Index, median (range)	30.8 (23.8–36.6)	32.2 (22.9–47.2)
eGFR, median (range)	90.0 (64.0–90.0)	87.0 (52.0–90.0)
LDL cholesterol (mmol/L), median (range)	2.0 (1.6–4.0)	1.7 (0.9–4.2)
HbA1c (%), median (range)	7.5 (6.5–9.0)	7.5 (5.8–13.0)
CCS Baseline, median (range)	253.5 (4.8–2162.0)	184.5 (23.3–1373.5)

Open in a new tab

Automated segmentation model performance

The voxel-level performance metrics from the cross-validation of the nnUNet auto-segmentation model are summarised in Table 2. The nnUNet model failed to predict the mask for LM in one instance during the second fold of training; consequently, distance-based metrics for LM were not reported in the cross-validation results. The nnUNet model achieved a 5-fold cross-validation mean DSC/HD95 of 0.56 ± 0.01/6.32 ± 1.74 mm for LCx, 0.66 ± 0.03/4.63 ± 1.11 mm for LAD, 0.57 ± 0.04 for LM, and 0.64 ± 0.02/6.51 ± 1.27 mm for RCA.

Table 2.

Cross-validation results for the auto-segmentation model

Metrics (Avg ± SD)	Artery
Metrics (Avg ± SD)	LCx	LAD	LM	RCA	All
Precision	0.63 ± 0.03	0.69 ± 0.04	0.65 ± 0.04	0.71 ± 0.02	0.67 ± 0.04
Recall	0.53 ± 0.01	0.64 ± 0.01	0.56 ± 0.04	0.61 ± 0.03	0.59 ± 0.05
DSC	0.56 ± 0.01	0.66 ± 0.03	0.57 ± 0.04	0.64 ± 0.02	0.61 ± 0.05
HD (mm)	12.70 ± 3.39	11.63 ± 2.14	N/A	19.29 ± 5.32	N/A
HD95 (mm)	6.32 ± 1.74	4.63 ± 1.11	N/A	6.51 ± 1.27	N/A

Open in a new tab

Avg = average, SD = standard deviation, LCx = circumflex artery, LAD = left anterior descending artery, LM = left main artery, RCA = right coronary artery, DSC = dice similarity coefficient, HD = Hausdorff distance

Results are showing voxel-level metrics averaged within each fold across all patients and further averaged across the five folds

The nnUNet model failed to predict the mask for LM in one instance during the second fold of training; consequently, distance-based metrics for LM were not reported in the cross-validation results

For the ‘All’ artery category, results are first averaged across the four coronary arteries for each patient. Standard deviations are calculated using the average values of each fold

Voxel-level performance metrics for both the testing set of the nnUNet auto-segmentation model and the intra-observer variability analysis are summarised in Table 3. The testing set achieved mean DSC/HD95 of 0.59 ± 0.13/5.18 ± 3.52 mm for LCx, 0.66 ± 0.11/3.57 ± 2.35 mm for LAD, 0.60 ± 0.15/2.78 ± 1.09 mm for LM, and 0.68 ± 0.09/5.42 ± 4.70 mm for RCA. In the intra-observer variability study, the mean DSC/HD95 achieved was 0.56 ± 0.10/5.15 ± 3.30 mm for LCx, 0.70 ± 0.08/3.21 ± 1.86 mm for LAD, 0.66 ± 0.14/2.38 ± 1.15 mm for LM, and 0.61 ± 0.15/5.58 ± 5.21 mm for RCA. Figure 1 displays voxel-level metric comparisons for the 10 patients with both automated and secondary contours by the same clinician. No statistically significant differences were observed except for the recall metric for LM, where the secondary contour has a significantly higher recall than the automated model (p = 0.010, median: 0.71 vs. 0.54). Additionally, the automated model showed higher precision for the LCx than the secondary contour, though this difference was not statistically significant (p = 0.084, median: 0.67 vs. 0.56). Figure 2 illustrates an example of the auto-segmentation results, showing axial CSCT slices with both manual and nnUNet segmentations for a patient.

Table 3.

Voxel-level result metrics for the intra-observer and the testing set

Metrics (Avg ± SD)	Artery
Metrics (Avg ± SD)	LCx	LAD	LM	RCA	All
Intra-observer ( n = 10)
Precision	0.60 ± 0.11	0.72 ± 0.09	0.64 ± 0.14	0.64 ± 0.17	0.65 ± 0.13
Recall	0.54 ± 0.14	0.69 ± 0.12	0.71 ± 0.19	0.62 ± 0.20	0.64 ± 0.17
DSC	0.56 ± 0.10	0.70 ± 0.08	0.66 ± 0.14	0.61 ± 0.15	0.63 ± 0.13
Hausdorff distance (mm)	8.91 ± 6.06	6.98 ± 4.72	4.56 ± 1.81	12.53 ± 10.23	8.24 ± 6.88
HD95 (mm)	5.15 ± 3.30	3.21 ± 1.86	2.38 ± 1.15	5.58 ± 5.21	4.08 ± 3.42
Testing Set ( n = 28)
Precision	0.69 ± 0.11	0.68 ± 0.12	0.69 ± 0.19	0.68 ± 0.12	0.68 ± 0.14
Recall	0.54 ± 0.17	0.67 ± 0.14	0.55 ± 0.15	0.72 ± 0.13	0.62 ± 0.17
DSC	0.59 ± 0.13	0.66 ± 0.11	0.60 ± 0.15	0.68 ± 0.09	0.63 ± 0.13
Hausdorff distance (mm)	10.44 ± 6.84	8.58 ± 4.76	4.79 ± 1.79	15.00 ± 11.06	9.70 ± 7.81
HD95 (mm)	5.18 ± 3.52	3.57 ± 2.35	2.78 ± 1.09	5.42 ± 4.70	4.24 ± 3.35

Open in a new tab

For the ‘All’ artery category, metrics are first averaged across the four coronary arteries for each patient. Standard deviations are derived from the results of individual patient

Results are calculated by averaging across all patients

Fig. 1 — Comparison of auto-segmentation results with Intra-observer variability. Voxel-level distribution of DSC, precision, and recall metrics for individual coronary arteries, and averaged across all arteries for each of the 10 patients in the testing set with both secondary manual segmentations by the same observer and automated segmentations. P-values from Wilcoxon signed-rank tests are indicated for each metric comparison. The box plot shows the interquartile range (IQR) and whiskers for data within the 1.5 IQR

Fig. 2 — Axial CSCT slices featuring manual and nnUNet segmentations for a patient. nnUNet segmentation metrics detailed as follows: Precision for LCx, LAD, LM, and RCA are 0.85, 0.83, 0.77, and 0.74, respectively; Recall for LCx, LAD, LM, and RCA are 0.68, 0.68, 0.43, and 0.68, respectively; DSC for LCx, LAD, LM, and RCA are 0.76, 0.75, 0.55, and 0.71, respectively. GT = ground truth manual segmentation, PRED = predicted automated segmentation

Inter- and Intra-observer radiomic reproducibility

Figure 3 resents the inter- and intra-observer radiomic reproducibility results, highlighting the proportion of features within different ICC categories [26] for original unfiltered images across arteries and imaging modalities, feature categories, and arteries. Among the 428 features extracted from the four arteries per modality, intra-observer reproducibility analysis revealed that 62 (16.7%) of CSCT features and 13 (3.5%) of PET features achieved excellent ICC, while 71 (19.1%) of CSCT and 46 (12.4%) of PET features showed good ICC. Moderate ICC was observed in 87 (23.4%) of CSCT and 118 (31.7%) of PET features, and poor ICC was identified in 152 (40.9%) of CSCT and 195 (52.3%) of PET features. Inter-observer reproducibility was lower overall, with only 13 (3.5%) of CSCT and 10 (2.7%) of PET features exhibiting excellent ICC. Good ICC was observed in 35 (9.4%) of CSCT and 25 (6.7%) of PET features, moderate ICC in 89 (23.9%) of CSCT and 24 (6.5%) of PET features, and poor ICC in 235 (63.2%) of CSCT and 313 (84.1%) of PET features.

Fig. 3 — Inter- and Intra- observer Radiomic Reproducibility. Proportion of features for different lower bound of ICC categories for the original unfiltered images across arteries, and imaging modalities (baseline CT, baseline [¹⁸F]NaF PET): (a) intra-observer radiomic reproducibility; (b) inter-observer radiomic reproducibility

First-order and GLCM features demonstrated the highest reproducibility. Specifically, for 7 (9.7%) of CT and 18 (25.0%) of PET first-order features, as well as 17 (17.7%) of CT GLCM features, were reproducible for both inter- and intra-observer analyses. In contrast, features from other categories showed minimal reproducibility across both inter- and intra-observer assessments.

For the clinically recognised SUV_max feature, intra-observer reproducibility demonstrated ICC values with 95% confidence intervals of 0.99 (0.98–1.00), 0.96 (0.84–0.99), 0.92 (0.71–0.98), and 0.91 (0.71–0.98) for the LM, RCA, LAD, and LCx arteries, respectively. Inter-observer reproducibility showed slightly lower ICC values of 0.98 (0.91–0.99), 0.97 (0.81–0.99), 0.89 (0.64–0.97), and 0.80 (0.38–0.95) for the same arteries.

Comprehensive results illustrating the impact of image filtering on inter- and intra-observer radiomic reproducibility are provided in the Supplementary Fig. 1 and Supplementary Fig. 2.

Comparison of radiomics from manual and AI-derived segmentations

ICC values for reliability of radiomics extracted from AI-derived segmentations, comparing with manual segmentations, using the original unfiltered PET images and a bin width of 0.1 SUV with 95% CIs are illustrated in Supplementary Fig. 3 for all four coronary arteries. Of the 107 features from each of the four arteries, totalling 428 feature extractions per modality, 65 (15.2%) of CSCT and 19 (4.4%) of PET features demonstrated excellent ICC, 82 (19.2%) of CSCT and 62 (14.5%) of PET features exhibited good ICC, 103 (24.1%) of CSCT and 145 (33.9%) of PET features showed moderate ICC, and 178 (41.6%) of CSCT and 202 (47.2%) of PET features had poor ICC. Four PET features (first-order: root mean squared, mean, 90th percentile, median) and six CSCT features (first-order: entropy, maximum, range; GLCM: difference entropy, joint entropy; GLDM: small dependence emphasis) consistently maintained reliability across all coronary arteries, with their entire ICC CI above 0.75.

Among the coronary arteries, a higher number of PET features from LAD and RCA exhibited at least “good” reliability compared to features from the LCx and LM (ICC > 0.75: LAD, 34[31.8%]; RCA, 24[22.4%]; LCx, 6/[5.6%]; LM, 17/[15.9%]). Consistent with the previous inter- and intra-observer reproducibility findings, first-order PET features displayed the highest reliability (ICC > 0.75; 38[52.8%]), followed by GLCM features (ICC > 0.75: 24[25.0%]). The percentages of CSCT and PET features with excellent and good ICC from the original unfiltered images across all feature categories and coronary arteries, organised in descending order by average percentage are presented in Fig. 4.

Fig. 4 — Reliability of AI-segmented features from original images. Bar chart illustrating the percentage of features from the original unfiltered CSCT images and [¹⁸F]NaF PET images using a 0.1 SUV bin width without mask dilation that have a lower bound of the ICC 95% CI above 0.75. The chart highlights the robustness of CT and PET features across different feature categories and coronary arteries, with the sequence of feature categories, coronary arteries, and image modalities arranged from left to right in descending order based on average percentage

The reliability of the SUV_max feature in the 10 testing set patients with both intra-observer contours and automated segmentations is illustrated in Fig. 5. Wilcoxon signed-rank tests revealed no statistically significant differences in SUV_max values between the automated segmentation and the first contour, the automated segmentation and the secondary contour, and between the two contours (p≫0.05). ICC values for SUV_max were higher when comparing automated segmentation to either set of manual contours than between the two manual contours, except for a minor reduction in the LM artery. These results indicate that SUV_max values extracted from the AI-derived segmentations were reliable and had accuracy comparable to intra-observer variability.

Fig. 5 — Figures illustrating the reliability of AI-segmented SUV_max feature from the original unfiltered [¹⁸F]NaF PET images. (a) Boxplot of SUV_max values for individual coronary arteries for each of the 10 patients in the testing set with both secondary manual segmentations by the same observer and automated segmentations. P-values from Wilcoxon signed-rank tests were calculated to compare SUV_max values between the automated segmentation and the first contour, the automated segmentation and the secondary contour, and between the two contours. The box plot shows the IQR and whiskers for data within the 1.5 IQR. (b) ICC values and their associated 95% CI plotted for SUV_max extracted from the original unfiltered [¹⁸F]NaF PET image. The ICC values are calculated for between the automated segmentation and the first contour, the automated segmentation and the secondary contour, and between the two contours

A comprehensive heatmap displaying reliability of radiomics from AI-derived segmentations across various modalities, bin widths, coronary arteries, image filtering methods, and feature categories is presented in Supplementary Fig. 4. Details on the effects of image filtering and bin width settings on the reliability of radiomics from AI-derived segmentations, along with comparisons to existing literature, are also provided in the supplementary materials.

Effect of mask dilation on reliability of radiomics from AI-derived segmentations

Dilating both the prediction and manual masks increased the number of reliable PET features from AI-derived segmentations across all arteries and feature categories. Changes in the number of features across different ICC categories with uniform mask dilation of 1, 2, and 3 voxels are illustrated in Fig. 6a). Notably, for PET images, the number of features in the excellent ICC category significantly increased from 19 (4.4%) with no dilation to 50 (11.7%), 75 (17.5%), and 90 (21.0%) with progressive dilation, while those in the poor ICC category decreased from 202 (47.2%) to 161 (37.6%) with 1 voxel dilation and 150(35.0%) with 2 and 3 voxels of dilation. In contrast, CSCT features did not exhibit consistent improvement in reliability with mask dilation; more features reached excellent ICC without dilation (65 [15.2%]) than with 1 voxel (22 [5.1%]), 2 voxels (22 [5.1%]), and 3 voxels of dilation (29 [6.8%]). Figure 6b) provides a heatmap detailing the impact of mask dilation on the number of reliable features across different feature categories and coronary arteries, revealing a general increase in reliable PET features from AI-derived segmentations. However, for CSCT images, feature reliability only increased for LM features across all feature categories, while RCA features remained largely unaffected, and reliability decreased for LCx and LAD features with increased mask dilation across all feature categories.

Fig. 6 — Bar chart and heatmap illustrating the impact of mask dilation on AI-segmented feature reliability. (a) Lower bound of ICC value classification for radiomics features extracted from original, unfiltered [¹⁸F]NaF PET and CSCT images with no mask dilation and uniform 1 (DIL_1), 2 (DIL_2), 3 (DIL_3) voxel dilation. Features are categorised based on the lower end of the ICC 95% confidence interval using the Koo and Li classification scheme [26]; (b) Heatmap of ICC results depicting the proportion of features with excellent and good lower bound of ICC after applying uniform mask dilation of 1, 2, 3 voxels across imaging modalities (baseline CT, baseline PET), four coronary arteries, and feature categories, utilising a bin width of 0.1 SUV for PET images without any image filtering

Discussion

Although [¹⁸F]NaF PET has previously been used to assess atherosclerotic disease risk by predicting calcifications in the valve and aorta [11, 27], it is increasingly recognised as a valuable tool for detecting active microcalcifications and predicting disease progression in the coronary arteries [9]. Irkle et al. found that [¹⁸F]NaF selectively adsorbs to microcalcifications within atherosclerotic plaques with high affinity, making it a sensitive method for non-invasive detection of microcalcifications in active unstable atherosclerosis [7]. Ishiwata et al. reported a significant correlation between the intensity of [¹⁸F]NaF uptake and changes in calcification volumetric score and CCS over one-year follow-up [11]. Doris et al. concluded that coronary [¹⁸F]NaF uptake identifies both patients and individual coronary segments with more rapid progression of coronary calcification [10]. Additionally, Kwiecinski et al. found that the incidence of myocardial infarction was higher in patients with increased coronary [¹⁸F]NaF activity, and [¹⁸F]NaF outperformed CCS in predicting these adverse events [8]. Given the extensive evidence in the literature, thoroughly exploring the full array of radiomic features from [¹⁸F]NaF PET imaging holds significant promise for enhancing the prediction of atherosclerotic disease progression and personalising patient risk management.

Accurate and consistent segmentation of coronary arteries is crucial for reliable radiomic studies. Automated segmentation of coronary arteries using [¹⁸F]NaF PET and CSCT scans can significantly enhance clinical workflows by reducing the time required for segmentation and minimising the inter-observer variability inherent in manual methods. Few studies have focused on segmenting coronary arteries with cardiac PET images [28, 29]. Therefore, a separate model using only CSCT images was trained using the same patient cohort, achieving overall average DSC of 0.60, comparable to the multimodality model (DSC:0.61), indicating that CSCT alone suffices for effective auto-segmentation of coronary arteries. The performance of either model in this study surpasses literature reports using similar non-contrast CT scans. For instance, Bujny et al. reported a DSC of 0.57 ± 0.10 using nnUNet framework across 98 ECG-gated CT scans [20]. Shieh et al. achieved DSC scores of 0.36, 0.52, 0.42, and 0.63 for LM, LAD, LCx, and RCA respectively, using the transformer-based SwinUNETR across 250 CT scans [30]. Jin et al. developed a model combining ResNet [31] and U-Net [32] and achieved a mean DSC of 0.39 ± 0.10 for smaller cardiac substructures (valves and LAD) [33]. Morris et al. utilised magnetic resonance imaging (MRI) alongside CT employing 3D U-Net, achieving a mean DSC of 0.50 ± 0.14 for coronary arteries [34]. Lastly, Finnegan et al. developed a hybrid segmentation model using 30 radiotherapy CT and achieved mean DSC/HD of 0.07–0.17/16.31-20 across coronary arteries [35]. Developing effective auto-segmentation models using non-contrast CT images is clinically significant and highly valuable. Adding an ECG-gated non-contrast CT to the standard PET scanner workflow for attenuation correction is straightforward. Conversely, contrast-enhanced CT requires trained staff, poses allergy risks, and necessitates pre-medication and renal function considerations. Although non-contrast CT provides less detail than contrast-enhanced CT like Coronary Computed Tomography Angiography (CCTA), it delivers lower radiation doses and is more practical in clinical settings.

Nonetheless, segmenting coronary arteries using non-contrast CT scans is challenging due to their delicate structure and boundary blurring [36]. These arteries are difficult to visualise without contrast, relying heavily on anatomical knowledge for accurate contouring relative to observable cardiac structures [35]. Additionally, the model in this study consistently demonstrated higher precision than recall, suggesting an underestimation of segmentation dimensions, a finding consistent with other studies [37, 38]. This often led to missed contours at the edge slices of manual segmentations, particularly the initial and final slices. The model is also limited by using a single reader for ground-truth segmentations. Future work should incorporate multiple readers and consensus volumes. Despite these challenges, the model’s performance was comparable to intra-observer variability, underscoring its potential effectiveness in clinical settings.

Quantitative radiomic features from segmented images have significant prognostic potential, reflecting underlying anatomical and physiological processes. However, radiomics reproducibility analysis is essential to ensure their clinical applicability. Predictive models using radiomic features sensitive to manual segmentation variations can yield inconsistent results and overfitting. Several studies have investigated radiomic stability in cardiac imaging. Yunus et al. found that 23.5% of manually segmented features and 24.1% of semi-automatically segmented features from 30 CCTA images were reproducible (ICC > 0.75) [39]. In this study, using a stricter criterion of LB ICC, 35.75% of CT features and 15.32% of PET features were reproducible for intra-observer analysis, while 12.63% of CT features and 7.53% of PET features were reproducible for inter-observer analysis. Other studies in the literature focused on different cardiac structures. Jang et al.‘s study on MRI myocardial features demonstrated 32-47% inter-observer and 61-73% intra-observer reproducibility (ICC > 0.8) across various image filters [40]. These findings are comparable to the results of this study (ICC > 0.8: intra-observer CT, 37.63-79.84%; PET, 25.81-71.24%; inter-observer: CT, 21.51-66.13%; PET, 11.29-41.67%). The relatively larger and less variable myocardial structures likely contribute to the higher reproducibility observed in myocardial studies compared to coronary arteries.

With the growing number of studies and software solutions focused on developing auto-segmentation tools for various biological structures and imaging modalities, many clinics have begun integrating these tools into practice. This raises a critical question: Are radiomics derived from AI-segmented structures reliable and consistent with those extracted from manual segmentations? In this study, the reliability of radiomics from AI-derived segmentations was thoroughly evaluated, with particular emphasis on the widely recognised SUV_max feature. The SUV_max from AI-derived segmentations was compared with SUV_max derived from both sets of manual contours provided by the expert reader, and no statistically significant differences were observed in either comparison (p≫0.05). Additionally, the reliability of SUV_max from AI-derived segmentations was found to be comparable to the intra-observer reproducibility of SUV_max from the expert reader, supporting its use in future studies with confidence. For the broader range of radiomic features, the reliability of radiomics from AI-derived segmentations was similarly comparable to intra-observer reproducibility (LB ICC > 0.75: radiomics from AI-derived segmentations, 34.35% CT, 18.93% PET, intra-observer radiomics, 35.75% CT, 15.32% PET).

The reliability of coronary radiomics from AI-derived segmentations, comparing with manual segmentations, under the effect of mask dilation was investigated due to the underestimation of the mask from the auto-segmentation model, the small size of these arteries, comparable to the resolution of non-contrast CT scans, and the typical containment of local [¹⁸F]NaF PET signal within the arteries. Peripheral image signals near the segmented ROI can still be within the ROI and capture early pathological changes. These signals may be vital for precise disease detection and could enhance the clinical utility of radiomic analyses. Mask dilation consistently improved PET feature reliability from AI-derived segmentations but not CT feature reliability. This difference can be attributed to the intrinsic characteristics of PET and CT imaging. PET images measure metabolic activity, which is highly variable and sensitive to slight changes in the ROI. Mask dilation in PET increases sensitivity and smooths noise, enhancing stability but potentially reducing specificity. Conversely, CT captures stable anatomical details, making its radiomic features less affected by ROI changes and less benefited from mask dilation. While studies on how mask dilation affects the reliability of radiomic features are lacking, literature has explored the robustness of radiomic features against morphological operations like erosion, dilation, and contour randomisation [41–43].

The significance of radiomic reproducibility from manual segmentations and radiomic reliability from AI-derived segmentations becomes particularly important when considering the application of radiomic models across diverse clinical settings. In this study, the inter- and intra-observer reproducibility of radiomics and the reliability of radiomics from AI-derived segmentations radiomics extracted in [¹⁸F]NaF PET and CSCT images of coronary arteries was assessed. While literature on the robustness of coronary radiomics is sparse and no existing studies explore [¹⁸F]NaF PET imaging, this study aims to fill this gap. Raw reliability results for radiomics from AI-derived segmentations are provided in Supplementary Table 4 for the wider research community.

This study is the first to our knowledge to evaluate the reproducibility and the reliability of AI-segmentation derived [¹⁸F]NaF PET radiomics for coronary arteries. However, it has limitations, including a small, predominantly diabetic cohort with early-stage coronary calcification, potentially skewing outcomes. The retrospective selection from a clinical trial and the post-hoc analysis require confirmation in larger studies. Despite prospective image acquisition using coronary-specific protocols, clinical trials may not reflect the broader population, limiting generalisability. Another limitation of this study is the batch effect from using two different CSCT scanners, as scanner-specific differences in resolution and noise may impact radiomic feature reproducibility and reliability. Although preprocessing steps were standardised, residual variability from these device differences could influence the results. Future studies should consider harmonisation techniques, such as ComBat [44], to adjust for scanner variability, which would improve feature robustness across multi-centric settings. While this study assessed inter- and intra-observer reproducibility and the reliability of radiomics from AI-derived segmentations, it did not investigate other aspects of feature robustness, such as test-retest repeatability and reproducibility across different scanning equipment and reconstruction settings. Future research should address these aspects to identify radiomic instabilities and assess clinical utility. Addressing variability requires more than excluding sensitive features, as this may remove predictive ones—standardising image processing and feature extraction is essential for reliability and clinical utility.

Conclusions

In this retrospective study, an automated segmentation model for the coronary arteries using the nnUNet framework and combined CSCT and [¹⁸F]NaF PET images was trained. The model’s performance was found to be statistically comparable to intra-observer variability, indicating its potential effectiveness for clinical implementation. inter- and intra-observer reproducibility and AI-derived segmentation radiomic reliability assessments revealed that CSCT features were more robust than [¹⁸F]NaF PET features, with first-order and GLCM features proving the most robust, and shape features the least. The feature robust metrics established in this study provide a foundation for selecting robust features for future coronary studies. To validate these findings, further studies with larger sample sizes and varying degrees of coronary calcification are recommended.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1^{(7.3MB, docx)}

Supplementary Material 2^{(60KB, docx)}

Supplementary Material 3^{(15.4MB, png)}

Supplementary Material 4^{(12.6MB, png)}

Supplementary Material 5^{(6.2MB, png)}

Supplementary Material 6^{(24.6MB, png)}

Supplementary Material 7^{(17.5MB, png)}

Supplementary Material 8^{(2MB, png)}

Supplementary Material 9^{(33.7KB, png)}

Supplementary Material 10^{(20.6KB, docx)}

Acknowledgements

This research was supported by the Research and Training Program at the University of Western Australia and funded by the Charlies Foundation for Research. Significant guidance was provided by the University of Western Australia Medical Physics Machine Learning Cluster.

Author contributions

SL (Corresponding Author) developed the model, conducted data analysis and evaluation, and prepared the manuscript. KW, JB, CS, and SCL were responsible for acquiring, contextualizing, and organising the clinical trial data. JB and SCL performed the manual segmentation of the coronary arteries. JB and CS provided interpretation of the data from a cardiac perspective. JK, GMH, and NB offered valuable advice on statistical methodologies. MAE contributed significantly to the study design and methodologies. JK, JB, and CS made substantial revisions to the manuscript. All authors contributed to the study’s conception and design, read, and approved the final manuscript. Additionally, all authors agree to be personally accountable for their contributions and to ensure that any questions related to the accuracy or integrity of any part of the work, even those not directly involved, are properly investigated and resolved, with the resolution documented in the literature.

Funding

This work was supported by the Charlies Foundation for Research with a grant to present at RANZCR conference. Author Suning Li received research support from Research and Training Program from the University of Western Australia. JWB is supported by a Fulbright Scholarship and Research Establishment Fellowship from the Royal Australasian College of Physicians.

Data availability

Raw Reproducibility data generated or analysed during this study are included in this published article and its supplementary information files. Original patient data are not publicly available due to ethics reasons but are available from the corresponding author on reasonable request.

Declarations

Ethics approval and consent to participate

Ethics approval for undertaking this study was acquired from the Royal Perth Hospital Human Research and Ethics Committee (REG14-095) and the study was performed in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. All participants provided written informed consent.

Consent for publication

All participants provided written informed consent.

Competing interests

The authors declare that they have no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Martin SS, et al. 2024 heart disease and stroke statistics: A report of US and global data from the American heart association. Circulation. 2024;149(8):e347–913. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Budoff MJ, et al. Ten-year association of coronary artery calcium with atherosclerotic cardiovascular disease (ASCVD) events: the multi-ethnic study of atherosclerosis (MESA). Eur Heart J. 2018;39(25):2401–. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Budoff MJ, et al. Progression of coronary artery calcium predicts All-Cause mortality. Jacc-Cardiovasc Imag. 2010;3(12):1229–36. [DOI] [PubMed] [Google Scholar]
4.Budoff MJ, et al. Assessment of coronary artery disease by cardiac computed tomography - A scientific statement from the American heart association committee on cardiovascular imaging and intervention, Council on cardiovascular radiology and intervention, and committee on cardiac imaging, Council on clinical cardiology. Circulation. 2006;114(16):1761–91. [DOI] [PubMed] [Google Scholar]
5.Lehmann N, et al. Value of progression of coronary artery calcification for risk prediction of coronary and cardiovascular events: result of the HNR study (Heinz Nixdorf Recall). Circulation. 2018;137(7):665–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Tzolos E, Dweck MR. (18)F-Sodium fluoride ((18)F-NaF) for imaging microcalcification activity in the cardiovascular system. Arterioscler Thromb Vasc Biol. 2020;40(7):1620–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Irkle A, et al. Identifying active vascular microcalcification by (18)F-sodium fluoride positron emission tomography. Nat Commun. 2015;6:7495. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Kwiecinski J, et al. Coronary (18)F-Sodium fluoride uptake predicts outcomes in patients with coronary artery disease. J Am Coll Cardiol. 2020;75(24):3061–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Dweck MR, et al. Coronary arterial 18F-sodium fluoride uptake: a novel marker of plaque biology. J Am Coll Cardiol. 2012;59(17):1539–48. [DOI] [PubMed] [Google Scholar]
10.Doris MK, et al. Coronary (18)F-Fluoride uptake and progression of coronary artery calcification. Circ Cardiovasc Imaging. 2020;13(12):e011438. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Ishiwata Y, et al. Quantification of Temporal changes in calcium score in active atherosclerotic plaque in major vessels by (18)F-sodium fluoride PET/CT. Eur J Nucl Med Mol Imaging. 2017;44(9):1529–37. [DOI] [PubMed] [Google Scholar]
12.Bellinge JW, et al. ¹⁸F-Sodium fluoride positron emission tomography activity predicts the development of new coronary artery calcifications. Arterioscler Thromb Vasc Biol. 2021;41(1):534–541. [DOI] [PubMed] [Google Scholar]
13.Kairemo K, et al. A retrospective comparative study of sodium fluoride (NaF-18)-PET/CT and fluorocholine (F-18-CH) PET/CT in the evaluation of skeletal metastases in metastatic prostate cancer using a volumetric 3-D radiomics analysis. Diagnostics. 2020;11(1):17. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Sheppard A, et al. Utilizing 18F-NaF PET/CT to inform pathologic and radiographic progression of an Ultra-rare case of Familial tumoral calcinosis (FTC): plus a case for synergistic information. Soc Nuclear Med; 2023.
15.Kairemo K et al. 18F-sodium fluoride positron emission tomography (NaF-18-PET/CT) radiomic signatures to evaluate responses to alpha-particle Radium-223 dichloride therapy in osteosarcoma metastases. Curr. Probl. Cancer. 2021;45(5):100797. [DOI] [PubMed]
16.Traverso A, et al. Repeatability and reproducibility of radiomic features: A systematic review. Int J Radiat Oncol Biol Phys. 2018;102(4):1143–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Jin J, et al. The accuracy and radiomics feature effects of multiple U-net-Based automatic segmentation models for transvaginal ultrasound images of cervical cancer. J Digit Imaging. 2022;35(4):983–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Sousa-Nunes F, et al. Reproducibility of epicardial adipose tissue in Non-Contrast CT images: manual vs Semi-Automatic segmentation. J Cardiovasc Comput Tomogr. 2023;17(1):S4. [Google Scholar]
19.Isensee F, et al. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. 2021;18(2):203–11. [DOI] [PubMed] [Google Scholar]
20.Bujny M et al. Coronary artery segmentation in non-contrast calcium scoring CT images using deep learning. arXiv preprint arXiv:2403.02544, 2024.
21.Kendrick J, et al. Fully automatic prognostic biomarker extraction from metastatic prostate lesion segmentations in whole-body [68Ga] Ga-PSMA-11 PET/CT images. Eur J Nucl Med Mol Imaging. 2022;50(1):67–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Bellinge JW, et al. The effect of Vitamin-K(1) and Colchicine on Vascular Calcification Activity in subjects with Diabetes Mellitus (ViKCoVaC): A double-blind 2x2 factorial randomized controlled trial. J Nucl Cardiol. 2022;29(4):1855–66. [DOI] [PubMed] [Google Scholar]
23.Sudre CH et al. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. deep learning in medical image analysis and multimodal learning for clinical decision support. 2017;10553:240–8. [DOI] [PMC free article] [PubMed]
24.Zwanenburg A, et al. The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology. 2020;295(2):328–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Leijenaar RT, et al. The effect of SUV discretization in quantitative FDG-PET radiomics: the need for standardized methodology in tumor texture analysis. Sci Rep. 2015;5:11075. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15(2):155–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Dweck MR, et al. 18F-sodium fluoride uptake is a marker of active calcification and disease progression in patients with aortic stenosis. Circ Cardiovasc Imaging. 2014;7(2):371–8. [DOI] [PubMed] [Google Scholar]
28.Kim SJW, et al. Multi-atlas cardiac PET segmentation. Phys Med. 2019;58:32–9. [DOI] [PubMed] [Google Scholar]
29.Piri R, et al. Aortic wall segmentation in (18)F-sodium fluoride PET/CT scans: Head-to-head comparison of artificial intelligence-based versus manual segmentation. J Nucl Cardiol. 2022;29(4):2001–10. [DOI] [PubMed] [Google Scholar]
30.Shieh A, et al. Towards comprehensive cardiovascular analysis for Non-contrast cardiac and chest Ct: automated heart, Aorta, coronary artery and coronary artery calcium segmentation. J Cardiovasc Comput Tomogr. 2023;17(4):S67. [Google Scholar]
31.He K et al. Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
32.Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. in Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5–9, 2015, proceedings, part III 18. 2015. Springer.
33.Jin X, et al. Robustness of deep learning segmentation of cardiac substructures in Noncontrast computed tomography for breast cancer radiotherapy. Med Phys. 2021;48(11):7172–88. [DOI] [PubMed] [Google Scholar]
34.Morris ED, et al. Cardiac substructure segmentation with deep learning for improved cardiac sparing. Med Phys. 2020;47(2):576–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Finnegan RN, et al. Open-source, fully-automated hybrid cardiac substructure segmentation: development and optimisation. Phys Eng Sci Med. 2023;46(1):377–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Cui J, et al. Fully-automatic segmentation of coronary artery using growing algorithm. J Xray Sci Technol. 2020;28(6):1171–86. [DOI] [PubMed] [Google Scholar]
37.Gharleghi R, et al. Automated segmentation of normal and diseased coronary arteries–the Asoca challenge. Comput Med Imaging Graph. 2022;97:102049. [DOI] [PubMed] [Google Scholar]
38.Nannini G et al. A fully automated deep learning approach for coronary artery segmentation and comprehensive characterization. APL Bioeng. 2024;8(1). [DOI] [PMC free article] [PubMed]
39.Yunus MM et al. Reproducibility and repeatability of coronary computed tomography angiography (CCTA) image segmentation in detecting atherosclerosis: A radiomics study. Diagnostics (Basel), 2022. 12(8). [DOI] [PMC free article] [PubMed]
40.Jang J, et al. Reproducibility of Segmentation-based myocardial radiomic features with cardiac MRI. Radiol Cardiothorac Imaging. 2020;2(3):e190216. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Le E, et al. 146 Ct radiomics in carotid artery atherosclerosis: a systematic evaluation of robustness, reproducibility and predictive performance for culprit lesions. BMJ Publishing Group Ltd and British Cardiovascular Society; 2022.
42.Wright DE et al. Reproducibility in medical image radiomic studies: contribution of dynamic histogram binning. arXiv preprint arXiv:2211.05241, 2022.
43.Lo Iacono F, Pontone G, Corino VD. Assessing Left Ventricle Radiomic Features Robustness by Segmentation Perturbations. in Mediterranean Conference on Medical and Biological Engineering and Computing. 2023. Springer.
44.Orlhac F, et al. A guide to combat harmonization of imaging biomarkers in multicenter studies. J Nucl Med. 2022;63(2):172–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1^{(7.3MB, docx)}

Supplementary Material 2^{(60KB, docx)}

Supplementary Material 3^{(15.4MB, png)}

Supplementary Material 4^{(12.6MB, png)}

Supplementary Material 5^{(6.2MB, png)}

Supplementary Material 6^{(24.6MB, png)}

Supplementary Material 7^{(17.5MB, png)}

Supplementary Material 8^{(2MB, png)}

Supplementary Material 9^{(33.7KB, png)}

Supplementary Material 10^{(20.6KB, docx)}

Data Availability Statement

[CR1] 1.Martin SS, et al. 2024 heart disease and stroke statistics: A report of US and global data from the American heart association. Circulation. 2024;149(8):e347–913. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Budoff MJ, et al. Ten-year association of coronary artery calcium with atherosclerotic cardiovascular disease (ASCVD) events: the multi-ethnic study of atherosclerosis (MESA). Eur Heart J. 2018;39(25):2401–. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Budoff MJ, et al. Progression of coronary artery calcium predicts All-Cause mortality. Jacc-Cardiovasc Imag. 2010;3(12):1229–36. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Budoff MJ, et al. Assessment of coronary artery disease by cardiac computed tomography - A scientific statement from the American heart association committee on cardiovascular imaging and intervention, Council on cardiovascular radiology and intervention, and committee on cardiac imaging, Council on clinical cardiology. Circulation. 2006;114(16):1761–91. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Lehmann N, et al. Value of progression of coronary artery calcification for risk prediction of coronary and cardiovascular events: result of the HNR study (Heinz Nixdorf Recall). Circulation. 2018;137(7):665–79. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Tzolos E, Dweck MR. (18)F-Sodium fluoride ((18)F-NaF) for imaging microcalcification activity in the cardiovascular system. Arterioscler Thromb Vasc Biol. 2020;40(7):1620–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Irkle A, et al. Identifying active vascular microcalcification by (18)F-sodium fluoride positron emission tomography. Nat Commun. 2015;6:7495. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Kwiecinski J, et al. Coronary (18)F-Sodium fluoride uptake predicts outcomes in patients with coronary artery disease. J Am Coll Cardiol. 2020;75(24):3061–74. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Dweck MR, et al. Coronary arterial 18F-sodium fluoride uptake: a novel marker of plaque biology. J Am Coll Cardiol. 2012;59(17):1539–48. [DOI] [PubMed] [Google Scholar]

[CR10] 10.Doris MK, et al. Coronary (18)F-Fluoride uptake and progression of coronary artery calcification. Circ Cardiovasc Imaging. 2020;13(12):e011438. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Ishiwata Y, et al. Quantification of Temporal changes in calcium score in active atherosclerotic plaque in major vessels by (18)F-sodium fluoride PET/CT. Eur J Nucl Med Mol Imaging. 2017;44(9):1529–37. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Bellinge JW, et al. ¹⁸F-Sodium fluoride positron emission tomography activity predicts the development of new coronary artery calcifications. Arterioscler Thromb Vasc Biol. 2021;41(1):534–541. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Kairemo K, et al. A retrospective comparative study of sodium fluoride (NaF-18)-PET/CT and fluorocholine (F-18-CH) PET/CT in the evaluation of skeletal metastases in metastatic prostate cancer using a volumetric 3-D radiomics analysis. Diagnostics. 2020;11(1):17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Sheppard A, et al. Utilizing 18F-NaF PET/CT to inform pathologic and radiographic progression of an Ultra-rare case of Familial tumoral calcinosis (FTC): plus a case for synergistic information. Soc Nuclear Med; 2023.

[CR15] 15.Kairemo K et al. 18F-sodium fluoride positron emission tomography (NaF-18-PET/CT) radiomic signatures to evaluate responses to alpha-particle Radium-223 dichloride therapy in osteosarcoma metastases. Curr. Probl. Cancer. 2021;45(5):100797. [DOI] [PubMed]

[CR16] 16.Traverso A, et al. Repeatability and reproducibility of radiomic features: A systematic review. Int J Radiat Oncol Biol Phys. 2018;102(4):1143–58. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Jin J, et al. The accuracy and radiomics feature effects of multiple U-net-Based automatic segmentation models for transvaginal ultrasound images of cervical cancer. J Digit Imaging. 2022;35(4):983–92. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Sousa-Nunes F, et al. Reproducibility of epicardial adipose tissue in Non-Contrast CT images: manual vs Semi-Automatic segmentation. J Cardiovasc Comput Tomogr. 2023;17(1):S4. [Google Scholar]

[CR19] 19.Isensee F, et al. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. 2021;18(2):203–11. [DOI] [PubMed] [Google Scholar]

[CR20] 20.Bujny M et al. Coronary artery segmentation in non-contrast calcium scoring CT images using deep learning. arXiv preprint arXiv:2403.02544, 2024.

[CR21] 21.Kendrick J, et al. Fully automatic prognostic biomarker extraction from metastatic prostate lesion segmentations in whole-body [68Ga] Ga-PSMA-11 PET/CT images. Eur J Nucl Med Mol Imaging. 2022;50(1):67–79. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Bellinge JW, et al. The effect of Vitamin-K(1) and Colchicine on Vascular Calcification Activity in subjects with Diabetes Mellitus (ViKCoVaC): A double-blind 2x2 factorial randomized controlled trial. J Nucl Cardiol. 2022;29(4):1855–66. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Sudre CH et al. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. deep learning in medical image analysis and multimodal learning for clinical decision support. 2017;10553:240–8. [DOI] [PMC free article] [PubMed]

[CR24] 24.Zwanenburg A, et al. The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology. 2020;295(2):328–38. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Leijenaar RT, et al. The effect of SUV discretization in quantitative FDG-PET radiomics: the need for standardized methodology in tumor texture analysis. Sci Rep. 2015;5:11075. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15(2):155–63. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Dweck MR, et al. 18F-sodium fluoride uptake is a marker of active calcification and disease progression in patients with aortic stenosis. Circ Cardiovasc Imaging. 2014;7(2):371–8. [DOI] [PubMed] [Google Scholar]

[CR28] 28.Kim SJW, et al. Multi-atlas cardiac PET segmentation. Phys Med. 2019;58:32–9. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Piri R, et al. Aortic wall segmentation in (18)F-sodium fluoride PET/CT scans: Head-to-head comparison of artificial intelligence-based versus manual segmentation. J Nucl Cardiol. 2022;29(4):2001–10. [DOI] [PubMed] [Google Scholar]

[CR30] 30.Shieh A, et al. Towards comprehensive cardiovascular analysis for Non-contrast cardiac and chest Ct: automated heart, Aorta, coronary artery and coronary artery calcium segmentation. J Cardiovasc Comput Tomogr. 2023;17(4):S67. [Google Scholar]

[CR31] 31.He K et al. Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

[CR32] 32.Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. in Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5–9, 2015, proceedings, part III 18. 2015. Springer.

[CR33] 33.Jin X, et al. Robustness of deep learning segmentation of cardiac substructures in Noncontrast computed tomography for breast cancer radiotherapy. Med Phys. 2021;48(11):7172–88. [DOI] [PubMed] [Google Scholar]

[CR34] 34.Morris ED, et al. Cardiac substructure segmentation with deep learning for improved cardiac sparing. Med Phys. 2020;47(2):576–86. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Finnegan RN, et al. Open-source, fully-automated hybrid cardiac substructure segmentation: development and optimisation. Phys Eng Sci Med. 2023;46(1):377–93. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Cui J, et al. Fully-automatic segmentation of coronary artery using growing algorithm. J Xray Sci Technol. 2020;28(6):1171–86. [DOI] [PubMed] [Google Scholar]

[CR37] 37.Gharleghi R, et al. Automated segmentation of normal and diseased coronary arteries–the Asoca challenge. Comput Med Imaging Graph. 2022;97:102049. [DOI] [PubMed] [Google Scholar]

[CR38] 38.Nannini G et al. A fully automated deep learning approach for coronary artery segmentation and comprehensive characterization. APL Bioeng. 2024;8(1). [DOI] [PMC free article] [PubMed]

[CR39] 39.Yunus MM et al. Reproducibility and repeatability of coronary computed tomography angiography (CCTA) image segmentation in detecting atherosclerosis: A radiomics study. Diagnostics (Basel), 2022. 12(8). [DOI] [PMC free article] [PubMed]

[CR40] 40.Jang J, et al. Reproducibility of Segmentation-based myocardial radiomic features with cardiac MRI. Radiol Cardiothorac Imaging. 2020;2(3):e190216. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Le E, et al. 146 Ct radiomics in carotid artery atherosclerosis: a systematic evaluation of robustness, reproducibility and predictive performance for culprit lesions. BMJ Publishing Group Ltd and British Cardiovascular Society; 2022.

[CR42] 42.Wright DE et al. Reproducibility in medical image radiomic studies: contribution of dynamic histogram binning. arXiv preprint arXiv:2211.05241, 2022.

[CR43] 43.Lo Iacono F, Pontone G, Corino VD. Assessing Left Ventricle Radiomic Features Robustness by Segmentation Perturbations. in Mediterranean Conference on Medical and Biological Engineering and Computing. 2023. Springer.

[CR44] 44.Orlhac F, et al. A guide to combat harmonization of imaging biomarkers in multicenter studies. J Nucl Med. 2022;63(2):172–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Auto-segmentation, radiomic reproducibility, and comparison of radiomics between manual and AI-derived segmentations for coronary arteries in cardiac [18F]NaF PET/CT images

Suning Li

Jake Kendrick

Martin A Ebert

Ghulam Mubashar Hassan

Nathaniel Barry

Keaton Wright

Sing Ching Lee

Jamie W Bellinge

Carl Schultz

Abstract

Background

Results

Conclusions

Clinical trial registration

Supplementary Information

Background

Methods

Participants

Imaging

Coronary artery segmentation

Model training

Radiomics feature extraction

Statistical analysis

Results

Patient characteristics

Table 1.

Automated segmentation model performance

Table 2.

Table 3.

Fig. 1.

Fig. 2.

Inter- and Intra-observer radiomic reproducibility

Fig. 3.

Comparison of radiomics from manual and AI-derived segmentations

Fig. 4.

Fig. 5.

Effect of mask dilation on reliability of radiomics from AI-derived segmentations

Fig. 6.

Discussion

Conclusions

Electronic supplementary material

Acknowledgements

Author contributions

Funding

Data availability

Declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Auto-segmentation, radiomic reproducibility, and comparison of radiomics between manual and AI-derived segmentations for coronary arteries in cardiac [¹⁸F]NaF PET/CT images