Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Jan 1.
Published in final edited form as: Magn Reson Med. 2024 Sep 2;93(1):384–396. doi: 10.1002/mrm.30283

Automated MRI-based segmentation of intracranial arterial calcification by restricting feature complexity

Xin Wang 1, Gador Canton 2, Yin Guo 3, Kaiyu Zhang 3, Halit Akcicek 4, Ebru Yaman Akcicek 4, Thomas S Hatsukami 5, Jin Zhang 6, Beibei Sun 6, Huilin Zhao 6, Yan Zhou 6, Linda Shapiro 7, Mahmud Mossa-Basha 2, Chun Yuan 2,4, Niranjan Balu 2
PMCID: PMC11518638  NIHMSID: NIHMS2022547  PMID: 39221515

Abstract

Purpose:

To develop an automated deep learning model for MRI-based segmentation and detection of intracranial arterial calcification.

Methods:

A novel deep learning model under the variational autoencoder framework was developed. A theoretically-grounded dissimilarity loss was proposed to refine network features extracted from MRI and restrict their complexity, enabling the model to learn more generalizable MR features that enhance segmentation accuracy and robustness for detecting calcification on MRI.

Results:

The proposed method was compared with nine baseline methods on a dataset of 113 subjects and showed superior performance (for segmentation, Dice similarity coefficient: 0.620, area under precision-recall curve (PR-AUC): 0.660, 95% Hausdorff Distance: 0.848 mm, Average Symmetric Surface Distance: 0.692 mm; for slice-wise detection, F1 score: 0.823, recall: 0.764, precision: 0.892, PR-AUC: 0.853). For clinical needs, statistical tests confirmed agreement between the true calcification volumes and predicted values using the proposed approach. Various MR sequences, namely T1, time-of-flight, and SNAP, were assessed as inputs to the model, and SNAP provided unique and essential information pertaining to calcification structures.

Conclusion:

The proposed deep learning model with a dissimilarity loss to reduce feature complexity effectively improves MRI-based identification of intracranial arterial calcification. It could help establish a more comprehensive and powerful pipeline for vascular image analysis on MRI.

Keywords: calcification segmentation, intracranial arteries, deep learning, variational autoencoders, information bottleneck

1 |. INTRODUCTION

Intracranial atherosclerosis is a leading cause of ischemic stroke2. Atherosclerotic plaque may contain various plaque components that determine the risk of stroke. Among these plaque components, arterial calcification is known to be of pathological significance in stroke, and also associated with other diseases such as dementia and cognitive decline3,4,5. Therefore, the identification and segmentation of calcification are important in vascular image analysis for diagnosis and risk assessment.

Automated calcification segmentation on non-contrast computed tomography (CT) or contrast-enhanced CT angiography (CTA) has been extensively explored in the literature, where calcification has a high, easy-to-detect signal. For example, Lessmann et al.6 connected two networks to label and refine calcification on chest CT. Graffy et al.7 utilized Mask R-CNN to identify aortic calcification on abdominal CT. Weng et al.8 and Bortsova et al.9 used U-Nets to segment calcification around superficial femoral arteries and internal carotid arteries, respectively. In general, these studies commonly relied on existing deep networks established for general medical image segmentation tasks.

Despite its widespread use, CT exhibits inherent limitations which diminish its utility in intracranial arterial calcification studies.

  • Safety. CT involves patient exposure to ionizing radiation, and CTA requires the administration of invasive contrast agents, especially when longitudinal scans are essential to monitor disease progression10.

  • Comprehensive analysis with other plaque features. Intracranial arterial calcification often coexists and interacts with other atherosclerotic plaque components, such as lipid core and fibrous tissue11. However, arteries cannot be visualized with non-contrast CT. Even on CTA that uses contrast, distinguishing the type of calcification (e.g., intimal or medial) and identifying other plaque components are challenging12.

In contrast, MRI can overcome these disadvantages. Vessel wall MRI (VWI) allows for safe, non-invasive, and radiation-free imaging of various types of atherosclerotic plaques13,14, preferable for serial monitoring of atherosclerosis15. Detection of calcification on VWI enables a clinically significant joint analysis of different plaque components. Therefore, there has been a growing demand for MRI-based assessment of arterial calcification16, which our method can address.

However, it is difficult to detect calcification on MRI. As shown in Fig. 1, calcification appears dark in most MR sequences and is difficult to distinguish from noise. For example, on T1-weighted VWI, when the signal of flowing blood in the vessel lumen is suppressed, calcification adjacent to the lumen can be hard to identify; on time-of-flight MR angiography (TOF MRA), all tissue signals except flowing blood are generally suppressed. Therefore, a single MRI contrast may not suffice to segment calcification. Nonetheless, the various imaging patterns shown on different MR sequences still provide rich tissue information that may have value in calcium identification. Similar to the aforementioned CT-based segmentation, it is straightforward to train an existing network using multi-sequence MRI, by stacking sequences as a multi-channel input image. However, such a naive solution could be suboptimal, as shown in the result section.

FIGURE 1.

FIGURE 1

Original and preprocessed images. Left: Axial slices of intracranial scans from multi-sequence MRI and CT angiography (CTA). (a) T1-weighted, (b) Simultaneous Non-contrast Angiography and intraPlaque hemorrhage imaging (SNAP)1, (c) Time-of-flight (TOF) MR angiography (MRA), (d) CTA. Calcification is delineated with orange contours. Right: Extraction of 2D cross-sectional slices perpendicular to the vessel centerlines of 3D scans. An example slice from a T1 image and the corresponding longitudinal view of extracted slices from the same subject are shown on the right side.

To improve the segmentation beyond such blackbox networks, we propose to refine the network features extracted from the MR images. Following the idea of the information bottleneck (IB)17, we hypothesize that it is crucial for the MRI features to exhibit: 1) rich information about the target tissue (calcification); 2) constrained complexity essentially excluding irrelevant information. Such features are potentially more generalizable and better for segmentation18. In contrast, a naive network only satisfies the first requirement by a regular segmentation loss, but its features could be too complex, presenting a challenge in prioritizing the learning of calcification patterns, consequently impairing performance.

To reduce the complexity of MR features, we use regularization to concentrate the information in MR features on calcification. Assume we have an auxiliary image (spatially aligned to the MRI) that displays calcification but few other tissue structures. Considering features extracted separately from the MR and the auxiliary images, even though both contain information of calcification, the auxiliary feature may exhibit lower complexity due to its image consisting of simpler structures. By aligning the MR feature with the auxiliary one, we could refine the MR feature, reducing its complexity.

Such alignment of features can be achieved using variational autoencoders (VAEs)19, which extract stochastic features (probability distributions) from input data to facilitate downstream tasks such as reconstruction, detection and segmentation. VAEs are capable of refining and learning features with compressed information20. In this work, we propose a novel VAE for MRI-based calcification segmentation. Particularly, during network training, we incorporate the ground-truth calcification mask as the auxiliary image input to the network. We explicitly align the MR and auxiliary features by minimizing their divergence, such that the model learns to extract MR features with a lower complexity. Once training is done, auxiliary images are no longer needed, and the network can segment calcification using only MR images during testing.

The contributions of this article are summarized as follows:

  1. To the best of our knowledge, this is the first automated method for MRI-based intracranial arterial calcification detection. Notably, our model is theoretically grounded in the VAE framework; our strategy of feature complexity reduction provides an interpretable way to improve performance.

  2. We demonstrate the superiority of our model compared to multiple widely-used state-of-the-art approaches for segmentation, and highlight the clinical significance of our work in predicting high-level measurements, including calcium volume and slice-wise calcification occurrence.

  3. We quantitatively explore the effect of different MR sequences on enhancing calcification identification.

2 |. METHODS

2.1 |. Data acquisition

We used a dataset of 113 subjects scanned at Renji hospital, China. Use of the data was approved by the local institutional review board. The subjects underwent multi-sequence intracranial MRI and CTA scans from 2019 to 2020 during their hospitalizations due to different clinical indications. The MRI was performed using a 3T scanner (Philips Ingenia, the Netherlands) with a dedicated 16-channel phased-array carotid artery coil (Beijing TSImaging Healthcare Technology Co., China); the CTA was performed using a 320-detector row scanner (Aquilion ONE VISION, Canon Medical System Corporation, Otawara, Japan). The demographics of the subjects are summarized in Table 1. The dataset consists of subjects likely to have intracranial vascular diseases, leading to a large diversity of calcification distribution and allowing for a comprehensive assessment of model performance.

TABLE 1.

Baseline characteristics of the studied cohort. The calcification-related values are for internal carotid arteries and middle cerebral arteries, and the calcification (cal.) volume is reported as the median with interquartile range.

Characteristic Value Characteristic Value Characteristic Value
No. of subjects 113 Smoking 23 Ischemic stroke 85
No. of women 15 Hypertension 82 Transient ischemic attack 8
Age (y) 64.8 ± 8.5 Diabetes mellitus 46 Stenosis 72
Cal. volume (mm3) 18.9 (3.2-50.5) Hyperlipidimia 5 Aneurysm 7
Presence of cal. 99 Coronary heart disease 8

Three MR sequences were chosen for this study, i.e., T1-weighted VWI, TOF MRA, and SNAP, based on their potential to visualize calcification. SNAP stands for Simultaneous Non-contrast Angiography and intraPlaque hemorrhage imaging, an MRI technique capable of acquiring a proton-density-weighted image, in which vessel wall and luminal blood are both bright1; T1 is typically used to image the vessel wall by suppressing blood flow within the lumen; TOF displays the lumen with high intensity while suppressing other tissues. Calcification produces no MR signal and appears dark on all three sequences. These sequences provide complementary structural information that helps distinguish calcification from other tissues or the lumen.

2.2 |. Preprocessing

Preprocessing of the scans was performed using a previously-developed tool, MOCHA, for multi-contrast vascular image analysis21. Specifically, we first used 3D rigid registration to spatially align T1, TOF, SNAP and CTA, with T1 as the reference image. CTA was used only because it can help human readers better visualize calcification when they reviewed the images; it was not used in model training and testing. All images were resampled to be isotropic (spacing of 0.58 mm), and normalized to a fixed intensity window (0-500 for T1, 0-2000 for TOF, and 0-220 for SNAP) for appropriate contrast.

A reviewer (G.C.) with more than ten years of experience then used the TOF sequence to track the centerlines of left and right intracranial internal carotid arteries (ICAs) and the horizontal (M1) and sylvian (M2) segments of the middle cerebral arteries (MCAs), using iCafe, a previously-developed software package for intracranial artery extraction22.

Then, 2D cross-sectional slices through the tracked centerlines were generated, as shown in Fig. 1. A region-of-interest (ROI) with size 80 × 80 was cropped from the center of each 2D slice to obtain an appropriate field-of-view (containing the vessel but not too large). Thus, for each location on each vessel of each patient, we generated 2D slices of CTA and multiple MR sequences, referred to as a slice group. For each group, the radiologist meticulously labeled calcification through a systematic process, encompassing the following steps: 1) careful examination of the CTA to identify the locations and shapes of calcification, 2) comprehensive review of all MR sequences to identify on MRI the calcification locations corresponding to the CTA, 3) precise delineation of calcification on MRI according to the boundaries formed by signal contrasts. Note that CTA was used to guide manual labeling, because even experienced radiologists find it challenging to minimize false positives when labeling calcification using MRI alone without concurrent reference to CTA. Besides, due to the blooming effect of CTA and imperfect registration between CTA and MRI in some instances, the calcification labels may not precisely align with the bright shapes on CTA. However, this pragmatic labeling procedure acknowledged the intricacies of the imaging modalities, and prioritized optimal precision of manual labeling on MRI. The manual labels were converted to binary masks as the ground-truth calcification segmentation for MRI.

Subjects were randomly divided into training, validation and test sets, with a ratio of 10:1:1. The vessel lengths among different subjects exhibited variations, leading to diverse numbers of slice groups for each individual. Consequently, the training, validation, and test sets comprised of 39858, 5296, and 2782 slice groups, respectively.

2.3 |. Proposed model

2.3.1 |. Overall framework

We propose a novel model to improve calcification segmentation on MRI. The overall architecture of our model is illustrated in Fig. 2. The inputs to the network are the 2D cross-sectional slices of multi-sequence MRI (x1) and an auxiliary image (x2, i.e., the ground-truth segmentation mask, used for training only). The MR feature (denoted by q1(zx1)) is extracted via the MR branch based on all MR sequences, and utilized by a decoder to segment calcification. For training, a segmentation loss (cross-entropy) is used to measure the difference between the predicted segmentation probability map and the ground-truth calcification label.

FIGURE 2.

FIGURE 2

The proposed framework (with two MR sequences as an example). Each encoder or decoder is the same as in a U-Net. The cubes represent feature maps. The probability distributions (e.g., q1, q2) indicate the correspondence between the features and the terms in the derived objective function (the ELBO). The red arrows and boxes indicate the calculation of the two losses. A fuller version is provided in the Supplementary Information.

In addition to the cross-entropy loss, during training, the auxiliary branch is utilized to extract the feature (denoted by q2(zx2)) from the auxiliary image. We introduce a dissimilarity loss to measures the divergence between the MR and auxiliary features, which is detailed in the next section. Trained with this loss, the network can learn to extract an MR feature similar to the auxiliary one. As a binary mask, the auxiliary image has a lower complexity, resulting in simpler features. The dissimilarity loss could therefore restrict the complexity (compress the information) of the MR feature. As we described in Section 1, this could improve feature generalizability and lead to better segmentation. Since the auxiliary branch is only used for loss calculation, after the training is complete, the model does not require auxiliary images for inference and can segment calcification using only MRI scans.

Note that the model is a variational auto-encoder (VAE). Therefore, the MR and auxiliary features essentially represent two probability distribution q1 and q2, respectively. Random sampling from q1 is involved before the decoder, which will be detailed in the following sections.

2.3.2 |. Theoretical explanation

In this section, we describe the theoretical details of our framework. Building on the work of Alemi et al.20, which demonstrated that VAEs can compress information inside the features in an unsupervised scenario, we extend these concepts to develop a new VAE model for calcification segmentation.

In general, let the input multi-sequence MRI be x1H0×W0×C0, where H0, W0 are the height and width of each MR image, respectively, and C0 is the number of MR sequences; let the corresponding auxiliary image be x2H0×W0×1. Then the input images can be written as x=(x1,x2). Following the variational inference framework19, we further assume the ground-truth segmentation label s{0,1}H0×W0 is generated from a latent variable z by a conditional distribution p(sz). Since the true posterior distribution p(zs) is intractable, a variational posterior p(zs) is introduced to approximate it.

The VAE and its multimodal variants23,24,25,26 are well-established techniques to learn features in an unsupervised manner. In this work, we aim to extend this model family for calcification segmentation. Notably, we propose to reduce the information in the feature z by forcing the feature to focus on the segmentation s rather than the entire input images. Thus, we assume q(zs)=q(zx). Intuitively this means s and x contain the same information of z. Then a lower bound of the log-likelihood log p(s) can be derived as

logp(s)Eq(zx)logp(sz)DKL(q(zx)p(z)):=ELBO, (1)

where E denotes the mathematical expectation over a specific distribution, p(z) is the prior distribution of z, and DKL is the Kullback-Leibler (KL) divergence between two distributions. This is called the evidence lower bound (ELBO)19. Our goal is to maximize the log-likelihood logp(s), which is, however, generally intractable. Therefore, the VAE framework aims to maximize its ELBO, which is achieved by our network and will be detailed in the next section.

We have addressed the MRI and auxiliary together as x=(x1,x2), and q(zx) in the ELBO indicates that the extraction of feature z relies on both. However, we aim to build a model capable of segmenting calcification without the auxiliary image. To this end, for each m (m = 1, 2 for MRI or auxiliary, respectively), we propose to model an individual posterior distribution qm:=qm(zxm) (a.k.a. “expert”). To obtain the final posterior q(zx) and the prior p(z), we propose to process the experts in two different ways (namely the geometric27 and arithmetic23 means), i.e.,

q(zx)[m=1Mqm]1M,p(z):=1Mm=1Mqm. (2)

The intuition is that each expert qm(zxm) corresponds to extracting the feature from either the MRI or the auxiliary. To get an overall estimation of the feature by integrating information from both inputs, we merge the information of the experts by calculating their geometric mean. Our previous work has validated this strategy for multimodal image registration28,29, and here we extend it to image segmentation.

Based on the assumptions and derivations above, the objective to be maximized during network training, i.e., the ELBO, can be expressed as

ELBO=Eq(zx)logp(sz)DKL(C[m=1Mqm]1M1Mm=1Mqm) (3)

where qm:=qm(zxm), m=1,2 is the individual posterior of MRI or auxiliary, and C is a constant to normalize the geometric mean, making it a probability distribution.

2.3.3 |. Network structure

As mentioned in Section 2.3.2, the goal of VAE is to maximize the ELBO, and from Eq. (3) we see that to calculate the ELBO, we only need to estimate the probability distributions p(sz) and qm(zxm), m=1,2 (note that q(zx) in Eq. (3) is the geometric mean of qm’s). Therefore, we build a network to infer the distributions from input images. Then, by deploying the negative of the ELBO as the loss function, we can train the network to learn to produce optimal distributions given any input data. Such network is called a variational autoencoder (VAE)19. In our case, qm(zxm), m=1,2 requires two encoding branches to infer the distribution of z separately from MRI (x1) and auxiliary (x2), based on which we analytically calculate q(zx) as the geometric mean. Besides, p(sz) requires a decoder to infer the distribution of s (i.e., the segmentation probability map) given any z.

Considering these requirements, the design of our VAE is shown in Fig. 2. In summary, the auxiliary branch extracts from the auxiliary image an H×W×C feature map, where H, W is the height and width, and C is the number of channels. Following the convention of VAEs30,31, we model the distribution q2(zx2) as a diagonal Gaussian distribution, with mean and covariance represented by the first and second half channels of this feature map. For multi-sequence MRI, we first use multiple encoders to extract per-sequence features, which are then concatenated and processed by a convolutional layer to obtain the MR feature (overall feature map for MRI). This feature map parameterizes the distribution q1(zx1) in the same way as the auxiliary feature map. Once q1(zx1) and q2(zx2) are obtained through the feature maps, the final posterior and prior distributions are calculated through Eq. (2), and the KL divergence in the ELBO can be calculated analytically.

More importantly, the KL term in Eq. (3) measures the dissimilarity between the geometric and arithmetic means of qm. As a part of maximizing the ELBO, we minimize the KL during training, encouraging the network to extract similar qms (features) from MRI and the auxiliary image. In other words, the MR feature q1 will be aligned to the auxiliary feature q2. As we discussed in Section 1, this restricts the complexity of q1 and thus leads to better model performance. In contrast, a vanilla model may learn an MR feature intertwined with redundant information, making it difficult for the decoder to segment specific substructures. It has been demonstrated that individual distributions qms are helpful for learning desired features23,28.

To calculate the expectation term in the ELBO, we perform Monte Carlo estimation, shown in Fig. 2, similar to previous works19. Specifically, for each training iteration, we sample a value z from q(zx), based on which the decoder produces the segmentation probability map p(sz). The log-probability logp(sz) can then be calculated by measuring the probability of the ground-truth segmentation s given the probability map, which is equivalent to the negative of cross-entropy31. This forms the estimate of Eq(zx)logp(sz). For model evaluation, instead of sampled z, the decoder directly utilizes the mode of q(zx1), as the input MR feature to produce segmentation probability maps.

The structures of the encoders and decoder are the same as in a U-Net. Therefore, the feature maps from the encoders are exactly the outputs of convolutional layers after downsampling (pooling) operations. Since these U-Net features are with different resolutions, we also calculate the dissimilarity loss (KL term in Eq. (3)) in a multi-scale manner. In other words, for each downsampling (scale) of the encoders, we obtain one q(zx) to calculate one KL divergence, and the final KL is the sum of the multi-scale KL divergences. We provide a proof in Supplementary Information to decompose the KL term in Eq. (3) into multi-scale ones. In addition, the encoders share all the parameters except for batch normalizations. This significantly reduces computational complexity and the number of learnable parameters, thus mitigating overfitting and helping better extract features32.

2.3.4 |. Implementation details

We set the weights of the KL divergence and the segmentation (cross-entropy) loss to 1.5 and 500, respectively. The results of an ablation study examining the impact of loss weights are provided in the supporting information. The positive weight for the cross-entropy loss was set to 0.95. The model was implemented using PyTorch33 and trained on an NVIDIA TITAN V GPU for 100 epochs, via the Adam optimizer34 with a learning rate of 10−3 and a batch size of 400. We selected the model with the best validation performance (evaluated after each epoch) and reported the following results by applying this model to the test set.

2.4 |. Evaluation metrics

We used the Dice Similarity Coefficient to evaluate the segmentation performance. We also measured the commonly-used 95% Hausdorff Distance (HD95) and the Average Symmetric Surface Distance (ASSD), where HD95 evaluates the maximum surface distance between the prediction and ground-truth, and ASSD considers the distance in an average sense. Besides, we report the area under the precision-recall curve (PR-AUC) to evaluate the overall performance by taking into account different thresholds.

2.5 |. Compared methods

We compared our model with two types of state-of-the-art deep learning methods: UNet-based and transformer-based. The first type exhibits similar U-Net-like encoder/decoder structures to ours, yet each adopts a distinctive approach to feature processing, including U-Net35, Residual U-Net (ResU-Net)36, Attention U-Net37, Attention ResU-Net, U-Net++38, and nnU-Net39, which are widely recognized for their performance in medical image segmentation nowadays. The second type involves more advanced transformer-based architectures, including recently proposed UNETR40, Swin UNETR41, and MedNeXt42.

Since we aimed to segment calcification on MRI, all methods were trained and evaluated with the combination of the three MR sequences. For fair comparisons, the common settings (e.g., the number of scales) for all methods were also the same.

3 |. RESULTS

3.1 |. Quantitative comparisons of different methods for segmentation

We first compared the proposed model and baseline methods for calcification segmentation. Particularly, our model involves two variants:

  • Ours (w/o mask): For training, the auxiliary branch and dissimilarity loss are disabled.

  • Ours (w/mask): For training, both branches are utilized, with the ground-truth segmentation mask as the auxiliary.

The comparison between Ours (w/o mask) and Ours (w/mask) served as an ablation study to investigate the effect of the auxiliary training input.

Comparisons with state-of-the-art segmentation models:

The quantitative metrics evaluated on the test set are summarized in Table 2. Our method outperforms all baselines across Dice, PR-AUC and HD95, and our ASSD is very close to the best value achieved by nnU-Net, which, however, exhibits a very low Dice score. These results highlight the effectiveness of our approach. Besides, while UNETR achieves the best Dice among the baseline methods, its distance-based metrics, especially HD95, are significantly worse than ours. This indicates that our model produces better spatial agreement in boundary localization, effectively mitigating false positive outliers. Moreover, comparisons with the six U-Net variants highlight our model’s superiority, attributed to the novel design of the theoretically grounded loss function, because our models shares similar encoder/decoder structures with those U-Net variants. Besides, our model outperforms the three networks with much more advanced transformer-based structures, reaffirming its superiority.

TABLE 2.

Segmentation performance of different methods on the test MR images. HD95 and ASSD were measured in millimeter (mm).

Methods Dice PR-AUC HD95 (mm) ASSD (mm)
U-Net 0.562 0.575 1.829 1.294
ResU-Net 0.539 0.545 2.431 1.080
Att U-Net 0.545 0.552 1.998 1.117
Att ResU-Net 0.539 0.564 5.087 1.347
U-Net++ 0.596 0.623 1.648 0.853
nnU-Net 0.485 0.654 1.165 0.672

UNETR 0.606 0.652 8.114 1.535
Swin UNETR 0.587 0.633 4.201 1.260
MedNeXt 0.601 0.626 1.165 0.761

Ours (w/o mask) 0.592 0.630 6.649 1.133
Ours (w/mask) 0.620 0.660 0.848 0.692

Comparison with inter-reader variations:

To further validate that our method’s results align closely with the consensus of human experts, we followed the same procedure in43 to investigate inter-reader variations. Particularly, we invited two more radiologists (H.A. and E.Y.A.) to individually label calcification on 14 randomly selected subjects, resulting in an inter-reader Dice score of 0.626, which is very close to the value 0.620 achieved by our model.

Ablation study for the auxiliary encoder and dissimilarity loss:

Ours (w/mask) exhibits superior performance compared to Ours (w/o mask) trained without the auxiliary encoder. Notably, Ours (w/mask) achieves a significantly lower HD95, demonstrating that the auxiliary encoder, coupled with the dissimilarity loss, effectively eliminated false positive outlier pixels in the predicted segmentation.

3.2 |. Qualitative results

The qualitative results from our model are visualized in Fig. 3, where we show 12 examples with various calcification shapes and a range of segmentation performances. Note that the CTA images are shown for reference purpose only, i.e., they were not used as input for training or testing. It is evident that our method can produce relatively accurate calcification boundaries in a variety of situations. Although the dark area on SNAP is much larger than the calcification region, the model still managed to combine multiple sequences to delineate the real boundaries of calcification. One can observe that even with relatively low Dice scores in some cases, the model can still localize correct calcification region, e.g., the worst Dice 0.444 is just due to a very small calcification area. In the lower bottom case, the model failed to detect one of the two calcification regions, probably because a portion of the vessel wall on T1 is dark and unclear. Still, our model is powerful enough to identify calcification locations that could be hard to detect on MRI.

FIGURE 3.

FIGURE 3

Examples of segmentation from Ours (w/mask) compared to the ground-truth. 2D slices with various calcification shapes (ring-like, bulk and spotty) are displayed, with contours (green for ground-truth and orange for ours) overlaid on MR images, and the Dice scores for each slice group shown on the TOF images. Corresponding CTA images are not used for training or test, but are shown here only for reference.

Feature complexity:

We visualized MR features from the two variants of our model in Fig. 4. One can observe that Ours (w/mask) produced visually simpler feature maps with a much smaller intensity range indicated by the color bars. Similar to17, we also estimated an upper bound of mutual information I(X1; Z) between input MRI X1 and its feature Z, which provides a measurement for feature complexity. Ours (w/mask) achieved a value of 6.9×10−8, much smaller than the value 9.5×107 from Ours (w/o mask). These results indicates that the proposed model with the dissimilarity loss can effectively restrict feature complexity.

FIGURE 4.

FIGURE 4

Visualization of MR features extracted by Ours (w/mask) and Ours (w/o mask) from two example groups of multi-sequence MRI (with predicted and ground-truth segmentation delineated by orange and green contours, respectively). The features are eight channels of the mean value μ(z) of (q1z|x1). Ours (w/mask) extracted features with reduced complexity, in contrast to ours (w/o mask) which extracted more complex features and predicted more false positives as seen by the segmentation contours, indicating poor generalizability of the features on the test set.

3.3 |. Performance in predicting clinical measurements

In clinical reviews of calcification, radiologists commonly rely on high-level measurements derived from segmentations for diagnosis and the analysis of disease progression. Frequently utilized calcification metrics include the Agatston Score and calcium volume44,45. Since the former is specific to CT scans, here we investigated our model’s performance in predicting calcium volume. The results are shown in Fig. 5.

FIGURE 5.

FIGURE 5

Comparisons between ground-truth and predicted calcium volumes for test subjects.

Compared to baselines, our approach demonstrates a calcium volume distribution closer to the ground truth, and the Kolmogorov-Smirnov test for calcium volumes yields a p-value of 0.997 0.05, indicating no significant evidence of a difference between our volume distribution and the ground-truth. The Bland-Altman plot reveals nearly all predictions fall within the range of agreement with ground-truth, and there is only one slight outlier with a small volume (typically more challenging to predict). The regression line between our predictions and the ground-truth further exhibits a substantial R value with p-value 0.001. Collectively, these findings demonstrate robust evidence of the efficacy of our method in accurately predicting calcium volume.

Another potentially valuable evaluation involves assessing the frequency of calcium occurrence along the arteries on a slice-by-slice basis. Therefore, we further evaluated the performance of the compared methods in slice-wise calcification detection. In this task, each location on the vessel centerline is classified as calcified if the calcification mask of the corresponding 2D slice group contains positive pixels. The results are shown in Table 3, demonstrating the superiority of our method in detecting calcified locations along arteries, with high F1 score around 0.82 and PR-AUC around 0.85. Considering calcification is dark and nearly invisible on MRI, our approach achieves accurate automated calcification localization. This could greatly alleviate the burden of manual analysis, which requires an experienced radiologist to carefully examine suspicious dark signals on multiple MR sequences simultaneously.

TABLE 3.

Slice-wise detection performance of different methods on the test MR images, with top-2 values bolded.

Methods F1 Recall Precision PR-AUC
U-Net 0.761 0.696 0.839 0.800
ResU-Net 0.776 0.740 0.814 0.805
Att U-Net 0.751 0.680 0.840 0.794
Att ResU-Net 0.764 0.720 0.813 0.797
U-Net++ 0.821 0.772 0.876 0.848

nnU-Net 0.788 0.658 0.982 0.857
UNETR 0.804 0.899 0.727 0.824
Swin UNETR 0.810 0.907 0.731 0.829
MedNeXt 0.794 0.720 0.886 0.833

Ours (w/o mask) 0.756 0.745 0.766 0.783
Ours (w/mask) 0.823 0.764 0.892 0.853

3.4 |. Effect of different MR sequences

We also investigated the effect of each MR sequence on model performance. To this end, we trained and evaluated Ours (w/mask) with different combinations of MR sequences, as shown in Table 4.

TABLE 4.

Performance of Ours (w/mask) with different combinations of MR sequences. A check mark indicates that the sequence is used for both training and test. A value in bold means it is the best among all combinations with the same number of sequences.

Combinations
Segmentation
Slice-wise detection
T1 TOF SNAP Dice PR-AUC HD95 (mm) ASSD (mm) F1 Recall Precision PR-AUC
0.215 0.140 22.950 5.588 0.533 0.632 0.460 0.585
0.246 0.133 9.712 3.212 0.579 0.605 0.554 0.622
0.444 0.418 8.229 1.869 0.690 0.717 0.666 0.722

0.406 0.378 15.990 2.938 0.638 0.723 0.571 0.677
0.554 0.561 2.959 1.184 0.762 0.735 0.791 0.791
0.502 0.488 2.171 1.178 0.737 0.691 0.790 0.774

0.620 0.660 0.848 0.692 0.823 0.764 0.892 0.853

We observed that the results improved with the inclusion of more MR sequences, and SNAP is much more informative about calcification compared to TOF and T1, as the combination containing SNAP achieves a better performance than without SNAP, when the number of sequences is fixed. Moreover, T1 is generally better than TOF when combined with SNAP to detect calcification, but either TOF+SNAP or T1+SNAP is not sufficient to achieve a good performance. The best performance was achieved when all three sequences were included.

The results are consistent with our intuition that: 1) SNAP can provide a relatively accurate segmentation boundary, since the bright lumen area and the dark calcification area usually form an easy-to-detect contrast ratio, even though there are other non-calcified dark regions in SNAP, which may increase the false positive rate. 2) TOF can help localize the lumen and thus ameliorate the interference of dark artifacts inside the lumen in SNAP, while it cannot help exclude surrounding dark structures outside the vessel region. 3) T1 can also help localize calcification by providing an outer bound of the region-of-interest (including lumen, calcification and outer wall). An abnormal boundary could indicate the potential presence of calcification, while calcification that does not distort vessel wall contours could be missed. Thus, the combination of the three sequences produces the best result, and each of them plays an important role.

4 |. DISCUSSION

We have developed a novel model for intracranial arterial calcification segmentation and detection on multi-sequence MRI by restricting feature complexity. The literature on automated deep learning models for MRI-based calcification assessment is very limited and unexplored. Therefore, we conducted this study and validated the feasibility of the task.

While CT/CTA is the first-line imaging in stroke, and is often considered the reference standard for calcification detection, with the increased use of VWI14 and a healthcare push to limit imaging overutilization, patients with cerebrovascular diseases may only undergo MRI and VWI in disease assessment. By providing comprehensive feature characterization without the need for added imaging, VWI improves healthcare efficiency and reduces the burden on patients from additional radiation and iodinated contrast injection. In addition, dementia evaluation may not include CT/CTA. Considering the associations between atherosclerosis, calcification and cognitive impairment, with further evaluation and validation, comprehensive evaluation of plaque features and calcification could be beneficial in outcome assessment in the future.

The proposed method was compared with multiple cutting-edge medical image segmentation methods on a dataset of intracranial VWI. With the combination of three MR sequences (i.e., T1, TOF, and SNAP) and an auxiliary image for training, our method achieved superior segmentation and slice-wise detection on the test MRI data, generally outperforming all baseline methods. This demonstrates the efficacy of our model. Moreover, the proposed method with auxiliary was better than the same model trained without auxiliary. This demonstrates that the strategy of restricting MR feature complexity by an auxiliary feature is beneficial for calcium segmentation.

We also investigated the model performance with different combinations of MR sequences. We found that T1, TOF and SNAP all contain certain structural information of calcification. In addition, with the presence of more MR sequences, the model generally achieves a better performance. In particular, SNAP seemed to be more informative than T1 or TOF, as it presents calcified areas with darker signal intensities, which contrasts more sharply with the lumen and other surrounding tissue. Nonetheless, these three sequences support calcification assessment with complementary and unique information from their own perspectives. Note that “SNAP” in this work refers to the reference image of the SNAP VWI sequence rather than the corrected real image which is generally referred to as SNAP in the literature. The reference image of the SNAP sequence has been shown to benefit the identification of calcium46.

Although this work utilized three specific MR sequences, our model is flexible to include more. New sequences need to be registered and then 2D cross-sectional slices can be generated. More MR encoders can be added to the network structure for the additional sequences. Then training and test can be performed as usual. The number of model parameters will not increase excessively, since we share the convolutional layers across all encoders. In addition, we only targeted the intracranial ICA and MCA. Further study is needed to determine whether our model has good generalizability and robustness for calcification segmentation for other intracranial artery segments. We plan to include the full intracranial arterial tree for comprehensive assessment in the future.

In summary, calcification is generally considered extremely challenging to segment on MRI. This work established the first automated deep learning approach to this task. This can enable comprehensive MRI review including calcification, vessel wall, plaque and hemodynamic evaluation that may improve imaging efficiency and reduce patient burden and risk.

Supplementary Material

Supinfo

Figure S1. Fuller version of Figure 2 in the main text. The proposed framework (with two MR sequences as an example). Each encoder or decoder is the same as in a U-Net. The cubes represent feature maps, with H, W and C being the height, width, and channel number, respectively. The probability distributions (e.g., q1 and q2) indicate the correspondence between the features and the terms in the derived objective function (the ELBO). The red arrows and boxes indicate the calculation of the two losses, and purple arrows explain their effects.

Table S1. Ablation study for the effect of loss weights on calcification segmentation performance by fixing the segmentation loss weight as 400 and varying the KL divergence weight. The bolded row, representing the optimal performance, was reported in the main text of the paper as the model Ours (w/mask). The result with the KL weight being 0 is equivalent to disabling the auxiliary branch during training, corresponding to Ours (w/o mask) in the main text.

Table S2. Ablation study for the effect of loss weights on slice-wise calcification detection performance.

Table S3. Segmentation and slice-wise detection performance of Ours (w/mask) (trained on the Renji dataset) on the external test set.

Figure S2. Comparisons between ground-truth and predicted calcium volumes by Ours (w/mask) (trained on the Renji dataset) for the external test set. Since there are cases whose calcium volumes are zero, the presented values are in mm3 rather than log mm3.

ACKNOWLEDGMENTS

This work was partially funded by National Institute of Health (NIH) grants R01NS092207 and R01NS127317.

Footnotes

Conflict of interest

The authors declare no potential conflict of interests.

REFERENCES

  • 1.Shu Hongge, Sun Jie, Hatsukami Thomas S., et al. Simultaneous noncontrast angiography and intraplaque hemorrhage (SNAP) imaging: Comparison with contrast-enhanced MR angiography for measuring carotid stenosis. Journal of Magnetic Resonance Imaging. 2017;46(4):1045–1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Arenillas Juan F.. Intracranial Atherosclerosis: Current Concepts. Stroke. 2011;42(1_suppl_1):S20–S23. [DOI] [PubMed] [Google Scholar]
  • 3.Bos Daniel, Portegies Marileen L. P., Lugt Aad, et al. Intracranial Carotid Artery Atherosclerosis and the Risk of Stroke in Whites: The Rotterdam Study. JAMA Neurology. 2014;71(4):405–411. [DOI] [PubMed] [Google Scholar]
  • 4.Bugnicourt Jean-Marc, Leclercq Claire, Chillon Jean-Marc, et al. Presence of Intracranial Artery Calcification Is Associated With Mortality and Vascular Events in Patients With Ischemic Stroke After Hospital Discharge. Stroke. 2011;42(12):3447–3453. [DOI] [PubMed] [Google Scholar]
  • 5.Bos Daniel, Vernooij Meike W., Elias-Smale Suzette E., et al. Atherosclerotic calcification relates to cognitive function and to brain changes on magnetic resonance imaging. Alzheimer’s & Dementia. 2012;8(5S):S104–S111. [DOI] [PubMed] [Google Scholar]
  • 6.Lessmann Nikolas, Ginneken Bram, Zreik Majd, et al. Automatic Calcium Scoring in Low-Dose Chest CT Using Deep Neural Networks With Dilated Convolutions. IEEE Transactions on Medical Imaging. 2018;37(2):615–625. [DOI] [PubMed] [Google Scholar]
  • 7.Graffy Peter M., Liu Jiamin, O’Connor Stacy D., Summers Ronald M., Pickhardt Perry J.. Automated segmentation and quantification of aortic calcification at abdominal CT: application of a deep learning-based algorithm to a longitudinal screening cohort. Abdominal Radiology. 2019;44:2921–2928. [DOI] [PubMed] [Google Scholar]
  • 8.Weng Wenhai, Ku Yijie, Chen Zhong, et al. Superficial femoral artery calcification segmentation and detection in CT angiography using convolutional neural network. Computers in Biology and Medicine. 2022;148:105951. [DOI] [PubMed] [Google Scholar]
  • 9.Bortsova Gerda, Bos Daniel, Dubost Florian, et al. Automated Segmentation and Volume Measurement of Intracranial Internal Carotid Artery Calcification at Noncontrast CT. Radiology: Artificial Intelligence. 2021;3(5):e200226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Brenner David J., Hall Eric J.. Computed Tomography An Increasing Source of Radiation Exposure. New England Journal of Medicine. 2007;357(22):2277–2284. [DOI] [PubMed] [Google Scholar]
  • 11.Yang Wen-jie Chen Xiang-yan. Intracranial Atherosclerosis: From Microscopy to High-Resolution Magnetic Resonance Imaging. J Stroke. 2017;19(3):249–260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Amann Kerstin. Media Calcification and Intima Calcification Are Distinct Entities in Chronic Kidney Disease. Clinical Journal of the American Society of Nephrology. 2008;3(6). [DOI] [PubMed] [Google Scholar]
  • 13.Mandell DM, Mossa-Basha M, Qiao Y, et al. Intracranial Vessel Wall MRI: Principles and Expert Consensus Recommendations of the American Society of Neuroradiology. American Journal of Neuroradiology. 2017;38(2):218–229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Mossa-Basha Mahmud, Yuan C, Wasserman Bruce A., et al. Survey of the American Society of Neuroradiology Membership on the Use and Value of Extracranial Carotid Vessel Wall MRI. American Journal of Neuroradiology. 2022;43:1756–1761. [DOI] [PubMed] [Google Scholar]
  • 15.Guo Yin, Canton Gador, Baylam Geleri Duygu, et al. Plaque Evolution and Vessel Wall Remodeling of Intracranial Arteries: A Prospective, Longitudinal Vessel Wall MRI Study. Journal of Magnetic Resonance Imaging. ;n/a(n/a). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Saba L, Yuan C, Hatsukami TS, et al. Carotid Artery Wall Imaging: Perspective and Guidelines from the ASNR Vessel Wall Imaging Study Group and Expert Consensus Recommendations of the American Society of Neuroradiology. American Journal of Neuroradiology. 2018;39(2):E9–E31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Tishby Naftali, Pereira Fernando C., Bialek William. The information bottleneck method. In: Proc. of the 37-th Annual Allerton Conference on Communication, Control and Computing:368–377; 1999. [Google Scholar]
  • 18.Gao Shangqi, Zhou Hangqi, Gao Yibo, Zhuang Xiahai. BayeSeg: Bayesian modeling for medical image segmentation with interpretable generalizability. Medical Image Analysis. 2023;89:102889. [DOI] [PubMed] [Google Scholar]
  • 19.Kingma Diederik P., Welling Max. Auto-Encoding Variational Bayes. In: Bengio Yoshua, LeCun Yann, eds. 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, ; 2014. [Google Scholar]
  • 20.Alemi Alexander A., Fischer Ian, Dillon Joshua V., Murphy Kevin. Deep Variational Information Bottleneck. In: International Conference on Learning Representations; 2017. [Google Scholar]
  • 21.Guo Yin, Canton Gador, Chen Li, et al. Multi-Planar, Multi-Contrast and Multi-Time Point Analysis Tool (MOCHA) for Intracranial Vessel Wall Characterization. Journal of Magnetic Resonance Imaging. 2022;56(3):944–955. [DOI] [PubMed] [Google Scholar]
  • 22.Chen Li, Mossa-Basha Mahmud, Balu Niranjan, et al. Development of a quantitative intracranial vascular features extraction tool on 3D MRA using semiautomated open-curve active contour vessel tracing. Magnetic Resonance in Medicine. 2018;79(6):3229–3238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Shi Yuge, Siddharth N, Brooks Paige, Torr Philip. Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models. 2019. [Google Scholar]
  • 24.Wu Mike, Goodman Noah. Multimodal Generative Models for Scalable Weakly-Supervised Learning. 2018. [Google Scholar]
  • 25.Sutter Thomas M., Daunhawer Imant, Vogt Julia E. Generalized Multimodal ELBO. 2021. [Google Scholar]
  • 26.Tu Xinming, Cao Zhi-Jie, chenrui xia, Mostafavi Sara, Gao Ge. Cross-Linked Unified Embedding for cross-modality representation learning. 2022. [Google Scholar]
  • 27.Hinton Geoffrey E.. Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation. 2002;14(8):1771–1800. [DOI] [PubMed] [Google Scholar]
  • 28.Wang Xin, Luo Xinzhe, Zhuang Xiahai. BInGo: Bayesian Intrinsic Groupwise Registration via Explicit Hierarchical Disentanglement. In: Frangi Alejandro, Bruijne Marleen, Wassermann Demian, Navab Nassir, eds. Information Processing in Medical Imaging, :319–331Springer Nature Switzerland; 2023; Cham. [Google Scholar]
  • 29.Luo Xinzhe, Wang Xin, Shapiro Linda, Yuan Chun, Feng Jianfeng, Zhuang Xiahai. Bayesian Intrinsic Groupwise Image Registration: Unsupervised Disentanglement of Anatomy and Geometry. 2024. [Google Scholar]
  • 30.Vahdat Arash, Kautz Jan. NVAE: A Deep Hierarchical Variational Autoencoder. In: Neural Information Processing Systems (NeurIPS); 2020. [Google Scholar]
  • 31.Wu Fuping, Zhuang Xiahai. Unsupervised Domain Adaptation With Variational Approximation for Cardiac Segmentation. IEEE Transactions on Medical Imaging. 2021;40(12):3555–3567. [DOI] [PubMed] [Google Scholar]
  • 32.Chang Woong-Gi, You Tackgeun, Seo Seonguk, Kwak Suha, Han Bohyung. Domain-Specific Batch Normalization for Unsupervised Domain Adaptation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019;. [Google Scholar]
  • 33.Paszke Adam, Gross Sam, Massa Francisco, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. 2019. [Google Scholar]
  • 34.Kingma Diederik P, Ba Jimmy. Adam: A method for stochastic optimization. In: International Conference on Learning Representations; 2015. [Google Scholar]
  • 35.Ronneberger Olaf, Fischer Philipp, Brox Thomas. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab Nassir, Hornegger Joachim, Wells William M., Frangi Alejandro F., eds. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, :234–241Springer International Publishing; 2015; Cham. [Google Scholar]
  • 36.Khanna Anita, Londhe Narendra D., Gupta S, Semwal Ashish. A deep Residual U-Net convolutional neural network for automated lung segmentation in computed tomography images. Biocybernetics and Biomedical Engineering. 2020;40(3):1314–1327. [Google Scholar]
  • 37.Oktay Ozan, Schlemper Jo, Folgoc Loic Le, et al. Attention U-Net: Learning Where to Look for the Pancreas. In: Medical Imaging with Deep Learning; 2018. [Google Scholar]
  • 38.Zhou Zongwei, Rahman Siddiquee Md Mahfuzur, Tajbakhsh Nima, Liang Jianming. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In: Stoyanov Danail, Taylor Zeike, Carneiro Gustavo, et al. , eds. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, :3–11Springer International Publishing; 2018; Cham. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Isensee Fabian, Jaeger Paul F, Kohl Simon A A, Petersen Jens, Maier-Hein Klaus H. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods. 2021;18(2):203211. [DOI] [PubMed] [Google Scholar]
  • 40.Hatamizadeh Ali, Tang Yucheng, Nath Vishwesh, et al. UNETR: Transformers for 3D Medical Image Segmentation.. In: WACV:1748–1758IEEE; 2022. [Google Scholar]
  • 41.Hatamizadeh Ali, Nath Vishwesh, Tang Yucheng, Yang Dong, Roth Holger R., Xu Daguang. Swin UNETR: Swin Transformers forSemantic Segmentation ofBrain Tumors inMRI Images. In: Crimi Alessandro, Bakas Spyridon, eds. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, :272–284Springer International Publishing; 2022; Cham. [Google Scholar]
  • 42.Roy Saikat, Koehler Gregor, Ulrich Constantin, et al. MedNeXt: Transformer-Driven Scaling ofConvNets forMedical Image Segmentation. In: Greenspan Hayit, Madabhushi Anant, Mousavi Parvin, et al. , eds. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023, :405–415 Springer Nature Switzerland; 2023; Cham. [Google Scholar]
  • 43.Qiu Junyi, Li Lei, Wang Sihan, et al. MyoPS-Net: Myocardial pathology segmentation with flexible combination of multi-sequence CMR images. Medical Image Analysis. 2023;84:102694. [DOI] [PubMed] [Google Scholar]
  • 44.Subedi Deepak, Zishan Umme Sara, Chappell Francesca, et al. Intracranial Carotid Calcification on Cranial Computed Tomography. Stroke. 2015;46(9):2504–2509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Yoon William J., Crisostomo Paul, Halandras Pegge, Bechara Carlos F., Bernadette Aulivola. The Use of the Agatston Calcium Score in Predicting Carotid Plaque Vulnerability. Annals of Vascular Surgery. 2019;54:22–26. [DOI] [PubMed] [Google Scholar]
  • 46.Chen Shuo, Zhao Huilin, Li Jifan, et al. Evaluation of carotid atherosclerotic plaque surface characteristics utilizing simultaneous noncontrast angiography and intraplaque hemorrhage (SNAP) technique. Journal of Magnetic Resonance Imaging. 2018;47(3):634–639. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supinfo

Figure S1. Fuller version of Figure 2 in the main text. The proposed framework (with two MR sequences as an example). Each encoder or decoder is the same as in a U-Net. The cubes represent feature maps, with H, W and C being the height, width, and channel number, respectively. The probability distributions (e.g., q1 and q2) indicate the correspondence between the features and the terms in the derived objective function (the ELBO). The red arrows and boxes indicate the calculation of the two losses, and purple arrows explain their effects.

Table S1. Ablation study for the effect of loss weights on calcification segmentation performance by fixing the segmentation loss weight as 400 and varying the KL divergence weight. The bolded row, representing the optimal performance, was reported in the main text of the paper as the model Ours (w/mask). The result with the KL weight being 0 is equivalent to disabling the auxiliary branch during training, corresponding to Ours (w/o mask) in the main text.

Table S2. Ablation study for the effect of loss weights on slice-wise calcification detection performance.

Table S3. Segmentation and slice-wise detection performance of Ours (w/mask) (trained on the Renji dataset) on the external test set.

Figure S2. Comparisons between ground-truth and predicted calcium volumes by Ours (w/mask) (trained on the Renji dataset) for the external test set. Since there are cases whose calcium volumes are zero, the presented values are in mm3 rather than log mm3.

RESOURCES