Double-Uncertainty Guided Spatial and Temporal Consistency Regularization Weighting for Learning-based Abdominal Registration

Zhe Xu; Jie Luo; Donghuan Lu; Jiangpeng Yan; Sarah Frisken; Jayender Jagadeesan; William M Wells, III; Xiu Li; Yefeng Zheng; Raymond Kai-yu Tong

. Author manuscript; available in PMC: 2023 Sep 17.

Published in final edited form as: Med Image Comput Comput Assist Interv. 2022 Sep 17;2022:14–24.

Double-Uncertainty Guided Spatial and Temporal Consistency Regularization Weighting for Learning-based Abdominal Registration

Zhe Xu ¹, Jie Luo ³, Donghuan Lu ², Jiangpeng Yan ⁴, Sarah Frisken ³, Jayender Jagadeesan ³, William M Wells III ³, Xiu Li ⁵, Yefeng Zheng ², Raymond Kai-yu Tong ¹

PMCID: PMC10211410 NIHMSID: NIHMS1850123 PMID: 37250854

Abstract

In order to tackle the difficulty associated with the ill-posed nature of the image registration problem, regularization is often used to constrain the solution space. For most learning-based registration approaches, the regularization usually has a fixed weight and only constrains the spatial transformation. Such convention has two limitations: (i) Besides the laborious grid search for the optimal fixed weight, the regularization strength of a specific image pair should be associated with the content of the images, thus the “one value fits all” training scheme is not ideal; (ii) Only spatially regularizing the transformation may neglect some informative clues related to the ill-posedness. In this study, we propose a mean-teacher based registration framework, which incorporates an additional temporal consistency regularization term by encouraging the teacher model’s prediction to be consistent with that of the student model. More importantly, instead of searching for a fixed weight, the teacher enables automatically adjusting the weights of the spatial regularization and the temporal consistency regularization by taking advantage of the transformation uncertainty and appearance uncertainty. Extensive experiments on the challenging abdominal CT-MRI registration show that our training strategy can promisingly advance the original learning-based method in terms of efficient hyperparameter tuning and a better tradeoff between accuracy and smoothness.

Keywords: Abdominal Registration, Regularization, Uncertainty

1. Introduction

Recently, learning-based multimodal abdominal registration has greatly advanced percutaneous nephrolithotomy due to their substantial improvement in computational efficiency and accuracy [2, 24, 10, 21, 25, 23]. In the training stage, given a set of paired image data, the neural network optimizes the cost function

ℒ = ℒ_{sim} (I_{f}, I_{m} \circ ϕ) + λ ℒ_{reg} (ϕ),

(1)

to learn a mapping function that can rapidly estimate the deformation field ϕ for a new pair of images. In the cost function, the first term ℒ_sim quantifies the appearance dissimilarity between the fixed image I_f and the warped moving image I_m ◦ ϕ. Since image registration is an ill-posed problem, the second regularization term ℒ_reg is used to constrain its solution space. The tradeoff between registration accuracy and transformation smoothness is controlled by a weighting coefficient λ.

In classical iterative registration methods, the weight λ is manually tuned for each image pair. Yet, in most learning-based approaches [2], the weight λ is commonly set to a fixed value for all image pairs throughout the training stage, assuming that they require the same regularization strength. Besides the notoriously time-consuming grid search for the so-called “optimal” fixed weight, this “one value fits all” scheme (Fig. 1(c)) may be suboptimal because the regularization strength of a specific image pair should be associated with their content and misalignment degree, especially for the challenging abdominal registration with various large deformation patterns. In these regards, HyperMorph [11] estimated the effect of hyperparameter values on deformations with the additionally trained hypernetworks [7]. Being more parameter-efficient, recently, Mok et al. [16] introduced conditional instance normalization [6] into the backbone network and used an extra distributed mapping network to implicitly control the regularization by normalizing and shifting feature statistics with their affine parameters. As such, it enables optimizing the model with adaptive regularization weights during training and reducing the human efforts in hyperparameter tuning. We also focus on this underexploited topic, yet, propose an explicit and substantially different alternative without changing any components of the backbone.

Fig. 1. — Illustration of the proposed framework with (a) training process, (b) inference process and (c) comparison of hyperparameter optimization strategies.

On the other hand, besides the traditional regularization terms, e.g., smoothness and bending energy [4], task-specific regularizations have been further proposed, e.g., population-level statistics [3] and biomechanical models [12, 17]. Inherently, all these methods still belong to the category of spatial regularization. Experimentally, however, we notice that the estimated solutions in unsupervised registration vary greatly at different training steps due to the ill-posedness. Only spatially regularizing the transformation at each training step may neglect some informative clues related to the ill-posedness. It might be advantageous to further exploit such temporal information across different training steps.

In this paper, we present a novel double-uncertainty guided spatial and temporal consistency regularization weighting strategy, in conjunction with the Mean-Teacher (MT) [20] based registration framework. Specifically, inspired by the consistency regularization [20, 22] in semi-supervised learning, this framework further incorporates an additional temporal consistency regularization term by encouraging the consistency between the predictions of the student model and the temporal ensembling predictions of the teacher model. More importantly, instead of laboriously grid searching for the optimal fixed regularization weight, the self-ensembling teacher model takes advantage of the transformation uncertainty and the appearance uncertainty [15] derived by Monte Carlo dropout [14] to heuristically adjust the regularization weights for each image pair during training (Fig. 1 (c)). Extensive experiments on a challenging intra-patient abdominal CT-MRI dataset show that our training strategy can promisingly advance the original learning-based method in terms of efficient hyperparameter tuning and a better tradeoff between accuracy and smoothness.

2. Methods

The proposed framework, as depicted in Fig. 1 (a), is designed on top of the Mean-Teacher (MT) architecture [20] which is constructed by a student registration network (Student RegNet) and a weight-averaged teacher registration network (Teacher RegNet). This architecture was originally proposed for semi-supervised learning, showing superior performance in further exploiting unlabeled data. We appreciate the MT-like design for registration because the temporal ensembling strategy can (i) help us efficiently exploit the temporal information across different training steps in such a ill-posed problem (Sec. 2.1), and (ii) enable smoother uncertainty estimation [27] to heuristically adjust the regularization weights during training (Sec. 2.2).

2.1. Mean-Teacher based Temporal Consistency Regularization

Specifically, the student model is a typical registration network updated by back-propagation, while the teacher model uses the same network architecture as the student model but its weights are updated from that of the student model via Exponential Moving Average (EMA) strategy, allowing to exploit the temporal information across the adjacent training steps. Formally, denoting the weights of the teacher model and the student model at training step k as $θ_{k}^{'}$ and θ_k, respectively, we update $θ_{k}^{'}$ as:

θ_{k}^{'} = α θ_{k - 1}^{'} + (1 - α) θ_{k},

(2)

where α is the EMA decay and empirically set to 0.99 [20]. To exemplify our paradigm, we adopt the same U-Net architecture used in VoxelMorph (VM) [2] as the backbone network. Concisely, the moving image I_m and the fixed image I_f are concatenated as a single 2-channel 3D image input, and downsampled by four 3 × 3 × 3 convolutions with stride of 2 as the encoder. Then, corresponding 32-filter convolutions and four upsampling layers are applied to form a decoder, followed by four convolutions to refine the 3-channel deformation field. Skip connections between encoder and decoder are also applied. Given the predicted ϕ_s and ϕ_t from the student and the teacher, respectively, the Spatial Transformation Network (STN) [13] is utilized to warp the moving image I_m into I_ws and I_wt, respectively. Following the common practice, the dissimilarity between I_ws and I_f can be measured as the similarity loss ℒ_sim(I_ws, I_f) with a spatial regularization term ℒ_ϕ for ϕ_s. To exploit the informative temporal information related to the ill-posedness, we encourage the temporal ensemble prediction of the teacher model to be consistent with that of the student model by adding an appearance consistency constraint ℒ_c(I_ws, I_wt) to the training loss. In contrast to ℒ_ϕ which spatially constrains the deformation field at the current training step, ℒ_c penalizes the difference between predictions across adjacent training steps. Thus, we call ℒ_c temporal consistency regularization.

As such, the cost function Eqn. 1 can be reformulated as the following Eqn. 3, which is a combination of similarity loss ℒ_sim, spatial regularization loss ℒ_ϕ and temporal consistency regularization loss ℒ_c:

ℒ = ℒ_{sim} + λ_{ϕ} ℒ_{ϕ} + λ_{c} ℒ_{c},

(3)

where λ_ϕ and λ_c are the uncertainty guided adaptive tradeoff weights, as elab-orated in the following Sec. 2.2. In order to handle multimodal abdominal registration, we use the Modality Independent Neighborhood Descriptor (MIND) [8] based dissimilarity metric for ℒ_sim. ℒ_c is measured by mean squared error (MSE). Following the benchmark method [2], the choice of ℒ_ϕ is the generic L2-norm of the deformation field gradients for smooth transformation.

2.2. Double-Uncertainty Guided Adaptive Weighting

Distinct from the previous “one value fits all” strategy, we propose a double-uncertainty guided adaptive weighting scheme to locate the uncertain samples and then adaptively adjust λ_ϕ and λ_c for each training step.

Double-Uncertainty Estimation.

Besides predicting the deformation ϕ_t, the teacher model can also serve as our uncertainty estimation branch since the model weight-averaged strategy can improve the stability of the predictions [20] that enables smoother model uncertainty estimation [27]. Particularly, we adopt the well-known Monte Carlo dropout [14] for Bayesian approximation due to its superior robustness [18]. We repetitively perform N stochastic forward passes on the teacher model with random dropout. After this step, we can obtain a set of voxel-wise predicted deformation fields ${ϕ_{t_{i}}}_{i = 1}^{N}$ with a set of warped images ${I_{w t_{i}}}_{i = 1}^{N}$ . We propose to use the absolute value of the ratio of standard deviation to the mean, which can characterize the normalized volatility of the predictions, to represent the uncertainty [19]. More specifically, the proposed registration uncertainties can be categorized into the transformation uncertainty and appearance uncertainty [15]. Formulating $μ_{ϕ_{t}}^{c} = \frac{1}{N} \sum_{i = 1}^{N} ϕ_{t_{i}}^{c}$ and $μ_{I_{w t}} = \frac{1}{N} \sum_{i = 1}^{N} I_{w t_{i}}$ as the mean of the deformation fields and the warped images, respectively, where c represents the c^th channel of the deformation field (i.e., x, y, z displacements) and i denotes the i^th forward pass, the transformation uncertainty map $U_{ϕ_{t}} \in ℝ^{H \times W \times D \times 3}$ and the appearance uncertainty map $U_{I_{w t}} \in ℝ^{H \times W \times D}$ can be calculated as:

{\begin{array}{l} σ_{ϕ_{t}}^{c} = \sqrt{\frac{1}{N - 1} \sum_{i = 1}^{N} {(ϕ_{t_{i}}^{c} - μ_{ϕ_{t}}^{c})}^{2}} and U_{ϕ_{t}}^{c} = | \frac{σ_{ϕ_{t}}^{c}}{μ_{ϕ_{t}}^{c}} |, \\ σ_{I_{w t}} = \sqrt{\frac{1}{N - 1} \sum_{i = 1}^{N} {(I_{w t_{i}} - μ_{I_{w t}})}^{2}} and U_{I_{w t}} = | \frac{σ_{I_{w t}}}{μ_{I_{w t}}} | . \end{array}

(4)

With the guidance of $U_{ϕ_{t}}$ and $U_{I_{w t}}$ , we then propose to heuristically assign the weights λ_ϕ and λ_c of the spatial regularization ℒ_ϕ and the temporal consistency regularization ℒ_c for each image pair during training.

Adaptive Weighting.

Firstly, for the typical spatial regularization ℒ_ϕ, considering that unreliable predictions often correlate with biologically-implausible deformations [26], we assume that stronger spatial regularization can be given when the network tends to produce more uncertain predictions. As for the temporal consistency regularization ℒ_c, we notice that more uncertain predictions can be characterized as that this image pair is harder-to-align. Particularly, the most recent work [22] in semi-supervised segmentation combats with the assumption in [27] and experimentally reveals an interesting finding that emphasizing the unsupervised teacher-student consistency on those unreliable (often challenging) regions can provide more informative and productive clues for network training. Herein, we follow this intuition, i.e., more uncertain (difficult) samples should receive more attention (higher λ_c) for the consistency regularization ℒ_c. Formally, for each training step s, we update λ_ϕ and λ_c as follows:

λ_{ϕ} (s) = k_{1} \cdot \frac{\sum_{v} I (U_{ϕ_{t}} (s) > τ_{1})}{V_{U_{ϕ_{t}}}} and λ_{c} (s) = k_{2} \cdot \frac{\sum_{v} I (U_{I_{w t}} (s) > τ_{2})}{V_{U_{I_{w t}}}},

(5)

where $I (\cdot)$ is the indicator function; v denotes the v-th voxel; $V_{U_{ϕ_{t}}}$ and $V_{I_{w t}}$ represent the volume sizes of $U_{ϕ_{t}}$ and $U_{I_{w t}}$ , respectively; k₁ and k₂ are the empirical scalar values; and τ₁ and τ₂ are the thresholds to select the most uncertain predictions. Noteworthily, the proposed strategy can work with any learning-based image registration architectures without increasing the number of trainable parameters. Besides, as shown in Fig. 1 (b), only the student model is utilized at the inference stage, which can ensure the computational efficiency.

3. Experiments and Results

Datasets.

We focus on the challenging application of abdominal CT-MRI multimodal registration for improving the accuracy of percutaneous nephrolithotomy. Under institutional review board approval, a 50-pair intra-patient abdominal CT-MRI dataset was collected from our partnership hospital with radiologist-examined segmentation masks for the region-of-interests (ROIs), including liver, kidney and spleen. We randomly divided the dataset into three groups for training (35 cases), validation (5 cases) and testing (10 cases), respectively. After sequential preprocessing steps including resampling, affine pre-alignment, intensity normalization and cropping, the images were processed into sub-volumes of 176× 176 × 128 voxels at the 1 mm isotropic resolution.

Implementation and Evaluation Criteria.

The proposed framework is implemented on PyTorch and trained on an NVIDIA Titan X (Pascal) GPU. We employ the Adam optimizer with a learning rate of 0.0001 with a decay factor of 0.9. The batch size is set to 1 so that each step contains an image pair. We set N = 6 for the uncertainty estimation. We empirically set k₁ to 5 since the deformation fields with maximum λ_ϕ = 5 are diffeomorphic in most cases. The scalar value k₂ is set to 1. Thresholds τ₁ and τ₂ are set to 10% and 1%, respectively. We adopt a series of evaluation metrics, including Average Surface Distance (ASD) and the average Dice score between the segmentation masks of warped images and fixed images. In addition, the average percentage of voxels with non-positive Jacobian determinant (|J_ϕ| ≤ 0) in the deformation fields and the standard deviation of the Jacobian determinant (σ(|J_ϕ|)) are obtained to quantify the diffeomorphism and smoothness, respectively.

Comparison Study.

Table 1 and Fig. 2 present the quantitative and qualitative comparisons, respectively. We compare with several baselines, including two traditional methods SyN [1] and Deeds [9] with five levels of discrete optimization, as well as the benchmark learning-based method VoxelMorph (VM) [2] and its probabilistic diffeomorphic version DIF-VM [5]. As an alternative for adaptive weighting, we also include the recent conditional registration network [16] with the same scalar value 5 (denoted as VM (CIR-DM)). Fairly, we use the MIND-based dissimilarity metric in all learning-based methods. Although DIF-VM preserves better diffeomorphism properties, we find that its results are often suboptimal. Thus, we adopt VM as our backbone. For simplicity, we denote our adaptive spatial and temporal consistency regularization weighting strategy as VM (AS+ATC). Instead of training multiple models for finding the optimal fixed weight, only one model needs to be trained in both VM (CIR-DM) [16] and our VM (AS+ATC). Experimentally, training each VM model from scratch requires an average of 9.2h for this task. For the typical grid search scheme, five individual VM models are trained with varying fixed spatial regularization weights from 1 to 5, resulting in around 46h training time in total, wherein we observe that the overall best-performing VM model appears at λ = 3. As a comparison, the implicit control method, i.e., VM (CIR-DM), achieves comparable results with ~4.7x shorter total training time. Compared with VM (CIR-DM), our VM (AS+ATC) requires slightly longer training time due to the uncertainty estimation, yet, still resulting in ~4.2x faster than the grid search scheme. Distinct from VM (CIR-DM), we do not need to change any components in the network. Encouragingly, we find that VM (AS+ATC) further improves the registration accuracy in terms of Dice and ASD along with better properties of diffeomorphism and smoothness, implying that our strategy helps produce more desirable (smoother) solutions in this ill-posed problem. Visually, the structure boundaries registered by SyN and Deeds still have considerable disagreements, while the learning-based methods achieve more appealing boundary alignment. Besides, all trained models can infer an alignment in a second with a GPU.

Table 1.

Quantitative results for abdominal CT-MRI registration (mean±std).

Methods	Dice [%] ↑			ASD [voxel] ↓			\|J_ϕ\| ≤ 0	σ(\|J_ϕ\|)
Methods	Liver	Spleen	Kidney	Liver	Spleen	Kidney	\|J_ϕ\| ≤ 0	σ(\|J_ϕ\|)
Initial	76.23±4.12	77.94±3.31	80.18±3.06	4.98±0.83	2.02±0.51	1.95±0.35	-	-
SyN	79.42±4.35	80.33±3.42	82.68±3.03	4.83±0.82	1.62±0.61	1.91±0.44	0.07%	0.42
Deeds	82.16±3.15	81.48±2.64	83.82±3.02	3.97±0.55	1.44±0.62	1.59±0.40	0.01%	0.28
DIF-VM	83.14±3.24	82.45±2.59	83.24±2.98	3.88±0.62	1.52±0.59	1.63±0.41	<0.001%	0.12
VM (λ = 1)	85.21±3.06	84.04±2.52	83.12±2.81	3.17±0.59	1.34±0.56	1.55±0.37	0.03%	0.19
VM (λ = 3)^†	85.36±3.12	84.24±2.61	83.40±3.11	3.19±0.53	1.31±0.52	1.56±0.42	0.001%	0.14
VM (λ = 5)	84.28±2.93	83.71±2.40	82.96±2.77	3.83±0.68	1.48±0.59	1.65±0.44	<0.0001%	0.08
VM (CIR-DM)	85.29±3.39	84.17±2.59	83.01±3.06	3.32±0.41	1.33±0.47	1.49±0.46	0.002%	0.17
VM (AS+ATC) (ours)	87.01±3.22	84.96±2.55	84.47±2.86	2.57±0.48	1.24±0.49	1.26±0.43	<0.0005%	0.13
Ablation Study
VM (AS)	86.02±3.02	84.75±2.59	83.44±2.96	3.21±0.49	1.33±0.50	1.45±0.49	0.0007%	0.12
VM (S+TC)	86.32±3.07	84.37±2.53	83.95±3.02	3.05±0.50	1.26±0.52	1.47±0.47	0.001%	0.15
VM (AS+TC)	86.49±3.13	84.87±2.68	84.03±2.89	2.88±0.53	1.20±0.44	1.32±0.40	0.0005%	0.14

Open in a new tab

Higher average Dice score (%) and lower ASD (mm) are better.

^†

indicates the best model via grid search.

Best results are shown in bold. Average percentage of foldings (|J_ϕ| ≤ 0) and the standard deviation of the Jacobian determinant (σ(|J_ϕ|)) are also given.

Fig. 2. — Exemplar axial slice of an abdominal CT-MRI registration case. The segmentation contours of the liver (green), kidney (yellow) and spleen (blue) extracted from the fixed abdominal MRI I_f are overlaid on all images. Better alignment drives structures closer to the fixed contours of I_f. The red arrows indicate the registration of interest around the organ boundary.

Ablation Study.

To better understand our training strategy, we perform an ablation study with three variants: (i) VM (AS): removing the adaptive temporal consistency term; (ii) VM (S+TC): using empirical fixed weights λ_ϕ = 3 and λ_c = 0.5; (iii) VM (AS+TC): adaptively adjusting λ_ϕ while λ_c is fixed as 0.5. The quantitative results are also presented in Table 1. Especially, similar to VM (CIR-DM), VM (AS) only adaptively adjusts λ_ϕ. We find that VM (AS) achieves even better performance compared with VM (λ = 3) and VM (CIR-DM), highlighting that our uncertainty-guided weighting scheme enables more precise control of the spatial regularization strength during training. It can be also observed that both VM (S+TC) and VM (AS+TC) achieve better performance than VM (AS), demonstrating that additionally exploiting the temporal information can be rewarding. When we integrate these components into our synergistic training scheme, their better efficacy can be brought into play.

Visualized Uncertainty Map and Weighting Process.

Examples of two uncertainty maps $U_{ϕ_{t}}$ and $U_{I_{w t}}$ are visualized in Fig. 3 (a) and (c), respectively. We observe that high uncertainty often occurs in hard-to-align ambiguous areas. Note that at the early stage, the weights are relatively small since limited transformation has been captured. As the training goes, the weights of ℒ_ϕ and ℒ_c are adaptively modulated after each step (Fig. 3 (b) and (d)) assisted by the two uncertainties. Such scheme helps the model pursue a better tradeoff between accurate alignment and desirable diffeomorphism properties.

4. Conclusion

In this paper, we proposed a double-uncertainty guided spatial and temporal consistency regularization weighting strategy, assisted by a mean-teacher based registration framework. Besides temporal consistency regularization for further exploiting the temporal clues related to the ill-posedness, more importantly, the self-ensembling teacher model takes advantage of two estimated uncertainties to heuristically adjust the regularization weights for each image pair during training. Extensive experiments on abdominal CT-MRI registration showed that our strategy could promisingly advance the original learning-based method in terms of efficient hyperparameter tuning and a better tradeoff between accuracy and smoothness.

Acknowledgement

This research was done with Tencent Healthcare (Shenzhen) Co., LTD and Tencent Jarvis Lab and supported by General Research Fund from Research Grant Council of Hong Kong (No. 14205419) and the Scientific and Technical Innovation 2030-“New Generation Artificial Intelligence” Project (No. 2020AAA0104100).

References

1.Avants BB, Epstein CL, Grossman M, Gee JC: Symmetric diffeomorphic image registration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerative brain. Medical Image Analysis 12 1, 26–41 (2008) [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Balakrishnan G, Zhao A, Sabuncu MR, Guttag J, Dalca AV: An unsupervised learning model for deformable medical image registration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 9252–9260 (2018) [Google Scholar]
3.Bhalodia R, Elhabian SY, Kavan L, Whitaker RT: A cooperative autoen-coder for population-based regularization of CNN image registration. In: International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 391–400. Springer; (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Bookstein F: Landmark methods for forms without landmarks: morphometrics of group differences in outline shape. Medical Image Analysis pp. 225–243 (1997) [DOI] [PubMed] [Google Scholar]
5.Dalca AV, Balakrishnan G, Guttag J, Sabuncu MR: Unsupervised learning for fast probabilistic diffeomorphic registration. In: International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 729–738. Springer; (2018) [Google Scholar]
6.Dumoulin V, Shlens J, Kudlur M: A learned representation for artistic style. International Conference on Learning Representations (2016) [Google Scholar]
7.Ha D, Dai A, Le QV: Hypernetworks. arXiv preprint arXiv:1609.09106 (2016) [Google Scholar]
8.Heinrich MP, Jenkinson M, Bhushan M, Matin T, Gleeson FV, Brady M, Schnabel JA: MIND: Modality independent neighbourhood descriptor for multimodal deformable registration. Medical Image Analysis 16(7), 1423–1435 (2012) [DOI] [PubMed] [Google Scholar]
9.Heinrich MP, Jenkinson M, Brady M, Schnabel JA: MRF-based deformable registration and ventilation estimation of lung CT. IEEE Transactions on Medical Imaging 32(7), 1239–1248 (2013) [DOI] [PubMed] [Google Scholar]
10.Hering A, Hansen L, Mok TC, Chung A, Siebert H, Häger S, Lange A, Kuckertz S, Heldmann S, Shao W, et al. : Learn2Reg: comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning. arXiv preprint arXiv:2112.04489 (2021) [DOI] [PubMed] [Google Scholar]
11.Hoopes A, Hoffmann M, Fischl B, Guttag J, Dalca AV: Hypermorph: Amortized hyperparameter learning for image registration. In: International Conference on Information Processing in Medical Imaging. pp. 3–17. Springer; (2021) [Google Scholar]
12.Hu Y, Gibson E, Ghavami N, Bonmati E, Moore CM, Emberton M, Vercauteren T, Noble JA, Barratt DC: Adversarial deformation regularization for training image registration neural networks. In: International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 774–782. Springer; (2018) [Google Scholar]
13.Jaderberg M, Simonyan K, Zisserman A, et al. : Spatial transformer networks. In: Advances in Neural Information Processing Systems. pp. 2017–2025 (2015) [Google Scholar]
14.Kendall A, Gal Y: What uncertainties do we need in Bayesian deep learning for computer vision? arXiv preprint arXiv:1703.04977 (2017) [Google Scholar]
15.Luo J, Sedghi A, Popuri K, Cobzas D, Zhang M, Preiswerk F, Toews M, Golby A, Sugiyama M, Wells W, Frisken S: On the applicability of registration uncertainty. In: Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 107–115. Springer; (2019) [Google Scholar]
16.Mok TC, Chung A: Conditional deformable image registration with convolutional neural network. In: International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 35–45. Springer; (2021) [Google Scholar]
17.Qin C, Wang S, Chen C, Qiu H, Bai W, Rueckert D: Biomechanics-informed neural networks for myocardial motion tracking in mri. In: International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 296–306. Springer; (2020) [Google Scholar]
18.Qu Y, Mo S, Niu J: DAT: Training deep networks robust to label-noise by matching the feature distributions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6821–6829 (2021) [Google Scholar]
19.Smith L, Gal Y: Understanding measures of uncertainty for adversarial example detection. arXiv preprint arXiv:1803.08533 (2018) [Google Scholar]
20.Tarvainen A, Valpola H: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems. pp. 1195–1204 (2017) [Google Scholar]
21.de Vos BD, Berendsen FF, Viergever MA, Sokooti H, Staring M, Išgum I: A deep learning framework for unsupervised affine and deformable image registration. Medical Image Analysis 52, 128–143 (2019) [DOI] [PubMed] [Google Scholar]
22.Wu Y, Xu M, Ge Z, Cai J, Zhang L: Semi-supervised left atrium segmentation with mutual consistency training. In: International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 297–306. Springer; (2021) [Google Scholar]
23.Xu Z, Luo J, Yan J, Li X, Jayender J: F3RNet: Full-resolution residual registration network for deformable image registration. International Journal of Computer Assisted Radiology and Surgery 16(6), 923–932 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Xu Z, Luo J, Yan J, Pulya R, Li X, Wells W, Jagadeesan J: Adversarial uni-and multi-modal stream networks for multimodal image registration. In: International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 222–232. Springer; (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Xu Z, Yan J, Luo J, Li X, Jagadeesan J: Unsupervised multimodal image registration with adaptative gradient Guidance. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1225–1229. IEEE; (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Xu Z, Yan J, Luo J, Wells W, Li X, Jagadeesan J: Unimodal cyclic regularization for training multimodal image registration networks. In: IEEE 18th International Symposium on Biomedical Imaging (ISBI). pp. 1660–1664. IEEE; (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Yu L, Wang S, Li X, Fu CW, Heng PA: Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation. In: International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 605–613. Springer; (2019) [Google Scholar]

[R1] 1.Avants BB, Epstein CL, Grossman M, Gee JC: Symmetric diffeomorphic image registration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerative brain. Medical Image Analysis 12 1, 26–41 (2008) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Balakrishnan G, Zhao A, Sabuncu MR, Guttag J, Dalca AV: An unsupervised learning model for deformable medical image registration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 9252–9260 (2018) [Google Scholar]

[R3] 3.Bhalodia R, Elhabian SY, Kavan L, Whitaker RT: A cooperative autoen-coder for population-based regularization of CNN image registration. In: International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 391–400. Springer; (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Bookstein F: Landmark methods for forms without landmarks: morphometrics of group differences in outline shape. Medical Image Analysis pp. 225–243 (1997) [DOI] [PubMed] [Google Scholar]

[R5] 5.Dalca AV, Balakrishnan G, Guttag J, Sabuncu MR: Unsupervised learning for fast probabilistic diffeomorphic registration. In: International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 729–738. Springer; (2018) [Google Scholar]

[R6] 6.Dumoulin V, Shlens J, Kudlur M: A learned representation for artistic style. International Conference on Learning Representations (2016) [Google Scholar]

[R7] 7.Ha D, Dai A, Le QV: Hypernetworks. arXiv preprint arXiv:1609.09106 (2016) [Google Scholar]

[R8] 8.Heinrich MP, Jenkinson M, Bhushan M, Matin T, Gleeson FV, Brady M, Schnabel JA: MIND: Modality independent neighbourhood descriptor for multimodal deformable registration. Medical Image Analysis 16(7), 1423–1435 (2012) [DOI] [PubMed] [Google Scholar]

[R9] 9.Heinrich MP, Jenkinson M, Brady M, Schnabel JA: MRF-based deformable registration and ventilation estimation of lung CT. IEEE Transactions on Medical Imaging 32(7), 1239–1248 (2013) [DOI] [PubMed] [Google Scholar]

[R10] 10.Hering A, Hansen L, Mok TC, Chung A, Siebert H, Häger S, Lange A, Kuckertz S, Heldmann S, Shao W, et al. : Learn2Reg: comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning. arXiv preprint arXiv:2112.04489 (2021) [DOI] [PubMed] [Google Scholar]

[R11] 11.Hoopes A, Hoffmann M, Fischl B, Guttag J, Dalca AV: Hypermorph: Amortized hyperparameter learning for image registration. In: International Conference on Information Processing in Medical Imaging. pp. 3–17. Springer; (2021) [Google Scholar]

[R12] 12.Hu Y, Gibson E, Ghavami N, Bonmati E, Moore CM, Emberton M, Vercauteren T, Noble JA, Barratt DC: Adversarial deformation regularization for training image registration neural networks. In: International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 774–782. Springer; (2018) [Google Scholar]

[R13] 13.Jaderberg M, Simonyan K, Zisserman A, et al. : Spatial transformer networks. In: Advances in Neural Information Processing Systems. pp. 2017–2025 (2015) [Google Scholar]

[R14] 14.Kendall A, Gal Y: What uncertainties do we need in Bayesian deep learning for computer vision? arXiv preprint arXiv:1703.04977 (2017) [Google Scholar]

[R15] 15.Luo J, Sedghi A, Popuri K, Cobzas D, Zhang M, Preiswerk F, Toews M, Golby A, Sugiyama M, Wells W, Frisken S: On the applicability of registration uncertainty. In: Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 107–115. Springer; (2019) [Google Scholar]

[R16] 16.Mok TC, Chung A: Conditional deformable image registration with convolutional neural network. In: International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 35–45. Springer; (2021) [Google Scholar]

[R17] 17.Qin C, Wang S, Chen C, Qiu H, Bai W, Rueckert D: Biomechanics-informed neural networks for myocardial motion tracking in mri. In: International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 296–306. Springer; (2020) [Google Scholar]

[R18] 18.Qu Y, Mo S, Niu J: DAT: Training deep networks robust to label-noise by matching the feature distributions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6821–6829 (2021) [Google Scholar]

[R19] 19.Smith L, Gal Y: Understanding measures of uncertainty for adversarial example detection. arXiv preprint arXiv:1803.08533 (2018) [Google Scholar]

[R20] 20.Tarvainen A, Valpola H: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems. pp. 1195–1204 (2017) [Google Scholar]

[R21] 21.de Vos BD, Berendsen FF, Viergever MA, Sokooti H, Staring M, Išgum I: A deep learning framework for unsupervised affine and deformable image registration. Medical Image Analysis 52, 128–143 (2019) [DOI] [PubMed] [Google Scholar]

[R22] 22.Wu Y, Xu M, Ge Z, Cai J, Zhang L: Semi-supervised left atrium segmentation with mutual consistency training. In: International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 297–306. Springer; (2021) [Google Scholar]

[R23] 23.Xu Z, Luo J, Yan J, Li X, Jayender J: F3RNet: Full-resolution residual registration network for deformable image registration. International Journal of Computer Assisted Radiology and Surgery 16(6), 923–932 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Xu Z, Luo J, Yan J, Pulya R, Li X, Wells W, Jagadeesan J: Adversarial uni-and multi-modal stream networks for multimodal image registration. In: International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 222–232. Springer; (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Xu Z, Yan J, Luo J, Li X, Jagadeesan J: Unsupervised multimodal image registration with adaptative gradient Guidance. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1225–1229. IEEE; (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Xu Z, Yan J, Luo J, Wells W, Li X, Jagadeesan J: Unimodal cyclic regularization for training multimodal image registration networks. In: IEEE 18th International Symposium on Biomedical Imaging (ISBI). pp. 1660–1664. IEEE; (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Yu L, Wang S, Li X, Fu CW, Heng PA: Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation. In: International Conference on Medical Image Computing and Computer Assisted Intervention. pp. 605–613. Springer; (2019) [Google Scholar]

PERMALINK

Double-Uncertainty Guided Spatial and Temporal Consistency Regularization Weighting for Learning-based Abdominal Registration

Zhe Xu

Jie Luo

Donghuan Lu

Jiangpeng Yan

Sarah Frisken

Jayender Jagadeesan

William M Wells III

Xiu Li

Yefeng Zheng

Raymond Kai-yu Tong

Abstract

1. Introduction

Fig. 1.

2. Methods

2.1. Mean-Teacher based Temporal Consistency Regularization

2.2. Double-Uncertainty Guided Adaptive Weighting

Double-Uncertainty Estimation.

Adaptive Weighting.

3. Experiments and Results

Datasets.

Implementation and Evaluation Criteria.

Comparison Study.

Table 1.

Fig. 2.

Ablation Study.

Visualized Uncertainty Map and Weighting Process.

Fig. 3.

4. Conclusion

Acknowledgement

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Double-Uncertainty Guided Spatial and Temporal Consistency Regularization Weighting for Learning-based Abdominal Registration

Zhe Xu

Jie Luo

Donghuan Lu

Jiangpeng Yan

Sarah Frisken

Jayender Jagadeesan

William M Wells III

Xiu Li

Yefeng Zheng

Raymond Kai-yu Tong

Abstract

1. Introduction

Fig. 1.

2. Methods

2.1. Mean-Teacher based Temporal Consistency Regularization

2.2. Double-Uncertainty Guided Adaptive Weighting

Double-Uncertainty Estimation.

Adaptive Weighting.

3. Experiments and Results

Datasets.

Implementation and Evaluation Criteria.

Comparison Study.

Table 1.

Fig. 2.

Ablation Study.

Visualized Uncertainty Map and Weighting Process.

Fig. 3.

4. Conclusion

Acknowledgement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases