Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Sep 1.
Published in final edited form as: Med Phys. 2024 May 31;51(9):6176–6184. doi: 10.1002/mp.17235

Diffeomorphic Transformer-based Abdomen MRI-CT Deformable Image Registration

Yang Lei 1, Luke A Matkovic 1, Justin Roper 1, Tonghe Wang 2, Jun Zhou 1, Beth Ghavidel 1, Mark McDonald 1, Pretesh Patel 1, Xiaofeng Yang 1
PMCID: PMC11489013  NIHMSID: NIHMS1997681  PMID: 38820286

Abstract

Background:

Stereotactic body radiotherapy (SBRT) is a well-established treatment modality for liver metastases in patients unsuitable for surgery. Both CT and MRI are useful during treatment planning for accurate target delineation and to reduce potential organs-at-risk (OAR) toxicity from radiation. MRI-CT deformable image registration (DIR) is required to propagate the contours defined on high-contrast MRI to CT images. An accurate DIR method could lead to more precisely defined treatment volumes and superior OAR sparing on the treatment plan. Therefore, it is beneficial to develop an accurate MRI-CT DIR for liver SBRT.

Purpose:

To create a new deep learning model that can estimate the deformation vector field (DVF) for directly registering abdominal MRI-CT images.

Methods:

The proposed method assumed a diffeomorphic deformation. By using topology-preserved deformation features extracted from the probabilistic diffeomorphic registration model, abdominal motion can be accurately obtained and utilized for DVF estimation. The model integrated Swin transformers, which have demonstrated superior performance in motion tracking, into the convolutional neural network (CNN) for deformation feature extraction. The model was optimized using a cross-modality image similarity loss and a surface matching loss. To compute the image loss, a modality-independent neighborhood descriptor (MIND) was used between the deformed MRI and CT images. The surface matching loss was determined by measuring the distance between the warped coordinates of the surfaces of contoured structures on the MRI and CT images. To evaluate the performance of the model, a retrospective study was carried out on a group of 50 liver cases that underwent rigid registration of MRI and CT scans. The deformed MRI image was assessed against the CT image using the target registration error (TRE), Dice similarity coefficient (DSC), and mean surface distance (MSD) between the deformed contours of the MRI image and manual contours of the CT image.

Results:

When compared to only rigid registration, DIR with the proposed method resulted in an increase of the mean DSC values of the liver and portal vein from 0.850±0.102 and 0.628±0.129 to 0.903±0.044 and 0.763±0.073, a decrease of the mean MSD of the liver from 7.216±4.513 mm to 3.232±1.483 mm, and a decrease of the TRE from 26.238±2.769 mm to 8.492±1.058 mm.

Conclusion:

The proposed DIR method based on a diffeomorphic transformer provides an effective and efficient way to generate an accurate DVF from an MRI-CT image pair of the abdomen. It could be utilized in the current treatment planning workflow for liver SBRT.

Keywords: Abdomen SBRT, deformable image registration, deep learning

1. Introduction

Stereotactic body radiation therapy (SBRT) is a high-dose ablative technique administered in one to five fractions, with a biologically effective dose that is either equal to or greater than that of conventional radiotherapy.1 Recent studies have demonstrated significant local control of primary and metastatic hepatic malignancies through the use of SBRT.2,3 As a result, SBRT has emerged as a viable alternative for liver-directed therapy in the treatment of both primary liver cancers and liver metastases, particularly for patients ineligible for surgical resection for various reasons.4,5 However, the liver is a radiation-sensitive organ with a risk for radiation-induced liver dysfunction and hepatic toxicities including exacerbation of any underlying liver cirrhosis with both mortality risk and potential detrimental effect on the quality of life for patients. With high dose per fraction and steep dose gradients seen in SBRT, normal liver dosimetry and optimal delivery of the target dose are hindered by potential hepatic toxicities, which continue to be limiting factors.6,7

Simulation CT images are regularly obtained for treatment planning due to their provision of essential electron density information required for dose calculation. By utilizing the spatial precision and administering tumoricidal radiation doses based on the target(s) and OARs defined on planning CT images, SBRT can be implemented to achieve high rates of tumor control while minimizing the exposure of nearby healthy tissue. This approach helps to reduce the risk of radiation-induced liver disease.8 Although the planning CT image can be used to delineate the target(s) and OARs, the CT image alone may not offer sufficient soft-tissue information to enable accurate contouring of the target(s) and nearby critical structures and relevant organs. This level of accuracy is vital for both tumor control and dose delivery precision.9

The integration of MRI into the liver SBRT workflow under free breathing conditions is being studied as a means of enhancing SBRT, thanks to MRI’s superior soft tissue contrast.10 By employing MRI-CT deformable image registration (DIR), contours that were defined on high-contrast MRI images can be transferred to CT images, potentially resulting in smaller treatment volumes, greater OAR sparing accuracy, and reduction in toxicity. Consequently, the development of an accurate MRI-CT DIR for liver SBRT is highly beneficial.

Recent methods have been proposed to train deep learning-based models that can create a transformation from a set of moving and fixed images to a deformation estimation, such as a deformation vector field (DVF).11 Supervised deep learning-based methods have shown feasibility for DIR.12 However, such methods require ground truth DVFs for training, which are usually generated through simulation. Such simulation for generating training data may not accurately represent the distribution of real patient deformation. Consequently, recent deep learning-based methods have utilized unsupervised techniques. These methods utilize a network that performs spatial transformations to deform one image into another.1315 Unsupervised DIR methods consider volumetric changes between scans; however, they may generate unrealistic results where voxels move in a non-physiological manner. This challenge is particularly pronounced in abdominal motion when the DIR is near the diaphragm. In addition to quantitative calculations, qualitative assessment is recommended for DIR.16 Moreover, the different intensity distributions of distinct image modalities can make liver MRI-CT DIR challenging. Additionally, aligning a specific organ to nearby structures may not provide the best solution for the entire image volume in cases where the organ is deformed relative to adjacent structures. In this study, we aim to develop a novel DIR with a direct approximation of dense DVF via a deep learning-based method, called diffeomorphic transformer-based DIR, to match MRI images to planning CT images and thus improve the accuracy of OAR sparing for liver SBRT. The concepts that inspired us in our methodology are introduced in the following sections.

1.A. Probabilistic Diffeomorphic Registration

Several recent studies have assumed that deformations and their inverses are diffeomorphic, or differentiable.1722 In one study, deep learning-based probabilistic models were used to apply the diffeomorphic theory for DIR.18 Diffeomorphic deformations are differentiable and invertible, and thus preserve topology.19 The approach developed in 201820 used a variational strategy to learn a deep learning-based model to predict a stationary velocity field.21 VoxelMorph,22 an improved approach developed in 2019, was applied to a multitude of deformable representations and assumed the deformations were diffeomorphic, especially with the stationary velocity field. In this study, we also assumed that the deformations are diffeomorphic. We utilized topology-preserved deformation features extracted from the probabilistic diffeomorphic registration model to accurately obtain abdominal motion, which can be used for estimating the DVF.

1.B. Surface-guided

Recent registration techniques have suggested utilizing contour-based loss, such as the Dice similarity coefficient (DSC), in place of image-based similarity terms when contours are accessible through optimization for intra-subject multi-modality.23 Instead of relying on volume-based similarity metrics, surface matching methods employ surface coordinates or geometric features extracted from anatomical structures to evaluate similarity.24 One solution is to use iterated closest point-based optimization methods to find the shape correspondences.25 The method in this work combined surface- and image-based losses to train a deep learning-based model, which utilized a 3D point representation with volumetric images to achieve fast registration. The registration was enabled by a differentiable surface distance function.

1.C. Transformer

Recently, vision transformer architectures have been proposed to overcome the limitations of convolutional neural networks (CNNs) and have produced state-of-the-art performances in many medical imaging applications.26 Transformers can be strong candidates for image registration because their substantially larger receptive field enables more precise comprehension of the spatial correspondence between moving and fixed images. In this work, we aimed to integrate the transformer architecture into a CNN for deformation estimation. The Swin Transformer,27 which demonstrated superior performance in motion tracking, was integrated into the model for deformation feature extraction.

2. Methods and materials

2.A. Mathematics

Let IMR and ICT be the 3D MRI and CT images, where MR is taken as the moving image and CT is taken as the fixed image, and let φR3R3 denote the DVF that deformably registers the MRI to match the CT. The deformed image Idef is described as IMR warped via φ*IMR to match ICT. Inspired by previous works,20,22 the estimated φ yields diffeomorphic registration given IMR and ICT in a probabilistic manner. Let yφ denote the displacement field, which is estimated from the proposed CNN-based model and registers IMR by φ*IMR. The prior probability of yφ can be assumed to have a multivariate Gaussian distribution, Pyφ=𝒩yφ;μyφ,σyφ, where the mean is μyφ and the covariance is assumed to be σyφ. Inspired by a recent study,17 yφ is assumed to be a fixed velocity field that defines a diffeomorphism through the ordinary differential equation.28 Namely, the spatial smoothness of yφ is encouraged by σyφ-1=λLG, where LG=DG-A is a Laplacian of a neighborhood graph defined on the voxel grid, DG is the graph degree matrix, and A is a voxel neighborhood adjacency matrix. λ represents a parameter that governs the magnitude of the velocity field yφ. The posterior distribution of Idef can be represented as:

PIdef|yφ;IMR=𝒩Idef;φ*IMR,σdef (1)

where σdef captures the variance of additive image noise.

Based on the above assumption, the goal of the proposed CNN-based model was to estimate the most likely displacement field yφ for an image set (IMR,ICT) which can satisfy a maximum a posteriori estimation PIdef|yφ;IMR. Namely, the goal of the CNN-model was the estimation of the voxel-wise velocity field mean μφ and variance σyφ. The computation of PIdef|yφ;IMR is inflexible, but can be approximated by minimizing the Kullback–Leibler (KL) divergence:

minφKLPφIdef|yφ;IMRPIdef|yφ;IMR. (2)

As a result, the computation of the negative value of the variational lower bound for the model evidence is achieved. Then, yφ can be learned by optimizing Eq. (2) via stochastic gradient methods. Namely, for a training image pair (IMR,ICT), we compute φ*IMR with the resulting loss:

Lyφ;IMR,ICT=KLPφIdef|yφ;IMRPIdef|yφ;IMR (3)

The minimization of Eq. (3) requires an image similarity loss calculated between Idef and ICT. Since Idef is the deformed MRI image and a different modality than ICT, we used our recently well-developed modality-independent neighborhood descriptor (MIND) to compute this cross-modality image similarity loss.29 More details of the benefit and technical description of MIND loss are discussed in our previous work.30 Generally, MIND relies on the similarity of small image patches within a single image and seeks to extract distinctive structures in local neighborhoods that are preserved across modalities. It has the capability to differentiate between various features, including corners, edges, and homogeneously textured regions. The multi-dimensional descriptor can be calculated efficiently in a dense manner across the entire image, offering point-wise local similarity across modalities by considering the absolute or squared difference between descriptors. The sum of squared differences in MIND representations of images served as the similarity metric employed for MIND loss.

While in the training phase, the proposed model used delineations of structures of interest on both CT and MRI scans. These contours were incorporated in the loss functions during the training process. Notably, once the training was complete, the model no longer required the CT- and MRI-delineated contours as input; it solely relied on the CT and MRI scans as input. As illustrated in Fig. 1, only the portions indicated by the black arrows were essential for the feedforward path of the model. Specifically, for deformable registration of a new MRI and CT pair using the trained model, only these specified portions were necessary.

Figure 1.

Figure 1.

The workflow of the proposed diffeomorphic transformer model. The blue arrows denote supervision during training stage, which is not required during inference stage. Both black and blue arrows are needed during training.

2.B. Surface-based semi-supervision

Certain training images may provide additional anatomical outlines for specific structures of interest, which can improve the registration in an unsupervised manner. Given the anatomical structure (liver) delineated from IMR and ICT, the anatomical surface was extracted. Let sMR and sCT indicate the spatial coordinates of the anatomical structure’s surface in IMR and ICT, respectively. Given the diffeomorphism yφ in the previous section, we modeled surface location sMR, which was formed by displacing a matching surface location sCT according to yφ:

Psdef|yφ;sMR=𝒩sCT;φ*sMR,σs (4)

where the composition φ*sMR warps surface coordinates, sdef denotes the warped coordinates of sMR, and σs denotes the spatial variance. The structure (liver) delineation in CT is required only during the training phase. After training, the model no longer requires the delineated contour for a new set of CT and MRI images.

By leveraging both images and contour maps during the training process, our objective was to feed surfaces of the contours of ICT, called sCT, into the model to approximate the conditional posterior probability PyφICT,sCT;IMR,sMR. As in the previous section, our goal was to minimize the KL divergence between the true posterior and the approximate posterior:

minφKLPφIdef|yφ;IMRPyφICT,sCT;IMR,sMR (5)

More details of KL is explained in Supplement, KL divergence.

2.C. Workflow

The framework, called the diffeomorphic transformer model, is summarized in Fig. 1. The first part of the network, called the morphological transformer, takes the images and surfaces as input, then generates the estimated posterior probability, which can be represented by the estimated DVF’s mean μyφ and variance σyφ. To generate a diffeomorphic deformation field yφ, a velocity field y is sampled and transformed using integration layers that support differentiability. At last, a spatial transformer warps IMR and sMR to derive deformed images, represented by φ*IMR and φ*sMR, that can match ICT and sCT.

To enable optimization of parameters yφ using Eq. (3) and Eq. (5), Idef=φ*IMR needs to be formed. Given yφ=μyφ+σyφnv, where nv~𝒩0,1,31 the φ can be computed as φ=eyφ. The vector integration layer is composed of scaling and squaring operations. Specifically, scaling and squaring operations involve compositions within the neural network architecture using a differentiable spatial transformation operation. Given two 3D vector fields a and b, for each voxel p this operation computes abp=abp, a non-integer voxel location bp in a, using linear interpolation. Starting with φ1/2T=p+yφ/2T, we computed φ1/2t-1=φ1/2tφ1/2t recursively T times resulting in φ=eyφ.32 In the final step, we employed a spatial transform layer to deform the moving image IMR based on the estimated deformation vector field.

In summary, the network takes images IMR and ICT as input, estimates parameters μyφ and σyφ used for obtaining the DVF, samples a new velocity field yφ~𝒩μyφ|IMR,ICT,σyφ|IMR,ICT, generates a diffeomorphic φ, and deforms IMR. As all the steps’ optimization objectives are differentiable, the network parameters can be optimized via stochastic gradient descent optimization. This network-based model yields three outputs, namely μz|IMR,ICT,σz|IMR,ICT, and φ*IMR, which are used in the model loss in Eq. (5).

With the optimized network, we can perform the deformable registration of a set of scans (IMR,ICT) using φ. The first step is to obtain the most probable velocity field y^φ by employing the following equation:

y^φ=argmaxyφPyφ|ICT,sCT;IMR,sMR=μyφ|IMR,ICT+σyφ|IMR,ICTnv (6)

by evaluating the proposed CNN. Then, we computed φ by utilizing the integration process based on scaling and squaring.

2.D. Morphological Transformer

In this work, we developed a morphological transformer, which is a hybrid transformer-CNN model that is able to utilize abdominal MRI-CT (IMR,ICT), for μyφ|IMR,ICT and σyφ|IMR,ICT estimation. The morphological transformer is an end-to-end encoder-decoder network, where Swin transformers were employed as the encoder to capture the spatial correspondence between the input MRI and CT images. The Swin transformer used in this work is a shift-window based transformer, which is inserted to each of the corresponding blocks in Fig. 2. The source code is based on “Swin3D” https://github.com/microsoft/Swin3D.

Figure 2.

Figure 2.

Network architecture of the morphological transformer. The blue dashed arrow indicates the skip connection between operators, while the black solid arrows signify the flow to the next set of operators.

Then, a CNN-based decoder processed the information provided by the encoder into a deformation feature map. The Conv3D blocks included two 3D convolutional layers. The Conv3D layer was followed by batch normalization and then ReLU activation. Long skip connections were implemented to preserve the flow of localization information between the encoder and decoder stages. Finally, the morphological transformation estimated the dense DVF and applied the DVF to deform the MRI image to match the CT image.

2.E. Dataset, implementation and evaluation

To test the proposed method, a retrospective study was conducted on a cohort of 50 patients’ datasets. These individuals were patients who had liver cancer and underwent radiotherapy. The MRI images were acquired via Axial T1 fast spoiled gradient-echo (FSPGR) scanning. The spatial resolutions of MRI and CT images were resampled to 0.97×0.97×3.00 mm3 prior to DIR. The liver and portal vein structures were manually contoured on MRI and CT images separately.

The implementation details are explained in Supplement, Implementation.

The performance of the proposed method was evaluated using a five-fold cross-validation approach. In detail, the 50 datasets were initially divided randomly and evenly into five groups. Four of these groups were utilized for training purposes, while the remaining group was reserved for testing. This process was repeated five times, with each group serving as the testing set in a rotation.

Qualitative evaluations of the proposed method were performed by visually assessing the alignment between the planning CT and deformed MRI images. A fusion image between the planning CT and deformed MRI images was generated for visual assessment. Quantitatively, the deformed MRI image was evaluated against the CT image using the TRE, DSC, and MSD calculated between the deformed contours of the MRI image and the manual contours of the CT image.

3. Results

3.A. Comparison with state-of-the-art

Two state-of-the-art deep learning-based methods13,30 were compared in this work. VoxelMorph formulated DIR as a function that maps an input image pair to a deformation field that aligns these images. A CNN was used to parameterize the mapping function. Given a new pair of scans, VoxelMorph generated a DVF by directly evaluating the function. In MIND, a deep learning-based network was proposed to directly predict the DVF for MRI-CT liver DIR. To overcome the challenge of multimodal registration, a modality-independent descriptor was incorporated into the deep learning network to explore the correlations between the MRI and CT images. The difference of the proposed method when compared to the two recent deep learning-based methods can be summarized as: (1) rather than using unsupervised learning as in the other methods, the proposed method was a semi-supervised method in which liver contour surface matching was used as a contour-based loss to supervise the model, and (2) the proposed method incorporated the Swin transformer into a CNN-based model, the utility of which was demonstrated in the previous subsection.

The deformation results are shown in Fig. 3. Row (a) shows three axial slices of the planning CT image of a different patient, where the corresponding original MRI images are shown in row (b). Row (c) shows the fusion images of the planning CT (a) and the original MRI images (b). Row (d) shows the deformed MRI image via VoxelMorph.13 Row (e) shows the fusion images between the planning CT (a) and deformed MRI images (d). Row (f) shows the deformed MRI images via MIND.30 Row (g) shows the fusion images between the planning CT (a) and deformed MRI images (f). Row (h) shows the deformed MRI images using the proposed method with the Swin transformer. Row (i) shows the fusion images between the planning CT (a) and deformed MRI images (h).

Figure 3.

Figure 3.

Row (a): planning CT images, which are regarded as fixed images. Row (b): MRI images, which are regarded as moving images. Row (c): deformed MRI images using VoxelMorph.13 Row (d): deformed MRI images using MIND.30 Row (e): deformed MRI images using the proposed method incorporating the Swin transformer.

Several observations can be drawn from Fig. 3. MIND had mismatches for some OARs with higher contrast, whereas the other two methods performed well. For example, when comparing (c1) to (d1) and (e1), MIND was found to produce an enlarged liver and reduced spleen. VoxelMorph didn’t work well for cases with abdominal compression when compared to the other methods. When comparing (c2) to (d2) and (e2), one can see that the deformed images of the proposed method and the MIND method show agreement in the CT (fixed image) at the compressed abdominal region, whereas VoxelMorph does not. Finally, the proposed method shows better bowel matching, which can be seen when comparing (c3) to (d3) and (e3).

Table 1 shows the numerical results. Overall, the mean DSC values of the liver and portal vein increased from 0.850±0.102 and 0.628±0.129 to 0.903±0.044 and 0.763±0.073 after DIR using the proposed method. The mean MSD of the liver decreased from 7.216±4.513 mm to 3.232±1.483 mm. The TRE decreased from 26.238±2.769 mm to 8.492±1.058 mm. In terms of TRE, the proposed method outperforms the other two methods by about 10%.

Table 1.

Overall quantitative results achieved via the proposed method and state-of-the-art methods. P-values with each method evaluated against the proposed method are shown below the mean±std.

DSC of liver DSC of portal vein MSD of liver (mm) TRE (mm)
Before DIR 0.850±0.102
(p=0.001)
0.628±0.129
(p<0.001)
7.216±4.513
(p<0.001)
26.238±2.769
(p<0.001)
VoxelMorph 0.894±0.057
(p=0.346)
0.742±0.116
(p=0.194)
3.730±1.911
(p=0.122)
9.364±1.043
(p<0.001)
MIND 0.889±0.068
(p=0.198)
0.752±0.131
(p=0.550)
3.519±1.833
(p=0.368)
10.775±1.356
(p<0.001)
Proposed 0.903±0.044 0.763±0.073 3.232±1.483 8.492±1.058

3.B. Ablation study

Ablation study was conducted to demonstrate the utility of the Swin transformer and MIND loss. See more details in Supplement, Ablation study.

4. Discussion

In this study, we presented a new approach for liver SBRT MRI-CT cross-modality DIR using a diffeomorphic transformer-based method. Our method assumed diffeomorphic deformations and leveraged topology-preserved deformation features extracted from a probabilistic diffeomorphic registration model to accurately capture abdominal motion and estimate the DVF. To enhance the deformable feature extraction, we integrated Swin transformers, which have shown excellent performance in motion tracking, into a CNN-based model. This integration allowed us to extract high-quality features that capture the deformations accurately. We optimized our model using a combination of volume-based similarity for unsupervised training and surface matching for semi-supervised training. This dual optimization approach ensures that the generated DVF not only aligns the volumes but also matches the surfaces with a particular focus on OARs. By increasing the reasonability of the generated DVF, we aimed to improve the overall quality of the registration process. Paired two-sample t-tests (two-tailed) were used to evaluate the significance. For Table 1, the proposed method has better metrics than VoxelMorph and MIND for the DSCs of the liver and portal vein, MSD of the liver, and TRE, with associated p-values shown. Compared to VoxelMorph and MIND, TRE is the metric that is statistically significant (p<0.001) for the proposed method. The differences for DSCs and MSD were not statistically significant, however the mean and standard deviation values for the proposed method are improved over both VoxelMorph and MIND in all cases.

As compared to deterministic registration methods, the probabilistic DIR accounts for spatial variability in deformations. Instead of providing a single deterministic transformation, it can offer a distribution of likely transformations, providing a more comprehensive representation of the possible spatial variations. To achieve this in our work, as shown in Eq. (5) and Eq. (6), the proposed method optimized the model based on the likelihood function, which is the KL divergence between the true segmentation and the approximate segmentation. The benefit of this framework is the capability of dealing with inherent ambiguities between the MRI and CT images, especially when dealing with complex anatomical structures, such as in the abdominal region.

Three potential issues exist with the current proposed method. First, the performance of DIR can be affected by the MRI image quality. As can be seen from Fig. 3, the patient’s MRI image quality is affected by inhomogeneity. Our optimization of the model during training was based on MIND, which is a cross-modality image similarity metric, and surface matching loss. Although surface matching loss may not be affected by this involved bias, MIND, which is based on the texture information compared between deformed MRI and planning CT images, can be affected by this bias. To reduce this potential issue, incorporating MRI bias correction as a first step into our deep learning-based model will be a future focus. Recently, generative adversarial networks (GANs) have been used for MRI intensity non-uniformity correction.33 However, the computation complexity of the GAN can be dramatically increased due to involvement of an additional discriminator. A more efficient method of integrating MRI image quality improvement into our deep learning-based model can be another future focus.

Another potential limitation is that the model trained by MIND image similarity loss may not be able to cover the full range of MRI modalities. In this work, we only trained and tested our model on T1-weighted MRI with FSPGR scanning. Clinically, the patient may not have been scanned using T1-weighted MRI. In a future work, inclusion of variable MRI modalities to train our model will be another focus.

The landmarks used for evaluating the proposed method and comparing it to other methods are the same for all methods and were selected from a region of large abdominal motion and the surface of the liver. By using the proposed method, the TRE decreased from 26.238±2.769 mm before DIR to 8.492±1.058 mm after DIR. When compared to recent studies of multi-modality liver deformable registration,34 our TRE performance is larger (as compared to mean landmark errors of 3.20–5.36 mm in34). One major reason may come from the different datasets. In our work, the slice thickness was 3 mm, whereas in34 the mean slice thicknesses were 1.64 mm for T2 MRI and 0.8 mm for T1 MRI. Another potential reason may be that the multi-modality MRI sequences used in34, such as contrast-enhanced T1, T2, and diffusion-weighted imaging MRI, provide more comprehensive structural information when compared to only using T1 in our work. Testing the model’s performance on multi-modality MRI sequences for liver DIR will be another future focus.

5. Conclusion

We developed a deformable image registration method for abdomen MRI-CT images using a diffeomorphic transformer. The method estimates a DVF from an MRI-CT image pair and applies it to deform the MRI image to match the CT image. This provides an effective solution for registering abdominal MRI-CT images, which can be useful in delineating target volumes and OARs for liver SBRT.

Acknowledgments

This research is partially supported by the National Institutes of Health under Award Numbers R01CA272991, R56EB033332, R01EB032680, and P30CA008748.

Reference

  • 1.Sahgal A, Roberge D, Schellenberg D, et al. The Canadian Association of Radiation Oncology scope of practice guidelines for lung, liver and spine stereotactic body radiotherapy [published online ahead of print 2012/05/29]. Clin Oncol (R Coll Radiol). 2012;24(9):629–639. [DOI] [PubMed] [Google Scholar]
  • 2.Huertas A, Baumann AS, Saunier-Kubs F, et al. Stereotactic body radiation therapy as an ablative treatment for inoperable hepatocellular carcinoma [published online ahead of print 2015/06/02]. Radiother Oncol. 2015;115(2):211–216. [DOI] [PubMed] [Google Scholar]
  • 3.Doi H, Uemoto K, Suzuki O, et al. Effect of primary tumor location and tumor size on the response to radiotherapy for liver metastases from colorectal cancer [published online ahead of print 2017/07/12]. Oncol Lett. 2017;14(1):453–460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Hellman S, Weichselbaum RR. Importance of local control in an era of systemic therapy [published online ahead of print 2005/11/03]. Nat Clin Pract Oncol. 2005;2(2):60–61. [DOI] [PubMed] [Google Scholar]
  • 5.Mahadevan A, Blanck O, Lanciano R, et al. Stereotactic Body Radiotherapy (SBRT) for liver metastasis – clinical outcomes from the international multi-institutional RSSearch® Patient Registry. Radiation Oncology. 2018;13(1):26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Doi H, Beppu N, Kitajima K, Kuribayashi K. Stereotactic Body Radiation Therapy for Liver Tumors: Current Status and Perspectives [published online ahead of print 2018/01/29]. Anticancer Res. 2018;38(2):591–599. [DOI] [PubMed] [Google Scholar]
  • 7.Toesca DAS, Ibragimov B, Koong AJ, Xing L, Koong AC, Chang DT. Strategies for prediction and mitigation of radiation-induced liver toxicity [published online ahead of print 2018/02/13]. J Radiat Res. 2018;59(suppl_1):i40–i49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Tong VJW, Shelat VG, Chao YK. Clinical application of advances and innovation in radiation treatment of hepatocellular carcinoma [published online ahead of print 2022/01/07]. J Clin Transl Res. 2021;7(6):811–833. [PMC free article] [PubMed] [Google Scholar]
  • 9.Lei Y, Harms J, Wang T, et al. MRI-only based synthetic CT generation using dense cycle consistent generative adversarial networks. Medical Physics. 2019;46(8):3565–3581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Liu Y, Lei Y, Wang T, et al. MRI-based treatment planning for liver stereotactic body radiotherapy: validation of a deep learning-based synthetic CT generation method. The British Journal of Radiology. 2019;92(1100):20190067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Fu Y, Lei Y, Wang T, Curran WJ, Liu T, Yang X. Deep learning in medical image registration: a review. Physics in Medicine & Biology. 2020;65(20):20TR01. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Cao X, Yang J, Zhang J, et al. Deformable Image Registration Based on Similarity-Steered CNN Regression. Paper presented at: Medical Image Computing and Computer Assisted Intervention − MICCAI 2017; 2017//, 2017; Cham. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Balakrishnan G, Zhao A, Sabuncu MR, Guttag J, Dalca AV. VoxelMorph: A Learning Framework for Deformable Medical Image Registration. IEEE Transactions on Medical Imaging. 2019;38(8):1788–1800. [DOI] [PubMed] [Google Scholar]
  • 14.Li H, Fan Y. Non-rigid image registration using self-supervised fully convolutional networks without training data [published online ahead of print 2018/08/07]. Proc IEEE Int Symp Biomed Imaging. 2018;2018:1075–1078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Lei Y, Fu Y, Wang T, et al. 4D-CT deformable image registration using multiscale unsupervised deep learning. Physics in Medicine & Biology. 2020;65(8):085003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Brock KK, Mutic S, McNutt TR, Li H, Kessler ML. Use of image registration and fusion algorithms and techniques in radiotherapy: Report of the AAPM Radiation Therapy Committee Task Group No. 132. Medical Physics. 2017;44(7):e43–e76. [DOI] [PubMed] [Google Scholar]
  • 17.Yang X, Yang J. Efficient diffeomorphic metric image registration via stationary velocity. Journal of Computational Science. 2019;30:90–97. [Google Scholar]
  • 18.Rebsamen M, McKinley R, Radojewski P, et al. Reliable brain morphometry from contrast-enhanced T1w-MRI in patients with multiple sclerosis. Human Brain Mapping. 2023;44(3):970–979. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Avants BB, Epstein CL, Grossman M, Gee JC. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain [published online ahead of print 2007/07/31]. Med Image Anal. 2008;12(1):26–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Krebs J, Mansi T, Mailhé B, Ayache N, Delingette H. Unsupervised Probabilistic Deformation Modeling for Robust Diffeomorphic Registration. Paper presented at: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; 2018//, 2018; Cham. [Google Scholar]
  • 21.Ashburner J A fast diffeomorphic image registration algorithm. NeuroImage. 2007;38(1):95–113. [DOI] [PubMed] [Google Scholar]
  • 22.Dalca AV, Balakrishnan G, Guttag J, Sabuncu MR. Unsupervised learning of probabilistic diffeomorphic registration for images and surfaces. Medical Image Analysis. 2019;57:226–236. [DOI] [PubMed] [Google Scholar]
  • 23.Hu Y, Modat M, Gibson E, et al. Weakly-supervised convolutional neural networks for multimodal image registration. Medical Image Analysis. 2018;49:1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Fu Y, Lei Y, Wang T, et al. Biomechanically constrained non-rigid MR-TRUS prostate registration using deep learning based 3D point cloud matching. Medical Image Analysis. 2021;67:101845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Shi X, Peng J, Li J, Yan P, Gong H. The Iterative Closest Point Registration Algorithm Based on the Normal Distribution Transformation. Procedia Computer Science. 2019;147:181–190. [Google Scholar]
  • 26.Henry EU, Emebob O, Omonhinmin CA. Vision Transformers in Medical Imaging: A Review. ArXiv. 2022;abs/2211.10043. [Google Scholar]
  • 27.Liu Z, Lin Y, Cao Y, et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV); 2021. [Google Scholar]
  • 28.Golovina NY. The nonlinear stress-strain curve model as a solution of the fourth order differential equation. International Journal of Pressure Vessels and Piping. 2021;189:104258. [Google Scholar]
  • 29.Heinrich MP, Jenkinson M, Bhushan M, et al. MIND: Modality independent neighbourhood descriptor for multi-modal deformable registration. Medical Image Analysis. 2012;16(7):1423–1435. [DOI] [PubMed] [Google Scholar]
  • 30.Fu Y, Lei Y, Wang T, et al. Deformable MRI-CT liver image registration using convolutional neural network with modality independent neighborhood descriptors. Vol 11597: SPIE; 2021. [Google Scholar]
  • 31.Lopez R, Boyeau P, Yosef N, Jordan MI, Regier J. Decision-Making with Auto-Encoding Variational Bayes. ArXiv. 2020;abs/2002.07217. [Google Scholar]
  • 32.Arsigny V, Commowick O, Pennec X, Ayache N. A Log-Euclidean Framework for Statistics on Diffeomorphisms. Paper presented at: Medical Image Computing and Computer-Assisted Intervention – MICCAI 2006; 2006//, 2006; Berlin, Heidelberg. [DOI] [PubMed] [Google Scholar]
  • 33.Dai X, Lei Y, Liu Y, et al. Intensity non-uniformity correction in MR imaging using residual cycle generative adversarial network. Physics in Medicine & Biology. 2020;65(21):215025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Spahr N, Thoduka S, Abolmaali N, Kikinis R, Schenk A. Multimodal image registration for liver radioembolization planning and patient assessment. International Journal of Computer Assisted Radiology and Surgery. 2019;14(2):215–225. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES