Joint synthesis and registration network for deformable MR-CBCT image registration for neurosurgical guidance

R Han; C K Jones; J Lee; X Zhang; P Wu; P Vagdargi; A Uneri; P A Helm; M Luciano; W S Anderson; J H Siewerdsen

doi:10.1088/1361-6560/ac72ef

. Author manuscript; available in PMC: 2022 Dec 30.

Published in final edited form as: Phys Med Biol. 2022 Jun 10;67(12):10.1088/1361-6560/ac72ef. doi: 10.1088/1361-6560/ac72ef

Joint synthesis and registration network for deformable MR-CBCT image registration for neurosurgical guidance

R Han ¹, C K Jones ², J Lee ³, X Zhang ¹, P Wu ¹, P Vagdargi ⁴, A Uneri ¹, P A Helm ⁵, M Luciano ⁶, W S Anderson ⁶, J H Siewerdsen ^1,^2,^4,⁶

PMCID: PMC9801422 NIHMSID: NIHMS1855698 PMID: 35609586

Abstract

Objective.

The accuracy of navigation in minimally invasive neurosurgery is often challenged by deep brain deformations (up to 10 mm due to egress of cerebrospinal fluid during neuroendoscopic approach). We propose a deep learning-based deformable registration method to address such deformations between preoperative MR and intraoperative CBCT.

Approach.

The registration method uses a joint image synthesis and registration network (denoted JSR) to simultaneously synthesize MR and CBCT images to the CT domain and perform CT domain registration using a multi-resolution pyramid. JSR was first trained using a simulated dataset (simulated CBCT and simulated deformations) and then refined on real clinical images via transfer learning. The performance of the multi-resolution JSR was compared to a single-resolution architecture as well as a series of alternative registration methods (symmetric normalization (SyN), VoxelMorph, and image synthesis-based registration methods).

Main results.

JSR achieved median Dice coefficient (DSC) of 0.69 in deep brain structures and median target registration error (TRE) of 1.94 mm in the simulation dataset, with improvement from single-resolution architecture (median DSC = 0.68 and median TRE = 2.14 mm). Additionally, JSR achieved superior registration compared to alternative methods—e.g. SyN (median DSC = 0.54, median TRE = 2.77 mm), VoxelMorph (median DSC = 0.52, median TRE = 2.66 mm) and provided registration runtime of less than 3 s. Similarly in the clinical dataset, JSR achieved median DSC = 0.72 and median TRE = 2.05 mm.

Significance.

The multi-resolution JSR network resolved deep brain deformations between MR and CBCT images with performance superior to other state-of-the-art methods. The accuracy and runtime support translation of the method to further clinical studies in high-precision neurosurgery.

Keywords: deformable registration, inter-modality registration, deep learning, image synthesis, neurosurgery

1. Introduction

Neuro-endoscopy is prevalent in treating a wide spectrum of neurological conditions, including tumor biopsy (Oppido et al 2011), hydrocephalus (Spennato et al 2007), and deep brain stimulation (DBS) (Groiss et al 2009). Current navigation workflow involves using preoperative magnetic resonance (MR) imaging for planning (e.g. organ/tumor segmentation, and electrode target definition) and intraoperative cone-beam computed tomography (CBCT) for rigid localization. Compared to MR, CBCT exhibits limited soft-tissue contrast and image artifacts (shading and streaks), but it is less expensive, mobile, and faster, while providing higher spatial resolution and good visualization of bone and surgical instrumentation.

The accuracy of navigation is often challenged by deep brain deformations induced by egress of cerebrospinal fluid (CSF) and introduction of instrumentation during surgery. Such deformations in deep brain parenchyma can be up to 10 mm (Nowell et al 2014). Conventional neuro-navigation using rigid registration of stereotactic frames between preoperative MR and intraoperative CBCT fails to address such deformations and can lead to inaccurate targeting and device placement (Nabavi et al 2001). To address the challenge, a MR-CBCT deformable registration is needed to establish correspondence between preoperative and intraoperative coordinates. However, MR-CBCT registration is very challenging due to (i) the distinct image appearances with non-linear, non-monotonic, and non-reproducible correspondence between MR and CBCT image intensities and (ii) suboptimal CBCT image quality (poor soft-tissue contrast, high image noise, and strong image artifacts). Largely due to the loss of soft-tissue contrast resolution in CBCT, previous work on MR-CBCT registration is limited to rigid registration (Dean et al 2012, Rivest-Hénault et al 2015), and few deformable registration methods have been proposed.

Instead of MR-CBCT registration, a number of registration algorithms have been proposed for MR-CT and CBCT-CT deformable registration based on iterative optimization of a similarity metric. Most MR-CT registration algorithms optimize multi-modality image similarity metrics between fixed and moving images, including mutual information (MI) (Modat et al 2010, Denis de Senneville et al 2016, Han et al 2018) and modality-insensitive neighborhood descriptor (MIND) (Heinrich et al 2012, Reaungamornrat et al 2016). Algorithms for CBCT-CT registration have employed local intensity corrections (for correction of shading artifacts) to map CBCT to a better match of CT appearance, followed by intra-modality registration (Nithiananthan et al 2011, Zhen et al 2012, Park et al 2017). However, such iterative optimization-based approaches often carry high computational load and long runtimes, limiting their application in intraoperative workflow.

Recent advances in deep learning-based registration demonstrate superior accuracy and runtime compared to such methods. Several deep learning-based registration algorithms have focused on unsupervised approaches involving a convolutional neural network (CNN) with encoder-decoder architecture to predict the deformation field between fixed and moving images without ground truth definition of deformation fields (Balakrishnan et al 2018, de Vos et al 2019). A loss function consisting of an image similarity metric (e.g. normalized cross-correlation (NCC) for intra-modality registration) and deformation regularization is minimized during network learning. Inter-modality registration algorithms often rely on either some degree of weak supervision (e.g. labeled segmentations) or optimize an inter-modality similarity metric. Common inter-modality similarity metrics, however, are often subject to reduced registration accuracy compared to intra-modality metrics. Momin et al (2021) sought to address such a challenge using a deep learning self-correlation descriptor for registration of MR and CBCT pelvis images. In Fu et al (2021), biomechanical models of segmented anatomy were constructed to constrain deformations between MR and CBCT; however, reliable anatomical segmentation may not be feasible for other clinical applications.

A popular approach to mitigate the challenges in multi-modal image registration is to convert the images into a common (intermediate) modality domain using image synthesis, permitting subsequent intra-modality registration. For MR-CT registration, Han et al (2022a), Wei et al (2019), Huan Yang et al (2020a, 2020b) used generative adversarial networks (GANs) to generate synthetic images from the input and a registration network to learn the deformation between intra-modality images in either the CT or MR domain. MR-CBCT registration, however, exhibits additional challenges compared to MR-CT registration, due to the strong disparity in image appearance and limitations in CBCT soft-tissue image quality.

Preliminary studies (Han et al 2022b) proposed a joint synthesis and registration network that synthesizes MR and CBCT to an intermediate CT domain for registration. The network used encoders to extract latent representations from input images and decoders to generate synthetic CT images and estimate the deformation field between the latent representations. In this work, we present a substantial extension of the method by incorporating a multi-resolution decoder in the registration network, an improved network architecture, and investigation of novel loss functions. The multi-resolution decoder, inspired by the multi-resolution decoder for intra-modality registration Hu et al (2019), Cao et al (2020), is employed for an inter-modality synthesis and registration network. In the work reported below, a method for deformable registration of MR and CBCT brain images is proposed for application in minimally invasive neurosurgery, with major points of novel contribution including: (i) jointly performing MR-to-CT and CBCT-to-CT image synthesis and CT registration using shared encoders and separate decoders between image synthesis and registration tasks that are learned jointly; (ii) a novel multi-resolution pyramid registration decoder designed to estimate a diffeomorphic deformation field between the moving and fixed images (in the synthetic CT domain); and (iii) initial training of the proposed network with simulated brain deformations, followed by transfer learning for training with real clinical images. To our knowledge, this is the first work to test a deep learning-based deformation brain registration algorithm for CBCT-guided procedures and includes comparison to a series of state-of-the-art iterative optimization-based and deep learning-based registration methods in real clinical images from neurosurgery. The work focuses on diffeomorphic deformations induced by minimally invasive functional neurosurgery (e.g. placement of DBS electrodes via a burr hole) and do not consider large morphological/structural/topological changes associated with gross pathological lesions (e.g. cysts or tumors) or approach via craniotomy.

2. Algorithmic methods

A deformable registration framework is proposed for registering preoperative 3D MR images to intraoperative 3D CBCT images using a CNN-based joint synthesis and registration 3D network (denoted JSR). The JSR network jointly estimates synthetic 3D CT images and the 3D deformation field between the input images. As illustrated in figure 1, let I_MR be the moving preoperative MR image and I_CBCT be the fixed intraoperative CBCT image as input. JSR predicts the deformation field φ that maps the moving I_MR to the fixed I_CBCT coordinate frame.

Figure 1. — Schematic illustration of the joint synthesis and registration (JSR) network. Two encoders extract information from the moving MR (*I_MR*) and the fixed CBCT (*I_CBCT*) into latent representations, *z_MR* and *z_CBCT*. Two synthesis decoders separately decode the latent representations to synthetic CT images. Finally, a registration decoder estimates the deformation field between the moving and fixed images at four resolution levels.

An MR encoder and a CBCT encoder are first used to encode I_MR and I_CBCT into latent representations (z_MR and z_CBCT), which are then decoded via synthesis decoders into synthetic CT images ( ${\hat{I}}_{C T}^{M R}$ and ${\hat{I}}_{C T}^{C B C T}$ ), transforming the inter-modality input into the a common intermediate domain for improved registration learning. For both MR-to-CT and CBCT-to-CT synthesis, the encoder and synthesis decoder with skip connections form a U-Net architecture, which serves as the synthesis generator. Additional CT discriminators are employed to differentiate whether the synthetic CTs ( ${\hat{I}}_{C T}^{M R}$ and ${\hat{I}}_{C T}^{C B C T}$ ) are real or fake.

Simultaneously, z_MR and z_CBCT are concatenated and decoded via a multi-resolution registration decoder. At each resolution level i of the decoding path, the registration decoder estimates an intermediate deformation field (ϕ_i, progressively refining the deformation field in a coarse-to-fine pyramid. The final deformation (φ) accumulated from predictions across four resolution levels is used to deform I_MR to produce a registered MR image (I_MR∘ϕ) that is aligned with the fixed image (I_CBCT).

2.1. Network architecture

2.1.1. Encoders

The input images I_MR and I_CBCT are first encoded into latent representations (z_MR and z_CBCT) via the MR encoder and CBCT encoder as depicted in figure 1, respectively. The architecture of the encoder is depicted in figure 2(a), consisting of four downsampling ‘ResBlocks’ (He et al 2015). Details of the ResBlocks are further depicted in figures 2(b) and (c) for down sampling by 2 and without downsampling, respectively. Convolutions are followed by LeakyReLU activation and instance normalization. The encoders extract low-resolution latent representations of the input MR and CBCT images that are used for downstream synthesis and registration, promoting the latent representations to encode information pertinent to both tasks.

2.1.2. Synthesis decoders

Two CT synthesis decoders are employed to decode z_MR and z_CBCT into synthetic CT images. As shown in figure 2(a), each decoder consists of a series of trilinear upsampling to increase the spatial size by two and ResBlocks. A final 3D convolution without activation or normalization is appended at the end to estimate the single-channel synthetic CT image. Additionally, skip connections are used to connect the encoders to the synthesis decoders at corresponding resolution to preserve fine-grained details from the encoding-decoding process.

The encoder and synthesis decoder with skip connections form a U-Net architecture, which serves as the generator for conditional GAN CT synthesis (for MR-to-CT synthesis and CBCT-to-CT synthesis, respectively). Through the generators, synthetic moving CT $({\hat{I}}_{C T}^{M R})$ is generated from the moving I_MR and synthetic fixed CT $({\hat{I}}_{C T}^{C B C T})$ is generated from the fixed I_CBCT. For each of the MR-to-CT and CBCT-to-CT synthesis, a CT discriminator ( $D_{C T}^{M R}$ and $D_{C T}^{C B C T}$ , respectively) conditioned on input MR/CBCT image is also applied. The discriminator follows a multi-scale Patch-GAN design that distinguishes between real and synthetic CT images at three resolution levels. Both the generator and discriminator models are a modified 3D version of the Pix2Pix conditional GAN network (Isola et al 2017).

2.1.3. Registration decoders

For deformable registration, a multi-resolution registration decoder is employed to progressively predict the deformation field in a multi-resolution pyramid. The registration decoder uses four devised ‘registration block’ (shown as pink box in figure 1 and detailed in figure 3(a)) to predict the deformation fields at four resolution levels (ϕ_i, i = 0, 1, 2, 3) using intermediate feature maps from the CT synthesis decoders (paired feature maps from the fixed CBCT synthesis (F₃ ~ F₀) and moving MR synthesis (M₃ ~ M₀)). The ‘registration block’ uses three inputs, the intermediate fixed/moving feature maps (F_i and M_i) and the predicted deformation from the previous resolution (ϕ_i. ϕ_i is first upsampled by 2 to match with the current level spatial dimension and is used to warp the moving feature map M_i via a spatial transformer (STN) module. The previous deformation field up2(ϕ_i+1), the warped moving feature map (M_i∘up2(ϕ_i+1)), and the fixed feature map (F_i) are concatenated and fed into two convolutions with residual connection (followed by LeakyReLU activation and instance normalization), one 1 × 1 × 1 convolution, and a vector integration module (VectInt) that exponentiates the deformation into a diffeomorphic field (Dalca et al 2018). The output from the series of operations is an update (u_i) that corresponds to the residual deformation between the fixed image and warped moving image by the previous deformation field. Instead of directly adding the update to the previous deformation field, a deformation composition is computed via the STN to preserve diffeomorphism:

ϕ_{i} = c o m p o s e (u p 2 (ϕ_{i + 1}), u_{i}) = u p 2 (ϕ_{i + 1}) + u_{i} \circ u p 2 (ϕ_{i + 1}) .

(1)

Figure 3. — Registration decoder design. (a) The proposed ‘registration block’ used in the registration decoder takes feature maps from the synthesis decoder (*M_i* and *F_i*) and the deformation field (ϕ_i+1) from the previous resolution level as input to predict the deformation field (*ϕ_i*) at the current resolution level. (a–c) Deformation field predicted at three resolution levels, from coarse to fine deformations. (d) ‘Registration block’ used in single-resolution JSR ablation study in section 3.3.

Figures 3(b)–(d) illustrates one example of the progressive deformation field estimations from the registration decoder. While the lower-resolution deformation fields capture coarse and more global deformations, the higher-resolution deformation fields are refined on the coarse fields and contain more fine-grained local deformations. The multi-resolution pyramid of the registration decoder is thus able to obtain deformations of a wide range of magnitudes.

2.2. End-to-end learning of JSR

The JSR network parameters are optimized using a multi-task learning strategy that jointly learns CT image synthesis and deformable registration. The image synthesis is learned using either a conditional GAN or Cycle-GAN depending on the availability of ground truth CT images with same spatial alignment as the input images. The deformable registration is learned in an unsupervised manner by optimizing an image similarity loss function.

2.2.1. Image synthesis loss

The loss functions of a conditional GAN image synthesis are first described, which assumes availability of reference CT both from the preoperative stage $(I_{C T}^{pre})$ and from the intraoperative stage $(I_{C T}^{intra})$ . Three loss functions are computed: the adversarial loss, the L1 loss, and the structural-consistency loss. The adversarial loss allows the generators to generate synthetic CT images that can fool the discriminators, while training the discriminators to better distinguish between real and synthetic images. A least-square minimization is used in the work:

ℒ_{a d v}^{M R} = D_{C T}^{M R} {({\hat{I}}_{C T}^{M R}, I_{M R})}^{2} + {(1 - D_{C T}^{M R} (I_{C T}^{p r e}, I_{M R}))}^{2},

(2a)

ℒ_{a d v}^{C B C T} = D_{C T}^{C B C T} {({\hat{I}}_{C T}^{C B C T}, I_{C B C T})}^{2} + {(1 - D_{C T}^{C B C T} (I_{C T}^{i n t r a}, I_{C B C T}))}^{2},

(2b)

where the discriminators take the concatenation of real/synthetic CT image and input image (MR or CBCT) as input.

The L1 loss is applied to reduce the difference between synthetic CT and reference CT:

ℒ_{L 1}^{pre} = | I_{C T}^{pre} - {\hat{I}}_{C T}^{M R} |,

(3a)

ℒ_{L 1}^{intra} = | I_{C T}^{intra} - {\hat{I}}_{C T}^{C B C T} | .

(3b)

Additionally, a structural-consistency loss is added to improve the image anatomical alignment between input images and synthetic images, which is important for deformable registration. Following Heran Yang et al (2020a, 2020b), MIND is used to calculate local structural feature vectors around each voxel and the L1 loss of MIND features between input and synthetic images is minimized:

ℒ_{structure} = | M I N D ({\hat{I}}_{C T}^{M R}) - M I N D (I_{M R}) | + | M I N D ({\hat{I}}_{C T}^{C B C T}) - M I N D (I_{C B C T}) | .

(4)

In this work, MIND was implemented with 6 nearest neighbors and computed using local patch of size 7.

The synthesis training loss is thus the combination of the adversarial, L1, and structural-consistency loss:

ℒ_{G A N} = ℒ_{adv} + λ_{L 1} ℒ_{L 1}^{pre} + λ_{L 1} ℒ_{L 1}^{intra} + λ_{structure} ℒ_{structure},,

(5)

where λ_L1 and λ_structure are hyperparameters that control the relative importance of the L1 loss and structural-consistency loss. The network is trained by alternating between updating $D_{C T}^{M R}$ and $D_{C T}^{C B C T}$ (with JSR fixed) and updating JSR (with $D_{C T}^{M R}$ and $D_{C T}^{C B C T}$ fixed).

Alternatively, if no paired reference CT that is spatially aligned with MR or CBCT is available, unsupervised Cycle-GAN can be used for image synthesis. In an initial synthesis pre-training, MR-CT cycled synthesis and CBCT-CT cycled synthesis, each consists of forward and backward generators and discriminators, will be separately trained. Once the Cycle-GAN networks are trained, the MR-to-CT generator and CBCT-to-CT generator parameter weights are copied to the JSR MR encoder-synthesis decoder and CBCT encoder-synthesis decoder, respectively. Further training of the JSR network image synthesis then minimize equation (5) with λ_L1 set to 0. For details of Cycle-GAN synthesis implementation, please refer to Han et al (2022a). For simplicity, the supervised conditional GAN synthesis will be used for the remaining of the paper, and comparison of supervised conditional GAN and unsupervised Cycle-GAN is discussed in section 5.1.

2.2.2. Image registration loss

The unsupervised deformable registration loss consists of three terms, an intra-modality image similarity loss, an inter-modality image similarity loss, and a deformation smoothness regularization. The intra-modality image similarity loss measures the patch-based NCC between the fixed and synthetic CT images. As the deformation fields are estimated progressively in a coarse-to-fine pyramid, the loss is computed at all 4 resolution levels:

ℒ_{intramodal} = - \sum_{i = 0}^{3} N C C (down 2^{i} ({\hat{I}}_{C T}^{M R} \circ ϕ), down 2^{i} ({\hat{I}}_{C T}^{C B C T})),

(6)

where down2ⁱ denotes trilinear downsampling with a factor of 2ⁱ. Image synthesis is inevitably associated with inconsistency and error, thus information from the original input may be lost during the mapping process. An additional inter-modality image similarity loss is therefore computed between the fixed synthetic CT and MR:

ℒ_{intermodal} = \sum_{i = 0}^{3} | M I N D (down 2^{i} (I_{M R} \circ ϕ)) - M I N D (down 2^{i} ({\hat{I}}_{C T}^{C B C T})) | .

(7)

The synthetic CT is used as the fixed image instead of the original CBCT because the CBCT contains lower contrast, higher noise, and various artifacts that would potentially deteriorate registration performance.

To encourage smooth deformations, an additional regularization is applied on the L2 norm of deformations ϕ_i, i = 0, 1, 2, 3 from each resolution level:

ℒ_{smooth} = \sum_{i = 0}^{3} {‖ \nabla ϕ_{i} ‖}^{2} .

(8)

The combined loss function for the registration task is therefore:

ℒ_{reg} = ℒ_{intramodal} + λ_{intermodal} ℒ_{intermodal} + λ_{smooth} ℒ_{smooth},,

(9)

where λ_intermodal and λ_smooth are hyperparameters on the intermodal similarity loss and smoothness regularization. At inference, the final predicted deformation ϕ₀ warps the moving I_MR to produce the registered MR image (I_MR∘ϕ).

To optimize the joint parameter space of the image synthesis and registration, the JSR is trained with a combined loss function:

ℒ_{joint} = ℒ_{G A N} + λ_{reg} ℒ_{reg},

(10)

where λ_reg is a hyperparameter balancing between the two loss terms. In this work, ℒ_joint is first optimized with λ_reg = 0 for a small number of iterations (e.g. 5 epochs) to provide a reasonable initialization of the synthetic images. Then λ_reg is computed using adaptive loss balancing for multi-task learning (Chen et al 2018) to dynamically balance between the two tasks.

3. Experimental methods

3.1. Image datasets

Two datasets were used in training, validation, and testing of the proposed method. The first dataset consisted of 50 paired T1-weighted MR and MDCT images without evidence of deformation. The MDCT images were simulated into CBCT images, and simulated deep brain deformations were applied as described below in section 3.1.1 to create a large training dataset. The network trained from the first dataset was then refined via transfer learning on a second dataset, containing 14 pairs of MR and corresponding CBCT images exhibiting large deformations resulting from neurosurgical interventions. The second dataset was used to further evaluate the proposed JSR method under clinical scenarios of large, realistic deformations.

3.1.1. Simulation dataset

Fifty pairs of T1-weighted MR and multi-detector CT (MDCT) images obtained in a retrospective imaging study approved by the institutional review board (IRB) were used. The MR and MDCT scans were acquired on the same day with no evidence of deformation. Preprocessing steps included rigid registration using ITK library (Fedorov et al 2012), resampling to 1.5 × 1.5 × 1.5 mm³ isotropic spacing, and cropping to 128 × 160 × 128 voxels to capture the entire brain at the center of the field of view. CBCT images were simulated from the MDCT images via a high-fidelity forward simulator (Wu et al 2021), which utilized highly accurate, physics-based models of the full imaging chain and image formation process. Incident spectrum, scatter, quantum noise, electronic noise, glare, and lag were included in the simulation under Medtronic O-arm (Littleton, MA) system geometry. The CBCT images were clipped between −100 and 100 HU and normalized to the range 0–1 to focus the dynamic range around soft-tissue. The MR images were similarly normalized to range 0–1 between the 1st–99th percentile of MR image intensities.

Random deep brain diffeomorphic deformations were simulated following the method proposed in Han et al (2022a) to generate additional images for training, by placing multiple attractive/repulsive points within the ventricles to deform the surrounding brain tissue and CSF. The method mimics deep brain deformations induced by CSF egress, the primary source of deformation in neuro-endoscopic surgery. The deformation field generated by each point was modeled according to an inverse power-law with distance, and the combination of multiple random points yielded the overall simulated deformation:

ϕ_{sim} (x) = \exp (\sum_{i = 1}^{N} \frac{α_{i}}{{| s_{i} - x |}^{β_{i}}} (s_{i} - x)),

(11)

where s is the source location, α is the maximum deformation magnitude, (β is the power-law decay rate, and N is the number of attractive/repulsive seed points. The operation exp( ) denotes the vector field exponentiation for diffeomorphic mapping using the ‘VecInt’ module.

Figure 4 shows the simulation of one example MR/CBCT image pair. The original MR image in 4(a) was warped by a random deformation to yield the moving MR image (I_MR) as shown in 4(b), whereas the fixed CBCT image (I_CBCT) in 4(d) was simulated from the MDCT image in 4(c). Additionally, random rigid transformations were also applied, with translations of 0–10 mm and rotations of 0°–10°. In total, images from 45 subjects were used to simulate a total of 400 training pairs and 10 validation pairs, and the remaining 5 subjects were used to simulate 10 testing pairs. Because 3D volumetric images contain a large amount of information (128 slices each) necessary for 3D network training, the resulting dataset size is comparable to that used in other work on 3D image registration with neural networks—e.g. a total of 32, 50, 16, and 30 image pairs used in four respective registration tasks in the MICCAI 2021 Learn2Reg Challenge (Hering et al 2021).

An additional study was conducted to investigate the influence of surgical instrumentation (viz., metal) on algorithm performance. The simulation study was expanded to include a variety of such instruments and realistic metal artifacts in the image data. Metal objects were modeled in a variety of shapes and material content, including 5 mm radius spheres and 5 mm radius × 70 mm length rods, randomly placed in deep brain regions proximal to the thalamus and caudate nucleus in the MDCT image. The attenuation coefficient of these simulated instruments was varied randomly between 0.06 and 0.16 mm⁻¹ [as a reference, A1 (0.08 mm⁻¹)]. Random rigid transformations were also applied to the metals to vary their position and orientation. Simulated CBCT images were formed from the MDCT (with simulated metal) including effects of scatter, beam-hardening, and photon starvation to yield realistic metal artifacts as illustrated in figure 4(e). Simulation of metal instruments in the CBCT projection data (and corresponding metal artifacts in the CBCT reconstruction) was performed using the high-fidelity forward projection simulation pipeline as previously reported and validated in Wu et al (2021), which includes physics-based models of the full image formation process. A total of 100 additional image pairs with metal were generated (from the same 45 subjects) for training in MR (no metal) and CBCT (with metal), with an additional 10 pairs (from the remaining 5 subjects) used for testing.

Quantitative image synthesis performance was evaluated in terms of mean absolute error (MAE) and structural similarity index measure (SSIM) between synthetic CT and reference CT. Quantitative registration performance was evaluated in terms of Dice coefficient (DSC), mean surface distance (SD), and Hausdorff distance (HD) on segmentations of the lateral/third/fourth ventricles, amygdala, hippocampus, caudate nucleus, and thalamus. The segmentations were obtained automatically from MALPEM (Ledig et al 2015) on the original MR images before simulated deformations. The segmentations were propagated to the fixed CBCT coordinates as no deformation existed between the CBCT and the original Mr The segmentations were further warped by the simulated deformation with nearest neighbor interpolation to the moving preoperative MR coordinates. Additionally, target registration error (TRE) was evaluated in terms of the Euclidean distance between target points on the registered and fixed images. Thirty target points were defined by the centroid of small (<1500 mm³) anatomical segmentations throughout the brain (e.g. the amygdala, ventral diencephalon, and medial orbital gyrus). The diffeomorphism of the deformation field was further quantified by the Jacobian determinant (|J_ϕ|), where voxels with |J_ϕ| ≤ 0 indicates non-realistic folding or tearing.

3.1.2. Clinical dataset

A second dataset of paired MR and CBCT images with real deformations induced by neurosurgical intervention was constructed to further evaluate the performance of the registration network on real, clinical settings. A total of 14 cases with preoperative T1-weighted MR and intraoperative CBCT were collected in an IRB-approved retrospective clinical study. Each case exhibited substantial deformation associated with burr hole incision, endoscope and/or shunt placement, and disease progression between the scans. The MR images were selected from the neurosurgery dataset described in Han et al (2022a), which were acquired on different machines with a wide range of imaging protocols (i.e. spin echo/MP-RAGE pulse sequences, 1.5 T/3 T, 2D axial/2D sagittal, and different voxel sizes ranging from 0.45 × 0.45 × 0.9 mm³ to 0.94 × 0.94 × 5 mm³). The CBCT images were acquired using a prototype mobile U-arm (Xu et al 2016, Wu et al 2020) at 100 kV and reconstructed at 0.44 × 0.44 × 0.44 mm³ voxel size with filtered back-projection. Similar to the preprocessing in section 3.1.1, the MR and CBCT image pairs were rigidly registered, resampled to 1.5 × 1.5 × 1.5 mm³ isotropic spacing, cropped to 128 × 160 × 128 voxels, and normalized to 0–1. Intraoperative reference CT images as described in Han et al (2022a) were also available and were similarly processed as the CBCT images. To improve the generalizability of the network and avoid overfitting, data augmentation was applied during training by applying random, rigid transformations (translations of 0–10 mm and rotations of 0°–10°) and scaling (0.8–1.2).

Registration performance was evaluated in terms of DSC, SD, HD, and TRE. Ventricle segmentations were obtained via MALPEM on the MR images and via manual segmentation on the CBCT images. For analysis of TRE, six target points were defined in both MR and CBCT at unambiguous landmarks (e.g. the anterior commissure, interventricular foramen, and the most posterior point on the lateral ventricle posterior horn).

3.2. Implementation details

The proposed JSR network was implemented using PyTorch and trained on a NVIDIA Quadro RTX 6000 with 24 GB of GPU memory. The network predicted a 3D deformation field of the entire 3D image domain (128 × 160 × 128 voxels), and batch size was set to 1 due to the high memory usage for training the 3D network. Adam optimizer with default parameters was used. In the simulation study, learning rate was first set to 2 × 10⁻⁴ for the first 5 epochs for only image synthesis pre-training by optimizing equation (5). Then learning rate was reduced to 1 × 10⁻⁴ to optimize the joint synthesis and registration loss function in equation (10) for 100 epochs. The sensitivity of registration performance to image synthesis pre-training is discussed in section 5.1.

Two hyperparameters associated with the image synthesis loss are the L1 loss hyperparameter (λ_L1) and the structural-consistency loss hyperparameter (λ_strcture). λ_L1 was searched in the range 30–300, and λ_structure was searched in the range 0–20. Nominal values of λ_L1 = 200 and λ_structure = 5 were identified to minimize the L1 accuracy of the synthetic CT images in the validation dataset. In terms of registration losses, the inter-modality similarity loss hyperparameter λ_intermodal was searched over the range 0–1, and the smoothness regularization hyperparameter λ_smooth was searched over the range 0.5–3. Nominal values of λ_intermodal = 0.1 and λ_smooth = 1 were selected to maximize DSC overall segmented structures after registration in the validation dataset. The JSR network was further trained via transfer learning on the simulated data with metal, using the same augmentation and hyperparameters as the previous metal-free simulation, except for the learning rate. Using the previously trained network as an initialization, a smaller learning rate of 1 × 10⁻⁵ was used in transfer learning to refine the network weights.

In the clinical study, JSR was not trained from scratch due to the limited training data and significant changes in scan protocols. Transfer learning is widely used to address the challenges of limited datasets (W eiss et al 2016) by training a network in one domain and transferring it to a closely related target domain. In this work, the network trained on the larger simulation dataset was used as an initialization, and transfer learning was applied to refine the network using images from the clinical dataset with a smaller learning rate. The transfer learning included two steps—synthesis pre-training and joint synthesis and registration training. In synthesis pre-training, only the synthesis decoders were refined while freezing other parts of the JSR network that minimized equation (5). Due to the lack of reference CT that spatially aligned with the MR or CBCT images, λ_L1 was set to 0. The pre-training used a learning rate of 1 × 10⁻⁴ for the 30 epochs. Then in joint synthesis and registration training, the entire network was refined with a learning rate of 1 × 10⁻⁵ set on the synthesis and registration decoders and 1 × 10⁻⁶ set on the MR and CBCT encoders. A three-fold cross-validation was performed by splitting the 14 image pairs into 10 training cases, 2 validation cases, and 2 test cases.

3.3. Comparison of registration methods

To evaluate the performance of the proposed JSR network, a series of registration methods were implemented for comparison. JSR was first compared to a single-resolution JSR (JSR-Single) ablation method to investigate the effect of the multi-resolution registration decoder (described in section 2.1.3), and the findings of the comparison is discussed in section 5.2. A second question investigated below was whether transforming inter-modality registration into intra-modality registration via image synthesis improves registration performance. To this end, two state-of-the-art direct inter-modality registration methods were implemented—an iterative optimization-based registration (SyN-MI) and a deep learning-based registration (VM-MI). Additionally, two deep learning-based registration methods using image synthesis to the MR or CT domain (VM-Synth-NCC and VM-DualSynth-NCC) were implemented to test which domain (MR or CT domain) is more suitable to the task of MR-CBCT registration.

3.3.1. Single-resolution JSR (JSR-single)

An additional JSR network with a single-resolution registration decoder, denoted JSR-single, was implemented to investigate the effect of the multi-resolution registration decoder. The single-resolution registration decoder does not explicitly estimate a deformation field at each resolution level. The ‘registration block’ is illustrated in figure 3(e), where the fixed feature map (F_i) and moving feature maps without warping (M_i) are concatenated with the fused feature map from the previous level (C_i+1) and fed into two convolutions with residual connection and one 1 × 1 × 1 convolution to yield a fused feature map at the current level (C_i). To maintain a fair comparison between the single-resolution and multi-resolution registration decoder, the same number of convolution parameters were used. Since only one deformation field is predicted at the end of the decoding, registration loss functions equations (6)–(8) were computed only at the full resolution.

3.3.2. SyN-MI

Symmetric normalization (SyN), an iterative optimization-based deformable registration algorithm popular in brain registration (Avants et al 2008, Murphy et al 2011) was first performed to directly register MR to CBCT. MI was selected as the image similarity metric. Parameter search was performed to maximize DSC after registration for the simulation study: step size 0.5, update field Gaussian smoothing 2 voxels, total field Gaussian smoothing 1 voxel, 4 × 2 × 1 multiresolution pyramid, convergence criteria of 1 × 10⁻⁵, and 32 histogram bins for MI calculation. Parameters were optimized per image pair on the clinical study, with step size 0.25–1, update and total field Gaussian smoothing 0–5 voxels.

3.3.3. VM-MI

A multimodality VoxelMorph (Guo 2019) using MI similarity metric was also performed as a representation of the unsupervised deep learning registration algorithms. MR and CBCT images were directly input to the network without performing image synthesis. A MI patch size of 8 (searched in 4–10), λ_smooth = 1 (searched in 0–2) and learning rate of 1 × 10⁻⁴ were selected to maximize DSC in the validation dataset.

3.3.4. VM-Synth-NCC

An image synthesis-based registration algorithm, VM-Synth-NCC, was implemented utilizing MR-CBCT synthesis. A CBCT to MR image synthesis network was first trained using a conditional GAN network, with the generator and discriminator design identical to the image synthesis part of the JSR network (encoder, synthesis decoder, and discriminator). The moving MR image was then registered to the fixed synthetic MR from CBCT via an intra-modality VoxelMorph network by minimizing NCC similarity loss. The whole registration pipeline, named as VM-Synth-NCC, investigated the feasibility of using MR-CBCT synthesis for registration in the MR domain.

3.3.5. VM-DualSynth-NCC

An additional image synthesis-based registration algorithm, VM-DualSynth-NCC, was performed using two image synthesis networks, a MR-to-CT synthesis and a CBCT-to-CT synthesis. Then sequentially VoxelMorph with NCC image similarity loss was trained to register between the two synthetic CT images. The VM-DualSynth-NCC shared a similar idea as the proposed JSR network in performing registration in an intermediate CT domain but sequentially learned image synthesis and registration networks instead of jointly as in JSR. Both VM-Synth-NCC and VM-DualSynth-NCC used λ_L1 = 200 and λ_structure = 5 for image synthesis (same as the hyperparameters used in JSR) and λ_smooth = 1 for VoxelMorph registration.

4. Results

4.1. Simulation studies

4.1.1. Accuracy of image synthesis

The performance of MR-to-CT and CBCT-to-CT synthesis from JSR was first examined. Nonetheless, as shown in a simulation study test case in figure 5, both image synthesis generated synthetic CT images that resembled the reference CT images. In the following sections, the presented images and reported evaluation metrics correspond to downsampled images (128 × 160 × 128 voxels) that were used for training (due to GPU memory constraints). Given an input preoperative MR image (figure 5(a)), the synthetic CT (figure 5(b)) shows anatomical structures consistent to the reference CT (figure 5(c)), which is important for the registration task. The MAE of the synthetic CT with respect to the reference CT is depicted in figure 5(d), showing errors of less than 10 HU throughout the brain parenchyma and higher errors only outside the brain, which are not relevant to brain registration performance. Similarly, the CBCT-to-CT synthesis result is shown in figures 5(e)–(f). The MAE between the synthetic CT from CBCT and the reference CT shows a low level of error (<20 HU) throughout the brain. The errors of the skull were zero because the CT images were clipped to 100 HU.

Quantitatively, MR-to-CT synthesis exhibited MAE of 18.4 ± 1.5 HU and SSIM of 0.86 ± 0.02 within the brain soft-tissue region. The CBCT-to-CT synthesis yielded MAE of 16.0 ± 0.7 and SSIM of 0.89 ± 0.01. While the CBCT-to-CT synthesis yielded overall better MAE and SSIM than MR-to-CT synthesis, some fine structures in CT may not be accurately restored in the synthetic CT from CBCT due to artifacts present in the CBCT. The yellow arrows in figures 5(e)–(h) indicate an example of structures lost in the synthesis process, potentially due to the strong shading artifact that removes the presence of the contrast in CBCT. The accuracy of synthesis (in terms of MAE) of soft tissues in the brain is on par with other state-of-the-art approaches—for example: 13–30 HU in CBCT-to-CT synthesis (Harms et al 2019, Liang et al 2019, Chen et al 2020, Spadea et al 2021) and 14–50 HU in MR-to-CT synthesis (Boulanger et al 2021, Bourbonne et al 2021). Note that MAE in this paper refers only to measurement within the brain region (excluding the skull and air), since the goal is to achieve accurate soft-tissue brain registration, and the absolute accuracy of the image synthesis in the skull and air is less important. For this reason, the reported MAE values are less than would be expected if computed over the entire head (including brain, skull, and air).

4.1.2. Accuracy of deformable MR-CBCT registration

The performance of the proposed JSR network was first evaluated in comparison to the alternative and ablation methods. Table 1 summarizes registration results of the test cases in the simulation dataset, where the average DSC, SD, and HD metrics were computed overall seven brain anatomical structures (lateral ventricles, third ventricles, fourth ventricles, amygdala, hippocampus, caudate nucleus, and thalamus). The SyN-MI and VM-MI showed minimal overall improvement compared to the initial rigid registration, suggesting that using MI metric alone was not able to accurately register the given MR and CBCT images, either in conventional iterative optimization (SyN-MI) or in deep learning-based unsupervised registration (VM-MI).

Table 1.

Registration performance of registration methods evaluated on the simulation study.

Method	DSC	SD (mm)	HD (mm)	TRE (mm)	\|J_ϕ\| ⩽ 0	Runtime (s)
Rigid	0.49 ± 0.20	1.15 ± 0.47	4.24 ± 1.22	3.25 ± 1.83	—	—
SyN-MI	0.49 ± 0.20	1.01 ± 0.43	4.05 ± 1.35	3.07 ± 1.75	0	982 ± 155
VM-MI	0.48 ± 0.20	0.95 ± 0.40	4.00 ± 1.02	2.95 ± 1.69	0.03%	2.59 ± 0.02
VM-Synth-NCC	0.56 ± 0.19	0.58 ± 0.36	3.76 ± 0.92	2.65 ± 1.12	0.01%	3.07 ± 0.02
VM-DualSynth-NCC	0.68 ± 0.09	0.45 ± 0.25	2.89 ± 0.75	2.20 ± 0.71	0.01%	3.07 ± 0.02
JSR-Single	0.67 ± 0.11	0.47 ± 0.26	2.91 ± 0.86	2.23 ± 0.80	0.05%	2.55 ± 0.03
JSR	0.69 ± 0.11	0.43 ± 0.23	2.66 ± 0.69 ^*	2.05 ± 0.96 ^*	0.01%	2.66 ± 0.03

Open in a new tab

The proposed method (JSR) is marked in bold, and asterisks (*) denote statistical significance (p < 0.05) in paired t-tests between JSR and the best performing of the other methods. Values for DSC, SD, HD, and TRE are the mean (and standard deviation) computed over the segmented anatomical structures.

The remaining four image synthesis-based registration methods, on the other hand, all demonstrated improved registration compared to rigid initialization. The VM-Synth-NCC method, computing registration in the MR domain, yielded DSC (0.56 ± 0.19), SD (0.58 ± 0.36 mm), HD (2.43 ± 0.68 mm), and TRE (2.65 ± 1.12 mm). The VM-DualSynth-NCC method, which computed the registration in the CT domain, further improved registration performance. The ablation study, JSR-Single, which only estimated the deformation field at the original resolution, demonstrated comparable accuracy to VM-DualSynth-NCC. Finally, the proposed JSR method with multi-resolution registration decoder achieved the highest DSC (0.69 ± 0.11), and lowest SD (0.43 ± 0.23 mm), HD (2.43 ± 0.68 mm), and TRE (2.05 ± 0.96 mm). A boxplot summarizing TRE from all compared method is also shown in figure 6(b). All registration methods yielded diffeomorphic deformations, with few voxels with |J_ϕ| ⩽ 0. The runtime of VM-MI, JSR-Single, and JSR were around 2.5 s, compared to ~3 s for sequential synthesis and registration methods VM-Synth-NCC and VM-DualSynth-NCC, and ~16 min for SyN-MI.

figure 6 further quantifies the registration DSC for each method with respect to individual anatomical structures. SyN-MI and VM-MI were particularly challenged in MR-CBCT registration, likely due to the large image appearance discrepancy that is beyond the description power of MI similarity metric. The VM-Synth-NCC method yielded substantial improvement in many structures, including the 3rd ventricles, 4th ventricles, and amygdala. Due to the challenges in CBCT-to-MR synthesis, however, many structures (e.g. lateral ventricles) were not accurately synthesized, resulting in suboptimal registration performance. The remaining methods, which all computed registration in the synthetic CT domain, demonstrated superior registration performance. Among VM-DualSynth-NCC, JSR-Single, and JSR, the proposed JSR method showed overall higher DSC: JSR achieved better alignment of the 3rd ventricle, amygdala, and thalamus than VM-DualSynth-NCC, and JSR achieved better alignment of 4th ventricle, thalamus, and caudate nucleus than JSR-single.

Figure 7 shows qualitative comparison of Rigid, SyN-MI, JSR-Single, and JSR in an example test case. The boundaries of segmentations (defined in the intraoperative fixed CBCT images) are overlaid on the registered MR images. Compared to Rigid, SyN-MI improved alignment in some regions, while failed to align the moving MR to the fixed image at other regions, as shown in figure 7(b), resulting in no significant change in DSC. Additionally, the skull appeared to be drastically deformed, even though no deformation existed between the MR and CBCT in the skull region, suggesting that using MI was suboptimal in modeling deformation between the input images. As shown in figure 7(c), JSR-Single improved anatomical alignment compared to SyN-MI, while maintaining the rigidity of the skull. JSR further improved registration upon JSR-Single, as evident in regions marked by the arrows in figures 7(c)–(d). With the multi-resolution registration decoder, JSR was able to gradually refine the estimated deformation in the decoding process and hence model more complex deformation patterns.

4.1.3. Accuracy of deformable registration in the presence of metal

The performance of JSR in the presence of metal is depicted in figure 8. Figures 8(a)–(c) show a metal-free simulation case, in the order of CBCT, the synthetic CT (from CBCT), and the registered Mr figures 8(d)–(f) show the corresponding case with a simulated metal instrument adjacent to the anterior horn of the right lateral ventricle. The synthetic CT in figure 8(d) successfully reduced metal artifacts (streaks) compared to the CBCT, while maintaining visibility of the ventricle boundaries. Both registrations (with and without metal) demonstrated comparable improvement in anatomical alignment compared to the moving MR image before deformable registration (figure 8(g)). Figure 8(h) further depicts the norm of the difference between the deformation fields of the two cases, showing only a minor difference throughout the brain, except in a tightly localized region adjacent to the metal. Therefore, the presence of metal had only limited impact on JSR performance and was restricted to the immediate region adjacent to the metal. In addition, JSR achieved diffeomorphic deformations even in the presence of metal, with only a negligible number of voxels (0.01%) with |J_ϕ| ⩽ 0.

A similar finding is observed in the DSC plot of figure 8(i), where JSR performance without metal (JSR) and with metal (JSR-Metal) achieved comparable DSC for most anatomical structures. The accuracy of alignment of the thalamus and caudate nucleus, however, were reduced because the metal simulations were specifically adjacent to those structures (as noted in section 3.1.1), thereby occluding necessary soft-tissue structures for registration. That the registration performed well overall is attributable to the CBCT-to-CT synthesis that is integral to JSR, which reduces metal artifacts and alleviated some of the challenges of metal in registration. The performance of JSR compared favorably to direct intermodal registration (VM-MI) as shown in the VM-MI-Metal result of figure 8(i), which suffered a strong reduction in accuracy due to metal artifacts.

4.2. Clinical studies

The JSR network trained in the simulation study was further trained and evaluated using clinical images with real deformations. As discussed in section 3.1.2, the clinical dataset contains a diverse range of MR images with different acquisition protocols, resulting in large variations of intensity, resolution, and texture. Additionally, a few patients underwent extreme pathological conditions that severely deformed the brain that were beyond typical deep brain deformation from endoscopic surgery. Both complications posed challenges to the registration methods. The registration performance is summarized in table 2, where SyN-MI and VM-MI did not improve compared to Rigid registration. JSR, on the other hand, demonstrated good generalizability on the clinical images despite the challenges with the aid of the transfer learning. The transfer learning leveraged network previously trained with a large number of simulated images in section 4.1 to accommodate to the real clinical images. JSR yielded statistically significant improvement than Rigid, with DSC 0.68 ± 0.14, SD 0.63 ± 0.31 mm, HD 3.14 ± 1.32 mm, and TRE 2.95 ± 1.08 mm. Figure 9(a) further illustrates the DSC of individual ventricles after registration, which shows a similar trend as the overall metrics. All methods yielded similar level of diffeomorphism with only an arguably negligible percent of non-positive Jacobian determinant. JSR achieved runtime of 2.66 ± 0.03 s, compared to ~17 min for SyN-MI.

Table 2.

Registration performance of the methods investigated on the clinical study.

Method	DSC	SD (mm)	HD (mm)	TRE (mm)	\|J_ϕ\| ⩽ 0	Runtime (s)
Rigid	0.61 ± 0.19	1.24 ± 0.55	5.04 ± 2.02	5.40 ± 2.53	—	—
SyN-MI	0.56 ± 0.20	1.32 ± 0.57	4.85 ± 2.35	5.11 ± 2.78	0.10%	1020 ± 136
VM-MI	0.61 ± 0.19	1.20 ± 0.52	4.92 ± 2.02	5.07 ± 2.09	0.08%	2.59 ± 0.02
JSR	0.68 ± 0.14 ^*	0.63 ± 0.31 ^*	3.14 ± 1.32 ^*	2.45 ± 1.08 ^*	0.02%	2.66 ± 0.03

Open in a new tab

Asterisks (*) denote statistical significance (p < 0.05) assessed from paired t-test between JSR and other methods. DSC, SD, HD, and TRE values are the mean (and standard deviation) computed over pertinent structures.

The JSR result of an example test case is further shown in figures 9(b)–(g). Figure 9(b) shows the moving MR image (acquired in the sagittal plane using a spin echo pulse sequence and reconstructed to 0.94 × 0.94 × 5 mm³ voxel size). The synthetic CT from MR is shown in figure 9(c), which roughly matches with CT image appearance, albeit a slight loss of CSF contrast due to lack of training data. The fixed CBCT image and synthetic CT from CBCT are shown in figures 9(e) and (f), respectively. The presence of a ventricular shunt and brain shift at the cortical surface (separating the brain from the cranium) were not seen in the simulation or training. Interestingly, the artifacts (streaks) surrounding the shunt in CBCT were successfully removed in the synthetic CT image, suggesting that the CBCT-to-CT synthesis was robust to unseen CBCT artifacts. The brain shift was not correctly handled, because such deformations likely violated diffeomorphism and the network did not see such a scenario during training. The network performance in the presence of such conditions is further discussed in section 5.4. Despite all the challenges, registration about the ventricles achieved comparable accuracy as in the simulation study, noting that deformations within the deep brain are more pertinent than deformations at the cortex for the clinical use case of neuroendoscopic deep brain surgery. As evident in figure 9(d), the registered MR image showed close alignment to the fixed CBCT ventricle segmentation contours.

5. Discussion

The results reported above summarize JSR performance trained with nominal hyperparameters, which is dependent on the image synthesis pre-training described in section 3.2. To better understand the utility of the synthesis pre-training, the sensitivity of registration accuracy to the image synthesis performance is discussed in section 5.1, below. Furthermore, the comparison among alternative architectures warrants further discussion, as in section 5.2 for single-resolution and multi-resolution registration decoders in JSR-Single and JSR, respectively. Then, we discuss questions regarding the intermediate (MR-like or CT-like) intermediate domain for synthesis-based registration methods in section 5.3. Finally, the JSR network performance in the presence of clinical conditions (metal and missing tissue) is discussed in section 5.4.

5.1. Sensitivity of registration accuracy to image synthesis performance

The sensitivity of registration accuracy with respect to image synthesis pre-training was evaluated first as a means for initializing the fixed and moving synthetic CT images. Different JSR trainings were conducted with initialization of different number of epochs of synthesis pre-training. The validation DSC loss curve (summarized overall associated anatomy) of different JSR trainings with 0 (without pre-training), 1, 3, 5, and 10 epochs of synthesis pre-training is shown in figure 10(a), and the final DSC of the ventricles is shown in figure 10(b). Synthesis pre-training of 1–10 epochs yielded comparable overall registration performance in terms of DSC, whereas the case without synthesis pre-training (epoch 0) yielded the lowest DSC. The study suggests that while synthesis pre-training is required, JSR registration performance is relatively insensitive to the amount of synthesis pre-training, because the joint training appears to compensate for the remaining synthesis learning, and as little as one epoch is sufficient for convergence. Another interesting finding was that registration performance slightly decreased from 5 to 10 epochs of pre-training. One explanation is that if the synthesis pre-training proceeded for too many epochs, the encoders could be trapped in a local minimum that only encoded synthesis-related information, thus reducing the final registration performance.

Figure 10. — Sensitivity of registration accuracy to image synthesis pre-training. (a) Validation DSC with respect to number of JSR training epochs initialized with different synthesis pre-training. (b) DSC of the test cases after JSR training convergence with different synthesis pre-training.

As discussed in section 2.2.1, the image synthesis part of the JSR network can be trained in either a supervised manner using a conditional GAN (with reference CT in the preoperative and intraoperative coordinates) or an unsupervised manner using a Cycle-GAN. The Cycle-GAN training differs from the conditional GAN training in two aspects: (1) in the pre-training phase, the Cycle-GAN required two additional synthesis generators (CT-to-MR and CT-to-CBCT), which were later removed in JSR training; and (2) λ_L1 in equation (5) was set to 0 for the Cycle-GAN. The accuracy of synthesis and JSR using the two synthesis trainings is summarized in table 3. As expected, the MAE for image synthesis from the unsupervised Cycle-GAN training was slightly higher than the supervised conditional GAN. The registration accuracy, however, was relatively insensitive to the two synthesis methods, and the DSC and SD metrics obtained from Cycle-GAN JSR were only slightly lower than conditional GAN JSR (without statistical significance, p > 0.1 in student t-test). It is worth noting that in comparison to most Cycle-GAN unpaired synthesis approaches in the literature—with input and reference images of different subjects (Boulanger et al 2021, Spadea et al 2021)—the current work utilized paired images from the same subjects and was only unpaired in terms of the deformations between the input and reference images. As a result, a lower MAE could be achieved from the Cycle-GAN synthesis (in comparison to ~50 HU in entirely unpaired training in the literature).

Table 3.

JSR registration accuracy with respect to supervised conditional GAN image synthesis and unsupervised Cycle-GAN image synthesis.

	Image synthesis accuracy		JSR registration accuracy
Synthesis method	MAE (MR-to-CT) (HU)	MAE (CBCT-to-CT) (HU)	DSC	SD (mm)
Conditional GAN	18.4 ± 1.5	16.0 ± 0.7	0.69 ± 0.11	0.43 ± 0.23
Cycle-GAN	25.8 ± 7.3	22.3 ± 5.5	0.68 ± 0.10	0.45 ± 0.25

Open in a new tab

With a large training dataset and the incorporation of structural-consistency loss in image synthesis (equation (4)), both conditional GAN and Cycle-GAN were able to generate synthetic CT images with similar image appearance and correct structural information as real CT images. Additionally, NCC was used as the similarity metric in JSR, which could potentially tolerate some level of intensity error in the synthetic images

5.2. JSR architecture design

The innovation of the proposed JSR network is twofold: (1) a joint synthesis and registration network trained with shared encoders and multi-task learning strategy, and (2) a multi-resolution registration decoder that estimates deformation fields from coarse to fine resolutions. To explore the potential benefit of the joint synthesis and registration learning, JSR was compared to VM-DualSynth-NCC, which employed a sequentially training strategy. In VM-DualSynth-NCC, MR-to-CT and CBCT-to-CT synthesis were first trained, and then a VoxelMorph registration network taking the synthetic CTs as input was trained. JSR, on the other hand, jointly trained the entire network, driving the encoders to encode important information for both synthesis and registration. Such a multi-task learning strategy utilizes the synergy between image synthesis and registration to improve registration performance (as shown in table 1 and figure 6) while reducing the total number of learnable parameters by 20%.

The efficacy of the multi-resolution registration decoder can be observed by comparing JSR to the ablation study JSR-Single. As shown in table 1 and figure 6, JSR achieved better registration performance than JSR-Single, especially in the 4th ventricle, thalamus, and caudate nucleus. By using the multi-resolution registration decoder, the lower levels of the network only needed to learn the course and more global deformations, while leaving the higher levels of the network to refine the estimation, enabling the network to model more complex deformation patterns. Additionally, the multi-resolution registration decoder allows loss computation at each resolution, which provided ‘deep supervision’ that is known to benefit deep neural network learning (Zeng et al 2017).

5.3. Which domain is best for MR-CBCT registration: inter-modality, synthetic MR domain, or synthetic CT domain?

The registration result of JSR and the several comparison methods offered some insight into the question of which domain should MR-CBCT registration be computed. Direct, inter-modality registration methods (SyN-MI and VM-MI) were not able to solve the deformations between MR and CBCT due to challenges of large image appearance difference and presence of image artifacts. To directly compute registration between MR and CBCT, some level of supervision (e.g. labeled segmentation (Hu et al 2018, Fu et al 2021)) or deep learning-based similarity metrics (Haskins et al 2019, Niethammer et al 2019) maybe needed.

Intuitively, the MR domain, which offers good soft-tissue contrast, would be a good choice for brain registration, and (Han et al 2022a) also showed that for MR-CT registration, synthesizing CT to MR and performing MR domain registration yielded better registration performance than doing so in the CT domain. To perform MR domain registration, the CBCT images need to be first synthesized into Mr The quality of CBCT-to-MR synthesis, however, was diminished by the large image appearance difference between MR and CBCT. Figure 11 shows a test case of CBCT-to-MR synthesis using supervised conditional GAN (used in VM-Synth-NCC), and the synthetic MR image in figure 11(b) is very noisy with blurry anatomical boundaries, in comparison to synthetic CT images with clearly defined boundaries as shown in figures 5(b), (f). Such blurry boundaries are not desirable for accurate registration, resulting in diminished registration performance of VM-Synth-NCC compared to registrations in the synthetic CT domain (VM-DualSynth-NCC, JSR-Single, and JSR) as evident in table 1.

Figure 11. — CBCT-to-MR image synthesis on a test case in the simulation study. (a) Input intraoperative CBCT image. (b) Synthetic MR image. (c) Reference ground truth MR image in the intraoperative coordinates. (d) MAE between the synthetic and reference Mr.

The CBCT images, on the other hand, can be seen as a lower image quality and artifact-contaminated version of the CT images. Performing registration in CBCT domain, therefore, is obviously suboptimal. The CBCT-to-CT synthesis, to some extent, can be considered as an artifact correction and denoising process, improving the CBCT image quality to the level of CT. Registration methods computed in the CT domain (VM-DualSynth-NCC, JSR-Single, and JSR), therefore, achieved better performance than methods performed in other domains. Future work will also investigate CBCT reconstruction and artifact correction as a pre-processing step before registration, which could potentially improve the input image quality and hence improve registration performance.

5.4. Network performance in clinical situations

The behavior of the network and the corresponding registration performance in complex, realistic surgical situations (such as brain shift, the presence of abnormal anatomy, and metal instruments) is also an important consideration for clinical application. In the simulation study in section 4.1.3, the JSR method demonstrated a comparable level of registration accuracy (DSC) in simulations with and without metal, with the difference in deformation field only localized in the metal region. The results suggest that the influence of metal was highly localized to regions adjacent to the instrument, and JSR performed well overall in part due to the integral synthesis step reducing the magnitude of metal artifacts.

While metal and topological changes were not explicitly modeled in the JSR network, the network was able to maintain a proper synthesis and registration through transfer learning to accommodate topological changes (e.g. metal and missing tissue) present in the training data. Figure 12 shows two examples from the clinical study zoomed-in to regions of metal artifacts. Figures 12(a)–(f) show a clinical case in which a ventricular shunt is present at the anterior horn of the lateral ventricles, and cortical brain shift is evident in the right frontal lobe. Compared to the CBCT (figure 12(a)), the synthetic CT (figure 12(b)) demonstrated reduced metal artifacts (streaks) around the shunt and maintained the anatomical structures from the CBCT in the region of cortical shift in comparison to the CT. In terms of registration, the network was able to faithfully align the moving image in regions not adjacent to the metal artifacts. In regions immediately adjacent metal instrumentation (within ~8 mm as evident in figure 8(h)) metal artifacts challenged registration, which was prone to aligning the ventricle boundary from the moving MR to the metal boundary in the CBCT.

On the other hand, realistic, diffeomorphic deformations were obtained overall despite the topological changes, as shown in figure 12(f). A possible explanation of the diffeomorphic performance could be the multi-resolution registration decoder, which progressively estimates diffeomorphic deformations in a coarse-to-fine manner, heavily regularizing the output deformations to be smooth and diffeomorphic. Similar observations were evident in another case presenting a ventricular shunt and tissue excision, as shown in figures 12(h)–(m), where the JSR network yielded diffeomorphic deformations even in regions with such strong topological change. The clinical dataset also contained numerous instances of metal staples or cranial pins on the skull, which were found not to affect the brain soft-tissue registration (and were therefore not evaluated in this work).

Acknowledging that accurate delineation of anatomical boundaries in the presence of such artifacts is challenging even for a neuroradiology/neurosurgery expert, additional work is warranted in the future to more explicitly model the presence of metal and other clinical scenarios involving gross anatomical change. For example, the extra-dimensional Demons registration (Nithiananthan et al 2012) was shown to handle such changes in topology by invoking a fourth pseudo-dimension allowing tissue to be excised (or instrumentation to be introduced) within the 3D image volume and thereby model deformations even in the presence of such dramatic changes/mismatch between preoperative and intraoperative images. Such an idea could potentially be incorporated into the current framework to better model registration for intraoperative scenarios.

5.5. Limitations, generalizability, and future work

One limitation of the work is that the JSR network in the current form relies on an accurate rigid pre-registration. Hu et al (2019), Cao et al (2020) proposed a multi-resolution decoder that includes a rigid alignment module at the coarsest level to simultaneously estimate rigid and deformable registration in a single pass for intra-modality registration. Such a rigid alignment module can potentially be incorporated into the JSR network to provide a more integrated registration pipeline.

The JSR network was first trained in the simulation study and then transfer learned in the clinical study. JSR demonstrated robustness and generalizability to unseen real data and outperformed state-of-the-art SyN-MI and VoxelMorph, yielding comparable DSC and TRE as in the simulation study. Among the limitations of the clinical study was the relatively small number of cases in the clinical dataset (14 volumetric image pairs). Currently, there are no available publicly available datasets of co-registered MR-CBCT brain images suitable to this study. An IRB-approved clinical study is ongoing to expand the clinical image dataset and expose the network to a broader variety of images and pathologies, improve the training performance, and improve generalizability of the network for new clinical applications.

Several additional challenges were present in the clinical study and were not included in the simulation study, including (1) diverse MR acquisition protocols resulted in variations in MR intensity distribution, texture, and spatial resolution; (2) metal in the intraoperative CBCT but not in the preoperative MR; (3) a macroscopic brain shift separating from the inner surface of the cranium (not commonly seen in neuroendoscopic surgery). Future work needs to acquire preoperative MR images with more consistent acquisition protocols and increase the total number of training images. MR harmonization techniques will be explored to normalize MR scans from different sequences and scanners into a common image appearance (Dewey et al 2019, Zuo et al 2021), which would reduce the variation of training and test data and improve registration performance. Metal artifact reduction or metal inpainting techniques may also be desirable to handle shunt/endoscope/electrode stimulations in the intraoperative CBCT images.

To our knowledge, this is the first work on MR-CBCT deformable registration in the brain within a neurosurgical context. As a result, there were no baseline studies of alternative, state-of-the-art methods for comparison in the literature. To better understand the performance of the proposed JSR network in comparison to alternative registration techniques, JSR was directly compared to a series of comparison methods implemented and tested on the same MR-CBCT dataset. Furthermore, to evaluate and compare the performance of the resulting segmentation of structures following registration, the lateral ventricles were segmented in CBCT using the continuous max-flow/min-cut method (Yuan et al 2014), which achieved an overall DSC of 0.53 ± 0.19. Due to challenges in CBCT image quality, simple threshold-based segmentation is not an appropriate choice and was not performed. Comparing the DSC after JSR in table 1 (simulation study) and table 2 (clinical study), the anatomical alignment following JSR was significantly better than for direct segmentation via continuous max-flow/min-cut, supporting the notion that JSR achieves reasonably accurate anatomical definition for surgical guidance. Additionally, JSR achieved a TRE of ~2 mm, which is comparable to the TRE for surgical navigation systems (1.2–3.0 mm deviation of electrode placements (Bjartmarz and Rehncrona 2007, Fick et al 2021)).

6. Conclusions

A joint image synthesis and registration network for MR-CBCT deformable registration was reported, which converts a very challenging inter-modality registration into a simpler intra-modality registration in the intermediate CT domain. The method uses encoders to extract latent representations from MR and CBCT, which are then jointly decoded into synthetic CT images and the deformation field between the two. A novel multi-resolution registration decoder was also implemented to estimate the deformation in a coarse-to-fine resolution pyramid, which was shown to outperform a single-resolution registration decoder (JSR-Single).

The proposed JSR network was first trained in a simulation study with simulated CBCT from CT and simulated brain deformations. JSR achieved mean DSC of 0.69 and mean TRE of 2.05 mm, superior to alternative conventional and deep learning-based methods (SyN-MI, VM-MI, VM-Synth-NCC, and VM-DualSynth-NCC). Additionally, JSR, with the proposed multi-resolution registration decoder, achieved improved registration performance compared to single-resolution JSR (JSR-Single). JSR was further trained with real clinical images via transfer learning, achieving mean DSC of 0.68 and mean TRE of 2.45 mm. In all cases, JSR yielded diffeomorphic transformations and fast runtime (<3 s). The proposed registration network may be compatible with the demands of high-precision neurosurgery and warrants further investigation in further clinical studies of CBCT-guided procedures.

Acknowledgments

This research was supported by NIH grant U01-NS-107133 and academic-industry partnership with Medtronic Inc. (Littleton, MA).

References

Avants BB, Epstein CL, Grossman M and Gee JC 2008. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain Med. Image Anal 12 26–41 [DOI] [PMC free article] [PubMed] [Google Scholar]
Balakrishnan G, Zhao A, Sabuncu MR, Guttag J and Dalca AV 2018. VoxelMorph: a learning framework for deformable medical image registration IEEE Trans. Med. Imaging 38 1788–800 [DOI] [PubMed] [Google Scholar]
Bjartmarz H and Rehncrona S 2007. Comparison of accuracy and precision between frame-based and frameless stereotactic navigation for deep brain stimulation electrode implantation Stereotact. Funct. Neurosurg 85 235–42 [DOI] [PubMed] [Google Scholar]
Boulanger M, Nunes JC, Chourak H, Largent A, Tahri S, Acosta O, De Crevoisier R, Lafond C and Barateau A 2021. Deep learning methods to generate synthetic CT from MRI in radiotherapy: a literature review Phys. Med 89 265–81 [DOI] [PubMed] [Google Scholar]
Bourbonne V, Jaouen V, Hognon C, Boussion N, Lucia F, Pradier O, Bert J, Visvikis D and Schick U 2021. Dosimetric validation of a GAN-based Pseudo-CT generation for MRI-only stereotactic brain radiotherapy Cancers (Basel) 13 1–13 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cao Y, Zhu Z, Rao Y, Qin C, Lin D, Dou Q, Ni D and Wang Y 2020. Edge-aware pyramidal deformable network for unsupervised registration of brain MR images Front. Neurosci 14 620235. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen L, Liang X, Shen C, Jiang S and Wang J 2020. Synthetic CT generation from CBCT images via deep learning Med. Phys 47 1115–25 [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen Z, Badrinarayanan V, Lee C-Y and Rabinovich A 2018. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask NetworksarXiv: 1711.02257 [Google Scholar]
Dalca AV, Balakrishnan G, Guttag J and Sabuncu MR 2018. Unsupervised learning for fast probabilistic diffeomorphic registration Int. Conf. Med. Image Comput. Comput. Interv. 11070 LNCS pp 729–38 [DOI] [PubMed] [Google Scholar]
Dean CJ, Sykes JR, Cooper RA, Hatfield P, Carey B, Swift S, Bacon SE, Thwaites D, Sebag-Montefiore D and Morgan AM 2012. An evaluation of four CT-MRI co-registration techniques for radiotherapy treatment planning of prone rectal cancer patients Br. J. Radiol. 85 61–8 [DOI] [PMC free article] [PubMed] [Google Scholar]
de Vos BD, Berendsen FF, Viergever MA, Sokooti H, Staring M and Išgum I 2019. A deep learning framework for unsupervised affine and deformable image registration Med. Image Anal 52 128–43 [DOI] [PubMed] [Google Scholar]
Fick T, van Doormaal JAM, Hoving EW, Willems PWA and van Doormaal TPC 2021. Current accuracy of augmented reality neuronavigation systems: systematic review and meta-analysis World Neurosurg. 146 179–88 [DOI] [PubMed] [Google Scholar]
Fu Y, Wang T, Lei Y, Patel P, Jani AB, Curran WJ, Liu T and Yang X 2021. Deformable MR-CBCT prostate registration using biomechanically constrained deep learning networks Med. Phys 48 253–63 [DOI] [PMC free article] [PubMed] [Google Scholar]
Groiss SJ, Wojtecki L, Sudmeyer M and Schnitzler A 2009. Deep Brain Stimulation in Parkinson’s Disease Therapeutic Advances in Neurological Disorders 2 379–91 [DOI] [PMC free article] [PubMed] [Google Scholar]
Guo K 2019. Multi-Modal Image Registration with Unsupervised Deep Learning Massachusetts Institute of Technology; (https://hdl.handle.net/1721.1/123142) [Google Scholar]
Han R, Jones CK, Lee J, Wu P, Vagdargi P, Uneri A, Helm PA, Luciano M, Anderson WS and Siewerdsen JH 2022a. Deformable MR-CT image registration using an unsupervised, dual-channel network for neurosurgical guidance Med. Image Anal 75 102292. [DOI] [PMC free article] [PubMed] [Google Scholar]
Han R et al. 2022b. Deformable registration of MRI to intraoperative cone-beam CT of the brain using a joint synthesis and registration network Medical Imaging 2022: Image-Guided Procedures, Robotic Interventions, and Modeling 12034 (12034) 30–6 [Google Scholar]
Han R, De Silva T, Ketcha M, Uneri A and Siewerdsen JH 2018. A momentum-based diffeomorphic demons framework for deformable MR-CT image registration Phys. Med. Biol 63 215006. [DOI] [PMC free article] [PubMed] [Google Scholar]
Harms J, Lei Y, Wang T, Zhang R, Zhou J, Tang X, Curran WJ, Liu T and Yang X 2019. Paired cycle-GAN-based image correction for quantitative cone-beamcomputed tomography Med. Phys 46 3998–4009 [DOI] [PMC free article] [PubMed] [Google Scholar]
Haskins G, Kruecker J, Kruger U, Xu S, Pinto PA, Wood BJ and Yan P 2019. Learning deep similarity metric for 3D MR–TRUS image registration Int. J. Comput. Assist. Radiol. Surg 14 417–25 [DOI] [PMC free article] [PubMed] [Google Scholar]
He K, Zhang X, Ren S and Sun J 2015. Deep residual learning for image recognition Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2016-December pp 770–8 [Google Scholar]
Heinrich MP, Jenkinson M, Bhushan M, Matin T, Gleeson FV, Brady SM and Schnabel JA 2012. MIND: modality independent neighbourhood descriptor for multi-modal deformable registration Med. Image Anal., Special Issue on the 2011 Conf. on Medical Image Computing and Computer Assisted Intervention, 16, pp 1423–35 [DOI] [PubMed] [Google Scholar]
Hering A. et al. Learn2Reg: comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning arXiv:2112.04489. 2021 doi: 10.1109/TMI.2022.3213983. [DOI] [PubMed] [Google Scholar]
Isola P, Zhu J-Y, Zhou T and Efros AA 2017. Image-to-image translation with conditional adversarial networks IEEE Conf Comput Vis Patt Recognit (CVPR) (Honolulu, HI, 21–26 July 2017) (Picastaway, NJ: IEEE; ) ( 10.1109/CVPR.2017.632) [DOI] [Google Scholar]
Kang M, Hu X, Huang W, Scott MR and Reyes M 2019. Dual-Stream Pyramid Registration Network Med Image Anal 78 382–90 [DOI] [PubMed] [Google Scholar]
Hu Y et al. 2018. Weakly-supervised convolutional neural networks for multimodal image registration Med. Image Anal 49 1–13 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ledig C, Heckemann RA, Hammers A, Lopez JC, Newcombe VFJ, Makropoulos A, Lötjönen J, Menon DK and Rueckert D 2015. Robust whole-brain segmentation: application to traumatic brain injury Med. Image Anal 21 40–58 [DOI] [PubMed] [Google Scholar]
Liang X, Chen L, Nguyen D, Zhou Z, Gu X, Yang M, Wang J and Jiang S 2019. Generating synthesized computed tomography (CT) from cone-beam computed tomography (CBCT) using CycleGAN for adaptive radiation therapy Phys. Med. Biol 64 125002. [DOI] [PubMed] [Google Scholar]
Modat M, Vercauteren T, Ridgway GR, Hawkes DJ, Fox NC and Ourselin S 2010. Diffeomorphic demons using normalized mutual information, evaluation on multimodal brain MR images Medical Imaging 2010: Image Processing Int. Society for Optics and Photonics p 76232K [Google Scholar]
Momin S, Lei Y, Wang T, Fu Y, Patel P, Jani AB, Curran WJ, Liu T and Yang X 2021. Deep learning-based deformable MRI-CBCT registration of male pelvic region Medical Imaging 2021: Computer-Aided Diagnosis 11597, 108–13 [Google Scholar]
Murphy K et al. 2011. Evaluation of registration methods on thoracic CT: the EMPIRE10 challenge IEEE Trans. Med. Imaging 30 1901–20 [DOI] [PubMed] [Google Scholar]
Nabavi A et al. 2001. Serial intraoperative magnetic resonance imaging of brain shift Neurosurgery 48 787–98 [DOI] [PubMed] [Google Scholar]
Niethammer M, Kwitt R and Vialard FX 2019. Metric learning for image registration Proc. of the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (IEEE Computer Society; ) pp 8455–64 [DOI] [PMC free article] [PubMed] [Google Scholar]
Denis de Senneville B, Zachiu C, Ries M and Moonen C 2016. EVolution: an edge-based variational method for non-rigid multi-modal image registration Phys. Med. Biol 61 7377–96 [DOI] [PubMed] [Google Scholar]
Nithiananthan S, Schafer S, Mirota DJ, Stayman JW, Zbijewski W, Reh DD, Gallia GL and Siewerdsen JH 2012. Extra-dimensional Demons: a method for incorporating missing tissue in deformable image registration Med. Phys 39 5718–31 [DOI] [PMC free article] [PubMed] [Google Scholar]
Nithiananthan S et al. 2011. Demons deformable registration of CT and cone-beam CT using an iterative intensity matching approach Med. Phys 38 1785–98 [DOI] [PMC free article] [PubMed] [Google Scholar]
Nowell M, Rodionov R, Diehl B, Wehner T, Zombori G, Kinghorn J, Ourselin S, Duncan J, Miserocchi A and McEvoy A 2014. A novel method for implementation of frameless StereoEEG in epilepsy surgery Neurosurgery 10 525–34 [DOI] [PMC free article] [PubMed] [Google Scholar]
Oppido PA. et al. Neuroendoscopic biopsy of ventricular tumors: a multicentric experience. Neurosurg. Focus. 2011;30:E2. doi: 10.3171/2011.1.FOCUS10326. [DOI] [PubMed] [Google Scholar]
Park S, Plishker W, Quon H, Wong J, Shekhar R and Lee J 2017. Deformable registration of CT and cone-beam CT with local intensity matching Phys. Med. Biol 62 927–47 [DOI] [PMC free article] [PubMed] [Google Scholar]
Reaungamornrat S, De Silva T, Uneri A, Vogt S, Kleinszig G, Khanna AJ, Wolinsky J-P, Prince JL and Siewerdsen JH 2016. MIND demons: symmetric diffeomorphic deformable registration of MR and CT for image-guided spine surgery IEEE Trans. Med. Imaging 35 2413–24 [DOI] [PMC free article] [PubMed] [Google Scholar]
Rivest-Hénault D, Dowson N, Greer PB, Fripp J and Dowling JA 2015. Robust inverse-consistent affine CT-MR registration in MRI-assisted and MRI-alone prostate radiation therapy Med. Image Anal 23 56–69 [DOI] [PubMed] [Google Scholar]
Spadea MF, Maspero M, Zaffino P and Seco J 2021. Deep learning based synthetic-CT generation in radiotherapy and PET: a review Med. Phys 48 6537–66 [DOI] [PubMed] [Google Scholar]
Spennato P, Cinalli G, Ruggiero C, Aliberti F, Trischitta V, Cianciulli E and Maggi G 2007. Neuroendoscopic treatment of multiloculated hydrocephalus in children J. Neurosurg 106 29–35 [DOI] [PubMed] [Google Scholar]
Wei D, Ahmad S, Huo J, Peng W, Ge Y, Xue Z, Yap P-T, Li W, Shen D and Wang Q 2019. Synthesis and inpainting-based MR-CT registration for image-guided thermal ablation of liver tumors Med. Image Comput. Comput. Assist. Interv.—MICCAI 2019. MICCAI 2019. Lect. Notes Comput. Sci 11768 LNCS 512–20 [Google Scholar]
Weiss K, Khoshgoftaar TM and Wang DD 2016. A survey of transfer learning J. Big Data 3 1–40 [Google Scholar]
Wu P, Sisniega A, Stayman JW, Zbijewski W, Foos D, Wang X, Khanna N, Aygun N, Stevens RD and Siewerdsen JH 2020. Cone-beam CT for imaging of the head/brain: development and assessment of scanner prototype and reconstruction algorithms Med. Phys 47 2392–407 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu P, Sisniega A, Uneri A, Han R, Jones C, Vagdargi P, Zhang X, Luciano M, Anderson W and Siewerdsen J 2021. Using Uncertainty in Deep Learning Reconstruction for Cone-Beam CT of the Brain 2108.09229 [Google Scholar]
Xu J et al. 2016. Technical assessment of a prototype cone-beam CT system for imaging of acute intracranial hemorrhage Med. Phys 43 5745–57 [DOI] [PubMed] [Google Scholar]
Yang H, Sun J, Carass A, Zhao C, Lee J, Prince JL and Xu Z 2020a. Unsupervised MR-to-CT synthesis using structure-constrained CycleGAN IEEE Trans. Med. Imaging 39 4249–61 [DOI] [PubMed] [Google Scholar]
Yang H, Qian P and Fan C 2020b. An indirect multimodal image registration and completion method guided by image synthesis Comput. Math. Methods Med 2020 ( 10.1155/2020/2684851) [DOI] [PMC free article] [PubMed] [Google Scholar]
Yuan J, Bae E, Tai XC and Boykov Y 2014. A spatially continuous max-flow and min-cut framework for binary labeling problems Numer. Math 126 559–87 [Google Scholar]
Zeng G, Yang X, Li J, Yu L, Heng PA and Zheng G 2017. 3D U-net with multi-level deep supervision: fully automatic segmentation of proximal femur in 3D MR images Machine Learning in Medical Imaging. MLMI 2017 10541 LNCS, 274–82 [Google Scholar]
Zhen X, Gu X, Yan H, Zhou L, Jia X and Jiang SB 2012. CT to cone-beam CT deformable registration with simultaneous intensity correction Phys. Med. Biol 57 6807–26 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zuo L, Dewey BE, Liu Y, He Y, Newsome SD, Mowry EM, Resnick SM, Prince JL and Carass A 2021. Unsupervised MR harmonization by learning disentangled representations using information bottleneck theory Neuroimage 243 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Avants BB, Epstein CL, Grossman M and Gee JC 2008. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain Med. Image Anal 12 26–41 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Balakrishnan G, Zhao A, Sabuncu MR, Guttag J and Dalca AV 2018. VoxelMorph: a learning framework for deformable medical image registration IEEE Trans. Med. Imaging 38 1788–800 [DOI] [PubMed] [Google Scholar]

[R3] Bjartmarz H and Rehncrona S 2007. Comparison of accuracy and precision between frame-based and frameless stereotactic navigation for deep brain stimulation electrode implantation Stereotact. Funct. Neurosurg 85 235–42 [DOI] [PubMed] [Google Scholar]

[R4] Boulanger M, Nunes JC, Chourak H, Largent A, Tahri S, Acosta O, De Crevoisier R, Lafond C and Barateau A 2021. Deep learning methods to generate synthetic CT from MRI in radiotherapy: a literature review Phys. Med 89 265–81 [DOI] [PubMed] [Google Scholar]

[R5] Bourbonne V, Jaouen V, Hognon C, Boussion N, Lucia F, Pradier O, Bert J, Visvikis D and Schick U 2021. Dosimetric validation of a GAN-based Pseudo-CT generation for MRI-only stereotactic brain radiotherapy Cancers (Basel) 13 1–13 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Cao Y, Zhu Z, Rao Y, Qin C, Lin D, Dou Q, Ni D and Wang Y 2020. Edge-aware pyramidal deformable network for unsupervised registration of brain MR images Front. Neurosci 14 620235. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Chen L, Liang X, Shen C, Jiang S and Wang J 2020. Synthetic CT generation from CBCT images via deep learning Med. Phys 47 1115–25 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Chen Z, Badrinarayanan V, Lee C-Y and Rabinovich A 2018. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask NetworksarXiv: 1711.02257 [Google Scholar]

[R9] Dalca AV, Balakrishnan G, Guttag J and Sabuncu MR 2018. Unsupervised learning for fast probabilistic diffeomorphic registration Int. Conf. Med. Image Comput. Comput. Interv. 11070 LNCS pp 729–38 [DOI] [PubMed] [Google Scholar]

[R10] Dean CJ, Sykes JR, Cooper RA, Hatfield P, Carey B, Swift S, Bacon SE, Thwaites D, Sebag-Montefiore D and Morgan AM 2012. An evaluation of four CT-MRI co-registration techniques for radiotherapy treatment planning of prone rectal cancer patients Br. J. Radiol. 85 61–8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] de Vos BD, Berendsen FF, Viergever MA, Sokooti H, Staring M and Išgum I 2019. A deep learning framework for unsupervised affine and deformable image registration Med. Image Anal 52 128–43 [DOI] [PubMed] [Google Scholar]

[R12] Fick T, van Doormaal JAM, Hoving EW, Willems PWA and van Doormaal TPC 2021. Current accuracy of augmented reality neuronavigation systems: systematic review and meta-analysis World Neurosurg. 146 179–88 [DOI] [PubMed] [Google Scholar]

[R13] Fu Y, Wang T, Lei Y, Patel P, Jani AB, Curran WJ, Liu T and Yang X 2021. Deformable MR-CBCT prostate registration using biomechanically constrained deep learning networks Med. Phys 48 253–63 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Groiss SJ, Wojtecki L, Sudmeyer M and Schnitzler A 2009. Deep Brain Stimulation in Parkinson’s Disease Therapeutic Advances in Neurological Disorders 2 379–91 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Guo K 2019. Multi-Modal Image Registration with Unsupervised Deep Learning Massachusetts Institute of Technology; (https://hdl.handle.net/1721.1/123142) [Google Scholar]

[R16] Han R, Jones CK, Lee J, Wu P, Vagdargi P, Uneri A, Helm PA, Luciano M, Anderson WS and Siewerdsen JH 2022a. Deformable MR-CT image registration using an unsupervised, dual-channel network for neurosurgical guidance Med. Image Anal 75 102292. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Han R et al. 2022b. Deformable registration of MRI to intraoperative cone-beam CT of the brain using a joint synthesis and registration network Medical Imaging 2022: Image-Guided Procedures, Robotic Interventions, and Modeling 12034 (12034) 30–6 [Google Scholar]

[R18] Han R, De Silva T, Ketcha M, Uneri A and Siewerdsen JH 2018. A momentum-based diffeomorphic demons framework for deformable MR-CT image registration Phys. Med. Biol 63 215006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Harms J, Lei Y, Wang T, Zhang R, Zhou J, Tang X, Curran WJ, Liu T and Yang X 2019. Paired cycle-GAN-based image correction for quantitative cone-beamcomputed tomography Med. Phys 46 3998–4009 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Haskins G, Kruecker J, Kruger U, Xu S, Pinto PA, Wood BJ and Yan P 2019. Learning deep similarity metric for 3D MR–TRUS image registration Int. J. Comput. Assist. Radiol. Surg 14 417–25 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] He K, Zhang X, Ren S and Sun J 2015. Deep residual learning for image recognition Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2016-December pp 770–8 [Google Scholar]

[R22] Heinrich MP, Jenkinson M, Bhushan M, Matin T, Gleeson FV, Brady SM and Schnabel JA 2012. MIND: modality independent neighbourhood descriptor for multi-modal deformable registration Med. Image Anal., Special Issue on the 2011 Conf. on Medical Image Computing and Computer Assisted Intervention, 16, pp 1423–35 [DOI] [PubMed] [Google Scholar]

[R23] Hering A. et al. Learn2Reg: comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning arXiv:2112.04489. 2021 doi: 10.1109/TMI.2022.3213983. [DOI] [PubMed] [Google Scholar]

[R24] Isola P, Zhu J-Y, Zhou T and Efros AA 2017. Image-to-image translation with conditional adversarial networks IEEE Conf Comput Vis Patt Recognit (CVPR) (Honolulu, HI, 21–26 July 2017) (Picastaway, NJ: IEEE; ) ( 10.1109/CVPR.2017.632) [DOI] [Google Scholar]

[R25] Kang M, Hu X, Huang W, Scott MR and Reyes M 2019. Dual-Stream Pyramid Registration Network Med Image Anal 78 382–90 [DOI] [PubMed] [Google Scholar]

[R26] Hu Y et al. 2018. Weakly-supervised convolutional neural networks for multimodal image registration Med. Image Anal 49 1–13 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Ledig C, Heckemann RA, Hammers A, Lopez JC, Newcombe VFJ, Makropoulos A, Lötjönen J, Menon DK and Rueckert D 2015. Robust whole-brain segmentation: application to traumatic brain injury Med. Image Anal 21 40–58 [DOI] [PubMed] [Google Scholar]

[R28] Liang X, Chen L, Nguyen D, Zhou Z, Gu X, Yang M, Wang J and Jiang S 2019. Generating synthesized computed tomography (CT) from cone-beam computed tomography (CBCT) using CycleGAN for adaptive radiation therapy Phys. Med. Biol 64 125002. [DOI] [PubMed] [Google Scholar]

[R29] Modat M, Vercauteren T, Ridgway GR, Hawkes DJ, Fox NC and Ourselin S 2010. Diffeomorphic demons using normalized mutual information, evaluation on multimodal brain MR images Medical Imaging 2010: Image Processing Int. Society for Optics and Photonics p 76232K [Google Scholar]

[R30] Momin S, Lei Y, Wang T, Fu Y, Patel P, Jani AB, Curran WJ, Liu T and Yang X 2021. Deep learning-based deformable MRI-CBCT registration of male pelvic region Medical Imaging 2021: Computer-Aided Diagnosis 11597, 108–13 [Google Scholar]

[R31] Murphy K et al. 2011. Evaluation of registration methods on thoracic CT: the EMPIRE10 challenge IEEE Trans. Med. Imaging 30 1901–20 [DOI] [PubMed] [Google Scholar]

[R32] Nabavi A et al. 2001. Serial intraoperative magnetic resonance imaging of brain shift Neurosurgery 48 787–98 [DOI] [PubMed] [Google Scholar]

[R33] Niethammer M, Kwitt R and Vialard FX 2019. Metric learning for image registration Proc. of the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (IEEE Computer Society; ) pp 8455–64 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Denis de Senneville B, Zachiu C, Ries M and Moonen C 2016. EVolution: an edge-based variational method for non-rigid multi-modal image registration Phys. Med. Biol 61 7377–96 [DOI] [PubMed] [Google Scholar]

[R35] Nithiananthan S, Schafer S, Mirota DJ, Stayman JW, Zbijewski W, Reh DD, Gallia GL and Siewerdsen JH 2012. Extra-dimensional Demons: a method for incorporating missing tissue in deformable image registration Med. Phys 39 5718–31 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Nithiananthan S et al. 2011. Demons deformable registration of CT and cone-beam CT using an iterative intensity matching approach Med. Phys 38 1785–98 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] Nowell M, Rodionov R, Diehl B, Wehner T, Zombori G, Kinghorn J, Ourselin S, Duncan J, Miserocchi A and McEvoy A 2014. A novel method for implementation of frameless StereoEEG in epilepsy surgery Neurosurgery 10 525–34 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Oppido PA. et al. Neuroendoscopic biopsy of ventricular tumors: a multicentric experience. Neurosurg. Focus. 2011;30:E2. doi: 10.3171/2011.1.FOCUS10326. [DOI] [PubMed] [Google Scholar]

[R39] Park S, Plishker W, Quon H, Wong J, Shekhar R and Lee J 2017. Deformable registration of CT and cone-beam CT with local intensity matching Phys. Med. Biol 62 927–47 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] Reaungamornrat S, De Silva T, Uneri A, Vogt S, Kleinszig G, Khanna AJ, Wolinsky J-P, Prince JL and Siewerdsen JH 2016. MIND demons: symmetric diffeomorphic deformable registration of MR and CT for image-guided spine surgery IEEE Trans. Med. Imaging 35 2413–24 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] Rivest-Hénault D, Dowson N, Greer PB, Fripp J and Dowling JA 2015. Robust inverse-consistent affine CT-MR registration in MRI-assisted and MRI-alone prostate radiation therapy Med. Image Anal 23 56–69 [DOI] [PubMed] [Google Scholar]

[R42] Spadea MF, Maspero M, Zaffino P and Seco J 2021. Deep learning based synthetic-CT generation in radiotherapy and PET: a review Med. Phys 48 6537–66 [DOI] [PubMed] [Google Scholar]

[R43] Spennato P, Cinalli G, Ruggiero C, Aliberti F, Trischitta V, Cianciulli E and Maggi G 2007. Neuroendoscopic treatment of multiloculated hydrocephalus in children J. Neurosurg 106 29–35 [DOI] [PubMed] [Google Scholar]

[R44] Wei D, Ahmad S, Huo J, Peng W, Ge Y, Xue Z, Yap P-T, Li W, Shen D and Wang Q 2019. Synthesis and inpainting-based MR-CT registration for image-guided thermal ablation of liver tumors Med. Image Comput. Comput. Assist. Interv.—MICCAI 2019. MICCAI 2019. Lect. Notes Comput. Sci 11768 LNCS 512–20 [Google Scholar]

[R45] Weiss K, Khoshgoftaar TM and Wang DD 2016. A survey of transfer learning J. Big Data 3 1–40 [Google Scholar]

[R46] Wu P, Sisniega A, Stayman JW, Zbijewski W, Foos D, Wang X, Khanna N, Aygun N, Stevens RD and Siewerdsen JH 2020. Cone-beam CT for imaging of the head/brain: development and assessment of scanner prototype and reconstruction algorithms Med. Phys 47 2392–407 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] Wu P, Sisniega A, Uneri A, Han R, Jones C, Vagdargi P, Zhang X, Luciano M, Anderson W and Siewerdsen J 2021. Using Uncertainty in Deep Learning Reconstruction for Cone-Beam CT of the Brain 2108.09229 [Google Scholar]

[R48] Xu J et al. 2016. Technical assessment of a prototype cone-beam CT system for imaging of acute intracranial hemorrhage Med. Phys 43 5745–57 [DOI] [PubMed] [Google Scholar]

[R49] Yang H, Sun J, Carass A, Zhao C, Lee J, Prince JL and Xu Z 2020a. Unsupervised MR-to-CT synthesis using structure-constrained CycleGAN IEEE Trans. Med. Imaging 39 4249–61 [DOI] [PubMed] [Google Scholar]

[R50] Yang H, Qian P and Fan C 2020b. An indirect multimodal image registration and completion method guided by image synthesis Comput. Math. Methods Med 2020 ( 10.1155/2020/2684851) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] Yuan J, Bae E, Tai XC and Boykov Y 2014. A spatially continuous max-flow and min-cut framework for binary labeling problems Numer. Math 126 559–87 [Google Scholar]

[R52] Zeng G, Yang X, Li J, Yu L, Heng PA and Zheng G 2017. 3D U-net with multi-level deep supervision: fully automatic segmentation of proximal femur in 3D MR images Machine Learning in Medical Imaging. MLMI 2017 10541 LNCS, 274–82 [Google Scholar]

[R53] Zhen X, Gu X, Yan H, Zhou L, Jia X and Jiang SB 2012. CT to cone-beam CT deformable registration with simultaneous intensity correction Phys. Med. Biol 57 6807–26 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] Zuo L, Dewey BE, Liu Y, He Y, Newsome SD, Mowry EM, Resnick SM, Prince JL and Carass A 2021. Unsupervised MR harmonization by learning disentangled representations using information bottleneck theory Neuroimage 243 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Joint synthesis and registration network for deformable MR-CBCT image registration for neurosurgical guidance

R Han

C K Jones

J Lee

X Zhang

P Wu

P Vagdargi

A Uneri

P A Helm

M Luciano

W S Anderson

J H Siewerdsen

Abstract

Objective.

Approach.

Main results.

Significance.

1. Introduction

2. Algorithmic methods

Figure 1.

2.1. Network architecture

2.1.1. Encoders

Figure 2.

2.1.2. Synthesis decoders

2.1.3. Registration decoders

Figure 3.

2.2. End-to-end learning of JSR

2.2.1. Image synthesis loss

2.2.2. Image registration loss

3. Experimental methods

3.1. Image datasets

3.1.1. Simulation dataset

Figure 4.

3.1.2. Clinical dataset

3.2. Implementation details

3.3. Comparison of registration methods

3.3.1. Single-resolution JSR (JSR-single)

3.3.2. SyN-MI

3.3.3. VM-MI

3.3.4. VM-Synth-NCC

3.3.5. VM-DualSynth-NCC

4. Results

4.1. Simulation studies

4.1.1. Accuracy of image synthesis

Figure 5.

4.1.2. Accuracy of deformable MR-CBCT registration

Table 1.

Figure 6.

Figure 7.

4.1.3. Accuracy of deformable registration in the presence of metal

Figure 8.

4.2. Clinical studies

Table 2.

Figure 9.

5. Discussion

5.1. Sensitivity of registration accuracy to image synthesis performance

Figure 10.

Table 3.

5.2. JSR architecture design

5.3. Which domain is best for MR-CBCT registration: inter-modality, synthetic MR domain, or synthetic CT domain?

Figure 11.

5.4. Network performance in clinical situations

Figure 12.

5.5. Limitations, generalizability, and future work

6. Conclusions

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases