Abstract
Retrograde intrarenal surgery (RIRS) is a widely utilized diagnostic and therapeutic tool for multiple upper urinary tract pathologies. The image‐guided navigation system can assist the surgeon to perform precise surgery by providing the relative position between the lesion and the instrument after the intraoperative image is registered with the preoperative model. However, due to the structural complexity and diversity of multi‐branched organs such as kidneys, bronchi, etc., the consistency of the intensity distribution of virtual and real images will be challenged, which makes the classical pure intensity registration method prone to bias and random results in a wide search domain. In this paper, we propose a structural feature similarity‐based method combined with a semantic style transfer network, which significantly improves the registration accuracy when the initial state deviation is obvious. Furthermore, multi‐view constraints are introduced to compensate for the collapse of spatial depth information and improve the robustness of the algorithm. Experimental studies were conducted on two models generated from patient data to evaluate the performance of the method and competing algorithms. The proposed method obtains mean target error (mTRE) of 0.971 ± 0.585 mm and 1.266 ± 0.416 mm respectively, with better accuracy and robustness overall. Experimental results demonstrate that the proposed method has the potential to be applied to RIRS and extended to other organs with similar structures.
Keywords: multimodal image registration, multi‐view constraint, retrograde intrarenal surgery, structural feature extraction, style transfer
1. INTRODUCTION
Nephrolithiasis is one of the most common urological diseases in recent years. Its prevalence is increasing over the last two decades and is highly recurrent, with recurrence risks as high as 50% within 10 years. As a diagnostic and therapeutic tool in the management of upper urinary tract pathologies including nephrolithiasis, retrograde intrarenal surgery (RIRS) has been widely utilized. 1 , 2 , 3
In RIRS, the surgeon usually needs to use the ureteroscope to search the calyces successively until the target is located. 4 Novice surgeons are often disoriented and less effective due to the tortuous anatomy of the renal cavity and poor visibility due to microhematuria. Surgeons generally use fluoroscopic guidance to regain their spatial orientation, but this would expose the patient to the risk of excessive radiation. In addition, intraoperative two‐dimensional (2D) images only provide a limited viewing angle, which has disadvantages such as lack of depth information, occlusion, and dependence on experience. The presence of the above factors will inevitably lead to an increase in operative time and tissue injury, which is a crucial risk factor for some severe complications, such as septic shock, cardiovascular events, and blood loss. 5 , 6 , 7 The development of a surgical navigation system offers hope for a solution to the abovementioned problems.
The surgical navigation system is based on the preoperative medical image (CT, MRI, etc.), reconstructs the three‐dimensional (3D) model, and combines with the real‐time tracking system to guide the operation of doctors. As a crucial part of the navigation system for RIRS, endoscopic 2D/3D registration realizes the coordinate system alignment between the 3D model and the intraoperative camera.
The problem of 2D/3D registration has been widely studied and a large number of methods were proposed. 8 , 9 , 10 , 11 , 12 They were mainly classified into three categories: landmark‐based registration, computer vision and deep learning‐based registration, and intensity‐based registration.
Landmark‐based registration is realized by establishing and aligning correspondence (landmarks/features) between objects. The principle is simple and efficient, but its effectiveness depends on the matching accuracy of the correspondence. For the acquisition of landmarks, the previous work mainly relied on the discrimination and extraction of human eyes. Recently, automatic landmark detection methods based on deep learning networks have also emerged. In general, extracting feature landmarks from complex anatomical shapes is an extremely challenging problem, especially for flexible internal organs of the human body. This approach mainly focused on rigid parts such as brains, bones, and teeth.
With the rapid development of computer vision and deep learning, the above technologies have also been applied to registration. Early research mainly uses traditional depth reconstruction algorithms (SFS, SFM, SLAM, etc.) to construct surface point clouds and convert 2D/3D problems into 3D/3D registration. 13 , 14 In recent years, learning‐based depth prediction networks 15 and end‐to‐end registration networks 16 have emerged for solving the registration problem. However, Unberath et al. believed that deep learning is still in the exploration stage, and the generalization ability of the network model on different patients remains to be verified. Therefore, a hybrid registration guided by traditional methods and combined with deep learning may be a good choice. 17 Koo et al. 18 and Labrunie et al. 19 successively used the segmentation network for contour feature extraction and then used the RANSAC‐PNP algorithm for direct 2D/3D registration.
The intensity‐based registration is one of the important contents of 2D/3D endoscopic registration. Usually, intensity‐based registration usually requires a robust similarity function to accurately characterize the intensity difference to guide image matching. Helferty et al. proposed a normalized mutual information (NMI)‐based global similarity measurement to match the bronchoscopic image and virtual image. 20 The work 21 , 22 , 23 successively explored and presented intensity‐based constraints to perform registration in the bronchi, digestive tract, etc. More recently, research work 24 , 25 proposed an HWD‐driven similarity measure and a multiscale structural similarity measure for bronchoscopic registration, respectively. However, the performance of this method depends on the intensity consistency between real images and virtual images and is easily affected by differences in multimodal data.
Although 2D/3D registration has been extensively studied, it mainly focuses on the brain, orthopedics, bronchus, ear, throat, etc. For registration in the intrarenal cavity, there are still the following two problems: (1) Methods suitable for intrarenal cavity registration, including the determination of structural features, the establishment of similarity measures, or methods based on deep learning remain to be studied; (2) the complexity and diversity of intrarenal structures, the lack of significant and robust landmarks, and the complicated intraoperative environment bring challenges to registration. For example, in complex intrarenal structures, it is difficult to maintain intensity consistency between virtual and real images. Classical intensity‐based registration methods tend to show biased results and get trapped in local minima.
Therefore, in this article, we define a projected structural contour feature and propose a robust and high‐precision 2D/3D registration framework for addressing the initial registration in RIRS surgery. The framework effectively uses the structural features of renal anatomy and endoscopic images to automatically extract 3D feature points and corresponding 2D distinctive regions for matching. In addition, an image translation network is introduced to mitigate the impact of multimodal data, and multi‐view constraints are added to compensate for the collapse of spatial depth information to improve robustness and accuracy. Moreover, an experimental study was performed on two models with significant shape differences to evaluate the performance of the method and compare it with other competing algorithms.
2. MATERIALS AND METHODS
2.1. Algorithm overview
The developed algorithm includes preoperative 3D structural feature extraction, 2D image salient region extraction, iterative projection of structural features, and optimal transformation search based on similarity measure. The 3D structure features are sparse representations of salient regions in the CT volume, and its 2D projections have a strong gradient on the image. The 3D feature projections and 2D distinctive points are respectively defined as generated contours and actual contours. Intuitively, the registration is achieved by minimizing visual misalignment between the generated contours and actual contours. Generated contours are the projection from an iterated estimated pose. A similarity cost function is established and a coarse‐to‐fine search strategy was employed to estimate the optimal transform. Furthermore, a style transfer network is integrated to mitigate the impact of differences between multimodal data. The collapse of spatial depth information makes it difficult to obtain precise 2D/3D registration through 2D‐2D contour matching. Thus, a multi‐view constraint is added to improve the accuracy and robustness. To sum up, the main modules and processing pipeline are shown in Figure 1.
FIGURE 1.

The working pipeline of the proposed registration framework
2.2. Assumptions and initiation
A key assumption of our algorithm is that a common set of feature points can be robustly extracted from the 3D volume and 2D images. Feature points in 2D images are generated due to the interaction of anatomical structure, illumination, and line of sight. A typical renal collection system and the state of the endoscope observing the calyces is shown in Figure 2. For 3D surface points with gradients perpendicular to the line of sight, their projected contours have significant intensity differences compared to their surroundings due to light occlusion effects, especially the edge of minor calyces and the joints of the minor and major calyces. This phenomenon can be observed in both real and virtual rendered images, as shown in Figure 3.
FIGURE 2.

A typical working scenario in surgery. Light occlusion from an endoscopic perspective.
FIGURE 3.

The structural region in the real and the virtual rendered image.
The task of 2D/3D registration is to find a high‐precision transformation between the volume and the camera space. Searching in an entire volume space would take a lot of effort. Obtaining an initial pose estimation would reduce search space. It does not necessarily ensure an accurate matrix. Therefore, in our work, the landmark‐based registration is performed manually to get the T 0.
2.3. Salient feature points extraction
As shown in Figure 4, to observe more obvious structural features, a search space is defined in the renal pelvis to achieve initial registration. For each camera pose , a salient point set related to was selected from the CT model surface. has the following characteristic: (1) at the camera pose , through the camera projection model, a 2D contour can be generated on the image; (2) satisfies perpendicularity and visibility conditions. Further, the salient points of the entire model consists of a series of , . are usually distributed continuously in the prominences of the kidney, such as the junction of the pelvis and calyces, and the junction of the major and minor calyces, as shown in Figure 5.
FIGURE 4.

Schematic diagram of salient feature point extraction in a three‐dimensional (3D) model. The ps is a feature point in this figure.
FIGURE 5.

Structural feature points in front view, rear view, and camera view.
(1) Intersection: The salient structure points have the characteristics that the normal is perpendicular to the viewing vector . In this case, the optical center is constrained to be on the tangent plane πp, fulfilling the point‐on‐plane condition. An intersection test is performed between the tangent plane πp and the renal pelvis space to determine the common spatial region πc. If the region πc exists, then the collision test would be performed. An intuitive explanation is shown in Figure 4.
(2) Collision: The prerequisite for observing from successfully is that there are no obstacles on the line segment formed by and . The purpose of this test is to determine whether there is point on the plane πp. The triangles of the volume are considered as obstacles. Specifically, we create a number of seed points in plane πp with equal intervals. For each seed point , we calculate the line segment with and execute the collision detection between rays and triangles. When no collision occurs, the is marked as a valid point. When the number of valid points is greater than the threshold , was determined as a significant feature point.
2.4. D. Salient point selection with the iterated pose
During the registration process, the generated contours are continuously updated according to the iterative camera pose . The perpendicularity and visibility test were performed to select the set of salient points from . And then, the is projected to obtain the up‐to‐date generated contours.
(1) Perpendicularity: It is worth noting that this requirement is the first necessary condition. For each point in the feature points subset F, we select points that satisfy the following angle constraint
| (1) |
where is the angle between the normal of the salient point and the viewing ray, is the normal of point and is the direction of viewing ray. The perpendicular threshold controls the contour thickness and is empirically set close to 90◦.
(2) Visibility: This test is used to check whether a salient feature point can be observed on the 2D projection image. A collision detection algorithm is performed similarly in the last section. We detect whether there are obstacles on the path between the salient point and the viewing position.
2.5. E. Semantic GAN‐based translation
In this paper, the images and used for similarity matching were derived respectively from intraoperative endoscopic images and virtual rendering images. Referring to the standard virtual endoscopy techniques, was generated using the surface rendering method under the OpenGL framework. Moreover, there are two aspects that need to be matched with the real endoscope for registration application: (1) camera model parameters related to the image projection, including focal length, the field of view, radial distortion, and principal point. (2) Lighting model for scene rendering, including the light source and scene lighting. The Phong lighting model was used to simulate complicated scenarios. The lighting parameters used were selected by visual inspection to match the real image. However, due to the inherent differences in the color, light, material, texture, and the realistic operating environment between the two domains, the style of the rendering image is still quite different from the endoscopic image .
To alleviate the effect of domain shift between multimodal images, we trained a translation network to invert the style from the to the . The cycle‐consistent generative adversarial network (CycleGAN) 26 is adopted due to the lack of paired data sets. The CycleGAN network includes two generator networks (, ) and two discriminator networks (, ). The generator is aimed at producing fake and that are accurate enough to fool the discriminator. Discriminator networks are used to distinguish the generated images from “real” and then update the generator networks accordingly. This adversarial behavior is formalized through the loss function as follows:
| (2) |
| (3) |
| (4) |
| (5) |
However, CycleGAN has changed the intensity and anatomical shape simultaneously, so that the semantic mask of the generated image no longer corresponds to that of the original image. 27 In our application, the network produces false holes due to the image over‐rendering, as shown in Figure 6. We introduced a semantic consistency loss into CycleGAN to avoid geometric changes between the real and synthesized images. A semantic mask categorized into holes and wall was obtained by using a trained U‐Net. 28 Then, a K‐way classification with a cross‐entropy loss is used to establish the semantic consistency loss as follows:
| (6) |
| (7) |
FIGURE 6.

False hole generated from the cycle‐consistent generative adversarial network (CycleGAN).
Taken together, the image translation network is shown in Figure 7 and its full loss functions can be summarized as
| (8) |
FIGURE 7.

A CycleGAN‐based image translation network with semantic loss.
This ultimately corresponds to solving for a target network according to the optimization problem, which expressed as:
| (9) |
2.6. F.Actual contours extraction
The extraction of the actual contour of a 2D image is an important step in 2D/3D alignment. The region at contours tends to has contrasting intensity differences compared to the surroundings, which is utilized to extract contrasting regions and actual contours. It's worth noting that the extraction is performed on the 2D image which is generated from the trained network . To improve the robustness of detection, we employ a segmentation approach SLIC 29 as a preprocessing step to group pixels with similar characteristics to reduce the influence of abnormal single pixels.
Given a superpixel , we decide whether is a contrasting region. It mainly includes three steps: (1) count the number of pixels in the and calculate the average intensity. (2) search the neighborhood superpixel of and calculate the average intensity difference, . (3) obtain the maximum mean intensity difference and compare the with the threshold . If satisfies and , superpixel is selected. Subsequently, we calculate the barycentric coordinates of selected superpixel and record coordinate as the actual contour. It is worth noting that the extracted is likely to deviate from the actual contour coordinate in visual, but this deviation is acceptable only for supporting the subsequent fine searching.
2.7. G. Coarse‐to‐fine estimation
The optimal registration solution can be represented by the optimal alignment between generated contours and the actual contours. A cost function based on similarity measure was built to guide the search for . We adopt a coarse‐to‐fine search strategy and use a Differential Evolution (DE) 30 algorithm to search the . This process is divided into two stages: (1) iterative search of the location‐based cost function to reduce the search range; (2) under the reduced search space, optimize the cost function that contains position, gradient, and multi‐view constraints.
MI is an information‐theoretic function that measures the mutual dependence between two random variables. In the registration field, it characterizes the statistical distributions within the entire image area and is robust to noise, which is well suited for the multimodality registration scenario. In the coarse searching process, we evaluate the geometric location similarity between contours, which means attempting to align a point set with the other . The is coordinate in 2D image. Then, a searching function of the joint probability 31 is established by
| (10) |
| (11) |
T represents the iterated transformation matrix, is the Euclidean distance between and . Usually, the extracted contours are at the edge with strong gradient. In the follow‐up search process, the local gradient term was added to the cost to achieve a higher precision contour matching. The gradient term includes the gradient magnitude and orientation. To reduce the ambiguity in 2D‐2D contour matching, an additional multiple views constraint was also integrated into the framework. represents the contour location similarity cost calculated from the other camera pose . Finally, the cost function in fine process is established by
| (12) |
| (13) |
| (14) |
| (15) |
and is respectively the gradient magnitude and angle difference between pi and sj ,,, represents the weight of each cost function, is the transformation between frames.
3. EXPERIMENTS AND EVALUATION
To validate the effectiveness of the proposed framework, we have carried out experimental verification on two phantoms based on patient CT data. We built a data acquisition platform shown in Figure 8(a, b) and used a PC (Intel®CoreTM i7‐6799K) equipped with RTX2080 (GPU, NVIDIA) to train the network and run the algorithm. Endoscopic videos were recorded using a Hawk electronic ureteroscope (DPG II) and an image processing system Hawk SD300A. An Aurora electromagnetic tracking system (NDI, Canada) and a 6 DOF electromagnetic sensor mounted on the tip of the ureteroscope were used to provide location data. The transformation between the electromagnetic tracking coordinate system and the CT volume coordinate system was calibrated, as shown in Figure 8(c).
FIGURE 8.

Experimental platform. (a) Data generation and collection platform. (b) A 3D‐printed phantom with ten marks. (c) The calibration between electromagnetic global and CT model coordinate system.
3.1. Phantom and training datasets
Two 3D‐printed phantoms were created to collect test datasets. Each phantom contains a recessed slot for holding the model and ten markers as target points for evaluating algorithm accuracy. Marks are circular holes of 1 mm in diameter and different depth, which distribute around the kidney model, as shown in Figure 8(b). To explore the impact of structure differences on algorithm performance, the two kidney models (K1, K2) with different shapes were selected. Among them, K1 contains a complex branch structure, while the other is relatively simple, as shown in Figures 9(a) and 10(a). The CT data were acquired using a SIEMENS's device with 0.6 mm slices. To evaluate the performance of the translation network in the live conditions and the phantom conditions, we trained and tested on two datasets: live/CT and phantom/CT. The live images were collected from ten patients. The CT dataset was made by rendering from 3D printing phantoms through virtual endoscopy technology.
FIGURE 9.

The positions and rotation of 25 sample points in K1.
FIGURE 10.

The positions and rotation of 25 sample points in K2.
3.2. Evaluation methodology and ground‐truth
The performance of the algorithm was evaluated mainly relying on the mean target registration error (mTRE) 32 that is defined as the distance of target points under the ground‐truth (GT) and the estimated transformation . The mTRE is calculated by,
| (16) |
represents the 3D position of the i‐th marker in the CT coordinate system. The GT was obtained by manually adjusting the transformation matrix so that the projected contours were visual aligned with the contours on the real 2D image. Given the ambiguity of 2D matching caused by depth information collapse, it is hard to resolve the subtle deviations only relying on a visual way. We added a selection criterion to filter out the optimal truth value. Among them, represents the transformation calculated utilizing the data of electromagnetic positioning as follows:
| (17) |
represents the calibrated transformation from CT to NDI coordinate system, represents magnetic positioning information under the current frame, represents the rigid‐body transformation between camera and magnetic sensor coordinate system. Considering the calibration error, we set the threshold as 2.5 mm, and the smaller the value is the better if there is no distinguishable contour misalignment.
3.3. Search range and sampling pose
With reference to the optimization problem, we set up a 6 DOF searching range based on a start pose . μ0 was calculated by manual matching 4−5 anatomical marks on the CT model and the phantom. We set an angle search range of . The search range of position was set as referring to the size of the renal pelvis. Evaluation using randomly sampled start positions is part of the standardized evaluation methodology for 2D/3D registration. In our randomized study, sampling points varied mainly in depth and roll rotation considering the surgical scene and the shape from the ureter to the renal pelvis. We collected 25 test positions in phantom K1 and K2 respectively, and their positions and orientations are shown in Figures 9 and 10.
3.4. Experiment results
The semantic‐based translation network was tested on real endoscopic images and images captured from models. Some results are shown in Figure 11. The results indicate that the semantic constraints effectively inhibits the generation of false holes. To evaluate the quality of the synthetic image, the indicators of Mean absolute error (MAE), Peak‐Signal‐Noise‐Ratio (PSNR), Structural Similarity (SSIM), Multiple scales structural similarity (MS‐SSIM), and Sharpness Difference (SD) were adopted. 33 Measurements are computed as shown in Table 1.
The quality of contour extraction in 2D image will directly affect the registration accuracy of the algorithm. Therefore, according to the study, 34 a distance‐based indicator was selected as the measure to evaluate the accuracy and robustness of our auto‐contouring method. The value close to 0 indicates a good segmentation, otherwise, the value close to 1 for a poor segmentation. The contour GT was obtained by manual labeling. For phantom experiments, we compared the difference between projected contours calculated from GT registration matrix and extracted contours from cycleGAN 2D images. For contour quality evaluation in real images, we compared the difference of results with and without the cycleGAN. The results are shown in Table 2. The results show that the contour extraction method in this paper achieves acceptable values (all less than 0.5) and performs well on phantoms. The contour extraction error of the real image is higher than that of the phantom, but it is still below 0.5, and the extraction results using the cycleGAN network is better than that directly from the original image.
We evaluated the performance of the proposed method and compared it with five competing methods used in 2D/3D registration for endoscope: NMI, 20 , 23 RANSAC‐PnP, 19 MS‐DSSM 25 and CycleGAN‐Depth. 15 The Powell algorithm was used to optimize the competing method, and the initial value of the optimization was set to μ0. We calculated the mean and standard deviation on indicators such as mTRE, orientation error (OE), and position error (PE). The initial registration errors mTRE obtained on K1 and K2 phantom are respectively and .The comparison results with other methods are shown in Table 3 and Table 4. In K1 cases, the mean and standard deviation of the mTRE, OE, and PE are respectively mm, and 0.42 ± 0.24 mm. In K2 cases, the mean and standard deviation of the mTRE, OE, and PE are respectively 1.27 ± 0.42 mm, and 0.58 ± 0.21 mm. The results indicate that the proposed method significantly improved the accuracy and robustness. In Figure 12, we show the trend of mTRE during coarse‐to‐fine iterations, which is in line with our expectations. After reducing the search range through iterations in the coarse registration stage, higher precision registration is achieved through introducing local gradients.
Ablation study: the cycleGAN and contour matching are two important modules in our algorithm. Therefore, the ablation experiments were performed in two groups to verify the effectiveness: without the cycleGAN and without the contour matching. The comparative experimental results are shown in Table 5. The results show that contour matching contributes more to the accuracy of the algorithm, but style transfer network can also improve the performance.
FIGURE 11.

The results of the translation network. Images in (a)and(b) were from two patients. Images in (c)and(d) were from K1 and K2 models.
TABLE 1.
Quantitative evaluations on image style transfer network.
| Category | MAE | PSNR | SSIM | MS‐SSIM | SD |
|---|---|---|---|---|---|
| K1 | 24.16 ± 4.32 | 19.18 ± 1.41 | 0.91 ± 0.01 | 0.69 ± 0.05 | 32.04 ± 0.28 |
| K2 | 19.25 ± 4.73 | 21.18 ± 1.84 | 0.92 ± 0.02 | 0.69 ± 0.07 | 32.17 ± 0.37 |
| real | 34.60 ± 13.25 | 16.99 ± 3.10 | 0.84 ± 0.04 | 0.70 ± 0.08 | 30.42 ± 0.64 |
TABLE 2.
Quantitative evaluations on contour quality.
| K1 | K2 | real | ||||
|---|---|---|---|---|---|---|
| Indicator | detect | project | detect | project | original | cycleGAN |
| KPIΨ | 0.12 ± 0.04 | 0.02 ± 0.02 | 0.09 ± 0.03 | 0.02 ± 0.02 | 0.37 ± 0.22 | 0.27 ± 0.18 |
TABLE 3.
The registration results of the comparison experiments on the K1 model.
| K1 | |||||||
|---|---|---|---|---|---|---|---|
| Indicators | Our | NMI |
|
PnP | MS‐DSSIM | Depth‐GAN | |
| mTRE (mm) | 0.97 ± 0.59 | 8.48 ± 4.15 | 4.70 ± 1.57 | 52.18 ± 25.65 | 13.79 ± 6.96 | 11.82 ± 6.08 | |
| OE (°) | 0.54 ± 0.19 | 5.35 ± 1.82 | 3.07 ± 1.09 | 37.96 ± 33.92 | 8.64 ± 4.16 | 6.86 ± 3.37 | |
| PE (mm) | 0.42 ± 0.24 | 2.63 ± 0.74 | 2.11 ± 0.72 | 17.28 ± 12.07 | 3.95 ± 2.36 | 4.90 ± 2.33 | |
TABLE 4.
The registration results of the comparison experiments on the K2 model.
| K2 | |||||||
|---|---|---|---|---|---|---|---|
| Indicators | Our | NMI |
|
PnP | MS‐DSSIM | Depth‐GAN | |
| mTRE (mm) | 1.27 ± 0.42 | 16.24 ± 14.12 | 8.69 ± 3.59 | 81.14 ± 72.65 | 14.56 ± 6.43 | 14.43 ± 5.56 | |
| OE (°) | 0.54 ± 0.19 | 5.35 ± 1.82 | 6.14 ± 2.34 | 46.82 ± 48.40 | 9.27 ± 3.70 | 8.20 ± 2.94 | |
| PE (mm) | 0.42 ± 0.24 | 2.63 ± 0.74 | 3.93 ± 2.23 | 41.28 ± 50.92 | 4.58 ± 2.44 | 5.12 ± 1.92 | |
FIGURE 12.

The trend of mean target registration error (mTRE) during coarse‐to‐fine iterations.
TABLE 5.
Ablation study.
| K1 | K2 | |||||
|---|---|---|---|---|---|---|
| Indicators | A+B+C+D | A+C | B+C+D | A+B+C+D | A+C | B+C+D |
| mTRE (mm) | 0.97 ± 0.59 | 27.14 ± 23.46 | 3.49 ± 5.85 | 1.27 ± 0.42 | 17.62 ± 6.17 | 2.38 ± 1.11 |
| OE (°) | 0.54 ± 0.19 | 15.99 ± 15.18 | 2.08 ± 3.65 | 0.73 ± 0.25 | 10.76 ± 4.22 | 1.46 ± 0.66 |
| PE (mm) | 0.42 ± 0.24 | 6.99 ± 3.40 | 1.31 ± 1.84 | 0.58 ± 0.21 | 3.70 ± 1.64 | 0.74 ± 0.44 |
4. DISCUSSION AND CONCLUSION
We developed and investigated a 2D/3D registration method used in RIRS. Structures of interest are extracted and projected into the generated contours to match with the actual contours. A translation network and multi‐view constraint were integrated to achieve higher precision registration. The performance of the method was evaluated in phantoms and compared with other endoscopic registration techniques. The results show that our approach has a better accuracy and variability overall. The indicators on model K1 are better than that of model K2. One possible reason is that the more complex lumen structure of K1 brings more structural constraints.
Intensity‐based registration is usually greatly affected by the intensity differences between images. A translation network was trained and used to alleviate the impact of intensity differences. The results show that the translation network improved indicators. Due to the lack of structural constraints and the influence of initial values, the competing algorithms tend to have considerable bias and randomness. As shown in Figure 13, when the initial position is outside of the model, the optimized outcomes are still outside. The introduced maximum contour position similarity constrains the solution space, so that our algorithm can run in a rough search space, which suppresses the generation of the above‐mentioned bad results.
FIGURE 13.

Comparison of results when initial positions outside of the model. (a) is the GAN‐generated image translated from the real image.
Although the proposed method performs well on phantoms, it still has some limitations. The quality of extraction of actual contours has a significant impact on registration accuracy. Fortunately, the translation network can be trained to generate images that salient regions can be extracted more easily. Another disadvantage is that the running time is about 30 min. The time cost mainly comes from the image processing in the fine search stage, such as gradient calculation and rendering. Parallel computing can be considered to optimize algorithm and reduce the computation time. This is also the focus of our follow‐up research work. Finally, we found that binary classification semantic loss suppresses the appearance of false holes, but it can't solve them completely. One of the solutions may be multi‐label classification and image recognition. Moreover, the initial TRE on phantom results would better than the actual patient's TRE. Therefore, this may have two effects on the actual registration process when setting up range centering on T 0 : 1) When the search space is too small and the true value is not included in it, the algorithm may generate a poor result; 2) When the search space is expanded, a local optimum may appear. Therefore, in order to verify the effectiveness, we expanded the search range to test the performance of the algorithm. The comparison results are shown in Table 6, in which ‘[15,10]’ denotes ± in rotation, ± 10 mm in translation, ‘[90,30]’ denotes ± in rotation, ± 30 mm in translation. We found that our algorithm still maintains a good registration accuracy compared with small search range (mTRE is increased within 1 mm). However, the average number of iterations required increased (from 550 to 950) when the solution range increased. Therefore, more in vivo experiments are needed to verify the accuracy and robustness of the method.
TABLE 6.
The results of the comparison experiments on an expanded search range.
| K1 | K2 | |||
|---|---|---|---|---|
| Indicators | [15,10] | [90,30] | [15,10] | [90,30] |
| mTRE (mm) | 0.97 ± 0.59 | 1.62 ± 0.97 | 1.27 ± 0.42 | 1.97 ± 0.85 |
| OE (°) | 0.54 ± 0.19 | 0.96 ± 0.61 | 0.73 ± 0.25 | 1.67 ± 2.62 |
| PE (mm) | 0.42 ± 0.24 | 0.66 ± 0.39 | 0.58 ± 0.21 | 1.19 ± 2.01 |
Overall, the proposed method has the potential to support registration in RIRS and be extended to other organs with similar structures. To further enhance the robustness and effectiveness, the quality of contour generator needs to be improved to cope with more complex intraoperative environments. Required work includes collecting more realistic surgical images, classifying image content more precisely, and training style transfer networks to extract contours more efficiently.
AUTHOR CONTRIBUTIONS
Zuoming Fu: Drafting the manuscript, Methodology, Software, Validation. Ziyi Jin: Methodology, Review & Editing. Chongan Zhang: Methodology, Review & Editing, Validation, Investigation. Peng Wang: Supervision, Data Curation, Acquisition of data, Review & Editing. Hong Zhang: Funding acquisition, Review & Editing. Xuesong Ye: Review & Editing, Funding acquisition, Project administration, Revising the manuscript critically for important intellectual content.
CONFLICT OF INTEREST STATEMENT
The authors declare that they have no conflict of interest.
ETHICAL APPROVAL
This article does not contain any studies with human participants or animals performed by any of the authors.
ACKNOWLEDGEMENTS
This research was supported by the National Major Scientific Research Instrument Development Project (Grant No. 81827804) and the Key Research and Development Plan of Zhejiang Province (Grant Nos. 2022C03086) and the National Key Research and Development Project (Grant Nos. 2022YFB4700803).
Fu Z, Jin Z, Zhang C, Wang P, Zhang H, Ye X. A novel intrarenal multimodal 2D/3D registration algorithm and preliminary phantom study. J Appl Clin Med Phys. 2023;24:e14084. 10.1002/acm2.14084
REFERENCES
- 1. Gauhar V, Somani BK, Heng CT, et al. Technique, feasibility, utility, limitations, and future perspectives of a new technique of applying direct in‐scope suction to improve outcomes of retrograde intrarenal surgery for stones. J Clin Med. 2022;11(19):5710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Van Cleynenbreugel B, Kılıç Ö, MJTJou Akand. Retrograde intrarenal surgery for renal stones‐Part 1. Turk J Urol. 2017;43(2):112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Solano C, Chicaud M, Kutchukian S, et al. Optimizing outcomes in flexible ureteroscopy: a narrative review of suction techniques. J Clin Med. 2023;12(8):2815. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Guohua Zeng XG. Flexible Ureteroscopy. People's Medical Publishing House; 2014. [Google Scholar]
- 5. Yoshida K, Naito S, Matsuda T. Navigation in endourology, ureteroscopy. Endourology Progress. Springer; 2019:289‐295. [Google Scholar]
- 6. Kuroda S, Ito H, Sakamaki K, et al. A new prediction model for operative time of flexible ureteroscopy with lithotripsy for the treatment of renal stones. PLoS One. 2018;13(2):e0192597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. van Oosterom MN, van der Poel HG, Navab N, van de Velde CJH, van Leeuwen FWB. Computer‐assisted surgery: virtual‐and augmented‐reality displays for navigation during urological interventions. Curr Opin Urol. 2018;28(2):205‐213. [DOI] [PubMed] [Google Scholar]
- 8. Markelj P, Tomaževič D, Likar B, Pernuš F. A review of 3D/2D registration methods for image‐guided interventions. Med image anal. 2012;16(3):642‐661. [DOI] [PubMed] [Google Scholar]
- 9. Alam F, Rahman SU, Ullah S, Gulati KJB, Engineering B. Medical image registration in image guided surgery: issues, challenges and research opportunities. Comput Math Methods Med. 2018;38(1):71‐89. [Google Scholar]
- 10. Boveiri HR, Khayami R, Javidan R, Mehdizadeh A. Medical image registration using deep neural networks: a comprehensive review. Comput. Electr. Eng. 2020;87:106767. [Google Scholar]
- 11. Luo X, Mori K, Peters TM. Advanced endoscopic navigation: surgical big data, methodology, and applications. Annu Rev Biomed Eng. 2018;20:221‐251. [DOI] [PubMed] [Google Scholar]
- 12. Jiang X, Ma J, Xiao G, Shao Z, Guo X. A review of multimodal image matching: methods and applications. Inf. Fusion. 2021;73:22‐71. [Google Scholar]
- 13. Deligianni F, Chung A, Yang G‐Z, pq‐space based 2D/3D registration for endoscope tracking. Paper presented at: International Conference on Medical Image Computing and Computer‐Assisted Intervention2003.
- 14. Visentini‐Scarzanella M, Stoyanov D, Yang GZ, Metric depth recovery from monocular images using Shape‐from‐Shading and specularities. Paper presented at: IEEE International Conference on Image Processing2013.
- 15. Shen M, Gu Y, Liu N, Yang G‐Z. Context‐aware depth and pose estimation for bronchoscopic navigation. IEEE Robot. Autom. Lett. 2019;4(2):732‐739. [Google Scholar]
- 16. Zhao C, Shen M, Sun L, Yang G‐Z. Generative localization with uncertainty estimation through video‐CT data for bronchoscopic biopsy. IEEE Robot. Autom. Lett. 2019;5(1):258‐265. [Google Scholar]
- 17. Unberath M, Gao C, Hu Y, et al. The impact of machine learning on 2d/3d registration for image‐guided interventions: a systematic review and perspective. Front Robot AI. 2021;8:716007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Koo B, Robu MR, Allam M, et al. Automatic, global registration in laparoscopic liver surgery. Int J Comput Assist Radiol Surg. 2022:1‐10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Labrunie M, Ribeiro M, Mourthadhoi F, et al. Automatic preoperative 3d model registration in laparoscopic liver resection. Int J Comput Assist Radiol Surg. 2022;17(8):1429‐1436. [DOI] [PubMed] [Google Scholar]
- 20. Helferty JP, Sherbondy AJ, Kiraly AP, Higgins WE. Computer‐based system for the virtual‐endoscopic guidance of bronchoscopy. Comput. Vis. Image Underst. 2007;108(1‐2):171‐187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Deguchi D, Mori K, Feuerstein M, et al. Selective image similarity measure for bronchoscope tracking based on image registration. Med Image Anal. 2009;13(4):621‐633. [DOI] [PubMed] [Google Scholar]
- 22. Luo X, KJItomi Mori. A discriminative structural similarity measure and its application to video‐volume registration for endoscope three‐dimensional motion tracking. IEEE Trans Med Imaging. 2014;33(6):1248‐1261. [DOI] [PubMed] [Google Scholar]
- 23. Ingram WS, Yang J, Beadle BM, et al. The feasibility of endoscopy‐CT image registration in the head and neck without prospective endoscope tracking. PLoS One. 2017;12(5):e0177886. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Luo X, Zeng H‐Q, Du Y‐P, Cheng X, Towards multiple instance learning and Hermann Weyl's discrepancy for robust image‐guided bronchoscopic intervention. Paper presented at: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part V. 2019.
- 25. Luo X, Accurate Multiscale Selective Fusion of CT and Video Images for Real‐Time Endoscopic Camera 3D Tracking in Robotic Surgery. Paper presented at: ICASSP 2022‐2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2022.
- 26. Zhu J‐Y, Park T, Isola P, Efros AA, Unpaired image‐to‐image translation using cycle‐consistent adversarial networks. Paper presented at: Proceedings of the IEEE international conference on computer vision2017.
- 27. Hoffman J, Tzeng E, Park T, et al. Cycada: cycle‐consistent adversarial domain adaptation. Paper presented at: International conference on machine learning2018.
- 28. Ronneberger O, Fischer P, Brox T, U‐net: convolutional networks for biomedical image segmentation. Paper presented at: International Conference on Medical image computing and computer‐assisted intervention2015.
- 29. Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S. Slic superpixels. 2010. [DOI] [PubMed]
- 30. Storn R, KJJogo Price, Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. 1997;11(4):341‐359. [Google Scholar]
- 31. Álvarez NA, Sanchiz JM, Badenas J, Pla F, Casañ G, Contour‐based image registration using mutual information. Paper presented at: Iberian Conference on Pattern Recognition and Image Analysis2005.
- 32. Schaffert R, Wang J, Fischer P, Maier A, Borsdorf A. Robust multi‐view 2‐d/3‐d registration using point‐to‐plane correspondence model. IEEE Trans Med Imaging. 2019;39(1):161‐174. [DOI] [PubMed] [Google Scholar]
- 33. Borji A, Computer Vision and Image Understanding. Pros and cons of gan evaluation measures. 2019;179:41‐65. [Google Scholar]
- 34. Magnier B, Multimedia Tools and Applications. Edge detection: A review of dissimilarity evaluations and a proposed normalized measure2018;77:9489‐9533. [Google Scholar]
