Abstract
Conventional neuro-navigation can be challenged in targeting deep brain structures via transventricular neuroendoscopy due to unresolved geometric error following soft-tissue deformation. Current robot-assisted endoscopy techniques are fairly limited, primarily serving to planned trajectories and provide a stable scope holder. We report the implementation of a robot-assisted ventriculoscopy (RAV) system for 3D reconstruction, registration, and augmentation of the neuroendoscopic scene with intraoperative imaging, enabling guidance even in the presence of tissue deformation and providing visualization of structures beyond the endoscopic field-of-view. Phantom studies were performed to quantitatively evaluate image sampling requirements, registration accuracy, and computational runtime for two reconstruction methods and a variety of clinically relevant ventriculoscope trajectories. A median target registration error of 1.2 mm was achieved with an update rate of 2.34 frames per second, validating the RAV concept and motivating translation to future clinical studies.
Keywords: Image-guided surgery, intraoperative imaging, computer vision, augmented reality, neurosurgery, ventriculoscopy
I. Introduction
TRANSVENTRICULAR endoscopic neurosurgery offers a minimally invasive approach to deep brain structures that can otherwise be difficult to access with a high degree of accuracy and precision. Such approaches are commonly used for endoscopic third ventriculostomy (ETV) and increasingly for cystectomy or biopsy in proximity to the lateral or third ventricles. The approach also facilitates emerging deep brain stimulation (DBS) techniques that target functional nuclei (e.g., the subthalamic nucleus [1], [2], amygdala [3], and globus pallidus [4]) for treatment of a spectrum of neurological or neurodegenerative diseases, including certain forms of autism [3], depression [5], Alzheimer’s disease [6], Tourette’s syndrome [7], and even obesity [8]. Such novel approaches would benefit from neurosurgical guidance with a higher degree of accuracy than current standard workflows without image guidance or navigation [9], which can be subject to unresolved geometric errors [10] from deep brain tissue deformation [11], [12]. During such an approach, a typical, modest egress of ~3-4 ml of cerebrospinal fluid (CSF) corresponds to deformation of ~4-10 mm in regions pertinent to such targets [13]. Confident localization of anatomical soft-tissue landmarks is essential to avoiding severe neurological complications, but such landmarks are not always conspicuous at the interior ventricular walls and in some patients are visually occult. Moreover, some targets are beyond the visible surface in the periventricular parenchyma. In practice, the geometric error associated with deep brain deformations that are not resolved by current navigation systems and confound accurate overlay of anatomical structures in neuroendoscopy.
Conventional neuronavigation relying on preoperative imaging uses frame-based or frameless stereotaxy with infrared or electromagnetic trackers to guide instruments, with reported errors ~2 mm [14], [15]. T1/T2 weighted MR images and/or CT is acquired routinely as part of the standard of care for neurosurgery. With the tracker registered to the image using fiducial or surface correspondences captured via a tracked pointer, rigid registration based on bone fiducials can help align instruments with respect to the skull, but such methods are still susceptible to soft-tissue deformation, limiting their utility in targeting structures in the deep-brain in the presence of CSF egress and deep brain shift. Intraoperative magnetic resonance (MR), computed tomography (CT), cone-beam CT (CBCT), or ultrasound imaging offers a means to resolve geometric error associated with deep brain tissue deformation by acquiring 3D images in the course of intervention. Intraoperative MR imaging [16] offers high contrast in soft tissues without ionizing radiation; however, the high cost of equipment, lack of mobility, and requirement of MR-compatible instruments limit broad utilization. CBCT [17] has emerged as a valuable means of intraoperative guidance with relatively compact, lower cost, and portable systems providing sub-millimeter spatial resolution for clear visualization of bone structures. While CBCT offers broader utilization than MR imaging, image quality in the current state of the art limits utility primarily to visualization of high-contrast structures. Intraoperative ultrasound imaging presents a possible alternative, although many approaches are limited to visualization and targeting of structures close to the cortical surface due to compounding errors of registration and poor image quality in deep brain structures [18]. Emerging methods, including photoacoustic imaging [19], could help advance ultrasound imaging in this context.
Robotic assistance offers the potential for further improvements in precision and safety. Emerging robotic systems such as ROSA® (Zimmer Biomet, Warsaw IN), neuromate® (Renishaw, Wotton-under-Edge, UK), ExcelsiusGPS® (Globus Medical, Audubon PA), and Galen Surgical Robot [20] host an advanced set of capabilities to assist neurosurgery. Features such as automated positioning on planned trajectories, tool stabilization, force limitation, and cooperative control of the end effector have proven useful for some surgical techniques [21], [22]. However, robotic guidance still commonly relies on surgical tracking and registration with preoperative imaging using fiducial point markers()[23] or surface registration [24] and is therefore subject to the same sources of geometric error mentioned above – e.g., deep brain tissue deformation due to CSF egress in neuroendoscopy.
The integration of neuroendoscopy with preoperative imaging (e.g., MR images) and robotic assistance has the potential to combine the strengths of each technology, providing up-to-date guidance by reconstruction of soft-tissue structures even after deep brain tissue deformation has occurred. In skull base neurosurgery, for example, previous work [25]–[27] showed that the nasal endoscope could be used to reconstruct a 3D surface using a structure from motion (SfM) approach to register the endoscopic scene with preoperative CT or intraoperative CBCT. However, such approaches rely on optical tracker-based navigation system as a reference, leading to confounding sources of error when evaluating accuracy. Typically, adaptive scale kernel consensus or hierarchical multi-affine registration is employed for matching, with the goal of tracking endoscopic motion for visualization. Other work [27] applied SfM methods for reconstruction of sinus anatomy using video only, and similarly relied on use of electromagnetic tracking as ground truth. Clinical studies aim to integrate neuroendoscopy with robotic assistance, but current approaches primarily employ the robot for endoscope holding and stabilization [28].
In the work reported below, robot-controlled motion of a neuroendoscope is used to reconstruct 3D surfaces of anatomical structures – viz., the inner surface of the cerebral ventricles. The approach is novel in its use of precise encoder feedback of robot pose to improve both the accuracy and runtime of 3D reconstruction from monocular endoscopic video. The reconstructed anatomical structures present an up-to-date view of the 3D anatomy and can be registered to pre- or intra-operative 3D imaging to help resolve geometric errors due to soft-tissue deformation. The solution thereby combines real-time neuroendoscopic video with preoperative or intraoperative 3D imaging. The approach is certainly not intended to replace preoperative imaging; rather, it combines information from preoperative CT or MRI (and planning therein) with neuroendoscopy to provide quantitative feedback and guidance within the context of the video scene – cf. conventional orthogonal slice-based navigation. The work reported below builds on preliminary studies of camera and hand-eye calibration for robot-assisted endoscopy [29]. The approach also builds on the prior literature reviewed above [25–27] and adapts recent technical development to address unique challenges of the ventricular anatomy and deep brain surgery. The specific contributions of current work include: (1) development of a prototype platform for robot-assisted neuroendoscopy evaluated in both simple geometric phantoms and a semirealistic anthropomorphic ventricle phantom; (2) implementation of requisite SfM reconstruction algorithms; (3) assessment of reconstruction accuracy for various reconstruction methods, demonstrating improvements in accuracy and runtime achieved with a robot-assisted approach; and (4) evaluation of reconstruction accuracy for a selection of robot-controlled endoscope trajectories.
The potential contributions of this work toward eventual clinical application include: (1) extending the utility of robotic assistance from serving simply as an endoscope holder to actively enabling 3D endoscopy; (2) providing accurate intraoperative navigation following deformation of the cerebral ventricles via 3D reconstruction and registration; and (3) visualizing ventricular anatomical structures, visible and/or occluded, overlaid accurately in endoscopic video imaging for improved targeting. The work reported below constitutes the first detailed reporting of the robot-assisted ventriculoscopy (RAV) prototype, including evaluation of the RAV approach in terms of the accuracy and runtime in 3D reconstruction in a series of phantom studies as part of the systematic development and translation toward clinical studies.
II. Methods
A. System for Robot-Assisted Ventriculoscopy
1). Overview of System Components:
The RAV system illustrated in Fig. 1a is a platform for development and evaluation of neuroendoscopic imaging, algorithms for reconstruction and registration, and integration with other surgical navigation systems in preclinical studies. The robot (UR3e, Universal Robots, Denmark) rigidly holds the ventriculoscope via a custom end effector attachment. The ventriculoscope (Lotta 28164LA 6°, Karl Storz, Tuttlingen, Germany) is coupled to an endoscopy camera console (AIM 1588, Stryker, Kalamazoo MI) and a light source (L10 AIM, Stryker, Kalamazoo MI). The system acquires 1920 × 1080 pixel images at 10 frames-per-second (fps). The analog endoscopic video output is converted to digital image frames and recorded by a computer workstation via a frame grabber (USB2DVI3.0, Epiphan Systems, Ottawa Canada). The camera zoom and focus were held fixed in experiments detailed below.
Fig. 1.

System for Robot Assisted Ventriculoscopy (RAV). (a) RAV system components and (b) a zoomed inset view of the ventriculoscope attached to the robot end effector. (c) Test bench setup illustrating the pertinent coordinate frames. Coordinate transforms between E, W, and C frames (marked in blue) are obtained preoperatively, and transforms between D, R, P, and S frames (marked in yellow) are obtained intraoperatively.
Intraoperative imaging and surgical navigation components of the system are also shown in Fig. 1a. The O-arm™imaging system (Medtronic, Littleton MA) was used to acquire CBCT images with nominal scan protocols involving 745 x-ray projections acquired at 100 kV (150 mAs) in a 360° orbit about the subject and reconstructed via 3D filtered backprojection with voxel dimensions 0.3 × 0.3 × 0.3 mm3. The StealthStation™navigation system (Medtronic, Boulder CO) provides frameless surgical tracking using stereoscopic infrared cameras to track a dynamic reference frame (SureTrak™) rigidly affixed to the endoscope and a reference marker attached to the cranium. For calibration and phantom studies, the reference marker was attached to a base plate as shown in Fig. 1c.
2). Coordinate Frames and Transforms:
The RAV system coordinate frames are illustrated in Fig. 1c. The robot base is considered the world frame, W, and the robot end effector frame is denoted E. Frame D is the dynamic reference frame attached to the ventriculoscope, and frame R is the fixed reference frame, conventionally attached to the cranium or a Mayfield clamp. The camera frame is denoted C and corresponds approximately to the tip of the ventriculoscope. Frame P denotes the patient / phantom frame with its corresponding intraoperative CBCT image, S. Throughout the manuscript, the notation denotes the homogeneous rigid transform of frame B with respect to reference frame A.
Several of the transforms between coordinate frames are calculated offline via a one-time calibration process. Camera calibration determines the camera intrinsics (K) and lens distortion coefficients as detailed and evaluated in previous work [29]. The robot-to-camera hand-eye calibration estimates from the camera intrinsics and end effector transforms, , obtained from the robot. In a similar process, the tracker-to-camera hand-eye calibration estimates from (obtained from the tracker). The O-arm™and StealthStation™provide automatic tracker-to-patient registration, , since the tracker observes the imager and reference markers during the CBCT scan. For other scenarios, such as using preoperative CT images, tracker-to-patient registration can instead be calculated by point fiducial or surface registration.
The remaining coordinate transforms in Fig. 1c are calculated intraoperatively and updated as new data are acquired. The relative coordinate transform between the phantom and camera frame, , was obtained by reconstructing a 3D point-cloud representation of the phantom. Registration between the CBCT image and the reconstructed point-cloud structure yields . Traversing this chain of transformations enables referencing of the intraoperative CBCT image in the camera frame, thereby permitting augmentation of endoscopic video with planning structures defined in (or deformably registered to) CBCT.
B. Point-Cloud Reconstruction and Registration
A SfM-based approach was used to reconstruct a point-cloud representation of the interior surface of the cerebral ventricles. Two implementations were evaluated in experiments detailed below: (1) a conventional arrangement in which endoscope motion is controlled manually (positioned via a static arm), and the extrinsics and endoscope trajectory are solved as part of the 3D point-cloud reconstruction; and (2) the RAV system, in which endoscope motion is computer-controlled, and the end effector transform, , is an input to the reconstruction pipeline.
Figure 2 outlines the algorithmic components for CBCT-to-video registration for the “Static-Arm” and “Robot-Assisted” implementations. The system was calibrated with one-time, preoperative robot-to-camera and tracker-to-camera calibrations. Ventriculoscope images () are taken as input to SfM reconstruction for both the Static-Arm and Robot-Assisted implementations, with the latter also ingesting the robot poses . Image domain features () are extracted and matched across multiple image frames to obtain a sparse 3D point-cloud representation of the ventricles. The reconstructed point cloud is aligned to a surface mesh segmented from the CBCT image, S, via iterative closest point (ICP) registration, yielding . The resulting registration allows the ventriculoscope trajectory to be visualized within the volumetric image via , providing a basis for navigating the endoscope to target structures. Conversely, the annotated target structures in S can be mapped to the camera frame, C, via and rendered in the endoscopic video as augmented overlays. Additional preoperative or intraoperative information – such as ventriculoscope orientation or depth cues – can be used to further augment the video display.
Fig. 2.

Algorithm flowchart illustrating the steps for 3D point cloud reconstruction, registration, and endoscopic guidance using structure-from-motion. The conventional arrangement (denoted “Static-Arm”) solves extrinsics and endoscope trajectory as part of the SfM optimization. The proposed “Robot-Assisted” process takes the camera-to-world as known input to the solution, presenting potential benefits to accuracy and runtime in addition to the physical / ergonomic benefits of a robot-controlled platform for endoscope manipulation.
1). Data Acquisition:
Neuroendoscopic video images are denoted I = {Ii : i = 1, …, NI}, where each image, Ii, is corrected for lens distortion as detailed in [29]. For the Robot-Assisted implementation, the corresponding end effector transforms yield the camera pose (extrinsics) for each video image as . The reconstruction is then performed over a batch of NI images.
2). Point-Cloud Reconstruction:
The COLMAP toolkit [30], [31], an open-source, general-purpose SfM library, was used for the reconstruction pipeline. SfM uses image domain features to establish correspondence between consecutive image frames and estimate homography. For each video image, Ii, features Fi = {fij, xij : j = 1, …, NiF} are extracted, where fij is the feature descriptor, and xij is the feature location.
The scale-invariant feature transform (SIFT) [32] was selected as the feature detector, and a histogram of oriented gradients (HoG) [33] was used to describe features at each detected feature location. Previous work[34] reported on the robustness and performance of various feature descriptors, including SIFT, in terms of specificity and sensitivity evaluated in an in vivo endoscopic procedure, identifying SIFT as the most discriminative descriptor for the imaging sequence. Prior literature [35–37] reported the use of SIFT and several variants as generalizable feature detectors for in vivo reconstruction of various weakly textured tissues. Other variants of SIFT such as AffineSIFT [38] and DSP-SIFT [39] were found to offer robust feature detection and matching between images with large affine distortions and tilt angles; however, such tilts are less pertinent in constrained robot-controlled ventriculoscope movements. Similarly, [40] reported a thorough analysis of SIFT and variants for feature detection and reconstruction in endoscopy validated using in vivo and ex vivo datasets, along with typical runtimes. A GPU-based implementation of SIFT was chosen over potential CPU-based SIFT variants to support a clinically feasible reconstruction runtime. Detected features are exhaustively matched to establish a match matrix, , for every image, Ia and Ib, where is the correspondence strength between image features. Detected features are exhaustively matched to establish a match matrix, , for every image, Ia and Ib, where is the correspondence strength between image features.
In the Static-Arm arrangement, the estimated homography between images is estimated to obtain relative camera extrinsics , thus solving the camera motion, with . For the Robot-Assisted arrangement, is known by design, since recording absolute transforms allows solution of the relative transforms, , and more importantly, a scale factor mapping the transformations to real dimensions. The lack of an absolute reference in the Static-Arm arrangement results in a estimate – and hence, the point-cloud reconstruction – absent of scale. Without loss of generality, is selected as an arbitrary frame as reference for the phantom for the Robot-Assisted arrangement. On the other hand, is set to the identity matrix for the Static-Arm arrangement. To reconstruct the 3D point cloud from either an estimated or known motion, the features matched across multiple images are identified in Q and triangulated in 3D space, yielding a sparse 3D point cloud, .
Image artifacts such as noise or specularity contribute to poor distinction between features, causing spurious feature matches and falsely triangulated points. As shown in Fig. 2, bundle adjustment helps to refine the noisy point cloud, , by posing the nonlinear problem of simultaneously estimating a refined 3D scene structure, , and motion, , via minimization of the reprojection error (RPE):
| (1) |
where for a given image, Ii, with a feature located at xij and corresponding backprojected feature, , the RPE is defined as:
| (2) |
Further outlier filtering [41] is performed as shown in Fig. 2 to remove points according to their triangulation angle and a realistic range of focal lengths (10 – 40 mm) required to for backprojection. The point-cloud reconstruction is further processed using a statistical filtering scheme to remove outliers. A k-d tree is constructed based on point locations, and each point is queried for k-nearest neighbors (k = 5). The standard deviation in relative distances from queried point to its neighbors (σ) is calculated, following which all points beyond σ = 1 are trimmed. The process is repeated for 3 iterations using CloudCompare [42] to obtain a reconstructed point cloud, . The reconstruction, , used to align with the preoperative or intraoperative image, S, and the known (Robot-Assisted) or estimated (Static-Arm) motion, P, completes the transformation chain to augment the surgical display.
3). Point-Cloud Registration:
The reconstructed point cloud, , within the patient reference frame, P, was first initialized to the surface mesh segmented from the CBCT scan, S. Initialization can be obtained via the surgical tracker using the dynamic reference frame, such that The point cloud, , is then registered to the mesh using trimmed iterative closest point (ICP) [43] with fraction of overlap γ = 0.95. Levenberg-Marquardt optimizer [44] with a point-to-plane distance metric [45] is performed for m = 1, …, NM iterations, yielding the transform . An initial transform (TPS)m=1 is obtained by either manual registration or using a tracker-to-scan transform as . The ICP optimization routine is then defined for a given point, Xk, and corresponding point closest point, Sk, in S, and triangulated face normal, nk, to minimize the normal distance dk as given in Eq. 3.
| (3) |
| (4) |
Next, (dk)2 is calculated for each point and sorted in ascending order to select a subset of inlier points Nin = γ · (NX)m with smallest (dk)2 values. The residual error of inlier points is minimized to obtain the incremental update transform shown in Eq. 4. The process is repeated iteratively until a tolerance criterion between consecutive iterations of angular difference (Δr < 0.05°) or a translational difference (Δt < 0.01 mm) was met. The tolerance criterion between consecutive iterations was set to an angular difference of Δr < 0.05° and a translational difference of Δt < 0.01 mm. The registration enables localizing the scan S and other targets of interest defined in S to be mapped to the world frame W using the transformation chain.
4). Augmented Overlay:
With the transformation known, any target defined in S can mapped to the camera frame, C, using the transform . Similarly, the planned surface or volumetric target structures can be mapped to C, and these structures can be assigned visual properties such as texture and opacity, followed by alpha-blending over the video images, I, providing an information rich, augmented overlay. Conversely, the ventriculoscope trajectory, , can be mapped to the scan, S, as to provide conventional three-view surgical guidance. Blender was used to render the augmented scene, along with TREK [46] and CloudCompare to display the augmented scene.
C. Experiments
Development and quantitative evaluation of the RAV system and methods proceeded according to two main experiments: (1) investigation of the effect of image sampling, the Static-Arm implementation, and the RAV implementation on the accuracy and runtime of point-cloud reconstruction in a simple hemispherical geometry phantom; and (2) translation to an anatomically realistic anthropomorphic phantom emulating the cerebral ventricles, with evaluation of reconstruction performance for the RAV system with a variety of endoscope motion trajectories.
1). Experiment #1: Reconstruction Accuracy in a Simple Geometric Phantom:
The first experiment investigated the effect of image sampling density for point-cloud reconstruction using a relatively simple geometric phantom (Fig. 3a–b) designed as a 70 × 70 mm2 square with a hemispherical recess of 50 mm diameter and parallel contrasting (red and blue) lines at intervals of 10 mm. The phantom was 3D printed on a Connex-3 Objet 260 (Stratasys, Eden Prairie, Minnesota) in Vero PureWhite material, and the parallel lines were manually colored. An example ventriculoscope image of the phantom (prior to distortion correction) is shown in Fig. 3b. Teflon spheres (BBs) of 1.5 mm diameter and contrasting colors were embedded at each intersection as target points (conspicuous in both the video and CBCT images). The BB positions were manually annotated in CBCT as ground truth for target registration error (TRE) evaluation. The centers of the BBs were manually annotated on thresholded CBCT images using an in-house software platform (TREK) [46] for 3D image analysis, registration, and guidance. To eliminate possible contribution of the spheres as features in the point-cloud reconstruction, the spheres were segmented in HSV space, and any features detected within the resulting masked region were discarded. The reconstructed point-cloud was similarly thresholded using the hue axis of the HSV color space to obtain points contrastingly colored in green (as seen in Fig. 3f), followed by manual annotation of central points of the BBs using CloudCompare [42]. The fast marching algorithm [47] was used to extract a triangulated surface mesh from CBCT, further used for registration.
Fig. 3.

Phantoms used in development and evaluation of the RAV system. (a) Hemispherical calibration phantom for reconstruction validation and (b) example image from the ventriculoscope. (c-e) Various views of the anthropomorphic ventricle phantom and (f) example image from the ventriculoscope.
The ventriculoscope was translated along a flat circular plane trajectory over the hemispherical phantom at a focal length of 20 mm (a typical depth of field in ventriculoscopy). The number of video images (NI) used for point-cloud reconstruction varied from 10 to 200 sampled uniformly from video series collected from the motion trajectory. The effect of image sampling rate on point-cloud reconstruction was assessed in terms of: (1) the number of 3D points (NX) in the resulting sparse reconstruction; and (2) the computational runtime for all steps within the reconstruction pipeline.
The geometrical phantom was further used to evaluate the Static-Arm and Robot-Assisted arrangements in terms of reconstruction accuracy and runtime. Reconstruction accuracy was evaluated in terms of the signed residual error (after registration) between a reconstructed point (Xk), and the corresponding closest point on the surface mesh of CBCT image (), as projected on the CBCT image (S) – i.e., the projection error:
| (5) |
where projS is the projection operator over image, S, followed by k-nearest neighbor interpolation (5 nearest neighbors). To restrict extrapolation within the bounds of reconstruction, the point cloud was first triangulated using Delaunay triangulation to create a convex hull, after which PE was evaluated only over the surface within the enclosed convex hull. Target registration error (TRE) was evaluated as the RMS distance from a BB{Br : r = 1, …, Nb} as observed in the reconstructed point cloud, and the corresponding annotated BB position in CBCT image S:
| (6) |
Statistical significance in the observed differences in TRE was evaluated using a Mann-Whitney U test under the null hypothesis that the two independent samples draw from identical distributions with equal medians (without prior assumptions on the shape of the distribution), taking p < 0.05 as indicative of a statistically significant difference. Finally, the computational runtime was evaluated as the time required for all steps in the reconstruction pipeline.
2). Experiment #2: Translation to an Anthropomorphic Phantom:
The RAV solution was translated to a semirealistic context using an anthropomorphic ventricle phantom (Fig. 3c–f) emulating the cerebral ventricles – a model that more closely reflects the natural anatomical scene and allows quantitative, reproducible evaluation of reconstruction and registration performance. The phantom depicts the anatomical structures pertinent to ventriculoscopic approach – viz., the lateral ventricles (LV), Foramen of Monro (FoM), and third ventricle (3V) – as shown in Fig. 3c–e, segmented from MR images and scaled by 2.5 × for initial testing without safeguards on robot motion. The phantom was 3D printed using Vero PureWhite on a Connex-3 Objet 260 printer (Stratasys, Eden Prairie MN) and post-processed to depict simulated vasculature, including the superior thalamostriate vein (an important anatomical landmark relative to the FoM), the anterior septal vein, smaller vessels approximating a typical neuroendoscopic scene. The materials used in the phantom allowed high resolution CBCT in these studies as a substitute for MR imaging as the basis for defining the location of the ventricular surface and target structures. The ventricle walls were embedded with Teflon BBs (1.5 mm diameter) labeled green (as seen in Fig. 3f) to permit segmentation and masking (in point-cloud reconstruction and registration) and use as target points in calculation of TRE as described in §II.C.1.
Three clinically relevant endoscope trajectories were tested using the Robot-Assisted arrangement: Linear, Arc, and Remote Center of Motion (RCM), as detailed below. Each orbit follows a clinically feasible orbit for ventriculoscopy in that the burr hole (drilled in the cranium typically at the Kocher’s point) constrains the possible range of surgical movements to a point at some location along the shaft of the endoscope. For all three trajectories, performance was evaluated in terms of PE of the ventricular surface and TRE of target points. The Linear trajectory is typical of common ventriculoscopic approach to the 3V, in which the ventriculoscope is inserted via a sheath through anterior volume of the LV, through the FoM, and into the 3V to access deep brain target structures. The Linear trajectory was realized in the phantom simply by advancing the RAV system along a 1D path from the superior aspect of the LV to the floor of the 3V (50 mm total length). While this trajectory may be typical of clinical approach, it is perhaps the least robust for 3D point-cloud reconstruction, since to motion of the camera along the principal axis is degenerate from an information theoretic perspective.
The Arc trajectory refers to the arc traced by the tip of ventriculoscope along the entire length of the anterior-posterior (AP) axis of the LV, with the ventriculoscope shaft constrained at the Kocher’s point. Since the Arc surveys the entire length of the ventricles, major vasculature along the ventricular walls (including the superior thalamostriate vein and anterior septal vein) are well sampled, while the FoM (which is comparatively feature rich in the Linear trajectory) is only faintly visible and contributes fewer feature points for this trajectory. In these studies, the Arc subtended a motion spanning a radius of 50 mm.
The RCM trajectory refers to the tip of the ventriculoscope in the axial plane traced via circular motion within the LV, with center of rotation at the Kocher’s point. Similar to the Arc, the RCM trajectory was anticipated to produce feature-rich sampling of the walls of the LV (including major and minor vasculature) and sparser sampling about the FoM.
III. Results
A. Experiment #1: Sampling Density and Static-Arm vs. RAV
Factors governing the sampling density for accurate point-cloud reconstruction are presented in Fig. 4. The number of reconstructed 3D points was observed to increase sharply with increasing number of video images acquired, reaching a saturation limit of NI ≈ 100, beyond which a maximal number of image features were extracted from the video scene. The result can be expected to be somewhat scene-dependent, and a similar trend and saturation value was confirmed in the ventricle phantom, below. The runtime was increase sharply with number of images.
Fig. 4.

Effect of the number of video images used for point-cloud reconstruction on the algorithm runtime (left y-axis) and the number of 3D points in the resulting sparse reconstruction (right y-axis). Trend lines are quadratic (left) and exponential (right) fits.
To maintain a reasonably low runtime for clinical feasibility while ensuring a sufficient degree of sampling for point-cloud reconstruction, NI = 100 images was selected for Experiment #2. As shown in Fig. 4, this corresponded to a runtime of ~3.6 min for 100 images, which in turn represents 2.1 seconds taken for each incremental image captured (update rate of ~0.5 fps). Further improvement in runtime towards clinical feasibility was achieved using Robot-Assisted arrangement, below.
Figure 5 summarizes the performance of Static-Arm and Robot-Assisted arrangements in terms of PE, TRE, and algorithm runtime. The Static-Arm arrangement demonstrated PE of 0.29 mm (0.63 mm), and the Robot-Assisted arrangement achieved PE of 0.22 mm (0.43 mm), p < 0.001, both reported as RMSE (95th percentile). That both methods achieved end-to-end errors < 1 mm suggests that each reached the same global minimum in point-cloud reconstruction. The reconstructed point cloud (for the RAV) is shown in Fig. 5b overlaid on the CBCT image of the phantom (S), displaying the backprojected 3D feature points in blue or red. The reconstruction is seen to be denser in the central region than the periphery, owing to a higher degree of image overlap among the NI images sampled from the plane circular trajectory. The colorwash overlay of PE on the phantom surface shows a uniformly small (< 0.2 mm) error within the hemispherical region and a bias (ranging from 0 to −2 mm) in the flat region peripheral to the densely sampled region, primarily due to extrapolation at the edges, where the density of reconstructed points was sparse or null.
Fig. 5.

Geometric accuracy of point-cloud reconstruction using the Static-Arm and Robot-Assisted arrangements for ventriculoscopy. (a) Projected error (PE) for each showing that each can achieve an unbiased estimate of the 3D surface. Box-plots shown within the distributions show the median (circle), 1st and 3rd quartile (black), and 5th and 95th percentiles (whiskers). (b) Colorwash overlay of PE for the Robot-Assisted arrangement projected on the phantom surface along with the reconstructed point cloud (blue and red points), showing no particular directional bias within the central region of interest (and bias outside the hemisphere due to extrapolation in regions of low sampling density). (c) TRE for the Static-Arm and Robot-Assisted arrangements, showing each to perform with median accuracy of ~0.7 mm. (d) The endoscopic scene augmented with registered targets (yellow spheres) and a depth cue given by the brightness of a green-scale overlay. (e) Runtime measured for the Static-Arm and Robot-Assisted implementations, showing a strong reduction in runtime for the latter.
As shown in Fig. 5c, the TRE was measured to be 0.79 mm (1.21 mm) for the Static-Arm and 0.83 mm (1.57 mm) for the RAV arrangement, with no statistically significant difference observed (p > 0.1). Visual overlay of target points (for the RAV, augmented as yellow spheres in Fig. 5d) confirms a high degree of reconstruction accuracy throughout the visual field. Also illustrated in Fig. 5d is a depth cue computed from the 3D point-cloud reconstruction indicated by the intensity (brightness) of the green overlay, with regions farther from the endoscope tip overlaid with darker intensity overlay. Figure 5e shows the runtimes for the Static-Arm and RAV arrangements. For the circular plane trajectory, the point-cloud reconstruction runtime was 4.56 min for the Static-Arm and 2.62 min for the RAV. In Fig. 5e, a distribution of runtimes is shown for a variety of trajectories, including the Linear, Arc, and RCM trajectories detailed below. Overall, the Static-Arm required a median runtime of 1.60 ± 1.03 min, compared to 0.71 ± 0.16 min reported as median ± IQR, for the RAV. This corresponds to a 2X speedup for the RAV gained by knowledge of the extrinsics, , and an update rate of 2.34 fps per incremental image captured (NI = 100 images). For both arrangements, the runtime was dominated by the feature matching and subsequent bundle adjustment steps, consuming 56% and 41% of the total runtime, respectively. Feature extraction occupied < 3% of the runtime, owing to GPU implementation of SIFT feature detection. The RAV arrangement shows promise in providing similar end-to-end reconstruction accuracy with a significantly faster runtime and was therefore implemented for Experiment #2 below.
B. Experiment #2: Translation to Anthropomorphic Phantom
As summarized in Fig. 6, reconstruction accuracy was evaluated for three endoscope trajectories in the anthropomorphic ventricle phantom in terms of PE and TRE. The absolute PE was < 1 mm for all trajectories tested, noting that the residual ICP registration error was 0.51 mm. The PE was 0.38 mm (0.78 mm) for the Linear trajectory, 0.52 mm (1.15 mm) for the Arc, and 0.57 mm (1.11 mm) for the RCM trajectory. The PE for the RCM was significantly different from that of Linear or Arc (p < 0.001), owing to the larger spatial coverage and sparser point-cloud reconstruction of RCM in the lateral ventricles. As illustrated in Fig. 6b (for the Linear trajectory), the PE is unbiased throughout the main region of about the FoM and base of the 3V. Features overlaid on the surface are seen to correspond to the thalamostriate vein along with its branches.
Fig. 6.

Robot-assisted ventriculoscopy in an anthropomorphic ventricle phantom. (a) Projected error for the Linear, Arc, and RCM trajectories. Box-plots within the distributions show the median (circle), 1st and 3rd quartile (black), and 5th and 95th percentiles (whiskers). (b) Overlay of PE for the Linear trajectory on the ventricle surface, showing no particular bias within the region of interest of the FoM and 3V. (c) TRE for the Linear, Arc, and RCM trajectories. (d) Augmented overlay of the neuroendoscopic scene with target BBs (yellow), green-scale depth cue, and simulated surgical targets (Target1 and Target2 in blue and pink). A video version of (d) is available as supplemental material.
Figure 6c shows the TRE for the three RAV trajectories: 1.63 mm (3.15 mm) for Linear, 1.36 mm (2.3 mm) for Arc, and 0.8 mm (1.4 mm) for RCM. The RCM performed significantly better (p < 0.005) than the Linear and Arc, which exhibited a broader distribution of TRE due to extrapolation in the periphery for these more narrowly focused motion paths. Fig. 6d shows an example endoscope view on approach to the FoM with a number of noteworthy augmentations. A video depicting the endoscopic scene of Fig. 6d is available as supplemental material. The target positions are overlaid in yellow, showing accurate correspondence with the BBs embedded in the ventricle wall (dark green). The distance between the camera and ventricle surface is displayed as a depth cue given by the brightness space of green-scale overlay. Finally, two simulated targets (denoted Target1 and Target2) exterior to the ventricles are augmented in the video scene in translucent cyan (Target1 exterior to the LV) and opaque magenta (Target2 outside the 3V, partially occluded in this image by the anterior aspect of the FoM).
IV. Discussion
Experiment #1 evaluated the image sampling density required for accurate point-cloud reconstruction as a function of the number of video images ingested. Reconstructions were successful (low PE) for even a low number of images (NI < 50), but did not provide a sufficient number of points for reliable registration to CBCT image and lacked sufficient structure for annotation of BBs for TRE evaluation. The number of 3D points (NX) in the resulting sparse point-cloud reconstruction increased up to a saturation of ≈ 1.2 × 104 points, followed by minimal further improvement due to a maximum number of SIFT features extracted from the phantom. The increased number of images and feature points corresponds to a rapid increase in reconstruction runtime, however – for example, runtime ~15 min (≈ 4.5s per image) for NI = 200 images, due to exhaustive matching and reconstruction across all captured images. Considering the tradeoff between runtime and reconstruction quality, a nominal value of NI = 100 images was taken as a reasonable operating point.
The experiment further evaluated the Static-Arm and Robot-Assisted arrangements in terms of reconstruction quality and registration accuracy. For both arrangements, the solution was able to reconstruct a 3D-point cloud that closely matched the true CBCT surface, showing that either scenario can provide sufficient sampling to reach the same global minimum in reconstruction. The Robot-Assisted arrangement was performed as well as the Static-Arm in terms of reconstruction (PE and TRE < 1 mm), providing a stable, convenient platform for the ventriculoscope, and a significant reduction in runtime, warranting its use in subsequent studies.
Experiment #2 evaluated system performance in translation to a semi-realistic anthropomorphic ventricle phantom imaged according to three clinically relevant trajectories. Overall, the RAV system was relatively insensitive to the choice of trajectory. The RCM trajectory yielded the most feature-rich reconstruction of the LV, resulting in a more uniform degree of reconstruction accuracy throughout the 3D point cloud. The Linear and Arc trajectories performed as well as the RCM within a particular region of interest (e.g., the FoM), but exhibited a broader distribution in registration error, since these trajectories were more narrowly focused and yielded a sparser collection of features at the periphery. Although the RCM trajectory presented a lower TRE, the Linear trajectory performed well within the region of interest (and better than might be expected, given information theoretic degeneracies of sampling along a single line). The Linear trajectory also presents a simple, convenient workflow for RAV, since it allows reconstruction from images obtained during the usual insertion of the ventriculoscope through LV and into the FoM (without additional maneuvers required for the Arc or RCM trajectories). This could facilitate a more practical clinical workflow in guiding neuroendoscopic access to deep-brain targets.
This study is not without its limitations. First is the limitation to phantoms, and even the anthropomorphic ventricle phantom (while qualitatively similar to the natural scene in neuroendoscopy) may not reflect realistic texture and specular characteristics of a true ventricular wall. Moreover, the phantoms did not include CSF, and while CSF is relatively clear and is not in itself anticipated to affect the accuracy of point-cloud reconstruction, floating debris within the CSF could present spurious feature detections. Finally, the phantoms were not deformable; rather, the current studies focused on aspects of 3D point-cloud reconstruction (with rigid ICP registration to the CBCT image). In practice, the 3D point-cloud reconstruction may require deformable registration to the CBCT image – a point to be investigated in future work that aims to translate the method to clinical neuroendoscopic images.
V. Conclusion
The first implementation of the RAV system as a platform for development of requisite 3D reconstruction and registration algorithms was presented, along with the design of geometric and anthropomorphic phantoms for calibration and system validation respectively. Phantom studies were performed to identify the image sampling requirements for accurate point-cloud reconstruction, assessed in terms of registration accuracy and algorithm runtime. Further studies conducted in an anthropomorphic ventricle phantom with clinically relevant trajectories demonstrated < 1.5 mm end-to-end system accuracy – even with a simple, practical Linear trajectory of the endoscope traversing the LV to the FoM. Runtime was sufficient to provide 3D updates at a rate of 2.34 fps. These early results warrant further development and translation of the system to cadaver and clinical studies.
The work demonstrates the first steps toward extending the utility of robotic assistance in ventriculoscopy from that of a scope holder to that of an active guidance system, exploiting the controllable motion and known hand-eye relationship of the robot and endoscope to reconstruct a 3D representation of intraoperative anatomy. The solution could help to address geometric errors associated with soft-tissue deformation occurring in the course of surgical access by providing an up-to-date 3D reconstruction of the ventricles at the time of intervention, thereby enabling more precise and accurate guidance (<1 mm PE, compared to previous methods with up to 10 mm residual error due to brain shift and confounding stereotactic error) to deep brain targets. The system enables augmentation of the endoscopic video scene not only with planning structures (defined in or registered to intraoperative CBCT) that are visible within the ventriculoscopic scene but also depth cues drawn from the 3D reconstruction and even target structures beyond the FOV, deeper within the brain and exterior to the ventricles. This could facilitate interventions such as cystectomy, biopsy, or DBS targeting extra-ventricular features in the deep brain.
Future work will include design and testing on deformable anthropomorphic ventricle phantoms and implementation of deformable registration methods. An active RAV approach will be considered to plan the optimal robot-controlled trajectory for 3D reconstruction and navigation with respect to tissue surface coverage for reconstruction and registration. Future work will also extend the methods to include a deep learning approach to depth estimation and reconstruction methods to further improve navigation accuracy and runtime.
Supplementary Material
Acknowledgments
This work is supported by NIH grant U01-NS-107133 and Biomedical Research Partnership (BRP) with Medtronic, Littleton, MA.
Contributor Information
Prasad Vagdargi, Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA.
Ali Uneri, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
Craig K. Jones, Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD USA
Pengwei Wu, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
Runze Han, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
Mark G. Luciano, Department of Neurosurgery, Johns Hopkins Medicine, Baltimore, MD, USA
William S. Anderson, Department of Neurosurgery, Johns Hopkins Medicine, Baltimore, MD, USA
Patrick A. Helm, Medtronic, Littleton MA, USA
Gregory D. Hager, Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA.
Jeffrey H. Siewerdsen, Department of Biomedical Engineering and Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
References
- [1].Apetauerova D, Ryan RK, Ro SI, Arle J, Shils J, Papavassiliou E, and Tarsy D, “End of day dyskinesia in advanced Parkinson’s disease can be eliminated by bilateral subthalamic nucleus or globus pallidus deep brain stimulation,” Mov. Disord, vol. 21, no. 8, pp. 1277–1279, 2006. [DOI] [PubMed] [Google Scholar]
- [2].Starr PA, Martin AJ, Ostrem JL, Talke P, Levesque N, and Larson PS, “Subthalamic nucleus deep brain stimulator placement using high-field interventional magnetic resonance imaging and a skull-mounted aiming device: Technique and application accuracy - Clinical article,” J. Neurosurg, vol. 112, no. 3, pp. 479–490, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Sinha S, McGovern RA, and Sheth SA, “Deep brain stimulation for severe autism: from pathophysiology to procedure,” Neurosurg. Focus, vol. 38, no. 6, p. E3, 2015. [DOI] [PubMed] [Google Scholar]
- [4].Park HR, Lee JM, Ehm G, Yang HJ, Song IH, Lim YH, Kim MR, Kim KR, Lee WW, Kim YE, Hwang JH, Shin CW, Park H, Kim JW, Kim HJ, Kim C, Kim DG, Jeon BS, and Paek SH, “Long-term clinical outcome of internal globus pallidus deep brain stimulation for dystonia,” PLoS One, vol. 11, no. 1, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Temel Y and Lim LW, “Neurosurgical Treatments of Depression,” Current Topics in Behavioral Neurosciences, vol. 14. pp. 327–339, 2013. [DOI] [PubMed] [Google Scholar]
- [6].Laxton AW, Tang-Wai DF, McAndrews MP, Zumsteg D, Wennberg R, Keren R, Wherrett J, Naglie G, Hamani C, Smith GS, Lozano AM, A.W. L, D.F. T-W, M.P. M, D. Z, R. W, R. K, J. W, G. N, C. H, and G.S. S, “A Phase I trial of deep brain stimulation of memory circuits in alzheimer disease,” Annals of Neurology, vol. 68, no. 4. pp. 521–534, 2010. [DOI] [PubMed] [Google Scholar]
- [7].Fraint A and Pal G, “Deep brain stimulation in Tourette’s syndrome,” Frontiers in Neurology, vol. 6, no. Aug. 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Halpern CH, Wolf JA, Bale TL, Stunkard AJ, Danish SF, Grossman M, Jaggi JL, Grady MS, and Baltuch GH, “Deep brain stimulation in the treatment of obesity,” J. Neurosurg, vol. 109, no. 4, pp. 625–634, 2008. [DOI] [PubMed] [Google Scholar]
- [9].Wasi MSI, Sharif YS, and Gulzar F, “Implication of image guidance in endoscopic third ventriculostomy: Technical note,” Surg. Neurol. Int, vol. 11, no. 87, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Richardson RM, Ostrem JL, and Starr PA, “Surgical repositioning of misplaced subthalamic electrodes in Parkinson’s disease: Location of effective and ineffective leads,” Stereotact. Funct. Neurosurg, 2009. [DOI] [PubMed] [Google Scholar]
- [11].Ivan ME, Yarlagadda J, Saxena AP, Martin AJ, Starr PA, Sootsman WK, and Larson PS, “Brain shift during bur hole-based procedures using interventional MRI,” J. Neurosurg, vol. 121, no. 1, pp. 149–160, 2014. [DOI] [PubMed] [Google Scholar]
- [12].Khan MF, Mewes K, Gross RE, and Skrinjar O, “Assessment of brain shift related to deep brain stimulation surgery.,” Stereotact. Funct. Neurosurg, vol. 86, no. 1, pp. 44–53, 2008. [DOI] [PubMed] [Google Scholar]
- [13].Hodel J, Besson P, Rahmouni A, Petit E, Lebret A, Grandjacques B, Outteryck O, Benadjaoud MA, Maraval A, Luciani A, Pruvo JP, Decq P, and Leclerc X, “3D mapping of cerebrospinal fluid local volume changes in patients with hydrocephalus treated by surgery: Preliminary study,” Eur. Radiol, vol. 24, no. 1, pp. 136–142, 2014. [DOI] [PubMed] [Google Scholar]
- [14].Palys V and Holloway KL, “Frameless Functional Stereotactic Approaches,” Prog. Neurol. Surg, vol. 33, pp. 168–186, 2018. [DOI] [PubMed] [Google Scholar]
- [15].Bjartmarz H and Rehncrona S, “Comparison of accuracy and precision between frame-based and frameless stereotactic navigation for deep brain stimulation electrode implantation,” Stereotact. Funct. Neurosurg, vol. 85, no. 5, pp. 235–242, 2007. [DOI] [PubMed] [Google Scholar]
- [16].Jolesz FA, “Intraoperative imaging in neurosurgery: where will the future take us?,” Acta Neurochir. Suppl, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].King E, Daly MJ, Chan H, Bachar G, Dixon BJ, Siewerdsen JH, and Irish JC, “Intraoperative cone-beam CT for head and neck surgery: Feasibility of clinical implementation using a prototype mobile C-arm,” Head Neck, 2013. [DOI] [PubMed] [Google Scholar]
- [18].Ganau M, Ligarotti GK, and Apostolopoulos V, “Real-time intraoperative ultrasound in brain surgery: Neuronavigation and use of contrastenhanced image fusion,” Quantitative Imaging in Medicine and Surgery, vol. 9, no. 3. pp. 350–358, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Kim S, Kang HJ, Cheng A, Lediju Bell MA, Boctor E, and Kazanzides P, “Photoacoustic image guidance for robot-assisted skull base surgery,” in Proceedings - IEEE International Conference on Robotics and Automation, 2015, vol. 2015–June, no. June, pp. 592–597. [Google Scholar]
- [20].Olds KC, Chalasani P, Pacheco-Lopez P, Iordachita I, Akst LM, and Taylor RH, “Preliminary evaluation of a new microsurgical robotic system for head and neck surgery,” in IEEE International Conference on Intelligent Robots and Systems, 2014, pp. 1276–1281. [Google Scholar]
- [21].Faria C, Erlhagen W, Rito M, De Momi E, Ferrigno G, and Bicho E, “Review of Robotic Technology for Stereotactic Neurosurgery.,” IEEE Rev. Biomed. Eng, vol. 8, pp. 125–37, 2015. [DOI] [PubMed] [Google Scholar]
- [22].Zimmermann M, Krishnan R, Raabe A, and Seifert V, “Robot-assisted navigated endoscopic ventriculostomy: implementation of a new technology and first clinical results.,” Acta Neurochir. (Wien), vol. 146, no. 7, pp. 697–704, Jul. 2004. [DOI] [PubMed] [Google Scholar]
- [23].Liu L, Mariani SG, De Schlichting E, Grand S, Lefranc M, Seigneuret E, and Chabardés S, “Frameless ROSA®Robot-Assisted Lead Implantation for Deep Brain Stimulation: Technique and Accuracy,” Oper. Neurosurg, vol. 19, no. 1, pp. 57–64, 2020. [DOI] [PubMed] [Google Scholar]
- [24].Maier-Hein L, Speidel S, Stenau E, Chen ECS, and Ma B, “Mixed and Augmented Reality in Medicine,” in Mixed and Augmented Reality in Medicine, 2018. [Google Scholar]
- [25].Mirota DJ, Wang H, Taylor RH, Ishii M, Gallia GL, and Hager GD, “A System for Video-Based Navigation for Endoscopic Endonasal Skull Base Surgery,” IEEE Trans. Med. Imaging, vol. 31, no. 4, pp. 963–976, Apr. 2012. [DOI] [PubMed] [Google Scholar]
- [26].Mirota DJ, Uneri A, Schafer S, Nithiananthan S, Reh DD, Ishii M, Gallia GL, Taylor RH, Hager GD, and Siewerdsen JH, “Evaluation of a System for High-Accuracy 3D Image-Based Registration of Endoscopic Video to C-Arm Cone-Beam CT for Image-Guided Skull Base Surgery,” IEEE Trans. Med. Imaging, vol. 32, no. 7, pp. 1215–1226, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Leonard S, Sinha A, Reiter A, Ishii M, Gallia GL, Taylor RH, and Hager GD, “Evaluation and Stability Analysis of Video-Based Navigation System for Functional Endoscopic Sinus Surgery on In Vivo Clinical Data,” IEEE Trans. Med. Imaging, vol. 37, no. 10, pp. 21852195, Oct. 2018. [DOI] [PubMed] [Google Scholar]
- [28].Hoshide R, Calayag M, Meltzer H, Levy ML, and Gonda D, “Robot-assisted endoscopic third ventriculostomy: Institutional experience in 9 patients,” J. Neurosurg. Pediatr, vol. 20, no. 2, 2017. [DOI] [PubMed] [Google Scholar]
- [29].Vagdargi P, Uneri A, Jones C, Wu P, Han R, Luciano M, Anderson W, Hager G, and Siewerdsen JH, “Robot-assisted ventriculoscopic 3D reconstruction for guidance of deep-brain stimulation surgery,” in Proc. SPIE Medical Imaging 2021: Image-Guided Procedures, Robotic Interventions, and Modeling, 2021, vol. 11598, no. 23, p. 6. [Google Scholar]
- [30].Schönberger JL and Frahm JM, “Structure-from-Motion Revisited,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, vol. 2016-Decem, pp. 4104–4113. [Google Scholar]
- [31].Schönberger JL, Zheng E, Frahm JM, and Pollefeys M, “Pixelwise view selection for unstructured multi-view stereo,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, vol. 9907 LNCS, pp. 501–518. [Google Scholar]
- [32].Lowe DG, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis, vol. 60, no. 2, pp. 91–110, 2004. [Google Scholar]
- [33].Dalal N and Triggs B, “Histograms of oriented gradients for human detection,” in Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, 2005, vol. I, pp. 886–893. [Google Scholar]
- [34].Mountney P, Lo B, Thiemjarus S, Stoyanov D, and Zhong-Yang G, “A probabilistic framework for tracking deformable soft tissue in minimally invasive surgery,” in Lecture Notes in Computer Science, 2007, vol. 4792 LNCS, no. PART 2, pp. 34–41. [DOI] [PubMed] [Google Scholar]
- [35].Widya AR, Monno Y, Okutomi M, Suzuki S, Gotoda T, and Miki K, “Whole Stomach 3D reconstruction and frame localization from monocular endoscope video,” IEEE J. Transl. Eng. Heal. Med, vol. 7, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Shen Y, Guturu PP, and Buckles BP, “Wireless capsule endoscopy video segmentation using an unsupervised learning approach based on probabilistic latent semantic analysis with scale invariant features,” IEEE Transactions on Information Technology in Biomedicine, vol. 16, no. 1. pp. 98–105, 2012. [DOI] [PubMed] [Google Scholar]
- [37].Yuan Y, Li B, and Meng MQH, “Improved Bag of Feature for Automatic Polyp Detection in Wireless Capsule Endoscopy Images,” IEEE Trans. Autom. Sci. Eng, vol. 13, no. 2, pp. 529–535, 2016. [Google Scholar]
- [38].Yu G and Morel J-M, “ASIFT: An Algorithm for Fully Affine Invariant Comparison,” Image Process. Line, vol. 1, pp. 11–38, 2011. [Google Scholar]
- [39].Dong J and Soatto S, “Domain-size pooling in local descriptors: DSP-SIFT,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015, vol. 07–12–June, pp. 5097–5106. [Google Scholar]
- [40].Lourenço António Miguel, “Keypoint Detection, Matching, and Tracking in Images with Non-Linear Distortion: Applications in Medical Endoscopy and Panoramic Vision,” Universidade De Coimbra, 2014. [Google Scholar]
- [41].Schonberger J, “Robust Methods for Accurate and Efficient 3D Modeling from Unstructured Imagery,” Ph.D. dissertation, Computer Vision and Geometry Group, ETH Zurich, 2018. [Google Scholar]
- [42].Girardeau-Montaut D, “CloudCompare:3D point cloud And mesh processing software,” 2015.[Online]. Available: https://www.cloudcompare.org/.
- [43].Chetverikov D, Svirko D, Stepanov D, and Krsek P, “The trimmed iterative closest point algorithm,” in Proceedings - International Conference on Pattern Recognition, 2002, vol. 16, no. 3, pp. 545–548. [Google Scholar]
- [44].Moré JJ, “The Levenberg-Marquardt algorithm: Implementation and theory,” Springer, Berlin, Heidelberg, 1978, pp. 105–116. [Google Scholar]
- [45].Chen Y and Medioni G, “Object modeling by registration of multiple range images,” in Proceedings -IEEE International Conference on Robotics and Automation, 1991, vol. 3, pp. 2724–2729. [Google Scholar]
- [46].Uneri A, Schafer S, Mirota DJJ, Nithiananthan S, Otake Y, Taylor RHH, Siewerdsen JHH, Gallia GL, Khanna AJ, Lee S, Reh DD, and Siewerdsen JHH, “TREK: An integrated system architecture for intraoperative cone-beam CT-guided surgery,” Int. J. Comput. Assist. Radiol. Surg, vol. 7, no. 1, pp. 159–173, Jan. 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [47].Sethian JA, “A fast marching level set method for monotonically advancing fronts,” Proc. Natl. Acad. Sci. U. S. A, vol. 93, no. 4, pp. 1591–1595, 1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
