Abstract
Augmentation of endoscopic video with preoperative or intraoperative image data [e.g., planning data and/or anatomical segmentations defined in computed tomography (CT) and magnetic resonance (MR)], can improve navigation, spatial orientation, confidence, and tissue resection in skull base surgery, especially with respect to critical neurovascular structures that may be difficult to visualize in the video scene. This paper presents the engineering and evaluation of a video augmentation system for endoscopic skull base surgery translated to use in a clinical study. Extension of previous research yielded a practical system with a modular design that can be applied to other endoscopic surgeries, including orthopedic, abdominal, and thoracic procedures. A clinical pilot study is underway to assess feasibility and benefit to surgical performance by overlaying CT or MR planning data in real-time, high-definition endoscopic video. Preoperative planning included segmentation of the carotid arteries, optic nerves, and surgical target volume (e.g., tumor). An automated camera calibration process was developed that demonstrates mean re-projection accuracy (0.7±0.3) pixels and mean target registration error of (2.3±1.5)mm. An IRB-approved clinical study involving fifteen patients undergoing skull base tumor surgery is underway in which each surgery includes the experimental video-CT system deployed in parallel to the standard-of-care (un-augmented) video display. Questionnaires distributed to one neurosurgeon and two otolaryngologists are used to assess primary outcome measures regarding the benefit to surgical confidence in localizing critical structures and targets by means of video overlay during surgical approach, resection, and reconstruction.
Keywords: video-CT augmentation, image-guided surgery, skull-base surgery, surgical navigation, video endoscopy
1. INTRODUCTION
Endoscopic skull base surgery is an emerging minimally invasive approach used to address a broad spectrum of skull base lesions and necessitates precise visualization to ensure complete resection within the complex anatomy of the endonasal space1. Such skull base pathologies are in close proximity to critical neurovascular structures, and encroachment can have significant consequences (e.g., neurological injury and death). Ongoing research to improve skull base surgery guidance includes virtual endoscopy,2 image-overlay,3,4 and intraoperative cone-beam CT5,6. Improved visualization using 3D endoscopes7 also offers a novel technique to improve patient safety and reduce clinical learning curves. The system described below extends such work in a novel modular architecture for video augmentation that automates the camera calibration process and can be adapted to other endoscopic or laparoscopic procedures. The system includes a streamlined calibration process consistent with clinical workflow and provides registration of the video scene with preoperative (or intraoperative) image data - e.g., structures defined in CT or MR. A clinical feasibility study is underway with the video augmentation system deployed in parallel to the existing clinical standard-of-care. This work translates video-CT beyond the laboratory and into a form suitable for clinical studies, demonstrates the utility of the system in clinical surgical practice, and assesses the benefit to surgical performance.
2. SYSTEM ARCHITECTURE
The video-CT system extends the TREK8 software architecture for image-guided surgery to a clinically practical form. As described previously, TREK binds open-source libraries for image visualization and analysis from 3D Slicer9 (na-mic kit, Brigham & Women’s Hospital, Cambridge MA) and real-time tracking and registration from cisst libraries10 (ERC, Johns Hopkins University, Baltimore MD). These modular components of the framework are illustrated in Figure 1. The front-end graphical user interface loads preoperative CT (or MR) and corresponding planning/segmentation data after processing with ITK-Snap11. We extended the cisst package (specifically, the computer vision, device interface, and tracking functionalities) to include an automatic camera calibration and hand-eye calibration described below. Interface to a clinical tracking system (Stealthstation, Medtronic Inc., Minneapolis MN) provided infrared tracking of a rigid body marker attached to the endoscope and a reference marker attached to the stereotactic head frame. All studies involved a high-definition (HD) video endoscope (H3-Z Camera, Karl Storz Inc., Tuttlingen Germany).
Figure 1.

UML component diagram detailing the classes supporting video augmentation and emphasizing the modular design of the system architecture. The system is an extension of the TREK architecture for image-guided surgery, binding cisst/saw libraries for real-time tracking and registration with 3D Slicer libraries for front-end visualization. The specific embodiment described in this paper was intended to streamline calibration processes in a manner suitable to clinical use by a trained OR technologist without disruption of OR workflow.
3. SURGICAL PLANNING
Excision of skull base tumors is a challenge even for experienced surgeons for numerous reasons, including the proximity of surgical targets to critical anatomy, such as the carotid arteries and cranial nerves. Preoperative diagnostic imaging provides a wealth of 3D anatomical information of these areas of interest, including CT, CT angiography (CTA), and MRI. However, a conventional intraoperative guidance system involves separate, unregistered display of such 3D images apart from the endoscopic video. Within such preoperative image data can be defined the surgical approach (trajectories) as well as segmentation of pertinent anatomical structures, the surgical target, and margins - referred to simply as “planning data.” The video augmentation workflow described below included an offline, preoperative process to define such planning data - specifically, to segment critical structures (i.e., the carotid arteries and optic nerves) and the surgical target (i.e., the tumor volume). Using ITK-Snap (NLM Insight Toolkit, University of Pennsylvania, Philadelphia PA), these structures were defined in preoperative CT/CTA and/or MR using semi-automatic region growing and thresholding complemented by manual refinement and final review by the operating surgeon.
4. VIDEO-CT REGISTRATION
4.1. CAMERA CALIBRATION
Translating the system from a research platform to a clinically useful form necessitated a fast process for calibration and registration of the endoscopic camera. Previous work employed the DLR toolbox12 (DLR CalLab and CalDe, German Aerospace Center, Wessling, Germany) requiring ~40 min for camera calibration with standard-definition video. The DLR process is lengthy due to manual identification of the origin and orientation for each image, comparable to that of the MATLAB (v2011b, The Mathworks, Natick MA) camera calibration toolbox (Camera Calibration Toolbox for Matlab, Caltech, Pasadena, California) in which the users identifies, in order, the four corners of the calibration grid. Although faster (~minutes for an experienced user), MATLAB was found to be less accurate in computing barrel distortion correction where the calibration grid is not always entirely in the field of view. We developed an automated camera calibration process by extending functionalities provided in OpenCV13 (v2.1, Intel research/Willow Garage Inc., Menlo Park CA). The manual identification of the origin and orientation was eliminated using chrominance thresholds to locate red, green, and blue markers in a custom calibration grid. Eignevalue-based features were correlated with iterative homography and used to solve for the intrinsic and extrinsic camera parameters.
4.2. HAND-EYE CALIBRATION
Registration of endoscopic video with preoperative 3D image data (referred to as “video-CT registration” throughout) was derived by the following transformations:
| (1) | 
Markers on the stereotactic frame or cranium in CT provide the CTTpatient transformation. As part of the standard-of-care registration step, either the same fiducial markers or surface point sampling was used to localize points with a tracked pointer using a commercially available navigation system (Stealthstation) to determine transformation from the system coordinates, in Right, Anterior, Superior (RAS) to reference, i.e referenceTRAS. Additional transformation from RAS to patient as recorded by the image acquisition system resolves the transformation from Stealthstation to TREK, allowing the video-CT pipeline to utilize the exact transformation from the standard-of-care patient registration.
The final transformation, endoscopeTcamera, is the unknown relationship of the tracked rigid body to the optical center of the endoscope. The metal washer anchoring the HD camera optical cable allowed us to move the attachment location of the rigid body away from the endoscope as shown in Fig. 4 to the top of the camera to better accommodate the surgeon’s hand during surgery. Figure 4 also illustrates the relationship of multiple camera poses required to solve for , the transformation from camera to endoscope during registration. In a single camera pose, Ai gives the transformation matrix from the tracker to the rigid body, while Bi is the transformation matrix from the optical center to the calibration grid, derived from camera calibration. One motion (i.e., two poses) yields the conventional hand-eye equation: where
| (2) | 
and the homogeneous transformation matrix has the form:
The hand-eye calibration for the video-CT system applied compact dual quaternions, the algebraic counterpart of screws as proposed by Daniilidis,14. A quaternion, , is a 4-tuple representation of a rotation extending complex numbers to . Dual quaternions, , extend quaternion representation with as a dual number, and as a dual vector. A line in space with direction through a point can be represented with the six-tuple , where the line moment, , is equal to and can be denoted with four parameters, which together with the rotation angle and the translation along the pitch d, constitute the six degrees of freedom of a rigid transformation. A rigid transformation can therefore be modeled as a rotation with the same angle about a line in space (i.e., the screw axis) not through the origin and a translation along this axis. The direction is parallel to the rotation axis, and the pitch d is the projection of the translation on the rotation axis. Using these relations, Daniilidis proved that the hand-eye transformation is independent of the angle and the pitch of the camera and hand motions, and depends only on the line parameters of the screw axes. With dual quaternions characterizing our transformations, we have:
| (3) | 
where denotes the extrinsic camera parameters, , tracked endoscope, and , the endoscopeTcamera transformation.
Figure 4.

Optical (infrared) tracking of a rigid body mounted on a sinus endoscope. (a,b) Previous position of infrared markers on the endoscope. (c,d) Placement of the infrared markers at the base of the endoscope allowed better line-of-sight to the tracker and reduced interference with the hand. (e) Photograph showing the tracked endoscope in the surgeon’s hand. The camera-to-endoscope transform is also labeled. (f) Illustration of different camera poses in an automated process for hand-eye calibration.
Hand-eye calibration with dual quaternions provided a fast and efficient simultaneous solution of rotation and translation using singular value decomposition. This method obviated the need for an optimization weight factor, which is typically required by linear/nonlinear optimizations when minimizing a sum of covariance-weighted squared prediction errors.
Video-CT registration accuracy was measured using a rigid anthropomorphic skull phantom derived from 3D rapid prototyping of segmentations from a cadaver CT scan15. First, the camera calibration process was evaluated with five different sets of calibration images (each set comprising ten 1920×1080 HD images) using the proposed, automated program as well as the DLR and MATLAB toolboxes. A successful calibration was defined as a result whose mean L1 norm re-projection error was less than 1 pixel. In the case of a square grid, the MATLAB calibration toolkit failed on all five sets due to the required visibility of the same four corners in all images. The re-projection error achieved with DLR and TREK were comparable, as shown in Fig. 3(b). Further tuning of the DLR parameters may improve calibration, but the process was extremely time consuming, as each DLR calibration in HD images required ~1 hour. By comparison, the automated TREK camera calibration averaged ~12 seconds to process each set.
Figure 3.

Calibration grid for automated camera calibration. The checkerboard consists of 18×16 squares [each 2×2 mm2] with red, green, and blue markers for automatic localization of origin and orientation. (a) Original (distorted) view and (b) undistorted view after calibration. (c) Re-projection error measured using the DLR camera calibration and automatic calibration in TREK.
Target registration error (TRE) was evaluated by locating manually segmented target points in the left ethmoid, right ethmoid, and sphenoid sinus of the phantom described above. The distance between a ray computed from the camera center and the CT points in 3D was computed as the TRE. As shown in Fig. 5, the TRE achieved with the TREK-calibrated video-CT system was ~2 mm, consistent with previous results using real-time optical tracker-based measurement of the endoscope pose. This also confirmed that relocating the rigid body farther from the endoscope preserved comparable geometric accuracy.
Figure 5.

Target Registration Error comparing the geometric accuracy of video-CT registration with DLR and TREK.
5. PRECLINICAL TESTS: CADAVER STUDIES
Having completed the technical development required for the translation of the research system to be clinically deployable, the system workflow and usability were tested in a series of preclinical cadaver studies. These sessions focused on not only feasibility and quality assessment of the system but also provided opportunities to familiarize the surgeon with the video-CT user interface. Care was taken to follow clinical steps and setup as closely as possible to accurately assess the workflow (Fig. 6).
Figure 6.

Video-CT augmentation workflow incorporating steps for the experimental system into the conventional pipeline.
Cadaver head specimens were imaged with standard CT, and critical structures were segmented with ITK-Snap. Similar to the target and critical anatomy anticipated in skull base surgery, mesh segmentations were created for the carotid arteries, optic nerves, and pituitary gland as shown in Fig. 8 in red, blue, and magenta, respectively. Such segmentation was performed offline as a preoperative step by a trained technologist, and the resulting “Surgical Plan” were reviewed and refined if necessary by the operating surgeon. The datasets were loaded onto a Stealthstation and further processed for the RAS to image space vector (IJK) that aligns the CT coordinate system in TREK video-CT to coordinates from the Stealthstation patient registration. Following calibration and registration of the endoscope and other tracked tools, patient registration was conducted as in the standard-of-care with the Stealthstation, completing the setup process for the operation.
Figure 8.

(a)-(c) Three perspectives of video-CT overlay as the endoscope is panned left-to-right and anterior-to-posterior in a preclinical evaluation following. The images show overlay of the carotid arteries (red), optic nerves (blue), and pituitary gland (purple) in trans-sphenoid approach to the skull base.
Figure 7 illustrates three of the views available in the guidance system - namely triplanar views (CT or cone-beam CT) overlaid with planning data along with real-time endoscopic video (without or with planning data overlay). Figure 7(a) shows a sagittal slice of the CT image with planning data superimposed, which is optionally displayed in the video-CT user interface along with other triplanar views. Figure 7(b) shows an endoscopic view in the region of the sphenoid sinus as in the standard-of-care (un-augmented) video, whereas Fig. 7(c) shows the same HD video scene augmented in real-time with overlay of the carotid arteries. Certain parameters of the video-CT system were exposed to the user for customizing the visualization - for example, adjustment of the color palette, lighting, opacity of critical structures, and placement of the virtual camera and focal lengths.
Figure 7.

(a) Example planning data shown in a sagittal CT slice of a cadaver employed in preclinical evaluation. Target structures included the anterior skull base (pink), the inferior clivus (green), the superior clivus (blue), and the surgical target (pituitary, in red). The clinical pilot study involves side-by-side display of (b) conventional (non-augmented) endoscopic video and (c) the experimental system for augmentation and real-time overlay of registered planning data within the video scene. The carotid arteries, which in this case have a narrow inter-carotid distance, are overlaid in red in a trans-sphenoid clival drillout procedure.
6. APPLICATION IN A CLINICAL PILOT STUDY
The video-CT augmentation system is deployed in an IRB-approved clinical pilot study in parallel to a conventional, standard-of-care (un-augmented) endoscopic video display as shown in Fig. 9. The pilot study involves ~15 neurosurgical patients and assesses primary outcome measures focusing on expert assessment of the utility of the video-CT overlay, the potential benefit to surgical confidence, and the visualization of critical structures on specifically delineated phases of approach/exposure, resection, and reconstruction by three surgeons (1 neurosurgeon and 2 otolaryngologists). The pilot study is underway, with patient volunteers enrolled under informed consent from the population undergoing surgical treatment of skull base pathologies, including benign/malignant tumors, skull base defects (e.g., cerebrospinal fluid leak and encephaloceles), and infectious/inflammatory diseases. Following each case, both the neurosurgeon and otolaryngologist are asked to respond to the questionnaire summarized in Table 1. Questions #1–3 provide ordinal ratings (score 1–5) by the following utility scale: 1 = Significant hindrance / Negative effect; 2 = Minor hindrance / Slightly negative effect; 3 = Not helpful / No benefit or hindrance; 4 = Somewhat helpful / Slight benefit; 5 = Very helpful / Major benefit. Questions #4–5 allow free response in relation to anatomical and disease variations outside those present within the particular case that would potentially benefit from video-CT overlay.
Figure 9.

Clinical study operating room setup, showing components from the standard-of-care and video augmentation system.
Table 1.
Summary questionnaire for expert assessment of video-CT in skull base surgery.
| Postoperative Questionnaire | |
|---|---|
| 1. | Rate the effect of video-CT overlay on overall: | 
| Surgical confidence (1–5) | |
| Certainty in structure localization (1–5) | |
| Efficiency (1–5) | |
| 2. | Rate the utility of video-CT overlay in visualizing: | 
| Carotid arteries (1–5) | |
| Optic nerves (1–5) | |
| Surgical target (specify) (1–5) | |
| 3. | Rate the utility of video-CT overlay during: | 
| Surgical approach (1–5) | |
| Target resection (1–5) | |
| Reconstruction (1–5) | |
| 4. | What variations in normal anatomy pertinent to this case would benefit most from video-CT overlay? | 
| 5. | What variations in pathology pertinent to this case would benefit the most from video-CT Overlay? | 
The potential benefit to surgical confidence and visualization of critical structures is clear: although the carotids are evident to varying degrees to a trained surgeon in protuberances based on bony landmarks and color variations on the posterior aspect of the sphenoid sinus, the augmented video-CT display can provide a visually significant improvement in the conspicuity of such subtleties – particularly in the context of a bloody surgical field and in cases exhibiting anatomical variations, and in situations where the artery is encased by tumor.16. The primary outcome of the clinical study is to record subjective assessment via the questionnaire summarized in Table 1, with analysis of the operator responses pending completion of the pilot study. Additional examples of video-CT augmentation in contexts believed to improve surgical confidence, localization accuracy, and efficiency (Question #1 in the questionnaire) are illustrated in Fig. 8. The utility of the video-CT system in improving visualization of these critical structures during the approach, surgical resection, and reconstruction is assessed in Questions #2 and #3 of the questionnaire.
Conventional endoscopy incorporates pointer tracking to provide registration with preoperative image data. However, this information presented in a triplanar view of the radiological axes does not emphasize the target or critical structures. These views are rendered separately from the endoscopic video and only remain registered if the pointer is within the field of view. Moreover, they require a significant, and potentially error-prone, mental registration on the part of the surgeon to relate the information rendered in the triplanar views with the video scene -- a challenge especially for inexperienced residents. Although such visualization may be adequate for certain surgical treatments, endoscopy, as well as laparoscopy, thoracoscopy, bronchoscopy, etc., is utilized increasingly in minimally invasive procedures. Especially in the context of increasingly complex anatomy in proximity to critical structures that may be difficult to delineate visually in the video scene, presenting additional information to the surgeon, such as planning data derived from multi-modality imaging, provides a more natural “window” on complex, multi-modality data directly within the primary means of visualization, the endoscopic video.
7. DISCUSSION
This work describes the development and translation of a video-CT augmentation system for an endoscopic skull base surgical clinical study. The modular design of the system allows augmentation of video with planning data to be extended to a broad spectrum of video-based surgeries, including orthopedic, abdominal, and thoracic procedures. The registration accuracy of the current system is limited by the accuracy of the optical tracking system, but has potential future improvements through the incorporation of intraoperative C-arm cone-beam CT and 3D image-based registration.17 Automation of the camera calibration process streamlined the video-CT registration system to a form suitable for use by a trained OR technologist in a manner that is consistent with surgical workflow. The clinical pilot study described herein provides a valuable basis for use of video-CT registration in routine clinical care and critically evaluates the utility and cases for which such capability will have most benefit.
Figure 2.

Screenshot of the video-CT system’s interface including the augmented display and triplanar view of the CT and surgical plan.
ACKNOWLEDGMENTS
This research was supported by the National Institutes of Health, grant number R01-CA127144. Interface to the clinical Stealthstation was accomplished with a software license provided by Medtronic Inc. (Minneapolis, MN). The authors extend sincere thanks to Dr. Peter Kazanzides, Mr. Anton Deguet, and Mr. Balazs Vagvolgyi (Department of Computer Science, Johns Hopkins University) for assistance with the cisst/saw package. Pre-clinical experiments were conducted in research facilities in the Departments of Biomedical Engineering, Radiology, and Neurosurgery and Oncology at Johns Hopkins University - with thanks to Dr. E. McVeigh, Dr. J. Lewin, and Dr. H. Brem, respectively. The clinical pilot study is carried out under an IRB-approved protocol at Johns Hopkins Medical Institute (Baltimore MD).
REFERENCES
- [1].Snyderman CH, Pant H, Carrau RL, Prevedello D, Gardner P, and Kassam AB, “What are the limits of endoscopic sinus surgery?: the expanded endonasal approach to the skull base.,” Keio Journal of Medicine, 58(3), 152:60 September (2009). [DOI] [PubMed] [Google Scholar]
- [2].Schulze F, Bühler K, Neubauer A, Kanitsar A, Holton L, and Wolfsberger S, “Intra-operative virtual endoscopy for image guided endonasal transsphenoidal pituitary surgery,” International Journal of Computer Assisted Radiology and Surgery, 5, 143–154 (2010). [DOI] [PubMed] [Google Scholar]
- [3].Lapeer R, Chen MS, Gonzalez G, Linney A, and Alusi G, “Image-enhanced surgical navigation for endoscopic sinus surgery: evaluating calibration, registration and tracking,” The International Journal of Medical Robotics and Computer Assisted Surgery , 4, 32–45 (2008). [DOI] [PubMed] [Google Scholar]
- [4].Kawamata T, Iseki H, Shibasaki T, and Hori T, “Endoscopic Augmented Reality Navigation System for Endonasal Transsphenoidal Surgery to Treat Pituitary Tumors,” Neurosurgery, 50(6), 1393–7 June (2002). [DOI] [PubMed] [Google Scholar]
- [5].Daly M, Siewerdsen J, Moseley D, Jaffray D, and Irish J, “Intraoperative cone-beam CT for guidance of head and neck surgery: Assessment of dose and image quality using a C-arm prototype,” Medical Physics, 33(10) , 3767–3780 October (2006). [DOI] [PubMed] [Google Scholar]
- [6].Siewerdsen JH, Daly MJ, Chan H, Nithiananthan S, Hamming N, Brock KK, and Irish JC, “High performance intraoperative cone-beam CT on a mobile C-arm: an integrated system for guidance of head and neck surgery,” in SPIE Medical Imaging, Orlando, 72610 J-10 (2009) [Google Scholar]
- [7].Fraser JF, Allen B, Anand VK, and Schwartz TH, “Three-dimensional Neurostereoendoscopy? Subjective and Objective Comparison to 2,” Minimally Invasive Neurosurgery, 5(1), 1–7 (2008). [DOI] [PubMed] [Google Scholar]
- [8].Uneri A, Schafer S, Mirota D, Nithiananthan S, Otake Y, Russell T, and Siewerdsen J, “TREK: An Integrated System Architecture for Intraoperative Cone-Beam CT Guided Surgery,” International Journal of Computer Assisted Radiology and Surgery, 7(1) , 159–73 January (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Pieper S, Lorensen B, Schroeder W, and Kikinis R, “The na-mic kit: Itk, vtk, pipelines, grids and 3d slicer as an open platform for the medical image computing community,” in Proc IEEE Intl Symp on Biomedical Imaging ISBI, 698–701 (2006). [Google Scholar]
- [10].Deguet A, Kumar R, Taylor RH, and Kazanzides P, “The cisst libraries for computer assisted intervention systems,” in The MIDAS Journal - Systems and Architectures for Computer Assisted Interventions. MICCAI Workshop (2008). [Google Scholar]
- [11].Yushkevich PA, Piven J, Hazlett C, Smith RG, Ho S, Gee JC, and Gerig G, “User-guided 3D active contour segmentation of anatomical structures: Signficantly improved efficiency and reliability.,” Neuroimage, 1116–28 (2006). [DOI] [PubMed] [Google Scholar]
- [12].Strobl and Hirzinger, “Optimal Hand-Eye Calibration.,” in International Robotics Conference and Automation, 4647–4653 (2006). [Google Scholar]
- [13].Bradski G, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools (2000). [Google Scholar]
- [14].Daniilidis K, “Hand-Eye Calibration Using Dual Quaternions,” International Journal of Robotics Research, 18(3) , 286–29 March (1999). [Google Scholar]
- [15].Vescan AD, Chan H, Daly MJ, Witterick I, Irish JC, and Siewerdsen J, “C-arm cone beam CT guidance of sinus and skull base surgery: quantitative surgical performance evaluation and development of a novel high-fidelity phantom,” in SPIE Medical Imaging, Orlando, 72610L (2009). [Google Scholar]
- [16].Kassam AB, Prevedello DM, Carrau RL, Snyderman CH, Thomas A, Gardner P, Zanation A, Duz B, Stefko ST, Byers K, and Horowitz MB, “Endoscopic endonasal skull base surgery: analysis of complications in the authors’ initial 800 patients,” Journal of Neurosurgery, 114(6) , 1544–68 June (2011). [DOI] [PubMed] [Google Scholar]
- [17].Mirota DJ, Uneri A, Schafer S, Nithiananthan S, Reh DD, Gallia GL, Taylor RH, Hager GD, and Siewerdsen JH, “Highaccuracy 3D image-based registration of endoscopic video to C-arm cone-beam CT for image-guided skull base surgery,” in SPIE Medical Imaging, Orlando, 79640J-79640J-10 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
