Homography-based Visual Servoing with Remote Center of Motion for Semi-autonomous Robotic Endoscope Manipulation

Martin Huber; John Bason Mitchell; Ross Henry; Sébastien Ourselin; Tom Vercauteren; Christos Bergeles

doi:10.1109/ISMR48346.2021.9661563

. Author manuscript; available in PMC: 2024 Sep 30.

Published in final edited form as: Int Symp Med Robot. 2021 Nov 17;220:1–7. doi: 10.1109/ISMR48346.2021.9661563

Homography-based Visual Servoing with Remote Center of Motion for Semi-autonomous Robotic Endoscope Manipulation

Martin Huber ^1,^✉, John Bason Mitchell ^1,², Ross Henry ¹, Sébastien Ourselin ¹, Tom Vercauteren ¹, Christos Bergeles ¹

PMCID: PMC7616652 EMSID: EMS198902 PMID: 39351396

Abstract

The dominant visual servoing approaches in Minimally Invasive Surgery (MIS) follow single points or adapt the endoscope’s field of view based on the surgical tools’ distance. These methods rely on point positions with respect to the camera frame to infer a control policy. Deviating from the dominant methods, we formulate a robotic controller that allows for image-based visual servoing that requires neither explicit tool and camera positions nor any explicit image depth information. The proposed method relies on homography-based image registration, which changes the automation paradigm from point-centric towards surgical-scene-centric approach. It simultaneously respects a programmable Remote Center of Motion (RCM). Our approach allows a surgeon to build a graph of desired views, from which, once built, views can be manually selected and automatically servoed to irrespective of robot-patient frame transformation changes. We evaluate our method on an abdominal phantom and provide an open source ROS Moveit integration for use with any serial manipulator³. A video is provided⁴.

I. Introduction

⁡When compared to open surgery, MIS takes place under endoscopic guidance and offers improved cosmetics, less blood loss, shorter recovery times and reduced cost [1]. In a traditional MIS setup, the surgeon is supported by an assistant who guides the endoscope. Although this task is conceptually simple, it requires trained personnel, which introduces cost [2]. The assistant surgeon exhibits tremor, suffers fatigue, and can be prone to communication failures [2]–[4].

Several robotic endoscope holders, such as AESOP [5], ViKY [6], and EndoAssist [7], have been developed to address these shortcomings. Research in [8] and [9] showcased a reduction in the intervention time. While robotic endoscope holders can facilitate improvements, they introduce additional workload to the surgeon. With the advance of automated surgical systems this additional workload can be reduced [10]. Therefore, different methods to automate endoscopic camera motion were explored.

Alongside automation via kinematic data, visual servoing, i.e. control through images, is considered a promising alternative, as it provides intra-operative feedback [11] and is less prone to errors from model mismatch [12]. In semi-autonomous setups, such as gaze or voice control [13], visual servoing can robustly reflect a surgeon’s intent and respect anatomical constraints or facilitate full autonomy.

Visual servoing approaches that satisfy a RCM constraint can be split into methods that rely on a mechanical RCM and methods that rely on a programmable RCM. There has been less research on visual servoing with programmable RCM because of robot singularities and constraints on the robot positioning, however, in contrast to a mechanical RCM, a programmable RCM can be adapted in real-time, and the robot, with which the programmable RCM is achieved, can be used for multiple purposes [14], for example in open surgery. Existing methods with mechanical RCM, and programmable RCM, will be detailed in Sec. I-A, and Sec. I-B, respectively.

A. Visual Servoing with Mechanical RCM

Examples of approaches that use a mechanical RCM are [15], where a visual servo controls the position of a marked forceps in image space. In [16], [17], the tool entry point is exploited to find the tool tip in image space and to center it via visual servoing. Another common scheme is to alter the camera’s zoom based on the surgical tools’ distance, which was first presented in [18], where the tools are tracked with markers. Research in [19]–[21], based on [22], [23], adjusts the camera’s distance in this manner. They align the camera’s optical axis with the line that spans from RCM to the tools’ center point. Such an approach requires a complicated registration procedure. In [24], Abdelaal et al. also adjust the camera’s distance to the surgical scene based on the tool distance, but they align the camera’s optical axis with the scene’s surface normal, which is made possible by their 6 DOF endoscope. Yu et al. [25] adjust the field of view’s width based on tool distance. In [26], Ma et al. deploy a visual servo to center a marked tool by incorporating depth information, which they extract from camera and tool motion. In [27], they extend this work into a quadratic program in which they constrain the camera’s distance with respect to the tools and the tool position in the image plane, whilst minimizing the joint velocities. They rely on stereoscopic images for depth information.

B. Visual Servoing with Programmable RCM

Multi-purpose serial manipulators can achieve a RCM programmatically. In [28], Osa et al. adapt the interaction matrix to account for the RCM constraint, which they then use to control a point in image space. The authors in [29] design a composite Jacobian method that integrates a RCM objective with a task function that defines an error on points in image space. Yang et al. in [30] also design a Jacobian gain controller that enforces the tip of a tool to reside within a defined region. They additionally request the endoscope to extend the surgeon’s natural line of sight. In [4], Li et al. introduce the RCM and a visual error via the image Jacobian as constraints to a quadratic problem that aims at satisfying these constraints whilst minimizing the joint velocities.

C. Limitations of Current Approaches and Contributions

The majority of existing methods rely on the tool distance to infer a control law. Only in [4], [26]–[30], the position of arbitrary points wrt. the camera frame is fed back to the robot. All of the existing methods rely on relative positions, which either requires tool and camera positions or depth images. Position data might only be accessible in a fully robotic setup and image depth is difficult to estimate in a dynamic surgical environment from a monocular camera. Stereoscopic images are usually not available in robotically assisted surgery.

Our paper addresses the above limitations with the following contributions:

We introduce a visual servo that navigates towards desired images rather than towards points.
We formulate a visual servo control law that depends neither on explicit tool and camera positions nor on depth information.

These are achieved with a programmable RCM, as it, in contrast to a mechanical RCM, is more flexible.

This paper is structured as follows. In Sec. II, we introduce the necessary theoretical background and the derivation of the proposed visual servoing task. In Sec. III, we explain implementation details and the robotic setup. Results are provided in Sec. IV, and conclusions in Sec. V.

II. Methods

Here, we first introduce the composite Jacobian for control in Sec. II-A. Then, we extend it by a novel homography-based task function in Sec. II-B, and describe the processing pipeline in Sec. II-C. In the following, scalars are depicted by lower case letters, vectors through bold lower case letters, and matrices as bold upper case letters. A point x is described with respect to frame F as ^FX.

A. Task Control with Remote Center of Motion Objective

For the task control with RCM objective, we follow the derivation of Aghakhani et al. [29]. Therefore, as schematically shown in Fig. 2, an open kinematic chain is attached to reference frame W. An endoscope is attached to the chain. It originates at position ^WX_i and has its camera frame at position ^WX_i+1. The endoscope enters the patient through the trocar at position ^WX_trocar. The RCM position ^WX_RCM is required to lie along the line connecting ^WX_i to ^WX_i+1, hence

⁡^{W} X_{RCM} =^{W} X_{i} + λ (⁡^{W} X_{i + 1} -^{W} X_{i}),

(1)

Fig. 2 — Schematic illustration of the setup: The axes’ RGB coloring corresponds to XYZ, respectively. A serial manipulator is connected to the world frame W. The endoscope spans from ^WX_i to ^WX_i+1and it enters the trocar, which lies at x_trocar. The camera rotates around the RCM $⁡^{W} X_{RCM}$ and its entry depth is proportional to λ ≥ 0. The camera observes the surgical scene (pink) from different frames C and C*.

where the scalar λ ≥ 0 is proportional to the entry depth. λ = 0 corresponds to maximal insertion. The endoscope’s translational velocity at position ^WX_RCM has to remain zero for the endoscope to reside at the trocar ^WX_trocar. It was derived in [29] as

⁡^{W} {\dot{X}}_{R C M} = {[\begin{matrix} J_{i}^{v} + λ (J_{i + 1}^{v} - J_{i}^{v}) \\ ⁡^{W_{X_{i + 1}} -^{W}} X_{i} \end{matrix}]}^{Τ} [\begin{array}{l} \dot{q} \\ \dot{λ} \end{array}],

(2)

where $J_{i}^{v}, J_{i + 1}^{v}$ are the Jacobians’ top three rows, therefore the translational parts, corresponding to points ^WX_i,^WX_i+1 w.r.t. the world frame, $\dot{q}$ are the instantaneous joint velocities, and $\dot{λ}$ is the rate of change of entry depth. Eq. (2) can be rewritten as

⁡^{W} {\dot{X}}_{RCM} = J_{RCM} [\begin{array}{l} \dot{q} \\ \dot{λ} \end{array}] .

(3)

Expanding on [29], we introduce a feedback to λ by projecting the trocar position X_trocar onto the endoscope via

λ = \frac{{(⁡^{W} x_{i + 1} -^{W} x_{i})}^{T} (⁡^{W} x_{trocar} -^{W} x_{i})}{∥^{W} x_{i + 1} -^{W} x_{i} ∥_{2}^{2}} .

(4)

Eq. (3) can be further extended by a task as follows

[\begin{matrix} \dot{t} \\ w_{{\dot{X}}_{RCM}} \end{matrix}] = [\begin{matrix} J_{t} & 0_{n_{1} \times 1} \\ J_{RCM} \end{matrix}] [\begin{matrix} \dot{q} \\ \dot{λ} \end{matrix}],

(5)

where $\overset{\cdot}{t}$ is the task velocity with task dimension n_t and J_t is the task Jacobian. Eq. (5) can be turned into a PID controller

[\begin{array}{l} \overset{\cdot}{q} \\ \dot{λ} \end{array}] = J_{cp}^{#} (K^{p} [\begin{matrix} e_{t}^{p} \\ ⁡^{W} e_{RCM}^{p} \end{matrix}] + K^{i} [\begin{matrix} e_{t}^{i} \\ ⁡^{W} e_{RCM}^{i} \end{matrix}] + K^{d} [\begin{matrix} e_{t}^{d} \\ ^{W} e_{RCM}^{d} \end{matrix}]),

(6)

where $J_{cp}^{#}$ is the pseudo-inverse of the composite Jacobian from (5), $e_{t}^{p / i / d}$ and $⁡^{W} e_{RCM}^{p / i / d}$ , are the proportional, integral, and differential errors for the task and the RCM, respectively, and K^p/i/d are the diagonal gain matrices. Therein, $⁡^{W} e_{RCM}^{i / d}$ are computed as the integral, and the differential of the proportional error $⁡^{W} e_{RCM}^{p} =^{W} x_{t r o c a r} -^{W} x_{RCM}$ .

In the following section, we introduce a homography-based visual servoing task.

B. Homography-based Visual Servoing Task

Suppose point ^WX is projected from a plane, i.e. the surgical scene, onto normalized coordinates m* in camera frame C*, see Fig. 2, via

⁡^{C *} m^{*} = \frac{1}{⁡^{C *} Z^{*}} {[⁡^{C *} X^{*} ⁡^{C *} Y^{*} ⁡^{C *} Z^{*}]}^{T},

(7)

which means it is observed by the camera as

⁡^{C *} p^{*} = K^{⁡^{C *}} m^{*},

(8)

in pixel coordinates $⁡^{C *} p^{*} = {[\begin{matrix} u^{*} v^{*} 1 \end{matrix}]}^{T}$ , with the camera’s intrinsic parameters K. Should the camera move under rotation R and translation t, the points in normalized coordinates will change according to a homography H such that [31]

\frac{⁡^{C} Z}{⁡^{C *} Z^{*}} ⁡^{C} m = H^{C *} m^{*}

(9)

In pixel coordinates this can be written as

\frac{⁡^{C} Z}{⁡^{C *} Z^{*}} ⁡^{C} p = G^{⁡^{C *}} p^{*},

(10)

with the projective homography G, for which the following relation holds

H = K^{- 1} G K .

(11)

As shown in [31], the task error $⁡^{C} e_{t^{'}} = {[\begin{matrix} ⁡^{C} e_{v} ⁡^{C} e_{ω} \end{matrix}]}^{T}$ that urges to minimize the distance between the desired projection of ^WX, ^C*m*, and the current one ^Cm, can be obtained purely from the homography that relates those points in normalized coordinates via

\begin{array}{l} ⁡^{C} e_{v} = {(H - I)}^{⁡^{C *}} m^{*} \\ {[⁡^{C} e_{ω}]}_{\times} = H - H^{T}, \end{array}

(12)

where ${[⁡^{C} e_{ω}]}_{\times}$ is the skew symmetric matrix of ^Ce_ω. The task error $⁡^{C} e_{t^{'}}$ is described in body coordinates. It can be transferred to the world frame W through rotation, which is proportional to camera frame’s instantaneous velocity

[\begin{matrix} ⁡^{W} R_{C} & 0 \\ 0 & ⁡^{W} R_{C} \end{matrix}] ⁡^{C} e_{t^{'}} =^{W} e_{t^{'}} ~ J_{i + 1} \dot{q}

(13)

where ^wR_C is the rotation of the camera frame with respect to the world frame, and J_i+1 is the camera frame’s Jacobian, including its rotational contributions. Only 4 DOF can be controlled at a time after imposing the RCM, which constraints 2 DOF. To capture this, we introduce operator P that projects the camera frame body velocity onto the remaining DOF. Together with (13), this yields

P_{a / b} ⁡^{C} e_{t^{'}} =^{C} e_{t_{a / b}} ~ P_{a / b} [\begin{matrix} ⁡^{C} R_{W} & 0 \\ 0 & ⁡^{C} R_{W} \end{matrix}] J_{i + 1} \dot{q} .

(14)

The projection operator P_a/b can take different forms, such that the task error is mapped onto any of the decoupled remaining DOF via

P_{a} = [\begin{array}{l} I_{3 \times 3} 0_{3 \times 3} \\ \begin{matrix} 0_{1 \times 3} 0 0 1 \end{matrix} \end{array}], P_{b} = [\begin{array}{c} 0 0 1 0_{1 \times 3} \\ 0_{3 \times 3} I_{3 \times 3} \end{array}] .

(16)

Therefore, P_a maps the task error $⁡^{C} e_{t^{'}}$ to its translational parts and the rotation about the optical axis, and P_b maps it to its rotational part and the error along the optical axis. We identify the case sensitive contributions of (14) as the task Jacobian from (5) and the task error from (6), which yields

J_{t} = P_{a / b} [\begin{matrix} ⁡^{C} R_{W} & 0 \\ 0 & ⁡^{C} R_{W} \end{matrix}] J_{i + 1}, e_{t}^{p} = {\begin{array}{l} ⁡^{C} e_{t_{a}} = {[\begin{matrix} ⁡^{C} e_{v} & ⁡^{C} e_{ω_{z}} \end{matrix}]}^{T} \\ ⁡^{C} e_{t_{b}} = {[\begin{matrix} {⁡^{C}}_{e_{v_{z}}} & ⁡^{C} e_{ω} \end{matrix}]}^{T} \end{array}

(16)

This results in a task dimension n_t = 4, which means that together with the RCM objective that introduces 3 constraints and adds the additional DOF λ, the robot has to have at least 6 DOF.

C. Processing Pipeline

An overview of the processing pipeline is depicted in Fig. 3. A surgeon first controls the endoscope from within the camera’s reference frame via the keyboard. Images of desired views are manually taken along the way and are used to construct a graph, wherein each vertex is an image. This is done within the homography generation node.

Initially, camera calibration considering an underlying radial/tangential distortion model is carried out to obtain the distortion coefficients and the camera intrinsics. Following that, an eye in hand calibration is performed to locate the camera frame position ^WX_i+1, and ^WX_i is set to lie along the negative optical axis at the endoscope’s length, see Fig. 2.

Each image ℐ that is processed within the homography generation node undergoes distortion removal, followed by an intensity-based automatic detection of the endoscopic boundary circle. Therein, the image is smoothed with a bilateral filter and thresholded in HSV image space to obtain a binary mask. The circle’s center is computed as the center of mass, and its radius is obtained from the steepest gradient of the marginalized binary mask. If the illumination in the endoscopic view is below a certain value, then the last known center and radius are considered instead. The maximum rectangle of a given aspect ratio that fits into the extracted circle is then cropped from the image ℐ. The crop is further rescaled. The camera intrinsics are updated accordingly from K to K′ by offsetting and scaling the principal point.

Once the graph is built, the surgeon can browse through the image gallery, as shown in Fig. 3, where each image corresponds to a vertex within the graph. The surgeon may then select a desired view and execute the visual servo. This will trigger a Dijkstra search for the closest path from the current vertex to the desired view/vertex at constant cost per edge. This path is executed sequentially. Therefore, the homography G from the next vertex to the current view is computed for the visual servo. To compute the homography, we extract image features and their descriptors with a SURF feature detector [32]. For each feature in the target view, the two nearest neighbors are found in the current view, and, via Lowe’s ratio test [33], only features with distinctive descriptors are kept. The homography that maps features from the target view to the current view is then determined under RANSAC outlier rejection.

The updated camera intrinsics K′, together with the desired homography G, are then sent down the pipeline to first transform the homography from pixel coordinates to normalized coordinates via (11) and then to compute the desired task $⁡^{C} e_{t^{'}}$ from (12). The update rate of these operations are restricted by the camera frame rate, which is why the desired trocar position ^WX_trocar is sent separately to the synchronizer node, see Fig. 3. The synchronizer node takes a homography RCM visual servo action client, HRCMVSActionClient, which request the HRCMVSAction-Server to execute the desired task $⁡^{C} e_{t^{'}}$ , while maintaining a desired trocar position ^Wx_trocar.

The HRCMVSActionServer implements a state machine, which rejects infeasible requests. It computes the forward kinematics as well as the Jacobians and computes a joint position update $Δ q = Δ t \dot{q}$ via (6) in the RCM implementation RCMImpl, where Δt is the control interval. The desired joint positions are then sent to the robot.

III. Experimental Setup

This section gives an overview of the robotic system and its components in Sec. III-A. Following that, clinically relevant questions and the evaluation protocol are addressed in Sec. III-B.

A. Robotic System

Our experimental setup, see Fig. 1, uses a KUKA LBR Med 7 R800 robot. To control it, we created a bridge to ROS by wrapping the Fast Robot Interface (FRI) [34] with ROS’ Hardware Interface functionality. We use a Storz Endocameleon Hopkins Telescope, from which we capture images using a Storz TH 102 H3-Z FI camera head. The endoscope is mounted to the LBR Med 7 R800 robot with a custom designed 3D print. For illumination, we connect a Storz TL 300 Power LED 300 light source to the endoscope. The image feed is output to SDI, which we convert to HDMI with a Monoprice 3G SDI to HDMI converter. We then grab the HDMI signal with a DeckLink 4K Extreme 12G and stream it onto the ROS network.

B. Clinical Scenario Evaluation Protocol

The proposed method is evaluated in the laparoscopic setup shown in Fig. 1. We utilize a Szabo Pelvic Trainer to simulate a trocar. A Kyoto Kagaku colon rectum tube is inserted into the Szabo Pelvic Trainer to model a laparoscopic view of the abdomen. The clinical procedure is then modeled as follows. The robot initially drives the endoscope to the trocar and λ in (1) is set to 1. Following that, the user mounts the camera and the light source to the laparoscope. The user then drives the laparoscope through the trocar into the phantom.

In the phantom, we identify four clinically relevant views of the scene. These views include an overview of the scene, a view of the tool insertion area towards the abdominal wall, and two close-ups, one for further examination. For visual servoing between these views in a clinical scenario, these three objectives are of importance

Servoing from any current to any target view.
Servoing to target views under tool motion.
Servoing to target views after phantom repositioning.

To address these scenarios, we design three experiments. For all experiments, after the laparoscope insertion, the user moves to the overview of the surgical scene, where the first image is taken through the GUI, which corresponds to the graph’s root view/vertex, see Fig. 3. We measure the deviation of the RCM from the trocar position, record the Mean Pairwise Distance (MPD) of SURF features from the current to the desired view, the task error, execution time, joint angles, and the camera tip position.

1). Servoing from any current view to any target view

In this scenario we investigate the system’s capability to autonomously execute extreme view changes. The user moves from the overview to a close-up, from where the scene is further examined. The laparoscope is then moved manually to grant view of the tool insertion area. At this stage, tools would be inserted into the patient and the user would begin to operate. Therefore, the user selects the close-up view through the GUI and executes the autonomous visual servo towards it.

2). Servoing to target views under tool motion

In this scenario we investigate autonomous visual servoing towards desired views under tool motion. Therefore, the user moves the laparoscope from the overview to the tool insertion area. Tools are then inserted and the user is asked to perform a sample task, which involves moving small LASTT Training Package rings. The visual servo simultaneously navigates back towards the overview.

3). Servoing to target views after phantom repositioning

In this scenario we investigate the system’s invariance under patient motion. Therefore, we reposition the phantom and execute the visual servo to autonomously readjust the overview. We include both phantom rotation and tilting.

IV. Results

In this section, we first present generic findings in Sec. IV-A, followed by quantitative measurements for the evaluation protocol from Sec. III-B, in Sec. IV-B.

A. Generic Results

In practice we found that controlling the camera frame’s rotational DOF, using P_b in (15), leads to more stable solutions. We tried to invert the task part of the composite Jacobian from (6) within the Nullspace of the RCM Jacobian, but obtained more flexible solutions by computing the pseudo-inverse as a damped least squares solution from the SVD with a damping factor of 5e − 4. Empirically, we got good results with the following gain matrices

\begin{array}{l} K^{p} = diag (1.2, 1.5, 1.5, 1.8, 1 e 2, 1 e 2, 1 e 2) \\ K^{i} = diag (3 e - 3, 2.5 e - 3, 2.5 e - 3, 1.5 e - 3, 0, 0, 0) \\ K^{d} = diag (6 e - 2, 5 e - 2, 5 e - 2, 3 e - 2, 0, 0, 0) . \end{array}

The integral term therein helped remove a steady state error in the homography-based image alignments. The desired homography extraction proved noisy but correct on average, so we introduced a moving average filter on the task error ^ce_t with a buffer length of 10 at a frame rate of 30 fps. The sequential execution of desired views was greatly sped up by calling early convergence for intermediate vertices/views at a MPD of 5 pixels and a final convergence at a MPD of 1.5 pixels.

B. Clinical Scenario Results

1). Servoing from any current view to any target view

In this section we investigate the trajectory from tool insertion view to close-up, see Sec. III-B.1. The task error and the RCM deviation from the trocar position are depicted in Fig. 4. It can be seen that the deviation from the trocar position stays below 4.6 mm, at an average deviation of 0.8±0.8 mm. The task error converges for all vertices/views. The final task error corresponds to a camera tip deviation of 0.4 mm, when compared to the desired position. The joint angles deviate on average by 8.2 ± 6.0° from the initial configuration.

Fig. 4 — RCM deviation (top) and task error evolution (bottom) over time for the protocol in Sec. III-B.1. The visual servo autonomously servos from the tool insertion area to the close-up. Target views/vertices are updated along the way, as indicated by the black dotted lines.

2). Servoing to target views under tool motion

For this measurement, the visual servo navigates from the tool insertion area to the overview under tool motion, see Sec. III-B.2. The trajectory with all intermediate and the final vertex/view is shown in Fig. 5. It can be seen that, despite tool motion, the visual servo converges at pixel accuracy towards the desired views. The final camera position deviates by 1.4 mm to the desired one. The robot joint angles deviate on average by 1.1 ± 1.1° from the initial configuration. A video of this experiment is provided under footnote 4.

Fig. 5 — Servoing under tool motion, see Sec. III-B.2. Initially, the graph is built in manual control mode (top row), yellow indicates the current vertex. The visual servo is then executed to navigate back from the tool insertion to the overview (bottom row). Pink indicates the target vertex.

3). Servoing to target views after phantom repositioning

In this section we investigate the convergence of the visual error after phantom repositioning, see Sec. III-B.3. We perform clockwise and counterclockwise repositioning as well as phantom tilting. We keep the trocar at the initial position. The camera frame then rotates and translates towards a position that minimizes the visual error. The translation ΔX_i+1 and the angle axis rotation angle α are listed in Tab. I. It can be seen that the robotic laparoscope performs significant motion to readjust the view. The MPD is minimized to pixel range and the final deviation from the trocar remains in the submillimeter scale for all cases.

TABLE I.

CLOCKWISE (CW), AND COUNTERCLOCKWISE (CCW) REPOSITIONING, AND PHANTOM TILTING, CORRESPONDING TO THE PROTOCOL IN SEC. III-B.3. Δx_i+1 INDICATES THE CAMERA MOTION, α THE ANGLE AXIS ROTATION ANGLE FROM INITIAL TO FINAL CAMERA ROTATION, Δq THE JOINT ANGLE POSITION CHANGE, e_RCM THE FINAL DEVIATION OF THE RCM FROM THE TROCAR, AND MPD THE FINAL VISUAL ERROR.

Metric	CW	CCW	Tilt
Δx_{i + 1} / mm	10.4	6.7	4.7
α / °	16.6	10.2	4.8
Δq / °	20.5 ± 12.0	17.4 ± 13.4	2.6 ± 2.3
^W $e_{RCM}^{p}$ / mm	0.1	0.2	0.07
MPD / pixel	3.2 ± 2.5	2.0 ± 1.0	1.4 ± 1.2

Open in a new tab

V. Conclusion

In this work we introduced a visual servo that is independent of depth information and explicit tool and camera positions. The introduced method simultaneously respects a programmable RCM. Our method was successfully integrated into a robotic setup and clinically relevant scenarios were investigated on an abdominal phantom.

It was shown in Sec. IV-B.1 that the proposed composite Jacobian PID controller with homography-based task simultaneously minimizes the RCM and the visual servo objective. The integral term proved helpful to remove a steady state error in the image alignment. The homography estimation was noisy due to feature sparseness and required for average filtering. The graph representation allowed for visual servoing between images that were not relatable by a single homography transformation. In Sec. IV-B.2, tools were successfully introduced into the scene. It is to be noted that the tools were initially not present in the target views, which removed potential image misalignment. In Sec. IV-B.3 the phantom was repositioned significantly with a constant trocar position and image readjustment was successfully demonstrated. The MPD got close to perfect alignment, however, the trocar was possibly moved slightly during repositioning, which made perfect convergence not possible. The robot’s joint angles did not always return to their initial configuration. The camera position converged in submillimeter range to its target.

We successfully demonstrated that our visual servo navigates the camera in submillimeter range without depth information or explicit tool and camera positions. This proves the future potential for safe patient application and it circumvents time-consuming registration procedures. As our setup has one redundant DOF, the robot did not always return to its initial configuration. This might be handled by introducing joint state objectives to the Jacobian’s nullspace. While our visual servo is independent of registration procedures, the RCM requires initialization, and tracking. In future work, the controller might be updated as to incorporate force-torque sensing to update the RCM. Although the environment was mostly static, the homography estimation was noisy. In future research, one might, therefore, incorporate homography estimation that is invariant under object motion and robust under feature sparseness, using deep learning approaches, as shown in [35].

Acknowledgment

This work was supported by core and project funding from the Wellcome/EPSRC [WT203148/Z/16/Z; NS/A000049/1; WT101957; NS/A000027/1]. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101016985 (FAROS project). The authors gratefully acknowledge the support of Dr Carlo Seneci, Maleeha Al-Hamadani, Dr Chayanin Tangwiriyasakul, Dr Hongbing Liu, and Julius Bernth in the research that led to this manuscript.

Footnotes

https://github.com/RViMLab/hrcm_vs_ws.git

⁴

https://drive.google.com/file/d/1UCrR27xit6TTq3T9pTIEg1fMsfT3j/view?usp=sharing

References

[1].Vitiello V, Lee S-L, Cundy TP, Yang G-Z. Emerging robotic platforms for minimally invasive surgery. IEEE reviews in biomedical engineering. 2012;6:111–126. doi: 10.1109/RBME.2012.2236311. [DOI] [PubMed] [Google Scholar]
[2].Horgan S, Vanuno D. Robots in laparoscopic surgery. Journal of Laparoendoscopic & Advanced Surgical Techniques. 2001;11(6):415–419. doi: 10.1089/10926420152761950. [DOI] [PubMed] [Google Scholar]
[3].Palep JH. Robotic assisted minimally invasive surgery. Journal of minimal access surgery. 2009;5(1):1. doi: 10.4103/0972-9941.51313. [DOI] [PMC free article] [PubMed] [Google Scholar]
[4].Li W, Chiu PWY, Li Z. An accelerated finite-time convergent neural network for visual servoing of a flexible surgical endoscope with physical and RCM constraints. IEEE transactions on neural networks and learning systems. 2020;31(12):5272–5284. doi: 10.1109/TNNLS.2020.2965553. [DOI] [PubMed] [Google Scholar]
[5].Unger S, Unger H, Bass R. Aesop robotic arm. Surgical endoscopy. 1994;8(9):1131. doi: 10.1007/BF00705739. [DOI] [PubMed] [Google Scholar]
[6].Long J-A, Cinquin P, Troccaz J, Voros S, Berkelman P, Descotes J-L, Letoublon C, Rambeaud J-J. Development of miniaturized light endoscope-holder robot for laparoscopic surgery. Journal of Endourology. 2007;21(8):911–914. doi: 10.1089/end.2006.0328. [DOI] [PubMed] [Google Scholar]
[7].Gilbert J. The endoassist robotic camera holder as an aid to the introduction of laparoscopic colorectal surgery. The Annals of The Royal College of Surgeons of England. 2009;91(5):389–393. doi: 10.1308/003588409X392162. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Aiono S, Gilbert J, Soin B, Finlay P, Gordan A. Controlled trial of the introduction of a robotic camera assistant (endo assist) for laparoscopic cholecystectomy. Surgical Endoscopy and Other Interventional Techniques. 2002;16(9):1267–1270. doi: 10.1007/s00464-001-9174-7. [DOI] [PubMed] [Google Scholar]
[9].Voros S, Haber G-P, Menudet J-F, Long J-A, Cinquin P. Viky robotic scope holder: Initial clinical experience and preliminary results using instrument tracking. IEEE/ASME transactions on mechatronics. 2010;15(6):879–886. [Google Scholar]
[10].Moustris GP, Hiridis SC, Deliparaschos KM, Konstantinidis KM. Evolution of autonomous and semi-autonomous robotic surgical systems: a review of the literature. The international journal of medical robotics and computer assisted surgery. 2011;7(4):375–392. doi: 10.1002/rcs.408. [DOI] [PubMed] [Google Scholar]
[11].Pandya A, Reisner LA, King B, Lucas N, Composto A, Klein M, Ellis RD. A review of camera viewpoint automation in robotic and laparoscopic surgery. Robotics. 2014;3(3):310–329. [Google Scholar]
[12].Azizian M, Khoshnam M, Najmaei N, Patel RV. Visual servoing in medical robotics: a survey. part i: endoscopic and direct vision imaging–techniques and applications. The international journal of medical robotics and computer assisted surgery. 2014;10(3):263–274. doi: 10.1002/rcs.1531. [DOI] [PubMed] [Google Scholar]
[13].Taniguchi K, Nishikawa A, Sekimoto M, Kobayashi T, Kazuhara K, Ichihara T, Kurashita N, Takiguchi S, Doki Y, Mori M, et al. Classification, design and evaluation of endoscope robots. Robot Surgery. 2010;1:172. [Google Scholar]
[14].Kuo C-H, Dai JS, Dasgupta P. Kinematic design considerations for minimally invasive surgical robots: an overview. The International Journal of Medical Robotics and Computer Assisted Surgery. 2012;8(2):127–145. doi: 10.1002/rcs.453. [DOI] [PubMed] [Google Scholar]
[15].Omote K, Feussner H, Ungeheuer A, Arbter K, Wei G-Q, Siewert JR, Hirzinger G. Self-guided robotic camera control for laparoscopic surgery compared with human camera control. The American journal of surgery. 1999;177(4):321–324. doi: 10.1016/s0002-9610(99)00055-0. [DOI] [PubMed] [Google Scholar]
[16].Agustinos A, Wolf R, Long J-A, Cinquin P, Voros S. Visual servoing of a robotic endoscope holder based on surgical instrument tracking; 5th IEEE RAS/EMBS International Conference on Biomedical Robotics and Biomechatronics; 2014. pp. 13–18. [Google Scholar]
[17].Voros S, Long J-A, Cinquin P. Automatic detection of instruments in laparoscopic images: A first step towards high-level command of robotic endoscopic holders. The International Journal of Robotics Research. 2007;26(11-12):1173–1190. [Google Scholar]
[18].King BW, Reisner LA, Pandya AK, Composto AM, Ellis RD, Klein MD. Towards an autonomous robot for camera control during laparoscopic surgery. Journal of laparoendoscopic & advanced surgical techniques. 2013;23(12):1027–1030. doi: 10.1089/lap.2013.0304. [DOI] [PubMed] [Google Scholar]
[19].Eslamian S, Reisner LA, Pandya AK. Development and evaluation of an autonomous camera control algorithm on the da vinci surgical system. The International Journal of Medical Robotics and Computer Assisted Surgery. 2020;16(2):e2036. doi: 10.1002/rcs.2036. [DOI] [PubMed] [Google Scholar]
[20].Mariani A, Colaci G, Da Col T, Sanna N, Vendrame E, Menciassi A, De Momi E. An experimental comparison towards autonomous camera navigation to optimize training in robot assisted surgery. IEEE Robotics and Automation Letters. 2020;5(2):1461–1467. [Google Scholar]
[21].Da Col T, Mariani A, Deguet A, Menciassi A, Kazanzides P, De Momi E. Scan: System for camera autonomous navigation in robotic-assisted surgery [Google Scholar]
[22].Eslamian S, Reisner L, King B, Pandya A. Towards the implementation of an autonomous camera algorithm on the da vinci platform. Studies in health technology and informatics. 2016;220:118–23. [PubMed] [Google Scholar]
[23].Eslamian S, Reisner LA, King BW, Pandya AK. An autonomous camera system using the da vinci research kit. 2017 [Google Scholar]
[24].Abdelaal AE, Hong N, Avinash A, Budihal D, Sakr M, Hager GD, Salcudean SE. Orientation matters: 6-dof autonomous camera movement for minimally invasive surgery. arXiv preprint arXiv. 2020:2012.02836 [Google Scholar]
[25].Yu L, Li H, Zhao L, Ren S, Gu Q. Automatic guidance of laparoscope based on the region of interest for robot assisted laparoscopic surgery. Computer Assisted Surgery. 2016;21(sup1):17–21. [Google Scholar]
[26].Ma X, Song C, Chiu PW, Li Z. Autonomous flexible endoscope for minimally invasive surgery with enhanced safety. IEEE Robotics and Automation Letters. 2019;4(3):2607–2613. [Google Scholar]
[27].Ma X, Song C, Chiu PW, Li Z. Visual servo of a 6-dof robotic stereo flexible endoscope based on da vincix research kit (dvrk) system. IEEE Robotics and Automation Letters. 2020;5(2):820–827. [Google Scholar]
[28].Osa T, Staub C, Knoll A. Framework of automatic robot surgery system using visual servoing; 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems; 2010. pp. 1837–1842. [Google Scholar]
[29].Aghakhani N, Geravand M, Shahriari N, Vendittelli M, Oriolo G. Task control with remote center of motion constraint for minimally invasive robotic surgery; 2013 IEEE International Conference on Robotics and Automation; 2013. pp. 5807–5812. [Google Scholar]
[30].Yang B, Chen W, Wang Z, Lu Y, Mao J, Wang H, Liu Y-H. Adaptive fov control of laparoscopes with programmable composed constraints. IEEE Transactions on Medical Robotics and Bionics. 2019;1(4):206–217. [Google Scholar]
[31].Benhimane S, Malis E. Homography-based 2d visual servoing; Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006 ICRA 2006; 2006. pp. 2397–2402. [Google Scholar]
[32].Bay H, Tuytelaars T, Van Gool L. Surf: Speeded up robust features; European conference on computer vision; Springer; 2006. pp. 404–417. [Google Scholar]
[33].Lowe DG. Distinctive image features from scale-invariant key-points. International journal of computer vision. 2004;60(2):91–110. [Google Scholar]
[34].Schreiber G, Stemmer A, Bischoff R. The fast research interface for the kuka lightweight robot; IEEE Workshop on Innovative Robot Control Architectures for Demanding (Research) Applications How to Modify and Enhance Commercial Controllers (ICRA 2010); 2010. pp. 15–21. [Google Scholar]
[35].Huber M, Ourselin S, Bergeles C, Vercauteren T. Deep homography estimation in dynamic surgical scenes for laparoscopic camera motion extraction. arXiv preprint arXiv. 2021:2109.15098. doi: 10.1080/21681163.2021.2002195. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] [1].Vitiello V, Lee S-L, Cundy TP, Yang G-Z. Emerging robotic platforms for minimally invasive surgery. IEEE reviews in biomedical engineering. 2012;6:111–126. doi: 10.1109/RBME.2012.2236311. [DOI] [PubMed] [Google Scholar]

[R2] [2].Horgan S, Vanuno D. Robots in laparoscopic surgery. Journal of Laparoendoscopic & Advanced Surgical Techniques. 2001;11(6):415–419. doi: 10.1089/10926420152761950. [DOI] [PubMed] [Google Scholar]

[R3] [3].Palep JH. Robotic assisted minimally invasive surgery. Journal of minimal access surgery. 2009;5(1):1. doi: 10.4103/0972-9941.51313. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] [4].Li W, Chiu PWY, Li Z. An accelerated finite-time convergent neural network for visual servoing of a flexible surgical endoscope with physical and RCM constraints. IEEE transactions on neural networks and learning systems. 2020;31(12):5272–5284. doi: 10.1109/TNNLS.2020.2965553. [DOI] [PubMed] [Google Scholar]

[R5] [5].Unger S, Unger H, Bass R. Aesop robotic arm. Surgical endoscopy. 1994;8(9):1131. doi: 10.1007/BF00705739. [DOI] [PubMed] [Google Scholar]

[R6] [6].Long J-A, Cinquin P, Troccaz J, Voros S, Berkelman P, Descotes J-L, Letoublon C, Rambeaud J-J. Development of miniaturized light endoscope-holder robot for laparoscopic surgery. Journal of Endourology. 2007;21(8):911–914. doi: 10.1089/end.2006.0328. [DOI] [PubMed] [Google Scholar]

[R7] [7].Gilbert J. The endoassist robotic camera holder as an aid to the introduction of laparoscopic colorectal surgery. The Annals of The Royal College of Surgeons of England. 2009;91(5):389–393. doi: 10.1308/003588409X392162. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] [8].Aiono S, Gilbert J, Soin B, Finlay P, Gordan A. Controlled trial of the introduction of a robotic camera assistant (endo assist) for laparoscopic cholecystectomy. Surgical Endoscopy and Other Interventional Techniques. 2002;16(9):1267–1270. doi: 10.1007/s00464-001-9174-7. [DOI] [PubMed] [Google Scholar]

[R9] [9].Voros S, Haber G-P, Menudet J-F, Long J-A, Cinquin P. Viky robotic scope holder: Initial clinical experience and preliminary results using instrument tracking. IEEE/ASME transactions on mechatronics. 2010;15(6):879–886. [Google Scholar]

[R10] [10].Moustris GP, Hiridis SC, Deliparaschos KM, Konstantinidis KM. Evolution of autonomous and semi-autonomous robotic surgical systems: a review of the literature. The international journal of medical robotics and computer assisted surgery. 2011;7(4):375–392. doi: 10.1002/rcs.408. [DOI] [PubMed] [Google Scholar]

[R11] [11].Pandya A, Reisner LA, King B, Lucas N, Composto A, Klein M, Ellis RD. A review of camera viewpoint automation in robotic and laparoscopic surgery. Robotics. 2014;3(3):310–329. [Google Scholar]

[R12] [12].Azizian M, Khoshnam M, Najmaei N, Patel RV. Visual servoing in medical robotics: a survey. part i: endoscopic and direct vision imaging–techniques and applications. The international journal of medical robotics and computer assisted surgery. 2014;10(3):263–274. doi: 10.1002/rcs.1531. [DOI] [PubMed] [Google Scholar]

[R13] [13].Taniguchi K, Nishikawa A, Sekimoto M, Kobayashi T, Kazuhara K, Ichihara T, Kurashita N, Takiguchi S, Doki Y, Mori M, et al. Classification, design and evaluation of endoscope robots. Robot Surgery. 2010;1:172. [Google Scholar]

[R14] [14].Kuo C-H, Dai JS, Dasgupta P. Kinematic design considerations for minimally invasive surgical robots: an overview. The International Journal of Medical Robotics and Computer Assisted Surgery. 2012;8(2):127–145. doi: 10.1002/rcs.453. [DOI] [PubMed] [Google Scholar]

[R15] [15].Omote K, Feussner H, Ungeheuer A, Arbter K, Wei G-Q, Siewert JR, Hirzinger G. Self-guided robotic camera control for laparoscopic surgery compared with human camera control. The American journal of surgery. 1999;177(4):321–324. doi: 10.1016/s0002-9610(99)00055-0. [DOI] [PubMed] [Google Scholar]

[R16] [16].Agustinos A, Wolf R, Long J-A, Cinquin P, Voros S. Visual servoing of a robotic endoscope holder based on surgical instrument tracking; 5th IEEE RAS/EMBS International Conference on Biomedical Robotics and Biomechatronics; 2014. pp. 13–18. [Google Scholar]

[R17] [17].Voros S, Long J-A, Cinquin P. Automatic detection of instruments in laparoscopic images: A first step towards high-level command of robotic endoscopic holders. The International Journal of Robotics Research. 2007;26(11-12):1173–1190. [Google Scholar]

[R18] [18].King BW, Reisner LA, Pandya AK, Composto AM, Ellis RD, Klein MD. Towards an autonomous robot for camera control during laparoscopic surgery. Journal of laparoendoscopic & advanced surgical techniques. 2013;23(12):1027–1030. doi: 10.1089/lap.2013.0304. [DOI] [PubMed] [Google Scholar]

[R19] [19].Eslamian S, Reisner LA, Pandya AK. Development and evaluation of an autonomous camera control algorithm on the da vinci surgical system. The International Journal of Medical Robotics and Computer Assisted Surgery. 2020;16(2):e2036. doi: 10.1002/rcs.2036. [DOI] [PubMed] [Google Scholar]

[R20] [20].Mariani A, Colaci G, Da Col T, Sanna N, Vendrame E, Menciassi A, De Momi E. An experimental comparison towards autonomous camera navigation to optimize training in robot assisted surgery. IEEE Robotics and Automation Letters. 2020;5(2):1461–1467. [Google Scholar]

[R21] [21].Da Col T, Mariani A, Deguet A, Menciassi A, Kazanzides P, De Momi E. Scan: System for camera autonomous navigation in robotic-assisted surgery [Google Scholar]

[R22] [22].Eslamian S, Reisner L, King B, Pandya A. Towards the implementation of an autonomous camera algorithm on the da vinci platform. Studies in health technology and informatics. 2016;220:118–23. [PubMed] [Google Scholar]

[R23] [23].Eslamian S, Reisner LA, King BW, Pandya AK. An autonomous camera system using the da vinci research kit. 2017 [Google Scholar]

[R24] [24].Abdelaal AE, Hong N, Avinash A, Budihal D, Sakr M, Hager GD, Salcudean SE. Orientation matters: 6-dof autonomous camera movement for minimally invasive surgery. arXiv preprint arXiv. 2020:2012.02836 [Google Scholar]

[R25] [25].Yu L, Li H, Zhao L, Ren S, Gu Q. Automatic guidance of laparoscope based on the region of interest for robot assisted laparoscopic surgery. Computer Assisted Surgery. 2016;21(sup1):17–21. [Google Scholar]

[R26] [26].Ma X, Song C, Chiu PW, Li Z. Autonomous flexible endoscope for minimally invasive surgery with enhanced safety. IEEE Robotics and Automation Letters. 2019;4(3):2607–2613. [Google Scholar]

[R27] [27].Ma X, Song C, Chiu PW, Li Z. Visual servo of a 6-dof robotic stereo flexible endoscope based on da vincix research kit (dvrk) system. IEEE Robotics and Automation Letters. 2020;5(2):820–827. [Google Scholar]

[R28] [28].Osa T, Staub C, Knoll A. Framework of automatic robot surgery system using visual servoing; 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems; 2010. pp. 1837–1842. [Google Scholar]

[R29] [29].Aghakhani N, Geravand M, Shahriari N, Vendittelli M, Oriolo G. Task control with remote center of motion constraint for minimally invasive robotic surgery; 2013 IEEE International Conference on Robotics and Automation; 2013. pp. 5807–5812. [Google Scholar]

[R30] [30].Yang B, Chen W, Wang Z, Lu Y, Mao J, Wang H, Liu Y-H. Adaptive fov control of laparoscopes with programmable composed constraints. IEEE Transactions on Medical Robotics and Bionics. 2019;1(4):206–217. [Google Scholar]

[R31] [31].Benhimane S, Malis E. Homography-based 2d visual servoing; Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006 ICRA 2006; 2006. pp. 2397–2402. [Google Scholar]

[R32] [32].Bay H, Tuytelaars T, Van Gool L. Surf: Speeded up robust features; European conference on computer vision; Springer; 2006. pp. 404–417. [Google Scholar]

[R33] [33].Lowe DG. Distinctive image features from scale-invariant key-points. International journal of computer vision. 2004;60(2):91–110. [Google Scholar]

[R34] [34].Schreiber G, Stemmer A, Bischoff R. The fast research interface for the kuka lightweight robot; IEEE Workshop on Innovative Robot Control Architectures for Demanding (Research) Applications How to Modify and Enhance Commercial Controllers (ICRA 2010); 2010. pp. 15–21. [Google Scholar]

[R35] [35].Huber M, Ourselin S, Bergeles C, Vercauteren T. Deep homography estimation in dynamic surgical scenes for laparoscopic camera motion extraction. arXiv preprint arXiv. 2021:2109.15098. doi: 10.1080/21681163.2021.2002195. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Homography-based Visual Servoing with Remote Center of Motion for Semi-autonomous Robotic Endoscope Manipulation

Martin Huber

John Bason Mitchell

Ross Henry

Sébastien Ourselin

Tom Vercauteren

Christos Bergeles

Abstract

I. Introduction

A. Visual Servoing with Mechanical RCM

B. Visual Servoing with Programmable RCM

C. Limitations of Current Approaches and Contributions

II. Methods

A. Task Control with Remote Center of Motion Objective

Fig. 2.

B. Homography-based Visual Servoing Task

C. Processing Pipeline

Fig. 3.

III. Experimental Setup

A. Robotic System

Fig. 1.

B. Clinical Scenario Evaluation Protocol

1). Servoing from any current view to any target view

2). Servoing to target views under tool motion

3). Servoing to target views after phantom repositioning

IV. Results

A. Generic Results

B. Clinical Scenario Results

1). Servoing from any current view to any target view

Fig. 4.

2). Servoing to target views under tool motion

Fig. 5.

3). Servoing to target views after phantom repositioning

TABLE I.

V. Conclusion

Acknowledgment

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases