Skip to main content
Springer logoLink to Springer
. 2024 Jun 17;19(8):1569–1578. doi: 10.1007/s11548-024-03208-w

Autonomous navigation of catheters and guidewires in mechanical thrombectomy using inverse reinforcement learning

Harry Robertshaw 1, Lennart Karstensen 2, Benjamin Jackson 1, Alejandro Granados 1, Thomas C Booth 1,3,
PMCID: PMC7616368  EMSID: EMS197960  PMID: 38884893

Abstract

Purpose

Autonomous navigation of catheters and guidewires can enhance endovascular surgery safety and efficacy, reducing procedure times and operator radiation exposure. Integrating tele-operated robotics could widen access to time-sensitive emergency procedures like mechanical thrombectomy (MT). Reinforcement learning (RL) shows potential in endovascular navigation, yet its application encounters challenges without a reward signal. This study explores the viability of autonomous guidewire navigation in MT vasculature using inverse reinforcement learning (IRL) to leverage expert demonstrations.

Methods

Employing the Simulation Open Framework Architecture (SOFA), this study established a simulation-based training and evaluation environment for MT navigation. We used IRL to infer reward functions from expert behaviour when navigating a guidewire and catheter. We utilized the soft actor-critic algorithm to train models with various reward functions and compared their performance in silico.

Results

We demonstrated feasibility of navigation using IRL. When evaluating single- versus dual-device (i.e. guidewire versus catheter and guidewire) tracking, both methods achieved high success rates of 95% and 96%, respectively. Dual tracking, however, utilized both devices mimicking an expert. A success rate of 100% and procedure time of 22.6 s were obtained when training with a reward function obtained through ‘reward shaping’. This outperformed a dense reward function (96%, 24.9 s) and an IRL-derived reward function (48%, 59.2 s).

Conclusions

We have contributed to the advancement of autonomous endovascular intervention navigation, particularly MT, by effectively employing IRL based on demonstrator expertise. The results underscore the potential of using reward shaping to efficiently train models, offering a promising avenue for enhancing the accessibility and precision of MT procedures. We envisage that future research can extend our methodology to diverse anatomical structures to enhance generalizability.

Keywords: Inverse reinforcement learning, Mechanical thrombectomy, Machine learning, Artificial intelligence, Autonomous navigation, Endovascular intervention

Introduction

Cardiovascular (CV) diseases are the most common cause of death across Europe, accounting for more than 4 million deaths each year, with cerebrovascular disease accounting for 25.4% of CV-related mortality across all ages and genders [1]. Mechanical thrombectomy (MT) has become an established treatment for patients with acute ischaemic stroke due to large vessel occlusion, increasing the likelihood that a patient will be functionally independent after a stroke [25]. During such a procedure, an operator navigates a guidewire and guide catheter from an insertion point (typically the common femoral artery) through the common iliac artery and descending aorta to the aortic arch. Subsequently, a guide catheter is typically advanced over a guidewire through the aortic arch to the distal aspect of the cervical segment of the internal carotid artery (ICA) (with or without the help of a slip catheter). Once the guide catheter is positioned in the distal aspect of the cervical segment of the ICA, it serves as an anchor for a subsequent step. The subsequent step is often the navigation of a microguidewire and microcatheter through the distal aspect of the cervical segment of the ICA to the site of the thrombus which is typically the M1 segment of the middle cerebral artery (MCA). Once the target site has been reached, a mesh-like device called a stent retriever is pushed through the thrombus within a microcatheter, expanded to engage the thrombus, and then pulled backwards, which removes the thrombus from the blood vessel. Alternatively, an aspiration catheter—where a vacuum sucks the thrombus from the artery—is used instead of a stent retriever.

In acute ischaemic stroke, time from symptom onset to treatment is crucial, as the benefits of MT become more marked the sooner a thrombus is removed [6]. As a result, in the UK for example, only 3.1% of stroke admissions benefit from MT despite at least 10% of patients eligible for treatment [7, 8]. Other challenges for MT relate to occasional complications, including perforation, thrombosis and dissection in the parent artery, and distal embolization of thrombus [9]. Moreover, angiography requires intravascular contrast agent administration, which can occasionally lead to nephrotoxicity [10]. For operators and their teams, the high cumulative dose of x-ray radiation from angiography is a risk factor for cancer and cataracts [11]. Although exposure can be minimized with current radiation protection practice, some measures involve operators wearing heavy protective equipment, a risk factor for orthopaedic complications, so alternative exposure reduction methods are beneficial [12, 13].

It is hoped that robotic surgical systems can mitigate or eliminate some of the challenges that MT presents. For example, robotic systems could be set up in hospitals nationwide and tele-operated remotely from a central location, increasing the speed of access to treatments such as MT beyond what is currently possible [14]. Additionally, robotic systems might eliminate any operator physiological tremors or fatigue and allow MT to be performed in an optimum ergonomic position while potentially increasing procedural precision (for example, procedure time), thereby improving overall performance and reducing complication rates [15]. Furthermore, as operators would not be required to stand next to the patient, their radiation exposure would be reduced, and the need to wear heavy protective equipment would be eliminated.

Robotic systems such as the MagellanTM system (Auris Health, Redwood City, USA) and the Corpath GRX® (Corindus Vascular Robotics, USA) have been used to help alleviate some of the challenges of MT; however, they have limitations. The controller operator structure requires a reasonably high cognitive workload and can still result in human error which means that the procedure is limited to an individual operator’s skill set in terms of both MT and robot handling [16]. These robotic systems consist of user interfaces such as buttons and joysticks, requiring skills different from those used in current clinical practice [17]. Additionally, a lack of haptic feedback from robotic systems results in difficulty receiving tactile feedback from the catheters and guidewires as they interact with vessel walls [14].

One emerging method of mitigating some of these challenges is by applying artificial intelligence (AI) techniques to robotic systems. AI, and in particular, machine learning (ML), has accelerated in recent years in its applications for data analysis and learning [18], with many areas of healthcare already making use of this technology for disease prediction and diagnosis [19, 20]. By using ML in the autonomous navigation of guidewires and catheters in MT, it is plausible that in endovascular specialities facing a shortage of highly trained operators, tele-operated MT may be performed safely and effectively by a few highly trained operators based in centralized neuroscience centres. Alternatively, less experienced operators—for example, endovascular operators (e.g. interventional radiologists who are not used to performing neurointerventional procedures)—may use AI-assisted robots in hospitals that are not neuroscience centres (the majority of hospitals are not neuroscience centres). Potentially, such developments would lead to greater accessibility of MT globally [21].

Several papers have investigated the potential use of ML in the automation of catheters and guidewires for endovascular interventions, with a recent systematic review finding 80% (8/10) of studies (all published after 2018) implemented some form of reinforcement learning (RL) [21]. RL is a subset of ML where an agent learns by interacting with the environment, receiving feedback as rewards, and aims to minimize cumulative rewards over time by optimizing its actions based on the current state [22]. However, this trial-and-error approach means that RL often requires many interactions with the environment to learn the optimal policy [23]. ‘Demonstrator data’ has been utilized previously in a variety of ways: it has been used to establish control policies for RL to optimize [24], enhance training speed [25], and also function as high-priority samples in the reinforcement learning replay memory [26]. While a ‘dense reward’, characterized by frequent feedback, has been shown to provide quicker training times than a ‘sparse reward’, which offers feedback less frequently [25], there has been no previous work to use demonstrator data to determine a reward function specifically for autonomous endovascular navigation tasks. Inverse reinforcement learning (IRL) is able to do this by using an agent to infer the underlying reward function from the observed behaviour of an expert [27]. By leveraging the knowledge of experts, the number of trial and error cycles needed can potentially be reduced, leading to faster and more effective learning of complex tasks.

This study aimed to leverage expert demonstrators through IRL to train autonomous guidewire and guide catheter navigation in simulated MT environments (i.e. in silico), by navigating from the common iliac artery to the distal aspect of the cervical segment of the ICA (representing the early stages of navigation which is completed by obtaining this stable guide catheter position). A primary objective was to train and test three models, each with a different reward function, using the soft actor-critic (SAC) algorithm [28]. A secondary objective was to conduct an analysis of one- and two-device tracking. This study contributes to the advancement of autonomous MT navigation and demonstrates a novel application of IRL in the simulated MT vasculature environment, filling a significant research gap in the field [21].

Methods

In the following sections, we present the navigation task, the environment required to simulate MT, and the simulation data collection process of expert demonstrator data. Subsequently, we describe the IRL approach for devising a reward function.

Navigation task

In MT, a guidewire is typically used to navigate a guide catheter to the ICA. An ‘access catheter’—one with a distal shape to allow access into vessel branches—is placed within the guide catheter and taken ahead of the guide catheter tip during navigation. Once the access catheter is within the ICA, the guide catheter is advanced to make a stable platform. At this point, the guidewire and access catheter are retracted, and a microguidewire within a microcatheter is together passed through the stable guide catheter and navigated to the target site. The navigation task presented in this paper simulates this complete navigation of guidewire and access catheter. A target location, randomly sampled from 30 centreline points within the right internal carotid artery (RICA) or left internal carotid artery (LICA), is chosen to be navigated from an insertion point in the common iliac artery; 20 targets in each branch were selected for training, and the remaining 10 hold-out target locations were used for evaluation. For each navigation attempt, the train or test target location is changed. Figure 1a shows the simulation environment, with anatomy labelled, and the possible targets. Figure 1b shows the insertion point (standard across all runs) and an example navigation path to a target within each carotid artery.

Fig. 1.

Fig. 1

MT environment a used in simulations, with anatomy labelled and all possible targets, b with insertion point and example navigation path to target in each common carotid artery

Simulation environment

The in silico environment for the navigation task builds upon prior contributions from [17, 29], both of which use the Simulation Open Framework Architecture (SOFA), along with the modified BeamAdapter plugin, to create a comprehensive simulation-based training and evaluation environment [30, 31]. SOFA allowed replication of vascular structures derived from two computed tomography angiography (CTA) scans. The first scan encompassed the abdominal and thoracic regions, including the femoral arteries, descending aorta, and the aortic arch. The second CTA scan encompassed the aortic arch and extended to the cerebral vessels, including the carotid and vertebral arteries in the neck. These scans were subsequently processed into surface meshes, each consisting of approximately 850 vertices, which were then manually merged to create a cohesive and continuous model using Blender [32].

Integrating catheters (characterized by 160 vertices with Young’s modulus of 47 MPa) and guidewires (comprising 120 vertices with Young’s modulus of 43 MPa) within the in silico environment were facilitated by the BeamAdapter plugin developed for SOFA. This plugin was used to model the Penumbra Neuron MAX 088 guide catheter (Penumbra, California, USA) and the Terumo 0.035 guidewire (Terumo, Tokyo, Japan) to generate the initial topology and mechanical properties needed for the simulation.

The simulation assumed rigid vessel walls with an empty lumen. The simulation’s fidelity to real-world guidewire behaviour was achieved through iterative fine-tuning of parameters such as the friction between the guidewire and vessel wall and the guidewire’s stiffness, as demonstrated by [29] and [33], the latter of which has shown promising translation from in silico to an ex vivo porcine liver. A qualitative survey shows that six experienced (greater than five years clinical experience) interventional neuroradiologists (UK consultant grade; US attending equivalent), two novice (less than three years clinical experience) interventional neuroradiologists, and one interventional radiologist (UK specialist trainee grade; US fellow equivalent) agreed that the vessel anatomy was accurate, agreed the guide catheter performed realistically, and were indifferent to the realism of the guidewire [17].

Input parameters for the simulation, including guidewire rotation and translation speed, were applied at the proximal end of the catheter and guidewire. The output of the simulation was the catheter and guidewire position as a series of coordinate points directly extracted from the finite element model. The rotational speed was constrained to a maximum of 180 s-1, while the translational speed was capped at 40 mms-1. Feedback during the navigation was given as two-dimensional (x,z) tracking coordinates of the device, and no feedback about the vessel geometry was given. Experiments were performed using solely guidewire-tracking (single tracking) and combined guidewire and cathetertracking (dual tracking) methods.

Demonstrator data collection

The navigation task was performed in silico on ten random targets in each carotid artery for demonstrator data collection. Navigation of the targets from the same insertion point in Fig. 1b was performed using inputs from a keyboard. The action (device rotation and translation), device tracking data, and target location were collected at each simulation step. Data were split based on if the target branch was on the left or right side. An example navigation path of guidewire and catheter is shown in Fig. 3a.

Fig. 3.

Fig. 3

Trajectories of catheter and guidewire tip for a demonstrator data, b single device tracking (with final catheter position highlighted), c dual device tracking (with final catheter position highlighted), and d IRL

IRL algorithm

IRL was used to determine a suitable reward function from the collected demonstrator data. The maximum entropy IRL (MaxEnt IRL) algorithm was employed to learn the reward functions based on expert demonstrations, specifically for navigating through different branches of vasculature [34]. The algorithm’s objective is twofold: it seeks to replicate the observed behaviour of experts and ensures that the learned reward functions capture a distribution of behaviour with maximum entropy. To achieve this, MaxEnt IRL involves an iterative process of updating reward functions to maximize the entropy of the expected policy while maintaining consistency with observed expert behaviour. The reward functions are initialized randomly, with the same random seed across experiments, and updated using gradient ascent to minimize the entropy of the policy.

For each branch, the reward function is updated using the gradient ascent step in Eq. (1), where Ri is the reward function for branch i, Ri is the gradient vector, and η is the learning rate.

RiRi+ηRi 1

The MaxEnt IRL objective is expressed as in Eq. (2), for a policy πas, which is the probability distribution over actions given a particular state.

MaximizeEπ-aπaslogπas 2

The Boltzmann policy calculation for action a given state s is defined as Eq. (3).

πas=eRis,aaieRis,a 3

A feedforward neural network with four fully connected layers trained the model using expert trajectories, optimizing the policy through 1×106 iterations. A separate model was trained for each set of targets in each branch. At the start of a new navigation task, the correct model based on the target branch was obtained, and at each step, a reward value was returned based on the current observation given to the model.

Controller architecture, reward functions, and training

In this study, all RL models were trained using a SAC controller, adapted to facilitate two-device tracking, and accommodate various reward functions from the architecture developed in previous work by Karstensen et al. [29]. The architecture consists of a Long Short-Term Memory (LSTM) layer, which functions as an observation embedder, enabling the learning of a trajectory-dependent state representation. The subsequent feedforward layers are responsible for learning the control of the guidewire. The LSTM-based observation embedder is updated exclusively using the q1-network. In this architecture, the controller receives an observation as input, and the Gaussian policy network generates parameters, specifically mean (μ) and standard deviation (σ), of a normal distribution for the following action. During training, actions are sampled from this normal distribution, whereas for evaluation, μ is directly employed as the action, rendering the behaviour deterministic. Training was completed on a NVIDIA DGX A100 node with 8 GPUs (Santa Clara, California, USA).

The action was defined as the output of the Gaussian policy network representing the catheters’ and the guidewires’ rotation and translation speed. Three points on the instrument tip described the instrument’s position, denoted as (x,z)i=1,2,3 and (x,z)1 coinciding with the instrument tip. Additionally, the target position was specified by the current target’s (x,z)-coordinates. The observation comprised the current and previous guidewire and catheter positions, the target position, and the previous action taken from the last to the current position.

The controller training procedure performed the navigation task for 1×107 exploration steps, defined as a single control loop cycle during the exploration phase. Each navigation task was considered an episode and was deemed complete once the target was reached within a 5 mm threshold. To ensure computational efficiency, we introduced a timeout after 400 exploration steps (equivalent to approximately 53 s) without reaching the target. The control frequency was set to 7.5 Hz.

Three types of reward functions were used in this study to determine whether deriving a reward function from demonstrator data provides any benefit for autonomous endovascular navigation. Each reward function is calculated in every simulation step from the environment state based on the agent’s actions. Therefore, actions will lead to different reward quantities across reward functions and, hence, a variation in the final model.

The current state-of-the-art algorithm for autonomous navigation in endovascular interventions is devised by [29]. This algorithm uses a dense reward function, R1, which does not use IRL and is shown in Eq. (4). Here, pathlength is defined as the distance between the guidewire tip and the target along the centrelines of the arteries, with Δpathlength representing the change in pathlength at time t from the previous step at time t=-1.

R1=-0.005-0.001·Δpathlength+1.0if target reached0else 4

The second reward function R2 was derived using IRL, as seen in Eq. (5), where RRICA and RLICA are equal to Ri in Eq. (1) calculated for the right and left carotid arteries, respectively.

R2=RRICAif target in RICARLICAif target in LICA 5

The third reward function R3 was obtained through reward shaping using a combination of the dense reward function R1 and the IRL-derived reward function R2, as shown in Eq. (6). α is a scaling factor used to provide the correct ratio of R1 and R2. A scaling factor of 0.001 was used in this study.

R3=R1+αR2 6

Evaluations were conducted every 2.5×105 exploration steps for 100 episodes. Ten targets in each branch unused during training were utilized for evaluation. Each evaluation step records the success rate, procedural times, and path ratio. Comparison is performed between the highest success rate of each model, and the corresponding path ratio and procedure times at this evaluation step. The success rate is the percentage of evaluation episodes in which the controller successfully reaches the target; path ratio is a measure of the remaining distance to the target point in unsuccessful episodes, calculated by dividing the remaining distance by the initial distance; and procedure time is the time taken from the start of navigation to the target location for successful episodes. Exploration steps is the number of training steps taken to reach the point at which the results are provided. Comparative statistical analyses were conducted using two-tailed paired Student’s t-tests, with a predetermined significance threshold set at p = 0.05.

Results

Single- versus dual-device tracking

Results of an investigation into the difference in success rate, procedure times, and path ratio during training between tracking solely the guidewire and tracking both the guidewire and catheter are shown in Table 1. Figure 2 shows the success rate and path ratio during training. Single tracking and dual tracking reach similar maximum success rates of 95% and 96%, respectively (p=0.741), with single tracking having a lower procedure time of 22.5 s compared to 24.9 s for dual tracking (p=0.047). Figure 3 compares the guidewire and catheter trajectories from insertion to a target in the LICA of single and dual-device tracking. The final tip position of catheters for these two models is highlighted for comparison. For single tracking, the catheter is utilized sparingly and is not navigated out of the common iliac artery. In contrast, during dual tracking navigation, the catheter is used throughout the navigation task and is taken up the aortic arch to provide stability. Dual tracking navigation, therefore, mimics neurointerventional radiology experts, whereas single tracking navigation does not.

Table 1.

Results of device tracking training

Tracking method Success rate (%) Procedure time (s) Path ratio (%) Exploration steps
Single tracking 95 22.5 98.7 4.26×106
Dual tracking 96 24.9 98.9 4.75×106

Fig. 2.

Fig. 2

a Success rate (%), b path ratio (%) during training for single- versus dual-device tracking

IRL and reward shaping

Table 2 shows the final success rate, procedure time, and path ratio for a dense reward function alone, an IRL-derived reward function alone, and a reward function obtained through reward shaping (which combines a dense reward function and an IRL-derived reward function). All experiments are performed using dual tracking. Figure 4 shows the success rate and path ratio during training. Reward shaping provides the highest success rate of 100%, followed by 96% for the dense reward function (p=0.045). Reward shaping also has the lowest procedure time out of the examined reward functions at 22.6 s, compared to 24.9 s for the dense reward function (p=0.0001). Figure 3d shows an example navigation path for guidewire and catheter of the IRL-derived reward function at 0.76×106 exploration steps, where it can be seen that the low success rate of 48% can be attributed to catheterization of the left vertebral artery, rather than the right carotid artery.

Table 2.

Results of reward function training

Tracking method Success rate (%) Procedure time (s) Path ratio (%) Exploration steps
Dense 96 24.9 98.9 4.75×106
IRL 48 59.2 67.3 0.76×106
Reward shaping 100 22.6 100 5.51×106

Fig. 4.

Fig. 4

a Success rate, b path ratio during training for a dense reward function, IRL-derived reward function, and reward shaping function

Discussion

This study was the first to perform a navigation task that simulated the MT intervention by autonomously navigating both guidewire and catheter from an insertion point in the femoral artery to a target in the carotid artery. Furthermore, our study introduced for the first time dual device for autonomous endovascular navigation, providing information about catheters and guidewire during RL training. This study was also the first to investigate reward functions in autonomous endovascular navigation using IRL to identify a specific reward function from demonstrator data for RL training.

There have been no studies applying RL in any form to any task related to MT [21]. Our study demonstrated proof of concept of in silico autonomous navigation of instruments in MT. The implication is that if further validation studies demonstrate benefit, our approach can plausibly increase patient accessibility to MT while increasing intervention safety and speed.

Single versus dual tracking

Both models showed an increase in path ratio and success rate over training. Results showed that single and dual tracking models reach a similarly high success rate after training, but the single tracking method records quicker procedure times. However, investigation into the dual tracking models shows that the catheter is utilized from an early stage during navigation, with the movements performed similar to those of an experienced neurointerventional radiologist. The catheter has a higher stiffness than the guidewire, and hence, in the dual tracking model, the catheter is taken up the aortic arch to provide stability as the guidewire is navigated into the branching point of the common carotid arteries. For this reason, dual tracking was used in all reward function experiments, as it better replicates the actions of an experienced operator, with no trade-off for success rate.

It is noteworthy that in this experiment, a single static mesh and a relatively simple navigation task were used. It is plausible that in a more challenging navigation task, for example, where the anatomy is more tortuous, as seen in old and hypertensive patients, the dynamic use of two instruments (dual tracking with guidewire and catheter) would enable a higher success rate.

Reward function

Results showed that reward shaping using a combined learned IRL and a dense reward function gives a higher overall success rate and a lower procedure time than the state of the art [29]. The model trained using reward shaping can use demonstrator data through IRL to navigate towards the target, while the dense reward function provides an incentive for reaching and moving towards the target while promoting this in the lowest number of steps possible.

Training using only an IRL-derived reward function did not reach the same success rates as the other models, although it can be seen that the model was able to navigate towards the target. However, the low success rate comes from wrong branch catheterization, which could be caused by the IRL-derived reward function not taking into account the vessel walls and, therefore, receiving a high reward signal for a small Euclidean distance in the coordinate system, while the pathlength is still significant. For anatomical clarity, the distal vertebral artery and the ipsilateral internal carotid artery are approximately 10 mm from each other yet both arteries are in continuity with the aortic arch allowing the guidewire to move seamlessly to either location. These issues were overcome through reward shaping, as by providing a negative reward for moving away from the target via pathlength, the success rate can be increased dramatically.

A model with a high path ratio is still valid for all experiments performed within this study. Failures where the path ratio was above 80% usually meant that the correct branch was catheterized; however, the guidewire did not reach the target point. Despite this, the guidewire would be in the correct region of the vessel and be helpful for the operator and subsequent steps in MT. After all, in MT the initial aim simulated by our experiments is to navigate to this approximate area within the ICA with a standard guidewire facilitated by an access catheter. Thereafter, a guide catheter is advanced to this approximate area, and then, a microguidewire and microcatheter are used. The exact location of the initial navigation within the ICA is often less important than simply having a guidewire in the ICA that is stable enough to allow the subsequent endovascular steps.

Limitations

While the results of this investigation towards an autonomous MT navigation system fall under a technology readiness level (TRL) of 3 [35], this proof-of-concept work showed that IRL can be leveraged effectively as part of reward shaping to deduce a suitable reward function for RL training; there are limitations to the methodology employed in terms of utility. In vitro (e.g. phantom) and clinical validation steps would be required to progress the TRLs.

First, demonstrator data were obtained using a simple keypad controller; therefore, to improve the reward function’s suitability, data could be collected to accurately reflect the actions performed by an experienced neurointerventional radiologist. Furthermore, 20 demonstration trajectories were used for IRL, and a higher sample size may provide a more generalized reward function. Additionally, the reward function could be altered to give consideration to safety during the intervention. By providing a negative reward for vessel wall contact forces over a threshold, the puncturing of a vessel wall causing a perforation can be prevented.

Second, while different targets were used for training and testing, the same mesh was used throughout the experiments. Future in silico validation work should apply these algorithms to a range of patient anatomy (a dataset with multiple training and hold-out testing unique patient anatomy) to increase generalizability, improving the effectiveness during future in vitro experiments.

Third, simulations in this study were performed using tracking coordinates of the guidewire and catheter tip. An alternative to this would be to use an image-based tracking system, which may have more clinical relevance, as the autonomous system could use fluoroscopic imaging that already takes place during MT interventions. For future in vitro experiments, further work will investigate the effectiveness of the techniques demonstrated in this study to imaged-based tracking inputs and those captured by device tracking.

Fourth, recent clinical trials of robotic diagnostic cerebral angiography recommend that procedural times, success rates, fluoroscopy times, and radiation doses be compared with the traditional manual approach [36]. The measurement of fluoroscopy times and radiation doses is outside the scope of this in silico study. However, success rate, procedural times, and path ratio were recorded at each evaluation step. Future in vitro experiments could measure fluoroscopy times and radiation doses.

Our study successfully addresses the task of simulating the complete navigation of guidewire and guide catheter in MT, which takes place from the common iliac artery to either the LICA or RICA where the guide catheter is in a stable position. However, a complete MT procedure typically involves a subsequent navigation step from the LICA or RICA to the MCA. Such a subsequent navigation to the MCA requires different instrumentation (often microcatheters and microwires) beyond the scope of our investigation. Therefore, while this study provides valuable insights into a specific aspect of the MT procedure involving several steps where there is challenging anatomy, it does not represent a comprehensive examination of the entire MT navigation procedure.

Conclusion

We showed the feasibility of autonomous guidewire navigation throughout the MT vasculature using IRL with demonstrator data for the first time. We established a comprehensive simulation-based training environment and, for the first time, compared models with different reward functions for autonomous endovascular navigation by utilizing SOFA and SAC. Reward shaping emerged superior, with higher success rates and lower procedure times than the state of the art. Single- versus dual-device tracking revealed that both methods achieved high success rates, but dual tracking utilized both devices, effectively mimicking experienced neurointerventional radiologists. We have highlighted several opportunities for future research beyond our proof-of-concept experiments, in particular relating to improving performance. Another opportunity is to investigate the final stage of navigation which is typically from the distal aspect of the cervical ICA to the M1 segment of the MCA using microcatheter and microwire. Additionally, researchers may wish to apply our approach to other non-MT endovascular systems. In summary, this work offers promising avenues for improved procedural accessibility and precision of in silico autonomous endovascular navigation tasks. For MT, the work plausibly lays the foundation for potentially transformative patient care.

Author contributions

All authors contributed to the study conception and design. Material preparation was performed by Harry Robertshaw, Lennart Karstensen, and Benjamin Jackson. Data collection and analysis were performed by Harry Robertshaw. The first draft of the manuscript was written by Harry Robertshaw, and all authors commented on previous versions of the manuscript. All authors read and approved the manuscript.

Funding

Partial financial support was received from the Wellcome/EPSRC Centre for Medical Engineering [WT203148/Z/16/Z] and from the Engineering and Physical Sciences Research Council Doctoral Training Partnership (EPSRC DTP) under Grant Agreement No EP/R513064/1. For the purpose of Open Access, the Author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.

Declarations

Conflict of interest

The authors have no conflict of interest to declare that is relevant to the content of this article.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Townsend N, Wilson L, Bhatnagar P, Wickramasinghe K, Rayner M, Nichols M (2016) Cardiovascular disease in europe: epidemiological update 2016. Eur Heart J 37(3232–3245):11 [DOI] [PubMed] [Google Scholar]
  • 2.Goyal M, Menon BK, Zwam WHV, Dippel DW, Mitchell PJ, Demchuk AM, Dávalos A, Majoie CB, Lugt AVD, Miquel MAD, Donnan GA, Roos YB, Bonafe A, Jahan R, Diener HC, Berg LAVD, Levy EI, Berkhemer OA, Pereira VM, Rempel J, Millán M, Davis SM, Roy D, Thornton J, Román LS, Ribó M, Beumer D, Stouch B, Brown S, Campbell BC, Oostenbrugge RJV, Saver JL, Hill MD, Jovin TG (2016) Endovascular thrombectomy after large-vessel ischaemic stroke: a meta-analysis of individual patient data from five randomised trials. The Lancet 387:1723–1731 [DOI] [PubMed] [Google Scholar]
  • 3.Vidale S, Agostoni E (2017) Endovascular treatment of ischemic stroke: an updated meta-analysis of efficacy and safety. Vasc Endovasc Surg 51:215–219 [DOI] [PubMed] [Google Scholar]
  • 4.Rha JH, Saver JL (2007) The impact of recanalization on ischemic stroke outcome: a meta-analysis. Stroke 38:967–973 [DOI] [PubMed] [Google Scholar]
  • 5.Berkhemer OA, Fransen PS, Beumer D, van den Berg LA, Lingsma HF, Yoo AJ, Schonewille WJ, Vos JA, Nederkoorn PJ, Wermer MJ, van Walderveen MA, Staals J, Hofmeijer J, van Oostayen JA, Nijeholt GJL, Boiten J, Brouwer PA, Emmer BJ, de Bruijn SF, van Dijk LC, Kappelle LJ, Lo RH, van Dijk EJ, de Vries J, de Kort PL, van Rooij WJJ, van den Berg JS, van Hasselt BA, Aerden LA, Dallinga RJ, Visser MC, Bot JC, Vroomen PC, Eshghi O, Schreuder TH, Heijboer RJ, Keizer K, Tielbeek AV, den Hertog HM, Gerrits DG, van den Berg-Vos RM, Karas GB, Steyerberg EW, Flach HZ, Marquering HA, Sprengers ME, Jenniskens SF, Beenen LF, van den Berg R, Koudstaal PJ, van Zwam WH, Roos YB, van der Lugt A, van Oostenbrugge RJ, Majoie CB, Dippel DW (2015) A randomized trial of intraarterial treatment for acute ischemic stroke. N Engl J Med 372(11–20):1 [DOI] [PubMed] [Google Scholar]
  • 6.Saver JL, Goyal Lugt AVD, Menon BK, Majoie CB, Dippel DW, Campbell BC, Nogueira RG, Demchuk AM, Tomasello A, Cardona P, Devlin TG, Frei DF, Rochemont RDMD, Berkhemer OA, Jovin TG, Siddiqui AH, Zwam WHV, Davis SM, Castaño C, Sapkota BL, Fransen PS, Molina C, Oostenbrugge RJV, Ángel Chamorro, Lingsma H, Silver FL, Donnan GA, Shuaib A, Brown S, Stouch B, Mitchell PJ, Davalos A, Roos YB, Hill MD (2016) Time to treatment with endovascular thrombectomy and outcomes from ischemic stroke: a meta-analysis. JAMA J Am Med Assoc 316:1279–1288 [DOI] [PubMed] [Google Scholar]
  • 7.McMeekin P, White P, James MA, Price CI, Flynn D, Ford GA (2017) Estimating the number of UK stroke patients eligible for endovascular thrombectomy. Eur Stroke J 2:319–326 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Programme SSNA (2023) Ssnap annual report 2023 [Online]. Available: www.hqip.org.uk/national-programmes
  • 9.Hausegger KA, Schedlbauer P, Deutschmann HA, Tiesenhausen K (2001) Complications in endoluminal repair of abdominal aortic aneurysms. Eur J Radiol 39:22–33 [DOI] [PubMed] [Google Scholar]
  • 10.Rudnick MR, Goldfarb S, Wexler L, Ludbrook PA, Murphy MJ, Halpern EF, Hill JA, Winniford M, Cohen MB, VanFossen DB (1995) Nephrotoxicity of ionic and nonionic contrast media in 1196 patients: a randomized trial. Kidney Int 47:254–261 [DOI] [PubMed] [Google Scholar]
  • 11.Klein LW, Miller DL, Balter S, Laskey W, Haines D, Norbash A, Mauro MA, Goldstein JA (2009) Occupational health hazards in the interventional laboratory: time for a safer environment. Soc Interv Radiol 250:538–544 [DOI] [PubMed] [Google Scholar]
  • 12.Ho P, Cheng SW, Wu PM, Ting AC, Poon JT, Cheng CK, Mok JH, Tsang MS (2007) Ionizing radiation absorption of vascular surgeons during endovascular procedures. J Vasc Surg 46:455–459 [DOI] [PubMed] [Google Scholar]
  • 13.Madder RD, VanOosterhout S, Mulder A, Elmore M, Campbell J, Borgman A, Parker J, Wohns D (2017) Impact of robotics and a suspended lead suit on physician radiation exposure during percutaneous coronary intervention. Cardiovasc Revascularization Med 18:190–196 [DOI] [PubMed] [Google Scholar]
  • 14.Crinnion W, Jackson B, Sood A, Lynch J, Bergeles C, Liu H, Rhode K, Pereira VM, Booth TC (2022) Robotics in neurointerventional surgery: a systematic review of the literature. J Neurointerventional Surg 14:539–545 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Riga CV, Cheshire NJ, Hamady MS, Bicknell CD (2010) The role of robotic endovascular catheters in fenestrated stent grafting. J Vasc Surg 51:810–820 [DOI] [PubMed] [Google Scholar]
  • 16.Mofatteh M (2021) Neurosurgery and artificial intelligence. AIMS Neurosci 8:477–495 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Jackson B, Crinnion W, Reyzabal MDI, Robertshaw H, Bergeles C, Rhode K, Booth T (2023) Comparative verification of control methodology for robotic interventional neuroradiology procedures, International Journal of Computer Assisted Radiology and Surgery, 7. [Online]. Available: https://link.springer.com/10.1007/s11548-023-02991-2 [DOI] [PMC free article] [PubMed]
  • 18.Sarker IH (2021) Machine learning: algorithms, real-world applications and research directions. SN Comput Sci 2:5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Fatima M, Pasha M (2017) Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl 09:1–16 [Google Scholar]
  • 20.Silahtaroğlu G, Yılmaztürk N (2021) Data analysis in health and big data: a machine learning medical diagnosis model based on patients’ complaints. Commun Stat Theory Methods 50:1547–1556 [Google Scholar]
  • 21.Robertshaw H, Karstensen L, Jackson B, Sadati H, Rhode K, Ourselin S, Granados A, Booth TC (2023) Artificial intelligence in the autonomous navigation of endovascular interventions: a systematic review. Front Hum Neurosci 17:1239374 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Sutton RS, Barto AG (1998) Introduction to reinforcement learning, 1st ed. MIT Press
  • 23.Adams S, Cody T, Beling PA (2022) A survey of inverse reinforcement learning. Artif Intelli Rev 55:4307–4346 [Google Scholar]
  • 24.Chi W, Liu J, Abdelaziz MEMK, Dagnino G, Riga C, Bicknell C, Yang GZ (2018) Trajectory optimization of robot-assisted endovascularcatheterization with reinforcement learning. In: 2018 IEEE/RSJ International conference on intelligent robots and systems (IROS). IEEE, vol 8, pp 3875–3881
  • 25.Behr T, Pusch TP, Siegfarth M, Hüsener D, Mörschel T, Karstensen L (2019) Deep reinforcement learning for the navigation of neurovascular catheters. Curr Dir Biomed Eng 5:5–8 [Google Scholar]
  • 26.Kweon J, Kim K, Lee C, Kwon H, Park J, Song K, Kim YI, Park J, Back I, Roh JH, Moon Y, Choi J, Kim YH (2021) Deep reinforcement learning for guidewire navigation in coronary artery phantom. IEEE Access 9:166409–166422 [Google Scholar]
  • 27.Ng AY, Russel S (2000) Algorithms for inverse reinforcement learning. Morgan Kaufmann Publishers Inc., pp 663–670
  • 28.Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy J, Krause A (Eds.), Proceedings of the 35th international conference on machine learning, ser. Proceedings of machine learning research, vol. 80. PMLR, pp 1861–1870. [Online]. Available: https://proceedings.mlr.press/v80/haarnoja18b.html
  • 29.Karstensen L, Ritter J, Hatzl J, Ernst F, Langejürgen J, Uhl C, Mathis-Ullrich F (2023) Recurrent neural networks for generalization towards the vessel geometry in autonomous endovascular guidewire navigation in the aortic arch. Int J Comput Assist Radiol Surg 18:1735–1744 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Faure F, Duriez C, Delingette H, Allard J, Gilles B, Marchesseau S, Talbot H, Courtecuisse H, Bousquet G, Peterlik I, Cotin S (2012) SOFA, a multi-model framework for interactive physical simulation. Springer Berlin Heidelberg, 2012. [Online]. Available: 10.1007/8415_2012_125
  • 31.Duriez C, Cotin S, Lenoir J, Neumann P (2006) New approaches to catheter navigation for interventional radiology simulation. Comput Aided Surg 11:300–308 [DOI] [PubMed] [Google Scholar]
  • 32.Community BO (2018) Blender—a 3D modelling and rendering package, Blender Foundation, Stichting Blender Foundation, Amsterdam. [Online]. Available: http://www.blender.org
  • 33.Karstensen L, Ritter J, Hatzl J, Pätz T, Langejürgen J, Uhl C, Mathis-Ullrich F (2022) Learning-based autonomous vascular guidewire navigation without human demonstration in the venous system of a porcine liver. Int J Comput Assist Radiol Surg 17:2033–2040 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ziebart BD, Maas A, Bagnell JA, Dey AK (2008) Maximum entropy inverse reinforcement learning. AAAI Press, pp 1433–1438. [Online]. Available: www.aaai.org
  • 35.Mankins J (1995) Technology readiness level—a white paper, 4. [Online]. Available: https://www.researchgate.net/publication/247705707
  • 36.Beaman A Gautam, Peterson C, Kaneko N, Ponce L, Saber H, Khatibi K, Morales J, Kimball D, Lipovac JR, Narsinh KH, Baker A, Caton MT, Smith ER, Nour M, Szeder V, Jahan R, Colby GP, Cord BJ, Cooke DL, Tateshima S, Duckwiler G, Waldau B (2023) Robotic diagnostic cerebral angiography: A multicenter experience of 113 patients. J NeuroInterventional Surg [DOI] [PubMed]

Articles from International Journal of Computer Assisted Radiology and Surgery are provided here courtesy of Springer

RESOURCES