Augmented Reality (AR) for Surgical Robotic and Autonomous Systems: State of the Art, Challenges, and Solutions

. 2023 Jul 6;23(13):6202. doi: 10.3390/s23136202

Authors	Model	Performance Metrics	Purpose	Accuracy	Optimization Algorithm	Equipment
Von Atzigen et al. [80]	Stereo neural networks (adapted from YOLO)	Bending parameters such as axial displacement, reorientation, bending time, frame rate.	Markerless navigation and localization of pedicles of screw heads.	67.26% to 76.51%	Perspective-n-point algorithm and random sample consensus (RANSAC), SLAM.	Head-mounted AR device (HoloLens) with C++
Doughty et al. [182]	SurgeonAssistNet composed of EfficientNet-Lite-B0 for feature extraction and gated recurrent unit RNN	Parameters of the GRU cell and dense layer, model size, inference time, accuracy, precision, and recall.	Evaluating the online performance of the HoloLens during virtual augmentation of anatomical landmarks.	5.2× decrease in CPU inference time.	7.4× fewer model parameters, achieved 10.2× faster FLOPS, and used 3× less time for inference with respect to SV-RCNet.	Optical see-through head-mounted displays
Tanzi et al. [118]	CNN-based architectures such as UNet, ResNet, MobileNet for semantic segmentation of data	Intersection over union (IoU), Euclidean distance between points of interest, geodesic distance, number of iterations per second (it/s).	Semantic segmentation of intraoperative proctectomy, for 3D reconstruction of virtual models to preserve nerves of the prostate.	IoU = 0.894 (σ = 0.076) compared to 0.339 (σ = 0.195).	CNN with encoder–decoder structure for real-time image segmentation and training of a dataset in Keras and TensorFlow.	In vivo robot-assisted radical prostatectomy using DaVinci surgical console
Brunet et al. [183]	Adapted UNet architecture for simulation of preoperative organs	Image registration frequency, latency between data acquisition, input displacements, stochastic gradients, target registration error (TRE).	Use of an artificial neural network to learn and predict mesh deformation in human anatomical boundaries.	Mean target registration error = 2.9 mm, 100× faster.	Immersed boundary methods (FEM, MJED, Multiplicative Jacobian Energy Decomposition) for discretization of non-linear material on mesh.	RGB-D cameras
Marahrens et al. [184]	Visual deep learning algorithm such as UNet, DC-Net	For autonomous robotic ultrasound using deep-learning-based control, for better kinematic sensing and orientation of the US probe with respect to the organ surface.	Semantic segmentation of vessel scans for organ deformation analysis using a dVRK and Philips L15-7io probe.	Final model Dice score of 0.887 as compared to 0.982 in [179].	DC-Net with images in the propagation direction feed through, binary classification task, IMU-fused kinematics for trajectory comparison.	Philips L15-i07 probe driven by US machine, dVRK software