Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Nov 13.
Published in final edited form as: Hum Factors. 2022 Nov 11;66(4):1081–1102. doi: 10.1177/00187208221129940

An Adaptive Human-Robotic Interaction Architecture for Augmenting Surgery Performance Using Real-Time Workload Sensing—Demonstration of a Semi-autonomous Suction Tool

Jing Yang 1, Juan Antonio Barragan 2, Jason Michael Farrow 3, Chandru P Sundaram 4, Juan P Wachs 5, Denny Yu 6
PMCID: PMC11558698  NIHMSID: NIHMS2033676  PMID: 36367971

Abstract

Objective:

This study developed and evaluated a mental workload-based adaptive automation (MWL-AA) that monitors surgeon cognitive load and assist during cognitively demanding tasks and assists surgeons in robotic-assisted surgery (RAS).

Background:

The introduction of RAS makes operators overwhelmed. The need for precise, continuous assessment of human mental workload (MWL) states is important to identify when the interventions should be delivered to moderate operators’ MWL.

Method:

The MWL-AA presented in this study was a semi-autonomous suction tool. The first experiment recruited ten participants to perform surgical tasks under different MWL levels. The physiological responses were captured and used to develop a real-time multi-sensing model for MWL detection. The second experiment evaluated the effectiveness of the MWL-AA, where nine brand-new surgical trainees performed the surgical task with and without the MWL-AA. Mixed effect models were used to compare task performance, objective- and subjective-measured MWL.

Results:

The proposed system predicted high MWL hemorrhage conditions with an accuracy of 77.9%. For the MWL-AA evaluation, the surgeons’ gaze behaviors and brain activities suggested lower perceived MWL with MWL-AA than without. This was further supported by lower self-reported MWL and better task performance in the task condition with MWL-AA.

Conclusion:

A MWL-AA systems can reduce surgeons’ workload and improve performance in a high-stress hemorrhaging scenario. Findings highlight the potential of utilizing MWL-AA to enhance the collaboration between the autonomous system and surgeons. Developing a robust and personalized MWL-AA is the first step that can be used do develop additional use cases in future studies.

Application:

The proposed framework can be expanded and applied to more complex environments to improve human-robot collaboration.

Keywords: physiological measurement, mental workload, robotic and telesurgery, artificial intelligence, adaptive automation

INTRODUCTION

Human-robot collaboration is a promising paradigm in many fields due to its potential to exploit the strength of human flexibility and robot precision (Reason, 2000). Even with exceedingly sophisticated and highly evolved technologies, robotic systems are primarily operated by humans with varying degrees of intervention and control (Power et al., 2015). However, the teleoperated control that requires the surgeons to manipulate the robotic arms remotely may introduce problems such as ambiguity and a lack of motion feedback (Chen et al., 2007), resulting in excessive mental workload (MWL) that can compromise surgeon performance. As extreme MWL degrades performance and increases error probability (Yurko et al., 2010), operator workload is becoming a central concern in determining successful human-robot collaboration. Consequently, there has been an increased interest in developing robots that can provide operators with varying levels of assistance based on their MWL during task execution (i.e., mental workload-based adaptive automation) (MWL-AA).

In MWL-AA, robot behavior is adapted to the current state of the operator to mitigate performance degradations. Previous studies have documented three strategies regarding triggering mechanisms for MWL-AA (Aricò et al., 2016; de Visser et al., 2008), including critical task event triggers, performance triggers, and physiological triggers. Critical task event triggers are tied to the occurrence of specific events in the course of the task. The automation would activate to accommodate the additional demands when such events occur, irrespective of whether the user requires assistance at the time (Aricò et al., 2016) . However, the critical task event triggers have the inherent disadvantage of being insensitive to task environment differences and user experience. For example, novices will likely have difficulty performing tasks that are not considered critical task events due to their lack of skill. In this scenario, the critical task event triggers may fail to capture the task-demanding situations the novices face. The performance triggers measure the performance of the secondary task to determine whether MWL is excessive. However, the performance measures are insensitive to momentary changes in user states, and performance decrements made by humans in complex tasks may not be available immediately (Hasselberg & Söffker, 2013).

By contrast, physiological triggers that use operators’ neurophysiological signals to assess their MWL can be continuously obtained with a higher temporal resolution to capture momentary changes in user states (Wilson & Russell, 2003). Furthermore, physiological measurements can directly measure MWL activities without interfering with ongoing tasks (Hancock & Matthews, 2019). Multiple studies have shown arelationship between physiological measures and mental workload (Debie et al., 2021). For instance, the increase in theta and beta band power of electroencephalography (EEG) in the frontal regions may reflect the increased control required to carry out the task (Fernandez Rojas et al., 2020; Rieger et al., 2015), while alpha band activity is suppressed with increased task difficulty (Sterman & Mann, 1995). Previous studies have also demonstrated that eye movement patterns measured by eye-trackers, such as pupil diameter, scan path duration, and fixation duration (Imants & de Greef, 2014) are strongly correlated with changes in workload.

Physiological-based MWL-AA has been developed in several human-robot collaboration tasks (Aricò et al., 2016; Kohlmorgen et al., 2007; Prinzel et al., 2000; Teo et al., 2020) For instance, Teo et al. (2018) demonstrated that an EEG-based MWL-AA that took over one subtasks during soldier-robot collaboration training could improve task performance. In the healthcare field, MWL-AA has been used in rehabilitation (Knaepen et al., 2015) and service robots (Vänn et al., 2019). However, the field of robotic-assisted surgery (RAS), which is among the fastest-growing in the medical sector (Duchene et al., 2011), has not explored MWL-AA despite high MWL requirements reported in surgery (Kranzfelder et al., 2013). Developing an MWL-AA that can identify when the cognitive overload occurs and initiates different levels of assistance could potentially reduce surgeons’ MWL and improve task performance.

However, two critical gaps remain in developing a deployable MWL-AA in RAS. First, although the relationship between MWL and physiological metrics in robotic surgical tasks has been explored in several studies (Wu et al., 2020; T. Zhou et al., 2020), the integration of physiological sensors for MWL assessment has so far been limited to an offline analysis which is not viable for real-time MWL-AA systems. Second, there is a lack of effective intervention strategies for addressing high MWL in RAS. While existing studies have proposed several intervention strategies, they are not always practical in RAS. For example, pausing certain subtasks or adjusting the current task’s difficulty (Yuksel et al., 2016) is not feasible and may be deleterious to patient safety. In addition, providing surgeons with alerts and feedback may disrupt their workflow and increase their MWL (Weber et al., 2018).

To this end, we developed an MWL-AA that monitors surgeons’ MWL and initiates assistance whenever high MWL is detected by our physiological sensors (Experiment 1). The effectiveness of the MWL-AA was explored through a comparative user study (Experiment 2).

EXPERIMENT

In the present study, MWL-AA blood suction was used to demonstrate the usefulness of the proposed MWL-AA for RAS. We use a semi-autonomous suction tool as a demonstration of MWL-AA for two reasons. First, surgeon MWL and performance can be adversely affected during hemorrhage events. As the operating field fills with blood, tissues become obstructed (Kirchner et al., 2016). Therefore, blood suction tools are essential to providing surgeons with clear visibility. Furthermore, in the emerging field of RAS, the autonomy of surgical robots has been proposed and classified into six levels, from no autonomy (level 0) to total autonomy (level 6) (Haidegger, 2019). The concept of task-level autonomy (i.e., level 2 autonomy) has recently been proposed, in which a particular task or subtask can be automated without human intervention, such as suturing, grasping, blunt dissection (Nagy & Haidegger, 2019), and cutting (Attanasio et al., 2020; Zhou et al., 2021). In contrast to these tasks, bleeding during a surgical procedure can be unpredictable, and bedside assistants are usually needed to clean the field (Giedelman et al., 2013). Hence, the semi-autonomous suction tool was used in this study to illustrate the concept of MWL-AA.

To this end, two experiments were conducted. The first experiment proposed a multimodal model for detecting MWL in real-time during RAS. The second experiment evaluated the potential impact of the proposed MWL-AA (semi-autonomous suction tool triggered by user MWL) on task performance and perceived MWL.

EXPERIMENT 1

In this experiment, participants performed a task with two difficulties levels (i.e., high MWL hemorrhaging condition and low MWL hemorrhaging condition), and physiological signals were recorded throughout the experiment. We hypothesized that the designed tasks would successfully elicit different levels of MWL. Additionally, we compared the performance of individual and multimodal physiological sensors in predicting high MWL.

Method

Participant and apparatus.

This study was approved by the Institutional Review Board (IRB) of the university. We recruited ten participants (two females and eight males) from the university population, with an average age of 23 ± 3. All of them were novices. Da Vinci Research Kit (dVRK; Intuitive Surgical, Inc., Sunnyvale, CA) (Kazanzidesf et al., 2014) was used to simulate surgical training tasks (Figure 1). The participants’ brain activity was recorded using the g.Nautilus 32-channel EEG (Austria) at 250 Hz (Figure 2, blue box), and their eye movements were collected using the Tobii Pro Glasses 2.0 (Tobii Technology, Danderyd, Sweden) at 60 Hz (Figure 2, red box).

Figure 1.

Figure 1.

dVRK that consists of a patient side console (left) and surgeon’s console (right).

Figure 2.

Figure 2.

g.Nautilus EEG (blue box) and Tobii Pro Glasses 2.0 (red box).

Experimental tasks.

A novel bleeding task phantom was created. Two surgical task conditions were designed to simulate the conditions of low MWL and high MWL.

In the low MWL condition (i.e., non-hemorrhage condition), participants completed a needle passing task, a fundamental surgical skill adapted from the RAS training programs (Malpani et al., 2020). Participants passed a suture needle repeatedly through three openings (from the leftmost to the rightmost) on a simulated vessel (0.5 cm plastic tube) for 3 minutes (Figure 3, left).

Figure 3.

Figure 3.

Low MWLcondition (left): passing the surgical needle through pre-marked openings; High MWL condition (right): blood (the water with red coloring dye) was added into the cavity.

In the high MWL condition (i.e., hemorrhage condition), participants completed the needle passing task with blood filling the cavity (Figure 3, right). Moreover, they performed an auditory oddball task simultaneously, which simulated the real-world scenario of a surgeon dealing with multiple tasks simultaneously. Specifically, the secondary task consisted of a randomly occurring high-pitched (2000 Hz) tone and a low-pitched (220 Hz) tone at one-second intervals. The participants were asked to respond as quickly as possible to the low-pitched alarm by pressing the dVRK foot pedal on their left foot.

Experiment procedure.

Instructions and a video demonstration about the experimental task were given upon signing the consent form. EEG caps were placed according to the 10–20 system, and eye-tracking was calibrated using Tobii’s manual (Tobii Manual, 2020) . Participants then practiced the tasks on the dVRK. There were ten trials, five in each condition, and a one-minute break between trials. The task order was counterbalanced across participants, and each trial lasted 3 minutes. The experiment lasted approximately 90 minutes.

Data processing.

We split the recorded physiological signals using a sliding window of 15 seconds.

Preprocessing of EEG data was performed according to the published preprocessing pipeline (Bigdely-Shamlo et al., 2015). We imported the raw EEG data into the EEGLAB of MATLAB and then filtered it with 0.5 Hz and 30 Hz bands to remove noise caused by dramatic movements. Signals were then re-referenced to the average value of all electrodes (Paszkiel, 2020). To remove the artifacts, a clean line algorithm was used (Bigdely-Shamlo et al., 2015). Artifacts caused by eye blinks, head movements, and generic discontinuities were separated from signals using independent component analysis and automatically removed using the Adjust plugin (Delorme & Makeig, 2004). The cleaned data was used to calculate spectral powers that have been used in previous studies, including theta (4–8 Hz), alpha (8–13 Hz), and beta bands (13–30 Hz). Band power was averaged across all available channels. Due to equipment limitations, PO7 and PO8 channels were removed.

For the eye-tracker data preprocessing, the fixations were identified as sequences of gaze-point measurements that remain relatively close for longer than 85 milliseconds (Hafed & Krauzlis, 2012). The fixation locations were represented in normalized video pixel coordinates, where (0,0) and (1,1) corresponded to the top left corner and right corner of the scene camera video, respectively. Then, the fixation duration and count were calculated. The scan path length was computed using Equation (1), where d is the operation of Euclidean distance, i is the ith fixation, and n is the total number of identified fixations. The unit of the scan path was the normalized pixel.

Scanpathlength=i=1i=ndfixationi,fixationi1 (1)

The nearest neighbor index (NNI) (Fidopiastis, 2009) was calculated using Equation (2), where min(dij) is the distance between each fixation i and fixation j nearest to it, N is the number of fixations, and A is the area of the region.

NNI=d(NN)d(ran)=i=1Nmindij/N0.5A/N (2)

Statistical analysis.

Repeated measures ANOVA was performed to compare changes in physiological metrics between the two task conditions. Shapiro-Wilk and Levene’s tests were used to verify normality and equal variance, respectively. The theta and beta power metrics were log-transformed to meet the normality assumption. The repeated measures ANOVA was performed using the Car package (Hayden, 2012) in R 4.2.1. In addition, the effect size, η2, which measures the proportion of variance associated with each main effect and interaction effect in the model, was computed for each physiological metric. An increase in effect size reflects a stronger association between the physiological metric and MWL.

Machine learning model development.

The extracted physiological features were labeled based on the associated task condition, then used to build machine learning models using the Sklearn (Garreta, 2013) and the Keras (Ketkar, 2017) packages in Python 3.9. Models included a neural network model (NN) and four machine learning classifiers: logistic regression, support vector machine, linear discriminant analysis, and XGboost. The optimal hyperparameters of each classifier were determined using the grid search method (Pontes et al., 2016).

The NN used a combination of fully connected layers with ReLU activation functions and dropout layers. NN was optimized using the following techniques: 1) Adam was used as an optimizer with a loss function defined by binary cross-entropy; 2) 100 epochs and a batch size of 15 were assigned during training; 3) the early stop technique was implemented to minimize over-fitting; and 4) the initial learning rate and decay rate was set to 0.04 and 10-e4, respectively.

To test the model’s robustness, the leave-one-subject-out cross-validation strategy was utilized. In particular, the classifiers were trained on all subjects except one and validated on the held-one-subject-out fold. This process was repeated with each subject used once as a validation dataset. The final model performance was computed as the average classification performance of cross-validation.

RESULT

MWL comparison between the high and low MWL conditions.

For the eye-tracker metrics, the scan path length (F (1, 22) = 7.04; p = 0.03) and NNI (F (1, 22) = 18.82; p < 0.001) were 0.58 and 0.04 higher, respectively, in the hemorrhage condition than in the non-hemorrhage condition (Figure 4). The NNI had the largest effect size, suggesting that the NNI metric had a stronger association with task conditions than other physiological metrics. EEG theta (F (1, 22) = 10.25; p = 0.04) and beta (F (1, 22) = 16.45; p = 0.03) band powers of EEG was 4.69 and 4.24 higher in hemorrhage condition when compared with non-hemorrhage condition (Figure 4). However, no significant differences were observed in alpha, fixation count, and fixation duration between the two task conditions. The detailed statistical results are summarized in Appendix A.

Figure 4.

Figure 4.

Comparison of repeated measures ANOVAs between the two tasks. “*”: significant; “n.s.”: not significant; NNI: nearest neighbor index; η2: 0.01 small effect size, 0.06 medium effect size, 0.14 or higher large effect size.

MWL model accuracy.

Compared with traditional machine learning models (Table 1), the NN achieved the best performance across all evaluation metrics. The EEG and eye-tracking fusion produced the best classification results compared to the single sensor model (Table 2). The confusion matrix of the NN is shown in Appendix B.

TABLE 1:

Model Performance for Each Classifier

Classifiers Accuracy Sensitivity Precision
Linear discriminate analysis classifier 57.3% ± 3.1% 61.2% ± 8.7% 57.4% ± 6.3%
K nearest neighbors classifier 61.3% ± 12.5% 59.4% ± 14.3% 61.7% ± 16.7%
Decision tree classifier 65.4% ± 15.5% 63.7% ± 21.3% 59.1% ± 16.4%
Support vector machine 54.2% ± 19.9% 56.5% ± 13.8% 55.3% ± 18.2%
Neural network model 77.9% ± 5.9% 82.4% ± 8.9% 75.7% ± 6.5%

TABLE 2:

Neural Network Model Performance Comparison by Modality

Modality Accuracy Sensitivity Precision
Multi-sensor 77.9% ± 5.9% 82.4% ± 8.9% 75.7% ± 6.5%
EEG only 63.4% ± 6.2% 64.5% ± 10.5% 62.3% ± 13.4%
Eye-tracker only 70.1% ± 11.5% 79.4% ± 7.8% 62.8% ± 12.6%

DISCUSSION

The physiological differences between the two task conditions suggest that the designed high MWL hemorrhaging condition evoked high MWL. In the hemorrhage case, longer scan path (i.e., dispersed fixations) and higher NNI (i.e., random fixations) during visual exploration have been associated with higher MWL (Maggi & di Nocera, 2021). The significant changes in these metrics may be caused by differences in surgical field visibility under the two conditions. Surgeons performed a comprehensive visual search in hemorrhage, as opposed to non-hemorrhage, to locate the openings on the vessel occluded by blood. In this case, participants did not follow an organized scanning strategy when hemorrhaging occurred but instead searched for the marked openings randomly. The extensive searching behaviors allowed participants to encode various information during the task, resulting in a higher MWL.

The significant increases in theta and beta activity can also provide evidence of the high MWL evoked in the hemorrhage condition. Theta activity has been shown to play an essential role in working memory functions, including encoding and retaining information (Gevins et al., 2016). In addition, beta activity has been correlated with complexity manipulation, visual attention (Mapelli & Özkurt, 2019), and increased working memory (Spitzer & Haegens, 2017). The increased theta and beta in hemorrhage conditions suggested that more cognitive resources were needed to accomplish the task.

Previous studies have shown that alpha, fixation duration, and fixation count are sensitive to changes in MWL (Matthews et al., 2014; Puma et al., 2018), but no differences were detected in this study. Individual differences in visual behavior may cause an insignificant fixation count and duration. For example, some participants may focus their gaze on the instrument’s tip, while another may concentrat on the target holes. Needles drops may also interrupt the task, diverting the eyes from the intended area and causing participants to change their visual search strategies.

The observed physiological significance may also be partially explained by the secondary task in the high MWL condition. The secondary task served two purposes in this study. First, surgery is a multitasking environment. Specifically, when hemorrhage events occur, the lead surgeon needs to attend to warnings from the surgical monitor and case-related communications from the surgical team. The secondary task was introduced to replicate such scenarios. Moreover, according to the self-regulation theory (Hancock & Matthews, 2019), humans can monitor task demands and modify their actions to meet demanding task goals to some degree. For example, experts may be able to manage hemorrhaging events by altering their strategies based on self-regulation without excessively increasing MWL. Studies have pointed out that aids should be provided when self-regulation fails; however, the additional external aids may interfere with the self-regulation process and adversely affect performance. Therefore, the secondary task was designed to evoke a challenging situation to avoid conflict between external aids and self-regulatory mechanisms.

Machine learning models were constructed to discover the patterns associated with the different levels of MWL. Compared with more conventional models, the NN performed better in this study. These results are consistent with other studies that assess MWL with physiological sensors. The NN is demonstrated to implicitly detect complex nonlinear relationships between input and target variables, and it can cope with noise in input data (Hussein et al., 2019). This is desired in MWL-AA since physiological data is subjected to noise and variable interactions due to individual differences.

Finally, this study compared the performance of the sensor’s ability to recognize MWL. In line with our expectations, multimodal sensors improved MWL prediction. This can be attributed to the unique information each sensor provides. Specifically, participants must use visual cues to locate target areas and physically manipulate controls to reach the targets. Under such circumstances, an EEG sensor may not provide sufficient information about participants’ visual behaviors, as indicated by the relatively poor classification results with the EEG sensor only. Nevertheless, eye-trackers can provide direct information about operators’ gaze patterns and attentional control, hence, providing additional information in identifying different levels of MWL. Considering the strengths of those two methods, combining those two can provide a more comprehensive assessment of participants’ MWL during RAS.

EXPERIMENT 2

In experiment two, a semi-autonomous suction tool that activates when high MWL is detected was developed using the Experiment 1 multi-sensor model. The effectiveness of the MWL-AA on surgical task performance was assessed in a comparative study with surgical trainees.

Method

Real-time MWL-AA System Algorithms.

EEG and eye-tracking data were synchronized using the Lab Streaming Layer distribution (Figure 5). The sliding window (15s) was utilized to extract features from the streaming data. For each sliding window, instead of using the band power value of each electrode, the averaged band power across all available EEG channels was obtained (i.e., alpha, theta, and beta) to reduce the computation time. Eye-tracker metrics were also computed from each sliding window. The model learned the vector pattern and produced a second-based MWL index. The higher the value, the greater the probability of the physiological input represented a high MWL. The output of the model further drove the activation of MWL-AA; when surgeons’ workload exceeded a threshold (described below), the suction tool would activate and keep running until the surgical field was clear of blood observed through the console video.

Figure 5.

Figure 5.

Diagram showing the synchronization of EEG and eye-tracking signals for adaptive automation systems. Physiological metrics for each epoch were extracted and combined into a single vector. Features were then input into a neural network model for MWL assessment. Predictions were made every second; if the workload was high, the semi-autonomous suction tool would be activated.

Participant and apparatus.

Upon obtaining the IRB, ten residents (all male, the average age was 34) who participated in Urology’s robotic skill training program were recruited from medical school. The experience of the participants in RAS ranged from one to three years. The experiment was performed on the da Vinci Si Surgical System (dVSS Si; Intuitive Surgical, Sunnyvale, CA). It was similar to the system used in experiment 1, except that the dVSS is used for clinical procedures while the dVRK is an older system intended for none-medical use. Physiological signals were measured with sensors described in experiment one.

Experimental task.

As in experiment one, participants passed needles while completing an auditory oddball task. Simulated blood was slowly pumped into the cavity every 10 seconds. Hemorrhages started randomly within 1 minute after every suction, so the user would not be anticipating it.

Experimental design.

Each participant performed the task under two conditions as follows: 1) in the adaptive automation condition, the MWL-AA was used and 2) in the periodic automation condition, the suction was initiated periodically without consideration of the user MWL. Particularly, the suction was activated every 150 seconds in the periodic automation condition. The interval was estimated by calculating the average number of suctions made by the suction tool when participants in experiment one performed the same task with the use of MWL-AA.

Experimental procedure.

After completing the consent form and training session, sensors were set up on the user as described in experiment one. A calibration session was conducted to determine the MWL threshold for semi-autonomous suction tool activation. Specifically, each participant completed the needle passing task under hemorrhage and non-hemorrhage conditions (2 minutes each). Personalized cutoff points were determined using Equation (3), where WI is a set of continuous MWL indices produced during the task. As the sigmoid activation function was used to classify the MWL level, the output of the model ranged from zero to one (i.e., WI) and represented a probability that current physiological responses fall into the high MWL class. Hence, the higher WI implied the present physiological responses were closer to high MWL, indicating an increase in MWL.

Personalizedthreshold=maxWIheomrrhage+minWInon-heomrrhage2 (3)

All participants then underwent an eight-minute trial under both conditions, and the task order was counterbalanced across participants. All subjects wore sensors throughout the experiment. In addition, participants were not informed which trials were associated with which task condition. Nonetheless, they may notice the trial condition based on the suction activation timing. The performance of the task was recorded on video. The NASA-TLX questionnaire was completed after each task.

Data processing and system evaluation.

Our study evaluated the success of the MWL-AA from two perspectives, accuracy and effectiveness.

  1. Accuracy of the MWL-AA. Sensitivity and specificity were used to measure the accuracy of MWL-AA activation during the adaptive condition. The sensitivity is the percentage of suction deactivation when no hemorrhage occurs. The specificity refers to the proportion of hemorrhage events that resulted in suction activation. In particular, the post hoc video task analysis was performed. The video was sampled every 15 seconds, and the occurrence of hemorrhages (yes/no) and the activation of the semi-autonomous suction tool (yes/no) were summarized.

  2. Effectiveness of the MWL-AA. To evaluate the MWL-AA, the performance and physiological metrics in the adaptive automation condition were compared to those in the periodic automation condition. However, the suction frequency in the adaptive automation condition may vary based on the workload experienced by the individual, resulting in unequal task difficulty between the two conditions in some situations. To minimize the effect of such confounding factors, a more granular analysis of performance was performed to compare metrics at the same task difficulty. We assumed that task difficulty would be comparable at the same blood level. To this end, the recorded video was segmented into 15-second video clips. Each video clip was annotated based on the amount of blood present in the cavity. There were three possible annotations: no blood (i.e., the surgical was clear of blood), low blood (i.e., the target tube was visible), and high blood (i.e., the target tube was not visible).

Since segment counts differed across categories, the mixed effect model was applied to address the unbalanced data. Each response metric was modeled with a mixed effect model, resulting in nine total models (three EEG metrics models, four eye metrics models, and two performance metrics models). For each mixed effect model, the interaction between task condition and blood level was treated as a fixed effect. Subjects were treated as random effects. The surgeons’ performance and physiological responses were compared between the two task conditions using a post hoc Tukey pairwise contrast (α = 0.05). The mixed effect models and the post hoc test were performed with the lme4 (Bates, 2015) and the emmeans packages (Russell, 2022) in R 4.2.1, respectively. The NASA-TLX scores were compared using a pairwise t-test with the Car package (Hayden, 2012) in R 4.2.1.

Result

Nine surgeons’ data were analyzed. One sample was excluded because the eye-tracker malfunctioned during the experiment.

Accuracy of the MWL-AA.

In Table 3, the accuracy of MWL-AA activation during the adaptive condition is summarized. The results indicated that the sensitivity and specificity of the system were 72.3% and 94.9%, respectively.

TABLE 3:

Mental Workload-Based Adaptive Automation Activation Frequency Summary

Activation of Suction Tool
No, % Yes, %
Occurrence of hemorrhage events No 71.56 3.79 Specificity = 94.9%
Yes 6.81 17.84 Sensitivity = 72.3%

Note: this table reflects imbalanced data where the samples in the occurrence of hemorrhage events No is much more than the occurrence of hemorrhage events Yes in this study.

Effectiveness of the MWL-AA.

The frequency of each blood level of each condition is summarized in Appendix C. Extracted metrics were compared between the adaptive and periodic automation conditions at different blood levels. Eye behavior in this study showed that the scan path length, fixation count, fixation duration, and NNI values were significantly higher in the periodic automation condition that the MWL-AA condition, except for the NNI value in low blood levels and averaged fixation duration in high blood levels (Figure 6(a)). Compared with adaptive automation condition, participants had increased theta band power (t (133) = 4.57; p < 0.001) and beta band power (t (206) = 2.98; p = 0.03) when blood levels were high and low in periodic automation condition, respectively (Figure 6(b)).

Figure 6.

Figure 6.

Figure 6.

Tukey post hoc contract analysis under three task difficulties (i.e., no blood, low blood level, and high blood level) between two task conditions (i.e., adaptive and periodic automation condition). (a) Eye-tracker metrics. (b) EEG metrics. (c) Performance metrics and NASA-TLX.

For the performance metrics, participants had significantly faster response times to the secondary task and a shorter completion time for the primary task when blood was empty or low. The NASA-TLX was also significantly higher (t (8) = 2.83; p = 0.02) in the periodic automation condition (54.6 ± 16.2) than the adaptive automation condition (44.3 ± 10.5) (Figure 6(c)). The detailed statistical results are furthere detailed in Appendix D and Appendix E.

DISCUSSION

In experiment two, we developed a semi-autonomous suction device that can be activated by the user’s MWL to demonstrate the effectiveness of MWL-AA. Participants’ physiological responses, task performance, and self-reported MWL were collected and compared between the adaptive automation condition and the periodic automation condition. Our assumption was that the user would experience lower MWL and achieve better task performance with the MWL-AA, as the MWL-AA can reduce blood more efficiently, reduce the number of high blood situations, and therefore mitigate the overall MWL during the task.

Findings demonstrated that the MWL-AA reduced surgeons’ MWL during the procedure based on sensing measures of physiological activation. Particularly, participants had a shorter scan path length and a lower NNI with the MWL-AA, suggesting a more efficient searching strategy and a lower perceived MWL ((Imants & de Greef, 2014; Jacob, 2003). Further, fixation frequency and duration, which have been positively correlated with MWL (Reimer et al., 2010), were lower in the adaptive condition, providing further evidence for the benefits of the MWL-AA.

EEG metrics demonstrated reduced MWL in the adaptive condition. As discussed in experiment one, the reduction in beta and theta power has been associated with a decrease in MWL. In experiment two, such decrements were observed in the adaptive condition (Figure 6(b)), suggesting lower surgeon MWL. This could be explained by the fact that the suction was initiated in a more timely manner during the adaptive condition than the periodic automation condition. Delays in suction may require surgeons to spend a longer time in a stressful state while the blood is covering the field, which may result in increased MWL.

The condition with MWL-AA resulted in faster needle passing in the primary task and a faster reaction time in the secondary task when no blood and low blood levels were present. However, the performance was similar under the two conditions at high blood levels, despite physiological metrics indicating that the users experienced different workload. This could be explained by a dissociation between performance and workload at high blood levels (Hancock & Matthews, 2019), where performance measurements may not necessarily be correlated with physiological measures. In this study, the high blood level increased the task difficulty dramatically in both conditions since the blood completely covered the tubes. Consequently, participants had to search for openings beneath the flooded blood regardless of the task condition, resulting in no difference in performance. Physiological metrics, however, responded more slowly to the increased in task difficulty than performance measures (Escalona, 2019). This slower response may suggest that the user’s physiological response required time to react to task load increments.

In addition, participants using the MWL-AA reported significantly lower levels of MWL, as measured by NASA-TLX surveys. The significant differences in self-reported MWL further support that the surgeons perceived lower MWL with MWL-AA.

We also found that participants perceived lower MWL in the adaptive condition even when no blood was in the cavity, as indicated by physiological and performance metrics. It could be explained by carryover effects on the MWL level throughout the experiment. Better management of MWL in the adaptive condition could lead to reduced disruptions during workload transitions and make overall tasks (including no blood level) less stressful to participants (Helton et al., 2008).

The results indicated that the suction tool remained inactive with a probability of 94.9% when the surgical field was clean and would activate 72.3% of the time when hemorrhage occurred. The relatively good generalization capability of the MWL-AA could be attributed to the calibration session preceding the experiment. Depending on the individual’s skill level and how they cope with workload stress, they may respond differently to the same task. For instance, novices may perceive a high MWL when bleeding occured. In contrast, experts may adjust their strategies to avoid eliciting the high MWL, resulting in different physiological responses and predicted MWL. A personalized threshold would allow the system to adjust sensitivity to fit users’ needs, especially if the model is generalized to new users without retraining and collecting new baseline data.

GENERAL DISCUSSION AND FUTURE WORK

Our primary objective in this study was to develop an MWL-AA that capable of triggering occupationally relevant automation based on physiological sensors. A semi-autonomous suction device controlled by the user’s MWL has been developed in order to demonstrate the concept of the MWL-AA. In particular, experiment one developed a real-time multi-sensing model for MWL detection, which was then used in experiment two as the basis for the MWL-AA system. We evaluated the proposed MWL-AA in real surgeons by comparing task performance, physiological responses, and self-rated MWL with and without using the MWL-AA. The results indicated that adaptive automation triggered by the surgeons’ MWL is possible in RAS, and the implementation of such technologies did not hamper surgical task performance.

In this study, it was hypothesized that the MWL-AA would reduce blood at more efficient times, which could lead to a reduction in the overall MWL. To illustrate this, we designed the experiment so that the timing of the intervention was the primary independent variable. Particularly, periodic assistance was used in order to approximate the frequency of assistance in the adaptive condition. Despite the limitations of this control, it enabled us to make a relatively fair comparison. In terms of other possible control conditions, a no suction condition would cover the tube in the blood (preventing completion of the task). At the end of the spectrum, constant suction is not feasible in a clinical setting due to the possibility of damaging the tissues (Czarnik et al., 1991). This initial demonstration may guide the design of better controls (e.g., completely random conditions) for future studies focusing on the impact of suction timing on the performance of users.

It is important to note that in our study, the MWL-triggered suction tool was used in a very specific training environment to demonstrate the concept of MWL-AA and its application in real-world conditions may result in very different outcomes. Noteworthy, the overall activation rate of 72.3% was not ideal for laboratory conditions and is likely to be considerably lower in the field. This low activation rate was likely due to the relatively small sample size used in this study. The size and variation of the population had an undisputed impact on model fidelity. Research in the future should examine larger sample sizes and consider a variety of demographic characteristics and surgical skills when training the MWL model. Individual differences may also affect model performance in real-world conditions. Intra- and inter-individual differences may impact physiological patterns and thus MWL assessment accuracy (Kondacs & Szabó, 1999). Modeling individual variations in MWL responses is essential to improving accuracy. In the future, transfer learning (Weiss et al., 2016) can be applied to reuse a trained model as the basis for developing new models for new tasks or users.

In addition, several factors could contribute to the high MWL in RAS in the field, including instrument exchange, conversations between surgical teammates, and poor visibility (Weigl et al., 2016), resulting in inaccurate MWL assessment. Therefore, more sophisticated training procedures will be necessary for the future. For instance, future studies can investigate how physiological responses differ when different types of workload stress are presented. The development of an intelligent system that can not only detect cognitive overload but also identify the underlying causes and provide appropriate assistance would greatly increase the value of the system. Furthermore, extensive testing must be conducted in order to determine the reliability of the MWL-AA before it can be used in real-life scenarios. Specifically, studies should demonstrate that MWL-AA can correctly identify cognitive overload conditions and provide appropriate assistance in a variety of scenarios. The usability of MWL-AA, trust issues among users, ethical challenges, and organization policy adaptations must also be thoroughly examined in the future.

Moreover, care must be taken when collecting data for MWL assessment in a complex environment where alternative sources of cognitive stimuli are available, such as stress, motivation, and situation awareness (Natarajan et al., 2004). Mental states are indeed intertwined. Hence, mapping between target mental states and physiological responses should be clearly defined and investigated with careful consideration of other confounding mental states that might affect physiological variables. Disentangling mental states will require further effort in the future. Furthermore, studies in surgery typically manipulate workload by increasing the complexity of tasks with more sophisticated motor control (Wu et al., 2020). Nonetheless, it is unclear whether the observed physiological differences between more and less demanding conditions are attributable to differences in MWL alone or whether they result from differences in motor performance between tasks. The motor-related confounding factor should also be considered when assessing MWL in RAS.

It is unfortunate that the implementation of the proposed semi-autonomous suction tool is unlikely to improve patient outcomes in the field, taking into account the full sociotechnical work system. As the design of automated technologies is beyond the scope of the current study, we only compared the MWL-triggered technology with a highly unfavorable technology. A more comprehensive comparison is indispensable for studies that are interested in enhancing automation design. Specifically, studies must demonstrate that the automated suction tool is at least as good as bedside assistants in terms of suction timing, efficiency, and reliability before the automatic suction can be deployed in real surgical procedures.

It is also noteworthy that a variety of alternatives are available for activating automated suction tools. For example, computer vision approaches have been proposed for detecting bleeding sites and initiating suction. It is nevertheless necessary to train the systems with extensive videos prior to their use. Furthermore, blood splashes can obscure the camera or make it ineffective at detecting blood when visibility is poor, thus delaying the suction procedure. MWL triggers are relatively easy to train, but there are concerns regarding their robustness. Specifically, the gaze behavior outside the laboratory varies significantly under different operational conditions (e.g., scanning for appropriate tools or examining the patient’s signs). Different eye exploration behaviors may be reflected in corresponding eye metrics, which may lead to inaccurate measurement of MWL. Moreover, the harsh visibility may cause surgeons to become frustrated and stressed, which may further affect EEG metrics and lead to inappropriate suction. It is acknowledged that the MWL-AA presented in this study may not be the most suitable mechanism for activating an automatic suction device. Nevertheless, the design of a robust automated suction tool was beyond the scope of this study. It is imperative that future research in automation design weighs the pros and cons of each possible automation mechanism in order to optimize the functionality of the automated suction system.

The MWL-AA also requires surgeons to wear sensors and needs appropriate hardware and software. The installation time of physiological sensors would hinder MWL-AA use in surgical procedures. Indeed, wet EEG must typically incorporate a gel or saline solution to capture the signal from human scalp, which requires a tedious setup procedure. Dry EEG and advanced ear-EEG (Kappel et al., 2019) have been proposed recently and can be performed without gel or saline. Especially, the ear-EEG (X. Zhou et al., 2016) can acquire signals by inserting electrodes into the ear, which can considerably reduce the installation time. However, the possibility and accuracy of employing easy-to-install sensors in MWL-AA systems require additional investigation.

Despite some limitations, the present study lays the groundwork for future research relating to the improvement of human-machine interactions. In particular, the MWL-AA framework presented in this study could be refined and incorporated into future RAS automation designs. Current automation in RAS is largely static automation, where the technology operates at a fixed level of automation. Static automation, however, may have several disadvantages, such as monitoring inefficiency, loss of situational awareness, and complacency (Byrne & Parasuraman, 1996). In contrast, MWL-AA which implements automated assistance in response to changing demands on operators may be less vulnerable to such issues. A RAS equipped with the MWL-AA can, for example, step in when surgeons are overwhelmed and return control to the surgeons when MWL levels reduce to a moderate level, keeping surgeons in the loop all the time. The proposed MWL-AA framework can also be used as a tool for RAS training. For instance, MWL-AA could be used to track trainee’s cognitive workload, identify challenging tasks, and automatically schedule training sessions to compensate for gaps in training. The achievements of this study can also be adopted to develop adaptive systems elicited by other mental states that are critical to successful operations.

KEY POINTS.

  • A novel adaptive automation system has been proposed to monitor surgeons’ cognitive workload with 77.9% accuracy using multi-sensor and to initiate assistance when a high workload is detected.

  • A semi-autonomous suction tool triggered by surgeons’ cognitive workload was designed to demonstrate the effectiveness of the proposed adaptive automation system.

  • The results indicated that using the proposed adaptive automation system improved surgeons’ task performance and reduced their perceived workload.

ACKNOWLEDGMENTS

Research reported in this study was supported by the National Institutes of Health under award #R21EB026177, and in part by a research grant provided by Intuitive.

Biographies

Jing Yang is a Ph.D. candidate in the School of Industrial Engineering at Purdue University. She has a MS in Industrial and Operations Engineering from the University ofMichigan.

Juan Antonio Barragan is a master student in the School of Industrial Engineering at Purdue University.

Jason Michael Farrow is a Urology MIS Fellow in medical school at Indiana University.

Chandru P Sundaram is a Dr. Norbert M. Welch, Sr. and Louise A. Welch Professor of Urology at Indiana University.

Juan P. Wachs is a professor in the School of Industrial Engineering at Purdue University.

Denny Yu is an associate professor in the School of Industrial Engineering at Purdue University. He received his Ph.D. in Industrial and Operations Engineering from the University of Michigan.

APPENDIX

APPENDIX A. Repeated Measures ANOVA Results: Physiological Metrics Comparison Between Hemorrhage and Non-hemorrhage Condition

Sensors Metrics Hemorrhage condition mean (SD) Non-hemorrhage condition mean (SD) F-ratio;
p-value
Effect size (η2)
EEG Theta (dB)* 13.01 (6.71) 17.70 (11.9) F (1,22) = 10.25;
p = 0.01
0.53
Beta (dB)* 12.04 (3.09) 16.28 (5.39) F (1,22) = 16.45;
p < 0.001
0.64
Alpha (dB) 17.32 (6.47) 19.51 (8.23) F (1,22) = 1.90;
p = 0.2
0.07
Eye-tracker Averaged fixation count 33.7 (11.93) 31.7 (16.79) F (1,22) = 0.22;
p = 0.64
0.02
Averaged fixation duration (ms) 291 (43.88) 281 (55.55) F (1,22) = 0.7;
p = 0.42
0.07
Scan path length* (normalized pixel) 1.25 (1.09) 1.83 (2.02) F (1,22) = 7.04;
p = 0.03
0.43
NNI * 0.15 (0.07) 0.19 (0.08) F (1,22) = 18.82;
p < 0.001
0.67
“*”

indicates significant; SD: standard deviation

APPENDIX B. Confusion Matrix of Workload Classification Accuracy using Proposed Neural Network Model

Predicted Condition
LMW (P) HMW (N)
Actual condition LMW (P) TP= 41.3% ± 4.6% FN = 8.8% ± 4.5% Sensitivity=TPP=82.4%±8.9%
HML (N) FP = 13.2% ± 4.5% TN = 36.7% ± 4.2% Sensitivity=TNN=73.5%±21.7%
F1score=2TP2TP+FP+FN=78.9%±9% Accuracy=TP+TNP+N=77.9%±5.9%

P: positive; N: negative; TP: true positive; FN: false negative; FP: false positive; TN: true negative

APPENDIX C. Summary of the Average Frequency of Each Blood Level under Each Task Condition

Adaptive Automation Condition Periodic Automation Condition
No blood frequency mean (SD) 13.1 (0.5) 12.8 (1.8)
Low blood frequency mean (SD) 11.6 (1.5) 10.4 (2.8)
High blood frequency mean (SD) 7.3 (1.1) 8.8 (1.3)

SD: standard deviation

APPENDIX D. Mixed Effect Models Summary for Effects of Interaction Between Task Condition and Blood Level (No blood in Periodic Automation Condition is Reference Group) on Considered Metrics: Considered Metric = Blood level: Task Condition+ (1|Subject)

Task condition No blood Low Blood High Blood R2 AIC
Adaptive Automation Adaptive Automation Periodic Automation Adaptive Automation Periodic Automation
Average needle Pass time (s) Coefficient −2.89 −2.29 0.04 3.89 3.29 0.56 1397.6
P <0.001* <0.001* 0.97 <0.001* <0.001*
Average Reaction Time (s) Coefficient −0.18 −0.11 0.13 0.23 0.28 0.57 1154.5
P <0.001* 0.03* 0.01* <0.001* <0.001*
Theta (dB) Coefficient −1.52 −2.27 0.09 8.83 14.92 0.59 2026.36
P 0.49 0.11 0.96 <0.001* <0.001*
Beta (dB) Coefficient 0.98 3.45 6.04 4.7 3.69 0.54 2054.7
P 0.45 0.03* 0.01* <0.001* <0.001*
Alpha (dB) Coefficient 2.58 −2.02 −1.13 9.6 8.7 0.39 3195.9
P 0.10 0.21 0.51 <0.001* <0.001*
Averaged fixation count Coefficient −11.78 −9.45 −5.05 −4.08 −0.32 0.77 1181.4
P <0.001* 0.001* 0.003* 0.01* 0.03*
Averaged fixation duration (ms) Coefficient −39 −41 5 −12 18 0.81 1872.1
P <0.001* <0.001* 0.98 <0.001* 0.03*
Scan path length (normalized pixel) Coefficient −1.92 −2.09 −1.13 −0.58 2.09 0.76 1175.7
P 0.01* 0.02* 0.01* 0.05* 0.02*
NNI Coefficient −0.13 −0.07 −0.05 −0.08 0.07 0.79 1203.5
P <0.001* 0.04* 0.12 0.02* 0.04*
*:

significant; the interaction between task condition and blood level is fixed effect, the subject is the random effect

APPENDIX E. Post hoc Contrast Result Summary: Comparison of Considered Metrics Between Conditions Under Each Blood Level

Metrics Task condition No Blood Low Blood Level High Blood Level
Adaptive Automation (n = 118) Periodic Automation (n = 115) Adaptive Automation (n = 114) Periodic Automation (n = 94) Adaptive Automation (n = 56) Periodic Automation (n = 79)
Averaged Needle pass time (s) Mean (SD) 6.33 (2.3) 9.22 (2.6) 6.93 (2.9) 9.26 (3.3) 13.11 (3.5) 12.51 (6.3)
T-test; P t (231) = 3.86; p < 0.001* t (206) =3.13; p = 0.02* t (133) = 0.96; p = 0.92
Averaged Reaction Time (s) Mean (SD) 0.43 (0.2) 0.61 (0.3) 0.50 (0.2) 0.74 (0.3) 0.84 (0.5) 0.89 (0.3)
T-test; P t (231) = 4 .835; p < 0.001* t (206) = 3.63; p < 0.001* t (133) =1.08; p = 0.88
Theta (dB) Mean (SD) 9.73 (5.3) 11.25 (5.3) 8.98 (5.8) 11.34 (6.9) 20.08 (6.9) 26.17 (9.7)
T-test; P t (231) = 0.402; p = 0.97 t (206) = 1.21; p = 0.835 t (133) = 4.57; p < 0.001*
Beta (dB) Mean (SD) 15.50 (1.8) 14.52 (1.7) 17.97 (2.8) 20.56 (1.6) 19.22 (1.63) 18.21 (1.9)
T-test; P t (231) = 2.13; p = 0.27 t (206) = 2.98; p = 0.03* t (133) = 0.27; p = 0.8
Alpha (dB) Mean (SD) 11.73 (1.3) 9.15 (1.5) 7.13 (1.3) 8.02 (1.5) 18.75 (1.8) 17.85 (1.3)
T-test; P t (231) = 1.62; p = 0.58 t (206) = 0.57; p = 0.92 t (133) = 2.16; p = 0.64
Averaged Fixation Count Mean (SD) 26.43 (12.5) 38.21 (10.7) 28.76 (10.2) 33.16 (11.2) 34.31 (13.2) 37.89 (9.4)
T-test; P t (231) = 7.28; p < 0.001* t (206) = 3.35; p < 0.001* t (133) = 6.632; p < 0.001*
Averaged Fixation Duration (ms) Mean (SD) 204 (44.6) 243 (44.2) 202 (42.6) 248 (44.1) 231 (44.2) 261 (45.3)
T-test; P t (231) =3.91; p < 0.001* t (206) = 4.62; p < 0.001* t (133) = 1.98; p = 0.35
Scan Path Length Mean (SD) 2.11 (1.8) 4.03 (2.5) 1.94 (1.69) 2.90 (2.5) 3.45 (2.46) 6.12 (4.1)
T-test; P t (231) = 6.15; p < 0.001* t (206) = 3.21; p < 0.001* t (133) = 6.632; p < 0.001*
NNI Mean (SD) 0.18 (0.08) 0.31 (0.10) 0.24 (0.12) 0.26 (0.1) 0.23 (0.11) 0.38 (0.2)
T-test; P t (231) = 5.65; p < 0.001* t (206) = 2.31; p = 0.19 t (133) = 4.77; p < 0.001*
*:

significant; SD: standard deviation; n: number of observations

Contributor Information

Jing Yang, School of Industrial Engineering, Purdue University, West Lafayette, Indiana, USA.

Juan Antonio Barragan, School of Industrial Engineering, Purdue University, West Lafayette, Indiana, USA.

Jason Michael Farrow, Department of Urology, Indiana University School of Medicine, Indianapolis, Indiana, USA.

Chandru P. Sundaram, Department of Urology, Indiana University School of Medicine, Indianapolis, Indiana, USA

Juan P. Wachs, School of Industrial Engineering, Purdue University, West Lafayette, Indiana, USA

Denny Yu, School of Industrial Engineering, Purdue University, West Lafayette, Indiana, USA.

REFERENCES

  1. Aricò P, Borghini G, di Flumeri G, Colosimo A, Bonelli S, Golfetti A, Pozzi S, Imbert JP, Granger G, Benhacene R, & Babiloni F (2016). Adaptive automation triggered by EEG-based mental workload index: A passive brain-computer interface application in realistic air traffic control environment. Frontiers in Human Neuroscience, 10(OCT2016), 1–13. 10.3389/fnhum.2016.00539 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bates D et al. (2015). Fitting linear mixed-effects models using lme4. In Journal of Statistical Software (1st ed., 67, pp. 27–48). Foundation for Open Access Statistics. [Google Scholar]
  3. Bigdely-Shamlo N, Mullen T, Kothe C, Su KM, & Robbins KA (2015). The PREP pipeline: Standardized preprocessing for large-scale EEG analysis. Frontiers in Neuroinformatics, 9(JUNE), 1–19. 10.3389/fninf.2015.00016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Byrne EA, & Parasuraman R (1996). Psychophysiology and adaptive automation. Biological Psychology, 42(3), 249–268. 10.1016/0301-0511(95)05161-9 [DOI] [PubMed] [Google Scholar]
  5. Czarnik RE, Stone KS, Everhart CC, & Preusser BA (1991). Differential effects of continuous versus intermittent suction on tracheal tissue. Heart & Lung: The Journal of Critical Care, 20(2), 144–151, http://europepmc.org/abstract/MED/2004925 [PubMed] [Google Scholar]
  6. de Visser EJ, Legoullon M, Freedy A, Freedy E, Weltman G, & Parasuraman R (2008). Designing an adaptive automation system for human supervision of unmanned vehicles: A bridge from theory to practice. Proceedings of the Human Factors and Ergonomics Society, 52(4), 221–225. 10.1177/154193120805200405 [DOI] [Google Scholar]
  7. Debie E, Fernandez Rojas R, Fidock J, Barlow M, Kasmarik K, Anavatti S, Garratt M, & Abbass HA (2021). Multimodal fusion for objective assessment of cognitive workload: A review. IEEE Transactions on Cybernetics, 51(3), 1542–1555. 10.1109/TCYB.2019.2939399 [DOI] [PubMed] [Google Scholar]
  8. Delorme A, & Makeig S (2004). EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics. Journal of Neuroscience Methods, 134(1), 9–21. [DOI] [PubMed] [Google Scholar]
  9. Escalona EM et al. (2019). Latency Differences Between Mental Workload Measures in Detecting Workload Changes. Communications in Computer and Information Science, 1012(1), 131–146. DOI: 10.1007/978-3-030-14273-5_8 [DOI] [Google Scholar]
  10. Fernandez Rojas R, Debie E, Fidock J, Barlow M, Kasmarik K, Anavatti S, Garratt M, & Abbass H (2020). Electroencephalographic workload indicators during teleoperation of an unmanned aerial vehicle shepherding a swarm of unmanned ground vehicles in contested environments. Frontiers in Neuroscience, 0, 40. 10.3389/FNINS.2020.00040 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Fidopiastis CM (2009). Impact of Automation and Task Load on Unmanned System Operator’s Eye Movement Patterns. In Foundations of Augmented Cognition. Neuroergonomics and Operational Neuroscience (5638, pp. 229–238). Springer, Berlin, Heidelberg. [Google Scholar]
  12. Garreta R et al. (2013). Supervised Learning. In Learning scikit-learn: Machine Learning in Python Experience (12). Packt Publishing Ltd. [Google Scholar]
  13. Gevins A, Smith ME, Leong H, McEvoy L, Whitfield S, Du R, & Rush G (2016). Monitoring Working Memory Load during Computer-Based Tasks with EEG Pattern Recognition Methods: Human Factors: The Journal of the Human Factors and Ergonomics Society, 40(1), 79–91. 10.1518/001872098779480578 [DOI] [PubMed] [Google Scholar]
  14. Giedelman CA, Abdul-Muhsin H, Schatloff O, Palmer K, Lee L, Sanchez-Salas R, Cathelineau X, Davilá H, Cavelier L, Rueda M, & Patel V (2013). The impact of robotic surgery in urology. Actas Urologicas Espanolas, 37(10), 652–657. 10.1016/j.acuro.2012.11.015 [DOI] [PubMed] [Google Scholar]
  15. Hafed ZM, & Krauzlis RJ (2012). Similarity of superior colliculus involvement in microsaccade and saccade generation. Journal of Neurophysiology, 107(7), 1904–1916. 10.1152/jn.01125.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Haidegger T (2019). Autonomy for surgical robots: Concepts and paradigms. IEEE Transactions on Medical Robotics and Bionics, 1(2), 65–76. 10.1109/tmrb.2019.2913282 [DOI] [Google Scholar]
  17. Hancock PA, & Matthews G (2019). Workload and performance: Associations, insensitivities, and dissociations. Human Factors, 61(3), 374–392. 10.1177/0018720818809590 [DOI] [PubMed] [Google Scholar]
  18. Hasselberg A, & Söffker D (2013). A human cognitive performance measure based on available options for adaptive aiding. IFAC Proceedings Volumes, 46(15), 442–449. 10.3182/20130811-5-US-2037.00012 [DOI] [Google Scholar]
  19. Hayden RW et al. (2012). A Review of: “An R Companion to Applied Regression, Second Edition, by Fox J and Weisberg S”. Journal of Biopharmaceutical Statistics, 22(2), 418–419. DOI: 10.1080/10543406.2012.635980 [DOI] [Google Scholar]
  20. Helton WS, Shaw T, Warm JS, Matthews G, & Hancock P (2008). Effects of warned and unwarned demand transitions on vigilance performance and stress. Anxiety, Stress and Coping, 21(2), 173–184. 10.1080/10615800801911305 [DOI] [PubMed] [Google Scholar]
  21. Hussein R, Palangi H, Ward RK, & Wang ZJ (2019). Optimized deep neural network architecture for robust detection of epileptic seizures using EEG signals. Clinical Neurophysiology, 130(1), 25–37. 10.1016/j.clinph.2018.10.010 [DOI] [PubMed] [Google Scholar]
  22. Imants P, & de Greef T (2014). Eye metrics for task-dependent automation. ACM International Conference Proceeding Series. 10.1145/2637248.2637274 [DOI] [Google Scholar]
  23. Jacob RJK et al. (2003). Eye tracking in human-computer interaction and usability research. In The Mind’s Eye: Cognitive and Applied Aspects of Eye Movement Research (pp. 573–605). Elsevier Science BV. [Google Scholar]
  24. Kappel SL, Rank ML, Toft HO, Andersen M, & Kidmose P (2019). Dry-contact electrode ear-EEG. IEEE Transactions on Biomedical Engineering, 66(1), 150–158. 10.1109/TBME.2018.2835778 [DOI] [PubMed] [Google Scholar]
  25. Kazanzidesf P, Chen Z, Deguet A, Fischer GS, Taylor RH, & Dimaio SP (2014). An open-source research kit for the da Vinci® Surgical System. Proceedings - IEEE International Conference on Robotics and Automation, 6434–6439. 10.1109/ICRA.2014.6907809 [DOI] [Google Scholar]
  26. Ketkar N et al. (2017). Introduction to Keras. In Deep Learning with Python (pp. 97–111). Springer. [Google Scholar]
  27. Kirchner EA, Kim SK, Tabie M, Wöhrle H, Maurus M, & Kirchner F (2016). An intelligent man-machine interface—Multi-robot control adapted for task engagement based on single-trial detectability of P300. Frontiers in Human Neuroscience, (2016 Jun 2), 291. 10.3389/FNHUM.2016.00291 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Knaepen K, Marusic U, Crea S, Rodríguez Guerrero CD, Vitiello N, Pattyn N, Mairesse O, Lefeber D, & Meeusen R (2015). Psychophysiological response to cognitive workload during symmetrical, asymmetrical and dual-task walking. Human Movement Science, 40, 248–263. 10.1016/j.humov.2015.01.001 [DOI] [PubMed] [Google Scholar]
  29. Kohlmorgen J, Dornhege G, Braun M, Blankertz B, Müller K-R, Curio G, Hagemann K, Bruns A, Schrauf M, & Kincses W (2007). Improving human performance in a real operating environment through real-time mental workload detection.
  30. Kondacs A, & Szabó M (1999). Long-term intra-individual variability of the background EEG in normals. Clinical Neurophysiology, 110(10), 1708–1716. 10.1016/S1388-2457(99)00122-4 [DOI] [PubMed] [Google Scholar]
  31. Kranzfelder M, Staub C, Fiolka A, Schneider A, Gillen S, Wilhelm D, Friess H, Knoll A, & Feussner H (2013). Toward increased autonomy in the surgical OR: Needs, requests, and expectations. Surgical Endoscopy, 27(5), 1681–1688. 10.1007/s00464-012-2656-y [DOI] [PubMed] [Google Scholar]
  32. Maggi P, & di Nocera F (2021). Sensitivity of the spatial distribution of fixations to variations in the type of task demand and its relation to visual entropy. Frontiers in Human Neuroscience, 15, 257. 10.3389/FNHUM.2021.642535 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Malpani A, Vedula SS, Lin HC, Hager GD, & Taylor RH (2020). Effect of real-time virtual reality-based teaching cues on learning needle passing for robot-assisted minimally invasive surgery: a randomized controlled trial. International Journal of Computer Assisted Radiology and Surgery, 15(7), 1187–1194. 10.1007/s11548-020-02156-5 [DOI] [PubMed] [Google Scholar]
  34. Mapelli I, & Özkurt TE (2019). Brain Oscillatory Correlates of Visual Short-Term Memory Errors. Frontiers in Human Neuroscience, 13(33), 1–15. 10.3389/FNHUM.2019.00033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Matthews G, Reinerman-Jones LE, Barber DJ, & Julian Abich I (2014). The psychometrics of mental workload: multiple measures are sensitive but divergent. Human Factors: The Journal of the Human Factors and Ergonomics Society, 57(1), 125–143. 10.1177/0018720814539505 [DOI] [PubMed] [Google Scholar]
  36. Nagy TD, & Haidegger T (2019). A DVRK-based framework for surgical subtask automation. Acta Polytechnica Hungarica, 16(8), 61–78. 10.12700/APH.16.8.2019.8.5 [DOI] [Google Scholar]
  37. Natarajan K, Acharya UR, Alias F, Tiboleng T, & Puthusserypady SK (2004). Nonlinear analysis of EEG signals at different mental states. BioMedical Engineering Online, 3(1), 1–11. 10.1186/1475-925X-3-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Paszkiel S (2020). Data Analysis of Human Brain Activity Using MATLAB Environment with EEGLAB. In Analysis and Classification of EEG Signals for Brain–Computer Interfaces (852, pp. 33–39). Springer, Cham. [Google Scholar]
  39. Pontes FJ, Amorim GF, Balestrassi PP, Paiva AP, & Ferreira JR (2016). Design of experiments and focused grid search for neural network parameter optimization. Neurocomputing, 186(01), 22–34. 10.1016/j.neucom.2015.12.061 [DOI] [Google Scholar]
  40. Power M, Rafii-Tari H, Bergeles C, Vitiello V, & Yang GZ (2015). A cooperative control framework for haptic guidance of bimanual surgical tasks based on Learning from Demonstration. Proceedings - IEEE International Conference on Robotics and Automation, 2015(June), 5330–5337. 10.1109/ICRA.2015.7139943 [DOI] [Google Scholar]
  41. Prinzel LJ, Freeman FG, Scerbo MW, Mikulka PJ, & Pope AT (2000). A closed-loop system for examining psychophysiological measures for adaptive task allocation. International Journal of Aviation Psychology, 10(4), 393–410. 10.1207/S15327108IJAP1004_6 [DOI] [PubMed] [Google Scholar]
  42. Puma S, Matton N, Paubel P. v., Raufaste É, & El-Yagoubi R (2018). Using theta and alpha band power to assess cognitive workload in multitasking environments. International Journal of Psychophysiology, 123, 111–120. 10.1016/J.IJPSYCHO.2017.10.004 [DOI] [PubMed] [Google Scholar]
  43. Reason J (2000). Human error: Models and management. BMJ, 320(7237), 768–770. 10.1136/BMJ.320.7237.768 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Reimer B, Mehler B, Wang Y, & Coughlin JF (2010). The impact of systematic variation of cognitive demand on drivers’ visual attention across multiple age groups. Proceedings of the Human Factors and Ergonomics Society, 3(first 4), 2052–2056. 10.1518/107118110X12829370264321 [DOI] [Google Scholar]
  45. Rieger A, Fenger S, Neubert S, Weippert M, Kreuzfeld S, & Stoll R (2015). Psychophysical workload in the operating room: primary surgeon versus assistant. Surgical Endoscopy, 29(7), 1990–1998. 10.1007/s00464-014-3899-6 [DOI] [PubMed] [Google Scholar]
  46. Russell A et al. (2022). Package ‘emmeans’. In R topics documented (1st ed., 34, pp. 216–221). CRAN. [Google Scholar]
  47. Spitzer B, & Haegens S (2017). Beyond the status quo: A role for beta oscillations in endogenous content (re)activation. ENeuro, 4(4). 10.1523/ENEURO.0170-17.2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Sterman MB, & Mann CA (1995). Concepts and applications of EEG analysis in aviation performance evaluation. Biological Psychology, 40(1–2), 115–130. 10.1016/0301-0511(95)05101-5 [DOI] [PubMed] [Google Scholar]
  49. Teo G, Matthews G, Reinerman-Jones L, & Barber D (2020). Adaptive aiding with an individualized workload model based on psychophysiological measures. Human-Intelligent Systems Integration, 2(1–4), 1–15. 10.1007/s42454-019-00005-8 [DOI] [Google Scholar]
  50. Teo G, Reinerman-Jones L, Matthews G, Szalma J, Jentsch F, & Hancock P (2018). Enhancing the effectiveness of human-robot teaming with a closed-loop system. Applied Ergonomics, 67, 91–103. 10.1016/J.APERGO.2017.07.007 [DOI] [PubMed] [Google Scholar]
  51. Tobii Manual AB. (2020). Tobii Pro Glasses 3 – groundbreaking wearable eye tracker to analyze human behavior (pp. 3–5). Press Release. https://www.tobii.com/group/news-media/press-releases/2020/6/tobii-pro-glasses-3-groundbreaking-wearable-eye-tracker-to-analyze-human-behavior/
  52. Weber J, Catchpole K, Becker AJ, Schlenker B, & Weigl M (2018). Effects of flow disruptions on mental workload and surgical performance in robotic-assisted surgery. World Journal of Surgery, 42(11), 3599–3607. 10.1007/s00268-018-4689-4 [DOI] [PubMed] [Google Scholar]
  53. Weigl M, Stefan P, Abhari K, Wucherer P, Fallavollita P, Lazarovici M, Weidert S, Euler E, & Catchpole K (2016). Intra-operative disruptions, surgeon’s mental workload, and technical performance in a full-scale simulated procedure. Surgical Endoscopy, 30(2), 559–566. 10.1007/s00464-015-4239-1 [DOI] [PubMed] [Google Scholar]
  54. Weiss K, Khoshgoftaar TM, & Wang DD (2016). A Survey of Transfer Learning Journal of Big Data (Vol. 3). Springer International Publishing. 10.1186/s40537-016-0043-6 [DOI] [Google Scholar]
  55. Wilson GF, & Russell CA (2003). Real-time assessment of mental workload using psychophysiological measures and artificial neural networks. Human Factors, 45(4), 635–643. 10.1518/HFES.45.4.635.27088 [DOI] [PubMed] [Google Scholar]
  56. Wu C, Cha J, Sulek J, Zhou T, Sundaram CP, Wachs J, & Yu D (2020). Eye-tracking metrics predict perceived workload in robotic surgical skills training. Human Factors, 62(8), 1365–1386. 10.1177/0018720819874544 [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Yuksel BF, Oleson KB, Harrison L, Peck EM, Afergan D, & Piano L (2016). Learn piano with BACh: An adaptive learning interface that adjusts task difficulty based on brain state recommended citation. 10.1145/2858036.2858388 [DOI] [Google Scholar]
  58. Yurko YY, Scerbo MW, Prabhu AS, Acker CE, & Stefanidis D (2010). Higher mental workload is associated with poorer laparoscopic performance as measured by the NASA-TLX tool. Simulation in Healthcare, 5(5), 267–271. 10.1097/SIH.0b013e3181e3f329 [DOI] [PubMed] [Google Scholar]
  59. Zhou T, Cha JS, Gonzalez G, Wachs JP, Sundaram CP, & Yu D (2020). Multimodal physiological signals for workload prediction in robot-assisted surgery. ACM Transactions on Human-Robot Interaction, 9(2), 1–26. 10.1145/3368589 [DOI] [Google Scholar]
  60. Zhou X, Li Q, Kilsgaard S, Moradi F, Kappel SL, & Kidmose P (2016). A wearable ear-EEG recording system based on dry-contact active electrodes. IEEE Symposium on VLSI Circuits, Digest of Technical Papers, 7–8 September 2016. 10.1109/VLSIC.2016.7573559 [DOI] [Google Scholar]

RESOURCES