Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2021 Feb 18;16(2):e0247117. doi: 10.1371/journal.pone.0247117

High density optical neuroimaging predicts surgeons’s subjective experience and skill levels

Hasan Onur Keles 1,*, Canberk Cengiz 2, Irem Demiral 3, Mehmet Mahir Ozmen 4, Ahmet Omurtag 5
Editor: Manabu Sakakibara6
PMCID: PMC7891714  PMID: 33600502

Abstract

Measuring cognitive load is important for surgical education and patient safety. Traditional approaches of measuring cognitive load of surgeons utilise behavioural metrics to measure performance and surveys and questionnaires to collect reports of subjective experience. These have disadvantages such as sporadic data, occasionally intrusive methodologies, subjective or misleading self-reporting. In addition, traditional approaches use subjective metrics that cannot distinguish between skill levels. Functional neuroimaging data was collected using a high density, wireless NIRS device from sixteen surgeons (11 attending surgeons and 5 surgery resident) and 17 students while they performed two laparoscopic tasks (Peg transfer and String pass). Participant’s subjective mental load was assessed using the NASA-TLX survey. Machine learning approaches were used for predicting the subjective experience and skill levels. The Prefrontal cortex (PFC) activations were greater in students who reported higher-than-median task load, as measured by the NASA-TLX survey. However in the case of attending surgeons the opposite tendency was observed, namely higher activations in the lower v higher task loaded subjects. We found that response was greater in the left PFC of students particularly near the dorso- and ventrolateral areas. We quantified the ability of PFC activation to predict the differences in skill and task load using machine learning while focussing on the effects of NIRS channel separation distance on the results. Our results showed that the classification of skill level and subjective task load could be predicted based on PFC activation with an accuracy of nearly 90%. Our finding shows that there is sufficient information available in the optical signals to make accurate predictions about the surgeons’ subjective experiences and skill levels. The high accuracy of results is encouraging and suggest the integration of the strategy developed in this study as a promising approach to design automated, more accurate and objective evaluation methods.

Introduction

Excessive workload or acute stress may impact surgeons’ ability to process all the information available during surgery in the operation room (OR) and may result in low situational and safety awareness, impaired decision-making and performance. Under excessive workload, a surgeon may be more easily distracted, entertain fewer alternatives, or persist with ineffective strategies [13]. The extent to which mental workload degrades performance depends upon surgeons’ experience. Expertise is characterized by a combination of high performance and low cognitive load, allowing expert surgeons to process larger amounts of information and respond appropriately to unexpected events [4]. The mastery of complex tasks correlates with a progressive decrease in mental workload [5, 6] and minimally invasive operations are more workload intensive than open operations [7].

Several methods have been used to measure workload in surgery, including, subjective rating scales and physiological measurements (EEG, EKG etc) [8]. One of the most widely used is the subjective ratings scales known as NASA-TLX [911]. It is a multidimensional scale, initially developed for the use in the aviation industry. The NASA-TLX provides an overall index of mental workload as well as the relative contributions of six subscales: mental, physical, and temporal task demands; and effort, frustration, and perceived performance. The NASA-TLX score on an interval scale ranging low (1) to high (20) for each subscale. The widespread use of NASA-TLX is associated with its simplicity of application and interpretation. However, the NASA TLX has been criticized for not measuring the mental workload in a real time [12, 13] NASA-TLX is filled out after task completion to gather participants’ recall of their cognitive effort during surgery. Therefore, NASA TLX can be intrusive to primary task performance Thus, there is a need for more automated, more accurate and objective evaluation methods.

Psychophysiological measures allow a more objective workload assessment and can provide uninterrupted evaluation. They are gaining in popularity as progress in wearable sensor technology makes this approach less intrusive and capable of delivering continuous, multi-modal information. Electroencephalogram (EEG) and Heart Rate (HR), Heart Rate Variability(HRV) have also been correlated with NASA-TLX scores [14, 15] as well as expertise, task complexity and poor performance in surgery. Similarly, optical imaging (NIRS) has been used to assess the cognitive load of surgeons. It has been implemented to capture activation patterns in specific brain areas during surgical tasks with resulting correlations to surgical expertise and technical performance [1618].

NIRS is a developing technology for assessing the hemodynamic activity of the human cortex. NIRS can be a portable and wireless. As a wearable and lightweight method, NIRS provides a safe and practical approach for monitoring surgeons’ brain activity, and may be adopted even for applications in OR. Technical progress has also made it possible to use NIRS with a high density and multi-distance source detector separations [19]. NIRS is a method which is very sensitive to the superficial layers of the head, i.e. the skin and the skull, where systemic interference occurs. Thus, the NIRS signal is contaminated with systemic interference of superficial origin. Our approach to overcome this problem has been the use of additional short source-detector separation optodes as regressors. In this study, high density NIRS device allow us to investigate the hemodynamic changes in a different depth using different source-detector separations. We are able to use 52 channels with a separation distance of 1.5 cm, 36 with 2.12 cm, 68 with 3 cm, and 48 with 3.35 cm.

In this paper, we investigate cerebral hemodynamic correlates of NASA-TLX by segregating the ratings into sets of high and low scores. We also investigate the ability of prefrontal activations to discriminate between the subjects’ subjective experiences (high v low NASA-TLX score) as well as their skill levels (Student v Attending), by using machine learning techniques. Our aim is to show that there is sufficient information available in the optical signals to make accurate predictions about the surgeons’ subjective experiences and skill levels. We focussed on how both the topographic location and the sampling depth of a channel (which depends on source detector separation) affected the information available regarding the cognitive load and the expertise of the subjects. We also examined the effects of superficial signal regression by using the signal from the shortest (1.5 cm) separated channels to minimize the non-cerebral component from that of the normal (3 cm) and long (3.35 cm) separated channels [20, 21]. Our automated approach can be used instead of currently established metrics used for certification in surgery. These results demonstrate that the combination of advanced fNIRS imaging with machine learning approaches offers a practical and quantitative method to predict subjective experience and skill levels. The reported optical neuroimaging methodology is well suited to provide quantitative and standardized metrics for professional certifications and surgery education.

Methods

Subjects

Sixteen surgeons (11 Attending surgeons and 5 surgery resident) and 17 medical students participated in this study. Surgery residents (5 surgery residents) are excluded from further analysis due to low sample number. Subject demographics are listed in S1 Table. To avoid any issues regarding hemisphere-specific activation, only right-handed participants were selected. All participants provided written informed consent prior to the study commencing. Participants had normal or corrected to-normal vision. The Ethical Committee of the College of Medicine at Medipol University (10840098–604.01.01-E.33230) approved the study.

Experimental design

The study was conducted in a laboratory equipped with a Laparoscopic trainer box. Each trial took about 40 minutes including total time spent by participants to perform the tasks and setting up the system and devices. At the beginning of the trial, two 2-minutes-long videos that demonstrates the tasks were shown on a computer screen to the subjects. The rules and the possible errors were explained and demonstrated. The subjects were informed that they were free to stop and leave the experiment at any time they wished without risk of facing any circumstances. The surgical equipment and their usage were introduced to 20 of 33 subjects who had no laparoscopic surgery training or experience and were given 10-minutes free time to train on tasks and understand how the devices and equipment work. 13 of 33 subjects who had previous experience or training of laparoscopic surgery skipped this step.

The tasks of the experiment included two Fundamentals of Laparoscopic Surgery tasks, Peg Transfer and Threading as “Task 1” and “Task 2” respectively. Every task was preceded by 1-minute-long resting state fNIRS recording and followed by fulfilling of NASA-TLX questionnaire by the subjects to evaluate his/her own performance and taskload in that task. In addition to this, following the training session, NASA-TLX Pre-Test was fulfilled by subjects with no previous laparoscopic surgery experience or training.

Peg transfer task involved grasping, lifting and relocating rings from one rod to another using both laparoscopy graspers and was performed on a ring stack base. Four rods were selected and labelled 1, 2, 3 and 4, at the left-hand bottom, top left-hand, top right-hand and bottom right-hand corners on the ring stack base respectively. Four rings were initially put over rod 1 at the beginning of the trial. Participants were instructed to move each ring individually from rod 1 to 4. The procedure included picking up rings from rod 1 and place it to rod 2 one by one with left-hand only. Once all rings were moved to rod 2, they were moved individually by grasping and lifting the ring up with the left-hand, passing the ring over to the right-hand, then placed onto rod 3 using the right-hand only. The procedure completed by moving the rings individually from rod 3 to 4 using right-hand only. One of the two defined error types in this task were dropping a ring during transfer steps except dropping it on the beginning or aimed rod. The other one is re-grasping a ring with right-hand grasper after dropping it while passing it over from left-hand to right-hand.

Threading involved grasping a piece of string and passing it through the holes using both graspers and was performed on a Threading Base. The start of a piece of string was initially placed on the left-hand side of the base and the participants had to start passing it through the holes which were labelled 1–7 in a zigzag pattern using both hands as they were willing to. Passing the string beside or behind a hole instead of through the hole and trying to continue on passing the string through the following holes due to the visual distortion (often caused by the 2-dimensional viewing on the screen) was a defined error type in this task.

Our experimental procedure is summarized in Fig 1. Following the introduction and training session, the subjects were given time to relax and prepare for the tasks. After this step, the NIRS device was placed on the subjects’ heads. In order to have the best result, the inferior border of the device was placed over upper border of nasion and eyebrows, the hair that might be an obstacle for optodes to work efficiently were tried to be gently retracted and kept away and the string of the device was adjusted and fitted according to the verbal feedback from the subjects. This was followed by the calibration of the optodes for recording.

Fig 1. Experiment description.

Fig 1

(A) Schematic depicting the laparoscopic box simulator where surgeons and students perform the FLS tasks. High density, wireless NIRS is used to measure functional brain activation. (B) Optode positions cover the frontal cortex (24 sources and 32 detectors). Red circles are sources, blue circles are detectors C) Channel numbers with different source-detector separations. Total 204 channels: 52 channels with a separation distance of 1.5 cm, 36 with 2.12 cm, 68 with 3 cm, and 48 with 3.35 cm D) Experimental protocol design. Schematic showing the experimental design for this study. All attending, resident and students performed the FLS tasks. After each Nirs recording, participant filled out the NASA-TLX.

After the optimal calibration was determined, any sound, extra light source and any other attention distracting stimuli were removed from the experiment environment. At this point, the subjects who had a 10-minute-long training session were asked to fill NASA-TLX Pre-Test. Then, all the subjects were reminded not to talk during the NIRS recordings. Later, the subjects were asked to stand still with closed eyes for resting state recording until they were asked to start the task. NIRS recording started at this point. 60 seconds later, the resting state recording was stopped, the task recording was started and the subjects started Task 1. If the subject had completed the task in less than 6 minutes, the task was finished, NIRS recording was paused and the time was recorded. Otherwise the task was ended unfinished and recorded so, and again NIRS recording was paused. Following this, NASA-TLX form was given to the subjects.

After completing the NASA-TLX, the subject was asked to complete Task 2 with the same procedure of resting state and task NIRS recordings, time recordings and fulfilling of NASA-TLX forms respectively and the experiment was finished. The completion time of task is recorded for each subjects during the experiment.

Optical imaging

Functional neuroimaging data was collected using a high density NIRS device (NIRSIT, OBELAB, Korea). This system has 24 laser source (780/850nm) and 32 photo detectors at a sampling rate of 8.138Hz. This device uses 204 channels in total attached to the forehead of the subjects for measurements. Each type of channel covers the areas coinciding with lower parts of the dorsolateral prefrontal, upper part of the orbitofrontal and medial prefrontal, and part of the ventrolateral prefrontal cortex.

The NIRS system that was picked for the experiment was capable of measuring signals from four source-detector (SD) separations: 15, 21.2, 30, and 33.5 mm, and this allowed the measurement of alterations at various depths. Movements were tracked in real-time by using a gyroscope and an accelerometer. The measurements were obtained from the prefrontal cortex, where the center of the lowermost optical probes was aligned to the frontal pole zero (FPz) location of the 10–20 EEG system to remove positional uncertainty between subjects.

Data analysis

Preprocessing

The detector readings at two wavelengths for each channel were converted into concentration changes of oxy- and deoxy-hemoglobin by using the modified Beer-Lambert Law [22]. The differential path length factor (DPF) values for light wavelengths 780 nm and 850nm are 5.075 and 4.64 in respectively [23, 24].

The haemoglobin time series thus obtained contain information about the local brain activity as well as extraneous components, considered as artifacts, arising from non-cerebral tissue and muscle activations, respiration and heartbeat, as well as from systemic physiological effects such as Mayer waves. Other common artefacts are signal transients due to head movement. These may create temporary variations in optical coupling leading to excursions in the detector readings. A comprehensive review of techniques designed to detect and minimise such artifacts is provided in [25].

The changes in hemoglobin concentrations were band-pass filtered in the range 0.01–0.5 Hz in order to diminish components with characteristic durations briefer than ~2 s and greater than ~100 s. This eliminated the effects of respiration (~0.3 Hz), the heart beat (~1 Hz), helped reduce some motion artefacts that had sharp transients, and reduced the slow baseline drift [26]. The characteristic scale of Mayer waves, with a period of ~10 s, partly overlap with task evoked hemodynamic responses hence we did not attempt to filter them out; however they were not expected to influence our results, as Mayer waves likely do not correlate with cognitive load [27].

Next we used windowed standard deviation to quantify the presence of motion artefact [28]. This approach focussed on motion artefacts associated with excursions greater than that from the concurrent physiological effects. Note that if the standard deviation of the motion artefact was smaller, this implied that it was not a severe artefact. We calculated the standard deviation in nonoverlapping 10 s windows, and the median absolute deviation (MAD) of the set of standard deviations for each channel. Any window whose standard deviation was greater than 4.5 MAD values away from the median was considered as an outlier and excluded from subsequent analysis.

To confirm the plausibility of this approach we visually inspected the signals from randomly selected segments of the recordings from multiple subjects and experimental conditions. The outlier tended to occur approximately around the same time as increases in the accelerations as measured by the headset. Virtually all severe deflections were captured while many outliers contained only mild fluctuations, suggesting that our criterion was conservative.

Feature extraction and selection

As an indicator of the local activations in the prefrontal cortex we computed the standard deviation of the oxyhemoglobin changes in each channel over 10 s windows. Because greater evoked hemodynamic response tends to increase the standard deviation of the signal in a window, we took these values as the measure of the prefrontal activation. Note that the window mean of the signal may not be as good an indicator of activation in some cases when the evoked response is brief and followed by a dip. By separately exploring other variables such as the window mean, skewness and kurtosis, we determined that the standard deviation was the best indicator for the types of analysis reported in this paper. The standard deviation or variance have frequently been used in machine learning studies with fNIRS [2931]. Other feature extraction techniques were described in [32]. The activations were used for analysing the information available from the channels with different separation distances. For some calculations we used the signals from the channels with a separation of 1.5 cm to perform superficial signal regression (SSR) in the 3 and 3.35 cm channels, in accordance with the methodology in [33]. This was done in order to explore the ability of short separation channel signals to eliminate the signal component originating from layers above the cortex. The feature types were prioritised by using the Pearson correlation between the observations and the labels, a standard feature-selection technique [23 and references therein]. The prioritization was used to select a small subset from the full set of features in the classification. Other feature selection techniques were also explored but not adopted since they did not appear to improve the classification accuracy. These were the Minimum Redundance and Maximum Relevance and the Chi-square tests, as implemented by Matlab’s functionsfscmrmr and fscchi2, respectively (Matlab v.8.2.0.701; The MathWorks, Inc., Natick, Massachusetts, United States).

Classification

We examined the cerebral hemodynamic correlates of NASA-TLX by segregating the ratings into sets of high and low scores. For each subject and episode the mean score (average over the 6 dimensions of the NASA-TLX) was used. The high/low scores were discriminated by using the median score as a cut-off value. Separate cut-off values were used for the Student and Attending groups of subjects. The statistical significance of the difference between two sets of values (e.g. high v low scores or Student v Attending subjects) was determined by means of the non-parametric Kolmogorov-Smirnov test.

We also examined the ability of prefrontal activations to discriminate between the subjects’ subjective experiences (high v low NASA-TLX score) as well as their skill levels (Student v Attending), by using machine learning techniques. The scores or the skill levels were used as the binary valued labels to be classified. For this purpose feature matrices were built with each row (an observation) representing the standard deviations averaged over an episode (e.g. Task 1) for a subject. Each column of the feature matrix corresponded to a channel. The columns therefore represented types of features available. We then chose a small group of features from the prioritised list of features and used it to train Support Vector Machines (SVM) with linear kernels. The accuracy of the SVMs were determined by means of 5-fold cross validation. Note that because each row is an entire experimental episode for a subject, the data from a subject that is in the training set was automatically excluded from the test set. This process was repeated 2000 times to obtain a distribution of accuracy values, revealing the variability due to different training-test partitions. The small set of features was then enlarged by progressively including more types of features and repeating the classification. This procedure allowed us to examine the accuracy as a function of the number of features used, and to focus on particular subsets of the local activations.

Statistical analysis

In order to assess the statistical significance of the difference between two groups of paired results, we used the non-parametric Wilcoxon signed-rank test. The descriptive results (Figs 24) comparing two groups, such as low v high cognitive load or student v attending subjects, contained paired data for each channel. The null hypothesis was that the results from both groups were drawn from the same population. In testing this hypothesis we utilized the Bonferroni procedure which insured that if each of the k tests has p < a / k, then the null hypothesis will be falsely rejected with a probability no greater than a. We set a = 0.05; and k = 16, the product of the four types of channel separations and the two prefrontal sides (right v left). In assessing the statistical significance of the differences in the topographic representation (Fig 5) we consulted the corresponding box plots in Fig 4 as explained in the Results. The statistical significance of the accuracy of prediction was determined through the permutation technique [34]. In this technique we randomly shuffled the labels and reassigned them to the observations, thereby creating a surrogate set of data. This was then used in classification following the steps of feature selection and cross validation, and generated a null distribution of accuracy values. In order avoid crowding the plots with indications statistical significance, in presenting the results of classification (Figs 68) we show the null and actual distributions as shaded areas so that the ranges where they were sufficiently distinct (and likely statistically significantly different) were visually evident.

Fig 2. Prefrontal activation associated with high and low NASA-TLX scores.

Fig 2

Channel separation distance for each column are labelled at the top of the figure. Student subjects only are shown. The high/low cut-off was the median score. On each box, the central mark indicates the median while the bottom and top edges indicate the 25th and 75th percentiles. Circles represent individual channels. Statistically significant difference between high and low scores is indicated by an asterisk above the boxes. The right and left prefrontal activations are shown separately as labelled at the bottom of the plot. The results for task 1 and 2 are shown in the top and bottom rows. For the values shown in this figure, please see S2 Table.

Fig 4. Prefrontal activations associated with student and attending subjects for task 1 and task 2.

Fig 4

For the values shown in this figure, please see S4 Table.

Fig 5. Topographic projection of the prefrontal activations in task 1, for the student (top row) and attending (bottom) subjects.

Fig 5

Different channel separation distances are shown as different columns as labelled at the top of the figure.

Fig 6. Accuracy of classification of the NASA-TLX score of student subjects in task 1.

Fig 6

The scores were segregated into sets of high and low values by using the median score as cut-off and predicted based on prefrontal activations by using support vector machine. A-D show the accuracy as a function of the number of channels used. The thick curves are the mean of 2000 repetitions of different 5-fold cross-validations. The shaded region indicates the standard deviation of the variability in the accuracy. The results (blue curve) are compared with those from a surrogate set that contained the same data but the scores were randomly reassigned prior to classification (black). E-H show the frontal view locations of selected channels that resulted in high accuracy.

Fig 8. Accuracy of classifying subjects into student v attending in task 1 for channel separations of 3 and 3.35 cm, using superficial signal regression (SSR) from the 1.5 cm channels.

Fig 8

This figure is the counterpart to subplots C-D of Fig 6 using SSR. The channel locations in C are not shown since the accuracy was low overall.

Results

In this section we present the results of our analysis of the fNIRS data recorded from Student and Attending subjects. We hypothesized that the task related activations of PFC (measured by using changes in the optical signal as described in Methods) would reflect the subjective experience of the subjects as well as their levels of skill and, furthermore, that there would be differences in the activations due to the different sampling depths of the channels with different separation distances. The subjective experiences were monitored using responses to the NASA-TLX questionnaire. We considered subjects in two groups of skill levels: Students who had no laparoscopy or training experience and Attending residents who had previously performed a median of 75 laparoscopic operations (Table 1). We first compared the high v low NASA-TLX scoring subjects’ activations (Figs 2 & 3), then the effects of skill level (Fig 4), and the topographic distributions of the differences due to skill level (Fig 5). We next used automated classification in order to quantify the ability of the fNIRS signals to predict the subjective experience during the training tasks as well as the skill level of the subjects (Figs 6 and 7).

Table 1. Subject demographics and descriptive data.

Group Demographics
Group Number Median SE In Years (Range) SE In Years Mean±SD Median Age (Range) Age Mean±SD Median LSE In Number (Range) LSE In Number Mean±SD
Undergraduate Student 17 0(0–0) 0±0 19 (18–27) 19.8±2.2 0(0–0) 0±0
Surgery Resident 5 1.5(0,2–3) 1.8±1 28 (26–29) 27.6±1 0(0–40) 9±15.6
Attending Surgeon 11 12(5–30) 12±6.8 37 (28–55) 37.4±7 75(8–350) 133±120.8)

SE:Surgery Experience; LSE: Laparoscopic surgery experience.

Fig 3. Prefrontal activations associated with high and low NASA-TLX scores.

Fig 3

This figure contains the same results as in Fig 1, but it is for the attending subjects. For the values shown in this figure, please see S3 Table.

Fig 7. Accuracy of classifying subjects into student v attending in task 1.

Fig 7

The subplots represent the same kind of information as in Fig 5.

Group differences

Fig 2 shows that in the Student subjects who experienced higher task load also had higher PFC activations. The figure shows each channel as a circle and the box indicates the range from the 25th to the 75th percentiles, with the line inside box showing the median. The y-axis is the standard deviation of the oxyhemoglobin changes averaged over an episode, such as Task 1 or 2, and over subjects. The variability of the response appeared lowest in the shortest and greatest in the longest separation (3.35 cm, D and H) channels. The figure shows that the right and left PFC differences in 3 cm channels was significant in the low load students.

Fig 3 presents the corresponding results this time for the Attending subjects. The figure indicates that while there was a similar tendency for the variability in the response to increase with channel separation, the difference between the high and low task loaded subjects was reversed relative to those of Student subjects. The Attending subjects who reported higher task load generally had lower PFC activations, the difference being statistically significant only in the shortest separations in Task 1 in the Right PFC.

Having examined the fNIRS correlates of task load, we turned to the effects of the differences in skill level. Fig 4 shows that lack of prior laparoscopy experience correlated with higher PFC activation. The students has significantly higher activations than Attending residents for most of the different sampling depths. In addition, for the student subjects there was a pronounced asymmetry in the case of the deepest sampling channel (D and H), the activation on the left being higher than on the right, although the asymmetry did not reach statistical significance.

The box plots so far presented show the values obtained from each channel, however it is instructive to see the locations of the individual channels together with their activations. To that end we present in Fig 4, the colour coded frontal view of activation projected on a drawing of the brain in order to illustrate the approximate locations. They were interpolated between the channels to show the spatial changes over a continuous field of activation. This figure only presents the results for Task 1. In the short channels (A, E, B, F) the difference between student and Attending subjects appears as a small difference in the tone distributed over the entire region. E.g. the blue in A and B are somewhat lighter than in E and F. That difference is the counterpart of the relatively shifted locations of the boxes in Fig 4A & 4B.

In the normal channel (C and G) there is a hint of high activation localized near the top left, within the dorsolateral PFC. The localization of the higher activation in student subjects is most apparent in the deep sampling channel (D and H). In student subjects (D), higher activation is visible by inspection in parts of the left lateral PFC and the orbitofrontal cortex. Some activation is localized near the lower part of the right dorsolateral PFC, although lower than that in the left.

To evaluate the statistical significance of the differences between the student (first row) and attending (2nd row) subjects in Fig 5, we used the corresponding box plots in Fig 4. Consider, for example, the higher PFC activation in the students relative to the attending population, namely the yellow regions in Fig 5D contrasted with the blue colour of the same regions in Fig 5H. This difference was significant because, as shown by box plots in Fig 4D, the activation in the student population in the left PFC was significantly different than that in the attending population.

Machine learning results

The results so far showed the differences in PFC activations of subjects who reported different cognitive load and had different skill levels. In order to delve deeper into these differences, and quantify the amount of information that PFC activation may contain with regard to subjective experience, we used machine learning techniques. Using progressively greater numbers of features (prioritized as described in Methods) to predict subjects’ NASA-TLX score led to the accuracies shown in Fig 6. As before, each column shows the results for a different channel separation distance. The top row of the figure (A-D) indicates that the accuracy and its range of variability under repeated cross validation (blue curve and shaded) are greater than expected by chance alone (black curve and gray shaded). In the short separation channels (A-B) the accuracy depends relatively little on the number of features included in classification. In B, the accuracy rises slightly with increasing number of features.

By contrast in the normal separation channel (Fig 6C) the accuracy is high in a small system and decreases quickly with increasing number of features. In order to visualize the locations of the channels which are responsible for the highest accuracy, we show their locations in the topographic plots directly underneath. For example G shows the four 3 cm separated channels that are in the small system whose accuracy is shows to be nearly 90% in C. The accuracy of the deep sampling channel (D) peaks at around 12 channels, whose locations are shown in H. For the short separation channels the accuracy plots (A-B) are relatively flat, therefore we show a larger number of channels (the first 20) from the prioritised list (E-F).

Fig 7 gives the corresponding results for the prediction of skill levels. The figure indicates that the classifying ability of the short channels are spatially distributed (as in the previous figure), producing a relatively flat curve for the accuracy as a function of the number of features (A-B). This is reflected in the wide coverage of the channel locations of the first 20 channels (E-F). The normal separation channel gives the highest classification accuracy for small systems (C) and the locations of the first 4 channels are shown (G). As distinct from Fig 6 which was for classifying task load, Fig 7 shows that in the case of classifying skill levels only a small number of deep sampling channels (H) participate in generating the highest accuracy (D).

Some of the results we presented above were from Task 1 only (Figs 47). The counterparts of these results from Task 2 were on the whole similar with only minor differences, and were not shown.

Discussion

In this paper we used high-density continuous-wave fNIRS data recorded from surgery students and Attending residents to show the extent of group differences in PFC activation between subjects with different levels of skill and subjective task load. We quantified the ability of PFC activation to predict the differences in skill and task load while focussing on the effects of fNIRS channel separation distance on the results. Our recordings were carried out using 52 channels with a separation distance of 1.5 cm, 36 with 2.12 cm, 68 with 3 cm, and 48 with 3.35 cm, each type of channel covering the areas coinciding with lower parts of the dorsolateral prefrontal, upper part of the orbitofrontal and medial prefrontal, and part of the ventrolateral prefrontal cortex (Fig 1). Our measure of PFC activation was based on the optically detected variability in the local changes in oxyhemoglobin concentrations. We also examined the effects of superficial signal regression by using the signal from the shortest (1.5 cm) separated channels to minimize the non-cerebral component from that of the normal (3 cm) and long (3.35 cm) separated channels.

The PFC activation was greater in students who reported higher-than-median task load, as measured by the NASA-TLX survey. This difference, visible in most channel separations, was statistically significant in all the 3 cm separated channels (Fig 2). Higher engagement of the PFC with greater task load is well known from previous studies [e.g. 25]. However in the case of Attending subjects the opposite tendency was observed, namely higher activations in the lower v higher task loaded subjects. This reversal was statistically significant in some of the 1.5 cm separated channels (Fig 3). This change in skilled subjects’ PFC response relative to that of unskilled ones may have been due to the fact that parts of our experimental procedure and the stylized tasks unavoidably differed from the actual laparoscopic operations to which the skilled participants were accustomed. In such non-optimal situations the experts’ established schemas are sometimes not applicable leading to higher cognitive load than in novices, in what has been called the expertise reversal effect [35].

We found that response was greater in the left PFC of students (Fig 4D and 4H), particularly near the dorso- and ventrolateral areas (Fig 5D), which was in accord with the known dominance of the left hemisphere in motor action. Previous studies have linked left hemisphere to behavioural efficiency, regardless of the subjects’ handedness [36], as well as to interference processing [37], the neural representation of grasping [38], bimanual coordination [39], and overall movement organisation and selection [40]. Interestingly, localisation and lateralization arose in our data only in the deeper sampling channels (with a separation distance of 3.35 cm) clearly suggesting their cerebral origin. This asymmetry did not arise in Attending subjects presumably because the skilled activity in their case had been relatively automated and its control shifted to non-frontal cortical areas [41, 42]. Note that the greater access to cerebral activity came at the cost of lower signal-to-noise ratio, reflected in the increasing response variability with channel separation distance (Fig 4A–4D). Several studies have used source detector separation distances in the range 2.5–3.5 cm for interrogating brain activity, and our longest separation is within this range [4345].

Our results showed that the classification of skill level and subjective task load could be predicted based on PFC activation. In a machine learning study, the activations were first prioritised in terms of their Pearson correlation with the prediction targets (student v Attending or high v low subjective score), a small subset of the highest priority features (or channels) were selected for cross-validated prediction of the targets using SVM, then the cross-validated prediction was repeated by progressively increasing the number of features. The resulting accuracy of prediction (y-axis) as a function of the number of channels (x-axis) are shown for subjective task load in Fig 6 and for skill level in Fig 7. Repeated calculations with different training/test partitioning of the data was a cause of variability in the results. We used this variability to generate a distribution of results from 2000 repetitions whose standard deviation was indicated by the blue shaded region in the figures.

In order to examine the statistical validity of the machine learning study we produced surrogate data by randomly scrambling the predicted labels of the original data and repeated the procedure of feature selection and classification by cross-validation. The mean of this null population of accuracies is shown by the black curves and its standard deviation by the gray shaded regions in Figs 6 and 7. With a sufficiently large data set and balanced number of targets the null accuracy was expected to approach 50% however the null accuracy in our results was clearly biased above 50%. This was due to the fact that we repeated the entire study including feature selection with the surrogate data. Consequently when features were selected according to their correlation with labels, the spuriously correlated features were assigned high priorities, and they generated the higher than 50% null accuracies. As expected the null accuracy declined toward 50% as larger numbers of features were included, as the figure shows. This procedure allowed us to properly identify the part of accuracy that was truly above chance level.

Figs 6 and 7 show that shorter separation channels (1.5 and 2.12 cm) have predictive power distributed over the frontal regions, which may be indicative of correlations between the prediction targets and systemic effects picked up from the layers above the cortex. By contrast the longer separated channels had their peak accuracy occurring with only a few channels (Figs 6C, 6G, 7C, 7D, 7G and 7H), suggesting that the discrimination between levels of task load and skill may be associated with specific PFC areas acting in concert. This finding is consistent with previous results [46] and may be used as a basis for a practical and light-weight brain-computer interface.

In order to further investigate the relative contributions of superficial v cerebral layers we performed SSR, which was designed to minimize the superficial contribution in each longer separated channel [33]. Fig 8 indicates that SSR drastically reduced the accuracy of the longer separated channels. The reduction was greater in the 3 cm separated channels. In the longest separated channels, which probe deeper, post-SSR accuracy was still reasonably high and peaked with a small number of channels (Fig 8B which is to be contrasted with Fig 7D). This confirms that the subjects’ level of expertise had a significant effect on the task related PFC activation, as shown by the response from the longest separated channels. The interpretation of the overall reduction in accuracy Fig 8 is less clear, however. This could be due to the fact that the superficial signal was responsible for most of the accuracy (Fig 7C) and the superficial component was mostly eliminated by SSR resulting in Fig 8A. Or, it could be due to the fact that the cerebral signal had a significant role in the accuracy shown in Fig 7C, but SSR caused much of this to be eliminated because the regressor was taken from channels whose separation (1.5 cm) was not sufficiently short.

We repeated some of the above calculations by including the signals from the motion sensors of the fNIRS headset as additional features. These included three distinct time series from the accelerometer and another three from its gyroscope. Including the motion signals as additional predictors had only minor and unsystematic effects on the accuracy. In addition calculations, we used the motion signals as the only set of features in classification, and this yielded accuracies that were no greater than chance. These results further suggested that motion artifacts had been sufficiently minimized in our preprocessing steps and did not play a role in our results.

Our study had some limitations. The standard optical topography has been used in this study. This method has several disadvantages. First, two dimensional imaging uses sparse arrangements of source and detector optodes and therefore the positions of the measurement channels do not always overlap the real activation foci. Therefore, the spatial resolution of fNIRS imaging is low compared to fMRI. Second, the positions of the measurement channels relative to brain shape vary among subjects, resulting in reduced reliability of comparison among subjects. We plan to use the diffuse optical tomography to solve the resolution problems in the further study.

Short separation channels are essential for accurate fNIRS measurements because they enable the extra-cerebral signal contribution to be regressed from standard separation channels. This reduces the chance that extra-cerebral hemodynamics will be falsely interpreted as functional brain activation. Our results indicates that SSR with 15mm drastically reduced the accuracy This may have been due to the short separation distance is not enough as a regressor. Based on the literature [20, 21]; the optimum short-separation distance is 8.4 mm (vary across the scalp) in the typical adult. The effects of channel separation distance and noncerebral blood flow have been reported in numerous previous studies [20, 4751].

The number of surgery resident was not enough for statistical analysis. The recruitment was not sufficient due to timeline and schedule problems, logistics issues. We have not investigated the effect of the small group of surgery resident and not compare to students and attending surgeons.

In addition, we have not investigated the longitudional surgical skills learning over time. There is limited research relating to longitudinal surgical skills learning over time. This can be also considered in future studies.

Conclusion

In this study, we have evaluated prefrontal haemodynamic responses to a set of surgery tasks in two groups: students vs attending surgeons. We quantified the ability of PFC activation to predict the differences in skill and task load between the two groups while focussing on the effects of fNIRS channel separation distance on the results. The high accuracy of results is encouraging and suggest the integration of the strategy developed in this study as a promising approach to design automated, more accurate and objective evaluation methods. Optical imaging is significantly more accurate than current established subjective methods. Our approach brings objectivity, and accuracy in measuring the mental workload and predicting the cognitive load. In particular, this approach may be expanded to robustly identify and predict surgical candidates that may achieve faster learning curves for learning complex surgical skills, and by extension, achieve technical and non-technical mastery with a significantly faster rate than other surgical trainees. In summary, we hope that this neuroimaging approach for objective quantification for will contribute toward a paradigm change in broad applications, such as surgical certification and assessment, aviation training, and motor skill rehabilitation and therapy.

Supporting information

S1 Fig. Accuracy of classification of the NASA-TLX score in task 1.

This figure shows the same results as in Fig 5 but it is for attending subjects. No channels locations are indicated in H since the accuracy remained at chance level.

(TIFF)

S1 Table. Subject demographics, task completion times and task NASA-TLX scores.

(DOCX)

S2 Table. Prefrontal activation associated with high and low NASA-TLX scores for student subjects.

(see Fig 2).

(XLSX)

S3 Table. Prefrontal activations associated with high and low NASA-TLX scores.

(See Fig 3).

(XLSX)

S4 Table. Prefrontal activations associated with student and attending subjects for task 1 and task 2.

(See Fig 4).

(XLSX)

Data Availability

All relevant data are within the manuscript and its Supporting information files.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Chrouser KL, Xu J, Hallbeck S, Weinger MB, Partin MR. The influence of stress responses on surgical performance and outcomes: Literature review and the development of the surgical stress effects (SSE) framework. Am J Surg. 2018;216: 573–584. 10.1016/j.amjsurg.2018.02.017 [DOI] [PubMed] [Google Scholar]
  • 2.Carswell CM, Clarke D, Seales WB. Assessing mental workload during laparoscopic surgery. Surg Innov. 2005;12: 80–90. 10.1177/155335060501200112 [DOI] [PubMed] [Google Scholar]
  • 3.Patil P V, Hanna GB, Cuschieri A. Effect of the angle between the optical axis of the endoscope and the instruments’ plane on monitor image and surgical performance. Surg Endosc. 2004;18: 111–114. 10.1007/s00464-002-8769-y [DOI] [PubMed] [Google Scholar]
  • 4.Dias RD, Ngo-Howard MC, Boskovski MT, Zenati MA, Yule SJ. Systematic review of measurement tools to assess surgeons’ intraoperative cognitive workload. Br J Surg. 2018;105: 491–501. 10.1002/bjs.10795 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Schneider W, Shiffrin RM. Controlled and automatic human information processing: I. Detection, search, and attention. Psychol Rev. 1977;84: 1–66. 10.1037/0033-295X.84.1.1 [DOI] [Google Scholar]
  • 6.Keehner MM, Tendick F, Meng M V, Anwar HP, Hegarty M, Stoller ML, et al. Spatial ability, experience, and skill in laparoscopic surgery. Am J Surg. 2004;188: 71–75. 10.1016/j.amjsurg.2003.12.059 [DOI] [PubMed] [Google Scholar]
  • 7.Prabhu A, Smith W, Yurko Y, Acker C, Stefanidis D. Increased stress levels may explain the incomplete transfer of simulator-acquired skill to the operating room. Surgery. 2010;147: 640–645. 10.1016/j.surg.2010.01.007 [DOI] [PubMed] [Google Scholar]
  • 8.Charles RL, Nixon J. Measuring mental workload using physiological measures: A systematic review. Appl Ergon. 2019;74: 221–232. 10.1016/j.apergo.2018.08.028 [DOI] [PubMed] [Google Scholar]
  • 9.Guru KA, Shafiei SB, Khan A, Hussein AA, Sharif M, Esfahani ET. Understanding Cognitive Performance During Robot-Assisted Surgery. Urology. 2015;86: 751–757. 10.1016/j.urology.2015.07.028 [DOI] [PubMed] [Google Scholar]
  • 10.Hardy DJ, Wright MJ. Assessing workload in neuropsychology: An illustration with the Tower of Hanoi test. J Clin Exp Neuropsychol. 2018;40: 1022–1029. 10.1080/13803395.2018.1473343 [DOI] [PubMed] [Google Scholar]
  • 11.Yurko YY, Scerbo MW, Prabhu AS, Acker CE, Stefanidis D. Higher mental workload is associated with poorer laparoscopic performance as measured by the NASA-TLX tool. Simul Healthc. 2010;5: 267–271. 10.1097/SIH.0b013e3181e3f329 [DOI] [PubMed] [Google Scholar]
  • 12.de Winter JCF. Controversy in human factors constructs and the explosive use of the NASA-TLX: A measurement perspective. Cogn Technol Work. 2014;16: 289–297. 10.1007/s10111-014-0275-1 [DOI] [Google Scholar]
  • 13.McKendrick RD, Cherry E. A Deeper Look at the NASA TLX and Where It Falls Short. Proc Hum Factors Ergon Soc Annu Meet. 2018;62: 44–48. 10.1177/1541931218621010 [DOI] [Google Scholar]
  • 14.Zakeri Z, Mansfield N, Sunderland C, Omurtag A. Physiological correlates of cognitive load in laparoscopic surgery. Sci Rep. 2020;10: 12927 10.1038/s41598-020-69553-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Rieger A, Stoll R, Kreuzfeld S, Behrens K, Weippert M. Heart rate and heart rate variability as indirect markers of surgeons’ intraoperative stress. Int Arch Occup Environ Health. 2014;87: 165–174. 10.1007/s00420-013-0847-z [DOI] [PubMed] [Google Scholar]
  • 16.Leff DR, Elwell CE, Orihuela-Espina F, Atallah L, Delpy DT, Darzi AW, et al. Changes in prefrontal cortical behaviour depend upon familiarity on a bimanual co-ordination task: an fNIRS study. Neuroimage. 2008;39: 805–813. 10.1016/j.neuroimage.2007.09.032 [DOI] [PubMed] [Google Scholar]
  • 17.Modi HN, Singh H, Orihuela-Espina F, Athanasiou T, Fiorentino F, Yang G-Z, et al. Temporal Stress in the Operating Room: Brain Engagement Promotes “Coping” and Disengagement Prompts “Choking”. Ann Surg. 2018;267: 683–691. 10.1097/SLA.0000000000002289 [DOI] [PubMed] [Google Scholar]
  • 18.Shetty K, Leff DR, Orihuela-Espina F, Yang GZ, Darzi A. Persistent prefrontal engagement despite improvements in laparoscopic technical skill. JAMA Surg. 2016;151: 682–684. 10.1001/jamasurg.2016.0050 [DOI] [PubMed] [Google Scholar]
  • 19.Khoe HCH, Low JW, Wijerathne S, Ann LS, Salgaonkar H, Lomanto D, et al. Use of prefrontal cortex activity as a measure of learning curve in surgical novices: results of a single blind randomised controlled trial. Surg Endosc. 2020. 10.1007/s00464-019-07331-7 [DOI] [PubMed] [Google Scholar]
  • 20.Gagnon L, Cooper RJ, Yücel MA, Perdue KL, Greve DN, Boas DA. Short separation channel location impacts the performance of short channel regression in NIRS. Neuroimage. 2012;59: 2518–2528. 10.1016/j.neuroimage.2011.08.095 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Brigadoi S, Cooper RJ. How short is short? Optimum source-detector distance for short-separation channels in functional near-infrared spectroscopy. Neurophotonics. 2015;2: 25005 10.1117/1.NPh.2.2.025005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Delpy DT, Cope M, van der Zee P, Arridge S, Wray S, Wyatt J. Estimation of optical pathlength through tissue from direct time of flight measurement. Phys Med Biol. 1988;33: 1433–1442. 10.1088/0031-9155/33/12/008 [DOI] [PubMed] [Google Scholar]
  • 23.Choi J-K, Choi M-G, Kim J-M, Bae H-M. Efficient data extraction method for near-infrared spectroscopy (NIRS) systems with high spatial and temporal resolution. IEEE Trans Biomed Circuits Syst. 2013;7: 169–177. 10.1109/TBCAS.2013.2255052 [DOI] [PubMed] [Google Scholar]
  • 24.Boas DA, Gaudette T, Strangman G, Cheng X, Marota JJ, Mandeville JB. The accuracy of near infrared spectroscopy and imaging during focal changes in cerebral hemodynamics. Neuroimage. 2001;13: 76–90. 10.1006/nimg.2000.0674 [DOI] [PubMed] [Google Scholar]
  • 25.Tak S, Ye JC. Statistical analysis of fNIRS data: a comprehensive review. Neuroimage. 2014;85 Pt 1: 72–91. 10.1016/j.neuroimage.2013.06.016 [DOI] [PubMed] [Google Scholar]
  • 26.Naseer N, Hong K-S. fNIRS-based brain-computer interfaces: a review. Front Hum Neurosci. 2015;9: 3 10.3389/fnhum.2015.00003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Vermeij A, van Beek AHEA, Reijs BLR, Claassen JAHR, Kessels RPC. An exploratory study of the effects of spatial working-memory load on prefrontal activation in low- and high-performing elderly. Front Aging Neurosci. 2014;6: 303 10.3389/fnagi.2014.00303 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Scholkmann F, Spichtig S, Muehlemann T, Wolf M. How to detect and reduce movement artifacts in near-infrared imaging using moving standard deviation and spline interpolation. Physiol Meas. 2010;31: 649–662. 10.1088/0967-3334/31/5/004 [DOI] [PubMed] [Google Scholar]
  • 29.Tai K, Chau T. Single-trial classification of NIRS signals during emotional induction tasks: towards a corporeal machine interface. J Neuroeng Rehabil. 2009;6: 39 10.1186/1743-0003-6-39 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Holper L, Wolf M. Single-trial classification of motor imagery differing in task complexity: a functional near-infrared spectroscopy study. J Neuroeng Rehabil. 2011;8: 34 10.1186/1743-0003-8-34 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Aghajani H, Garbey M, Omurtag A. Measuring Mental Workload with EEG+fNIRS. Front Hum Neurosci. 2017;11: 359 10.3389/fnhum.2017.00359 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Keshmiri S, Sumioka H, Yamazaki R, Ishiguro H. Differential Entropy Preserves Variational Information of Near-Infrared Spectroscopy Time Series Associated With Working Memory. Front Neuroinform. 2018;12: 33 10.3389/fninf.2018.00033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Gregg NM, White BR, Zeff BW, Berger AJ, Culver JP. Brain specificity of diffuse optical imaging: improvements from superficial signal regression and tomography. Front Neuroenergetics. 2010;2 10.3389/fnene.2010.00014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Orrù G, Pettersson-Yeo W, Marquand AF, Sartori G, Mechelli A. Using Support Vector Machine to identify imaging biomarkers of neurological and psychiatric disease: a critical review. Neurosci Biobehav Rev. 2012;36: 1140–1152. 10.1016/j.neubiorev.2012.01.004 [DOI] [PubMed] [Google Scholar]
  • 35.Armougum A, Gaston-Bellegarde A, La Marle CJ, Piolino P. Expertise reversal effect: Cost of generating new schemas. Comput Human Behav. 2020;111: 106406 10.1016/j.chb.2020.106406 [DOI] [Google Scholar]
  • 36.Serrien DJ, Sovijärvi-Spapé MM. Manual dexterity: Functional lateralisation patterns and motor efficiency. Brain Cogn. 2016;108: 42–46. 10.1016/j.bandc.2016.07.005 [DOI] [PubMed] [Google Scholar]
  • 37.Zhang L, Sun J, Sun B, Luo Q, Gong H. Studying hemispheric lateralization during a Stroop task by near-infrared spectroscopy. Opt Tech Neurosurgery, Neurophotonics, Optogenetics. 2014;8928: 892814 10.1117/1.JBO.19.5.057012 [DOI] [PubMed] [Google Scholar]
  • 38.Proverbio AM, Azzari R, Adorni R. Is there a left hemispheric asymmetry for tool affordance processing? Neuropsychologia. 2013;51: 2690–2701. 10.1016/j.neuropsychologia.2013.09.023 [DOI] [PubMed] [Google Scholar]
  • 39.Jäncke L, Peters M, Himmelbach M, Nösselt T, Shah J, Steinmetz H. fMRI study of bimanual coordination. Neuropsychologia. 2000;38: 164–174. 10.1016/s0028-3932(99)00062-7 [DOI] [PubMed] [Google Scholar]
  • 40.Rushworth MF, Krams M, Passingham RE. The attentional role of the left parietal cortex: the distinct lateralization and localization of motor attention in the human brain. J Cogn Neurosci. 2001;13: 698–710. 10.1162/089892901750363244 [DOI] [PubMed] [Google Scholar]
  • 41.Modi HN, Singh H, Yang G-Z, Darzi A, Leff DR. A decade of imaging surgeons’ brain function (part II): A systematic review of applications for technical and nontechnical skills assessment. Surgery. 2017;162: 1130–1139. 10.1016/j.surg.2017.09.002 [DOI] [PubMed] [Google Scholar]
  • 42.Nemani A, Yücel MA, Kruger U, Gee DW, Cooper C, Schwaitzberg SD, et al. Assessing bimanual motor skills with optical neuroimaging. Sci Adv. 2018;4: eaat3807. 10.1126/sciadv.aat3807 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.de Roever I, Bale G, Cooper RJ, Tachtsidis I. Functional NIRS Measurement of Cytochrome-C-Oxidase Demonstrates a More Brain-Specific Marker of Frontal Lobe Activation Compared to the Haemoglobins. Adv Exp Med Biol. 2017;977: 141–147. 10.1007/978-3-319-55231-6_19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Strangman GE, Li Z, Zhang Q. Depth sensitivity and source-detector separations for near infrared spectroscopy based on the Colin27 brain template. PLoS One. 2013;8: e66319 10.1371/journal.pone.0066319 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Pan Y, Borragán G, Peigneux P. Applications of Functional Near-Infrared Spectroscopy in Fatigue, Sleep Deprivation, and Social Cognition. Brain Topogr. 2019;32: 998–1012. 10.1007/s10548-019-00740-w [DOI] [PubMed] [Google Scholar]
  • 46.Omurtag A, Aghajani H, Keles HO. Decoding human mental states by whole-head EEG+fNIRS during category fluency task performance. J Neural Eng. 2017;14: 66003 10.1088/1741-2552/aa814b [DOI] [PubMed] [Google Scholar]
  • 47.Fekete T, Rubin D, Carlson JM, Mujica-Parodi LR. The NIRS Analysis Package: noise reduction and statistical inference. PLoS One. 2011;6: e24322 10.1371/journal.pone.0024322 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Zhang Y, Brooks DH, Franceschini MA, Boas DA. Eigenvector-based spatial filtering for reduction of physiological interference in diffuse optical imaging. J Biomed Opt. 2005;10: 11014 10.1117/1.1852552 [DOI] [PubMed] [Google Scholar]
  • 49.Kohno S, Miyai I, Seiyama A, Oda I, Ishikawa A, Tsuneishi S, et al. Removal of the skin blood flow artifact in functional near-infrared spectroscopic imaging data through independent component analysis. J Biomed Opt. 2007;12: 62111 10.1117/1.2814249 [DOI] [PubMed] [Google Scholar]
  • 50.Haeussinger FB, Dresler T, Heinzel S, Schecklmann M, Fallgatter AJ, Ehlis A-C. Reconstructing functional near-infrared spectroscopy (fNIRS) signals impaired by extra-cranial confounds: an easy-to-use filter method. Neuroimage. 2014;95: 69–79. 10.1016/j.neuroimage.2014.02.035 [DOI] [PubMed] [Google Scholar]
  • 51.Keshmiri S, Sumioka H, Okubo M, Ishiguro H. An Information-Theoretic Approach to Quantitative Analysis of the Correspondence Between Skin Blood Flow and Functional Near-Infrared Spectroscopy Measurement in Prefrontal Cortex Activity. Front Neurosci. 2019;13: 79 10.3389/fnins.2019.00079 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Manabu Sakakibara

3 Dec 2020

PONE-D-20-34607

High Density Optical Neuroimaging predicts surgeons’s subjective experience and skill levels

PLOS ONE

Dear Dr. Keles,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Two experts in the field have carefully evaluated the manuscript entitled, "High Density Optical Neuroimaging predicts surgeons’s subjective experience and skill levels". Their comments are appended below.

The first reviewer acknowledged the manuscript is well written leaving some minor methodological concerns regarding machine learning detail.

The second reviewer however raised several major concerns from all the aspects of the manuscript ranging from ‘Data Analysis’, ‘Machine Learning’, ’Results’, and ‘Discussion’. These concerns will be sure to improve and strengthen the manuscript.

I am looking forward receiving your revision according to these critiques and will make judgement.

Please submit your revised manuscript by Jan 17 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Manabu Sakakibara, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In your Methods section, please provide additional information about the participant recruitment method and the demographic details of your participants.

Please ensure you have provided sufficient details to replicate the analyses such as:

- the recruitment date range (month and year)

- a statement as to whether your sample can be considered representative of a larger population

- a description of how participants were recruited

- descriptions of where participants were recruited and where the research took place.

3. Please list the name and version of any software package used for statistical analysis, alongside any relevant references.

For more information on PLOS ONE's expectations for statistical reporting, please see https://journals.plos.org/plosone/s/submission-guidelines.#loc-statistical-reporting

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: No

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The paper addresses an interesting topic with clinical application. The flow of writing is good and easy to follow. The statistical and machine learning approach are rigorous.

Recommendations: on the machine learning section, it is recommenced to add more details on the feature selection. It is recommended to increase the resolution of the figures.

Reviewer #2: PLOS ONE Review Comments

Summary

In this study, the authors used fNIRS to measure the induced cognitive load on prefrontal cortex of expert and novice surgeons. The authors examined the potential predictive power of these fNIRS measurements for determining the cognitive load as well as the expertise of their participants in performing two laparoscopic surgery tasks: peg transfer and threading.

Major Comments:

Data Analysis:

How did the authors compute the hemoglobin concentrations (e.g., beer-lambert, etc.)? Also, the preprocessing of NIRS time series appear to solely include bandpass filtering without any baseline normalization and detrending (the latter for prevent potential non-stationarity in time series). Another issue is with regards to the use of standard deviation (3 in their case) for artefacts attenuation. This step is quite unconventional and is not (to the best of reviewer’s knowledge) practiced in the literature. The authors are encouraged to consult [1] for a comprehensive review of NIRS preprocessing.

[1] Tak, S. and Ye, J.C., 2014. Statistical analysis of fNIRS data: a comprehensive review. Neuroimage, 85, pp.72-91.

In addition, the authors also appear to use their measured NIRS standard deviation (i.e., after they preprocessed it) for quantification of the brain activation. This is also very unconventional. Specifically, what the authors refer to as “PFC activation” throughout the manuscript is indeed the deviation of the channels’ activation from the average (observed/induced) PFC activity. In fact, such deviation could still be present without any sufficient/significant induced PFC activation by the task. The authors are strongly encouraged to consult [2,3] for measures used for NIRS quantification.

[2] Naseer, N. and Hong, K.S., 2015. fNIRS-based brain-computer interfaces: a review. Frontiers in human neuroscience, 9, p.3.

[3] Keshmiri, S., Sumioka, H., Yamazaki, R. and Ishiguro, H., 2018. Differential entropy preserves variational information of near-infrared spectroscopy time series associated with working memory. Frontiers in neuroinformatics, 12, p.33.

With regard to the use of short-distance channels “to perform superficial signal regression (SSR)” on long-distance channels, it is not clear what methodology/approach the authors adapted in their study. Some of the available approaches are:

[4] Fekete, T., Rubin, D., Carlson, J.M. and Mujica-Parodi, L.R., 2011. The NIRS analysis package: noise reduction and statistical inference. PloS one, 6(9), p.e24322.

[5] Zhang, Y., Brooks, D.H., Franceschini, M.A. and Boas, D.A., 2005. Eigenvector-based spatial filtering for reduction of physiological interference in diffuse optical imaging. Journal of biomedical optics, 10(1), p.011014.

[6] Kohno, S., Miyai, I., Seiyama, A., Oda, I., Ishikawa, A., Tsuneishi, S., Amita, T. and Shimizu, K., 2007. Removal of the skin blood flow artifact in functional near-infrared spectroscopic imaging data through independent component analysis. Journal of biomedical optics, 12(6), p.062111.

[7] Haeussinger, F.B., Dresler, T., Heinzel, S., Schecklmann, M., Fallgatter, A.J. and Ehlis, A.C., 2014. Reconstructing functional near-infrared spectroscopy (fNIRS) signals impaired by extra-cranial confounds: an easy-to-use filter method. NeuroImage, 95, pp.69-79.

[8] Gagnon, L., Perdue, K., Greve, D.N., Goldenholz, D., Kaskhedikar, G. and Boas, D.A., 2011. Improved recovery of the hemodynamic response in diffuse optical imaging using short optode separations and state-space modeling. Neuroimage, 56(3), pp.1362-1371.

[9] Keshmiri, S., Sumioka, H., Okubo, M. and Ishiguro, H., 2019. An Information-Theoretic Approach to Quantitative Analysis of the Correspondence Between Skin Blood Flow and Functional Near-Infrared Spectroscopy Measurement in Prefrontal Cortex Activity. Frontiers in neuroscience, 13, p.79.

Analysis Steps with Machine Learning:

Although the authors mentioned the use of signal’s standard deviation for quantifying the brain activation, they then switched to its mean for ML-based classification (page 8: “For this purpose feature matrices were built with each row (an observation) representing the mean of the prefrontal activation over an episode”). Such inconsistencies and mixing of measures/metrics make quite difficult to realize the potential underlying property of the signal based on which the results have been derived.

The authors also mentioned that (page 9) “The feature types were prioritised by using the Pearson correlation between the observations and the labels, a standard feature-selection technique” – Do authors refer to the channels as features? If so, this step actually decided on which subset of channels to be used as inputs to their ML model.

The authors continued by explaining that (page 9) “We then chose a small group of features from the prioritised list and used it to train Support Vector Machines (SVM) with linear kernels.” – How did the authors decide this “small group of features?” Did they apply such utilities as “features importance” that are available through ML libraries? If so, what criterion/criteria was/were used to determine the level of significance of feature scores? (while using the term “feature,” I am assuming “selected channels” as per authors’ earlier explanation).

The authors also used 5-fold cross-validation for testing the accuracy of their model. As the authors explained, every participant in their study participated in two different tasks (i.e., Peg Transfer and

Threading). As such, did the authors ensure that data from the same individual were not present in both train and test sets while applying 5-fold cross-validation? This is an important issue while performing such analyses since data from the same individuals should not be expected to be highly different between the two tasks which could, in turn, results in overestimation of the model’s accuracy.

Result:

As one of their hypotheses, the authors stated (page 9) “that there would be differences in the activations due to the different sampling depths of the channels with different separation distances.” – The effect of channel separation and skin- other than cortical-blood-flow is a well-studied subject. In fact, short-distance channels are not expected to represent cortical activity. Similarly, channels with distances larger than 3.5 cm have been also generally accepted to not produce reliable results due to the absorption of optical signals as it penetrates deeper to cortical tissues. Please consult [5-9] above for more in-depth results and discussion on this matter.

The authors’ statement (page 10) “Figure 2 shows that in the Student subjects who had experienced higher task load also had higher PFC activations.” that is quite repeated throughout the manuscript is not really valid since the authors used the standard deviation of the signal. In other words, these quantities are how individuals’ PFC activity deviated from the averaged observed/induced PFC activation in their study.

With regard to results’ presentation, the authors sufficed to such statements as (e.g., page 10) “The difference between high and low load subjects were statistically significant in both Tasks in the case of the shortest (1.5 cm, A and E) and the normal separation (3 cm, C and G) channels.” while referring to the figures 2 through 7 without providing any descriptive statistics for their results. Precisely, it is not clear how the authors determined these results “were statistically significant?” What type of tests did they apply? What were the p-values, test-statistics, mean, standard deviation, confidence intervals, and effect-sizes associated with these tests? Did the authors corrected their p-values while determining their significance (e.g., Bonferroni, FDR, etc.)?

The authors stated that (page 11) “In addition, for the student subjects there was a pronounced asymmetry in the case of the deepest sampling channel (D and H), the activation on the left being significantly higher than on the right.” – The reviewer encourages the authors to consult the studies related to the effect of short/long-distance channels on NIRS measurements that are listed above. In particular, the “deepest sampling channel” in the present study could fall within the range that is considered not suitable for studying the cortical activation.

With regard to the source localization presented in Figure 5, the authors used such statements as (page 11) “there is a hint of high activation localized near the top left …” – Such assertion are not justified unless the authors provide statistical evidence for the possibility of such activations that differ from the other regions.

Above shortcomings with regard to the results’ representation also apply to the case of the results pertinent to ML-based results.

Discussion:

The authors stated that (page 13) “This difference, visible in most channel separations, was statistically significant in the 1.5 cm and most of the 3 cm separated channels (Figure 2).” – The authors did not present sufficient statistics for this claim (only presenting figures).

They also stated that (page 13) “in skilled subjects in the correlation of PFC activation with subjective task” however, the reviewer could not find/see these correlation analyses.

Another issue is with regard to hemispheric differences that authors referred to (page 14) “We found that response was greater in the left PFC of students (Figure 4D and H), ...” – Unless the authors perform statistical tests, such claims are not truly founded.

Other Comments:

The language of the manuscript requires a thorough auditing and proofread as it is not easy to follow and comprehend the study.

The quality of figures are very low and must be improved.

Please also break your Results Section into different subsections (e.g., one for test of significant differences, another for ML-based results, etc.) to help reader better follow and understand the results.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Soheil Keshmiri

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PLOS ONE Review Comments.pdf

PLoS One. 2021 Feb 18;16(2):e0247117. doi: 10.1371/journal.pone.0247117.r002

Author response to Decision Letter 0


18 Dec 2020

Reviewer #1: The paper addresses an interesting topic with clinical

application. The flow of writing is good and easy to follow. The

statistical and machine learning approach are rigorous.

Recommendations: on the machine learning section, it is recommenced to

add more details on the feature selection. It is recommended to increase

the resolution of the figures.

OUR RESPONSE: We have new details on feature selection in the new version. The new figures have high resolution.

Reviewer #2: PLOS ONE Review Comments

Summary

In this study, the authors used fNIRS to measure the induced cognitive

load on prefrontal cortex of expert and novice surgeons. The authors

examined the potential predictive power of these fNIRS measurements for

determining the cognitive load as well as the expertise of their

participants in performing two laparoscopic surgery tasks: peg transfer

and threading.

Major Comments:

Data Analysis:

-- How did the authors compute the hemoglobin concentrations (e.g.,

beer-lambert, etc.)?

OUR RESPONSE: The modified Beer-Lambert Law (MBLL) was applied to compute the concentration changes of HbO and HbR. The differential path length factor (DPF) values of 780 nm and 850nm are 5.075 and 4.64 in respectively. We provide an explanation of how we compute the haemoglobin concentrations in the updated preprocessing section. We have also cited the paper by Delpy 1988, Choi 2013 and Boas 2000.

-- Also, the preprocessing of NIRS time series appear

to solely include bandpass filtering without any baseline normalization

and detrending (the latter for prevent potential non-stationarity in

time series).

OUR RESPONSE: In our case the highpass filter (0.01 Hz) eliminated all baseline drift and any slow features (longer than 100 s). This was sufficient for the purposes of this study as our features were standard deviations computed within 10 s windows. The 10 second standard deviation was a mean subtracted measure hence not affected by any remaining slow components in the signal to our knowledge. The preprocessing stages and their justification are explained in greater detail in the new, expanded Data Analysis section.

-- Another issue is with regards to the use of standard

deviation (3 in their case) for artefacts attenuation. This step is

quite unconventional and is not (to the best of reviewer's knowledge)

practiced in the literature. The authors are encouraged to consult [1]

for a comprehensive review of NIRS preprocessing.

[1] Tak, S. and Ye, J.C., 2014. Statistical analysis of fNIRS data: a

comprehensive review. Neuroimage, 85, pp.72-91.

OUR RESPONSE: The standard deviation has in fact been used for artefact attenuation, for example Scholkmann 2010. In the next version of the manuscript we provide extensive description and justification of our method (together with a new figure). We have also cited the paper by Tak and Ye 2014.

-- In addition, the authors also appear to use their measured NIRS standard

deviation (i.e., after they preprocessed it) for quantification of the

brain activation. This is also very unconventional. Specifically, what

the authors refer to as "PFC activation" throughout the manuscript is

indeed the deviation of the channels' activation from the average

(observed/induced) PFC activity. In fact, such deviation could still be

present without any sufficient/significant induced PFC activation by the

task. The authors are strongly encouraged to consult [2,3] for measures

used for NIRS quantification.

[2] Naseer, N. and Hong, K.S., 2015. fNIRS-based brain-computer

interfaces: a review. Frontiers in human neuroscience, 9, p.3.

[3] Keshmiri, S., Sumioka, H., Yamazaki, R. and Ishiguro, H., 2018.

Differential entropy preserves variational information of near-infrared

spectroscopy time series associated with working memory. Frontiers in

neuroinformatics, 12, p.33.

OUR RESPONSE: The variance (closely related to the standard deviation used by our study) was described as one of the features available for use in fNIRS analysis by Naseer and Hong 2015. We have cited this paper in the new version of the manuscript.

Evoked hemodynamic response in a 10 s window this may be in the form of a rise in the oxyhemoglobin concentration followed by a dip. The within-window standard deviation is a sensitive measure for capturing this response. We provide an explanation of how in our study the standard deviation can be legitimately considered as a measure of PFC activation in the new subsection Feature extraction and selection

-- With regard to the use of short-distance channels "to perform

superficial signal regression (SSR)" on long-distance channels, it is

not clear what methodology/approach the authors adapted in their study.

Some of the available approaches are:

[4] Fekete, T., Rubin, D., Carlson, J.M. and Mujica-Parodi, L.R., 2011.

The NIRS analysis package: noise reduction and statistical inference.

PloS one, 6(9), p.e24322.

[5] Zhang, Y., Brooks, D.H., Franceschini, M.A. and Boas, D.A., 2005.

Eigenvector-based spatial filtering for reduction of physiological

interference in diffuse optical imaging. Journal of biomedical optics,

10(1), p.011014.

[6] Kohno, S., Miyai, I., Seiyama, A., Oda, I., Ishikawa, A., Tsuneishi,

S., Amita, T. and Shimizu, K., 2007. Removal of the skin blood flow

artifact in functional near-infrared spectroscopic imaging data through

independent component analysis. Journal of biomedical optics, 12(6),

p.062111.

[7] Haeussinger, F.B., Dresler, T., Heinzel, S., Schecklmann, M.,

Fallgatter, A.J. and Ehlis, A.C., 2014. Reconstructing functional

near-infrared spectroscopy (fNIRS) signals impaired by extra-cranial

confounds: an easy-to-use filter method. NeuroImage, 95, pp.69-79.

[8] Gagnon, L., Perdue, K., Greve, D.N., Goldenholz, D., Kaskhedikar, G.

and Boas, D.A., 2011. Improved recovery of the hemodynamic response in

diffuse optical imaging using short optode separations and state-space

modeling. Neuroimage, 56(3), pp.1362-1371.

[9] Keshmiri, S., Sumioka, H., Okubo, M. and Ishiguro, H., 2019. An

Information-Theoretic Approach to Quantitative Analysis of the

Correspondence Between Skin Blood Flow and Functional Near-Infrared

Spectroscopy Measurement in Prefrontal Cortex Activity. Frontiers in

neuroscience, 13, p.79.

OUR RESPONSE: In fact we used the method in Gagnon et al 2011. However due to an error in the manuscript the reference number showed another paper. This has now been fixed and we cite the Gagnon et al paper in the new subsection Feature extraction and selection.

Analysis Steps with Machine Learning:

-- Although the authors mentioned the use of signal's standard deviation

for quantifying the brain activation, they then switched to its mean for

ML-based classification (page 8: "For this purpose feature matrices were

built with each row (an observation) representing the mean of the

prefrontal activation over an episode"). Such inconsistencies and mixing

of measures/metrics make quite difficult to realize the potential

underlying property of the signal based on which the results have been

derived.

OUR RESPONSE: This in fact a misunderstanding due an unclear statement in the manuscript. The features we used by the standard deviations that were averaged over an experimental episode. Thus we did not switch to a new metric, but only averaged our metric over an episode. (The reason for doing so is related to another probing question raised by the reviewer, namely whether we included data from the same subject in both training and test sets. By averaging over an episode we insured that there was only one row per subject in the feature matrix.) We have changed the unclear statement in the subsection Classification by inserting the phrase “representing the standard deviations averaged over an episode”.

-- The authors also mentioned that (page 9) "The feature types were

prioritised by using the Pearson correlation between the observations

and the labels, a standard feature-selection technique" - Do authors

refer to the channels as features? If so, this step actually decided on

which subset of channels to be used as inputs to their ML model.

OUR RESPONSE: Yes, each channel contributes a separate column in the feature matrix.

-- The authors continued by explaining that (page 9) "We then chose a small

group of features from the prioritised list and used it to train Support

Vector Machines (SVM) with linear kernels." - How did the authors decide

this "small group of features?" Did they apply such utilities as

"features importance" that are available through ML libraries? If so,

what criterion/criteria was/were used to determine the level of

significance of feature scores? (while using the term "feature," I am

assuming "selected channels" as per authors' earlier explanation).

OUR RESPONSE: The Pearson correlation between a feature and the labels is a way of estimating the extent to which that feature will be helpful in classifying the labels. We have used this technique in previous studies e.g. Omurtag et al 2017 which provides references for it. But in the new version of the manuscript we provided additional details and referenced two other methods Minimum Redundance and Maximum Relevance and the Chi-square tests, available through Matlab. We explored these methods but they are more involved than the Pearson correlation without any clear benefits, hence were not adopted in our study.

-- The authors also used 5-fold cross-validation for testing the accuracy

of their model. As the authors explained, every participant in their

study participated in two different tasks (i.e., Peg Transfer and

Threading). As such, did the authors ensure that data from the same

individual were not present in both train and test sets while applying

5-fold cross-validation? This is an important issue while performing

such analyses since data from the same individuals should not be

expected to be highly different between the two tasks which could, in

turn, results in overestimation of the model's accuracy.

OUR RESPONSE: Thanks for pointing out this important issue. We ensured that the feature matrix contained only one row per subject for each classification problem. Therefore the training and test sets did not share data from the same subject. This is clarified in the new subsection Classification.

Result:

-- As one of their hypotheses, the authors stated (page 9) "that there

would be differences in the activations due to the different sampling

depths of the channels with different separation distances." - The

effect of channel separation and skin- other than cortical-blood-flow is

a well-studied subject. In fact, short-distance channels are not

expected to represent cortical activity. Similarly, channels with

distances larger than 3.5 cm have been also generally accepted to not

produce reliable results due to the absorption of optical signals as it

penetrates deeper to cortical tissues. Please consult [5-9] above for

more in-depth results and discussion on this matter.

OUR RESPONSE: We have added new statement and new references in the 3rd paragraph of the Discussion section reminding the reader of degraded signal-to-noise associated with longer source-detector separations. The long separation we have used is 3.35 cm, which is well within the range of 25-35 mm which is considered typical in several fNIRS studies, which we provide references for. Although we realize that, as indicated by the reviewer, one should try to avoid the longer end of this range in order to avoid noise, as we have done.

-- The authors' statement (page 10) "Figure 2 shows that in the Student

subjects who had experienced higher task load also had higher PFC

activations." that is quite repeated throughout the manuscript is not

really valid since the authors used the standard deviation of the

signal. In other words, these quantities are how individuals' PFC

activity deviated from the averaged observed/induced PFC activation in

their study.

OUR RESPONSE: The reviewer correctly points out that the observed PFC activation is typically based on haemoglobin concentration changes. However we believe that for the purposes of this study, their standard deviation is an equally valid meaning of the term activation, because (with proper preprocessing) greater evoked hemodynamic responses resulted in greater standard deviations. In this version of the manuscript we have clarified this definition of “activation” in the beginning of the new subsection “Feature extraction and selection” in Methods with the sentence: “Because greater evoked hemodynamic response tends to increase the standard deviation of the signal in a window, we took these values as the measure of the prefrontal activation.” Because of this close relationship, introducing a different term (other than activation) might have been confusing to the readers. We respectfully would like to keep this term, as it has now been clearly defined in the new version of the manuscript; it does not dramatically differ from the typical interpretation, and it provides a concise and coherent way of describing our results.

-- With regard to results' presentation, the authors sufficed to such

statements as (e.g., page 10) "The difference between high and low load

subjects were statistically significant in both Tasks in the case of the

shortest (1.5 cm, A and E) and the normal separation (3 cm, C and G)

channels." while referring to the figures 2 through 7 without providing

any descriptive statistics for their results. Precisely, it is not clear

how the authors determined these results "were statistically

significant?" What type of tests did they apply? What were the p-values,

test-statistics, mean, standard deviation, confidence intervals, and

effect-sizes associated with these tests? Did the authors corrected

their p-values while determining their significance (e.g., Bonferroni,

FDR, etc.)?

OUR RESPONSE: Thanks for pointing out this gap in our manuscript. We have added new material under the new subsection Statistical analysis under the Methods section. We describe at length our approach there that let us determine the statistical significance of our results. In the new version we have also introduced Bonferroni correction to adjust the significance cut-off for the p-values. This new cut-off slightly affected the figures, hence we have introduced new versions of the box plot figures, where a few of the previously significant results appear without an asterisk. However the majority of the significance results still remain.

--The authors stated that (page 11) "In addition, for the student subjects

there was a pronounced asymmetry in the case of the deepest sampling

channel (D and H), the activation on the left being significantly higher

than on the right." - The reviewer encourages the authors to consult the

studies related to the effect of short/long-distance channels on NIRS

measurements that are listed above. In particular, the "deepest sampling

channel" in the present study could fall within the range that is

considered not suitable for studying the cortical activation.

OUR RESPONSE: In the 3rd of paragraph of the Discussion section we address this issue and provide new references.

-- With regard to the source localization presented in Figure 5, the

authors used such statements as (page 11) "there is a hint of high

activation localized near the top left …" - Such assertion are not

justified unless the authors provide statistical evidence for the

possibility of such activations that differ from the other regions.

OUR RESPONSE: We thank the reviewer for pointing out this oversight in the manuscript. We have inserted in the new version a new paragraph immediately after Figure 5 which explains how to assess the statistical significance of Figure 5 based on the corresponding subplots of Figure 4. The greater activation in student subjects in the left PFC is significant by this analysis, as explained in the new paragraph.

Above shortcomings with regard to the results' representation also apply

to the case of the results pertinent to ML-based results.

Discussion:

-- The authors stated that (page 13) "This difference, visible in most

channel separations, was statistically significant in the 1.5 cm and

most of the 3 cm separated channels (Figure 2)." - The authors did not

present sufficient statistics for this claim (only presenting figures).

OUR RESPONSE: This gap in the manuscript has now been addressed by fully describing the statistical analysis in the new subsection Statistical analysis under the section Method, and by the new version of Figure 2 which includes clear indications of which group differences are significantly different.

-- They also stated that (page 13) "in skilled subjects in the correlation

of PFC activation with subjective task" however, the reviewer could not

find/see these correlation analyses.

OUR RESPONSE: To improve clarity this sentence has now been modified so that it reads: “This change in skilled subjects’ PFC response relative to that of unskilled ones may have been due to…”.

-- Another issue is with regard to hemispheric differences that authors

referred to (page 14) "We found that response was greater in the left

PFC of students (Figure 4D and H), ..." - Unless the authors perform

statistical tests, such claims are not truly founded.

OUR RESPONSE: We believe this valid critique has now been answered in the new version of the manuscript since we have a new section about the statistical analysis and we show the significant differences clearly on the figures. The difference mentioned in this comment is indeed statistically significant according to our analysis.

Other Comments:

-- The language of the manuscript requires a thorough auditing and

proofread as it is not easy to follow and comprehend the study.

OUR RESPONSE: We have introduced numerous clarifications and some stylistic edits (in addition to the corrections listed above).

-- The quality of figures are very low and must be improved.

OUR RESPONSE: The quality of figures are improved. We used PACE diagnostic tool to ensure that figures meet PLOS One requirements.

.

-- Please also break your Results Section into different subsections (e.g.,

one for test of significant differences, another for ML-based results,

etc.) to help reader better follow and understand the results.

OUR RESPONSE: We have introduced the new subsection titles “Group differences” and “Machine learning results” in the Results section.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Manabu Sakakibara

19 Jan 2021

PONE-D-20-34607R1

High Density Optical Neuroimaging predicts surgeons’s subjective experience and skill levels

PLOS ONE

Dear Dr. Keles,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The two original reviewers have carefully reviewed the revised manuscript. Their comments are appended below. Both of them acknowledged the improvement of the manuscript still leaving several concerns which should be considered before publication.

I will make the decision after receipt of replies to each critique and the necessary revision.

Please submit your revised manuscript by Mar 05 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Manabu Sakakibara, Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The paper is on an interesting topic, well written and method is rigorous. It would be better to add other ML metrics in the analysis to have a better comparison as well.

Needs to be fixed: Image resolution is needed to be higher. Figures are not captioned.

Reviewer #2: Thank you for taking your time and addressing my comments. The quality of the manuscript is substantially improved. However, there is still one point that the reviewer would like authors to address.

Although the authors now present the statistical test that they have used (i.e., non-parametric Wilcoxon signed-rank test), they still miss presenting the results of these tests. The reviewer realizes that the authors provided figures for their tests. However, these figures on their own are not sufficient. Please provide the numerical results of the test statistics associated with these figures (e.g., in tabular formats) providing test-statistics, p-values, M/SD, and confidence interval of your tests.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Soheil Keshmiri

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Feb 18;16(2):e0247117. doi: 10.1371/journal.pone.0247117.r004

Author response to Decision Letter 1


23 Jan 2021

Dear Prof. Manabu Sakakibara,

Thank you for sending the reviewer comments for our revised manuscript. We are very happy to see that they find the revised version greatly improved.

Regarding Reviewer 1:

We used PACE diagnostic tool to ensure that figures meet PLOS One requirements. If there is a remaining problem with the figures, we will be happy to address them promptly. In the revised submission, figure captions were inserted in the text of manuscript. Because of PLOS One requirements, we did not include captions as part of the figure files.

Regarding Reviewer 2:

We appreciate this reviewer’s close scrutiny of our revised MS. The only remaining criticism from this reviewer is that the values in the results section should be given (in addition to the figures). We have now given the values as tables in Supp. Information section.

We believe the new version fully meets all reviewers’ requests; however we will be happy to address any remaining issues with the manuscript.

Sincerely,

Hasan Onur Keles, PhD

Attachment

Submitted filename: Response to Reviewers_Final.docx

Decision Letter 2

Manabu Sakakibara

2 Feb 2021

High Density Optical Neuroimaging predicts surgeons’s subjective experience and skill levels

PONE-D-20-34607R2

Dear Dr. Keles,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Manabu Sakakibara, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Since the previous reviewer #1 is not available this time, I, this Academic Editor, and the reviewer #2 have carefully evaluated the revision #2. The reviewer #2 and I am totally satisfied with the revised manuscript. The revision has satisfactorily improved, thus the manuscript is accepted for publication in PLOS ONE.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: Yes: Soheil Keshmiri

Acceptance letter

Manabu Sakakibara

5 Feb 2021

PONE-D-20-34607R2

High Density Optical Neuroimaging predicts surgeons’s subjective experience and skill levels.

Dear Dr. Keles:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Manabu Sakakibara

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Accuracy of classification of the NASA-TLX score in task 1.

    This figure shows the same results as in Fig 5 but it is for attending subjects. No channels locations are indicated in H since the accuracy remained at chance level.

    (TIFF)

    S1 Table. Subject demographics, task completion times and task NASA-TLX scores.

    (DOCX)

    S2 Table. Prefrontal activation associated with high and low NASA-TLX scores for student subjects.

    (see Fig 2).

    (XLSX)

    S3 Table. Prefrontal activations associated with high and low NASA-TLX scores.

    (See Fig 3).

    (XLSX)

    S4 Table. Prefrontal activations associated with student and attending subjects for task 1 and task 2.

    (See Fig 4).

    (XLSX)

    Attachment

    Submitted filename: PLOS ONE Review Comments.pdf

    Attachment

    Submitted filename: Response to Reviewers.docx

    Attachment

    Submitted filename: Response to Reviewers_Final.docx

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES