Abstract
Background
The study aim was to evaluate validity evidence using idle time as a performance measure in open surgical skills assessment.
Methods
This pilot study tested psychomotor planning skills of surgical attendings (N=6), residents (N=4) and medical students (N=5) during suturing tasks of varying difficulty. Performance data were collected with a motion tracking system. Participants’ hand movements were analyzed for idle time, total operative time and path length. We hypothesized that there will be shorter idle times for more experienced individuals and on the easier tasks.
Results
A total of 365 idle periods were identified across all participants. Attendings had fewer idle periods during three specific procedure steps (p < .001). All participants had longer idle time on friable tissue (p < .005).
Conclusion
Using an experimental model, idle time was found to correlate with experience and motor planning when operating on increasingly difficult tissue types. Further work exploring idle time as a valid psychomotor measure is warranted.
Keywords: Motion tracking, Surgical skills, Assessment, Idle time, Path length, Simulation
Introduction
Objective measures of technical surgical skills are needed for accurate feedback and competency evaluations.1 Current assessments of technical skills include observer generated task specific checklists and global rating scales2–5; and technology-based performance measures.6–9 Observer-based scoring metrics are readily accessible and inexpensive. While commonly used to assess trainees, observer-based scoring metrics remain subject to bias10 and can be time consuming. In contrast, while technology-based performance measures are more expensive, they provide a unique and unparalleled opportunity for automated and objective assessment methods. Further integration of technology into surgical skills assessment is critical for developing objective performance measures and providing an explicit path to surgical mastery.
The movement of motor skills acquisition outside of the operating room and into the simulation environment permits the incorporation of technology-based metrics. Specifically, motion tracking technology allows for the objective measurement of motor behavior. Prior work in psychomotor skills assessment in open surgery has been conducted using electromagnetic motion tracking of participants hands with the Imperial College Surgical Assessment Device (ICSAD).11–15 This work demonstrated evidence of construct validity with relationships between number of hand movements and experience level13, final product scores14 and observer-based global rating scores.12–13
The majority of studies evaluating motor behavior in surgical skills focus on metrics regarding hand movements including path length and the amount of time to complete the procedure.11–15 These metrics represent a subset of surgical skill performance and technical skill metrics available with motion tracking technology. While the prior work on motor movement provides a strong foundation for understanding surgical performance, there is a paucity of research on those instances where there is no movement - idle time. Idle time is characterized by a lack of movement of both hands and may represent periods of motor planning or decision making that can be used to differentiate performance.16 Additionally, as individuals progress through stages of motor learning, there is a rapid reduction in the cognitive processes associated with difficult motor planning and a convergence on more rapid automaticity.17 This is expected to translate into less idle time during a task performed by more experienced individuals who have had more practice.
Our work involves development and implementation of decision based simulators with a variety of anatomical presentations that provide a range of task complexity.18–21 The variable tissue simulator was developed to present suturing tasks of varying levels of difficulty based on the materials presented. The first aim of this pilot study was to evaluate idle time as a potential surgical performance metric. We hypothesize that there will be shorter idle times for more experienced individuals and on the easier tasks. The second aim of this pilot study was to assess the ability of the variable tissue simulator to differentiate psychomotor performance based on experience and task difficulty. We hypothesize that the time and path length to complete the suturing task will be shorter for more experienced individuals and on less difficult tasks.
Methods
Setting and participants
The study participants (n=15) were medical students (n=5), surgery residents (n=4) and attending surgeons (n=6) at a Midwestern academic hospital. Participants were recruited through departmental list serves and participation was voluntary and based on availability. Participants performed a simulated suturing task on the variable tissue simulator in a single setting. Performance data were collected with video recordings and an optical motion tracking system. There were no time restrictions on the suturing task and feedback was not provided.
Study approval was granted by the Institutional Review Board and written informed consent was obtained from all participants.
Surveys
Prior to performing the suturing task, participants completed a survey that collected demographic information including level of training, surgical specialty and handedness. Following the suturing task, participants responded to a question asking if the motion tracking system interfered with their ability to perform the task (5-poin Likert scale: 1=strong disagree, 5=strongly agree).
Variable tissue simulator
The variable tissue simulator was designed to present a simulated suturing task that induced decision making by varying the materials and difficulty of the suturing task. The simulator was composed of a board with simulated materials held in place by clips. In this study, three different simulated tissue types were presented: foam (dense connective tissue), rubber balloons (artery) and tissue paper (friable tissue) (Figure 1). The suturing task was to place three interrupted instrument-tied sutures on two opposing pieces of material. 3-0 Prolene suture was provided along with a needle driver and surgical forceps. The development of the variable tissue simulator was guided by prior cognitive task analysis with two expert surgeons.22
Motion tracking system
Hand movements were measured with an optical motion tracking system (Visualeyez 3000, Phoenix Technologies, Inc.). Four infrared light emitting diode (IRED’s) markers were affixed by medical tape (Transpore, 3M Company) to the participant’s gloved hands. Locations on both hands were: 1) distal dorsal surface of the second digit phalanx; 2) distal dorsal surface of the first digit metacarpal; 3) mid-dorsal surface of the hand; and 4) dorsal surface of the forearm just proximal to the wrist joint. Data were collected from both of the participant’s hand movements. Traces of marker positions in millimeters were sampled at 180 Hz. To remove any high frequency noise artifacts, data were filtered with a dual-pass second-order Butterworth filter with a low-pass cutoff frequency of 7 Hz.
Motion tracking measures
Motion tracking data of both hands were analyzed for differences in idle time, total operative time, and path length. An idle threshold analysis was performed by calculating the proportions of time the hands spent below velocity values ranging from 1 to 50 mm/sec. The threshold yielding the greatest differences in proportion among participants, 20mm/sec, was the velocity selected to define idle periods. Also a minimum duration of 0.5 seconds was chosen to define a deliberate idle period to avoid capturing very brief pauses. Total operative time was calculated from the time participants’ hands left the starting position until participants cut the tail of the third suture. Path length is the three-dimensional distance the hands traveled from the starting position.
Video analysis
Idle periods were identified from the motion tracking data and correlated to specific time points in the video-recordings of the participants. The video recordings were then used to code when in the procedure the idle periods were occurring. The procedure steps were broken down into seven main categories: 1) Entering tissue with the needle; 2) Driving the needle through tissue; 3) Pulling the needle out of tissue; 4) Tightening a knot; 5) Cutting a suture tail; 6) Grasping suture with an instrument and 7) Managing the suture. Idle periods occurring when the participant was interacting with the experimenter were also noted. For analysis, videos were reviewed by a single blinded rater. To assess inter-rater agreement, a 20% random sample was reviewed and classified by a second blinded rater.
Data Analysis
Our hypothesis is that validity evidence exists for the use of three different psychomotor measures (idle time, total operative time and path length) to differentiate performance by experience level and task difficulty. To test this hypothesis, repeated measures ANOVA with experience level (medical students, surgery residents and attending surgeons) as a between-subjects variable and tissue type (foam, rubber and tissue paper) as repeated within-subjects variables was used to evaluate main effects of experience and tissue. Chi-squared analysis was performed to assess for an association between procedure steps when idle periods were occurring and participant experience level. Inter-rater reliability for video analysis of idle periods was assessed using a Cohen’s Kappa (κ). Pearson’s correlation coefficients were calculated to evaluate correlations between idle time, total operative and path length. All analyses were performed using IBM SPSS Statistics Version 22.
Results
Surveys
The study participants (n=15) were medical students (n=5), surgery residents (n=4) and attending surgeons (n=6). The medical students were either third or fourth year students that had completed at least one surgical rotation. Surgical residents ranged from post graduate year (PGY) one to three. Attending surgeons had an average length of time in practice of 14.2 years (SD=12.0). All participants completed all three of the suturing tasks. Overall participants felt that the motion tracking system did not interfere with their ability to perform the task (M=1.9/5.0, SD=1.1).
Idle Time
i. Motion tracking measures
The repeated measures ANOVA of idle time revealed a main effect for tissue type (Figure 2). All groups paused longer when working on the tissue paper (M=17.22 seconds, SD=15.08) compared to the balloon (M=4.76 seconds, SD=3.66, t(9)=3.72, p=0.003) or foam (M=4.47 seconds, SD=3.85, t(7)=3.45, p=0.005). Cohen’s effect size value (tissue vs. balloon d = 1.1; tissue vs. foam d = 1.2) suggests a high practical significance.
Repeated measures ANOVA of idle time was not significant for experience level (F(2.00, 12.00)=27.13, p=0.742).
ii. Video analysis
A total of 365 idle periods were identified by the motion tracking system across all participants and was further evaluated with video analysis. Video analysis of procedure steps when idle periods occurred revealed high inter-rater reliability (k = .887). Most of the idle periods (99.2%) occurred while participants were performing the task. There was a significant difference in the number of idle periods during specific procedure steps (χ2(6) = 238.65, p < 0.001). Idle periods by procedure step and experience level are shown in Figure 3. Results show attending surgeons had fewer idle periods while entering tissue with the needle (p = .001); driving the needle through tissue (p < .001); and pulling the needle out of tissue (p = .007). Attending surgeons had more idle periods when tightening a knot (p < .001).
Total Operative Time
The repeated measures ANOVA of total operative time also revealed a main effect for tissue type (Figure 4A). It took significantly longer for all groups to complete the suturing task on tissue paper (M=97.43 seconds, SD=21.77) compared to balloon (M=55.69 seconds, SD=6.50, t(9)=8.65, p<0.001) or foam (M=56.38 seconds, SD=7.06, t(7)=7.85, p<0.001). Cohen’s effect size value (tissue vs. balloon d = 2.6; tissue vs. foam d = 2.6) suggests a high practical significance.
A repeated measures ANOVA of total operative time showed a main effect of experience level (Figure 4B). Attending surgeons (M=52.4 seconds, SD=23.0) performed the procedure in less time than surgical residents (M=78.5 seconds, SD=43.0, t(11)=2.51, p<0.029) and medical students (M=88.7 seconds, SD=58.0, t(7)=2.51, p<.029). Cohen’s effect size value (attending vs. resident d = 0.76; attending vs. medical student d = 0.82) suggests a moderate to high practical significance.
Path Length
The repeated measures ANOVA of path length also revealed a main effect of tissue type (F(2, 26)=19.57, p<0.0001) (Figure 5A). Participant’s hands moved farther when suturing tissue paper (M=6.47 meters, SD=2.06) compared to balloon (M=4.61 meters, SD=1.02, t(14)=5.42, p<0.001) or foam (M=4.90 meters, SD=1.10, t(14)=4.27, p=0.001). Cohen’s effect size value (tissue vs. balloon d = 1.1; tissue vs. foam d = 0.95) suggests a high practical significance.
There was a significant main effect of experience level on path length (Figure 5B). Attending surgeons (M=4.03 meters, SD=1.03) had shorter path lengths than surgical residents (M=5.8 meters, SD=1.24, t(9)=2.27, p=0.045) and medical students (M=6.10 meters, SD=1.24, t(7)=2.58, p=0.025). Cohen’s effect size value (attending vs. resident d = 1.6; attending vs. medical student d = 1.8) suggests a high practical significance.
Correlation of motion tracking measures
Idle time was significantly correlated with total operative time for the tissue paper task (r = .56, p = .03). Idle time was not significantly correlated with total operative time for the foam (r = .44, p = .097) and balloon (r = .48, p =.07) tasks. In additiona, idle time did not correlate with path length (p > .05). Lastly, there was a significant correlation between total operative time and path length for all three tasks (Foam r = .84, Balloon r = .80, Tissue r = .760; p < .001 for all correlations).
Discussion
This pilot study sought validity evidence for the use of idle time as a performance metric in an open surgical skills task. Validity evidence was evaluated using known groups and response process with video analysis.25, 26 Additionally, we sought validity evidence for the psychomotor metrics generated when using motion tracking technology with the newly developed variable tissue simulator. Fifteen participants with different levels of experience (medical students, surgical residents and attending surgeons), completed three suturing tasks on different materials (foam, balloon, and tissue paper) that were purposefully selected to provide varying degrees of complexity.
This study is the first time that idle time was experimentally investigated as a performance metric in open surgical skills assessment. All participants had greater amounts of idle time while performing the more difficult suturing task. This is consistent with our hypothesis that idle times will be shorter for easier tasks. These findings provide validity evidence for the use of idle time as a psychomotor performance metric. Idle time may represent periods of motor planning or decision making that can be used to differentiate performance16. This suggests that evaluating what participants are doing or thinking while not moving their hands may be just as important as while they are moving their hands.
Prior studies assessing laparoscopic skills noted that novices spend a greater amount of time in the idle state compared to experts.16, 23–24 Our study did not find a significant relationship between overall idle time and experience level. However, differences in idle time were noted when evaluating specific procedure steps by experience. Entering tissue with the needle, driving the needle through tissue and tightening a knot had the greatest number of idle periods. Additionally, the idle periods during these steps were not evenly distributed amongst participants by experience level. Medical students and resident had a greater number of idle periods during entering tissue with the needle, driving the needle through tissue, while attending surgeons had a greater number of idle periods during tightening a knot. Medical students and residents may exhibit more idle periods when performing motor planning for contacting the needle with the tissue and moving it through the tissue. In contrast attendings exhibit idle periods while placing and evaluating tension on the knot. Our findings warrant additional research evaluating the relationship between idle time, experience level and procedure steps.
Study results also demonstrate that more experienced practitioners took less time and moved their hands a shorter distance to complete the suturing tasks compared to less experienced trainees. This finding is consistent with motor behavior research demonstrating improvements in movement components of motor skills with increased practice.27 Additionally, participants took more time and moved their hands a longer distance to complete the suturing task on the material (tissue paper) that was designed to be the most friable and complex. Both of these findings are consistent with prior work evaluating surgical performance with motion tracking technology11–15 thus providing construct validity evidence for the use of psychomotor metrics to differentiate performance on the newly developed variable tissue simulator.
There were significant positive correlations between idle time and total operative time on the more difficult suturing task (tissue paper). Participants that take longer to complete the task have greater idle periods. This may result from those participants requiring larger amounts of psychomotor and procedural planning. The lack of correlation between idle time and path length may result from other factors confounding the relationship or our small sample size and limited power.
Limitations of this study include the small sample size, which decreases statistical power and generalizability of the results. This pilot study was undertaken to evaluate validity evidence of this newly developed performance metric. The preliminary results show great promise with large effect sizes despite low power. Thus we plan to collect data from a larger sample size to address the limitations of statistical power and allow for broader inferences regarding our results. Generalizability is also limited by the sample size and lack of randomization. Participants were included in the study based on availability and willingness to participate, and therefore, may be inherently different than those that did not choose to participate. Lastly, another limitation may be the relative simplicity of using a suturing task rather than a full procedure. The study was specifically designed with a simple task and complexity was added through the use of different materials. This allowed for more detailed understanding of the motor behavior components that may have not been possible with a longer task involving multiple motor components. We are interested in quantifying periods of motor planning or decision making and our future work is aimed at evaluating these same psychomotor metrics of idle time, total operating time, and path length in more complex tasks and full procedures.
The implications of this work relate to ongoing development and evaluation of performance metrics in the research setting, as well as, the need for a paradigm shift in how motion tracking technology is used for surgical skills assessment. The surgical profession is still in its infancy regarding best practice for technology base performance assessments. Several research groups are using commercial technologies and discovering new metrics. As additional metrics continue to be developed, future directions may call for a meta-analysis to better understand the gaps, strengths and weaknesses of these new approaches. Idle time represents an evolution in traditional motion analysis research. Our findings imply that this performance metric is meaningful and may add to our current understanding of technical skills research.
Summary.
Idle time is an underdeveloped psychomotor performance measure. Using an experimental model, idle time was found to correlate with motor planning when operating on increasingly difficult tissue types. Further work exploring this performance metric is warranted.
Acknowledgments
Funding for this study came from the National Institutes of Health grant #1F32EB017084-01 entitled “Automated Performance Assessment System: A New Era in Surgical Skills Assessment” and the Department of Defense grant #W81XWH-13-1-0080 entitled “Psycho-Motor and Error Enabled Simulations Modeling Vulnerable Skills in the Pre-Mastery Phase – Medical Practice Initiative Procedural Skill Decay and Maintenance (MPI-PSD)”.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Schmitz CC, Darosa D, Sullivan ME, et al. Development and Verification of a Taxonomy of Assessment Metrics for Surgical Technical Skills. Acad Med. 2014;89(1):153–161. doi: 10.1097/ACM.0000000000000056. [DOI] [PubMed] [Google Scholar]
- 2.Martin JA, Regehr G, Reznick R, et al. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg. 1997;84(2):273–278. doi: 10.1046/j.1365-2168.1997.02502.x. [DOI] [PubMed] [Google Scholar]
- 3.Fried GM, Feldman LS. Objective assessment of technical performance. World J Surg. 2008;32:156–160. doi: 10.1007/s00268-007-9143-y. [DOI] [PubMed] [Google Scholar]
- 4.Regehr G, MacRae H, Reznick RK, Szalay D. Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination. Acad Med. 1998;73:993–997. doi: 10.1097/00001888-199809000-00020. [DOI] [PubMed] [Google Scholar]
- 5.Yule S, Paterson-Brown S. Surgeons’ non-technical skills. Surg Clin North Am. 2012;92(1):37–50. doi: 10.1016/j.suc.2011.11.004. [DOI] [PubMed] [Google Scholar]
- 6.Cano A, Gayá F, Lamata P, et al. Laparoscopic tool tracking method for augmented reality surgical applications. In: Bello F, Edwards PJ, editors. Biomedical simulation. Berlin: Springer; 2008. p. 191. [Google Scholar]
- 7.Dosis A, Aggarwal R, Bello F, et al. Synchronized video and motion analysis for the assessment of procedures in the operating theater. Arch Surg. 2005;140:293–299. doi: 10.1001/archsurg.140.3.293. [DOI] [PubMed] [Google Scholar]
- 8.Chmarra MK, Grimbergen CA, Dankelman J. Systems for tracking minimally invasive surgical instruments. Minim Invasive Ther Allied Technol. 2007;16(6):328–340. doi: 10.1080/13645700701702135. [DOI] [PubMed] [Google Scholar]
- 9.Datta V, Mackay S, Mandalia M, et al. The use of electromagnetic motion tracking analysis to objectively measure open surgical skill in the laboratory-based model. J Am Coll Surg. 2001;193:479–485. doi: 10.1016/s1072-7515(01)01041-9. [DOI] [PubMed] [Google Scholar]
- 10.Angrosino MV. Observer Bias. In: Lewis-Beck MS, Bryman A, Liao TF, editors. Encyclopedia of Social Science Research Methods. Thousand Oaks: SAGE Publications, Inc; 2004. pp. 758–759. [Google Scholar]
- 11.Bann S, Davis IM, Moorthy K, et al. The reliability of multiple objective measures of surgery and the role of human performance. Am J Surg. 2005;189(6):747–752. doi: 10.1016/j.amjsurg.2005.03.020. [DOI] [PubMed] [Google Scholar]
- 12.Datta V, Bann S, Mandalia M, Darzi A. The surgical efficiency score: a feasible, reliable, and valid method of skills assessment. Am J Surg. 2006;192(3):372–378. doi: 10.1016/j.amjsurg.2006.06.001. [DOI] [PubMed] [Google Scholar]
- 13.Datta V, Chang A, Mackay S, Darzi A. The relationship between motion analysis and surgical technical assessments. Am J Surg. 2002;184(1):70–73. doi: 10.1016/s0002-9610(02)00891-7. [DOI] [PubMed] [Google Scholar]
- 14.Datta V, Mandalia M, Mackay S, et al. Relationship between skill and outcome in the laboratory-based model. Surgery. 2002;131(3):318–23. doi: 10.1067/msy.2002.120235. [DOI] [PubMed] [Google Scholar]
- 15.Mackay S, Datta V, Mandalia M, et al. Electromagnetic motion analysis in the assessment of surgical skill: Relationship between time and movement. ANZ J Surg. 2002;72(9):632–4. doi: 10.1046/j.1445-2197.2002.02511.x. [DOI] [PubMed] [Google Scholar]
- 16.Oropesa I, Sanchez-Gonzalez P, Lamata P, et al. Methods and tools for objective assessment of psychomotor skills in laparoscopic surgery. J Surg Res. 2011;171(1):e81–95. doi: 10.1016/j.jss.2011.06.034. [DOI] [PubMed] [Google Scholar]
- 17.Fitts PM, Posner MI. Human performance. Belmont: Brooks/Cole; 1967. [Google Scholar]
- 18.Balkissoon R, Blossfield K, Salud L, et al. Lost in translation: unfolding medical students’ misconceptions of how to perform a clinical digital rectal examination. Am J Surg. 2009;197(4):525–532. doi: 10.1016/j.amjsurg.2008.11.025. [DOI] [PubMed] [Google Scholar]
- 19.Pugh CM, DaRosa DA, Santacaterina S, Clark RE. Faculty evaluation of simulation-based modules for assessment of intraoperative decision making. Surgery. 2011;149(4):534–542. doi: 10.1016/j.surg.2010.10.010. [DOI] [PubMed] [Google Scholar]
- 20.Pugh CM, Domont ZB, Salud LH, Blossfield A simulation-based assessment of clinical breast examination technique: do patient and clinician factors affect clinician approach? Am J Surg. 2008;195(6):874–880. doi: 10.1016/j.amjsurg.2007.10.018. [DOI] [PubMed] [Google Scholar]
- 21.Pugh CM, Rosen J. Qualitative and quantitative analysis of pressure sensor data acquired by the E-Pelvis simulator during simulated pelvic examinations. Stud Health Technol Inform. 2002;85:376–379. [PubMed] [Google Scholar]
- 22.Pugh CM, Santacaterina S, DaRosa DA, Clark RE. Intra-operative decision making: more than meets the eye. J Biomed Inform. 2011;44(3):486–96. doi: 10.1016/j.jbi.2010.01.001. [DOI] [PubMed] [Google Scholar]
- 23.Rosen J, Brown JD, Chang L, et al. Generalized approach for modeling minimally invasive surgery as a stochastic process using a discrete Markov model. IEEE Trans Biomed Eng. 2006;53(3):399–413. doi: 10.1109/TBME.2005.869771. [DOI] [PubMed] [Google Scholar]
- 24.Rosen J, Solazzo M, Hannaford B, Sinanan M. Task decomposition of laparoscopic surgery for objective evaluation of surgical residents’ learning curve using hidden Markov model. Comput Aided Surg. 2002;7(1):49–61. doi: 10.1002/igs.10026. [DOI] [PubMed] [Google Scholar]
- 25.Cook DA, Brydges R, Zendejas B, Hamstra SJ, Hatala R. Technology-enhanced simulation to assess health professionals: A systematic review of validity evidence, research methods, and reporting quality. Acad Med. 2013;88(6):872–83. doi: 10.1097/ACM.0b013e31828ffdcf. [DOI] [PubMed] [Google Scholar]
- 26.Downing SM. Validity: on meaningful interpretation of assessment data. Med Educ. 2003;37(9):830–7. doi: 10.1046/j.1365-2923.2003.01594.x. [DOI] [PubMed] [Google Scholar]
- 27.Magill RA. Motor Learning and Control: Concepts and Applications. 9. New York: McGraw-Hill; 2011. [Google Scholar]