Skip to main content
BMJ Simulation & Technology Enhanced Learning logoLink to BMJ Simulation & Technology Enhanced Learning
. 2020 Apr 16;7(1):3–10. doi: 10.1136/bmjstel-2019-000506

Validity of the Medi-StuNTS behavioural marker system: assessing the non-technical skills of medical students during immersive simulation

Emma Claire Phillips 1,2,3,, Samantha Eve Smith 3, Benjamin Clarke 2, Ailsa Lauren Hamilton 2, Joanne Kerins 2, Johanna Hofer 2, Victoria Ruth Tallentire 1,2,3
PMCID: PMC8936660  PMID: 35521075

Abstract

Background

The Medical Students’ Non-Technical Skills (Medi-StuNTS) behavioural marker system (BMS) is the first BMS to be developed specifically for medical students to facilitate training in non-technical skills (NTS) within immersive simulated acute care scenarios. In order to begin implementing the tool in practice, validity evidence must be sought. We aimed to assess the validity of the Medi-StuNTS system with reference to Messick’s contemporary validity framework.

Methods

Two raters marked video-recorded performances of acute care simulation scenarios using the Medi-StuNTS system. Three groups were marked: third-year and fourth-year medical students (novices), final-year medical students (intermediates) and core medical trainees (experts). The scores were used to make assessments of relationships to the variable of clinical experience through expert–novice comparisons, inter-rater reliability, observability, exploratory factor analysis, inter-rater disagreements and differential item functioning.

Results

A significant difference was found between the three groups (p<0.005), with experts scoring significantly better than intermediates (p<0.005) and intermediates scoring significantly better than novices (p=0.001). There was a strong positive correlation between the two raters’ scores (r=0.79), and an inter-rater disagreement of more than one point in less than one-fifth of cases. Across all scenarios, 99.7% of skill categories and 84% of skill elements were observable. Factor analysis demonstrated appropriate grouping of skill elements. Inconsistencies in test performance across learner groups were shown specifically in the skill categories of situation awareness and decision making and prioritisation.

Conclusion

We have demonstrated evidence for several aspects of validity of the Medi-StuNTS system when assessing medical students’ NTS during immersive simulation. We can now begin to introduce this system into simulation-based education to maximise NTS training in this group.

Keywords: simulation, medical students, non-technical skills, behavioural marker systems, validation


Key messages.

What is already known on this subject

  • There are presently no validated tools to assess the non-technical skills (NTS) of medical students during immersive simulation.

  • The Medical Students’ Non-Technical Skills (Medi-StuNTS) behavioural marker system (BMS) is the first BMS to be developed specifically for medical students and is designed to facilitate training in NTS within immersive simulation scenarios.

What this study adds

  • This study demonstrates validity of the Medi-StuNTS system with reference to Messick’s contemporary validity framework. Notably, performance in NTS using the tool improved with clinical experience.

  • We have highlighted that situation awareness and decision making and prioritisation are not performed as well as other NTS by medical students.

Introduction

High-level non-technical skills (NTS), ”the cognitive, social and personal resource skills that complement technical skills, and contribute to safe and effective task performance”,1 are now widely acknowledged as critical to the delivery of safe and effective medical management and patient safety.2–4 While the term NTS is widely used, some issues with the terminology have been highlighted.5 6 These include that it may be an inaccurate and oversimplified descriptor and that it devalues the skills involved by implying they are less valuable to performance than technical skills. However, since the currently accepted terminology is used and understood by those working in this field, the term NTS will be used for the purposes of this paper.

Behavioural marker systems (BMS) have been developed in order to delineate, identify and train NTS within healthcare.7 This follows progress made in the field of NTS training in aviation using crew resource management, a system of observable behaviours of flight deck crew relating to NTS.8

Although other tools for assessing NTS exist (such as self-reporting tools, objective structured clinical examinations and written assessments),9 there are several key benefits to using a BMS. These include being able to objectively recognise behaviours and link them to quality of performance, with some systems including numerical ratings, which provide an opportunity for both informal feedback and formal assessment in order to facilitate training in NTS. In order to be an effective training and assessment tool, the user of the BMS must be adequately trained, and the tool itself must be subjected to rigorous tests of validity. In addition, assessments made using a BMS are only of use for the period in which the participant is being observed, generally during a specific clinical interaction (real or simulated).10 BMS have been developed for a wide range of specialties (see table 1) and are being rapidly adopted into many training curricula.11–20

Table 1.

Examples of BMS and target groups

BMS Target group
Anaesthetists’ Non-Technical Skills11 Anaesthetists
Anaesthetic Non-Technical Skills for Anaesthetic Practitioners12 Anaesthetic assistants
Emergency Physicians’ Non-technical Skills13 Emergency physicians
Foundation Non-Technical Skills14 Foundation doctors
Medical Students’ Non-Technical Skills15 Medical students
Non-Technical Skills for Surgeons16 Surgeons
Observational Teamwork Assessment for Surgery17 Surgical teams
Oxford Non-Technical Skills18 Theatre teams
Resuscitation teams19 Cardiac arrest teams
Scrub Practitioners’ Nontechnical Skills20 Scrub practitioners

BMS, behavioural marker system.

The Medical Students Non-Technical Skills (Medi-StuNTS) BMS is the first BMS to be developed specifically for medical students.15 It was developed through a number of stages, including identification of NTS through literature review, interviews with medical students and expert panels.15 Like most BMS, the Medi-StuNTS system categorises NTS using a trihierarchical structure.1 21 These comprise broad skill categories (situation awareness, decision making and prioritisation, teamwork and communication, self-awareness and escalating care); subcategories within a broad skill category (skill elements); and a description of potentially observable behaviours for each skill element. The Medi-StuNTS structure is outlined in table 2.15 A numerical score of 1–5 (1=good performance, 5=poor performance) is assigned to each skill element and category when observing a student using the system. The purpose of the Medi-StuNTS system is to facilitate training in NTS for use within immersive simulated acute care scenarios. Simulation is recognised as an appropriate and effective method for the teaching of NTS, with this teaching being optimal when underpinned by sound NTS frameworks.22 Performance during simulation scenarios can be observed and assessed using frameworks such as a BMS in order to form the basis of a learning conversation designed to provide feedback and enhance future performance.

Table 2.

Outline of the skill categories and elements included in the Medical Students’ Non-Technical Skills system

Skill category Skill element Description
Situation awareness Gathering information Seeking information about patient’s condition, background and wishes from any available sources
Recognising and understanding information Collating information to develop an overall picture of the patient’s current situation
Planning, preparing and anticipating Anticipating consequences of actions and predicting potential outcomes
Decision making and prioritisation Prioritising Deciding priorities of assessment, investigation and management
Recognising and dealing with uncertainty Considering differential diagnoses and acknowledging diagnostic uncertainty
Reviewing decisions Reassessment of the patient in light of decisions made and actions taken
Teamwork and communication Establishing a shared mental model Communicating to facilitate a shared understanding between team members
Demonstrating active followership Proactive support of the leader and participation in team activities
Patient involvement Inclusion of the patient and/or their relatives within the team
Self-awareness Role awareness Understanding of role and responsibilities
Coping with stress Minimising the impact of stress on the ability to perform effectively
Speaking up Communicating concerns regarding patient safety
Escalating care Situation awareness Escalating care using skills related to situation awareness
Decision making and prioritisation Escalating care using skills related to decision making and prioritisation
Teamwork and communication Escalating care using skills related to teamwork and communication
Self-awareness Escalating care using skills related to self-awareness

As described, in order to confidently implement the use of a BMS, its validity must be assessed.1 23 Construct validity is described in the classical validity framework as ”the degree to which a test measures what it claims to be measuring”,24 and has been delineated by Messick’s contemporary framework into more specific forms of validity evidence, including the five categories of test content (ensuring the assessment reflects what is intended to be measured), response process (how the examiner/examinee responses align with the assessment design, including quality control), internal structure (relationships between assessment items and how they relate to the construct), relationship to other variables (association between assessment scores and some other measurable variable) and consequences (impact of the assessment and actions taken based on its results, including differences in scores among subgroups where performance ought to be similar).25–27

There is varying and often limited validity evidence for BMS, with systematic reviews of these tools highlighting that evidence was often sought using an out of date validity framework, or with no reference to a framework at all.27 28 This creates difficulty comparing different BMS. The most common forms of validity evidence presented are content (between 50% and 100% tools), internal structure (35%–94%) and relationship with other variables (30%–94%), which is most frequently determined by comparison of skills to training level. Response process and consequences evidence have been less consistently assessed, in some reviews being reported in <10% of reviewed tools.10 27–31 Additional forms of validity, such as face validity (how relevant a test appears to the participants), have also frequently been demonstrated for BMS,10 despite not being a feature of the contemporary validity framework and being thought of by some as too subjective.32

A recent Best Evidence Medical Education systematic review highlighted that there are currently no validated tools for the assessment of NTS during simulation for medical students9 and called for further vital work in this field. Content validity of the Medi-StuNTS system was sought during the development process by means of an expert panel.15 However, to date no further assessments of validity have been made on this BMS.

The aim of this study was to seek validity evidence for the Medi-StuNTS system with reference to Messick’s contemporary validity framework.26

Methods

Study design

As described previously, content validity of the Medi-StuNTS system was assessed as part of its development.15 To encompass all other aspects of the validity framework, this study sought evidence of the relationship to other variables, internal structure, response process and consequences. We recruited volunteer participants to three groups (novices, intermediates and experts) and scored them on the NTS displayed in an acute care simulated scenario using the Medi-StuNTS system. These scores were used for the assessments of validity. Methods for these were selected from Cook et al’s systematic review of validity evidence for simulation-based assessments.27

Context

We carried out this study at the Scottish Centre for Simulation and Clinical Human Factors. This is a state-of-the-art training facility used to deliver education to multiprofessional groups through simulation. The simulations for this study used a SimMan Essential manikin and took place in an immersive environment resembling a medical ward. The simulated ward had a working hospital bed; a patient observations monitor; emergency equipment, including defibrillator; drugs, including oxygen; and paperwork for each patient. Participants were given a 15 min orientation to the manikin, environment and equipment prior to undertaking a simulated scenario.

Participants

We recruited three groups of participants: third-year and fourth-year medical students (with up to 18 months of clinical experience, novices), final-year medical students (with 2–3 years of clinical experience, intermediates) and core medical trainees (at the start of the medical training scheme who are the equivalent of postgraduate years (PGYs) 3 and 4, with 6–7 years of clinical experience, experts). These groups were selected based on the evidence that NTSs improve with level of experience in a field,1 and we have therefore made the assumption for this study that NTS improve with level of clinical experience, similar to the assumptions made in the validation of other BMS.33–35 Medical students were recruited through their university course organisers from three medical schools in Scotland (The University of Edinburgh, The University of Dundee and The University of Glasgow). Core medical trainees were recruited through an online booking system for courses in the simulation centre and came from areas across the whole of Scotland. All participants were volunteers, were given written information prior to entering the study and were free to withdraw from the study at any time without giving a reason.

Sample size

Based on the measurement of the relationship with other variables (as detailed further), we conducted a sample size calculation with the following assumptions: an expected alpha error of 1%, power of 90%, estimated effect size 0.615 (estimated by conducting a small pilot study of two novices, two intermediates and two experts, who were marked by a panel of expert markers and thought to be typical), and estimated SD of 0.59 (estimated using previous scores of 68 medical students). This indicated the need for 22 participants per group (total of 66 participants).

Data collection

Each participant took part in a 10 to 15 min acute care simulated scenario. The medical conditions encountered were sepsis, upper gastrointestinal bleed, asthma, cardiac arrest, anaphylaxis, bradycardia, tachyarrhythmia, hypoglycaemia and acute coronary syndrome. These were encountered with equal frequency. The participants were asked to behave as they would in normal clinical practice. The manikin’s voice and physiology were manipulated from the control room, which was behind a one-way mirror from the simulated ward (by ECP, VRT and JK). A confederate playing the role of a nurse was present in the simulated ward with the participant. Contact between the control room and confederate was maintained by an earpiece. Performances were unscripted and progress through the scenario was driven by the actions of the participant.

The simulation scenarios were video-recorded using Scottish Medical Observation and Training System software and stored on a secure computer. Video editing software (iMovie V.10.1.10) was used to mute any verbal exchange, where the participants stated their name or role to ensure their anonymity and blinding of raters to the participants level of experience. Any indication of participants’ names, grade or date of recording was removed from the file name, and each video was relabelled numerically.

The videos were marked by two raters (BC and SES) using the Medi-StuNTS system. Video rater demographics are shown in table 3. Both raters had received 3 hours of training on using the tool and had experience of teaching and working with medical students. Neither had been present at the simulation sessions where the videos were recorded or knew any of the participants.

Table 3.

Demographics of video raters

Rater 1 Rater 2
Clinical background General practice Emergency medicine
Years of experience as qualified doctor 10 4
Gender Female Male

Both raters, blinded to the grade of the participant, independently viewed and marked each video. Using the Medi-StuNTS system, they were asked to assign a score to each skill element and category. They were able to write free text comments, but these were excluded from the analysis. They were instructed to write ‘not observed’ if they did not feel they had sufficient evidence to score the participant on a certain skill element. All videos were marked based on the expected performance of a final-year medical student. The two raters did not discuss their scores with each other or with any unblinded researcher during the video-marking process. Data from the marking sheets were collated by ECP on a Microsoft Excel spreadsheet.

Data analysis

Relationship with other variables

Relationship with other variables was assessed by comparing the scores with participants’ level of clinical experience, with the hypothesis that intermediates should perform better than novices, and experts should perform better than intermediates. We used the mean of all scores given by both raters to each participant to create a single score for each participant. We compared the scores between novices, intermediates and experts using analysis of variance. We then compared differences between novices and intermediates, and intermediates and experts, using Student’s unpaired t-test. We considered results to be statistically significant if p<0.025 (5% significance level with a Bonferroni correction for multiple comparisons). Our study was insufficiently powered to test for statistically significant differences in individual categories between groups.

Internal structure

Three assessments of internal structure were made. First, inter-rater reliability for the mean scores of all participants was calculated using Person’s correlation coefficient. Second, observability levels were calculated by comparing the number of observed versus not-observed behaviours from the video ratings, at both a skill element and skill category level. Acceptable levels are indicated by >50% observability.36

It is common to report internal consistency when using multiple Likert-type scales, using statistics such as Cronbach’s alpha. Internal consistency measures “the extent to which all items in a test measure the same construct or concept”.37 In the Medi-StuNTS system, the items are not inter-related as each item measures a completely different construct. Even within a single category, it is expected that a student may do well in one area but not in another. It is therefore inappropriate to conduct a Cronbach’s alpha analysis of internal consistency. However, an exploratory factor analysis was undertaken as a third test of internal structure by assessing if the skill elements in each category are related to one another in order to determine if they belong together within a skill category.38 Factor loadings (ie, the relationship of each skill category to the underlying skill element – the factor) were calculated using the average scores of both raters. A positive value indicates that the skill element is related to the overall category, and a number closer to one indicates a stronger relationship to the overall category.

Response process

Disagreements between the two raters were evaluated by identifying all scenarios in which the mean score given varied by greater than one point. The nature of these disagreements was then analysed by reviewing which skill categories within these data sets accounted for the difference in score between the raters.

Consequences

As the Medi-StuNTS system is not a summative assessment and does not have a set pass mark, we looked at differential item functioning as an assessment of consequences evidence, being the “actual consistencies or inconsistencies in test performance across learner groups”.27 To do this, we calculated a single mean score for each skill category for each of the three groups of participants in order to explore the trends that were present (differences between novices, intermediates and experts scores in each category). We expected that within each group, participants would demonstrate a consistent pattern of performance across the NTS categories.

Statistical analysis for all of the above was performed by SS and ECP using Minitab V.18 and Microsoft Excel V.16.

Results

We recruited 22 participants per group (novices, intermediates and experts) as per the sample size calculation, totalling 66 videos, each of which were marked by the two independent video raters. All those recruited agreed to participate, and none withdrew their consent.

Relationship with the variable of clinical experience

A significant difference was found between the three groups (p<0.005). Experts scored significantly better than intermediates (mean scores 1.59, 95% CI 1.32 to 1.85 vs 2.67, 95% CI 2.41 to 2.93; p<0.005), and intermediates scored significantly better than novices (mean scores 2.67 vs 3.45, 95% CI 3.12 to 3.72]; p=0.001). Figure 1 shows the overall mean scores and 95% CIs for the three groups. Table 4 shows the mean scores and SD in each skill category for each group.

Figure 1.

Figure 1

Mean scores and 95% confidence intervals for novices, intermediates and experts (1=good performance; 5=poor performance).

Table 4.

Mean scores and SDs in each skill category for each group (1=good performance, 5=poor performance)

Skill category Novice Intermediate Expert
Situation awareness 3.6±0.9 2.9±0.9 1.5±0.6
Decision making and prioritisation 3.5±0.9 2.8±0.9 1.5±0.6
Teamwork and communication 3.1±1.0 2.4±0.8 1.6±0.6
Self-awareness 3.6±0.9 2.6±0.8 1.8±0.9
Escalating care 3.3±1.0 2.5±0.9 1.5±0.8

Inter-Rater reliability

There was a strong positive correlation between the two raters’ independent scores (r=0.79).

Observability

Across all scenarios marked by both video raters, 99.7% (range 98%–100%) of skill categories and 84% (range 22%–100%) of skill elements were observable. Table 5 displays the observability for each item in the BMS.

Table 5.

Observability of skill categories and elements from the videos of simulation scenarios and factor loadings of each skill element to its skill category

Skill category (bold) and skill elements Observability (mean % of observed vs not observed ratings) Factor loading of skill element to skill category
Situation awareness 100
 Gathering information 98 0.93
 Recognising and understanding information 99 0.96
 Planning, preparing and anticipating 100 0.93
Decision making and prioritisation 100
 Prioritising 100 0.94
 Recognising and dealing with uncertainty 99 0.90
 Reviewing decisions 98 0.91
Teamwork and communication 100
 Establishing a shared mental model 98 0.73
 Demonstrating active followership 22 0.85
 Patient involvement 86 0.33
Self-awareness 100
 Role awareness 99 0.84
 Coping with stress 100 0.95
 Speaking up 51 0.90
Escalating care 98
 Situation awareness 90 0.92
 Decision making and prioritisation 77 0.88
 Teamwork and communication 28 0.77
 Self-awareness 98 0.86

Exploratory factor analysis

Table 5 displays the factor loadings for each skill element in relation to the skill category within which it lies. As shown, all factor loadings were positive and most were high. An exception was patient involvement (in the teamwork and communication skill category), which had a factor loading of 0.33.

Rater disagreements

The mean difference in score given between the two raters was 0.53 (range 0–2.05). Twelve videos (18.2%) had a score difference of greater than 1 point. Four of these were from the novice group, four from the intermediate group and four from the expert group. Rater 1 gave a higher score (indicating a poorer performance) in nine cases, and this was the case for rater 2 in three cases. Further analysis of these 12 cases demonstrated a mean difference of 1.7 between the two raters in the situation awareness skill category, 1.78 in decision making and prioritisation, 1.03 in teamwork and communication, 0.95 in self-awareness and 1.84 in escalating care.

Differential item functioning

Across all three groups, the mean scores for each skill category in ascending order (ie, best to worst performance) were 2.34 for teamwork and communication, 2.49 for escalating care, 2.6 for decision making and prioritisation, 2.67 for situation awareness and 2.69 for self-awareness. Figure 2 shows the mean scores for the skill categories in each group. Experts outperformed intermediates and intermediates outperformed novices across every skill category.

Figure 2.

Figure 2

Mean scores of the three groups in each skill category (1=good performance; 5=poor performance). DMP, decision making and prioritisation; EC, escalating care; SA, situation awareness; Self, self-awareness; T&C, teamwork and communication.

An overall larger difference in performance was found between the intermediate and expert groups (mean of 1.05 across all skill categories, range 0.8–1.45) versus that between the novice and intermediate groups (mean 0.8, range 0.66–1.08). Figure 3 shows these differences for each skill category.

Figure 3.

Figure 3

Differences in mean scores between groups for each skill category. DMP, decision making and prioritisation; EC, escalating care; SA, situation awareness; Self, self-awareness; T&C, teamwork and communication.

Discussion

Our objective was to assess the validity of the Medi-StuNTS system with reference to Messick’s contemporary validity framework.26 Regarding relationship with other variables, the results support the hypothesis that the level of performance of NTS during immersive simulation positively correlates with increasing levels of clinical experience when evaluated using the Medi-StuNTS system. This BMS is therefore able to significantly differentiate novices versus experts with regard to overall NTS performance in an immersive simulation scenario, in the context of caring for an acutely unwell patient. These findings are in keeping with what is known about NTS; that they improve with clinical experience and correlate with knowledge and technical skills.1 39 A strong positive correlation was demonstrated between the video raters marking the videos of the simulation scenarios. This is reassuring as reliability is a critical pairing to validity assurance; however, we acknowledge that more detailed work regarding inter-rater reliability of the Medi-StuNTS system is necessary.

Each skill category was found to be highly observable (range of 98%–100% observability), and indeed this was also the case for the majority of the skill elements. On a skill category level, these results compare favourably to the observability of other BMS such as Anaesthetists’ Non-Technical Skills (ANTS)11 (all skill categories had observability of >95%) and Scrub Practitioners’ Nontechnical Skills40 (all skill categories had an observability of 99%–100%). However, these BMS demonstrated slightly higher observability at the skill element level (range 66%–100% across both tools). Only two skill elements in the Medi-StuNTS system had observability below the acceptable level of 50% (demonstrating active followership and teamwork and communication within the context of escalating care), and one further skill element had borderline observability of 51% (speaking up). These are three clearly linked skill elements concerned with exhibiting behaviours related to supporting, prompting and challenging the leader.15 It may have been that the participants were sometimes performing in a leadership role as opposed to a followership role, rendering these skill elements unobservable for some scenarios. We suggest that the BMS may need further assessment for content regarding this in order to encompass a skill element reflecting leadership. The factor analysis indicated that the majority of skill elements related closely to other elements in the same skill category. The exception to this was patient involvement in the teamwork and communication category, which had a much lower factor loading of 0.33. This indicates that it may belong to a different category than the one in which it currently sits alongside the skill elements of shared mental model and demonstrating active followership. Students may be good at communicating with the patient, but this does not necessarily translate to good communication with the healthcare team, and these may actually be two different skills. This concept has been previously recognised in a study looking at preparedness of medical graduates for working as a doctor, which highlighted that communication with patients or relatives presents subtly different challenges compared with communicating with colleagues for this group.41

A disagreement of greater than 1 point (on a five-point scale) existed in less than one fifth of cases. Of these, the skill categories escalating care, decision making and prioritisation and situation awareness appeared to account for a larger proportion of the disagreements than teamwork and communication. We did not analyse the free text comments written by the raters and we did not track their thought processes while marking the videos; therefore, it is difficult to definitively comment on why these differences existed. It could be the case that the raters were more familiar with some NTS than others through their clinical or educational endeavours and therefore showed more congruence on these categories; for example, teamwork and communication are heavily emphasised from an early stage of clinical training, so both raters were likely to be very familiar with this skill. Their level of clinical experience may also have had an impact, as a doctor with 10 years clinical experience is likely to have a different outlook on appropriate care escalation compared with a more junior doctor, and this category demonstrated the largest difference between the two raters. These results give way to some interesting questions and opportunities for further research and could be combined with further inter-rater reliability studies as mentioned previously. Despite being an important element of the contemporary validity framework, response process evidence is rarely reported, with only 6% of simulation-based assessments doing so in Cook et al’s systematic review.27 This means there are limited accepted methods of collecting evidence for response process and few other findings with which to compare our results. These results therefore present a useful starting point with which future studies can be compared.

In this paper, we explored differential item functioning as a method for consequences evidence. We expected that within each group, participants would perform equally well in each skill category, but this was not the case, as shown in figures 2 and 3. In another study that used this method to examine communication skills using a performance instrument,42 it was found that doctors outperformed medical students on only one subscale of the tool. The authors suggested that this could be due to issues with the scale (such as it not being sensitive enough to detect differences in some areas) or with the raters (such as inadequate training or inability to discriminate specific skills). It may be the case that similar explanations underly the inconsistencies in the skill category scores across our three groups. However, these inconsistencies could be a true reflection of differing abilities between the groups; therefore, these data also provide an opportunity to explore how clinical experience influences NTS acquisition. Situation awareness and decision making and prioritisation are inherently linked cognitive skills.1 These categories were the two in which the expert group performed the best when assessed using the Medi-StuNTS system; however, as shown in figure 2, these were among the weakest areas for both medical student groups. Interestingly, this trend has actually been highlighted in other studies of expert–novice comparisons. Jirativanont et al 33 found that situation awareness was the NTS category performed best using the ANTS system for PGY 3 doctors, but not for PGY 1 or 2 doctors. Similarly, correlation of situation awareness and decision-making skills with years of clinical experience have been noted using Non-Technical Skills for Surgeons, where other categories including communication and leadership skills did not appear to be as linked to length of clinical training.43 In our data, scores in situation awareness and decision making and prioritisation demonstrated the biggest difference of all categories between the intermediate and expert groups (mean differences of 1.45 and 1.33, respectively). Therefore, it appears that these skills may be acquired to a greater extent with clinical experience compared with other NTS categories.

It is likely that the pattern of acquisition of the cognitive skills described previously is related to the interaction between these skills and working memory. Situation awareness has been described as comprising the perception and comprehension of information and using this to predict future states.44 It is heavily dependent on working memory, a limited resource.45 Endlsey46 suggested that because novices have not yet seen many medical conditions, they need to analyse each piece of information to inform their understanding of the situation, and they may not understand the significance of certain pieces of information. The impact of this is that working memory becomes overloaded; situation awareness is reduced; and inefficient or frankly incorrect decision making follows. Clinical experience allows the building of a ‘bank’ of patterns and presentations which can be instantly recognised, reducing working memory expenditure and allowing ‘primed decision making’47 and improved situation awareness. Our data demonstrated a larger difference in performance in both situation awareness and decision making and prioritisation between the intermediate and expert groups compared with those between the two medical student groups (mean differences for situation awareness of 1.45 vs 0.68, respectively; mean differences for decision making and prioritisation of 1.33 vs 0.66, respectively). This may be explained by the fact that these ‘banks’ only really start to be built when working full time with patients and being exposed to numerous and varied clinical presentations that are not seen during medical school. In addition, doctors are more likely than medical students to have participated in training focussing on situation awareness.1 They may also have attended more simulation-based education, which lends itself to practical training in situation awareness and decision making and prioritisation through exposure to scenarios to build the ‘bank’, incorporating reflective practice.22 46

It is therefore understandable that situation awareness and decision making and prioritisation will improve with real clinical experience, but this improvement appeared to be disproportionately larger than the improvement in other NTS. Teamwork and communication was the highest-performing skill category overall (mean score 2.34) and was the category in which both groups of medical students scored highest (novice mean score 3.08, intermediate mean score 2.4). A similar pattern was seen using the Ottawa Crisis Resource Management Global Rating Scale,34 where communication was the highest-scoring NTS in two simulation scenarios for both PGY 1 and 3 doctors. The explanation of such good performance from even an early stage may lie in medical student training. Medical school curricula appear to focus specifically on communication skills, and the General Medical Council (the public body that controls the medical register, regulates and sets standards for medical education and is responsible for revalidation for doctors in the UK) Outcomes for Graduates document outlines multiple criteria for qualifying medical students, including communication, interpersonal skills and teamworking.48 Although these skills are clearly important, it may be that their emphasis is at the cost of development in other NTSs during medical school. Self-awareness was the overall lowest scoring NTS in our data (mean score 2.69), which may reflect that this is an underemphasised skill set in both medical students and qualified doctors.

The Medi-StuNTS system is the first BMS designed specifically for assessing the NTS of medical students15 and is now the first tool for the assessment of NTS during simulation for medical students with validity evidence (to the best of our knowledge).9 It is essential that a tool for teaching and assessing NTS in this group exists as we know that NTS are critical for patient safety,1 49 and this group will soon be qualified doctors and responsible for patient care. By reporting validity evidence on each aspect of the contemporary validity framework,26 these results will allow us to confidently begin using the Medi-StuNTS system within medical student education. However, Cheng et al 30 highlighted that ”validation is a journey, not a destination”. We must bear this in mind when interpreting the results of this study, and in addition to providing evidence of validity for the Medi-StuNTS system, we have also highlighted opportunities for further research.

The patterns of NTS acquisition with clinical experience shown by using the Medi-StuNTS system have added to current literature by focussing on the performance of medical students, a group in whom little data are currently available. This should embolden those teaching NTS to medical students to consider the emphasis we give to each skill. If we are to improve patient safety through training in NTS, there should be a focus on a wider range of NTS as part of primary medical training, to prevent certain topics becoming sidelined and undervalued.49 Simulation-based education is an effective training modality for this purpose, as it has been shown that it can improve performance in several NTS categories in medical students.50

Strengths and limitations

There were several strengths to this study. A power calculation was undertaken in order to provide an adequate sample size. In addition, the participants were recruited from multiple institutions across Scotland and therefore were likely to be representative of a range of experience and competence, increasing generalisability of results. We took several steps to ensure blinding of the video raters regarding the group of the participants; however, limits in clinical knowledge shown by participants in the videos may have given some indication to their stage of training. It should be recognised that this study was designed and conducted within the context of simulation and not within real clinical practice. Although some interesting concepts arose from the data regarding acquisition of different NTS with clinical experience, these were based on observed trends as opposed to statistical analysis (as the study was not powered for this); therefore, no statistically significant conclusions can be drawn with regard to this particular aspect. This study addressed several facets of validity; however, since some forms of validity (eg, response process and consequences) have not been frequently sought for other BMS, it was difficult to compare the results to existing literature and to draw a firm conclusion on whether the results attained sufficient levels of acceptability.

Future work

Future work will involve embedding the Medi-StuNTS system in medical student simulation curricula and seeking further evidence to support its validity and wider utility.23 A separate qualitative study has investigated the educational impact of the Medi-StuNTS system,51 and further studies regarding consequences evidence, such as pass marks, would be valuable. In addition, a larger study to statistically analyse the differences between acquisition of different NTS would be of interest.

Conclusion

We have demonstrated evidence of several aspects of validity of the Medi-StuNTS system when assessing medical students’ NTS during immersive simulation. We can now begin to introduce this system into simulation-based education to maximise NTS training in this group. Trends in NTS categories indicate that we should consider increasing emphasis on NTSs that are currently under-represented in medical student curricula.

Acknowledgments

We thank Sarah Lavery and Jamie Dickson for their assistance in running the simulation scenarios, and Michael Moneypenny and Tanya Somerville for their support and for allowing us to use the Scottish Centre for Simulation and Clinical Human Factors for this research. Thank you to all participants without whom the study would not have been possible.

Footnotes

Twitter: @emma_c_phillips

Contributors: ECP, SES and VRT: conception and design of the work; data collection, analysis and interpretation; drafting of the paper; revisions to the paper; and approval of the final version of the manuscript for submission. BC, ALH and JK: conception and design of the work, data collection, drafting of the paper and approval of the final version of the manuscript for submission. JH: data collection, drafting of the paper and approval of the final version of the manuscript for submission. All authors agrede to be accountable for all aspects of the work.

Funding: The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests: None declared.

Ethics approval: This study obtained ethical approval from the University of Edinburgh College of Medicine and Veterinary Medicine Student Ethics Committee (approval number 2017/11), with NHS Research Ethics Committee review waived by NHS Forth Valley Research and Development department.

Provenance and peer review: Not commissioned; externally peer reviewed.

Data availability statement: Data are available upon reasonable request.

References

  • 1. Flin R, O’Connor P, Crichton M. Safety at the sharp end: a guide to non-technical skills. Ashgate Publishing Ltd, 2013. [Google Scholar]
  • 2. McCulloch P, Mishra A, Handa A, et al. The effects of aviation-style non-technical skills training on technical performance and outcome in the operating theatre. Qual Saf Health Care 2009;18:109–15. 10.1136/qshc.2008.032045 [DOI] [PubMed] [Google Scholar]
  • 3. Marshall DA, Manus DA. A team training program using human factors to enhance patient safety. Aorn J 2007;86:994–1011. 10.1016/j.aorn.2007.11.026 [DOI] [PubMed] [Google Scholar]
  • 4. Mishra A, Catchpole K, Dale T, et al. The influence of non-technical performance on technical outcome in laparoscopic cholecystectomy. Surg Endosc 2008;22:68–73. 10.1007/s00464-007-9346-1 [DOI] [PubMed] [Google Scholar]
  • 5. Murphy P, Nestel D, Gormley GJ. Words matter: towards a new lexicon for ‘nontechnical skills’ training. Adv Simul 2019;4:8. 10.1186/s41077-019-0098-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Nestel D, Walker K, Simon R, et al. Nontechnical skills: an inaccurate and unhelpful descriptor? Simul Healthc 2011;6:2–3. 10.1097/SIH.0b013e3182069587 [DOI] [PubMed] [Google Scholar]
  • 7. Rosen MA, Weaver SJ, Lazzara EH, et al. Tools for evaluating team performance in simulation-based training. J Emerg Trauma Shock 2010;3:353–9. 10.4103/0974-2700.70746 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Flin R, Martin L. Behavioral markers for Crew resource management: a review of current practice. Int J Aviat Psychol 2001;11:95–118. 10.1207/S15327108IJAP1101_6 [DOI] [Google Scholar]
  • 9. Gordon M, Farnan J, Grafton-Clarke C, et al. Non-Technical skills assessments in undergraduate medical education: a focused BEME systematic review: BEME guide No. 54. Med Teach 2019;41:732–45. 10.1080/0142159X.2018.1562166 [DOI] [PubMed] [Google Scholar]
  • 10. Dietz AS, Pronovost PJ, Benson KN, et al. A systematic review of behavioural marker systems in healthcare: what do we know about their attributes, validity and application? BMJ Qual Saf 2014;23:1031–9. 10.1136/bmjqs-2013-002457 [DOI] [PubMed] [Google Scholar]
  • 11. Fletcher G, Flin R, McGeorge P, et al. Anaesthetists' non-technical skills (ants): evaluation of a behavioural marker system. Br J Anaesth 2003;90:580–8. 10.1093/bja/aeg112 [DOI] [PubMed] [Google Scholar]
  • 12. Rutherford JS, Flin R, Irwin A, et al. Evaluation of the prototype anaesthetic non-technical skills for anaesthetic practitioners (ANTS-AP) system: a behavioural rating system to assess the non-technical skills used by staff assisting the anaesthetist. Anaesthesia 2015;70:907–14. 10.1111/anae.13127 [DOI] [PubMed] [Google Scholar]
  • 13. Flowerdew L, Brown R, Vincent C, et al. Development and validation of a tool to assess emergency physicians' nontechnical skills. Ann Emerg Med 2012;59:376–85. 10.1016/j.annemergmed.2011.11.022 [DOI] [PubMed] [Google Scholar]
  • 14. Mellanby EA, Hume M, Glavin R. Development of a behavioural marker system for the non-technical skills of junior doctors in acute care, 2013. Available: http://www.docs.hss.ed.ac.uk/iad/Learning_teaching/Academic_teaching/PTAS/Outputs/Mellanby_Jan2012award_PTAS_Final_Report.pdf
  • 15. Hamilton AL, Kerins J, Maccrossan MA, et al. Medical Students’ Non-Technical Skills (Medi-StuNTS): preliminary work developing a behavioural marker system for the non-technical skills of medical students in acute care. BMJ Simul Technol Enhanc Learn 2018;5. 10.1136/bmjstel-2018-000310 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Yule S, Flin R, Paterson-Brown S, et al. Development of a rating system for surgeons' non-technical skills. Med Educ 2006;40:1098–104. 10.1111/j.1365-2929.2006.02610.x [DOI] [PubMed] [Google Scholar]
  • 17. Undre S, Healey A, Sevdalis N, et al. The observational teamwork assessment for surgery (OTAS): development, feasibility and reliability. Proc Hum Factors Ergon Soc Annu Meet 2007;51:673–7. 10.1177/154193120705101115 [DOI] [Google Scholar]
  • 18. Mishra A, Catchpole K, McCulloch P. The Oxford NOTECHS system: reliability and validity of a tool for measuring teamwork behaviour in the operating theatre. Qual Saf Health Care 2009;18:104–8. 10.1136/qshc.2007.024760 [DOI] [PubMed] [Google Scholar]
  • 19. Andersen PO, Jensen MK, Lippert A, et al. Development of a formative assessment tool for measurement of performance in multi-professional resuscitation teams. Resuscitation 2010;81:703–11. 10.1016/j.resuscitation.2010.01.034 [DOI] [PubMed] [Google Scholar]
  • 20. Mitchell L, Flin R, Yule S, et al. Development of a behavioural marker system for scrub practitioners' non-technical skills (splints system). J Eval Clin Pract 2013;19:317–23. 10.1111/j.1365-2753.2012.01825.x [DOI] [PubMed] [Google Scholar]
  • 21. Flin R, Martin L, Goeters K-M, et al. Development of the NOTECHS (non-technical skills) system for assessing pilots’ CRM skills. Available: https://www.abdn.ac.uk/iprc/documents/NOTECHSHFASproofcopy.pdf
  • 22. Maran N, Edgar S, May A. The Non-Technical Skills. In: Essential simulation in clinical education. John Wiley & Sons, Ltd, 2013: 131–45. [Google Scholar]
  • 23. Van Der Vleuten CP, Vleuten C. The assessment of professional competence: developments, research and practical implications. Adv Health Sci Educ Theory Pract 1996;1:41–67. 10.1007/BF00596229 [DOI] [PubMed] [Google Scholar]
  • 24. Cronbach LJ, Meehl PE. Construct validity in psychological tests. Psychol Bull 1955;52:281–302. 10.1037/h0040957 [DOI] [PubMed] [Google Scholar]
  • 25. Messick S. Validity. In: Linn R, ed. Educational measurement. 3rd ed. New York: American Council on Education and Macmillan, 1989: 13–103. [Google Scholar]
  • 26. Frey BB. Standards for educational and psychological testing. Washington, DC: American Educational Research Association, 2014. [Google Scholar]
  • 27. Cook DA, Zendejas B, Hamstra SJ, et al. What counts as validity evidence? examples and prevalence in a systematic review of simulation-based assessment. Adv Health Sci Educ Theory Pract 2014;19:233–50. 10.1007/s10459-013-9458-4 [DOI] [PubMed] [Google Scholar]
  • 28. Higham H, Greig PR, Rutherford J, et al. Observer-based tools for non-technical skills assessment in simulated and real clinical environments in healthcare: a systematic review. BMJ Qual Saf 2019;28:672–86. 10.1136/bmjqs-2018-008565 [DOI] [PubMed] [Google Scholar]
  • 29. Cooper S, Endacott R, Cant R. Measuring non-technical skills in medical emergency care: a review of assessment measures. Open Access Emerg Med 2010;2:7–16. 10.2147/OAEM.S6693 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Cheng A, Nadkarni VM, Mancini MB, et al. Resuscitation education science: educational strategies to improve outcomes from cardiac arrest: a scientific statement from the American heart association. Circulation 2018;138:e82–122. 10.1161/CIR.0000000000000583 [DOI] [PubMed] [Google Scholar]
  • 31. Wood TC, Raison N, Haldar S, et al. Training tools for Nontechnical skills for Surgeons-A systematic review. J Surg Educ 2017;74:548–78. 10.1016/j.jsurg.2016.11.017 [DOI] [PubMed] [Google Scholar]
  • 32. Royal K. "Face validity" is not a legitimate type of validity evidence! Am J Surg 2016;212:1026–7. 10.1016/j.amjsurg.2016.02.018 [DOI] [PubMed] [Google Scholar]
  • 33. Jirativanont T, Raksamani K, Aroonpruksakul N, et al. Validity evidence of non-technical skills assessment instruments in simulated anaesthesia crisis management. Anaesth Intensive Care 2017;45:469–75. 10.1177/0310057X1704500410 [DOI] [PubMed] [Google Scholar]
  • 34. Kim J, Neilipovitz D, Cardinal P, et al. A pilot study using high-fidelity simulation to formally evaluate performance in the resuscitation of critically ill patients: the University of Ottawa critical care medicine, high-fidelity simulation, and crisis resource management I study. Crit Care Med 2006;34:2167–74. 10.1097/01.CCM.0000229877.45125.CC [DOI] [PubMed] [Google Scholar]
  • 35. Reid J, Stone K, Brown J, et al. The simulation team assessment tool (STAT): development, reliability and validation. Resuscitation 2012;83:879–86. 10.1016/j.resuscitation.2011.12.012 [DOI] [PubMed] [Google Scholar]
  • 36. Abell N, Abell N, Springer DW, et al. Developing and validating rapid assessment instruments. Oxford University Press, 2009. [Google Scholar]
  • 37. Tavakol M, Dennick R. Making sense of Cronbach's alpha. Int J Med Educ 2011;2:53–5. 10.5116/ijme.4dfb.8dfd [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Yong AG, Pearce S. A Beginner’s Guide to Factor Analysis: Focusing on Exploratory Factor Analysis. Tutor Quant Methods Psychol 2013;9:79–94. 10.20982/tqmp.09.2.p079 [DOI] [Google Scholar]
  • 39. Riem N, Boet S, Bould MD, et al. Do technical skills correlate with non-technical skills in crisis resource management: a simulation study. Br J Anaesth 2012;109:723–8. 10.1093/bja/aes256 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Mitchell L, Flin R, Yule S, et al. Evaluation of the Scrub Practitioners’ List of Intraoperative Non-Technical Skills (SPLINTS) system. Int J Nurs Stud 2012;49:201–11. 10.1016/j.ijnurstu.2011.08.012 [DOI] [PubMed] [Google Scholar]
  • 41. Tallentire VR, Smith SE, Wylde K, et al. Are medical graduates ready to face the challenges of Foundation training? Postgrad Med J 2011;87:590–5. 10.1136/pgmj.2010.115659 [DOI] [PubMed] [Google Scholar]
  • 42. LeBlanc VR, Tabak D, Kneebone R, et al. Psychometric properties of an integrated assessment of technical and communication skills. Am J Surg 2009;197:96–101. 10.1016/j.amjsurg.2008.08.011 [DOI] [PubMed] [Google Scholar]
  • 43. Beard JD, Marriott J, Purdie H, et al. Assessing the surgical skills of trainees in the operating theatre: a prospective observational study of the methodology. Health Technol Assess 2011;15:i-xxi, 1-162. 10.3310/hta15010 [DOI] [PubMed] [Google Scholar]
  • 44. Endsley MR. Toward a theory of situation awareness in dynamic systems. Hum Factors 1995;37:32–64. 10.1518/001872095779049543 [DOI] [Google Scholar]
  • 45. Miller GA. The magical number seven plus or minus two: some limits on our capacity for processing information. Psychol Rev 1956;63:81–97. 10.1037/h0043158 [DOI] [PubMed] [Google Scholar]
  • 46. Endsley MR. Situation awareness. Oxford Handb Cogn Eng, 2013. http://oxfordhandbooks.com/view/ [Google Scholar]
  • 47. Klein GA, Calderwood R. Rapid decision making on the fire ground. Human Factors and Ergonomics Society Annual Meeting Proceedings, 1988. [Google Scholar]
  • 48. General Medical Council . Outcomes for graduates, 2018. [Google Scholar]
  • 49. Flin R, Patey R. Improving patient safety through training in non-technical skills. BMJ 2009;339:b3595. 10.1136/bmj.b3595 [DOI] [PubMed] [Google Scholar]
  • 50. Hagemann V, Herbstreit F, Kehren C, et al. Does teaching non-technical skills to medical students improve those skills and simulated patient outcome? Int J Med Educ 2017;8:101–13. 10.5116/ijme.58c1.9f0d [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Kerins J, Smith SE, Phillips EC, et al. Exploring transformative learning when developing medical students' non-technical skills. Med Educ 2020;54:264–74. 10.1111/medu.14062 [DOI] [PubMed] [Google Scholar]

Articles from BMJ Simulation & Technology Enhanced Learning are provided here courtesy of BMJ Publishing Group

RESOURCES