Skip to main content
BJA Education logoLink to BJA Education
. 2021 Jan 14;21(4):148–153. doi: 10.1016/j.bjae.2020.12.002

The role of simulation in high-stakes assessment

J Dupre 1, VN Naik 1,2,
PMCID: PMC7984967  PMID: 33777413

Learning objectives.

By reading this article, you should be able to:

  • Describe the breadth of simulation modalities available for teaching and assessment.

  • Discuss evaluative approaches and frameworks for different assessment strategies.

  • Explain the role of simulation in a programme of high-stakes assessment, including certification examinations and workplace-based assessment.

Key points.

  • Simulation encompasses a wide range of different modalities from standardised patients to computer-enhanced mannequins.

  • Assessing competencies with simulation is complex, as no modality can completely duplicate clinical performance.

  • Validity, reliability, acceptability, educational impact and cost-effectiveness are important measures of an assessment strategy.

  • Simulation may not be a valid or cost-effective assessment strategy for a certification examination.

  • Simulation may have a role in higher-stakes assessment within a programme of assessment for core competencies that may be rarely demonstrable, or where management cannot be delegated to a trainee for assessment in the clinical environment.

Breadth of simulation

High-stakes assessments guide selection, progression and certification decisions. Before considering the role of simulation in high-stakes assessment, we must explore what is meant by the term, as it can encompass a wide range of approaches. The breadth of simulation can range from ‘standardised patients’ (SPs; actors), which may use no technology at all, to high-technology computer-enhanced mannequins. This spectrum is continually evolving with the high-technology end only limited by engineering advances. For the purposes of this brief review, simulation modalities can be grouped into four broad categories: SPs, part-task trainers (PTTs), virtual reality (VR) or computer-based simulations and computer-enhanced mannequins. Hybrid simulations combine offerings from two or more of the broad categories to enhance the objectives of the simulation.

Standardised patients are a form of simulation available in medical education. Standardised patients are actors trained to play the role of a patient and were first induced into medical education in the 1960's.1 Standardised patients can play roles as colleagues seeking assistance with a task, or patients in preoperative consultation or postoperative care. As a body SP, the purpose of the role is to assess a candidate's ability to perform physical examination skills. Standardised patients enhance the validity and reliability of an assessment compared with using real patients, by reproducing the same history, emotional tone, physical examination findings and communication style, without the potential fatigue and stress experienced by a real patient.2 They can also be used in training to augment the spectrum of cases or presentations that trainees can experience, in addition to being trained to provide valuable feedback for learners.

Part-task trainers and VR trainers encompass a wide range of technology and fidelity. They can either simulate an anatomical region or a specific procedure, and can be characterised by the following: anatomical and non-anatomical models; simulations of basic psychomotor tasks, individual surgical tasks and whole procedures; presence or absence of haptic (force) feedback; type of material used; VR and augmented reality.3 Virtual simulation can be either static or dynamic, based on the type of feedback and responses they provide the learner. Historically, computer simulation in medical education was static and used mainly for teaching students to ask relevant questions and order the correct diagnostic tests.4 However, the virtual learning environment has evolved to include dynamic simulations and can simulate physiological events, such as respiration, bleeding and patients' discomfort—forms of feedback that are absent from mechanical PTTs. More recently, virtual patients (VPs) have evolved to provide an interactive learning environment. Virtual patients can use computer technology that simulates reality via the use of a VR helmet and VR glove, and interaction through voice recognition.5,6 Virtual patients have two advantages over SPs: (i) they are more reproducible; and (ii) for training purposes, they have the ability for unscheduled repetitive practice.7

Computer-enhanced mannequins are full-body mannequins that are driven by computers. Similar to PTTs, these simulators replicate human anatomy with a high degree of likeness. However, they differ from PTTs in that, through the computer interface, they can mimic a variety of medical conditions and respond to medical interventions, including giving medications, and a variety of procedures. These responses can be preprogrammed or driven by an operator using the computer interface. They can be used to assess both individuals and healthcare teams.

Key elements in the evaluation of assessment strategies

Although the simulation modalities mentioned are unquestionably highly valuable educational tools designed to provide learning opportunities, the benefits and the weaknesses of the integration of all forms of simulation into assessments must be carefully considered. A framework can help evaluate the aspects of any assessment modality to determine an optimal blend of testing modalities relating back to the overall purpose of the assessment.

Using simulation to assess clinical competence is complex and inferential given that any simulation only approximates, as opposed to replicating identically, the clinical environment. Validity describes the confidence in the modality to provide evidence that supports or refutes performance in the real world. Reliability is often used to indicate the consistency to produce similar results with similar performances in consistent conditions, although modern assessment theories include reliability as a facet of validity. Tavares and colleagues and Cook and Hatala applied modern theories in validity to simulation-based assessments across four key inferences: (i) the comprehensiveness and reliability of the scoring tools, (ii) the generalisability of the findings that are enhanced by robust sampling of different clinical scenarios, (iii) the extrapolation or correlation with performance in the real world if available and (iv) the implication on overall safety and quality in the healthcare system (Fig. 1).8,9

Fig 1.

Fig 1

Key inferences in validation frameworks.

Other important issues for consideration include the financial cost and ease of implementation of simulation-based assessments, and their potential effect on learning.10 Van Der Vleuten explains the educational impact of an assessment as how it can be a driver for learning, as people strategically ‘study to the test’.11 He also noted the importance of the acceptability of an assessment to both the faculty and the learner.11 According to Van Der Vleuten, the different strengths and weaknesses inherent in any form of assessment and trade-offs are inevitable, and there is no perfect assessment approach. For example, although a multiple-choice examination may be a highly reliable and cost-effective form of assessment for competencies in how to manage comorbidities when planning for anaesthesia, and a driver for learning the knowledge necessary to be an anaesthetist, its ability to accurately predict a crisis resource management performance is not ideal. Ideally, the purpose of the assessment should be identified first, and then matched to an appropriate assessment approach to maximise the most important elements for the assessment.

High-stakes assessments and certification examinations

Assessment has historically been described as a dichotomy, being either formative or summative. Formative assessments are described for the purposes of providing a learner with feedback on a performance. This allows learners to review and reflect against their existing knowledge and understanding; and to enhance their performance towards a new learning outcome to be assessed again for the purposes of feedback.12 Summative assessments are described as a final evaluation at the end of an instructional unit, often compared against a benchmark or criterion to establish the validity. This dichotomous approach to assessment may be outdated in a culture of lifelong learning and continuous professional development. More practically, assessments can be viewed on a continuum between low-stakes assessments and high-stakes assessment. Lower-stakes decisions are clearly identified as a single data point focused on feedback for a particular task and not a progression decision. Higher-stakes assessments are selection, progression or certification oriented, and require multiple data points to assess the breadth of the discipline as possible for generalisability and enhanced validity in the decision. The primary areas of concern for a high-stakes assessment for certification are the validity of the examination, reproducibility of the results and feasibility and cost-effectiveness of the assessment process. Ideally, a high-quality programme of assessment for lifelong learning includes appropriate blend of both lower- and higher-stakes assessments, and enough sampling to be confident in the extrapolation of competencies to performance in real life.

Arguably, a national certification examination (such as the Fellowship of the Royal College of Anaesthetists in the UK, the Fellow of the Royal College of Physicians of Canada in Canada or board certification in the USA) is one of the highest-stakes assessments with a role to validate that a candidate has achieved the competencies to practice independently against a national standard. The content of the examination is blueprinted against the desired competencies, but as with any test, it also needs to be limited in duration. As such, any examination or other form of test can only test a sample of the breadth of all possible competencies. Great consideration and effort go into ensuring that a performance on a particular test form can be generalised to any test form, and that those test results can be extrapolated to performance in the real world. The culture of high-stakes certification assessment is integrated and normalised at all levels of education, and is therefore important in maintaining public trust, particularly in healthcare.

Standardised patients for high-stakes certification examinations

Standardised patients playing roles as colleagues behaving inappropriately in the workplace, or patients/family members refusing treatment, are amongst the more complex interactions that they are frequently asked to perform for candidates at the end of their training. Applying the evaluative criteria noted earlier, these typical SP certification cases can display a high degree of validity, as they are a reasonable representation of what a resident may encounter in a real-world setting. When used appropriately, SPs can improve the validity of an examination by allowing for the assessment of communication and collaboration competencies in a realistic manner.

The reproducibility of the assessment of candidates using SPs is generally good, with body patient-type SP physical examination interactions being easier to control, provided that there are no underlying pathologies, such as examination of an airway or an approach to cardiac auscultation.13 This is not always the case with SPs that are required to emulate more complex and multifaceted patients or colleagues. As SPs are required to embody a more complex set of characteristics and behaviours, such as patients with complex chronic pain or receiving bad news, they may become more susceptible to variations in performance. Additional factors, such as fatigue, can also have an impact on the performance delivered during each interaction. In contexts where it is essential to convey emotions, such as anger or grief, these factors can affect the delivery of these performances. Steps can be taken to limit these detrimental effects, but only to an extent.14 Furthermore, when there is a need to have multiple SPs performing the same role simultaneously, as is the case in a large, multicircuit objective structured clinical examination (OSCE), there is an additional variability between SPs to consider, in addition to the variability within individual SPs. As such, it is prudent to limit the use of SPs to situations where they are superior to other modalities in assessing a specific competency, such as physical examination or communication skills.15,16

Although the use of SPs allows for a robust assessment of the competencies of learners in the areas noted, the context of the use of SPs is also important to consider. The inclusion of the assessment of these skills on a high-stakes summative certification examination may not be the most effective use of the available testing time. A more reflective assessment of the candidate's true ability to execute a physical examination or communicate with a patient or colleague is almost certainly better assessed in the workplace during their training programme.

Virtual reality, PTTs and computer-enhanced mannequins for high-stakes certification examinations

As noted, the introduction of these advanced forms of simulation has allowed learners to build their skills when there are limited opportunities based on the patient group with which they interact. By presenting a safe and controllable learning environment, these technologies allow for the development of learners' skills in areas they may not be adequately exposed to were they to rely on traditional, patient–physician interactions.17 Specifically, they provide an opportunity to assess teamwork, leadership, situation awareness and decision-making through the management of a crisis.18 In addition, the lack of scheduling requirements, such as those present with SPs, and the fact that procedures can be repeated time and again without any harm coming to real patients are highly desirable characteristics of these technologies.

Despite the benefits they offer to learners in the acquisition of the skills required to gain certification, the variability in the adoption of these technologies from centre to centre and the differences between the simulators on the market present some significant barriers to their use in high-stakes assessment. The inclusion of specialised simulators, VR or PTTs in high-stakes certification assessment would require that all test takers be familiar with these tools, potentially including variation between manufacturers, before the examination, and have the opportunity to practice. Although they are designed to replicate a real patient, there is a learning curve to overcome when they are used.19 The consequences of including these types of simulation without accounting for the training of test takers on how to use them compromise the validity of the examination. The results of the examination, by consequence, may be more reflective of previous exposure to a specific simulator or simulation experience, and not a material difference in the candidate's ability to perform a task on a real patient.

Shift to competency-based medical education

There has been a global shift to competency-based medical education. The foundation of this approach to medical education is the well-articulated knowledge, skills and attitudes required for independent practice in a discipline.20 These competencies are defined for the faculty and learner, but more importantly, the achievement of these competencies is observed and measured in the training programmes through workplace-based assessments. Coaching focuses on the measured observations and provides the learners with meaningful feedback to guide their learning plans and development, in a timely manner.21 Reliability in achievement of competencies correlates directly with the number of observations in as many different contexts as possible, and further validates a programme decision to graduate a trainee to practice. In an ideal circumstance, with opportunities for multiple observations of all trainees equally in many different contexts, and good faculty development for calibrated assessment, coaching and feedback, the role of confirmatory high-stakes certification examinations may be diminished. This represents a future state and goal, and has been a fundamental shift in the culture of medical education, and will take years or decades for wide penetration across all disciplines globally. Put another way, if a high-quality accredited training programme has made multiple high-quality observations and assessments against the knowledge, skills and attitude standards required to practice in the discipline longitudinally, should a single point in time high-stakes assessment be necessary to support, or worse negate, their judgement of the trainee? The obvious chasm to bridge, to trust the medical education system without a high-stakes certification examination as a back stop, is to define what ‘high-quality’ measurement looks like in a workplace-based setting.

Competency-based education depends on opportunities for workplace-based assessment. In the clinical context, this depends on the patients, pathologies and services provided in the environment. The challenge for anaesthesia is that there are some clinical scenarios that are rare, but nonetheless critical competencies for practice in the discipline. Given the rarity in the presentation of these scenarios, a trainee may not even experience them in the course of a finite training programme, and if they were to present, intervention and management would likely default from the trainee to a more experienced senior faculty. In anaesthesia, examples include the management of malignant hyperthermia, intraoperative cardiac arrest and ‘cannot intubate/cannot ventilate’ situations. Importantly, the achievement of these competencies requires demonstrable technical skills (i.e. knowledge and procedural) and non-technical skills (i.e. situation awareness, teamwork, leadership, communication, etc.), and thus cannot be adequately assessed through traditional written and oral examinations.22 As such, we have a gap in the valid and reliable assessment of these key competencies for some disciplines.

Role of simulation in a programme of assessment

Simulation may provide an opportunity to address this gap. As mentioned previously, the reliability, acceptability, feasibility and risk associated with advanced forms of simulation in a high-stakes certification examination are unpalatable. However, embedding technology-enhanced simulation alongside a workplace-based assessment strategy within a programme may be more feasible. Rare clinical scenarios, with key competencies that need to be assessed, can be managed through simulation with a trainee as the most responsible physician.23 As a workplace-based assessment, the stakes of this assessment would be lower. Trainees could either be required to demonstrate adequate management of the scenarios towards certification, or the data from management in the simulator could be added to other observations from the clinical environment for a more informed entrustment of achievement of the competency.24 If programmes preferred a higher stake in these simulation assessments, performances could be rated against an established rubric by unbiased faculty blinded to the trainee's performance in the clinical context. By embedding a moderate-to high-stakes simulation-based assessment at the programme level, we can improve the reliability of our decisions within a programme of assessment with repeat performances as necessary, mitigate the logistic and feasibility risks of technological/equipment failure and improve the acceptability of the use of these technologies from the learner's perspective in a lower-stakes assessment in comparison with the final certification examination. The investment would still be a barrier, but well balanced against the value of addressing the gap in the assessment of key competencies.

Conclusions

Simulation may have an important role to play in a programme of assessment. Simulation has traditionally served as a teaching tool to accelerate the learning curve and prepare the learner for experiences in the clinical environment. Its role in assessment has been reserved in a milestone capacity, as a threshold for learners to ‘show’ they have competencies in a simulated setting, so that they are ready to consolidate what they have learned and can ‘do’, and thereby demonstrate professional authenticity for an expert with patients in the clinical environment (Fig. 2).25

Fig 2.

Fig 2

Miller's prism of clinical competence (also known as Miller's pyramid).25 Adapted by Drs R. Mehay and R. Burns, UK (January 2009). MCQ, multiple-choice question; OSCE, objective structured clinical examination.

Any assessment in a programme should be scrutinised for its intended and unintended impact. Valid and reliable assessments are the goal of any programme, but we also need to understand the impact on education, acceptance by faculty and the learner, and feasibility. Integrating simulation assessments requires resources, and that investment can grow exponentially when the stakes of the assessment are heightened and standardised for all graduates on a national scale. Choosing when to use simulation for higher-stakes assessment needs to be thoughtful and purposeful to maximise their value in certification decisions, with good recognition of the risks and limitations. We would suggest that these assessments be reserved for critical competencies that require demonstration of both technical and non-technical skills, and so cannot be assessed by traditional written and oral examinations; and that cannot be reliably assessed in the clinical environment because opportunities are rare but critical; or require deferral of management to a more senior clinician. We would also suggest that these simulation-based assessments may be best served within training programmes, to mitigate the risk of a process issue (technological or equipment) in administering the examination, and facilitate the proper orientation for a valid assessment of the underlying constructs, and not a learner's familiarity with the simulator. Programmes can consider whether to use assessments as very-high-stakes evaluations for decisions about certification, or as another data point consistent with these decisions to date.

Declaration of interests

The authors declare that they have no conflicts of interest.

MCQs

The associated MCQs (to support CME/CPD activity) will be accessible at www.bjaed.org/cme/home by subscribers to BJA Education.

Biographies

Viren Naik is the director of assessment for the Royal College of Physicians and Surgeons of Canada, a professor of anaesthesiology and the R.S. McLaughlin Professor of Medical Education at the University of Ottawa. He has more than 20 yrs of experience as an educator and oversees the credentialling of specialist training and assessment, including examinations, for all medical specialties in Canada.

Jonathan Dupre is the team lead for computer-based examinations in the Examination Unit at the Royal College of Physicians and Surgeons of Canada. He holds a Bachelor's degree in Biopharmaceutical Science from the University of Ottawa and a Master's degree of Education in measurement from the University of Illinois at Chicago. His primary interests have been in evaluation for high-stakes certification examinations.

Matrix codes: 1H02, 2H02, 3J02

References

  • 1.Barrows H.S. An overview of the uses of standardized patients for teaching and evaluating clinical skills. Acad Med. 1993;68:443–453. doi: 10.1097/00001888-199306000-00002. [DOI] [PubMed] [Google Scholar]
  • 2.Howley L.D. Performance assessment in medical education: where we’ve been and where we’re going. Eval Health Prof. 2004;27:285–303. doi: 10.1177/0163278704267044. [DOI] [PubMed] [Google Scholar]
  • 3.Botden S.M.B., Buzink S.N., Schijven M.P., Jakimowicz J.J. Augmented versus virtual reality laparoscopic simulation: what is the difference? World J Surg. 2007;31:764–772. doi: 10.1007/s00268-006-0724-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Sijstermans R., Jaspers M.W.M., Bloemendaal P.M., Schoonderwaldt E.M. Training inter-physician communication using the dynamic patient simulator. Int J Med Inform. 2007;76:336–343. doi: 10.1016/j.ijmedinf.2007.01.007. [DOI] [PubMed] [Google Scholar]
  • 5.Issenberg S.B., Scalese R.J. Simulation in health care education. Perspect Biol Med. 2008;51:31–46. doi: 10.1353/pbm.2008.0004. [DOI] [PubMed] [Google Scholar]
  • 6.Stevens A., Hernandez J., Johnsen K. The use of virtual patients to teach medical students history taking and communication skills. Am J Surg. 2006;191:806–811. doi: 10.1016/j.amjsurg.2006.03.002. [DOI] [PubMed] [Google Scholar]
  • 7.Deladisma A.M., Cohen M., Stevens A. Do medical students respond empathetically to a virtual patient? Am J Surg. 2007;193:756–760. doi: 10.1016/j.amjsurg.2007.01.021. [DOI] [PubMed] [Google Scholar]
  • 8.Tavares W., Brydges R., Myre P. Applying Kane’s validity framework to a simulation based assessment of clinical competence. Adv Health Sci Educ Theory Pract. 2018;23:323–338. doi: 10.1007/s10459-017-9800-3. [DOI] [PubMed] [Google Scholar]
  • 9.Cook D.A., Hatala R. Validation of educational assessments: a primer for simulation and beyond. Adv Simul. 2016;1:31. doi: 10.1186/s41077-016-0033-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Norman G.R., Muzzin L.J., Williams R.G., Swanson D.B. Simulation in health sciences education. J Instr Dev. 1985;8:11–17. [Google Scholar]
  • 11.Van Der Vleuten C.P.M. The assessment of professional competence: developments, research and practical implications. Adv Health Sci Educ. 1996;1:41–67. doi: 10.1007/BF00596229. [DOI] [PubMed] [Google Scholar]
  • 12.Kolb A., Kolb D. Learning styles and learning spaces: enhancing experiential learning in higher education. Acad Manag Learn Educ. 2005;4:193–212. [Google Scholar]
  • 13.Adamo G. Simulated and standardized patients in OSCEs: achievements and challenges 1992–2003. Med Teach. 2003;25:262–270. doi: 10.1080/0142159031000100300. [DOI] [PubMed] [Google Scholar]
  • 14.Rosebraugh C.J., Speer A.J., Solomon D.J. Setting standards and defining quality of performance in the validation of a standardized-patient examination format. Acad Med. 1997;72:1012–1014. doi: 10.1097/00001888-199711000-00022. [DOI] [PubMed] [Google Scholar]
  • 15.Baig L.A., Beran T.N., Vallevand A., Baig Z.A., Monroy-Coadros M. Accuracy of portrayal by standardized patients: results from four OSCE stations conducted for high-stakes examinations. BMC Med Educ. 2014;14:97. doi: 10.1186/1472-6920-14-97. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Erby L.A., Roter D.L., Biesecker B.B. Examination of standardized patient performance: accuracy and consistency of six standardized patients over time. Patient Educ Couns. 2011;85:194–200. doi: 10.1016/j.pec.2010.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Seymour N.E., Gallagher A.G., Roman S.A. Virtual reality training improves operating room performance: results of a randomized, double-blinded study. Ann Surg. 2002;236:458–463. doi: 10.1097/00000658-200210000-00008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Yee B., Naik V., Joo H. Nontechnical skills in anesthesia crisis management with repeated exposure to simulation-based education. Anesthesiology. 2005;103:241–248. doi: 10.1097/00000542-200508000-00006. [DOI] [PubMed] [Google Scholar]
  • 19.Jiang G., Chen H., Wang S. Learning curves and long-term outcome of simulation-based thoracentesis training for medical students. BMC Med Educ. 2011;11:39. doi: 10.1186/1472-6920-11-39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Frank J.R., Snell L.S., Ten Cate O. Competency-based medical education: theory to practice. Med Teach. 2010;32:638–645. doi: 10.3109/0142159X.2010.501190. [DOI] [PubMed] [Google Scholar]
  • 21.Bok H.G.J., Jaarsma D.A.D.C., Spruijt A. Feedback-giving behaviour in performance evaluations during clinical clerkships. Med Teach. 2015;38:88–95. doi: 10.3109/0142159X.2015.1017448. [DOI] [PubMed] [Google Scholar]
  • 22.Fletcher G., Flin R., McGeorge P., Glavin R., Maran N., Patey R. Anaesthetists’ non-technical skills (ANTS): evaluation of a behavioural marker system. Br J Anaesth. 2003;90:580–588. doi: 10.1093/bja/aeg112. [DOI] [PubMed] [Google Scholar]
  • 23.Ahlberg G., Enochsson L., Gallagher A.G. Proficiency-based virtual reality training significantly reduces the error rate for residents during their first 10 laparoscopic cholecystectomies. Am J Surg. 2007;193:797–804. doi: 10.1016/j.amjsurg.2006.06.050. [DOI] [PubMed] [Google Scholar]
  • 24.Chiu M., Tarshis J., Antoniou A. Simulation-based assessment of anesthesiology residents’ competence: development and implementation of the Canadian National Anesthesiology Simulation Curriculum (CanNASC) Surv Anesthesiology. 2017;61:32. doi: 10.1007/s12630-016-0733-8. [DOI] [PubMed] [Google Scholar]
  • 25.Miller G.E. The assessment of clinical skills/competence/performance. Acad Med. 1990;65:S63–S67. doi: 10.1097/00001888-199009000-00045. [DOI] [PubMed] [Google Scholar]

Articles from BJA Education are provided here courtesy of Elsevier

RESOURCES