Skip to main content
BMJ Simulation & Technology Enhanced Learning logoLink to BMJ Simulation & Technology Enhanced Learning
. 2018 Mar 23;4(2):59–64. doi: 10.1136/bmjstel-2016-000192

Assessment tool for the instructional design of simulation-based team training courses: the ID-SIM

Annemarie F Fransen 1,2, M Beatrijs van der Hout-van der Jagt 1,3, Roxane Gardner 4, Manuela Capelle 5, Sebastiaan P Oei 6, Pieter J van Runnard Heimel 1, S Guid Oei 1,7
PMCID: PMC8990196  PMID: 35515891

Abstract

Introduction

To achieve an expert performance of care teams, adequate simulation-based team training courses with an effective instructional design are essential. As the importance of the instructional design becomes ever more clear, an objective assessment tool would be valuable for educators and researchers. Therefore, we aimed to develop an evidence-based and objective assessment tool for the evaluation of the instructional design of simulation-based team training courses.

Methods

A validation study in which we developed an assessment tool containing an evidence-based questionnaire with Visual Analogue Scale (VAS) and a visual chart directly translating the results of the questionnaire. Psychometric properties of the assessment tool were tested using five descriptions of simulation-based team training courses. An expert-opinion-based ranking from poor to excellent was obtained. Ten independent raters assessed the five training courses twice, by using the developed questionnaire with an interval of 2 weeks. Validity and reliability analyses were performed by using the scores from the raters and comparing them with the expert’s ranking. Usability was assessed by an 11-item survey.

Results

A 42-item questionnaire, using VAS, and a propeller chart were developed. The correlation between the expert-opinion-based ranking and the evaluators’ scores (Spearman correlation) was 0.95, and the variance due to subjectivity of raters was 3.5% (VTraining*Rater). The G-coefficient was 0.96. The inter-rater reliability (intraclass correlation coefficient (ICC)) was 0.91 (95% CI 0.77 to 0.99), and intra-rater reliability for the overall score (ICC) was ranging from 0.91 to 0.99.

Conclusions

We developed an evidence-based and reliable assessment tool for the evaluation of the instructional design of a simulation-based team training: the ID-SIM. The ID-SIM is available as a free mobile application.

Keywords: simulation, team training, instructional design

Introduction

Simulation-based team training courses are recommended to improve teamwork skills of healthcare teams. Since a lack of these skills results in preventable harm to patients, the effectiveness of these interventions should be guaranteed. A properly conducted simulation-based team training course should enable that learning activities are ‘predictable, consistent, standardised, safe and reproducible’.1 To achieve an effective training course, a robust instructional design of the training course is required.2 Instructional design is generally referred to as the ‘set of prescriptions for teaching methods to improve the quality of instruction with a goal of optimizing learning outcomes.’3 It is an integrated set of elements that interact with each other, composing a system of procedures for developing training curricula in a consistent and reliable manner.4

The instructional design of simulation-based training courses is of great importance, as the effectiveness of these courses relies on whether the course is adequately designed and whether it provides opportunities for deliberate practice.2 To gain more insight into the value of different instructional design features, comparisons between simulation-based team training courses are recommended.5 6 In order to perform these comparisons, Eppich et al emphasised that standardised reporting of the applied instructional design is required.6 However, an evidence-based and objective assessment tool that enables a standardised way of reporting has not been developed until now.

Although assessment tools are still lacking, essential instructional design features of simulation-based education have been described by Issenberg et al, and McGaghie et al. 2 7 Issenberg et al conducted the first and, thus far, only systematic review in which the literature was translated into 10 important design features: feedback, repetitive practice, ranging difficulty levels, defined outcomes, individualised learning, curriculum integration, multiple learning strategies, clinical variation, controlled environment and simulator validity.7 Cook et al confirmed the effectiveness of several of Issenberg’s instructional design features.5 Interestingly, the first five components selected by Issenberg et al correspond to Ericsson’s educational theory of deliberate practice, in which he demonstrated that achievement of expert performance is largely determined by the instructional design quality of a training course.8 9 In addition, incorporating deliberate practice in simulation-based education leads to more effective educational interventions than traditional medical education methods.2 The features defined in the reviews by Issenberg et al and McGaghie et al are also incorporated in two guidelines for designing an effective simulation-based training by the Association for Medical Education in Europe (AMEE).10 11 For these reasons, it is impossible to imagine simulation-based team training without the knowledge arrived from these two reviews.

The next step of getting closer to designing effective training courses is putting the contemporary literature in practice by developing an objective assessment tool. Such tool increases the consistency and standardisation of training courses, which enables reliable comparisons between training designs. These comparisons will lead to advancement of our knowledge about the importance of each instructional design feature. This will support the design of effective training courses, which in turn, will make a real difference for patient care. Hence the primary goal of this study was to develop an evidence-based, objective and easy-to-use assessment tool to evaluate the quality of the instructional design of simulation-based team training courses: the ID-SIM, available as a mobile application. ID-SIM stands for ‘ I nstructional D esign of a S imulation I mproved by M onitoring’.

Methods

Tool development

The 10 instructional design features defined by Issenberg et al, in a BEME (Best Evidence in Medical Education) systematic review, were used to evaluate the instructional design quality of team training courses.7 Five out of 10 features corresponded to the effective theory of deliberate practice by Ericsson et al: feedback, repetitive practice, range of difficulty levels, defined outcomes and individualised learning.8 9 12 Issenberg’s features were considered to be an adequate starting point for the development of the ID-SIM, as they were clearly defined and unambiguously interpretable.

To yield a reliable and objective evaluation of the instructional design quality, we decided that the ID-SIM should consist of two parts: a questionnaire and a visual chart. On completion of all the items of the questionnaire, the responses to the questionnaire will be converted to the visual chart. Both the questionnaire and the related visual chart will be incorporated in a mobile application to guarantee an easy to access assessment tool.

Visual chart

We envisioned that the visual part of the assessment tool had to be easy to understand. To achieve this goal, we aimed that users of the ID-SIM should be able to interpret the visual part of the ID-SIM without additional instructions. Besides, the visual chart had to clearly identify the strengths and weaknesses of an applied instructional design. Therefore, all instructional design features had to be presented in the chart. Moreover, the scores assigned to each feature (expressed in a percentage) had to be depicted. Finally, a rating for the overall quality of a training had to be presented in the chart to allow a quick differentiation between training courses.

Questionnaire

The following key publications were used for the development of the questionnaire: Issenberg et al’s BEME systematic review (2005),7 the qualitative review by McGaghie et al 2 and the two guidelines of an international organisation of medical education (AMEE) on simulation-based medical education (no. 50 and no. 82).10 11 In these publications, important educational principles are discussed, which lead to effective learning in the context of simulation-based training courses.

Two researchers formulated questions for each design feature, based on the literature mentioned above. Furthermore, an English native speaker and an independent group of six researchers with different occupational backgrounds reviewed the questions on clear formulation and comprehensiveness. The group comprised engineers, clinical doctors and a psychologist, all familiar with (medical) simulation-based education. The content validity of the questionnaire has thus been obtained through consensus. Considering the experience and different backgrounds of the two researchers, it is highly likely that all relevant items of the properties have been included.13

The aim of this questionnaire is to translate systematic recording of observations into a visual chart, hence providing a tool for self-assessment. It could thus arguably be interpreted as a checklist rather than a questionnaire per se. We pursued to use a question type that yields continuous data, as this will also identify slight differences and allows for better statistical analysis.14 For the validation process, we used an online version of the questionnaire (Qualtrics, Provo, Utah, USA) for the raters who were unable to use the mobile application.

Tool validation

An evaluation of the psychometric properties was performed by an assessment of validity (construct validity), reliability (inter-rater and intra-rater reliability) and usability. In order to determine these properties, one of the authors (AF) used the selected literature to draft five realistic team training courses with a predefined poor, fair, average, good and excellent quality of the instructional design. We chose to use these standardised training courses rather than using published descriptions, as there might be a publication bias with respect to quality of simulation-based team training courses. These descriptions had a word count between 227 and 411 words.

To determine reliability, we asked 10 raters with varying degrees of experience in the field of simulation-based training courses to fill in the questionnaire for each of the five team training courses. These raters have a background as educators in simulation-based team training courses and/or clinical doctors who were trainers in simulation-based team training courses or conducted research in medical simulation. All were independent from our research team. We asked them to fill in the questionnaire independently from each other, and they were all blinded to the predefined quality levels of the team training courses. As our aim was to create a straightforward and easy-to-understand assessment tool, we did not provide any prior explanation for the use or interpretation of the questionnaire. The evaluators rated the training courses in a random order, without communicating with each other. A fully crossed design was used: all evaluators assessed all training courses. To test the intra-rater reliability, the evaluators assessed the same training courses without access to their previous ratings two weeks later.

Evidence of construct validity was obtained by examining the correlation between the obtained scores by the 10 raters and an expert’s ranking of the team training courses from poor to excellent. The expert had more than 10 years of experience in medical simulation; he was involved with designing many simulation-based team training courses, simulation centres, scientific conferences and publications, workshops and train-the-trainer courses worldwide. He performed the ranking without using the ID-SIM questionnaire or the selected literature. We opted to use an expert ranking as expert opinion is the current gold standard for the evaluation and design of simulation-based training courses. Usability of the questionnaire was assessed after the first round by a short survey with 11 questions, including an overall rating for the usability of the ID-SIM questionnaire (on a scale from 0 to 10).

Statistical analysis

Statistical analyses were performed using SPSS (V.24, IBM) and Excel (Microsoft Office Professional Plus Excel 2013, Microsoft).

Construct validity was defined as the degree to which the tool measures what it claims to be measuring.15 Applied to the current study, it implies: the correlation (Spearman coefficient) between the obtained overall scores of the different raters correlated to the expert’s ranking. Reliability refers to the reproducibility of assessment data or scores, over time or occasions. In our context, it is defined by the extent to which the scores of the rater are reproducible by the same rater (intra-rater reliability) and by others (inter-rater reliability).16 The inter-rater reliability was determined by conducting a generalisability theory (GT) analysis, which is identified by Downing as ‘the most elegant estimate of inter-rater reliability’.16

The GT study enables us to estimate the value of the different sources of error (variance components) that contribute to the overall variance in scores.17 The variance components comprise: the training courses, the raters and the items of the questionnaire. To estimate the variance components, we used the minimum norm quadratic unbiased estimation in SPSS (V.22, IBM), since this estimation does not rely on distributional assumptions. The variance components were used to calculate the generalisability coefficient (G-coefficient) that indicates how consistent the scores of the different raters are.

Additionally, an intraclass correlation coefficient (ICC; with a 95% CI) was calculated as an estimate of inter-rater reliability.18 The intra-rater reliability was also tested by calculating an ICC for the individual raters.19 As we were interested in the agreement between the scores and intend to generalise the questionnaire to other situations, we used a two-way random ICC (2,1) with an absolute agreement.18 19 This ICC considers both random and systematic error. The ICC for the inter-rater and intra-rater reliability was calculated by using the assigned overall score of the training course. The overall score was obtained by summing the relative scores for each item and dividing the total by 10. The relative score for each item was the average score of the corresponding questions per specific design.

Results

We developed the ID-SIM (mobile application) containing a questionnaire and a visual chart.i As an example of the use of the visual chart, a poor and a good training design have been depicted in figure 1. Each pie of the chart represents one of the 10 instructional design items. All the items have a similar weight. Depending on how well the item is implemented in the training course, the related pie grows. The area of each pie represents the relative score for that specific design feature (see figure 2). The overall score, displayed at the top of the chart, corresponds to the total area of the chart, thus averaging all design features. The research panel obtained consensus about 42 questions, ranging from two to six questions per design feature (see online supplementary appendix 1 for the included questions). For all questions, a Visual Analogue Scale was adopted, yielding continuous data that were rounded to one decimal. The obtained final questionnaire was incorporated in the mobile application.

Figure 1.

Figure 1

Example of two simulation-based team training courses evaluated with the ID-SIM (poor design (left); good design (right)).

Figure 2.

Figure 2

The score of one of the 10 instructional design features displayed in the mobile application.

Supplementary Appendix 1

bmjstel-2016-000192.supp1.pdf (311.4KB, pdf)

Validity and usability

To test the validity and usability, we used an expert-based ranking and a usability questionnaire for the raters who used the ID-SIM. The ranking of the team training courses by the simulation expert appeared to be identical to the predefined order. The expert was blinded to the assigned levels of quality. The Spearman’s correlation coefficient between the overall scores obtained by the raters and the ranking by the expert was 0.95.

All raters (strongly) agreed that the ID-SIM questionnaire was valuable for both design and evaluation of simulation-based team training courses. The mean overall rating assigned to the ID-SIM was 7.7 (on a scale ranging from 0 to 10). Nine out of 10 raters would recommend the tool to colleagues (one neither agreed nor disagreed) and all (strongly) agreed that the ID-SIM provides more insight into the strengths and weaknesses of a team training course. Four raters agreed, three raters disagreed and three raters neither agreed nor disagreed on whether the ID-SIM questionnaire was easy to use. Five raters thought the needed time was acceptable and two raters disagreed. After three training course assessments the raters felt familiar with the questionnaire. The average amount of time that they needed to complete the questionnaire for the evaluation of their first training course was 32 min (95% CI 16 to 48) and the fastest average time was 14 min (95% CI 9 to 19).

Reliability

The 10 raters completed the ID-SIM questionnaire twice for the five training descriptions with an interval of at least two weeks. Table 1 demonstrates the relative contribution of each variance component between the observed scores. The variance due to subjectivity of raters was 3.5% (VTraining*Evaluator). The main contributing component of variance in the observed scores was due to differences in the quality of the five standardised training courses (VTraining 53.4%). The tendency for a training course to perform differently with a specific item was 17.7% (VTraining*Item). The G-coefficient calculated from table 1 was 0.96.17 Among the 50 obtained overall scores (five standardised training courses by 10 raters), the single-rater interclass correlation coefficient was 0.91 (95% CI 0.77 to 0.99). The intra-rater reliability coefficient (ICC) for each individual rater (concerning the overall score) ranged from 0.91 to 0.99.

Table 1.

Variance components and relative contributions to the total variance

Factor Variance component Percentage of variance (%)
VTraining 743.8 53.4
VEvaluator* 6.0 0.4
VItem* 160.5 11.5
VTraining*Evaluator 48.3 3.5
VTraining*Item* 246.5 17.7
VEvaluator*Item 15.9 1.1
VTraining*Evaluator*Item 176.7 12.7
VError 0 0

*Item difficulty (VItem), assessor stringency (VEvaluator) and item-specific stringency (VEvaluator*Item) do not contribute to error as all training courses were assessed by all raters using the same items.20

Discussion

To evaluate the instructional design quality of simulation-based team training courses, we developed the ID-SIM. The ID-SIM is a mobile application consisting of an evidence-based 42-item questionnaire of which the results are directly visualised in a propeller chart. The current validation demonstrates that it is a valid, useful and reliable assessment tool. Raters agreed that the ID-SIM questionnaire increased the transparency of the strengths and weaknesses of the 10 most important instructional design features. Furthermore, they all agreed that the ID-SIM was a valuable assessment tool to design or evaluate a simulation-based team training course. Only 3.5% of variance in scores was due to subjectivity of raters (VTraining*Raters). This minor contribution of the raters’ taste implies that the ID-SIM yields objective scores. A G-coefficient of 0.96 indicates that a reliable judgement based on the ID-SIM is possible, as the reliability threshold is 0.80.17 One could interpret this coefficient as that 96% of the variance in obtained scores is caused by real and stable differences between the training courses. Based on the overall score of the training courses, there was an excellent inter-rater reliability (ICC 0.91; 95% CI 0.77 to 0.99) and intra-rater reliability (ICC varying from 0.91 to 0.99 for the individual evaluators). Besides, there was a high correlation (Spearman’s ρ 0.98) with the current gold standard (expert opinion).

An important advantage of the ID-SIM is that it provides the research community with standardisation in the assessment of team training courses, based on the current key educational theories and literature. An objective and standardised evaluation of these courses enables reliable comparisons between different training designs. The ID-SIM can therefore play a fundamental role in examining the importance of the individual instructional design features. The configuration of a mobile app for this assessment tool guarantees that it is easy to access and the propeller chart provides a quick overview.

The ID-SIM also creates opportunities for less-experienced individuals who are challenged with the development of a simulation-based team training course. Although design quality of a simulation-based team training course nowadays is most often determined by the opinion of simulation experts, an expert might not always be available. This implies that also less-experienced professionals are challenged to design and/or evaluate a simulation-based team training course, without being aware of the effective instructional design features. The ID-SIM provides them with an easy-to-access mobile application, offering a standardised and objective framework of key features that should be included. Moreover, we found a high correlation between the ID-SIM overall score and expert opinion, which is of great importance as an expert opinion includes knowledge from literature and combines his/her knowledge with experience of turning this knowledge into reality.

Limitations of our research design should be considered. First of all, we used standardised training course descriptions with an instructional design varying from poor to excellent. The variability in design quality might have influenced the magnitude of the ICC. However, the use of these training courses rules out inferences of possible publication bias and examines the use of the ID-SIM for all quality levels. Future research goals include the use of the ID-SIM across ‘real’ team training courses. Second, the obtained ratings might be influenced by common method bias. Although the raters were not involved with the development of the ID-SIM, common rater effects (eg, the tendency to maintain consistent in their responses, social desirability) cannot be ruled out. Both limitations are important to keep in mind when interpreting our results. However, the performed GT analysis provides more information about the different variance components that attribute to the total measurement error.

Now that the validation results of the ID-SIM are promising, the ID-SIM should be tested across a diversity of simulation experts and training courses. Applying the ID-SIM to simulation-based education more broadly might be possible in the future. Subsequently, the predictive value of the ‘ID-SIM overall rating’ for the effectiveness of a team training course needs to be established. To this end, it would be useful to evaluate the instructional design of training courses of which its effectiveness is assessed in a randomised controlled study. To enable evaluation with the ID-SIM, we therefore encourage authors to provide detailed information in scientific publications on the instructional design of team training courses. The use of the ID-SIM may help guiding in inevitable tradeoffs between an optimal training course design and the available financial and other resources.

Acknowledgments

We would like to acknowledge the contribution of Jenny W Rudolph (Center for Medical Simulation, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA) and Professor K Anders Ericsson (Department of Psychology, Florida State University, Tallahassee, Florida, USA) for critically discussing the idea behind the ID-SIM. We would also like to thank Kaspar Draaisma for the development of the mobile application.

Footnotes

i

The ID-SIM is available for free in the App store of Apple (ID SIM, Denovo, Amsterdam, the Netherlands).

Contributors: AF and SGO were responsible for the conception of the work. All authors provided substantial contributions to the design of the work. AF was responsible for the data collection. AF and BH were responsible for the data analysis. AF drafted the manuscript. All authors revised the drafted manuscript critically and provided a final approval. All authors agreed to be accountable for all aspects of the work.

Competing interests: The research team received no financial support from any third-party to complete the work. AF, MC, PRH, SGO and BH have no financial disclosure or conflicts of interest. RG is affiliated to the Center of Medical Simulation, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA.

Provenance and peer review: Not commissioned; externally peer reviewed.

References

  • 1. Okuda Y, Bryson EO, DeMaria S, et al. The utility of simulation in medical education: what is the evidence? Mt Sinai J Med 2009;76:330–43. 10.1002/msj.20127 [DOI] [PubMed] [Google Scholar]
  • 2. McGaghie WC, Issenberg SB, Petrusa ER, et al. A critical review of simulation-based medical education research: 2003–2009. Med Educ 2010;44:50–63. 10.1111/j.1365-2923.2009.03547.x [DOI] [PubMed] [Google Scholar]
  • 3. Fraser KL, Ayres P, Sweller J. Cognitive load theory for the design of medical simulations. Simul Healthc 2015;10:295–307. 10.1097/SIH.0000000000000097 [DOI] [PubMed] [Google Scholar]
  • 4. Reiser RA, Dempsey JV. Dempsy. Characteristics of Instructional Design Models. Reiser RA, In: Trends & Issues in Instructional Design and Technology. 3rd ed: Pearson, 2011. [Google Scholar]
  • 5. Cook DA, Hamstra SJ, Brydges R, et al. Comparative effectiveness of instructional design features in simulation-based education: systematic review and meta-analysis. Med Teach 2013;35:e867–98. 10.3109/0142159X.2012.714886 [DOI] [PubMed] [Google Scholar]
  • 6. Eppich W, Howard V, Vozenilek J, et al. Simulation-based team training in healthcare. Simul Healthc 2011;6:S14–19. 10.1097/SIH.0b013e318229f550 [DOI] [PubMed] [Google Scholar]
  • 7. Issenberg SB, McGaghie WC, Petrusa ER, et al. Features and uses of high-fidelity medical simulations that lead to effective learning: a BEME systematic review. Med Teach 2005;27:10–28. 10.1080/01421590500046924 [DOI] [PubMed] [Google Scholar]
  • 8. Ericsson KA, Krampe RT, Tesch-Römer C. The role of deliberate practice in the acquisition of expert performance. Psychol Rev 1993;100:363–406. 10.1037/0033-295X.100.3.363 [DOI] [Google Scholar]
  • 9. Ericsson KA. Deliberate practice and acquisition of expert performance: a general overview. Acad Emerg Med 2008;15:988–94. 10.1111/j.1553-2712.2008.00227.x [DOI] [PubMed] [Google Scholar]
  • 10. Khan K, Tolhurst-Cleaver S, White S, et al. ; Simulation in healthcare education. Building a simulation programme: A practical guide. AMEE, 2011:1–44. [Google Scholar]
  • 11. Motola I, Devine LA, Chung HS, et al. Simulation in healthcare education: a best evidence practical guide. AMEE guide no. 82. Med Teach 2013;35:e1511–30. 10.3109/0142159X.2013.818632 [DOI] [PubMed] [Google Scholar]
  • 12. Ericsson KA. Deliberate practice and the acquisition and maintenance of expert performance in medicine and related domains. Acad Med 2004;79:S70–81. 10.1097/00001888-200410001-00022 [DOI] [PubMed] [Google Scholar]
  • 13. Brinkman WP. Design of a questionnaire instrument, handbook of mobile technology research methods. Hauppauge, NY: Nova Science Publisher; 2009:31–57. [Google Scholar]
  • 14. Celenza A, Rogers IR. Comparison of visual analogue and likert scales in evaluation of an emergency department bedside teaching programme. Emerg Med Australas 2011;23:68–75. 10.1111/j.1742-6723.2010.01352.x [DOI] [PubMed] [Google Scholar]
  • 15. Downing SM. Validity: on meaningful interpretation of assessment data. Med Educ 2003;37:830–7. 10.1046/j.1365-2923.2003.01594.x [DOI] [PubMed] [Google Scholar]
  • 16. Downing SM. Reliability: on the reproducibility of assessment data. Med Educ 2004;38:1006–12. 10.1111/j.1365-2929.2004.01932.x [DOI] [PubMed] [Google Scholar]
  • 17. Crossley J, Davies H, Humphris G, et al. Generalisability: a key to unlock professional assessment. Med Educ 2002;36:972–8. 10.1046/j.1365-2923.2002.01320.x [DOI] [PubMed] [Google Scholar]
  • 18. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420–8. 10.1037/0033-2909.86.2.420 [DOI] [PubMed] [Google Scholar]
  • 19. Weir JP. Quantifying test–retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res 2005;19:231–40. 10.1519/15184.1 [DOI] [PubMed] [Google Scholar]
  • 20. Crossley J, Russell J, Jolly B, et al. ’I’m pickin' up good regressions': the governance of generalisability analyses. Med Educ 2007;41:926–34. 10.1111/j.1365-2923.2007.02843.x [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Appendix 1

bmjstel-2016-000192.supp1.pdf (311.4KB, pdf)


Articles from BMJ Simulation & Technology Enhanced Learning are provided here courtesy of BMJ Publishing Group

RESOURCES