Skip to main content
Medical Science Educator logoLink to Medical Science Educator
. 2024 Jan 13;34(2):315–318. doi: 10.1007/s40670-024-01974-5

Nurturing Untapped Integration Expertise of MS4 Assessment Writers

Esther Dale 1,, Bethany Schowengerdt 2, Claudio Violato 1
PMCID: PMC11055828  PMID: 38686140

Abstract

Creating original, integrated multiple-choice questions (MCQs) is time-consuming and onerous for basic science and clinical faculty. We demonstrate that medical students are co-experts to overcome assessment challenges of the faculty. We recruited, trained, and motivated medical students to write 10,000 high-quality MCQs for use in the foundational courses of medical education. These students were ideal because they possessed integrated knowledge (basic sciences and clinical experience). We taught them how to write high-quality MCQs using a writing template and continuous monitoring and support by an item bank curator. The students themselves also benefitted personally and pedagogically from the experience.

Keywords: Student co-prodution, MCQs, Assessment, Item bank

Background

Multiple choice questions (MCQs) are widely used in medical education at many levels including during medical school in foundational basic science courses; clinical clerkships; National Board of Medical Examiners (NBME) subject matter exams (“Shelf Exams”); Step 1, Step 2, and Step 3 licensure exams; assessments during residency training; assessments during fellowships; board certification in many specialty areas; and many other uses [1]. There is a consistent need for a large volume of high-quality, integrated preclinical MCQs (integration of knowledge from different disciplines within the preclinical phase of medical school) to assess students’ preparation for clerkships, but professors need assistance in meeting this requirement [2]. It is imperative to develop high-quality MCQs as poor-quality ones lead to construct-irrelevant variance (i.e., error of measurement), affecting student pass-fail outcomes and failing to assess what examiners purport to test [35].

The time, cost, and effort to generate quality MCQs are substantial. Testing organizations such as the NBME hire committees of faculty to serve as content experts writing items asynchronously and then meeting for synchronous group review, incurring honoraria, travel expenses, and administrative support costs [6]. Most medical school faculty write MCQs without any formal training in test construction or any administrative support for this important work. Accordingly, these items are typically flawed in important ways [4, 5]. The models used by the NBME are not feasible for local medical school faculty-generated items because they are costly, labor intensive, and, therefore, this process is unsustainable.

A readily available and underutilized resource for item writing is senior fourth year medical students. These students are at the peak of their biomedical knowledge and have undergone most of their required clinical immersion experiences during their third year. After graduation from medical school, there is a steady decline in biomedical knowledge by physicians. Custers and ten Cate [7] in a well-designed study found that performance on tests of biomedical knowledge decreased from approximately 40% correct answers at the end of medical school to 25–30% correct answers for doctors after many years of practice. After graduation from medical school, knowledge retention is best described by a negatively accelerated (logarithmic) forgetting curve with age. After 25 years, retention levels were in the range of 15–20%. Other studies have similar findings [8, 9]. While similar rigorous and specific studies have not been done with PhD basic science faculty, it can be supposed that they are subject to a similar forgetting curve with age as are physicians [10]. Senior medical students who have both been recently immersed in basic science biomedical knowledge and clinical experiences, therefore, are well-suited to be trained as MCQ writers in the process of assessment co-production employing best practices [1, 11].

Activity

We employed best practices methods to develop a large item bank of 10,000 MCQs over 3 years. The development and quality assurance of the MCQ items is depicted schematically as follows:graphic file with name 40670_2024_1974_Figa_HTML.jpg

This standardized approach allows for the development of high-quality items.

Student Recruitment

We recruited medical students by email, listserve postings, and personal contact describing the opportunity to be trained in workshops to write MCQs for remuneration. From the applicant pool, we interviewed and ultimately selected 68 students over the course of 3 years who had completed all of their academic requirements (basic science foundational courses, required clinical rotations) in good standing, and had adequate time to engage in the task.

Training Workshops

Students were trained (didactic, guided item writing, small group discussion) in 5-h workshops on MCQ item-writing skills and paid $21/h for their participation. General instructional objectives of the workshop are that attendees will be able to:

  1. Summarize the theory of MCQ testing

  2. Identify the parts of the MCQ item

  3. Write MCQs based on best practices

  4. Write clinical vignette type MCQ items

  5. Critique clinical vignette MCQs for validity

  6. Set minimum performance levels (MPLs) for each MCQ using a modified Nedelsky method [1214]

Table of Specifications (Test Blueprint)

We created a table of specifications or test blueprint. This blueprint was based on the United States Medical Licensing Exam (USMLE) Content Outline; the USMLE Step 1 test specifications; medical school course syllabi, a public, interdisciplinary database www.aquifersciences.com, developed by more than 100 basic scientists and clinicians; and a systematic literature review of core content and cognitive level of Bloom’s taxonomy of medical school foundational courses.

Item-Writing Template

We developed a Qualtrics digital template, an algorithm for creating a multiple-choice item and the associated meta-data. The template defines a process or a set of rules and options for item entry and tagging and item validity (using Bloom’s taxonomy, best practice from psychometric data, core concepts, and body systems).

Writers select from pre-determined, forced choices of core concepts, learning objectives, cognitive level, discipline, system, presenting problems, and MPL. The template also required each writer to add rationales to distractors and correct options. Finally, the template required writers to use and cite all reference materials and scholarly sources that informed the creation of items, from internationally recognized medical school textbooks such as Harrison’s Principles of Internal Medicine or scientific refereed journals. All items were reviewed and curated prior to inclusion in the bank and returned to the writer for revisions if necessary.

Students signed a non-disclosure statement, and the multiple-choice items were stored in a secure test delivery system to preserve the confidentiality of the questions and contained the items’ extensive meta-data (e.g., level of measurement, rational for each option, MPL) and references for each question. We developed 10,000 MCQ items over a 3-year period.

Vetting of the Items

Six PhD basic science faculty members reviewed more than 600 student written items in their respective specialty areas (e.g., physiology, anatomy, pathology, histology, microbiology, and immunology) and generally judged the items “excellent” and highly relevant at the appropriate assessment level of Bloom’s taxonomy. All other faculty were invited to access the item bank and review items. What had been initial faculty been faculty “skepticism” about students’ ability to write MCQs disappeared after this review. More than one faculty member informally told us that these MCQs are much better than those that had been written by faculty members which contained writing flaws and mostly lacked the meta-data that students provided (e.g., rationales for the options, explicit links to instructional objectives, Bloom’s taxonomy, MPLs).

Funding

The project was funded by the Dean’s Office, University of Minnesota Medical School, for $275,000 in a 3-year period. This funding was for all costs including materials, curator for the bank, and item writing. Item writers were paid $25 per item accepted for the bank. Most of the funds (~ 85%) therefore were paid directly to students for item writing.

Results and Discussion

There were three main results of the project: (1) a table of specifications for the content and cognitive levels based on Bloom’s taxonomy, disciplines (e.g., anatomy, biochemistry), systems (e.g., musculoskeletal, endocrine), and core concepts (e.g., flow gradients, cell potential) relevant to the foundational curriculum; (2) a MCQ writing template for creating a MCQ and associated meta-data; and (3) 10,000 curated MCQs authored by 68 trained student item writers.

Table 1 contains selected and slightly edited student writer perceptions of benefits of item writing. These include personal benefits for test performance and learning, questions about the motivation for writing if not paid for it, and an enhanced understanding of the complexity of learning objectives and their nuances.

Table 1.

Student writer comments

Personal Benefits for Test Performance
  “So for me, a huge unexpected benefit of my role as an item writer was that when I was studying for Step 2, this really gave my score a big bump …”
Personal Benefits for Learning
  “When you’re writing questions, you’re thinking about conditions, treatment plans just from all the different angles you know? What’s the pathophysiology? What’s the treatment? What’s the best initial treatment?”
  “I think writing questions for me really helped me to flesh out a lot of my understanding [of the content]”
Motivation for Writing Questions
  “I’d be interested to see how beyond this project, how writing questions can be something that students can use to study for shelf exams and board exams and where the motivation for that could come from when it’s not, you know, a paid role.”
Expanding and Understanding Complexity of Learning Objectives
  “One of the learning objectives is: Choose an appropriate diagnostic test for male presenting with sudden onset scrotal pain. So this is talking about testicular torsion but included in this learning objective is knowing what testicular torsion is. Knowing how it presents? … Physical exam signs? … How would you treat it? What diagnostic tests … other differential diagnosis are you considering and what would point you towards and away from that? And that within each of those topics, there are going to be knowledge, comprehension and application level questions that you can ask and so it’s very easy to expand a set of questions from a single learning objective” [emphasis added]

In their scoping review of barriers for medical school faculty to write high-quality MCQs, Karthikeyan et al. [2] identified several institutional and personal factors such as faculty development, quality assurance processes, individual barriers, and institutional barriers as barriers and facilitators to quality item writing. Barriers for individual faculty item writers face are lack of knowledge of MCQs, poor motivation, and high workload required in writing MCQs.

In the present work, we employed highly motivated, knowledgeable, and trainable medical students to bypass the institutional and personal barriers that exist for faculty. Our student writers were also skilled at cognitive integration of basic science to connect critically to signs and symptoms of clinical cases. PhD faculty generally lack the clinical knowledge and experience for this integration, while MD experienced clinicians may lack this integration since they use encapsulated rather than explicit basic science knowledge in clinical reasoning [15].

Our senior medical student MCQ writers occupy a unique space of proximity to basic science content and clinical immersion; they can utilize their integrated knowledge, select their high-yield essential science content, draw from their recent clinical experience, prioritize the content that should be tested, and create integrated questions. As cognitive integration is a desired outcome of medical education [15], advanced medical students possess this expertise. Several of the student comments in Table 1 support the item writers’ recognition of this integration.

In conclusion, we recruited, trained, and motivated medical students to write 10,000 high-quality MCQs for use in the foundational courses of medical education. These students were ideal because they possessed integrated knowledge (basic sciences and clinical experience) though they lacked experience in item writing. We taught them how to write high-quality MCQs and using the writing template and continuous monitoring and support by the item bank curator, our students cost-effectively ($25/item) developed 10,000 high-quality MCQs. The students themselves also benefitted personally and pedagogically from the experience. This was a “win–win” project for all involved.

Data Availability

Student comments and other data are available from the authors.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Violato C. Assessing competence in medicine and other health professions. Boca Raton, FL: Taylor & Francis; 2019. [Google Scholar]
  • 2.Karthikeyan S, O’Connor E, Hu W. Motivations of assessment item writers in medical programs: a qualitative study. BMC Med Educ. 2020;20:334. doi: 10.1186/s12909-020-02229-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Royal KD, Hedgpeth MW. The prevalence of item construction flaws in medical school examinations and innovative recommendations for improvement. EMJ Innovations. 2017;1(1):61–66. doi: 10.33590/emjinnov/10312489. [DOI] [Google Scholar]
  • 4.Rush BR, Rankin DC, White BJ. The impact of item-writing flaws and item complexity on examination item difficulty and discrimination value. BMC Med Educ. 2016;16:250. doi: 10.1186/s12909-016-0773-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Downing S. The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education. Adv Health Sci Educ Theory Pract. 2005;10(2):133–143. doi: 10.1007/s10459-004-4019-5. [DOI] [PubMed] [Google Scholar]
  • 6.National Board of Medical Examiners. United States Medical Licensing Exam, National Board of Medical Examiners. https://www.nbme.org/about-nbme/our-collaborations/item-co-creation. Accessed 1 Dec 2023. 
  • 7.Custers JF, ten Cate OT. Very long-term retention of basic science knowledge in doctors after graduation. Med Educ. 2011;45:422–430. doi: 10.1111/j.1365-2923.2010.03889.x). [DOI] [PubMed] [Google Scholar]
  • 8.D’Eon MF. Knowledge loss of medical students on first year basic science courses at the University of Saskatchewan. BMC Med Educ. 2006;6:5. doi: 10.1186/1472-6920-6-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Hoffman S. Physician cognitive decline: a challenge for state medical boards. J Med Regu. 2022;108:19–28. [Google Scholar]
  • 10.Caddick ZA, Fraundorf SH, Rottman BM, Nokes-Malach TJ. Cognitive perspectives on maintaining physicians’ medical expertise: II. Acquiring, maintaining, and updating cognitive skills. Cog Res Princ and Impli. 2023;8:47. doi: 10.1186/s41235-023-00497-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Violato EM, Violato C. Multiple choice questions in a nutshell: theory, practice, and postexam item analysis. Acad Med. 2020;95(4):659. doi: 10.1097/ACM.0000000000003096. [DOI] [PubMed] [Google Scholar]
  • 12.Nedelsky L. Absolute grading standards for objective tests. Educ Psych Measure. 1954;14:3–19. doi: 10.1177/001316445401400101. [DOI] [Google Scholar]
  • 13.Cizek GJ, Bunch MB. The Nedelsky method. In Standard setting (pp. 68–74). Thousand Oaks, CA: SAGE, 2007.
  • 14.Yousuf N, Violato C, Zuberi RW. Standard setting methods for pass/fail decisions on high stakes objective structured clinical examinations: a validity study. Teach Lear in Med. 2015;27(3):280–291. doi: 10.1080/10401334.2015.1044749. [DOI] [PubMed] [Google Scholar]
  • 15.Violato C, Gao H, O'Brien MC, Grier D, Shen E. How do physicians become medical experts? A test of three competing theories: distinct domains, independent influence and encapsulation models. Adv Heal Sci Educ. 2018;23:249–263. doi: 10.1007/s10459-017-9784-z. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Student comments and other data are available from the authors.


Articles from Medical Science Educator are provided here courtesy of Springer

RESOURCES