Skip to main content
Medical Science Educator logoLink to Medical Science Educator
. 2024 Jan 15;34(2):357–361. doi: 10.1007/s40670-023-01972-z

Transitioning from Faculty-Written Examinations to National Board of Medical Examiners Custom Examinations in Medical Education

Christine M Prater 1,, Thomas E Tenner Jr 1,2, Michael P Blanton 1,2, David Trotter 1,3
PMCID: PMC11055832  PMID: 38686141

Abstract

Purpose

A challenge for medical educators is choosing a method that best evaluates preclinical students’ performance in preparation for Step 1. In previous years, block directors (BDs) of the 2nd year (MS2) neuroscience course at Texas Tech University Health Sciences Center School of Medicine issued faculty-written (FW) examinations during the course. In 2022, BDs replaced FW examinations with National Board of Medical Examiners (NBME) custom examinations. The rationale being that the customized NBME exams would better reflect the national neuroscience curriculum and enhance student preparedness for taking standardized exams.

Methods

FW examinations (2021) were created by the faculty in the neuroscience course and reviewed by BDs. In contrast, questions that best aligned with the material covered for the 2022 course were selected by BDs using MyNBMESM Services Portal. The custom questions selected are assigned a “difficulty” score by NBME, generating a predicted national average score. At the end of the course, undergraduate medical students in the School of Medicine at Texas Tech University Health Sciences Center completed an online Qualtrics questionnaire to compare the transition of assessment type.

Results

Participants reported greater satisfaction in their neuroscience education and block organization with NBME examinations. For example, there was a nearly twofold (1.83) increase in the number of students that strongly agreed with the statement “Overall, I am satisfied with the quality of my neuroscience education in this block.” They were also less likely to report the workload as being “much too heavy.” Overall, students expressed a preference for the customized NBME exams as opposed to faculty generated exams (88.1%).

Conclusions

From the student perspective, building customized assessments through MyNBMESM Services Portal was found to be useful and preferable for evaluating student performance. From block directors’ perspective, it is noted that time is saved assisting faculty in writing valid questions, time defending/justifying FW questions, and time expended generating exams. The only perceived negative regarding the NBME exams is the cost.

Keywords: Survey, Curriculum, Satisfaction, Workload, Assessment

Introduction

The United States Medical Licensing Exam (USMLE) Step 1 is a pass/fail system created for licensure and securing a common standard in basic science knowledge amongst physicians according to the USMLE and Federation of State Medical Boards [1]. A medical student must pass the USMLE to become a licensed physician, but medical school programs differ in their methods for delivering basic science content and assessing preparedness for the examination. It has been shown that preclinical curriculum grades are one of the strongest predictors for Step 1 success as these scores reflect mastery of concepts [2].

By 2016, nearly one-third of medical schools utilized the National Board of Medical Examiners Customized Assessment Services (NBME CAS) platform, allowing faculty to select multiple-choice questions [3] for examinations. NBME CAS assessments have been shown to “most strongly correlate” with Step 1 score while passing faculty-written (FW) examinations “correlate” with passing Step 1 according to a study of an organ systems-based preclinical curriculum [4]. The advantage of taking FW assessments would be that students could receive individualized item analysis and rationales post-examination. The advantage for NBME CAS would be comparison to the national performance of Step 1 takers on the same set of questions.

The types of preclinical assessments that should be administered at medical schools (and when) have been debated for years with no consensus. More so, with the transition of Step 1 into a pass/fail format, preclinical curricula have undergone innovative changes [57], and the need to evaluate available preclinical assessment strategies is even more critical.

In previous years, block directors (BD) of the Integrated Neurosciences course, which is a 2nd year (MS2) course in the School of Medicine at Texas Tech University Health Sciences Center, issued FW examinations during the course with a NBME subject examination at the end of each unit. Faculty also provided rationales for students at the end of the each FW examination. In 2022, BDs replaced FW with NBME CAS examinations. As NBME does not provide rationales at the end of the examination, none were given. In this study, we compared the survey data from past (2021) and present (2022) student satisfaction with neuroscience and behavior education, block organization, perceived workload, and exam-type preference.

Methods

Participants included 347 undergraduate medical education students (n = 180, 2021; n = 167, 2022) at Texas Tech University Health Sciences Center School of Medicine (TTUHSCSOM). Students completed an online Qualtrics questionnaire, which was later used for analysis. This questionnaire was delivered at the end of the block via a personal url link that was sent via student email. Completion of the survey is required for participation in the block. Students reported on satisfaction and block organization (e.g., “Overall, I am satisfied with the quality of my neuroscience education in this block”) through three items on a 5-point differential scale (strongly disagree–strongly agree). Workload was also assessed for one item on a 5-point scale (much too light–much too heavy). All procedures were approved by the Texas Tech University Health Sciences Center Institutional Review Board.

Statistical Analysis

For analysis with adequate sample size, a chi-square test of homogeneity was conducted between the independent variable (faculty written versus NBME custom) and the dependent variable. Adequate sample size was established according to Cochran [8]. Post hoc analysis involved pairwise comparisons using multiple z-tests of two proportions with a Bonferroni correction. Statistical significance was therefore accepted at p < 0.01 due to the correction.

For analysis with inadequate sample size, Fisher’s exact test (r × 2) was conducted between the independent variable (faculty written versus NBME custom) and the dependent variable. Post hoc analysis involved pairwise comparisons using multiple Fisher’s exact tests (2 × 2) with a Bonferroni correction. Statistical significance was therefore accepted at p < 0.01 due to the correction.

Results

Medical Student Satisfaction

Fisher’s exact test (r × 2) was conducted to compare the difference between students’ satisfaction with their neuroscience education due to inadequate sample size (one expected cell count was less than five) for the chi-square test of homogeneity as determined by Cochran [8]. The two multinomial probability distributions were not equal in the population, p < 0.001. Observed frequencies and percentages are presented in Table 1. Post hoc analysis involved pairwise comparisons using multiple z-tests of two proportions with a Bonferroni correction. Statistical significance was therefore accepted at p < 0.01. More students “Strongly Disagreed” with the statement, “Overall, I am satisfied with the quality of my neuroscience education in this block” when they had faculty written exams as opposed to NBME custom exams (n = 8.3% versus 1.2%, p < 0.01). This same trend was also reported for those who “Disagreed” with the statement (n = 21.7% vs 5.4%, p < 0.001). More students “Agreed” when they had NBME custom exams as opposed to faculty written exams (n = 54.5% versus 36.7%, p < 0.001). There was no significant difference for those who reported “Neither agree nor disagree” (p = 0.229) nor “Strongly agree” (p = 0.01).

Table 1.

Cross-tabulation of exam type and agreement with the statement, “Overall, I am satisfied with the quality of my neuroscience education in this block”

Exam type
Faculty written NBME custom p
Strongly disagree 15 2 < 0.01
(8.33) (1.20)
Disagree 39 9 < 0.001
(21.67) (5.39)
Neither agree nor disagree 37 26 0.229
(20.56) (15.57)
Agree 66 91 < 0.001
(36.67) (54.49)
Strongly agree 23 39 0.01
(12.78) (23.35)

Data are displayed as both frequency and percentage (parentheses)

In comparing students’ satisfaction with their behavioral science education, Fisher’s exact test (r × 2) was conducted due to inadequate sample size (four expected cell counts were less than five) for the chi-square test of homogeneity. The two multinomial probability distributions were equal in the population (p = 0.064). Observed frequencies and percentages are presented in Table 2.

Table 2.

Cross-tabulation of exam type and agreement with the statement, “Overall, I am satisfied with the quality of my behavioral science education in this block”

Exam type
Faculty written NBME custom
Strongly disagree 3 0
(1.67) (0)
Disagree 6 3
(3.33) (1.80)
Neither agree nor disagree 18 13
(10.00) (7.78)
Agree 79 96
(43.89) (57.49)
Strongly agree 74 55
(41.11) (32.93)

Data are displayed as both frequency and percentage (parentheses)

In comparing students’ satisfaction with the block’s organization, Fisher’s exact test (r × 2) was conducted due to inadequate sample size (one expected cell count was less than five) for the chi-square test of homogeneity. There was a statistically significant difference in the multinomial probability distributions between the two groups (p < 0.001). Observed frequencies and percentages are presented in Table 3. Post hoc analysis involved pairwise comparisons using multiple Fisher’s exact tests (2 × 2) with a Bonferroni correction. Statistical significance was accepted as p < 0.01. There was a statistically significant decrease in students reporting that they thought the block to be “Very poorly organized” for the NBME custom examinations. Students were just as likely to report the block as being “Poorly” organized (p = 0.026) or neither “Agree” nor “Disagree” (p = 0.33). Students were more likely to agree that they were “Satisfied” with the organization of the block (p < 0.001), but they were not more likely to “Strongly agree” (p = 0.47).

Table 3.

Cross-tabulation of exam type and response to, “How well organized was this block?”

Exam type
Faculty written NBME custom p
Very poorly organized 11 0 < 0.001
(6.11) (0)
Poorly organized 27 12 0.026
(15.00) (7.19)
Neither agree nor disagree 85 70 0.333
(47.22) (41.92)
Agree 42 67 < 0.001
(23.33) (40.12)
Strongly agree 15 18 0.468
(8.33) (10.78)

Data are displayed as both frequency and percentage (parentheses)

Workload Preference

Fisher’s exact test (r × 2) was conducted due to inadequate sample size (three expected cell counts were less than five) for the chi-square test of homogeneity. There was a statistically significant difference in the multinomial probability distributions between the two groups (p < 0.001). Observed frequencies and percentages are presented in Table 4. Post hoc analysis involved pairwise comparisons using multiple Fisher’s exact tests (2 × 2) with a Bonferroni correction. Statistical significance was accepted as p < 0.01. The difference in proportion of students reporting the workload was “much too light” (p = 0.61) or “light” (p = 0.110) was not significant between the two testing groups. More students reported that the workload was “just right” when taking the NBME custom exams as opposed to the faculty written exams (p < 0.001). Students who took the NBME custom examinations were also less likely to report that the workload was “heavy” (p < 0.01) or “much too heavy” (p < 0.001).

Table 4.

Cross-tabulation of exam type and response to, “How would you rate the overall workload for this block?”

Exam type
Faculty written NBME custom p
Much too light 1 2 0.61
(0.56) (1.2)
Light 1 5 0.110
(0.56) (2.99)
Just right 37 82 < 0.001
(20.67) (49.10)
Heavy 105 73 < 0.01
(58.10) (43.71)
Much too heavy 36 5 < 0.001
(20.11) (2.99)

Data are displayed as both frequency and percentage (parentheses)

Student Preference for Exam Type

Overall, students strongly agreed (n = 116 or 69.5%) or agreed (n = 31 or 18.6%) with the statement, “I preferred having NBME custom exams as opposed to faculty-written exams for this block” (Fig. 1).

Fig. 1.

Fig. 1

Student agreement with the statement, “I preferred having NBME custom exams as opposed to faculty-written exams for this block.” Data are displayed as frequency

Discussion

Medical students evaluated with NBME Exams were, in general, more satisfied with the quality of their neuroscience education and felt the block was better organized than students who had been evaluated using FW exams. It should be noted that students, evaluated in the Neuroscience block using the NBME-generated exams, also experienced FW exams in other parts of their medical education, expressed a preference for taking the NBME over the FW exams.

Since the Flexner report, variations of faculty-written exams have been the mainstay for student evaluation in American Medical Schools [9]. Over the years, standardized exams such as the FLEX, NBME, and the Step series of exams have been generated and used as summative evaluations of medical student preparedness on a national level [10, 11]. During that time, determining the effectiveness of FW exams in preparing students for the standardized national exams has been a concern [12, 13].

Faculty written exams have, over the years, been remodeled to better reflect and test knowledge. Vignette-style questions have evolved. Eschewed were first-order questions; encouraged were second- and third-order questions. Considerable effort was invested in helping medical school faculty improve the quality of their written questions [12]. Advantages of faculty written questions included both targeting material taught by the individual faculty members and the ability to provide rationales for summative exams enabling a formative role of those exams. It could also be argued that beginning first year medical students are not accustomed to Step style questions and would be more comfortable with FW exams. However, numerous disadvantages with FW exams can be identified including: faculty inexperience with writing questions,inappropriate answer choices, paired with inadequate rationales leading to student mistrust in the quality of the exam; potential for targeting concepts of faculty interest—referred to as “testing minutia” by students; students’ inability to gauge their progress on a national scale; and need for very busy faculty contributors to create new questions banks (see [14] for review).

Several advantages to using the NBME CAS system include professionally written, proofed, and vetted questions; ability to compare student progress with National curricula; and ability of students to predict their potential for passing Step 1 [12, 15]. Indeed, for the four NBME unit exams administered, class average consistently beat predicted national averages for questions used.

Additionally, faculty preferred how much faster the NBME examinations were created, giving them more time to spend face-to-face with the students. The primary disadvantage of NBME CAS exams would seem to be their cost with the secondary disadvantage be lack of rationales.

While overall satisfaction reached statistical significance between NBME and FW exams in the Neuroscience portion of the block, there was no statistical significance found in the evaluated parameters for the behavioral section of the course. While there could be several reasons for this observation, one reason for this discrepancy might be related to the fact that the neuroscience section of the course has historically been perceived to be extremely challenging relative to the behavioral section, with historically low satisfaction ratings. For instance, in the FW group, 54 students were dissatisfied with their neuroscience education compared to 11 in the NBME group (statistically significant). In contrast, in the FW group, nine students were dissatisfied with their behavioral science education compared to three in the NBME group (not statistically significant). As such, there was more room for improvement in the neuroscience section. It should also be noted that other changes occurred in the course between the two groups. For instance, the course was compressed from 10 to 8 weeks for the NBME group, and yet, students felt better prepared and that the block was better organized. An interesting follow-up study would be to compare how these different groups of students perform on their step exam in the relevant subject areas. Overall, it is concluded that use of the NBME CAS exam served as a “center of gravity” providing the students a better sense of orientation and confidence that they were learning what was needed to master the material and ultimately pass their Step 1 exam.

Acknowledgements

We thank Lauren Findley (Texas Tech University Health Sciences Center) for the compilation of data from the Qualtrics survey. The authors also wish to acknowledge the contribution of the Texas Tech University Health Sciences Center Clinical Research Institute for their assistance with this research.

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Declarations

Conflict of Interest

The authors declare no competing interests.

Footnotes

Practice Points

• Transition from faculty-written to National Board of Medical Examiners custom examinations was associated with increased medical students’ satisfaction with the quality of their neuroscience education.

• Transition from faculty-written to National Board of Medical Examiners custom examinations was associated with increased medical students’ satisfaction with the block’s organization.

• Transition from faculty-written to National Board of Medical Examiners custom examinations was associated with decreased medical students’ perception of a “heavy” or “much too heavy” workload.

• Students expressed a preference for the customized NBME exams.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.USMLE Step 1: Content Description and General Information. A joint program of the Federation of State Medical Boards of the United States Inc., and the National Board of Medical Examiners® United States Medical Licensing Examination. 2016. [Accessed February 14, 2023]. pp. 1–7. Available from: https://www.usmle.org/step-1.
  • 2.Guilbault RWR, Lee SW, Lian B, Choi J. Predictors of USMLE Step 1 outcomes: charting successful study habits. Med Sci Educ. 2020;30:103–106. doi: 10.1007/s40670-019-00907-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wright WS, Baston K. A national survey: use of the National Board of Medical Examiners ® basic science subject exams and Customized Assessment Services exams in US medical schools. Adv Med Educ Pract. 2018;9:599–604. doi: 10.2147/AMEP.S169076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Keltner C, Haedinger L, Carney PA, Bonura EM. Preclinical assessment performance as a predictor of USMLE Step 1 scores or passing status. Med Sci Educ. 2021;31:1453–1462. doi: 10.1007/s40670-021-01334-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Kruczek C, Kaur G. Curriculum restructuring at Texas Tech Health Science Center Medical School: Evaluation and lessons learned. FASEB J. 2022;36.
  • 6.Mcdaniel CM, Forlenza EM, Kessler MW. Effect of shortened preclinical curriculum on medical student musculoskeletal knowledge and confidence: an institutional survey. J Surg Educ. 2020;77:1414–1421. doi: 10.1016/j.jsurg.2020.04.011. [DOI] [PubMed] [Google Scholar]
  • 7.Shin M, Prasad A, Sabo G, et al. Anatomy education in US Medical Schools: before, during, and beyond COVID-19. BMC Med Educ. 2022;22:103. doi: 10.1186/s12909-022-03177-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Cochran WG. Some methods for strengthening the Common X2 Tests. Biometrics. 1954;10:417–451. doi: 10.2307/3001616. [DOI] [Google Scholar]
  • 9.Flexner A. Medical education in the United States and Canada. Washington, DC: Science and Health Publications Inc; 1910. [Google Scholar]
  • 10.Gullo CA, McCarthy MJ, Shapiro JI, Miller BL. Predicting medical student success on licensure exams. Med Sci Educ. 2015;25:447–453. doi: 10.1007/s40670-015-0179-6. [DOI] [Google Scholar]
  • 11.Haist SA, Butler AP, Paniagua MA. Testing and evaluation: the present and future of the assessment of medical professionals. Adv Physiol Educ. 2017;41:149–153. doi: 10.1152/advan.00001.2017. [DOI] [PubMed] [Google Scholar]
  • 12.Case S, Swanson D. Constructing written test questions for basic and clinical Sciences. National Board of Examiners. 2002.
  • 13.Goldstein SD, Lindeman B, Colbert-Getz J, Arbella T, Dudas R, Lidor A, Sacks B. Faculty and resident evaluations of medical students on a surgery clerkship correlate poorly with standardized exam scores. Am J Surg. 2014;207:231–235. doi: 10.1016/j.amjsurg.2013.10.008. [DOI] [PubMed] [Google Scholar]
  • 14.Karthikeyan S, O’Connor E, Hu W. Barriers and facilitators to writing quality items for medical school assessments – a scoping review. BMC Med Educ. 2019;19:123. doi: 10.1186/s12909-019-1544-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jozefowicz RF, Koeppen BM, Case S, Galbraith R, Swanson D, Glew RH. The qualify to in-house medical school examinations. Acad Med. 2002;77:156–161. doi: 10.1097/00001888-200202000-00016. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.


Articles from Medical Science Educator are provided here courtesy of Springer

RESOURCES