Skip to main content
AEM Education and Training logoLink to AEM Education and Training
. 2019 Sep 19;3(4):317–322. doi: 10.1002/aet2.10381

Does the Emergency Medicine In‐training Examination Accurately Reflect Residents’ Clinical Experiences?

Jason J Bischof 1, Geremiha Emerson 1,, Jennifer Mitzman 1, Sorabh Khandelwal 1, David P Way 1, Lauren T Southerland 1
Editor: Margaret Wolff
PMCID: PMC6795359  PMID: 31637348

Abstract

Objective

The American Board of Emergency Medicine Model of the Clinical Practice of Emergency Medicine (ABEM Model) serves as a guide for resident education and the basis for the resident In‐training Examination (ITE) and the Emergency Medicine Board Qualification Examinations. The purpose of this study was to determine how closely resident–patient encounters in our emergency departments (EDs) matched the ABEM Model as presented in the specifications of the content outline for the ITE.

Methods

This single‐site study of an academic residency program analyzed all documented resident–patient encounters in the ED during a 2.5‐year period recorded in the electronic medical record. The chief complaints from these encounters were matched to the 20 categories of the ABEM Model. Chi‐square goodness‐of‐fit tests were performed to compare the proportions of categorized encounters and proportions of patient acuity levels to the proportions of categories as outlined in the content blueprint of the ITE.

Results

After the exclusion of encounters with missing data and those not involving EM residents, 125,405 encounters were analyzed. We found a significant difference between the clinical experience of EM residents and the ABEM Model as reflected in the ITE for both case categories (p < 0.01) and patient acuity (p < 0.01). The following categories were the most overrepresented in clinical care: signs, symptoms, and presentations; psychobehavioral disorders; and abdominal and gastrointestinal disorders. The most underrepresented were procedures and skills, systemic infectious disorders, and thoracic–respiratory disorders.

Conclusion

The clinical experience of EM residents differs significantly from the ITE Content Blueprint, which reflects the ABEM Model. This type of inquiry may help to provide custom education reports to residents about their clinical encounters to help identify clinical knowledge gaps that may require supplemental nonclinical training.


The American Board of Emergency Medicine (ABEM) provides an annual residency In‐training Examination (ITE) to gauge a resident's educational progress and medical knowledge.1 Both the ITE and the Board Qualification Examination are constructed from the comprehensive list of emergency medicine (EM) core content contained in the ABEM's Model of Clinical Practice (ABEM Model) and associated table of examination specifications.2 The ABEM Model is a consensus document that requires periodic updating to reflect the current practice of EM and the educational expectations that serve as the standard knowledge base against which resident medical knowledge is assessed.3

Clinical patient encounters serve as the key component of residency training and in theory should be reflected in the ABEM Model. However, the case mix and content of patient care encounters are difficult to characterize.4, 5 To our knowledge, although the ABEM Model framework was derived from clinical practice experience by expert panels, it has not been formally compared with the clinical experience of EM residents. Consequently, it is unknown whether the current EM resident bedside clinical experiences reflect the broad‐based educational requirements as outlined by the ABEM Model. As the ABEM model is also the basis for the questions on the ITE, the clinical environment is critical to the success of trainees in EM. This was a preliminary effort to demonstrate the value of profiling residents’ clinical experiences in an academic EM residency program and comparing it to the current ABEM Model to determine if there may be gaps in clinical training that need to be addressed through supplemental, alternative education methods. In other words, this was intended to be a proof‐of‐concept study.

Methods

Study Setting

The residency program is situated at an academic, tertiary care emergency department (ED) with 86 beds and 80,000 patient visits a year, with a secondary community ED with 26 beds and 52,000 patient visits per year. Only 2% of visits to these EDs are pediatric (age < 18 years), since our residents get their pediatric experience at a freestanding pediatric hospital. Those pediatric visits were not included in this study. At the tertiary care hospital, 90% of patients are insured (56% Medicare and/or Medicaid) and 61% have primary care physicians. At the community ED, 82% are insured (70% Medicare and/or Medicaid) and 39% have primary care doctors.

The residency is a 3‐year American Council of Graduate Medical Education–accredited program with 16 EM residents and two EM/internal medicine (EM/IM) residents per year. Residents work 8‐ to 10‐hour shifts without protected time for sign‐out or note writing. All patients with psychiatric complaints receive medical screenings by a resident, attending, or advanced practice provider. Both EDs have low‐acuity or fast‐track sections that are open 10 to 18 hours a day, Monday through Friday, and are primarily run by advanced practice providers.

Data Acquisition

This study used deidentified patient data from the electronic health record system (EPIC) and was approved by the local institutional review board. We queried all patient encounters that involved an EM resident at the study sites over a 2.5‐year period between July 1, 2015, and December 31, 2017. Data included the Emergency Severity Index (ESI) acuity level, the chief complaint (free‐text reason for the visit documented by the triage ED nurse), and unique resident ID code. The initial resident assigned to the patient was considered the resident of record for each encounter. Residents with fewer than 25 encounters were removed from the data set due to concern for incomplete or miscoded data. Records for which no chief complaint was recorded were also removed from the dataset due to an inability to map the encounters to the ABEM Model. ESI and chief complaint information were then mapped to the ABEM ITE Content Blueprint.1 ED diagnosis was obtained but was unavailable for more than half of the encounters, because the electronic health record system does not require an admission diagnosis.

Data Analysis

All chief complaints were tabulated and then independently categorized according to the ABEM model by two board‐certified EM physicians. Only one category was assigned to each chief complaint. Disagreements between the two reviewers were adjudicated by a third independent board‐certified EM physician. For complaints in which all three disagreed, categorization was discussed until consensus was reached. All reviewers are also involved in resident education. The ABEM Model categorized chief complaints and acuity‐levels residents experience in the ED were tallied and proportions of the total encounters were calculated to create observed proportions. These observed proportions were then compared to the expected proportions that were derived from the weights provided by the ITE Content Blueprint.

Chi‐square goodness‐of‐fit tests6, 7 (α = 0.05) were performed to compare the observed categorized chief complaints to the weights provided by the ITE Content Blueprint. Patient acuity scores were also compared. Effect sizes for the chi‐square goodness‐of‐fit tests were calculated using a formula proposed by Lomax and Hahs‐Vaughn (ES = χ2/N(J – 1)), where N is the total sample size and J is the number of categories.6 These can be interpreted as 0.1 = small effect, 0.3 = medium effect, and 0.5 = large effect. The chi‐squares for the content categories were calculated using Motulsky's GraphPad goodness‐of‐fit calculator.7 Chi‐squares for acuity levels were calculated using VassarStats.8 Data management was performed through IBM SPSS Statistics for Windows, Version 25.0. Interpreting significant results (post hoc analysis) involved inspection of the standardized residuals. Those that exceeded the absolute value of 1.96 were considered contributors to the significant differences between observed and expected values.6 To assess the effect of removing encounters without chief complaints, we repeated all analyses by including these encounters in the “other” category.

Results

The 30‐month study period included 160,208 ED encounters. Removing encounters associated with faculty only, fellows, non‐EM residents, residents with fewer than 25 encounters, or encounters without a chief complaint resulted in a total number of 125,405 encounters treated by 86 EM or EM/IM residents. There was 66.8% agreement (n = 488 chief complaints) between the initial two reviewers. The remaining 242 chief complaints were reviewed by a third arbitrator. There was three‐way disagreement in 6.7% (n = 49) of the chief complaints, which were resolved by group consensus.

Tabulation of the list of chief complaints revealed 730 unique chief complaints that were categorized into the 20 categories in the ABEM model. The chi‐square goodness‐of‐fit testing of the observed clinical experience of EM residents and the expected ABEM model categories covered by the ITE are reported in Table 1. Goodness‐of‐fit tests revealed a statistically significant difference (p ≤ 0.001), suggesting that the proportion of categories of chief complaints differs from the proportional representation of these categories on the ITE. The effect size is very small; however, post hoc analysis suggests that all but head, ear, eye, nose & throat, and cardiovascular disorders contributed to this significant difference.

Table 1.

Number and Percentage of Resident–Patient Encounters for EM or EM/IM Residents

Categories Observed ABEM/Expected Standardized Residual
Frequency Percent Frequency Percent
1. Signs, symptoms, and presentations (+) 27,683 22.1 11,286 9.0 154.3
14. Psychobehavioral disorders (+) 11,078 8.8 5,016 4.0 85.6
2. Abdominal and gastrointestinal disorders (+) 18,478 14.7 10,032 8.0 84.3
4. Cutaneous disorders (+) 3,117 2.5 1254 1.0 52.6
12. Nervous system disorders (+) 10,302 8.2 6,270 5.0 50.9
20. Other components** (+) 6,066 4.8 3,762 3.0 37.6
11. Musculoskeletal disorders (nontraumatic) (+) 5,746 4.6 3,762 3.0 32.3
3. Cardiovascular disorders (o) 12,113 9.7 12,541 10.0 –3.8
7. Head, ear, eye, nose, and throat disorders (o) 5,938 4.7 6,270 5.0 –4.2
18. Traumatic disorders (–) 10,937 8.7 12,541 10.0 –14.3
5. Endocrine, metabolic, and nutritional disorder (–) 1,011 .8 2,508 2.0 –29.9
8. Hematologic disorders (–) 974 .8 2,508 2.0 –30.6
15. Renal and urogenital disorders (–) 1,739 1.4 3,762 3.0 –33.0
13. Obstetrics and gynecology (–) 2,639 2.1 5,016 4.0 –33.6
9. Immune system disorders (–) 718 .6 2,508 2.0 –35.7
6. Environmental disorders (–) 352 .3 3,762 3.0 –55.6
17. Toxicologic disorders (–) 1,798 1.4 6,270 5.0 –56.5
16. Thoracic–respiratory disorders (–) 2,937 2.3 10,032 8.0 –70.8
10. Systemic infectious disorders (–) 268 .2 6,270 5.0 –75.8
19. Procedures and skills (–) 1,521 1.2 10,032 8.0 –85.0
Total 125,405 100.0 125,405 100.0  

Encounters are grouped by category of chief complaint and compared to the expected frequencies, which are derived from the 2016 EM Model of Practice content outline for the In‐training Examination. These data were used to set up a chi‐square goodness‐of‐fit test for each group. Results are presented in rank order by standardized residuals which are interpreted by evaluating how large or small they are compared to the absolute value of 1.96. χ2 = 75,870.444; df = 19; p < 0.0001; ES = 0.0326

(+) = Have significantly more encounters than expected under this category than recommended from EM Model of Practice.

(–) = Have significantly fewer encounters than expected under this category than recommended from EM Model of Practice.

(o) = Have about the same number of encounters as would be expected under this category as recommended by EM Model of Practice.

Categories that were overrepresented are labeled in Table 1 with a plus sign. Those with large positive standardized residuals (>50) included signs, symptoms, and presentations; psychobehavioral disorders; abdominal and gastrointestinal disorders; cutaneous disorders; and nervous system disorders. Categories that were underrepresented (negative standardized residuals) are labeled in the table with a minus sign. The most underrepresented categories (≤50) were procedures and skills; systemic infectious disorders; thoracic and respiratory disorders; toxicologic disorders; and environmental disorders. Additionally, 4.8% of clinical encounters were categorized as “other” components, which suggests significant clinical exposure to topics not central to the ABEM model, such as need for preoperative examination, social work consultation, or body‐fluid exposure testing (Table 2).

Table 2.

List of Chief Complaints Within the “Other” Category

Abnormal CT Consult Medication Problem Referral
Abnormal images Critical laboratory values Medication reaction Routine prenatal visit
Abnormal X‐ray Drain Medication refill Second opinion
Admission notification Drug screen MRI results Social work consult
Baclofen pump refill ED follow‐up ODRC doctor sick call Surgical follow‐up
Body fluid exposure Exposure to STI Other Transportation issues
Cancer Feeding tube problem Physician contact Well child
Chemotherapy Leaking fluid Postoperative problem  
Chronic care needs Letter for school/work Preoperative examination
Chronic pain Medication follow‐up Pump disconnect

The observed clinical acuity of patient encounters was also significantly different from the ITE Content Blueprint specifications (p ≤ 0.001) with a moderate effect size, 0.29 (Table 3). Analysis of the standardized residuals suggests that all levels of acuity contributed to this significant difference, as the observed number of encounters were far fewer than expected for the critical (ESI 1) and nonacute acuity levels (ESI 5) and much greater than expected for the lower acuity level (ESI 3 and 4). Results were similar across analyses with and without the data involving missing chief complaints.

Table 3.

Number and Percentage of Resident–Patient Encounters Grouped by Level of Patient Acuity for EM/EM‐IM Residents With Chi‐square Goodness‐of‐fit Tests Comparing Observed Frequencies to Expected Frequencies Based on the EM In‐training Examination Weights

Category Observed ABEM/Expected Standardized Residual
Frequency Percent Frequency Percent
Critical 2,658 2.0 37,621.5 30.0 –180.26
Emergent 53,461 43.0 50,162.0 40.0 +14.73
Lower 68,340 54.0 26,335.1 21.0 +258.84
None 946 1.0 11,286.5 9.0 –97.33
Total 125,405 1.00 125,405 1.00  

χ2 = 109,182.76; df = 3; p < 0.0001; ES = 0.2902.6

Discussion

The clinical experience of our residents spans the breadth of the ABEM Model categories. The ABEM Model provides the structure and guidance for current EM training, so it is reassuring that all categories were represented, albeit some in lower volumes. Several prior studies have identified challenges with tracking the EM resident clinical experience and assuring adequate clinical exposure.4, 5 New approaches are required to assess the training environment and the current training standards. Chief complaint categorization may be helpful in this regard, because it reflects the diversity of clinical encounters in the ED. Further comparisons of chief complaint, ED visit clinical impression, and eventual diagnosis may be helpful to determine the optimal way to profile the resident clinical experience.

This study capitalizes on the availability of clinical information associated with the increased reliance on electronic medical records (EMRs). As demonstrated by Douglass et al.5 this type of approach can inform current and future curricular interventions based on the diversity that exists in the current training environment. However, using retrospective data may reduce complex clinical situations and not fully reflect the resident clinical experience. For example, a patient with an initial chief complaint of “chest pain” and a final diagnosis of “chest pain” could be a 25‐year‐old with costochondritis requiring minimal evaluation or an 85‐year‐old with multiple medical comorbidities requiring significant testing and treatment to eliminate other life‐threatening causes of chest pain.

Despite these limitations, these data are helpful for demonstrating the value of this type of program evaluation. Results from this study reveal several key points. First, the current clinical encounters experienced by the EM residents at this academic center significantly differ from the ABEM Model in several key areas (i.e., procedures and skills, environmental disorders, toxicology). Given the rarity of certain disease processes this is not entirely unexpected. However, these underrepresented content areas may need heavier emphasis through nonclinical education methods such as simulations, conference presentations, or other creative methods. Conversely, areas overrepresented clinically can inform changes in ED flow that may reduce redundant clinical experiences. For example, 8.8% of ED encounters are for psychobehavioral disorders. If by the third year of training these encounters are deemed unlikely to add significantly to the residents’ education, a different clinical pathway for these residents might be considered.

Analysis of resident activity can also inform general ED operations. While we expected the signs, symptoms, and presentations category to be overrepresented as we were categorizing by chief complaint, these data suggest that 4.8% of our ED encounters are for “other” problems not categorized in the model. Table 2 demonstrates the breadth of these problems, including the need to expedite the work‐up and care of abnormal outpatient testing results, the need for MRI, and care coordination such as medication refills. The ED is the front porch of the entire health system, and care coordination is a necessary part of our practice. Future ABEM models may want to emphasize knowledge of patient flow through health systems, care coordination, and transitions of care.

The observed acuity of EM resident encounters when compared to the ITE distribution differed in the number of critical patients. This is expected due to the ITE's appropriate overemphasis of patients with high acuity. A secondary analysis of acuity including the encounters without a chief complaint (8.7%) did not affect the comparison despite our hypothesis that higher‐acuity patients (trauma, stroke, intubated) bypass the traditional triage method. This provides some validity to using chief complaints to profile resident clinical education, but there is still a chance that missing data impacted study results.

Overall, we found that our resident case mix and acuity vary significantly from the ABEM model, suggesting a need to adjust our clinical workflow or provide alternative education methods. This study uncovered several potential local problems—high volume of patients in the ED for mental health reasons and low volumes of toxicology encounters and critical care. This does not necessarily mean that the ABEM Model does not reflect clinical practice, nor do we suggest that the ITE should be a perfect reflection of clinical care. Residents should be prepared to handle high‐acuity, low‐frequency illnesses/injuries, and the ITE acuity levels reflect this. However, this analysis could inform updates to the ABEM Model, such as ensuring that residents have an understanding of transitions of care and other uses of the ED than for acute care. Further research into the breadth of clinical practice experienced by residents at other academic institutions could assist other residency programs in identifying surpluses and deficits in their training. Combining resident training information with a fresh national practice analysis would provide comprehensive data to inform future updates to the ABEM Model.

Limitations

Our data were limited to the residents’ adult patient experience since our residents obtain their pediatric ED experience at a separate children's hospital. Furthermore, this initial database was not broad enough to characterize the complete experience of each resident in our program. For example, 2017 PGY‐1 residents would have only 6 months of data in the system, while 2015 PGY‐1 residents would have 2.5 years of data. Therefore, these data cannot be used characterize patient encounters per resident by PGY level per year. We are also unable to account for the effects of different resident shifts and of advanced practice providers staffing the low‐acuity area Monday through Friday. Additional limitations secondary to the retrospective nature of the data capture include missing data or miscoded data that was omitted from the data capture. This, however, would result in an underestimation of the total number of encounters and is likely random across all categories and acuity levels of the encounters given the consistent method of data entry into the EMR that was queried.

Finally, our categorization is limited by the use of triage nurse chief complaint designation. The study team attempted to use natural language processing to assess the medical decision‐making section of the notes. However, the notes often have a broad differential diagnosis listed which complicated the identification of the actual diagnosis. Manual chart review would be needed, which was not feasible with 125,000 encounters. A future study option would be to perform manual chart review of a subset of random encounters, blinded to the chief complaint and chief complaint categorization, to assess how often the chief complaint categorization correlates. That could inform confidence intervals on future analyses. We suspect that some conditions such as psychiatric illnesses are likely adequately classified by chief complaint, suggesting that our residents are seeing an overwhelming number of these patients in relation to what they should be learning for their training. Other chief complaints are unlikely to map as well. For example, a chief complaint of nausea could be a gastrointestinal illness, an acute coronary syndrome, or a medication overdose.

Conclusion

In this single residency study, the clinical experiences of emergency medicine residents differed significantly from the 2016 American Board of Emergency Medicine Model. Electronic medical record data on resident clinical experience can be analyzed to identify areas of clinical knowledge requiring additional intervention by residency programs to guarantee adequate training of emergency medicine residents.

AEM Education and Training 2019;3:317–322.

Presented at the annual meeting of the Society for Academic Emergency Medicine, Las Vegas, NV, May 2019.

The authors have no relevant financial information or potential conflicts to disclose.

Author contributions: LTS and JM conceived the study; LTS, GE, JM, SK, and DPW designed the study; LTS, JM, and DPW defined the data fields needed from the EMR and acquired the data through the institution's information warehouse; JJB, DPW, GE, JM, and LTS analyzed and interpreted the data; JJB, GE, LTS, and DPW drafted the original manuscript; and all authors contributed to critical revisions of the manuscript.

References

  • 1. American Board of Emergency Medicine . In‐Training Examination. East Lansing, MI; Available at: https://www.abem.org/public/for-program-directors/in-training-examination. Accessed January 22, 2019.
  • 2. Counselman FL, Babu K, Edens MA, et al. The 2016 Model of the Clinical Practice of Emergency Medicine. J Emerg Med 2017;52:846–9. [DOI] [PubMed] [Google Scholar]
  • 3. Hockberger RS, La Duca A, Orr NA, Sklar DP. Creating the model of a clinical practice: the case of emergency medicine. Acad Emerg Med 2003;10:161–8. [DOI] [PubMed] [Google Scholar]
  • 4. Langdorf MI, Strange G, Macneil P. Computerized tracking of emergency medicine resident clinical experience. Ann Emerg Med 1990;19:764–73. [DOI] [PubMed] [Google Scholar]
  • 5. Douglass A, Yip K, Lumanauw D, Fleischman RJ, Jordan J, Tanen DA. Resident clinical experience in the emergency department: patient encounters by postgraduate year. Acad Emerg Med Educ Train 2019;3:243–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Lomax R, Hahs‐Vaughn DL. An Introduction to Statistical Concepts. 3rd ed New York, NY: Routledge, Taylor‐Francis Group, 2012. [Google Scholar]
  • 7. Motulsky H. Graphpad Calculator for Chi‐Square Goodness of Fit Test. Available at: https://www.graphpad.com/quickcalcs/chisquared1.cfm. Accessed April 15, 2019.
  • 8. Lowry R. VassarStats: Website for Statistical Computation. Chi‐Square Goodness of Fit Test. Avon, CT. Available at: http://vassarstats.net/. Accessed April 15, 2019.

Articles from AEM Education and Training are provided here courtesy of Wiley

RESOURCES