Skip to main content
AEM Education and Training logoLink to AEM Education and Training
. 2022 Mar 31;6(2):e10736. doi: 10.1002/aet2.10736

Implementation of a pilot novel objective peer comparison evaluation system in an emergency medicine residency program

Kraftin E Schreyer 1,, Megan E Healy 1, Zachary Repanshek 1, Wayne A Satz 1, Jacob W Ufberg 1
PMCID: PMC9005166  PMID: 35434444

Abstract

Objectives

Emergency medicine (EM) residents are currently evaluated via The Milestones, which have been shown to be imperfect and subjective. There is also a need for residents to achieve competency in patient safety and quality improvement processes, which can be accomplished through provision of peer comparison metrics. This pilot study aimed to evaluate the implementation of an objective peer comparison system for metrics that quantified aspects of quality and safety, efficiency and throughput, and utilization.

Methods

This pilot study took place at an academic, tertiary care center with a 3‐year residency and 14 residents per postgraduate year (PGY) class. Metrics were compared within each PGY class using Wilcoxon signed‐rank and rank‐order analyses.

Results

Significant changes were seen in the majority of the metrics for all PGY classes. PGY3s accounted for the significant change in EKG and X‐ray reads, while PGY1s and PGY2s accounted for the significant change in disposition to final note share. Physician evaluation to disposition decision was the only metric that did not reach significance in any class.

Conclusions

These preliminary data suggest that providing objective metrics is possible. Peer comparison metrics could provide an effective objective addition to the milestone evaluation system currently in use.

INTRODUCTION

Emergency medicine (EM) residents are currently evaluated via The Milestones, developed through the Accreditation Council for Graduate Medical Education (ACGME). 1 , 2  While the system was designed to measure specific outcomes of specialty‐specific knowledge, skills, attitudes, and behaviors, it has been shown to be imperfect and subjective. 3 Examples include a gender gap demonstrated in milestone attainment as assessed via direct observation, poor agreement between milestone‐based end of shift evaluation scores and clinical competency committee ratings, and EM residents rating themselves higher in all milestone subcompetencies than faculty ratings. 4 , 5 , 6 Inconsistent results across assessment modalities highlight the need for varied and broad objective data to accurately track resident milestone attainment and supplement more subjective assessments. 7

At the same time, the ACGME has increasingly focused on the need for residents to achieve competency in patient safety and quality improvement processes. With regard to quality metrics (VI.A.1.b), “access to data is essential to prioritizing activities for care improvement and evaluating success of improvement efforts.” 8 Individualized clinical data that provide peer comparison in a deidentified manner is one way to address this.

Research on the use of clinical metrics in assessment of resident progress is nascent. A prior qualitative assessment found that data from the electronic medical record (EMR) have the potential to support and guide resident assessment and feedback. 9 Another randomized pilot study of EM residents demonstrated that feedback with performance metric scorecards led to higher resident satisfaction with the feedback process. 10 In another study, personalized peer‐comparison feedback provided to EM residents led to an increased number of ultrasound scans, suggesting it could be a useful motivational tool for other metrics. 11 Schumacher et al. have previously highlighted the need for resident‐sensitive quality metrics, but the definitions and measurements of those metrics are not yet well established. 12

In this pilot study, we evaluate the implementation of an objective peer comparison system for metrics that quantified aspects of quality and safety, efficiency and throughput, and utilization. We hypothesized that the peer comparison metrics would be incorporated into the existing evaluation system.

METHODS

This pilot study took place at an academic, tertiary care center with a 3‐year residency and 14 residents per postgraduate year (PGY) class. In total, there were 17 male residents and 25 female residents who were all doctors of medicine that graduated from U.S. medical schools. Peer comparison metrics, which were chosen by residency and operational leadership, were generated from the electronic medical record (EMR) (Table 1). Metrics were calculated for academic year 2020–2021.

TABLE 1.

Definitions of resident metrics by category

Metric category Metrics Metric definition
Quality and safety EKG read (%) Number of EKG interpretations/Total number of EKGs ordered
X‐ray read (%) Number of X‐ray interpretations/Total number of X‐rays ordered on discharged patients
Disposition to final note share (min) Time from when the patient is dispositioned until the final time a note is shared by the resident
Throughput and efficiency Patients per EM block (#) Total number of patients seen/Number of EM blocks during study time frame
Room to physician evaluation (min) Time from when the patient was placed in a room until the physician evaluation occurred
Physician evaluation to disposition decision (min) Time from physician evaluation to disposition entered in EMR
Utilization Procedure notes (#) Number of procedures documented/ Patient
Diagnostic images (#) Number of diagnostic images ordered/ Patient

Abbreviations: #, number; %, percent; EKG, electrocardiogram; EM, emergency medicine; EMR, electronic medical record.

Quality and safety metrics included the interpretation of electrocardiograms (EKGs) and X‐rays. X‐rays are not read in real time by radiology at the study institution, so EM providers are encouraged to interpret the studies. The time from patient disposition to note completion by the resident was also identified as a quality and safety metric, since a completed note is an important source of information for the team assuming care of the patient. Throughput metrics included traditional metrics such as room to physician evaluation and physician evaluation to disposition decision. Door to provider time, which is more easily impacted by arrival and volume curves, patient acuity, and inpatient capacity, was not used. The average number of patients seen by each resident per EM block was also included as a marker of efficiency. Utilization metrics included the number of procedure notes written, which indicates the number of procedures performed by the resident, and the number of diagnostic images ordered by the resident. Larger numbers of procedure notes and diagnostic images were used as markers of overutilization.

Metrics were provided during mid‐year and end‐of‐year resident feedback meetings with program leadership. PGY1s received metrics on quality and safety, PGY2s additionally received metrics on throughput and efficiency, and PGY3s received all metrics. Residents received feedback on their current rank per metric throughout the academic year.

A Wilcoxon signed rank test, a t‐test dependent samples analog for nonparametric data, was used to compare time periods for all classes. Subgroup analyses were done for each individual class. Significance was set at < 0.05. 13 Statistical software R version 4.1.0 was used for analysis. 14  The rank‐order analysis was applied to each PGY year with a bottom cohort (ranked 10–14), a middle cohort (ranked 5–9), and a top cohort (ranked 1–4). Cohort rankings for each metric were quantified and compared throughout the year.

As this study focused on performance improvement around resident education, it was not required to undergo IRB approval at the study institution.

RESULTS

All eligible residents participated in the study. Results are summarized in Table 2. Combined PGY1‐3 differences in EKG reads (= 0.017) and X‐ray reads (= 0.002) and disposition to final note share (= 0.001) were significant. The average percent of EKG reads increased from 9.44% to 21.50% from the PGY1 class to the PGY3 class, and the average percent of X‐ray reads increased from 19.74% to 34.94%. Disposition to final note share time decreased by an average of 32.82 min. PGY subgroup analysis demonstrated that the PGY3 group was responsible for the differences in EKG reads (= 0.002) and X‐ray reads (= 0.001), while the PGY1 (= 0.000) and PGY2 (= 0.033) classes contributed to the disposition to final note share difference, which improved from 124 to 79.2 min for the PGY1s and from 90.6 to 76.0 min for the PGY2s. Combined PGY2 and PGY3 differences in patients per EM block (= 0.000) and room to physician evaluation (= 0.026) reached significance. Both PGY2 (= 0.000) and PGY3 (= 0.000) classes contributed to the difference in patients per EM block, but the PGY2 class was responsible for the significant difference in room to physician evaluation (PGY2, = 0.002; PGY3, = 0.903), which decreased from an average of 20.3 to 14.2 min for that class during the year. The combined difference in physician evaluation to disposition decision between the PGY2s and PGY3s did not reach significance (= 0.374). Subgroup analysis of the PGY3 specific metrics showed that differences in both procedure notes (= 0.001) and diagnostic images (= 0.000) were significant.

TABLE 2.

Statistical analysis of peer comparison metrics

PGY class Metric
EKG read X‐ray read Disposition to final note share Patients per ED block Room to physician evaluation Physician evaluation to disposition decision Procedure notes Diagnostic images
1 9.44 (3.4) p = 0.268 19.74 (7.0) p = 0.715 101.39 (1.6) p = 0.000*
2 16.08 (7.1) p = 0.119 25.19 (11.1) p = 0.268 83.34 (62.4) p = 0.033* 112.65 (54.0) p = 0.000* 17.23 (6.1) p = 0.002* 206.11 (62.2) p = 0.952
3 21.50 (8.5) p = 0.002* 34.94 (9.9) p = 0.001* 68.57 (65.4) p = 0.217 111.65 (55.4) p = 0.000* 17.31 (7.0) p = 0.903 194.96 (48.2) p = 0.119 44.33 (9.8) p = 0.001* 51.12 (38.2) p = 0.000*
1–3 p = 0.017* p = 0.002* p = 0.001* p = 0.000* p = 0.026* p = 0.374 p = 0.001* p = 0.000*

Statistical analysis of peer comparison metrics. Results reported as mean (standard deviation) p‐value. p‐values are results of Wilcoxon signed‐rank tests.

Abbreviations: ED, emergency department; EKG, electrocardiogram.

*

Indicates statistical significance (p < 0.05)

Within all three PGY classes, the greatest improvements throughout the year were seen in the lowest cohort, with an average improvement in rank of 1.23 (PGY3), 1.17 (PGY2), and 1 (PGY1). Those in the middle cohort improved by 0.46 (PGY3), 0.17 (PGY2), and 0.6 (PGY1). Those in the top cohort, in all three PGY classes, saw an average decrease in rank of 2.3 (PGY3), 1.25 (PGY2), and 0.25 (PGY1). In both the PGY1 and PGY2 classes, only four residents did not improve between or within their existing cohorts for any of the measured metrics. In the PGY3 class, only three residents showed no overall improvement for any metrics.

DISCUSSION

While normative data may be considered the antithesis of competency‐based evaluation, it has been shown to enhance the sensitivity of assessments and have proven useful in a variety of clinical arenas. 15 , 16 In this pilot study, significant changes were seen in the majority of the metrics for all PGY classes. PGY3s accounted for the significant change in EKG and X‐ray reads, which could indicate that appropriate attention was refocused on these quality metrics that are traditionally emphasized earlier in residency. PGY1s and PGY2s accounted for the significant change in disposition to final note share, which could be a marker of improved efficiency and time management as it relates to patient safety. There was a significant difference in procedure notes for the PGY3s, which was used as a marker of overutilization. It is possible, however, that the change reflected more accurate documentation of procedures that were being done. Physician evaluation to disposition decision was the only metric that did not reach significance in any class, which is most likely because it is most influenced by the supervising attending.

The majority of residents did improve in at least one metric over the course of the year, and there were consistent changes in the cohort rankings of all PGY classes. The greatest improvement was seen in the bottom cohort, which suggests that those specific residents may benefit the most from receiving the objective data. However, given a near proportional reduction in the top cohort, this may also reflect regression to the mean.

Ad hoc discussions with program leadership demonstrated that multisource data improved the program's ability to provide actionable feedback to residents in a holistic manner. For instance, some of the senior residents in the top cohort of efficiency metrics were lower in their patient safety metrics, indicating that they may be sacrificing one skill in service to another. This may also explain why some of the senior residents with highest efficiency decreased in their rank over time. Small downward trends in a single metric, when viewed holistically, may in fact demonstrate growth for these learners. These type of data serve a more nuanced discussion with both residents and faculty about the balance of skills required for successful EM practice.

Although objective peer comparison data is, by nature, in contrast to competency‐based medical education, it could be a useful adjunct in helping set standards for trainees. By accumulating group level performance data for each PGY class over time, expectations could set, either for minimum passing standard, or for a “well‐prepared” mastery standard for future trainees. 17

This preliminary study suggests that providing objective metrics as an adjunct to subjective feedback and milestone evaluations was valuable to program leadership and led to significant improvements. It could further be used to establish criterion references for future iterations.

LIMITATIONS

This was a single center study that used one EMR, which limits generalizability. While the data were generated from timestamps in the EMR, it was dependent on residents assigning themselves to patients accurately and lack of duplicate orders. For example, at times, EKGs are ordered by both the physicians and nursing, but only one EKG is completed and charted. This could have led to some inaccuracies with the EKG metric Some metrics can also be influenced by the supervising attending, particularly physician evaluation to disposition Decision, Procedure Notes, and Diagnostic Images. Furthermore, these data are purely descriptive and there are no available reference data to suggest what might be considered a meaningful change. Hence, no a priori power analysis was performed.

Additionally, there was no comparator group in this study. The authors felt that it was more important for all residents to receive the objective metrics than, for the sake of study design, to provide only half with potentially valuable information. It is therefore difficult to ascertain if the reported changes are due to the new objective feedback mechanism or expected growth over time. Further studies may be done to more clearly evaluate the impact of objective peer metrics on performance.

CONCLUSIONS

This pilot study demonstrated it is possible to incorporate objective peer comparison metrics into the existing evaluation system for EM residents. The objective peer comparison metrics in this pilot study may have contributed to improvements seen over time in each PGY class. This study suggests that these metrics could provide an effective objective addition to the subjective milestone evaluation system currently in use.

CONFLICT OF INTEREST

KES, MH, ZR, WAS, and JU report no conflict of interest.

AUTHOR CONTRIBUTIONS

KES was responsible for study concept and design. KES and WAS were responsible for acquisition of data. KES, WAS, and JU were responsible for analysis and interpretation of the data. KES, MH, ZR, and JU participated in drafting of the manuscript. KES and JU were responsible for critical revision of the manuscript for important intellectual content. WAS provided statistical expertise. No authors contributed to acquisition of funding as no funding was required.

Schreyer KE, Healy ME, Repanshek Z, Satz WA, Ufberg JW. Implementation of a pilot novel objective peer comparison evaluation system in an emergency medicine residency program. AEM Educ Train. 2022;6:e10736. doi: 10.1002/aet2.10736

PRESENTATIONS: Poster Presentation, American Academy of Emergency Medicine Scientific Assembly, St. Louis, MO, June 2021.

Funding information

None.

REFERENCES

  • 1. Emergencymedicinemilestones.pdf. Accessed January 28, 2022. https://www.acgme.org/globalassets/pdfs/milestones/emergencymedicinemilestones.pdf
  • 2. Beeson MS, Carter WA, Christopher TA, et al. The development of the emergency medicine milestones. Acad Emerg Med. 2013;20(7):724‐729. 10.1111/acem.12157 [DOI] [PubMed] [Google Scholar]
  • 3. Natesan P, Batley NJ, Bakhti R, El‐Doueihi PZ. Challenges in measuring ACGME competencies: considerations for milestones. Intl J Emerg Med. 2018;11(1):39. 10.1186/s12245-018-0198-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Goldflam K, Bod J, Della‐Giustina D, Tsyrulnik A. Emergency medicine residents consistently rate themselves higher than attending assessments on ACGME milestones. West J Emerg Med. 2015;16(6):931‐935. 10.5811/westjem.2015.8.27247 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Dehon E, Jones J, Puskarich M, Sandifer JP, Sikes K. Use of Emergency medicine milestones as items on end‐of‐shift evaluations results in overestimates of residents’ proficiency level. J Grad Med Educ. 2015;7(2):192‐196. 10.4300/JGME-D-14-00438.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Dayal A, O’Connor DM, Qadri U, Arora VM. Comparison of male vs female resident milestone evaluations by faculty during emergency medicine residency training. JAMA Intern Med. 2017;177(5):651‐657. 10.1001/jamainternmed.2016.9616 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Hauff SR, Hopson LR, Losman E, et al. Programmatic assessment of level 1 milestones in incoming interns. Acad Emerg Med. 2014;21(6):694‐698. 10.1111/acem.12393 [DOI] [PubMed] [Google Scholar]
  • 8. Accreditation Council for Graduate Medical Education . Common Program Requirements (Residency). Published online February 3, 2020.
  • 9. Sebok‐Syer SS, Goldszmidt M, Watling CJ, Chahine S, Venance SL, Lingard L. Using electronic health record data to assess residents’ clinical performance in the workplace: The good, the bad, and the unthinkable. Acad Med. 2019;94(6):853‐860. 10.1097/ACM.0000000000002672 [DOI] [PubMed] [Google Scholar]
  • 10. Mamtani M, Shofer FS, Sackeim A, Conlon L, Scott K, Mills AM. Feedback with performance metric scorecards improves resident satisfaction but does not impact clinical performance. AEM Educ Train. 2019;3(4):323‐330. 10.1002/aet2.10348 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Hempel D, Pivetta E, Kimberly HH. Personalized peer‐comparison feedback and its effect on emergency medicine resident ultrasound scan numbers. Crit Ultrasound J. 2014;6(1):1. 10.1186/2036-7902-6-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Schumacher DJ, Martini A, Holmboe E, et al. Developing resident‐sensitive quality measures: Engaging stakeholders to inform next steps. Acad Pediat. 2019;19(2):177‐185. 10.1016/j.acap.2018.09.013 [DOI] [PubMed] [Google Scholar]
  • 13. Wilcoxon F. Individual comparisons by ranking methods. Biomet Bullet. 1945;1(6):80‐83. 10.2307/3001968 [DOI] [Google Scholar]
  • 14. R: The R Foundation . The R Foundation for Statistical Computing; 2021. Accessed December 6, 2021. https://www.r‐project.org/foundation/ [Google Scholar]
  • 15. van Rentergem JAA, Murre JMJ, Huizenga HM. Multivariate normative comparisons using an aggregated database. PLoS One. 2017;12(3):e0173218. 10.1371/journal.pone.0173218 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. O’Connor PJ. Normative data: Their definition, interpretation, and importance for primary care physicians. Fam Med. 1990;22(4):307‐311. [PubMed] [Google Scholar]
  • 17. Barsuk JH, Cohen ER, Wayne DB, McGaghie WC, Yudkowsky R. A comparison of approaches for mastery learning standard setting. Acad Med. 2018;93(7):1079‐1084. 10.1097/ACM.0000000000002182 [DOI] [PubMed] [Google Scholar]

Articles from AEM Education and Training are provided here courtesy of Wiley

RESOURCES