Skip to main content
Lippincott Open Access logoLink to Lippincott Open Access
. 2022 Nov 8;98(3):367–375. doi: 10.1097/ACM.0000000000005084

Using Resident-Sensitive Quality Measures Derived From Electronic Health Record Data to Assess Residents’ Performance in Pediatric Emergency Medicine

Alina Smirnova 1, Saad Chahine 2, Christina Milani 3, Abigail Schuh 4, Stefanie S Sebok-Syer 5, Jordan L Swartz 6, Jeffrey A Wilhite 7, Adina Kalet 8, Steven J Durning 9, Kiki MJMH Lombarts 10, Cees PM van der Vleuten 11, Daniel J Schumacher 12
PMCID: PMC9944759  PMID: 36351056

Purpose

Traditional quality metrics do not adequately represent the clinical work done by residents and, thus, cannot be used to link residency training to health care quality. This study aimed to determine whether electronic health record (EHR) data can be used to meaningfully assess residents’ clinical performance in pediatric emergency medicine using resident-sensitive quality measures (RSQMs).

Method

EHR data for asthma and bronchiolitis RSQMs from Cincinnati Children’s Hospital Medical Center, a quaternary children’s hospital, between July 1, 2017, and June 30, 2019, were analyzed by ranking residents based on composite scores calculated using raw, unadjusted, and case-mix adjusted latent score models, with lower percentiles indicating a lower quality of care and performance. Reliability and associations between the scores produced by the 3 scoring models were compared. Resident and patient characteristics associated with performance in the highest and lowest tertiles and changes in residents’ rank after case-mix adjustments were also identified.

Results

274 residents and 1,891 individual encounters of bronchiolitis patients aged 0–1 as well as 270 residents and 1,752 individual encounters of asthmatic patients aged 2–21 were included in the analysis. The minimum reliability requirement to create a composite score was met for asthma data (α = 0.77), but not bronchiolitis (α = 0.17). The asthma composite scores showed high correlations (r = 0.90–0.99) between raw, latent, and adjusted composite scores. After case-mix adjustments, residents’ absolute percentile rank shifted on average 10 percentiles. Residents who dropped by 10 or more percentiles were likely to be more junior, saw fewer patients, cared for less acute and younger patients, or had patients with a longer emergency department stay.

Conclusions

For some clinical areas, it is possible to use EHR data, adjusted for patient complexity, to meaningfully assess residents’ clinical performance and identify opportunities for quality improvement.


Aligning learner and patient outcomes is a fundamental component of competency-based medical education (CBME). Without understanding how residency training is linked to quality of care, CBME cannot achieve its mandate of ensuring that training prepares graduates to provide the best possible care to populations of patients. 1 While several large studies have demonstrated that where a physician completes residency training predicts future clinical performance 28 and others have highlighted the critical role that residents play in ensuring patient care quality during training, 9,10 the use of clinical performance metrics in graduate medical education (GME) remains limited. Some have proposed a “big data” approach to elucidate the relationship between education and health care processes and outcomes, which uses electronic health record (EHR) data for quality metrics. 11,12 EHR data have previously been used to determine resident experience (e.g., conditions seen), 13 and obtaining resident performance quality metrics would be a logical next step. The success of this approach depends on the availability of metrics with reliability and validity evidence. 14

Traditional quality metrics do not adequately represent the clinical work performed by residents, and, therefore, cannot be used to link residency training to health care quality. 1420 For instance, mortality rates are often multifactorial and usually cannot be attributed to a single clinician. Resident-sensitive quality measures (RSQMs) attempt to address this gap. RSQMs are clinical care measures that are both important to providing care for an illness of interest, and likely completed by a resident rather than another member of the team or by the team collectively. 19,21 RSQMs in pediatric emergency medicine, developed in consultation with supervisors and residents, 19,21 have demonstrated a wide range of resident performance on both individual and composite measures of asthma, bronchiolitis, and closed head injury, 22 in relation to other variables such as patient complexity and acuity, 23 and for potential use in summative assessments. 24 For RSQMs to fulfill their potential for widespread use in GME, they must not only be relatively easily extractable from the EHR, but they must also demonstrate reliability and provide validity evidence.

The aim of this study was to examine whether RSQMs that are easily extractable from the EHR can meaningfully assess residents’ performance. In this proof-of-concept study, we assessed the feasibility of systematically collecting resident-specific performance measures from the EHR and evaluated the validity of these measures using the first 3 steps in Kane and Messick’s validity frameworks: scoring (supported by Messick’s response process), generalization (supported by Messick’s internal structure and content), and extrapolation (supported by Messick’s relationship to other variables). 25 We did not examine the use of these measures for decision supports, but rather to highlight the ability to develop a metric of performance through the use of automatically extractable clinical performance measures, without the need for chart review, which is essential for applying these measure on a larger scale. We tested whether EHR data could be used to create RSQM composite scores that reliably discriminate high- and low-performing residents in management of 2 common respiratory diseases, bronchiolitis and asthma, in the pediatric emergency department (PED). Additionally, we aimed to understand which resident, faculty, and patient characteristics were associated with residents’ performance on these RSQMs. Providing evidence of validity and reliability for RSQMs that are easily extractable from the EHR can help provide a baseline for future replication studies and comparison between sites and, ultimately, support their wider application in CBME.

Method

Setting and participants

Bronchiolitis and asthma are 2 of the most common reasons for visits to PEDs, where residents have a high degree of autonomy in ordering treatments before the attending sees the patient. 26 Their frequency and evidenced-based standardized care protocol make these chief complaints ideal for examining what and how residents do when caring for patients with these illnesses. We did not include closed head injury RSQMs because they were less automatically extractable from the EHR. Any PED encounters with billing diagnoses of bronchiolitis or asthma, where a resident was assigned to the patient and completed the clinical note, were extracted from the EHR (Epic Systems, Verona, Wisconsin) at Cincinnati Children’s Hospital Medical Center (CCHMC) between July 1, 2017, and June 30, 2019. The deidentified datasets contained information about patient and resident characteristics, as well as the supervising faculty of record for the encounter. The full data dictionary is available in Supplemental Digital Appendix 1, at http://links.lww.com/ACADMED/B356. CCHMC is a quaternary children’s hospital with approximately 60,000 PED visits annually. Between 2 and 7 residents usually staff the department at any given time. The majority are categorical pediatric residents, who spend 4 months in the PED during their 3-year residency, and emergency medicine residents, who complete 6 months during a 4-year residency. Residents from pediatric combined training programs such as medicine–pediatrics (4–5 years) and family medicine (3 years) also rotate through the PED. We obtained CCHMC institutional review board approval before data extraction and analysis. The study findings were reported using the CONSORT extension to pilot and feasibility trials. 27

Data and measures

The analysis focused on easily extractable RSQMs for each illness of interest. Using composite scores based on several individual measures compensates for the potential lack of variability in single performance measures. 2831 While RSQM composite scores based on 19–23 individual measures previously have shown good variability, 22 only 5 RSQMs were readily extractable from the EHR for each condition we studied, which were ultimately included in the analysis. We focused on RSQMs only available from the EHR since the aim of the study was to develop a set of measures that could be realistically replicated at other institutions (scalability).

Patients aged 0–1 year were included in the bronchiolitis dataset. Patients aged 2–21 years were included in the asthma dataset. The patient age ranges were chosen because bronchiolitis becomes less common over the age of 1 year and a formal asthma diagnosis is less common under 2 years of age. Additionally, we included several relevant patient, resident, and supervising faculty characteristics, described below.

Methodological approach

To provide evidence of scoring, we used a psychometric approach to determine appropriateness of individual items for the composite score for each condition. First, we calculated the proportion of patients who received appropriate care at the item level, with ideal values being ~0.50 and a range of 0.30–0.70 deemed generally acceptable. 32 We also calculated the internal consistency of each composite score, with a range of 0.7–1 indicating that items may be grouped together to provide an overall competence in the care of bronchiolitis or asthma evaluation. 32

Once individual items passed the initial screening, we used 3 different models to create the composite scores: a raw score model (model 1), an unadjusted latent score model (model 2), and an adjusted latent score model (model 3). In model 1, each indicator was given equal weight, and each contributed to the overall score uniformly. In models 2 and 3, we used hierarchical generalized linear models to provide differential weights for each indicator. 33 In addition to valuing more difficult items compared with easier items, this approach can take into account various patient and resident characteristics. Model 2 was unadjusted, while model 3 tested several resident and patient characteristics as covariates. For model 3, the following statistically significant (P < .05) characteristics were included: patient age, PED length of stay, initial placement in the medical resuscitation bay, and year presented in the PED (see Supplemental Digital Appendix 2, at http://links.lww.com/ACADMED/B356). In all models, indicators were nested within patients and patients within residents. We standardized the scores to have a mean of 50 and standard deviation of 10 to facilitate interpretation and comparability.

In models 2 and 3, we also assessed the difficulty of each item comprising the composite score as a final check to ensure each item significantly contributed to the latent composite scores. Item difficulty was judged based on the assumption that we were indirectly measuring a “latent” ability estimate (theta) of a person who would be considered average (50% of residents) in providing the correct treatment. The estimates range from −3 to +3. An item that would be equally difficult or easy for an average ability person would have a value of 0. Negative values indicate more difficult items.

To provide evidence of generalization within each model, we used composite scores to rank residents, where lower percentiles indicate a lower quality of care and, therefore, lower level of performance. Shifts in rankings were then evaluated using correlations and comparative statistical testing. Additionally, we examined which residents experienced the greatest shift in ranking by investigating which patient encounter characteristics were associated with at least a 10-point increase or decrease in RSQM percentile scores.

To provide evidence of extrapolation, we explored the relationships of residents’ performance in the top and bottom tertiles with several patient, resident, and faculty-of-record characteristics using t tests and chi-square tests. Patient characteristics were mean age, PED length of stay, whether they presented to a medical resuscitation bay compared with regular PED room initially, and time of presentation to the PED. Resident characteristics were postgraduate training year, sex, program type, and mean patient panel size. Supervising faculty-of-record characteristics were faculty’s own performance on the relevant RSQM calculated based on patient encounters seen by faculty within the same study period without a resident and mean patient panel size.

For descriptive and comparative analysis, we used SPSS statistical software, version 26 (SPSS, Inc., Chicago, Illinois), and SAS statistical software, version 9.4 (SAS Institute, Cary, North Carolina) for multilevel modeling. Models were then confirmed in HLM software, version 8.0 (Scientific Software International, Inc., Chapel Hill, North Carolina) and latent scores (models 2 and 3) were generated. There were no missing data.

Results

Asthma RSQMs

Of the 349 total residents, 270 (77%) treated at least one patient with an asthma exacerbation in the PED during the study period, amounting to 1,752 encounters (Table 1). Overall resident performance on asthma RSQMs is reported in Table 2. Internal consistency of treatment of asthma was good (α = 0.77). Table 2 reports the proportion of patients receiving the correct treatment for each item in addition to item difficulty for models 2 and 3. Based on these results, it was psychometrically justifiable to combine the individual asthma indicators into a composite score.

Table 1.

Demographic Characteristics of Residents and Patients, From a Study of Bronchiolitis and Asthma RSQMs for Pediatric Emergency Department Care, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, July 2017–June 2019

graphic file with name acm-98-367-g001.jpg

Table 2.

Item Scoring of RSQMs and Characteristics of Individual RSQMs Included in Composite Scores, From a Study of Bronchiolitis and Asthma RSQMs for Pediatric Emergency Department Care, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, July 2017–June 2019

graphic file with name acm-98-367-g002.jpg

The unstandardized raw score model (model 1) produced a normal distribution (n = 270; m = 0.47; 95% confidence interval [CI] = 0.45, 0.49). Table 3 shows characteristics associated with ranking in the highest or lowest tertile, with the last column indicating significant differences. For example, residents ranking in the top tertile were significantly more likely to be in their first year of training or to be supervised by faculty who themselves scored higher on asthma RSQMs. Conversely, residents in the bottom tertile were significantly more likely to be in their second year of training or have had more encounters with patients who initially presented to the medical resuscitation bay. No differences were found in ranking between third and fourth postgraduate years, resident sex, type of residency program, patient panel size, or other patient characteristics. The unadjusted latent score model (model 2) produced almost identical results to model 1, with correlation values of r = 0.99 for the standardized scores. The tertile comparison between ranking of residents also showed very little difference between the 2 models.

Table 3.

Resident, Faculty, and Patient Characteristics Associated With Residents Being Ranked in the Top or Bottom Tertile Based on Asthma Raw Composite Score, From a Study of Bronchiolitis and Asthma RSQMs for Pediatric Emergency Department Care, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, July 2017–June 2019

graphic file with name acm-98-367-g003.jpg

Given the almost negligible difference and the ease of calculating RSQM composite scores in model 1, we opted to compare model 1 with the adjusted latent model (model 3). Although the correlation between models 1 and 3 was extremely high (r = 0.92), there was a noticeable difference in the tertile groupings between the 2 models (Table 4). Compared with model 1, model 3 resulted in higher RSQM percentile scores for residents treating more acute patients, characterized by the initial presentation to the medical resuscitation bay, and penalized those who were treating fewer acute patients (Table 5). On average, residents shifted 10.08 percentiles after patient characteristics were taken into account. As highlighted in Table 5, residents who increased at least 10 points on their RSQM percentile scores were in their second, third, or fourth postgraduate year in July 2017, while residents in their first training year decreased in their ranking.

Table 4.

Number of Residents Grouped in Tertiles Using Adjusted Asthma Composite Score Compared With Raw Composite Score, From a Study of Bronchiolitis and Asthma RSQMs for Pediatric Emergency Department Care, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, July 2017–June 2019

graphic file with name acm-98-367-g004.jpg

Table 5.

Resident, Faculty, and Patient Characteristics Associated With an Absolute Rank Difference of at Least 10 Percentiles After Adjustment, From a Study of Bronchiolitis and Asthma RSQMs for Pediatric Emergency Department Care, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, July 2017–June 2019

graphic file with name acm-98-367-g005.jpg

Bronchiolitis RSQMs

Of the 349 total residents, 274 residents (79%) treated patients with bronchiolitis, amounting to 1,891 encounters over 2 years (Table 1). Overall resident performance on bronchiolitis RSQMs is reported in Table 2. The internal consistency of bronchiolitis treatment was very poor (α = 0.17). From a psychometric perspective, the items are considered too easy (i.e., the care was uniformly good), with an average performance of the individual bronchiolitis RSQMs of 82.4%. Only half of patients had “nasal bulb suction teaching ordered,” making this the only indicator able to distinguish among residents. Based on these results, a composite bronchiolitis RSQM score could not reliably differentiate higher-skilled residents from lower-skilled ones.

Discussion

This is the first study to provide evidence for scoring, generalizability, and extrapolation of available RSQMs from the EHR for 2 common pediatric conditions, to our knowledge. Scoring and generalizability evidence favors asthma RSQM composite scores, but not bronchiolitis. Overall, the 5-item asthma composite RSQM scores could discriminate between high and low performers, suggesting there is a proportion of the variation in patient care that is attributable to the actions of residents making decisions about care. This discrimination improves when data are adjusted for patient acuity, age, and academic year, favoring the adjusted model. Extrapolation evidence for the asthma composite RSQM scores indicates associations with resident postgraduate year, faculty performance, and patient acuity where first-year residents are overrepresented in the top tertile and second-year residents overrepresented in the bottom tertile; better faculty performance is associated with better resident performance; and poor resident performance is associated with higher patient acuity. On the other hand, the 5-item bronchiolitis composite RSQM scores showed low variability with consistently high resident performance. Thus, from a psychometric perspective, it poorly discriminates amongst residents’ performance. By providing baseline scoring, generalizability, and extrapolation evidence for future replication studies, this study facilitates the wider application of EHR data to provide individualized, performance-based information for residents and programs. 16

In contrast to asthma care, residents consistently provided high-quality care to patients with bronchiolitis on the chosen measures. Although there was no QI initiative around bronchiolitis, it is possible that residents performed better on bronchiolitis measures because the standard of care for bronchiolitis is far simpler and less dependent on clinical context compared with asthma exacerbations. Although this finding may not be desirable from a psychometric perspective, it is important from a mastery perspective. While consistently high performance on bronchiolitis RSQMs does not allow for creation of composite scores or performance ranking, such measures can be used to identify outliers. It would be meaningful for training programs to know if a resident underperforms compared with their peers since, from a patient care perspective, any substandard performance by a resident is, by definition, a compromise of patient care. Such measures can be considered a baseline clinical benchmark and significant signal, for all residents should consistently provide high-quality patient care based on the selected measures. On the other hand, asthma RSQM composite scores showed more variability in resident performance, and therefore, may be more useful for differentiating between the performance of individual residents. This finding raises an important question, whether (or which) RSQMs could be used as both criterion-based and performance differentiating measures in the context of a CBME program of assessment.

The finding that higher-performing residents were more likely to be supervised by higher-performing faculty may represent the interdependence between residents and faculty, especially earlier in residency training. 34 Interestingly, this relationship did not hold once patient characteristics were taken into account. Thus, patient complexity may explain the performance of both faculty and residents. While this study cannot prove causation, this finding may be a further indication that individual practice style may be shaped during residency training in a manner that imprints future practice patterns for decades to come. 2,5

We found that residents in the beginning of their training were more likely to score higher on asthma RSQMs, which goes against the logic that performance improves with experience. 25 This finding could be explained by more junior residents being more likely to follow standards of care provided to them because they do not yet have their own care style and preferences. It could also be due to greater interdependence between faculty and first-year residents, in which less experienced residents are more likely to review orders with their faculty before placing them in the EHR. 35 As residents advance in their training and gain more independence, they are also given more responsibility and harder tasks, leading to a lower scores because the task (i.e., the requisite patient care) is harder. 36 In this study, the apparent overrepresentation of first-year residents in the top tertile seems to be accounted for by seeing fewer patients overall, seeing fewer acute patients, or potentially spending more time on patient care. In addition, residents in the second year of their training performed worse on asthma RSQMs, but their ranking improved on average when higher patient acuity and larger patient panel sizes were taken into account. This may have implications for the number and acuity of patients whom residents should be allowed to manage to ensure they are positioned to provide high-quality care. In an earlier study, Schumacher and colleagues found higher patient acuity and complexity to be associated with a higher RSQM composite score for both asthma and bronchiolitis after controlling for postgraduate year. 23 Our current study builds upon these findings using a more objective measure of acuity (i.e., presentation to the medical resuscitation bay on arrival rather than faculty report). This also resonates with a previous study of primary care physicians that found adjustment for patient panel characteristics affected physician ranking on quality measures by an average of 7.6 percentiles, with more than a third of physicians studied being reclassified into a different tertile. 37

While adjustment for case-mix characteristics avoids unfairly penalizing residents who tend to see sicker patients, 38 from a patient perspective, these findings suggest that sicker patients may not consistently receive high-quality care in this setting. Alternatively, sicker patients may prompt faculty and residents to engage with the system in a different way that is not captured by this EHR data. 39 Hence, case-mix adjustment may be reasonable from an assessment point of view, but one should not overlook the opportunities for improvement in patient care from a care quality perspective.

Implications for practice

RSQM composite measures, easily extractable from the EHR, can provide meaningful information about residents’ clinical performance and point out opportunities for improving health care delivery. Although our application in this study is just one example, RSQMs are generalizable to any training program that uses an EHR. In the future, RSQM use can help set a standard among programs for resident feedback, in addition to setting educational goals when residents are missing clinical experiences. A similar procedure of defining and testing RSQMs can be applied to other clinical settings, although future replication studies are needed to examine the feasibility of extracting RSQMs from EHRs in different contexts, such as using different EHR systems. When calculating RSQM composite scores, however, some programs might prefer calculating raw scores, but residents should be compared within their own level of experience/cohort.

EHR data can also provide direct information for program directors about the levels of exposure of individual residents to various clinical scenarios. In this study, roughly 20% of residents did not get the experience of caring for patients with asthma exacerbation or bronchiolitis in the PED. While residents may have cared for patients with these illnesses in other settings, this information about the PED may be used by the program to target residents who did not get this experience to ensure competency is achieved. Regarding patient care, low use of the asthma order set could prompt the program to emphasize the use of clinical tools for residents before the start of the rotation. Residents may also benefit from closer clinical supervision when caring for patients with acute exacerbation of asthma, rather than bronchiolitis, especially in their first 2 years of training and with higher acuity patients.

Strengths and limitations

Using EHR data to assess residents has several strengths. We were able to obtain data for all patients with bronchiolitis and asthma exacerbation seen by a majority of residents rotating through the PED within the study period. Using RSQMs strengthens the assessment process by providing resident-level as well as patient-level data. It is possible to objectively identify underperforming residents, while providing feedback to residents that is performance-specific, allowing them to track and work to improve their performance. This method provides unique opportunities for identifying needed quality improvement initiatives, thereby responding to the goal of GME to improve quality of care while supporting the goals of CBME.

This study has several limitations. First, this was a single-institution study. While this controls for environmental factors when comparing between residents at the same institution, it is necessary to determine whether these findings are generalizable to other institutions. More work is needed to provide residents and programs with a full dashboard of RSQMs, as performing well on one measure does not translate to good performance on all measures. 28,31 Second, EHR data may not provide sufficient information necessary for risk adjustment. 29 Variations in electronic charting systems may limit the ability of RSQMs to be replicated at other sites. This work is currently underway. Finally, the weights of individual items in the composite score may not reflect their true proportional contribution to patient outcome. 29 Health care is ultimately provided by teams in a collaborative way, and a single RSQM, or even a composite of them, cannot fully encompass the quality of care delivered by a single resident.

Future research

Future studies are needed to replicate these findings at other sites as well as in programs located in smaller and rural locations. This research should focus on understanding the relationships between faculty and resident performance as well as the effects of patient case-mix and other resident characteristics on clinical performance. 17 Further research should examine the extent of interdependence between residents and faculty in decision making around orders at different stages of training to better interpret the extrapolation evidence. 36 Studying the relationship between RSQMs and other traditional workplace-based assessment approaches would provide additional extrapolation evidence for this novel method of assessment. Further research could also focus on studying cross-classification effects stemming from the interdependent nature of team-based patient care 39 while developing the methodological ingenuity needed to capture both nesting and cross-effects when measuring such performance. 35,40

Conclusions

This study shows that EHR data for 2 specific clinical conditions can be used to meaningfully assess residents’ clinical performance in the context of a CBME program of assessment, and to identify opportunities for improving health care delivery. Our findings suggest that, in the context of a CBME program of assessment, meaningful RSQMs should include performance differentiating measures that are criterion based, valid, and reliable to capture a wider range of resident performance in practice.

Supplementary Material

acm-98-367-s001.pdf (227KB, pdf)

Footnotes

Supplemental digital content for this article is available at http://links.lww.com/ACADMED/B356.

Funding/Support: The authors would like to thank the Edward J. Stemmler, MD Medical Education Research Fund of the National Board of Medical Examiners for funding the work of this collaborative group.

Other disclosures: None reported.

Ethical approval: Cincinnati Children’s Hospital Medical Center (CCHMC) institutional review board approval was obtained before data extraction and analysis.

Disclaimers: The views expressed herein are those of the authors and not necessarily those of the U.S. Department of Defense or other federal agencies.

Previous presentations: Preliminary results of this study were presented at the Office of Health and Medical Education Scholarship Symposium (OHMES), February 8, 2021, in Calgary, Canada (virtual conference). This study was presented for oral presentation as a research paper at the AMEE conference in Lyon, France, August 29–31, 2022.

Data: Only data from CCHMC were used for this study. CCHMC institutional review board approval was obtained before data extraction and analysis.

Contributor Information

Alina Smirnova, Email: alina.smirnova1@ucalgary.ca.

Saad Chahine, Email: saad.chahine@queensu.ca.

Christina Milani, Email: cmilani@ohri.ca.

Abigail Schuh, Email: aschuh@mcw.edu.

Stefanie S. Sebok-Syer, Email: ssyer@stanford.edu.

Jordan L. Swartz, Email: Jordan.Swartz@nyulangone.org.

Jeffrey A. Wilhite, Email: Jeffrey.Wilhite@nyulangone.org.

Adina Kalet, Email: akalet@mcw.edu.

Steven J. Durning, Email: steven.durning@usuhs.edu.

Cees P.M. van der Vleuten, Email: c.vandervleuten@maastrichtuniversity.nl.

Daniel J. Schumacher, Email: Daniel.Schumacher@cchmc.org.

References

  • 1.Frenk J, Chen L, Bhutta ZA, et al. Health professionals for a new century: Transforming education to strengthen health systems in an interdependent world. Lancet. 2010;376:1923–1958. [DOI] [PubMed] [Google Scholar]
  • 2.Asch DA, Nicholson S, Srinivas S, Herrin J, Epstein AJ. Evaluating obstetrical residency programs using patient outcomes. JAMA. 2009;302:1277–1283. [DOI] [PubMed] [Google Scholar]
  • 3.Epstein AJ, Srinivas SK, Nicholson S, Herrin J, Asch DA. Association between physicians’ experience after training and maternal obstetrical outcomes: Cohort study. BMJ. 2013;346:f1596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Asch DA, Nicholson S, Srinivas SK, Herrin J, Epstein AJ. How do you deliver a good obstetrician? Outcome-based evaluation of medical education. Acad Med. 2014;89:24–26. [DOI] [PubMed] [Google Scholar]
  • 5.Bansal N, Simmons KD, Epstein AJ, Morris JB, Kelz RR. Using patient outcomes to evaluate general surgery residency program performance. JAMA Surg. 2016;151:111–119. [DOI] [PubMed] [Google Scholar]
  • 6.Chen C, Petterson S, Phillips R, Bazemore A, Mullan F. Spending patterns in region of residency training and subsequent expenditures for care provided by practicing physicians for Medicare beneficiaries. JAMA. 2014;312:2385–2393. [DOI] [PubMed] [Google Scholar]
  • 7.Sirovich BE, Lipner RS, Johnston M, Holmboe ES. The association between residency training and internists’ ability to practice conservatively. JAMA Intern Med. 2014;174:1640–1648. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Phillips RL, Petterson SM, Bazemore AW, Wingrove P, Puffer JC. The effects of training institution practice costs, quality, and other characteristics on future practice. Ann Fam Med. 2017;15:140–148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Denson JL, McCarty M, Fang Y, Uppal A, Evans L. Increased mortality rates during resident handoff periods and the effect of ACGME duty hour regulations. Am J Med. 2015;128:994–1000. [DOI] [PubMed] [Google Scholar]
  • 10.Denson JL, Jensen A, Saag HS, et al. Association between end-of-rotation resident transition in care and mortality among hospitalized patients. JAMA. 2016;316:2204–2213. [DOI] [PubMed] [Google Scholar]
  • 11.Chahine S, Kulasegaram K, Wright S, et al. A call to investigate the relationship between education and health outcomes using big data. Acad Med. 2018;93:829–832. [DOI] [PubMed] [Google Scholar]
  • 12.Arora VM. Harnessing the power of big data to improve graduate medical education: Big idea or bust? Acad Med. 2018;93:833–834. [DOI] [PubMed] [Google Scholar]
  • 13.Levin JC, Hron J. Automated reporting of trainee metrics using electronic clinical systems. J Grad Med Educ. 2017;9:361–365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Smirnova A, Sebok-Syer SS, Chahine S, et al. Defining and adopting clinical performance measures in graduate medical education: Where are we now and where are we going? Acad Med. 2019;94:671–677. [DOI] [PubMed] [Google Scholar]
  • 15.Kalet AL, Gillespie CC, Schwartz MD, et al. New measures to establish the evidence base for medical education: Identifying educationally sensitive patient outcomes. Acad Med. 2010;85:844–851. [DOI] [PubMed] [Google Scholar]
  • 16.Simpson D, Sullivan GM, Artino AR, Jr, Deiorio NM, Yarris LM. Envisioning graduate medical education in 2030. J Grad Med Educ. 2020;12:235–240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Schumacher DJ, van der Vleuten CP, Carraccio CL. The future of high-quality care depends on better assessment of physician performance. JAMA Pediatr. 2016;170:1131–1132. [DOI] [PubMed] [Google Scholar]
  • 18.Wyer PC. Assessing resident performance: Do we know what we are evaluating? Ann Emerg Med. 2019;74:679–681. [DOI] [PubMed] [Google Scholar]
  • 19.Schumacher DJ, Holmboe ES, van der Vleuten C, Busari JO, Carraccio C. Developing resident-sensitive quality measures: A model from pediatric emergency medicine. Acad Med. 2018;93:1071–1078. [DOI] [PubMed] [Google Scholar]
  • 20.Kinnear B, Kelleher M, Sall D, et al. Development of resident-sensitive quality measures for inpatient general internal medicine. J Gen Intern Med. 2021;36:1271–1278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Schumacher DJ, Martini A, Holmboe E, et al. Developing resident-sensitive quality measures: Engaging stakeholders to inform next steps. Acad Pediatr. 2019;19:177–185. [DOI] [PubMed] [Google Scholar]
  • 22.Schumacher DJ, Martini A, Holmboe E, et al. Initial implementation of resident-sensitive quality measures in the pediatric emergency department: A wide range of performance. Acad Med. 2020;95:1248–1255. [DOI] [PubMed] [Google Scholar]
  • 23.Schumacher DJ, Holmboe E, Carraccio C, et al. Resident-sensitive quality measures in the pediatric emergency department: Exploring relationships with supervisor entrustment and patient acuity and complexity. Acad Med. 2020;95:1256–1264. [DOI] [PubMed] [Google Scholar]
  • 24.Schumacher DJ, Martini A, Sobolewski B, et al. Use of resident-sensitive quality measure data in entrustment decision making: A qualitative study of clinical competency committee members at one pediatric residency. Acad Med. 2020;95:1726–1735. [DOI] [PubMed] [Google Scholar]
  • 25.Cook DA, Brydges R, Ginsburg S, Hatala R. A contemporary approach to validity arguments: A practical guide to Kane’s framework. Med Educ. 2015;49:560–575. [DOI] [PubMed] [Google Scholar]
  • 26.Mittiga MR, Schwartz HP, Iyer SB, Gonzalez del Rey JA. Pediatric emergency medicine residency experience: Requirements versus reality. J Grad Med Educ. 2010;2:571–576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Lancaster GA, Thabane L. Guidelines for reporting non-randomised pilot and feasibility studies. Pilot Feasibility Stud. 2019;5:114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Smith KA, Sussman JB, Bernstein SJ, Hayward RA. Improving the reliability of physician “report cards.” Med Care. 2013;51:266–274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Shwartz M, Restuccia JD, Rosen AK. Composite measures of health care provider performance: A description of approaches. Milbank Q. 2015;93:788–825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Scholle SH, Roski J, Adams JL, et al. Benchmarking physician performance: Reliability of individual and composite measures. Am J Manag Care. 2008;14:833–838. [PMC free article] [PubMed] [Google Scholar]
  • 31.Parkerton PH, Smith DG, Belin TR, Feldbau GA. Physician performance assessment: Nonequivalence of primary care measures. Med Care. 2003;41:1034–1047. [DOI] [PubMed] [Google Scholar]
  • 32.Yudkowsky R, Park YS, Downing SM, eds. Assessment in Health Professions Education. New York, NY: Routledge; 2019. [Google Scholar]
  • 33.Kamata A. Item analysis by the hierarchical generalized linear model. J Educ Meas. 2001;38:79–93. [Google Scholar]
  • 34.Sebok-Syer SS, Chahine S, Watling CJ, Goldszmidt M, Cristancho S, Lingard L. Considering the interdependence of clinical performance: Implications for assessment and entrustment. Med Educ. 2018;52:970–980. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Sebok-Syer SS, Shepherd L, McConnell A, Dukelow AM, Sedran R, Lingard L. “EMERGing” electronic health record data metrics: Insights and implications for assessing residents’ clinical performance in emergency medicine. AEM Educ Train. 2021;5:e10501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Sebok-Syer SS, Goldszmidt M, Watling CJ, Chahine S, Venance SL, Lingard L. Using electronic health record data to assess residents’ clinical performance in the workplace: The good, the bad, and the unthinkable. Acad Med. 2019;94:853–860. [DOI] [PubMed] [Google Scholar]
  • 37.Hong CS, Atlas SJ, Chang Y, et al. Relationship between patient panel characteristics and primary care physician clinical performance rankings. JAMA. 2010;304:1107–1113. [DOI] [PubMed] [Google Scholar]
  • 38.Gebauer S, Steele E. Questions program directors need to answer before using resident clinical performance data. J Grad Med Educ. 2016;8:507–509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Sebok-Syer SS, Pack R, Shepherd L, et al. Elucidating system-level interdependence in electronic health record data: What are the ramifications for trainee assessment? Med Educ. 2020;54:738–747. [DOI] [PubMed] [Google Scholar]
  • 40.Sebok-Syer SS, Shaw JM, Asghar F, Panza M, Syer MD, Lingard L. A scoping review of approaches for measuring “interdependent” collaborative performances. Med Educ. 2021;55:1123–1130. [DOI] [PubMed] [Google Scholar]

Articles from Academic Medicine are provided here courtesy of Wolters Kluwer Health

RESOURCES