Abstract
Purpose
Postgraduate medical training in the United States requires formative assessments of learners using the Accreditation Council for Graduate Medical Education (ACGME) milestones system. With Milestones 2.0, Harmonized Milestones (HMs) for 4 competency domains (professionalism, communication and interpersonal skills, systems-based practice, and practice-based learning and improvement) across specialties were developed. Performance of postgraduate trainees across specialties and at the transition to residency can be explored with the HMs. This study examined the factors that contribute to the variability in the assessments of postgraduate year 1 (PGY-1) learners as measured using Milestones 2.0.
Method
This retrospective study assessed national ACGME HM data from PGY-1 residents at U.S. medical schools in July 2021 and 2022 from the 6 largest specialties: emergency medicine, family medicine, internal medicine, general surgery, psychiatry, and pediatrics. Variance component analyses were conducted using cross-classified random-effects models, accounting for clustering; estimated variance components were used to generate inference on contribution of learner variability due to residency program, medical school, and specialty and make inferences on HM rating practices, including straight-lining.
Results
The sample included 57,132 PGY-1 residents (2,430 programs). Specialty accounted for the largest variance (22%) across HM competency domains. Within specialty, variance components for trainees, residency programs, and medical schools accounted for 22%, 35%, and 2% of total variance, respectively. Straight-lining was found at 6 months for 6,827 of 56,804 PGY-1 residents (12%), with the greatest amount in surgery (2,105 of 5,559 [38%]).
Conclusions
This study found variability in HM performance across 6 specialties due to medical schools, specialty, residency programs, and trainees with limited variability attributed to medical school and learner. Substantial differences across specialties call for the need for clinical educators, researchers, and accreditors to create a shared mental model to bolster the evaluative strength of milestones and prepare residents for the needs of health care.
A shared goal among medical schools is to prepare graduates for residency training and future practice through summative and formative assessment of competence. However, schools and residency programs do not know how effectively this goal is met. First, uniform training outcomes for medical schools in the United States are lacking. Medical schools are largely responsible for determining their own educational objectives and assessment strategies. Second, although institutions are accredited in part based on outcomes such as graduation rates, attrition, performance on national exams such as Step 1 and Step 2 of the United States Medical Licensing Examinations, and match success, these measures do not demonstrate how effectively medical schools prepare graduates for training and practice. In contrast, in postgraduate training the expected educational outcomes are standardized through the Accreditation Council for Graduate Medical Education (ACGME) milestones.1,2 Milestones are specialty-based assessments on the 6 general competencies: medical knowledge, patient care and procedural skills, interpersonal and communication skills, practice-based learning and improvement, professionalism, and systems-based practice.3
At the 2016 ACGME Milestones Summit, participants noted marked inconsistencies in the milestones and subcompetencies across specialties. In reviewing the milestone implementation, Edgar et al4 found wide differences in milestone framing among specialties for the nonpatient care and procedural skills and nonmedical knowledge domains. Thematic analysis noted almost 700 different milestone descriptions across 26 specialties. To address these differences and noting that the interpersonal and communication skills, practice-based learning and improvement, professionalism, and systems-based practice competencies were similar across specialties, the ACGME convened 4 groups assigned to develop cross-specialty Harmonized Milestones (HMs). Each specialty reviewed, edited, and adopted the HMs and implemented them as part of Milestones 2.0.4,5
Transition to residency and learner performance during postgraduate year 1 (PGY-1) are marked with complex interactions that may be attributed to individual trainee performance, medical school and residency clinical learning environment, and residency program–level factors. Examining factors that contribute to variability of residency performance is central to understanding medical education, fairness in residency assessment, advancement decisions, allocation of training resources, and evaluating the outcomes of medical school. Until the publication of the HMs, performance during residency could not easily be associated with performance of medical school across specialties; however, the HMs offer a new opportunity to consider performance across the medical education continuum.
In an ideal assessment paradigm, variance in ratings should be attributed to the trainees, meaning that differences in scores would reflect genuine differences in trainee abilities rather than factors such as assessor, rater bias, program-specific practices, or specialty. In other words, differences in scores should be based largely on the performance of the individual (assessment signal) rather than other factors (error or noise). A prior study by the Medical School Outcomes Milestones Study Group6 of Milestones 1.0 found that medical school accounted for only 5% of variability in assessment in emergency medicine and 6% for family medicine for PGY-1 trainees. Program-level variance accounted for the preponderance of variability (70% in emergency medicine and 53% in family medicine), and learner variance was moderate (23% in emergency medicine and 34% in family medicine).5 During the full residency period, learner variance increased, program variance decreased, and medical school variance decreased slightly.
Although these results clarify variance within specialties, the HMs offer the potential to examine variance across specialties using the same competencies. By eliminating the confounders present with different milestones, we may better determine the contribution of key factors to the variance in assessment. The HMs can provide feedback to medical schools about the performance of their graduates. Furthermore, by examining the variability of HMs, we can determine whether the goal to decrease inconsistencies in the HMs across specialties was achieved.4
The primary objective of this study was to examine factors that contribute to the variability in the assessments of PGY-1 learners as measured using Milestones 2.0. We investigate the relative influence of trainee, specialty, residency program, medical school, and subcompetency on trainee assessments using the HMs at the 6- and 12-month assessments of PGY-1. Our secondary objectives were to examine how the HM ratings were operationalized across specialties by studying the prevalence of straight-lining (resident assigned same score across all subcompetencies) and the variation in HM ratings by specialty.
Method
Study design
This retrospective study assessed data from PGY-1 residents at U.S. medical schools entering an ACGME-accredited residency training in July 2021 and 2022 from 6 specialties: emergency medicine (3-year and 4-year programs), family medicine, internal medicine, general surgery, pediatrics, and psychiatry. The study included transitional year trainees (internal medicine, general surgery, and pediatrics), trainees from osteopathic medical schools (20% of the sample), and international medical schools (27% of the sample). Only PGY-1 residents were included to provide proximity to medical school graduation and because a longitudinal cohort is not yet available due to the recent implementation of Milestones 2.0. We used Milestones 2.0 assessments, which were reported to the ACGME every December (6 months) and June (12 months).7 The ACGME provided anonymized milestones data to the Medical School Outcomes Milestones Study Group.
ACGME milestones
Between 2019 and 2022, each specialty implemented Milestones 2.0, which included the HMs. The milestones consist of subcompetencies with associated levels of performance, articulated through developmentally based anchors corresponding to a 9-point scale with 0.5 intervals between the levels, with level 1 indicating a trainee who demonstrates milestones expected of an incoming resident and level 5 indicating a trainee whose performance demonstrates an “aspirational” level beyond graduation target. Level 4 indicates proficiency and is the target for graduation. Programs have the option to indicate that the resident “has not yet completed level 1,” potentially signifying deficiencies in the subcompetency. Additionally, programs could indicate the residents were “not yet assessed” or “not yet rotated.” Programs assign milestone levels for all subcompetencies for each reporting cycle. Decisions and processes associated with milestone ratings are within the purview of the residency program based on judgment of the Clinical Competency Committee (CCC) and the program director.8
The HMs include the following competency domains with minor variations for each specialty: (1) systems-based practice: 3 subcompetencies; (2) practice-based learning and improvement: 2 subcompetencies; (3) professionalism: 4 subcompetencies; and (4) interpersonal and communication skills: 3 subcompetencies. A complete list of subcompetencies and associated milestone levels are available through the ACGME.7
Analysis
We used descriptive statistics and visual graphs to examine trends and to review longitudinal patterns. Variance component analyses were conducted using cross-classified random-effects models, accounting for clustering at the residency program, medical school, and specialty. Given the cross-classified nature of the data, we controlled for crossover effects at the medical school and nesting at residency program levels by using cross-classified random-effects models by competency and reporting period. Ratings of “has not achieved level 1” were analyzed using descriptive statistics only.
We performed an additional analysis for frequency of straight-lining. We defined straight-lining as any resident’s rating being the same in all subcompetencies (i.e., ≥ 20 ratings were identical during a reporting cycle). From a variance component perspective, this is demonstrated by lack of variance within a trainee across the HM subcompetencies; for example, if a trainee was rated level 2 across all subcompetencies, the learner would have 0 variation across all subcompetencies, indicating straight-lining. Data compilation and analyses were conducted using Stata 18/MP software (StataCorp, College Station, Texas). The institutional review board of the University of Illinois at Chicago approved this study.
Results
The sample included 57,132 PGY-1 residents (28,889 [51%] female and 27,936 [49%] male; 307 [0.5%] did not report their gender) from 2,430 programs, with 56,804 PGY-1 residents (2,430 programs) at 6 months and 56,604 PGY-1 residents (2,428 programs) at 12 months. See Figures 1 and 2 for descriptive statistics on the HMs.
Figure 1.

Harmonized Milestones for postgraduate year 1 residents entering U.S. medical schools in July 2021 and 2022 in emergency medicine, family medicine, internal medicine, pediatrics, psychiatry, and general surgery (N = 57,132 residents and 2,430 programs). This figure shows the mean differences and ranges in the scores across specialties for the Harmonized Milestones and the growth from midyear of internship to end of year, with 0 indicating “not ready for supervised practice” (level 1). Abbreviations: ICS, interpersonal communication skills; PBLI, practice-based learning and improvement; PROF, professionalism; SBP, systems-based practice.
Figure 2.

Descriptive statistics of Harmonized Milestones by specialty for postgraduate year 1 residents entering U.S. medical schools in July 2021 and 2022. This figure shows the mean differences and ranges in the scores across specialties for the Harmonized Milestones by specialty. The mean differs for the specialty as well as the variation. Values represent the milestone ratings, with 0 indicating “not ready for supervised practice” (level 1). (A) Midyear reporting period. (B) End-year reporting period. Abbreviations: EM, emergency medicine; FM, family medicine; ICS, interpersonal communication skills; IM, internal medicine; PBLI, practice-based learning and improvement; PE, pediatrics; PROF, professionalism; PS, psychiatry; SBP, systems-based practice; SU, surgery.
Variance component analysis
When comparing across specialties, we found that specialty was the largest factor, accounting for approximately 22% of the variance in subcompetency performance ratings (Table 1), with medical school contributing less than 5% and trainee contributing between 0% and 10%. Within specialties, individual trainees contributed 22% of the variance, whereas residency programs accounted for 35%. The contribution of medical school on HMs was minimal (2%), with a range of 0% to 7% (see Supplemental Digital Appendix 1 at http://links.lww.com/ACADMED/B697 for full estimates of variance components by specialty and reporting period).
Table 1.
Harmonized Milestone Variance by Competency for Postgraduate Year 1 Residents Entering U.S. Medical Schools in July 2021 and 2022a
| Variance, % | |||
|---|---|---|---|
| Competency | Midyear | End of year | Overall |
| Interpersonal and communication skills | |||
| Specialty | 22.8 | 21.5 | 22.2 |
| Medical school | 4.7 | 4.6 | 4.7 |
| Program | 25.3 | 19.7 | 22.5 |
| Trainee | 5.8 | 0.0 | 2.9 |
| Subcompetency | 7.0 | 22.6 | 14.8 |
| Residual | 34.4 | 31.5 | 33.0 |
| Practice-based learning and improvement | |||
| Specialty | 23.2 | 21.8 | 22.5 |
| Medical school | 4.2 | 4.2 | 4.2 |
| Program | 20.2 | 11.8 | 16.0 |
| Trainee | 0.0 | 0.0 | 0.0 |
| Subcompetency | 14.4 | 29.6 | 22.0 |
| Residual | 38.0 | 32.5 | 35.2 |
| Professionalism | |||
| Specialty | 21.6 | 21.1 | 21.4 |
| Medical school | 3.1 | 3.6 | 3.4 |
| Program | 25.0 | 22.1 | 23.6 |
| Trainee | 9.5 | 0.0 | 4.7 |
| Subcompetency | 6.5 | 22.2 | 14.3 |
| Residual | 34.3 | 31.0 | 32.6 |
| Systems-based practice | |||
| Specialty | 20.1 | 20.9 | 20.5 |
| Medical school | 4.0 | 3.7 | 3.8 |
| Program | 27.7 | 22.9 | 25.3 |
| Trainee | 10.3 | 0.0 | 5.1 |
| Subcompetency | 0.0 | 15.3 | 7.7 |
| Residual | 37.9 | 37.2 | 37.6 |
aEstimation conducted using a cross-classified random-effects model with specialty, medical school, program, trainee, and subcompetency as competency domains.
There was no clear pattern of increasing or decreasing contribution by specialty or medical school over the 2 time points. For example, the medical school contribution to the variance in practice-based learning and improvement for emergency medicine increased from 4.4% at 6 months to 7.4% at 12 months and for family medicine decreased from 5.0% to 2.0%. For each analysis, there was unexplained variance from 11% to 50%, depending on the specialty and competency domains (see Supplemental Digital Appendix 1 at http://links.lww.com/ACADMED/B697).
Straight-lining
We found evidence of straight-lining at 6 months for 6,827 of 56,804 PGY-1 residents (12%). The greatest amount of straight-lining was found in surgery (2,105 of 5,559 [38%]) followed by psychiatry (1,084 of 4,233 [26%]), emergency medicine (1,399 of 5,911 [24%]), family medicine (1,438 of 10,295 [14%]), pediatrics (339 of 6,573 [5%]), and internal medicine (462 of 24,233 [2%]).
Specialty-specific differences
Figures 3 and 4 show an overall strong program variance across most specialties (30%–46%) at 6 and 12 months, which showed minimal decrease during reporting periods for emergency medicine, family medicine, pediatrics, and surgery. In contrast, the contribution in internal medicine was less and decreased from 6 to 12 months (12% to 15%). Although the HMs were similar across specialties, various specialties rated interns differently. For example, at 6 months, 8,193 of 56,807 interns (14%) were assessed to have not achieved level 1 (432 of 5,911 [7%] in emergency medicine, 1,930 of 10,295 [19%] in family medicine, 2,550 of 24,233 [11%] in internal medicine, 1,267 of 6,573 [19%] in pediatrics, 851 of 4,233 [20%] in psychiatry, and 1,163 of 5,559 [21%] in surgery). Surgery interns were rated lower, with a mean rating of 1.0, and pediatrics interns were rated higher, with a mean rating of 2.5. There was also more variability in scoring for internal medicine compared with other specialties.
Figure 3.

Learner variance by specialty for postgraduate year 1 (PGY-1) residents entering U.S. medical schools in July 2021 and 2022. This figure shows the differences attributed to the learner for midyear and end of year for each specialty. Most of the differences are slightly higher at the end of the year, with a marked increase at end of the year for internal medicine (IM). Abbreviations: EM, emergency medicine; FM, family medicine; PEDS, pediatrics; PSYCH, psychiatry; SURG, surgery.
Figure 4.

Program variance by specialty for postgraduate year 1 (PGY-1) residents entering U.S. medical schools in July 2021 and 2022. This figure shows the differences attributed to the program for midyear and end of year for each specialty. Most of the differences are slightly lower at the end of the year, with a marked decrease at the end of the year for internal medicine (IM). These findings mirror those in Figure 3; as the variance attributable to the learner increases, the program variance decreases. Abbreviations: EM, emergency medicine; FM, family medicine; PEDS, pediatrics; PSYCH, psychiatry; SURG, surgery.
At 12 months, within specialties, there was substantial trainee variability. For example, interpersonal and communication skills ranged from 26% in pediatrics to 51% in internal medicine. For many programs, the trainee contribution stayed mostly stable; however, it decreased for psychiatry (32% to 26%) and increased in internal medicine (25% to 48%).
Discussion
We found variability in HM performance across 6 specialties due to 4 contributing domains: (1) medical schools, (2) specialty, (3) residency programs, and (4) trainees. In this section, we further explore the domains that contributed to variance, examine the prevalence of straight-lining, and explore specialty-specific differences identified in our results.
Domains contributing to HM variance
Medical school. Overall, the results of this study are similar to those of a previous study,6 which found that medical schools contributed little to the variance in HM scoring. Despite the differences in school-specific educational objectives, graduates appear similarly prepared for residency, regardless of the institution (including U.S. medical schools, U.S. osteopathic medical schools, and international programs for which there may be bias in selection).9–11
Trainees and program. In contrast, the variance attributed to trainees and residency programs was much larger than the medical schools, although there were differences over time. The PGY-1 trainee variance contribution was minimal and increased somewhat by the year end. This finding is consistent with the previous study6 and likely reflects increased knowledge of trainee skill sets by raters. Although this is reassuring, the overall contribution of the trainee to variance was relatively modest for most specialties. This finding is concerning because the assessment system should discriminate among trainees, yet this may not be happening when considering HMs across specialties and programs.
By comparison, the residency program variance was the highest measurable domain across specialties, a finding that was relatively stable over time. Previous studies12,13 have reported similar findings with low variance attributable to the trainee and higher variance to the program. Although it is possible that some programs have cohorts of similarly high- or low-performing residents, leading to higher contribution of the variance to the program, more likely program scorings are “idiosyncratic and construct irrelevant.”13 Furthermore, in a study of transitional year CCC members, participants endorsed highly variable interpretations of milestones, challenging the response process validity associated with scores from these assessments.12 These variable interpretations of milestones become imbedded in each CCC scoring and residency program. These findings of residency program variance are important for the analysis in future studies to nest residents in programs.
Straight-lining
There was substantial evidence of straight-lining across specialties, with rates as high as 38% in surgery. Beeson et al14 suggested that straight-lining is an early assessment practice when a committee is less familiar with a trainee or may not have adequate assessments in that domain. Alternatively, programs may select one score for each subcompetency within the competency domain because they treat milestones in a more holistic fashion than how they were designed. In a recent qualitative study by Maranich et al,12 CCC members reported that unless data suggest otherwise, they presume residents are performing at an average level and assign a milestone value that represents some perceived norm referenced value. This construct-irrelevant threat has been demonstrated to varying degrees in studies of milestone assessments.15,16
Specialty differences
There were several marked differences in milestone ratings by specialty. Pediatrics residents were rated higher in essentially every milestone. There are several potential explanations for these findings. First, placement into specialties in the United States is not random. Perhaps interns in some specialties truly perform better in some areas. Medical students select their specialty for a wide variety of reasons,17–19 and we may anticipate that there are differential levels of milestone-based outcomes across individuals from different specialties. An alternative may be explained by differences in shared mental models among specialties that results in stringency or leniency20–22 by specialty, reflecting the scoring of the competency committee based on faculty assessments.8
Second, overall ratings may vary by demographics,23 trainee,22,24 potential bias,25 and the diversity of academic leadership.26 However, more recent literature has implicated the role of context. For example, a study of pediatric residents and fellows found that ratings on the same subcompetencies generally decreased from residency into fellowship, suggesting that milestones are context specific.27 Our findings suggest that, although the language of HMs is similar across specialties, context may play a critical role in the assessment of trainees.
Third, the differences may simply represent an artifact of the milestones themselves. Although a major purpose of HMs was to create a more uniform set of milestones across specialties,4 there are differences in language. For example, pediatrics lists 2 subcompetencies related to patient safety and quality improvement (systems-based practice 1 and 2), whereas these subcompetencies were incorporated into 1 (systems-based practice 1) in internal medicine. Similarly, even when subcompetencies are similar, the descriptive anchors are somewhat different. Although the general sentiment of these anchors may be similar, raters may view performance differently.
In internal medicine, trainee variance was higher at year end (38.5%–53.2%) compared with midyear (15.8%–31.4%) and in comparison to all other specialties. In parallel, internal medicine had the lowest rate of straight-lining (2%). The reasons for these findings are unclear. Specialties differ in terms of numbers of residents, culture and context, and CCC intentionality in scoring. Furthermore, the internal medicine structure with brief intense contact with a large portion of teaching and assessing performed with inpatient teams may provide more detailed understanding of performance.
Implications
Although the primary purpose of the HMs was to reduce variation in similar competencies, thus promoting shared assessment instrument and faculty development across specialties, a secondary benefit involves the potential to map medical student performance into residency. Our findings suggest the need for further refinement before interpersonal and communication skills, professionalism, systems-based practice, and practice-based learning and improvement competencies can uniformly provide constructive feedback to medical schools from various residency programs.
If the differences in milestone performance are truly specialty dependent, residency programs may benefit from clarification of expectations. Alternatively, if these disparities stem from the variability of rater judgments or from the way milestones are drafted across specialties, it introduces constraints to the comparability we can expect. Such revelations underscore the necessity for further research into rater leniency, bias, or stringency behavior and to refine the response processes governing milestone assessments. Although in this study we have concentrated on 4 principal factors that contribute to resident performance on HMs, a broader spectrum of elements likely informs learner readiness and success in residency.
Limitations
Although this study offers valuable insights into the developmental trajectories of medical school graduates as assessed by HMs, several limitations should be considered. Notably absent from our dataset are medical school graduates who did not match into a residency (an estimated 5%) and those who did not complete their medical training (approximately 3% of matriculants).24,28 This exclusion may be meaningful because it primarily pertains to individuals who struggle academically or may potentially face socioeconomic, mental health, or minority status–related challenges, which could impact the generalizability of our findings. Although it is not known what proportion of these graduated students opted not to enter postgraduate training, racial and ethnic minoritized graduates are disproportionately affected by nonmatching into graduate training in some specialties.29
Additionally, our study does not account for the demographic characteristic of trainees, such as race, ethnicity, gender, disability, or economic background, which may be associated with disparate performance ratings and experiences of discrimination and burnout.30,31 Moreover, a portion of our data correlate with an adaptation period of Milestones 2.0 for some specialties and may reflect less familiarity with the assessment tool rather than actual trainee performance. Finally, we assessed only medical school interns because the intern milestone assessments were proximate medical school training. Further work will need to reexamine the HMs and track trajectories during training.
Future considerations
In refining our understanding of competency development through the use of HMs, our study identified several important considerations for the implementation and interpretation of milestone evaluations. First, the observation of straight-lining in assessment raises questions about the accuracy and meaningfulness of the milestones as tools for evaluating resident progress. If milestone assessments default to a uniform score for subcompetencies without a true reflection of individual performance, the validity of the milestones may be undermined and the identification of areas needing developmental support may be impeded. Training programs should critically examine their assessment methods, ensuring that they leverage the robustness and sensitivity of milestones to capture the authentic breadth of resident abilities. Second, the differing ratings among specialties underscore the need for a more nuanced exploration into the potential causes. Third, the consistently minimal contribution of medical schooling on milestone evaluations reaffirms the notion that graduates, irrespective of the medical school they attended, demonstrate comparable preparedness for residency. This finding may have implications for resident selection processes, suggesting that program directors might consider placing less emphasis on the reputation or ranking of an applicant’s medical school and focus more on the individual attributes and competencies that enable success within their specialty.
Conclusions
This study invites a critical appraisal of the current state of milestone assessments and calls for informed discussions about how best to harness them for accurate and equitable assessment of resident development. We need collaborative efforts among educational researchers, clinical educators, and accreditation boards to address these revelations. By doing so, we will bolster the evaluative strength of milestones and pave the way for more responsive and personalized postgraduate medical education that prepares residents for the demands of modern health care.
Acknowledgments
The Medical School Outcomes Milestones Study Group includes University of Illinois College of Medicine, Jefferson College of Medicine, University of California Davis School of Medicine, University of Cincinnati College of Medicine, University of Virginia School of Medicine, Virginia Commonwealth University School of Medicine, and the Accreditation Council for Graduate Medical Education (ACGME).
Funding/Support
The Medical School Outcomes Milestone Study Group receives funding from the National Board of Medical Examiners (NBME) Edward J. Stemmler, MD Medical Education Research Fund grant.
Other disclosures
S.O. Hogan is employed by the ACGME. E.S. Holmboe receives royalties from Elsevier Publishing for a textbook on assessment and was employed by the ACGME. Y.S. Park is a consultant for the ACGME. L. Turner is the cofounder of 2-Sigma Smart Educational Technologies.
Ethical approval
This study was approved by the institutional review board of the University of Illinois at Chicago, March 27, 2023, STUDY2023–0361.
Disclaimers
The project and the views expressed in this publication do not necessarily reflect the position or policy of NBME, and NBME support provides no official endorsement.
Footnotes
First published online April 1, 2025
Supplemental digital content for this article is available at http://links.lww.com/ACADMED/B697.
Contributor Information
Michael S. Ryan, Email: DGQ7XK@uvahealth.org.
Tonya L. Fancher, Email: tlfancher@ucdavis.edu.
Tyler Carcamo, Email: tcarcamo@ucdavis.edu.
Sean O. Hogan, Email: shogan@acgme.org.
Laurah Turner, Email: laurah.turner@uc.edu.
Jeffrey J.H. Cheung, Email: jcheung@uic.edu.
Katherine Berg, Email: katherine.berg@jefferson.edu.
Moshe Feldman, Email: moshe.feldman@vcuhealth.org.
Eric S. Holmboe, Email: eholmboe@intealth.org.
Yoon Soo Park, Email: yspark2@uic.edu.
References
- 1.Nasca TJ, Philibert I, Brigham T, Flynn TC. The next GME accreditation system: rationale and benefits. N Engl J Med. 2012;366(11):1051–1056. [DOI] [PubMed] [Google Scholar]
- 2.Accreditation Council for Graduate Medical Education . Accessed February 27, 2025. https://www.acgme.org/
- 3.Edgar L, McLean S, Hogan SO, Hamstra S, Holmboe ES. The Milestones Guidebook. Accreditation Council for Graduate Medical Education; 2020. [Google Scholar]
- 4.Edgar L Roberts S Yaghmour NA, et al. Competency crosswalk: a multispecialty review of the Accreditation Council for Graduate Medical Education milestones across four competency domains. Acad Med. 2018;93(7):1035–1041. [DOI] [PubMed] [Google Scholar]
- 5.Holmboe ES, Iobst WF. ACGME Assessment Guidebook. Accreditation Council for Graduate Medical Education; 2020. [Google Scholar]
- 6.Park YS Ryan MS Hogan SO, et al. Transition to residency: national study of factors contributing to variability in learner milestones ratings in emergency medicine and family medicine. Acad Med. 2023;98(11S):S123–S132. [DOI] [PubMed] [Google Scholar]
- 7.Accreditation Council for Graduate Medical Education . ACGME Milestones by Specialty. Accessed February 27, 2025. https://www.acgme.org/milestones/milestones-by-specialty/ [DOI] [PubMed]
- 8.Andolsek K, Padmore J, Hauer KE, Ekpenyong A, Edgar L, Holmboe E. ACGME Clinical Competency Committees: A Guidebook for Programs. 3rd ed. Accreditation Council for Graduate Medical Education; 2020. [Google Scholar]
- 9.Chen PG, Curry LA, Bernheim SM, Berg D, Gozu A, Nunez-Smith M. Professional challenges of non–U.S.-born international medical graduates and recommendations for support during residency training. Acad Med. 2011;86(11):1383–1388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Murillo Zepeda C, Alcala Aguirre FO, Luna Landa EM, Reyes Guereque EN, Rodriguez Garcia GP, Diaz Montoya LS. Challenges for international medical graduates in the US graduate medical education and health care system environment: a narrative review. Cureus. 2022;14(7):e27351. doi: 10.7759/cureus.27351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.McGaghie WC. America’s best medical schools: a renewed critique of the U.S. News & World Report rankings. Acad Med. 2019;94(9):1264–1266. [DOI] [PubMed] [Google Scholar]
- 12.Maranich AM, Hemmer PA, Uijtdehaage S, Battista A. ACGME milestones in the real world: a qualitative study exploring response process evidence. J Grad Med Educ. 2022;14(2):201–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hu M, Carraccio C, Osta A, Winward ML, Schwartz A. Reported pediatrics milestones (mostly) measure program, not learner performance. Acad Med. 2020;95(11S):S89–S94. [DOI] [PubMed] [Google Scholar]
- 14.Beeson MS Hamstra SJ Barton MA, et al. Straight line scoring by clinical competency committees using emergency medicine milestones. J Grad Med Educ. 2017;9(6):716–720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Downing SM, Haladyna TM. Validity and its threats. Assess Health Professions Educ. 2009;1:21–56. [Google Scholar]
- 16.Tanaka P Park YS Roby J, et al. Milestone learning trajectories of residents at five anesthesiology residency programs. Teach Learn Med. 2020;33(3):304–313. [DOI] [PubMed] [Google Scholar]
- 17.Reed VA, Jernstedt GC, Reber ES. Understanding and improving medical student specialty choice: a synthesis of the literature using decision theory as a referent. Teach Learn Med. 2001;13(2):117–129. [DOI] [PubMed] [Google Scholar]
- 18.Weissman C, Schroeder J, Elchalal U, Weiss Y, Tandeter H, Zisk-Rony RY. Using marketing research concepts to investigate specialty selection by medical students. Med Educ. 2012;46(10):974–982. [DOI] [PubMed] [Google Scholar]
- 19.Xu R. A differentiation diagnosis: specialization and the medical student. N Engl J Med. 2011;365(5):391–393. [DOI] [PubMed] [Google Scholar]
- 20.Gauthier G, St-Onge C, Tavares W. Rater cognition: review and integration of research findings. Med Educ. 2016;50(5):511–522. [DOI] [PubMed] [Google Scholar]
- 21.Gingerich A, Kogan J, Yeates P, Govaerts M, Holmboe E. Seeing the ‘black box’ differently: assessor cognition from three research perspectives. Med Educ. 2014;48(11):1055–1068. [DOI] [PubMed] [Google Scholar]
- 22.Harasym PH, Woloschuk W, Cunning L. Undesired variance due to examiner stringency/leniency effect in communication skill scores assessed in OSCEs. Adv Health Sci Educ Theory Pract. 2008;13(5):617–632. [DOI] [PubMed] [Google Scholar]
- 23.Boatright D Anderson N Kim JG, et al. Racial and ethnic differences in internal medicine residency assessments. JAMA Netw Open. 2022;5(12):e2247649. doi: 10.1001/jamanetworkopen.2022.47649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lett E Tran NK Nweke N, et al. Intersectional disparities in emergency medicine residents' performance assessments by race, ethnicity, and sex. JAMA Netw Open. 2023;6(9):e2330847. doi: 10.1001/jamanetworkopen.2023.30847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hennein R Tiako MJN Bonumwezi J, et al. Vicarious racism, direct racism, and mental health among racialized minority healthcare workers. J Racial Ethn Health Disparities. 2025;12(1):8–21. [DOI] [PubMed] [Google Scholar]
- 26.Meadows AM, Skinner MM, Hazime AA, Day RG, Fore JA, Day CS. Racial, ethnic, and sex diversity in academic medical leadership. JAMA Netw Open. 2023;6(9):e2335529. doi: 10.1001/jamanetworkopen.2023.35529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Reed S, Mink R, Stanek J, Tyrrell L, Li ST. Are final residency milestones correlated with early fellowship performance in pediatrics? Acad Med. 2023;98(9):1069–1075. [DOI] [PubMed] [Google Scholar]
- 28.Nguyen M Chaudhry SI Desai MM, et al. Rates of medical student placement into graduate medical education by sex, race and ethnicity, and socioeconomic status, 2018–2021. JAMA Netw Open. 2022;5(8):e2229243. doi: 10.1001/jamanetworkopen.2022.29243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Curtin LS, Goldstein RB, Lamb DL. Applicant Demographics and the Transition to Residency: It’s Time to Leverage Data on Preferred Specialty and Match Outcomes to Inform the National Conversation about Diversity and Equity in Medical Education. Accessed March 10, 2025. https://www.nrmp.org/wp-content/uploads/2023/02/Demographic-data-perspectives-paper_FINAL.pdf
- 30.Boatright D, Edje L, Gruppen LD, Hauer KE, Humphrey HJ, Marcotte K. Ensuring fairness in medical education assessment. Acad Med. 2023;98(8S):S1–S2. [DOI] [PubMed] [Google Scholar]
- 31.Boatright D London M Soriano AJ, et al. Strategies and best practices to improve diversity, equity, and inclusion among US graduate medical education programs. JAMA Netw Open. 2023;6(2):e2255110. doi: 10.1001/jamanetworkopen.2022.55110. [DOI] [PMC free article] [PubMed] [Google Scholar]
