Skip to main content
AEM Education and Training logoLink to AEM Education and Training
. 2023 Sep 15;7(5):e10904. doi: 10.1002/aet2.10904

Educator's blueprint: Holistic applicant file review in undergraduate and postgraduate medical education

Eric Shappell 1,, Keme Carter 2, Yoon Soo Park 1, Michael Gottlieb 3
PMCID: PMC10502667  PMID: 37720308

Abstract

Medical schools and graduate medical education programs are tasked each year with selecting the next class of trainees, often from large applicant pools with enormous quantities of data to be processed. Review of applicant files must therefore be efficient, equitable, and effective in maximizing the likelihood of trainee success and alignment with institutional missions and values. In this article, we discuss 10 strategies to optimize the file review process for trainee selection. Using these strategies, educators can ensure rigorous and accountable file review processes for their training programs.

INTRODUCTION

Each year, medical schools and residency programs are tasked with selecting their next class of trainees. Applicant pools for such positions are large, with individual medical schools receiving over 6000 applications on average and some individual residency programs receiving over 2500 applications. 1 , 2 With substantial information provided for each applicant, limited resources available to selection committees, and high stakes for selecting trainees that will be successful and align with institutional missions, selection processes must be efficient without sacrificing rigor and equity.

Many decisions must be made when designing a system for efficient and effective review of applicant files. These include deciding who will review the files, what domains are assessed, how performance is measured, and how decisions are made once reviews are complete. In addition, these decisions may need to be contextualized to specific characteristics of an application year such as variations in application volume, performance metrics, or demographics. Recent trends of increasing pass/fail grading and identification of bias in several metrics used in these processes underscore the importance of a holistic approach to the selection process with careful attention to how ratings are generated and used to minimize bias. 3 , 4 , 5 In this article, we discuss 10 strategies for applicant file review focusing on efficiency, equity, and accountability in the applicant review process.

TEN STRATEGIES

1. Set the program mission, values, and goals for holistic review

To attract and select trainees who will thrive within and contribute meaningfully to a training environment, the mission, values, and goals of a program need to be well developed and easily accessible to applicants and embraced by selection committees. 6 , 7 , 8 , 9 Every medical school and residency program should work to identify and graduate professionals who are prepared to meet the health needs of our communities. 10 However, the health needs throughout the country are as varied as the patients are diverse. Training programs should consider the nation's existing gaps in care as well as their institutional strengths and local geographic setting when developing a mission and training goals for their graduates. Some institutions may have a robust research infrastructure that prepares graduates to advance biomedical or clinical research and innovation. Programs, based on the local or regional environment, may consider physician service to distinct patient populations (i.e., urban vs. rural) or practice settings (i.e., community based vs. academic) as central to their mission. Before an applicant can be holistically evaluated on their potential for success within a program, the program must develop an explicit, well‐publicized mission statement that includes significant institutional qualities and values that influence the training experience and describes how the institution hopes to contribute to health equity and the overall health care landscape of the country. 6 , 10

2. Define assessment rubric domains that align with program mission, values, and goals

Holistic review is an applicant assessment strategy, endorsed by the American Association of Medical Colleges, that allows for the recruitment of diverse, mission‐aligned trainees through a flexible consideration of their experiences, attributes, and metrics (Figure 1). 11 The holistic review process is dependent on the development of an assessment rubric consisting of specific domains; these domains should guide applicant evaluation at the screening, interview, and final assessment phases and should be closely tied to the institution's mission and predetermined qualities of a successful trainee. Medical students and residents must be prepared to manage the rigor of an academic curriculum. To that end, academic performance in the form of traditional metrics (e.g., test scores, grades) should be one of many domains that selection committees consider, and careful consideration should be given to the range of metrics compatible with success. 12 , 13 Clinical exposure and motivation to pursue medicine or a specific specialty may be important domains for evaluation of prospective medical students and residents, respectively. Selection committees should develop domains to evaluate personal attributes and professional experiences that signal future success in their trainees. Personal attributes may include resilience, teamwork, cultural competence, commitment to the underserved, or languages spoken. 14 Valued professional experiences may include substantive demonstrations of service, leadership, and productivity in research. 15 Once domains are drafted, these should be compared to the institutional mission, vision, and values to ensure prioritized areas are represented.

FIGURE 1.

FIGURE 1

The Association of American Medical Colleges Experiences–Attributes–Metrics Model for Holistic Review. Used with permission from the Association of American Medical Colleges.

3. Develop a rating system for assessment of applicants’ alignment with desired domains

Each domain that is a focus of recruitment should be assessed. While universally achievable pass/fail thresholds have gained popularity in other areas of health professions education, in scenarios where scarce resources must be distributed, a system for ordinal ranking facilitates the matching of resources (e.g., interview offers) to the population (e.g., applicant pool). To facilitate ordinal ranking, quantitative assessments of performance in target domains are helpful, whether directly measured or scored by reviewers (e.g., “two manuscripts,” “4 out of 5 rating for demonstrated scholarship”). Qualitative data can also be recorded to complement numerical ratings but are difficult to rank ordinally alone. The size of a rating scale should be large enough to distinguish levels of performance but small enough that differences have meaning and are reproducible across raters. For example, a scale for leadership experience from 1 to 100 is likely too large (i.e., a difference in an 86 vs. 87 is of unclear significance), while a binary scale of 0 and 1 (i.e., no leadership experience vs. leadership experience) will likely fail to stratify applicant performance to the desired degree. Rating scales should be developed iteratively based on feedback from users to optimize this balance. Finally, it is preferable for scales to have objective anchors (e.g., demonstrated behaviors) as opposed to subjective anchors (e.g., perceived potential). For example, “elected leader of local organization” is a demonstrated behavior and preferable to a subjective anchor like “leadership potential.”

4. Consider using composite and gestalt scores

Composite scores provide a summary measure that can be helpful in efficiently stratifying performance, particularly outlier identification. For example, in a system with three domains rated 1–10, recording a summative composite score (range 3–30) may prompt re‐review of applicants with scores in the high 20s that did not receive an interview or those with scores below 10 that did receive an interview to ensure resources are distributed as intended.

Gestalt scores can also be used to overcome some limitations of standard scores, including the ability to factor in truly exceptional performances that extend far above the standard measures of performance. For example, at an institution that prioritizes research, an applicant with extensive research experience far beyond the top of a “research experience” rating scale may receive a high gestalt score to acknowledge this performance in a way that the typical rating scale or composite score may fail to detect. Due to their inherent subjectivity and potential for bias, gestalt scores including assessments of “fit” should be used with caution. 16

5. Limit the use of noncompensatory scores to facilitate holistic review

Noncompensatory scores are those that are assessed in isolation; a decision is made based on the result regardless of how high or low other performance metrics are. Classic examples of scores used in a noncompensatory fashion are Medical College Admission Test scores for medical school admissions and United States Medical Licensing Exam scores for residency recruitment. While use of noncompensatory scores can be tempting to efficiently screen out candidates less likely to succeed without full file review, this approach runs counter to holistic review and the wide range of scores that are compatible with success in medical training. If using noncompensatory scores, consider extreme limits (e.g., “failed multiple courses” as opposed to “any course failure”). The files passing through to full review because of this change likely will not score highly unless there are redeeming factors elsewhere in the application that makes the applicant significantly more desirable to recruit, in which case this change will help detect additional desirable candidates.

6. Select file reviewers

When assembling a team of file reviewers, one must strive to balance sufficient numbers so as not to overburden reviewers while also ensuring reviewers have sufficient training and experience. File review requires a substantial time commitment for initial training, review, and subsequent application discussions. As such, it is valuable to have a larger number of reviewers to divide out this effort. However, if the number of reviewers is too large, reviewers may not have sufficient applications to review, leading to skill decay and inconsistency in application evaluation. The ideal number of files per reviewer should seek this balance and may vary by role (e.g., program director or dean of admissions vs. trainees or alumni).

It is important to ensure diversity and representation in the reviewer team throughout the process to increase the lenses through which the applicant can be viewed and reduce biases. 17 Trainees can also provide a valuable viewpoint as those who best understand the program from a learner perspective. However, access to files can expose residents to otherwise private information about future colleagues. Therefore, we suggest including trainees in the review process, but recommend providing partially redacted files or engaging graduating trainees who would not work with the applicants if they are recruited to the institution.

7. Train file reviewers

Prior to reviewing, raters must be trained on the specific tool used (see Strategies 3 and 4). Begin by discussing the tool, anchors, and criteria used for each item. This should ideally include exemplars for each end of the scale to facilitate initial rater calibration. Training should include a discussion of common types of rating errors and biases. 18 Raters should undergo implicit bias assessment followed by dedicated training on bias mitigation. 19 Single‐session training is often insufficient; therefore, this training should be repeated at regular intervals. Raters should then review a sample of applications and evaluate their scores compared to the group with feedback provided regarding areas of discordance. This process further calibrates rater scoring and identifies central tendencies of raters toward higher or lower average scores. After initial training, ongoing reassessments with repeat training and feedback will help ensure consistent scoring throughout the process. 20

8. Pilot test the review process with a purposeful sample of applications

Prior to full deployment, it is helpful to pilot the tool with a meaningful sample of files to assess response process validity (i.e., the tool is being interpreted as intended) and ease of use in a range of different circumstances. 21 Begin by reviewing the tool as a small group to ensure all questions are clearly written with appropriate rating options and anchors. 22 Then, pilot test the tool on a meaningful sample of applications (e.g., 30 applications) with all reviewers. Consider using strategies such as “think‐aloud” or cognitive interviewing to solicit understanding of how reviewers interpret the questions. 21 Use this process to develop and refine your initial system and then solicit continued feedback throughout the process.

9. Assess inter‐rater agreement measures

Once files are reviewed, inter‐rater agreement should be assessed. While measurement precision is not the intent here, agreement among file reviewers reflects the efficacy of rater training and clarity of the rating tool. Exact agreement is the percentage of raters agreeing on a score, whether that is a subscore (e.g., “scholarship”) or summative decision (e.g., “invite for interview”). Cohen's kappa can be used to assess agreement and adjusts for the probability of agreement by chance. Some variability is expected as tools will not perfectly cover all circumstances and reviewers bring diverse perspectives, but significant variability should prompt reevaluation of the tool and training process.

10. Analyze outcomes and refine the system for the next year

Two outcomes to assess after recruitment are: (1) Was the process successful in recruiting candidates aligned with institutional goals? and (2) Were these candidates recruited in a defensible manner? These are questions about the validity of the system, and a useful framework to formally explore these questions is Messick's modern unified theory of validity. 23 Some key steps in this process include analyzing data on the number of files reviewed, interview invitations sent and accepted, and matriculation/match results. Assess the degree to which file scores correlated with outcomes of interest including successful recruitment to the institution as well as performance during and after training. These applicant outcomes analyses serve to provide defensibility and consequential validity of the overall review process. Did the system successfully identify applicants you sought? If not, consider how the system can be updated. Finally, gather feedback from reviewers (e.g., What components of the system worked well? What was difficult or unclear?) and provide reviewers with feedback on their performance (e.g., accept/decline rates and overall scoring compared to the group, time to review, agreement measures from Strategy 9). Use this feedback to refine your tool and processes for the next recruitment season.

CONCLUSIONS

File review is an important step in selecting trainees that are likely to succeed at the next level and align with institutional missions. The 10 strategies discussed in this article can be used to design file review processes that are efficient while maintaining rigor and accountability.

AUTHOR CONTRIBUTIONS

Study concept and design: Eric Shappell, Michael Gottlieb. Acquisition of the data: Eric Shappell, Keme Carter, Yoon Soo Park, Michael Gottlieb. Analysis and interpretation of the data: Eric Shappell, Keme Carter, Yoon Soo Park, Michael Gottlieb. Drafting of the manuscript: Eric Shappell, Keme Carter, Yoon Soo Park, Michael Gottlieb. Critical revision of the manuscript for important intellectual content: Eric Shappell, Keme Carter, Yoon Soo Park, MG. Statistical expertise: Eric Shappell, Yoon Soo Park. Acquisition of funding: N/A.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflicts of interest.

Shappell E, Carter K, Park YS, Gottlieb M. Educator's blueprint: Holistic applicant file review in undergraduate and postgraduate medical education. AEM Educ Train. 2023;7:e10904. doi: 10.1002/aet2.10904

Supervising Editor: Sam Clarke

REFERENCES

  • 1. U.S. MD‐Granting Medical School Applications and Matriculants by School, State of Legal Residence, and Gender, 2022–2023: Association of American Medical Colleges. 2022.
  • 2. ERAS Statistics Preliminary Data—Residency: Electronic Residency Application System. 2023. February 23, 2023.
  • 3. Boatright D, Ross D, O'Connor P, Moore E, Nunez‐Smith M. Racial disparities in medical student membership in the alpha omega alpha honor society. JAMA Intern Med. 2017;177:659‐665. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. O'Sullivan L, Kagabo W, Prasad N, Laporte D, Aiyer A. Racial and ethnic bias in medical school clinical grading: a review. J Surg Educ. 2023;80:806‐816. [DOI] [PubMed] [Google Scholar]
  • 5. Alvarez A, Mannix A, Davenport D, et al. Ethnic and racial differences in ratings in the medical student standardized letters of evaluation (SLOE). J Grad Med Educ. 2022;14:549‐553. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Nakae S, Porfeli EJ, Davis D, et al. Enrollment Management in Undergraduate Medical School Admissions: a complementary framework to holistic review for increasing diversity in medicine. Acad Med. 2021;96:501‐506. [DOI] [PubMed] [Google Scholar]
  • 7. How to write a mission statement & mission statement examples. 2017. Accessed November 16, 2017 https://www.thebalance.com/how‐to‐write‐a‐mission‐statement‐2948001
  • 8. How to write a vision statement. 2016. Accessed November 16, 2017 https://www.thebalance.com/how‐to‐write‐a‐vision‐statement‐2947992
  • 9. Shappell E, Shakeri N, Fant A, et al. Branding and recruitment: a primer for residency program leadership. J Grad Med Educ. 2018;10:249‐252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Addams A, Bletzinger R, Sondheimer HM, White SE, Johnson LM. Roadmap to Diversity: Integrating Holistic Review Practices into Medical School Admission Processes. Association of American Medical Colleges: Association of American Medical Colleges; 2010. [Google Scholar]
  • 11. Holistic Review Association of American Medical Colleges. 2023. Accessed March 30, 2023, https://www.aamc.org/services/member‐capacity‐building/holistic‐review
  • 12. Aibana O, Swails JL, Flores RJ, Love L. Bridging the gap: holistic review to increase diversity in graduate medical education. Acad Med. 2019;94:1137‐1141. [DOI] [PubMed] [Google Scholar]
  • 13. Dunleavy DM, Kroopnick MH, Dowd KW, Searcy CA, Zhao X. The predictive validity of the MCAT exam in relation to academic performance through medical school: a national cohort study of 2001‐2004 matriculants. Acad Med. 2013;88:666‐671. [DOI] [PubMed] [Google Scholar]
  • 14. Koenig TW, Parrish SK, Terregino CA, Williams JP, Dunleavy DM, Volsch JM. Core personal competencies important to entering students’ success in medical school: what are they and how could they be assessed early in the admission process? Acad Med. 2013;88:603‐613. [DOI] [PubMed] [Google Scholar]
  • 15. Barcelo NE, Shadravan S, Wells CR, et al. Reimagining merit and representation: promoting equity and reducing bias in GME through holistic review. Acad Psychiatry. 2021;45:34‐42. [DOI] [PubMed] [Google Scholar]
  • 16. Shappell E, Schnapp B. The F word: how “fit” threatens the validity of resident recruitment. J Grad Med Educ. 2019;11:635‐636. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Gallegos M, Landry A, Alvarez A, et al. Holistic review, mitigating bias, and other strategies in residency recruitment for diversity, equity, and inclusion: an evidence‐based guide to best practices from the Council of Residency Directors in emergency medicine. West J Emerg Med. 2022;23:345‐352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Royal K. Forty‐five common rater errors in medical and health professions education. Education in the Health Professions. 2018;1:33‐35. [Google Scholar]
  • 19. Capers Qt . How clinicians and educators can mitigate implicit bias in patient care and candidate selection in medical education. ATS Sch. 2020;1:211‐217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Feldman M, Lazzara EH, Vanderbilt AA, DiazGranados D. Rater training to support high‐stakes simulation‐based assessments. J Contin Educ Health Prof. 2012;32:279‐286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Hill J, Ogle K, Gottlieb M, Santen SA, Artino AR Jr. Educator's blueprint: a how‐to guide for collecting validity evidence in survey‐ based research. AEM Education and Training. 2022;6:e10835. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Hill J, Ogle K, Santen SA, Gottlieb M, Artino AR Jr. Educator's blueprint: a how‐to guide for survey design. AEM Education and Training. 2022;6:e10796. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Yudkowsky R, Park YS, Downing SM. Assessment in Health Professions Education. 2nd ed. Routledge; 2020. [Google Scholar]

Articles from AEM Education and Training are provided here courtesy of Wiley

RESOURCES