We, along with our colleagues on the Emergency Medicine Standardized Video Interview Working Group (EMSVI working group members are listed in Appendix A), are writing to respond to Drs. Buckley, Hoch, and Huang's commentary on the Association of American Medical Colleges (AAMC) standardized video interview (SVI). Our response is intended to provide a more balanced perspective on the SVI project as those who have been involved in it from the beginning and address the concerns raised by the authors.
SVI Background
In summer 2016, the leading societies in academic emergency medicine partnered with the AAMC to develop the SVI. We formed a working group to design and evaluate the SVI. The working group included representatives from Society for Academic Emergency Medicine (SAEM), Association of Academic Chairs of Emergency Medicine (AACEM), Clerkship Directors in Emergency Medicine (CDEM), Council of Emergency Medicine Residency Directors (CORD), Emergency Medicine Residency Association (EMRA), American Academy of Emergency Medicine Resident and Student Association (AAEM‐RSA), and the AAMC Group on Student Affairs (GSA).
In 2016, the AAMC surveyed program directors about the residency selection process. They reported being least satisfied with available information about applicants’ interpersonal‐ communication skills and professionalism in deciding whom to invite to an in‐person interview. In addition, 60% reported that one of their top three pain points in the current selection process is lack of reliable information in these competency areas. After reviewing the literature in employment and higher education selection, the AAMC decided that a SVI was the most viable tool for the residency selection.
An institutional review board (IRB)‐approved research study of the SVI was conducted from June to December 2016. Applicants to the Electronic Residency Application Service (ERAS) 2017 cycle who expressed interest in applying to emergency medicine or applied in general surgery, internal medicine, or pediatrics were invited to participate. In total 1,760 applicants volunteered. Based on results from this study, we recommended that the AAMC move forward with an operational launch of the SVI for the ERAS 2018 cycle.
SVI Purpose, Content, Format, and Scoring
The purpose of the SVI is to provide program directors with objective, standardized information about applicants’ proficiency levels on two of the six ACGME core competencies–Interpersonal and Communication Skills and Professionalism (The ACGME core competency Professionalism was renamed Knowledge of Professional Behavior to acknowledge that the video interview is not a direct observation of behavior but rather an inference of an applicant's proficiency based on their description of past experiences or what they would do in a hypothetical situation)–that is reliable, valid, and easy to use in a high‐volume context. The video interview is designed for use, along with other selection data, to 1) identify applicants for invitation to interview, 2) balance emphasis on Step scores in the selection process and help broaden the pool of applications invited to in‐person interviews, and (3) contribute to the ranking process.
The SVI is an online, asynchronous interview with six questions. Questions are presented in text and applicants provide an audio/video response. Applicants have up to 30 seconds to read a question and up to 3 minutes to respond.
The SVI is not a direct observation of behavior. A diverse cohort of human resources professionals rated applicants’ responses. They received 16 hours of training before they were permitted to evaluate applicants’ responses. Rater training covered the emergency medicine trainee job, the ACGME competencies, a standardized rating process, and unconscious bias. Raters also participated in a calibration activity in which they practiced making ratings and received feedback about their performance. Six raters were assigned to each interview form to minimize the influence that an individual rater had on an applicants’ total SVI score. The order in which raters evaluated participants’ responses was also randomized to minimize potential rater biases (e.g., order effects) from affecting participants’ total SVI score.
Trained raters make inferences about an applicant's proficiency based on his/her description of past experiences or what s/he reported should be done in a hypothetical situation. Responses to each question are rated on a 5‐point scale anchored with detailed descriptions of each proficiency level. Ratings are summed to create a total score ranging from 6 to 30.
Validity
Validity refers to the extent to which evidence and theory support the inferences drawn from scores based on their intended uses. The process of providing evidence of validity is ongoing and involves accumulating multiple sources of evidence. Some sources of evidence become available quickly and others take time to accumulate.1
We respectfully disagree with Drs. Buckley, Hoch, and Huang's conclusion that inferences drawn from SVI scores are not valid. We established the validity of SVI scores using evidence based on content, response processes, and relations with other variables. This approach aligns with best practices in the professional testing literature and legal guidelines.2, 3, 4
The competencies assessed on the SVI are drawn from the Accreditation Council for Graduate Medical Education (ACGME) competencies, which have been identified as core competencies required for success in residency. Subject matter experts in graduate medical education reviewed all interview questions and linked each question to the target competencies. Only questions that survived the review were retained. The scoring rubrics were developed based on a review of the performance literature and Milestones from several specialties. A different set of subject matter experts in emergency medicine, including program directors and faculty, reviewed definitions of each proficiency level and the behavioral examples used to define them. They linked each behavioral example to the target competency and verified its placement on the proficiency scale. This process established job‐relatedness and validity evidence based on content.
We evaluated evidence based on response processes. We trained the raters to use a standardized process and scoring rubrics. We evaluated the extent to which they followed the standardized process with a calibration activity. Raters rated sample videos using the standardized process and scoring rubrics. Staff then reviewed the quality of ratings and provided feedback to raters. Raters adhered to the standardized process, focused on content, used the full range of the score scale, and provided ratings that were consistent with expectations for sample videos. Rater agreement was strong with the ICC (2, k) = 0.78.
We also evaluated evidence based on relations to other variables. Using data from the ERAS 2017 cycle, we examined the correlation between SVI scores and conceptually unrelated variables, such as USMLE Step 1, Step 2 Clinical Knowledge, and Step 2 Clinical Skills. Results showed no or low correlations with these variables.5 This is important, as a new evaluation tool such as the SVI should not correlate with unrelated variables. In fact, assessing competencies unrelated to USMLE scores was a guiding purpose of the SVI. Going forward, we will work with a subset of programs to collect data on trainees’ PGY‐1 performance, as well as other locally held data such as in‐person interview scores. This will allow us to continue evaluating validity based on relations with other variables and the real‐world experience of educators.
Necessity
Currently, the ERAS application overwhelmingly focuses on academic metrics.6, 7, 8 The SLOE is designed to provide a global perspective on an applicant, with writers evaluating the applicant's commitment, differential diagnosis skills, and overall suitability for emergency medicine, among other competencies and skills. While the SLOE is an important advancement over traditional letters of recommendation, some issues have been identified that may limit its effectiveness: 1) “inflated evaluations”; 2) “inconsistency between comments and grades”; and 3) “inadequate perspective on candidate attributes in the written comments.”9Additional research is planned to evaluate the correlation between the SVI and SLOE ratings and perceived usefulness of the SVI compared to the SLOE.
We respectfully disagree with the position that tools, which assess similar competencies, should be eliminated from the process. First, we don't think these tools are significantly redundant as shown by the differences in their stated purpose. Second, none of these tools is perfectly precise. We propose that a potentially more effective approach would be to use these tools in a complementary fashion to get a more complete picture of the whole applicant.
Costs
The AAMC shares concerns about the cost of the residency selection process and has undertaken a multipronged approach to provide applicants with information and tools to apply more strategically, thereby reducing the number of applications submitted and the cost of applying.
The AAMC will explore all avenues available to minimize SVI costs without jeopardizing its quality, security, and value. If the SVI is judged to be a useful tool by the community, an honest discussion of cost versus benefit will occur, including an assessment of total costs of the resident application process for all stakeholders. Drs. Buckley, Hoch, and Huang asserted that the time expended by programs to view SVI videos does not justify their usage. We expect that once program directors understand the meaning of SVI scores, they will rely more on the SVI scores than videos.
Another question to consider is whether programs can afford not to use the tool. It is estimated that emergency medicine residency programs spend almost $47 million during the interview season10 The cost for applicants varies by specialty with recent estimates ranging from about $8,000 to $9,000.11, 12 Much of these costs are associated with the in‐person interview. What if the SVI could reduce the number of applicants who are not a good fit for the program but are invited for interviews? Could the SVI, along with other application information, help program directors fill in‐person interview slots with applicants who are a better fit for their programs? If yes, the SVI could contribute to cost and resource efficiencies into the system.
As for other “for‐profit” enterprises entering the residency selection, these entities do not possess the intimate knowledge the AAMC has of all the stakeholders in the residency selection process nor is it their primary concern to improve the selection process for the future of medicine.
Computer Scoring
We agree that computer scoring of video interviews appears intimidating. AAMC is studying computer scoring carefully. The AAMC is evaluating whether the computer algorithm can predict the human raters’ scores, which are based on the content of responses, and if it is equally reliable and valid. They are also looking to see whether computer scoring results in smaller, similar, or larger group differences.
Consequences for Applicants
We share the concern that applicants who have more experience with technology or access to resources may have a competitive advantage. In response, AAMC developed a free resource guide to help applicants prepare and provided applicants unlimited practice attempts within the SVI. Research should be conducted to study how the frequency and type of practice affects SVI scores.
The SVI is a summative assessment. Its purpose is not for self‐reflection but to provide information about an applicants’ proficiency level on target competencies at a given moment in time. Like all assessments, the SVI is not perfectly precise. For example, an interviewee's score could be dampened by factors like fatigue or anxiety. This risk is no different than that posed by the Step exams. We addressed this risk by providing training to program directors that emphasized the importance of interpreting SVI scores in the context of the complete application and not overinterpreting small differences.
IRB Concerns
According to the Office of Human Research Protections Code of Federal Regulations Title 45 Part 46, research is a “systematic investigation, including research development, testing and evaluation, designed to develop or contribute to generalizable knowledge.”13 The purpose of the SVI administration was to provide program directors with reliable and valid information about applicants’ Interpersonal and Communication Skills and Knowledge of Professional Behavior, not to contribute to generalizable knowledge about selection or video interviews.
As is typical with tools used in selection, we recommended that all applicants participate in the SVI to ensure a standardized selection process. Discouraging full participation could have resulted in unintended negative consequences for applicants and programs. To evaluate the psychometric properties of a selection tool, results should be based on either the entire population of interviewees or a representative sample of the population. Without representation, the findings could compromise our ability to evaluate the psychometric properties of the SVI and provide data that help program directors interpret results.1
Several activities conducted to evaluate the SVI will require IRB oversight, including program director and applicant surveys and a local validity study with a subset of emergency medicine programs. As is typical, interviewees agreed for their data to be used by AAMC when they completed the SVI and ERAS application.
SVI Fairness
AAMC took steps to ensure fairness and minimize bias at several points in the SVI development and rating process. First, all interview questions and scoring rubrics were reviewed by two different sets of subject matter experts for potential bias. Only questions and elements of the rubric that survived the reviews were retained. Second, applicants were permitted to provide hypothetical or past‐experience responses so all applicants had opportunity to respond to questions. Third, raters were trained in unconscious bias, to use a standardized process, and to rely on the scoring rubric. AAMC provided free preparation materials and unlimited practice attempts to all applicants. DIOs had the option of limiting access to SVI videos in the same manner as pictures in the ERAS application.
When comparing performance differences between groups, it is important to consider the standardized rather than observed difference in scores due to possible differences in the group's score distributions. Results from the ERAS 2017 cycle showed that there were no or small differences between White and URM applicants’ performance on the SVI. These findings compare favorably to standardized tests that typically have large standardized differences between white and black examinees and moderate standardized differences between white and Hispanic examinees.14
We share the opinion that strong residencies benefit from diversity. Indeed, the ability to meet the future needs of a diverse nation depend upon such diversity. We reject the notion that providing the SVI as an additional tool to help applicants accentuate their nonacademic credentials and to enable programs to assess these competencies in a standardized manner jeopardizes the diversity of future training programs. In contrast, we believe that the SVI presents an opportunity to enhance the selection process by deemphasizing academic metrics and adding competencies that broaden the diversity of applicants invited to interview and ultimately ranked to match.
Summary
Since the introduction of Electronic Residency Application Service in 1995, there has been little change in the residency selection process. The current process is comfortable and arguably effective in delivering residents for training. However, the ability to reflect, reconsider, and possibly improve this process is necessary and a critical part of our profession. Applicants, medical schools, and members of the graduate medical education community have all expressed concerns about the overreliance on academic metrics, yet the current residency application is replete with these types of credentials. The standardized video interview may provide an opportunity to augment academic credentials with information that refocuses what is truly important in selecting the next generation of physicians who will meet the medical demands of an increasingly diverse population.
The standardized video interview research study provided promising data to evaluate the reliability of the tool, but the operational pilot allows us to further explore these and other questions, including the tool's utility, value, and user satisfaction. We humbly ask that people remain open‐minded about the standardized video interview as we evaluate its usefulness. We will not know the full potential of this tool unless we explore the questions posed by the community, share results, and facilitate discourse. We believe that this discourse is critically important and we welcome opportunities to explore these questions together.
Appendix A.
EMSVI working group members
Ashely Alker, MD (University of California San Diego); Andra Blomkalns, MD (University of Texas Southwestern Medical Center); Steve Bird, MD (University of Massachusetts Medical School); Mary Calderone Hass, MD (University of Michigan Health System); Nicole Deiorio, MD (Oregon Health Science University School of Medicine); Ramnick Dhaliwal, MD (Hennepin County Medical Center); Fiona Gallahue, MD (The University of Washington); Gene Hern, MD (Highland Medical Center); Yolanda Haywood, MD (George Washington University School of Medicine); Kathy Hiller (University of Arizona College of Medicine Tucson); Zach Jarou, MD (Denver Health/University of Colorado); Rahul Patwari, MD (Rush University Medical Center); Christopher Woleben, MD (Virginia Commonwealth University School of Medicine); and Richard Wolfe, MD (Harvard Medical School Beth Israel Deaconess Medical Center).
AEM Education and Training 2018;2:61–65.
The authors contributed equally to this manuscript and are listed in alphabetical order.
The authors have no relevant financial information or potential conflicts to disclose.
References
- 1. American Educational Research Association, American Psychological Association, National Council on Measurement in Education . Standards for Educational and Psychological Testing. 3rd ed Washington, DC: American Educational Research Association, 2014. [Google Scholar]
- 2. Principles for the Validation and Use of Personnel Selection Procedures. Fourth Edition Bowling Green, OH: Society for Industrial and Organizational Psychology, Inc, 2003. Available at: http://www.siop.org/_Principles/principles.pdf. Accessed Sep 13, 2017. [Google Scholar]
- 3. Downing SM. Validity: on the meaningful interpretation of assessment data. Med Educ 2003;830–7. [DOI] [PubMed] [Google Scholar]
- 4. Guardians v. CSC (1980).
- 5. AAMC Standardized Video Interview Update. May 2017. Available at: https://aamc-orange.global.ssl.fastly.net/production/media/filer_public/c7/6f/c76f2e9f-ccd4-428e-9710-e9bdcea2a9d0/standardized_video_interview_summary_2017_gsa.pdf. Accessed Sep 13, 2017.
- 6. Dunleavy D, Geiger T, Overton R, Prescott J. Results of the 2016 Program Directors Survey: Current Practices in Residency Selection. 2016. Available at: https://members.aamc.org/eweb/upload/Program%20Directors%20Survey%20Report.pdf. Accessed Sep 13, 2017.
- 7. Prober CG, Kolars JC, First LR, Melnick DE. A plea to reassess the role of United States Medical Licensing Examination Step 1 scores in residency selection. Acad Med 2016;91:12–5 [DOI] [PubMed] [Google Scholar]
- 8. Roberts C, Khanna P, Rigby L, et al. Utility of selection methods for specialist medical training: a BEME (best evidence medical education) systematic review: BEME guide no. 45. Med Teach Aug 28:1–17. doi: 10.1080/0142159X.2017.1367375. [Epub ahead of print]. [DOI] [PubMed] [Google Scholar]
- 9. Love JN, Smith J, Weizberg M, et al. Council of Emergency Medicine Residency Directors’ standardized letter of recommendation: the program director's perspective. Acad Emerg Med 2014;21:680–7. [DOI] [PubMed] [Google Scholar]
- 10. Van Dermark JT, Wald DA, Corker JR, Reid DG. Financial implications of the emergency medicine interview process. Acad Emerg Med Educ Train 2017;1:60–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Blackshaw AM, Watson SC, Bush JS. The cost and burden of the residency match in emergency medicine. West J Emerg Med 2017;18:169–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Polacco MA, Lally J, Walls A, Harrold LR, Malekzadeh S, Chen EY. Digging into debt: the financial burden associated with the otolaryngology match. Otolaryngol Head Neck Surg 2017;156:1091–6. [DOI] [PubMed] [Google Scholar]
- 13. Code of Federal Regulations Title 45 Public Welfare Department of Health and Human Services Part 46 Protection of Human Subjects. Available at: https://www.hhs.gov/ohrp/regulations-and-policy/regulations/45-cfr-46/index.html. Accessed Sep 13, 2017.
- 14. Sackett PR, Shen W. Subgroup differences on cognitively loaded tests in contexts other than personnel selection In: Outtz JL, ed. Adverse Impact: Implications for Organizational and Staffing and High Stakes Selection. New York, NY: Taylor and Francis Group, 2010:323–46. [Google Scholar]
