Skip to main content
MedEdPublish logoLink to MedEdPublish
. 2024 Feb 13;13:205. Originally published 2023 Oct 4. [Version 2] doi: 10.12688/mep.19735.2

The impact of interviewer characteristics on residency candidate scores in Emergency Medicine: a brief report

Ryan F Coughlin 1,a, Jessica Bod 1, D Brian Wood 2, Katja Goldflam 1, David Della-Giustina 1, Melissa Joseph 1, Dylan Devlin 1, Ambrose H Wong 1, Alina Tsyrulnik 1
PMCID: PMC10933563  PMID: 38481470

Version Changes

Revised. Amendments from Version 1

The authors would like to thank all reviewers for taking the time to review our work. In response to thoughtful peer review comments, a more robust description was added to the Participants subsection within the Methods to help more accurately describe inclusion and exclusion criteria for the data set.  The "in-person" interview qualifier was added since interviews have since transitioned to virtual-only for Accreditation Council for Graduate Medical Education (ACGME)-accredited emergency medicine residency applicants. An expansion of the discussion includes consideration of repeating the study to investigate cumulative influence for score change of non-program-director interviewers compared to program director scoring.  Additional attention was paid to behavioural and holistic interview assessments, adding one reference (Hopson LR et al). Minor grammatical clarifications were added.

Abstract

Background

At the conclusion of residency candidate interview days, faculty interviewers commonly meet as a group to reach conclusions about candidate evaluations based on shared information. These conclusions ultimately translate into rank list position for The Residency Match. The primary objective is to determine if the post-interview discussion influences the final scores assigned by each interviewer, and to investigate whether interviewer characteristics are significantly associated with the likelihood of changing their score. Based on Foucault’s ‘theory of discourse’ and Bourdieu’s ‘social capital theory,’ we hypothesized that interviewer characteristics, and the discourse itself, would contribute to score changes after a post-interview discussion regarding emergency medicine residency candidates.

Methods

We conducted a cross-sectional observational study of candidate scores for all candidates to a four-year emergency medicine residency program affiliated with Yale University School of Medicine during a single application cycle. The magnitude and direction of score changes, if any, after group discussion were plotted and grouped by interviewer academic rank. We created a logistic regression model to determine the odds that candidate scores changed from pre- and post-discussion ratings related to specific interviewer factors.

Results

A total of 24 interviewers and 211 candidates created 471 unique interviewer-candidate scoring interactions, with 216 (45.8%) changing post-discussion. All interviewers ranked junior to professor were significantly more likely to change their score compared to professors. Interviewers who were women had significantly lower odds of changing their individual scores following group discussion (p=0.020; OR 0.49, 95% CI 0.26-0.89).

Conclusions

Interviewers with lower academic rank had higher odds of changing their post-discussion scores of residency candidates compared to professors. Future work is needed to further characterize the influencing factors and could help create more equitable decision processes during the residency candidate ranking process.

Keywords: Residency selection, Emergency medicine, Interviewer characteristics, Academic rank, Gender, Post-interview discussion, Recruitment, Unconscious bias

Introduction

Given the binding nature of ‘The Match’ in determining residency candidate-program pairings, all parties are incentivized to ensure optimum compatibility during residency recruitment season. Despite this, a validated scoring system to assess residency candidate interview performances does not exist. Interviewers should therefore consider any factors potentially influencing their scores 16 .

A literature search was performed using all Ovid MEDLINE(R) database entries from 1946 to November 5, 2020, which was the date the search was performed, and did not identify previous investigations that focused on emergency medicine (EM) candidates, nor the impact of a post-interview group discussion. Search terms included: bias, medical school, decision, debrief, interview. A limited number of studies have investigated interviewer characteristics and their possible impact on residency match scores. A recent study reported that there was no significant effect of interviewer sex, faculty academic rank or title on internal medicine candidate scoring 7 . Another found internal medicine residents involved in interviewing consistently gave candidates more favorable scores than faculty interviewers but including resident interviewers did not lead to a significant impact on initial or final rank list position of candidates 8 .

The primary objective is to determine if the post-interview discussion influences the final scores assigned by each interviewer, and to investigate whether interviewer characteristics are significantly associated with the likelihood of changing their score. Our hypothesis was based on Foucault’s ‘theory of discourse’ and Bourdieu’s ‘social capital theory’ 9 . According to the theory of discourse, what a society [in this case, the interviewer group] holds true changes based on the exchange of ideas of those belonging to the society. Social capital theory describes the concept that one’s social position [in this case, interviewer characteristics including academic rank] is a form of resource or commodity that can be used in times of discourse or conflict [in this case, post-interview candidate ranking] 10 . Therefore, we hypothesized that interviewer characteristics, and the discourse itself, would contribute to score changes.

Methods

Ethical considerations

The Yale University School of Medicine Institutional Review Board deemed this deidentified study “Not human research” and exempt from consent requirements (Protocol ID #2000025029, Determined 2/21/2019). Specifically, no consent for publication was required as data has been anonymized, and the data alterations have not distorted scientific meaning.

Participants

The study was conducted at a four-year Accreditation Council for Graduate Medical Education (ACGME)-accredited EM residency program affiliated with Yale University School of Medicine, which is a quaternary referral center in the United States. Subjects were all faculty and resident interviewers during the 2017-18 application cycle. Twenty-four interviewers and 211 candidates were included for a total of 471 unique Interviewer-Candidate pairings of scores. Residency applicant interviewers included eight senior residents, four chief residents, three clinical instructors, six assistant professors, one associate professor, and two professors of EM. Interviewers were directed to score each candidate on a scale from 0 to 8 and provided examples of historical scores and corresponding likelihood to match. Each candidate had a maximum of one resident interviewer. Interviews conducted by the program director (PD) were not included in the data set at the recommendation of our statistician, as changes post-discussion were exceedingly rare, and PD scores mirrored the final rank list very closely. Interviews from the faculty interviewer collecting the data for the study were also excluded in an attempt to avoid bias. In a very rare instance, a faculty member was not able to be at the debriefing and was thus unable to provide edited scores (and these interactions were excluded from the data set). Ultimately, a total of 454 candidate interviews were included in the analysis, all of which were performed in-person.

Data collection

We conducted a cross-sectional, consecutive observational study to determine any change in score that resulted from the discussion session at the end of interview days. The interview structure at the study site included four one-on-one in-person interviews. After each interview, but prior to group discussion, interviewers independently numerically scored each candidate on a scale from 0 to 10. The day concluded with a closed discussion session attended by all interviewers. During this discussion, each candidate’s application and interview performance were reviewed by the entire group, allowing an opportunity for shared perspectives and optional revisions to initial candidate scores. These scores were ultimately used to create the first iteration of the rank list, reviewed again at the final end-of-season discussion. Two scores were obtained for each candidate from each individual interviewer: first immediately following the one-on-one interview, and second following review of the candidate in the closed group discussion. A validated scoring system to assess residency candidate interview performances does not exist, but this method differs from the historical scoring of a single score after the closed group session, that is anecdotally edited with some frequency after discussion but before submission. Closed group discussions last approximately 10 minutes per candidate.

Data were collected by one physician author (man) and the dataset was de-identified and coded by the residency program coordinator (non-physician woman), who did not participate in interviews nor the composition of any portion of this manuscript and entered into a Microsoft Excel (RRID:SCR_016137) worksheet. Data from interviews conducted by the physician author who collected the data were not included in the analysis. Interviewers included peers, educators, advisees, and supervisors of the physician data collector. All participants were aware of data collection and its purpose. Gender identification was by self-report. No interviews occurred more than once. No prompts were provided by authors, no audiovisual recording was used in data collection. No notes other than scores were taken, and corrections of scores after final submission were not allowed. Statistical calculation revealed that our dataset was appropriately powered to draw mathematical conclusions.

Statistical analysis

We determined the odds of candidate scores changing before and after discussion as related to specific interviewer factors using logistic regression modeling. The following variables were included in the model: interviewer academic rank, interviewer sex, score prior to the discussion, and candidate final rank group. A p-value of <.05 was chosen as statistically significant and 95% confidence intervals (CI) were reported. We used IBM SPSS Statistics (RRID:SCR_016479) software (v. 22.0, IBM Corp) to perform statistical analyses. No funding was obtained during this undertaking.

Results

In total, 216 (45.8%) scores changed from pre- to post-discussion. Logistic regression results are summarized in Table 1 11 .

Table 1. OR of interviewers changing candidate scores following group discussion.

(Abbreviations: n = number; OR = odds ratio; CI = confidence interval).

Variable n Adjusted OR (95% CI) p-value
Interviewer Gender
    Men 12 Reference value ──
    Women 12 0.49 (0.26, 0.89) 0.020
Interviewer Rank
    Professor 2 Reference value ──
    Associate Professor 1 9.56 (2.60, 25.40) <0.001
    Assistant Professor 6 12.78 (5.47, 29.87) <0.001
    Instructor 3 4.31 (2.02, 9.18) <0.001
    Chief Resident 4 9.55 (3.92, 23.24) <0.001
    Resident 8 4.94 (2.16, 11.33) <0.001
Candidate Rank List
Group
    Bottom Third Reference value ──
    Middle Third 0.34 (0.20, 0.59) <0.001
    Top Third 0.26 (0.14, 0.48) <0.001
Score Prior to
Discussion
1.15 (1.02, 1.31) 0.029

All interviewers ranked below professor were significantly more likely to change their score as compared to professors. Candidates in the top two thirds of the ultimate rank list were less likely to have their score changed post-discussion as compared to the bottom third (top third: OR 0.26 (95% CI 0.14, 0.48); middle third: OR 0.34 (95% CI 0.20, 0.59)). Interviewers who were women had significantly lower odds of changing their individual scores following group discussion (OR 0.49 (95% CI 0.26, 0.89)) as compared to men. For graphical representation of the degree and direction of candidate score change after discussion, please see Figure 1.

Figure 1. Change in interview candidate scores following group interviewer discussion.

Figure 1.

Discussion

Post-interview discussion resulted in an increased likelihood of score changes by all interviewers, except for full professors. One of the full professors in the study was the residency program director, who has a supervisory role over every interviewer within the department, except for the other full professor. Evaluating these results on the basis of Foucault’s ‘theory of discourse,’ the findings could be explained by the idea that power structures of the post-interview discussion had an influence in the outcomes: the junior faculty and resident interviewers could have a conscious or unconscious desire to ‘agree’ with their supervisor 9 . Alternatively, they may simply have adjusted their score based on reconsideration of the candidate considering other interviewer perspectives. Another background consideration rooted in the ‘social capital theory’ is that residents and junior faculty are closer to candidates in career progression, and so may theoretically have an easier time relating to applicants, whereas senior ranking faculty will have more life, clinical, and interviewing experience, though also be less likely to be affected by the impressions of more senior faculty ( i.e., the one full professor did not change their rank) 10 . Both interviewer groups use their social context to rate the candidate, thus scores change after the discussion occurs.

In future investigations it would be interesting to include candidate demographics and non-academic-rank groupings of faculty such as years of practice. Further work could also examine whether score changes may be more likely to affect final rank list positions for particularly strong or particularly weak candidates more than those in the middle.

Given the stakes for all parties involved in the residency matching process, it is particularly important that interviewers consider all variables, including the influence of group discussion observed here, that may affect a candidate’s position on the rank list. This is especially true of senior faculty who must consider their potential influence on junior interviewers, while junior interviewers should recognize their possible vulnerability to this influence.

A larger conversation by national organizations may be warranted regarding the positive and negative aspects of subjective numerical candidate ranking systems and complexities of associated biases. This study took place at a time when all interviews took place in-person, so further work is needed to investigate whether these findings are applicable to virtual interview procedures. As has been mentioned in recent work, further consideration of holistic and behaviorally based interview questions and scoring systems may allow programs to better design interview assessments to match their priorities 12 . In addition to a small sample size, there were very few professors and associate professors interviewing during the investigation time window. Further exploration of any cumulative effect that non-PD interviewers make on final rank list position seems a worthwhile next step, since the PD scores were nearly identical to final rank list positions. As previously mentioned, in this case, the PD was also a professor with over twenty years of interviewing experience, which is not universally the case among training programs. A replication of this study could also consider using years of interviewing experience as a seniority variable instead of academic ranking.

Interviewer reasoning for score adjustment was not evaluated in this study. Further investigation of score change rationale may clarify the influences on their decisions, though could also be influenced by reporting bias. It would be noteworthy to study any difference in reasoning between gender groups, as there was a significant difference in score change in our study.

The role of candidate socioeconomic status, race, ethnicity, and gender identity, which have been investigated in medical school interviews as well as several other industries, were not addressed in this study but could be an area for future investigation regarding candidate characteristics and any association with interview scores 7, 8, 13 .

Conclusions

Interviewers with lower academic rank had higher odds of changing their post-discussion scores of residency candidates compared to those at the professor level. Future work is needed to further characterize the influencing factors and could help create more equitable decision processes during the residency candidate ranking process.

Acknowledgements

The authors would like to acknowledge Jessica Ray, PhD for her expertise in data analysis and methodology. Dr. Ray has given permission to be named in this manuscript.

Funding Statement

The author(s) declared that no grants were involved in supporting this work.

[version 2; peer review: 1 approved, 2 approved with reservations]

Data availability

Underlying data

Zenodo: Deidentified Data Set For The Impact of Interviewer Characteristics on Residency Candidate Scores. https://doi.org/10.5281/zenodo.8172929 11 .

This project contains the following underlying data:

  • -

    Deidentified Data.xlsx (restricted access)

This data may make individuals identifiable even without a data dictionary due to recorded characteristics of involved parties and the dataset pertains to a confidential interview process and surrounding the confidential residency match process, which should not be shared. Though references to individuals are indirect, publishing the data could make individuals identifiable and is also a violation of the signed agreement referenced below. See section ‘6.4 Confidentiality’ from the NRMP website ( https://www.nrmp.org/wp-content/uploads/2022/09/2023-MPA-Main-Match-Program-FINAL-3.pdf) regarding confidentiality from the Match Participation Agreement Program: 2023 Main Residency Match and Supplemental Offer and Acceptance Program (SOAP).

Data access may be obtained by submitting an electronic request to the corresponding author of The Impact of Interviewer Characteristics on Residency Candidate Scores in Emergency Medicine A Brief Report. All requests will be reviewed by the authors before being allowed.

References

  • 1. Bandiera G, Regehr G: Reliability of a structured interview scoring instrument for a Canadian postgraduate emergency medicine training program. Acad Emerg Med. 2004;11(1):27–32. 10.1197/j.aem.2003.06.011 [DOI] [PubMed] [Google Scholar]
  • 2. Bass A, Wu C, Schaefer JP, et al. : In-group bias in residency selection. Med Teach. 2013;35(9):747–751. 10.3109/0142159X.2013.801937 [DOI] [PubMed] [Google Scholar]
  • 3. Crane JT, Ferraro CM: Selection criteria for emergency medicine residency applicants. Acad Emerg Med. 2000;7(1):54–60. 10.1111/j.1553-2712.2000.tb01892.x [DOI] [PubMed] [Google Scholar]
  • 4. Elam CL, Stratton TD, Scott KL, et al. : Review, deliberation, and voting: a study of selection decisions in a medical school admission committee. Teach Learn Med. 2002;14(2):98–103. 10.1207/S15328015TLM1402_06 [DOI] [PubMed] [Google Scholar]
  • 5. Gennissen LM, Stegers-Jager KM, de Graaf J, et al. : Unraveling the medical residency selection game. Adv Health Sci Educ Theory Pract. 2021;26(1):237–252. 10.1007/s10459-020-09982-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Ginsburg S, Schreiber M, Regehr G: The lore of admissions policies: contrasting formal and informal understandings of the residency selection process. Adv Health Sci Educ Theory Pract. 2004;9(2):137–145. 10.1023/B:AHSE.0000027438.59184.2b [DOI] [PubMed] [Google Scholar]
  • 7. Oyler J, Thompson K, Arora VM, et al. : Faculty characteristics affect interview scores during residency recruitment. Am J Med. 2015;128(5):545–550. 10.1016/j.amjmed.2015.01.025 [DOI] [PubMed] [Google Scholar]
  • 8. Milne CK, Bellini LM, Ravenell KL, et al. : Residents as members of intern selection committees: can they partially replace faculty? Teach Learn Med. 2003;15(4):242–246. 10.1207/S15328015TLM1504_05 [DOI] [PubMed] [Google Scholar]
  • 9. Dreyfus HL: Michel Foucault, beyond structuralism and hermeneutics / Hubert L. Dreyfus and Paul Rabinow; with an afterword by Michel Foucault.Chicago: University of Chicago Press;1982. [Google Scholar]
  • 10. Zackoff MW, Real FJ, Abramson EL, et al. : Enhancing Educational Scholarship Through Conceptual Frameworks: A Challenge and Roadmap for Medical Educators. Acad Pediatr. 2019;19(2):135–141. 10.1016/j.acap.2018.08.003 [DOI] [PubMed] [Google Scholar]
  • 11. Coughlin RF: Deidentified Data Set For The Impact of Interviewer Characteristics on Residency Candidate Scores. Zenodo. [Dataset],2023. 10.5281/zenodo.8172930 [DOI] [PMC free article] [PubMed]
  • 12. Hopson LR, Dorfsman ML, Branzetti J, et al. : Comparison of the standardized video interview and interview assessments of professionalism and interpersonal communication skills in emergency medicine.Messman A, ed. AEM Educ Train. 2019;3(3):259–268. 10.1002/aet2.10346 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Ryan T: Addressing bias and lack of objectivity in the Canadian resident matching process. CMAJ. 2018;190(40):E1211–E1212. 10.1503/cmaj.70008 [DOI] [PMC free article] [PubMed] [Google Scholar]
MedEdPublish (2016). 2024 Mar 12. doi: 10.21956/mep.21645.r36015

Reviewer response for version 2

Christie Lech 1

No new comments to add

Have any limitations of the research been acknowledged?

Partly

Is the study design appropriate and does the work have academic merit?

Partly

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

Medical education

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

MedEdPublish (2016). 2024 Mar 6. doi: 10.21956/mep.21645.r36016

Reviewer response for version 2

Milad Memari 1

This is an important study attempting to understand the potential social capital that factors such as academic rank and gender may have on changes made to interviewee scoring during a post-interview group review. The study question and background is appropriate, supported by theory and well-articulated. In particular, the authors' point in the discussion that the interviewing and ranking process is complex and bears further conversation is an important one and I applaud the authors for attempting to investigate this further with a rigorous approach.

The authors have evidence supporting the assertion that scores by junior faculty are more likely to change. The question remaining for me is whether the senior evaluators (including the PD who is one of the professors) have post-hoc opportunity to influence the rank list, thereby making them less likely to change the actual score in the first iteration of the rank list upon which the analysis is based. The authors mention a final "end-of-season discussion" which may allow further modification. As such, if the discussion session is seen an information-gathering exercise for the PD and other senior faculty who may have a later opportunity to state their views when compared to the more junior interviewees, the measured difference may be less meaningful. Additional limitations in this study are the small number of faculty involved and the single institution nature of the study limiting generalizability. These types of complexities are difficult to account for in a limited setting with a small number of faculty in general.

I fully agree with authors' suggestion for further studies to evaluate the impact of bias and discussion on rankings for all candidates. This study, despite limitations in generalizability, is helpful as an exploration into this important area of study. Given the fact that these discussions and ranking adjustment processes are likely universal in residency training programs, more studies like this and attempts at understanding the impact of social dynamics on interviewee evaluations is needed. 

Is the work clearly and accurately presented and does it cite the current literature?

Yes

Is the study design appropriate and does the work have academic merit?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Partly

If applicable, is the statistical analysis and its interpretation appropriate?

Defer to statistician

Have any limitations of the research been acknowledged?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Have any limitations of the research been acknowledged?

Yes

Is the study design appropriate and does the work have academic merit?

Partly

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

No

Are all the source data underlying the results available to ensure full reproducibility?

Partly

Are the conclusions drawn adequately supported by the results?

No

Are sufficient details of methods and analysis provided to allow replication by others?

Partly

Reviewer Expertise:

Medical Education, Learner Development, Growth Mindset, Coaching

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

MedEdPublish (2016). 2024 Jan 30. doi: 10.21956/mep.21143.r35732

Reviewer response for version 1

Laura R Hopson 1

This is an interesting study, and I will admit that I have contemplated (but never executed) a similar approach to understanding the role of group dynamics in the assignment of interview scores. I appreciate the anchoring of the study in theory and the attempt to explain behaviors.

There are several areas that I believe would benefit from additional attention from the authors in order to present a well-rounded perspective on their data and its analysis.  

First, these interactions are, almost definitely given their dates, in person interactions rather than virtual interactions for both the interview and scoring discussion. This has the potential to be a significant limitation given that their analysis relies on social interactions.

I agree that a single "validated scoring system does not exist" for residency interviews as noted in the introduction. However, one such tool was published for EM in PMID: Hopson et al. (2019 1 ) (self-citing). There are also strategies that can increase reliability such as behaviorally based interview questions and behaviorally anchored scoring systems. In addition, holistic review allows programs to design interview assessments to map to program priorities. A discussion of these would add richness to the discussion at a minimum. I would also appreciate seeing the scoring tool shared and a brief discussion of its development with the validity evidence behind it to strengthen this work.

Data collection processes are clearly reviewed and appear appropriate. The statistical analysis generally appears appropriate. However, I am worried there may be significant missing data and would like to see how this data was treated. 4 interviews per candidate with 211 candidates should yield almost 850 data points rather than the 471 included. I would like to see either clarification of this or accounting for an almost 50% loss of data. 

While it may be beyond the scope of this manuscript, there are some interesting nuances in the data which may merit further investigation - that is the marked variability in response of group members to change which appears to be particularly prominent among the residents and chief residents in Figure 1. In addition, the degree of influence of the discussion is not linearly related to academic rank. Both of these make me wonder if there is another confounder such as years of interview experience which could be involved.

The authors also have the potential to propose interventions to mitigate the influence of senior members during discussion and I would like to see them add that to this report in the discussion.

Have any limitations of the research been acknowledged?

Partly

Is the study design appropriate and does the work have academic merit?

Partly

Is the work clearly and accurately presented and does it cite the current literature?

Partly

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

No source data required

Are the conclusions drawn adequately supported by the results?

Partly

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

Medical Education including residency selection processes, the match environment, trainee assessment, and development of expertise.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

References

  • 1. : Comparison of the Standardized Video Interview and Interview Assessments of Professionalism and Interpersonal Communication Skills in Emergency Medicine. AEM Educ Train .2019;3(3) : 10.1002/aet2.10346 259-268 10.1002/aet2.10346 [DOI] [PMC free article] [PubMed] [Google Scholar]
MedEdPublish (2016). 2024 Feb 5.
Ryan Coughlin 1

Thank you for taking the time to review our work.  A new version has been submitted.  You have many valid points.  We have added the in-person qualifier in the background and discussion.  We have added the behavioral discussion items and your citation to the Discussion section. We have added an explanation of the missing data points in the Methods section. The data from the program director and the faculty member running the study and collecting data were excluded.  Not including the PD data was suggested by the statistician as it virtually identical to the ultimate final rank list. We agree that years of experience interviewing is almost certainly important and attempt to address this in our Discussion section.

MedEdPublish (2016). 2024 Jan 30. doi: 10.21956/mep.21143.r35735

Reviewer response for version 1

Milad Memari 1

This is an important study attempting to understand the potential social capital that factors such as academic rank and gender may have on changes made to interviewee scoring during a post-interview group review. The study question and background is appropriate, supported by theory and well-articulated.

There is a question as to whether the methods allow for the study question to be adequately addressed. Given the fact that the more senior evaluators (including the PD who is one of the professors) may have more insight into each candidate when compared to the more junior interviewees, there are alternative explanations to the differences in rating changes. There may be other reasons why the more senior faculty do not change ratings. It is also difficult to draw conclusions when comparing small numbers (in this case, an "n" of 2 full professors). 

Additionally, the statistical description is not fully clear and requires further explanation. The methods state that a logistic regression is reported to have been used without further explanation. The adjusted OR reported are all within the 95% confidence interval for all of the analyses cited and therefore would presumably not reject the null hypothesis in this instance. Unless I am missing something, it is quite unclear how these are statistically significant outcomes. This is altogether not surprising given the fact that the comparison is potentially underpowered. 

This is an important study and would be interesting if expanded to include more individuals such that a statistically significant outcomes could be determined. As it stands, it is unclear what to take away from this manuscript given the above limitations and whether the authors' conclusions can be accepted.

Have any limitations of the research been acknowledged?

Yes

Is the study design appropriate and does the work have academic merit?

Partly

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

No

Are all the source data underlying the results available to ensure full reproducibility?

Partly

Are the conclusions drawn adequately supported by the results?

No

Are sufficient details of methods and analysis provided to allow replication by others?

Partly

Reviewer Expertise:

Medical Education, Learner Development, Growth Mindset, Coaching

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

References

  • 1. : Statistical inference by confidence intervals: issues of interpretation and utilization. Phys Ther .1999;79(2) :186-95 [PubMed] [Google Scholar]
MedEdPublish (2016). 2024 Feb 5.
Ryan Coughlin 1

Thank you for taking the time to review our work. A new version has been submitted. We agree that our small sample size and experience of senior evaluators are important limitations.  We attempted to include these items in the third-to-last paragraph of the Discussion section. We have edited the Methods section to better describe our data set. Regarding the results listed in Table 1, there is indeed a statistically significant difference between the listed groups, allowing us to reject the null hypothesis.

MedEdPublish (2016). 2023 Oct 16. doi: 10.21956/mep.21143.r34993

Reviewer response for version 1

Christie Lech 1

It is important to also comment on how the interviewers were trained regarding interviewing of candidates and how they were trained to score the candidates, as this is also an area for potential bias. In addition, years of experience may come into play. Also can touch more on the significance of this work to future outcomes like matching of highly ranked residents etc.

Have any limitations of the research been acknowledged?

Partly

Is the study design appropriate and does the work have academic merit?

Partly

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

Medical education

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

MedEdPublish (2016). 2024 Feb 5.
Ryan Coughlin 1

It is important to also comment on how the interviewers were trained regarding interviewing of candidates and how they were trained to score the candidates, as this is also an area for potential bias. In addition, years of experience may come into play. Also can touch more on the significance of this work to future outcomes like matching of highly ranked residents etc.  Thank you for taking the time to review our work.  A new version has been submitted. We have added a sentence regarding scoring training in the first paragraph of the Methods Section. We agree that years of experience would be interesting and important and have attempted to include it in the 3rd to last paragraph of the Discussion Section.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Coughlin RF: Deidentified Data Set For The Impact of Interviewer Characteristics on Residency Candidate Scores. Zenodo. [Dataset],2023. 10.5281/zenodo.8172930 [DOI] [PMC free article] [PubMed]

    Data Availability Statement

    Underlying data

    Zenodo: Deidentified Data Set For The Impact of Interviewer Characteristics on Residency Candidate Scores. https://doi.org/10.5281/zenodo.8172929 11 .

    This project contains the following underlying data:

    • -

      Deidentified Data.xlsx (restricted access)

    This data may make individuals identifiable even without a data dictionary due to recorded characteristics of involved parties and the dataset pertains to a confidential interview process and surrounding the confidential residency match process, which should not be shared. Though references to individuals are indirect, publishing the data could make individuals identifiable and is also a violation of the signed agreement referenced below. See section ‘6.4 Confidentiality’ from the NRMP website ( https://www.nrmp.org/wp-content/uploads/2022/09/2023-MPA-Main-Match-Program-FINAL-3.pdf) regarding confidentiality from the Match Participation Agreement Program: 2023 Main Residency Match and Supplemental Offer and Acceptance Program (SOAP).

    Data access may be obtained by submitting an electronic request to the corresponding author of The Impact of Interviewer Characteristics on Residency Candidate Scores in Emergency Medicine A Brief Report. All requests will be reviewed by the authors before being allowed.


    Articles from MedEdPublish are provided here courtesy of Association for Medical Education in Europe

    RESOURCES