Skip to main content
The Journal of Education in Perioperative Medicine : JEPM logoLink to The Journal of Education in Perioperative Medicine : JEPM
. 2021 Jul 1;23(3):E671. doi: 10.46374/volxxiii_issue3_woo

Gender Differences in the Language of LORs Written for Anesthesiology Medical Student Applicants: Analysis of One Program’s Recruitment Cycle

Jacqueline Y H Woo 1,, Apolonia E Abramowicz 1, Mario A Inchiosa Jr 1, Sherin Abraham 1, Garret Weber 1,
PMCID: PMC8491635  PMID: 34631969

Abstract

Background:

Prior studies have demonstrated gender differences in language used in letters of recommendation (LOR) for residency applicants. No previous studies have investigated linguistic gender differences in LOR specifically in the field of anesthesiology. The objective of this study is to determine whether there are potential gender biases in the language of LOR written for anesthesiology residency applicants.

Methods:

Letters sent through the Electronic Residency Application Service in application for a single training program in the Northeast in 2019-2020 were divided into self-identified male and female groups. The letters were deidentified, converted to machine-readable text, and input into software to analyze differences in language use. Differences in language use and word count between the 2 groups were compared.

Results:

Included in this analysis were 316 applicants (113 female applicants and 203 male applicants) who submitted a total of 1132 letters, 409 of which were letters written for females and 723 were written for males. Analysis of 4 document characteristics and 19 psychological construct word categories showed that males had a higher frequency of tentative notations (P < .0110), while females had a higher frequency of ability notations (P < .0449). No other meaningful differences were found.

Conclusions:

While our results demonstrated 2 differences in language use between male and female anesthesiology residency applicants for LOR, it is reassuring that LOR are relatively free of linguistic bias. Future research should focus on identifying other areas of the specialty’s recruitment process in order to recognize and mitigate gender differences in anesthesiology.

Keywords: Anesthesiology, internship and residency, medical education, females, males, gender bias

Introduction

Gender discrepancies have long been of interest in the medical field. According to the Association of American Medical Colleges (AAMC), 2017 was the first year in which female medical school matriculates became majority; in the most recent published data, women were reported to represent 50.5% of all medical students.1 Yet many medical specialties still see disparities in gender representation in both their resident and physician populations, which raises questions about what factors may contribute to gender differences in the medical profession.2

One possibility implicates the residency application process, an often competitive and time-intensive undertaking entailing review of a medical student’s grades, standardized test scores, and a series of selected letters of recommendation (LORs). LORs serve as a major subjective component for residency selection, in contrast to the objective applicant statistics and scores, and have been found to contain gendered differences in descriptions of applicants.27 Linguistic differences have been identified in both narrative LORs and standardized letters of evaluation (SLOE) written to describe male and female residency applicants within specialties such as ophthalmology, urology, radiology, and surgery.27 These studies indicate that residency recruitment may underlie the observed gender disparities in various medical specialties.

Disparities in gender representation in different specialties are well documented.27 In a study reviewing gender differences in recommendation letters for 440 ophthalmology applicants, significant differences were found in letters describing males and females. Despite comparable applicant statistics, including US Medical Licensing Exam (USMLE) Step 1 scores and demographics, descriptions of male applicants contained more authentic and leisure verbiage than those of female applicants. Females were more often described with feel words and biological processes words.2 Differences in language were also found in a review of 460 letters for urology residency applicants, where male applicant letters included a more authentic tone.3 In addition, the males in this study were described with more references to personal drive, power, and work than female applicants.3 Letters referencing power were also correlated with a successful urology match.3 Radiology residency LORs contained more agentic descriptions for females than male applicants, but no differences were found in use of communal language.4 In surgical residency LORs, letters for male applicants contained more total words than female applicants. Additionally, stand out adjectives such as exceptional were used more frequently to describe males, while (such as hardworking) and work ethic descriptors were used more often to describe females.5 In contrast, orthopedic surgery LORs describing females were longer and contained slightly more achieve words than letters describing male applicants.6 Gender differences have also been found in SLOE written for emergency medicine residency applicants. In the study, female applicants were more often described with communal language, such as teamwork and helpfulness.7

Gender bias may be of particular importance for anesthesiology because of gender discrepancies within the field. Prior studies have demonstrated that anesthesiology has been a male-dominant field.8 AAMC’s most recent available data report that females represent only 34% of the anesthesiology resident population in 2018, and 33% in 2019.9,10 In AAMC’s Active Physicians by Sex and Specialty, 2019 report, 74.1% of active anesthesiologists were male, and 25.9% were female.11 Meanwhile, the Electronic Residency Application Service (ERAS) reported that in 2018, 2019, and 2020, among applicants who reported their gender, 32%, 32%, and 33% were female.12 While the ratio of women to men who apply to anesthesiology residency and the ratio of women to men who are anesthesiology residents are relatively congruent, there still may be implicit biases quantifiable throughout the process.

To the investigators’ knowledge, there have been no studies looking at differences in word categories and choices for male and female applicants’ LORs for anesthesiology residency. We sought to test the null hypothesis that there are no differences in linguistics and gender biases in LORs written for anesthesiology residency applicants. We investigated linguistic word choices and word count differences that LOR authors used to describe male and female applicant characteristics during the 2019–2020 anesthesiology residency application cycle to a single anesthesiology training program accredited by the Accreditation Council for Graduate Medical Education in northeastern United States.

Materials and Methods

LORs from Anesthesiology residency applicants were submitted through ERAS. The letters used in this study were from the 2019–2020 application cycle and included those written for applicants from schools accredited by the Liaison Committee of Medical Education and the Commission on Osteopathic College Accreditation. These letters available in ERAS for analysis were those of prescreened medical school applicants and were chosen for analysis because they represent all equally qualified and interview-acceptable candidates for a single institution during 2019–2020.

All letters were deidentified and sorted according to self-identified gender of the applicant on their ERAS application (male and female only). Each letter’s body text (ie, salutation and signature removed) was converted to machine-readable text using Adobe Acrobat Pro DC (version 20.0 San Jose, California). The resulting text was input into the Linguistic Inquiry and Word Count Software (LIWC) (LIWC2015; Austin, Texas), which is a validated text analysis application that quantifies language metrics.11 It has been used in previous studies that looked at gender differences in the LORs for other specialties.27 This program has been used not only to analyze documents in the medical field, but also to analyze social and physical science documents.13,14 For example, LIWC was used to determine differences in LORs describing job applicants for chemistry and biochemistry faculty positions.11,12

Word categories relevant to LORs were selected for analysis. They include the 4 categories that quantify document characteristics (ie, analytic thinking, clout, authenticity, emotional tone) and 19 psychological construct word categories (Table 1). We selected these latter 19 categories based on prior study results; the most commonly compared categories and the most relevant categories for LOR language were selected.47,15 The 4 document characteristic categories (previously described) and 12 of the 19 psychological construct word categories used are created and defined by LIWC and are automatically included in the LIWC program for text analysis. These categories are referred to as predefined categories. The 12 predefined psychological construct word categories include: positive emotion, negative emotion, social, tentative, drives, achievement, power, insight, leisure, and certainty, as well as male and female to confirm applicant’s genders. The remaining 7 categories are user-defined, ie, created and defined based on prior studies and then manually added into the LIWC program’s dictionary. These user-defined categories include grindstone, ability, standout, teaching, research, communal, and agency. All of these categories have been developed and used by a previously published study on gender differences.47,15 LIWC expresses the 4 document characteristic categories as a composite score on a scale of 1 to 100. For the remaining 19 predefined and user-defined categories, LIWC counts the occurrence of words in each category as they appear. For example, in the category communal, words such as caring, feeling, kind, and friend are each counted as an occurrence. Table 1 includes examples of words included for each of the 19 word categories.

Table 1.

Categorical Variable Information

LIWC Document Characteristics13
Analytic Thinking
Clout
Authenticity
Emotional Tone
LIWC Pre-Defined Word Category13 Examples of Words Included
Positive Emotion Love, Nice, Sweet
Negative Emotion Hurt, Ugly, Nasty
Social Mate, Talk, They
Tentative Maybe, Perhaps
Drives Success, Superior, Benefit
Achievement Win, Success, Better
Power Superior, Bully
Insight Think, Know
Leisure Cook, Chat, Movie
Certainty Always, Never
Male Boy, His, Dad
Female Girl, Her, Mom
User-Defined Word Category47,15
Grindstone Meticulous, Assiduous, Persist
Ability Talent, Innate, Competent
Standout Superb, Outstanding, Unique
Teaching Teach, Mentor, Supervise
Research Data, Study, Manuscript
Communal Care, Expressive, Understand
Agency Assertive, Attention, Industrious

Abbreviation: LIWC, linguistic inquiry and word count.

This study was deemed review board exempt by the Westchester Medical Center, New York Medical College Institutional Review Board (IRB ID # 14344).

Analysis

Means for the 4 document characteristic categories, 12 LIWC predefined word categories, and 7 user-defined word categories were determined for both male and female applicants in order to identify significant differences in language used to describe the 2 genders.

For comparisons with normally distributed samples, t tests were used, and Mann Whitney when normality was not satisfied. The Shapiro Wilk normality test was used for the variables to determine whether the male and female data was normally distributed and if a t test would be appropriate. The Shapiro Wilk test was satisfied in 8 of the variables: positive emotion, insight, drives, achieve, power, agency, standout, and teaching. The Mann Whitney test was used for the other 15 variable comparisons.

The study also looked for any differences in the means of USMLE scores and ages between the genders as secondary outcomes, as similarly reported in prior LOR studies.27 Age was not normally distributed for males and females, so the Mann Whitney test was used. USMLE Step 1 scores were normally distributed, so a t test was used. Statistical significance was defined as having a P value of less than .05 for all comparisons.

Results

Of the 316 medical student anesthesiology applicants who were prescreened and deemed qualified for interview invitation, 113 were female applicants and 213 were male applicants (Figure 1). A total of 1150 letters (inclusive of both males and females) were collected from ERAS. The female group cumulatively submitted 415 total letters and the male group cumulatively submitted a 735 total letters. SLOEs were excluded in this study as they contain mostly standardized language in addition to their short word narratives. Additionally, the linguistic software used did not allow separation of the narrative and nonnarrative elements. The female group had 6 SLOEs and the male group had 12 SLOEs that were removed. This resulted in a final total of 1132 narrative LORs (409 letters written for females and 723 letters written for males) used in our analysis (Figure 2).

Figure 1.

Figure 1.

Total number of applicants whose letters were included in analysis.

Figure 2.

Figure 2.

Total number of letters included in analysis. Abbreviations: ERAS, Electronic Residency Application Service; SLOE, standardized letters of evaluation.

Analysis of applicant characteristics revealed no difference in USMLE Step 1 scores, but a statistically significant difference in age. Female applicants (mean age of 27.36 years) were younger by a few months than male applicants (mean age of 27.71 years) (Table 2).

Table 2.

Applicant Characteristics

Male Applicants, Mean (IQR), n = 203 Female Applicants, Mean (IQR), n = 113 P Value
Applicant Characteristics
Age 27.71 (3.00) 27.36 (2.00) .0391a
USMLE Step 1 Score 231.66 (17.00) 229.42 (19.00) .1496

Abbreviations: IQR, interquartile range; USMLE, US Medical Licensing Exam.

a

Boldface indicates statistically significant (P < .05).

The analysis of letters in LIWC revealed no differences in mean word count between letters written for males and females. In analysis of word categories, there were statistical differences in 2 variables: tentative and ability. Males had a higher frequency of tentative notations and females had a higher frequency of ability notations. The other 17 variables revealed no statistical differences in language used to describe male and female applicants (Table 3). Given the large sample size of LORs (409 male and 723 female applicants), the current study had at least 80% power to detect a difference of 0.13 between the means of the 2 groups. Therefore, there was a confidence of more than 80% for the majority of findings presented in Table 3.

Table 3.

Mean Output Results

Male Applicant Letters, Mean (IQR), n = 723 Female Applicant Letters, Mean (IQR), n = 409 Difference in Means (95% Confidence Interval) P Valuea
Word Count per Letter 357.76 (142.16) 353.57 (140.75) −4.18 (−29.99 to 2162) .9667
Document Characteristics (0–100 score) Male Applicants, Mean (IQR; 95% CI) Female Applicants, Mean (IQR; 95% CI)
 Analytic Thinking 84.82 (6.70) 84.76 (5.79) −0.06 (−1.25 to 1.13) .8042
 Clout 80.19 (6.24) 80.97 (6.30) 0.78 (−0.42 to 1.99) .1967
 Authenticity 7.18 (5.50) 6.49 (5.80) −0.69 (−1.61 to 0.22) .1118
 Emotional Tone 94.59 (6.22) 94.76 (5.91) 0.17 (−1.09 to 1.43) .8107
LIWC-defined Categories (Percent Frequency of Occurrences) Male Applicants, Mean (IQR) Female Applicants, Mean (IQR)
 Positive Emotion 5.55 (1.46) 5.53 (1.38) −0.02 (−0.26 to 0.22) .8676
 Negative Emotion 0.584 (0.42) 0.560 (0.370) −0.02 (−0.09 to 0.04) .6168
 Social 11.64 (1.90) 11.82 (2.02) 0.18 (−0.14 to 0.50) .3891
 Tentative 1.583 (0.620) 1.443 (0.670) −0.14 (−0.24 to −0.03) .0110
 Drives 9.35 (1.42) 9.53 (1.62) 0.17 (−0.11 to 0.46) .2320
 Achievement 3.587 (1.09) 3.716 (1.07) 0.13 (−0.05 to 0.30) .1530
 Power 2.979 (0.920) 2.980 (0.910) 0.001 (−0.15 to 0.15) .9892
 Insight 2.787 (0.670) 2.749 (0.760) −0.04 (−0.17 to 0.09) .5755
 Leisure 0.555 (0.380) 0.568 (0.460) 0.01 (−0.05 to 0.08) .5173
 Certainty 1.360 (0.560) 1.349 (0.550) −0.01 (−0.11 to 0.09) .9105
 Male 5.686 (1.33) 0.149 (0.230) −5.53 (−5.71 to −5.35) N/A
 Female 0.0749 (0.110) 5.755 (1.20) 5.68 (5.55 to 5.81) N/A
User-Defined Categories Male Applicants, Mean (IQR), (Percent Frequency of Occurrences) Female Applicants, Mean (IQR), (Percent Frequency of Occurrences)
 Grindstone 1.088 (0.560) 1.122 (0.410) 0.03 (−0.05 to 0.12) .4680
 Ability 0.791 (0.430) 0.884 (0.490) 0.09 (0.01 to 0.17) .0449
 Standout 0.598 (0.370) 0.610 (0.380) 0.01 (−0.05 to 0.07) .7170
 Teaching 1.375 (0.580) 1.448 (0.630) 0.07 (−0.03 to 0.18) .1796
 Research 0.647 (0.590) 0.690 (0.590) 0.04 (−0.07 to 0.16) .4795
 Communal 0.898 (0.510) 0.928 (0.450) 0.03 (−0.05 to 0.11) .3261
 Agency 1.145 (0.520) 1.161 (0.500) 0.01 (−0.07 to 0.10) .7415

Abbreviations: IQR, interquartile range; LIWC, linguistic inquiry and word count; N/A, not applicable.

a

Boldface indicates statistically significant (P < .05).

Discussion

In this study on LORs submitted to 1 anesthesiology residency program, we found 3 statistically significant differences between male and female residency applicants’ narrative LORs; 2 differences were in the frequency of word categories (ie, the tentative and ability categories) used to describe male and female, and the third difference was age. Male applicants were described using language in the tentative category more often than females. The tentative category includes words such as maybe and perhaps. This is a novel finding not reported in prior studies.27 Meanwhile, females were described using ability category words more often than males. The ability word category includes words such as talent, innate, and competent. An increased frequency of ability word category use for females was also found in a prior study reviewing emergency medicine residency application letters.7 A third difference found during secondary analysis was that on average, females were slightly but statistically younger than their male applicant counterparts. In 2015–2017, females entering medical schools were on average a year younger than males. In 2017–2018, the average age of both male and females entering medical school was the same.16 While a statistically significant difference in age was found in this study, females were only younger by a few months. This difference is likely not of a practical or meaningful difference in experience.

Although our study data supports the null hypothesis and shows limited gender bias in LORs, it is still important to acknowledge the 2 significant differences as potential contributors to gender bias in the anesthesiology application process. It is important for not only letter writers, but also physicians and healthcare providers, to recognize the existence of implicit biases, which may impact attempts at equity during resident selection. In discussing possible reasons for differences in language used to describe male and female applicants, it is important to consider that there might be a generational gap between the letter writers (who likely graduated prior to 2017 when medical schools had greater than 50% male graduates) and the applicant; this may lead to implicit biases among letter writers. The gender difference in ability word use for female applicants stands out in our study; it mirrors the findings of a prior study of emergency medicine letters.7 It is notable that females were described with more ability-denoting words than males in a LOR in which positive attributes are already heavily highlighted. Perhaps it is the unconscious bias of the letter writer that feels a need to emphasize the standing of the female candidate for anesthesiology training. While the implication of the excess tentative language use to describe male applicants is not known, this may be in line with our prior finding that while female applicants tend to have more ability language, male applicants may not require as much strong wording. Hence, we see increased use of timid and hesitant language to describe male applicants. The authors acknowledge that these conclusions are speculative and further research is necessary to validate these implications. The impact of the linguistic differences observed in our dataset is not known; the findings are hypothesis-generating and deserve further study.

While the 2 word categories (tentative and ability) did demonstrate gender differences, the remaining 4 document characteristics and 17 other word categories had no statistically significant differences between languages to describe genders. It is encouraging that our study did not find many differences for anesthesiology applicants, suggesting minimal implicit biases of the letter writers. Additional elements of the application process should be investigated to understand why more men choose to apply into anesthesiology, and whether other elements, such as the Medical Student Performance Evaluation or the interview, contain biases.

As more medical schools move away from class ranking and towards a Pass/Fail grading, as well as with the announcement of the Pass/Fail scoring of USMLE Step 1 in 2022 and elimination of Step 2 Clinical Skills requirement, LORs may become an increasingly important element of a residency application. While it is encouraging that the study found few differences in LORs, it is still important to seek information on elements of the application that form the basis of residency applicant evaluation. Further studies should expand the scope of gender bias analyses to a broader range of anesthesiology programs and the entirety of the application, as well as comparisons with other specialties not represented in the literature.

Limitations

There are several limitations that may limit the generalizability of our findings. The analysis pertains to 1 year’s anesthesiology program residency candidates with completed applications. Applicants from schools not accredited by the Liaison Committee of Medical Education and the Commission on Osteopathic College Accreditation were excluded. Additionally, access to letters in ERAS was limited to those of applicants who were preselected as eligible for interviews, not all those who applied. However, we do believe that the sample was representative of equally qualified candidates as a holistic process considering the entire application determined eligibility. An applicant’s USMLE score was not used as a sole cutoff. As previously discussed, these letters are those of candidates deemed equally qualified during a prescreen. We also did not analyze the diversity of applicant’s medical school locations, so a geographic and cultural bias may persist. Despite this, it is reassuring that the program’s matched applicant cohort for 2019–2020 has a regionally diverse background and consists of an approximately equal number of male and female gendered residents. As a whole, the program had 42.8% females by the end of 2020, and 41% by the end of 2021. In 2020, the program hailed students from 12 states, 3 international medical graduates, and 2 foreign medical graduates; in 2021, the program represents 13 states, 3 international medical graduates, and 2 foreign medical graduates. While our study has limitations, the number of letters analyzed is comparable to studies of other specialties.24

Another methodological limitation was in the conversion of PDF letters to machine-readable text. It is possible that the conversion of scanned letter text to machine-readable text was not converted verbatim despite our best efforts to edit text errors manually. This may possibly reduce the word count recognized and readable text differences analyzed by LIWC.

Additionally, analysis of only 23 word domains was pursued. Several prior studies have compared additional LIWC word categories not investigated in this study.23,57 However, the document characteristics and categories selected for investigation were those most commonly used in the published literature.27,13

Conclusion

In an investigation of 1132 LORs of candidates whose applications were deemed equally qualified and selected for interview to 1 residency program in the northeast United States during 2019–2020, there were no significant differences found in 17 of the 19 word categories and all 4 document characteristics investigated, suggesting there are minimal linguistic differences between the LORs for applicants of both genders. Two significant linguistic differences were found: ability and tentative. Further research is necessary for understanding the implications and weight of these differences with respect to gender disparity and bias.

Footnotes

Disclosures: None

References


Articles from The Journal of Education in Perioperative Medicine : JEPM are provided here courtesy of Society for Education in Anesthesia

RESOURCES