Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2023 Nov 21;21(11):e3002385. doi: 10.1371/journal.pbio.3002385

Gender imbalances among top-cited scientists across scientific disciplines over time through the analysis of nearly 5.8 million authors

John P A Ioannidis 1,2,3,4,5,*, Kevin W Boyack 6, Thomas A Collins 7, Jeroen Baas 8
Editor: Anita Bandrowski9
PMCID: PMC10662734  PMID: 37988334

Abstract

We evaluated how the gender composition of top-cited authors within different subfields of research has evolved over time. We considered 9,071,122 authors with at least 5 full papers in Scopus as of September 1, 2022. Using a previously validated composite citation indicator, we identified the 2% top-cited authors for each of 174 science subfields (Science-Metrix classification) in 4 separate publication age cohorts (first publication pre-1992, 1992 to 2001, 2002 to 2011, and post-2011). Using NamSor, we assigned 3,784,507 authors as men and 2,011,616 as women (for 36.1% gender assignment uncertain). Men outnumbered women 1.88-fold among all authors, decreasing from 3.93-fold to 1.36-fold over time. Men outnumbered women 3.21-fold among top-cited authors, decreasing from 6.41-fold to 2.28-fold over time. In the youngest (post-2011) cohort, 32/174 (18%) subfields had > = 50% women, 97/174 (56%) subfields had > = 30% women, and 3 subfields had = <10% women among the top-cited authors. Gender imbalances in author numbers decreased sharply over time in both high-income countries (including the United States of America) and other countries, but the latter had little improvement in gender imbalances for top-cited authors. In random samples of 100 women and 100 men from the youngest (post-2011) cohort, in-depth assessment showed that most were currently (April 2023) working in academic environments. 32 women and 44 men had some faculty appointment, but only 2 women and 2 men were full professors. Our analysis shows large heterogeneity across scientific disciplines in the amelioration of gender imbalances with more prominent imbalances persisting among top-cited authors and slow promotion pathways even for the most-cited young scientists.


An evaluation of the top-cited scientists across science shows a wide range in the amelioration of gender imbalances across scientific disciplines; many scientific disciplines still have fewer women among the top-cited scientists, even among the youngest age cohort.

Introduction

Gender disparities have been very prominent in science across multiple dimensions including recruitment, tenure, funding, authorship, and citation impact [15]. Some of these disparities may be diminishing over time, but the pace of change varies across scientific fields, settings, and countries. For example, an analysis report [6] has documented the decreasing gap between the numbers of female and male authors across science over the years, but while the gap has practically disappeared in Argentina, it continues to be very large in Japan. Moreover, in the same analysis [6] when medical subfields are examined, the inequality continues to be very strong against women in the fields of surgery and radiology and imaging, while women authors outnumber men currently in fields such as infectious diseases, fertility, and public health.

Citation impact in particular is a key coinage in the scientific academic enterprise and there is evidence that citations are misused and gamed [7]. The importance of citations both as a means of academic power as well as a marker and promoter of inequalities may be most prominent among the most-cited scientists and may also affect academic career trajectories. Gender imbalance in scientific careers may be driven be multiple complex forces [8,9], but publications and citations may be key common mediators. Differences in citation counts may reflect difference in the number of publications and/or in the citations received per publication for authors of different genders [6]. Previous analysis has shown [6] that, overall, women tend to publish fewer papers than men and that the field-adjusted number of average citations received is modestly larger when first author is a man rather than a woman.

Here, we aim to use comprehensive publication and citation data that cover all science through the Scopus database [10] in order to evaluate how gender disparities have changed over time in the cohorts of the most-cited scientists, across each of the 174 subfields of science [11]. Top-cited scientists are a select group that is the most influential across science, and any gender biases in this group are likely to have major repercussions for science at large. The available data that we have compiled allow us to investigate cohorts of scientists according to their publication age (i.e., how many years they have been active publishing scientific work). For scientist cohorts of different publication age, we identify the 2% top-cited scientists based on citation metrics that incorporate not only the number of citations received, but also information on and adjustment for co-authorship and for authorship positions among published papers.

Methods

We used the approach we have previously applied [1214] for generating a composite citation index and the construction of comprehensive databases including the 2% top-cited scientists in each of 174 scientific subfields, as defined by the Science-Metrix (RRID:SCR_024471) classification [11]. These subfields cover all types of science, technology, and (bio)medicine as well as scholarly disciplines on the study of humanities and social disciplines. We use summarily the terms “science,” “scientific fields,” and “scientists” in the paper to cover all these diverse types of scholarship, even though, strictly speaking, some of these authors may not see themselves as “scientists.” Each scientist may have published papers in more than 1 subfields, but eventually he/she is classified into a single dominant subfield, the one that has the highest percentage among his/her papers. The utilized Science-Metrix classification uses allocation each journal in a single subfield, except for multidisciplinary journals where the articles may be split into multiple subfields. The primary aim of the analysis was to assess across each of the 174 scientific subfields to what extent the gender imbalance in representation of women among the top-cited scientists has been ameliorating over time. We can track the representation percentage-wise of women among the top-cited scientists with different publication ages.

We used NamSor (RRID:SCR_023935) [15], a gender-assignment software to assign gender to the Scopus (RRID:SCR_022559) author IDs. We have previously used Scopus data to assign gender to each author in projects done through the ICSR Lab, such as the European Commission She Figures report: https://ec.europa.eu/assets/rtd/shefigures2021/index.html [16] and the Elsevier Gender report https://www.elsevier.com/__data/assets/pdf_file/0011/1083971/Elsevier-gender-report-2020.pdf [17]. The NamSor application programming interface takes first/last name and country into consideration. For country, we took the first country an author publishes from as the best estimate (the country of his/her oldest published paper). Some authors may move to different countries during their career and then there is no perfect solution on which country should represent them. However, gender bias has formative influence starting from very early age, therefore assigning these authors to the country of their oldest publication is probably the most appropriate choice. Information on country of birth (which may also be different from the country of oldest publication) is not available in Scopus and extremely difficult to find for most authors. We only kept gender assignments with a confidence score >85%.

We created databases of 2% top-cited authors similar to the ones that we have created and updated on an annual basis (last updated based on September 1, 2022 data using information on over 9 million authors with at least 5 published full papers across all science) [1214]. In brief, the scientists are ranked based on a composite science indicator that considered 6 citation metrics (total citations, h-index, co-authorship adjusted hm-index, number of citations to single-authored papers, number of citations to single and first-authored papers, and number of citations to single-, first-, or last-authored papers). The composite indicator thus takes into account not only the overall citation impact, but also co-authorship, and specifically the citation impact from papers where the author has had authorship positions that in most fields suggest greater contribution to the work.

In the current project, instead of considering all authors together, we considered in separate runs:

A. Those with first publication before 1992 (30+ years of publication age),

B. 1992 to 2001 (20 to 30 years of publication age),

C. 2002 to 2011 (10 to 20 years of publication age),

D. 2012 or later (10 or fewer years of publication age).

For each of the 4 sets, we generated the list of the 2% top-cited scientists in each of the 174 subfields with the ranking based on the composite index. We generated data both for career-long impact (all citations received at any time for all papers published at any time) and for the citation impact in the most recent calendar year, in this case 2021 (citations received in 2021 to papers published in any time). This was done twice: with and without including self-citations (as done in our previous work). Results are largely similar for career-long and single recent year impact; we report in detail here the latter (single recent year impact) and also show the main results according to the former (career-wide impact) approach. Moreover, the presented analyses consider as top-cited all scientists who are in the top-2% according to the composite index score either in the calculations excluding self-citations and/or in the calculations including self-citations. The vast majority of included scientists are in the top-2% using both calculations. We then estimated the percentage of women and men in the 4 publication age cohorts for each of the scientific subfields.

In calculating whether the proportion of women among the top-cited scientists changes over time, we excluded unclear names that cannot be assigned to a gender with >85% certainty. Gender assignment is more difficult in names from some countries than for others and it also depends on whether the full first names are available rather than just the first name initial.

We focused more on subfields that have reached a percentage of women of at least 50% (matching or outnumbering men) and of at least 30% and at what age cohorts these milestones were achieved. Comparatively, we also focused at the other end of the spectrum, subfields where the percentage of women remained below 10%.

We also calculated the relative propensity R of women versus men to be among the 2% top-cited in each subfield and publication age cohort. If there are n(w) women and n(m) men in a given subfield and given publication age cohort, and N(w) and N(m) among them are in the top-2% most-cited, then R=N(w)n(m)N(m)n(w).

We focus on subfields and publication age cohorts with R> = 1, i.e., where women have a larger relative representation among the top-cited scientists than their representation in the overall count of authors. Reciprocally, we also focus on subfields and publication age cohorts where R<1/3, i.e., where men have more than 3-fold relative overrepresentation among top-cited scientists than in the overall count of authors.

All these analyses were primarily performed using the global data of all scientists regardless of country. We then rerun these analyses limited to the scientists who are from high-income countries and separately for non-high-income countries and for scientists who are from the United States of America. High-income countries are those classified as such by the World Bank in 2023 (https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups).

We also present data for the ratio of men over women among authors and, comparatively, among top-cited authors separately for each country, focusing primarily on the youngest (post-2011) cohort to see if some specific countries have ameliorated imbalances more than others. Data are presented for each country for all scientific subfields combined (country-level data per subfield have mostly very small numbers).

Finally, we assessed whether women are disadvantaged in academic recruitments and promotions even if their early work has high impact. Focusing on the youngest top-cited scientists (post-2011 cohort) who are still fairly early in their career, we selected a random sample of 100 men and 100 women. We manually checked their information online to identify how many of them are as of April 2023 in academia, industry, government, or other occupation. All retrieved sources were eligible for perusal, including, but not limited to LinkedIn, ResearchGate, Google Scholar, Frontiers Biosketch pages, and personal and CV-related pages in university and other institution websites. Among those who were located in academia, we recorded how many of them were full professors, associate professors, or assistant professors (or similar early-level faculty title). We coded different academic titles into these 3 ranks, based on perceived equivalence, e.g., for United Kingdom readers may be considered the equivalent of associate professors. The manual inspection of 100 + 100 = 200 random sample of scientists was also used to examine concurrently whether there was a substantial accuracy problem with Scopus ID assignments, i.e., whether top-cited scientists with seemingly short publication age were artifacts of older researchers with fragments of their recent research output separated from their earlier work. This artifact arises because in a few cases, the publications of an author are split in 2 or more different author ID files in Scopus that may contain papers covering different year spans; e.g., if an author who has published from 2000 until now has her publications split into a Scopus ID file that covers the papers published from 2000 until 2020 and a different Scopus ID file that covers the papers published after 2020, that second Scopus ID file will appear as if it belongs to a young author. It also allowed additional validation of the gender assignment generated by the NamSor algorithm.

This is a descriptive, exploratory analysis of a science-wide large bibliometric dataset (not pre-registered). We performed exploratory statistical testing using analysis of 2 × 4 tables adjusting for trend with exact test and comparison of proportions. P-values are two-tailed.

Results

Proportion of women among all authors and top-cited authors

Across a total of 9,071,122 authors with at least 5 full papers, the algorithm assigned 3,784,507 as men, 2,011,616 as women, and for 3,274,999 (36.1%), the assignment to gender was uncertain. Among the top-cited authors, 101,918 were assigned as men, 31,725 as women, and 61,672 were uncertain. Uncertain gender authors are not considered in any further calculations. Overall, men outnumbered women 1.88-fold among all authors and 3.21-fold among top-cited authors.

As shown in Table 1, there was increasing representation of women in cohorts of authors with more recent first publication year both for all authors and for top-cited authors. The ratio of men to women for all authors was 3.93 for authors who first published before 1992 and gradually decreased to 2.06 for authors first publishing in 1992 to 2001, 1.57 for authors first publishing in 2002 to 2011, and 1.36 for authors first publishing after 2011. There was larger gender inequality for top-cited authors at all age groups, with the respective ratios being 6.41, 3.48, 2.74, and 2.28. R (the ratio of ratios for top-cited and all authors) remained constant across age cohorts.

Table 1. Proportion of top-cited women across 4 different publication age cohorts of top-cited scientists.

First publication year pre-1992 First publication year 1992–2001 First publication year 2002–2011 First publication year post-2011
Total authors 2,216,557 1,530,465 2,816,702 2,507,398
Total men authors 994,506 732,149 1,154,225 903,627
Total women authors 253,122 355,935 736,673 665,886
Total uncertain authors 968,929 442,381 925,804 937,885
Top-cited for single recent year impact
Top-cited men authors 28,242 18,082 30,527 25,067
Top-cited women authors 4,403 5,192 11,159 10,971
Top-cited uncertain authors 13,522 9,063 19,085 20,002
Ratio R of top-cited women to men versus all authors women to men 0.61 0.57 0.59 0.59
Subfields with women > = 50% among top-cited 5 18 21 32
Subfields with women > = 30% among top-cited 22 56 84 97
Subfields with women = <10% among top-cited 69 18 6 3
Top-cited for career-long impact
Top-cited men authors 29,317 19,211 31,689 25,046
Top-cited women authors 3,614 4,808 10,951 11,279
Top-cited uncertain authors 13,233 8,315 17,932 19,561
Ratio R of top-cited women to men versus all authors women to men 0.48 0.51 0.54 0.61
Subfields with women > = 50% among top-cited 4 15 15 29
Subfields with women > = 30% among top-cited 16 45 75 98
Subfields with women = <10% among top-cited 80 26 8 7

This data is derived from a breakdown of authors and their citation by gender, country, and career cohort (as well as across all cohorts) done across all subfields.

The changes in the proportion of women among all authors, and among top-cited authors across the 4 age cohorts (adjusting for trend) are statistically significant at p < 0.001. The changes in the proportion of subfields with women being > = 50%, > = 30%, and = <10% among the top-cited between the oldest and youngest cohort are also statistically significant at p < 0.001.

In younger age cohorts, the proportion of women among top-cited authors improved across disciplines (Fig 1). Among top-cited authors who published their first paper before 1992, in almost half of the 174 scientific subfields (n = 69, 40%) women represented only 0% to 10% of the highly cited authors (i.e., ratio of men to women was > = 9). The number of scientific subfields with such major underrepresentation of women among the top-cited authors decreased sharply over time and among the youngest age cohort (authors with first publication after 2011), only 3 subfields had such a pattern (Table 1). In the pre-1992 age cohort, only 5 subfields had > = 50% representation of women among the top-cited authors and the number of subfields with > = 50% representation of women increased gradually to reach 32 (18%) in the post-2011 age cohort. There was a much larger, gradual increase in the number of subfields where women represented > = 30% of the top-cited authors, from 22 (13%) in the pre-1992 cohort to 97 (56%) in the post-2011 cohort (Table 1).

Fig 1. Boxplots of the proportion of women among top-cited authors across the 174 scientific subfields in the 4 age cohorts.

Fig 1

The 2 panels show results when top-cited authors are determined according to career-long impact and when they are determined according to single recent year impact. The data underlying this figure can be found in https://doi.org/10.17632/wwykk8d48g.3.

Table 2 shows the 32 subfields where top-cited women matched or outnumbered top-cited men in the youngest (post-2011) age cohort. As shown, in all of these 32 subfields with 1 exception (Social Science Methods), there was a larger pool of women than men authors starting their publications post-2011. The 32 subfields cover a wide variety of disciplines, with heavier concentrations in medicine, health sciences, social sciences, and cultural domains, and distinct absence of mathematical, engineering, and economics subfields. Detailed data on all 174 subfields are placed in Elsevier Data Repository, doi: 10.17632/wwykk8d48g.3.

Table 2. Subfields with women representing > = 50% of the top-cited scientists among those with first publication post-2011.

Subfields Total scientists post-2011 Total men post-2011 Total women post-2011 Top-cited men post-2011 Top-cited women post-2011 % women among top-cited R Other age cohorts with top-cited women > = 50%: Pre-92/92-01/02-11*
Folklore 86 22 42 0 1 100 ND YNY
Gender studies 503 89 309 2 8 80 1.15 YYY
Social work 1,945 461 1,102 9 30 76.9 1.39 NYY
Drama and theater 181 42 90 1 3 75 1.4 NYY
Nursing 11,685 1,932 6,804 52 121 69.9 0.66 YYY
Epidemiology 2,347 680 1,114 14 31 68.9 1.35 NNY
Developmental and child psychology 5,325 872 3,687 33 70 68.0 0.50 NYY
Family studies 814 169 518 5 10 66.7 0.65 NYY
Criminology 2,890 937 1,618 24 36 60 0.87 NNN
Behavioral science and comparative psychology 2,646 832 1,400 19 28 59.6 0.91 NNN
Rehabilitation 6,779 2,097 3,283 47 69 59.5 0.94 NYY
Public health 18,050 4,677 9,830 135 179 57.0 0.63 NNY
Nutrition and dietetics 14,367 3,166 7,212 110 142 56.3 0.57 NYY
Allergy 3,749 1,092 1,784 27 33 55 0.75 NNN
Pediatrics 12,221 3,441 5,693 107 124 53.7 0.70 NNN
Obstetrics and reproductive medicine 18,304 4,310 8,666 137 156 53.2 0.57 NNY
Geriatrics 3,936 941 1,780 29 33 53.2 0.60 NNY
Gerontology 3,184 814 1,661 29 33 53.2 0.56 NNY
Arthritis and rheumatology 11,692 3,056 4,155 91 103 53.1 0.83 NNN
General clinical medicine 4,240 1,181 1,417 32 36 52.9 0.94 NNY
Legal and forensic medicine 3,071 1,015 1,148 26 29 52.7 0.99 NNN
Clinical psychology 3,846 953 2,453 36 40 52.6 0.43 NNN
Psychiatry 18,626 5,481 8,764 156 168 51.9 0.67 NNN
Genetics and heredity 8,032 1,981 3,968 70 74 51.4 0.53 NNN
Veterinary sciences 11,378 3,568 4,975 103 108 51.2 0.75 NNN
Communication and media studies 3,574 1,266 1,454 31 32 50.8 0.90 NNN
Education 23,905 7,507 11,201 210 214 50.5 0.68 NNY
Endocrinology and metabolism 19,076 5,285 8,276 160 161 50.2 0.64 NNN
Social sciences methods 642 253 238 6 6 50 1.06 NNY
Art practice, history and theory 257 82 90 3 3 50 0.91 YYN
Complementary and alternative medicine 5,059 1,010 1,242 26 26 50 0.81 NNY
General psychology and cognitive Sciences 762 227 389 7 7 50 0.58 NNY

R is the ratio of women to men among top-cited authors divided by the ratio of women to men among all authors. ND, not defined.

*N: No, Y: Yes (for cohorts with authors who had their first publication before 1992, in 1992–2001, in 2002–2011, and after 2011. Percentages of > = 50% for women among the top-cited had been achieved also in the pre-1992 cohort for Architecture, for the 1992–2001 cohort for Demography, Speech-Language Pathology, Industrial Relations, Music, Psychoanalysis, Development Studies, Language and Linguistics, Classics and Anthropology and in the 2002–2011 cohort for Industrial Relations and Substance Abuse; all of these subfields (with the exception of Architecture) had 25%–50% representation of women among their top-cited authors for the post-2011 age cohort. Top-cited scientists are determined based on single recent year impact.

The propensity of women to find themselves among the top-cited (after accounting for number of total available authors) exceeded the performance of men in 29/174 subfields for the pre-1992 cohort, in 33/174 subfields in the 1992 to 2001 cohort, in 17/174 subfields in the 2002 to 2011 cohort, and in only 12/174 subfields in the youngest (post-2011) cohort. At the other end of the spectrum, subfields with more than 3-fold propensity advantage for men (R<1/3) decreased from 24/179 in the pre-1992 cohort, to 17/174 in the 1992 to 2001 cohort, to 14/179 in the 2002 to 2011 cohort, and 10/174 in the post-2011 cohort. The 10 subfields with R<1/3 in the youngest (post-2011) cohort were Economic Theory, Econometrics, Architecture, Microscopy, Music, General Physics, Paleontology, Biophysics, Speech Language Pathology, and Mechanical Engineering.

Analyses on high and non-high-income countries

In the overall database, there was almost double the number of authors from high-income countries (5,899,402) than from non-high-income countries (3,171,720) and uncertain gender assignment was less common in the former than in the latter (25.3% versus 56.2%). Men outnumbered women more prominently in high-income countries (2,925,898/1,481,038 = 1.98) than in other countries (858,609/530,578 = 1.62) in the number of authors. The 2 groups of countries had a similar preponderance of men over women among the top-cited authors (85,776/26,663 = 3.18 and 16,142/4,762 = 3.39, respectively).

However, when focusing on the youngest cohort (first publishing after 2011), there was an equal number of authors from high-income and other countries (1,259,314 versus 1,248,084) and both groups of countries had a similar ratio of male to female authors (576,212/415,307 = 1.36 versus 336,415/250,579 = 1.34), while the preponderance of men over women among the top-cited authors had decreased in the high-income countries, but not substantially in the other countries (17,742/8,472 = 2.09 versus 7,325/2,499 = 2.93).

As shown in Table 3, there was a gradual increase over time in the number of subfields where women represented > = 50% of the top-cited authors in high-income countries (from 3% in the pre-1992 cohort to 21% in the post-2011 cohort), but this was not seen in other countries (still 6% in the post-2011 cohort). The number of subfields where women represented = <10% of the top-cited authors decreased for both high-income and other countries, but 15% of subfields in non-high-income still showed = <10% women authors even in the youngest (post-2011) cohort. Detailed data on all 174 subfields appear in Elsevier Data Repository, doi: 10.17632/wwykk8d48g.3.

Table 3. Number of scientific subfields with > = 50% or = <10% representation of women among the top-cited scientists, in high income and non-high-income countries.

First publication year Subfields with women > = 50% among top-cited (high-income countries) Subfields with women > = 50% among top-cited (non-high-income countries)* Subfields with women = <10% among top-cited (high-income countries) Subfields with women = <10% among top-cited (non-high-income countries)
Pre-1992 5/174 (3%) 10/128 (8%) 72/174 (41%) 76/128 (59%)
1992–2001 17/174 (10%) 27/142 (19%) 20/174 (12%) 47/142 (33%)
2002–2011 22/174 (13%) 22/154 (14%) 10/174 (6%) 33/154 (21%)
Post-2011 37/174 (21%) 10/158 (6%) 9/174 (5%) 24/158 (15%)

*Almost half of these occurrences (34/69) represent situations where the specific age cohort and subfield there was 1 top-cited woman and 0 or 1 top-cited man. Excluding these occurrences, the number of subfields with > = 50% among the top-cited authors in non-high-income countries are 3, 15,11, and 6 in the pre-1992, 1992–2001, 2002–2011, and post-2011 cohorts, respectively.

The changes in the proportion of subfields with women being > = 50% (high-income countries), = <10% (high income countries), and = <10% (non-high-income countries) among the top-cited between the oldest and youngest cohort are statistically significant at p < 0.001. The change in the proportion of subfields with women being > = 50% (non-high-income countries) among the top-cited between the oldest and youngest cohort is not statistically significant (p > 0.25). Top-cited scientists are determined based on single recent year impact.

USA authors

Authors from the USA represented about 30% to 40% of the authors from high-income countries in various analyses and age cohorts and their representation of women was very similar to the overall data from all high-income countries. For example, there were overall 1,910,526 authors assigned to the USA and gender could not be assigned for 423,580 (22.2%). The ratio of male to female authors was 985,813/501,133 = 1.97 overall and 34,664/11,480 = 3.02 for top-cited authors. Among the youngest (post-2011) cohort (372,725 authors), the respective ratios had decreased to 164,151/129,831 = 1.26 and 6,350/3,240 = 1.96. Frequency of very high or very low female representation in different subfields was similar to that found in high-income countries. Detailed data on all 174 subfields appear in Elsevier Data Repository, doi: 10.17632/wwykk8d48g.3.

Country-level analyses

Different countries varied in the extent of imbalance for the ratio of representation of men versus women among all authors and among top-cited authors. Fig 2 presents these ratios for the youngest (post-2011) cohort for the 52 countries with more than 5,000 authors (numbers are small and make these ratios more unreliable for other countries). Results were similar when top-cited authors were determined based on career-long impact (Fig 2A) or single recent year impact (Fig 2B) and we discuss here in detail the latter. In the youngest cohort, 11 countries had fewer men than women authors (the lowest ratios were 0.63 in Thailand and 0.87 in Italy), 1 country had an equal number for both genders and 41 had more men than women (the highest ratios were 6.85 in Iraq and 4.06 in Saudi Arabia). In the youngest cohort, no countries had fewer men than women top-cited authors, but the ratio was closest to 1.00 for Italy (1.03) and Romania (1.04). Conversely, the highest ratios of men versus women top-cited authors were seen in Iraq (14.2) and Japan (9.92). India, Colombia, Pakistan, Argentina, Finland, and Japan had the worse deterioration of the gender imbalance when top-cited authors were considered rather than all authors (ratio of ratios, 4.95, 4.38, 3.70, 3.36, 2.98, and 2.92, respectively). Of these 6 countries, Argentina and Finland had more women than men authors overall, but large imbalances favoring men among top-cited authors. Detailed data on all countries and age cohorts are in Elsevier Data Repository, doi: 10.17632/wwykk8d48g.3.

Fig 2.

Fig 2

Ratio of men over women for all authors (horizontal axis) and ratio of men over women for top-cited authors (vertical axis) for the youngest age cohort (authors who started publishing after 2011) for the 52 countries that had more than 5,000 authors in that cohort. Countries with fewer authors have too few top-cited authors and the gender ratio for top-cited authors would be driven by very small numbers. The 2 panels show results when top-cited authors are determined according to career-long impact and when they are determined according to single recent year impact. The data underlying this figure can be found in https://doi.org/10.17632/wwykk8d48g.3.

In-depth manual assessment of random samples from the youngest cohort

In the random sample of 100 authors assigned as women by the algorithm and selected from the post-2011 cohort, on close manual verification, 3 were men and 10 were authors who had actually started publishing before 2011 but their earlier publications had not been assigned by Scopus in their selected main author profile. Among the respective random sample of 100 authors assigned as men, 9 had similarly started published in earlier years. Thus, 87 verified eligible women and 91 verified eligible men had their work histories evaluated in-depth for their current occupations (Table 4). There were some possible trends for more women being engaged in government positions than men and more men than women working in the industry, but the difference could have been due to chance. The majority in both gender samples were in an academic environment; among them, slightly more than half already had some academic title and this tended to be more common for men than for women (44/69 (64%) versus 32/59 (54%)), but the difference was not statistically significant (p = 0.29). Only 4 had full professor appointments (2 women, 2 men; all 4 in non-high-income countries).

Table 4. In depth analysis of current (April 2023) occupation of random samples of men and women from the youngest (post-2011) cohort.

Current occupation (as of April 2023) Women (n = 87)* Men (n = 91)*
Academia 59 69
 Full professor 2 2
 Associate professor 11 13
 Assistant professor of similar** 19 29
 Other (training, staff) or unclear 27 25
Government 9 3
Other research institute (non-industry) 5 2
Industry 7 13
Clinical*** 7 4

Data on current occupation and time onset of publication records were compiled perusing online searches with the name of the scientist and examining LinkedIn, Google Scholar, ResearchGate, Frontiers, Scopus, and any other relevant data that appeared in these searches. The most recent position was recorded, but it cannot be certain that this information online was entirely up-to-date.

*Of 100 randomly selected women, 13 were artifacts and the same applied to 9 men (see text for details).

**Lecturer, senior lecturer, instructor, or other titles that presumably are in the same level

***Clinicians with academic titles were assigned to the academic group.

Discussion

Our evaluation of a comprehensive science bibliometric database with over 9 million authors who have published at least 5 full papers shows that there have been substantial corrections of the gender imbalance in the scientific workforce over time. However, these corrections are still lagging behind in many scientific subfields and vary extensively across countries. Moreover, while the difference between the number of male and female authors has overall become modest (about 1.3-fold across all scientific authors), the difference in the number of top-cited authors between the 2 genders remains much higher. The overall imbalance in this regard is about 2-fold in high-income countries (and also in the USA specifically) and 3-fold in other countries. There is currently very large heterogeneity across scientific subfields and countries in the presence and prominence of gender imbalances. In the youngest cohort of scientists (those who started publishing after 2011) in almost 1 in 5 subfields, women match or outnumber men among the ranks of its top-cited scientists. However, in almost all of these subfields this largely reflects the fact that more women than men work and publish in them. Conversely, even in the youngest cohort of scientists, in 44% of the scientific subfields women represent less than 30% of the top-cited authors. Finally, most of the youngest top-cited scientists are still working in academic environments and there was a trend for more men than women to have positions in the typical academic ladder (assistant, associate, or full professor). Nevertheless, very few have reached full professor appointments.

We also examined relative propensity metrics that correct the ratio of top-cited authors by considering also the ratio of all authors who are women versus men. We noted that over time, there were fewer subfields where women had a competitive advantage against men to find themselves among the top-cited authors once they started publishing in a given subfield. Concurrently, the number of subfields where men had a large competitive advantage to find themselves among the top-cited authors once they started publishing in a given subfield also diminished over time.

The scientific workforce globally is changing. There is a rapid advent of massive research productivity in research and scientific publications in some non-high-income countries like China [1820], often with financial incentives that have attracted criticism [18]. Therefore, the share of authors from non-high-income countries has increased sharply. Among the youngest authors, those with a decade or less of Scopus-indexed publication history, half of them come from non-high-income countries. In these countries, gender imbalances in the ability to reach the top-cited group remain much stronger than in high-income countries. This poses an extra challenge to achieving equity in these countries, where research is often performed under suboptimal circumstances. Previous work has shown that the gender gap in science, technology, and medicine fields is smaller in countries where women are more likely to major in those fields [21]. Moreover, there are local factors and barriers in each low- and middle-income country that shape the characteristics of its workforce and the extent of distortion from gender bias [22].

Even high-income countries, including the USA, have much room to optimize equity. In the USA, it has been documented that women are less likely to be included as authors especially in highly cited papers [23]. Japan has 10-fold more top-cited men than women even in the youngest cohort, probably a reflection of long-lasting traditions [24], despite efforts to improve perceived norms [25]. Other countries such as Argentina and Finland have strong gender imbalances in top-cited authors, even though they have managed to extinguish imbalances in the overall number of authors.

Much attention needs to be given to the younger generations of scientists to promote equity and optimize their path in research. Our in-depth analysis of a random sample of scientists who have been publishing Scopus-indexed papers for a decade or less shows a trend for more men than women to have entered the academic ladder, although the samples were quite small and the difference could be due to chance. We should caution that information available online on academic ranks may not be always up-to-date or complete, but any missingness is probably not affected by gender. A worrisome observation is that very few of these early overachievers have reached professor-level appointments. None of the 4 full professors in this sample came from a high-income country and we cannot exclude the possibility that the 4 full professors had also started publishing earlier than 2012 in journals that are not indexed in Scopus (e.g., local journals) and thus may have longer publication ages. In most academic environments and in most countries, progress through the academic ranks takes a painfully long time and funding independence is typically reached in the mid-40s [26]. Funding disparities also continue to exist and they may fuel career choices and advancement [27]. The current situation may be eroding independent creativity and needs to be challenged [28]. In the past, the time to get a doctoral degree was shorter and scientists could become faculty and even full professors very soon after obtaining their doctoral degree. Lengthy graduate studies and multiple postdoctoral experiences are currently far more common before reaching independence [2931]. Very talented individuals who show clear early promise may need to be promoted much faster. Perhaps overall research originality and creativity gets promoted if young talented scientists are given more support and confidence in tenure.

We considered all authorship positions in counting number of scientists of different genders. However, the calculation of the composite citation indicator that is used to identify the top-cited scientists gives a lot of weight to single, first, and last authorships, as opposed to middle authorship. It is possible that in some cases, women may be more likely to be listed as secondary or supporting authors rather than first or senior authors (or not be listed at all) [23] and this can impact their visibility and recognition in the academic community. Thus, multiple forces may converge towards diminishing the chances of women becoming top-cited.

Our work has some limitations. First, for over a third of the authors, gender assignment was uncertain and these people had to be excluded from further analyses. This level of uncertainty is unavoidable with any automated gender assignment tool. Uncertain gender was modestly less common among the top-cited scientists and in those from high-income countries, but it was very high in authors from non-high-income countries. While there is no reason to believe that the representation of men versus women would be different in the uncertain gender group, we cannot exclude the possibility for some imbalance, e.g., if women are more likely to use only initials rather than full first name. One should therefore be cautious about the uncertainty that these excluded authors induce in the main calculations of gender ratios. Moreover, even with a >85% certainty selection threshold, some people will be assigned to the wrong gender by NamSor. This wrong assignment happened nevertheless in only 6 of 200 randomly selected authors examined in depth. It is possible that the risk of mistaken gender assignment varies across fields and it may be more likely to reflect women being assigned to male gender than the opposite. If so, this may cause some underestimation of the percentage of women in some countries. Second, Scopus is a comprehensive database, but some types of publications, e.g., books and some specific journals may not be represented properly [10]. This may affect the validity of the ranking in some scientific subfields (which specific individuals are included in the top-cited), but it is less likely to affect gender ratios. Third, we used a previously extensively validated methodology for identifying the top-cited authors. However, as we have described before in detail [1214], all citation metrics and calculations, including ours, have deficiencies and inaccuracies [32]. Moreover, citation impact (in whatever form it is calculated) should not be construed as a perfect surrogate of research quality or real-world impact. Nevertheless, our approach offers a reproducible way to identify authors with the highest citation metrics. Imbalances between genders may vary in degree across different metrics or aspects of work achievement. Finally, we could only look at the distinction between men and women, a binary classification that does not consider self-perceptions of gender which go beyond binary options. This is a known unavoidable limitation of any algorithm that tries to assign gender based on names’ and countries’ information.

Allowing for these caveats, this large-scale analysis offers insights for past and current status of gender imbalances in scientific productivity and top citation impact and may be used for future planning and evaluation. Similar data may also be used for benchmarking within single countries and institutions. The persisting large imbalances in several scientific disciplines need more study to understand their causes. One may also learn a lot from disciplines where women have matched or even outperformed men in productivity and citation impact. The counterfactual of ideal equity may not represent a situation where men and women have equal representation among the top-cited scientists in each and every subfield. Nevertheless, the big composite picture suggests that there is still substantial room for further correction of imbalances.

Data Availability

The underlying data for the top 2% by age and field are provided openly in Mendeley (Collins, Thomas; Ioannidis, John; Boyack, Kevin; Baas, Jeroen (2023), “Supplementary Data for "Differential correction of gender imbalance for top-cited scientists across scientific subfields over time" ”, Elsevier Data Repository, doi: 10.17632/wwykk8d48g). For the designation of “high income countries”, we have used public data from the World Bank [https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups]. All other data materials, including applied gender attributions based on NamSor, are available for scientific research purposes on ICSR Lab [https://www.icsr.net/].

Funding Statement

The work of JPAI is supported by an unrestricted gift from Sue and Bob O’ Donnell to Stanford University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Kozlowski D, Larivière V, Sugimoto CR, Monroe-White T. Intersectional inequalities in science. Proc Natl Acad Sci U S A. 2022. Jan 11;119(2):e2113067119. doi: 10.1073/pnas.2113067119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ni C, Smith E, Yuan H, Larivière V, Sugimoto CR. The gendered nature of authorship. Sci Adv. 2021. Sep 3;7(36):eabe4639. doi: 10.1126/sciadv.abe4639 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Larivière V, Ni C, Gingras Y, Cronin B, Sugimoto CR. Bibliometrics: global gender disparities in science. Nature. 2013. Dec 12;504(7479):211–3. doi: 10.1038/504211a [DOI] [PubMed] [Google Scholar]
  • 4.Eloy JA, Svider PF, Cherla DV, Diaz L, Kovalerchik O, Mauro KM, et al. Gender disparities in research productivity among 9952 academic physicians. Laryngoscope. 2013. Aug;123(8):1865–75. doi: 10.1002/lary.24039 [DOI] [PubMed] [Google Scholar]
  • 5.Carr PL, Gunn C, Raj A, Kaplan S, Freund KM. Recruitment, Promotion, and Retention of Women in Academic Medicine: How Institutions Are Addressing Gender Disparities. Womens Health Issues. 2017. May-Jun;27(3):374–381. doi: 10.1016/j.whi.2016.11.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.De Kleijn M, Jayabalasingham B, Falk-Krzesinski HJ, Collins T, Kuiper-Hoyng L, Cingolani I, et al. The Researcher Journey Through a Gender Lens: An Examination of Research Participation, Career Progression and Perceptions Across the Globe. (Elsevier, March 2020) Retrieved from www.elsevier.com/gender-report. [Google Scholar]
  • 7.Ioannidis JPA, Thombs BD. A user’s guide to inflated and manipulated impact factors. Eur J Clin Investig. 2019. Sep;49(9):e13151. doi: 10.1111/eci.13151 [DOI] [PubMed] [Google Scholar]
  • 8.Card D, DellaVigna S, Funk P, Iriberri N. Gender gaps at the academies. Proc Natl Acad Sci U S A. 2023. Jan 24;120(4):e2212421120. doi: 10.1073/pnas.2212421120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Huang J, Gates AJ, Sinatra R, Barabási AL. Historical comparison of gender inequality in scientific careers across countries and disciplines. Proc Natl Acad Sci U S A. 2020. Mar 3;117(9):4609–4616. doi: 10.1073/pnas.1914221117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Baas J, Schotten M, Plume M, Côté G, Karimi R. Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies. Quant Sci Stud. 2020;1:377–386. [Google Scholar]
  • 11.Archambault E, Beauchesne OH, Caruso J. “Towards a multilingual, comprehensive and open scientific journal ontology” in Proceedings of the 13th International Conference of the International Society for Scientometrics and Informetrics (ISSI), Durban, South Africa. Noyons B, Ngulube P, Leta J, editors. 2011, p. 66–77. [Google Scholar]
  • 12.Ioannidis JP, Klavans R, Boyack KW. Multiple citation indicators and their composite across scientific disciplines. PLoS Biol. 2016;14:e1002501. doi: 10.1371/journal.pbio.1002501 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ioannidis JPA, Baas J, Klavans R, Boyack KW. A standardized citation metrics author database annotated for scientific field. PLoS Biol. 2019;17:e3000384. doi: 10.1371/journal.pbio.3000384 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ioannidis JPA, Boyack KW, Baas J. Updated science-wide author databases of standardized citation indicators. PLoS Biol. 2020;18:e3000918. doi: 10.1371/journal.pbio.3000918 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.NamSor. Available from: https://NamSor.app.
  • 16.European Commission She Figures report. Available from: https://ec.europa.eu/assets/rtd/shefigures2021/index.html, last accessed April 15, 2023. [Google Scholar]
  • 17.Elsevier Gender report. Available from: https://www.elsevier.com/__data/assets/pdf_file/0011/1083971/Elsevier-gender-report-2020.pdf, last accessed April 15, 2023.
  • 18.Hvistendahl M. China’s publication bazaar. Science. 2013. Nov 29;342(6162):1035–9. doi: 10.1126/science.342.6162.1035 [DOI] [PubMed] [Google Scholar]
  • 19.Jones TS, Plume AM. Tracking China’s publication boom. Nature. 2011. May 12;473(7346):154. doi: 10.1038/473154d [DOI] [PubMed] [Google Scholar]
  • 20.Li J, Zhu X, Wu D. China’s publications: fewer but better. Nature. 2021. Apr;592(7855):507. doi: 10.1038/d41586-021-01026-7 [DOI] [PubMed] [Google Scholar]
  • 21.Aldén and Neuman. Culture and the gender gap in choice of major: An analysis using sibling comparisons. J Econ Behav Organ. 2022. [Google Scholar]
  • 22.Rose and Hardi. “With Education You Can Face Every Struggle”: Gendered Higher Education in Iraq and Iraqi Kurdistan—Part Three: The Gender Problem. [Google Scholar]
  • 23.Ross MB, Glennon BM, Murciano-Goroff R, Berkes EG, Weinberg BA, Lane JI. Women are credited less in science than men. Nature. 2022;608:135–145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Okoshi K, Nomura K, Fukami Y, Tomizawa Y, Kobayashi K, Kinoshita K, et al. Gender inequality in career advancement for females in Japanese Academic Surgery. Tohoku J Exp Med. 2014;234:221–227. doi: 10.1620/tjem.234.221 [DOI] [PubMed] [Google Scholar]
  • 25.Nagano N, Watari T, Tamaki Y, Onigata K. Japan’s academic barriers to gender equality as seen in a comparison of public and private medical schools: a cross-sectional study. Women’s Health Rep (New Rochelle). 2022. Jan 31;3(1):115–123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Levitt M, Levitt JM. Future of fundamental discovery in US biomedical research. Proc Natl Acad Sci U S A. 2017. Jun 20;114(25):6498–6503. doi: 10.1073/pnas.1609996114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Lauer MS, Roychowdhury D. Inequalities in the distribution of National Institutes of Health research project grant funding. elife. 2021. Sep 3;10:e71712. doi: 10.7554/eLife.71712 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ioannidis JP. More time for research: fund people not projects. Nature. 2011. Sep 28;477(7366):529–31. doi: 10.1038/477529a [DOI] [PubMed] [Google Scholar]
  • 29.Institute of Medicine. The Postdoctoral Experience Revisited. Washington, DC: The National Academies Press; 2014. [PubMed] [Google Scholar]
  • 30.Andalib MA, Ghaffarzadegan N, Larson RC. The postdoc queue: A labour force in waiting. Syst Res Behav Sci. 2018;35(6):675–686. [Google Scholar]
  • 31.Denton M, Borrego M, Knight DB. U.S. postdoctoral careers in life sciences, physical sciences and engineering: Government, industry, and academia. PLoS ONE. 2022;17(2):e0263185. doi: 10.1371/journal.pone.0263185 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Hicks D, Wouters P, Waltman L, de Rijcke S, Rafols I. Bibliometrics: The Leiden Manifesto for research metrics. Nature. 2015. Apr 23;520(7548):429–31. doi: 10.1038/520429a [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Roland G Roberts

3 Jul 2023

Dear John,

Thank you for submitting your manuscript entitled "Differential correction of gender imbalance for top-cited scientists across scientific subfields over time" for consideration as a Meta-Research Article by PLOS Biology.

Your manuscript has now been evaluated by the PLOS Biology editorial staff, as well as by an academic editor with relevant expertise, and I'm writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. After your manuscript has passed the checks it will be sent out for review. To provide the metadata for your submission, please Login to Editorial Manager (https://www.editorialmanager.com/pbiology) within two working days, i.e. by Jul 05 2023 11:59PM.

If your manuscript has been previously peer-reviewed at another journal, PLOS Biology is willing to work with those reviews in order to avoid re-starting the process. Submission of the previous reviews is entirely optional and our ability to use them effectively will depend on the willingness of the previous journal to confirm the content of the reports and share the reviewer identities. Please note that we reserve the right to invite additional reviewers if we consider that additional/independent reviewers are needed, although we aim to avoid this as far as possible. In our experience, working with previous reviews does save time.

If you would like us to consider previous reviewer reports, please edit your cover letter to let us know and include the name of the journal where the work was previously considered and the manuscript ID it was given. In addition, please upload a response to the reviews as a 'Prior Peer Review' file type, which should include the reports in full and a point-by-point reply detailing how you have or plan to address the reviewers' concerns.

During the process of completing your manuscript submission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Best wishes,

Roli

Roland Roberts, PhD

Senior Editor

PLOS Biology

rroberts@plos.org

Decision Letter 1

Roland G Roberts

15 Aug 2023

Dear John,

Thank you for your patience while your manuscript "Differential correction of gender imbalance for top-cited scientists across scientific subfields over time" was peer-reviewed at PLOS Biology. It has now been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by three independent reviewers.

In light of the reviews, which you will find at the end of this email, we would like to invite you to revise the work to thoroughly address the reviewers' reports.

You'll see that all three reviewers are broadly positive about your study, but each raises a number of concerns that will need to be addressed before further consideration. Reviewer #1 asks if the study was preregistered, notes some short-cut citations, wants you to report some additional data, asks about the universality of some academic posts, and wants more detail on how authorship position was considered. S/he also makes some recommendations for improved dataviz and suggests some additional analyses. Reviewer #2 wants more rigorous formal stats to support your assertions of differences between fields and through time; s/he has some further requests for clarifications and extra analyses. Reviewer #3 is somewhat more guarded; s/he wants you to rectify a mismatch between the claims and the choice of academic subfields, recommends pooling small subfields to make numbers of researchers in each more comparable, questions the robustness of your gender classification tools (especially if it has subfield-specific problems), and challenges the relevance of your “country” analysis.

IMPORTANT: I discussed the reviews with the Academic Editor, and they sent me the following comments, somewhat edited, which you may find helpful (and which you should address):

"I agree with most of the reviewers comments, each brings up important points and none of them are unreasonable.

"I see where the authors are coming from, no, not everything is going to need to be hypothesis tested. However, as reviewer #2 points out, descriptive stats are important for making some of the points that authors would like to make. Post hoc tests are typically used for these sorts of "additional analysis" ensuring an overall lower significance rate. Of course the n's here are quite large so I would assume that no matter which analysis is done, it will be significant so I can certainly see why the authors chose this route.

"I also agree with the reviewers about the https://namsor.app/. Much like many of our analyses, the presence of a black box component (either a commercial app or an antibody kit) needs to be treated carefully. The authors simply need to address this issue, it is not disqualifying, but should be treated more carefully.

"I would also ensure that all RRIDs were added to the manuscript. There are RRIDs for software tools and databases that are appropriate here and are not included."

Given the extent of revision needed, we cannot make a decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is likely to be sent for further evaluation by all or a subset of the reviewers.

We expect to receive your revised manuscript within 3 months. Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension.

At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may withdraw it.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Please submit the following files along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point-by-point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually, point by point.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Revised Article with Changes Highlighted" file type.

*Re-submission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this re-submission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Roli

Roland Roberts, PhD

Senior Editor

PLOS Biology

rroberts@plos.org

------------------------------------

REVIEWERS' COMMENTS:

Reviewer #1:

The article "Differential correction of gender imbalance for top-cited scientists across scientific subfields over time" is very well written and an important contribution in the field of gender imbalances in the scientific community. However, there is a notable omission in the presentation of results, which could enhance the overall impact of this study. There are a few concerns that need to be addressed before publishing in PLOS Biology.

Methods:

* Has the study been preregistered? If so please provide the link and/or DOI.

* Particularly the methods section contains quite some short cut citations. Linking to the previous content is fine, however, few more details would be helpful for the reader to better assess the limitations of the study (Taking shortcuts: Great for travel, but not for reproducible methods sections, bioRxiv 2022.08.08.503174; doi: https://doi.org/10.1101/2022.08.08.503174)

* The authors choose to not report the results from the career-long impact; even though they are largely similar, it would be nice to either add this information as supplement (and refer to it in the man text) or even add this data in the main text in an infographic or else. Incorporating this additional data could enrich the comprehensiveness of the study and provide readers with a more comprehensive perspective on the topic.

* Full professor versus associate or assistant professor, are those universal positions one can hold? E.g., in Germany they have the concept of habilitation and junior Prof, but the requirements are different from other countries. Thus, do you consider someone early in their career into one of these categories or rather categorize under "other" if they are in the process of habilitation -I can imagine that this information is not easily accessible via LinkedIn or else..

* Did you consider all authorships irrespective of the position (first, last, middle)? The order of authors listed on a scientific publication can carry significant weight and influence an author's reputation and career prospects. Women, in some cases, may be more likely to be listed as secondary or supporting authors rather than first or senior authors, which can impact their visibility and recognition in the academic community. This is not clear in the manuscript and results might be different when considering the position.

Visualization(s):

* The manuscript would highly benefit from visualizations e.g., the overall number of male versus female authors and the proportion of highly cited researcher in each group. Some of the information that is currently presented in the tables 1- 4 could easily be visualized for a better understanding.

* A Flow chart including the number of excluded data based on unclear gender assignments or else would be helpful, especially for the inclusion or exclusion of countries as only countries with > 5000 authors are considered in your descriptive analysis. Visualizing the random sample and the respective exclusion in the flow chart would be helpful.

* Figure 1: It would be of interest to visually include the different countries into the figure, e.g., by including a color-code for high-income vs non-high income; the 53 countries could also be provided as a list in the supplement (including whether they were considered high- or non-high-income.

Supplementary Information:

* The authors do link to the Elsevier Data Repository; however, it would be helpful for readers to provide a list of content and refer to any supplementary information within the main text. I think the repository contains valuable information and should be better assessable (consider curation and FAIR data principles).

* In line with the previous comment: there are few parts where data is not shown, this could be done via the supplement and linking to it. As an example: "Frequency of very high or very low female representation in different subfields was similar to that found in high income countries." Why is that data not shown and why making the separation and focus on the USA (apart from the fact that the authors are based in the states)? I would rather recommend showing the overall data for high income and then include a subsection with a focus on the US, rather than doing it the other way around. If there are any restrictions though, please mention this in the limitation section.

Results/ Discussion:

* How was the country-level analysis performed if an author changed their affiliation over time e.g., from non-high income to high income? In the method section the authors state that they rerun the analysis for scientists from country X or Y. Could you please clarify how the assignment was done, maybe I missed this.

* In the random sample used for the manual analysis the authors stated in the methods section that they would describe whether top-cited scientists in the youngest cohort (publication age) were artifacts of older researchers, this is a very important point and I feel it is lacking in the results and discussion section. In general, the question relates to the position of an author within a paper -if a highly cited female first author is accompanied by a male senior author. I realize that this might be beyond the scope of this publication, however, a critical discussion regarding the diversity in the author list of a given publication would be highly appreciated.

* Considering low-income or middle-income countries and the current publishing system, how large is the proportion of (highly-cited) authors in general considering high APCs? I think a brief overview of additional barriers (even if they cannot be explored in detail here) should be included in the discussion/ limitation section.

* In that context, funding disparities in high versus non-high-income countries as well as geo-political and social structures, traditions and cultural differences might play a major role and should be discussed in more detail. In that line, the gender gap (in STEM) is smaller in countries where women are more likely to major in those fields (Aldén and Neuman, Culture and the gender gap in choice of major: An analysis using sibling comparisons, Journal of Economic Behavior & Organization, 2022.). The manuscript would highly benefit from a discussion that highlights common issues in gender inequality countries that do not belong to the global north (Rose and Hardi; "With Education You Can Face Every Struggle": Gendered Higher Education in Iraq and Iraqi Kurdistan - Part Three: The Gender Problem). Highlighting these potential (additional) causes for the existing imbalance can help to raise awareness and should not be detangled from the inequity in authorship and career prospects.

Reviewer #2:

This study performs a large-scale quantitative analysis of citation rates for authors of different gender in across different scientific subfields and at different stages of their independent research career. The analysis was applied to a dataset previously generated by this group, using a previously developed citation metric. New to this study, the authors used an automated gender identification algorithm to label authors in their dataset and compare citation rates based on these labels. They report an overall increase in representation of women researchers over time and an increase in the relative representation of women in the group of top-2% cited researchers. However, the degree of representation by women in these groups varies substantially across fields.

This study addresses an important, challenging question with a new and valuable dataset. However, it is confusing that the authors use no statistical tests to substantiate their results. In a broad sense, the conclusion that women are generally underrepresented in research but the degree of their representation has been increasing over time is not surprising. If the authors had made claims that did not align with a confirmation bias, these results would be rejected without statistical tests supporting the claims. In particular, statements about differences between fields and changes over time should be qualified or backed up with statistical tests.

MAJOR CONCERNS

1. L. 173-74. "We avoided formal statistical testing…" The logic of this statement is not clear. Statistical tests can be used to assess results in exploratory studies. If claims are to be made about numerical differences, it seems quite reasonable to back up those claims with a significance test. If the goal here is simply to publish the expanded dataset with gender labels, that may be acceptable on its own. In this case, however, claims about numerical differences between groups should be qualified as anecdotal and not statistically supported. One might argue that the results are so obvious that no statistics are needed. But if that's the case, why bother with a numerical analysis in the first place? Alternatively, it seems like all the tables (1-3, at least) could benefit from fairly straightforward statistical tests. More specific points about these tables are below.

2. L. 269-281. "close manual verification" It is unclear what to take away from this section. At face value, the small subset shows no difference between gender groups. There is the observation that only a small fraction of authors sampled have independent faculty jobs, but is this surprising, given that most authors would complete some years of graduate school and often a postdoc before obtaining a faculty position? Can the authors make a comparison that shows time-to-independence is longer than for earlier generations of researchers? Some additional analysis is required to substantiate the claims made here.

LESSER CONCERNS

L. 179 "top-cited" Please state how this group is defined. The information is provided in the Methods, but it takes a certain amount of digging. It appears that top-cited means the subset of each group with the highest composite citation index (c-index) for papers published in the last years? Or citations in the last year? And ranked both with self-citations and without?

L. 183. Table 1. It would be helpful if the authors could provide percentages and R scores for the different groups. It is true that they can be computed from the raw numbers, but a reader would appreciated it if were reported in the table. A figure would be even better, if the authors are able to do it.

L. 201. A similar request applies to Table 2.

L. 232. Table 3 seems to contain the key results of the study. It's clear that gender ratios are on average approaching 50%. But is the process accelerating? Slowing down? Are there significant differences or not between wealthy and less wealthy countries? The data as presented aren't convincing.

L. 252-267. Figure 1. This figure it hard to interpret. Please add a line of unity slope to show where values on the y axis are greater than the x axis. Consider also plotting on log-log scale, as the small values are difficult to discriminate.

L. 257 "9.87 in Italy" Typo? 9.87 seems very high.

Reviewer #3:

Summary

This study evaluates how the share of female authors among the top-cited 2% in Scopus in each of 174 research subfields has changed over time. Gender of authors is determined using the NamSor database. The authors find substantial heterogeneity across disciplines and subfields in women researchers' shares and the extent to which gender disparities in the top-cited 2% has closed over four publication-age cohorts: pre-1992, 1992-2001, 2002-2011, and 2012 or later. Additional analyses consider within-field heterogeneity in these disparities across countries. Finally, the authors selected a random sample of 200 researchers in the post-2011 cohort for additional manual web research (e.g., LinkedIn, Google Scholar, university websites, etc.) to evaluate differences by gender in academic rank and employment sector.

This paper asks an interesting question and presents plausible descriptive results. However, there are several aspects that need further clarification before publication.

Major points:

1. The description of the 174 fields as "scientific" (and the authors as "scientists") throughout the paper is confusing, as readers might reasonably presume the subjects of analysis are all in the natural sciences or at least in STEM fields. As Table 2 shows, this is not the case: subfields also include humanities, family and gender studies, and so on. I would suggest either limiting the subfields discussed to natural sciences (including health sciences) for better comparability, or changing the presentation and discussion of the paper to better reflect this.

2. As Table 2 also shows, the total number of published researchers varies tremendously across fields. Weighting subfields like Folklore (86 researchers post-2011), Drama & Theater (181), Art History (257), and Gender Studies (503) equally with subfields Public Health (18,050), Psychiatry (18,626), and Education (23,905) can yield misleading conclusions about the extent to which the broader research enterprise is approaching gender balance. Moreover, I also downloaded the full classification subfield list for Science-Metrix, and while e.g. medical topics appear to be reasonably disaggregated (allergy, pediatrics, ob/gyn, endocrinology, etc.), others (e.g., economics) are not. Different citation norms, publication frequencies and peer-review timelines across true subfields of a discipline can lead to differences in citations. At a minimum, I would recommend combining similar small fields so that the number of unique researchers is more similar across "subfields".

3. For fields with very detailed subfields, is it possible that a given highly cited researcher is showing up in multiple subfields' top 2% lists? I don't know about Scopus and Science-Metrix article classifications, but Scimago's similar system classifies many journals in more than one subfield. I also don't see, though perhaps I missed it, whether the subfield classification of authors is based on the journals in which they published or article-level subfield tags. If the former, I would wonder how high-impact scientific journals like Science and Nature are categorized.

4. My own small-N of NamSor suggests it may systematically misclassify contemporary female names as male, and performs very poorly with Vietnamese names. I am in a male-dominated field, and entered the names of each of my 32 students this fall as a test case. Of these, all male students were correctly classified, with probabilities 95% or higher. Among the 14 women, fully half would either be excluded from this study or misclassified as male. Four of the women are Vietnamese, and of those, 3 would be excluded due to "unclear" gender using the 85% cutoff. Half of the 8 native-US women (all with Anglo / Western European names) were misclassified as men, with probabilities 86%, 93%, 95%, and 99%. Now consider the following thought experiment: what if parents who give female children gender-neutral or historically-male-typed names raise children who are more likely to disregard or actively challenge gender norms about degree fields? This paper's method would then systematically undercount the share of women researchers in male-dominated fields.

5. It's not clear what the country analysis is meant to portray, if authors' country is set based on the first observed publication. For many natural sciences fields, this likely represents the location of the author's graduate program or postdoc, which (in some fields more than others) might not correspond to their national origin or where they subsequently work. At minimum, more care is needed in discussing the meaning of these results. It could also be interesting to evaluate whether authors who publish in different countries over their careers have higher citations (e.g. due to networking) vs those who remain in the same country.

Minor points:

* The pre-1992 cohort is likely to have more citations for men because the window for possible citations can be arbitrarily long, and men dominated most fields before 1980. I suggest excluding this cohort, or using a similar decade cutoff as for other cohorts, for better comparability.

* On p. 4, "there is evidence that citations are misused and gamed". I'm sympathetic to this point, but the authors need to clarify what "gamed" means in this context, and if it's important, how/whether we can detect differences by gender in "gaming" citations.

Decision Letter 2

Roland G Roberts

29 Sep 2023

Dear John,

Thank you for your patience while we considered your revised manuscript "Differential correction of gender imbalance for top-cited scientists across scientific subfields over time" for publication as a Meta-Research Article at PLOS Biology. This revised version of your manuscript has been evaluated by the PLOS Biology editors and the Academic Editor.

Based on our Academic Editor's assessment of your revision, we are likely to accept this manuscript for publication, provided you satisfactorily address the following data and other policy-related requests:

IMPORTANT - please attend to the following:

a) Please change your title to make it more accessible and to include some idea of the impressive scale of your study. We suggest the following selection: "Analysis of nearly 5.8 million authors reveals differential correction of gender imbalance for top-cited scientists across scientific subfields over time" or "Analysis of nearly 5.8 million authors reveals large heterogeneity across scientific disciplines in the amelioration of gender imbalances among top-cited scientists over time" or (possibly the snappiest) "Gender imbalances among top-cited scientists across scientific disciplines over time through the analysis of nearly 5.8 million authors"

b) The Academic Editor asked me to re-iterate his/her request from the previous round about RRIDs: "The only thing they did not address that was simple and practical was the addition of RRIDs, which I am assuming they simply did not understand because they did not read the instructions to authors. https://journals.plos.org/plosbiology/s/materials-software-and-code-sharing The authors used 3 major tools. These three are commercial tools therefore there are no good scholarly papers and there will certainly not be any public github repository that will point to the code. Therefore the addition of a persistent unique identifier, the RRID, is the only way to mark which tools were used when the tools are discontinued at some future point (statistically speaking 75% of tools would likely be around for two years after publication). NamSor Version or date of access (RRID:SCR_023935) Science-Metrix Version or date of access (RRID:SCR_024471) Scopus Version or date of access (RRID:SCR_022559) Please ask the authors to include these RRIDs to the methods section with any additional version or date of access information."

c) We note that you have deposited the underlying in the Elsevier Data Repository, V1, doi: 10.17632/wwykk8d48g.1 - many thanks for doing so. However, I note that the associated licence is CC BY NC. I consulted our data team, and they tell me that the data underlying a data cannot have a more restrictive licence than our CC BY one. Please could you therefore (preferably) switch the Elsevier licence to CC BY; if this is not possible, please lodge a copy of the data in (e.g.) Zenodo and provide the Zenodo URL/DOI in the paper's Data Availability Statement and Figure legends (see next point).

d) Please cite the location of the data clearly in both Figure legends, e.g. “The data underlying this Figure can be found in https://doi.org/10.17632/wwykk8d48g.1” or “The data underlying this Figure can be found in https://doi.org/10.5281/zenodo.XXXXX”

e) Please make any custom code available, either as a supplementary file or as part of your data deposition.

As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.

We expect to receive your revised manuscript within two weeks.

To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following:

- a cover letter that should detail your responses to any editorial requests, if applicable, and whether changes have been made to the reference list

- a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable, if not applicable please do not delete your existing 'Response to Reviewers' file.)

- a track-changes file indicating any changes that you have made to the manuscript.

NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:

https://journals.plos.org/plosbiology/s/supporting-information

*Published Peer Review History*

Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*Press*

Should you, your institution's press office or the journal office choose to press release your paper, please ensure you have opted out of Early Article Posting on the submission form. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please do not hesitate to contact me should you have any questions.

Sincerely,

Roli

Roland Roberts, PhD

Senior Editor,

rroberts@plos.org,

PLOS Biology

------------------------------------------------------------------------

CODE POLICY

Per journal policy, as the code that you have generated is important to support the conclusions of your manuscript, we require that you make it available without restrictions upon publication. Please ensure that the code is sufficiently well documented and reusable, and that your Data Statement in the Editorial Manager submission system accurately describes where your code can be found.

------------------------------------------------------------------------

DATA NOT SHOWN?

- Please note that per journal policy, we do not allow the mention of "data not shown", "personal communication", "manuscript in preparation" or other references to data that is not publicly available or contained within this manuscript. Please either remove mention of these data or provide figures presenting the results and the data underlying the figure(s).

------------------------------------------------------------------------

Decision Letter 3

Roland G Roberts

16 Oct 2023

Dear John,

Thank you for the submission of your revised Meta-Research Article "Gender imbalances among top-cited scientists across scientific disciplines over time through the analysis of nearly 5.8 million authors" for publication in PLOS Biology. On behalf of my colleagues and the Academic Editor, Anita Bandrowski, I'm pleased to say that we can in principle accept your manuscript for publication, provided you address any remaining formatting and reporting issues. These will be detailed in an email you should receive within 2-3 business days from our colleagues in the journal operations team; no action is required from you until then. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have completed any requested changes.

Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.

PRESS: We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have previously opted in to the early version process, we ask that you notify us immediately of any press plans so that we may opt out on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for choosing PLOS Biology for publication and supporting Open Access publishing. We look forward to publishing your study. 

Sincerely, 

Roli

Roland G Roberts, PhD, PhD

Senior Editor

PLOS Biology

rroberts@plos.org

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: pointbypointplosbiology.docx

    Attachment

    Submitted filename: responsesoct5.docx

    Data Availability Statement

    The underlying data for the top 2% by age and field are provided openly in Mendeley (Collins, Thomas; Ioannidis, John; Boyack, Kevin; Baas, Jeroen (2023), “Supplementary Data for "Differential correction of gender imbalance for top-cited scientists across scientific subfields over time" ”, Elsevier Data Repository, doi: 10.17632/wwykk8d48g). For the designation of “high income countries”, we have used public data from the World Bank [https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups]. All other data materials, including applied gender attributions based on NamSor, are available for scientific research purposes on ICSR Lab [https://www.icsr.net/].


    Articles from PLOS Biology are provided here courtesy of PLOS

    RESOURCES