Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2023 Jan 11;17(1):101380. doi: 10.1016/j.joi.2023.101380

The effect of the COVID-19 pandemic on gendered research productivity and its correlates

Eunrang Kwon a, Jinhyuk Yun b,, Jeong-han Kang a,
PMCID: PMC9832056  PMID: 36643578

Abstract

Female researchers may have experienced more difficulties than their male counterparts since the COVID-19 outbreak because of gendered housework and childcare. To test it, we constructed a unique dataset that connects 15,280,382 scholarly publications and their 11,828,866 authors retrieved from Microsoft Academic Graph data between 2016 and 2020 to various national characteristics from LinkedIn, Johns Hopkins Coronavirus Resource Center, and Covid-19 Community Mobility Reports from Google. Using the dataset, this study estimated how much the proportion of female authors in academic journals on a global scale changed in 2020 (net of recent yearly trends). We observed a decrease in research productivity for female researchers in 2020, mostly as first authors, followed by last author position. We also identified various factors that amplified the gender gap by dividing the authors’ backgrounds into individual, organizational and national characteristics. Female researchers were more vulnerable when they were in their mid-career, affiliated to the least influential organizations, and more importantly from less gender-equal countries with higher mortality and restricted mobility as a result of COVID-19. Our findings suggest that female researchers were not necessarily excluded from but were marginalized in research since the COVID-19 outbreak and we discuss its policy implications.

Keywords: COVID-19, Gender inequality, Research productivity, Career, Childcare

1. Introduction

The COVID-19 pandemic has changed the way we live everywhere, and academia has not been immune to these changes. For instance, research productivity declined during the pandemic, particularly for female researchers (Bell & Fong, 2021; Kibbe, 2020; Krukowski et al., 2021; Lerchenmüller et al., 2021; Myers et al., 2020). It is evident that everyone has been affected by the pandemic, yet the degree of disturbance may differ. To illustrate, as of 11th October 2021, four of the five countries with the highest COVID-19 death tolls (WHO, 2021) were developing or emerging economies identified by the International Monetary Fund (IMF) (Long & Ascent, 2020). Moreover, this influence diverges according to scholars’ environment and characteristics (such as their field of study (FOS), social class, and country of domicile); thus, the decline in scientific productivity may not occur equally for researchers. To understand this disparity during the pandemic, scholars have explored the impact of social background on research activity (Andersen et al., 2020; Deryugina et al., 2021; Krukowski et al., 2021; Myers et al., 2020). They found that academic minors, particularly women, suffer more by examining the status of the selected subsets in academia. The significant implications of such studies, however, necessitate complementary research with more quantitative means and widened subjects to understand the unequal effect of the pandemics because it affected the entire world. Here, on the basis of large-scale bibliographic data, we attempt a nearly pan-academia scale comprehension of the unequal impact of the pandemic on scientific activity regarding the social and academic status of researchers.

We hypothesize that researcher vulnerability should be reflected in research productivity and suggest an extensive collection of scientific publications at a pan-disciplinary scale might be suitable for gaging their impact. Microsoft Academic Graph (MAG) provides information with metadata regarding millions of scientific publications, including journal papers, conference proceedings, and online preprints. Here, we focus on the impact of gender on research activity as a personal factor that is vulnerable to the effects of the pandemic. In particular, the research productivity of female researchers may have decreased more than that of the male researchers because of the unequal division of housework and childcare between husband and wife, which has intensified since the outbreak of COVID-19 (Cui et al., 2020). Because bibliographical datasets do not identify the gender of authors, we have identified them with a statistics-based gender classifier (see Methods).

Female academics have established their position in all subject areas but have not yet achieved full gender equity. In most cultures, unpaid housework, and more subtle emotion work (Hochschild, 1979) at home is disproportionately imposed on women due to conventional gender roles; thus, these additional burdens may result in unequal sacrifice (Fuwa, 2004; Greenstein, 1996). To test the hypothesis, we examined the gendered impact of research activity according to social and academic background at three levels: individual (reflecting career stage and academic prestige); organizational (reflecting institutional prestige); and country (reflecting living environment, such as gender equality and the severity of COVID-19 at the national level).

The study used all publications in the MAG from 2016 to 2020, regardless of either the journal or discipline. We combined this large-scale bibliographic metadata with scholars’ social and academic backgrounds. The academic age and author and affiliation h-indices were selected as the representative characteristics of individuals and their academic organizations. Meanwhile, gender equality and total cases of infection/mortality per million (COVID-19) were chosen as the national statistics. We grouped papers into four levels for each variable and explored gender impact by investigating the change in difference (CID) of female authors’ proportions between adjacent years (see Methods). CID can be conceived as a variant of difference-in-difference (DID) between two groups (Abadie, 2005; Card & Krueger, 1993) that cancels any common effect on two differences. This can capture net change in 2020 beyond the constant increase/decrease of female authors’ participation during recent years. Our analysis demonstrates that the pandemic had a greater negative impact on females than males, as presented in the negative CID in 2020 compared with previous years. The analysis (taking into account authors’ surrounding factors) further reveals under which conditions women in academia were more seriously affected in terms of their career stage, affiliated institution, and country of work.

2. Relevant studies

In this section, we provide a concise review of the studies and debates related to the gendered effect of the COVID-19 pandemic in academia. The outbreak of COVID-19 is an unprecedented disaster on a global scale and has led to the explosion of “COVID science” (Else, 2020). However, there have been limited studies on changes in academia globally as a result of the pandemic. Meanwhile, considerable concern has arisen in studies on the decline of research productivity of female researchers in various academic fields. Many countries continue to practice social distancing, and some have even enforced lockdowns to prevent the spread of the virus, resulting in restricted mobility (Google, 2021). Research facilities have been closed, and remote work has been widely implemented to avoid face-to-face contact. At the same time, children have been kept home without going to school or childcare. This has resulted in an additional burden of housework and childcare that is not equally distributed among males and females in most households. Females have had to undertake more unpaid family work (Facebook, 2021), which likely yields a decrease in research productivity compared with male counterparts. This may reasonably be assumed to be because of conventional gender roles in many cultures (Acker, 1990; Fuwa, 2004).

While insufficient time has been spent on research throughout the academic community, it has been reported that female researchers have experienced career disruption due to an increased burden of childcare and housework compared to male counterparts. This suggests a possible long-term negative effect on female researchers as a result of COVID-19 (Cui et al., 2020; Minello, 2020; Yildirim & Eslen-Ziya, 2020; Gao et al., 2021). This finding suggests that an increase in childcare responsibilities at home is a cause of the decrease in research productivity in the STEM fields as well as others such as linguistics and sociology (Hunter & Leahey, 2010; Krukowski et al., 2021). Another study demonstrated that lockdown has resulted in a loss of female economic productivity—also reflected in lower participation rates for females among new research related to COVID-19 (Amano-Patiño et al., 2020).

In summary, the impact of COVID-19 on research productivity may vary depending on the stage of life course for female researchers, especially harmful for those who are parents (Hunter & Leahey, 2010; Squazzoni et al., 2021). Such gendered effects are also observed in online environments (Viglione, 2020). To illustrate, women are less successful than men in disseminating their research and its impact (Vásárhelyi et al., 2021), while gendered tie formation in co-authorship networks is associated with online success for males but not for females. Preprints in the online repository Social Science Research Network (SSRN) show that female research productivity dropped more than that of males, despite the total research productivity increasing after 10 weeks of lockdown in the United States (Cui et al., 2020). In short, even though some studies focus on the gendered impact of the pandemic, the main evidence is based on a rather small subset of academia and commonly neglects to estimate the impact of authors’ environment that may either promote or disrupt scientific productivity.

Some studies report the influence of the author's role in research during the pandemic, although these only test for rather a small subset (such as a single journal or FOS). These papers also utilized limited information on female researchers’ academic status and gender-related conditions in their countries. Saying that, one significant observation was that fewer females were presented as first authors and had fewer submission rates compared with males (Bell & Fong, 2021; Kibbe, 2020; Lerchenmüller et al., 2021). One of the most recent and notable studies (Liu et al., 2022) covered published papers on the COVID-19 open research dataset up to September 2020 in 50 countries and found that female researchers were negatively affected as the first author and showed low contribution and less collaboration with male colleagues during the pandemic. This study by Liu et al. (2022) shows particularly strong evidence because it tried its best efforts to control any confounding factors related to the changes in female productivity. It employed a difference-in-difference (DID) approach for the fair comparison with a control group. It also included two-way fixed effects to control any confounders specific to the country and month. Our study attempts to expand its findings to all the academic fields and at least partly clarifies those confounders in terms of researchers’ career stage, organizational status, and the combination of gender inequality and the COVID-19-related context in the country. Other studies point out that the underestimated contribution of female authors leads to unfair distribution of authorship (Ni et al., 2021; Pinho-Gomes et al., 2020; West et al., 2013). Meanwhile, women as the last author were found to be inconsistent across studies (Andersen et al., 2020; Bell & Fong, 2021; Kibbe, 2020; Lerchenmüller et al., 2021).

Taken together, these findings indicate that it is indispensable to differentiate author roles if we aim to sharply analyze gendered research productivity. For instance, in the physical sciences and biology, the first author is usually an early-career researcher who primarily implements the research, whereas the last author mainly takes a mentoring role that steers the research (Sekara et al., 2018). Therefore, for a single research paper, the first author can be considered a person who devotes the most effort among the authors; from this we can hypothesize that leading authors have been influenced more negatively than the other authors by heavier household duties and mental distractions during the COVID-19 pandemic (Vincent-Lamarre, Sugimoto & Larivière, 2020; Squazzoni et al., 2021).

In addition to author role, we suggest some additional factors that may interact with gender in affecting research productivity: academic career stage, prestige, and national contexts concerning gender and epidemiological situations. First, for the interacting effect of career stage, we hypothesize that female researchers’ productivity will be more affected by the pandemic when their career stage is more likely to be confounded with parenthood. During the pandemic, surveys in multiple countries learned that women, particularly mothers of young children, spent more time on childcare and household tasks and reduced their work hours than fathers (Collins et al., 2021; Giurge et al., 2021). This is also a possible scenario for the researchers. For instance, early-career female authors have experienced a decrease in research productivity during the pandemic, although the influence on female authorship overall was inconclusive in the field of medicine (Andersen et al., 2020). In the United States, highly educated women usually begin motherhood in their 30 s (Pew Research Center, 2015). Because the earliest age for the first publication can be determined to be around the mid-20 s, we can assume that mid-career researchers are likely to have minor children who need parental care (Jensen et al., 2009).

Another possible protective effect of academic prestige is how the author's profile in academia affects vulnerability. Two competing hypotheses are possible. First, less prestigious female authors may be more vulnerable to the pandemic because of insufficient resources and support. Second, more prestigious authors may be more vulnerable due to harsher competition with comparably prestigious male academic counterparts.

Females are known to be underrepresented in the academic world, especially in top positions. For example, females tend to be evaluated by performance less favorably than males, which influences the small number of leading positions available for females (De Paola et al., 2018). Similarly, in America, female academics do not obtain tenure and promotion at the same rate as men in similar fields with similar academic credentials (Bonawitz & Andel, 2009). Moreover, females still report stress from lack of administrative support, current economic conditions, bias of tenured colleagues, and family responsibilities (Williams, 2005). The glass ceiling and the maternal wall, formed by gender stereotypes, demote their position, and shorten their tenure (Williams, 2005). In addition, females are deprived of research time because they tend to be assigned more ‘feminine’ tasks, such as student counseling and local arrangement of conferences.

Although there is no absolute standard for scientific prestige of either academics or their affiliations, the citation-based bibliometric indicators, such as number of citations, papers with the highest number of citations, and scientific proximity, are usually employed (Wildgaard et al., 2014). These measures are generally focused on either of two points: i) productivity of authors and ii) citation impact of their publications. The h-index (an increasing function of time) is a well-known measure that can embrace both points to quantify the research output of individual scientists (Hirsch, 2005), and can also be applied as an affiliation-level metric (Grant et al., 2007), although there is a bias towards mature scientists (Egghe, 2007). Therefore, it can be an indicator of both career longevity and academic impact. In other words, a high h-index indicates a stable research environment that secures authors from external threats.

In addition to different life stages and statuses, different disciplines can yield significant differences between gender (Jemielniak et al., 2021). Gendered culture in laboratory-based and academic occupations can be an intervening factor. The survey demonstrates that physical research time in disciplines requiring laboratory work (such as biology, chemistry, and engineering) decreased sharply after COVID-19, while there was little impact for dry laboratories (Myers et al., 2020). One exception is time spent on medical research related to COVID-19 (Myers et al., 2020). With a higher care burden for female researchers (as already discussed) (Cui et al., 2020; Yildirim & Eslen-Ziya, 2020), the research productivity of female researchers in STEM decreased after the outbreak of the pandemic (Andersen et al., 2020; Krukowski et al., 2021). Females who are expected to conform to traditional gender roles in male-dominated organizations consider themselves excluded and marginalized from the majors (Hatchell & Aveling, 2008). Therefore, female researchers in STEM fields are in more critical environments than males during the pandemic. While it may be extremely difficult to detect gender-related cultures surrounding researchers, this could be approximated by a national gender-equality index for professional occupations (Kashyap & Verkroost, 2021).

The last interacting factor is the epidemiological risk at the national level. Lockdown and social distancing policies are mostly executed at the national level. Female researchers in a country of higher risk may have been more likely to work from home, absent from the laboratory, and taking care of home-schooling children. Those epidemiological risks can be measured by the cumulative number of infections and deaths by COVID-19 (Dong et al., 2020).

Here, based on large-scale empirical publication history before and during the first year of the pandemic, we seek to understand the gendered effect of researcher productivity in a quantitative manner.

3. Materials and methods

3.1. Data description

We used the February 15, 2021 dump of the Microsoft Academic Graph (MAG) dataset in this study (Sinha et al., 2015). This dataset includes bibliographic information of scientific items, such as journal papers, patents, and conference proceedings, in tab-separated values (TSV) format, with their metadata. Additional socioeconomic data were collected from various sources, including the gender equality index and total infected cases with mortality (Dong et al., 2020; Kashyap & Verkroost, 2021). Specifically, we used the LinkedIn gender gap indices (GGI), which calculate the difference between genders in their occupations (Kashyap & Verkroost, 2021). A higher value indicates a more balanced country regarding gendered occupations, with an average of 0.73 among 170 countries. The total of infected cases and mortality of COVID-19 were retrieved from the COVID-19 Map provided by the Johns Hopkins Coronavirus Resource Center (Dong et al., 2020). We used cumulative statistics to December 31, 2020. The average of total cases per million was 16,051.62 across 174 countries, and total deaths per million averaged 311.94 across 163 countries. We also used COVID-19 Community Mobility Reports from Google to retrieve the differences in mobility during the pandemic (Google, 2021). This data has six place categories from which we selected two relevant places (Workplaces and Residential mobility). Each mobility was calculated for an average between the first date of report to December 31, 2020. A lower score of mobility means less movement at their place compared with the pre-pandemic period. The average workplace mobility was −20.45 (133 countries) and residential mobility was 9.17 (130 countries).

To construct the test dataset, we selected papers published between 2016 and 2020 (see Table 1 for the statistical details). We first filtered all items published during a period of 5 years (2016–2020) from the Date element. We then filtered only journal papers that denote the date of an offline publication. We also additionally analyzed online preprints, where no peer-review had been completed (Figs. S10–S14, Fig. S16). We filter out papers with 50 or more authors to focus on the general practice of science and technology because the ecosystem of the large sciences is different from the typical sciences. After filtering, the average number of co-authors was reduced from 4.8 to 4.6.

Table 1.

Descriptive statistics of the bibliographic data set.

N (%) Mean SD Min Max
Papers
 Number of Journals 36,005(100)
 Number of Issues 940,892(100)
 Number of papers 15,280,382(100)
 Number of coauthors 4.55 3.61 1 50
 Number of gender-identified papers 15,092,866(100)
 Number of papers with any female author 9772,377(64.75)
 Number of papers with a female first author 5070,922(33.60)
Individual characteristics by the first author
 Academic Age 11,796,675(78.16) 9.75 9.88 0 60
 Author h-index 11,821,233(78.32) 5.89 7.40 0 146
 Affiliation h-index 11,828,866(78.37) 167.56 118.27 0 674
National characteristics by country
 Gender Equality 170(100) 0.73 0.32 0.13 2.12
 Total Cases Per Million 174(100) 16,051.62 18,720.89 5.63 76,818.85
 Total Deaths Per million 163(100) 311.94 384.66 0.17 1684.96
 Workplace Mobility 133(100) −20.45 9.47 −46.98 10.92
 Residential Mobility 130(100) 9.17 4.68 0.02 25.64

Note: We analyzed the metadata in a paper unit, and thus we present the paper-level statistics. The researcher's characteristics were categorized into individual characteristics and country-level characteristics. We present the number of papers for individual-level characteristics. We also display the number of countries with country-level statistics because they were gathered incompletely from the original sources.

3.2. Identification of authors and affiliations for a paper

To analyze the scientific activities of authors and affiliations, it is necessary to identify the identities of authors and affiliations. Affiliation refers to the institutions to which the authors belong, such as a university, company, laboratory, hospital, etc., without specifying a particular department. This is a very coarse-grained identification of affiliation but can be enough to capture affiliation-related prestige by h-index. We used the Paper Author Affiliations table in Microsoft Academic Graph. This table contains columns named AuthorId and AffiliationId, which are unique identifiers of the authors and affiliations, respectively, along with the unique identifier of the paper (PaperId). We used these identifiers for other analyses. The additional information of the paper was also extracted from the other tables in Microsoft Academic Graph (such as the country information from the Iso3166Code column in the Affiliations table, which documents two-letter country codes defined in ISO3166 Standards). We disregarded the papers where the affiliations were not assigned to the authors for the analysis using the country information. In addition, we defined the academic age as the elapsed time between the author's first journal paper and the present. Author sequences were retrieved from the AuthorSequenceNumber column in the Paper Author Affiliations table to identify the first and last authors of a paper. It could be argued that the importance of the first author may vary by discipline. For example, it has been noted that authorship is presented in alphabetical order for mathematics. However, there is no significant difference in the fraction of the alphabetically ordered papers across the disciplines (Figs. S15 and S16); thus, we consider the first authors as the leading authors regardless of the FOS.

3.3. Estimation of the h-index for authors and affiliations

To capture the prestige of authors and affiliations, we used the h-index: a well-known citation-based metric (Hirsch, 2005) defined as the largest number of h that the given author (affiliations) has published h papers cited at least h times each. Thus, it needs to estimate the number of citations for all papers. For this purpose, we extracted all forward citations from the reference list in the Paper References table. We then calculated the citation count for each paper to estimate the h-index by assigning the papers to authors and affiliations. For the h-index, we used MAG regardless of the publish year.

It has been noticed that the h-index is a limited tool for evaluating researchers and stressed that librarians should exercise caution when using the h-index (Barnes, 2017). The construction of the h-index is essentially arbitrary according to some studies (Bornmann, 2014; Gingras, 2014). The main criticism is that the h-index cannot surpass the entire number of papers produced by a person. Therefore, the h-index is likely to be low for early-career scholars with high citation counts (Kelly & Jennions, 2006). This limitation also reduces the ability to distinguish between innovative scientific papers and more conventional scientific research (Gaster & Gaster, 2012) and undervalues the accomplishments of moderately productive authors (Costas & Bordons, 2007). It is difficult to directly compare researchers from different domains using the h-index, mostly due to the lack of standardization for reference procedures and traditions in the various scientific disciplines, where all citation-based measurements are potentially constrained. Diverse normalizations have been proposed, but there is no gold standard; consequently, it is essential to avoid directly comparing authors or institutions from different disciplines with a single value (Alonso et al., 2009).

Despite the criticism, the h-index is still one of the most extensively used citation-based measures by indexing services, e.g., Scopus, Web of Science, and Google Scholar. The h-index can be applied to almost any level of aggregation, from a single scholar to departments, to universities, and indeed different countries (Norris & Oppenheim, 2010). A study also suggested that the h-index can be used to measure academic achievement at the university level (Huang, 2012). In some sense, the h-index served as the de facto standard of measuring scientific status.

Our analytic strategy to use the h-index is to maximize its benefits and minimize its disadvantages. We measured the h-index at both the institutional and individual levels, and both were used as surrogate indicators of which researchers could conduct stable research during the pandemic instead of being considered as their excellence. Due to the impossibility of measuring all research environments of individual researchers with the h-index alone, we attempted to measure the researcher's situation more precisely by comparing the two results simultaneously. Because the overall level of the h-indexes can vary by discipline or other external factors. We thus only used coarse-grained groups of the researchers and institutions in four and ten levels rather than using the original values of h-index. Researchers and institutions with lower values of h-index tend to be less well-known and established. Therefore, we presume that they are not in a secure setting, in response to the damages from the pandemic.

3.4. Gender disambiguation

We used Genderize (https://genderize.io/) to assign the gender of authors. Genderize is a paid web application using statistics collected from the user profiles of social networks and census data. Specifically, we used a Python package named genderize (https://github.com/SteelPangolin/genderize) to collect the gender information from Genderize. We determined authors’ gender based on their first name, which is the first token of the entire name separated by a space. We consider the author to be male or female based on the classification result, unless the gender was uncertain, in which case the name is classified as None. When there is not enough information in the data to predict gender of a given name, genderize cannot predict gender. Because it is based on statistics, the performance of name-based identification cannot be perfect. Low accuracy has been reported for Asian countries, such as China, Japan, and South Korea (Karimi et al., 2016). Because there are 21,804,885 unique authors in the dataset, it is almost impossible to identify their gender manually. Genderize is also known to be more accurate than other identifiers (Karimi et al., 2016), which was why we selected it for our analysis.

3.5. Change in difference analysis

We grouped the papers into four levels (Table S1). For each group, we used CID analysis to verify the increase or decrease in female authors’ productivity compared to males for each year. The CIDs were estimated in the following way (Eq. (1)). Let the total number of papers in year X be A(X) and the number of papers by female authors be B(X). Then we can estimate the part of female authors by F(X) = B(X)/A(X), which is the proportion of papers with a female author. Therefore, the CID is defined as follows:

CID(X)=[F(X)F(X1)][F(X1)F(X2)] (1)

3.6. Bayesian gaussian mixture model and t-SNE

To determine whether 19 academic disciplines can group into distinct clusters with similar publishing trends, we perform cluster analysis by building a simple feature vector for each discipline. The vector consists of the CIDs for each discipline as follows: for each of the 19 disciplines, 96 CIDs can be yielded in 2020 based on three author roles (female first author, female last author, and any female author) and quartile groups by each of eight author attributes in Figs. 3 and 4 (academic age, author h-index, affiliation h-index, gender equality, total cases per million, total deaths per million, workplace mobility, and residential mobility). Each value of CID is considered a component of feature vectors to vectorize as 96-dimensional vectors for each field.

Fig. 3.

Fig 3

CID for% female first authors by first author's characteristics. We grouped the papers into four levels for each of the first author's six characteristics and compared CIDs for% female first authors across the four groups. We considered three individual- and institutional- level characteristics: A, academic age since the first publications, B, author h-index reflecting authors prestige, C, affiliation h-index displaying the status of the affiliations (see Methods), and three country-level statistics, D, the level of gender equality (see Methods), and E, the total infected cases, and F, mortality per million for each country (fig. S5 for the averaged CID between 2018 and 2019 to compensate for the yearly fluctuation). Some disciplines tend to arrange the authors alphabetically, as opposed to in order of contribution (Joanis & Patil, 2021), and this may alter the results. We also present the same analysis excluding publications whose authors' last names were arranged alphabetically, which yields almost identical findings to those with all papers (Figs. S15-S19).

Fig. 4.

Fig 4

Distribution between COVID-19 severity and mobility and CID (%) for female first authors by mobility. The mobility score indicates how visits and lengths of stay at certain locations changes over time compared to a pre-pandemic period (see Methods; Google, 2021). Panels A and B show correlation between severity of COVID-19 and mobility by country. The x-axis is the logarithm of total cases per million (green) and total deaths per million (yellow). Panels C and D show the change in mobility by the severity groups as of Fig. 3. Panels E and F display CIDs for female first authors by their change in mobility. We grouped authors into four levels to estimate how lockdown affects research activity during the pandemic (for CID between 2018 and 2019 compensating for yearly fluctuation, Fig. S7).

We applied two machine learning techniques to these vectors. In particular, we utilized the Bayesian Gaussian mixture model to group the disciplines, which effectively partitions vectors in hyperspace (Roberts et al., 1998). Compared to other models, it exhibits superior performance for high-dimensional data, regardless of simulated and real data (Constantinopoulos, Titsias & Likas, 2006). Then, the t-SNE technique was utilized to show the higher-dimensional feature vectors in two-dimensional space while keeping their original degrees of local separation by manifold learning (Van der Maaten & Hinton, 2008).

3.7. Two proportion Z-Test and combining P-values

The existence of a female author in a paper can be considered a single random variable of the Bernoulli process; thus, the ratio essentially follows the binary distribution. For a large number of samples, the binary distribution can be estimated as the Gaussian distribution. Thus, to yield the p-values for the CIDs, we performed a two-proportion z-test under the null hypothesis (H0) when the CIDs are less than 0 in 2020 as follows (Eq. (2)):

H0:CID2020=Z2020Z2019>0 (2)

In this case, the test statisticZt can perform a one-tailed z-test and can be written as follows by the summation property of the Gaussian variables (Eqs. (3) and (4)):

p^0=p^tm+p^t1nm+n,Zt=(p^tp^t1)SEt=(p^tp^t1)p^0(1p^0)(1m+1n), (3)
CID2020=(Z2020Z2019)SE20202+SE20192. (4)

Here, p^0 is overall sample proportion and m and n are sample sizes at time t and t1. Term p^t (p^t1) is the sample proportion at t(t1); thus, SEt denotes the standard error at timet.

Because there are four groups for each characteristic, we need to yield a single statistic to validate our results. Fisher's method combines multiple p-values to obtain a single p-value to confirm all groups are statistically significant when the groups are statistically independent (Brown, 1975). We performed a chi-square using Fisher's methods as follows (Eq. (5)):

X2k22i=1kln(pi) (5)

where piis the p-value for each group i.

3.8. Estimation of feature significance using XGBoost

XGBoost is a scalable tree boosting system widely used in machine learning tasks (Chen & Guestrin, 2016). We tested the feature importance of variables to predict the existence of female authors by published year and author role. Because it was pre-pandemic, we excluded the total infected rate and mortality rate for 2019. Meanwhile, we used six features for 2020. We used the characteristics of the first authors to model the existence of the first authors, whereas we used the last authors’ statistics when we accounted for the last authors. However, the average was used for the other authors and all authors. We calculated feature importance through the reduced average training loss when using the feature by using the gain option of XGBoost. We normalized the f-score by dividing the score by their total sum for a fair comparison. We used the standard train/test splits with a ratio of 9:1 and then trained the model with the following parameters: n_estimators = 1000, learning _rate = 0.1, max_depth = 7.

4. Results

4.1. Changes in productivity and collaboration

Before examining gendered productivity, we present how overall productivity and the degree of collaboration have changed since the COVID-19 outbreak (Fig. 1 ). We measure the degree of collaboration by the average number of authors per paper and productivity for each field by the total number of papers in the field. Most fields demonstrate an increased number of papers from 2019 to 2020 (net of common trends since 2018), except for three fields—art, history, and philosophy—whose CIDs for number of papers are not positive in 2020. Notably, in no discipline did the number of co-authors and the number of papers decline simultaneously in 2020. (Fig. S1A for comparing CIDs between the number of papers and the average number of authors). Therefore, our analysis of the entire year (2020) suggests that collaboration and research productivity in most academic fields increased, arguably as a result of the COVID-19 pandemic (Fig. S1B for further comparisons with previous years).

Fig. 1.

Fig 1

CID in Number of Papers and Average Number of Authors per Paper by Field. For all 19 fields of study, we present CIDs (see Methods) in the number of papers (A) and the average number of authors per paper (B) for 2020 (orange bars) and 2019 (blue bars) in the ascending order of values for 2020. There are no clear patterns for different distributions between two years. CIDs for the number of papers even increased. Neither productivity (number of papers) nor collaboration (number of authors) has decreased since the COVID-19 outbreak. Productivity in most fields improved during the pandemic.

Given the increased productivity and collaborations during the first year of the pandemic, we were interested to determined how productive females academic were, depending on their collaborative role. The contribution of female authors to research is often devalued, which leads to cumulative disadvantages in authorship (Ni et al., 2021). At the same time, conventional gender stereotypes characterize housework and childcare as female duties. We may assume that female researchers are more challenged than males during the pandemic (Deryugina et al., 2021; Myers et al., 2020). In many fields, the first author is considered the primary member who conducts the research. They are usually early-career scientists who ask for mentoring from the senior researchers (the last author) (Sekara et al., 2018). In other words, the first author requires more time on the research than other authors; meanwhile, the last author may invest substantial time in finalizing the study, but less than the first author. COVID-19 has led to an abrupt change in society. Under the economically and socially uncertain environment resulting from the pandemic, the first authors in charge of practical analysis may be more vulnerable than other authors because they have to invest the most time among all the authors. Combining the gendered effect and the role of authors, we focus on how much more female first authors have been disadvantaged than those in other authorial positions during the pandemic.

4.2. Female productivity by authorship

To estimate gendered productivity, we first need to identify authors’ gender. Unfortunately, the metadata does not include information about the gender of authors; we thus infer the gender using a web-based application that incorporates various census data (see Methods). Also note that the overall participation of female authors has increased in recent years (Fig. S3) (Bernstein, 2017). To compensate for this increasing trend, we employed the CID measure, which enables comparison of the yearly variations net of the common effect across recent years (see Methods), with the results presented in Fig. 2 .

Fig. 2.

Fig 2

CID (% female) by author's role and field of study. The CID for a year estimates how much larger or smaller the change from the previous year is compared with past change to the previous year. Panel A: We measured the CID for the percentage of female authors broken down by authors’ roles in a paper: the first, the last, and the rest (others). This shows that the% change of female first authors drops most significantly in 2020 by 0.33% (orange bars), reversing the increase in trend up to 2019 (blue bars). Although less dramatic, we also observe decreases in the% change of female last authors, which is unobserved for other author roles. Despite the significant decrease of female authors in first and last authorship, CID for any authorship decreased by only 0.02%, which suggests that the majority of female authors play other roles than either first or last author. Panels B and C: We present bar graphs of CIDs in (B) the first and (C) last authors by 19 fields of study post-pandemic in 2020 (orange bars) and pre-pandemic in 2019 (blue bars). Except for four fields, CIDs for 2020 tend to decrease relative to CIDs for 2019. In other words, the percentage of female first authors began to decrease in 15 of 19 academic categories in 2020. Although less apparent than those of the first authors, more than half of CIDs for 2020 exhibit negative values indicating a decline in the percentage of female last authors in most fields. In short, what we observe in panel A is not attributable to a few fields but common across various fields (for comparing to CIDs between 2018 and 2019, fig. S2).

First, we observe that the change of research productivity is insignificant for female authors post-pandemic when we ignore the role of authors (Fig. 2A). Compared with the recent pre-pandemic period of 2019, the CID (%) has decreased from 0.34 to −0.02. This switch to a decreasing trend is also valid for the non-leading authors who are neither the first nor last authors.

Second, when we account for roles, we observe a sharply decreasing tendency of females as the first authors during the pandemic (Fig. 2A). The difference is clear when we compare it with the pre-pandemic years at an increase of CID (%) by 0.28. This tendency remains the same even when we compare the two after breaking down into monthly CIDs for the productivity of female first authors: in 2020, female first authors' productivity is lower than that of the same month in 2019 for most months (Fig. S4). The decreasing female productivity is also valid for the last authors, although less salient. The CID of the last authors was positive (=0.07) pre-pandemic, which later became negative in the post-pandemic era (=−0.09) (Fig. 2A).

It could be inferred that the trend may vary for the FOS. Indeed, we find variations of CIDs across the 19 Level 0 FOSs defined by MAG. For instance, the 2020 CIDs of physics, political science, biology, and geology are positive for the first authors (Fig. 2B). However, the CIDs are negative for the majority of FOSs (specifically, 15 out of 19), which is consistent with what we observe in Fig. 2A at the aggregate level. The negative CIDs are less apparent for the last authors but hold for over half of the FOSs (12 out of 19) (Fig. 2C). Note that about half of FOSs show decreased female participation for both the first and last authors with negative CIDs. In summary, the number of lead female authors decreases in many fields regarding both first and last authors, which is particularly serious for the first authors who execute experiments and write papers themselves. In other words, female authors have tended to take less important roles during the pandemic.

Throughout Fig. 2, although the overall research productivity of female scholars seems not to have significantly decreased, we observe changes in their role during 2020. Female authors are less likely to take leading positions, such as the first and last authors. Considering the reports of inequality in terms of housework and childcare responsibilities (Facebook, 2021), we may infer that the pandemic has limited the time and energy that female researchers can devote to research; thus, they may opt for less active roles.

4.3. Female productivity by individual, organizational, and national characteristics

The transition of female authors from active roles to non-active raises an intriguing question as to how much the impact depends on female authors’ life stage and academic status. For instance, the presence and age of children can be a critical factor for parenting researchers’ productivity. In the United States, highly educated females tend not to start their motherhood until their 30 s (Pew Research Center, 2015). Thus, mid-aged researchers are more likely to have minor offspring who need care, resulting in more time being spent on childcare by these women. Unfortunately, the data does not document the age of authors to prove this hypothesis; we thus alternatively employed academic age, which refers to elapsed years from the first paper. We divided papers into four groups by every 10 years of the first author's academic age (see Methods). Calculating the percentage of female first authors and their CIDs, we find clearly decreasing percentage changes for female first authors of all academic age ranges, the highest being for mid-ages in academia (Groups 2 and 3 with CID of −0.39 and −0.55, respectively; Fig. 3 A). On average, researchers publish their first paper in their mid-20 s (Jensen et al., 2009); thus, academic age can be estimated as 25 years behind the age of researchers. Groups 2 and 3 correspond to the researchers from their mid-30 s to the mid-50 s. Because motherhood of highly educated females usually begins after their 30 s, it can be inferred that these groups are more likely to have children to care for than other groups. In addition, group 3 (most significantly affected), may have teenage offspring. In summary, this finding supports our hypothesis that childcare affects the productivity of female researchers.

Besides academic age, the status of authors and their affiliations can be critical elements of productivity during the pandemic (Long, 1978). We used the h-index, which is widely accepted for the author- and affiliation-level performance metrics to gage prestige. The h-index was high for the prominent authors (and affiliations) with well-established academic positions (see Methods). We found that publications with the least established first authors in group 1 had the biggest fall in percentage change among female first authors, implying that least established females were more severely affected than females in other groups, regarding both authors and their affiliations (Figs. 3B and C). This effect decreases as they become more established, except for the most prestigious authors and affiliations in group 4. In sum, the negative CID for the proportion of female first authors shows an inverted U shape across prestige (Fig. S6), both for individual and organizational levels, as a combination of low-prestige vulnerability and high-prestige competition. Previous studies suggest that female faculty experience stresses (such as lack of support and poor economic conditions) and are underrepresented in top positions in academia (De Paola et al., 2018). The glass ceiling is a barrier to promotion and tenure for female researchers and the assignment of tasks due to gender stereotypes diminishes the available time for female academics to engage in research time (Williams, 2005).

For a more detailed understanding of the decline of female authors’ first authorship, we further examined the relationship between the living conditions of their country and the productivity of female authors. Because we hypothesize that an increased burden of housework and childcare means that female authors spend less time on research, overall gender inequality in the country may influence the degree of disadvantage experienced by female researchers (Methods). Indeed, we observed a negative correlation between gender equality and the decline in first authorship for females (Fig. 3D). The decline of CID in 2020 is greatest for group 1 in the most unequal countries (=−0.56), whereas it is lowest for the most equal countries belonging to group 4 (=−0.16). However, we also observed that productivity decreased regardless of the gender equality index during the pandemic.

It may be expected that the severity of COVID-19 in the country of domicile has an influence on female productivity because epidemiological policies tend to be applied at the national level and female researchers will be under the same influence of those policies However, we barely observed consistency with the total infected cases per million (Fig. 3E). The CID in 2020 is the lowest for group 2, or countries with moderate infection rates. The mortality rate shows more consistent results (Fig. 3F). The CIDs in 2020 are high for groups 2–4, the countries with moderate to high mortality rates. In other words, the countries that handle mortality rates well also have the least decrease in female first authorship (Group 1 in Fig. 3F). It seems that mortality rates by COVID-19, rather than infection rates, is a better indicator of living conditions that may influence female disadvantage during the pandemic.

In summary, we found that female first authors’ research productivity during the pandemic decreases in the following cases: i) if they are mid-career and likely to have minor offspring, ii) if they are less well academically established, iii) if they live in gender-unequal countries, and iv) unless they live in countries with a very low mortality rate for COVID-19. Note that the infection rate itself does not show a significant correlation with the change in female first authors. Successfully suppressed infections could be both a favorable condition for female productivity and an outcome of intensive social distancing, which increases female household burdens.

4.4. The impact of mobility on research productivity

Many countries have tried to prevent the spread of the COVID-19 by limiting in-person contact. Physical and social distancing, including severe lockdown, has been practiced by many countries. These practices essentially limit personal mobility; thus, mobility can also be considered a COVID-19–related variable, similar to total cases/deaths per million. Both lockdown and distancing certainly reduce the impact of COVID-19 (Kharroubi & Saleh, 2020), although changes in mobility may vary from country to country because each country has applied different distancing policies. Indeed, there is a correlation between total cases/deaths per million and change in mobility, although this is not completely deterministic (Figs. 4 A and B). A social distancing policy typically reduces mobility to workplaces and increases mobility in residential areas, both of which typically increase the burden of females at home because of the closure of school and childcare during the pandemic (Facebook, 2021). Their home-based work overlaps with their children's home-schooling and they cannot consequently dedicate themselves to their research. Therefore, a change in mobility may lead to a decrease in research productivity.

This hypothesis suggests that the impact of COVID-19 severity may have indirectly affected female research productivity by changes in national-level distancing or lockdown policy. We can test this mediated effect by examining the extent of change in female productivity according to changes in national mobility. As we mentioned in the previous sections, overall research productivity increased, although the proportion of female leading authors did not keep up with the increased opportunity post-pandemic in 2020. We hypothesize that the correlation between COVID-19 severity and research productivity of female first authors is largely determined by the correlation between severity and change in mobility. In other words, the research productivity of female first authors decreases when they travel less to their workplace and spend more time in their homes.

We observe that workplace mobility decreases when total deaths/cases per million increases (Figs. 4A and C). However, residential mobility increases when total cases/deaths per million increases (Figs. 4B and D). It appears that workplaces closed and time spent at home increased in countries that implemented a strict policy as COVID-19 became more severe. As we hypothesized, research productivity decreased less with more time spent at work (Fig. 4E). Compared to other groups, CID in 2020 decreased the most in group 1 (=−0.47). The CID for group 4, who spend more time in the workplace, decreased least in 2020 (=−0.09). By contrast, as residential mobility increases, research productivity decreases, except for group 2 (Fig. 4F). CID for group 1 in 2020 (=0.03) is bigger than group 4 (=−0.76). In the case of groups 1, 3, and 4, CID in 2020 decreases when residential mobility is higher.

4.5. Differences in CIDs of 2020 between fields

The research productivity of female researchers may vary by academic field. When females are required to play traditional gender roles at home and work in male-dominated organizations such as STEM, the female researcher may become less productive than male counterpart during the pandemic (Hatchell & Aveling, 2008; Andersen et al., 2020; Krukowski et al., 2021). Therefore, it is necessary to investigate the influence of discipline on the research output of female authors to probe the issue in depth. We conduct a clustering analysis of fields to verify whether CIDs in 2020 or gendered research productivity differ by field, using the way we examined CIDs by eight characteristics of the first author divided into four levels in Figs. 3 and 4. We cluster the 19 disciplines by building a simple feature vector with 96 CIDs yielded in 2020 based on the quartile groups by combinations of three author roles and eight author attributes for each discipline (Fig. 5; see Methods for details).

Fig. 5.

Fig 5

Clustering fields by CIDs in female author's role and characteristics with Bayesian gaussian mixture model. We group the feature vectors of fields from two to ten clusters using the Bayesian Gaussian Mixture model and visualize using t-SNE (see Methods). The size of the circle represents the number of papers for each field, and the colors show the assigned cluster.

Since the number of clusters may affect the overall landscape, we conduct clustering analysis ranging from 2 to 10 clusters. Until the number of clusters increases to five, the largest group is not segregated. Instead, fields such as engineering, art, philosophy, and history are separated one by one from the largest cluster. These fields are classified as humanities except engineering (Fig. 2). Notably, these fields are mainly distributed at the bottom 25% regarding the number of papers and the ratio of female authors, where their CIDs of female first authors are also in the lowest ranges in 2020 (Fig. S8 and Table. S2). As a result, results of up to five clusters suggest that the gendered impact for those disciplines, in which female researchers are more marginalized, diverges from the largest cluster. When the number of clusters increases to six or seven, disciplines of social sciences are separated from the largest cluster consisting of natural science, medicine, and applied science (except engineering). Therefore, the gendered impact of the pandemic differs between the hard and soft sciences. Given that the CIDs of social sciences and humanities are typically more negative than those of hard sciences, we may conclude that female authors working in soft science were more vulnerable to the pandemics (Fig. 2). One single exception was engineering. Considering that the category of engineering is separate from those of computer and material sciences and has the smallest number of papers compared to other disciplines (Table S2), we may assume that MAG avoids classifying as engineering when a more appropriate field of study exists; and this may impact the clustering result. In summary, our clustering results largely suggest that differences between the fields are relatively minor, as the largest cluster remains up to five clusters assigned.

4.6. Relative significance of factors for change in productivity

So far, we have explored factors related to the decline of female first authorship. Our finding of multiple factors prompts two key questions: is the decline sincerely related to the pandemic and is any single factor more important than the others. The first question can be answered by the high statistical significance indicated by the p-values being almost zero for all negative CIDs, which confirms our observations in the figures (Table S3 and Methods). However, the p-value does not indicate the degree of influence for each factor. To address the remaining second question, we performed a binary classification for the presence of a female author for each author role by constructing an ensemble of the tree models using XGBoost (see Methods). We compared feature importance (between 2019 and 2020) by the role of female authors (Fig. 6). We used four COVID-unrelated features for 2019 and added two COVID-related features for 2020. The two COVID-related features were workplace and residential mobility—shown to have a significant impact on research productivity for female authors (Figs. 4E and F).

Fig. 6.

Fig 6

Feature importance by female author's characteristics measured with XGBoost. We present the f-score denoting the relative feature importance for the existence of female authors by their role at A, 2019 and B, 2020, respectively. This calculates the feature importance through the reduced average training loss when using the feature by the gain option of XGBoost (see Methods). We normalize the f-score by dividing the score by their total sum for a fair comparison. We yielded acceptable accuracy for all six models. Specifically, the model accuracies are 0.694 (any female author), 0.655 (female first author), and 0.739 (female last author) for 2019, whereas those in 2020 are 0.704, 0.648, and 0.733, respectively. Note that only four features are used for 2019 because it was before the onset of the pandemic, whereas we additionally used mobility features for 2020. To check the robustness of our model, we also tested the models that include all features and four author characteristics along with residential mobility, with similar results (Figs. S9A and S9B). Compared with the other features, gender equality and COVID-related mobility at the national level are more important for predicting whether the leading authors (that is, first and last authors) are female or not.

Both pre- and post-pandemic, having a female author at any role in a paper is mainly influenced by the average academic age of authors. Over time, more female researchers have entered academia (Bernstein, 2017); thus, there is a greater possibility of including female authors when the team members are young. However, the most important feature in 2019 was gender equality among leading authors (Fig. 6 A), although it became the second important feature for leading authors in 2020. The most important feature in 2020 switched to residential mobility (Fig. 6B). Therefore, our classification results support the importance of COVID-related constraints at the national level that prevent female researchers from leading research during the pandemic. Consequently, it would appear that these constraints are more important than a researcher's career stage and status at the individual and organizational levels.

5. Discussion

Gender equality has gradually improved but a divide still persists in which gender roles impose housework and childcare predominantly to females and essentially disrupt research productivity of female researchers (Deryugina et al., 2021; Fuwa, 2004; Greenstein, 1996). Although previous studies have reported the negative effect of COVID-19 on the research output among women, they were typically based on limited observations of particular journals, disciplines, and countries. Considering the pandemic's global impact, a study encompassing the entire academic community and various gender-role cultures was required, as we did in this study. In this study, we examined the change of female researchers’ productivity using a massive collection of bibliographic metadata that spans all academic fields (Microsoft Academic Graph) by analyzing 15,280,382 scholarly publications and their 11,828,866 authors over the last five years.

We did not observe a notable decline in just the proportion of female authors but found a sharp decline of female leading authors, whether first or last authors, at the global scale during the pandemic. The percentage of leading female authors had fallen in 2020, primarily as the first and then the last authors. In other words, the risk arising from the pandemic may have been so subtle that female researchers have not necessarily been excluded from publications but have been marginalized in the publication process. The global decline of leading female researchers calls for attention from the academic community, particularly considering the pre-pandemic year when the CIDs of female leading authors had been positive in many fields. In short, the outbreak of COVID-19 at best nullified and at worst reversed the recent growth in overall influence of female authors (Bernstein, 2017; Elsevier, 2020).

Our analysis also revealed under which conditions female researchers became more vulnerable to marginalization, in terms of their career stage, the status of their affiliations, and the levels of gender equality and COVID-19 severity in their countries. The weaks and minors in academia have more severe damage than established researchers.

First, we observed that mid-career female scholars experienced the greatest decrease in productivity as the first author arguably because they are likely to have minor offspring. We also found that the decline in productivity is steeper when the authors are in a relatively unstable academic position (measured by their h-indexes regarding themselves and their institutional affiliations).

At the same time, we found strikingly sharp effects of gender inequality and pandemic-related risks at the country level. First, the greater gender inequality in professional occupations, the greater decline in female first authorship during the pandemic. Second, female first authorship was discouraged by higher COVID-19 mortality and restricted mobility by social distancing. Taken together, these findings suggest that pandemic situations unfavorable to female researchers may be amplified by their disadvantages in the national job market and by quarantine policies in their country of domicile.

Our machine-learning-based analysis also suggests that those country-level factors play more important roles than individual and organizational factors, such as academic age and h-indices for determining female researchers’ leading authorships. This finding leads to an interesting policy implication for the global academic community. According to our findings, policies to prevent female researchers from being marginalized during the pandemic may target those in their mid-career with few citations yet, and who work in less influential universities and research organizations. At the same time, our findings suggest that more effective than those individually targeted supports could be global efforts to improve countries with severe gender inequality in the professional labor market and those subject to intense social distancing due to high epidemic mortality. Targeting of these countries does not directly support female researchers in need but may eventually prove beneficial.

The implication of our study is not limited to a report of the current state of academia during the pandemic but can be expanded to the future direction of academia as it recovers in the post-pandemic age. Our findings also help identify various structural constraints that can intensify female researchers’ disadvantages associated with gender-unequal burdens of housework and caregiving responsibilities. For instance, our study shows that the negative effect of the pandemic on female research leadership is not limited to a few specific journals or disciplines; furthermore, the decrease in female first authorship was present for most countries, except those that have been able to control epidemic mortality. Note that our analysis is based on a collection of socio-academic data to perform pan-academic–scale analysis, rather than using a survey methodology. Such data cannot completely capture each individual's environment, such as the division of housework in a household and the status of offspring. Despite this lack of complete information, we observed significant collective tendencies supporting our conclusions; that is, weaker and minor female researchers were more adversely affected during the pandemic.

Future research may benefit by improving from this study. In this study, we evaluated articles published between 2016 and 2020 due to the insufficient coverage of MAG from 2021 because of its discontinuation. However, as social activity has recovered gradually since 2021, further studies are required to verify if research productivity also has rebounded. We employ CID based on difference-in-difference (DID) to measure research productivity. The DID was originally proposed to compare the influence of the policy on two similar groups; thus, it would be ideal to set up two groups with identical members for comparison (Dimick & Ryan, 2014). However, it may be almost impossible to find such a group in practice, given the massive demographic shifts caused by the epidemic. We attempted to control the difference between the two groups of researchers by comparing 2019 and 2020 and 2018 and 2019 by considering field-level differences, however, the study is still susceptible to bias because the authors used for analysis cannot be identical between years.

Despite an increase in research participation of female authors over time, there are still disadvantages in taking leading roles, as we demonstrated with first authorship during the pandemic. This is consistent with the fact that female authors are more likely to be devalued from authorship as in previous studies (Ni et al., 2021; Pinho-Gomes et al., 2020; West et al., 2013). We also demonstrated that the authorship of female researchers is affected by the larger social status of labor market equality and work-from-home arrangements. Diversity is key to academic progress (Reagans & Zuckerman, 2001). Our study asks academia to pay attention to biased social structures that can unintendedly hinder diversity and suggests that vital actions are taken to protect vulnerable researchers against bias, specifically female researchers who lose opportunities for academic leadership.

Declaration of competing interest

The authors declare that they have no known competing financial interestsr personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2021S1A3A2A02090597). (E.K., J.K.). This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the Innovative Human Resource Development for Local Intellectualization support program (IITP-2022-RS-2022–00156360) supervised by the IITP(Institute for Information & communications Technology Planning & Evaluation) (J.Y.). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Footnotes

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.joi.2023.101380.

Appendix. Supplementary materials

mmc1.pdf (2MB, pdf)
mmc2.docx (35.4KB, docx)

References

  1. Abadie A. Semiparametric difference-in-differences estimators. The Review of Economic Studies. 2005;72(1):1–19. [Google Scholar]
  2. Acker J. HIERARCHIES, JOBS, BODIES:: A theory of gendered organizations. Gender & Society: Official Publication of Sociologists for Women in Society. 1990;4(2):139–158. [Google Scholar]
  3. Alonso S., Cabrerizo F.J., Herrera-Viedma E., Herrera F. h-Index: A review focused in its variants, computation and standardization for different scientific fields. Journal of Informetrics. 2009;3(4):273–289. [Google Scholar]
  4. Amano-Patiño N., Faraglia E., Giannitsarou C., Hasna Z. The unequal effects of Covid-19 on economists. Research Productivity. 2020 https://www.repository.cam.ac.uk/handle/1810/310888 [Google Scholar]
  5. Andersen J.P., Nielsen M.W., Simone N.L., Lewiss R.E., Jagsi R. COVID-19 medical papers have fewer women first authors than expected. eLife. 2020;9 doi: 10.7554/eLife.58807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Barnes C. The h-index debate: An introduction for librarians. The Journal of Academic Librarianship. 2017;43(6):487–494. [Google Scholar]
  7. Bell M.L., Fong K.C. Gender differences in first and corresponding authorship in public health research submissions during the COVID-19 pandemic. American Journal of Public Health. 2021;111(1):159–163. doi: 10.2105/AJPH.2020.305975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bernstein R. More female researchers globally, but challenges remain. Science (New York, N.Y.) 2017 doi: 10.1126/science.caredit.a1700022. [DOI] [Google Scholar]
  9. Bonawitz M., Andel N. The glass ceiling is made of concrete: The Barriers to promotion and tenure of women in American academia. Forum on Public Policy Online. 2009 https://eric.ed.gov/?id=EJ870462 [Google Scholar]
  10. Bornmann L. h-index research in scientometrics: A summary. Journal of Informetrics. 2014;8(3):749–750. [Google Scholar]
  11. Brown M.B. 400: A Method for combining Non-independent, One-Sided Tests of Significance. Biometrics. 1975;31(4):987–992. [Google Scholar]
  12. Card D., Krueger A.B. Minimum wages and employment: A case study of the fast food industry in New Jersey and Pennsylvania (No. 4509) National Bureau of Economic Research. 1993 doi: 10.3386/w4509. [DOI] [Google Scholar]
  13. Chen T., Guestrin C. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016. XGBoost. [DOI] [Google Scholar]
  14. Collins C., Landivar L.C., Ruppanner L., Scarborough W.J. COVID-19 and the gender gap in work hours. Gender, Work & Organization. 2021;28(S1):101–112. doi: 10.1111/gwao.12506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Constantinopoulos C., Titsias M.K., Likas A. Bayesian feature and model selection for Gaussian mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2006;28(6):1013–1018. doi: 10.1109/TPAMI.2006.111. [DOI] [PubMed] [Google Scholar]
  16. Costas R., Bordons M. The h-index: Advantages, limitations and its relation with other bibliometric indicators at the micro level. Journal of Informetrics. 2007;1(3):193–203. [Google Scholar]
  17. Cui R., Ding H., Zhu F. Harvard Business School; 2020. Gender inequality in research productivity during the COVID-19 pandemic. [Google Scholar]
  18. De Paola M., Ponzo M., Scoppa V. Are men given priority for top Jobs? Investigating the glass ceiling in Italian academia. Journal of Human Capital. 2018;12(3):475–503. [Google Scholar]
  19. Deryugina T., Shurchkov O., Stearns J. COVID-19 disruptions disproportionately affect female academics. AEA Papers and Proceedings. 2021;111:164–168. [Google Scholar]
  20. Dimick J.B., Ryan A.M. Methods for evaluating changes in health care policy: The difference-in-differences approach. JAMA. 2014;312(22):2401–2402. doi: 10.1001/jama.2014.16153. [DOI] [PubMed] [Google Scholar]
  21. Dong E., Du H., Gardner L. An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases. 2020;20(5):533–534. doi: 10.1016/S1473-3099(20)30120-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Egghe L. Dynamich-index: The Hirsch index in function of time. In Journal of the American Society for Information Science and Technology. 2007;58(3):452–454. doi: 10.1002/asi.20473. [DOI] [Google Scholar]
  23. Else H. How a torrent of COVID science changed research publishing - in seven charts. Nature. 2020;588(7839):553. doi: 10.1038/d41586-020-03564-y. [DOI] [PubMed] [Google Scholar]
  24. Elsevier . Elsevier; 2020. Elsevier's reports on gender in research.http://www.elsevier.com/gender-report March 4. [Google Scholar]
  25. Facebook . 2021. Survey on gender equality at home report.https://dataforgood.facebook.com/dfg/docs/survey-on-gender-equality-at-home Retrieved from. Accessed May 4, 2022. [Google Scholar]
  26. Fuwa M. Macro-level gender inequality and the division of household labor in 22 countries. American Sociological Review. 2004;69(6):751–767. [Google Scholar]
  27. Gao J., Yin Y., Myers K.R., Lakhani K.R., Wang D. Potentially long-lasting effects of the pandemic on scientists. Nature Communications. 2021;12(1):1–6. doi: 10.1038/s41467-021-26428-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Gaster N., Gaster M. A critical assessment of the h-index. BioEssays : News and Reviews in Molecular, Cellular and Developmental Biology. 2012;34(10):830–832. doi: 10.1002/bies.201200036. [DOI] [PubMed] [Google Scholar]
  29. Gingras Y. Criteria for evaluating indicators. Beyond bibliometrics: Harnessing multidimensional indicators of scholarly impact. 2014:109–125. [Google Scholar]
  30. Giurge L.M., Whillans A.V., Yemiscigil A. A multicountry perspective on gender differences in time use during COVID-19. Proceedings of the National Academy of Sciences. 2021;118(12) doi: 10.1073/pnas.2018494118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Google . 2021. COVID-19 community mobility reports.https://www.google.com/covid19/mobility Retrieved from. Accessed May 4, 2022. [Google Scholar]
  32. Grant J.B., Olden J.D., Lawler J.J., Nelson C.R., Silliman B.R. Academic institutions in the United States and Canada ranked according to research productivity in the field of conservation biology. Conservation Biology: The Journal of the Society for Conservation Biology. 2007;21(5):1139–1144. doi: 10.1111/j.1523-1739.2007.00762.x. [DOI] [PubMed] [Google Scholar]
  33. Greenstein T.N. Husbands’ participation in domestic labor: Interactive effects of Wives' and Husbands' gender ideologies. Journal of Marriage and Family Counseling. 1996;58(3):585–595. [Google Scholar]
  34. Hatchell H., Aveling N. Those same old prejudices? Gendered experiences in the science workplace. Journal of Workplace Rights. 2008;13(4):355–375. [Google Scholar]
  35. Hirsch J.E. An index to quantify an individual's scientific research output. In Proceedings of the National Academy of Sciences. 2005;102(46):16569–16572. doi: 10.1073/pnas.0507655102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Hochschild A.R. Emotion work, feeling rules, and social structure. In American Journal of Sociology. 1979;85(3):51–575. doi: 10.1086/227049. [DOI] [Google Scholar]
  37. Huang M.H. Exploring the h-index at the institutional level: A practical application in world university rankings. Online Information Review. 2012;36(4):534–547. [Google Scholar]
  38. Hunter L.A., Leahey E. Parenting and research productivity: New evidence and methods. Social Studies of Science. 2010;40(3):433–451. [Google Scholar]
  39. Jemielniak D., Sławska A., Wilamowski M. COVID-19 effect on the gender gap in academic publishing. Journal of Information Science. 2021 [Google Scholar]
  40. Jensen P., Rouquier J.-.B., Croissant Y. Testing bibliometric indicators by their prediction of scientists promotions. Scientometrics. 2009;78(3):467–479. [Google Scholar]
  41. Joanis S.T., Patil V.H. Alphabetical ordering of author surnames in academic publishing: A detriment to teamwork. PloS One. 2021;16(5) doi: 10.1371/journal.pone.0251176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Karimi F., Wagner C., Lemmerich F., Jadidi M., Strohmaier M. Proceedings of the 25th International Conference Companion on World Wide Web. 2016. Inferring gender from names on the web: A comparative evaluation of gender detection methods; pp. 53–54. [Google Scholar]
  43. Kashyap R., Verkroost F.C.J. Analysing global professional gender gaps using LinkedIn advertising data. EPJ Data Science. 2021;10(1):39. [Google Scholar]
  44. Kelly C.D., Jennions M.D. The h index and career assessment by numbers. Trends in Ecology & Evolution. 2006;21(4):167–170. doi: 10.1016/j.tree.2006.01.005. [DOI] [PubMed] [Google Scholar]
  45. Kharroubi S., Saleh F. Are lockdown measures effective against COVID-19? Frontiers in Public Health. 2020;8 doi: 10.3389/fpubh.2020.549692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Kibbe M.R. Consequences of the COVID-19 pandemic on manuscript submissions by women. JAMA Surgery. 2020;155(9):803–804. doi: 10.1001/jamasurg.2020.3917. [DOI] [PubMed] [Google Scholar]
  47. Krukowski R.A., Jagsi R., Cardel M.I. Academic productivity differences by gender and child age in science, technology, engineering, mathematics, and medicine faculty during the COVID-19 pandemic. Journal of Women's Health. 2021;30(3):341–347. doi: 10.1089/jwh.2020.8710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Lerchenmüller C., Schmallenbach L., Jena A.B., Lerchenmueller M.J. Longitudinal analyses of gender differences in first authorship publications related to COVID-19. BMJ Open. 2021;11(4) doi: 10.1136/bmjopen-2020-045176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Liu M., Zhang N., Hu X., Jaiswal A., Xu J., Chen H., et al. Further divided gender gaps in research productivity and collaboration during the COVID-19 pandemic: Evidence from coronavirus-related literature. Journal of Informetrics. 2022;16(2) doi: 10.1016/j.joi.2022.101295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Long A., Ascent D. World economic outlook. International Monetary Fund. 2020 https://www.imf.org/en/Publications/WEO/Issues/2021/03/23/world-economic-outlook-april-2021 [Google Scholar]
  51. Long J.S. Productivity and academic position in the scientific career. American Sociological Review. 1978;43(6):889–908. [Google Scholar]
  52. Minello A. The pandemic and the female academic. In Nature. 2020 doi: 10.1038/d41586-020-01135-9. [DOI] [PubMed] [Google Scholar]
  53. Myers K.R., Tham W.Y., Yin Y., Cohodes N., Thursby J.G., Thursby M.C., et al. Unequal effects of the COVID-19 pandemic on scientists. Nature Human Behaviour. 2020;4(9):880–883. doi: 10.1038/s41562-020-0921-y. [DOI] [PubMed] [Google Scholar]
  54. Ni C., Smith E., Yuan H., Larivière V., Sugimoto C.R. The gendered nature of authorship. Science Advances. 2021;7(36):eabe4639. doi: 10.1126/sciadv.abe4639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Norris M., Oppenheim C. The h-index: A broad review of a new bibliometric indicator. Journal of Documentation. 2010;66(5):681–705. [Google Scholar]
  56. Pew Research Center . 2015. For most highly educated women, motherhood doesn't start until the 30s.http://pewrsr.ch/1DIpIhX Retrieved from. Accessed May 4, 2022. [Google Scholar]
  57. Pinho-Gomes A.-.C., Peters S., Thompson K., Hockham C., Ripullone K., Woodward M., et al. Where are the women? Gender inequalities in COVID-19 research authorship. BMJ Global Health. 2020;5(7) doi: 10.1136/bmjgh-2020-002922. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Reagans R., Zuckerman E.W. Networks, diversity, and productivity: The social capital of corporate R&D teams. Organization Science. 2001;12(4):502–517. [Google Scholar]
  59. Roberts S.J., Husmeier D., Rezek I., Penny W. Bayesian approaches to Gaussian mixture modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1998;20(11):1133–1142. [Google Scholar]
  60. Sekara V., Deville P., Ahnert S.E., Barabási A.-.L., Sinatra R., Lehmann S. The chaperone effect in scientific publishing. Proceedings of the National Academy of Sciences of the United States of America. 2018;115(50):12603–12607. doi: 10.1073/pnas.1800471115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Sinha A., Shen Z., Song Y., Ma H., Eide D., Hsu B.-J.(paul), et al. Proceedings of the 24th International Conference on World Wide Web. 2015. An overview of microsoft academic service (MAS) and applications; pp. 243–246. [Google Scholar]
  62. Squazzoni F., Bravo G., Grimaldo F., García-Costa D., Farjam M., Mehmani B. Gender gap in journal submissions and peer review during the first wave of the COVID-19 pandemic. A study on 2329 Elsevier journals. PloS One. 2021;16(10) doi: 10.1371/journal.pone.0257919. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Van der Maaten L., Hinton G. Visualizing data using t-SNE. Journal of machine learning research. 2008;9(11):2579–2605. [Google Scholar]
  64. Vásárhelyi O., Zakhlebin I., Milojević S., Horvát E.-.Á. Gender inequities in the online dissemination of scholars’ work. Proceedings of the National Academy of Sciences of the United States of America. 2021;(39):118. doi: 10.1073/pnas.2102945118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Viglione G. Are women publishing less during the pandemic? Here's what the data say. Nature. 2020;581(7809):365–367. doi: 10.1038/d41586-020-01294-9. [DOI] [PubMed] [Google Scholar]
  66. Vincent-Lamarre P., Sugimoto C.R., Larivière V. The decline of women's research production during the coronavirus pandemic. Nature Index. 2020 https://www.natureindex.com/news-blog/decline-women-scientist-research-publishing-production-coronavirus-pandemic Retrieved from. Accessed Nov 20, 2022. [Google Scholar]
  67. West J.D., Jacquet J., King M.M., Correll S.J., Bergstrom C.T. The role of gender in scholarly authorship. PloS One. 2013;8(7):e66212. doi: 10.1371/journal.pone.0066212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. WHO . 2021. WHO coronavirus (COVID-19) dashboard.https://covid19.who.int/table Retrieved from. Accessed May 4, 2022. [Google Scholar]
  69. Wildgaard L., Schneider J.W., Larsen B. A review of the characteristics of 108 author-level bibliometric indicators. Scientometrics. 2014;101(1):125–158. [Google Scholar]
  70. Williams J.C. The glass ceiling and the maternal wall in academia. New Directions for Higher Education. 2005;2005(130):91–105. [Google Scholar]
  71. Yildirim T.M., Eslen-Ziya H. The differential impact of COVID-19 on the work conditions of women and men academics during the lockdown. Gender, Work, and Organization. 2020 doi: 10.1111/gwao.12529. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

mmc1.pdf (2MB, pdf)
mmc2.docx (35.4KB, docx)

Articles from Journal of Informetrics are provided here courtesy of Elsevier

RESOURCES