Skip to main content
. 2019 Jul 15;8:e45374. doi: 10.7554/eLife.45374

Figure 7. Flowchart of data collection, inclusion and exclusion.

Figure 7—source data 1. Excluded countries due to unreliable gender assignments from first name.
DOI: 10.7554/eLife.45374.019
Figure 7—source data 2. List of specialty and main specialty designation, and number of papers per specialty for the full sample.
DOI: 10.7554/eLife.45374.020
Figure 7—source data 3. Groupings of countries by geographical region.
DOI: 10.7554/eLife.45374.021

Figure 7.

Figure 7—figure supplement 1. Percentage of papers per journal included in the analysis.

Figure 7—figure supplement 1.

The excluded papers are a combination of missing document types in Web of Science and missing name information. Journals publishing document types which are included in PubMed Medline but not Web of Science (e.g. comments, notes) can account for a large exclusion percentage for many journals. For other journals, first name information is consistently missing for some or all years. Here showing journals with >50 papers only.

Figure 7—figure supplement 2. Reliability of gender assignment per country, shown as the rank of countries.

Figure 7—figure supplement 2.

Gender determination: The online tool Gender-API was used to estimate the gender of all first-name and country pairings. This pairing is important as the gender connotations for some first names vary by language and culture. As an example, the name Kim is typically male in Danish, female in English-speaking countries, and unisex in Korean. Gender-API uses co-occurrences of names and countries on social media to provide a precision score for each assignment, which we use to calculate a probability of an author being female, f. We exclude all authors from this analysis who only have initials registered in Web of Science, or who are from a country with unreliable gender prediction. Country sampling and bias: We calculated a reliability score for each country, by determining the precision score of the Gender-API name assignment for all authors per country. Names with precision scores >= 0.8 are considered reliable, and the reliability for the country is the average reliability hereof. We use the reliability distribution in Figure 7—figure supplement 2 to heuristically set a cut-off at .9 reliability for inclusion in the analysis. The excluded countries are listed in Figure 7—source data 1. For some of the East-Asian countries, the explanation for the low reliability lies in the unisex-naming culture of these countries. For other countries, the probable explanation is the absence of comprehensive social media data from these countries.

Figure 7—figure supplement 3. Proportion of papers with gender assignment for all authors.

Figure 7—figure supplement 3.

Reported as function of all sampled papers (p_pubmed) and proportion of all papers matched to Web of Science (p_wos).