Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 May 1.
Published in final edited form as: Am Econ Rev. 2016 May;106(5):333–338. doi: 10.1257/aer.p20161124

STEM Training and Early Career Outcomes of Female and Male Graduate Students: Evidence from UMETRICS Data linked to the 2010 Census

Catherine Buffington 1, Benjamin Cerf Harris 2, Christina Jones 3, Bruce A Weinberg 4,
PMCID: PMC4876811  NIHMSID: NIHMS780357  PMID: 27231399

1. Introduction

Women are underrepresented in a number of science and engineering fields, and the extent of underrepresentation generally increases in career stage (National Science Board, 2014). This article uses new transaction data linked to Census Bureau Data to examine gender differences at critical junctures in the STEM pathway, graduate training and the early career. We find gender “separation” among students—women work on teams with larger shares of women (especially among faculty) than men—but we find no clear disadvantages in the aspects of training environments that we can measure. We find, however, dramatic differences in career outcomes. Women earn 31% less than men overall and 11% less controlling most notably for field of study and funding source. The gap disappears once we include gender interacted with marital status and children.

We use unique new administrative data that allow us to identify personnel employed on federally funded research grants at 4 participating universities over 10 years. These data allow us to characterize the projects on which graduate students train, which is particularly relevant for STEM careers, where most training occurs on funded research teams. We augment these data by matching to demographics, household composition, and presence of children from the 2010 Decennial Census; earnings from W-2 records; and other Census information on sector of employment. The resulting linked data, which will be available through the Federal Statistical Census Research Data Centers, provide a window into a critical and understudied stage of research careers at a time when participation in STEM fields is increasingly important.

Gender gaps exist across several dimensions, from compensation and response to outside offers, space allocations, grant funding, and awards (Chisholm et.al. [1999] and Ginther [2001]). We contribute by analyzing new aspects of graduate training, a point in careers when disparities are likely to have long-lasting consequences. One area that has received attention is whether women benefit from being mentored by other women, although the literature has found mixed results using a range of qualitative and quantitative methods (Pezzoni, Mairesse, Stephan, and Lane [2015]). We make further contributions by studying post-graduation outcomes.

2. Data

Four central datasets are used in the analysis: UMETRICS personnel files on all individuals employed under federal (and some non-federal) research awards matched to the 2010 Decennial Census; the ProQuest Dissertation and Thesis Database; and a composite of earnings and placement information based on extracts from IRS W-2 forms, the Longitudinal Employer Household Dynamics microdata (LEHD), the Business Register (BR), Longitudinal Business Database (LBD), and the Integrated Longitudinal Business Database (ILBD) (Zolas et al. [2015]). The UMETRICS file contains university payroll records on the proportion of earnings allocated to all federal awards (and some non-federal awards) for all pay periods and for all individuals. The data also include the federal funding agency and job titles, which we mapped into 6 occupation categories that include faculty and graduate student. These data identify all people employed on research projects and their positions. The focal award is identified based on the time on the award and the award’s share of the student’s earnings.

PhD recipients (“graduate students” include Masters students) are identified by matching UMETRICS data to ProQuest’s Dissertation and Thesis Database. These data contain the name of the dissertation’s author; the subject (which we aggregated manually); institution awarding the degree; and the degree awarded.

We link the UMETRICS data to the 2010 Census via a person identifier, used internally by the Census Bureau, called the Protected Identification Key (PIK). PIKs are assigned through the Person Identification Validation System (PVS), which uses probability record linkage techniques and personal information such as name, date of birth, and residential location (Wagner and Layne [2014]). Once a PIK is assigned to a record, the Personally Identifiable Information is removed so analysts can anonymously link individuals across files for statistical and research purposes. Person records in the Census contain date of birth, gender, race, ethnicity, and relationship to the head of household (HH), the last of which permits inference about certain relationships within a household. Marital status is modeled for graduate students who are either the HH or listed as a spouse or unmarried partner of the HH. Individuals are characterized as having children when they are, or are married to, the HH and there are (step) children of the HH present.1

The PIK is used to link to W-2 earnings, which cover total annual wages, tips and other compensation from the job with the highest earnings in each year from 2005 and 2012. Linking to the LEHD to provides establishment identifiers, and linking on those establishment identifiers to the BR, LBD, and ILBD provides sector of employment.

As shown in the Appendix, the UMETRICS data include 3,551,730 payment records, representing 127,822 employees (at all levels) from 4 universities. Of these, 11,773 earned a degree from one of the 4 universities, and 3,837 were in the 2007–2010 graduating cohort. We keep those in STEM fields, between the ages of 24 and 40, who were assigned a PIK and matched to earnings data. The final sample includes 1,237 students (867 male and 370 female). There are no gender differences in terms of demographics. For each, all but 1% are White alone (57%), Black alone (2.3%), or Asian alone (40%); 3% are Hispanic; the average age is just over 30; and just under two-thirds are married or partnered. Nineteen percent of females and 24% of males had children at the time of the 2010 Census. There are clear differences in field of study— 59% of the females in our sample completed dissertations in Biology, Chemistry, or Health, but only 27% of males wrote dissertations in those fields. Males were more than twice as likely to complete dissertations in Engineering (45% versus 21%) and were 1.5 times as likely to study Computer Science, Math, or Physics (28% versus 19%). Given these differences, it is crucial to account for field of study when estimating training and labor outcomes.

3. Analysis

We use OLS to compare the training environments and labor market outcomes of female and male doctoral recipients along a wide range of dimensions. The main variable of interest is an indicator equal to 1 if the student is a woman. The simplest specification includes university indicators, a linear trend for the first year the student appears as a graduate student in the UMETRICS data,2 and an indicator for being left-censored in the UMETRICS data.3 Labor market regressions also include indicators for graduation year. Additional controls are progressively introduced for dissertation topic, funding agency, race, Hispanic origin, age and its square, marital status, and presence of children. Finally, interactions are included between gender and both marital status and presence of children. Of course, there are likely to be unmeasured differences between women and men and those with and without children.

Table 1 reports differences in training environments. Here and in Table 2, columns (1a)–(1c) report raw means for women and men, and the difference between the two. The remaining columns report the gender gaps conditional on the controls discussed above. Dependent variables enter as rows. There is substantial gender separation in teams. For the average female graduate student in the data, over 2 out of 10 faculty members on the research teams are female, while fewer than 1 out of 10 faculty members are female for the average male graduate student. This finding is robust; even using the richest set of controls in column (6), there is a precise 5 percentage point difference in the proportion of faculty members on the research team who are female. There is some evidence female graduate students work on teams with a higher percentage of other female students, but this result disappears controlling for dissertation topic.

Table 1.

Training Environments of Male and Female Graduate Students Participating in STEM Research

(1) (2) (3) (4) (5) (6)
Dependent Variables ↓ (a)
Females
(b)
Males
(c)
Diff
Share of Faculty that are Female 0.21
(0.02)
0.08
(0.01)
0.13***
(0.02)
0.12***
(0.02)
0.12***
(0.02)
0.09***
(0.02)
0.09***
(0.02)
0.05*
(0.03)
Share of Graduate Students that are Female 0.14
(0.01)
0.09
(0.00)
0.05***
(0.01)
0.04***
(0.01)
0.04***
(0.01)
0.01
(0.01)
0.01
(0.01)
−0.00
(0.02)
Ln Team Size 1.73
(0.04)
1.93
(0.03)
−0.20***
(0.05)
−0.18***
(0.05)
−0.18***
(0.05)
−0.10*
(0.06)
−0.10*
(0.06)
−0.06
(0.09)
Faculty to Student Ratio 0.93
(0.06)
0.64
(0.03)
0.29***
(0.07)
0.22***
(0.07)
0.22***
(0.07)
0.15**
(0.07)
0.14*
(0.07)
0.29**
(0.13)
Total Number of Awards 2.24
(0.07)
2.69
(0.06)
−0.45***
(0.09)
−0.34***
(0.09)
−0.32***
(0.09)
−0.24***
(0.09)
−0.23***
(0.09)
−0.10
(0.15)
Number of Months Participating on the Award 20.98
(0.69)
21.59
(0.45)
−0.62
(0.82)
−1.10
(0.79)
−1.00
(0.79)
−1.38*
(0.82)
−1.42*
(0.82)
−0.91
(1.18)
Years from First Observation to Degree 3.20
(0.08)
3.23
(0.06)
−0.03
(0.10)
−0.12**
(0.06)
−0.11*
(0.06)
−0.12**
(0.06)
−0.12**
(0.06)
0.00
(0.10)
University, First Year Trend, Left-Censored
Race, Hispanic Origin, Age, Age-squared
Dissertation Topic
Funding Agency
Married or Partnered, Children
Female × (Married or Partnered + Children)
Observations 370 867 1,237 1,237 1,237 1,237 1,237 1,237

Notes: Sample includes 2007–2010 graduates with dissertation topics in a STEM field. Each cell in columns (2)–(8) displays the estimated coefficient on the FEMALE indicator from a separate regression. Robust standard errors.

Source: Author calculations. UMETRICS linked to 2010 Census, ProQuest, LEHD, W2, LBD, BR, and iLBD

***

Significant at the 1 percent level.

**

Significant at the 5 percent level.

*

Significant at the 10 percent level.

Table 2.

Labor Market Outcomes of Male and Female Graduate Students Participating in STEM Research

(1) (2) (3) (4) (5) (6)
Dependent Variables ↓ (a)
Females
(b)
Males
(c)
Diff
Employed in Industry 0.40
(.022)
0.47
(0.02)
−0.13***
(0.03)
−0.11***
(0.03)
−0.11***
(0.03)
−0.05
(0.03)
−0.05
(0.03)
−0.03
(0.05)
Ln Wage 10.50
(.063)
10.93
(0.03)
−0.37***
(0.07)
−0.35***
(0.07)
−0.35***
(0.07)
−0.11*
(0.07)
−0.11*
(0.07)
0.01
(0.10)
Ln Wage
(with Industry Controls)
10.40
(.057)
10.71
(0.04)
−0.31***
(0.07)
−0.29***
(0.07)
−0.30***
(0.06)
−0.9
(0.06)
−0.10
(0.07)
0.02
(0.10)
University, First Year Trend, Left-Censored
Degree Year
Race, Hispanic Origin, Age, Age-squared
Dissertation Topic
Funding Agency
Married or Partnered, Presence of Children
Female × (Married or Partnered + Children)
Observations 318 731 1,049 1,049 1,049 1,049 1,049 1,049

Notes: Labor outcomes are taken from one year following graduation or separation from the university payroll, whichever is greater. Wages are in 2012 dollars. Sample includes observations with dissertation topics in a STEM field. Each cell in columns (2)–(8) displays the estimated coefficient on the FEMALE indicator from a separate regression. Robust standard errors.

Source: Author calculations. UMETRICS linked to 2010 Census, ProQuest, LEHD, W2, LBD, BR, and iLBD

***

Significant at the 1 percent level.

**

Significant at the 5 percent level.

*

Significant at the 10 percent level.

These differences could be due to choice (“sorting”), external forces (“segregation”), or a combination of both. Furthermore, since students are observed at a relatively late stage in their education, sorting by choice at this stage may derive from experiences (including segregation) at earlier stages. Distinguishing between these mechanisms is beyond the scope of this paper and is an area for future research.

The remainder of Table 1 shows gender differences in several characteristics of STEM students’ training environments. These differences may be advantageous or disadvantageous. For example, female graduate students tend to be on awards with smaller teams and a greater share of faculty per graduate student. This may imply more opportunities for direct mentorship, but it also may reflect differences in the size and prestige of grants (i.e., smaller awards may not be able to employ as many graduate students). Female graduate students are employed on fewer federal research awards overall than their male counterparts, spend slightly less time participating in their primary award, and have shorter spans between first appearing in the UMETRICS data and graduating, although many of these differences are sensitive to the specification. These differences could be interpreted two ways. In one view, the tendency of females to participate in fewer awards and appear in the data for shorter durations could reflect specialized training and faster degree completion. In another view, female students may be isolated from other researchers and may begin participating in federally funded research later. More research is needed to determine which view is correct.

The early labor market outcomes of the males and females in the graduating cohort can also be studied. We examine outcomes one year from the time the student graduates (according to ProQuest) or leaves the payroll of the university that granted the degree (according to the W-2 data), whichever is later. Specifically, we consider earnings in 2012 dollars and placement within Academia and Government versus all other industries.4

Figure 1 shows unconditional kernel density plots of earnings for males and females in all sectors, the Academic and Government Sector, and all other sectors. Panel (a) shows women are more concentrated than men at the low-to-middle portion of the earnings distribution and less represented at the higher end of the earnings distribution, with the male earnings distribution being bi-modal. Panel (b) shows relatively smaller gaps among those going into academia and government, with many earning typical postdoctoral researcher incomes just under $50,000. Women and men earn the most in industry, but the gap is also larger.

FIGURE 1.

FIGURE 1

WAGE DISTRIBUTIONS BY SEX AND SECTOR

Table 2 further analyzes differences in early labor market outcomes. Column (1) of the top row shows that the female students in our graduating cohort are 13 percentage points less likely than male graduate students to work in the lucrative sectors outside academia and government. This holds controlling for university, degree year, and demographic characteristics, but column (4) shows there are no detectable differences once we control for broad dissertation topic and funding source. We find unconditional wage differences between males and females of 0.37 log points (31%). Controlling for university characteristics, degree date, and demographics has little impact on the point estimate. However, we see the magnitude of the estimated wage gap drop by about two thirds to 11% when we include controls for dissertation topic and funding source, underscoring the important role of field of study.5 Adding controls for family and household structure (column (5)) does not change the point estimate, which is significant at the 10% level. Allowing the impact of partnership status and children to vary by gender, however, makes the point estimate of the male-female wage gap statistically indistinguishable from zero. This suggests the presence of children contributes meaningfully to the gender wage gap. However the point estimates on the interactions themselves are imprecise, possibly due to noise in measurement of children and partnered status (see footnote 1). Finally, the gender gap is larger for industry employees and robust to controlling for sector.

4. Conclusion

This paper explores differences in STEM training environments and labor market placement outcomes using unique transaction data from university combined with administrative and survey data from the Census Bureau. The results show gender separation in training, but no clear gender disadvantages in training environments. There are, however, differences in placement outcomes—women are much less likely to enter industry and more likely to enter academia or government. Women have substantially lower wages, with a larger gap for those entering industry. This difference is due largely to field of study and disappears controlling for gender interacted with marital status and the presence of children. These results should be interpreted with caution. The data represent a limited number of schools and only some aspects of the training environment. Also, labor outcomes likely reflect some unobserved heterogeneity, including in hours worked, and potentially household decisions on housework and child care.

Supplementary Material

Appendix

Acknowledgments

Weinberg acknowledges support from NSF EHR DGE 1535399 and 1348691; NSF SciSIP 1064220, NIA P01 AG039347, and R24 AG048059; and the Sloan and Kauffman Foundations. We thank Nathan Goldshlag, Ron Jarmin, John King, Julia Lane, Sharon Levine, Catherine Massey, Brett McBride, Heather Metcalf, Nikolas Zolas, and participants of the course Big Data for Federal Agencies for their valuable support in this project. Any mistakes are the responsibility of the authors.

Footnotes

Disclaimer: This paper is released to inform interested parties of research and to encourage discussion. The views expressed are those of the authors and not necessarily those of the U.S. Census Bureau, the American Institutes for Research, or the Ohio State University.

1

This approach provides valuable information but has two main limitations. First, we can only infer marital status or presence of children for those who are either listed as the HH or are married or partnered to the HH. Individuals in our sample who are in multi-family or multi-generational households may be incorrectly classified as single or childless. Second, the 2010 Census provides a point-in-time measure of marital status and presence of children, while our UMETRICS and earnings data are longitudinal. Therefore, these measures become increasingly noisy as our educational and labor market outcomes deviate further from 2010.

2

Some students enter the UMETRICS data first as undergraduates and later transition to graduate students.

3

A student is defined as left-censored if the date she first appears as a graduate student in the UMETRICS data is equal to the first date for which her university’s data are available.

4

Far more people are in academia than government, but these sectors have similar earnings.

5

We estimated regressions that introduced funding agency before dissertation topic to check that the change moving from columns (3) to (4) was not an artifact of model saturation, and we found that the large influence of dissertation topic was invariant to the stage at which we introduce it. Given its central role in our results, we also tried estimating regressions that interacted gender and topic. However, we found no evidence of within-topic gender gaps in wages.

Contributor Information

Catherine Buffington, U.S. Census Bureau, Associate Directorate of Research and Methodology, 4600 Silver Hill Rd., Washington, DC 20233.

Benjamin Cerf Harris, U.S. Census Bureau, Center for Administrative Records Research and Applications, 4600 Silver Hill Rd., Washington DC 20233.

Christina Jones, American Institutes for Research, 1000 Thomas Jefferson Street NW, Washington DC 20007.

Bruce A. Weinberg, The Ohio State University Department of Economics, Columbus Ohio, 43210, IZA, and NBER

References

  1. Chisholm Sallie . A Study on the Status of Women Faculty in Science at MIT. Massachusetts Institute of Technology; 1999. [Google Scholar]
  2. Ginther Donna K. Does Science Discriminate Against Women? Evidence from Academia: 1973–1997. 2001. (Federal Re-serve Bank of Atlanta Working Paper 2001–02). [Google Scholar]
  3. National Science Board. Science and Engineering Indicators 2014. Arlington VA: National Science Foundation (NSB 14-01); 2014. [Google Scholar]
  4. Michele Pezzoni, Mairesse Jacques, Stephan Paula, Lane Julia. Gender and Performance of Graduate Students. Working Paper 2015 [Google Scholar]
  5. Deborah Wagner, Layne Mary. (CARRA Working Paper Series #2014-01).The Person Identification Validation System (PVS): Applying the Center for Administrative Records Research and Application’s (CARRA) Record Linkage Software. 2014 [Google Scholar]
  6. Nikolas Zolas, Goldshlag Nathan, Jarmin Ron, Stephan Paula, Smith Jason Owen, Rosen Rebecca, Allen Barbara McFadden, Weinberg Bruce A, Lane Julia L. Wrapping it up in a person: Examining Employment and Earnings Outcomes for Ph.D. Recipients. Science. 2015 Dec 11;350(6266):1367–1371. doi: 10.1126/science.aac5949. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix

RESOURCES