Extended Data Table 2.
Validation of SES Predictions and Group Assignments using Publicly Available Data
This table evaluates the accuracy of the SES measures constructed in the Facebook data as well as the methods used to assign individuals to specific groups (i.e., specific neighbourhoods, high schools and colleges). Concretely, we compare the fraction of above-median-SES individuals in each ZIP code, high school and college to estimates of the fraction of high-income members of these groups from publicly available administrative data sources. In the first row, we correlate the fraction of households with above-median household income within each ZIP code, as calculated by ref. 23 using data from the 2014-2018 American Community Survey, with the estimated proportion of Facebook users in our primary sample with above-median SES. This correlation is weighted by ZIP code population based on the 2018 ACS. In the second row, for each public high school, we calculate the 5-year median of the fraction of students who are not eligible for free or reduced price lunch (based on the 2014-2018 NCES Common Core of Data). We correlate this measure with the fraction of students with above-median parental SES in the Facebook data, using individuals born between 2000 and 2004 to match the cohorts observed in the NCES data. This correlation is weighted by the number of students in grades 9 to 12, as reported in the NCES data. In the third row, we correlate the fraction of students with parental income in the top two quintiles of the national distribution in each college, as calculated by ref. 46 using tax records, with our corresponding estimates of the proportion of students with above-median parental SES from the Facebook data. This correlation is weighted by the number of students in the relevant college cohort. See Supplementary Information A for further details on the publicly available data sources.
