Abstract
Recent studies have begun to uncover the genetic architecture of educational attainment. We build on this work using genome-wide data from siblings in the National Longitudinal Study of Adolescent to Adult Health (Add Health). We measure the genetic predisposition of siblings to educational attainment using polygenic scores. We then test how polygenic scores are related to social environments and educational outcomes. In Add Health, genetic predisposition to educational attainment is patterned across the social environment. Participants with higher polygenic scores were more likely to grow up in socially advantaged families. Even so, the previously published genetic associations appear to be causal. Among pairs of siblings, the sibling with the higher polygenic score typically went on to complete more years of schooling as compared to their lower-scored co-sibling. We found subtle differences between sibling fixed effect estimates of the genetic effect versus those based on unrelated individuals.
Keywords: Polygenic score, Educational Attainment, Add Health, Genetic Risk Score
Introduction
Genetics and the social sciences have endured a long and troubled partnership. At the beginning of the 20th Century, eugenicists, including the father of modern quantitative genetics R.A. Fisher, used their science to promote politics of racism, classism, and xenophobia (Tabery 2008). By the end of 20th Century, things were not much better. Publication in 1994 of The Bell Curve was followed by contentious debate over the existence of and biological basis for a racial gradient in intelligence (Neisser et al. 1996, Devlin, 1997). The 21st Century is off to a better start in the form of international collaboration among academic social scientists and geneticists best embodied by the Social Science Genetic Association Consortium. The first large-scale endeavor of this group was to apply state of the art methods typically used to hunt for genetic causes of common diseases to investigate the genetics of educational attainment (Rietveld et al. 2013). They pooled data on more than 100,000 individuals from 42 different studies. To the surprise of many in the scientific community, they actually found something. Not only were they able to identify genetic variants that exhibited robust and replicable associations with educational attainment, they were able to construct a genome-wide “polygenic score” for educational attainment that predicted, albeit very weakly, how far an individual was likely to progress in their educational career (i.e. total years of schooling and/or whether they completed college).
This breakthrough finding raises an important question for social scientists who study educational attainment: What does a measure of genetic proclivity towards higher levels of educational attainment actually capture? Can we say with confidence that the genetics of educational attainment uncovered in Rietveld et al. (2013) operate independently of the social circumstances into which a child is born? And, if so, what are the mechanisms? That is, what are the personal attributes (e.g., endophenotypes) that develop from a “high education” genotype that in turn enable their holders to go farther in their educational careers?
To help address these questions, we conducted a sibling fixed effects analysis among respondents in the National Longitudinal Study of Adolescent to Adult Health Sibling Pairs Study. Differences in siblings’ genotypes arise from a random process similar to a lottery (variation in recombination and segregation of alleles during the meiosis that produces gametes). Our analysis tested whether the “winners” of within-family genetic lotteries completed more years of schooling as compared to their siblings. The use of an independent sample of sibling pairs for this type of inquiry provides three important contributions to the existing work in this area. First, we find strong evidence that recent discoveries made in genetic studies of educational attainment are non-spurious (i.e. not the result of environmental confounding) and represent more than the genetic signature of a privileged social group or groups. Second, features of children’s environments that promote educational attainment are correlated with their genetic endowments; such correlations may bias between-family estimates of genetic effects. Third, estimates of genetic influence on educational attainment from comparisons of siblings may differ in important respects from estimates based on individuals who do not share the same household. We also examined the potential bias that could arise if socioeconomic correlates of a person’s genetic inheritance are ignored, a question critical to any future translation of genetic discoveries into education research. Finally, we examined a putative mechanism or pathway by which this genotype-education relationship may hold: verbal intelligence as measured by a receptive vocabulary test.
The remainder of this introduction is split into four sections. We begin by introducing genome-wide data analysis and its application to the study of educational attainment. We then discuss polygenic scoring as an approach to translating results from genome-wide analysis into a tool for social science. In particular, we highlight vulnerabilities in polygenic scoring methods and ways of addressing them. Finally, we discuss population and social stratification that may confound inference and how the sibling-difference may be used to bypass these confounding dynamics.
Genome-wide data analysis and its application to the study of educational attainment
Completions of the Human Genome Project and the International HapMap Project have given scientists the necessary tools to directly investigate human DNA and its relation to various traits and diseases. The current approach favored by geneticists for identifying DNA sequence variation associated with complex human traits is the genome-wide association study (GWAS). GWAS is an inductive data mining approach in which an outcome of interest (known as a phenotype) is analyzed for association with each of a large number of genetic variants selected to survey variation throughout the entire genome, most commonly single-nucleotide polymorphisms (SNPs).1 To date, thousands of genome-wide analyses have been conducted on hundreds of traits and diseases, and many discoveries have been made (Welter et al. 2014). Most GWAS research falls within the biomedical domain but the Social Science Genetics Association Consortium (SSGAC) was formed to apply the methods of GWAS to the study of social phenomena. Their first large-scale project was a genome-wide association study of educational attainment (Rietveld et al. 2013). That GWAS, which analyzed data from more than 100,000 individuals, identified several SNPs that were associated with educational attainment even after strict adjustments for multiple hypothesis testing. Subsequent analysis has replicated these discoveries (Rietveld et al. 2014). The individual genetic variants discovered exhibited only very small effects on educational attainment, consistent with findings from GWAS of other complex traits ranging from body mass index to schizophrenia. But the results of the GWAS are not limited to the handful of SNPs identified. It is possible to combine information from all of the SNPs analyzed in the GWAS to calculate a “polygenic score” that summarizes genome-wide genetic predisposition to educational attainment.
Polygenic scores as a tool to integrate GWAS results into social science research
Polygenic scores (also known as genetic risk scores) summarize an individual’s cumulative genetic predisposition to a particular disease or trait. Scores aggregate information across a panel of SNPs according to associations identified in independent GWAS studies. Each SNP is scored by counting the number of disease/trait-associated alleles and then multiplying that sum by a weight. The same weight may be used for all SNPs or some other value may be used, such as the coefficient estimated for the association between the SNP and the disease/trait in a GWAS. Then, the weighted allele counts are summed across the SNP panel. Polygenic scores can include all SNPs measured in GWAS or some subset, typically defined by a p-value threshold for the GWAS results (for a detailed discussion of polygenic scoring methods see Wray et al. 2007, Purcell et al. 2009, Dudbridge 2013). As the number of SNPs included in a polygenic score increases, the score’s distribution rapidly approaches normality (Plomin et al. 2009). The capacity to integrate information from across the genome into a single index and the statistical properties of that index (i.e. continuous and normally distributed) have made polygenic scores an appealing tool for the integration of genetics in both biomedical and social sciences. For example, previous work has used polygenic scores to study the development of obesity, smoking, and asthma (Belsky et al. 2012; Domingue et al. 2014; Belsky et al. 2013a; Belsky et al. 2013b). The majority of polygenic scores can predict only a few percent of the variance in traits of interest. However, it is thought that as GWAS samples increase in size along with the density of SNPs genotyped, so too will the predictive power of polygenic scores based on GWAS results (Conley, 2015). In the case of human height, a trait measured with high precision, GWAS of nearly one quarter million individuals recently generated a polygenic score predicting nearly 30% of population variance (Wood et al. 2014). Even with the small level of predictive power they do offer, polygenic scores provide a tool for beginning to pose and answer questions about the complex relationships that exist between genetics, environments, and the traits and behaviors of interest to the social sciences (Belsky & Israel 2014; Conley et al. 2015).
Population stratification and ethnic confounding of genome-wide analysis
Substantively, GWAS test for covariance between allele frequencies and a trait of interest. When an association is detected, the inference is that the SNP (or, more likely, some other DNA sequence variant that is highly spatially correlated with the SNP) causes a biological effect that in turn causes variation in the trait of interest. But there are other sources of covariance between allele frequencies and traits that can confound associations detected in GWAS. A particularly pervasive source of confounding in GWAS is “population stratification.” Population stratification is the non-random patterning of allele frequencies across global populations (Cardon & Palmer, 2003). These patterns may arise for any number of reasons including major events, such as the departure of a select group from the African subcontinent, and minor events of social construction, such as the erection of national boundaries that restrict contact between groups. The main consequence of population stratification for our purposes is that these alleles will be associated with any trait that varies systematically between these populations even though the genetic variation may have nothing to do with the underlying reasons (which may be environmental) why the trait varies between the two groups. To guard against confounding due to population stratification, GWAS typically use samples in which the respondents all report the same self-identified racial background (Cardon & Palmer, 2003).
The challenge of population stratification raises two important considerations for the integration of genome-wide data into social science research. First, it highlights the potential racial specificity of GWAS findings because the particular SNPs identified in GWAS may be differently associated with the true causal loci due to differences in “linkage disequilibria” (e.g., Reich et al. 2001). This implies that a particular SNP measured in GWAS may be highly correlated with an unmeasured causal variant in one population, but not in another. An important first step for social scientists wishing to incorporate GWAS-derived genetic measurements into their own research designs is to evaluate cross-race replication of associations (Belsky et al. 2013c; Belsky et al. 2013d; Domingue et al. 2014). This is an especially important point because the SSGAC GWAS of educational attainment was conducted only in a European-descent sample.
The second consideration raised by the challenge of population stratification is that residual confounding may be present even within samples designed to be racially homogenous. Subtle, genome-wide allele frequency differences exist within even relatively narrowly defined European-descent populations (Nelis et al. 2008). Thus, at a minimum, statistical controls for population stratification are needed. The usual approach in the context of a GWAS is to estimate principal components from genome-wide SNP data and then use these as control variables in regression analysis (Price et al. 2006). Such principal components are only estimates, though. Therefore, an ideal control for population stratification is to conduct analyses that compare individuals who share the same ancestry, i.e. family-based genetic analysis (Laird et al. 2006).
Social stratification and environmental confounding of genome-wide analysis
To the extent that GWAS are able to uncover molecular roots of behavioral phenomena, there are important challenges to address in establishing the magnitudes of the effects of genetic influences. A primary challenge is that polygenic influences will be correlated among family members; any genetic predisposition to social attainment will be shared between parents and children. Thus a child’s genetic and social inheritances will be correlated (e.g. Boardman et al. 2012). Attempts to quantify genetic effects must therefore account for social differences between children. One method is to measure and control for features of children’s environments, such as characteristics of their families and neighborhoods. But in parallel to the limitations with using principal components to control population stratification, such methods depend on the quality and completeness of the measurements of children’s environments. An alternative is to conduct within-family analysis via sibling fixed effects. Full siblings in a family share—to a large degree—parents, housing, neighborhoods, schools, and so on. And as discussed above, their genetic differences are essentially randomly assigned. Siblings thus provide ideal controls for establishing magnitudes of genetic effects on social attainments.
Here, we test the effects of a polygenic score related to educational attainment derived from GWAS in a nationally representative sample of siblings. We then evaluate correlations between genetic and social determinants of educational attainment. We next estimate genetic effects after controlling for select measured features of children’s social environments. Finally, we submit genetic effect estimates to the acid test of a sibling comparison. We evaluate whether genetic effects on educational attainment operate in a similar manner within families and across children in the population. We also test whether genetic effects are accounted for by a common measure of academic aptitude, verbal intelligence.
Materials
Sample
Add Health is a nationally representative cohort drawn from a probability sample of 80 US high schools and 52 US middle schools, representative of US schools in 1994–95 with respect to region, urban setting, school size, school type and race or ethnic background (n=20,745, aged 12–20 years at Wave 1 in 1994–95). The Wave 3 (2001-2002) and 4 (2008-2009) data collections included n=15,197 individuals (then aged 18–26 years, mean age 22.3 years) and n=15,701 individuals (then aged 24–32 years, mean age 28.9 years) respectively. The Add Health study includes an oversample of siblings (Harris et al. 2013). The sibling pairs sample was genotyped (via Oragene saliva collection) with the Illumina Human Omni Quad chip at Wave 4 of the study (see McQueen et al. 2014 for details). We use this genome wide data to construct polygenic scores for study participants.
Patterns of linkage disequilibrium (LD) vary considerably across socially defined racial and ethnic groups and this is particularly evident when comparing the correlated genotype structures of Europeans to those of African ancestry (Price et al. 2010). Specifically, there is more genetic variation among those of African ancestry (Rosenberg et al. 2002; Li et al. 2008) that reduces LD (e.g., the correlation between neighboring SNPs) and thus creates problems for comparing the effects of SNPs across groups, a problem compounded when creating genome-wide polygenic scores. We therefore analyzed genetic associations separately for European and African Americans.
The 917 European Americans (EA) in our analytic sample are in 386 sibling pairs, twelve sibling trios, with an additional 109 singletons. The 677 African Americans (AA) are in 100 sibling pairs, four trios, with an additional 465 singletons. Table 1 shows characteristics of the European- and African-American Sibling Pairs Study participants who provided genetic data and constitute our analytic sample. The table also shows characteristics of the full Add Health European- and African-American samples for comparison. The European Americans in our analytic sample are largely comparable to the full population of EA respondents in the Add Health study. The African Americans in our sample are less educated, have less educated parents, and score lower on the verbal intelligence measure as compared to all AA Add Health participants. The bulk of our analysis is focused on the EA sample because the original Reitveld et al. (2013) GWAS was conducted on European-descent individuals. Replication of polygenic scores discovered in EA samples among AA samples may be compromised because LD differences in the groups leads to less precision among AA samples. Accordingly, large-scale GWAS of educational attainment in African Americans will be needed to better quantify genetic influences on attainment in this population. Nevertheless, in the interest of testing the extent to which findings made in European-descent individuals replicate in a different population, we conduct several analyses of the African American sample. Due to the small number of African American sibling pairs in the data, sibling analyses are conducted only in European Americans.
Table 1.
Add Health Cohort | EA subsample | Genotyped EA sibs | p-value for difference between EA subsample and genotyped sibs | AA subsample | Gentoyped AA sibs | p-value for difference between AA subsample and genotyped sibs | |
---|---|---|---|---|---|---|---|
Years of Education at W4 | 14.2 (2.2) | 14.3 (2.2) | 14.2 (2.2) | 0.11 | 14.0 (2.2) | 13.5 (2.2) | 0.00 |
Verbal Intelligence | 100.6 (14.5) | 105.1 (12.0) | 103.9 (11.1) | 0.00 | 94.3 (14.2) | 91.6 (13.8) | 0.00 |
Years of Maternal Education | 13.2 (2.6) | 13.5 (2.3) | 13.5 (2.1) | 0.66 | 13.3 (2.5) | 12.6 (2.2) | 0.00 |
Neighborhood Disadvantage | 0.00 (2.2) | -0.65 (1.8) | -0.54 (1.7) | 0.06 | 1.5 (2.5) | 1.8 (2.5) | 0.01 |
| |||||||
N | 15697 | 8630 | 917 | 3456 | 677 |
Measures
Educational Attainment
We measured educational attainment as the highest degree completed by the time of interview at Wave 4 when respondents were asked “What is the highest level of education that you have achieved to date?”. Response options and their numeric values (in parentheses) were: 8th grade or less (8), some high school (10), high school graduate (12), some vocational/technical training (13), completed vocational/technical training (14), some college (14), completed college (16), some graduate school (17), completed a master’s degree (18), some graduate training beyond a master’s degree (19), completed a doctoral degree (20), some post baccalaureate professional education (18), and completed post baccalaureate professional education (19). EA respondents in our genetic sample completed 14.2 years of schooling on average (SD=2.2) by Wave 4. Of the sibling pairs, 64% varied in their educational attainment (mean difference=1.7 years). AA respondents in our genetic sample completed 13.5 years of schooling on average (SD=2.2).
Parental Education
At the first wave of data collection, parents of respondents (over 90% were females) responded to a question asking “How far did you go in school?” Potential responses and their numeric codes (in parentheses) included: 8th grade or less (8), more than 8th grade but did not graduate from high school (10), went to vocational school in place of high school (10), high school graduate (12), GED (12), vocational school after high school (13), attended college (14), graduated college (16), and training beyond college (18). EA parents of participants in our genetic sample reported completing 13.5 years of schooling on average (SD=2.1). AA parents completed 12.6 years of schooling on average (SD=2.2). Participants with more educated parents went on to complete more years of schooling (r=0.42 in the EA sample; r=0.32 in the AA samples; see Table 2).
Table 2.
EA Respondents | AA Respondents | |||
---|---|---|---|---|
| ||||
Correlation with W4 education | Correlation with Polygenic Score | Correlation with W4 education | Correlation with Polygenic Score | |
Years of Education at W4 | 0.18*** | 0.11** | ||
Verbal Intelligence | 0.36*** | 0.14*** | 0.36*** | 0.15*** |
Years of Maternal Education | 0.42*** | 0.05 | 0.32*** | 0.12** |
Neighborhood Disadvantage | -0.35*** | -0.13*** | -0.14*** | 0.00 |
Polygenic Score | 0.18*** | 0.11** |
Note:
p<0.1;
p<0.05;
p<0.01;
p<0.001
Neighborhood Disadvantage
The Add Health Study used respondents’ residential addresses at the time of Wave I data collection to link individuals with data describing the US Census block group where they lived. We used contextual variables from this dataset to measure the socioeconomic and sociodemographic characteristics of the neighborhoods in which Add Health respondents were living at the time of the baseline interview in adolescence (see Online Supplement). By design, measured neighborhood disadvantage was associated with educational attainment (r=-0.35, for EA respondents), although this association was weaker for AA respondents (r=-0.14).
Verbal Intelligence
Verbal intelligence was measured at Wave 1 (when Add Health participants were 12-20 years old) via a modified version of the Peabody Picture Vocabulary Test (Dunn & Dunn, 1981, 1997), a test of receptive vocabulary (M=103.9, SD=11.1 for EA; M=91.6, SD=13.8 for AA). Respondents who scored higher on the vocabulary test went on to complete more years of schooling (r=0.36 in both EA and AA samples).
Educational Attainment Polygenic Score
After quality controls (see Online Supplement), the genetic database included 1,886 individuals with valid data on 940,862 SNPs. Polygenic scores for educational attainment were calculated for each Sibling Pairs participant using the results of their meta-analysis of GWAS of educational attainment (Rietveld et al. 2013). Briefly, SNPs in the Add Health Sibling Pairs genetic database were matched to SNPs with reported results in the GWAS. For each of these SNPs, a loading was calculated as the number of educational attainment-associated alleles multiplied by the effect-size estimated in the original GWAS. Loadings were then summed across the SNP set to calculate the polygenic score. Additional details on the construction of this variable, as well as a sensitivity analysis, are included in the Online Supplement. We standardize the polygenic score to have M=0, SD=1 separately within the EA and AA samples. Scores were normally distributed (Figure S1). The mean sibling difference in polygenic scores in the EA sample was 0.8.
Analysis
Our analysis used 3 models to test associations between Add Health participants’ polygenic scores and their educational attainments. The youngest participants were aged 24 at the time of the most-recent data collection and some may not have completed their education (Figure S1 contains a comparison of birth year and educational attainment). All models were adjusted for year of birth to account for any differences in educational attainment due to age at the time of follow-up. Models 1 and 2 are also adjusted for the first 10 principal components estimated from the genome-wide SNP data to account for any population stratification in our analytic sample (McQueen et al. 2014).
The first model estimated the association between polygenic score and educational attainment in the pooled sample of sibling pairs. Model 1 takes the form
Model 1 |
The estimate of the genetic effect is denoted “βU,” where the subscript emphasizes the fact that the estimate comes from an approach in which the respondents are treated as unrelated individuals. The sibling structure of the data was accounted for by clustering standard errors within families (Zeileis, 2004), but this does not affect point estimates. Model 1 approximates the approach being used by many social scientists seeking to integrate genetic information into analyses of educational attainment (e.g., Ward et al. 2014; de Zeeuw et al. 2014).
A limitation to Model 1 is that βU may be biased away from zero due to confounders that covary with the genetic score across families (environmental stratification, as discussed in the introduction). For example, children share half of their DNA with each parent. Thus, a child’s polygenic score will be positively correlated their parents’ scores. If the polygenic score is causally related to educational attainment, then children with high scores will tend to have better educated parents as compared to children with low scores. As a consequence, they are likely to grow up in quite different environments. βU may therefore capture not just a genetic effect, but also the effects of environmental advantages that are associated with the child’s genotype (i.e. parents with more education and the economic and social resources that come with it). The geocoded Add Health contextual data allow us to test this hypothesis by fitting a second model that statistically control for differences in adolescents’ environments that may be correlated with their polygenic scores. Model 2 takes the form:
Model 2 |
where ν and ω adjust for differences between adolescents’ parental and neighborhood characteristics. We also consider models where ν and ω are independently constrained to be 0 (Models 2A and 2B respectively).
A limitation of Model 2 is that it cannot account for unmeasured features of families and neighborhoods that are correlated with children’s genotypes. Therefore, we fit a third model that utilized the family-structure of the data to generate a sibling fixed effect estimate that fully controls for parental genotype and attainments and also for any neighborhood or environmental characteristics that may vary across families. Model 3 takes the form:
Model 3 |
Where IK (i) is 1 if individual i is in family k and 0 otherwise (and one family, k=1, is excluded as the reference). This sibling comparison model leverages the genetic lotteries that occur within families. Estimates of βW represent the educational advantage enjoyed by the sibling who “wins” a hypothetical family’s genetic lottery. Because the estimate is based on comparing siblings, any parental,neighborhood, or school factors that are shared by siblings in a family are controlled by the design of the model.
Results
Did adolescents with higher polygenic scores complete more years of schooling?
Adolescents with higher polygenic scores went on to complete more years of schooling as of the most recent follow-up, when they were in their 20s and 30s. The genetic effect in our US sample of EA respondents was small in magnitude, (r=0.18, see Table 2) consistent with published estimates from samples in the UK and the Netherlands (Ward et al. 2014, De Zeeuw et al. 2014). In years of educational attainment, this correlation is equivalent to a predicted increase of 0.41 years for a one SD increase in the polygenic score. In our European-descent sample, we detected little evidence that population stratification confounds genetic effects as estimated effect sizes for the polygenic score were similar when models were fitted without adjustment for population structure: our base Model 1 estimated that each standard-deviation increase in an adolescent’s polygenic score forecast their completion of over one-third of one year of additional schooling ( , SE=0.08, p<0.001, see Table 3). In comparison, having a mother who graduated college was associated with an additional 1.7 years of schooling.
Table 3.
EA respondents
|
AA respondents
|
|||||||||
---|---|---|---|---|---|---|---|---|---|---|
Estimate | SE | pv | N | Model r2 | Estimate | SE | pv | N | Model r2 | |
M1: | 0.37 | 0.08 | 7.5E-07 | 917 | 0.06 | 0.20 | 0.09 | 2.1E-02 | 677 | 0.02 |
M2A: | 0.30 | 0.07 | 3.1E-05 | 901 | 0.16 | 0.22 | 0.09 | 1.0E-02 | 671 | 0.04 |
M2B: | 0.29 | 0.08 | 1.4E-04 | 762 | 0.23 | 0.14 | 0.09 | 1.2E-01 | 556 | 0.12 |
M2: | 0.26 | 0.07 | 5.4E-04 | 752 | 0.26 | 0.14 | 0.09 | 1.2E-01 | 555 | 0.12 |
M3: | 0.35 | 0.11 | 2.3E-03 | 808 | 0.74 |
We repeated this analysis in the AA Sibling Pairs. The genetic effect was smaller in African Americans, but remained statistically significant (r=0.11, p<0.01). In real terms, after controlling for population structure, Model 1 suggests that each standard-deviation increase in polygenic score forecast their completion of about one-fifth of one year of additional schooling ( , SE=0.09, p=0.02).
Were adolescents’ social environments related to their genetic inheritance?
We next tested the potential for environmental confounding of genetic associations. In the EA sample, we did not detect a (significant) relationship between participants’ polygenic scores and their mothers’ educational attainments. In contrast, in the AA sample, participants with higher polygenic scores tended to have better educated mothers (r=0.12, p<0.01). This pattern of findings was reversed when we analyzed genetic associations with neighborhood disadvantage. EA participants with higher polygenic scores tended to live in more socially advantaged neighborhoods (r=-0.13, p<0.001) whereas AA participants’ polygenic scores were not related to the social circumstances of their neighborhoods. These findings show that genetic predisposition to educational attainment was socially stratified in both whites and blacks, although they suggest differences in the nature of that social stratification.
We next tested whether genetic associations with educational attainment could be accounted for by measured social environmental differences. We repeated our genetic analysis of educational attainment, this time adding statistical adjustments to account for maternal education and neighborhood disadvantage. For the EA respondents, adding controls for parental education and neighborhood disadvantage one at a time attenuated genetic effect estimates by roughly twenty percent (for a model controlling neighborhood disadvantage, Model 2A, βU′=0.30, SE=0.07, p<0.001; for a model controlling maternal education, Model 2B, βU′=0.29, SE=0.08, p<0.001). When both maternal education and neighborhood disadvantage were included in the model together, the genetic effect was reduced roughly by 30% (βU′=0.26, SE=0.07, p<0.001). We repeated this analysis in the AA sample. Because neighborhood disadvantage showed no distinguishable association with the polygenic score, we focus on Model 2B which adjusts the effect of the polygenic score for parental education. After including controls for maternal education in Model 2B, the estimated coefficient for the polygenic score was not statistically significant (βU′=0.14, SE=0.09, p=0.12).
Differences between adolescents’ polygenic scores also reflect genetic differences between their families. Correlations of polygenic scores between parents and children have been estimated as high as r=0.60 (Conley et al. 2015). In our sample, the correlations between EA siblings’ polygenic scores is r=0.53. Families with higher polygenic scores could achieve higher degrees and acquire the resources to move into more advantaged neighborhoods on the strength of their genetic endowments. As a result, interpretation of the attenuation of genetic effects from Model 1 to Model 2 is not straightforward. We therefore moved to the sibling comparison model, in which adolescents’ social environments are equal by design and genetic differences between individuals are randomly assigned by the “lottery” of meiosis.
Within a family, did the sibling with the higher polygenic score achieve higher educational attainment?
We expected that our Model 3 sibling fixed effect estimate would be similar to our Model 2 estimates. Surprisingly, the sibling-difference genetic effect was of nearly the same magnitude as the base model estimate ( , SE=0.11, p<0.01). This result suggests two things. First, genetic associations with educational attainment are non-spurious, i.e. not confounded by social environmental differences that correlate with adolescents’ polygenic scores. Second, sibling-based analyses may be subtly different from analysis of unrelated samples. We discuss the substance and implications of these differences below.
Do genetic effects operate via influence on verbal intelligence?
Published analyses suggest that genetic influence on educational attainment may be mediated by higher intellectual functioning, i.e. children with higher polygenic scores complete more schooling because they are cognitively more able (e.g., Rietveld et al. 2013). We found evidence to support this hypothesis in our models analyzing unrelated adolescents. Our analysis here focused on the subset of 877 EA respondents with data on the modified Peabody Picture Vocabulary Test of verbal intelligence in Add Health Wave 1. Adolescents with higher polygenic scores did better on the verbal intelligence Test (r=0.14, p<0.001, see Table 2). In turn, adolescents with higher verbal scores went on to complete more schooling (r=0.36, p<0.001). When we repeated the Model 1 analysis of the association between an adolescent’s polygenic score and their educational attainment, this time adding the verbal intelligence score as a covariate, the genetic effect was attenuated ( , p<0.001 compared to , p<0.001). This result suggests that about 1/3 of the genetic association with educational attainment is attributable to genetic influence on the development of verbal intelligence. However, the statistical test for the difference in coefficients fails to reach conventional significance levels.
We next subjected the mediation hypothesis to the rigorous test of the sibling comparison model. There was a relatively weak association between the difference in sibling polygenic scores and the difference in sibling verbal intelligence (r=0.07, p=0.18). However, the difference in sibling verbal intelligence was correlated with differences in attainment (r=0.22, p<0.001). When we repeated our analysis of the within-sibling association between polygenic score and educational attainment, this time adding Peabody score as a covariate, the coefficient was only modestly (and insignificantly) attenuated ( , SE=0.12, p<0.01 compared to , SE=0.12, p<0.01). This result suggests that very little of the genetic effect on sibling differences in educational attainment is attributable to sibling differences in verbal intelligence.
We discuss several plausible explanations for these divergent results based on between- and within-family analyses. First, it could be that intelligence score differences between siblings contain relatively less information than score differences between unrelated individuals. This could occur if there was less true score variance within sibships, If this were true and variance due to random measurement error remained constant across the two types of comparisons, then there may be a reduced reliability of the sibling difference score—that is, the ratio of signal to noise would be lower for the sibling analysis. It could also be the case that sibling analysis captures non-random measurement error, i.e. mean-regressive error which may occur if siblings deemphasized their verbal differences (consciously or unconsciously) when tested. This would not change the reliability of the family average (thus the point estimate for the between family analysis would be unaffected); however, it would lead to attenuation bias in the within-family analysis. A final potential explanation is that the mechanisms linking genes to educational attainment could be different for unrelated individuals compared to siblings. Twin studies suggest that traits other than intelligence (e.g., personality) may mediate genetic influences on educational attainment (Krapohl et al. 2014) and these traits may play a larger role in producing differences between siblings. We return to this divergence in results in the discussion.
Sensitivity Analyses
The strength of the sibling analyses is that factors which do not vary across siblings are eliminated as potential confounders. One clear difference between siblings, which previous studies have related to attainment, is their birth order (Kantarevic & Mechoulan, 2006; Conley & Glauber, 2006; Black et al. 2006 ; c.f., Hauser & Sewell, 1985). If birth order were also related to a person’s polygenic score, it would represent a plausible confounder. We therefore tested this association. A sibling’s birth order was not related to his/her polygenic score (r=0.01). When we include a dummy variable for birth order in Model 3, we estimate at Wave 4 to be 0.30 (p=0.01), unchanged from the original estimate of 0.29.
Previous research suggests the possibility that genetic influences on a child’s educational attainment may be modified by features of the child’s environment, such as their family’s SES (Turkehimer et al. 2003). A previous test of this hypothesis in older cohorts using a similar polygenic score found no evidence that genetic effects varied by family SES (Conley et al. 2015). As an exploratory analysis, we evaluated the hypothesis in our data by testing for an interaction between the polygenic score and maternal educational attainment in a modified version of Model 2B. The main effect of the polygenic score was similar to what was reported in Table 3, ( , p<0.001). We estimated an interaction between parental education and the polygenic score of -0.06 (SE=0.03, p=0.04). The coefficient being negative suggests that a child’s polygenic score is less predictive of his/her own educational attainment when his/her mother holds a higher degree. Notably, this finding is opposite the prediction that would be made based on the original Turkheimer observation, in which genetic factors explained more variance in higher SES children. We view this as a preliminary result which will need to be verified in the full Add Health cohort once it has been genotyped. A comparable model estimated in the AA sample yielded a main effect nearly identical to Model 2B ( , p=0.12) and an interaction of 0.01 (SE=0.04, p=0.79).
Given the limited sample size, statistical power is a concern. Based on published associations between the polygenic score and educational attainment, we expected an effect size of at least r=0.1. We have better than 80% power to detect such an effect in the EA sample. Power for the sibling-comparison analyses is somewhat lower (about 60%, additional details available in the Online Supplement). Therefore, our results should be interpreted as contributing to the evidence base on the nature of genetic associations with educational attainment, but needing replication in additional samples.
Discussion
We investigated a recently published genetic algorithm to predict educational attainment using genome-wide genetic data from the National Longitudinal Study of Adolescent to Adult Health (Add Health) sibling pair files (McQueen et al. 2014). We found that a polygenic score produced with this algorithm was predictive of educational outcomes in our sample of US adolescents born during the 1970s and 1980s and followed-up through the first decade of the 21st Century. Add Health respondents with higher polygenic scores completed more years of schooling as compared to peers with lower scores. Each standard deviation difference in polygenic score predicted roughly one-third of one year’s difference in completed schooling by the end of follow-up (e.g., a moderate effect size). This estimate may be a lower bound of how much variation in educational attainment can be predicted with a polygenic score. Twin studies estimate that approximately 40% of the variation in educational attainment is attributable to genetic factors (e.g., Branigan et al. 2013). The SSGAC estimates that the variance in educational attainment explained by their polygenic score will grow as their GWAS sample size increases; Rietveld et al. (2013) estimate that 15% of the variance in attainment might be predicted with a polygenic score derived from a GWAS on one million respondents.
Our sibling comparison analysis extends prior work (Rietveld et al. 2014; Conley et al. 2015) to a contemporary, nationally representative US sample. We further show, for the first time, clear evidence for socio-geographic patterning of polygenic scores in the contemporary United States. It is not entirely surprising that the genetic similarities of parents and children are reflected in their respective educational attainments (Krapohl & Plomin, 2015; Conley et al. 2015). But our data also show that patterning of polygenic scores extends to the neighborhoods in which children live. Neighborhoods can be important facilitators of or impediments to children’s social attainments (e.g. Chetty et al. 2015a, b). Future research should investigate neighborhoods and other macro-social factors as potential pathways through which familial genetic endowments influence children’s outcomes. Ultimately, the substance of genetic differences between neighborhoods implied by our analysis remains uncertain. Our observations here represent only a first illustration of how novel genome-science methods can begin to integrate biological science with research on social attainment and mobility.
A further contribution of our study is to identify an important difference in estimates of genetic effects obtained from between-family analysis and within-family analysis. In our between-family analysis, genetic effects were substantially attenuated when we included controls for family and neighborhood social advantage. This result suggests that, for educational attainment, social advantages are correlated with genetic advantages. This complicates the causal models social scientists use when they study socioeconomic gradients in education, particularly in light of evidence that childhood social advantage and educational attainment share genetic roots (Krapohl & Plomin, 2015). In any event, the within-family analysis does not have this problem due to the shared sibling environment. In the within-family analysis that also controlled for socioeconomic differences between individuals, genetic effects were nearly identical to unadjusted estimates from between-family analysis. We also see discrepancies in the mediation analyses: Verbal intelligence appears to mediate about 1/3 of the genetic association with educational attainment in analyses of unrelated individuals, but is a weaker mediator of genetic effects identified in the within-family analysis. So why do the two approaches yield such different results?
The explanation we favor is that families constitute heavily controlled laboratories for testing genetic effects. Out in the “wild” of between family analyses, variance in educational attainments is mostly accounted for by structural features of the social environments children grow up in—their parents’ education, the kinds of neighborhoods in which they live, and the schools they attend. These powerful social forces are silenced within families. This is generally regarded as a strength of within-family analysis. But in our case it may require a subtle reinterpretation of results. Because so much is similar for siblings, small differences in their genetic makeup have the opportunity to stand out. We know that medical treatments sometimes show large effects in carefully controlled trials, but prove less effective when implemented in field settings where there is more variation in treatment context (Rothwell, 2005). In the same way, a genetic difference measured by polygenic score could have larger consequences for a pair of siblings, who share most other determinants of educational outcomes, than for a pair of unrelated individuals. Some have referred to this pattern as a “social distinction” process in which particular social environments, specifically those in which background social noise in minimized, enable us to distinguish the signals from small genetic associations (Boardman, Daw, and Freese, 2013). It may also be the case that family environments function to magnify differences between siblings. Parents respond to observed differences in their children by making different investments in them (Conley, 2004), potentially magnifying a genetic difference of modest consequence. Siblings, seeking to differentiate themselves from one another, may form identities that track them toward more or less educationally enriching activities and associations, again, with the consequence of magnifying a genetic difference of initially modest consequence.
We acknowledge limitations. First, our data are right censored. Some Add Health participants may not have completed their educational careers by the time of the most recent Wave 4 interview. Continued follow-up of the cohort is needed. Second, our data are left-censored. Add Health began when participants were well along their adolescent educational careers. We were therefore unable to observe pre-schooling characteristics but also unable to observe all possible educational transitions (e.g., we have left and right censoring). Third, cognitive assessment in Add Health at baseline was limited to the modified Peabody Picture Vocabulary test. It is possible that the genetic influence measured in the polygenic score affects other facets of general cognitive ability not measured in this test of verbal intelligence. Finally, the Add Health Study used school-based cluster sampling, providing a highly attractive setting for investigating the role of schools in modifying/contextualizing genetic influence on educational outcomes (e.g., through use of school-level fixed effects). The sibling pairs sample is not large enough to take advantage of this design, and therefore schools are omitted from our analysis. We do analyze characteristics of children’s families and neighborhoods. Analysis of schools will be a priority when the genetic data on the full Add Health sample become available.
Conclusion
Twin studies have been the traditional approach for understanding the connection between genes and outcomes such as education, but they do not tell us about the biological underpinnings of these connection. Although we must emphasize that this age of integrative genetic research is only just entering its second decade, study of molecular genetic data has begun to offer evidence providing information about why certain types of genetic variation lead to variation in mental ability. At this point we attempt to answer a key question: What is the relevance of such genetics research to education research? At the present time, the predictive power of the polygenic score is clearly too weak to have “clinical” value and we are skeptical that even increased predictive power would make the score useful as the basis for intervention. But we do think this line of inquiry offers opportunities for study of (1) how the genetic predisposition towards attainment comes to fruition and (2) how environments, often in the role of policies, combine with biology to influence outcomes. We discuss these two opportunities in turn.
There are numerous reasons that certain individuals experience educational success. Some individuals have more raw ability in the various cognitive domains required to continue in education. Some individuals have psychological characteristics that contribute while others have social skills that lead to increased educational attainment. Genes are linked to all of these personal attributes. Here, we have tested one natural pathway (verbal intelligence) through which the genetic predisposition towards educational attainment may act, but we are limited in our ability to test other pathways. The full Add Health sample is currently being genotyped. When this process is complete, we hope to test additional pathways. Alongside the study of these mediating pathways, incorporating genetics into education research also provides an additional point of leverage for studying the translational pathways through which increased educational attainment may translate into more distal life course outcomes such as improved health and labor force participation.
One important pathway through which a genotype may translate into increased attainment involves the possibility that one’s genotype evokes a particular environment (i.e., evocative gene-environment correlation). This perspective suggests that genotype is associated with observable traits that may, for example, affect a counselor’s decision about class scheduling, a teacher’s perception of student ability or effort, or even the likelihood that a particular student will befriend certain people (Boardman et al. 2012). All of these factors may then have influences on the years of educational attainment. If this is the case, it does not change the fact that genotype is related to educational outcomes but it suggests that the cause has more to do with the environment in which one resides than the production of specific proteins that directly enhance one’s ability to succeed in school.
A second area of relevance to educational research of genetic inquiry is an increased understanding of how environments shape outcomes. As an example of how this might work, consider smoking. There is evidence to suggest that genes became a more important determinant of smoking behavior after the 1964 publication of the Surgeon General’s warning (Boardman 2010, Boardman et al. 2011). If those who still smoke have a different biological relationship with tobacco (as indicated by genetics) than smokers from previous generations, then this suggests that modern cessation efforts might need a new focus as compared to previous efforts aimed at those with a weaker genetic inclination towards smoking.
One might similarly consider the composition of those who enter college in 2015 compared to the composition of those who entered college in the mid-1960s. It is increasingly normative for nearly all students in the US to consider college attendance with 68% of high school graduates attending college in 2013 compared to only 45% in 1965 (US Bureau of Labor Statistics 2015). As such, it is possible that the relative contribution of genetics to educational attainment may have changed. This increased access to education may increase or decrease the relative contribution of genes to educational differences in the population. For example, 100 years ago, a remarkably select group of adults were able to attend and matriculate from college. Thus, social factors related to family resources and institutional connections placed great limits on who was able to obtain higher education. As such, small genetic associations may not have differentiated between individuals in this context. As social controls were removed, it is possible that the selection into college was not random but initiated primarily among those with higher polygenic scores which would enable genetic variation in the population to contribute to phenotypic variation (e.g., education).
Of course, there are almost assuredly scenarios that would decrease the relevance of the polygenic score. The introduction of compulsory schooling, universal preschool, and the GI bill are all interventions that have possibly changed the association between the polygenic score and attainment. Whether the genetic association with attainment is increasing or decreasing, the larger point is that a consideration of genetics can help us understand the role of environment, including policy interventions. In particular, a consideration of genetics may allow for understanding of response heterogeneity and, more broadly, could help us to understand why policies may (or may not) be generating the desired policy objective. Although such research is just beginning, Fletcher (2012) provides a useful example in which he demonstrates that the smoking behavior of certain individuals may be less sensitive to changes in the tax rate as a function of genotype.
In closing, this paper adds to the ample evidence to suggest that children’s educational attainments are influenced by their genes (e.g., Branigan et al. 2013). However, it is becoming increasingly clear that just as biology plays a role in shaping social outcomes such as education, the social environments in which humans are placed play a role in shaping their biology. For example, recent research suggests that chronic poverty plays a role in shaping brain structure (Noble et al. 2015). Children’s educational environments are among the most important social exposures that modern humans experience. Thus, we believe that just as genetics can offer new tools to education researchers, education researchers have important expertise to bring to genetic studies. Specifically, there is a need to identify which aspects of the educational environment matter, when in development they matter most, and whether there are specific children who may be more or less sensitive to these environments.
Supplementary Material
Acknowledgments
This research uses data from Add Health, a program project directed by Kathleen Mullan Harris and designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris at the University of North Carolina at Chapel Hill, and funded by grant P01-HD31921 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, with cooperative funding from 23 other federal agencies and foundations. Opinions reflect those of the authors and do not necessarily reflect those of the granting agencies. We also received support from NIH/NICHD R01 HD060726.
Footnotes
SNPs are single-letter changes in the human DNA sequence that are present in >1% of the population. An individual’s genotype for a SNP includes two “alleles,” one inherited from each parent. Most SNPs involve the substitution of one letter of the A-C-T-G alphabet of human DNA for another. So a SNP might be described as A/G if some individuals in the population carried a ‘G’ where most others carried an ‘A.’ An individual could carry one A and one G, or two As or two Gs. In some cases, a change in allele results in a functional change in the genome. For example, in the case of the SNP rs6265 in the BDNF gene, the substitution of an ‘A’ allele for the more common ‘G’ allele results in an amino acid substitution from valine to methionine, in turn resulting in altered production of the BDNF peptide (Egan et al., 2003). However, most SNPs do not have a known biological function and the biological significance of associations detected in GWAS is usually uncertain.
References
- Belsky DW, Moffitt TE, Houts R, Bennett GG, Biddle AK, Blumenthal JA, Caspi A, et al. Polygenic risk, rapid childhood growth, and the development of obesity: evidence from a 4-decade longitudinal study. Archives of pediatrics & adolescent medicine. 2012;166(6):515–521. doi: 10.1001/archpediatrics.2012.131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Belsky DW, Moffitt TE, Baker TB, Biddle AK, Evans JP, Harrington H, Caspi A, et al. Polygenic risk and the developmental progression to heavy, persistent smoking and nicotine dependence: evidence from a 4-decade longitudinal study. JAMA psychiatry. 2013a;70(5):534–542. doi: 10.1001/jamapsychiatry.2013.736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Belsky DW, Sears MR, Hancox RJ, Harrington H, Houts R, Moffitt TE, Caspi A, et al. Polygenic risk and the development and course of asthma: an analysis of data from a four-decade longitudinal study. The Lancet Respiratory Medicine. 2013b;1(6):453–461. doi: 10.1016/S2213-2600(13)70101-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Belsky DW, Moffitt TE, Sugden K, Williams B, Houts R, McCarthy J, Caspi A. Development and evaluation of a genetic risk score for obesity. Biodemography and social biology. 2013c;59(1):85–100. doi: 10.1080/19485565.2013.774628. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Belsky DW, Moffitt TE, Caspi A. Genetics in population health science: strategies and opportunities. American journal of public health. 2013d;103(S1):S73–S83. doi: 10.2105/AJPH.2012.301139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Belsky DW, Israel S. Integrating Genetics and Social Science: Genetic Risk Scores. Biodemography and social biology. 2014;60(2):137–155. doi: 10.1080/19485565.2014.946591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Black SE, Devereux PJ, Salvanes KG. The more the merrier? The effect of family size and birth order on children’s education. The Quarterly Journal of Economics. 2005:669–700. [Google Scholar]
- Boardman JD, Blalock CL, Pampel FC. Trends in the genetic influences on smoking. Journal of health and social behavior. 2010;51(1):108–123. doi: 10.1177/0022146509361195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boardman JD, Blalock CL, Pampel FC, Hatemi PK, Heath AC, Eaves LJ. Population composition, public policy, and the genetics of smoking. Demography. 2011;48(4):1517–1533. doi: 10.1007/s13524-011-0057-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boardman JD, Domingue BW, Fletcher JM. How social and genetic factors predict friendship networks. Proceedings of the National Academy of Sciences. 2012;109(S1):17377–17381. doi: 10.1073/pnas.1208975109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boardman JD, Daw J, Freese J. Defining the environment in gene–environment research: lessons from social epidemiology. American journal of public health. 2013;103(S1):S64–S72. doi: 10.2105/AJPH.2013.301355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Branigan AR, McCallum KJ, Freese J. Variation in the heritability of educational attainment: An international meta-analysis. Social forces. 2013;92(1):109–140. [Google Scholar]
- Cardon LR, Palmer LJ. Population stratification and spurious allelic association. The Lancet. 2003;361(9357):598–604. doi: 10.1016/S0140-6736(03)12520-2. [DOI] [PubMed] [Google Scholar]
- Chetty R, Hendren N. The Effects of Neighborhoods on Intergenerational Mobility: Childhood Exposure Effects and County Level Estimates. 2015a Available from http://www.equality-of-opportunity.org/index.php/papers.
- Chetty R, Hendren N, Katz L. The Effects of Exposure to Better Neighborhoods on Children: New Evidence from the Moving to Opportunity Experiment. 2015b doi: 10.1257/aer.20150572. Available from http://www.equality-of-opportunity.org/index.php/papers. [DOI] [PubMed]
- Conley D. The pecking order: Which siblings succeed and why. Pantheon 2004 [Google Scholar]
- Conley D. Genotyping a new, national household panel study. Journal of Economic and Social Measurement. 2015 doi: 10.3233/JEM-150408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Conley D, Glauber R. Parental educational investment and children’s academic risk estimates of the impact of sibship size and birth order from exogenous variation in fertility. Journal of human resources. 2006;41(4):722–737. [Google Scholar]
- Conley D, Domingue B, Cesarini D, Dawes C, Boardman J. Is the effect of parental education on offspring biased or moderated by genotype? Sociological Science. 2015 doi: 10.15195/v2.a6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Devlin B, editor. Intelligence, genes, and success: Scientists respond to The Bell Curve. Springer Science &Business Media; 1997. [Google Scholar]
- de Zeeuw EL, van Beijsterveldt CE, Glasner TJ, Bartels M, Ehli EA, Davies GE, Boomsma DI, et al. Polygenic scores associated with educational attainment in adults predict educational achievement and ADHD symptoms in children. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics. 2014;165(6):510–520. doi: 10.1002/ajmg.b.32254. [DOI] [PubMed] [Google Scholar]
- Domingue BW, Belsky DW, Harris KM, Smolen A, McQueen MB, Boardman JD. Polygenic risk predicts obesity in both white and black young adults. PloS one. 2014;9(7):e101596. doi: 10.1371/journal.pone.0101596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dudbridge F. Power and predictive accuracy of polygenic risk scores. PLoS genetics. 2013;9(3):e1003348. doi: 10.1371/journal.pgen.1003348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dunn LM, Dunn LM. Manual for the Peabody picture vocabulary test-revised. Circle Pines, MN: American Guidance Service; 1981. [Google Scholar]
- Dunn LM, Dunn LM. PPVT-III: Peabody picture vocabulary test. Circle Pines, MN: American Guidance Service; 1997. [Google Scholar]
- Egan MF, Kojima M, Callicott JH, Goldberg TE, Kolachana BS, Bertolino A, Weinberger DR, et al. The BDNF val66met polymorphism affects activity-dependent secretion of BDNF and human memory and hippocampal function. Cell. 2003;112(2):257–269. doi: 10.1016/s0092-8674(03)00035-7. [DOI] [PubMed] [Google Scholar]
- Fletcher JM. Why have tobacco control policies stalled? Using genetic moderation to examine policy impacts. PLoS ONE. 2012;7(12):e50576. doi: 10.1371/journal.pone.0050576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harris KM, Halpern CT, Haberstick BC, Smolen A. The National Longitudinal Study of Adolescent Health (Add Health) Sibling Pairs Data. Twin Research and Human Genetics. 2013;16(01):391–398. doi: 10.1017/thg.2012.137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hauser RM, Sewell WH. Birth order and educational attainment in full sibships. American Educational Research Journal. 1985;22(1):1–23. [Google Scholar]
- Kantarevic J, Mechoulan S. Birth order, educational attainment, and earnings an investigation using the PSID. Journal of human resources. 2006;41(4):755–777. [Google Scholar]
- Krapohl E, Rimfeld K, Shakeshaft NG, Trzaskowski M, McMillan A, Pingault JB, Plomin R, et al. The high heritability of educational achievement reflects many genetically influenced traits, not just intelligence. Proceedings of the National Academy of Sciences. 2014;111(42):15273–15278. doi: 10.1073/pnas.1408777111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krapohl E, Plomin R. Genetic link between family socioeconomic status and children’s educational achievement estimated from genome-wide SNPs. Molecular Psychiatry. 2015 doi: 10.1038/mp.2015.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laird NM, Lange C. Family-based designs in the age of large-scale gene-association studies. Nature Reviews Genetics. 2006;7(5):385–394. doi: 10.1038/nrg1839. [DOI] [PubMed] [Google Scholar]
- Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Myers RM, et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008;319(5866):1100–1104. doi: 10.1126/science.1153717. [DOI] [PubMed] [Google Scholar]
- McQueen MB, Boardman JD, Domingue BW, Smolen A, Tabor J, Killeya-Jones L, Harris KM, et al. The National Longitudinal Study of Adolescent to Adult Health (Add Health) Sibling Pairs Genome-Wide Data. Behavior genetics. 2014:1–12. doi: 10.1007/s10519-014-9692-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neisser U, Boodoo G, Bouchard TJ, Jr, Boykin AW, Brody N, Ceci SJ, Urbina S, et al. Intelligence: knowns and unknowns. American psychologist. 1996;51(2):77. [Google Scholar]
- Nelis M, Esko T, Mägi R, Zimprich F, Zimprich A, Toncheva D, Metspalu A, et al. Genetic structure of Europeans: a view from the North–East. PloS one. 2009;4(5):e5472. doi: 10.1371/journal.pone.0005472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Noble KG, Houston SM, Brito NH, Bartsch H, Kan E, Kuperman JM, Sowell ER, et al. Family income, parental education and brain structure in children and adolescents. Nature Neuroscience. 2015 doi: 10.1038/nn.3983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plomin R, Haworth CM, Davis OS. Common disorders are quantitative traits. Nature Reviews Genetics. 2009;10(12):872–878. doi: 10.1038/nrg2670. [DOI] [PubMed] [Google Scholar]
- Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics. 2006;38(8):904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
- Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nature Reviews Genetics. 2010;11(7):459–463. doi: 10.1038/nrg2813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, Sullivan PF, Fraser G, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460(7256):748–752. doi: 10.1038/nature08185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reich DE, Cargill M, Bolk S, Ireland J, Sabeti PC, Richter DJ, Lander ES, et al. Linkage disequilibrium in the human genome. Nature. 2001;411(6834):199–204. doi: 10.1038/35075590. [DOI] [PubMed] [Google Scholar]
- Rietveld CA, Medland SE, Derringer J, Yang J, Esko T, Martin NW, McMahon G, et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science. 2013;340(6139):1467–1471. doi: 10.1126/science.1235488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rietveld CA, Conley D, Eriksson N, Esko T, Medland SE, Vinkhuyzen AA, Teumer A, et al. Replicability and Robustness of Genome-Wide-Association Studies for Behavioral Traits. Psychological science. 2014;25(11):1975–1986. doi: 10.1177/0956797614545132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, Feldman MW. Genetic structure of human populations. science. 2002;298(5602):2381–2385. doi: 10.1126/science.1078311. [DOI] [PubMed] [Google Scholar]
- Rothwell PM. External validity of randomised controlled trials: “To whom do the results of this trial apply?”. The Lancet. 2005;365(9453):82–93. doi: 10.1016/S0140-6736(04)17670-8. [DOI] [PubMed] [Google Scholar]
- Tabery J. RA Fisher, Lancelot Hogben, and the origin (s) of genotype–environment interaction. Journal of the History of Biology. 2008;41(4):717–761. doi: 10.1007/s10739-008-9155-y. [DOI] [PubMed] [Google Scholar]
- Turkheimer Eric, Haley Andreana, Waldron Mary, D’Onofrio Brian, Gottesman Irving I. Socioeconomic status modifies heritability of IQ in young children. Psychological science. 2003;14(6):623–628. doi: 10.1046/j.0956-7976.2003.psci_1475.x. [DOI] [PubMed] [Google Scholar]
- US Bureau of Labor Statistics. College enrollment and work activity of 2014 high school graduates. 2015 Retrieved from http://www.bls.gov/news.release/hsgec.nr0.htm.
- Ward ME, McMahon G, St Pourcain B, Evans DM, Rietveld CA, Benjamin DJ, et al. Social Science Genetic Association Consortium. Genetic variation associated with differential educational attainment in adults has anticipated associations with school performance in children. PloS one. 2014;9(7):e100248. doi: 10.1371/journal.pone.0100248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, Parkinson H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic acids research. 2014;42(D1):D1001–D1006. doi: 10.1093/nar/gkt1229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wood AR, Esko T, Yang J, Vedantam S, Pers TH, Gustafsson S, Lim U, et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nature genetics. 2014;46(11):1173–1186. doi: 10.1038/ng.3097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wray NR, Goddard ME, Visscher PM. Prediction of individual genetic risk to disease from genome-wide association studies. Genome research. 2007;17(10):1520–1528. doi: 10.1101/gr.6665407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeileis A. Econometric Computing with HC and HAC Covariance Matrix Estimators. Journal of Statistical Software. 2004;11(10):1–17. URL http://www.jstatsoft.org/v11/i10/ [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.