Abstract
We conduct a large-scale genetic association analysis of educational attainment in a sample of ~1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of ~0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11–13% of the variance in educational attainment and 7–10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.
INTRODUCTION
Educational attainment (EA) is moderately heritable1 and an important correlate of many social, economic, and health outcomes2,3. Because of its relationship with many health outcomes, measures of EA are available in most medical data sets. Partly for this reason, EA was the focus of the first large-scale genome-wide association study (GWAS) of a social-science phenotype4 and has continued to serve as a “model phenotype” for behavioral traits (analogous to height for medical traits). Genetic associations with EA identified via GWAS have been used in follow-up work examining biologizcal5 and behavioral mechanisms6,7 and genetic overlap with health outcomes8,9.
The largest (N = 293,723) GWAS of EA to date identified 74 approximately independent SNPs at genome-wide significance (hereafter, lead SNPs) and reported that a 10-million-SNP linear predictor (hereafter, polygenic score) had an out-of-sample predictive power of 3.2%10. Here, we expand the sample size to over a million individuals (N = 1,131,881). We identify 1,271 lead SNPs. In a subsample (N = 694,894), we also conduct genome-wide association analyses of variants on the X chromosome, identifying ten lead SNPs.
The dramatic increase in our GWAS sample size enables us to conduct a number of informative additional analyses. For example, we show that the lead SNPs have heterogeneous effects, and we perform within-family association analyses that probe the robustness of our results. Our biological annotation analyses, which focus on the results from the autosomal GWAS, reinforce the main findings from earlier GWAS in smaller samples, such as the role of many of the prioritized genes in brain development. However, the newly identified SNPs also lead to several new findings. For example, they strongly implicate genes involved in almost all aspects of neuron-to-neuron communication.
We found that a polygenic score derived from our results explains around 11% of EA variance. We also report additional GWAS of three phenotypes that are highly genetically correlated with EA: cognitive (test) performance (N = 257,841), self-reported math ability (N = 564,698), and hardest math class completed (N = 430,445). We identify 225, 618, and 365 lead SNPs, respectively. When we jointly analyze all four phenotypes using a recently developed method11, we found that the explanatory power of polygenic scores based on the resulting summary statistics increases, to 12% for EA and 7–10% for cognitive performance.
RESULTS
Primary GWAS of EduYears
In our primary GWAS, we study EA, which is measured as number of years of schooling completed (EduYears). All association analyses were performed at the cohort level in samples restricted to European-descent individuals. We applied a uniform set of quality-control procedures to all cohort-level results. Our final sample-size-weighted meta-analysis produced association statistics for ~10 million SNPs from phase 3 of the 1000 Genomes Project12.
The quantile-quantile plot of the meta-analysis (Supplementary Figure 1) exhibits substantial inflation (λGC = 2.04). According to our LD Score regression13 estimates, only a small share (~5%) of this inflation is attributable to bias (Supplementary Figure 2, Supplementary Table 1). We used the estimated LD Score intercept (1.11) to generate inflation-adjusted test statistics.
Fig. 1 shows the Manhattan plot of the resulting P values. We identified 1,271 approximately independent (pairwise r2 < 0.1) SNPs at genome-wide significance (P < 5×12−8), 995 of which remain if we adopt the stricter significance threshold (P < 1×10−8) proposed in a recent study (Supplementary Table 2, see Online Methods for a description of the clumping algorithm). The Supplementary Note and Supplementary Table 3 reports the results from a conditional-joint analysis14.
We used a Bayesian statistical framework to calculate winner’s-curse-adjusted posterior distributions of the effect sizes of the lead SNPs (Online Methods). We found that the median effect size of the lead SNPs corresponds to 1.7 weeks of schooling per allele; at the 5th and 95th percentiles, 1.1 and 2.6 weeks, respectively. We also examined the replicability of the 162 single-SNP associations (P < 5×10−8) reported from the combined discovery and replication sample (N = 405,073) of the largest previous study10. In the subsample of our data (N = 726,808) that did not contribute to the earlier study’s analyses, the SNPs replicate at a rate that closely matches theoretical projections derived from our Bayesian framework (Supplementary Figure 3).
Within-Family Association Analyses
We conducted within-family association analyses in four sibling cohorts (22,135 sibling pairs) and compared the resulting estimates to those from a meta-analysis that excluded the siblings (N = 1,070,751). The latter association statistics were adjusted for stratification bias using the LD Score intercept. Fig. 2 shows the observed sign concordance for three sets of approximately independent SNPs, selected using P value cutoffs of 5×10−3, 5×10−5, and 5×10−8. The concordance is substantially greater than expected by chance but weaker than predicted by our Bayesian framework, even after we extend the framework to account for inflation in GWAS coefficients due to assortative mating. In a second analysis based on all SNPs, we estimate that within-family effect sizes are roughly 40% smaller than GWAS effect sizes and that our assortative-mating adjustment explains at most one third of this deflation. (For comparison, when we apply the same method to height, we found that the assortative-mating adjustment fully explains the deflation of the within-family effects.)
Supplementary Note contains analyses and discussion of the possible causes of the remaining deflation we observe for EduYears. While the evidence is not conclusive, it suggests that the GWAS effect-size estimates may be biased upward by correlation between EA and a rearing environment conducive to EA. Consistent with this hypothesis, a recent paper15 reports that a polygenic score for EduYears based entirely on parents’ non-transmitted alleles is approximately 30% as predictive as a polygenic score based on transmitted alleles. (For height, the analogous estimate is only 6%.) The non-transmitted alleles affect parents’ EA but can only influence the child’s EA indirectly. If greater parental EA positively influences the rearing environment, then GWAS that control imperfectly for rearing environment will yield inflated estimates. The LD Score regression intercept does not capture this bias because the bias scales with the LD Score in the same way as a direct genetic effect.
Heterogeneous Effect Sizes
Because educational institutions vary across places and time, the effects of specific SNPs may vary across environments. Consistent with such heterogeneity, for the lead SNPs, we reject the joint null hypothesis of homogeneous cohort-level effects (P value = 9.7×10−12; Supplementary Figure 4). Moreover, we found that the inverse-variance-weighted mean genetic correlation of EduYears across pairs of cohorts in our sample is 0.72 (SE = 0.14), which is statistically distinguishable from one (P value = 0.03).
Our finding of an imperfect genetic correlation replicates earlier results from smaller samples16,17. This imperfect genetic correlation is an important factor to consider in power calculations and study design. In the Supplementary Note, we report exploratory analyses that aim to identify specific sources of measurement heterogeneity or gene-environment interaction that may explain the imperfect genetic correlation. Unfortunately, the estimates are noisy, and the only strong finding was that SNP heritability was smaller in cohorts whose measure of EduYears is derived from questions with fewer response categories.
X-Chromosome GWAS Results
We supplemented our autosomal analyses with association analyses of SNPs on the X chromosome. We first conducted separate association analyses of males (N = 152,608) and females (N = 176,750) in the UK Biobank. We found a male-female genetic correlation close to unity. We also found nearly identical SNP heritability estimates for men and women, which is consistent with partial dosage compensation (i.e., on average the per-allele effect sizes are smaller in women) and implies that any contribution of common variants on the X chromosome to sex differences in the normal-range variance of cognitive phenotypes18 is quantitatively negligible.
Next, we conducted a large (N = 694,894) meta-analysis of summary statistics from mixed-sex analyses (Supplementary Figure 5). We identified 10 lead SNPs and estimated a SNP heritability due to the X chromosome of ~0.3% (Supplementary Table 4). This heritability is lower than that expected for an autosome of similar length (Supplementary Figure 6, Supplementary Table 5). We cannot distinguish whether the lower heritability is due to smaller per-allele effect sizes for SNPs on the X chromosome or to the combination of haploidy in males and (partial) X-inactivation in females.
Biological Annotation
For biological annotation, we focus on the results from the autosomal meta-analysis of EduYears. Across an extensive set of analyses (see Supplementary Figure 7 for a flowchart), all major conclusions from the largest previous GWAS of EduYears10 continue to hold but are statistically stronger. For example, we applied the bioinformatics tool DEPICT19 and found that, relative to other genes, genes near our lead SNPs are overwhelmingly enriched for expression in the central nervous system (Fig. 3A, Supplementary Table 6).
There are also many novel findings associated with the large number of genes newly implicated by our analyses: At the standard false discovery rate (FDR) threshold of 5%, the bioinformatics tool DEPICT19 prioritizes 1,838 genes (Supplementary Table 7), a tenfold increase relative to the DEPICT results from an earlier GWAS of EduYears10. In what follows, we distinguish between the 1,703 “newly prioritized” genes and the 135 “previously prioritized” genes.
The Supplementary Note contains an extensive analysis of many of the newly prioritized genes and their brain-related functions. Here we highlight two especially noteworthy regularities. First, whereas previously prioritized genes exhibited especially high expression in the brain prenatally, newly prioritized genes show elevated levels of expression both pre- and postnatally (Fig. 3B). Many of the newly prioritized genes encode proteins that carry out online brain functions such as neurotransmitter secretion, the activation of ion channels and metabotropic pathways, and synaptic plasticity (Supplementary Figure 8).
Second, even though glial cells are at least as numerous as neurons in the human brain20, gene sets related to glial cells (astrocytes, myelination, and positive regulation of gliogenesis) are absent from those identified as positively enriched (Supplementary Table 8). Furthermore, using stratified LD Score regression21, we estimated relatively weak enrichment of genes highly expressed in glial cells (Supplementary Table 9): 1.08-fold for astrocytes (P = 0.07) and 1.09-fold for oligodendrocytes (P = 0.06) versus 1.33-fold for neurons (P = 2.89×10−11). Because myelination increases the speed with which signals are transmitted along axons22, the absence of enrichment of genes related to glial cells may weigh against the hypothesis that differences across people in cognition are driven by differences in transmission speed.
The results also raise a number of possible targets for functional studies. Among SNPs within 50 kb of lead SNPs, 127 of them are identified by the fine-mapping tool CAVIARBF23 as likely causal SNPs (posterior probability > 0.9) (Supplementary Table 10). Eight of these are non-synonymous, and one of these (rs61734410) is located in CACNA1H (Supplementary Figure 9), which encodes the pore-forming subunit of a voltage-gated calcium channel that has been implicated in the trafficking of NMDA-type glutamate receptors24.
Polygenic Prediction
Polygenic predictors derived from earlier GWAS of EduYears have proven to be a valuable tool for researchers, especially in the social sciences6,7. We constructed polygenic scores for European-ancestry individuals in two prediction cohorts: the National Longitudinal Study of Adolescent to Adult Health (Add Health, N = 4,775), a representative sample of American adolescents; and the Health and Retirement Study (HRS, N = 8,609), a representative sample of Americans over age 50. We measure prediction accuracy by the “incremental R2”: the gain in coefficient of determination (R2) when the score is added as a covariate to a regression of the phenotype on a set of baseline controls (sex, birth year, their interaction, and 10 principal components of the genetic relatedness matrix).
All scores are based on results from a meta-analysis that excluded the prediction cohorts. Our first four scores were constructed from sets of LD-pruned SNPs associated with EduYears at various P-value thresholds: 5×10−8, 5×10−5, 5×10−3, and 1 (i.e., all SNPs). In both cohorts, the predictive power is greater for scores constructed with less stringent thresholds (Supplementary Figure 10). The sample-size-weighted mean incremental R2 increases from 3.2% at P < 5×10−8 to 9.4% at P ≤ 1. Our fifth score was generated from HapMap3 SNPs using the software LDpred25. Rather than dropping SNPs in LD with each other, LDpred is a Bayesian method which weights each SNP by (an approximation to) the posterior mean of its conditional effect, given other SNPs. This score was the most predictive in both cohorts, with an incremental R2 of 12.7% in AddHealth and 10.6% in HRS (and a sample-size weighted average of 11.4%).
To put the predictive power of this score in perspective, Fig. 4A shows the mean college completion rate by polygenic-score quintile. The difference between the bottom and top quintiles in Add Health and HRS is, respectively, 45 and 36 percentage points (see Supplementary Figure 11 for analogous analyses of high school completion and grade retention). Fig. 4B compares the incremental R2 of the score to that of standard demographic variables. The score is a better predictor of EduYears than household income and a worse predictor than mother’s or father’s education. Controlling for all the demographic variables jointly, the score’s incremental R2 is 4.6% (Supplementary Figure 12).
We also found that the score has substantial predictive power for a variety of other cognitive phenotypes measured in the prediction cohorts (Supplementary Figure 13). For example, it explains 9.2% of the variance in overall grade point average in Add Health.
Because the discovery sample used to construct the score consisted of individuals of European ancestry, we would not expect the predictive power of our score to be as high in other ancestry groups7,26,27. Indeed, when our score is used to predict EduYears in a sample of African-Americans from the HRS (N = 1,519), the score only has an incremental R2 of 1.6%, implying an attenuation of 85%. The Supplementary Note shows that this amount of attenuation is typical of what has been reported in previous studies.
Related Cognitive Phenotypes and MTAG
We performed genome-wide association analyses of three complementary phenotypes: cognitive performance (N = 257,841), self-reported math ability (Math Ability, N = 564,698), and highest math class taken (Highest Math, N = 430,445). For cognitive performance, we meta-analyzed published results from the COGENT Consortium28 with results based on new analyses of the UKB, as did Davies et al.29. For the two math phenotypes, we studied new genome-wide analyses in samples of research participants from 23andMe. We identified 225, 618, and 365 genome-wide significant SNPs for Cognitive Performance, Math Ability, and Highest Math, respectively (Supplementary Figures 14–16, Supplementary Tables 11–13).
We conducted a multi-trait analysis of EduYears and our supplementary phenotypes to improve polygenic prediction accuracy. These phenotypes are well suited to joint analysis because their pairwise genetic correlations are high, in all cases exceeding 0.5 (Supplementary Table 14). We applied a recently developed method, Multi-Trait Analysis of GWAS, or MTAG11, to summary statistics for the four phenotypes from meta-analyses that exclude the prediction cohorts. For all four phenotypes, MTAG increases the number of lead SNPs identified at genome-wide significance (Supplementary Figures 17–20, Supplementary Table 15). Fig. 4C shows the incremental R2 for the polygenic scores based on GWAS and MTAG association statistics (but otherwise constructed using identical methods) when the target phenotype is either EduYears (left panel) or Cognitive Performance (right panel).
In Add Health, where our measure of cognitive performance is the respondent’s score on a test of verbal cognition, the incremental R2s of the GWAS and MTAG scores are 5.1% and 6.9%, respectively. To obtain a better measure prediction accuracy for cognitive performance, we used an additional validation cohort, the Wisconsin Longitudinal Study (WLS), which administered a cognitive test with excellent retest reliability and psychometric properties similar to those used in our discovery GWAS of cognitive performance. In the WLS, the MTAG score predicts 9.7% of the variance in Cognitive Performance, a substantial improvement over the 7.0% predicted by the GWAS score and approximately double the prediction accuracy reported in three recent GWASs of cognitive performance29–31.
DISCUSSION
The results of this study illustrate what the advocates of GWAS anticipated: as sample sizes get large, thousands of lead SNPs will be identified, and polygenic predictors will attain non-trivial levels of predictive power. However, theoretical projections that failed to consider heterogeneity of effect sizes were optimistic4. Our and others’ findings16,17 suggest that imperfect genetic correlation across cohorts will be the norm for phenotypes that, like EA, are environmentally contingent.
For research at the intersection of genetics and neuroscience, the set of 1,271 lead SNPs we identify is a treasure trove for future analyses. For research in social science and epidemiology, the polygenic scores we construct—which explain 11–13% and 7–10% of the variance of EA and cognitive performance, respectively—will prove useful across at least three types of applications.
First, by examining associations between the scores and high-quality measures of endophenotypes, researchers may be able to disentangle the mechanisms by which genetic factors affect EA and cognitive phenotypes. Such studies are already being conducted with polygenic scores from earlier GWAS of EA6,7, but they can now be well powered in samples as small as those from laboratory experiments. For example, if our polygenic score explains 10% of the variance in an endophenotype, then its effect can be detected at a 5% significance threshold with 80% power in a sample of only 75 individuals. Second, the polygenic scores can be used as control variables in randomized controlled trials (RCTs) of interventions that aim to improve academic and cognitive outcomes. Given the scores’ current levels of predictive power, such use can now generate non-trivial gains in statistical power for the RCT. For example, if adding the polygenic score to the set of control variables in an RCT increases their joint explanatory power from 10% to 20%, then the gain in power from including the polygenic score is equivalent to increasing the RCT’s sample size by 11% (for such calculations, see the SOM of Rietveld et al.4). Third, the polygenic scores can be used as a tool for exploring gene-environment interactions32, which are known to be important for genetic effects on educational attainment and cognitive performance1,33.
Our results also highlight two caveats to the use of the polygenic scores in research. First, our within-family analyses suggest that GWAS estimates may overstate the causal effect sizes: if EA-increasing genotypes are associated with parental EA-increasing genotypes, which are in turn associated with rearing environments that promote EA, then failure to control for rearing environment will bias GWAS estimates. If this hypothesis is correct, some of the predictive power of the polygenic score reflects environmental amplification of the genetic effects. Without controls for this bias, it is therefore inappropriate to interpret the polygenic score for EA as a measure of genetic endowment.
Second, we found that our score for EA has much lower predictive power in an African-American sample than in a European-ancestry sample, and we anticipate that the score would also have reduced predictive power in other non-European-ancestry samples. Therefore, until polygenic scores are available that have as much predictive power in other ancestry groups, the score will be most useful in research that is focused on European-ancestry samples.
ONLINE METHODS
This article is accompanied by a Supplementary Note with further details.
Genome-wide association study meta-analyses.
Our primary analysis extends the (combined discovery and replication) sample of a previous genome-wide association study (GWAS) of educational attainment10 from N = 405,072 to N = 1,131,881 individuals. We performed a sample-size-weighted meta-analysis of 71 quality-controlled cohort-level results files using the METAL software35. The meta-analysis combines 59 cohort-level results files from the previous study with 12 new results files: 8 from cohorts that were not included in the previous study10 and 4 from cohorts that updated their results in larger samples.
All cohort-level analyses were restricted to European-ancestry individuals that passed the cohort’s quality control and whose EduYears was measured at an age of at least 30. The EduYears phenotype was constructed by mapping each major educational qualification that can be identified from the cohort’s survey measure to an International Standard Classification of Education (ISCED) category and imputing a years-of-education equivalent for each ISCED category. Details on cohort-level phenotype measures, genotyping, imputation, association analyses, and quality-control filters are described in Supplementary Tables 16–19.
We used the estimated intercept from LD Score regression13 to inflation-adjust the test statistics. We then used the clumping algorithm described below to determine the number of approximately independent SNPs identified at any given P value threshold.
Clumping algorithm.
Our clumping algorithm is iterative and has been used previously10. We describe it here for the case of identifying lead SNPs among the set of SNPs reaching P < 5×10−8; the algorithm is the same when determining sets of approximately independent SNPs for other P value thresholds.
First, the SNP with the smallest P value in the pooled meta-analysis results is identified as the lead SNP of the first clump. Next, all SNPs in LD with the lead SNP are also assigned to this clump. SNPs are defined to be in LD with each other if they are on the same chromosome and the squared correlation of their genotypes is r2 > 0.1. To determine the second lead SNP and second clump, the first clump is removed, and the same steps are applied to the remaining SNPs. The process is repeated until no SNPs with P value below 5×10−8 remain. Each locus is defined by a lead SNP and the SNPs assigned to its clump. Hence, each lead SNP maps to exactly one locus, and each locus maps to exactly one lead SNP.
We perform the clumping in Plink36. Note that we measure the LD between every pair of SNPs on each chromosome without regard to the physical distance between them. Therefore, if two SNPs on the same chromosome have pairwise r2 above 0.1, then they cannot both be lead SNPs. On the other hand, it is possible for two SNPs in close physical proximity both to be lead SNPs, provided their pairwise r2 is below 0.1. The Supplementary Note reports analyses of the sensitivity of the number of lead SNPs and loci to alternative definitions and to the choice of the reference file used to estimate LD.
Conditional and joint multiple-SNP analysis (COJO).
Given a P value threshold specified by the user, COJO14 is a method that identifies a set of SNPs such that, in a multivariate regression of the phenotype on all the SNPs in the set, every SNP has a P value below threshold. COJO uses the meta-analysis summary statistics together with LD estimates from a reference simple. Our COJO analysis was conducted using a reference sample of approximately unrelated individuals of European ancestry from UK Biobank. We specified the P value threshold 5 × 10−8. The analyses were restricted to SNPs satisfying recommended quality-control filters. The Supplementary Note contains additional details.
Bayesian framework for calculating winner’s-curse-adjusted posterior effect-size distributions.
We assume that the marginal effect size of each SNP is drawn from the following mixture distribution:
where τ2 is the effect-size variance for non-null SNPs and π is the fraction of non-null SNPs in our data. We estimate the parameters τ2 and π by maximum likelihood. Given their values, the posterior distribution of SNP j can be calculated from Bayes’ Rule. Relative to the GWAS effect estimate, the mean of the posterior distribution is shrunken toward zero (because zero is the mean of the prior distribution) and is not biased by the winner’s curse. Further details and a derivation of the likelihood function used in the maximum-likelihood estimation are provided on p. 59 in the Supplementary Note of a previous SSGAC study37.
To calculate the 5th, 50th, and 95th percentile of the effect-size distribution of our lead SNPs, we simulated effect sizes from each lead SNP’s posterior distribution and identified the 5th, 50th, and 95th percentiles of the complete set of simulated effect sizes.
As described below, we also use this Bayesian framework in our GWAS and MTAG replication analyses and in our within-family analyses.
Replication of lead SNPs from Okbay et al.’s combined-stage analysis.
We conducted a replication analysis of the 162 lead SNPs identified at genome-wide significance in Okbay et al.’s10 pooled (discovery and replication) meta-analysis (N = 405,073). Of the 162 SNPs, 158 pass quality-control filters in our updated meta-analysis. To examine their out-of-sample replicability, we calculated Z-statistics from the subsample of our data (N = 726,808) that was not included in Okbay et al. Let the Z-statistics of association from, respectively, Okbay et al., the new data, and our final EA3 meta-analysis, be denoted by Z1, Z2 and Z. Since our meta-analysis used sample-size weighting35, Z2 is implicitly defined by:
where SNP subscripts have been dropped and N’s are sample sizes. Because this formula holds when Z1 and Z2 are independent, the implicitly-defined Z2 is interpreted as the additional information contained in the new data.
Of the 158 SNPs, we found that 154 have matching signs in the new data (for the remaining four SNPs, the estimated effect is never statistically distinguishable from zero at P < 0.10). Of the 154 SNPs with matching signs, 143 are significant at P < 0.01, 119 are significant at P < 10−5, and 97 are significant at P < 5×10−8. The replication results are shown graphically in Supplementary Figure 3. To help interpret these results, we used the Bayesian framework described above to calculate the expected replication record under the hypothesis that all 158 SNPs are true associations. The posterior distributions of the SNPs’ effect sizes are calculated using parameters estimated from Okbay et al.’s summary statistics: .
Within-family analyses.
We conducted within-family association analyses on a sample of 22,135 sibling pairs from STR-Twingene, STR-SALTY, UKB, and WLS. For each cohort, we standardized EduYears within the cohort and then residualized this variable using the same controls as in the GWAS. We then regressed the sibling difference in the residuals on the sibling difference in genotype. We restricted analyses to SNPs with minor allele frequency above 5% in each of the sibling cohorts and meta-analyzed the cohort-level results using inverse-variance weighting.
We followed Okbay et al.37 to compare the signs of the within-family estimates to the signs of the estimates from a GWAS meta-analysis that we re-ran after removing the sibling samples (N = 1,070,751). We benchmarked our observed fraction of concordant signs against the three theoretical benchmarks shown in Fig. 2. The theoretical benchmarks are calculated using posterior distributions for the GWAS effect sizes obtained from our Bayesian statistical framework. Treating each benchmark as a null hypothesis, we conducted one-sided binomial tests where the alternative hypothesis is that the observed sign concordance falls short of the benchmark. We conducted this test for sets of approximately independent SNPs selected at the P value thresholds 5×10−8, 5×10−5, and 5×10−3 (Supplementary Table 20 and Fig. 2).
We also performed regression-based comparisons of the within-family estimates and the GWAS estimates (Supplementary Table 21 and Supplementary Figure 21). Further details, including a derivation of our assortative-mating adjustment, can be found in the Supplementary Note.
Joint F-test of heterogeneity.
When the SNPs are considered individually, for all but one of the 1,271 lead SNPs, we fail to reject a null hypothesis of homogenous effects across cohorts at the Bonferroni-adjusted P value threshold of 0.05/1,271. We generated an omnibus test statistic for heterogeneity by summing the Cochran Q-statistics for heterogeneity across all 1,271 lead SNPs38. Because the software used for meta-analysis does not report Q-statistics, we inferred these values based on the reported heterogeneity P values. To do so, we treated each lead SNP as if it were available for each of the 71 cohorts in the meta-analysis, which implies that the Q-statistic for each lead SNP has a χ2 distribution with 70 degrees of freedom. The sum of these Q-statistics is therefore (approximately) χ2-distributed with 70 × 1,271 = 88,970 degrees of freedom. This gave us an omnibus Q-statistic of 91,830, with corresponding P value equal to 9.68 × 10−12.
Cross-cohort genetic correlation.
We estimated the genetic correlation of EduYears across all pairs of cohorts with non-negative heritability estimates (Supplementary Table 22). We used bivariate LD Score regression39 implemented by the LDSC software with a European reference population, filtered to HapMap3 SNPs. The estimated genetic correlations of EduYears between each of our 933 pairs of cohorts is shown in Supplementary Table 23.
We calculated the inverse-variance-weighted mean of the genetic-correlation estimates. The genetic correlation across pairs of cohorts will be correlated across all observations that share one of their cohorts in common. Therefore, to obtain correct standard errors, we used the node-jackknife variance estimator described by Cameron and Miller40. As detailed in Supplementary Note, we also estimated the variance of SNP heritability of EduYears across cohorts, and we conducted analyses to assess the extent to which we can predict variation in SNP heritability and genetic correlation of EduYears based on several observable cohort characteristics (Supplementary Tables 24 and 25).
X chromosome.
We performed association analyses of SNPs on the X chromosome in our two largest cohorts, UKB (N = 329,358) and 23andMe (N = 365,536). The UKB analyses were conducted in a sample of conventionally unrelated European-ancestry individuals, yielding a smaller sample size than the autosomal UKB analyses (Supplementary Table 26). Imputed genotypes for the X chromosome were not included in the data officially released by UKB. We therefore imputed the data ourselves using the 1000 Genomes Project41 as our reference panel.
In both cohorts, the association analyses were performed on a pooled male-female sample with male genotypes coded 0/2. Except for this allele coding in males, all major aspects of the 23andMe analysis were identical to those described for the autosomal analyses; see Supplementary Tables 17–19 for details.
Both sets of association results underwent the same set of quality-control filters as the autosomal analyses prior to meta-analysis. Additionally, we dropped a small number of SNPs with male-female allele frequency differences above 0.005 in UKB. The meta-analysis was conducted in METAL35, using sample-size weighting. Only SNPs that were present in both cohorts’ results files were used. To adjust the test statistics for bias, we inflated the standard errors using the LD Score regression intercept estimated from our main autosomal analysis .
Heritability of the X chromosome and dosage compensation.
To estimate SNP heritability for males and females, we use the equation
where i ∈ {m, f} indicates males or females, is the expected χ2 statistic, is the SNP heritability for the X chromosome, Ni is the GWAS sample size, and Meff is the effective number of SNPs (which is assumed to be the same in males and females). We replaced with its sample analog and Meff with its estimated value, and then we solved for .
Let denote the dosage compensation ratio. The ratio takes on a value between 0.5 (zero dosage compensation) and 2 (full dosage compensation). Based on the above equation, we estimated it as
where is the mean χ2 statistic. (Equivalently, our γ estimate is equal to the ratio of our SNP heritability estimates.)
Biological annotation.
We used DEPICT19 (downloaded February 2016 from https://github.com/perslab/depict) to identify the tissues/cell types where the causal genes are strongly expressed, detect enrichment of gene sets, and prioritize likely causal genes. We ran DEPICT as described previously10 with the following exceptions: we used 37,427 human Affymetrix HGU133a2.0 platform microarrays19, discarded gene sets that were not well reconstituted42, and relaxed the significance threshold for defining a matching SNP in the simulated null GWAS from 5×10−4 to 5×10−3. “Previously prioritized” genes were prioritized by DEPICT (in the sense of achieving FDR < 0.05) both in Okbay et al.10 and in the current work; “newly prioritized genes,” on the other hand, were not prioritized in Okbay et al.10. We used expression data from the BrainSpan Developmental Transcriptome34 and calculated the average expression in the brain of all DEPICT-prioritized EduYears genes (Supplementary Table 7) as a function of developmental stage (Supplementary Table 8, Supplementary Figure 22).
In addition to the analyses presented in the main text, we determined which functional systems are least implicated by DEPICT (Supplementary Table 27) and how enrichment of gene sets differs across phenotypes (Supplementary Table 28).
We tested the robustness of our DEPICT results using the bioinformatics tools MAGMA43 and PANTHER44,45. For MAGMA, we used the “multi=snp-wise” option, mapping a SNP to a gene if it resides within the gene boundaries or 5kb of either endpoint. We estimated LD using a reference panel of Europeans in 1000 Genomes phase 3, and we defined a gene as significant if its joint P value falls below the threshold corresponding to FDR < 0.05 (Supplementary Table 29). For PANTHER, we used the binomial overrepresentation test with the DEPICT-prioritized genes as input (Supplementary Table 30).
We also used stratified LD Score regression21 to partition the heritability of the trait between SNPs of different types. In addition to the baseline SNP-level annotations (Supplementary Table 31), we tested a number of novel annotation types, described more fully in the Supplementary Note. We tested the heritability enrichment of neural cell types (Supplementary Table 9), various SNP-level annotations assembled by Pickrell46 (Supplementary Figure 23, Supplementary Table 32), developmental stages (Supplementary Table 33), and genes that are broadly expressed or specifically expressed in a particular tissue (Supplementary Figure 24, Supplementary Table 34). We also applied LD Score regression to DEPICT-reconstituted gene sets (Supplementary Table 35) and binary gene sets (Supplementary Table 36 and Supplementary Figure 25).
We used the tool CAVIARBF23,47 in a fine-mapping exercise to identify candidate causal SNPs. We used the 74 baseline annotations employed by stratified LD Score regression as well as 451 annotations from from Pickrell46. We applied a MAF filter of 0.01 and a sample-size filter of 400,000 and only considered SNPs within a 50-kb radius of a lead SNP. We computed exact Bayes factors by averaging over prior variances of 0.01, 0.1, and 0.5; we set the sample size to the mean sample size of our considered SNPs; and we added 0.2 to the main diagonal of the LD matrix because we used a reference panel for LD estimation. To incorporate annotations, we used the elastic net setting with parameters selected via 5-fold cross-validation. The resulting annotation effect sizes and list of candidate causal SNPs are given in Supplementary Tables 37 and 10. Regional association plots of four noteworthy candidates are shown in Supplementary Figure 9.
Polygenic prediction.
Prediction analyses were performed using the National Longitudinal Study of Adolescent to Adult Health (Add Health), the Health and Retirement Study (HRS), and the Wisconsin Longitudinal Study (WLS). Polygenic scores were constructed using HapMap3 SNPs that meet the following conditions: (i) the variant has a call rate greater than 98% in the prediction cohort; (ii) the variant has a minor allele frequency (MAF) greater than 1% in the prediction cohort; and (iii) the allele frequency discrepancy between the meta-analysis and the prediction cohort does not exceed 0.15. To calculate the SNP weights we use the software package LDpred25, assuming a fraction of causal variants equal to 1, and then we construct the scores in PLINK.
All prediction exercises were performed with an OLS or probit regression of a phenotype on our score and a set of controls consisting of a full set of dummy variables for year of birth, an indicator variable for sex, a full set of interactions between sex and year of birth, and the first 10 principal components of the variance-covariance matrix of the genetic relatedness matrix.
Our measure of prediction accuracy is the incremental R2. To calculate this value, we first regress a phenotype on our set of controls without the polygenic score. Next, we re-run the same regression but with the score included as a regressor. For quantitative phenotypes, our measure of predictive power is the change in R2. For binary outcomes, we calculated the incremental pseudo-R2 from a Probit regression. To obtain 95% confidence intervals, we bootstrapped the incremental R2’s with 1000 repetitions (Supplementary Table 38 and Supplementary Figures 13, 26, 27 and 28.
Prediction of other phenotypes.
In addition to EduYears, we also used our polygenic score to predict a number of other phenotypes. In the HRS and Add Health, we analyzed three binary variables related to educational attainment: (i) High School Completion, (ii) College Completion, and (iii) Grade Retention (i.e., retaking a grade).
In additional analyses in Add Health, we predicted an augmented version of the Peabody Picture Vocabulary test, measured when participants were 12–20 years old. Peabody scores were age-standardized. We also predicted a number of Grade Point Average variables (range: 0.0 to 4.0) from the third wave of Add Health, when transcripts were collected from respondents’ high schools. We analyzed Overall GPA, Math GPA, Science GPA, and Verbal GPA, controlling for high school fixed effects.
In additional analyses in the HRS, we predicted several cognitive phenotypes. Total Cognition is the sum of four cognitive measures measured in waves 3 through 10: an immediate word recall task, a delayed word recall task, a naming task, and a counting task. Verbal Cognition measures the subject’s ability to define five words. To evaluate changes over time, we also studied wave-to-wave changes in Total Cognition and Verbal Cognition. Our next cognitive outcome, Alzheimer’s, is an indicator variable equal to 1 for subjects who report having been diagnosed with Alzheimer’s disease, and 0 otherwise. Since the HRS data are longitudinal, the unit of analysis for our 4 cognitive outcomes is a person-year. For these analyses, because an individual took the cognitive tests at different ages, in our set of controls we replaced our person-specific age variable with age at assessment (which differs for an individual across the cognitive outcomes); we also clustered all standard errors at the person level.
In the WLS, we measured cognitive performance using a respondent’s raw score on a Henmon-Nelson test of mental ability48.
For all of these additional prediction exercises, results are shown in Supplementary Table 38 and depicted in Figure 4A and Supplementary Figures 13 and 11.
Benchmarking the Predictive Power of the EduYears Polygenic Score.
To benchmark our score’s predictive power, we compared its predictive power to the predictive power of other common variables: mother’s education, father’s education, both mother’s and father’s education, verbal cognition, household income, and a binary indicator for marital status. For each variable, we calculated the variable’s incremental R2 using the same procedures as those described above, with the same set of control variables. (For “mother’s and father’s education,” we calculated the incremental R2 from adding both variables as regressors.) The results of this analysis are shown in Supplementary Table 39A and depicted in Figure 4B and Supplementary Figure 12.
We also evaluated the attenuation in the incremental R2 of the polygenic score in predicting EduYears when we control for available demographic variables one at a time: marital status, household income, mother’s education, and father’s education. We next controlled for both mother’s and father’s education, and finally, we controlled for the full set of demographic controls. The results of this analysis are shown in Supplementary Table 39B and Supplementary Figure 12.
GWAS of Cognitive Performance, Math Ability and Highest Math.
The GWAS of Math Ability (N = 564,698) and Highest Math (N = 430,445) phenotypes were conducted exclusively among research participants of the personal genomics company 23andMe who answered survey questions about their mathematical background. In our analyses of Cognitive Performance, we combined a published study of general cognitive ability (N = 35,298) conducted by the COGENT consortium28 with new genome-wide association analyses of cognitive performance in the UK Biobank (N = 222,543). The phenotype measures are described in detail in Supplementary Table 40. Our new genome-wide analyses of Cognitive Performance in UKB, and Math Ability and Highest Math in 23andMe, were conducted using methods identical to those for EduYears in UKB and 23andMe, respectively (Supplementary Table 19).
For Cognitive Performance, we conducted a sample-size-weighted meta-analysis (N = 257,841), imposing a minimum-sample-size filter of 100,000. We similarly applied minimum-sample-size filters to the Math Ability (N > 500,000) and Highest Math (N > 350,000) results. We adjusted the test statistics using the estimated intercepts from LD Score regressions (1.073 for Math Ability, 1.105 for Highest Math, and 1.046 for Cognitive Performance). The summary statistics underwent quality control using the same procedures applied to the EduYears results files.
The lists of lead SNPs were obtained by applying the same clumping algorithm used in the EduYears analyses (Supplementary Tables 11–13). Manhattan plots from the analyses are shown in Supplementary Figures 14–16.
MTAG of Cognitive Performance, Math Ability and Highest Math.
We performed a joint analysis of our GWAS results on EduYears, Cognitive Performance, Math Ability, and High Math using MTAG11. Supplementary Table 14 shows moderately high pairwise genetic correlations, ranging from 0.51 to 0.85, which motivate the multivariate analysis. The MTAG analyses were restricted to SNPs that passed MTAG-recommended filters in all files with summary statistics. We dropped (i) SNPs with minor allele frequency below 1% or (ii) SNPs with sample sizes below a cutoff (66.6% of the 90th percentile), leaving approximately 7.1 million SNPs found in all four results files. Supplementary Table 41 reports the increases in effective sample size from using MTAG for each set of GWAS results.
Supplementary Table 15 lists all the lead SNPs in the MTAG analysis. Supplementary Figures 17–20 show inverted Manhattan plots that compare the MTAG and GWAS results, restricted to the set of SNPs that pass MTAG filters.
Polygenic scores were constructed from MTAG results using the same procedures as for the GWAS results. Supplementary Figure 29 and Supplementary Tables 42 and 43 compare the predictive power of scores constructed from MTAG results in the Add Health and WLS cohorts (see Supplementary Note for details).
To examine the credibility of the MTAG-identified lead SNPs of our lowest-powered GWAS, Cognitive Performance, we conducted a replication analysis. We re-ran MTAG with GWAS results that exclude COGENT cohorts, and we used the COGENT meta-analysis as our replication sample. In addition to applying the MTAG filters above, we limited the analysis to SNPs for which the COGENT results file contains summary statistics based on analyses of at least 25,000 individuals. The MTAG-identified lead SNPs for Cognitive Performance from our restricted sampled are reported in Supplementary Table 44. We used our Bayesian framework to calculate the expected replication record of the MTAG results under the hypothesis that the MTAG-identified lead SNPs are true positives, given sampling variation and adjusted for winner’s curse and differences in SNP heritability across the samples.
Supplementary Material
ACKNOWLEDGMENTS:
This research was carried out under the auspices of the Social Science Genetic Association Consortium (SSGAC). The research has also been conducted using the UK Biobank Resource under application numbers 11425 and 12512. We acknowledge the Swedish Twin Registry for access to data. The Swedish Twin Registry is managed by Karolinska Institutet and receives funding through the Swedish Research Council under the grant no 2017-00641. This study was supported by funding from the Ragnar Söderberg Foundation (E9/11, E24/15), the Swedish Research Council (421-2013-1061), The Jan Wallander and Tom Hedelius Foundation, an ERC Consolidator Grant (647648 EdGe), the Pershing Square Fund of the Foundations of Human Behavior, and the NIA/NIH through grants P01-AG005842, P01-AG005842-20S2, P30-AG012810, and T32-AG000186-23 to NBER, and R01-AG042568 to USC. A full list of acknowledgments is provided in the Supplementary Note.
Footnotes
COMPETING FINANCIAL INTERESTS: Anil Malhotra is a consultant to Genomind Inc., Informed DNA, Concert Pharmaceuticals, and Biogen. Nicholas A. Furlotte, Aaron Kleinman, and Joyce Tung are employees of 23andMe, Inc.
CONTRIBUTOR LIST FOR THE 23andMe RESEARCH TEAM: Michelle Agee23, Babak Alipanahi23, Adam Auton23, Robert K. Bell23, Katarzyna Bryc23, Sarah L. Elson23, Pierre Fontanillas23, Nicholas A. Furlotte23, David A. Hinds23, Bethann S. Hromatka23, Karen E. Huber23, Aaron Kleinman23, Nadia K. Litterman23, Matthew H. McIntyre23, Joanna L. Mountain23, Carrie A.M. Northover23, J. Fah Sathirapongsasuti23, Olga V. Sazonova23, Janie F. Shelton23, Suyash Shringarpure23, Chao Tian23, Joyce Y. Tung23, Vladimir Vacic23, Catherine H. Wilson23, and Steven J. Pitts23.
CONTRIBUTOR LIST FOR THE SOCIAL SCIENCE GENETIC ASSOCIATION CONSORTIUM RESEARCH TEAM: Aysu Okbay5,6, Jonathan P. Beauchamp41, Mark Alan Fontana9,13, James J. Lee1, Tune H. Pers14,15, Cornelius A. Rietveld12,56,57, Patrick Turley16,17, Guo-Bo Chen52, Valur Emilsson58,59, S. Fleur W. Meddens5,12,60, Sven Oskarsson47, Joseph K. Pickrell61, Kevin Thom49, Pascal Timshel14,15, Ronald de Vlaming12,56,57, Abdel Abdellaoui62, Tarunveer S. Ahluwalia14,63,64, Jonas Bacelis65, Clemens Baumbach66,67, Gyda Bjornsdottir68, Johannes H. Brandsma69, Maria Pina Concas70, Jaime Derringer71, Nicholas A. Furlotte23, Tessel E. Galesloot72, Giorgia Girotto73, Richa Gupta74, Leanne M. Hall75,77, Sarah E. Harris39,77, Edith Hofer78,79, Momoko Horikoshi80,81, Jennifer E. Huffman34, Kadri Kaasik82, Ioanna P. Kalafati83, Robert Karlsson46, Augustine Kong68, Jari Lahti82,84, Sven J. van der Lee57, Christiaan de Leeuw5,85, Penelope A. Lind86, Karl-Oskar Lindgren47, Tian Liu87, Massimo Mangino88,89, Jonathan Marten34, Evelin Mihailov11, Michael B. Miller1, Peter J. van der Most90, Christopher Oldmeadow91,92, Antony Payton93,94, Natalia Pervjakova11,95, Wouter J. Peyrot96, Yong Qian97, Olli Raitakari98, Rico Rueedi99,100, Erika Salvi101, Börge Schmidt102, Katharina E. Schraut21, Jianxin Shi103, Albert V. Smith58,104, Raymond A. Poot69, Beate St Pourcain105,106, Alexander Teumer107, Gudmar Thorleifsson68, Niek Verweij108, Dragana Vuckovic73, Juergen Wellmann109, Harm-Jan Westra110,111,112, Jingyun Yang113,114, Wei Zhao115, Zhihong Zhu52, Behrooz Z. Alizadeh90,116, Najaf Amin57, Andrew Bakshi52, Sebastian E. Baumeister107,117, Ginevra Biino118, Klaus Bønnelykke63, Patricia A. Boyle113,119, Harry Campbell21, Francesco P. Cappuccio120, Gail Davies77,121, Jan-Emmanuel De Neve122, Panos Deloukas123,124, Ilja Demuth125,126, Jun Ding97, Peter Eibich127,128, Lewin Eisele102, Niina Eklund95, David M. Evans105,129, Jessica D. Faul130, Mary F. Feitosa131, Andreas J. Forstner132,133, Ilaria Gandin73, Bjarni Gunnarsson68, Bjarni V. Halldórsson68,134, Tamara B. Harris135, Andrew C. Heath136, Lynne J. Hocking137, Elizabeth G. Holliday91,92, Georg Homuth138, Michael A. Horan139, Jouke-Jan Hottenga62, Philip L. de Jager112,140,141, Peter K. Joshi21,24, Astanand Jugessur142, Marika A. Kaakinen143, Mika Kähönen144,145, Stavroula Kanoni123, Liisa Keltigangas-Järvinen82, Lambertus A. L. M. Kiemeney72, Ivana Kolcic146, Seppo Koskinen95, Aldi T. Kraja131, Martin Kroh127, Zoltan Kutalik99,100,147, Antti Latvala74, Lenore J. Launer148, Maël P. Lebreton60,149, Douglas F. Levinson150, Paul Lichtenstein46, Peter Lichtner151, David C. M. Liewald77,121, LifeLines Cohort Study152, Anu Loukola74, Pamela A. Madden136, Reedik Mägi11, Tomi Mäki-Opas95, Riccardo E. Marioni41,77,153, Pedro Marques-Vidal154, Gerardus A. Meddens155, George McMahon105, Christa Meisinger67, Thomas Meitinger151, Yusplitri Milaneschi96, Lili Milani11, Grant W. Montgomery156, Ronny Myhre142, Christopher P. Nelson75,76, Dale R. Nyholt156,157, William E. R. Ollier93, Aarno Palotie16,17,112,158,159,160, Lavinia Paternoster105, Nancy L. Pedersen46, Katja E. Petrovic78, David J. Porteous39, Katri Räikkönen82,84, Susan M. Ring105, Antonietta Robino161, Olga Rostapshova7,162, Igor Rudan21, Aldo Rustichini163, Veikko Salomaa95, Alan R. Sanders164,165, Antti-Pekka Sarin159,166, Helena Schmidt78,167, Rodney J. Scott92,168, Blair H. Smith169, Jennifer A. Smith90, Jan A. Staessen170,171, Elisabeth Steinhagen-Thiessen125, Konstantin Strauch172,173, Antonio Terracciano174, Martin D. Tobin175, Sheila Ulivi161, Simona Vaccargiu70, Lydia Quaye88, Frank J. A. van Rooij57,176, Cristina Venturini88,89, Anna A. E. Vinkhuyzen52, Uwe Völker138, Henry Völzke107, Judith M. Vonk90, Diego Vozzi161, Johannes Waage63,64, Erin B. Ware115,177, Gonneke Willemsen62, John R. Attia91,92, David A. Bennett113,114, Klaus Berger108, Lars Bertram178,179, Hans Bisgaard63, Dorret I. Boomsma62, Ingrid B. Borecki131, Ute Bültmann180, Christopher F. Chabris50, Francesco Cucca181, Daniele Cusi101,182, Ian J. Deary77,121, George V. Dedoussis83, Cornelia M. van Duijn57, Johan G. Eriksson84,183, Barbara Franke184, Lude Franke185, Paolo Gasparini73,161,186, Pablo V. Gejman164,165, Christian Gieger66, Hans-Jörgen Grabe187,188, Jacob Gratten52, Patrick J. F. Groenen189, Vilmundur Gudnason58,104, Pim van der Harst108,185,190, Caroline Hayward34, David A. Hinds23, Wolfgang Hoffmann107, Elina Hyppönen191,192,193, William G. Iacono1, Bo Jacobsson65,142, Marjo-Riitta Järvelin194,195,196,197, Karl-Heinz Jöckel102, Jaakko Kaprio74,95,159, Sharon L. R. Kardia115, Terho Lehtimäki198,199, Steven F. Lehrer43,44,45, Patrik K. E. Magnusson46, Nicholas G. Martin200, Matt McGue1, Andres Metspalu11,201, Neil Pendleton202,203, Brenda W. J. H. Penninx96, Markus Perola11,95, Nicola Pirastu73, Mario Pirastu70, Ozren Polasek21,204, Danielle Posthuma5,205, Christine Power192, Michael A. Province131, Nilesh J. Samani75,76, David Schlessinger97, Reinhold Schmidt78, Thorkild I. A. Sørensen14,105,206, Tim D. Spector88, Kari Stefansson68,104, Unnur Thorsteinsdottir68,104, A. Roy Thurik12,56,207,208, Nicholas J. Timpson105, Henning Tiemeier57,209,210, Joyce Y. Tung23, André G. Uitterlinden57,211, Veronique Vitart34, Peter Vollenweider154, David R. Weir130, James F. Wilson21,34, Alan F. Wright34, Dalton C. Conley42, Robert F. Krueger1, George Davey Smith105, Albert Hofman57, David I. Laibson7, Sarah E. Medland86, Michelle N. Meyer51, Jian Yang10,52, Magnus Johannesson53, Peter M. Visscher10,52, Tõnu Esko11, Philipp D. Koellinger5,6,12, David Cesarini45,49,55 & Daniel J. Benjamin9,45,54.
56 Department of Applied Economics, Erasmus School of Economics, Erasmus University Rotterdam, 3062 PA, Rotterdam, The Netherlands
57 Department of Epidemiology, Erasmus Medical Center, Rotterdam, 3015 GE, The Netherlands
58 Icelandic Heart Association, Kopavogur, 201, Iceland
59 Faculty of Pharmaceutical Sciences, University of Iceland, 107 Reykjavík, Iceland
60 Amsterdam Business School, University of Amsterdam, Amsterdam, 1018 TV, The Netherlands
61 New York Genome Center, New York, NY 10013, USA
62 Department of Biological Psychology, VU University Amsterdam, Amsterdam, 1081 BT, The Netherlands
63 COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, 2820, Denmark
64 Steno Diabetes Center, Gentofte, 2820, Denmark
65 Department of Obstetrics and Gynecology, Institute of Clinical Sciences, Sahlgrenska Academy, Gothenburg, SE 416 85, Sweden
66 Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, 85764, Germany
67 Institute of Epidemiology II, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, 85764, Germany
68 deCODE Genetics/Amgen Inc., Reykjavik, IS-101, Iceland
69 Department of Cell Biology, Erasmus Medical Center Rotterdam, 3015 CN, The Netherlands
70 Istituto di Ricerca Genetica e Biomedica U.O.S. di Sassari, National Research Council of Italy, Sassari, 07100, Italy
71 Psychology, University of Illinois, IL 61820, Champaign, USA
72 Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, 6525 EC, The Netherlands
73 Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste, 34100, Italy
74 Department of Public Health, University of Helsinki, Helsinki, FI-00014, Finland
75 Department of Cardiovascular Sciences, University of Leicester, Leicester, LE3 9QP, UK
76 NIHR Leicester Cardiovascular Biomedical Research Unit, Glenfield Hospital, Leicester, LE3 9QP, UK
77 Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, EH8 9JZ, UK
78 Department of Neurology, General Hospital and Medical University Graz, Graz, 8036, Austria
79 Institute for Medical Informatics, Statistics and Documentation, General Hospital and Medical University Graz, Graz, 8036, Austria
80 Oxford Centre for Diabetes, Endocrinology & Metabolism, University of Oxford, Oxford, OX3 7LE, UK
81 Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
82 Institute of Behavioural Sciences, University of Helsinki, Helsinki, FI-00014, Finland
83 Nutrition and Dietetics, Health Science and Education, Harokopio University, Athens, 17671, Greece
84 Folkhälsan Research Centre, Helsingfors, FI-00014, Finland
85 Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, 6525 EC, The Netherlands
86 Quantitative Genetics, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4029, Australia
87 Lifespan Psychology, Max Planck Institute for Human Development, Berlin, 14195, Germany
88 Department of Twin Research and Genetic Epidemiology, King’s College London, London, SE1 7EH, UK
89 NIHR Biomedical Research Centre, Guy’s and St. Thomas’ Foundation Trust, London, SE1 7EH, UK
90 Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, 9700 RB, The Netherlands
91 Public Health Stream, Hunter Medical Research Institute, New Lambton, NSW 2305, Australia
92 Faculty of Health and Medicine, University of Newcastle, Newcastle, NSW 2300, Australia
93 Centre for Integrated Genomic Medical Research, Institute of Population Health, The University of Manchester, Manchester, M13 9PT, UK
94 School of Psychological Sciences, The University of Manchester, Manchester, M13 9PL, UK
95 Department of Health, THL-National Institute for Health and Welfare, Helsinki, FI-00271, Finland
96 Psychiatry, VU University Medical Center & GGZ inGeest, Amsterdam, 1081 HL, The Netherlands
97 Laboratory of Genetics, National Institute on Aging, Baltimore, MD 21224, USA
98 Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, 20521, Finland
99 Department of Medical Genetics, University of Lausanne, Lausanne, 1005, Switzerland
100 Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
101 Department Of Health Sciences, University of Milan, Milano, 20142, Italy
102 Institute for Medical Informatics, Biometry and Epidemiology, University Hospital of Essen, Essen, 45147, Germany
103 Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892–9780, USA
104 Faculty of Medicine, University of Iceland, Reykjavik, 101, Iceland
105 MRC Integrative Epidemiology Unit, University of Bristol, Bristol, BS8 2BN, UK
106 School of Oral and Dental Sciences, University of Bristol, Bristol, BS1 2LY, UK
107 Institute for Community Medicine, University Medicine Greifswald, Greifswald, 17475, Germany
108 Department of Cardiology, University Medical Center Groningen, University of Groningen, Groningen, 9700 RB, The Netherlands 107
109 Institute of Epidemiology and Social Medicine, University of Muenster, Muenster, 48149, Germany
110 Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, MA 02115, Boston, USA
111 Partners Center for Personalized Genetic Medicine, Boston, MA 02115, USA
112 Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
113 Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, IL 60612, USA
114 Department of Neurological Sciences, Rush University Medical Center, Chicago, IL 60612, USA
115 Department of Epidemiology, University of Michigan, Ann Arbor, MI 48109, USA
116 Department of Gastroenterology and Hepatology, University of Groningen, University Medical Center Groningen, Groningen, 9713 GZ, The Netherlands
117 Institute of Epidemiology and Preventive Medicine, University of Regensburg, Regensburg, D-93053, Germany
118 Institute of Molecular Genetics, National Research Council of Italy, Pavia, 27100, Italy
119 Department of Behavioral Sciences, Rush University Medical Center, Chicago, IL 60612, USA
120 Warwick Medical School, University of Warwick, Coventry, CV4 7AL, UK
121 Department of Psychology, University of Edinburgh, Edinburgh, EH8 9JZ, UK
122 Saïd Business School, University of Oxford, Oxford, OX1 1HP, UK
123 William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
124 Princess Al-Jawhara Al-Brahim Centre of Excellence in Research of Hereditary Disorders (PACER-HD), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
125 The Berlin Aging Study II; Research Group on Geriatrics, Charité – Universitätsmedizin Berlin, Germany, Berlin, 13347, Germany
126 Institute of Medical and Human Genetics, Charité-Universitätsmedizin, Berlin, Berlin, 13353, Germany
127 German Socio- Economic Panel Study, DIW Berlin, Berlin, 10117, Germany
128 Health Economics Research Centre, Nuffield Department of Population Health, University of Oxford, Oxford, OX3 7LF, UK
129 The University of Queensland Diamantina Institute, The Translational Research Institute, Brisbane, QLD 4102, Australia
130 Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI 48109, USA
131 Department of Genetics, Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO 63018, USA
132 Institute of Human Genetics, University of Bonn, Bonn, 53127, Germany
133 Department of Genomics, Life and Brain Center, University of Bonn, Bonn, 53127, Germany
134 Institute of Biomedical and Neural Engineering, School of Science and Engineering, Reykjavik University, Reykjavik 101, Iceland
135 Laboratory of Epidemiology, Demography, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892–9205, United States
136 Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA
137 Division of Applied Health Sciences, University of Aberdeen, Aberdeen, AB25 2ZD, UK
138 Interfaculty Institute for Genetics and Functional Genomics, University Medicine Greifswald, Greifswald, 17475, Germany
139 Manchester Medical School, The University of Manchester, Manchester, 9PT, UK
140 Program in Translational NeuroPsychiatric Genomics, Departments of Neurology & Psychiatry, Brigham and Women’s Hospital, Boston, MA 02115, USA
141 Harvard Medical School, Boston, MA 02115, USA
142 Department of Genes and Environment, Norwegian Institute of Public Health, Oslo, N-0403, Norway
143 Department of Genomics of Common Disease, Imperial College London, London, W12 0NN, UK
144 Department of Clinical Physiology, Tampere University Hospital, Tampere, 33521, Finland
145 Department of Clinical Physiology, University of Tampere, School of Medicine, Tampere, 33014, Finland
146 Public Health, Medical School, University of Split, 21000 Split, Croatia
147 Institute of Social and Preventive Medicine, Lausanne University Hospital (CHUV), Lausanne, 1010, Switzerland
148 Neuroepidemiology Section, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892–9205, USA
149 Amsterdam Brain and Cognition Center, University of Amsterdam, 1018 XA, Amsterdam, The Netherlands
150 Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305–5797, USA
151 Institute of Human Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, 85764, Germany 155
152 LifeLines Cohort Study, University of Groningen, University Medical Center Groningen, Groningen, 9713 BZ, The Netherlands
153 Medical Genetics Section, Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK
154 Department of Internal Medicine, Internal Medicine, Lausanne University Hospital (CHUV), Lausanne, 1011, Switzerland
155 Tema BV, 2131 HE Hoofddorp, The Netherlands
156 Molecular Epidemiology, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4029, Australia
157 Institute of Health and Biomedical Innovation, Queensland Institute of Technology, Brisbane, QLD 4059, Australia
158 Psychiatric & Neurodevelopmental Genetics Unit, Department of Psychiatry, Massachusetts General Hospital, Boston, MA 02114, USA
159 Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, 00014, Finland
160 Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA 161 Medical Genetics, Institute for Maternal and Child Health IRCCS “Burlo Garofolo”, Trieste, 34100, Italy
162 Social Impact, Arlington, VA 22201, USA
163 Department of Economics, University of Minnesota Twin Cities, Minneapolis, MN 55455, USA
164 Department of Psychiatry and Behavioral Sciences, NorthShore University HealthSystem, Evanston, IL 60201–3137, USA
165 Department of Psychiatry and Behavioral Neuroscience, University of Chicago, Chicago, IL 60637, USA
166 Public Health Genomics Unit, National Institute for Health and Welfare, Helsinki 00300, Finland
167 Research Unit for Genetic Epidemiology, Institute of Molecular Biology and Biochemistry, Center of Molecular Medicine, General Hospital and Medical University, Graz, Graz, 8010, Austria
168 Information Based Medicine Stream, Hunter Medical Research Institute, New Lambton, NSW 2305, Australia
169 Medical Research Institute, University of Dundee, Dundee, DD1 9SY, UK
170 Research Unit Hypertension and Cardiovascular Epidemiology, Department of Cardiovascular Science, University of Leuven, Leuven, 3000, Belgium
171 R&D VitaK Group, Maastricht University, Maastricht, 6229 EV, The Netherlands
172 Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, 85764, Germany
173 Institute of Medical Informatics, Biometry and Epidemiology, Chair of Genetic Epidemiology, Ludwig Maximilians-Universität, Munich, 81377, Germany
174 Department of Geriatrics, Florida State University College of Medicine, Tallahassee, FL 32306, USA
175 Department of Health Sciences and Genetics, University of Leicester, Leicester, LE1 7RH, UK
176 Department of Internal Medicine, Erasmus Medical Center, Rotterdam, 3015 GE, The Netherlands
177 Research Center for Group Dynamics, Institute for Social Research, University of Michigan, Ann Arbor, MI 48104, USA
178 Platform for Genome Analytics, Institutes of Neurogenetics & Integrative and Experimental Genomics, University of Lübeck, Lübeck, 23562, Germany
179 Neuroepidemiology and Ageing Research Unit, School of Public Health, Faculty of Medicine, The Imperial College of Science, Technology and Medicine, London SW7 2AZ, UK
180 Department of Health Sciences, Community & Occupational Medicine, University of Groningen, University Medical Center Groningen, Groningen, 9713 AV, The Netherlands
181 Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche, c/o Cittadella Universitaria di Monserrato, Monserrato, Cagliari, 9042, Italy
182 Institute of Biomedical Technologies, Italian National Research Council, Segrate (Milano), 20090, Italy
183 Department of General Practice and Primary Health Care, University of Helsinki, Helsinki, 00014, Finland
184 Departments of Human Genetics and Psychiatry, Donders Centre for Neuroscience, Nijmegen, 6500 HB, The Netherlands
185 Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen. 9700 RB, The Netherlands
186 Sidra, Experimental Genetics Division, Sidra, Doha 26999, Qatar
187 Department of Psychiatry and Psychotherapy, University Medicine Greifswald, Greifswald, 17475, Germany
188 Department of Psychiatry and Psychotherapy, HELIOS-Hospital Stralsund, Stralsund, 18437, Germany
189 Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam, Rotterdam, 3062 PA, The Netherlands
190 Durrer Center for Cardiogenetic Research, ICIN-Netherlands Heart Institute, Utrecht, 1105 AZ, The Netherlands
191 Centre for Population Health Research, School of Health Sciences and Sansom Institute, University of South Australia, SA5000, Adelaide, Australia
192 South Australian Health and Medical Research Institute, Adelaide, SA5000, Australia
193 Population, Policy and Practice, UCL Institute of Child Health, London, WC1N 1EH, UK
194 Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment & Health, School of Public Health, Imperial College London, London, W2 1PG, UK
195 Center for Life Course Epidemiology, Faculty of Medicine, University of Oulu, Oulu, FI-90014, Finland
196 Unit of Primary Care, Oulu University Hospital, Oulu, 90029 OYS, Finland
197 Biocenter Oulu, University of Oulu, FI-90014 Oulu, Finland
198 Fimlab Laboratories, Tampere, 33520, Finland
199 Department of Clinical Chemistry, University of Tampere, School of Medicine, Tampere, 33014, Finland
200 Genetic Epidemiology, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4029, Australia
201 Institute of Molecular and Cell Biology, University of Tartu, Tartu, 51010, Estonia
202 Centre for Clinical and Cognitive Neuroscience, Institute Brain Behaviour and Mental Health, Salford Royal Hospital, Manchester, M6 8HD, UK
203 Manchester Institute Collaborative Research in Ageing, University of Manchester, Manchester, M13 9PL, UK
204 Faculty of Medicine, University of Split, Croatia, Split 21000, Croatia
205 Department of Clinical Genetics, VU Medical Centre, Amsterdam, 1081 HV, The Netherlands
206 Institute of Preventive Medicine, Bispebjerg and Frederiksberg Hospitals, The Capital Region, Frederiksberg, 2000, Denmark
207 Montpellier Business School, Montpellier, 34080, France
208 Panteia, Zoetermeer, 2715 CA, The Netherlands
209 Department of Psychiatry, Erasmus Medical Center, Rotterdam, 3015 GE, The Netherlands
210 Department of Child and Adolescent Psychiatry, Erasmus Medical Center, Rotterdam, 3015 GE, The Netherlands
211 Department of Internal Medicine, Erasmus Medical Center, Rotterdam, 3015 GE, The Netherlands
DATA AVAILABILITY AND ACCESSION CODES
Summary statistics can be downloaded from www.thessgac.org/data. We provide association results for all SNPs that passed quality-control filters in a GWAS meta-analysis of EduYears that excludes the research participants from 23andMe. SNP-level summary statistics from analyses based entirely or in part on 23andMe data can only be reported for up to 10,000 SNPs. We provide summary statistics for all lead SNPs identified in our GWAS analyses of Cognitive Performance, Math Ability, and Highest Math and the MTAG analyses of our four phenotypes. For the complete EduYears GWAS, which includes 23andMe, clumped results for the 3,575 SNPs with P < 10−5 are provided; this P-value threshold was chosen such that the total number of SNPs across the analyses that include data from 23andMe does not exceed 10,000. Contact information for each of the cohorts included in this paper can be found in the Supplementary Note.
CODE AVAILABILITY:
All software used to perform these analyses are available online.
URLs:
Social Science Genetic Association Consortium (SSGAC) website: http://www.thessgac.org/#!data/kuzq8.
Minimac2: https://genome.sph.umich.edu/wiki/Minimac2
BEAGLE v2.1.2: http://faculty.washington.edu/browning/beagle/b3.html
IMPUTE2 v2.3.1: http://mathgen.stats.ox.ac.uk/impute/impute_v2.html
PBWT: https://github.com/richarddurbin/pbwt
IMPUTE4: https://jmarchini.org/impute-4/
ShapeIT v2.r790: http://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html
BOLT-LMM: https://data.broadinstitute.org/alkesgroup/BOLT-LMM/
SNPTEST v2.4.1: https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html
REGSCAN v0.2.0: https://www.geenivaramu.ee/en/tools/regscan
METAL, release 2011–03-25: http://csg.sph.umich.edu/abecasis/metal/
EasyQC v9.0: http://www.uni-regensburg.de/medizin/epidemiologie-praeventivmedizin/genetische-epidemiologie/software/
ldsc v1.0.0: https://github.com/bulik/ldsc
Plink, 1.90b3p: http://zzz.bwh.harvard.edu/plink/plink2.shtml
LDpred v0.9.09: https://bitbucket.org/bjarni_vilhjalmsson/ldpred
Stata v14.2: https://www.stata.com/install-guide/windows/download/
DEPICT (downloaded Feb 2015): https://data.broadinstitute.org/mpg/depict/
MAGMA v1.06b: https://ctg.cncr.nl/software/magma
PANTHER release 20170403: http://www.geneontology.org
CAVIARBF v0.2.1: https://bitbucket.org/Wenan/caviarbf
MTAG software v1.0.1: https://github.com/omeed-maghzian/mtag
REFERENCES
- 1.Branigan AR et al. Variation in the Heritability of Educational Attainment: An International Meta-Analysis. Soc. Forces 92, 109–140 (2013). [Google Scholar]
- 2.Conti G, Heckman J & Urzua S The Education-Health Gradient. Am. Econ. Rev 100, 234–238 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Cutler DM & Lleras-Muney A Education and Health: Evaluating Theories and Evidence in Making Americans Healthier: Social and Economic Policy as Health Policy (eds. House J, Schoeni R, Kaplan G & Pollack H) (Russell Sage Foundation, 2008). [Google Scholar]
- 4.Rietveld CA et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science (80−. ). 340, 1467–1471 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Pickrell JK et al. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet 48, 709–717 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Belsky DW et al. The Genetics of Success. Psychol. Sci 27, 957–972 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Domingue BW, Belsky DW, Conley D, Harris KM & Boardman JD Polygenic Influence on Educational Attainment: New evidence from The National Longitudinal Study of Adolescent to Adult Health. AERA Open 1, 1–13 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Marioni RE et al. Genetic variants linked to education predict longevity. Proc. Natl. Acad. Sci 113, 13366–13371 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Anttila AV et al. Analysis of shared heritability in common disorders of the brain. bioRxiv 48991 (2016). doi: 10.1101/048991 [DOI] [PMC free article] [PubMed]
- 10.Okbay A et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Turley P et al. MTAG: Multi-Trait Analysis of GWAS. Nat. Genet in press, (2017). [Google Scholar]
- 12.The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bulik-Sullivan BK et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet 47, 291–295 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Yang J et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet 44, 369–375 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kong A et al. The nature of nurture: effects of parental genotypes. bioRxiv 219261 (2017). doi: 10.1101/219261 [DOI] [PubMed]
- 16.de Vlaming R et al. Meta-GWAS Accuracy and Power (MetaGAP) Calculator Shows that Hiding Heritability Is Partially Due to Imperfect Genetic Correlations across Studies. PLOS Genet. 13, e1006495 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Tropf FC et al. Hidden heritability due to heterogeneity across seven populations. Nat. Hum. Behav (2017). doi: 10.1038/s41562-017-0195-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Johnson W, Carothers A & Deary IJ Sex Differences in Variability in General Intelligence: A New Look at the Old Question. Perspect. Psychol. Sci 3, 518–531 (2008). [DOI] [PubMed] [Google Scholar]
- 19.Pers TH et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun 6, 5890 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Azevedo FAC et al. Equal numbers of neuronal and nonneuronal cells make the human brain an isometrically scaled-up primate brain. J. Comp. Neurol 513, 532–541 (2009). [DOI] [PubMed] [Google Scholar]
- 21.Finucane HK et al. Partitioning heritability by functional category using GWAS summary statistics. Nat. Genet 47, 1228–1235 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Reed TE & Jensen AR Arm nerve conduction velocity (NCV), brain NCV, reaction time, and intelligence. Intelligence 15, 33–47 (1991). [Google Scholar]
- 23.Chen W, McDonnell SK, Thibodeau SN, Tillmans LS & Schaid DJ Incorporating functional annotations for fine-mapping causal variants in a Bayesian framework using summary statistics. Genetics 204, 933–958 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wang G et al. CaV3.2 calcium channels control NMDA receptor-mediated transmission: a new mechanism for absence epilepsy. Genes Dev. 29, 1535–51 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Vilhjálmsson BJ et al. Modeling linkage disequilibrium increases accuracy of polygenicrisk scores. Am. J. Hum. Genet 97, 576–592 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Martin AR et al. Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations. Am. J. Hum. Genet 100, 635–649 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Scutari M, Mackay I & Balding D Using Genetic Distance to Infer the Accuracy of Genomic Prediction. PLoS Genet. 12, e1006288 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Trampush JW et al. GWAS meta-analysis reveals novel loci and genetic correlates for general cognitive function: a report from the COGENT consortium. Mol. Psychiatry 22, 336–345 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Davies G et al. Ninety-nine independent genetic loci influencing general cognitive function include genes associated with brain health and structure (N = 280,360). bioRxiv (2017).
- 30.Sniekers S et al. Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence. Nat Genet 49, 1107–1112 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Savage JE et al. GWAS meta-analysis (N=279,930) identifies new genes and functional links to intelligence. bioRxiv (2017).
- 32.Schmitz LL & Conley D The Effect of Vietnam-Era Conscription and Genetic Potential for Educational Attainment on Schooling Outcomes. Econ. Educ. Rev (2017). doi: 10.1016/j.econedurev.2017.10.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Heath AC et al. Education policy and the heritability of educational attainment. Nature 314, 734–736 (1985). [DOI] [PubMed] [Google Scholar]
- 34.Kang HJ et al. Spatio-temporal transcriptome of the human brain. Nature 478, 483–489 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
METHODS-ONLY REFERENCES
- 35.Willer CJ, Li Y & Abecasis GR METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Chang CC et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 1–16 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Okbay A et al. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet 48, 624–633 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Cochran WG The Combination of Estimates from Different Experiments. Biometrics 10, 101 (1954). [Google Scholar]
- 39.Bulik-Sullivan B et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet 47, 1236–1241 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Cameron AC & Miller D Robust inference with dyadic data. mimeo (2014). doi: 10.1201/b10440 [DOI] [Google Scholar]
- 41.The 1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Fehrmann RSN et al. Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat. Genet 47, 115–125 (2015). [DOI] [PubMed] [Google Scholar]
- 43.de Leeuw CA et al. MAGMA: Generalized Gene-Set Analysis of GWAS Data. PLoS Comput. Biol 11, e1004219 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Liu JZ et al. A versatile gene-based test for genome-wide association studies. Am. J. Hum. Genet 87, 139–145 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Mi H, Muruganujan A, Casagrande JT & Thomas PD Large-scale gene function analysis with the PANTHER classification system. Nat. Protoc 8, 1551–1566 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Pickrell JK Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet 94, 559–573 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Chen W et al. Fine mapping causal variants with an approximate Bayesian method using marginal test statistics. Genetics 200, 719–736 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Henmon VAC Henmon-Nelson Tests of Mental Ability, High School Examination-Grades 7 to 12-Forms A, B, and C. Teacher’s Manual. (1946). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.