Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2022 Aug 23;119(35):e2202764119. doi: 10.1073/pnas.2202764119

Genome-wide analyses of individual differences in quantitatively assessed reading- and language-related skills in up to 34,000 people

Else Eising a, Nazanin Mirza-Schreiber b, Eveline L de Zeeuw c, Carol A Wang d,e, Dongnhu T Truong f, Andrea G Allegrini g, Chin Yang Shapland h,i, Gu Zhu j, Karen G Wigg k, Margot L Gerritse a, Barbara Molz a, Gökberk Alagöz a, Alessandro Gialluisi l,m,n, Filippo Abbondanza o, Kaili Rimfeld g,p, Marjolein van Donkelaar a, Zhijie Liao q, Philip R Jansen r,s,t, Till F M Andlauer l,u, Timothy C Bates v, Manon Bernard w, Kirsten Blokland x, Milene Bonte y, Anders D Børglum z,aa,bb, Thomas Bourgeron cc, Daniel Brandeis dd,ee,ff,gg, Fabiola Ceroni hh,ii, Valéria Csépe jj,kk, Philip S Dale ll, Peter F de Jong mm, John C DeFries nn,oo, Jean-François Démonet pp, Ditte Demontis z,aa, Yu Feng k, Scott D Gordon j, Sharon L Guger qq, Marianna E Hayiou-Thomas rr, Juan A Hernández-Cabrera ss, Jouke-Jan Hottenga c, Charles Hulme tt, Juha Kere uu,vv, Elizabeth N Kerr qq,ww,xx, Tanner Koomar yy, Karin Landerl zz,aaa, Gabriel T Leonard bbb, Maureen W Lovett x,xx, Heikki Lyytinen ccc, Nicholas G Martin j, Angela Martinelli o, Urs Maurer ddd, Jacob J Michaelson yy, Kristina Moll eee, Anthony P Monaco fff, Angela T Morgan ggg,hhh,iii, Markus M Nöthen jjj, Zdenka Pausova w,kkk, Craig E Pennell d,e,lll, Bruce F Pennington mmm, Kaitlyn M Price k,x,nnn, Veera M Rajagopal z,aa, Franck Ramus ooo, Louis Richer ppp, Nuala H Simpson qqq, Shelley D Smith rrr, Margaret J Snowling qqq,sss, John Stein ttt, Lisa J Strug uuu,vvv, Joel B Talcott www, Henning Tiemeier r,xxx, Marc P van der Schroeff yyy,zzz, Ellen Verhoef a, Kate E Watkins qqq, Margaret Wilkinson x, Margaret J Wright aaaa, Cathy L Barr k,x,nnn, Dorret I Boomsma c,bbbb,cccc, Manuel Carreiras dddd,eeee,ffff, Marie-Christine J Franken yyy, Jeffrey R Gruen f, Michelle Luciano v, Bertram Müller-Myhsok l,gggg, Dianne F Newbury ii, Richard K Olson nn, Silvia Paracchini o, Tomáš Paus hhhh, Robert Plomin g, Sheena Reilly ggg,iiii, Gerd Schulte-Körne eee, J Bruce Tomblin jjjj, Elsje van Bergen c,bbbb,kkkk, Andrew J O Whitehouse llll, Erik G Willcutt oo, Beate St Pourcain a,h,mmmm,1, Clyde Francks a,mmmm,nnnn,1, Simon E Fisher a,mmmm,2
PMCID: PMC9436320  PMID: 35998220

Significance

Our unique capacities for spoken and written language are fundamental features of what makes us human, yet the biological bases remain largely mysterious. We present a large-scale well-powered genome-wide association study meta-analysis of individual differences in reading- and language-related skills (word reading, nonword reading, spelling, phoneme awareness, and nonword repetition) in tens of thousands of participants. The findings prompt a major reevaluation of prior literature claiming candidate gene associations in much smaller samples. Moreover, we use the novel genetic data as windows into multiple aspects of the biology of these important abilities, revealing molecular links to individual differences in neuroanatomy of language-related brain areas and enriched heritability in archaic deserts of the human genome as well as in fetal brain enhancer regions.

Keywords: reading, language, genome-wide association study, meta-analysis

Abstract

The use of spoken and written language is a fundamental human capacity. Individual differences in reading- and language-related skills are influenced by genetic variation, with twin-based heritability estimates of 30 to 80% depending on the trait. The genetic architecture is complex, heterogeneous, and multifactorial, but investigations of contributions of single-nucleotide polymorphisms (SNPs) were thus far underpowered. We present a multicohort genome-wide association study (GWAS) of five traits assessed individually using psychometric measures (word reading, nonword reading, spelling, phoneme awareness, and nonword repetition) in samples of 13,633 to 33,959 participants aged 5 to 26 y. We identified genome-wide significant association with word reading (rs11208009, P = 1.098 × 10−8) at a locus that has not been associated with intelligence or educational attainment. All five reading-/language-related traits showed robust SNP heritability, accounting for 13 to 26% of trait variability. Genomic structural equation modeling revealed a shared genetic factor explaining most of the variation in word/nonword reading, spelling, and phoneme awareness, which only partially overlapped with genetic variation contributing to nonword repetition, intelligence, and educational attainment. A multivariate GWAS of word/nonword reading, spelling, and phoneme awareness maximized power for follow-up investigation. Genetic correlation analysis with neuroimaging traits identified an association with the surface area of the banks of the left superior temporal sulcus, a brain region linked to the processing of spoken and written language. Heritability was enriched for genomic elements regulating gene expression in the fetal brain and in chromosomal regions that are depleted of Neanderthal variants. Together, these results provide avenues for deciphering the biological underpinnings of uniquely human traits.


The processing and production of complex spoken and written language are capacities that appear to be distinct to our species (1). Such skills have become fundamental for day-to-day life in modern society. Decades of family and twin studies have revealed substantial genetic components contributing to individual variation in reading- and language-related traits as well as to susceptibility to associated disorders (2). A recent meta-analysis integrated available data on these skills from 49 twin studies, with a total sample size of 38,000 children and adolescents aged 4 to 18 y. The meta-analysis yielded heritability estimates of 66% for word reading (meta-analysis of 48 studies), 80% for spelling (15 studies), and 52% for phoneme awareness (the ability to identify and manipulate individual sounds of spoken words; 13 studies) and suggested greater genetic influences on reading-related abilities than language-related measures (twin heritability of 34%; meta-analysis of 10 studies with measures on receptive and expressive vocabulary, oral language, and naming abilities) (3).

Linkage mapping and targeted candidate gene studies have reported associations of SNPs and/or genetic loci with reading- and language-related traits as well as with disorders such as dyslexia and developmental language disorder (DLD), which encompasses the older definition of specific language impairment (SLI) (4). However, replication efforts have been met with limited success (4). Moreover, for the language sciences, in contrast to other areas of human genetics, there have so far been few genome-wide association studies (GWASs), in which SNPs at millions of points across the genome are systematically screened for association with the trait of interest in large datasets comprising thousands of individuals (2). GWAS efforts are beginning to identify SNPs that show genome-wide significant associations with reading- and language-related traits: rs7642482 near ROBO2 associated with expressive vocabulary in infancy (5); rs17663182 within MIR924HG with rapid automatized naming of letters (6); and rs1555839 near RPL7P34 with rapid automatized naming and rapid alternating stimulus, deficits of which are often implicated in dyslexia (7). Nonetheless, insights into the genomic underpinnings of these types of skills from GWAS approaches have thus far been limited, which may reflect low power due to the relatively small sample sizes of the cohorts, such that the majority of genetic variance remains unexplained. Sample sizes have remained limited because of the labor-intensive assessment methods required for phenotypic characterization of reading- and language-related traits, which are difficult or even impossible to replace with simple questionnaires. Yet, well-powered GWAS efforts that characterize the molecular genetic variation involved in reading- and language-related traits have the potential to provide novel perspectives on the biological bases and origins of human cognitive specializations (8).

Here, we present large-scale GWAS meta-analyses of a set of reading- and language-related traits, measured with psychometric tools. We captured variation across the phenotypic spectrum, extending beyond disorder. Our study focused on traits assessed using continuous measures in multiple cohorts from the international GenLang network (https://www.genlang.org/) together with several public datasets that have data available for the relevant phenotypes matched to genome-wide genotype information. Five quantitative traits were identified for which phenotype data could be aligned across different cohorts to yield sufficiently large sample sizes for GWAS: word reading, nonword reading, spelling, phoneme awareness, and nonword repetition. Univariate GWAS meta-analyses were performed for each of the phenotypes to identify genetic variation influencing these traits and to model genetic overlaps between them. For comparative purposes, a GWAS meta-analysis for performance intelligence quotient (IQ) was also performed in the same dataset. Together with publicly available GWAS summary statistics from prior studies of cognitive performance and educational attainment, these data were used to study genetic relationships between reading- and language-related traits, IQ, and educational attainment. A multivariate approach allowed us to optimize the power of GWAS meta-analysis for functional follow-ups, giving insights into the tissues, cell types, brain regions, and evolutionary signatures involved.

Results

Meta-Analyses of Quantitative Reading- and Language-Related Traits in 22 Cohorts.

Our study focused on five quantitative reading- and language-related traits: word reading accuracy, nonword reading accuracy, spelling accuracy, phoneme awareness, and nonword repetition accuracy (Table 1). These traits are thought to tap into a number of underlying processes involved in written and spoken language. For example, nonword reading relies heavily on basic decoding skills [translating graphemes one by one into phonemes (9)], while spelling utilizes lexical and orthographic knowledge [understanding of permissible letter patterns and how they are arranged in specific words (10)]. Phoneme awareness measures the ability to distinguish and manipulate the separate phonemes in spoken words (11). Nonword repetition tasks tap into speech perception, phonological short-term memory, and articulation (12). A total of 22 cohorts aggregated by the GenLang Consortium combined with several publicly available datasets provided data for one or several of these traits (Datasets S1–S3 and SI Appendix, Figs. S1 and S2). The cohorts connected in the GenLang network either were originally ascertained through a proband with a language/reading disorder (DLD/SLI or dyslexia) or were sampled from the general population; all cohorts include quantitative phenotypic data gathered via validated psychometric tests as well as genome-wide genotype data from the tested individuals. Some of the samples are birth cohorts, and some involve family or twin designs. The phenotype data were collected across an array of different ages, test instruments, and languages (primarily English but also, Dutch, Spanish, German, French, Finnish, and Hungarian). We reduced heterogeneity of assessment age by excluding individuals over 18 y of age (except for three cohorts) (SI Appendix, Extended Methods) and where phenotype data were available from the same participant at multiple ages, by choosing the age that matched best with the assessment ages of the largest cohort(s). We limited the heterogeneity introduced by different test instruments by only including those that measured the phenotypes described in our analysis plan (SI Appendix, Supplemental Notes) and in cases where data from more than one test instrument were available, by selecting the test instrument that was used by the largest number of cohorts.

Table 1.

Phenotypes and sample sizes of the GWAS meta-analyses

Trait Phenotype description Meta-analysis total sample Meta-analysis European ancestry only
No. of cohorts No. of individuals No. of cohorts No. of individuals
Word reading Number of correct words read aloud from a list in a time-restricted or unrestricted fashion 19 33,959 18 27,180
Nonword* reading Number of nonwords read aloud correctly from a list in a time-restricted or unrestricted fashion 13 17,984 12 16,746
Spelling Number of words correctly spelled orally or in writing after being dictated as single words or in a sentence 15 18,514 14 17,278
Phoneme awareness Number of words correctly altered in phoneme deletion/elision and spoonerism tasks 12 13,633 11 12,411
Nonword* repetition Number of nonwords or phonemes repeated aloud correctly 10 14,046 10 12,828

*A nonword is a group of phonemes that looks or sounds like a word, obeys the phonotactic rules of the language, but has no meaning.

We evaluated whether, despite efforts to minimize this, heterogeneity related to age and/or use of different test instruments remained in our data. To do so, we generated GWAS meta-analysis results for word and nonword reading stratified by age or test instrument (Dataset S4). We tested those data for genetic correlations using linkage disequilibrium score regression (LDSC) (13), determining to what extent the same common genetic variation is accounting for phenotypic variability in the different stratified GWAS datasets (detailed in SI Appendix, Supplemental Notes). There was limited heterogeneity of GWAS meta-analysis results, as also evident in the Cochran Q statistics (SI Appendix, Fig. S3) and LDSC ratios (Dataset S4) for all traits except nonword repetition.

We also assessed effects of sex, observing high genetic correlations for female- and male-only subsets (SI Appendix Supplemental Notes, Dataset S4). Given the lack of genetic heterogeneity, the remainder of the study involved analyses of the full non-stratified dataset, to ensure largest available sample size and maximal power.

A Genome-Wide Significant Locus Associated with Word Reading.

We performed univariate GWAS meta-analyses for the reading-/language-related traits in the full GenLang dataset as follows. For each phenotype, associations between SNPs and the quantitative trait were calculated in every cohort separately, then combined into a meta-analysis for that phenotype. This yielded five separate univariate GWAS datasets, one for each trait. For evaluating statistical significance of SNP associations, we determined an appropriate threshold that was adjusted not only for genome-wide screening (P = 5 × 10−8), but also for multiple testing based on the correlation structure of our five reading-/language-related traits, as estimated using phenotypic Spectral Decomposition (phenoSpD; Materials and Methods). The significance threshold for assessing the GWAS meta-analysis results was thus set to P = 5 × 10−8/2.15 independent traits = 2.33 × 10−8. We identified a genome-wide significant locus associated with word reading (rs11208009 C/T on chromosome 1, P = 1.10 × 10−8, beta = 0.048, SE = 0.008) (SI Appendix, Fig. S5). Notably, rs11208009 has not shown association with general cognitive performance or educational attainment, while other SNPs in linkage disequilibrium (LD) with rs11208009 (r2 > 0.6) have been associated with triglyceride and total cholesterol levels in blood in previous GWAS (Dataset S5). Three genes are located in the vicinity of rs11208009 and SNPs in LD (r2 > 0.6): DOCK7, encoding a guanine nucleotide exchange factor important for neurogenesis (14); ANGPTL3, which encodes a growth factor specific for the vascular endothelium that is expressed specifically in the liver (15); and USP1, encoding a deubiquitinating enzyme specific for the Fanconi anemia pathway (16). The associated locus harbors an expression quantitative trait locus regulating DOCK7 and ATG4C [another nearby gene that encodes an autophagy regulator (17)] in the cerebellum and DOCK7, ATG4C, and USP1 in blood (Dataset S6). Genome-wide significant loci were not identified for the other traits. Dataset S7 lists all results with P < 1 × 10−6.

Traits Related to Written and Spoken Language Are Highly Correlated at the Genetic Level.

The individual effect size of our genome-wide significant hit is small, as is typical for genetically complex traits. We went on to make use of the complete GWAS signal considered in aggregate across the genome to gain insights into the genetic architecture of the reading-/language-related traits as well as relationships with other aspects of human biology. First, for each phenotype, we estimated SNP-based heritability: the proportion of trait variability explained by the SNPs included in the GWAS. All five traits showed significant SNP-based heritability, with LDSC-based estimates ranging from 0.13 for nonword repetition to 0.26 for nonword reading (Dataset S4), indicating that the captured common genetic variation accounts for a substantive proportion of the phenotypic variance. These observations allowed for follow-up analyses that are dependent on significant SNP heritability, including estimates of genetic correlations: a measure that quantifies the overall genetic similarity between two complex traits. Pairwise genetic correlation analyses showed significant overlap among the reading- and language-related traits (Fig. 1A and Dataset S4). Genetic correlation estimates were especially high for word reading, nonword reading, spelling, and phoneme awareness, ranging from 0.96 (SE = 0.07) to 1.06 (SE = 0.07).

Fig. 1.

Fig. 1.

Reading- and language-related traits have a shared genetic architecture that is largely independent of performance IQ. (A) Genetic correlations (rg) among the reading- and language-related traits estimated with LDSC. Estimates are capped at one. Full LDSC results are reported in Dataset S4. In addition, genetic correlations are given between the GenLang traits and 1) performance IQ (using GenLang cohorts only); 2) educational attainment (EA; n = 766,345) and full-scale IQ (n = 257,828) (18); 3) noncognitive abilities involved in EA, resulting from a recent GWAS by subtraction study (n = 510,795) (19); and 4) components associated with distinct performance domains identified used a decomposition analysis of Danish school grades (n = 30,982) (20). Full results can be found in Dataset S8. *Significant genetic correlation after correction for 18.28 independent comparisons (P < 2.74 × 10−3); **P < 2.74 × 10−4; ***P < 2.74 × 10−5. (B) Three-factor model fitted to the GenLang summary statistics for word reading, nonword reading, spelling, phoneme awareness, nonword repetition, and performance IQ and to published GWAS summary statistics for full-scale IQ and EA (18) using GenomicSEM (21). Black and gray paths represent factor loadings with P < 0.05 and P > 0.05, respectively. Standardized factor loadings are shown, with SE in parentheses. The subscript g represents the genetic variables; the u variables represent the residual genetic variance not explained by the models. Unstandardized results and model fit indices are reported in Dataset S9.

Prior literature has shown phenotypic correlations of reading- and language-related traits with general cognitive performance and educational attainment. Most cognitive assessments depend on a combination of verbal and nonverbal tests. To enable the investigation of genetic overlaps between nonverbal cognitive performance and reading- and language-related traits while closely matching the sample characteristics of our study, we carried out a GWAS meta-analysis of performance IQ in the GenLang network (n = 18,722) (SI Appendix, Figs. S1–S3). Only nonverbal subtests of general intelligence tests were used in this analysis (Dataset S1). Summary statistics were also obtained from another three sources: 1) genome-wide studies of full-scale IQ (based on both verbal and nonverbal tasks; n = 257,828) and educational attainment (n = 766,345) by the Social Science Genetic Association Consortium (18); 2) a GWAS by subtraction study that investigated the noncognitive abilities involved in educational attainment (n = 510,795) (19); and 3) a recent GWAS analysis of school grades in the Danish Integrative Psychiatric Research (iPSYCH) cohort (n = 30,982) that used a decomposition analysis to identify genetic associations with distinct domains of performance (20).

The five GenLang reading-/language-related traits showed moderate to strong positive genetic correlations with full-scale IQ (range = 0.52 to 0.77), educational attainment (range = 0.54 to 0.68), and school performance (range = 0.54 to 0.81) (Fig. 1A and Dataset S8). Interestingly, genetic correlations with full-scale IQ were substantially higher for word reading (95% CI = 0.70 to 0.85) than nonword reading (95% CI = 0.50 to 0.68), likely reflecting the importance of reading skills for verbal tests of cognition. Genetic correlations of reading-/language-related traits with performance IQ (range = 0.20 to 0.35) were much lower than those for full-scale IQ, and the 95% CIs did not overlap. Indeed, only word reading showed a significant genetic correlation with performance IQ. These results indicate that reading-/language-related traits and IQ are at least partly based on distinct genetic factors. Significant trait-specific genetic correlations were observed for components 2 to 4 of the Danish school grade decomposition analysis (20). Component 2, reflecting relatively better school grade performance in language than mathematics (as compared with peers), was positively correlated with both GenLang reading traits. Component 3, reflecting relatively better school grade performance in oral than in written examinations, showed significant negative correlations with phoneme awareness and spelling. Lastly, component 4, reflecting relatively better school grade performance in Danish than in English, showed significant negative correlation with nonword repetition. The noncognitive abilities involved in educational attainment identified in the GWAS by subtraction study (19) showed small but significant positive genetic correlations with word reading, nonword reading, and nonword repetition.

Genomic structural equation modeling (GenomicSEM) (21) is a method that can use GWAS results to fit and compare models describing genetic overlaps between multiple phenotypes to further understand how traits are related. We used this method to model the shared genetic architecture of the five reading- and language-related traits together with performance IQ, full-scale IQ, and educational attainment (Dataset S9). Exploratory factor models with one to four factors were fitted to the data, of which the three-factor model explained the majority of the variance. The three-factor model was followed up using confirmatory factor analysis in GenomicSEM. In the final model (Fig. 1B), the first factor explains variation in nonword reading, spelling, phoneme awareness, word reading, and full-scale IQ. For the first three traits, there is no evidence for additional genetic influences, suggesting high genetic similarity. The second factor explains additional variation in full-scale IQ and is also related to performance IQ and educational attainment. The third factor explains variation in nonword repetition, word reading, and educational attainment. Factors 1 and 3 are highly correlated, indicating that the genetic architecture underlying word reading does not differ much from nonword reading, spelling, and phoneme awareness. Nonword repetition, on the other hand, is genetically more distinct, as indicated by evidence for specific genetic influences not captured by the model. Specific genetic influences were also evident for full-scale IQ and educational attainment. Thus, although reading- and language-related traits show genetic overlaps with full-scale IQ and educational attainment, the model indicates that these traits also have unique unshared components, in line with genetic correlation estimates that are lower than one.

Limited Evidence for Genes Previously Reported in Reading-/Language-Related Traits and Disorders.

The number of previous GWAS on reading-/language-related traits and disorders is small (Dataset S10), and these have identified very few associations exceeding genome-wide significance. In those prior studies, a total of 48 independent SNPs met a less stringent threshold of P < 1 × 10−6 in the respective GWAS. We ran lookups of each of those SNPs in our GenLang GWAS meta-analysis results (Dataset S10). Where SNP associations passed a threshold adjusted for multiple testing of 48 SNPs and 2.15 independent GenLang traits (P < 4.84 × 10−4), we then reran the association analyses after exclusion of the original cohort(s) in which the association was first identified to evaluate independent effects beyond those of the respective earlier study. According to these criteria, only one SNP, rs1555839, previously associated with rapid automatized naming in the Genes, Reading, and Dyslexia (GRaD) cohort (7) yielded a significant signal in the remainder of the GenLang cohorts, showing association with spelling (P = 3.33 × 10−4). This SNP is one of five SNPs that reached the threshold for genome-wide significance in the original GWAS of the GRaD cohort.

Some 20 genes have been described in the literature as candidate genes for reading-/language-related traits and disorders based on a range of mapping approaches and have been the focus of much of the prior published research in this area (4). Dataset S11 gives gene-based P values from our GenLang GWAS meta-analysis, calculated by Multi-marker Analysis of GenoMic Annotation (MAGMA) (22), for each of these genes. Variation in one, namely DCDC2, showed association with nonword reading that passed the significance threshold for multiple testing of 20 genes and 2.15 independent GenLang traits (P < 0.0012). DCDC2 was originally identified in a linkage region for dyslexia susceptibility, and SNPs in and near this gene were subsequently associated with dyslexia in candidate gene studies (23), although some investigations, including a meta-analysis of seven studies, failed to support this (24). No single candidate SNP highlighted in prior studies of DCDC2, nor in any other candidate gene, was significantly associated with any traits in the GWAS meta-analysis results after correction for testing of 54 SNPs and 2.15 independent GenLang traits (P < 4.31 × 10−4) (Dataset S11). These null findings highlight the importance of large-scale studies for robust identification of common DNA variation associated with reading-/language-related traits and disorders and warrant a reevaluation of the contributions of candidate genes that are prominent from prior literature.

A Multivariate GWAS Analysis of GenLang Traits Maximizes SNP Heritability.

To improve the power of our GWAS meta-analysis for follow-up analyses, we took advantage of the high genetic correlations between the traits by performing a multivariate GWAS analysis of word reading, nonword reading, spelling, and phoneme awareness with the Multi Trait Analysis of GWAS (MTAG) method (25). This approach improves the effect estimates of univariate GWAS results per SNP by incorporating information from the other genetically correlated traits. Although MTAG generates output for each primary input trait, these were extremely similar as a consequence of the particularly high genetic correlations between the traits. Hence, the multivariate results for word reading were used for all follow-up analyses because the univariate word reading GWAS meta-analysis has the largest sample size (27,180 for the European ancestry analysis compared with 12,411 to 17,278 for the other three traits). Although no individual SNP reached genome-wide significance in this multivariate analysis (SI Appendix, Fig. S4), the approach improved power for follow-up analyses in three ways: 1) by incorporating all available data from the different traits without increasing multiple testing burden; 2) by reducing error variance and thereby, increasing the proportion of phenotypic variability captured by genetic variations, with SNP-based heritability that was higher than any of the univariate estimates with a smaller SE (0.29, SE = 0.02); and 3) because joint analysis of the different traits maximized the effective sample size of the dataset. Indeed, MTAG estimated the GWAS equivalent sample size of the multivariate results as 41,783 compared with 27,180 for the corresponding univariate results. These multivariate GenLang GWAS results were used for follow-up analyses utilizing not only genetic correlation approaches, but also other genome-based methods, such as gene property analysis and heritability partitioning, further explained below.

Assessing Links to Genetics of Neuroanatomical Variation in Reading-/Language-Related Circuitry.

The neurobiological circuitry involved in spoken and written language has been extensively investigated in prior literature, from pioneering postmortem studies of brain lesions through to the noninvasive structural and functional neuroimaging research that is now standard for the field (26, 27). Notably, multiple studies have identified relationships between measures of neuroanatomical features and performance on reading-/language-related tasks through analyses of developmental changes, of individual differences, and of relevant disorders (28, 29). At the same time, a growing number of large-scale GWAS efforts have reported robust relationships between common DNA variation and individual differences in multiple aspects of human neuroanatomy, including brain volumes, surface area and thickness of different cortical regions, and white matter microstructure, among others (30, 31). The availability of GWAS data from behavioral/cognitive phenotypes and from magnetic resonance imaging (MRI)-based measures of neuroanatomy makes it possible to determine genetic overlaps between brains and behavior, even when the study cohorts are independent (31, 32). Here, we applied this strategy to assess genetic relationships between reading-/language-related measures and neuroanatomical variation, as follows. First, we performed a literature review to select structural neuroimaging traits that 1) encompass brain regions and white matter tracts with known links to aspects of reading and language and 2) have been investigated with GWAS in the large UK Biobank resource (Materials and Methods). This yielded 58 structural neuroimaging traits, including surface-based morphometry (surface area and thickness) (SI Appendix, Fig. S6) phenotypes and diffusion tensor imaging (mean and weighted mean fractional anisotropy) (SI Appendix, Fig. S7) results. Second, as many of these brain-based phenotypes were significantly correlated with each other, genetic correlations among their summary statistics were used to calculate the number of independent traits for multiple testing correction. Third, we performed genetic correlation analyses between the multivariate GenLang GWAS results and the 58 neuroimaging traits to identify genetic overlaps. One neuroimaging trait showed significant genetic correlation with the multivariate GenLang results (P < 0.05/24.85 independent traits = 2.01 × 10−3): the surface area of the banks of the superior temporal sulcus (STS) of the left hemisphere (rg = 0.21, SE = 0.06) (Fig. 2 and Dataset S12). This finding suggests the existence of shared genetic factors that contribute both to left STS surface area and to reading-/language-related skills, albeit without identifying which SNPs underlie the relationship. Functional MRI studies have linked this region to different aspects of written and spoken language processing (3336). To investigate further whether the five different reading- and language-related traits show similar or diverse genetic correlations for banks of the left STS, we went on to specifically assess the results from the original univariate GenLang GWAS meta-analyses, finding consistent genetic correlations (range = 0.18 to 0.23) (Dataset S12).

Fig. 2.

Fig. 2.

The multivariate GenLang GWAS results show significant genetic correlation with the cortical surface area around the left STS. Genetic correlations (rg) were estimated with LDSC. Included traits are 58 structural brain imaging traits from the UK Biobank selected based on known links of regions and circuits with language processing. The results of the 22 cortical surface areas are shown; gray areas were not included in the analysis. Full results can be found in Dataset S12 and SI Appendix, Figs. S6 and S7. *Significant genetic correlation after correcting for 24.85 independent brain imaging traits (P < 2.01 × 10−3).

Genetic Correlation with Traits from the UK Biobank and Brain-Related Traits from the LD Hub.

Next, we assessed genetic correlations of our multivariate GWAS results with 20 cognitive, education, neurological, psychiatric, and sleeping-related traits and 515 additional UK Biobank traits using LD Hub. To further investigate overlaps with and differences from IQ, genetic correlations between these 535 traits and the published GWAS summary statistics for full-scale IQ (18) (n = 257,828) were obtained as well. A total of 143 traits showed significant genetic correlations with the multivariate GenLang GWAS results after correction for multiple testing [P < 0.05/(535 × 2) = 4.67 × 10−5] (Dataset S13), while 245 traits were genetically correlated with full-scale IQ; 135 traits showed significant correlations with both our multivariate GWAS and full-scale IQ. Traits with strong genetic correlations were related to education, eyesight, chronotype, well-being, lifestyle, physical health and exercise, and socioeconomic status. Representative traits are plotted in Fig. 3. Cognitive and education-related traits showed higher genetic correlations with full-scale IQ than with the multivariate GenLang GWAS data. Several psychiatric and well-being traits showed significant (negative) genetic correlations with full-scale IQ but not with our multivariate GWAS: for example, depressive symptoms; cross-disorder susceptibility (from the Psychiatric Genomics Consortium GWAS); and tense, hurt, and nervous feelings. In contrast, several traits related to physical health and lifestyle showed larger genetic correlations with the multivariate GenLang GWAS results than with full-scale IQ, including body mass index, reduced alcohol intake as a health precaution, and usual walking pace.

Fig. 3.

Fig. 3.

Genetic correlation results of the multivariate GenLang GWAS analysis with comparisons with those for the largest published GWAS of full-scale IQ in LD Hub. Summary statistics for full-scale IQ (n = 257,828) were obtained from the Social Science Genetic Association Consortium (18). Genetic correlations between the multivariate GenLang results (blue–green), full-scale IQ (purple), and traits in LD Hub reveal an overlap with cognitive traits, education, eyesight, chronotype, lifestyle, well-being, psychiatric disorders, pain, physical health and exercise, and socioeconomic status. A subset of representative traits is shown; 143 traits showed significant associations with the multivariate GenLang results, and 245 traits showed significant correlations with full-scale IQ, of which 135 traits overlap after correction for multiple testing for 535 × 2 traits (P < 4.67 × 10−5). Significant correlations are shown in dark colors; nonsignificant correlations are in light colors. Full results can be found in Dataset S13. UKBB: UK Biobank, GCSE: General Certificate of Secondary Education, NVQ: National Vocational Qualification, HND: Higher National Diploma, HNC: Higher National Certificate, CSE: Certificate of Secondary Education, PGC: Psychiatric Genomics Consortium, BMI: body mass index, SES: socioeconomic status. Genetic correlation (rg) is presented as a dot, and error bars indicate the SE.

Analysis of Genomic Annotations Related to Human Evolution.

Writing and reading are relatively recent cultural innovations, but multiple lines of evidence indicate that the relevant skills, especially those involved in decoding as assessed here, are based on our capacities for spoken language, which emerged through biological evolution along the lineage that led to humans (37, 38). As noted above, our multivariate GWAS captured common DNA variation accounting for a substantial proportion of interindividual variability in reading-/language-related traits within our study populations. We went on to test in aggregate whether there were overlaps between the genomic regions driving this association signal and regions involved in aspects of human evolution over a range of timescales. To do so, we used LDSC heritability partitioning (39), a method that uses GWAS results to investigate whether common DNA variants in a certain set of genomic regions, named an annotation, explain a larger proportion of the SNP-based heritability of the trait than is expected based on the size of that annotation. Building on prior work on evolution of human brain structure (40), we studied five annotations reflecting different aspects of human evolution spanning periods from 30 Mya to 50,000 y ago (SI Appendix, Fig. S8). The tested annotations included human gained enhancers active in fetal and adult brain tissue, ancient selective sweep regions, Neanderthal introgressed variants, and archaic introgression deserts (details are in SI Appendix, Extended Methods). The latter two annotations relate to admixture events between Homo sapiens and Neanderthal populations that took place when the different hominins encountered each other outside Africa some 50,000 to 60,000 y ago, with the consequent gene flow leaving remnants (introgressed fragments) that can be detected in the genomes of living humans. Archaic deserts are long stretches in the human genome that, despite these admixture events, are significantly depleted for Neanderthal alleles in living humans, possibly due to critical functions of the genetic loci in H. sapiens and intolerance to gene flow (41). We observed significantly enriched heritability for archaic deserts, which was robust to multiple testing correction for analysis of five annotations (P < 0.01) (Dataset S14). Thus, our results suggest that common DNA variation in archaic deserts makes a larger contribution to individual differences in reading-/language-related traits within present-day populations than expected by chance.

Functional Enrichment Using Heritability Partitioning and MAGMA Gene Property Analysis.

We next investigated whether regions of the genome with tissue-specific functions are involved in reading-/language-related traits, again using LDSC heritability partitioning. We used annotations reflecting chromatin signatures from a broad range of tissues, as most variants identified in GWAS are located outside coding regions and are often found enriched in tissue-specific functional regions of the genome, such as promoters, enhancers, and regions with open chromatin. After correction for testing of 489 annotations (P < 1.02 × 10−4), three annotations showed significant heritability enrichment: histone-3 lysine-4 monomethylation (H3K4me1) in two fetal brain samples and the adult brain germinal matrix (SI Appendix, Fig. S9 and Dataset S14). H3K4me1 is considered a marker for enhancer regions. These results indicate that SNPs associated with reading-/language-related traits are overrepresented in fetal brain enhancers.

Next, we used MAGMA gene property analysis (22) to study whether the multivariate GenLang GWAS results were enriched in a specific tissue or brain cell type using tissue-specific and cell type–specific gene expression data in Functional Mapping and Annotation (FUMA) (42, 43). As MAGMA corrects for average expression, each comparison can only answer the question of whether the tissue or cell type is more related to the multivariate GWAS results than the average of the tissues or cell types in the dataset. After correction for 83 tissues (P < 6.02 × 10−4), no relation was found with tissue-specific gene expression patterns of adult tissues from the Genotype-Tissue Expression (GTEx) project and brain tissues of a specific (developmental) time from Brainspan (Dataset S15 and SI Appendix, Fig. S10). In the cell type–specific gene expression analysis, three single-cell RNA-sequencing datasets of embryonic, fetal, and adult brain tissue were used. After correction for 142 cell types (P < 3.52 × 10−4), the multivariate GWAS results were significantly associated with one of the mature neurons from the fetal dataset: red nucleus neurons (beta = 0.24, SE = 0.07) (Dataset S15 and SI Appendix, Fig. S11). This observation could reflect an association with this specific nucleus or with the higher maturity of the neurons compared with the other cell types in the fetal dataset. The red nucleus is a large structure in the ventral midbrain that is part of the olivocerebellar and cerebello–thalamo–cortical systems. It plays roles in locomotion and nonmotor behavior in various animals, and in humans, it might also play a role in higher cortical functions (44).

Discussion

We performed GWAS meta-analysis of five quantitative reading- and language-related traits (word reading, nonword reading, spelling, phoneme awareness, and nonword repetition) in sample sizes (up to ∼34,000 participants) that are substantially larger than previous genetic analyses of reading and/or language skills assessed with neuropsychological tools (prior GWAS efforts with maximal sizes of n = 1,331 to 10,819) (57, 4547). We identified genome-wide significant association for word reading (rs11208009 at chromosome 1, P = 1.10 × 10−8), highlighting DOCK7, ATG4C, ANGPTL3, and USP1 as potential candidates for involvement in this trait. Other SNPs from the locus have been associated with triglyceride and cholesterol levels (48) but may represent an independent association signal. Robust SNP-based heritabilities were observed, ranging from 0.13 for nonword repetition to 0.26 for nonword reading. These SNP-based heritabilities are similar in magnitude to those of the related trait dyslexia (estimates range from 0.15 to 0.25 on a liability scale) (49, 50); psychiatric traits, such as Attention Deficit Hyperactivity Disorder symptoms, in adults (0.22) (51); and brain imaging traits, such as cortical surface area (range from 0.12 to 0.33 for different regions) (52) and cortical thickness (range from 0.08 to 0.26) (52). They are larger than that of psychiatric traits, such as major depression (0.08) (53) and alcohol dependence (0.09) (54). So, despite highlighting the need for larger sample sizes to identify loci that individually exceed genome-wide significance, our GenLang GWAS already allowed for multiple informative follow-up analyses based on the full dataset of common variants across the genome, yielding findings that connect to neuroanatomical variation, human evolutionary history, and other aspects of the biology of spoken and written language.

Our work shows overlapping contributions of common genetic variants to individual differences in reading-/language-related and cognitive traits. Such overlaps are evident both from genetic correlation analyses and from the three-factor structural equation model, in which one factor explains most of the variation in word reading, nonword reading, spelling, and phoneme awareness. This is in line with the widespread pleiotropy found between many aspects of cognitive functioning, including language, reading, mathematics, and general cognitive ability (6, 55). We note that summary statistics from the current investigation have also been used to investigate genetic overlaps with self-report of dyslexia diagnosis in an independent GWAS by 23andMe (∼52,000 cases), yielding substantial negative genetic correlations between GenLang quantitative traits and dyslexia status (e.g., −0.71 for word reading, −0.75 for spelling) as reported by Doust et al. (50). Yet, nonword repetition, IQ, and educational attainment have, at least in part, different genetic foundations, as reflected in the residual genetic variation contributing to these traits that is not captured by the model. These findings are consistent with multiple behavioral studies showing a distinction between nonword repetition and other reading- and language-related traits (56). They are also in line with recent structural equation modeling of genetic trait interrelatedness for different reading- and language-related measures in the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort, which demonstrated a shared genetic factor accounting almost fully for the genetic variance in literacy-related phenotypes but for only 53% of that in nonword repetition (57). The 23andMe GWAS on self-reported dyslexia further shows the existence of effects on reading ability that are independent of IQ; of 42 genome-wide significant loci in that study, less than half had previously been associated with cognitive ability or educational attainment in prior high-powered investigations (50). Overall, our work thus enhances understanding of not only the overlaps but also, the distinctions between different reading-/language-related traits and more general cognitive abilities.

Further evidence of differences in trait etiology between the five reading-/language-related traits was observed in our genetic correlation analysis with GWAS results of components (identified via decomposition analysis) from school grades in the Danish iPSYCH cohort (20) and with results of a GWAS by subtraction analysis of educational attainment and cognitive performance (19). Higher scores on phoneme awareness and spelling appear to be genetically correlated with better performance in written than in oral examinations. This may reflect the greater importance of phoneme awareness and spelling for proficient writing than for oral language. Interestingly, word and nonword reading skills were associated with better performance in language than mathematics but not with better performance in written than oral language. Such findings may offer further proof that key reading skills, especially those involved in decoding as assessed in our GWAS meta-analyses, originate from oral language skills (58). “Component 4,” corresponding to relatively better performance in Danish (the native language of the participants in that study) as compared with performance in English, showed negative genetic correlations with our GenLang nonword repetition measure, possibly reflecting the particular importance of verbal short-term memory in second-language learning (59, 60). The results of the GWAS by subtraction, previously proposed to represent so-called “noncognitive” abilities related to educational attainment, such as motivation, curiosity, and persistence, were genetically correlated with word and nonword reading as well as nonword repetition in GenLang. Of note, genetic correlation analyses may be influenced by genetic nurture, the process whereby DNA variants of the parents affect phenotypic outcomes in their children (61). Genetic variants relating to the socioeconomic status of the family may, for example, be involved, as was recently found for cognitive traits (62). Future investigations that include information about nontransmitted alleles (63) and/or data from siblings (62) may help to disentangle pleiotropy from genetic nurture.

Human abilities to process spoken and written language depend on an array of distributed brain circuits (28, 29, 6466). We performed genetic correlation analyses of our multivariate GenLang GWAS with summary statistics from 58 MRI-based neuroanatomical phenotypes chosen because they concerned brain areas and/or tracts with known links to language processing (28, 29, 65, 66). We identified a significant genetic correlation with cortical surface area of the banks of the STS on the left hemisphere. The STS is a region where the processing of spoken and written language converges, in between modality-specific preprocessing and language comprehension (3336). A broad range of language-related functions has been previously linked with the left STS through (meta-analyses of) functional MRI and positron emission tomography studies, including those essential for the reading- and language-related traits included in the GWAS meta-analyses: sublexical processing of speech (67, 68) and representation of phonological word forms (69). The importance of this brain area for reading-related traits is also evident from a meta-analysis of structural MRI studies that found lower gray matter volume in the left STS related to reading disability and poor reading comprehension (70). Thus, findings from the genetic correlation analysis are consistent with the role of the STS as a hub where the processing of different language modalities gets integrated as well as the lateralization of such functions. Note, however, that while our work supports the existence of shared genetic factors influencing both left STS surface area and psychometric measures of reading-/language-related skills, it does not inform about potential causal relationships or direction of effects, nor does it identify which particular molecular mechanisms may be involved.

Capacities for acquiring spoken and written language appear unique to our species, building on underlying skills that emerged on the lineage leading to modern humans, but evolutionary accounts remain subject to considerable debate (37, 38). To explore whether GWAS data could give insights in this area, we used heritability partitioning to analyze five annotations representing different time frames and aspects of human evolution. Archaic introgression deserts, defined as genomic regions that are significantly depleted of Neanderthal ancestry, were enriched for genetic variants showing associations in our multivariate GenLang GWAS. Such regions are thought to correspond to genomic loci that were intolerant to the gene flow from Neanderthal populations into H. sapiens, which took place through admixture events around 50,000 to 60,000 y ago (41). These loci are enriched for conserved and functional genomic elements: promoters and regions conserved in primates (71) and enhancers active in many tissues as well as those specific for expression in fetal brain and muscle (72). Of all regulatory regions in archaic deserts, brain enhancers show signs of the most stringent purifying selection against introgressed Neanderthal variation. This evidence of conservation and selection indicates that archaic deserts mark parts of the genome where variation has a high probability of deleterious consequence (72). Enriched heritability for reading-/language-related traits in archaic deserts seems broadly consistent with differences between Neanderthals and H. sapiens in evolutionary trajectories for language emergence. However, analyses of this kind are only indirectly informative and cannot pinpoint any specific time frame of human evolution during which genetic variants associated with reading- and language-related traits were introduced. Investigations that further integrate data on reading-/language-related genetic signals and evolutionary annotations of hominin genomes would be warranted to shed further light on these complex questions.

Regarding functional implications of our findings, heritability partitioning of the multivariate GenLang GWAS results identified enrichment in enhancer regions present in fetal brain tissue and the adult germinal matrix. The enhancer regions of the germinal matrix are highly similar to those of fetal brain tissues and not the other adult brain tissues we studied, likely reflecting the neural stem cell population present in that tissue (73). In these analyses, there was no specific association with one particular brain cell type. However, follow-up work with MAGMA using single-cell RNA-sequencing data from fetal, embryonic, and adult brain uncovered a significant association with fetal neurons from the red nucleus, which may relate to the more adult state of these neurons compared with the other cell types in the fetal dataset (74). The MAGMA analysis could not be used to test for replication of the association with (fetal) brain tissue of the LDSC heritability partitioning, as results in this case are corrected for the association with the average expression of the dataset (22, 42), and the datasets with fetal data only included brain samples.

Our findings prompt a major reevaluation of prior literature on genetic associations with reading and language traits, especially with regard to candidate gene studies. The large-scale GWAS meta-analysis results made it possible to robustly and systematically investigate evidence for association of previously reported candidate SNPs/genes and suggestive genome-wide screening results from prior studies of reading-/language-related traits and disorders. Of the 54 candidate SNPs and 20 candidate genes that we assessed (none of which met genome-wide significance), only DCDC2 yielded an association that survived correction for multiple testing in the context of targeted replications. This locus showed association only at the gene-based level and with one trait: nonword reading. Some previously reported associations in the literature could reflect the specific language, phenotype, or recruitment procedure of the cohort in which the gene or variant was investigated and/or differences between contributions of common and rare variation at a locus of interest. Yet, the lack of support here also suggests that false-positive results have made an impact on the field, most likely related to limited sample size in prior reports, which is known to elevate the risk of type 1 error (75). Regarding validation of findings from the previous more limited GWAS of quantitative reading-/language-related traits, our large-scale meta-analysis identified association with spelling for the SNP rs1555839, previously found to be associated with rapid automatized naming and rapid alternating stimulus (7). Overall, these results highlight the need for a genome-wide perspective and the importance of large well-powered samples if we are to obtain reliable insights into the role of common genetic variants in language- and reading-related traits.

Reading- and language-related phenotypes pose special challenges for scaling up genetic analysis since psychometric assessments can be labor intensive to administer and score and because of the heterogeneity introduced by differences in assessment tools, ages, populations, and languages, among other factors (2, 76). One-item questions have enabled increases in sample size for GWAS of a wide range of traits and disorders, especially when available through large resources, such as the UK Biobank. As discussed above, a single self-report item about dyslexia diagnosis enabled an informative GWAS in the 23andMe research cohort (50). However, no validated questions have yet been described that adequately capture interindividual variability in reading and language skills in the normal range, which still requires administration of psychometric tests. The GenLang Consortium was established as an international effort by multiple research teams with the aims of overcoming such difficulties through a range of strategies and enabling large-scale well-powered investigations of genomic underpinnings of these important traits.

This first wave of analysis from GenLang represents the largest GWAS meta-analyses for direct quantitative assessments of reading- and language-related abilities to date, including 22 cohorts with data available for at least one of the phenotypes. Nonetheless, although substantially increased over prior work in this area, sample sizes may still be considered relatively modest compared with the state of the art for genetic association analyses of other complex traits. While they captured a significant proportion of the genetic variation underlying each phenotype, yielding several insights into the associated biology, detection of individual genome-wide significant loci was still limited. In addition, when sufficiently large datasets become available, it would be valuable to validate the genetic correlation and heritability partitioning results in additional independent datasets. A number of phenotypes of interest, including those that tap into syntactic skills or that involve different modalities such as sign languages, could not (yet) be pursued due to inadequate sample sizes, even when combining data available from multiple cohorts. We note that despite our best efforts at harmonizing the included datasets and limited evidence of heterogeneity in the results based on Cochran Q statistics, LDSC intercepts, and genetic correlations between subsets of the data, we cannot fully exclude that heterogeneity is introduced by the inclusion of data from different assessment tools, languages, and ages. The choice of assessment tools for future collection of reading- and language-related phenotypes for genomic studies, to increase the sample sizes of these GWAS meta-analyses and also, to collect additional language-related phenotypes, should, therefore, be based at least partially on optimal matching with existing data. At the same time, we should invest in facilitating and simplifying the collection of language-related phenotypes, in part by developing and optimizing test batteries that could be reliably administered online in web/app-based settings. These are major areas of focus for the GenLang Consortium moving forward.

In summary, our GWAS meta-analyses of five reading- and language-related phenotypes in sample sizes of up to ∼34,000 participants demonstrated significant SNP heritability for all traits and identified genome-wide significant associations with word reading on chromosome 1. Structural equation models revealed a single factor accounting for much of the genetic architecture underlying word reading, nonword reading, spelling, and phoneme awareness, prompting a multivariate GWAS analysis. The multivariate results were genetically correlated with cortical surface area of the banks of the left STS, a brain region where the processing of spoken and written language comes together. Partitioned heritability analyses showed enrichments in fetal brain enhancers, highlighting links to early brain development, and in archaic deserts depleted of Neanderthal ancestry, suggesting that genomic regions associated with emerging language-related skills in H. sapiens may have been intolerant to gene flow from other archaic hominins. These efforts by GenLang open up avenues for deciphering the biological underpinnings of spoken and written language.

Materials and Methods

A more detailed version of the methods is provided in SI Appendix, Extended Methods.

Study Cohorts.

The meta-analyses included GWAS summary statistics from 22 independent cohorts (Dataset S1). Different measures for the reading- and language-related traits had been assessed in each cohort and could be included in the GWAS meta-analyses (SI Appendix, Supplemental Notes has details of each measure, and Dataset S1 has an overview of the included measures and sample sizes for each cohort). Data from children, adolescents, and young adults aged 5 to 26 y were included. The phenotype data were standardized and then, adjusted for covariates (age, age2, sex, and ancestry principal components [PCs]); the genotype data were subjected to stringent quality control according to a detailed analysis plan following standard procedures for GWAS and then imputed using the Haplotype Reference Consortium version 1.1 panel or the 1000 Genomes Project Phase 3 reference panel (Dataset S2). Univariate GWAS analyses were performed using linear regression methods with the imputed additive genotype dosages for the full dataset and for males and females separately.

Meta-Analyses.

The summary statistics for each GWAS cohort for each trait were subjected to stringent quality control measures (SNPs were excluded if imputation quality scores were <0.7, minor allele frequency was <0.01, and/or minor allele count was ≤10). Meta-analyses of the univariate GWAS results were performed with METAL (version March 2011), with effect size estimates weighted using the inverse of the corresponding SEs. For the follow-up analyses with LDSC, separate meta-analyses without genomic control correction (77) were performed using only individuals of European ancestry as determined by ancestry PCs. To accommodate the multiple testing burden present in performing separate meta-analyses for the five reading- and language-related traits while taking into account the high phenotypic correlations between them, we calculated the effective number of independent variables (VeffLi) from the meta-analysis results using PhenoSpD (78) (v1.0.0). The Bonferroni-corrected genome-wide significant P value threshold was determined at 2.33 × 10−8 (5 × 10−8/2.15 independent traits).

GenomicSEM.

To investigate the high genetic correlations between the summary statistics of the five reading- and language-related traits and with cognitive performance (18) and educational attainment (18), we used GenomicSEM (version 0.03) (21) to model the joint genetic architecture. We used a two-step approach. First, exploratory factor analysis for models with one to four factors was run; second, the model that explained the largest part of the variance in data was chosen for follow-up with confirmatory factor analysis in GenomicSEM. Different models were tested based on the exploratory three-factor model, with different factor loadings based on strength thresholds between 0.1 and 0.5, and the optimal model was chosen based on several indices of model fit.

Multivariate GWAS Analysis.

A multivariate GWAS was performed on the univariate GWAS summary statistics of the four traits with the highest genetic correlations (word reading, nonword reading, spelling, and phoneme awareness) using MTAG (v1.0.8) (25).

Heritability and Genetic Correlation.

LDSC (77) (v1.0.0) was used to estimate genomic inflation and SNP-based heritability of the meta-analysis results and to investigate genetic correlations (13). All analyses were based on HapMap 3 SNPs only, and precalculated LD scores from the European 1000 Genomes reference cohort were used. GWAS summary statistics for genetic correlation analyses with cognitive traits were obtained from the Social Science Genetic Association Consortium (18) and the GWAS catalog (79) and through collaboration with the iPSYCH Consortium (20). Publicly available GWAS summary statistics of neuroimaging traits were obtained via the Oxford Brain Imaging Genetics Server (30); a total of 58 neuroanatomical phenotypes were selected based on their relevance to language processing. PhenoSpD (78) was used to calculate the effective number of independent variables (VeffLi) to inform the multiple testing correction of the genetic correlation analyses with the cognitive and brain imaging phenotypes. Following our targeted analysis of brain imaging traits, genetic correlations were estimated between the MTAG results and summary statistics of 20 cognitive, education, neurological, psychiatric, and sleeping-related traits and all 515 UK Biobank traits available in LD Hub (80) (v1.9.3).

Functional Mapping and Annotation of GWAS Meta-Analysis Results.

The FUMA platform (version 1.3.6a) (43) was used to annotate the genome-wide significant variants.

Gene and Gene-Set Analysis.

MAGMA (22) (version 1.08) gene analysis in FUMA was used to calculate gene-based P values and for gene-property analyses, to study relationships with tissue-specific and cell type–specific gene expression patterns. Several bulk RNA-sequencing and single-cell RNA-sequencing datasets of brain samples were assessed through FUMA (43).

Partitioning Heritability of Chromatin and Evolutionary Signatures.

LDSC heritability partitioning (39) was used to estimate the enrichment of heritability of the MTAG results in annotations reflecting evolutionary signatures from different periods along the lineage leading to modern humans, ranging from around 50,000 y ago back to 30 Mya [adapting a pipeline recently published by Tilot et al. (40)]. In addition, LDSC heritability partitioning was used to study the association with several annotations reflecting tissue-specific chromatin modification patterns. Annotations were based on data from the Roadmap Epigenomics project and Enhancing GTEx project (ENTEx) processed by Finucane et al. (81).

Supplementary Material

Supplementary File
pnas.2202764119.sd01.xlsx (20.2KB, xlsx)
Supplementary File
pnas.2202764119.sd02.xlsx (15.2KB, xlsx)
Supplementary File
pnas.2202764119.sd03.xlsx (11.8KB, xlsx)
Supplementary File
Supplementary File
pnas.2202764119.sd04.xlsx (14.3KB, xlsx)
Supplementary File
pnas.2202764119.sd05.xlsx (12.2KB, xlsx)
Supplementary File
pnas.2202764119.sd06.xlsx (26.3KB, xlsx)
Supplementary File
Supplementary File
pnas.2202764119.sd08.xlsx (15.1KB, xlsx)
Supplementary File
pnas.2202764119.sd09.xlsx (15.1KB, xlsx)
Supplementary File
Supplementary File
pnas.2202764119.sd11.xlsx (32.7KB, xlsx)
Supplementary File
pnas.2202764119.sd12.xlsx (17.8KB, xlsx)
Supplementary File
pnas.2202764119.sd13.xlsx (76.4KB, xlsx)
Supplementary File
Supplementary File
pnas.2202764119.sd15.xlsx (25.1KB, xlsx)

Acknowledgments

We thank all the children, twins, families, and participants who took part and are taking part in the 22 cohorts whose data contributed to these GWAS meta-analyses; the staff working on the different cohorts, including volunteers, study coordinators, interviewers, teachers, nurses, research scientists, general practitioners, midwives, psychologists, psychometrists, computer and laboratory technicians, and colleagues who assisted in the quality control and preparation of the imputed GWAS data; and the pharmacies and hospitals that were involved. B.M., B.M.-M., B.S.P., C.F., E.E., E.V., G.A., M.v.D., and S.E.F. are supported by the Max Planck Society. A.G. and T.F.M.A. were supported by the Munich Cluster for Systems Neurology (SyNergy), and A.G. was supported by Fondazione Umberto Veronesi. A.T.M. is supported by National Health and Medical Research Council of Australia (NHMRC) Grants 1105008 and 1195955 and Centre of Research Excellence Grant 1116976. A.J.O.W. is supported by NHMRC Grant 1173896. B.S.P. is supported by Simons Foundation Autism Research Initiative Grant 514787. C.Y.S. works in the Medical Research Council Integrative Epidemiology Unit at the University of Bristol (MC_UU_00011/3). D.I.B. acknowledges Royal Netherlands Academy of Science Professor Award PAH/6635. E.E. is supported by NIH Grant R01DC016977. E.G.W. and J.R.G. are supported by National Institute of Child Health and Human Development (NICHD) Grant P50 HD 27802. F.R. is supported by Agence Nationale de la Recherche Grants ANR-06-NEURO-019-01, ANR-17-EURE-0017 IEC, ANR-10-IDEX-0001-02 PSL, and ANR-11-BSV4-014-01 and European Commission Grant LSHM-CT-2005-018696. H.T. is supported by the Netherlands Organization for Scientific Research (NWO) and Netherlands Organisation for Health Research and Development (ZonMW) Grant VICI 016.VICI.170.200. J.C.D. was supported by NICHD Grant P50 HD 27802. J.J.M., J.B.To., and T.K. were supported by NIH Grant R01 DC014489. K.M.P. was supported by the Hospital for Sick Children Research Training Program (Restracomp). K.R. is supported by a Sir Henry Wellcome Postdoctoral Fellowship (213514/Z/18/Z). M.J.S. is supported by Wellcome Trust Grant WT082032MA. S.P. and F.A. are supported by Royal Society Grants UF150663 and RGF\EA\180141. T.B. is supported by Institut Pasteur, the Bettencourt-Schueller Foundation, and Université de Paris. The Adolescent Brain Cognitive Development Study is supported by the NIH and additional federal partners (NIH Grants U01DA041048, U01DA050989, U01DA051016, U01DA041022, U01DA051018, U01DA051037, U01DA050987, U01DA041174, U01DA041106, U01DA041117, U01DA041028, U01DA041134, U01DA050988, U01DA051039, U01DA041156, U01DA041025, U01DA041120, U01DA051038, U01DA041148, U01DA041093, U01DA041089, U24DA041123, and U24DA041147). The Aston Cohort was supported by funding from European Union (EU) Horizon 2020 Programme 641652 and Waterloo Foundation Grant 797/17290. The St. Andrews Bioinformatics Unit is funded by Wellcome Trust Grants 105621/Z/14/Z and 204821/Z/16/Z. ALSPAC is supported by UK Medical Research Council and Wellcome Grant 217065/Z/19/Z and the University of Bristol. A comprehensive list of grant funding is available on the ALSPAC website (http://www.bristol.ac.uk/alspac/external/documents/grant-acknowledgements.pdf). The Basque Center on Cognition, Brain and Language (BCBL) cohort was supported by the Basque Government through the Basic Excellence Research Centre program and the Agencia Estatal de Investigación through BCBL Severo Ochoa excellence accreditation. The Brisbane Adolescent Twin Sample was supported by Australian Research Council Grants A7960034, A79906588, A79801419, DP0212016, and DP0343921, with genotyping funded by the NHMRC Grant 389891. The Colorado Learning Disabilities Research Center cohort was supported by NICHD Grant P50 HD 27802. The Early Language in Victoria Study was supported by NHMRC Grant 436958. The Familial Influences on Literacy Abilities cohort is supported by the University of Amsterdam, the Max Planck Institue Nijmegen, and NWO Grants Rubicon 446-12-005 and VENI 451-15-017. The GRaD study was funded by the Manton Foundation, NIH Grants P50-HD027802 and K99-HD094902, and the Lambert Family. NeuroDys was funded by an EU Sixth Framework Program grant to the NeuroDys Consortium, Swiss National Science Foundation Grant 32-108130, and Austrian Science Fund Grant 18351-B02. The Netherlands Twin Register is funded by NWO Grants 480-04-004, 481-08-011, 056-32-010, 024.001.003, 480-15-001/674, 184.021.007, 184.033.111, and 56-464-14192; ZonMW Grants 911-09-032 and 912-10-020; the Amsterdam Public Health and Amsterdam Reproduction and Development Research Institutes; European Science Council Grant ERC Advanced 230374; EU Seventh Framework Program (FP7) Grant FP7/2007-2013: 602768; National Institute of Mental Health (NIMH) Grants U24 MH068457-06, R01 MH58799-03, and 1RC2 MH089995; and the Avera Institute for Human Genetics. The Pediatric Imaging, Neurocognition, and Genetics cohort is funded by NIH Grant RC2DA029475, the National Institute on Drug Abuse, and the Eunice Kennedy Shriver NICHD. The Philadelphia Neurodevelopmental Cohort is funded by NIH Grants RC2MH089983 and RC2MH089924, an institutional development award to the Center for Applied Genomics from The Children’s Hospital of Philadelphia, and a donation from Adele and Daniel Kubert and thanks the NIH data repository. The Raine study was supported by long-term funding from NHMRC Grants 572613, 403981, and 1059711 and Canadian Institutes of Health Research (CIHR) Grant MOP-82893. Funding was also provided by the University of Western Australia, Curtin University, the Women and Infants Research Foundation, the Telethon Kids Institute, Edith Cowan University, Murdoch University, the University of Notre Dame Australia, and the Raine Medical Research Foundation. The Raine study analyses were supported by the Pawsey Supercomputing Centre with funding from the Australian Government and the Government of Western Australia. The Saguenay Youth Study is supported by the CIHR, the Heart and Stroke Foundation of Quebec, and the Canadian Foundation for Innovation. The SLI Consortium was funded by Wellcome Trust Grant 076566 and UK Medical Research Council Grant G1000569. The Twins Early Development Study is supported by UK Medical Research Council Grants MR/V012878/1 and MR/M021475/1, NIH Grant AG046938, and the EU FP7 grant FP7/2007-2013/: 602768. Toronto was supported by CIHR Grant MOP-133440. UK Dyslexia was supported by Wellcome Trust Grants 076566/Z/05/Z and 075491/Z/04, Waterloo Foundation Grant 797–1720, EU Grant 018696, and Royal Society Grant UF100463. The York cohort was funded by Wellcome Trust Grant 082036/B/07/Z. We acknowledge iPSYCH for sharing their summary statistics. The iPSYCH team was supported by Lundbeck Foundation Grants R102-A9118, R155-2014-1724, and R248-2017-2003; NIMH Grant 1U01MH109514-01; and the Universities and University Hospitals of Aarhus and Copenhagen. The Danish National Biobank resource was supported by the Novo Nordisk Foundation. High-performance computer capacity was provided by the Center for Genomics and Personalized Medicine and the Centre for Integrative Sequencing, Aarhus University, Denmark.

Footnotes

The authors declare no competing interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2202764119/-/DCSupplemental.

Data Availability

The full GWAS summary statistics (for word reading, nonword reading, spelling, phoneme awareness, nonword repetition, performance IQ, each available as "full" and "European subset" datasets) are available through the GWAS Catalog (accession nos. GCST90104462GCST90104472) and the website of the GenLang network (http://www.genlang.org/) (82, 83). Code used to perform the meta-analysis and follow-up analyses is available in GitLab (https://gitlab.gwdg.de/else.eising/genlang_quantitative_trait_gwasma) (84).

References

  • 1.Fisher S. E., Marcus G. F., The eloquent ape: Genes, brains and the evolution of language. Nat. Rev. Genet. 7, 9–20 (2006). [DOI] [PubMed] [Google Scholar]
  • 2.Deriziotis P., Fisher S. E., Speech and language: Translating the genome. Trends Genet. 33, 642–656 (2017). [DOI] [PubMed] [Google Scholar]
  • 3.Andreola C., et al. , The heritability of reading and reading-related neurocognitive components: A multi-level meta-analysis. Neurosci. Biobehav. Rev. 121, 175–200 (2021). [DOI] [PubMed] [Google Scholar]
  • 4.Graham S. A., Fisher S. E., Understanding language from a genomic perspective. Annu. Rev. Genet. 49, 131–160 (2015). [DOI] [PubMed] [Google Scholar]
  • 5.St Pourcain B., et al. , Common variation near ROBO2 is associated with expressive vocabulary in infancy. Nat. Commun. 5, 4831 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Gialluisi A., et al. , Genome-wide association scan identifies new variants associated with a cognitive predictor of dyslexia. Transl. Psychiatry 9, 77 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Truong D. T., et al. ; Pediatric, Imaging, Neurocognition, and Genetics Consortium, Multivariate genome-wide association study of rapid automatised naming and rapid alternating stimulus in Hispanic American and African-American youth. J. Med. Genet. 56, 557–566 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Fisher S. E., Vernes S. C., Genetics and the language sciences. Annu. Rev. Linguist. 1, 289–310 (2015). [Google Scholar]
  • 9.Ehri L. C., Learning to read words: Theory, findings, and issues. Sci. Stud. Read. 9, 167–188 (2005). [Google Scholar]
  • 10.Ehri L. C., The development of spelling knowledge and its role in reading acquisition and reading disability. J. Learn. Disabil. 22, 356–365 (1989). [DOI] [PubMed] [Google Scholar]
  • 11.Duncan L. G., Language and reading: The role of morpheme and phoneme awareness. Curr. Dev. Disord. Rep. 5, 226–234 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Coady J. A., Evans J. L., Uses and interpretations of non-word repetition tasks in children with and without specific language impairments (SLI). Int. J. Lang. Commun. Disord. 43, 1–40 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bulik-Sullivan B., et al. ; ReproGen Consortium; Psychiatric Genomics Consortium; Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3, An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Watabe-Uchida M., John K. A., Janas J. A., Newey S. E., Van Aelst L., The Rac activator DOCK7 regulates neuronal polarity through local phosphorylation of stathmin/Op18. Neuron 51, 727–739 (2006). [DOI] [PubMed] [Google Scholar]
  • 15.Conklin D., et al. , Identification of a mammalian angiopoietin-related protein expressed specifically in liver. Genomics 62, 477–482 (1999). [DOI] [PubMed] [Google Scholar]
  • 16.Nijman S. M., et al. , The deubiquitinating enzyme USP1 regulates the Fanconi anemia pathway. Mol. Cell 17, 331–339 (2005). [DOI] [PubMed] [Google Scholar]
  • 17.Zhang L., Li J., Ouyang L., Liu B., Cheng Y., Unraveling the roles of Atg4 proteases from autophagy modulation to targeted cancer therapy. Cancer Lett. 373, 19–26 (2016). [DOI] [PubMed] [Google Scholar]
  • 18.Lee J. J., et al. ; 23andMe Research Team; COGENT (Cognitive Genomics Consortium); Social Science Genetic Association Consortium, Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Demange P. A., et al. , Investigating the genetic architecture of noncognitive skills using GWAS-by-subtraction. Nat. Genet. 53, 35–44 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Rajagopal V. M., et al. , Genome-wide association study of school grades identifies a genetic overlap between language ability, psychopathology and creativity. bioRxiv [Preprint] (2020). https://www.biorxiv.org/content/10.1101/2020.05.09.075226v2 (Accessed 22 May 2020). [DOI] [PMC free article] [PubMed]
  • 21.Grotzinger A. D., et al. , Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav. 3, 513–525 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.de Leeuw C. A., Mooij J. M., Heskes T., Posthuma D., MAGMA: Generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Meng H., et al. , DCDC2 is associated with reading disability and modulates neuronal development in the brain. Proc. Natl. Acad. Sci. U.S.A. 102, 17053–17058 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhong R., et al. , Meta-analysis of the association between DCDC2 polymorphisms and risk of dyslexia. Mol. Neurobiol. 47, 435–442 (2013). [DOI] [PubMed] [Google Scholar]
  • 25.Turley P., et al. ; 23andMe Research Team; Social Science Genetic Association Consortium, Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet. 50, 229–237 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Brown C. M., Hagoort P., The Neurocognition of Language (OUP, Oxford, United Kingdom, 1999). [Google Scholar]
  • 27.Price C. J., A review and synthesis of the first 20 years of PET and fMRI studies of heard speech, spoken language and reading. Neuroimage 62, 816–847 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Richardson F. M., Price C. J., Structural MRI studies of language function in the undamaged brain. Brain Struct. Funct. 213, 511–523 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Price C. J., The anatomy of language: A review of 100 fMRI studies published in 2009. Ann. N. Y. Acad. Sci. 1191, 62–88 (2010). [DOI] [PubMed] [Google Scholar]
  • 30.Smith S. M., et al. , An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank. Nat. Neurosci. 24, 737–745 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Medland S. E., et al. ; ENIGMA Genetics Working Group, Ten years of enhancing neuro-imaging genetics through meta-analysis: An overview from the ENIGMA Genetics Working Group. Hum. Brain Mapp. 43, 292–299 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Le B. D., Stein J. L., Mapping causal pathways from genetics to neuropsychiatric disorders using genome-wide imaging genetics: Current status and future directions. Psychiatry Clin. Neurosci. 73, 357–369 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wilson S. M., Bautista A., McCarron A., Convergence of spoken and written language processing in the superior temporal sulcus. Neuroimage 171, 62–74 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Venezia J. H., et al. , Auditory, visual and audiovisual speech processing streams in superior temporal sulcus. Front. Hum. Neurosci. 11, 174 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Friederici A. D., Kotz S. A., Scott S. K., Obleser J., Disentangling syntax and intelligibility in auditory language comprehension. Hum. Brain Mapp. 31, 448–457 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Vigneau M., Jobard G., Mazoyer B., Tzourio-Mazoyer N., Word and non-word reading: What role for the Visual Word Form Area? Neuroimage 27, 694–705 (2005). [DOI] [PubMed] [Google Scholar]
  • 37.Fitch W. T., Empirical approaches to the study of language evolution. Psychon. Bull. Rev. 24, 3–33 (2017). [DOI] [PubMed] [Google Scholar]
  • 38.Corballis M. C., Language evolution: A changing perspective. Trends Cogn. Sci. 21, 229–236 (2017). [DOI] [PubMed] [Google Scholar]
  • 39.Finucane H. K., et al. ; ReproGen Consortium; Schizophrenia Working Group of the Psychiatric Genomics Consortium; RACI Consortium, Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Tilot A. K., et al. , The evolutionary history of common genetic variants influencing human cortical surface area. Cereb. Cortex 31, 1873–1887 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Vernot B., et al. , Excavating Neandertal and Denisovan DNA from the genomes of Melanesian individuals. Science 352, 235–239 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Watanabe K., Umićević Mirkov M., de Leeuw C. A., van den Heuvel M. P., Posthuma D., Genetic mapping of cell type specificity for complex traits. Nat. Commun. 10, 3222 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Watanabe K., Taskesen E., van Bochoven A., Posthuma D., Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Basile G. A., et al. , Red nucleus structure and function: From anatomy to clinical neurosciences. Brain Struct. Funct. 226, 69–91 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Luciano M., et al. , A genome-wide association study for reading and language abilities in two population cohorts. Genes Brain Behav. 12, 645–652 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Harlaar N., Hayiou-Thomas M. E., Dale P. S., Plomin R., Why do preschool language abilities correlate with later reading? A twin study. J. Speech Lang. Hear. Res. 51, 688–705 (2008). [DOI] [PubMed] [Google Scholar]
  • 47.Price K. M., et al. , Genome-wide association study of word reading: Overlap with risk genes for neurodevelopmental disorders. Genes Brain Behav. 19, e12648 (2020). [DOI] [PubMed] [Google Scholar]
  • 48.Hoffmann T. J., et al. , A large electronic-health-record-based genome-wide study of serum lipids. Nat. Genet. 50, 401–413 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Gialluisi A., et al. , Genome-wide association study reveals new insights into the heritability and genetic correlates of developmental dyslexia. Mol. Psychiatry 26, 3004–3017 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Doust C., et al. , Discovery of 42 genome-wide significant loci associated with dyslexia. Nat. Genet. In press. [DOI] [PMC free article] [PubMed]
  • 51.Demontis D., et al. ; ADHD Working Group of the Psychiatric Genomics Consortium (PGC); Early Lifecourse & Genetic Epidemiology (EAGLE) Consortium; 23andMe Research Team, Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet. 51, 63–75 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Grasby K. L., et al. ; Alzheimer’s Disease Neuroimaging Initiative; CHARGE Consortium; EPIGEN Consortium; IMAGEN Consortium; SYS Consortium; Parkinson’s Progression Markers Initiative; Enhancing NeuroImaging Genetics through Meta-Analysis Consortium (ENIGMA)—Genetics working group, The genetic architecture of the human cerebral cortex. Science 367, eaay6690 (2020).32193296 [Google Scholar]
  • 53.Wray N. R., et al. ; eQTLGen; 23andMe; Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium, Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Walters R. K., et al. ; 23andMe Research Team, Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders. Nat. Neurosci. 21, 1656–1669 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Kovas Y., Plomin R., Generalist genes: Implications for the cognitive sciences. Trends Cogn. Sci. 10, 198–203 (2006). [DOI] [PubMed] [Google Scholar]
  • 56.Bishop D. V., Developmental cognitive genetics: How psychology can inform genetics and vice versa. Q J Exp Psychol (Hove) 59, 1153–1168 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Shapland C. Y., et al. , Multivariate genome-wide covariance analyses of literacy, language and working memory skills reveal distinct etiologies. NPJ Sci. Learn. 6, 23 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Hulme C., Snowling M. J., The interface between spoken and written language: Developmental disorders. Philos. Trans. R. Soc. Lond. B Biol. Sci. 369, 20120395 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Adlof S. M., Patten H., Nonword repetition and vocabulary knowledge as predictors of children’s phonological and semantic word learning. J. Speech Lang. Hear. Res. 60, 682–693 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Soroli E., Szenkovits G., Ramus F., Exploring dyslexics’ phonological deficit III: Foreign speech perception and production. Dyslexia 16, 318–340 (2010). [DOI] [PubMed] [Google Scholar]
  • 61.Hart S. A., Little C., van Bergen E., Nurture might be nature: Cautionary tales and proposed solutions. NPJ Sci. Learn. 6, 2 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Selzam S., et al. , Comparing within- and between-family polygenic score prediction. Am. J. Hum. Genet. 105, 351–363 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Kong A., et al. , The nature of nurture: Effects of parental genotypes. Science 359, 424–428 (2018). [DOI] [PubMed] [Google Scholar]
  • 64.Catani M., Forkel S. J., “Diffusion imaging methods in language sciences” in The Oxford Handbook of Neurolinguistics, de Zubicaray G. I., Schiller N. O., Eds. (Oxford University Press, Oxford, United Kingdom, 2019), 212–230. [Google Scholar]
  • 65.Roehrich-Gascon D., Small S. L., Tremblay P., Structural correlates of spoken language abilities: A surface-based region-of interest morphometry study. Brain Lang. 149, 46–54 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Perdue M. V., Mednick J., Pugh K. R., Landi N., Gray matter structure is associated with reading skill in typically developing young readers. Cereb. Cortex 30, 5449–5459 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Liebenthal E., Desai R. H., Humphries C., Sabri M., Desai A., The functional organization of the left STS: A large scale meta-analysis of PET and fMRI studies of healthy adults. Front. Neurosci. 8, 289 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Turkeltaub P. E., Coslett H. B., Localization of sublexical speech perception components. Brain Lang. 114, 1–15 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Okada K., Hickok G., Identification of lexical-phonological networks in the superior temporal sulcus using functional magnetic resonance imaging. Neuroreport 17, 1293–1296 (2006). [DOI] [PubMed] [Google Scholar]
  • 70.Eckert M. A., Berninger V. W., Vaden K. I. Jr., Gebregziabher M., Tsu L., Gray matter features of reading disability: A combined meta-analytic and direct analysis approach(1,2,3,4). eNeuro 3, ENEURO.0103-15.2015 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Petr M., Pääbo S., Kelso J., Vernot B., Limits of long-term selection against Neandertal introgression. Proc. Natl. Acad. Sci. U.S.A. 116, 1639–1644 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Telis N., Aguilar R., Harris K., Selection against archaic hominin genetic variation in regulatory regions. Nat. Ecol. Evol. 4, 1558–1566 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Kundaje A., et al. ; Roadmap Epigenomics Consortium, Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Zhong S., et al. , A single-cell RNA-seq survey of the developmental landscape of the human prefrontal cortex. Nature 555, 524–528 (2018). [DOI] [PubMed] [Google Scholar]
  • 75.Button K. S., et al. , Power failure: Why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14, 365–376 (2013). [DOI] [PubMed] [Google Scholar]
  • 76.Newbury D. F., Monaco A. P., Paracchini S., Reading and language disorders: The importance of both quantity and quality. Genes (Basel) 5, 285–309 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Bulik-Sullivan B. K., et al. ; Schizophrenia Working Group of the Psychiatric Genomics Consortium, LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Zheng J., et al. , PhenoSpD: An integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics. Gigascience 7, (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Buniello A.et al., . The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Zheng J., et al. ; Early Genetics and Lifecourse Epidemiology (EAGLE) Eczema Consortium, LD Hub: A centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33, 272–279 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Finucane H. K., et al. ; Brainstorm Consortium, Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.E. Eising et al., Data for “Genome-wide analyses of individual differences in quantitatively assessed reading- and language-related skills in up to 34,000 people.” Accession GCST90104462. GWAS Catalog. https://www.ebi.ac.uk/gwas/search?query=GCST90104462. Deposited 13 June 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.E. Eising et al., Data for “Genome-wide analyses of individual differences in quantitatively assessed reading- and language-related skills in up to 34,000 people.” Accession GCST90104472. GWAS Catalog. https://www.ebi.ac.uk/gwas/search?query=GCST90104472. Deposited 13 June 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.E. Eising et al., Code for “Genome-wide analyses of individual differences in quantitatively assessed reading- and language-related skills in up to 34,000 people.” GitLab. https://gitlab.gwdg.de/else.eising/genlang_quantitative_trait_gwasma. Deposited 9 November 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
pnas.2202764119.sd01.xlsx (20.2KB, xlsx)
Supplementary File
pnas.2202764119.sd02.xlsx (15.2KB, xlsx)
Supplementary File
pnas.2202764119.sd03.xlsx (11.8KB, xlsx)
Supplementary File
Supplementary File
pnas.2202764119.sd04.xlsx (14.3KB, xlsx)
Supplementary File
pnas.2202764119.sd05.xlsx (12.2KB, xlsx)
Supplementary File
pnas.2202764119.sd06.xlsx (26.3KB, xlsx)
Supplementary File
Supplementary File
pnas.2202764119.sd08.xlsx (15.1KB, xlsx)
Supplementary File
pnas.2202764119.sd09.xlsx (15.1KB, xlsx)
Supplementary File
Supplementary File
pnas.2202764119.sd11.xlsx (32.7KB, xlsx)
Supplementary File
pnas.2202764119.sd12.xlsx (17.8KB, xlsx)
Supplementary File
pnas.2202764119.sd13.xlsx (76.4KB, xlsx)
Supplementary File
Supplementary File
pnas.2202764119.sd15.xlsx (25.1KB, xlsx)

Data Availability Statement

The full GWAS summary statistics (for word reading, nonword reading, spelling, phoneme awareness, nonword repetition, performance IQ, each available as "full" and "European subset" datasets) are available through the GWAS Catalog (accession nos. GCST90104462GCST90104472) and the website of the GenLang network (http://www.genlang.org/) (82, 83). Code used to perform the meta-analysis and follow-up analyses is available in GitLab (https://gitlab.gwdg.de/else.eising/genlang_quantitative_trait_gwasma) (84).


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES