Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2025 Mar 10:2025.03.07.641231. [Version 1] doi: 10.1101/2025.03.07.641231

Rapidly evolved genomic regions shape individual language abilities in present-day humans

Lucas G Casten 1,*, Tanner Koomar 1,*, Taylor R Thomas 2, Jin-Young Koh 3, Dabney Hofamman 1, Savantha Thenuwara 4, Allison Momany 5, Marlea O’Brien 6, Jeffrey C Murra 5, J Bruce Tomblin 6, Jacob J Michaelson 1,6
PMCID: PMC11952349  PMID: 40161630

1 Summary

Minor genetic changes have produced profound differences in cognitive abilities between humans and our closest relatives, particularly in language. Despite decades of research, ranging from single-gene studies to broader evolutionary analyses[1, 2, 3, 4, 5], key questions about the genomic foundations of human language have persisted, including which sequences are involved, how they evolved, and whether similar changes occur in other vocal learning species. Here we provide the first evidence directly linking rapidly evolved genomic regions to language abilities in contemporary humans. Through extensive analysis of 65 million years of evolutionary events in over 30,000 individuals, we demonstrate that Human Ancestor Quickly Evolved Regions (HAQERs)[5] - sequences that rapidly accumulated mutations after the human-chimpanzee split - specifically influence language but not general cognition. These regions evolved to shape language development by altering binding of Forkhead domain transcription factors, including FOXP2. Strikingly, language-associated HAQER variants show higher prevalence in Neanderthals than modern humans, have been stable throughout recent human history, and show evidence of convergent evolution across other mammalian vocal learners. An unexpected pattern of balancing selection acting on these apparently beneficial alleles is explained by their pleiotropic effects on prenatal brain development contributing to birth complications, reflecting an evolutionary trade-off between language capability and reproductive fitness. By developing the Evolution Stratified-Polygenic Score analysis, we show that language capabilities likely emerged before the human-Neanderthal split - far earlier than previously thought[3, 6, 7]. Our findings establish the first direct link between ancient genomic divergence and present-day variation in language abilities, while revealing how evolutionary constraints continue to shape human cognitive development.

2. Main

The human genome differs from that of our closest living relatives by only 1–5%[8, 9, 10], yet this modest genetic divergence underlies profound differences in cognitive functions, particularly language. Understanding how such minor genetic changes gave rise to complex human abilities has been a central challenge in evolutionary genetics. Despite decades of research, we still lack a clear understanding of the genomic basis of human language ability – a trait that fundamentally shapes human cognition and culture.

The search for genetic foundations of human language accelerated with the discovery that mutations in FOXP2 can cause speech and language disorders[1, 2]. While initially heralded as “the language gene,” FOXP2 ‘s role in more typical variation in language ability proved to be elusive[11, 12]. Subsequent research shifted focus to numerous non-coding elements distributed throughout the genome[13, 14, 15, 16, 17]. This distributed regulatory model better captured the complexity of language but left open questions about evolutionary origins and how these elements influence language ability when perturbed.

Here, we provide evidence bridging these perspectives through systematic analysis of 65 million years worth of primate and human evolutionary events. We find one class of human specific genetic regions to be particularly important to human language, Human Ancestor Quickly Evolved Regions (HAQERs)[5] – sequences that began accumulating mutations at an unusually high rate after the human-chimpanzee ancestral split. Despite comprising a small fraction of the genome, HAQERs show robust and specific associations with core language ability, but not general cognition, across multiple large cohorts. Strikingly, we find these regions are under selection for increased affinity to Forkhead box transcription factor binding motifs (including FOXP2 ), with this motif integrity directly linked to language ability in a sample of modern humans. This finding provides a theoretical framework that explains both the impact of rare coding variants in genes like FOXP2 and the importance of distributed regulatory variation in shaping individual differences in language ability.

Our discovery emerged from a multi-modal study combining evolutionary genomics with language phenotyping and whole genome sequencing in present-day humans. We analyzed 350 children followed longitudinally with extensive language testing, validating our findings in larger cohorts including the SPARK study (N > 30,000) and the ABCD study (N > 5,000)[18, 19]. We further examined selective pressures acting on language and general cognition using ancient DNA from early humans, Neanderthals, and Denisovans (Allen Ancient DNA Resource, N > 3,000)[20]. This revealed that language-promoting variation in HAQERs has remained stable across humans throughout the past 20,000 years – we find a possible explanation for the apparent lack of positive selection as these variants have pleiotropic effects that increased fetal brain growth as well as birth complications. Finally, we investigated convergent evolution of HAQER-like sequences through analysis of 170 non-primate species[21], finding additional evidence for the key role of these regulatory sequences in vocal learning.

Our results demonstrate for the first time how ancient genomic changes directly influence individual variation in modern language abilities, reveal the central role of Forkhead box transcription factors in language evolution, and provide empirical support for a complex, ancient, and polygenic model of language evolution. By connecting evolutionary genomics with functional impacts on modern human cognition, our study provides triangulated support for the role of HAQERs in language development and opens new avenues for investigating the genetic foundations of human-specific traits.

3. Dimensions of language ability

To quantify dimensions of developmental language abilities, we analyzed 17 longitudinal cognitive and language assessments administered from kindergarten through 4th grade for 350 children sampled from a community-based cohort[22], which we refer to as the “EpiSLI” cohort. This analysis revealed seven factors representing distinct aspects of language ability (Figure 2a). The first factor (F1), primarily driven by sentence repetition scores, represents “core language” ability. Sentence repetition strongly indicates overall language capacity, making F1 a key measure of general language competence[23, 24]. The second factor (F2) relates to receptive vocabulary and listening comprehension, covering broad receptive language skills. The third factor (F3) specifically reflects nonverbal IQ, aligning with performance IQ at both kindergarten and 2nd grade. Factor F4 captures pre-literacy language skills, incorporating all kindergarten scores except performance IQ. Its slight correlation to F1 and F2 (r = 0.13 and 0.12), but not F3, suggests specificity to language (Figure 2b). Factor F5, which we call “talkativeness,” mainly reflects the number of clauses produced in a narrative task. Factor F6, based on a comprehension of concepts and directions assessment, indexes mastery of directive language (i.e., task-based instructions). Factor F7 spans a variety of assessments, with specific loading on vocabulary and grammar-related tasks, suggesting a broad, crystallized knowledge of language.

Figure 2: Factor loadings and genetic associations.

Figure 2:

a Loadings of cognitive and language assessments onto the seven language factors. g0 = Kindergarten (age 5–6), g2 = 2nd grade (age 7–8), g4 = 4th grade (age 9–10). b Pearson correlations for language factors (upper triangle) and distribution of each factor (diagonal). c Interpretations of the language factors based on their loadings. d Pearson correlations for each factor with genome-wide PGS. ** indicates FDR adjusted p-value < 0.05 and * indicates unadjusted p-value < 0.05.

Most of our preliminary investigation of these factors suggested that Factors 1, 2, and 3 carried the most genetic association signal (Figure 2c, Supplementary Table 3), so these were the focus of the work we present here. We also find pervasive associations with F1-F3 and measures of mental health in our sample (N = 241, Supplementary Table 2).

4. Linking evolution to individual differences in language ability with ES-PGS analysis

To investigate the genetic basis of language ability, we developed a novel analytical approach: Evolution Stratified Polygenic Score (ES-PGS) analysis. This method allows for systematic examination of how genetic variants from different evolutionary periods contribute to a trait by partitioning polygenic scores based on the evolutionary origin of the DNA sequence. The ES-PGS method extends traditional polygenic score analysis by comparing reduced and full statistical models, introducing an additional term that captures the contribution of specific evolutionary annotations to language ability. ES-PGS shares similarities with “partitioned heritability” and “pathway-based polygenic scores”[25, 26, 27, 28], which attribute trait heritability to specific genomic regions. The key difference from partitioned heritability methods is that ES-PGS uses individual-level data for predictions, enabling direct association testing in smaller but more deeply phenotyped cohorts. More information about the ES-PGS method and implementation can be found in the Methods section and our code repository (https://github.com/lucasgcasten/language_evolution).

Our genome-wide analysis identified significant associations between the polygenic score (PGS) for cognitive performance[29] (CP-PGS) with both core language (r = 0.22, FDR adjusted p-value = 0.001, F1) and receptive language ability (r = 0.19, FDR adjusted p-value = 0.01, F2) but not for nonverbal IQ (r = 0.1, FDR adjusted p-value = 0.33, F3, Figure 2d) in our EpiSLI sample. We then applied our ES-PGS method to the CP-PGS to determine whether the genomic regions influencing core language (F1) and receptive language (F2) originate from deeply conserved sequences (primate conserved sequence regions), or more recent additions to the human genome (human-Neanderthal divergent regions). By comparing contributions from 11 established evolutionary annotations spanning approximately 65 million years of evolution - from ancient primate-conserved regions to sequences under selection in the past few thousand years — we traced the evolutionary origins of language-related genomic elements.[21, 30, 31, 5, 32, 33, 34, 35].

As expected, CP-PGS in primate ultra-conserved elements had no association with language scores (ES-PGS model β = −0.074, p-value = 0.1). Also in line with expectations, CP-PGS in genes that are differentially regulated between humans and our closest living primate relative (human-chimp divergent genes) is significantly positively correlated with core language ability (ES-PGS model β = 0.107, p-value = 0.023), Figure 3a. This finding confirms that human-specific genetic sequence influences language abilities in contemporary humans, and this sequence evolved after the human-chimpanzee split (approximately 6–8 million years ago).

Figure 3: HAQERs are associated with language ability and not nonverbal IQ.

Figure 3:

a Comparison of evolutionary events on core language ability in EpiSLI and SPARK. Points represent the β provided from the ES-PGS models for each evolutionary annotation, while the ranges represent the 95% confidence interval. Solid points indicate p-value < 0.05. b Scatterplot of core language scores (F1) with HAQER CP-PGS after adjusting for the background PGS in the EpiSLI sample. c Scatterplot of nonverbal IQ scores (F3) with HAQER CP-PGS after adjusting for the background PGS in the EpiSLI sample. d-f Distributions of rare reversions counts from the SPARK whole genome sequencing data within 10Kb of the following regions: HAQERs (d), HARs (e), and random sequence (RAND, f). g Effects of rare reversions across HAQERs, HARs, and RAND on language related phenotypes in SPARK autism cases. Line ranges represent 95% confidence intervals from logistic and linear regression models. A positive β indicates delayed developmental age or higher likelihood of the diagnosis as reversions increase.

The most compelling finding emerged from the analysis of Human Ancestor Quickly Evolved Regions (HAQERs). Although they represent a small proportion of the human genome, HAQER-associated polygenic scores revealed robust and specific associations with language capabilities. Specifically, CP-PGS in HAQERs showed significant correlations with core language ability (ES-PGS model β = 0.117, p-value = 0.012, Figures 3a-b) and receptive language skills (ES-PGS model β = 0.098, p-value = 0.022), and without any association with nonverbal intelligence (ES-PGS model β = 0.0007, p-value = 0.99, Figure 3c). HAQERs, which began evolving approximately 6–8 million years ago – after the human-chimpanzee split but before the human-Neanderthal divergence – represent largely non-coding sequences that have rapidly acquired regulatory functions in humans. The genomic specificity of these regions is particularly striking. While the background CP-PGS utilized nearly 300,000 independent SNPs (single nucleotide polymorphisms) and explained 3.47% of variance in core language ability, the addition of HAQER CP-PGS (comprising only 1,350 independent SNPs) increased the explained variance to 5.22%, suggesting that HAQER SNPs carry on average 112 times more explanatory power for language than SNPs elsewhere in the genome.

Notably, we found no comparable signal from Human Accelerated Regions (HARs, β = −0.048, p-value = 0.29), which are deeply conserved regulatory elements that acquired human-specific changes[4, 32]. This distinction suggests that human language ability emerged disproportionately through novel regulatory innovations (HAQERs), rather than modifications to existing functional elements (HARs). The specific association between HAQERs and language factors (F1 and F2), but not nonverbal IQ (F3), reveals a distinct evolutionary trajectory for verbal abilities compared to general cognition.

5. Confirming HAQERs impact on language ability across the lifespan

Next, we validated our finding that HAQERs influence human language by examining the effects of both common and rare variants in these regions with the SPARK autism dataset[18]. The considerable heterogeneity in language abilities observed in autism populations makes SPARK an especially informative dataset for validating language-related genetic discoveries, as this variation enhances our ability to detect associations across the full spectrum of language abilities[36, 37]. Although SPARK is not as deeply phenotyped for language traits as our discovery sample, we have several meaningful language indices. First, we conducted a recontact study in more than 1,000 adults with and without autism (N = 917 with genetic data), in which we asked participants to complete an online battery of language tasks [38]. The sample has a “core language” factor that is comparable to the core language factor we find in EpiSLI (F1), it loads primarily onto sentence repetition, comes from a similar exploratory factor analysis, and is significantly correlated with CP-PGS (r = 0.2, p-value = 1.1×10−8). Additionally, in a much larger sample of autistic children in SPARK we have parent-reported developmental language disorder (N = 30,203) and verbal IQ from clinical records (N = 1,462). In all of these traits, we found that the HAQER partition of CP-PGS was significantly associated with language outcomes (Figure 3a, Supplementary Table 6). Further, to investigate specificity of HAQER-linked CP-PGS to language, we tested for association with measures of nonverbal IQ (N = 1,547) and intellectual disability (N = 30,203). There was no association between the HAQER partition of CP-PGS and the non-language related measures of cognitive function, indicating genetic variation in HAQERs specifically influences language and not other aspects of cognition (all statistics can be found in Supplementary Table 6).

Additionally, we analyzed the effects of rare genetic variation in HAQERs on cognitive traits with whole genome sequencing data from SPARK. We found that the vast majority of individuals carry rare “reversions” in and around HAQERs (variants with an allele frequency < 1% and “revert” to the human-chimp ancestor allele), indicating these regions remain highly polymorphic in modern human populations when compared to HARs and random sequences (Figure 3d-f). Whole genome sequencing data, developmental milestone information, and diagnosis information were available for more than 2,000 individuals with autism. Individuals with more rare reversions in HAQERs have delayed language development and are more likely to have developmental language disorders (Figure 3g, Supplementary Table 7).

Further supporting the specificity of HAQERs to language ability, we found that rare reversions in these regions showed no association with age started walking or intellectual disability (Figure 3g). To show that the effect is due to HAQERs and not any reversion across the genome, we also analyzed the effects of rare reversions in HARs[4] and random (RAND) non-coding sequence matched to HAQERs[5]. There was no association between rare reversions in random sequence with any of our phenotypes. We did find that rare reversions in HARs were associated with two of the four language phenotypes: age first combined words and age first combined phrases; but not with age of first word or developmental language disorder.

Taken together, our discovery and replication samples provide consistent and compelling evidence that genetic variation in HAQERs is associated with language traits, and not nonverbal IQ, in contemporary humans. In contrast, we do not find consistent evidence that genetic variation in HARs is associated with language, though rare reversions in HARs appear to impact some language development milestones independently of HAQERs.

6. Forkhead motifs in HAQERs are under selection and improve language

To investigate the molecular mechanisms underlying HAQERs’ role in language development, we analyzed how two classes of rare variants affect transcription factor binding sites (TFBS) in the EpiSLI cohort: (1) rare (minor allele frequency < 1%) human-chimp ancestral allele reversions and (2) other rare variants. Using position weight matrices (PWMs), we quantified how these variants alter TFBS motif scores (i.e., predicted TF binding affinity). By comparing the effects of reversions with other rare variants, we could detect signatures of human-chimp divergent selection on TF motif scores.

In HAQERs, our estimates of transcription factor motif selection showed a significant correlation with those motifs’ estimated impact on individual core language ability (β = 0.25, p-value = 1.9×10−11, Figure 4a), suggesting humans evolved to increase TF binding in these regions and this binding is associated with better core language ability (F1). In contrast, regions under sequence conservation (HARs, β = 0.04, p-value = 1, Figure 4b) or neutral evolution (RAND sequences, β = 0.01, p-value = 1, Figure 4c) showed no relationship between motif integrity selection and language ability. This highlights HAQERs’ unique and systematic selection for regulatory function during hominin evolution.

Figure 4: HAQERs show coupled evolutionary and functional effects on language-relevant transcription factor binding sites.

Figure 4:

a-c Relationship between selection for transcription factor motif integrity (x-axis) and motif association with language ability (y-axis) in (a) HAQERs, (b) HARs, and (c) random genomic regions. Each point represents one transcription factor motif. Error bars indicate ±1 standard error. Purple line (or gray for non-significant fits) shows York regression fit with 95% confidence interval (shaded); regression coefficient (β), chi-squared statistic (χ2), and p-values are shown. d Detailed view of motif effects in HAQERs colored by transcription factor family. Solid points indicate motifs with p < 0.05 for both positive selection and positive language association. Colored polygons show convex hulls for each transcription factor family. Several key Forkhead family members are labeled. Dashed lines at x = 0 and y = 0 define quadrants. e Enrichment analysis of transcription factor families for concordant positive selection and language effects, shown as log2 odds ratios. Error bars indicate 95% confidence intervals. Solid points indicate p < 0.05. f HAQER sequence similarity scores in vocal learning (blue) versus non-vocal learning (orange) mammals. Violin plots show score distributions, with individual species are indicated by points. Phylogenetic logistic regression coefficient (β) and p-value are shown.

Analysis of transcription factor binding in HAQERs revealed a striking enrichment of Forkhead box TFs in the upper right quadrant of the motif selection-language association space (Figure 4d), with every member of the Forkhead box TF family in this quadrant. This quadrant represents TFs showing both positive selection for binding site integrity and positive effects on core language ability (i.e., increased binding in HAQERs is associated with better language), the Forkhead box family displaying the strongest enrichment among all TF families (odds ratio = 19.53, p-value = 1.3×10−5; Figure 4e, Supplementary Table 16). FOXC2 demonstrated particularly strong signals of human-gained binding affinity (β = 0.31, p-value = 4.2 × 10−12) and association with language ability (β = 0.11, p-value = 0.035). FOXP2 showed a similar directional trend but did not meet significance criteria (human-gained binding affinity β = 0.24, p-value = 7.7 × 10−8, but a p-value > 0.05 in the language association test). Our data reveals that binding of the Forkhead box family within HAQERs played a crucial role in the evolution of human language.

7. Selective pressures acting on language and general cognition

Having established HAQERs’ role in human language evolution, we next examined how language-related genetic variants have been selected for throughout the past 20,000 years of human history using the Allen Ancient DNA Resource (AADR)[20]. The AADR is the largest genotyped collection of ancient humans, providing harmonized genotype and metadata for each sample (like radiocarbon dating based sample ages). We identified ancient west Eurasians, then correlated their HAQER CP-PGS and the background CP-PGS with sample age (N = 3,244 individuals with remains dated between 18,775 to 150 years ago passing quality control). We see that general cognition (background CP-PGS) has been subject to positive selection and has increased substantially over time (selection coefficient = 0.089, p-value = 2.8 × 10−12), Figure 5c. Unexpectedly, we found that HAQER CP-PGS has been stable throughout human history - meaning ancient and modern humans carry similar numbers of language-related alleles in HAQERs (selection coefficient = −0.002, p-value = 0.87).

Figure 5: Selective pressures acting on language alleles in HAQERs.

Figure 5:

a Polygenic selection of HAQER and background CP-PGS, correlating sample age with CP-PGS in ancient west Eurasians from the AADR. b-c Distribution of HAQER CP-PGS (b) or background CP-PGS (c) in modern Europeans (N = 503 individuals from the 1000 Genomes dataset), with black dotted lines indicating the PGS in the four Neanderthal and Denisovan genomes. d Comparison of F-statistics across HAQERs, HARs, and random sequences (RAND). F-statistics measure heterozygosity enrichment, with lower values indicating more heterozygosity than expected. “***” is used to indicate statistical significance (p-value < 0.001) based on t-test comparisons between each pair of regions. e Site Frequency Spectrum (SFS) comparison between HAQERs, HARs, and random sequences (RAND). x-axis represent minor allele frequency bins, y-axis is the log2(ratio) comparing HAQERs to the other sequence types. Positive log2(ratios) indicate that HAQERs have proportionally more variants in that allele frequency bin compared to the other sequence type. f Correlation between core language ability (F1, x-axis) with F-statistics in HAQERs (y-axis).

The presence of HAQERs in archaic humans provided a unique opportunity to examine genetic potentiators of cognitive traits across human species. To do this, we computed HAQER CP-PGS and background CP-PGS in archaic humans and compared them to modern humans. Remarkably, all four high-coverage archaic human genomes (three Neanderthals and one Denisovan) showed elevated HAQER CP-PGS (mean z-score = 0.91), while having reduced background CP-PGS (mean z-score = −1.45, 5a-b, Supplementary Table 12). The elevated HAQER CP-PGS observed in archaic humans remains consistent even after accounting for population structure and limiting our analysis to HAQER SNPs directly genotyped across all archaic samples, indicating this result is unlikely to be a technical artifact. This unexpected pattern suggests that our ancient relatives had an elevated genetic predisposition for language abilities despite lower scores for general cognition, challenging traditional views of archaic human capabilities.

The striking stability of the HAQER CP-PGS throughout human evolutionary history, potentially predating the human-Neanderthal split, led us to hypothesize that HAQERs have been maintained through balancing selection. To test this hypothesis, we conducted multiple population genetic analyses in the EpiSLI sample comparing HAQERs to both HARs and matched random genomic sequences (RAND). First, we observed an enrichment of intermediate frequency variants (MAF 10–50%) in HAQERs - a characteristic signature of balancing selection (Figure 5e). This enrichment pattern indicates a “heterozygote advantage”, where selective pressures are actively maintaining genetic diversity in these regions rather than driving language-beneficial alleles to fixation. To directly evaluate a heterozygote advantage in HAQERs, we calculated F-statistics across these regions, which quantify the deviation between observed and expected heterozygosity. HAQERs exhibited significantly lower F-statistics compared to both HARs (t-statistic = −55.3, p-value = 8 × 10−254) and random sequence (t-statistic = −57.9, p-value = 1.8 × 10−265), indicating excess heterozygosity in HAQERs (Figure 5d). Intriguingly, we discovered that individuals with higher F-statistics in HAQERs—those possessing more homozygous genotypes across these regions—displayed significantly better language abilities (rho = 0.11, p-value = 0.036, Figure 5f). This apparent paradox can be resolved by a model of antagonistic pleiotropy, where homozygosity for certain HAQER variants enhances language ability but potentially decreases reproductive fitness. These findings collectively support a balancing selection model in which a heterozygote advantage for reproductive outcomes constrains the fixation of language-enhancing alleles in HAQERs, maintaining these variants at intermediate frequencies throughout human history.

8. HAQERs influence birth complications and prenatal neurodevelopment

The well-established neurodevelopmental functions of HAQERs guided our hypothesis about balancing selection in these regions[5]. We hypothesized that while variants in HAQERs can benefit language ability, these same variants likely created reproductive challenges through adverse effects on fetal development, possibly through birth complications and increased energy requirements, thus preventing their fixation despite their cognitive advantages. To test this hypothesis, we examined links between HAQERs and prenatal development in modern humans with the ABCD sample[19]. We conducted a within-family analysis, comparing ES-PGS between siblings and their differences in birth outcomes, allowing us to more rigorously account for environmental effects than typical population scale analyses. We found that HAQER CP-PGS influences birth weight and birth complications within families (N > 500 sibling pairs from ABCD). The difference between sibling birth weight was positively associated with their difference in HAQER CP-PGS (ES-PGS β = 1.62, p-value = 0.03), meaning the sibling born heavier carries more HAQER CP-PGS (Figure S1f). We did not observe an association with difference in birth weight and background CP-PGS (ES-PGS β = −0.17, p-value = 0.82, Figure S1e). We also examined 47 of these sibling pairs where only one was born via c-section (Figure S1h). The siblings born by c-section had higher HAQER CP-PGS (t-statistic = 2.06, p-value = 0.045), and no difference in their background CP-PGS (t-statistic = 0.02, p-value = 0.88, Figure S1c-d).

We sought to further expand our understanding of HAQERs neurodevelopmental influences by examining their regulatory effects during prenatal and postnatal periods. By intersecting HAQER regions with single-cell quantitative trait loci (scQTLs) from developing midbrain neurons, we found HAQERs are significantly enriched for variants influencing prenatal brain gene expression. Our analysis revealed HAQERs are enriched for regulatory elements across all observed prenatal timepoints and almost all cell types, suggesting a broad neurodevelopmental impact (Figure S1a). Critically, these regions showed no enrichment in adult brain scQTLs, indicating their influence occurs primarily during prenatal development (Figure S1b).

To further test evidence of a HAQERs influence brain structure prenatally, we associated HAQER CP-PGS intracranial volume and intracranial growth phenotypes using a large independent sample of adolescents (ABCD). Finding that HAQER CP-PGS was predictive of total intracranial volume, even after accounting for body size (ES-PGS β = 0.03, p-value = 0.02, N = 5,274 unrelated individuals, Figure S1g). HAQERs were not predictive of intracranial growth in adolescence, supporting that HAQERs influence brain development prenatally (ES-PGS β = 0, p-value = 0.98, N = 3,156 unrelated individuals, Figure S1h). Background CP-PGS was related to both total intracranial volume and growth through adolescence, indicating their influence on postnatal brain development (Figure S1g-h).

Together, these analyses suggest that language related variants in HAQERs may also increase the size of newborns, their brains, and subsequent birth complications. The relationship between HAQERs and birth complications provides a possible explanation as to why they have been under balancing selection, in contrast to the background CP-PGS which has undergone positive selection and appears to have more of an influence on postnatal neurodevelopment.

9. Evidence of convergent evolution in HAQERs for vocal learning

Next, we investigated whether homologous sequences to HAQERs in other species might influence their communication abilities. Previous work has classified > 200 mammalian species as either vocal learners or non-vocal learners[39]. Vocal learning species can acquire and modify their vocalizations through experience and auditory feedback, in contrast to species restricted to innate vocalizations. We focused our analysis on 170 non-primate species which were not used to derive HAQERs (121 non-vocal learning and 49 vocal learning species). For each species, we computed genome-wide “HAQER-like” and “HAR-like” sequence similarity scores by comparing regions aligning to the human reference genome using multiple sequence alignment data[40]. Strikingly, vocal learning species showed significantly higher “HAQER-like” sequence similarity scores compared to non-vocal learners, even after controlling for phylogenetic relatedness and “HAR-like” sequence similarity (phylogenetic regression beta = 3.92, p-value = 5.9 × 10−6, Figure 4f). Given that these vocal learning capabilities evolved independently from the human lineage, this finding reveals a fundamental property of HAQER-like sequences: they can reproducibly promote vocal learning across diverse evolutionary contexts, reaching their most sophisticated expression in humans. This supports an interpretation of the F1 core language factor as the human instantiation of a broadly reproducible “vocal learner” phenotype that has evolved independently in multiple mammalian lineages.

10. Discussion

In this study, we combine evolutionary genomics with observations in deeply phenotyped modern humans to understand how rapidly evolving genomic regions have contributed to and continue to shape the development of human language. The first major contribution of this work is in reconciling two major theories of the development of human language: the single-gene theory, popularized by early findings implicating FOXP2 [1, 2, 41], and more recent work that espouses a highly distributed, polygenic model[13, 14, 15, 16, 17, 42]. In brief, we find and replicate evidence that HAQERs[5] – ~1,500 previously neutral regions that have gained regulatory potential in the human lineage – harbor alleles that have a strikingly disproportionate effect on language: a SNP in a HAQER has on average 112 times the impact on language that a SNP elsewhere in the genome has. While this result implicates regions of highly concentrated signal, these loci are distributed throughout the genome, thus supporting the polygenic view of human language evolution[13, 43, 44, 42]. Interestingly, we also found evidence that selective forces in the human lineage have guided HAQER sequences toward a significant preference for Forkhead box binding motifs (e.g., FOXP2 ), and that the integrity of these motifs underlies individual differences in language ability in modern humans. This finding supports a key role for Forkhead box transcription factors in human language at both the species and individual level. Early reports focusing on rare coding variation in FOXP2 struggled to find later generalization in more commonly observed individual differences in language[12, 11]. Our findings suggest that FOXP2 and other Forkhead family members likely contribute to individual differences in language more consistently through polygenic variation in their downstream binding targets (e.g., in HAQERs in the current study) than through their own protein-coding variation.

Having established HAQERs as a key mechanism in language evolution, we next sought to understand how they manifest in measurable language abilities. This led us to examine which aspects of language most directly reflect the ancient genomic foundations we identified, versus more recent elaborations. The sentence repetition task presented an interesting case study in this regard. While human language researchers have debated the precise cognitive processes it measures - with evidence suggesting roles for verbal working memory, grammatical knowledge, phonological processing, and other linguistic systems[45, 46, 23, 24] - our cross-species analysis suggests it may tap into something more fundamental. Through our factor analysis of deeply and longitudinally phenotyped children (kindergarten, second, and fourth grades), we found that sentence repetition loads primarily on a core factor of language ability that remains stable across development even as other language skills change. While this task does engage complex linguistic knowledge in humans, our finding that HAQERs influence sentence repetition ability and show convergent evolution in vocal learning species suggests that the task’s core demand - the ability to perceive and accurately reproduce novel acoustic sequences through auditory feedback - reflects a fundamental capacity that emerged through regulatory changes in HAQERs during hominin evolution, and has independently evolved in other vocal learner species through similar genetic mechanisms.

This interpretation is strengthened by several converging lines of evidence. First, HAQERs show a striking specificity in their effects – they influence sentence repetition ability (F1) but not nonverbal IQ, mirroring how vocal learning capacity is dissociable from other cognitive abilities across species. Second, HAQERs’ influence on prenatal brain development, evidenced by their enrichment for variants affecting prenatal gene expression and their association with intracranial volume, suggests they help establish early neural substrates that support vocal learning ability. This is further supported by the pleiotropic effects we observe, where HAQERs that promote language ability also influence birth weight and are associated with birth complications, suggesting a fundamental role in prenatal development. Third, the enrichment of Forkhead box transcription factor binding sites in HAQERs provide a plausible mechanistic framework for known mechanisms of vocal learning, including FOXP2. Perhaps most compellingly, we find that homologous HAQER-like regions show convergent evolution specifically in vocal learning species across diverse mammalian lineages, suggesting they represent a common genetic substrate for the recurrent emergence of vocal learning ability.

The evolutionary trajectory of HAQER variants offers surprising insights into the emergence of human language. While cognitive variants outside HAQERs show clear positive selection over the past 20,000 years, HAQER variants have remained remarkably stable. Even more striking, we find that Neanderthals and Denisovans carried a higher proportion of language-promoting HAQER variants than modern humans, despite having lower polygenic scores for general cognition. This pattern suggests that the genetic foundations for complex language capabilities were established before the human-Neanderthal split (predating many previous language emergence estimates[47, 6, 7]) and have been maintained by balancing selection ever since. This apparent evolutionary paradox—the lack of positive selection on language-promoting HAQER variants—can be explained by our discovery of their pleiotropic effects on prenatal development. HAQERs that enhance language ability also influence birth weight and are associated with increased risk of birth complications, creating an evolutionary trade-off between core language ability and reproductive fitness, potentially including mechanisms linked to cephalopelvic disproportion. Importantly, our results align with observations linking prenatal development with language outcomes[48, 49]. These results suggest that the genetic architecture supporting language has been shaped not only by its adaptive benefits but also by developmental constraints. Modern medical interventions may now be relaxing these evolutionary constraints, potentially allowing for novel trajectories in the ongoing evolution of language-related genomic variation.

Our study has several important limitations. While we demonstrate clear associations between HAQER variation, Forkhead box binding motifs, and language ability, direct experimental validation of binding and its developmental consequences would strengthen our understanding of the causal mechanisms. In particular, the specific pathways through which HAQER variants influence prenatal development and birth outcomes warrant further investigation through functional studies. The evolutionary history of HAQERs presents another challenge. Although we have good coverage of recent human history through ancient DNA, our understanding of selection on HAQERs during earlier periods of hominin evolution remains limited by available samples. The oldest available genomic data from our genus is ~400,000 years old[50], leaving much of our early evolutionary history unresolved. Additionally, the relationship between HAQERs and reproductive fitness may have varied across different human populations and time periods in ways that are difficult to fully reconstruct.

These limitations point to several promising avenues for future research. Understanding how HAQER variants interact with other genomic regions could reveal additional insights into language evolution, while their association with birth complications suggests important clinical implications that warrant further investigation. Recent research has demonstrated how variants inherited from ancient human subpopulations can influence complex traits[51, 52, 53] - applying similar analyses to HAQERs could reveal how different human groups contributed to the mosaic of genetic variation that shapes modern language abilities. Finally, examining HAQER effects across diverse linguistic contexts could illuminate how these ancient adaptations interface with different language systems, though our findings linking HAQERs to vocal learning across species suggest their effects operate at a fundamental level that transcends specific languages.

This work bridges millions of years of human evolution with present-day individual differences, revealing how ancient genomic innovations established and continue to refine human language ability. The discovery that these same regions show convergent evolution across vocal learner species suggests we have identified a fundamental genetic mechanism for the emergence of complex communication - one that has been repeatedly utilized throughout evolutionary history and continues to shape human cognitive development today.

11. Methods

11.1. Human Subjects and IRB

All EpiSLI subjects in this study were minors who assented to participation under the University of Iowa IRB# 200511767. Secondary genetic analyses for EpiSLI subjects were approved and carried out under the University of Iowa IRB #201406727. SPARK is approved under WIRB #201703201. Approval for data collection and analysis of the SPARK Research Match study described here was conducted under the the University of Iowa IRB #201705739.

11.2. EpiSLI Cohort

The discovery cohort of this study - referred to as EpiSLI - was originally recruited as an epidemiological study of language impairments in Kindergarteners in the state of Iowa. A more detailed description of the recruitment scheme can be found in the initial [54], and subsequent [22, 55] publications. In brief, 7,218 kindergarteners, sampled to be representative of Iowa’s population were screened for language impairment with a rapid, 40-item subset [54] of the TOLD-2P [56]. Children who failed the TOLD-2P screener (i.e., have poor language ability) were over-sampled to capture a broad range of language ability in the primary cohort (n = 1,929 children). Children with autism spectrum disorder and/or intellectual disability were excluded. The entirety of the primary cohort received a more complete battery of language and cognitive assessments, while teachers and parents completed scholastic and behavioral questionnaires about the children 2nd, 4th, 8th, and 10th grades. Language and cognitive assessments included: TOLD-2P[56], WPPSI[57], a narrative story task[58], Woodcock Reading Mastery Tests-Revised[59], Word-Sound Deletion task[60], Random Animals-Colors task [60]. We selected a total of 390 children from the primary cohort for whole genome sequencing. This sample represents a broad spectrum of language abilities, these 390 children were chosen by sampling from the tails of the distribution of a composite language score inward until we reached our final sample size [54, 11].

Behavioral assessment scores came from T-scores for the summary scales of the Child Behavior Checklist (CBCL) developed by ASEBA[61]. CBCLs were completed at the 2nd grade timepoint of the study (241 of the 350 children used in this study had complete CBCL data for the analysis in Supplementary Table 2).

11.2.1. Factor Analysis

An exploratory maximum-likelihood factor analysis was carried out on the language and cognitive assessment scores using the factanal function from the stats package in R [62], with the default “varimax” rotation. A number of factors from 1–10 were evaluated, with 7 - the largest number of factors with a nominal χ2 p-value less than 0.05 - chosen for subsequent analyses (Figure S1A). These seven factors accounted for an estimated 62.3% of the total variance in the original data.

11.2.2. EpiSLI Cognitive measures

Core language assessment scores and IQ were normalized via the general scheme described in [54] to account for the over-sampling of individuals with low language ability. Core language scores were corrected for age as the residual of a linear regression model using lm() in R [62]. The age-corrected core language scores were then normalized with respect to the population using the wtd.mean() and wtd.var() functions from the Hmisc package [63] in R.

11.2.3. Cognitive data Imputation

Most individuals for whom whole genome sequencing was generated had scores from assessments given in 2nd grade, while a minority had only 4th grade (41/390) or kindergarten (39/390) scores. Missing scores were imputed using chained random forests and predictive mean matching as implemented the missRanger R package [64] with 5,000 trees and pmm.k = 20.

11.3. EpiSLI Whole Genome Sequencing

11.3.1. Sample Collection and whole genome sequencing

DNA was collected and extracted from either blood or saliva samples for 390 individuals (350 of which were used in the final analysis after quality control). DNA concentration of all sequenced samples was quantified with with Qubit 2.0 Fluorometer (Life Technologies Corporation). 390 DNA samples were sheared on a E220 Focused-ultrasonicator (Covaris) to an average size of 400 bp. Sequencing libraries were generated with a Kapa Hyper Prep kit (Kapa Biosystems) according to the manufacturer protocol. All samples had an average genome-wide coverage of at least 20X and were sequenced on a HiSeq4000 (Illumina) with 150-bp Paired End chemistry.

11.3.2. Post-sequencing QC

All sequencing data was analysed with Fastqc (v0.11.8)[65], where no samples failed the run modules. After genome alignment (see below), some samples were found to have significantly better coverage than others, in excess of 40x. To help control for ascertainment bias in these samples, all samples with an average coverage > 35x (based on initially mapped BAM) were randomly down-sampled, leading to the distribution of genome-wide coverage and average insert-size outlined in Supplementary Table 17.

11.3.3. Genome Alignment and Variant Calling

Reads were processed with bcbio (v1.1.6)[66] and mapped to hg19 via BWA-mem (v0.7.17)[67]. SNVs and InDels were then called with all samples in a pool with three variant callers: GATK (v4.1.2.0)[68], FreeBayes (v1.1.0.46)[69], and Platypus (v0.8.1.2)[70].

Variants from each caller were filtered according to caller-specific quality metrics. For GATK, variants needed to pass all VQSR tranche thresholds. For Platypus, variants were filtered based on the goodness of fit of genotype calls, excessive region-based haplotype scores, root-mean-square mapping quality, variant quality and its ratio with read depth, low complexity sequence context, allele bias, region-based read quality, neighboring homopolymers, and strand bias. For FreeBayes, variants were filtered based on a combination of allele frequency, read depth, and overall quality. The thresholds used for filtering Platypus and FreeBayes calls were the default set by bcbio. An ensemble of all three callers was generated and used in all subsequent analyses in order to achieve improved specificity in the detection of rare variants. All variants in the ensemble callset were called by GATK as well as either Platypus OR FreeBayes. The majority of variants ( 87%; 25,775,508) were called by all three callers, while a minority were called by GATK and only one other caller ( 13%; 3,937,146). All variants were filtered to have a QUAL and individual quality score ≥ 20. Qualified researchers can access the individual level genotypes from dbGaP at (study accession = phs002255.v1.p1): https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002255.v1.p1

11.3.4. Final sample QC

Starting from the initial sample size of 390, a total of 14 samples were flagged for removal from all association analyses because of population stratification. Specifically, these samples did not cluster with 1,000 Genomes Europeans[71], or were more than 3 standard deviations away from the rest of the EpiSLI cohort based on the top 10 multidimensional scaling components calculated from SNPs found at or above a 0.05 minor allele frequency. An additional 26 samples were dropped due to relatedness or limited phenotypic data, leaving a final sample size of 350 unrelated European individuals with complete data for all genomic analyses.

11.3.5. Variant Annotations

Variants were annotated with the Ensembl Variant Effect Predictor tool (VEP v109) [72]. Reference population allele frequencies came from 1000 Genomes Phase 3 samples[71] and GnomAD[73], rsIDs of variants were dbSNP [74] (v151), these additional annotations were added using VCFanno [75] (v20190119).

11.4. Polygenic Scores (PGS)

11.4.1. Genotypes for PGS

To derive a set of SNPs suitable for PGS analysis, we merged our dataset with 1000 Genomes Europeans to use as a reference sample. Based on widely used recommendations[76], we extracted SNPs with a minor allele frequency ≥ 1%, Hardy-Weinberg equilibrium p-value > 1 × 10−6, and a missingness rate < 2% in both samples (leaving 7,719,665 SNPs total for PGS calculation).

11.4.2. PGS Calculation

LDpred2 was used to calculate a genome-wide PGS for all traits with the infinitesimal model using the provided UK Biobank LD reference panel and HapMap3+ variant set [77]. PGS were calculated using GWAS summary statistics for psychiatric traits: ADHD [78], addiction[79], alcohol dependency [80], Autism [81], Anorexia [82], bipolar disorder [83], depression [84], insomnia [85], neurodevelopmental conditions [86], PTSD [87], and schizophrenia [88]. GWAS summary statistics for cognitive traits included: cognitive performance[89], educational attainment [90], executive functioning [91], and the “g Factor” [92]. Additional PGS were calculated for the following behavioral and socioeconomic status related traits: childhood aggression[93], antisocial behavior [94], empathy [95], the BIG5 personality traits [96], income [97], and the Townsend Deprivation Index (a measure of material deprivation) [98]. PGS were also calculated for brain structural[99, 100] and functional connectivity phenotypes [101]. Finally, we computed PGS for miscellaneous traits: left-handedness [98], height [102], childhood trauma [103], and vocal pitch [104]. It is important to note we did not compute PGS in EpiSLI for reading based traits, because our sample was part of the discovery cohort of the largest reading related GWAS to date [13]. Associations for all of these PGS with our language factors can be found in Supplementary Table 3.

To account for population stratification, we corrected PGS for the first 5 genetic principal components. PGS were normed to the 1000 Genomes Europeans reference sample.

11.4.3. ES-PGS Calculation

Here we describe ES-PGS, a novel method to estimate the point in evolutionary history that genetic code key to a phenotype developed. This technique leverages two key pieces of information; (1) evolutionary genomic annotations and (2) stratified PGS, providing a PGS for a trait of interest across each provided annotation in the genome. The annotation restricted PGS (ES-PGS) is used to represent key evolutionary events and a “background PGS” is calculated to control for the rest of the genome. The ES-PGS and the background PGS are jointly modeled to determine whether the evolutionary annotation is predictive of the phenotype and independent of the rest of the genome (as measured by the background PGS). These results were used to answer the question: “how much does this specific part of the genome contribute to language ability, over and above what is expected from the rest of the genome?”.

Briefly, we use the ANOVA test of a reduced and full model to determine whether the addition of a new model term significantly improves the model fit. In the case of ES-PGS, the reduced model is:

y=β0+β1PGSbackground+ϵ 1

where y is the phenotype of interest, in our case the language factor score (e.g., F1, F2, or F3), PGSbackground is the polygenic score calculated using all independent SNPs except those in the evolutionary annotation of interest, and ϵ is the error term. The full model includes an additional term for the ES-PGS:

y=β0+β1PGSbackground+β2PGSannotation+ϵ 2

where PGSannotation is the polygenic score calculated using only independent SNPs within the specific evolutionary annotation. By comparing these models, we can determine whether the addition of the annotation-specific PGS significantly improves the model’s fit, indicating a unique contribution of that evolutionary annotation to the trait beyond what is explained by the background PGS. This approach allows us to systematically test the contributions of a range of evolutionary periods to our language ability phenotypes, providing insights into the evolutionary history of genetic variants associated with language development.

PRSet [28] was used to calculate stratified PGS using human genome annotations in BED formatted files, computing a clumping and thresholding based ES-PGS for each annotation and a background PGS for the rest of the genome. Background regions for each annotation were identified using the “complement” function from bedtools (v2.26.0)[105]. As recommended by the authors of PRSet, we used a p-value threshold of 1 and the 1000 Genomes Europeans as the LD reference [71]. We corrected for population stratification using the same steps as described in the genome-wide PGS analysis.

We focused our analysis on genome annotations related to primate and human evolution, annotations included: primate ultra conserved regions (primate UCEs) [21], primate lineage accelerated regions [30], human-chimp divergent genes (differentially methylated genes between humans and chimp)[31], Human Ancestor Quickly Evolved Regions (HAQERs)[5], Human Accelerated Regions (HARs)[32], Neanderthal Selective Sweep loci (the 5% of the human genome most depleted for Neanderthal derived variants)[33], and recent human selection pressures (highest scoring 5% of SNPs based on absolute value of Singleton Density Scores)[35]. All annotations were converted to BED format and lifted over to match genome builds when necessary, using the UCSC liftOver tool[106].

11.4.4. ES-PGS replication in SPARK

To replicate our ES-PGS results in a separate sample we used the SPARK cohort, a large genetic study of individuals with autism and their family members [18]. For the replication we utilized imputed SNP array data that we have previously described[107] to compute an ES-PGS using the same workflow as in our EpiSLI sample. One phenotype, the “core language ability” factor, came from a previously described research match where we had > 1,000 adults with autism or parents of children with autism complete an online language battery[38]. All other phenotypes came from the SPARK v13 phenotype release. All analyses included age and sex as covariates.

11. 5. SPARK rare ancestral reversion analysis

To explore the effects of rare genetic variation in evolutionary significant regions on language ability, we used the whole genome sequencing data from the previously described SPARK cohort (max N = 11,545)[18]. Briefly, we merged the whole genome sequencing data provided by SPARK (WGS batches 1–4) using bcftools, then used the same processing pipeline as we did for the EpiSLI cohort: filtered to variants with a QUAL and individual quality score ≥ 20, and annotated the variants with VEP (v109)[72]. To identify rare variants we then filtered variants in both datasets to have a maximal reference population allele frequency < 1% and an allele frequency < 1% in SPARK. Ancestral alleles were identified using those provided in the original HAQER manuscript[5]. We identified all reversion variants within 10Kb of HAQERs, HARs, or random non-coding (RAND) sequence and counted the number of rare ancestral reversions each sample had in each of these elements. We removed outlier samples who had > 2.5 median absolute deviations away from the median value for either HAQER, HAR, or random reversions (N = 1,781 outliers), but had consistent phenotypic associations even when including these outliers in the analysis. We then used these reversion counts for association with speech and language phenotypes in SPARK.

11.6. Transcription Factor Analysis

11.6.1. Variant selection and annotation

We analyzed transcription factor binding sites in three distinct genomic contexts: Human Accelerated Regions (HARs), Human-Accelerated Quickly Evolved Regions (HAQERs), and matched random genomic regions (RAND, all taken from[5]). Position weight matrices (PWMs) for 633 human transcription factors were obtained from the JASPAR2020 database[108], with pseudocounts adjusted according to base frequency distributions.

Variants were filtered using strict quality control criteria as described above. For this analysis, we retained only biallelic single nucleotide variants (SNVs) located within feature boundaries that exhibited minor allele frequencies below 1%. Complete great ape allele information was required for each variant, including data from Neanderthals and Denisovans, Chimpanzee, Bonobo, Gorilla, and Orangutan genomes [109, 110, 111, 112, 5]. The final dataset comprised genotype information from 15,746 rare variant sites across 350 individuals, with corresponding variant annotations.

11.6.2. Reversion Status Determination

To characterize the evolutionary trajectory of variants, we developed a machine learning approach to impute reversion status where direct determination was not available. We implemented an elastic net regression model (α = 0.9) using the glmnet package in R[113], incorporating great ape allele states as predictors of reversion status as given in[5]. To address class imbalance, we applied weights to ensure equal representation across sequence context types (HAQER, HAR, random). Reversion status was assigned using a probability threshold of 0.8, with known states preserved for training data.

11.6.3. Sequence and Motif Analysis

For each variant, we extracted 51-base pair genomic windows centered on the variant position from the human reference genome (hg19). Alternative sequences were generated by substituting variant alleles into the reference background. We then calculated maximal motif scores for both reference and alternative sequences across all JASPAR2020 human transcription factor motifs. Scores were computed on both forward and reverse complement strands, with the maximum score retained for reference and alternate alleles of each variant-motif pair.

11.6.4. Language Association Analysis

Core language ability was assessed using the F1 measure as described above in the factor analysis. We computed burden scores for each transcription factor motif by combining variant effects weighted by genotype status of reversion sites. Linear regression models were employed to estimate the associations between individual context-specific (aggregate) motif scores and language ability.

11.6.5. Transcription Factor Motif Score Selection Analysis

For each transcription factor, the (Z-scaled) difference in reference and alternate allele motif scores was modeled as a linear function of variant sequence context (HAQER, HAR, or random) and a binary reversion status indicator. Separate reversion effects were estimated for HAR, HAQER, and random region variants as a sequence context by reversion interaction term. Estimated beta coefficients from these terms, as well as their standard errors, were extracted for use in downstream analyses.

11.6.6. Joint Selection-Language Enrichment Analysis

We used York regression analysis[114] to examine relationships between selective pressure for motif integrity and language-related effects. This approach accounts for uncertainty in both variables (i.e., language association betas and selection effect betas). Prior to the analysis, the sign of the selection betas were flipped such that positive values indicate human-divergent selection for increased motif scores (i.e., reversions to the human-chimp ancestral allele tend to decrease motif scores). York regression betas, their standard errors, a Chi-squared goodness of fit statistic and its p-value were extracted to interpret the significance of the overall relationship between TF motif integrity (i.e., motif score) and motif score effect on individual differences in language ability, for each sequence context. Individual TF motifs of interest are those with nominal significance (p < 0.05) for both selection for motif integrity and for positive association of aggregate motif integrity with higher F1 core language scores. Transcription factors were classified into families using InterPro annotations[115] of representative families found to be significantly associated with a reversion effect in either direction (using www.string-db.org)[116]. To identify transcription factor families showing convergent patterns of selection and language association, we performed 2×2 Fisher’s exact tests comparing the proportion of motifs with concordant effects (positive selection AND positive language association vs. all other combinations) across families. Odds ratios with 95% confidence intervals and p-values were computed for each TF family.

11.7. Ancient DNA

11.7.1. Neanderthal and Denisovan DNA data and ES-PGS

VCFs with genotypes for the three high coverage Neanderthals (Altai[109], Chagyrskaya[110], and Vindija[111]) and one Denisovan[112] produced from high-coverage while genome sequencing were downloaded from the Max Planck Institute for Evolutionary Anthropology website. The individual genomes were merged using bcftools. We then subset the VCFs to SNPs used in our EpiSLI dataset merged with 1000 Genomes Europeans for analysis (N = 7,593,137 overlapping loci), limiting spurious genotypes and allowing us to more directly compare the samples. We then computed ES-PGS in the merged archaic human, 1000 Genomes European, and EpiSLI dateset using the same SNPs as we did in the primary EpiSLI HAQER ES-PGS analysis. This left us with 1,335 SNPs called in both the archaic samples and modern samples out of the 1,350 total HAQER independent SNPs used in the EpiSLI discovery sample. We imputed missing genotypes in all samples using the mean genotype value for the missing SNP so they would have minimal impact on the final ES-PGS. Given the difficulty of accounting for population stratification with the archaic humans, we used the raw polygenic score presented in Figures 5b-c to provide more conservative estimates of archaic human cognitive abilities. Notably, when we do account for the genetic PCs more traditionally, based on 5 first PCs computed using the 1000 Genomes Europeans, the archaic human polygenic scores become even more extreme (higher HAQER CP-PGS and lower background CP-PGS, Supplementary Table 12). Additionally, to ensure this result was not simply an artifact of some systematic missingness from these ancient sample we computed HAQER CP-PGS using only SNPs called in all 4 of the archaic humans and came to the same conclusion that archaic human species appear carry more HAQER CP-PGS than modern Europeans (N = 558 SNPs, Supplementary Table 12).

11.7.2. Ancient homo sapiens data

DNA data and sample age information for ancient homo sapiens came from the Allen Ancient DNA Resource (AADR) version 54. We downloaded the publicly available EIGENSTRAT formatted files and converted them to PLINK format using the EIGENSOFT tool. We then merged the ancient genomes with our EpiSLI and 1000 Genomes Europeans dataset to ensure we were using comparable SNPs for our ES-PGS selection analysis as we did in our discovery sample. We identified ancient west Eurasians using the same criteria as a recent large-scale selection analysis[117]. Briefly, we filtered to samples found between longitude 25W and 60E and latitude 35N to 80N, samples passing quality control with an assessment labeled as “PASS”, and sample ages > 0 but < 20,000 years old.

We then computed ES-PGS for CP in HAQERs using the same methodology as we did in the EpiSLI, SPARK, and ABCD samples for use in our polygenic selection analysis. Given the challenges of accounting for population structure in ancient DNA, we opted to use a LMM based approach instead of a traditional PC based approach as recommended by Akbari et al., 2024[117]. With the ancient west Eurasian subsample, we then identified independent SNPs using 1000 Genomes as the LD reference with PLINK’s “–indep-pairwise” function (window size = 1000bp, step size = 1bp, r2 threshold = 0.05, MAF ≥ 5%)[118]. Next, we identified samples and SNPs with low missingness for GRM calculation (samples missing < 50% of independent SNPs, and SNPs missing in < 10% of those samples). Finally, we computed the genetic relatedness matrix (GRM) with GCTA[119] with the QC passing samples and SNPs and removed duplicate/twin samples for subsequent analysis (GCTA “grm-cutoff” 0.9).

We then used the 3,244 QC passing samples and the GRM based on the 12,146 QC passing SNPs for the ES-PGS analysis. We implemented the LMM based polygenic selection analysis with the gaston[120] R package (lmm.aireml function), allowing us to account for the GRM which reflects population structure and relatedness of the sample. We used log10(sample age) as the outcome variable and HAQER CP-PGS and background CP-PGS as the independent variables (similar to the ES-PGS analysis we used in our other samples).

11.8. Detecting balancing selection

To detect signatures of balancing selection, we analyzed the WGS data we generated in EpiSLI. First, we subset to regions of interest (HAQERs, HARs, and RAND sequences). Then we computed a site frequency spectrum (SFS) for each sequence class, calculating the proportion of variants in each minor allele frequency bin. We compared HAQERs SFS to the SFS of both HARs and RAND sequences, to identify whether HAQERs had a relative enrichment of intermediate frequency variants which can indicate balancing selection (or ongoing selection).

Next, to more formally test our balancing selection hypothesis we identified common independent SNPs in these regions using PLINK[118], using the “–indep-pairwise” function (window size = 200bp, step size = 50bp, r2 threshold = 0.5, MAF ≥ 5%)[118]. This identified common independent SNPs in HAQERs (2,641 SNPs), HARs (1,411 SNPs), RAND sequences (1,670 SNPs). We then computed individual level F-statistics for each class of variation using the “–het” function in PLINK[118], which derives F-statistics based on the expected number of homozygotes and observed homozygotes (with more negative values indicating there are more heterozygotes than expected). We compared F-statistics between classes using t-tests to determine if there was excess heterozygosity in HAQERs, a signature of balancing selection. Additionally, we correlated the sample level F-statistics with core language (F1) and nonverbal IQ (F3) scores to determine whether the excess heterozygosity is beneficial to language.

11.9. ES-PGS analysis in ABCD

To explore the effects of HAQER CP-PGS in prenatal development we analyzed the ABCD cohort, a large longitudinal study of adolescent development with genetic, brain imaging, and developmental phenotypes [121]. All phenotypes used in ABCD came from the v4.0 data release. Similar to the SPARK replication analysis, we utilized imputed SNP array data that we have previously described[107] to compute ES-PGS using the same workflow as in our EpiSLI and SPARK samples. We computed genetic relatedness using GCTA[119] in the merged ABCD and SPARK dataset. Using the relatedness matrix, we identified unrelated individuals for ES-PGS analysis with brain imaging phenotypes (genetic relatedness < 0.05). We used the most recent intracranial volume value provided by ABCD for association with our CP ES-PGS. Covariates for the intracranial volume analysis included: age, sex, height, and weight. For the postnatal brain growth phenotype, we identified each individual’s initial structural brain MRI as well as their most recent and computed difference scores between the intracranial volumes, limiting our analysis unrelated individuals who had multiple scans and a brain growth score > 0. We adjusted the brain growth score for total volume, age, sex, and difference in age between MRI scans.

For our within-family ES-PGS analysis, we used the relatedness matrix to identify full siblings in the dataset (0.675 < relatedness > 0.375) who were not dizygotic twins (different ages). Using the dataset of full siblings (N > 500 sibling pairs), we computed difference scores for birth related phenotypes (birth weight and birth via c-section) as well as for CP ES-PGS (HAQERs and background). In the sibling birth weight difference ES-PGS analysis, we included difference in age, sex, and gestation duration as covariates.

11.10. HAQER enrichment of scQTLs

To determine if HAQERs, HARs, and RAND sequences significantly influence gene expression in prenatal and postnatal brains, we leveraged scQTLs from two studies. (1) a study of stem cell derived neurons, meant to mimic early prenatal cells in the midbrain[122] and (2) a study of nearly 400 postmortem adult brains from psychENCODE2[123]. For scQTL study (1), we defined scQTLs as all variants with a p-value < 5 × 10−4, for scQTL study (2) we utilized all scQTLs defined as significant by the authors. To determine enrichment for scQTLs in our regions of interest, we used the “intervalOverlap” function from gonomics[124]. This allowed us to compute expected versus actual overlaps (based on size of the human genome and the provided annotations) providing an enrichment p-value for each region and scQTL cell-type combination.

11.11. Convergent evolution of vocal learning analysis

To test for evidence of convergent evolution of “HAQER-like” sequences in vocal learning species, we utilized a large dataset of > 400 species with whole genome data aligned to the human reference genome (hg38)[40]. We parsed the alignments with Biopython[125] to subset to regions of interest. For each species and sequence type (HAQERs or HARs), we computed a sequence similarity (using the number of bases matching the human reference genome in these regions). We then used the “HAQER-like” sequence similarity to predict vocal learning status in 170 non-primate species that have been previously described[39]. To determine statistical significance and account for species relatedness we used phylogenetic logistic regression[126] with the phylolm package[127] in R. Additionally, we included “HAR-like” sequence similarity as a covariate to demonstrate the effect was specific to “HAQER-like” sequences.

11.12. Statistical analysis

All statistical analysis was done in R (version 4.3.1)[62].

11.13. Approach to multiple testing correction and triangulating evidence

To address multiple testing concerns, we employed False Discovery Rate (FDR) correction for analyses testing more than 20 hypotheses, allowing us to limit type 1 errors (false positives). This was applied to the EpiSLI CBCL mental health score analysis and genome-wide PGS analysis where we conducted many statistical tests (Supplementary Tables 2–3).

Rather than relying solely on stringent p-value thresholds, we adopted a comprehensive approach to validate our key findings through multiple lines of evidence. This strategy was particularly important given that some of our individual statistical tests showed modest statistical significance (0.01 < p-values < 0.05). We established confidence in our results through: (1) replication across independent cohorts (e.g., EpiSLI and SPARK); (2) identifying convergent evidence across different analytical approaches (e.g., common variant, rare variant, and transcription factor binding analyses); (3) demonstration of specificity, showing HAQERs’ associations with language but not nonverbal IQ; (4) evolutionary support from ancient DNA and cross-species analyses; and (5) mechanistic support through analysis of transcription factor binding and enrichment of variants influencing prenatal gene regulation. This multi-faceted approach allowed us to distinguish robust biological signals from statistical noise, even when some individual analyses showed moderate statistical significance. Our findings’ consistency across diverse data types, species, and analytical methods provides stronger evidence than would be achieved through any single statistical test, regardless of its p-value.

Supplementary Material

Supplement 1

Figure 1:

Figure 1:

Overview of this study

12. Acknowledgments

We are grateful for the contributions of the EpiSLI cohort, their families, and the research team. We also appreciate the effort of participants, families, and research team members in ABCD, SPARK, SPARK Research Match, and Tempus. We appreciate obtaining access to genetic and phenotypic data for SPARK data on SFARI Base. We are incredibly appreciative of the open access datasets used in this study, including the researchers who put together the AADR dataset and the Zoonomia consortium. We also greatly appreciate Dr. Kristi Hendrickson and Dr. Susan Shen for valuable and encouraging discussions about this project. This work was funded by NIH grant DC014489, some additional funding came from the National Institutes of Health through a Predoctoral Training Grant (T32GM008629) to LGC.

Footnotes

14

Code Availability

We used publicly available tools for data processing and analysis:

bcftools: https://samtools.github.io/bcftools/bcftools.html

PLINK: https://www.cog-genomics.org/plink

GCTA: https://yanglab.westlake.edu.cn/software/gcta

LDpred2: https://privefl.github.io/bigsnpr/articles/LDpred2.html

PRSet: https://choishingwan.github.io/PRSice/quick_start_prset/

bedtools: https://bedtools.readthedocs.io

Biopython: https://biopython.org/

gonomics: https://github.com/vertgenlab/gonomics

R: https://www.r-project.org/

Custom code for ES-PGS and all analyses in this paper is available here, including example code for ES-PGS that should be adaptable to other research problems: https://github.com/lucasgcasten/language_evolution

13. Data Availability

The EpiSLI whole genome sequencing data described here is available to qualified researches via dbGaP (study accession = phs002255.v1.p1): https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002255.v1.p1

SPARK genetic and phenotype data is available to qualified researchers at SFARI base: https://base.sfari.org/

ABCD is available to qualified researchers at: https://nda.nih.gov/abcd/request-access

1000 Genomes Phase 3 data is available at: https://www.internationalgenome.org/data/

Allen Ancient DNA Resource: https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadab

Neanderthal and Denisovan genomes: https://www.eva.mpg.de/genetics/genome-projects/

Cross-species sequence alignment data: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/cactus447way/

References

  • [1].Lai C. S. L., Fisher S. E., Hurst J. A., Vargha-Khadem F., and Monaco A. P., “A forkhead-domain gene is mutated in a severe speech and language disorder,” Nature, vol. 413, pp. 519–523, 10 2001. [DOI] [PubMed] [Google Scholar]
  • [2].Vernes S. C., Newbury D. F., Abrahams B. S., Winchester L., Nicod J., Groszer M., Alarcón M., Oliver P. L., Davies K. E., Geschwind D. H., Monaco A. P., and Fisher S. E., “A functional genetic link between distinct developmental language disorders,” New England Journal of Medicine, vol. 359, pp. 2337–2345, 11 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Tajima Y., Vargas C. D. M., Ito K., Wang W., Luo J.-D., Xing J., Kuru N., Machado L. C., Siepel A., Carroll T. S., Jarvis E. D., and Darnell R. B., “A humanized nova1 splicing factor alters mouse vocal communications,” Nat. Commun., vol. 16, Feb. 2025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Pollard K. S., Salama S. R., King B., Kern A. D., Dreszer T., Katzman S., Siepel A., Pedersen J. S., Bejerano G., Baertsch R., Rosenbloom K. R., Kent J., and Haussler D., “Forces shaping the fastest evolving regions in the human genome.,” PLoS genetics, vol. 2, p. e168, 10 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Mangan R. J., Alsina F. C., Mosti F., Sotelo-Fonseca J. E., Snellings D. A., Au E. H., Carvalho J., Sathyan L., Johnson G. D., Reddy T. E., Silver D. L., and Lowe C. B., “Adaptive sequence divergence forged new neurodevelopmental enhancers in humans.,” Cell, vol. 185, pp. 4587–4603.e23, 11 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Pagel M., “Q&A: What is human language, when did it evolve and why should we care?,” BMC Biol., vol. 15, Dec. 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Crow T. J., “Is schizophrenia the price that homo sapiens pays for language?,” Schizophr. Res., vol. 28, pp. 127–141, Dec. 1997. [DOI] [PubMed] [Google Scholar]
  • [8].Suntsova M. V. and Buzdin A. A., “Differences between human and chimpanzee genomes and their implications in gene expression, protein functions and biochemical properties of the two species.,” BMC genomics, vol. 21, p. 535, 9 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Britten R. J., “Divergence between samples of chimpanzee and human dna sequences is 5%, counting indels.,” Proceedings of the National Academy of Sciences of the United States of America, vol. 99, pp. 13633–5, 10 2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Varki A. and Altheide T. K., “Comparing the human and chimpanzee genomes: searching for needles in a haystack.,” Genome research, vol. 15, pp. 1746–58, 12 2005. [DOI] [PubMed] [Google Scholar]
  • [11].Mueller K. L., Murray J. C., Michaelson J. J., Christiansen M. H., Reilly S., and Tomblin J. B., “Common genetic variants in foxp2 are not associated with individual differences in language development,” PLOS ONE, vol. 11, p. e0152576, 4 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Atkinson E. G., Audesse A. J., Palacios J. A., Bobo D. M., Webb A. E., Ramachandran S., and Henn B. M., “No evidence for recent selection at foxp2 among diverse human populations,” Cell, vol. 174, pp. 1424–1435.e15, Sept. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Eising E., Mirza-Schreiber N., de Zeeuw E. L., Wang C. A., Truong D. T., Allegrini A. G., Shapland C. Y., Zhu G., Wigg K. G., Gerritse M. L., Molz B., Alagöz G., Gialluisi A., Abbondanza F., Rimfeld K., van Donkelaar M., Liao Z., Jansen P. R., Andlauer T. F. M., Bates T. C., Bernard M., Blokland K., Bonte M., Børglum A. D., Bourgeron T., Brandeis D., Ceroni F., Csépe V., Dale P. S., de Jong P. F., DeFries J. C., Démonet J.-F., Demontis D., Feng Y., Gordon S. D., Guger S. L., Hayiou-Thomas M. E., Hernández-Cabrera J. A., Hottenga J.-J., Hulme C., Kere J., Kerr E. N., Koomar T., Landerl K., Leonard G. T., Lovett M. W., Lyytinen H., Martin N. G., Martinelli A., Maurer U., Michaelson J. J., Moll K., Monaco A. P., Morgan A. T., Nöthen M. M., Pausova Z., Pennell C. E., Pennington B. F., Price K. M., Rajagopal V. M., Ramus F., Richer L., Simpson N. H., Smith S. D., Snowling M. J., Stein J., Strug L. J., Talcott J. B., Tiemeier H., van der Schroeff M. P., Verhoef E., Watkins K. E., Wilkinson M., Wright M. J., Barr C. L., Boomsma D. I., Carreiras M., Franken M.-C. J., Gruen J. R., Luciano M., Müller-Myhsok B., Newbury D. F., Olson R. K., Paracchini S., Paus T., Plomin R., Reilly S., Schulte-Körne G., Tomblin J. B., van Bergen E., Whitehouse A. J. O., Willcutt E. G., Pourcain B. S., Francks C., and Fisher S. E., “Genome-wide analyses of individual differences in quantitatively assessed reading- and language-related skills in up to 34,000 people,” Proceedings of the National Academy of Sciences, vol. 119, 8 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Verhoef E., Allegrini A. G., Jansen P. R., Lange K., Wang C. A., Morgan A. T., Ahluwalia T. S., Symeonides C., Andreassen O. A., Bartels M., Boomsma D., Dale P. S., Ehli E., Fernandez-Orth D., Guxens M., Hakulinen C., Harris K. M., Haworth S., de Hoyos L., Jaddoe V., Keltikangas-Järvinen L., Lehtimäki T., Middeldorp C., Min J. L., Mishra P. P., Njølstad P. R., Sunyer J., Tate A. E., Timpson N., van der Laan C., Vrijheid M., Vuoksimaa E., Whipp A., Ystrom E., A. Consortium, B. I. S. investigator group, Eising E., Franken M.-C., Hypponen E., Mansell T., Olislagers M., Omerovic E., Rimfeld K., Schlag F., Selzam S., Shapland C. Y., Tiemeier H., Whitehouse A. J., Saffery R., Bønnelykke K., Reilly S., Pennell C. E., Wake M., Cecil C. A., Plomin R., Fisher S. E., and Pourcain B. S., “Genome-wide analyses of vocabulary size in infancy and toddlerhood: Associations with attention-deficit/hyperactivity disorder, literacy, and cognition-related traits,” Biological Psychiatry, vol. 95, pp. 859–869, 5 2024. [DOI] [PubMed] [Google Scholar]
  • [15].Rajagopal V. M., Ganna A., Coleman J. R. I., Allegrini A., Voloudakis G., Grove J., Als T. D., Horsdal H. T., Petersen L., Appadurai V., Schork A., Buil A., Bulik C. M., Bybjerg-Grauholm J., Bækvad-Hansen M., Hougaard D. M., Mors O., Nordentoft M., Werge T., Belliveau R., Carey C. E., Cerrato F., Chambert K., Churchhouse C., Daly M. J., Dumont A., Goldstein J., Hansen C. S., Howrigan D. P., Huang H., Maller J., Martin A. R., Martin J., Mattheisen M., Moran J., Neale B. M., Pallesen J., Palmer D. S., Pedersen C. B., Pedersen M. G., Poterba T., Ripke S., Satterstrom F. K., Thompson W. K., Turley P., Walters R. K., Mortensen P. B., Breen G., Roussos P., Plomin R., Agerbo E., Børglum A. D., and Demontis D., “Genome-wide association study of school grades identifies genetic overlap between language ability, psychopathology and creativity,” Scientific Reports, vol. 13, p. 429, 1 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Niarchou M., Gustavson D. E., Sathirapongsasuti J. F., Anglada-Tort M., Eising E., Bell E., McArthur E., Straub P., Aslibekyan S., Auton A., Bell R. K., Bryc K., Clark S. K., Elson S. L., Fletez-Brant K., Fontanillas P., Furlotte N. A., Gandhi P. M., Heilbron K., Hicks B., Huber K. E., Jewett E. M., Jiang Y., Kleinman A., Lin K.-H., Litterman N. K., McCreight J. C., McIntyre M. H., McManus K. F., Mountain J. L., Mozaffari S. V., Nandakumar P., Noblin E. S., Northover C. A. M., O’Connell J., Pitts S. J., Poznik G. D., Shastri A. J., Shelton J. F., Shringarpure S., Tian C., Tung J. Y., Tunney R. J., Vacic V., Wang X., McAuley J. D., Capra J. A., Ullén F., Creanza N., Mosing M. A., Hinds D. A., Davis L. K., Jacoby N., and Gordon R. L., “Genome-wide association study of musical beat synchronization demonstrates high polygenicity,” Nature Human Behaviour, vol. 6, pp. 1292–1309, 6 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Doust C., Fontanillas P., Eising E., Gordon S. D., Wang Z., Alagöz G., Molz B., 23andMe Research Team, Q. T. W. G. of the GenLang Consortium, Pourcain B. S., Francks C., Marioni R. E., Zhao J., Paracchini S., Talcott J. B., Monaco A. P., Stein J. F., Gruen J. R., Olson R. K., Willcutt E. G., DeFries J. C., Pennington B. F., Smith S. D., Wright M. J., Martin N. G., Auton A., Bates T. C., Fisher S. E., and Luciano M., “Discovery of 42 genome-wide significant loci associated with dyslexia.,” Nature genetics, vol. 54, pp. 1621–1629, 11 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Feliciano P., Daniels A. M., Snyder L. G., Beaumont A., Camba A., Esler A., Gulsrud A. G., Mason A., Gutierrez A., Nicholson A., Paolicelli A. M., McKenzie A. P., Rachubinski A. L., Stephens A. N., Simon A. R., Stedman A., Shocklee A. D., Swanson A., Finucane B., Hilscher B. A., Hauf B., O’Roak B. J., McKenna B., Robertson B. E., Rodriguez B., Vernoia B. M., Metre B. V., Bradley C., Cohen C., Erickson C. A., Harkins C., Hayes C., Lord C., Martin C. L., Ortiz C., Ochoa-Lubinoff C., Peura C., Rice C. E., Rosenberg C. R., Smith C. J., Thomas C., Taylor C. M., White L. C., Walston C. H., Amaral D. G., Coury D. L., Sarver D. E., Istephanous D., Li D., Nugyen D. C., Fox E. A., Butter E. M., Berry-Kravis E., Courchesne E., Fombonne E. J., Hofammann E., Lamarche E., Wodka E. L., Matthews E. T., O’Connor E., Palen E., Miller F., Dichter G. S., Marzano G., Stein G., Hutter H., Kaplan H. E., Li H., Lechniak H., Schneider H. L., Zaydens H., Arriaga I., Gerdts J. A., Cubells J. F., Cordova J. M., Gunderson J., Lillard J., Manoharan J., McCracken J. T., Michaelson J. J., Neely J., Orobio J., Pandey J., Piven J., Scherr J., Sutcliffe J. S., Tjernagel J., Wallace J., Callahan K., Dent K., Schweers K. A., Hamer K. E., Law J. K., Lowe K., O’Brien K., Smith K., Pawlowski K. G., Pierce K. L., Roeder K., Abbeduto L. J., Berry L. N., Cartner L. A., Coppola L. A., Carpenter L., Cordeiro L., DeMarco L., Grosvenor L. P., Higgins L., Huang-Storms L. Y., Hosmer-Quint L., Herbert L. M., Kasparson L., Prock L. M., Pacheco L. D., Raymond L., Simon L., Soorya L. V., Wasserburg L., Lazar M., Alessandri M., Brown M., Currin M. H., Gwynette M. F., Heyman M., Hale M. N., Jones M., Jordy M., Morrier M. J., Sahin M., Siegel M. S., Verdi M., Parlade M. V., Yinger M., Bardett N., Hanna N., Harris N., Pottschmidt N., Russo-Ponsaran N., Takahashi N., Ousley O. Y., Juarez A. P., Manning P., Annett R. D., Bernier R. A., Clark R. D., Landa R. J., Goin-Kochel R. P., Remington R., Schultz R. T., Brewster S. J., Booker S., Carpenter S., Eldred S., Francis S., Friedman S. L., Horner S., Hepburn S., Jacob S., Kanne S., Lee S. J., Mastel S. A., Plate S., Qiu S., Sandhu S., Thompson S., White S., Myers V. J., Singh V., Yang W. S., Warren Z., Amatya A., Ace A. J., Chatha A. S., Lash A. E., Negron B., Rigby C., Ridenour C., Stock C. M., Schmidt D., Fisk I., Acampado J., Nestle J. L., Nestle J. A., Layman K., Butler M. E., Kent M., Mallardi M. D., Carriero N., Lawson N., Volfovsky N., Edgar R., Marini R., Rana R., Ganesan S., Shah S., Ramsey T., Chin W., Jensen W., Krentz A. D., Gruber A. J., Sabo A., Salomatov A., Eng C., Muzny D., Astrovskaya I., Gibbs R. A., Han X., Shen Y., Reichardt L. F., and Chung W. K., “Spark: A us cohort of 50,000 families to accelerate autism research,” Neuron, vol. 97, pp. 488–493, 2 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Lisdahl K. M., Sher K. J., Conway K. P., Gonzalez R., Ewing S. W. F., Nixon S. J., Tapert S., Bartsch H., Goldstein R. Z., and Heitzeg M., “Adolescent brain cognitive development (abcd) study: Overview of substance use assessment methods,” Developmental cognitive neuroscience, vol. 32, pp. 80–96, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Mallick S., Micco A., Mah M., Ringbauer H., Lazaridis I., Olalde I., Patterson N., and Reich D., “The allen ancient DNA resource (AADR) a curated compendium of ancient human genomes,” Sci. Data, vol. 11, p. 182, Feb. 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Kuderna L. F. K., Ulirsch J. C., Rashid S., Ameen M., Sundaram L., Hickey G., Cox A. J., Gao H., Kumar A., Aguet F., Christmas M. J., Clawson H., Haeussler M., Janiak M. C., Kuhlwilm M., Orkin J. D., Bataillon T., Manu S., Valenzuela A., Bergman J., Rouselle M., Silva F. E., Agueda L., Blanc J., Gut M., de Vries D., Goodhead I., Harris R. A., Raveendran M., Jensen A., Chuma I. S., Horvath J. E., Hvilsom C., Juan D., Frandsen P., Schraiber J. G., de Melo F. R., Bertuol F., Byrne H., Sampaio I., Farias I., Valsecchi J., Messias M., da Silva M. N. F., Trivedi M., Rossi R., Hrbek T., Andriaholinirina N., Rabarivola C. J., Zaramody A., Jolly C. J., Phillips-Conroy J., Wilkerson G., Abee C., Simmons J. H., Fernandez-Duque E., Kanthaswamy S., Shiferaw F., Wu D., Zhou L., Shao Y., Zhang G., Keyyu J. D., Knauf S., Le M. D.,Lizano E., Merker S., Navarro A., Nadler T., Khor C. C., Lee J., Tan P., Lim W. K., Kitchener A. C., Zinner D., Gut I., Melin A. D., Guschanski K., Schierup M. H., Beck R. M. D., Karakikes I., Wang K. C., Umapathy G., Roos C., Boubli J. P., Siepel A., Kundaje A., Paten B., Lindblad-Toh K., Rogers J., Bonet T. M., and Farh K. K.-H., “Identification of constrained sequence elements across 239 primate genomes.,” Nature, vol. 625, pp. 735–742, 1 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Tomblin J. B., “The episli database: a publicly available database on speech and language,” Lang. Speech Hear. Serv. Sch., vol. 41, pp. 108–117, Jan. 2010. [DOI] [PubMed] [Google Scholar]
  • [23].Klem M., Melby-Lervåg M., Hagtvet B., Lyster S. A., Gustafsson J. E., and Hulme C., “Sentence repetition is a measure of children’s language skills rather than working memory limitations,” Dev Sci, vol. 18, pp. 146–154, Jan 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Rujas I., Mariscal S., Murillo E., and Lázaro M., “Sentence repetition tasks to detect and prevent language difficulties: A scoping review,” Children (Basel), vol. 8, p. 578, July 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Lee S. H., DeCandia T. R., Ripke S., Yang J., Sullivan P. F., Goddard M. E., Keller M. C., Visscher P. M., and Wray N. R., “Estimating the proportion of variation in susceptibility to schizophrenia captured by common snps,” Nature Genetics, vol. 44, pp. 247–250, 3 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Finucane H. K., Bulik-Sullivan B., Gusev A., Trynka G., Reshef Y., Loh P.-R., Anttila V., Xu H., Zang C., Farh K., Ripke S., Day F. R., Purcell S., Stahl E., Lindstrom S., Perry J. R. B., Okada Y., Raychaudhuri S., Daly M. J., Patterson N., Neale B. M., and Price A. L., “Partitioning heritability by functional annotation using genome-wide association summary statistics,” Nature Genetics, vol. 47, pp. 1228–1235, 11 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Wei X., Robles C. R., Pazokitoroudi A., Ganna A., Gusev A., Durvasula A., Gazal S., Loh P.-R., Reich D., and Sankararaman S., “The lingering effects of neanderthal introgression on human complex traits,” Elife, vol. 12, Mar. 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Choi S. W., García-González J., Ruan Y., Wu H. M., Porras C., Johnson J., B. D. W. group of the Psychiatric Genomics Consortium, Hoggart C. J., and O’Reilly P. F., “Prset: Pathway-based polygenic risk score analyses and software.,” PLoS genetics, vol. 19, p. e1010624, 2 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Lee J. J., Wedow R., Okbay A., Kong E., Maghzian O., Zacher M., Nguyen-Viet T. A., Bowers P., Sidorenko J., Linnér R. K., et al. , “Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals,” Nature genetics, vol. 50, pp. 1112–1121, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Bi X., Zhou L., Zhang J.-J., Feng S., Hu M., Cooper D. N., Lin J., Li J., Wu D.-D., and Zhang G., “Lineage-specific accelerated sequences underlying primate evolution.,” Science advances, vol. 9, p. eadc9507, 6 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Gokhman D., Nissim-Rafinia M., Agranat-Tamir L., Housman G., García-Pérez R., Lizano E., Cheronet O., Mallick S., Nieves-Colón M. A., Li H., Alpaslan-Roodenberg S., Novak M., Gu H., Osinski J. M., Ferrando-Bernal M., Gelabert P., Lipende I., Mjungu D., Kondova I., Bontrop R., Kullmer O., Weber G., Shahar T., Dvir-Ginzberg M., Faerman M., Quillen E. E., Meissner A., Lahav Y., Kandel L., Liebergall M., Prada M. E., Vidal J. M., Gronostajski R. M., Stone A. C., Yakir B., Lalueza-Fox C., Pinhasi R., Reich D., Marques-Bonet T., Meshorer E., and Carmel L., “Differential DNA methylation of vocal and facial anatomy genes in modern humans,” Nat. Commun., vol. 11, p. 1189, Mar. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Capra J. A., Erwin G. D., McKinsey G., Rubenstein J. L. R., and Pollard K. S., “Many human accelerated regions are developmental enhancers.,” Philosophical transactions of the Royal Society of London. Series B, Biological sciences, vol. 368, p. 20130025, 12 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Green R. E., Krause J., Briggs A. W., Maricic T., Stenzel U., Kircher M., Patterson N., Li H., Zhai W., Fritz M. H.-Y., Hansen N. F., Durand E. Y., Malaspinas A.-S., Jensen J. D., Marques-Bonet T., Alkan C., Prüfer K., Meyer M., Burbano H. A., Good J. M., Schultz R., Aximu-Petri A., Butthof A., Höber B., Höffner B., Siegemund M., Weihmann A., Nusbaum C., Lander E. S., Russ C., Novod N., Affourtit J., Egholm M., Verna C., Rudan P., Brajkovic D., Kucan Željko, Gušic I., Doronichev V. B., Golovanova L. V., Lalueza-Fox C., de la Rasilla M., Fortea J., Rosas A., Schmitz R. W., Johnson P. L. F., Eichler E. E., Falush D., Birney E., Mullikin J. C., Slatkin M., Nielsen R., Kelso J., Lachmann M., Reich D., and Pääbo S., “A draft sequence of the neandertal genome.,” Science (New York, N.Y.), vol. 328, pp. 710–722, 5 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Rinker D. C., Simonti C. N., McArthur E., Shaw D., Hodges E., and Capra J. A., “Neanderthal introgression reintroduced functional ancestral alleles lost in eurasian populations,” Nat. Ecol. Evol., vol. 4, pp. 1332–1341, Oct. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Field Y., Boyle E. A., Telis N., Gao Z., Gaulton K. J., Golan D., Yengo L., Rocheleau G., Froguel P., McCarthy M. I., and Pritchard J. K., “Detection of human adaptation during the past 2000 years.,” Science (New York, N.Y.), vol. 354, pp. 760–764, 11 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Tager-Flusberg H. and Kasari C., “Minimally verbal school-aged children with autism spectrum disorder: the neglected end of the spectrum,” Autism Res., vol. 6, pp. 468–478, Dec. 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Vogindroukas I., Stankova M., Chelas E.-N., and Proedrou A., “Language and speech characteristics in autism,” Neuropsychiatr. Dis. Treat., vol. 18, pp. 2367–2377, Oct. 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Casten L. G., Koomar T., Elsadany M., McKone C., Tysseling B., Sasidharan M., Tomblin J. B., and Michaelson J. J., “Lingo: an automated, web-based deep phenotyping platform for language ability.,” medRxiv : the preprint server for health sciences, 3 2024. [Google Scholar]
  • [39].Wirthlin M. E., Schmid T. A., Elie J. E., Zhang X., Kowalczyk A., Redlich R., Shvareva V. A., Rakuljic A., Ji M. B., Bhat N. S., Kaplow I. M., Schäffer D. E., Lawler A. J., Wang A. Z., Phan B. N., Annaldasula S., Brown A. R., Lu T., Lim B. K., Azim E., Zoonomia Consortium, Clark N. L., Meyer W. K., Pond S. L. K., Chikina M., Yartsev M. M., Pfenning A. R., Andrews G., Armstrong J. C., Bianchi M., Birren B. W., Bredemeyer K. R., Breit A. M., Christmas M. J., Clawson H., Damas J., Di Palma F., Diekhans M., Dong M. X., Eizirik E., Fan K., Fanter C., Foley N. M., Forsberg-Nilsson K., Garcia C. J., Gatesy J., Gazal S., Genereux D. P., Goodman L., Grimshaw J., Halsey M. K., Harris A. J., Hickey G., Hiller M., Hindle A. G., Hubley R. M., Hughes G. M., Johnson J., Juan D., Kaplow I. M., Karlsson E. K., Keough K. C., Kirilenko B., Koepfli K.-P., Korstian J. M., Kowalczyk A., Kozyrev S. V., Lawler A. J., Lawless C., Lehmann T., Levesque D. L., Lewin H. A., Li X., Lind A., Lindblad-Toh K., Mackay-Smith A., Marinescu V. D., Marques-Bonet T., Mason V. C., Meadows J. R. S., Meyer W. K., Moore J. E., Moreira L. R., Moreno-Santillan D. D., Morrill K. M., Muntané G., Murphy W. J., Navarro A., Nweeia M., Ortmann S., Osmanski A., Paten B., Paulat N. S., Pfenning A. R., Phan B. N., Pollard K. S., Pratt H. E., Ray D. A., Reilly S. K., Rosen J. R., Ruf I., Ryan L., Ryder O. A., Sabeti P. C., Schäffer D. E., Serres A., Shapiro B., Smit A. F. A., Springer M., Srinivasan C., Steiner C., Storer J. M., Sullivan K. A. M., Sullivan P. F., Sundström E., Supple M. A., Swofford R., Talbot J.-E., Teeling E., Turner-Maier J., Valenzuela A., Wagner F., Wallerman O., Wang C., Wang J., Weng Z., Wilder A. P., Wirthlin M. E., Xue J. R., and Zhang X., “Vocal learning-associated convergent evolution in mammalian proteins and regulatory elements,” Science, vol. 383, p. eabn3263, Mar. 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Kuderna L. F. K., Gao H., Janiak M. C., Kuhlwilm M., Orkin J. D., Bataillon T., Manu S., Valenzuela A., Bergman J., Rousselle M., Silva F. E., Agueda L., Blanc J., Gut M., de Vries D., Goodhead I., Harris R. A., Raveendran M., Jensen A., Chuma I. S., Horvath J. E., Hvilsom C., Juan D., Frandsen P., Schraiber J. G., de Melo F. R., Bertuol F., Byrne H., Sampaio I., Farias I., Valsecchi J., Messias M., da Silva M. N. F., Trivedi M., Rossi R., Hrbek T., Andriaholinirina N., Rabarivola C. J., Zaramody A., Jolly C. J., Phillips-Conroy J., Wilkerson G., Abee C., Simmons J. H., Fernandez-Duque E., Kanthaswamy S., Shiferaw F., Wu D., Zhou L., Shao Y., Zhang G., Keyyu J. D., Knauf S., Le M. D., Lizano E., Merker S., Navarro A., Nadler T., Khor C. C., Lee J., Tan P., Lim W. K., Kitchener A. C., Zinner D., Gut I., Melin A. D., Guschanski K., Schierup M. H., Beck R. M. D., Umapathy G., Roos C., Boubli J. P., Rogers J., Farh K. K.-H., and Marques Bonet T., “A global catalog of whole-genome diversity from 233 primate species,” Science, vol. 380, pp. 906–913, June 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].Fisher S. E. and Scharff C., “Foxp2 as a molecular window into speech and language,” Trends in Genetics, vol. 25, pp. 166–177, 4 2009. [DOI] [PubMed] [Google Scholar]
  • [42].Alagöz G., Eising E., Mekki Y., Bignardi G., Fontanillas P., 23andMe Research Team, Nivard M. G., Luciano M., Cox N. J., Fisher S. E., and Gordon R. L., “The shared genetic architecture and evolution of human language and musical rhythm,” Nat. Hum. Behav., Nov. 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Fisher S. E., “Evolution of language: Lessons from the genome,” Psychon. Bull. Rev., vol. 24, pp. 34–40, Feb. 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Fitch W. T., “Evolutionary developmental biology and human language evolution: Constraints on adaptation,” Evol. Biol., vol. 39, pp. 613–637, Dec. 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [45].Polišenská K., Chiat S., and Roy P., “Sentence repetition: what does the task measure?,” Int. J. Lang. Commun. Disord., vol. 50, pp. 106–118, Jan. 2015. [DOI] [PubMed] [Google Scholar]
  • [46].Devescovi A. and Caselli M. C., “Sentence repetition as a measure of early grammatical development in italian,” Int. J. Lang. Commun. Disord., vol. 42, pp. 187–208, Mar. 2007. [DOI] [PubMed] [Google Scholar]
  • [47].Bolhuis J. J., Tattersall I., Chomsky N., and Berwick R. C., “How could language have evolved?,” PLoS Biol., vol. 12, p. e1001934, Aug. 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [48].Madigan S., Wade M., Plamondon A., Browne D., and Jenkins J. M., “Birth weight variability and language development: Risk, resilience, and responsive parenting,” J. Pediatr. Psychol., vol. 40, pp. 869–877, Oct. 2015. [DOI] [PubMed] [Google Scholar]
  • [49].Mariani B., Nicoletti G., Barzon G., Ortiz Barajas M. C., Shukla M., Guevara R., Suweis S. S., and Gervain J., “Prenatal experience with language shapes the brain,” Sci. Adv., vol. 9, p. eadj3524, Nov. 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [50].Meyer M., Arsuaga J.-L., de Filippo C., Nagel S., Aximu-Petri A., Nickel B., Martínez I., Gracia A., Bermúdez de Castro J. M., Carbonell E., Viola B., Kelso J., Prüfer K., and Pääbo S., “Nuclear dna sequences from the middle pleistocene sima de los huesos hominins,” Nature, vol. 531, pp. 504–507, Mar. 2016. [DOI] [PubMed] [Google Scholar]
  • [51].Pankratov V., Mezzavilla M., Aneli S., Kuznetsov I. A., Fusco D., Wilson J. F., Metspalu M., Provero P., Pagani L., and Marnetto D., “Ancestral genetic components are consistently associated with the complex trait landscape in european biobanks,” Eur. J. Hum. Genet., vol. 32, pp. 1492–1499, Nov. 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [52].Yamamoto K., Namba S., Sonehara K., Suzuki K., Sakaue S., Cooke N. P., Higashiue S., Kobayashi S., Afuso H., Matsuura K., Mitsumoto Y., Fujita Y., Tokuda T., Biobank Japan Project, Matsuda K., Gakuhari T., Yamauchi T., Kadowaki T., Nakagome S., and Okada Y., “Genetic legacy of ancient hunter-gatherer jomon in japanese populations,” Nat. Commun., vol. 15, p. 9780, Nov. 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [53].Bolognini D., Halgren A., Lou R. N., Raveane A., Rocha J. L., Guarracino A., Soranzo N., Chin C.-S., Garrison E., and Sudmant P. H., “Recurrent evolution and selection shape structural diversity at the amylase locus,” Nature, vol. 634, pp. 617–625, Oct. 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [54].Tomblin J. B., Records N. L., and Zhang X., “A system for the diagnosis of specific language impairment in kindergarten children,” Journal of Speech, Language, and Hearing Research, vol. 39, pp. 1284–1294, dec 1996. [DOI] [PubMed] [Google Scholar]
  • [55].Tomblin J. B. and Nippold M. A., eds., Understanding Individual Diflerences in Language Development Across the School Years. Psychology Press, mar 2014. [Google Scholar]
  • [56].Newcomer P. L. and Hammill D. D., Test of Language Development – 2 Primary. Pro-Ed Austin, TX, 1988. [Google Scholar]
  • [57].Wechsler D., “Wechsler preschool and primary scale of intelligence–revised,” Dec. 1989. Title of the publication associated with this dataset: PsycTESTS Dataset. [Google Scholar]
  • [58].Culatta B., Page J. L., and Ellis J., “Story retelling as a communicative performance screening tool,” Lang. Speech Hear. Serv. Sch., vol. 14, pp. 66–74, Apr. 1983. [Google Scholar]
  • [59].Woodcock R. W., “Woodcock reading mastery tests-revised,” Nov. 1987. Title of the publication associated with this dataset: PsycTESTS Dataset. [Google Scholar]
  • [60].Catts H. W., “The relationship between speech-language impairments and reading disabilities,” J. Speech Lang. Hear. Res., vol. 36, pp. 948–958, Oct. 1993. [DOI] [PubMed] [Google Scholar]
  • [61].Achenbach T. M., Child Behavior Checklist, pp. 546–552. Springer New York, 2011. [Google Scholar]
  • [62].R Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2020. [Google Scholar]
  • [63].Harrell F. E. Jr, with contributions from Charles Dupont, and many others., Hmisc: Harrell Miscellaneous, 2020. R package version 4.4–0. [Google Scholar]
  • [64].Mayer M., missRanger: Fast Imputation of Missing Values, 2019. R package version 2.1.0. [Google Scholar]
  • [65].Andrews S., Krueger F., Segonds-Pichon A., Biggins L., Krueger C., and Wingett S., “FastQC.” Babraham Institute, Jan. 2012. [Google Scholar]
  • [66].Chapman B., Kirchner R., Pantano L., Khotiainsteva T., Smet M. D., Beltrame L., Saveliev V., Guimera R. V., Naumenko S., Kern J., Brueffer C., Carrasco G., Giovacchini M., Sytchev I., Tang P., Ahdesmaki M., Kanwal S., Porter J. J., Möller S., Le V., Coman A., Svensson V., bogdang989, Mistry M., Edwards M., Hammerbacher J., Pedersen B., Cock P., apastore, and Turner S., “bcbio/bcbio-nextgen: v1.1.6,” Dec. 2019. [Google Scholar]
  • [67].Li H., “Aligning sequence reads, clone sequences and assembly contigs with bwa-mem,” arXiv, 3 2013. [Google Scholar]
  • [68].McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M., and DePristo M. A., “The genome analysis toolkit: A mapreduce framework for analyzing next-generation dna sequencing data,” Genome Research, vol. 20, pp. 1297–1303, 9 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [69].Garrison E. and Marth G., “Haplotype-based variant detection from short-read sequencing,” arXiv, 7 2012. [Google Scholar]
  • [70].Rimmer A., Phan H., Mathieson I., Iqbal Z., Twigg S. R. F., Wilkie A. O. M., McVean G., and Lunter G., “Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications,” Nature Genetics, vol. 46, pp. 912–918, 8 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [71].“A global reference for human genetic variation,” Nature, vol. 526, pp. 68–74, sep 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [72].McLaren W., Gil L., Hunt S. E., Riat H. S., Ritchie G. R. S., Thormann A., Flicek P., and Cunningham F., “The ensembl variant effect predictor,” Genome Biology, vol. 17, jun 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [73].Karczewski K. J., Francioli L. C., Tiao G., Tolonen C., Wade G., Talkowski M. E., Neale B. M., Daly M. J., MacArthur D. G., et al. , “The mutational constraint spectrum quantified from variation in 141,456 humans,” jan 2019. [Google Scholar]
  • [74].Smigielski E. M., “dbSNP: a database of single nucleotide polymorphisms,” Nucleic Acids Research, vol. 28, pp. 352–355, jan 2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [75].Pedersen B. S., Layer R. M., and Quinlan A. R., “Vcfanno: fast, flexible annotation of genetic variants,” Genome Biology, vol. 17, jun 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [76].Choi S. W., Mak T. S.-H., and O’Reilly P. F., “Tutorial: a guide to performing polygenic risk score analyses.,” Nature protocols, vol. 15, pp. 2759–2772, 9 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [77].Privé F., Albiñana C., Arbel J., Pasaniuc B., and Vilhjálmsson B. J., “Inferring disease architecture and predictive ability with ldpred2-auto,” The American Journal of Human Genetics, vol. 110, pp. 2042–2055, 12 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [78].Demontis D., Walters R. K., Martin J., Mattheisen M., Als T. D., Agerbo E., Baldursson G., Belliveau R., Bybjerg-Grauholm J., Bækvad-Hansen M., et al. , “Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder,” Nature genetics, vol. 51, no. 1, pp. 63–75, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [79].Hatoum A. S., Colbert S. M. C., Johnson E. C., Huggett S. B., Deak J. D., Pathak G., Jennings M. V., Paul S. E., Karcher N. R., Hansen I., Baranger D. A. A., Edwards A., Grotzinger A., Substance Use Disorder Working Group of the Psychiatric Genomics Consortium, Tucker-Drob E. M., Kranzler H. R., Davis L. K., Sanchez-Roige S., Polimanti R., Gelernter J., Edenberg H. J., Bogdan R., and Agrawal A., “Multivariate genome-wide association meta-analysis of over 1 million subjects identifies loci underlying multiple substance use disorders,” Nat. Ment. Health, vol. 1, pp. 210–223, Mar. 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [80].Walters R. K., Polimanti R., Johnson E. C., McClintick J. N., Adams M. J., Adkins A. E., Aliev F., Bacanu S.-A., Batzler A., Bertelsen S., Biernacka J. M., Bigdeli T. B., Chen L.-S., Clarke T.-K., Chou Y.-L., Degenhardt F., Docherty A. R., Edwards A. C., Fontanillas P., Foo J. C., Fox L., Frank J., Giegling I., Gordon S., Hack L. M., Hartmann A. M., Hartz S. M., Heilmann-Heimbach S., Herms S., Hodgkinson C., Hoffmann P., Hottenga J. J., Kennedy M. A., Alanne-Kinnunen M., Konte B., Lahti J., Lahti-Pulkkinen M., Lai D., Ligthart L., Loukola A., Maher B. S., Mbarek H., McIntosh A. M., McQueen M. B., Meyers J. L., Milaneschi Y., Palviainen T., Pearson J. F., Peterson R. E., Ripatti S., Ryu E., Saccone N. L., Salvatore J. E., Sanchez-Roige S., Schwandt M., Sherva R., Streit F., Strohmaier J., Thomas N., Wang J.-C., Webb B. T., Wedow R., Wetherill L., Wills A. G., Boardman J. D., Chen D., Choi D.-S., Copeland W. E., Culverhouse R. C., Dahmen N., Degenhardt L., Domingue B. W., Elson S. L., Frye M. A., Gäbel W., Hayward C., Ising M., Keyes M., Kiefer F., Kramer J., Kuperman S., Lucae S., Lynskey M. T., Maier W., Mann K., Männistö S., Müller-Myhsok B., Murray A. D., Nurnberger J. I., Palotie A., Preuss U., Räikkönen K., Reynolds M. D., Ridinger M., Scherbaum N., Schuckit M. A., Soyka M., Treutlein J., Witt S., Wodarz N., Zill P., Adkins D. E., Boden J. M., Boomsma D. I., Bierut L. J., Brown S. A., Bucholz K. K., Cichon S., Costello E. J., de Wit H., Diazgranados N., Dick D. M., Eriksson J. G., Farrer L. A., Foroud T. M., Gillespie N. A., Goate A. M., Goldman D., Grucza R. A., Hancock D. B., Harris K. M., Heath A. C., Hesselbrock V., Hewitt J. K., Hopfer C. J., Horwood J., Iacono W., Johnson E. O., Kaprio J. A., Karpyak V. M., Kendler K. S., Kranzler H. R., Krauter K., Lichtenstein P., Lind P. A., McGue M., MacKillop J., Madden P. A. F., Maes H. H., Magnusson P., Martin N. G., Medland S. E., Montgomery G. W., Nelson E. C., Nöthen M. M., Palmer A. A., Pedersen N. L., Penninx B. W. J. H., Porjesz B., Rice J. P., Rietschel M., Riley B. P., Rose R., Rujescu D., Shen P.-H., Silberg J., Stallings M. C., Tarter R. E., Vanyukov M. M., Vrieze S., Wall T. L., Whitfield J. B., Zhao H., Neale B. M., Gelernter J., Edenberg H. J., and Agrawal A., “Transancestral gwas of alcohol dependence reveals common genetic underpinnings with psychiatric disorders,” Nature Neuroscience, vol. 21, pp. 1656–1669, 12 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [81].Grove J., Ripke S., Als T. D., Mattheisen M., Walters R. K., Won H., Pallesen J., Agerbo E., Andreassen O. A., Anney R., et al. , “Identification of common genetic risk variants for autism spectrum disorder,” Nature genetics, vol. 51, no. 3, pp. 431–444, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [82].Watson H. J., Yilmaz Z., Thornton L. M., Hübel C., Coleman J. R. I., Gaspar H. A., Bryois J., Hinney A., Leppä V. M., Mattheisen M., et al. , “Genome-wide association study identifies eight risk loci and implicates metabo-psychiatric origins for anorexia nervosa,” Nature genetics, vol. 51, pp. 1207–1214, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [83].Mullins N., Forstner A. J., O’Connell K. S., Coombes B., Coleman J. R. I., Qiao Z., Als T. D., Bigdeli T. B., Børte S., Bryois J., Charney A. W., Drange O. K., Gandal M. J., Hagenaars S. P., Ikeda M., Kamitaki N., Kim M., Krebs K., Panagiotaropoulou G., Schilder B. M., Sloofman L. G., Steinberg S., Trubetskoy V., Winsvold B. S., Won H.-H., Abramova L., Adorjan K., Agerbo E., Eissa M. A., Albani D., Alliey-Rodriguez N., Anjorin A., Antilla V., Antoniou A., Awasthi S., Baek J. H., Bækvad-Hansen M., Bass N., Bauer M., Beins E. C., Bergen S. E., Birner A., Pedersen C. B., Bøen E., Boks M. P., Bosch R., Brum M., Brumpton B. M., Brunkhorst-Kanaan N., Budde M., Bybjerg-Grauholm J., Byerley W., Cairns M., Casas M., Cervantes P., Clarke T.-K., Cruceanu C., Cuellar-Barboza A., Cunningham J., Curtis D., Czerski P. M., Dale A. M., Dalkner N., David F. S., Degenhardt F., Djurovic S., Dobbyn A. L., Douzenis A., Elvsåshagen T., Escott-Price V., Ferrier I. N., Fiorentino A., Foroud T. M., Forty L., Frank J., Frei O., Freimer N. B., Frisén L., Gade K., Garnham J., Gelernter J., Pedersen M. G., Gizer I. R., Gordon S. D., Gordon-Smith K., Greenwood T. A., Grove J., Guzman-Parra J., Ha K., Haraldsson M., Hautzinger M., Heilbronner U., Hellgren D., Herms S., Hoffmann P., Holmans P. A., Huckins L., Jamain S., Johnson J. S., Kalman J. L., Kamatani Y., Kennedy J. L., Kittel-Schneider S., Knowles J. A., Kogevinas M., Koromina M., Kranz T. M., Kranzler H. R., Kubo M., Kupka R., Kushner S. A., Lavebratt C., Lawrence J., Leber M., Lee H.-J., Lee P. H., Levy S. E., Lewis C., Liao C., Lucae S., Lundberg M., MacIntyre D. J., Magnusson S. H., Maier W., Maihofer A., Malaspina D., Maratou E., Martinsson L., Mattheisen M., McCarroll S. A., McGregor N. W., McGuffin P., McKay J. D., Medeiros H., Medland S. E., Millischer V., Montgomery G. W., Moran J. L., Morris D. W., Mühleisen T. W., O’Brien N., O’Donovan C., Loohuis L. M. O., Oruc L., Papiol S., Pardiñas A. F., Perry A., Pfennig A., Porichi E., Potash J. B., Quested D., Raj T., Rapaport M. H., DePaulo J. R., Regeer E. J., Rice J. P., Rivas F., Rivera M., Roth J., Roussos P., Ruderfer D. M., Sánchez-Mora C., Schulte E. C., Senner F., Sharp S., Shilling P. D., Sigurdsson E., Sirignano L., Slaney C., Smeland O. B., Smith D. J., Sobell J. L., Hansen C. S., Artigas M. S., Spijker A. T., Stein D. J., Strauss J. S., Świątkowska B., Terao C., Thorgeirsson T. E., Toma C., Tooney P., Tsermpini E.-E., Vawter M. P., Vedder H., Walters J. T. R., Witt S. H., Xi S., Xu W., Yang J. M. K., Young A. H., Young H., Zandi P. P., Zhou H., Zillich L., Adolfsson R., Agartz I., Alda M., Alfredsson L., Babadjanova G., Backlund L., Baune B. T., Bellivier F., Bengesser S., Berrettini W. H., Blackwood D. H. R., Boehnke M., Børglum A. D., Breen G., Carr V. J., Catts S., Corvin A., Craddock N., Dannlowski U., Dikeos D., Esko T., Etain B., Ferentinos P., Frye M., Fullerton J. M., Gawlik M., Gershon E. S., Goes F. S., Green M. J., Grigoroiu-Serbanescu M., Hauser J., Henskens F., Hillert J., Hong K. S., Hougaard D. M., Hultman C. M., Hveem K., Iwata N., Jablensky A. V., Jones I., Jones L. A., Kahn R. S., Kelsoe J. R., Kirov G., Landén M., Leboyer M., Lewis C. M., Li Q. S., Lissowska J., Lochner C., Loughland C., Martin N. G., Mathews C. A., Mayoral F., McElroy S. L., McIntosh A. M., McMahon F. J., Melle I., Michie P., Milani L., Mitchell P. B., Morken G., Mors O., Mortensen P. B., Mowry B., Müller-Myhsok B., Myers R. M., Neale B. M., Nievergelt C. M., Nordentoft M., Nöthen M. M., O’Donovan M. C., Oedegaard K. J., Olsson T., Owen M. J., Paciga S. A., Pantelis C., Pato C., Pato M. T., Patrinos G. P., Perlis R. H., Posthuma D., Ramos-Quiroga J. A., Reif A., Reininghaus E. Z., Ribasés M., Rietschel M., Ripke S., Rouleau G. A., Saito T., Schall U., Schalling M., Schofield P. R., Schulze T. G., Scott L. J., Scott R. J., Serretti A., Weickert C. S., Smoller J. W., Stefansson H., Stefansson K., Stordal E., Streit F., Sullivan P. F., Turecki G., Vaaler A. E., Vieta E., Vincent J. B., Waldman I. D., Weickert T. W., Werge T., Wray N. R., Zwart J.-A., Biernacka J. M., Nurnberger J. I., Cichon S., Edenberg H. J., Stahl E. A., McQuillin A., Florio A. D., Ophoff R. A., and Andreassen O. A., “Genome-wide association study of more than 40,000 bipolar disorder cases provides new insights into the underlying biology,” Nature Genetics, vol. 53, pp. 817–829, 6 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [84].Howard D. M., Adams M. J., Clarke T.-K., Hafferty J. D., Gibson J., Shirali M., Coleman J. R. I., Hagenaars S. P., Ward J., Wigmore E. M., et al. , “Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions,” Nature neuroscience, vol. 22, pp. 343–352, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [85].Watanabe K., Jansen P. R., Savage J. E., Nandakumar P., Wang X., Agee M., Aslibekyan S., Auton A., Bell R. K., Bryc K., Clark S. K., Elson S. L., Fletez-Brant K., Fontanillas P., Furlotte N. A., Gandhi P. M., Heilbron K., Hicks B., Huber K. E., Jewett E. M., Jiang Y., Kleinman A., Lin K.-H., Litterman N. K., McCreight J. C., McIntyre M. H., McManus K. F., Mountain J. L., Mozaffari S. V., Noblin E. S., Northover C. A. M., O’Connell J., Pitts S. J., Poznik G. D., Sathirapongsasuti J. F., Shelton J. F., Shi J., Shringarpure S., Tian C., Tung J. Y., Tunney R. J., Vacic V., Wang W., Hinds D. A., Gelernter J., Levey D. F., Polimanti R., Stein M. B., Someren E. J. W. V., Smit A. B., and Posthuma D., “Genome-wide meta-analysis of insomnia prioritizes genes associated with metabolic and psychiatric pathways,” Nature Genetics, vol. 54, pp. 1125–1132, 8 2022. [DOI] [PubMed] [Google Scholar]
  • [86].Huang Q. Q., Wigdor E. M., Malawsky D. S., Campbell P., Samocha K. E., Chundru V. K., Danecek P., Lindsay S., Marchant T., Koko M., Amanat S., Bonfanti D., Sheridan E., Radford E. J., Barrett J. C., Wright C. F., Firth H. V., Warrier V., Strudwick Young A., Hurles M. E., and Martin H. C., “Examining the role of common variants in rare neurodevelopmental conditions,” Nature, vol. 636, pp. 404–411, Dec. 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [87].Nievergelt C. M., Maihofer A. X., Klengel T., Atkinson E. G., Chen C.-Y., Choi K. W., Coleman J. R. I., Dalvie S., Duncan L. E., Gelernter J., Levey D. F., Logue M. W., Polimanti R., Provost A. C., Ratanatharathorn A., Stein M. B., Torres K., Aiello A. E., Almli L. M., Amstadter A. B., Andersen S. B., Andreassen O. A., Arbisi P. A., Ashley-Koch A. E., Austin S. B., Avdibegovic E., Babić D., Bækvad-Hansen M., Baker D. G., Beckham J. C., Bierut L. J., Bisson J. I., Boks M. P., Bolger E. A., Børglum A. D., Bradley B., Brashear M., Breen G., Bryant R. A., Bustamante A. C., Bybjerg-Grauholm J., Calabrese J. R., de Almeida J. M. C., Dale A. M., Daly M. J., Daskalakis N. P., Deckert J., Delahanty D. L., Dennis M. F., Disner S. G., Domschke K., Dzubur-Kulenovic A., Erbes C. R., Evans A., Farrer L. A., Feeny N. C., Flory J. D., Forbes D., Franz C. E., Galea S., Garrett M. E., Gelaye B., Geuze E., Gillespie C., Uka A. G., Gordon S. D., Guffanti G., Hammamieh R., Harnal S., Hauser M. A., Heath A. C., Hemmings S. M. J., Hougaard D. M., Jakovljevic M., Jett M., Johnson E. O., Jones I., Jovanovic T., Qin X.-J., Junglen A. G., Karstoft K.-I., Kaufman M. L., Kessler R. C., Khan A., Kimbrel N. A., King A. P., Koen N., Kranzler H. R., Kremen W. S., Lawford B. R., Lebois L. A. M., Lewis C. E., Linnstaedt S. D., Lori A., Lugonja B., Luykx J. J., Lyons M. J., Maples-Keller J., Marmar C., Martin A. R., Martin N. G., Maurer D., Mavissakalian M. R., McFarlane A., McGlinchey R. E., McLaughlin K. A., McLean S. A., McLeay S., Mehta D., Milberg W. P., Miller M. W., Morey R. A., Morris C. P., Mors O., Mortensen P. B., Neale B. M., Nelson E. C., Nordentoft M., Norman S. B., O’Donnell M., Orcutt H. K., Panizzon M. S., Peters E. S., Peterson A. L., Peverill M., Pietrzak R. H., Polusny M. A., Rice J. P., Ripke S., Risbrough V. B., Roberts A. L., Rothbaum A. O., Rothbaum B. O., Roy-Byrne P., Ruggiero K., Rung A., Rutten B. P. F., Saccone N. L., Sanchez S. E., Schijven D., Seedat S., Seligowski A. V., Seng J. S., Sheerin C. M., Silove D., Smith A. K., Smoller J. W., Sponheim S. R., Stein D. J., Stevens J. S., Sumner J. A., Teicher M. H., Thompson W. K., Trapido E., Uddin M., Ursano R. J., van den Heuvel L. L., Hooff M. V., Vermetten E., Vinkers C. H., Voisey J., Wang Y., Wang Z., Werge T., Williams M. A., Williamson D. E., Winternitz S., Wolf C., Wolf E. J., Wolff J. D., Yehuda R., Young R. M., Young K. A., Zhao H., Zoellner L. A., Liberzon I., Ressler K. J., Haas M., and Koenen K. C., “International meta-analysis of ptsd genome-wide association studies identifies sex- and ancestry-specific genetic risk loci,” Nature Communications, vol. 10, p. 4558, 10 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [88].Trubetskoy V., Pardiñas A. F., Qi T., Panagiotaropoulou G., Awasthi S., Bigdeli T. B., Bryois J., Chen C.-Y., Dennison C. A., Hall L. S., Lam M., Watanabe K., Frei O., Ge T., Harwood J. C., Koopmans F., Magnusson S., Richards A. L., Sidorenko J., Wu Y., Zeng J., Grove J., Kim M., Li Z., Voloudakis G., Zhang W., Adams M., Agartz I., Atkinson E. G., Agerbo E., Eissa M. A., Albus M., Alexander M., Alizadeh B. Z., Alptekin K., Als T. D., Amin F., Arolt V., Arrojo M., Athanasiu L., Azevedo M. H., Bacanu S. A., Bass N. J., Begemann M., Belliveau R. A., Bene J., Benyamin B., Bergen S. E., Blasi G., Bobes J., Bonassi S., Braun A., Bressan R. A., Bromet E. J., Bruggeman R., Buckley P. F., Buckner R. L., Bybjerg-Grauholm J., Cahn W., Cairns M. J., Calkins M. E., Carr V. J., Castle D., Catts S. V., Chambert K. D., Chan R. C. K., Chaumette B., Cheng W., Cheung E. F. C., Chong S. A., Cohen D., Consoli A., Cordeiro Q., Costas J., Curtis C., Davidson M., Davis K. L., de Haan L., Degenhardt F., DeLisi L. E., Demontis D., Dickerson F., Dikeos D., Dinan T., Djurovic S., Duan J., Ducci G., Dudbridge F., Eriksson J. G., Fañanás L., Faraone S. V., Fiorentino A., Forstner A., Frank J., Freimer N. B., Fromer M., Frustaci A., Gadelha A., Genovese G., Gershon E. S., Giannitelli M., Giegling I., Giusti-Rodríguez P., Godard S., Goldstein J. I., Peñas J. G., González-Pinto A., Gopal S., Gratten J., Green M. F., Greenwood T. A., Guillin O., Gülöksüz S., Gur R. E., Gur R. C., Gutiérrez B., Hahn E., Hakonarson H., Haroutunian V., Hartmann A. M., Harvey C., Hayward C., Henskens F. A., Herms S., Hoffmann P., Howrigan D. P., Ikeda M., Iyegbe C., Joa I., Julià A., Kähler A. K., Kam-Thong T., Kamatani Y., Karachanak-Yankova S., Kebir O., Keller M. C., Kelly B. J., Khrunin A., Kim S.-W., Klovins J., Kondratiev N., Konte B., Kraft J., Kubo M., Kučinskas V., Kučinskiene Z. A., Kusumawardhani A., Kuzelova-Ptackova H., Landi S., Lazzeroni L. C., Lee P. H., Legge S. E., Lehrer D. S., Lencer R., Lerer B., Li M., Lieberman J., Light G. A., Limborska S., Liu C.-M., Lönnqvist J., Loughland C. M., Lubinski J., Luykx J. J., Lynham A., Macek M., Mackinnon A., Magnusson P. K. E., Maher B. S., Maier W., Malaspina D., Mallet J., Marder S. R., Marsal S., Martin A. R., Martorell L., Mattheisen M., McCarley R. W., McDonald C., McGrath J. J., Medeiros H., Meier S., Melegh B., Melle I., Mesholam-Gately R. I., Metspalu A., Michie P. T., Milani L., Milanova V., Mitjans M., Molden E., Molina E., Molto M. D., Mondelli V., Moreno C., Morley C. P., Muntané G., Murphy K. C., Myin-Germeys I., Nenadić I., Nestadt G., Nikitina-Zake L., Noto C., Nuechterlein K. H., O’Brien N. L., O’Neill F. A., Oh S.-Y., Olincy A., Ota V. K., Pantelis C., Papadimitriou G. N., Parellada M., Paunio T., Pellegrino R., Periyasamy S., Perkins D. O., Pfuhlmann B., Pietiläinen O., Pimm J., Porteous D., Powell J., Quattrone D., Quested D., Radant A. D., Rampino A., Rapaport M. H., Rautanen A., Reichenberg A., Roe C., Roffman J. L., Roth J., Rothermundt M., Rutten B. P. F., Saker-Delye S., Salomaa V., Sanjuan J., Santoro M. L., Savitz A., Schall U., Scott R. J., Seidman L. J., Sharp S. I., Shi J., Siever L. J., Sigurdsson E., Sim K., Skarabis N., Slominsky P., So H.-C., Sobell J. L., Söderman E., Stain H. J., Steen N. E., Steixner-Kumar A. A., Stögmann E., Stone W. S., Straub R. E., Streit F., Strengman E., Stroup T. S., Subramaniam M., Sugar C. A., Suvisaari J., Svrakic D. M., Swerdlow N. R., Szatkiewicz J. P., Ta T. M. T., Takahashi A., Terao C., Thibaut F., Toncheva D., Tooney P. A., Torretta S., Tosato S., Tura G. B., Turetsky B. I., Üçok A., Vaaler A., van Amelsvoort T., van Winkel R., Veijola J., Waddington J., Walter H., Waterreus A., Webb B. T., Weiser M., Williams N. M., Witt S. H., Wormley B. K., Wu J. Q., Xu Z., Yolken R., Zai C. C., Zhou W., Zhu F., Zimprich F., Atbaşoğlu E. C., Ayub M., Benner C., Bertolino A., Black D. W., Bray N. J., Breen G., Buccola N. G., Byerley W. F., Chen W. J., Cloninger C. R., Crespo-Facorro B., Donohoe G., Freedman R., Galletly C., Gandal M. J., Gennarelli M., Hougaard D. M., Hwu H.-G., Jablensky A. V., McCarroll S. A., Moran J. L., Mors O., Mortensen P. B., Müller-Myhsok B., Neil A. L., Nordentoft M., Pato M. T., Petryshen T. L., Pirinen M., Pulver A. E., Schulze T. G., Silverman J. M., Smoller J. W., Stahl E. A., Tsuang D. W., Vilella E., Wang S.-H., Xu S., I. S. Consortium, PsychENCODE, P. E. I. Consortium, S. Consortium, Adolfsson R., Arango C., Baune B. T., Belangero S. I., Børglum A. D., Braff D., Bramon E., Buxbaum J. D., Campion D., Cervilla J. A., Cichon S., Collier D. A., Corvin A., Curtis D., Forti M. D., Domenici E., Ehrenreich H., Escott-Price V., Esko T., Fanous A. H., Gareeva A., Gawlik M., Gejman P. V., Gill M., Glatt S. J., Golimbet V., Hong K. S., Hultman C. M., Hyman S. E., Iwata N., Jönsson E. G., Kahn R. S., Kennedy J. L., Khusnutdinova E., Kirov G., Knowles J. A., Krebs M.-O., Laurent-Levinson C., Lee J., Lencz T., Levinson D. F., Li Q. S., Liu J., Malhotra A. K., Malhotra D., McIntosh A., McQuillin A., Menezes P. R., Morgan V. A., Morris D. W., Mowry B. J., Murray R. M., Nimgaonkar V., Nöthen M. M., Ophoff R. A., Paciga S. A., Palotie A., Pato C. N., Qin S., Rietschel M., Riley B. P., Rivera M., Rujescu D., Saka M. C., Sanders A. R., Schwab S. G., Serretti A., Sham P. C., Shi Y., Clair D. S., Stefánsson H., Stefansson K., Tsuang M. T., van Os J., Vawter M. P., Weinberger D. R., Werge T., Wildenauer D. B., Yu X., Yue W., Holmans P. A., Pocklington A. J., Roussos P., Vassos E., Verhage M., Visscher P. M., Yang J., Posthuma D., Andreassen O. A., Kendler K. S., Owen M. J., Wray N. R., Daly M. J., Huang H., Neale B. M., Sullivan P. F., Ripke S., Walters J. T. R., O’Donovan M. C., and of S. W. G. the Psychiatric Genomics Consortium, “Mapping genomic loci implicates genes and synaptic biology in schizophrenia.,” Nature, vol. 604, pp. 502–508, 4 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [89].Rietveld C. A., Esko T., Davies G., Pers T. H., Plomin R., Visscher P. M., Benjamin D. J., Cessarini D., Koellinger P. D., et al. , “Common genetic variants associated with cognitive performance identified using the proxy-phenotype method,” Proceedings of the National Academy of Sciences, vol. 111, pp. 13790–13794, sep 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [90].Okbay A., Wu Y., Wang N., Jayashankar H., Bennett M., Nehzati S. M., Sidorenko J., Kweon H., Goldman G., Gjorgjieva T., Jiang Y., Hicks B., Tian C., Hinds D. A., Ahlskog R., Magnusson P. K. E., Oskarsson S., Hayward C., Campbell A., Porteous D. J., Freese J., Herd P., Agee M., Alipanahi B., Auton A., Bell R. K., Bryc K., Elson S. L., Fontanillas P., Furlotte N. A., Hinds D. A., Huber K. E., Kleinman A., Litterman N. K., McCreight J. C., McIntyre M. H., Mountain J. L., Northover C. A. M., Pitts S. J., Sathirapongsasuti J. F., Sazonova O. V., Shelton J. F., Shringarpure S., Tung J. Y., Vacic V., Wilson C. H., Fontana M. A., Pers T. H., Rietveld C. A., Chen G.-B., Emilsson V., Meddens S. F. W., Pickrell J. K., Thom K., Timshel P., de Vlaming R., Abdellaoui A., Ahluwalia T. S., Bacelis J., Baumbach C., Bjornsdottir G., Brandsma J. H., Concas M. P., Derringer J., Galesloot T. E., Girotto G., Gupta R., Hall L. M., Harris S. E., Hofer E., Horikoshi M., Huffman J. E., Kaasik K., Kalafati I. P., Karlsson R., Lahti J., van der Lee S. J., de Leeuw C., Lind P. A., Lindgren K.-O., Liu T., Mangino M., Marten J., Mihailov E., Miller M. B., van der Most P. J., Oldmeadow C., Payton A., Pervjakova N., Peyrot W. J., Qian Y., Raitakari O., Rueedi R., Salvi E., Schmidt B., Schraut K. E., Shi J., Smith A. V., Poot R. A., Pourcain B. S., Teumer A., Thorleifsson G., Verweij N., Vuckovic D., Wellmann J., Westra H.-J., Yang J., Zhao W., Zhu Z., Alizadeh B. Z., Amin N., Bakshi A., Baumeister S. E., Biino G., Bønnelykke K., Boyle P. A., Campbell H., Cappuccio F. P., Davies G., Neve J.-E. D., Deloukas P., Demuth I., Ding J., Eibich P., Eisele L., Eklund N., Evans D. M., Faul J. D., Feitosa M. F., Forstner A. J., Gandin I., Gunnarsson B., Halldórsson B. V., Harris T. B., Heath A. C., Hocking L. J., Holliday E. G., Homuth G., Horan M. A., Hottenga J.-J., de Jager P. L., Joshi P. K., Jugessur A., Kaakinen M. A., Kähönen M., Kanoni S., Keltigangas-Järvinen L., Kiemeney L. A. L. M., Kolcic I., Koskinen S., Kraja A. T., Kroh M., Kutalik Z., Latvala A., Launer L. J., Lebreton M. P., Levinson D. F., Lichtenstein P., Lichtner P., Liewald D. C. M., Loukola A., Madden P. A., Mägi R., Mäki-Opas T., Marioni R. E., Marques-Vidal P., Meddens G. A., McMahon G., Meisinger C., Meitinger T., Milaneschi Y., Milani L., Montgomery G. W., Myhre R., Nelson C. P., Nyholt D. R., Ollier W. E. R., Palotie A., Paternoster L., Pedersen N. L., Petrovic K. E., Räikkönen K., Ring S. M., Robino A., Rostapshova O., Rudan I., Rustichini A., Salomaa V., Sanders A. R., Sarin A.-P., Schmidt H., Scott R. J., Smith B. H., Smith J. A., Staessen J. A., Steinhagen-Thiessen E., Strauch K., Terracciano A., Tobin M. D., Ulivi S., Vaccargiu S., Quaye L., van Rooij F. J. A., Venturini C., Vinkhuyzen A. A. E., Völker U., Völzke H., Vonk J. M., Vozzi D., Waage J., Ware E. B., Willemsen G., Attia J. R., Bennett D. A., Berger K., Bertram L., Bisgaard H., Boomsma D. I., Borecki I. B., Bültmann U., Chabris C. F., Cucca F., Cusi D., Deary I. J., Dedoussis G. V., van Duijn C. M., Eriksson J. G., Franke B., Franke L., Gasparini P., Gejman P. V., Gieger C., Grabe H.-J., Gratten J., Groenen P. J. F., Gudnason V., van der Harst P., Hoffmann W., Hyppönen E., Iacono W. G., Jacobsson B., Järvelin M.-R., Jöckel K.-H., Kaprio J., Kardia S. L. R., Lehtimäki T., Lehrer S. F., Martin N. G., McGue M., Metspalu A., Pendleton N., Penninx B. W. J. H., Perola M., Pirastu N., Pirastu M., Polasek O., Posthuma D., Power C., Province M. A., Samani N. J., Schlessinger D., Schmidt R., Sørensen T. I. A., Spector T. D., Stefansson K., Thorsteinsdottir U., Thurik A. R., Timpson N. J., Tiemeier H., Uitterlinden A. G., Vitart V., Vollenweider P., Weir D. R., Wilson J. F., Wright A. F., Conley D. C., Krueger R. F., Smith G. D., Hofman A., Laibson D. I., Medland S. E., Yang J., Esko T., Watson C., Jala J., Conley D., Koellinger P. D., Johannesson M., Laibson D., Meyer M. N., Lee J. J., Kong A., Yengo L., Cesarini D., Turley P., Visscher P. M., Beauchamp J. P., Benjamin D. J., and Young A. I., “Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals,” Nature Genetics, vol. 54, pp. 437–449, 4 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [91].Hatoum A. S., Morrison C. L., Mitchell E. C., Lam M., Benca-Bachman C. E., Reineberg A. E., Palmer R. H. C., Evans L. M., Keller M. C., and Friedman N. P., “Genome-wide association study shows that executive functioning is influenced by gabaergic processes and is a neurocognitive genetic correlate of psychiatric disorders.,” Biological psychiatry, vol. 93, pp. 59–70, 1 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [92].de la Fuente J., Davies G., Grotzinger A. D., Tucker-Drob E. M., and Deary I. J., “A general dimension of genetic sharing across diverse cognitive traits inferred from molecular data.,” Nature human behaviour, vol. 5, pp. 49–58, 1 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [93].Ip H. F., van der Laan C. M., Krapohl E. M. L., Brikell I., Sánchez-Mora C., Nolte I. M., St Pourcain B., Bolhuis K., Palviainen T., Zafarmand H., Colodro-Conde L., Gordon S., Zayats T., Aliev F., Jiang C., Wang C. A., Saunders G., Karhunen V., Hammerschlag A. R, Adkins D. E., Border R., Peterson R. E., Prinz J. A., Thiering E., Seppälä I., Vilor-Tejedor N., Ahluwalia T. S., Day F. R., Hottenga J.-J., Allegrini A. G., Rimfeld K., Chen Q., Lu Y., Martin J., Soler Artigas M., Rovira P., Bosch R., Español G., Ramos Quiroga J. A., Neumann A., Ensink J., Grasby K., Morosoli J. J., Tong X., Marrington S., Middeldorp C., Scott J. G., Vinkhuyzen A., Shabalin A. A., Corley R., Evans L. M., Sugden K., Alemany S., Sass L., Vinding R., Ruth K., Tyrrell J., Davies G. E., Ehli E. A., Hagenbeek F. A., De Zeeuw E., Van Beijsterveldt T. C. E. M., Larsson H., Snieder H., Verhulst F. C., Amin N., Whipp A. M., Korhonen T., Vuoksimaa E., Rose R. J., Uitterlinden A. G., Heath A. C., Madden P., Haavik J., Harris J. R., Helgeland Ø., Johansson S., Knudsen G. P. S., Njolstad P. R., Lu Q., Rodriguez A., Henders A. K., Mamun A., Najman J. M., Brown S., Hopfer C., Krauter K., Reynolds C., Smolen A., Stallings M., Wadsworth S., Wall T. L., Silberg J. L., Miller A., Keltikangas-Järvinen L., Hakulinen C., Pulkki-Råback L., Havdahl A., Magnus P., Raitakari O. T., Perry J. R. B., Llop S., Lopez-Espinosa M.-J., Bønnelykke K., Bisgaard H., Sunyer J., Lehtimäki T., Arseneault L., Standl M., Heinrich J., Boden J., Pearson J., Horwood L. J., Kennedy M., Poulton R., Eaves L. J., Maes H. H., Hewitt J., Copeland W. E., Costello E. J., Williams G. M., Wray N., Järvelin M.-R., McGue M., Iacono W., Caspi A., Moffitt T. E., Whitehouse A., Pennell C. E., Klump K. L., Burt S. A., Dick D. M., Reichborn-Kjennerud T., Martin N. G., Medland S. E., Vrijkotte T., Kaprio J., Tiemeier H., Davey Smith G., Hartman C. A., Oldehinkel A. J., Casas M., Ribasés M., Lichtenstein P., Lundström S., Plomin R., Bartels M., Nivard M. G., and Boomsma D. I., “Genetic association study of childhood aggression across raters, instruments, and age,” Transl. Psychiatry, vol. 11, p. 413, July 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [94].Tielbeek J. J., Uffelmann E., Williams B. S., Colodro-Conde L., Gagnon Éloi, Mallard T. T., Levitt B. E., Jansen P. R., Johansson A., Sallis H. M., Pistis G., Saunders G. R. B., Allegrini A. G., Rimfeld K., Konte B., Klein M., Hartmann A. M., Salvatore J. E., Nolte I. M., Demontis D., Malmberg A. L. K., Burt S. A., Savage J. E., Sugden K., Poulton R., Harris K. M., Vrieze S., McGue M., Iacono W. G., Mota N. R., Mill J., Viana J. F., Mitchell B. L., Morosoli J. J., Andlauer T. F. M., Ouellet-Morin I., Tremblay R. E., Côté S. M., Gouin J.-P., Brendgen M. R., Dionne G., Vitaro F., Lupton M. K., Martin N. G., C. Consortium, S. for Science Working Group, Castelao E., Räikkönen K., Eriksson J. G., Lahti J., Hartman C. A., Oldehinkel A. J., Snieder H., Liu H., Preisig M., Whipp A., Vuoksimaa E., Lu Y., Jern P., Rujescu D., Giegling I., Palviainen T., Kaprio J., Harden K. P., Munafò M. R., Morneau-Vaillancourt G., Plomin R., Viding E., Boutwell B. B., Aliev F., Dick D. M., Popma A., Faraone S. V., Børglum A. D., Medland S. E., Franke B., Boivin M., Pingault J.-B., Glennon J. C., Barnes J. C., Fisher S. E., Moffitt T. E., Caspi A., Polderman T. J. C., and Posthuma D., “Uncovering the genetic architecture of broad antisocial behavior through a genome-wide association study meta-analysis.,” Molecular psychiatry, vol. 27, pp. 4453–4463, 11 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [95].Warrier V., Toro R., Chakrabarti B., iPSYCH-Broad autism group, Børglum A. D., Grove J., 23andMe Research Team, Hinds D. A., Bourgeron T., and Baron-Cohen S., “Genome-wide analyses of self-reported empathy: correlations with autism, schizophrenia, and anorexia nervosa.,” Translational psychiatry, vol. 8, p. 35, 3 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [96].Gupta P., Galimberti M., Liu Y., Beck S., Wingo A., Wingo T., Adhikari K., Kranzler H. R., Stein M. B., Gelernter J., and Levey D. F., “A genome-wide investigation into the underlying genetic architecture of personality traits and overlap with psychopathology,” Nature Human Behaviour, 8 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [97].Hill W. D., Davies N. M., Ritchie S. J., Skene N. G., Bryois J., Bell S., Angelantonio E. D., Roberts D. J., Xueyi S., Davies G., Liewald D. C. M., Porteous D. J., Hayward C., Butterworth A. S., McIntosh A. M., Gale C. R., and Deary I. J., “Genome-wide analysis identifies molecular systems and 149 genetic loci associated with income.,” Nature communications, vol. 10, p. 5741, 12 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [98].Neale B. M., “Uk biobank gwas round 2 results,” 2018. [Google Scholar]
  • [99].Grasby K. L., Jahanshad N., Painter J. N., Colodro-Conde L., Bralten J., Hibar D. P., Lind P. A., Pizzagalli F., Ching C. R. K., McMahon M. A. B., Shatokhina N., Zsembik L. C. P., Thomopoulos S. I., Zhu A. H., Strike L. T., Agartz I., Alhusaini S., Almeida M. A. A., Alnæs D., Amlien I. K., Andersson M., Ard T., Armstrong N. J., Ashley-Koch A., Atkins J. R., Bernard M., Brouwer R. M., Buimer E. E. L., Bülow R., Bürger C., Cannon D. M., Chakravarty M., Chen Q., Cheung J. W., Couvy-Duchesne B., Dale A. M., Dalvie S., de Araujo T. K., de Zubicaray G. I., de Zwarte S. M. C., den Braber A., Doan N. T., Dohm K., Ehrlich S., Engelbrecht H.-R., Erk S., Fan C. C., Fedko I. O., Foley S. F., Ford J. M., Fukunaga M., Garrett M. E., Ge T., Giddaluru S., Goldman A. L., Green M. J., Groenewold N. A., Grotegerd D., Gurholt T. P., Gutman B. A., Hansell N. K., Harris M. A., Harrison M. B., Haswell C. C., Hauser M., Herms S., Heslenfeld D. J., Ho N. F., Hoehn D., Hoffmann P., Holleran L., Hoogman M., Hottenga J.-J., Ikeda M., Janowitz D., Jansen I. E., Jia T., Jockwitz C., Kanai R., Karama S., Kasperaviciute D., Kaufmann T., Kelly S., Kikuchi M., Klein M., Knapp M., Knodt A. R., Krämer B., Lam M., Lancaster T. M., Lee P. H., Lett T. A., Lewis L. B., Lopes-Cendes I., Luciano M., Macciardi F., Marquand A. F., Mathias S. R., Melzer T. R., Milaneschi Y., Mirza-Schreiber N., Moreira J. C. V., Mühleisen T. W., Müller-Myhsok B., Najt P., Nakahara S., Nho K., Loohuis L. M. O., Orfanos D. P., Pearson J. F., Pitcher T. L., Pütz B., Quidé Y., Ragothaman A., Rashid F. M., Reay W. R., Redlich R., Reinbold C. S., Repple J., Richard G., Riedel B. C., Risacher S. L., Rocha C. S., Mota N. R., Salminen L., Saremi A., Saykin A. J., Schlag F., Schmaal L., Schofield P. R., Secolin R., Shapland C. Y., Shen L., Shin J., Shumskaya E., Sønderby I. E., Sprooten E., Tansey K. E., Teumer A., Thalamuthu A., Tordesillas-Gutiérrez D., Turner J. A., Uhlmann A., Vallerga C. L., van der Meer D., van Donkelaar M. M. J., van Eijk L., van Erp T. G. M., van Haren N. E. M., van Rooij D., van Tol M.-J., Veldink J. H., Verhoef E., Walton E., Wang M., Wang Y., Wardlaw J. M., Wen W., Westlye L. T., Whelan C. D., Witt S. H., Wittfeld K., Wolf C., Wolfers T., Wu J. Q., Yasuda C. L., Zaremba D., Zhang Z., Zwiers M. P., Artiges E., Assareh A. A., Ayesa-Arriola R., Belger A., Brandt C. L., Brown G. G., Cichon S., Curran J. E., Davies G. E., Degenhardt F., Dennis M. F., Dietsche B., Djurovic S., Doherty C. P., Espiritu R., Garijo D., Gil Y., Gowland P. A., Green R. C., Häusler A. N., Heindel W., Ho B.-C., Hoffmann W. U., Holsboer F., Homuth G., Hosten N., Jack C. R., Jang M., Jansen A., Kimbrel N. A., Kolskår K., Koops S., Krug A., Lim K. O., Luykx J. J., Mathalon D. H., Mather K. A., Mattay V. S., Matthews S., Son J. M. V., McEwen S. C., Melle I., Morris D. W., Mueller B. A., Nauck M., Nordvik J. E., Nöthen M. M., O’Leary D. S., Opel N., Martinot M.-L. P., Pike G. B., Preda A., Quinlan E. B., Rasser P. E., Ratnakar V., Reppermund S., Steen V. M., Tooney P. A., Torres F. R., Veltman D. J., Voyvodic J. T., Whelan R., White T.,Yamamori H., Adams H. H. H., Bis J. C., Debette S., Decarli C., Fornage M., Gudnason V., Hofer E., Ikram M. A., Launer L., Longstreth W. T., Lopez O. L., Mazoyer B., Mosley T. H., Roshchupkin G. V., Satizabal C. L., Schmidt R., Seshadri S., Yang Q., A. D. N. Initiative, C. Consortium, E. Consortium, I. Consortium, S. Consortium, P. P. M. Initiative, Alvim M. K. M., Ames D., Anderson T. J., Andreassen O. A., Arias-Vasquez A., Bastin M. E., Baune B. T., Beckham J. C., Blangero J., Boomsma D. I., Brodaty H., Brunner H. G., Buckner R. L., Buitelaar J. K., Bustillo J. R., Cahn W., Cairns M. J., Calhoun V., Carr V. J., Caseras X., Caspers S., Cavalleri G. L., Cendes F., Corvin A., Crespo-Facorro B., Dalrymple-Alford J. C., Dannlowski U., de Geus E. J. C., Deary I. J., Delanty N., Depondt C., Desrivières S., Donohoe G., Espeseth T., Fernández G., Fisher S. E., Flor H., Forstner A. J., Francks C., Franke B., Glahn D. C., Gollub R. L., Grabe H. J., Gruber O., Håberg A. K., Hariri A. R., Hartman C. A., Hashimoto R., Heinz A., Henskens F. A., Hillegers M. H. J., Hoekstra P. J., Holmes A. J., Hong L. E., Hopkins W. D., Pol H. E. H., Jernigan T. L., Jönsson E. G., Kahn R. S., Kennedy M. A., Kircher T. T. J., Kochunov P., Kwok J. B. J., Hellard S. L., Loughland C. M., Martin N. G., Martinot J.-L., McDonald C., McMahon K. L., Meyer-Lindenberg A., Michie P. T., Morey R. A., Mowry B., Nyberg L., Oosterlaan J., Ophoff R. A., Pantelis C., Paus T., Pausova Z., Penninx B. W. J. H., Polderman T. J. C., Posthuma D., Rietschel M., Roffman J. L., Rowland L. M., Sachdev P. S., Sämann P. G., Schall U., Schumann G., Scott R. J., Sim K., Sisodiya S. M., Smoller J. W., Sommer I. E., Pourcain B. S., Stein D. J., Toga A. W., Trollor J. N., der Wee N. J. A. V., van ‘t Ent D., Völzke H., Walter H., Weber B., Weinberger D. R., Wright M. J., Zhou J., Stein J. L., Thompson P. M., Medland S. E., and E. N. G. through Meta-Analysis Consortium (ENIGMA)—Genetics working group, “The genetic architecture of the human cerebral cortex.,” Science (New York, N.Y.), vol. 367, 3 2020. [Google Scholar]
  • [100].Tissink E., de Lange S. C., Savage J. E., Wightman D. P., de Leeuw C. A., Kelly K. M., Nagel M., van den Heuvel M. P., and Posthuma D., “Genome-wide association study of cerebellar volume provides insights into heritable mechanisms underlying brain development and mental health,” Communications Biology, vol. 5, p. 710, 7 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [101].Tissink E., Werme J., de Lange S. C., Savage J. E., Wei Y., de Leeuw C. A., Nagel M., Posthuma D., and van den Heuvel M. P., “The genetic architectures of functional and structural connectivity properties within cerebral resting-state networks.,” eNeuro, vol. 10, April 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [102].Yengo L., Vedantam S., Marouli E., Sidorenko J., Bartell E., Sakaue S., Graff M., Eliasen A. U., Jiang Y., Raghavan S., Miao J., Arias J. D., Graham S. E., Mukamel R. E., Spracklen C. N., Yin X., Chen S.-H., Ferreira T., Highland H. H., Ji Y., Karaderi T., Lin K., Lüll K., Malden D. E., Medina-Gomez C., Machado M., Moore A., Rüeger S., Sim X., Vrieze S., Ahluwalia T. S., Akiyama M., Allison M. A., Alvarez M., Andersen M. K., Ani A., Appadurai V., Arbeeva L., Bhaskar S., Bielak L. F., Bollepalli S., Bonnycastle L. L., Bork-Jensen J., Bradfield J. P., Bradford Y., Braund P. S., Brody J. A., Burgdorf K. S., Cade B. E., Cai H., Cai Q., Campbell A., Cañadas-Garre M., Catamo E., Chai J.-F., Chai X., Chang L.-C., Chang Y.-C., Chen C.-H., Chesi A., Choi S. H., Chung R.-H., Cocca M., Concas M. P., Couture C., Cuellar-Partida G., Danning R., Daw E. W., Degenhard F., Delgado G. E., Delitala A., Demirkan A., Deng X., Devineni P., Dietl A., Dimitriou M., Dimitrov L., Dorajoo R., Ekici A. B., Engmann J. E., Fairhurst-Hunter Z., Farmaki A.-E., Faul J. D., Fernandez-Lopez J.-C., Forer L., Francescatto M., Freitag-Wolf S., Fuchsberger C., Galesloot T. E., Gao Y., Gao Z., Geller F., Giannakopoulou O., Giulianini F., Gjesing A. P., Goel A., Gordon S. D., Gorski M., Grove J., Guo X., Gustafsson S., Haessler J., Hansen T. F., Havulinna A. S., Haworth S. J., He J., Heard-Costa N., Hebbar P., Hindy G., Ho Y.-L. A., Hofer E., Holliday E., Horn K., Hornsby W. E., Hottenga J.-J., Huang H., Huang J., Huerta-Chagoya A., Huffman J. E., Hung Y.-J., Huo S., Hwang M. Y., Iha H., Ikeda D. D., Isono M., Jackson A. U., Jäger S., Jansen I. E., Johansson I., Jonas J. B., Jonsson A., Jørgensen T., Kalafati I.-P., Kanai M., Kanoni S., Kårhus L. L., Kasturiratne A., Katsuya T., Kawaguchi T., Kember R. L., Kentistou K. A., Kim H.-N., Kim Y. J., Kleber M. E., Knol M. J., Kurbasic A., Lauzon M., Le P., Lea R., Lee J.-Y., Leonard H. L., Li S. A., Li X., Li X., Liang J., Lin H., Lin S.-Y., Liu J., Liu X., Lo K. S., Long J., Lores-Motta L., Luan J., Lyssenko V., Lyytikäinen L.-P., Mahajan A., Mamakou V., Mangino M., Manichaikul A., Marten J., Mattheisen M., Mavarani L., McDaid A. F., Meidtner K., Melendez T. L., Mercader J. M., Milaneschi Y., Miller J. E., Millwood I. Y., Mishra P. P., Mitchell R. E., Møllehave L. T., Morgan A., Mucha S., Munz M., Nakatochi M., Nelson C. P., Nethander M., Nho C. W., Nielsen A. A., Nolte I. M., Nongmaithem S. S., Noordam R., Ntalla I., Nutile T., Pandit A., Christofidou P., Pärna K., Pauper M., Petersen E. R. B., Petersen L. V., Pitkänen N., Polašek O., Poveda A., Preuss M. H., Pyarajan S., Raffield L. M., Rakugi H., Ramirez J., Rasheed A., Raven D., Rayner N. W., Riveros C., Rohde R., Ruggiero D., Ruotsalainen S. E., Ryan K. A., Sabater-Lleal M., Saxena R., Scholz M., Sendamarai A., Shen B., Shi J., Shin J. H., Sidore C., Sitlani C. M., Slieker R. C., Smit R. A. J., Smith A. V., Smith J. A., Smyth L. J., Southam L., Steinthorsdottir V., Sun L., Takeuchi F., Tallapragada D. S. P., Taylor K. D., Tayo B. O., Tcheandjieu C., Terzikhan N., Tesolin P., Teumer A., Theusch E., Thompson D. J., Thorleifsson G., Timmers P. R. H. J., Trompet S., Turman C., Vaccargiu S., van der Laan S. W., van der Most P. J., van Klinken J. B., van Setten J., Verma S. S., Verweij N., Veturi Y., Wang C. A., Wang C., Wang L., Wang Z., Warren H. R., Wei W. B., Wickremasinghe A. R., Wielscher M., Wiggins K. L., Winsvold B. S., Wong A., Wu Y., Wuttke M., Xia R., Xie T., Yamamoto K., Yang J., Yao J., Young H., Yousri N. A., Yu L., Zeng L., Zhang W., Zhang X., Zhao J.-H., Zhao W., Zhou W., Zimmermann M. E., Zoledziewska M., Adair L. S., Adams H. H. H., Aguilar-Salinas C. A., Al-Mulla F., Arnett D. K., Asselbergs F. W., Åsvold B. O., Attia J., Banas B., Bandinelli S., Bennett D. A., Bergler T., Bharadwaj D., Biino G., Bisgaard H., Boerwinkle E., Böger C. A., Bønnelykke K., Boomsma D. I., Børglum A. D., Borja J. B., Bouchard C., Bowden D. W., Brandslund I., Brumpton B., Buring J. E., Caulfield M. J., Chambers J. C., Chandak G. R., Chanock S. J., Chaturvedi N., Chen Y.-D. I., Chen Z., Cheng C.-Y., Christophersen I. E., Ciullo M., Cole J. W., Collins F. S., Cooper R. S., Cruz M., Cucca F., Cupples L. A., Cutler M. J., Damrauer S. M., Dantoft T. M., de Borst G. J., de Groot L. C. P. G. M., Jager P. L. D., de Kleijn D. P. V., de Silva H. J., Dedoussis G. V., den Hollander A. I., Du S., Easton D. F., Elders P. J. M., Eliassen A. H., Ellinor P. T., Elmståhl S., Erdmann J., Evans M. K., Fatkin D., Feenstra B., Feitosa M. F., Ferrucci L., Ford I., Fornage M., Franke A., Franks P. W., Freedman B. I., Gasparini P., Gieger C., Girotto G., Goddard M. E., Golightly Y. M., Gonzalez-Villalpando C., Gordon-Larsen P., Grallert H., Grant S. F. A., Grarup N., Griffiths L., Gudnason V., Haiman C., Hakonarson H., Hansen T., Hartman C. A., Hattersley A. T., Hayward C., Heckbert S. R., Heng C.-K., Hengstenberg C., Hewitt A. W., Hishigaki H., Hoyng C. B., Huang P. L., Huang W., Hunt S. C., Hveem K., Hyppönen E., Iacono W. G., Ichihara S., Ikram M. A., Isasi C. R., Jackson R. D., Jarvelin M.-R., Jin Z.-B., Jöckel K.-H., Joshi P. K., Jousilahti P., Jukema J. W., Kähönen M., Kamatani Y., Kang K. D., Kaprio J., Kardia S. L. R., Karpe F., Kato N., Kee F., Kessler T., Khera A. V., Khor C. C., Kiemeney L. A. L. M., Kim B.-J., Kim E. K., Kim H.-L., Kirchhof P., Kivimaki M., Koh W.-P., Koistinen H. A., Kolovou G. D., Kooner J. S., Kooperberg C., Köttgen A., Kovacs P., Kraaijeveld A., Kraft P., Krauss R. M., Kumari M., Kutalik Z., Laakso M., Lange L. A., Langenberg C., Launer L. J., Marchand L. L., Lee H., Lee N. R., Lehtimäki T., Li H., Li L., Lieb W., Lin X., Lind L., Linneberg A., Liu C.-T., Liu J., Loeffler M., London B., Lubitz S. A., Lye S. J., Mackey D. A., Mägi R., Magnusson P. K. E., Marcus G. M., Vidal P. M., Martin N. G., März W., Matsuda F., McGarrah R. W., McGue M., McKnight A. J., Medland S. E., Mellström D., Metspalu A., Mitchell B. D., Mitchell P., Mook-Kanamori D. O., Morris A. D., Mucci L. A., Munroe P. B., Nalls M. A., Nazarian S., Nelson A. E., Neville M. J., Newton-Cheh C., Nielsen C. S., Nöthen M. M., Ohlsson C., Oldehinkel A. J., Orozco L., Pahkala K., Pajukanta P., Palmer C. N. A., Parra E. J., Pattaro C., Pedersen O., Pennell C. E., Penninx B. W. J. H., Perusse L., Peters A., Peyser P. A., Porteous D. J., Posthuma D., Power C., Pramstaller P. P., Province M. A., Qi Q., Qu J., Rader D. J., Raitakari O. T., Ralhan S., Rallidis L. S., Rao D. C., Redline S., Reilly D. F., Reiner A. P., Rhee S. Y., Ridker P. M., Rienstra M., Ripatti S., Ritchie M. D., Roden D. M., Rosendaal F. R., Rotter J. I., Rudan I., Rutters F., Sabanayagam C., Saleheen D., Salomaa V., Samani N. J., Sanghera D. K., Sattar N., Schmidt B., Schmidt H., Schmidt R., Schulze M. B., Schunkert H., Scott L. J., Scott R. J., Sever P., Shiroma E. J., Shoemaker M. B., Shu X.-O., Simonsick E. M., Sims M., Singh J. R., Singleton A. B., Sinner M. F., Smith J. G., Snieder H., Spector T. D., Stampfer M. J., Stark K. J., Strachan D. P., ‘t Hart L. M., Tabara Y., Tang H., Tardif J.-C., Thanaraj T. A., Timpson N. J., Tönjes A., Tremblay A., Tuomi T., Tuomilehto J., Tusié-Luna M.-T., Uitterlinden A. G., van Dam R. M., van der Harst P., der Velde N. V., van Duijn C. M., van Schoor N. M., Vitart V., Völker U., Vollenweider P., Völzke H., Wacher-Rodarte N. H., Walker M., Wang Y. X., Wareham N. J., Watanabe R. M., Watkins H., Weir D. R., Werge T. M., Widen E., Wilkens L. R., Willemsen G., Willett W. C., Wilson J. F., Wong T.-Y., Woo J.-T., Wright A. F., Wu J.-Y., Xu H., Yajnik C. S., Yokota M., Yuan J.-M., Zeggini E., Zemel B. S., Zheng W., Zhu X., Zmuda J. M., Zonderman A. B., Zwart J.-A., Partida G. C., Sun Y., Croteau-Chonka D., Vonk J. M., Chanock S., Marchand L. L., Chasman D. I., Cho Y. S., Heid I. M., McCarthy M. I., Ng M. C. Y., O’Donnell C. J., Rivadeneira F., Thorsteinsdottir U., Sun Y. V., Tai E. S., Boehnke M., Deloukas P., Justice A. E., Lindgren C. M., Loos R. J. F., Mohlke K. L., North K. E., Stefansson K., Walters R. G., Winkler T. W., Young K. L., Loh P.-R., Yang J., Esko T., Assimes T. L., Auton A., Abecasis G. R., Willer C. J., Locke A. E., Berndt S. I., Lettre G., Frayling T. M., Okada Y., Wood A. R., Visscher P. M., and Hirschhorn J. N., “A saturated map of common genetic variants associated with human height,” Nature, vol. 610, pp. 704–712, 10 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [103].Dalvie S., Maihofer A. X., Coleman J. R. I., Bradley B., Breen G., Brick L. A., Chen C.-Y., Choi K. W., Duncan L. E., Guffanti G., Haas M., Harnal S., Liberzon I., Nugent N. R., Provost A. C., Ressler K. J., Torres K., Amstadter A. B., Bryn Austin S., Baker D. G., Bolger E. A., Bryant R. A., Calabrese J. R., Delahanty D. L., Farrer L. A., Feeny N. C., Flory J. D., Forbes D., Galea S., Gautam A., Gelernter J., Hammamieh R., Jett M., Junglen A. G., Kaufman M. L., Kessler R. C., Khan A., Kranzler H. R., Lebois L. A. M., Marmar C., Mavissakalian M. R., McFarlane A., Donnell M. O., Orcutt H. K., Pietrzak R. H., Risbrough V. B., Roberts A. L., Rothbaum A. O., Roy-Byrne P., Ruggiero K., Seligowski A. V., Sheerin C. M., Silove D., Smoller J. W., Stein M. B., Teicher M. H., Ursano R. J., Van Hooff M., Winternitz S., Wolff J. D., Yehuda R., Zhao H., Zoellner L. A., Stein D. J., Koenen K. C., and Nievergelt C. M., “Genomic influences on self-reported childhood maltreatment,” Transl. Psychiatry, vol. 10, p. 38, Jan. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [104].Gisladottir R. S., Helgason A., Halldorsson B. V., Helgason H., Borsky M., Chien Y.-R., Gudnason J., Gudjonsson S. A., Moisik S., Dediu D., Thorleifsson G., Tragante V., Bustamante M., Jonsdottir G. A., Stefansdottir L., Rutsdottir G., Magnusson S. H., Hardarson M., Ferkingstad E., Halldorsson G. H., Rognvaldsson S., Skuladottir A., Ivarsdottir E. V., Norddahl G., Thorgeirsson G., Jonsdottir I., Ulfarsson M. O., Holm H., Stefansson H., Thorsteinsdottir U., Gudbjartsson D. F., Sulem P., and Stefansson K., “Sequence variants affecting voice pitch in humans.,” Science advances, vol. 9, p. eabq2969, 6 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [105].Quinlan A. R. and Hall I. M., “Bedtools: a flexible suite of utilities for comparing genomic features.,” Bioinformatics (Oxford, England), vol. 26, pp. 841–2, 3 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [106].Kuhn R. M., Haussler D., and Kent W. J., “The ucsc genome browser and associated tools.,” Briefings in bioinformatics, vol. 14, pp. 144–61, 3 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [107].Thomas T. R., Tener A. J., Pearlman A. M., Imborek K. L., Yang J. S., Strang J. F., and Michaelson J. J., “Polygenic scores clarify the relationship between mental health and gender diversity,” Biological Psychiatry Global Open Science, vol. 4, p. 100291, 3 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [108].Fornes O., Castro-Mondragon J. A., Khan A., van der Lee R., Zhang X., Richmond P. A., Modi B. P., Correard S., Gheorghe M., Baranašić D., Santana-Garcia W., Tan G., Chèneby J., Ballester B., Parcy F., Sandelin A., Lenhard B., Wasserman W. W., and Mathelier A., “Jaspar 2020: update of the open-access database of transcription factor binding profiles,” Nucleic Acids Res., vol. 48, pp. D87–D92, Jan. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [109].Prüfer K., Racimo F., Patterson N., Jay F., Sankararaman S., Sawyer S., Heinze A., Renaud G., Sudmant P. H., de Filippo C., Li H., Mallick S., Dannemann M., Fu Q., Kircher M., Kuhlwilm M., Lachmann M., Meyer M., Ongyerth M., Siebauer M., Theunert C., Tandon A., Moorjani P., Pickrell J., Mullikin J. C., Vohr S. H., Green R. E., Hellmann I., Johnson P. L. F., Blanche H., Cann H., Kitzman J. O., Shendure J., Eichler E. E., Lein E. S., Bakken T. E., Golovanova L. V., Doronichev V. B., Shunkov M. V., Derevianko A. P., Viola B., Slatkin M., Reich D., Kelso J., and Pääbo S., “The complete genome sequence of a neanderthal from the altai mountains,” Nature, vol. 505, pp. 43–49, Jan. 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [110].Mafessoni F., Grote S., de Filippo C., Slon V., Kolobova K. A., Viola B., Markin S. V., Chintalapati M., Peyrégne S., Skov L., Skoglund P., Krivoshapkin A. I., Derevianko A. P., Meyer M., Kelso J., Peter B., Prüfer K., and Pääbo S., “A high-coverage neandertal genome from chagyrskaya cave,” Proc. Natl. Acad. Sci. U. S. A., vol. 117, pp. 15132–15136, June 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [111].Prüfer K., de Filippo C., Grote S., Mafessoni F., Korlević P., Hajdinjak M., Vernot B., Skov L., Hsieh P., Peyrégne S., Reher D., Hopfe C., Nagel S., Maricic T., Fu Q., Theunert C., Rogers R., Skoglund P., Chintalapati M., Dannemann M., Nelson B. J., Key F. M., Rudan P., Kućan Ž., Gušić I., Golovanova L. V., Doronichev V. B., Patterson N., Reich D., Eichler E. E., Slatkin M., Schierup M. H., Andrés A. M., Kelso J., Meyer M., and Pääbo S., “A high-coverage neandertal genome from vindija cave in croatia,” Science, vol. 358, pp. 655–658, Nov. 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [112].Meyer M., Kircher M., Gansauge M.-T., Li H., Racimo F., Mallick S., Schraiber J. G., Jay F., Prüfer K., de Filippo C., Sudmant P. H., Alkan C., Fu Q., Do R., Rohland N., Tandon A., Siebauer M., Green R. E., Bryc K., Briggs A. W., Stenzel U., Dabney J., Shendure J., Kitzman J., Hammer M. F., Shunkov M. V., Derevianko A. P., Patterson N., Andrés A. M., Eichler E. E., Slatkin M., Reich D., Kelso J., and Pääbo S., “A high-coverage genome sequence from an archaic denisovan individual,” Science, vol. 338, pp. 222–226, Oct. 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [113].Friedman J., Hastie T., and Tibshirani R., “Regularization paths for generalized linear models via coordinate descent,” J. Stat. Softw., vol. 33, no. 1, pp. 1–22, 2010. [PMC free article] [PubMed] [Google Scholar]
  • [114].York D., Evensen N. M., Martınez M. L., and De Basabe Delgado J., “Unified equations for the slope, intercept, and standard errors of the best straight line,” Am. J. Phys., vol. 72, pp. 367–375, Mar. 2004. [Google Scholar]
  • [115].Paysan-Lafosse T., Blum M., Chuguransky S., Grego T., Pinto B. L., Salazar G. A., Bileschi M. L., Bork P., Bridge A., Colwell L., Gough J., Haft D. H., Letunić I., Marchler-Bauer A., Mi H., Natale D. A., Orengo C. A., Pandurangan A. P., Rivoire C., Sigrist C. J. A., Sillitoe I., Thanki N., Thomas P. D., Tosatto S. C. E., Wu C. H., and Bateman A., “Interpro in 2022,” Nucleic Acids Res., vol. 51, pp. D418–D427, Jan. 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [116].Szklarczyk D., Nastou K., Koutrouli M., Kirsch R., Mehryary F., Hachilif R., Hu D., Peluso M. E., Huang Q., Fang T., Doncheva N. T., Pyysalo S., Bork P., Jensen L. J., and von Mering C., “The string database in 2025: protein networks with directionality of regulation,” Nucleic Acids Res., vol. 53, pp. D730–D737, Jan. 2025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [117].Akbari A., Barton A. R., Gazal S., Li Z., Kariminejad M., Perry A., Zeng Y., Mittnik A., Patterson N., Mah M., Zhou X., Price A. L., Lander E. S., Pinhasi R., Rohland N., Mallick S., and Reich D., “Pervasive findings of directional selection realize the promise of ancient DNA to elucidate human adaptation,” bioRxiv, Sept. 2024. [Google Scholar]
  • [118].Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M. A., Bender D., Maller J., Sklar P., de Bakker P. I., Daly M. J., and Sham P. C., “PLINK: A tool set for whole-genome association and population-based linkage analyses,” The American Journal of Human Genetics, vol. 81, pp. 559–575, sep 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [119].Yang J., Lee S. H., Goddard M. E., and Visscher P. M., “Gcta: A tool for genome-wide complex trait analysis,” The American Journal of Human Genetics, vol. 88, pp. 76–82, 1 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [120].Perdry H. and Dandine-Roulland C., gaston: Genetic Data Handling (QC, GRM, LD, PCA) Linear Mixed Models, 2023. R package version 1.6. [Google Scholar]
  • [121].Karcher N. R. and Barch D. M., “The abcd study: understanding the development of risk for mental and physical health outcomes,” Neuropsychopharmacology, vol. 46, pp. 131–142, Jan. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [122].Jerber J., Seaton D. D., Cuomo A. S. E., Kumasaka N., Haldane J., Steer J., Patel M., Pearce D., Andersson M., Bonder M. J., Mountjoy E., Ghoussaini M., Lancaster M. A., HipSci Consortium, Marioni J. C., Merkle F. T., Gaffney D. J., and Stegle O., “Population-scale single-cell rna-seq profiling across dopaminergic neuron differentiation,” Nat. Genet., vol. 53, pp. 304–312, Mar. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [123].Emani P. S., Liu J. J., Clarke D., Jensen M., Warrell J., Gupta C., Meng R., Lee C. Y., Xu S., Dursun C., Lou S., Chen Y., Chu Z., Galeev T., Hwang A., Li Y., Ni P., Zhou X., PsychENCODE Consortium‡, Bakken T. E., Bendl J., Bicks L., Chatterjee T., Cheng L., Cheng Y., Dai Y., Duan Z., Flaherty M., Fullard J. F., Gancz M., Garrido-Martín D., Gaynor-Gillett S., Grundman J., Hawken N., Henry E., Hoffman G. E., Huang A., Jiang Y., Jin T., Jorstad N. L., Kawaguchi R., Khullar S., Liu J., Liu J., Liu S., Ma S., Margolis M., Mazariegos S., Moore J., Moran J. R., Nguyen E., Phalke N., Pjanic M., Pratt H., Quintero D., Rajagopalan A. S., Riesenmy T. R., Shedd N., Shi M., Spector M., Terwilliger R., Travaglini K. J., Wamsley B., Wang G., Xia Y., Xiao S., Yang A. C., Zheng S., Gandal M. J., Lee D., Lein E. S., Roussos P., Sestan N., Weng Z., White K. P., Won H., Girgenti M. J., Zhang J., Wang D., Geschwind D., Gerstein M., and PsychENCODE Consortium, “Single-cell genomics and regulatory networks for 388 human brains,” Science, vol. 384, p. eadi5199, May 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [124].Au E. H., Fauci C., Luo Y., Mangan R. J., Snellings D. A., Shoben C. R., Weaver S., Simpson S. K., and Lowe C. B., “Gonomics: uniting high performance and readability for genomics with go,” Bioinformatics, vol. 39, Aug. 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [125].Cock P. J. A., Antao T., Chang J. T., Chapman B. A., Cox C. J., Dalke A., Friedberg I., Hamelryck T., Kauff F., Wilczynski B., and de Hoon M. J. L., “Biopython: freely available python tools for computational molecular biology and bioinformatics,” Bioinformatics, vol. 25, pp. 1422–1423, June 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [126].Ives A. R. and Garland T. Jr, “Phylogenetic logistic regression for binary dependent variables,” Syst. Biol., vol. 59, pp. 9–26, Jan. 2010. [DOI] [PubMed] [Google Scholar]
  • [127].Ho L. s. T. and Ané C., “A linear-time algorithm for gaussian and non-gaussian trait evolution models,” Syst. Biol., vol. 63, pp. 397–408, May 2014. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

Data Availability Statement

The EpiSLI whole genome sequencing data described here is available to qualified researches via dbGaP (study accession = phs002255.v1.p1): https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002255.v1.p1

SPARK genetic and phenotype data is available to qualified researchers at SFARI base: https://base.sfari.org/

ABCD is available to qualified researchers at: https://nda.nih.gov/abcd/request-access

1000 Genomes Phase 3 data is available at: https://www.internationalgenome.org/data/

Allen Ancient DNA Resource: https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadab

Neanderthal and Denisovan genomes: https://www.eva.mpg.de/genetics/genome-projects/

Cross-species sequence alignment data: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/cactus447way/


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES