Ancient regulatory evolution shapes individual language abilities in present-day humans

Lucas G Casten; Tanner Koomar; Taylor R Thomas; Jin-Young Koh; Dabney Hofammann; Savantha Thenuwara; Allison Momany; Marlea O’Brien; Jeff C Murray; J Bruce Tomblin; Jacob J Michaelson

doi:10.1101/2025.03.07.641231

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2026 Feb 13:2025.03.07.641231. Originally published 2025 Mar 10. [Version 4] doi: 10.1101/2025.03.07.641231

Ancient regulatory evolution shapes individual language abilities in present-day humans

Lucas G Casten ^1,^†, Tanner Koomar ^1,^†, Taylor R Thomas ², Jin-Young Koh ³, Dabney Hofammann ¹, Savantha Thenuwara ⁴, Allison Momany ⁵, Marlea O’Brien ⁶, Jeff C Murray ⁵, J Bruce Tomblin ⁶, Jacob J Michaelson ^1,^6,^*

PMCID: PMC11952349 PMID: 40161630

Abstract

Language is a defining feature of our species, yet the genomic changes enabling it remain poorly understood. Despite decades of work since FOXP2’s discovery, we still lack a clear picture of which regions shaped language evolution and how variation contributes to present-day phenotypic differences. Using a novel evolutionary stratified polygenic score approach in nearly 40,000 individuals, we find that Human Ancestor Quickly Evolved Regions (HAQERs) are specifically associated with language but not general cognition. HAQERs evolved before the human–Neanderthal split, giving hominins increased binding of Forkhead and Homeobox transcription factors, and show balancing selection across the past 20,000 years. Remarkably, language variants in HAQERs appear more prevalent in Neanderthals and have convergently evolved across vocal-learning mammals. Our results reveal how ancient innovations continue shaping human language.

Human language is one of our species’ most remarkable cognitive innovations, yet the genetic mechanisms underlying this ability remain elusive. While the human genome differs by only 1–5% from our closest primate relatives (1–3), these modest genetic changes enabled the evolution of our species’ unique capacity for complex language. Understanding how these relatively small genomic differences produced profound cognitive differences represents a central challenge in evolutionary genetics, with implications for language disorders, human cognitive diversity, and the origins of human-specific traits (4).

The discovery that mutations in FOXP2 cause speech and language disorders provided the first clear example of a single gene with significant effects on language, reinforcing early expectations for simple genetic architectures (5, 6). However, FOXP2’s contribution to typical variation in language ability proved limited, with subsequent studies failing to find associations between common FOXP2 variants and individual differences in language skills (7, 8). This limitation shifted research toward polygenic models emphasizing a large number of regulatory elements scattered throughout the genome that collectively influence language development. Genome-wide association studies have since identified numerous loci contributing to reading abilities, stuttering, rhythm, and vocabulary development, supporting a highly polygenic architecture (9–14). Cross-species studies have revealed that language-related traits (vocal learning and rhythm) show convergent evolution across mammalian lineages, with distributed regulatory networks rather than single genes controlling complex vocal behaviors (15–20). However, this polygenic model has left critical evolutionary questions unanswered: how did language-relevant regulatory elements chnage during human evolution, when did humans acquire language-promoting functions, and how do ancient evolutionary changes translate into present-day individual differences in language abilities?

To address these questions, we analyzed 65 million years of primate evolutionary history to trace the origins of language-relevant genetic variation. As part of our approach, we developed an evolutionary stratified polygenic score (ES-PGS) method that partitions genetic effects based on the evolutionary origins of their sequence context. We applied this approach across nearly 40,000 individuals with detailed language phenotyping, combined with molecular analysis, ancient DNA analysis, and cross-species genomic comparisons. This multi-modal framework enabled us to directly connect ancient genomic innovations with modern individual differences in language ability, identify neurobiological mechanisms supporting language evolution, and revealed how evolutionary trade-offs have shaped human cognitive variation.

Results

Dimensions of language ability

To quantify dimensions of developmental language abilities, we analyzed 17 longitudinal cognitive and language assessments administered from kindergarten through 4th grade for 350 children sampled from a community-based cohort (21), which we refer to as the ”EpiSLI” cohort. This analysis revealed seven factors representing distinct aspects of language ability (Figure 2A). The first factor (F1), primarily driven by sentence repetition scores, represents ”core language” ability. Sentence repetition strongly indicates overall language capacity, making F1 a key measure of general language competence (22, 23). The second factor (F2) relates to receptive vocabulary and listening comprehension, covering broad receptive language skills. The third factor (F3) specifically reflects nonverbal IQ, aligning with performance IQ at both kindergarten and 2nd grade. Factor F4 captures pre-literacy language skills, incorporating all kindergarten scores except performance IQ. Its slight correlation to F1 and F2 (r = 0.13 and 0.12), but not F3, suggests specificity to language (Figure 2B). Factor F5, which we call ”talkativeness,” mainly reflects the number of clauses produced in a narrative task. Factor F6, based on a comprehension of concepts and directions assessment, indexes mastery of directive language (i.e., task-based instructions). Factor F7 spans a variety of assessments, with specific loading on vocabulary and grammar-related tasks, suggesting a broad, crystallized knowledge of language.

Most of our preliminary investigation of these factors suggested that Factors 1, 2, and 3 carried the most genetic association signal (Figure 2D, Supplementary Table 2). We also find pervasive associations with F1-F3 and measures of mental health in our sample (N = 241, Figure S1, Supplementary Table 1).

Evolutionary Stratified Polygenic Score Analysis

To investigate the genetic origins of language ability, we developed an evolutionary stratified polygenic score (ES-PGS) approach that systematically examines how genetic variants from different evolutionary periods contribute to traits. ES-PGS builds on the conceptual framework of partitioned heritability and pathway-based polygenic score methods, which have successfully partitioned genetic effects across functional genomic regions (24–27). However, evolutionary questions about when and how language-relevant functions emerged during human evolution require partitioning based on phylogenetic age rather than functional annotations. ES-PGS addresses this need by partitioning polygenic scores based on evolutionary origin, testing whether incorporating specific evolutionary periods significantly improves phenotypic predictions beyond what is explained by the rest of the genome and biologically matched random control regions (matched for chromosome, size, GC content, repeat content, distance to nearest gene, number of genes nearby, and overlap with promoter or coding regions). By leveraging individual-level data rather than summary statistics, ES-PGS enables direct association testing in deeply phenotyped cohorts, which is particularly valuable for specialized studies like ours with extensive language assessments.

Human-specific genomic regions predict individual differences in language ability

We applied ES-PGS using the cognitive performance polygenic score (CP-PGS) (28) to trace the evolutionary origins of language abilities. This polygenic score captures broad dimensions of cognitive function, enabling valid comparisons across multiple cognitive domains (including language and nonverbal IQ). We first confirmed that the CP-PGS showed the expected associations in our EpiSLI sample, the genome-wide CP-PGS showed significant associations with both core language (F1, r = 0.22, FDR adjusted p-value = 0.001) and receptive language ability (F2, r = 0.19, FDR adjusted p-value = 0.01), (Figure 2D). Finally, we partitioned CP-PGS across five evolutionary annotations spanning approximately 65 million years of primate and human evolution, ranging from ancient primate-conserved regions to sequences differentiating modern humans from Neanderthals, to systematically trace which genomic regions and evolutionary periods contributed to different aspects of human cognition (29–33).

Human Ancestor Quickly Evolved Regions (HAQERs) emerged as the most compelling finding from this analysis. Despite comprising less than 0.1% of the human genome, HAQERs showed associations with four of the seven factors (F1, F2, F4, and F6, Figure 3B, Supplementary Table 3). HAQER CP-PGS demonstrated the strongest association with core language ability (r = 0.23, ES-PGS model β = 0.18, p-value = 1.2 × 10⁻⁴, FDR adjusted p-value = 0.004, Figures 3A–C) while showing no association with nonverbal IQ (F3, ES-PGS model β = 0.06, p-value = 0.19, FDR adjusted p-value = 0.61). HAQERs are largely non-coding sequences that began rapidly evolving after the human-chimpanzee split (approximately 6 million years ago) but before human-Neanderthal divergence (approximately 600,000 years ago), acquiring novel regulatory functions in the human lineage (31, 34).

Figure 3: — A Comparison of evolutionary events on core language ability in EpiSLI (N = 350). Points represent the β provided from the ES-PGS models for each evolutionary annotation, while the ranges represent the 95% confidence interval. Solid points indicate p-value < 0.05. B Comparison of 3 evolutionary (oldest = Primate UCEs, middle = HAQERs, and youngest = archaic deserts) events on the 7 factor scores in EpiSLI (N = 350). Points represent the β provided from the ES-PGS models for each evolutionary annotation, while the ranges represent the 95% confidence interval. Solid points indicate p-value < 0.05. C Scatterplot of HAQER CP-PGS with core language scores (F1) in the EpiSLI sample. D Scatterplot of background CP-PGS with core language scores (F1) in the EpiSLI sample. E Scatterplot of biologically matched control regions CP-PGS with core language scores (F1) in the EpiSLI sample (matched to HAQERs).

The predictive power of HAQERs is striking. While the background and matched CP-PGS together utilized approximately 300,000 independent SNPs and explained 3.7% of variance in core language ability, adding HAQER CP-PGS (comprising only 1,763 independent SNPs) increased explained variance to 7.7%. This indicates that an average HAQER SNP carries 188 times more predictive power for language than SNPs elsewhere in the genome, with HAQERs alone explaining slightly more variance of core language scores in the EpiSLI cohort (r² gain of 4%) than the remaining >99.9% of the human genome (r² of 3.7%).

In contrast, Human Accelerated Regions (HARs), which are deeply conserved regulatory elements that acquired human-specific changes (32, 35), showed no comparable signal using ES-PGS (F1 β = −0.05, p-value = 0.27, FDR adjusted p-value = 0.7, Figure S2). This distinction suggests that human language ability emerged through novel regulatory innovations (HAQERs) rather than modifications to existing functional elements (HARs). The specific association between HAQERs and language factors but not nonverbal IQ reveals a distinct evolutionary trajectory for verbal abilities compared to general cognition. Supporting this distinction, nonverbal IQ (F3) was most strongly associated with genomic regions that underwent rapid changes across all great apes (30) (ES-PGS β = 0.13, p-value = 0.004, FDR adjusted p-value = 0.07, Figure S2).

HAQERs influence language ability across multiple cohorts and variant types

We validated HAQER effects on language across multiple independent cohorts using both common and rare genetic variants. In the SPARK autism dataset (N > 30,000) (36), HAQER CP-PGS predicted verbal language capability (”Able to talk using short phrases or sentences”, ES-PGS β = 0.05, p-value = 0.008, N = 29,266) and language disorder diagnoses in parents without autism (model improvement p-value = 7.9 × 10⁻⁵, N = 713) but not psychiatric conditions (model improvement p-value = 0.58, N = 713), confirming language specific effects (Figure 4A–B, Supplementary Table 5). Clinical records showed HAQER CP-PGS is associated with verbal IQ (ES-PGS β = 2.08, p-value = 0.022, N = 620) but not nonverbal IQ (ES-PGS β = 0.59, p-value = 0.49, N = 620, Figure S3), further supporting specificity.

Figure 4: — A Points represent the β provided from the ES-PGS models for the HAQER CP-PGS on Social Communication Questionnaire (SCQ) items in SPARK, while the ranges represent the 95% confidence interval. Solid points indicate p-value < 0.05. B Points represent the β provided from the ES-PGS models for the HAQER CP-PGS on self-reported language and psychiatric diagnosis in SPARK, while the ranges represent the 95% confidence interval. Solid points indicate p-value < 0.05. C Points represent the β provided from the regression models for the rare reversions within 10Kb of HAQERs, HARs, or RAND (random matched) sequences, while the ranges represent the 95% confidence interval. Solid points indicate p-value < 0.05. **D-F** Distributions of rare reversions counts from the SPARK whole genome sequencing data within 10Kb of the following regions: HAQERs (D), HARs (E), and random sequence (RAND, F).

To test HAQER effects through an orthogonal approach independent of our ES-PGS method, we analyzed rare genetic variation in SPARK whole genome sequencing data (N > 2,000). We examined rare ”reversions”, variants that revert from the human-specific version to their human-chimp ancestral state, reasoning that if HAQERs evolved to support human language, reversions should impair language function, which would support the observed polygenic score associations. Individuals carrying more reversions in HAQERs showed increased likelihood of developmental language disorder (β = 0.16, p-value = 6.5 × 10⁻⁴) and delayed language developmental milestones, but no association with age started walking or intellectual disability (Figure 4C, Supplementary Table 6). Notably, HAQERs showed higher rates of reversions compared to HARs and matched random sequences (Figure 4D–F).

Further independent validation in the ABCD developmental cohort (37) showed the HAQER CP-PGS is associated with the Rey Auditory Verbal Learning Test performance, a measure of spoken word recall (ES-PGS β = 0.24, p-value = 0.048, N = 5,625), but not reading or vocabulary tasks from the NIH Toolbox (Supplementary Table 7). This suggests HAQERs have preferential effects on vocal communication over written language. These converging results across cohorts and variant types strongly support that genetic variation within HAQERs have a significant and specific effect on spoken language abilities in contemporary humans.

HAQERs evolved stronger binding affinity for language-relevant transcription factors

To investigate the molecular mechanisms underlying HAQERs’ association with language development, we analyzed how rare genetic variants affect transcription factor binding sites in these regions. We compared two classes of rare variants in the EpiSLI cohort: hominin-chimpanzee ancestral allele reversions versus other rare variants, using position weight matrices to quantify how these variants alter predicted transcription factor binding affinity. By comparing the effects of reversions with other rare variants, we could detect systematic evolutionary changes in hominin-specific transcription factor binding associated with language phenotypes.

In HAQERs, hominin-gained transcription factor motif binding showed significant correlation with individual core language ability (N = 350 individuals, β = 0.14, p-value = 5.6×10⁻⁴, Figure 5A), indicating that hominins evolved increased transcription factor binding in these regions and this binding enhances language performance. In contrast, regions under sequence conservation with human-specific changes (HARs, β = 0.01, p-value = 1, Figure 5B) or neutral evolution (RAND sequences, β = 0, p-value = 1, Figure 5C) showed no relationship between motif integrity and language ability, highlighting HAQERs’ unique and systematic selection for regulatory function during hominin evolution.

Figure 5: — **A-C** Relationship between selection for transcription factor motif integrity (x-axis) and motif association with language ability (y-axis) in (A) HAQERs, (B) HARs, and (C) random genomic regions. Each point represents one transcription factor motif. Error bars indicate ±1 standard error. Purple line (or gray for non-significant fits) shows York regression fit with 95% confidence interval (shaded); regression coefficient (β), chi-squared statistic (χ²), and p-values are shown. D Detailed view of motif effects in HAQERs colored by transcription factor family. Solid points indicate motifs with p < 0.05 for both positive selection and positive language association. Colored polygons show convex hulls for each transcription factor family. E Enrichment analysis of transcription factor families for concordant positive selection and language effects, shown as log2 odds ratios. Error bars indicate 95% confidence intervals. Solid points indicate p < 0.05.

Analysis of specific transcription factor families revealed striking enrichment of Homeobox and Forkhead box transcription factors associated with both enhanced binding affinity in HAQERs and improved language performance (Figure 5D). The Homeobox family displayed the strongest enrichment among all transcription factor families (odds ratio = 11.58, p-value = 2.6 × 10⁻³⁴), followed by the Forkhead box family (which includes FOXP2, odds ratio = 7.28, p-value = 5.3 × 10⁻⁶, Figure 5E, Supplementary Table 16). These results suggest that hominin-gained binding of Homeobox and Forkhead box families within HAQERs may have played a crucial role in the evolution of human language capability.

HAQERs regulate language-relevant brain circuits through human-specific chromatin accessibility

To determine which brain cell types are regulated by HAQERs, we analyzed their overlap with candidate cis-regulatory elements (cCREs) identified through single-nucleus chromatin accessibility profiling (snATAC-seq) of human and mouse brain cell types (38). We tested whether HAQERs preferentially associate with human-specific versus evolutionarily conserved chromatin accessible regions, reasoning that HAQERs should show increased overlap with human-specific regulatory elements if they provide novel functions in the human lineage.

HAQERs demonstrated significant enrichment for human-specific cCREs across brain cell types (Figure S5A), with the strongest enrichment in medium spiny neurons (MSNs, p-value = 9.8 × 10⁻⁸). MSNs comprise over 90% of neurons in the human striatum, a circuit that plays essential roles in vocal learning across species (17, 39) and was the only part of the brain robustly associated with developmental language disorders in a recent meta-analysis (40). HAQERs also showed enrichment around human-specific cCREs in FOXP2-expressing neurons (p-value = 5.3 × 10⁻⁴), providing independent evidence linking HAQERs to FOXP2 regulatory networks beyond our transcription factor binding findings (Figure 5E). In contrast, HARs showed minimal enrichment for human-specific chromatin accessibility but strong enrichment for evolutionarily conserved regions (strongest in VIP neurons, p-value = 2.7 × 10⁻¹⁰, Figure S5B). These results are consistent with HAQERs providing novel regulatory functions specific to human brain development that may support language capabilities.

Selective pressures acting on language and general cognition

Having multiple lines of evidence associating HAQERs with human language evolution, we next examined how selective pressures may have influenced language-related genetic variation over the past 20,000 years of human history using the Allen Ancient DNA Resource (AADR) (41). The AADR is the largest genotyped collection of ancient humans, providing harmonized genotype and metadata for each sample (like radiocarbon dating based sample ages). We identified ancient west Eurasians, then correlated their HAQER CP-PGS and the background CP-PGS with sample age (N = 3,244 individuals with remains dated between 18,775 to 150 years ago passing quality control). We see that the polygenic score for general cognition (background CP-PGS) has been subject to positive selection and has increased substantially over time (selection coefficient = 0.088, p-value = 2.1 × 10⁻¹²), Figure 6A. Unexpectedly, we found that HAQER CP-PGS has been stable throughout human history, indicating that ancient and modern humans carry similar numbers of language-related alleles in HAQERs (selection coefficient = −0.004, p-value = 0.71, Figure 6A).

Figure 6: — A CP ES-PGS across hominin evolution. HAQER PGS (purple) and background PGS (green) plotted against sample age for Neanderthals/Denisovans, Ancient Europeans, and Present-day Europeans. LOESS fits with 95% CIs shown. B Site frequency spectrum comparing HAQERs to matched controls and HARs. *Log*₂ ratio of proportion of variants across allele frequency bins; positive values indicate HAQER enrichment. Error bars: 95% bootstrap CIs. C Inbreeding coefficient (F-statistic) across sequence types. Lower F-statistics indicate excess heterozygosity. HAQERs show significantly lower F-statistics than HARs and control regions (*** indicates p-value < 0.001). D Composite phenotype analysis linking HAQER CP-PGS to birth complications and cognitive scores. Scatterplot: canonical correlation components colored by HAQER CP-PGS. Heatmap: variable loadings showing birth complications on component 1, cognitive variables on component 2. E HAQER-like and HAR-like sequence similarity scores in non-vocal learning (N = 121 species) and vocal learning (N = 49 species) mammals. Phylogenetic logistic regression statistics are shown. **F-G** HAQER-like sequence similarity correlates with brain mass (N = 116 species, F) and birth:adult weight ratio (N = 115 species, G) across mammals. LOESS fits with 95% CIs.

The presence of HAQERs in archaic humans provided a unique opportunity to describe genetically predicted cognitive traits across human species. To do this, we computed HAQER CP-PGS and background CP-PGS in archaic humans (N = 10) and compared them to ancient (N = 3,244) and modern humans (N = 503). Remarkably, the ten archaic human genomes (eight Neanderthals and two Denisovans) showed elevated HAQER CP-PGS (mean z-score = 0.91, median z-score = 1.23), while having reduced background CP-PGS (mean z-score = −3.02, median z-score = −3.01, Figure 6A). In contrast, a set of random matched control regions showed no differences in CP-PGS across groups (Supplementary Figure S4A–C, Supplementary Table 9). While this data should be interpreted with caution due to challenges applying polygenic scores across populations (42), the elevated HAQER CP-PGS we observed aligns with arguments that archaic humans were capable of complex language (43–45).

Evidence of balancing selection from modern genomes

The stability of HAQER CP-PGS throughout human evolutionary history led us to hypothesize that HAQERs have been maintained through balancing selection. Multiple population genetic analyses in the EpiSLI sample provided support for this hypothesis. HAQERs exhibited significantly more heterozygosity compared to both HARs (t-statistic = 110, p-value = 5.3×10⁻²⁷³) and matched control sequences (t-statistic = 147, p-value = 1.6 × 10⁻³¹⁵), suggesting that heterozygosity at HAQER loci provided a selective advantage (Figure 6C). Additionally, we observed an enrichment of intermediate frequency variants (MAF 30–50%) in HAQERs (Figure 6B). Both excess heterozygosity and enrichment of intermediate allele frequencies are characteristic signatures of balancing selection. Together with our ancient DNA results, these findings support a model where selective pressures prevented fixation of language-enhancing alleles in HAQERs, maintaining genetic variation at intermediate frequencies throughout human history.

HAQERs influence prenatal brain development

The evolutionary analysis revealed a puzzling pattern: while other cognitive variants show recent positive selection, language-promoting HAQER variants have remained stable for at least the past 20,000 years of human history despite their clear cognitive benefits. This stability suggests ongoing fitness costs that counterbalance the advantages of enhanced language ability. Given HAQERs’ established role in neurodevelopment (34), we investigated whether these variants create pleiotropic effects on prenatal brain development that could explain their evolutionary constraints. First, we investigated temporal and cell-type enrichment for these regions to identify if they were likely to influence birth related neurodevelopmental traits. HAQERs showed broad enrichment for variants affecting prenatal brain gene expression when intersected with single-cell quantitative trait loci (scQTLs) from developing midbrain neurons (46). The strongest enrichment observed was at the late prenatal time point, which corresponds to when the human brain most rapidly expands (47) (Figure S5C). Critically, HAQERs showed no enrichment when we examined adult brain regulatory elements (48), confirming prenatal neurodevelopmental effects (Figure S5D). HAQERs were also significantly enriched around genomic loci associated with head circumference at birth, a proxy for brain size (49) (p-value = 4.4 × 10⁻⁴, Figure S5E).

HAQERs link language evolution to the obstetric dilemma

The evidence for HAQER effects on prenatal brain development and head circumference at birth suggests a potential mechanism for their evolutionary stability: the obstetric dilemma. Enhanced fetal brain development may create reproductive costs through the obstetric dilemma, where increased brain size complicates birth in bipedal humans with narrow pelvises (50, 51). To test whether HAQERs contribute to this trade-off, we analyzed brain imaging, cognitive, and birth outcome data in the ABCD cohort (37).

A canonical correlation analysis revealed two distinct composite phenotypic axes associated with HAQER CP-PGS, providing evidence for the obstetric dilemma that could plausibly drive the observed balancing selection (Figure 6D). The first canonical component captured variance primarily from birth complication-related variables, while the second component loaded predominantly on cognitive performance measures (including a measure of verbal language learning). Critically, both composite scores showed positive correlations with HAQER CP-PGS (r = 0.69 and r = 0.66 respectively; p-value < 2.2 × 10⁻¹⁶ for both), and the two composite scores were themselves positively correlated (r=0.32, p-value < 2.2 × 10⁻¹⁶). This pattern indicates that genetic variants contributing to higher HAQER CP-PGS simultaneously increase both cognition (a trait under positive selection) and birth complication risk (a trait under negative selection). The consistent positive relationship between HAQER CP-PGS and both phenotypic domains provides a plausible mechanistic explanation for the balancing selection signatures specific to HAQERs. While the correlation magnitudes should be interpreted cautiously given our optimization procedure (see Materials and Methods), the qualitative finding of antagonistic pleiotropy is robust and aligns with the evolutionary hypothesis that the cognitive benefits conferred by HAQER variants are counterbalanced by obstetric costs, maintaining genetic diversity at these loci through balancing selection. This trade-off between cognitive ability and increased birth complications is a fundamental constraint that may have shaped the evolution of human language.

HAQER-like sequences show convergent evolution in vocal learning mammals

To test whether HAQER functions extend beyond humans, we analyzed homologous sequences across 170 non-primate mammalian species, including 49 vocal learners and 121 non-vocal learner species (15). Vocal learner species can acquire and modify vocalizations through experience, contrasting with species restricted to innate vocalizations. We computed genome-wide ”HAQER-like” and ”HAR-like” sequence similarity scores using multiple sequence alignments (52) and tested for associations with vocal learning ability while controlling for phylogenetic relatedness (53, 54).

Vocal learner species showed significantly higher HAQER-like sequence similarity compared to non-vocal learners (phylogenetic regression β = 1.41, p-value = 1 × 10⁻⁴, Figure 6E). HAQERs were also enriched around previously identified mammalian vocal learner enhancer regions (p-value = 0.028, Figure S5F) (15). While HARs showed a similar enrichment around vocal learner enhancer regions (p-value = 0.04), HAR sequence similarity was not associated with vocal learner classification (phylogenetic regression β = −0.15, p-value = 0.67, Figure 6E). Given the independent evolution of vocal learning across mammalian lineages, these results suggest HAQER-like sequences may be a fundamental genetic mechanism for complex vocal communication that has been repeatedly utilized across evolutionary history.

Remarkably, HAQER-like sequences also associated with brain size across species (phylogenetic regression β = 0.42, p-value = 0.006, N = 116 species) and larger relative birth weights (phylogenetic regression β = 0.44, p-value = 0.001, N = 115 species), mirroring the human obstetric dilemma pattern (Figure 6F–G). This convergent evidence across independent evolutionary lineages supports a link between the genetic architecture of vocal learning, brain development, and reproductive constraints. Together, these results suggests the trade-offs we observe in human language evolution may represent a broader biological phenomenon that support complex vocal communication.

Discussion

Our evolutionary stratified polygenic score analysis identifies genomic regions that disproportionately contributed to human language evolution and continue to influence individual differences in language abilities observed in present-day humans. While previous research demonstrated that rare mutations in FOXP2 can cause language disorders (5, 55), common variants in FOXP2 show minimal association with typical language variation (7, 8), prompting GWAS studies and polygenic models of language-related traits (9, 11–14, 56). However, these models left critical questions unanswered: when did language-associated variation evolve, and how do these molecular changes influence language development? Our analysis reveals that HAQERs (31, 34), mostly non-coding regions that rapidly evolved before the human-Neanderthal split and represent less than 0.1% of the human genome, harbor variants with disproportionate effects on language. Individual SNPs in HAQERs carry 188 times more impact on language than variants elsewhere in the genome, while showing no association with nonverbal cognition. These potent regulatory effects distributed across thousands of small genomic elements support the polygenic architecture of human language while pointing to its specific evolutionary origins.

We find that HAQERs evolved across hominins to increase binding for Forkhead (including FOXP2) and Homeobox transcription factors (TFs), with motif integrity correlating with individual language scores. Supporting this regulatory mechanism, HAQERs show significant enrichment for human-specific chromatin accessible regions across brain cell types, with strong enrichment in medium spiny neurons and FOXP2-expressing neurons, while HARs primarily overlap with evolutionarily conserved regulatory elements. This suggests FOXP2 influences language primarily through its regulatory networks (57) rather than protein-coding changes, explaining how rare mutations in a single TF can produce profound effects on language development while common variants in the same TF shows minimal associations (7, 8). The observed 11.58-fold and 7.28-fold enrichment of hominin-gained Homeobox and Forkhead binding sites that positively correlate with language scores in HAQERs indicates that these developmentally essential TF families (58–60) likely became central to human language evolution.

Cross-species triangulation provides independent support that HAQERs are functional regulatory elements for language-related traits. Consistent with our human evidence, vocal learner mammals show significantly higher HAQER-like sequence similarity than non-vocal learners after controlling for phylogenetic relationships, with parallel associations for brain size and birth weight. Importantly, HAQERs show enrichment around established mammalian vocal learning enhancer regions (15), consistent with previous reports of convergent evolution for complex vocal communication (15–20). This convergent evolution of HAQER-like sequences across vocal learning lineages provides strong independent support that these regulatory elements are fundamental genetic mechanisms for complex vocal communication.

Ancient DNA analysis reveals that while general cognitive variants show positive selection over 20,000 years, language-related HAQER variants have remained stable, suggesting balancing selection maintains language-related genetic variation. Analysis of birth outcomes, brain imaging, and cognitive data provided a potential mechanism for this unexpected evolutionary constraint: individuals with more language-related variants in HAQERs were more likely to have larger heads and birth complications, indicating trade-offs between language capability and reproductive costs. This pattern connects HAQERs to the obstetric dilemma, the evolutionary trade-off between narrower pelvises supporting upright walking and larger fetal brains enabling complex cognition (50, 51), potentially explaining why language-promoting variants persist at intermediate frequencies rather than reaching fixation. Further supporting this neurodevelopmental mechanism, we find that HAQERs are enriched for genetic variants associated with prenatal gene expression and head circumference at birth (46, 49). The prenatal brain regulatory activity of HAQERs aligns with established evidence that early developmental processes critically influence later language outcomes (61, 62). Intriguingly, despite substantial methodological limitations in cross-population polygenic score applications (42), available Neanderthal and Denisovan genomes suggest higher frequencies of language-promoting HAQER variants than modern humans, though this requires cautious interpretation. This finding needs further validation, and future research should also investigate morphological and obstetric differences between archaic and modern humans that may have enabled archaic populations to carry higher polygenic scores of language-associated alleles in HAQERs. These results point to the obstetric dilemma as an ongoing evolutionary constraint that specifically limits language-related genetic variants from reaching fixation, creating a fundamentally different selection landscape for vocal communication compared to general cognitive abilities.

These findings demonstrate how ancient regulatory innovations continue to shape human language variation through evolutionary constraints that balance cognitive benefits against reproductive costs. The independent evolution of HAQER-like sequences in vocal learning mammals reveals fundamental genetic mechanisms for complex communication that have emerged repeatedly across lineages. While general cognitive variants show positive selection over time, language variants remain stable at intermediate frequencies, suggesting that individual differences in spoken language abilities reflect ongoing evolutionary trade-offs. These evolutionary constraints reveal why individual differences in language abilities persist despite the importance of communication skills. Further investigation of HAQER regulatory networks will be essential to translate these evolutionary insights into approaches for supporting those with language disorders.

Materials and methods summary

We developed an evolutionary stratified polygenic score (ES-PGS) approach to trace the origins of language-relevant genetic variation across 65 million years of primate evolutionary events. Language abilities were assessed through factor analysis of 17 longitudinal cognitive and language assessments from kindergarten through 4th grade in 350 children from a community-based cohort (EpiSLI) (21). We applied ES-PGS using cognitive performance (28) polygenic scores partitioned across five evolutionary annotations (29–33), testing whether genomic regions from specific evolutionary periods contribute disproportionately to language versus general cognitive functions.

Validation was performed across multiple independent cohorts using both common and rare genetic variants. In the SPARK autism dataset (N > 30,000) (36), we tested HAQER effects on verbal language ability and language disorder diagnoses. We analyzed rare ”reversions” in SPARK whole genome sequencing data (N > 2,000 individuals), variants reverting from human-specific to human-chimpanzee ancestral states (34), reasoning that if HAQERs evolved to enhance language, reversions should impair language function. Additional validation used unrelated individuals from the ABCD cohort (N = 5,625) for spoken word recall performance, cognitive test scores, brain size, and birth traits (37).

To investigate molecular mechanisms supporting language evolution, we analyzed how rare genetic variants in evolutionarily significant regions affect transcription factor binding sites using position weight matrices for 633 human transcription factors from JASPAR2020 (63). We compared hominin-chimpanzee ancestral allele reversions versus other rare variants to detect systematic evolutionary changes in transcription factor binding associated with language phenotypes. Additionally, we looked for enrichment of evolutionarily significant regions across human-specific or conserved chromatin accessible regions (38), birth head circumference GWAS loci (49), neurodevelopmental related scQTLs datasets (46, 48), and mammalian vocal learning genomic regions (15).

Ancient DNA analysis examined selective pressures using the Allen Ancient DNA Resource (41), correlating HAQER polygenic scores with sample age in 3,244 ancient west Eurasians (18,775–150 years ago). We computed polygenic scores in archaic humans (8 Neanderthals, 2 Denisovans) and compared them to ancient and modern humans. Cross-species validation analyzed HAQER-like sequence similarity across 170 non-primate mammalian species (49 vocal learning, 121 non-vocal learning) (15) using phylogenetic regression to control for evolutionary relatedness (53, 54) in analyses of vocal learning, brain size, and a birth:adult weight ratio (used as a rough proxy for the obstetric dilemma) (64).

Supplementary Material

Supplement 1

media-1.xlsx^{(95.9KB, xlsx)}

NIHPP2025.03.07.641231V4-supplement-1.pdf^{(1.5MB, pdf)}

Materials and Methods

Figs. S1 to S5

References (65–141)

Figure 1: — Overview of this study and key findings

Acknowledgments

General:

We are grateful for the contributions of the EpiSLI cohort, their families, and the research team. We also appreciate the effort of participants, families, and research team members in ABCD, SPARK, SPARK Research Match, and Tempus. We appreciate obtaining access to genetic and phenotypic data for SPARK data on SFARI Base. We are incredibly appreciative of the open access datasets used in this study, including the researchers who put together the AADR dataset, the Zoonomia consortium datasets, and PanTHERIA. We also greatly appreciate Dr. Kristi Hendrickson and Dr. Susan Shen for valuable and encouraging discussions about this project.

Funding:

This work was funded by NIH grant DC014489, some additional funding came from the National Institutes of Health through a Predoctoral Training Grant (T32GM008629) to LGC.

Funding Statement

This work was funded by NIH grant DC014489, some additional funding came from the National Institutes of Health through a Predoctoral Training Grant (T32GM008629) to LGC.

Footnotes

Competing interests: There are no competing interests to declare.

Data and materials availability:

Custom code for ES-PGS and all analyses in this paper is available here, including example code for ES-PGS that can be applied to other research problems:

https://github.com/lucasgcasten/language_evolution

The EpiSLI whole genome sequencing data described here is available to qualified researches via dbGaP (study accession = phs002255.v1.p1):

https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002255.v1.p1

SPARK genetic and phenotype data is available to qualified researchers at SFARI base:

https://base.sfari.org/

ABCD is available to qualified researchers at:

https://nda.nih.gov/abcd/request-access

1000 Genomes Phase 3 data is available at:

https://www.internationalgenome.org/data/

Allen Ancient DNA Resource:

https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-Neanderthal

Neanderthal and Denisovan genomes:

https://www.eva.mpg.de/genetics/genome-projects/

Cross-species sequence alignment data:

https://hgdownload.soe.ucsc.edu/goldenPath/hg38/cactus447way/

PanTHERIA:

https://figshare.com/articles/dataset/Full_Archive/3531875

We used publicly available tools for data processing and analysis: bcftools:

https://samtools.github.io/bcftools/bcftools.html

PLINK:

https://www.cog-genomics.org/plink

GCTA:

https://yanglab.westlake.edu.cn/software/gcta

LDpred2:

https://privefl.github.io/bigsnpr/articles/LDpred2.html

PRSet:

https://choishingwan.github.io/PRSice/quick_start_prset/

bedtools:

https://bedtools.readthedocs.io

Biopython:

https://biopython.org/

gonomics:

https://github.com/vertgenlab/gonomics

https://www.r-project.org/

References and Notes

1.Suntsova M. V., Buzdin A. A., Differences between human and chimpanzee genomes and their implications in gene expression, protein functions and biochemical properties of the two species. BMC genomics 21, 535 (2020), doi: 10.1186/s12864-020-06962-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Britten R. J., Divergence between samples of chimpanzee and human DNA sequences is 5%, counting indels. Proceedings of the National Academy of Sciences of the United States of America 99, 13633–5 (2002), doi: 10.1073/pnas.172510699. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Varki A., Altheide T. K., Comparing the human and chimpanzee genomes: searching for needles in a haystack. Genome research 15, 1746–58 (2005), doi: 10.1101/gr.3737405. [DOI] [PubMed] [Google Scholar]
4.Fisher S. E., Evolution of language: Lessons from the genome. Psychon. Bull. Rev. 24 (1), 34–40 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Lai C. S. L., Fisher S. E., Hurst J. A., Vargha-Khadem F., Monaco A. P., A forkhead-domain gene is mutated in a severe speech and language disorder. Nature 413, 519–523 (2001), doi: 10.1038/35097076. [DOI] [PubMed] [Google Scholar]
6.Fisher S. E., Scharff C., FOXP2 as a molecular window into speech and language. Trends in Genetics 25, 166–177 (2009), doi: 10.1016/j.tig.2009.03.002. [DOI] [PubMed] [Google Scholar]
7.Atkinson E. G., et al. , No evidence for recent selection at FOXP2 among diverse human populations. Cell 174 (6), 1424–1435.e15 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Mueller K. L., et al. , Common Genetic Variants in FOXP2 Are Not Associated with Individual Differences in Language Development. PLOS ONE 11, e0152576 (2016), doi: 10.1371/journal.pone.0152576. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Eising E., et al. , Genome-wide analyses of individual differences in quantitatively assessed reading- and language-related skills in up to 34,000 people. Proceedings of the National Academy of Sciences 119 (2022), doi: 10.1073/pnas.2202764119. [DOI] [Google Scholar]
10.Polikowsky H. G., et al. , Large-scale genome-wide analyses of stuttering. Nat. Genet. 57 (8), 1835–1847 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Verhoef E., et al. , Genome-Wide Analyses of Vocabulary Size in Infancy and Toddlerhood: Associations With Attention-Deficit/Hyperactivity Disorder, Literacy, and Cognition-Related Traits. Biological Psychiatry 95, 859–869 (2024), doi: 10.1016/j.biopsych.2023.11.025. [DOI] [PubMed] [Google Scholar]
12.Alagöz G., et al. , The shared genetic architecture and evolution of human language and musical rhythm. Nat. Hum. Behav. (2024). [Google Scholar]
13.Doust C., et al. , Discovery of 42 genome-wide significant loci associated with dyslexia. Nature genetics 54, 1621–1629 (2022), doi: 10.1038/s41588-022-01192-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Niarchou M., et al. , Genome-wide association study of musical beat synchronization demonstrates high polygenicity. Nature Human Behaviour 6, 1292–1309 (2022), doi: 10.1038/s41562-022-01359-x. [DOI] [Google Scholar]
15.Wirthlin M. E., et al. , Vocal learning-associated convergent evolution in mammalian proteins and regulatory elements. Science 383 (6690), eabn3263 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Kaplow I. M., et al. , Relating enhancer genetic variation across mammals to complex phenotypes using machine learning. Science 380 (6643), eabm7993 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Jarvis E. D., Evolution of vocal learning and spoken language. Science 366 (6461), 50–54 (2019). [DOI] [PubMed] [Google Scholar]
18.Gordon R. L., et al. , Linking the genomic signatures of human beat synchronization and learned song in birds. Philos. Trans. R. Soc. Lond. B Biol. Sci. 376 (1835), 20200329 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Cahill J. A., et al. , Positive selection in noncoding genomic regions of vocal learning birds is associated with genes implicated in vocal learning and speech functions in humans. Genome Res. 31 (11), 2035–2049 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Sebastianelli M., et al. , A genomic basis of vocal rhythm in birds. Nat. Commun. 15 (1), 3095 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Tomblin J. B., The EpiSLI database: a publicly available database on speech and language. Lang. Speech Hear. Serv. Sch. 41 (1), 108–117 (2010). [DOI] [PubMed] [Google Scholar]
22.Klem M., et al. , Sentence repetition is a measure of children’s language skills rather than working memory limitations. Dev Sci 18 (1), 146–154 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Rujas I., Mariscal S., Murillo E., Lázaro M., Sentence repetition tasks to detect and prevent language difficulties: A scoping review. Children (Basel) 8 (7), 578 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Lee S. H., et al. , Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nature Genetics 44, 247–250 (2012), doi: 10.1038/ng.1108. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Finucane H. K., et al. , Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature Genetics 47, 1228–1235 (2015), doi: 10.1038/ng.3404. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Wei X., et al. , The lingering effects of Neanderthal introgression on human complex traits. Elife 12 (2023). [Google Scholar]
27.Choi S. W., et al. , PRSet: Pathway-based polygenic risk score analyses and software. PLoS genetics 19, e1010624 (2023), doi: 10.1371/journal.pgen.1010624. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Lee J. J., et al. , Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nature genetics 50, 1112–1121 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Kuderna L. F. K., et al. , Identification of constrained sequence elements across 239 primate genomes. Nature 625, 735–742 (2024), doi: 10.1038/s41586-023-06798-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Bi X., et al. , Lineage-specific accelerated sequences underlying primate evolution. Science advances 9, eadc9507 (2023), doi: 10.1126/sciadv.adc9507. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Yoo D., et al. , Complete sequencing of ape genomes. Nature 641 (8062), 401–418 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Capra J. A., Erwin G. D., McKinsey G., Rubenstein J. L. R., Pollard K. S., Many human accelerated regions are developmental enhancers. Philosophical transactions of the Royal Society of London. Series B, Biological sciences 368, 20130025 (2013), doi: 10.1098/rstb.2013.0025. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Peyrégne S., Boyle M. J., Dannemann M., Prüfer K., Detecting ancient positive selection in humans using extended lineage sorting. Genome Res. 27 (9), 1563–1572 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Mangan R. J., et al. , Adaptive sequence divergence forged new neurodevelopmental enhancers in humans. Cell 185, 4587–4603.e23 (2022), doi: 10.1016/j.cell.2022.10.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Pollard K. S., et al. , Forces shaping the fastest evolving regions in the human genome. PLoS genetics 2, e168 (2006), doi: 10.1371/journal.pgen.0020168. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Feliciano P., et al. , SPARK: A US Cohort of 50,000 Families to Accelerate Autism Research. Neuron 97, 488–493 (2018), doi: 10.1016/j.neuron.2018.01.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Lisdahl K. M., et al. , Adolescent brain cognitive development (ABCD) study: Overview of substance use assessment methods. Developmental cognitive neuroscience 32, 80–96 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Li Y. E., et al. , A comparative atlas of single-cell chromatin accessibility in the human brain. Science 382 (6667), eadf7044 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Xiao L., et al. , Expression of FoxP2 in the basal ganglia regulates vocal motor sequences in the adult songbird. Nat. Commun. 12 (1), 2617 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Ullman M. T., et al. , The neuroanatomy of developmental language disorder: a systematic review and meta-analysis. Nature Human Behaviour 8, 962–975 (2024), doi: 10.1038/s41562-024-01843-6. [DOI] [Google Scholar]
41.Mallick S., et al. , The Allen Ancient DNA Resource (AADR) a curated compendium of ancient human genomes. Sci. Data 11 (1), 182 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Ding Y., et al. , Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature 618 (7966), 774–781 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Dediu D., Levinson S. C., Neanderthal language revisited: not only us. Curr. Opin. Behav. Sci. 21, 49–55 (2018). [Google Scholar]
44.Conde-Valverde M., et al. , Neanderthals and Homo sapiens had similar auditory and speech capacities. Nature ecology evolution 5, 609–615 (2021), doi: 10.1038/s41559-021-01391-6. [DOI] [PubMed] [Google Scholar]
45.Krause J., et al. , The derived FOXP2 variant of modern humans was shared with Neandertals. Curr. Biol. 17 (21), 1908–1912 (2007). [DOI] [PubMed] [Google Scholar]
46.Jerber J., et al. , Population-scale single-cell RNA-seq profiling across dopaminergic neuron differentiation. Nat. Genet. 53 (3), 304–312 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Bethlehem R. A. I., et al. , Brain charts for the human lifespan. Nature 604 (7906), 525–533 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Emani P. S., et al. , Single-cell genomics and regulatory networks for 388 human brains. Science 384 (6698), eadi5199 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Vogelezang S., et al. , Genetics of early-life head circumference and genetic correlations with neurological, psychiatric and cognitive outcomes. BMC Med. Genomics 15 (1), 124 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Grunstra N. D. S., et al. , There is an obstetrical dilemma: Misconceptions about the evolution of human childbirth and pelvic form. Am. J. Biol. Anthropol. 181 (4), 535–544 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Xu L., et al. , The genetic architecture of and evolutionary constraints on the human pelvic form. Science 388 (6743), eadq1521 (2025). [DOI] [PubMed] [Google Scholar]
52.Kuderna L. F. K., et al. , A global catalog of whole-genome diversity from 233 primate species. Science 380 (6648), 906–913 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Ives A. R., Garland T., Jr, Phylogenetic logistic regression for binary dependent variables. Syst. Biol. 59 (1), 9–26 (2010). [DOI] [PubMed] [Google Scholar]
54.Ho L. s. T., Ané C., A linear-time algorithm for Gaussian and non-Gaussian trait evolution models. Syst. Biol. 63 (3), 397–408 (2014). [DOI] [PubMed] [Google Scholar]
55.Vernes S. C., et al. , A Functional Genetic Link between Distinct Developmental Language Disorders. New England Journal of Medicine 359, 2337–2345 (2008), doi: 10.1056/NEJMoa0802828. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Rajagopal V. M., et al. , Genome-wide association study of school grades identifies genetic overlap between language ability, psychopathology and creativity. Scientific Reports 13, 429 (2023), doi: 10.1038/s41598-022-26845-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Hickey S. L., Berto S., Konopka G., Chromatin decondensation by FOXP2 promotes human neuron maturation and expression of neurodevelopmental disease genes. Cell Rep. 27 (6), 1699–1711.e9 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Lambert S. A., et al. , The human transcription factors. Cell 172 (4), 650–665 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Hobert O., Homeobox genes and the specification of neuronal identity. Nat. Rev. Neurosci. 22 (10), 627–636 (2021). [DOI] [PubMed] [Google Scholar]
60.Co M., Anderson A. G., Konopka G., FOXP transcription factors in vertebrate brain development, function, and disorders. Wiley Interdiscip. Rev. Dev. Biol. 9 (5), e375 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Mariani B., et al. , Prenatal experience with language shapes the brain. Sci. Adv. 9 (47), eadj3524 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Madigan S., Wade M., Plamondon A., Browne D., Jenkins J. M., Birth weight variability and language development: Risk, resilience, and responsive parenting. J. Pediatr. Psychol. 40 (9), 869–877 (2015). [DOI] [PubMed] [Google Scholar]
63.Fornes O., et al. , JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 48 (D1), D87–D92 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Jones K. E., et al. , PanTHERIA: a species-level database of life history, ecology, and geography of extant and recently extinct mammals. Ecology 90 (9), 2648–2648 (2009). [Google Scholar]
65.Tomblin J. B., Records N. L., Zhang X., A System for the Diagnosis of Specific Language Impairment in Kindergarten Children. Journal of Speech, Language, and Hearing Research 39 (6), 1284–1294 (1996), doi: 10.1044/jshr.3906.1284, https://doi.org/10.1044\%2Fjshr.3906.1284. [DOI] [Google Scholar]
66.Tomblin J. B., Nippold M. A., eds., Understanding Individual Differences in Language Development Across the School Years (Psychology Press; ) (2014), doi: 10.4324/9781315796987, https://doi.org/10.4324\%2F9781315796987. [DOI] [Google Scholar]
67.Newcomer P. L., Hammill D. D., Test of Language Development – 2 Primary (Pro-Ed; Austin, TX: ) (1988). [Google Scholar]
68.Wechsler D., Wechsler preschool and primary scale of intelligence–revised (1989), title of the publication associated with this dataset: PsycTESTS Dataset.
69.Culatta B., Page J. L., Ellis J., Story retelling as a communicative performance screening tool. Lang. Speech Hear. Serv. Sch. 14 (2), 66–74 (1983). [Google Scholar]
70.Woodcock R. W., Woodcock reading mastery tests-revised (1987), title of the publication associated with this dataset: PsycTESTS Dataset.
71.Catts H. W., The relationship between speech-language impairments and reading disabilities. J. Speech Lang. Hear. Res. 36 (5), 948–958 (1993). [Google Scholar]
72.Achenbach T. M., Child Behavior Checklist (Springer; New York: ), pp. 546–552 (2011), doi: 10.1007/978-0-387-79948-3_1529. [DOI] [Google Scholar]
73.R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria: (2020), https://www.R-project.org/. [Google Scholar]
74.Harrell F. E. Jr, with contributions from Charles Dupont, many others., Hmisc: Harrell Miscellaneous (2020), https://CRAN.R-project.org/package=Hmisc, r package version 4.4–0.
75.Mayer M., missRanger: Fast Imputation of Missing Values (2019), https://CRAN.R-project.org/package=missRanger, r package version 2.1.0.
76.Andrews S., et al. , FastQC, Babraham Institute (2012).
77.Chapman B., et al. , bcbio/bcbio-nextgen: v1.1.6 (2019), doi: 10.5281/zenodo.3564939, https://doi.org/10.5281/zenodo.3564939. [DOI]
78.Li H., Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv (2013). [Google Scholar]
79.McKenna A., et al. , The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20, 1297–1303 (2010), doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
80.Garrison E., Marth G., Haplotype-based variant detection from short-read sequencing. arXiv (2012). [Google Scholar]
81.Rimmer A., et al. , Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nature Genetics 46, 912–918 (2014), doi: 10.1038/ng.3036. [DOI] [PMC free article] [PubMed] [Google Scholar]
82.A global reference for human genetic variation. Nature 526 (7571), 68–74 (2015), doi: 10.1038/nature15393, https://doi.org/10.1038\%2Fnature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
83.McLaren W., et al. , The Ensembl Variant Effect Predictor. Genome Biology 17 (1) (2016), doi: 10.1186/s13059-016-0974-4, https://doi.org/10.1186\%2Fs13059-016-0974-4. [DOI] [Google Scholar]
84.Karczewski K. J., et al. , The mutational constraint spectrum quantified from variation in 141,456 humans (2019), doi: 10.1101/531210, https://doi.org/10.1101\%2F531210. [DOI] [Google Scholar]
85.Smigielski E. M., dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Research 28 (1), 352–355 (2000), doi: 10.1093/nar/28.1.352, https://doi.org/10.1093\%2Fnar\%2F28.1.352. [DOI] [PMC free article] [PubMed] [Google Scholar]
86.Pedersen B. S., Layer R. M., Quinlan A. R., Vcfanno: fast, flexible annotation of genetic variants. Genome Biology 17 (1) (2016), doi: 10.1186/s13059-016-0973-5, https://doi.org/10.1186\%2Fs13059-016-0973-5. [DOI] [Google Scholar]
87.Choi S. W., Mak T. S.-H., O’Reilly P. F., Tutorial: a guide to performing polygenic risk score analyses. Nature protocols 15, 2759–2772 (2020), doi: 10.1038/s41596-020-0353-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
88.Privé F., Albiñana C., Arbel J., Pasaniuc B., Vilhjálmsson B. J., Inferring disease architecture and predictive ability with LDpred2-auto. The American Journal of Human Genetics 110, 2042–2055 (2023), doi: 10.1016/j.ajhg.2023.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
89.Demontis D., et al. , Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nature genetics 51 (1), 63–75 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
90.Hatoum A. S., et al. , Multivariate genome-wide association meta-analysis of over 1 million subjects identifies loci underlying multiple substance use disorders. Nat. Ment. Health 1 (3), 210–223 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
91.Walters R. K., et al. , Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders. Nature Neuroscience 21, 1656–1669 (2018), doi: 10.1038/s41593-018-0275-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
92.Wightman D. P., et al. , A genome-wide association study with 1,126,563 individuals identifies new risk loci for Alzheimer’s disease. Nature Genetics 53, 1276–1282 (2021), doi: 10.1038/s41588-021-00921-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
93.Grove J., et al. , Identification of common genetic risk variants for autism spectrum disorder. Nature genetics 51 (3), 431–444 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
94.Watson H. J., et al. , Genome-wide association study identifies eight risk loci and implicates metabo-psychiatric origins for anorexia nervosa. Nature genetics 51, 1207–1214 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
95.Mullins N., et al. , Genome-wide association study of more than 40,000 bipolar disorder cases provides new insights into the underlying biology. Nature Genetics 53, 817–829 (2021), doi: 10.1038/s41588-021-00857-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
96.Johnson E. C., et al. , A large-scale genome-wide association study meta-analysis of cannabis use disorder. Lancet Psychiatry 7 (12), 1032–1045 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
97.Howard D. M., et al. , Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nature neuroscience 22, 343–352 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
98.International League Against Epilepsy Consortium on Complex Epilepsies, GWAS meta-analysis of over 29,000 people with epilepsy identifies 26 risk loci and subtype-specific genetic architecture. Nat. Genet. 55 (9), 1471–1482 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
99.Watanabe K., et al. , Genome-wide meta-analysis of insomnia prioritizes genes associated with metabolic and psychiatric pathways. Nature Genetics 54, 1125–1132 (2022), doi: 10.1038/s41588-022-01124-w. [DOI] [PubMed] [Google Scholar]
100.Huang Q. Q., et al. , Examining the role of common variants in rare neurodevelopmental conditions. Nature 636 (8042), 404–411 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
101.Nievergelt C. M., et al. , International meta-analysis of PTSD genome-wide association studies identifies sex- and ancestry-specific genetic risk loci. Nature Communications 10, 4558 (2019), doi: 10.1038/s41467-019-12576-w. [DOI] [Google Scholar]
102.Trubetskoy V., et al. , Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 604, 502–508 (2022), doi: 10.1038/s41586-022-04434-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
103.Yu D., et al. , Interrogating the genetic determinants of Tourette’s syndrome and other tic disorders through genome-wide association studies. Am. J. Psychiatry 176 (3), 217–227 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
104.Rietveld C. A., et al. , Common genetic variants associated with cognitive performance identified using the proxy-phenotype method. Proceedings of the National Academy of Sciences 111 (38), 13790–13794 (2014), doi: 10.1073/pnas.1404623111, https://doi.org/10.1073\%2Fpnas.1404623111. [DOI] [Google Scholar]
105.Okbay A., et al. , Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nature Genetics 54, 437–449 (2022), doi: 10.1038/s41588-022-01016-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
106.Hatoum A. S., et al. , Genome-wide Association Study Shows That Executive Functioning Is Influenced by GABAergic Processes and Is a Neurocognitive Genetic Correlate of Psychiatric Disorders. Biological psychiatry 93, 59–70 (2023), doi: 10.1016/j.biopsych.2022.06.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
107.de la Fuente J., Davies G., Grotzinger A. D., Tucker-Drob E. M., Deary I. J., A general dimension of genetic sharing across diverse cognitive traits inferred from molecular data. Nature human behaviour 5, 49–58 (2021), doi: 10.1038/s41562-020-00936-2. [DOI] [Google Scholar]
108.Ip H. F., et al. , Genetic association study of childhood aggression across raters, instruments, and age. Transl. Psychiatry 11 (1), 413 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
109.Tielbeek J. J., et al. , Uncovering the genetic architecture of broad antisocial behavior through a genome-wide association study meta-analysis. Molecular psychiatry 27, 4453–4463 (2022), doi: 10.1038/s41380-022-01793-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
110.Warrier V., et al. , Genome-wide analyses of self-reported empathy: correlations with autism, schizophrenia, and anorexia nervosa. Translational psychiatry 8, 35 (2018), doi: 10.1038/s41398-017-0082-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
111.Gupta P., et al. , A genome-wide investigation into the underlying genetic architecture of personality traits and overlap with psychopathology. Nature Human Behaviour (2024), doi: 10.1038/s41562-024-01951-3. [DOI] [Google Scholar]
112.Hill W. D., et al. , Genome-wide analysis identifies molecular systems and 149 genetic loci associated with income. Nature communications 10, 5741 (2019), doi: 10.1038/s41467-019-13585-5. [DOI] [Google Scholar]
113.Neale B. M., UK Biobank GWAS Round 2 results (2018), http://www.nealelab.is/uk-biobank/.
114.Grasby K. L., et al. , The genetic architecture of the human cerebral cortex. Science (New York, N.Y.) 367 (2020), doi: 10.1126/science.aay6690. [DOI] [Google Scholar]
115.Tissink E., et al. , Genome-wide association study of cerebellar volume provides insights into heritable mechanisms underlying brain development and mental health. Communications Biology 5, 710 (2022), doi: 10.1038/s42003-022-03672-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
116.Tissink E., et al. , The Genetic Architectures of Functional and Structural Connectivity Properties within Cerebral Resting-State Networks. eNeuro 10 (2023), doi: 10.1523/ENEURO.0242-22.2023. [DOI] [Google Scholar]
117.Yengo L., et al. , A saturated map of common genetic variants associated with human height. Nature 610, 704–712 (2022), doi: 10.1038/s41586-022-05275-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
118.Dalvie S., et al. , Genomic influences on self-reported childhood maltreatment. Transl. Psychiatry 10 (1), 38 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
119.Gisladottir R. S., et al. , Sequence variants affecting voice pitch in humans. Science advances 9, eabq2969 (2023), doi: 10.1126/sciadv.abq2969. [DOI] [PMC free article] [PubMed] [Google Scholar]
120.Quinlan A. R., Hall I. M., BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics (Oxford, England) 26, 841–2 (2010), doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
121.Davis E. S., et al. , matchRanges: generating null hypothesis genomic ranges via covariate-matched sampling. Bioinformatics 39 (5) (2023). [Google Scholar]
122.Kuhn R. M., Haussler D., Kent W. J., The UCSC genome browser and associated tools. Briefings in bioinformatics 14, 144–61 (2013), doi: 10.1093/bib/bbs038. [DOI] [PMC free article] [PubMed] [Google Scholar]
123.Thomas T. R., et al. , Polygenic Scores Clarify the Relationship Between Mental Health and Gender Diversity. Biological Psychiatry Global Open Science 4, 100291 (2024), doi: 10.1016/j.bpsgos.2024.100291. [DOI] [PMC free article] [PubMed] [Google Scholar]
124.Casten L. G., et al. , Lingo: an automated, web-based deep phenotyping platform for language ability. medRxiv : the preprint server for health sciences (2024), doi: 10.1101/2024.03.29.24305034. [DOI] [Google Scholar]
125.Prüfer K., et al. , The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505 (7481), 43–49 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
126.Mafessoni F., et al. , A high-coverage Neandertal genome from Chagyrskaya Cave. Proc. Natl. Acad. Sci. U. S. A. 117 (26), 15132–15136 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
127.Prüfer K., et al. , A high-coverage Neandertal genome from Vindija Cave in Croatia. Science 358 (6363), 655–658 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
128.Meyer M., et al. , A high-coverage genome sequence from an archaic Denisovan individual. Science 338 (6104), 222–226 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
129.Friedman J., Hastie T., Tibshirani R., Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 (1), 1–22 (2010). [PMC free article] [PubMed] [Google Scholar]
130.York D., Evensen N. M., Martınez M. L., De Basabe Delgado J., Unified equations for the slope, intercept, and standard errors of the best straight line. Am. J. Phys. 72 (3), 367–375 (2004). [Google Scholar]
131.Paysan-Lafosse T., et al. , InterPro in 2022. Nucleic Acids Res. 51 (D1), D418–D427 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
132.Szklarczyk D., et al. , The STRING database in 2025: protein networks with directionality of regulation. Nucleic Acids Res. 53 (D1), D730–D737 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
133.Akbari A., et al. , Pervasive findings of directional selection realize the promise of ancient DNA to elucidate human adaptation. bioRxiv (2024). [Google Scholar]
134.Purcell S., et al. , PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. The American Journal of Human Genetics 81 (3), 559–575 (2007), doi: 10.1086/519795, https://doi.org/10.1086\%2F519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
135.Yang J., Lee S. H., Goddard M. E., Visscher P. M., GCTA: A Tool for Genome-wide Complex Trait Analysis. The American Journal of Human Genetics 88, 76–82 (2011), doi: 10.1016/j.ajhg.2010.11.011, https://doi.org/10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
136.Perdry H., Dandine-Roulland C., gaston: Genetic Data Handling (QC, GRM, LD, PCA) Linear Mixed Models (2023), https://CRAN.R-project.org/package=gaston, r package version 1.6.
137.Karcher N. R., Barch D. M., The ABCD study: understanding the development of risk for mental and physical health outcomes. Neuropsychopharmacology 46 (1), 131–142 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
138.Risso D., Ngai J., Speed T. P., Dudoit S., Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32 (9), 896–902 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
139.Tenenhaus A., et al. , Variable selection for generalized canonical correlation analysis. Biostatistics 15 (3), 569–583 (2014). [DOI] [PubMed] [Google Scholar]
140.Au E. H., et al. , Gonomics: uniting high performance and readability for genomics with Go. Bioinformatics 39 (8) (2023). [Google Scholar]
141.Cock P. J. A., et al. , Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25 (11), 1422–1423 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

media-1.xlsx^{(95.9KB, xlsx)}

NIHPP2025.03.07.641231V4-supplement-1.pdf^{(1.5MB, pdf)}

Data Availability Statement

Custom code for ES-PGS and all analyses in this paper is available here, including example code for ES-PGS that can be applied to other research problems:

https://github.com/lucasgcasten/language_evolution

The EpiSLI whole genome sequencing data described here is available to qualified researches via dbGaP (study accession = phs002255.v1.p1):

https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002255.v1.p1

SPARK genetic and phenotype data is available to qualified researchers at SFARI base:

https://base.sfari.org/

ABCD is available to qualified researchers at:

https://nda.nih.gov/abcd/request-access

1000 Genomes Phase 3 data is available at:

https://www.internationalgenome.org/data/

Allen Ancient DNA Resource:

https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-Neanderthal

Neanderthal and Denisovan genomes:

https://www.eva.mpg.de/genetics/genome-projects/

Cross-species sequence alignment data:

https://hgdownload.soe.ucsc.edu/goldenPath/hg38/cactus447way/

PanTHERIA:

https://figshare.com/articles/dataset/Full_Archive/3531875

We used publicly available tools for data processing and analysis: bcftools:

https://samtools.github.io/bcftools/bcftools.html

PLINK:

https://www.cog-genomics.org/plink

GCTA:

https://yanglab.westlake.edu.cn/software/gcta

LDpred2:

https://privefl.github.io/bigsnpr/articles/LDpred2.html

PRSet:

https://choishingwan.github.io/PRSice/quick_start_prset/

bedtools:

https://bedtools.readthedocs.io

Biopython:

https://biopython.org/

gonomics:

https://github.com/vertgenlab/gonomics

https://www.r-project.org/

[R1] 1.Suntsova M. V., Buzdin A. A., Differences between human and chimpanzee genomes and their implications in gene expression, protein functions and biochemical properties of the two species. BMC genomics 21, 535 (2020), doi: 10.1186/s12864-020-06962-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Britten R. J., Divergence between samples of chimpanzee and human DNA sequences is 5%, counting indels. Proceedings of the National Academy of Sciences of the United States of America 99, 13633–5 (2002), doi: 10.1073/pnas.172510699. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Varki A., Altheide T. K., Comparing the human and chimpanzee genomes: searching for needles in a haystack. Genome research 15, 1746–58 (2005), doi: 10.1101/gr.3737405. [DOI] [PubMed] [Google Scholar]

[R4] 4.Fisher S. E., Evolution of language: Lessons from the genome. Psychon. Bull. Rev. 24 (1), 34–40 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Lai C. S. L., Fisher S. E., Hurst J. A., Vargha-Khadem F., Monaco A. P., A forkhead-domain gene is mutated in a severe speech and language disorder. Nature 413, 519–523 (2001), doi: 10.1038/35097076. [DOI] [PubMed] [Google Scholar]

[R6] 6.Fisher S. E., Scharff C., FOXP2 as a molecular window into speech and language. Trends in Genetics 25, 166–177 (2009), doi: 10.1016/j.tig.2009.03.002. [DOI] [PubMed] [Google Scholar]

[R7] 7.Atkinson E. G., et al. , No evidence for recent selection at FOXP2 among diverse human populations. Cell 174 (6), 1424–1435.e15 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Mueller K. L., et al. , Common Genetic Variants in FOXP2 Are Not Associated with Individual Differences in Language Development. PLOS ONE 11, e0152576 (2016), doi: 10.1371/journal.pone.0152576. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Eising E., et al. , Genome-wide analyses of individual differences in quantitatively assessed reading- and language-related skills in up to 34,000 people. Proceedings of the National Academy of Sciences 119 (2022), doi: 10.1073/pnas.2202764119. [DOI] [Google Scholar]

[R10] 10.Polikowsky H. G., et al. , Large-scale genome-wide analyses of stuttering. Nat. Genet. 57 (8), 1835–1847 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Verhoef E., et al. , Genome-Wide Analyses of Vocabulary Size in Infancy and Toddlerhood: Associations With Attention-Deficit/Hyperactivity Disorder, Literacy, and Cognition-Related Traits. Biological Psychiatry 95, 859–869 (2024), doi: 10.1016/j.biopsych.2023.11.025. [DOI] [PubMed] [Google Scholar]

[R12] 12.Alagöz G., et al. , The shared genetic architecture and evolution of human language and musical rhythm. Nat. Hum. Behav. (2024). [Google Scholar]

[R13] 13.Doust C., et al. , Discovery of 42 genome-wide significant loci associated with dyslexia. Nature genetics 54, 1621–1629 (2022), doi: 10.1038/s41588-022-01192-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Niarchou M., et al. , Genome-wide association study of musical beat synchronization demonstrates high polygenicity. Nature Human Behaviour 6, 1292–1309 (2022), doi: 10.1038/s41562-022-01359-x. [DOI] [Google Scholar]

[R15] 15.Wirthlin M. E., et al. , Vocal learning-associated convergent evolution in mammalian proteins and regulatory elements. Science 383 (6690), eabn3263 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Kaplow I. M., et al. , Relating enhancer genetic variation across mammals to complex phenotypes using machine learning. Science 380 (6643), eabm7993 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Jarvis E. D., Evolution of vocal learning and spoken language. Science 366 (6461), 50–54 (2019). [DOI] [PubMed] [Google Scholar]

[R18] 18.Gordon R. L., et al. , Linking the genomic signatures of human beat synchronization and learned song in birds. Philos. Trans. R. Soc. Lond. B Biol. Sci. 376 (1835), 20200329 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Cahill J. A., et al. , Positive selection in noncoding genomic regions of vocal learning birds is associated with genes implicated in vocal learning and speech functions in humans. Genome Res. 31 (11), 2035–2049 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Sebastianelli M., et al. , A genomic basis of vocal rhythm in birds. Nat. Commun. 15 (1), 3095 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Tomblin J. B., The EpiSLI database: a publicly available database on speech and language. Lang. Speech Hear. Serv. Sch. 41 (1), 108–117 (2010). [DOI] [PubMed] [Google Scholar]

[R22] 22.Klem M., et al. , Sentence repetition is a measure of children’s language skills rather than working memory limitations. Dev Sci 18 (1), 146–154 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Rujas I., Mariscal S., Murillo E., Lázaro M., Sentence repetition tasks to detect and prevent language difficulties: A scoping review. Children (Basel) 8 (7), 578 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Lee S. H., et al. , Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nature Genetics 44, 247–250 (2012), doi: 10.1038/ng.1108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Finucane H. K., et al. , Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature Genetics 47, 1228–1235 (2015), doi: 10.1038/ng.3404. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Wei X., et al. , The lingering effects of Neanderthal introgression on human complex traits. Elife 12 (2023). [Google Scholar]

[R27] 27.Choi S. W., et al. , PRSet: Pathway-based polygenic risk score analyses and software. PLoS genetics 19, e1010624 (2023), doi: 10.1371/journal.pgen.1010624. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Lee J. J., et al. , Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nature genetics 50, 1112–1121 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Kuderna L. F. K., et al. , Identification of constrained sequence elements across 239 primate genomes. Nature 625, 735–742 (2024), doi: 10.1038/s41586-023-06798-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Bi X., et al. , Lineage-specific accelerated sequences underlying primate evolution. Science advances 9, eadc9507 (2023), doi: 10.1126/sciadv.adc9507. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Yoo D., et al. , Complete sequencing of ape genomes. Nature 641 (8062), 401–418 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Capra J. A., Erwin G. D., McKinsey G., Rubenstein J. L. R., Pollard K. S., Many human accelerated regions are developmental enhancers. Philosophical transactions of the Royal Society of London. Series B, Biological sciences 368, 20130025 (2013), doi: 10.1098/rstb.2013.0025. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Peyrégne S., Boyle M. J., Dannemann M., Prüfer K., Detecting ancient positive selection in humans using extended lineage sorting. Genome Res. 27 (9), 1563–1572 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Mangan R. J., et al. , Adaptive sequence divergence forged new neurodevelopmental enhancers in humans. Cell 185, 4587–4603.e23 (2022), doi: 10.1016/j.cell.2022.10.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Pollard K. S., et al. , Forces shaping the fastest evolving regions in the human genome. PLoS genetics 2, e168 (2006), doi: 10.1371/journal.pgen.0020168. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.Feliciano P., et al. , SPARK: A US Cohort of 50,000 Families to Accelerate Autism Research. Neuron 97, 488–493 (2018), doi: 10.1016/j.neuron.2018.01.015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Lisdahl K. M., et al. , Adolescent brain cognitive development (ABCD) study: Overview of substance use assessment methods. Developmental cognitive neuroscience 32, 80–96 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Li Y. E., et al. , A comparative atlas of single-cell chromatin accessibility in the human brain. Science 382 (6667), eadf7044 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.Xiao L., et al. , Expression of FoxP2 in the basal ganglia regulates vocal motor sequences in the adult songbird. Nat. Commun. 12 (1), 2617 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Ullman M. T., et al. , The neuroanatomy of developmental language disorder: a systematic review and meta-analysis. Nature Human Behaviour 8, 962–975 (2024), doi: 10.1038/s41562-024-01843-6. [DOI] [Google Scholar]

[R41] 41.Mallick S., et al. , The Allen Ancient DNA Resource (AADR) a curated compendium of ancient human genomes. Sci. Data 11 (1), 182 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] 42.Ding Y., et al. , Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature 618 (7966), 774–781 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Dediu D., Levinson S. C., Neanderthal language revisited: not only us. Curr. Opin. Behav. Sci. 21, 49–55 (2018). [Google Scholar]

[R44] 44.Conde-Valverde M., et al. , Neanderthals and Homo sapiens had similar auditory and speech capacities. Nature ecology evolution 5, 609–615 (2021), doi: 10.1038/s41559-021-01391-6. [DOI] [PubMed] [Google Scholar]

[R45] 45.Krause J., et al. , The derived FOXP2 variant of modern humans was shared with Neandertals. Curr. Biol. 17 (21), 1908–1912 (2007). [DOI] [PubMed] [Google Scholar]

[R46] 46.Jerber J., et al. , Population-scale single-cell RNA-seq profiling across dopaminergic neuron differentiation. Nat. Genet. 53 (3), 304–312 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] 47.Bethlehem R. A. I., et al. , Brain charts for the human lifespan. Nature 604 (7906), 525–533 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] 48.Emani P. S., et al. , Single-cell genomics and regulatory networks for 388 human brains. Science 384 (6698), eadi5199 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] 49.Vogelezang S., et al. , Genetics of early-life head circumference and genetic correlations with neurological, psychiatric and cognitive outcomes. BMC Med. Genomics 15 (1), 124 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] 50.Grunstra N. D. S., et al. , There is an obstetrical dilemma: Misconceptions about the evolution of human childbirth and pelvic form. Am. J. Biol. Anthropol. 181 (4), 535–544 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] 51.Xu L., et al. , The genetic architecture of and evolutionary constraints on the human pelvic form. Science 388 (6743), eadq1521 (2025). [DOI] [PubMed] [Google Scholar]

[R52] 52.Kuderna L. F. K., et al. , A global catalog of whole-genome diversity from 233 primate species. Science 380 (6648), 906–913 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] 53.Ives A. R., Garland T., Jr, Phylogenetic logistic regression for binary dependent variables. Syst. Biol. 59 (1), 9–26 (2010). [DOI] [PubMed] [Google Scholar]

[R54] 54.Ho L. s. T., Ané C., A linear-time algorithm for Gaussian and non-Gaussian trait evolution models. Syst. Biol. 63 (3), 397–408 (2014). [DOI] [PubMed] [Google Scholar]

[R55] 55.Vernes S. C., et al. , A Functional Genetic Link between Distinct Developmental Language Disorders. New England Journal of Medicine 359, 2337–2345 (2008), doi: 10.1056/NEJMoa0802828. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R56] 56.Rajagopal V. M., et al. , Genome-wide association study of school grades identifies genetic overlap between language ability, psychopathology and creativity. Scientific Reports 13, 429 (2023), doi: 10.1038/s41598-022-26845-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R57] 57.Hickey S. L., Berto S., Konopka G., Chromatin decondensation by FOXP2 promotes human neuron maturation and expression of neurodevelopmental disease genes. Cell Rep. 27 (6), 1699–1711.e9 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R58] 58.Lambert S. A., et al. , The human transcription factors. Cell 172 (4), 650–665 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R59] 59.Hobert O., Homeobox genes and the specification of neuronal identity. Nat. Rev. Neurosci. 22 (10), 627–636 (2021). [DOI] [PubMed] [Google Scholar]

[R60] 60.Co M., Anderson A. G., Konopka G., FOXP transcription factors in vertebrate brain development, function, and disorders. Wiley Interdiscip. Rev. Dev. Biol. 9 (5), e375 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R61] 61.Mariani B., et al. , Prenatal experience with language shapes the brain. Sci. Adv. 9 (47), eadj3524 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R62] 62.Madigan S., Wade M., Plamondon A., Browne D., Jenkins J. M., Birth weight variability and language development: Risk, resilience, and responsive parenting. J. Pediatr. Psychol. 40 (9), 869–877 (2015). [DOI] [PubMed] [Google Scholar]

[R63] 63.Fornes O., et al. , JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 48 (D1), D87–D92 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R64] 64.Jones K. E., et al. , PanTHERIA: a species-level database of life history, ecology, and geography of extant and recently extinct mammals. Ecology 90 (9), 2648–2648 (2009). [Google Scholar]

[R65] 65.Tomblin J. B., Records N. L., Zhang X., A System for the Diagnosis of Specific Language Impairment in Kindergarten Children. Journal of Speech, Language, and Hearing Research 39 (6), 1284–1294 (1996), doi: 10.1044/jshr.3906.1284, https://doi.org/10.1044\%2Fjshr.3906.1284. [DOI] [Google Scholar]

[R66] 66.Tomblin J. B., Nippold M. A., eds., Understanding Individual Differences in Language Development Across the School Years (Psychology Press; ) (2014), doi: 10.4324/9781315796987, https://doi.org/10.4324\%2F9781315796987. [DOI] [Google Scholar]

[R67] 67.Newcomer P. L., Hammill D. D., Test of Language Development – 2 Primary (Pro-Ed; Austin, TX: ) (1988). [Google Scholar]

[R68] 68.Wechsler D., Wechsler preschool and primary scale of intelligence–revised (1989), title of the publication associated with this dataset: PsycTESTS Dataset.

[R69] 69.Culatta B., Page J. L., Ellis J., Story retelling as a communicative performance screening tool. Lang. Speech Hear. Serv. Sch. 14 (2), 66–74 (1983). [Google Scholar]

[R70] 70.Woodcock R. W., Woodcock reading mastery tests-revised (1987), title of the publication associated with this dataset: PsycTESTS Dataset.

[R71] 71.Catts H. W., The relationship between speech-language impairments and reading disabilities. J. Speech Lang. Hear. Res. 36 (5), 948–958 (1993). [Google Scholar]

[R72] 72.Achenbach T. M., Child Behavior Checklist (Springer; New York: ), pp. 546–552 (2011), doi: 10.1007/978-0-387-79948-3_1529. [DOI] [Google Scholar]

[R73] 73.R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria: (2020), https://www.R-project.org/. [Google Scholar]

[R74] 74.Harrell F. E. Jr, with contributions from Charles Dupont, many others., Hmisc: Harrell Miscellaneous (2020), https://CRAN.R-project.org/package=Hmisc, r package version 4.4–0.

[R75] 75.Mayer M., missRanger: Fast Imputation of Missing Values (2019), https://CRAN.R-project.org/package=missRanger, r package version 2.1.0.

[R76] 76.Andrews S., et al. , FastQC, Babraham Institute (2012).

[R77] 77.Chapman B., et al. , bcbio/bcbio-nextgen: v1.1.6 (2019), doi: 10.5281/zenodo.3564939, https://doi.org/10.5281/zenodo.3564939. [DOI]

[R78] 78.Li H., Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv (2013). [Google Scholar]

[R79] 79.McKenna A., et al. , The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20, 1297–1303 (2010), doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R80] 80.Garrison E., Marth G., Haplotype-based variant detection from short-read sequencing. arXiv (2012). [Google Scholar]

[R81] 81.Rimmer A., et al. , Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nature Genetics 46, 912–918 (2014), doi: 10.1038/ng.3036. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R82] 82.A global reference for human genetic variation. Nature 526 (7571), 68–74 (2015), doi: 10.1038/nature15393, https://doi.org/10.1038\%2Fnature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R83] 83.McLaren W., et al. , The Ensembl Variant Effect Predictor. Genome Biology 17 (1) (2016), doi: 10.1186/s13059-016-0974-4, https://doi.org/10.1186\%2Fs13059-016-0974-4. [DOI] [Google Scholar]

[R84] 84.Karczewski K. J., et al. , The mutational constraint spectrum quantified from variation in 141,456 humans (2019), doi: 10.1101/531210, https://doi.org/10.1101\%2F531210. [DOI] [Google Scholar]

[R85] 85.Smigielski E. M., dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Research 28 (1), 352–355 (2000), doi: 10.1093/nar/28.1.352, https://doi.org/10.1093\%2Fnar\%2F28.1.352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R86] 86.Pedersen B. S., Layer R. M., Quinlan A. R., Vcfanno: fast, flexible annotation of genetic variants. Genome Biology 17 (1) (2016), doi: 10.1186/s13059-016-0973-5, https://doi.org/10.1186\%2Fs13059-016-0973-5. [DOI] [Google Scholar]

[R87] 87.Choi S. W., Mak T. S.-H., O’Reilly P. F., Tutorial: a guide to performing polygenic risk score analyses. Nature protocols 15, 2759–2772 (2020), doi: 10.1038/s41596-020-0353-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R88] 88.Privé F., Albiñana C., Arbel J., Pasaniuc B., Vilhjálmsson B. J., Inferring disease architecture and predictive ability with LDpred2-auto. The American Journal of Human Genetics 110, 2042–2055 (2023), doi: 10.1016/j.ajhg.2023.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R89] 89.Demontis D., et al. , Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nature genetics 51 (1), 63–75 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R90] 90.Hatoum A. S., et al. , Multivariate genome-wide association meta-analysis of over 1 million subjects identifies loci underlying multiple substance use disorders. Nat. Ment. Health 1 (3), 210–223 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R91] 91.Walters R. K., et al. , Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders. Nature Neuroscience 21, 1656–1669 (2018), doi: 10.1038/s41593-018-0275-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R92] 92.Wightman D. P., et al. , A genome-wide association study with 1,126,563 individuals identifies new risk loci for Alzheimer’s disease. Nature Genetics 53, 1276–1282 (2021), doi: 10.1038/s41588-021-00921-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R93] 93.Grove J., et al. , Identification of common genetic risk variants for autism spectrum disorder. Nature genetics 51 (3), 431–444 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R94] 94.Watson H. J., et al. , Genome-wide association study identifies eight risk loci and implicates metabo-psychiatric origins for anorexia nervosa. Nature genetics 51, 1207–1214 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R95] 95.Mullins N., et al. , Genome-wide association study of more than 40,000 bipolar disorder cases provides new insights into the underlying biology. Nature Genetics 53, 817–829 (2021), doi: 10.1038/s41588-021-00857-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R96] 96.Johnson E. C., et al. , A large-scale genome-wide association study meta-analysis of cannabis use disorder. Lancet Psychiatry 7 (12), 1032–1045 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R97] 97.Howard D. M., et al. , Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nature neuroscience 22, 343–352 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R98] 98.International League Against Epilepsy Consortium on Complex Epilepsies, GWAS meta-analysis of over 29,000 people with epilepsy identifies 26 risk loci and subtype-specific genetic architecture. Nat. Genet. 55 (9), 1471–1482 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R99] 99.Watanabe K., et al. , Genome-wide meta-analysis of insomnia prioritizes genes associated with metabolic and psychiatric pathways. Nature Genetics 54, 1125–1132 (2022), doi: 10.1038/s41588-022-01124-w. [DOI] [PubMed] [Google Scholar]

[R100] 100.Huang Q. Q., et al. , Examining the role of common variants in rare neurodevelopmental conditions. Nature 636 (8042), 404–411 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R101] 101.Nievergelt C. M., et al. , International meta-analysis of PTSD genome-wide association studies identifies sex- and ancestry-specific genetic risk loci. Nature Communications 10, 4558 (2019), doi: 10.1038/s41467-019-12576-w. [DOI] [Google Scholar]

[R102] 102.Trubetskoy V., et al. , Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 604, 502–508 (2022), doi: 10.1038/s41586-022-04434-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R103] 103.Yu D., et al. , Interrogating the genetic determinants of Tourette’s syndrome and other tic disorders through genome-wide association studies. Am. J. Psychiatry 176 (3), 217–227 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R104] 104.Rietveld C. A., et al. , Common genetic variants associated with cognitive performance identified using the proxy-phenotype method. Proceedings of the National Academy of Sciences 111 (38), 13790–13794 (2014), doi: 10.1073/pnas.1404623111, https://doi.org/10.1073\%2Fpnas.1404623111. [DOI] [Google Scholar]

[R105] 105.Okbay A., et al. , Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nature Genetics 54, 437–449 (2022), doi: 10.1038/s41588-022-01016-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R106] 106.Hatoum A. S., et al. , Genome-wide Association Study Shows That Executive Functioning Is Influenced by GABAergic Processes and Is a Neurocognitive Genetic Correlate of Psychiatric Disorders. Biological psychiatry 93, 59–70 (2023), doi: 10.1016/j.biopsych.2022.06.034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R107] 107.de la Fuente J., Davies G., Grotzinger A. D., Tucker-Drob E. M., Deary I. J., A general dimension of genetic sharing across diverse cognitive traits inferred from molecular data. Nature human behaviour 5, 49–58 (2021), doi: 10.1038/s41562-020-00936-2. [DOI] [Google Scholar]

[R108] 108.Ip H. F., et al. , Genetic association study of childhood aggression across raters, instruments, and age. Transl. Psychiatry 11 (1), 413 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R109] 109.Tielbeek J. J., et al. , Uncovering the genetic architecture of broad antisocial behavior through a genome-wide association study meta-analysis. Molecular psychiatry 27, 4453–4463 (2022), doi: 10.1038/s41380-022-01793-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R110] 110.Warrier V., et al. , Genome-wide analyses of self-reported empathy: correlations with autism, schizophrenia, and anorexia nervosa. Translational psychiatry 8, 35 (2018), doi: 10.1038/s41398-017-0082-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R111] 111.Gupta P., et al. , A genome-wide investigation into the underlying genetic architecture of personality traits and overlap with psychopathology. Nature Human Behaviour (2024), doi: 10.1038/s41562-024-01951-3. [DOI] [Google Scholar]

[R112] 112.Hill W. D., et al. , Genome-wide analysis identifies molecular systems and 149 genetic loci associated with income. Nature communications 10, 5741 (2019), doi: 10.1038/s41467-019-13585-5. [DOI] [Google Scholar]

[R113] 113.Neale B. M., UK Biobank GWAS Round 2 results (2018), http://www.nealelab.is/uk-biobank/.

[R114] 114.Grasby K. L., et al. , The genetic architecture of the human cerebral cortex. Science (New York, N.Y.) 367 (2020), doi: 10.1126/science.aay6690. [DOI] [Google Scholar]

[R115] 115.Tissink E., et al. , Genome-wide association study of cerebellar volume provides insights into heritable mechanisms underlying brain development and mental health. Communications Biology 5, 710 (2022), doi: 10.1038/s42003-022-03672-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R116] 116.Tissink E., et al. , The Genetic Architectures of Functional and Structural Connectivity Properties within Cerebral Resting-State Networks. eNeuro 10 (2023), doi: 10.1523/ENEURO.0242-22.2023. [DOI] [Google Scholar]

[R117] 117.Yengo L., et al. , A saturated map of common genetic variants associated with human height. Nature 610, 704–712 (2022), doi: 10.1038/s41586-022-05275-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R118] 118.Dalvie S., et al. , Genomic influences on self-reported childhood maltreatment. Transl. Psychiatry 10 (1), 38 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R119] 119.Gisladottir R. S., et al. , Sequence variants affecting voice pitch in humans. Science advances 9, eabq2969 (2023), doi: 10.1126/sciadv.abq2969. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R120] 120.Quinlan A. R., Hall I. M., BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics (Oxford, England) 26, 841–2 (2010), doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R121] 121.Davis E. S., et al. , matchRanges: generating null hypothesis genomic ranges via covariate-matched sampling. Bioinformatics 39 (5) (2023). [Google Scholar]

[R122] 122.Kuhn R. M., Haussler D., Kent W. J., The UCSC genome browser and associated tools. Briefings in bioinformatics 14, 144–61 (2013), doi: 10.1093/bib/bbs038. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R123] 123.Thomas T. R., et al. , Polygenic Scores Clarify the Relationship Between Mental Health and Gender Diversity. Biological Psychiatry Global Open Science 4, 100291 (2024), doi: 10.1016/j.bpsgos.2024.100291. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R124] 124.Casten L. G., et al. , Lingo: an automated, web-based deep phenotyping platform for language ability. medRxiv : the preprint server for health sciences (2024), doi: 10.1101/2024.03.29.24305034. [DOI] [Google Scholar]

[R125] 125.Prüfer K., et al. , The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505 (7481), 43–49 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R126] 126.Mafessoni F., et al. , A high-coverage Neandertal genome from Chagyrskaya Cave. Proc. Natl. Acad. Sci. U. S. A. 117 (26), 15132–15136 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R127] 127.Prüfer K., et al. , A high-coverage Neandertal genome from Vindija Cave in Croatia. Science 358 (6363), 655–658 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R128] 128.Meyer M., et al. , A high-coverage genome sequence from an archaic Denisovan individual. Science 338 (6104), 222–226 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R129] 129.Friedman J., Hastie T., Tibshirani R., Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 (1), 1–22 (2010). [PMC free article] [PubMed] [Google Scholar]

[R130] 130.York D., Evensen N. M., Martınez M. L., De Basabe Delgado J., Unified equations for the slope, intercept, and standard errors of the best straight line. Am. J. Phys. 72 (3), 367–375 (2004). [Google Scholar]

[R131] 131.Paysan-Lafosse T., et al. , InterPro in 2022. Nucleic Acids Res. 51 (D1), D418–D427 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R132] 132.Szklarczyk D., et al. , The STRING database in 2025: protein networks with directionality of regulation. Nucleic Acids Res. 53 (D1), D730–D737 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R133] 133.Akbari A., et al. , Pervasive findings of directional selection realize the promise of ancient DNA to elucidate human adaptation. bioRxiv (2024). [Google Scholar]

[R134] 134.Purcell S., et al. , PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. The American Journal of Human Genetics 81 (3), 559–575 (2007), doi: 10.1086/519795, https://doi.org/10.1086\%2F519795. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R135] 135.Yang J., Lee S. H., Goddard M. E., Visscher P. M., GCTA: A Tool for Genome-wide Complex Trait Analysis. The American Journal of Human Genetics 88, 76–82 (2011), doi: 10.1016/j.ajhg.2010.11.011, https://doi.org/10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R136] 136.Perdry H., Dandine-Roulland C., gaston: Genetic Data Handling (QC, GRM, LD, PCA) Linear Mixed Models (2023), https://CRAN.R-project.org/package=gaston, r package version 1.6.

[R137] 137.Karcher N. R., Barch D. M., The ABCD study: understanding the development of risk for mental and physical health outcomes. Neuropsychopharmacology 46 (1), 131–142 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R138] 138.Risso D., Ngai J., Speed T. P., Dudoit S., Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32 (9), 896–902 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R139] 139.Tenenhaus A., et al. , Variable selection for generalized canonical correlation analysis. Biostatistics 15 (3), 569–583 (2014). [DOI] [PubMed] [Google Scholar]

[R140] 140.Au E. H., et al. , Gonomics: uniting high performance and readability for genomics with Go. Bioinformatics 39 (8) (2023). [Google Scholar]

[R141] 141.Cock P. J. A., et al. , Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25 (11), 1422–1423 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

This is a preprint.

Ancient regulatory evolution shapes individual language abilities in present-day humans

Lucas G Casten

Tanner Koomar

Taylor R Thomas

Jin-Young Koh

Dabney Hofammann

Savantha Thenuwara

Allison Momany

Marlea O’Brien

Jeff C Murray

J Bruce Tomblin

Jacob J Michaelson

Roles

Abstract

Results

Dimensions of language ability

Figure 2: Factor loadings and genetic associations.

Evolutionary Stratified Polygenic Score Analysis

Human-specific genomic regions predict individual differences in language ability

Figure 3: HAQERs are associated with language ability and not nonverbal IQ.

HAQERs influence language ability across multiple cohorts and variant types

Figure 4: Large-scale validation of HAQERs association with language ability.

HAQERs evolved stronger binding affinity for language-relevant transcription factors

Figure 5: Hominin gained transcription factor binding in HAQERs influences language.

HAQERs regulate language-relevant brain circuits through human-specific chromatin accessibility

Selective pressures acting on language and general cognition

Figure 6: Selective pressures on human cognition and convergent evolution of vocal learning.

Evidence of balancing selection from modern genomes

HAQERs influence prenatal brain development

HAQERs link language evolution to the obstetric dilemma

HAQER-like sequences show convergent evolution in vocal learning mammals

Discussion

Materials and methods summary

Supplementary Material

Figure 1:

Acknowledgments

General:

Funding:

Funding Statement

Footnotes

Data and materials availability:

References and Notes

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases