Skip to main content
Nature Communications logoLink to Nature Communications
. 2025 Jul 1;16:5625. doi: 10.1038/s41467-025-60665-w

Decoding DNA sequence-driven evolution of the human brain epigenome at cellular resolution

Emre Caglayan 1,2,3,, Genevieve Konopka 1,2,
PMCID: PMC12216504  PMID: 40595532

Abstract

DNA-based evolutionary comparisons of regulatory genomic elements enable insight into functional changes driven in cis, partially overcoming tissue inaccessibility. Here, we harnessed adult and fetal cortex single-cell ATAC-seq datasets to uncover DNA substitutions specific to the human and human-ancestral lineages within apes. We found that fetal microglia identity is evolutionarily divergent in all lineages, whereas other cell types are conserved. Using multiomic datasets, we further identified genes linked to multiple lineage-divergent gene regulatory elements and implicated biological pathways associated with these divergent features. We also uncovered patterns of transcription factor binding site evolution across lineages and identified expansion of bHLH-PAS transcription factor targets in human-hominin lineages, and MEF2 transcription factor targets in the ape lineage. Finally, conserved features were more enriched in brain disease variants, whereas there was no distinct enrichment of brain disease variants on the human lineage compared to its ancestral lineages. Our study identifies ancestral evolutionary patterns of the human brain epigenome at cellular resolution.

Subject terms: Evolutionary genetics, Gene regulation, Genetics of the nervous system


How genetic changes prior to the human-chimpanzee split contributed to human brain evolution remains unknown. Here, the authors address this by identifying open-chromatin regions divergent in various lineages and uncover evolutionary patterns in the last ~25 million years of human brain evolution.

Introduction

Comparative genomics is widely used to investigate evolutionary patterns at the molecular level1,2. Genetic novelties in the human lineage are exceptionally well studied due to their importance in understanding human evolution with notable examples including human accelerated regions (HARs) that possess a significantly greater number of DNA substitutions in the human lineage compared to chimpanzees and other non-human species3, human-specific deletions in otherwise conserved regions (hCONDELs)4,5, and newly identified human-specific regulatory elements (HAQERs)6. Functional impact of these elements are investigated both per region7,8 and across all regions using functional omics strategies911.

In addition to identifying the newly evolved features of the human genome and following up by in-depth but low-throughput functional characterization, certain assays allow direct functional characterization of coding and non-coding genome in high throughput. New approaches for assaying gene expression and chromatin accessibility at single-cell resolution have been particularly powerful, as they can be scaled across thousands of cells in a single experiment. These assays have been applied to the transcriptome and epigenome across the immense cellular heterogeneity of the human and non-human primate brains to uncover species-specific molecular changes at cell type resolution1215. However, the availability of high-quality brain tissue from non-human primates is a limiting factor especially from endangered species such as all apes, particularly from early developmental periods. Hence, studies to date have focused mainly on the postnatal brain tissues and only included humans and chimpanzees among the apes to understand human brain evolution at cellular resolution1215. Despite the limitation of tissue availability, ancestral history of the human brain’s cellularly resolved epigenomic landscape can be interrogated for their sequence divergence across ape species.

Here, we implemented this approach through an integrative analysis of single-nucleus ATAC-seq and multi-omics datasets from adult and fetal human cortex. Focusing on the chromatin accessible regions within the human brain, we identified unique substitutions in both the human lineage and ancestral lineages within the ape clade by comparing DNA sequences across apes, old world monkeys and new world monkeys. We found unexpected cell type evolution patterns in fetal and adult brains, notably in microglia. We also used these multiomic datasets to identify lineage-specific links between genes and open chromatin regions as well as transcription factor binding site (TFBS) expansion patterns. Finally, we investigated the disease susceptibility of divergent and conserved open chromatin regions. Thus, our results provide insights into the cellular and functional evolutionary changes in the human lineage and its ancestral lineages within the ape clade though systematic analysis of DNA substitutions.

Results

Identification of lineage specific substitutions

We utilized single-nucleus ATAC-seq datasets from fetal brain cortex (post-conceptional weeks 16–24, only from humans, four individuals, ~31,000 cells, 380,455 open chromatin regions) and adult brain cortex (30–70 years old from humans and age-matched chimpanzees and rhesus macaques, four individuals from each species, ~25,000 human cells, 364,657 open chromatin regions)13,16. Adult open chromatin regions are identified from posterior cingulate cortex that overlaps >90% with other cortical regions13, indicating that it is largely representative of the open chromatin regions in the entire neocortex Due to limited regional datasets, the fetal open chromatin regions are identified from the entire cerebral cortex. Together, these datasets provide a comprehensive list of cellularly resolved chromatin accessibility profiles of open chromatin regions hereby referred to as gene-regulatory elements (GREs) (Supplementary Fig. 1a–c). We found that ~59% of all GREs were specific to either fetal or the adult cortex, underscoring the importance of a comprehensive investigation (Supplementary Fig. 1c).

To uncover how DNA sequences may have evolved within these GREs, we identified the substitution events in the human lineage and human-ancestral lineages within these regions. Based on the availability of extant species genomes, we focused on 4 human-ancestral lineages in addition to the human lineage: Hominin, African great ape (A. G. Ape), Great ape (G. Ape) and Ape. We reconstructed ancestral sequences using the HKY substitution model and only considered substitutions that occurred once among these 5 lineages (Fig. 1a, Supplementary Fig. 1d–h, Supplementary Data 12, Methods). Ape-specific substitutions were identified by using old world monkeys as a sister group and new world monkeys as outgroup. As expected, we identified more substitutions on lineages with a longer branch length (i.e., longer evolutionary time) (Human, A. G. Ape, Ape) than lineages with a shorter branch length (Hominin and G. Ape) (Fig. 1b, d). Substitution rates were similar across the lineages after branch length normalization (Fig. 1c, e). To reveal cell type specific evolutionary patterns, we then determined major cell types in fetal and adult datasets. Specifically, we combined original annotations of neuronal subtypes and intermediate progenitor cells into broader categories (e.g. excitatory, inhibitory) to reduce loss of power for cell types with higher heterogeneity (Supplementary Fig. 1a, b). For each cell type, we identified GREs with significantly greater chromatin accessibility in the given cell type compared to all other cell types for adult and fetal datasets (fold change > 2 and FDR < 0.05). (Supplementary Data 3). We named these cell type specific GREs. We then counted the normalized substitutions within the cell type specific GREs, and divided values for each cell type to the mean value across all cell types to highlight differences in substitution loads across cell types (Methods). While there was a significant excess of substitutions in certain cell types compared to others for various lineages in the adult dataset, the fetal dataset revealed significantly higher substitutions only in microglia specific GREs (Fig. 1f, g). We note that this relative increase in substitution load is substantially high in all lineages except for the human lineage.

Fig. 1. Identification of lineage-specific substitutions and divergent GREs.

Fig. 1

a Outline of the methodology to identify lineage-specific substitutions. be Substitution ratios within GREs (Gene Regulatory Elements) per lineage. (b, d: GRE length normalized. c, e GRE length and branch length normalized. b, c adult, (d, e) fetal). f, g Substitution ratios normalized to the mean value per lineage. Asterisks indicate FDR <1e-5 (Chi-square test). h, i Number of GREs within each group in adult (h) and fetal (i) datasets. j Fisher’s exact test of overlaps between accessibility and substitution groups. Asterisk indicates FDR < 0.05. k Number of substitutions per million year per kb per lineage in adult, fetal and HAR datasets. l, m Fisher’s exact test of overlaps between cell type marker GREs and substitution groups. Asterisk indicates FDR < 0.05. Blue colors indicate depletions, red colors indicate enrichments. Source data are provided as a Source Data file. Boxplots represent the median (center line), interquartile range (box), and the minimum and maximum values within 1.5 times the interquartile range (whiskers) in (be) (k). A.G. Apes African Great Apes. G. Apes Great Apes.

Identification of lineage divergent gene regulatory elements

To identify divergent and conserved GREs, we utilized relative substitution levels across lineages and set up a list of criteria to call each GRE as lineage-divergent, conserved or unclassified. We have referred to lineage-divergent GREs after the lineage they are divergent in (human-divergent GREs, hominin divergent GREs etc.) (Fig. 1h, i, Supplementary Fig. 1i, j, Supplementary Data 4, Methods). To test whether substitution-driven GRE classification is concordant with the chromatin accessibility measurements, we performed enrichments with species-specific accessibility changes in the adult dataset13. We found significant enrichments only between human-specific changes of accessibility and human-specific changes of substitution or conserved accessibility and conserved substitution (Fig. 1j, Source Data). We next compared the level of lineage-specific substitutions within human-divergent GREs to lineage-specific substitutions within HARs that are similarly defined by high substitution rate in humans but also require high conservation in non-human mammals3. We found that both the human-divergent GREs and HARs display high substitution rates in the human lineage and low substitution rates in ancestral lineages, although this contrast is greater in HARs due to stricter requirement for conservation outside the human lineage (Fig. 1k). As expected, we found 10-fold greater overlap between HARs or recently identified cortical13 and human-divergent GREs compared to between HARs and ancestrally divergent GREs (Supplementary Fig. 1k). Another hallmark of accelerated regions is the high GC conversion ratio17. We found that GC conversion ratio is also higher within HARs and cortical HARs compared to all lineage-divergent GREs (Supplementary Fig. 1l). These results show that lineage-divergent GREs have less divergence than accelerated regions such as HARs; however, lineage-divergent GREs are more abundant and thus provide greater power for robust downstream functional analyses.

To complement our analysis of substitutions in cell type specific GREs, we asked whether cell type specific GREs are enriched in lineage-divergent GREs. We note that in our previous analysis, we compared substitution rates across cell type specific GREs and did not test whether they were enriched compared to the background of all GREs (Fig. 1f, g). We found that cell type specific GREs are depleted in most lineage-divergent GREs and enriched in conserved GREs (Fig. 1l, m, Source Data). Surprisingly, only fetal microglia specific GREs are uniquely enriched in lineage-divergent GREs, while depleted in conserved GREs, suggesting accelerated fetal microglia evolution (Fig. 1l, m, Source Data). Interestingly, we noticed that the level of enrichment is the greatest in ape lineage (fold change: 1.54) and lowest in hominin lineage (fold change: 1.24) with no significant enrichment in the human lineage (fold change: 0.98), indicating that the rate of fetal microglial GRE evolutionary divergence was reduced over time. We wondered whether this trend was similar in the chimpanzee lineage. To determine this, we identified chimpanzee-specific substitutions and chimpanzee-divergent GREs. Both chimpanzee-specific substitutions and chimpanzee-divergent GREs revealed similar enrichment levels in cell type specific GREs compared to human-divergent GREs (Supplementary Fig. 2). We also assessed the specificity of this finding for fetal tissue by dividing fetal and adult microglia specific GREs into either shared or fetal/adult-specific categories based on their overlap with each other and performing enrichment with both categories. We observed that the fetal-specific category showed the greatest enrichment, further underscoring the evolutionary divergence of fetal microglia (Supplementary Fig. 3a, b). Additionally, we considered the possibility that microglial enrichment may reflect a relative evolutionary divergence of non-neural lineages since all other cell types are derived from a neural lineage. To answer this, we performed the enrichment analysis with vasculature cell type specific GREs (endothelia and pericytes) but did not find a similar level of enrichment with the divergent GREs (Supplementary Fig. 3c). Taken together, our results suggest accelerated evolution of fetal microglia in human-ancestral lineages that may have substantially decelerated before the human-chimpanzee split.

Identifying regulatorily divergent genes using multiomic datasets

We next incorporated gene expression measures into our analyses by utilizing adult cortical brain (prefrontal cortex15) and fetal cortical brain (cerebral cortex16) single-nucleus multiomic (ATAC-seq and RNA-seq) datasets from humans. The use of these multiomic datasets allowed us to uncover genes linked to lineage-divergent GREs by correlating the gene expression and chromatin accessibility (Supplementary Data 5). We generated GRE-gene linkage scores and found that ~40% of all GREs were significantly linked to a gene and the GRE-gene linkage scores were consistent across biological replicates in both datasets, indicating high reproducibility (Supplementary Fig. 4a, b). Next, we looked to identify lineage-divergent GRE-gene linkages. We created a randomized background of GRE-gene linkages and applied multiple filters, to which we compared the GRE-gene linkages in each lineage. We called the genes in linkages that were significantly different from the background regulatorily divergent genes (RDGs, Methods). We identified ~20–80 RDGs per lineage in the adult cortical brain, with limited overlap between lineages (Fig. 2a, Supplementary Fig. 4c, Supplementary Data 6, Methods). Notable RDGs in both the human and hominin lineages included MSRA, which encodes a methionine sulfoxide reductase implicated in both schizophrenia and autism18 and CHD13, a gene implicated in autism and attention deficit disorder19 (Fig. 2b). Moreover, human-divergent and hominin-divergent GREs linked to the same gene were distinct, indicating significant regulatory differences in both lineages (Fig. 2c, d). We also found a significant overlap with human-specific gene expression changes only for human RDGs but not for RDGs from ancestral lineages, further supporting the specificity and functional relevance of RDGs (Fig. 2e). We next identified RDGs in the fetal cortical brain with low overlap to adult RDGs in all lineages (Fig. 2f, g, Supplementary Fig. 4d–g, Supplementary Data 6). An example fetal-specific human RDG was IGFR, with 7 out of 12 of them are linked GREs divergent in humans only in fetal tissue (Fig. 2h, i).

Fig. 2. Identification of regulation divergent genes (RDGs).

Fig. 2

a Number of genes linked to divergent GREs significantly more compared to the background for each lineage. b Number of divergent GREs linked to CDH13 and MSRA per lineage. Red rectangles indicate significant associations. c Coverage plot showing all GREs (labeled as Peaks) linked to CDH13 expression. Blue colored peak regions indicate human-divergent GREs and light blue colored peak regions indicate hominin-divergent GREs. d Number of substitutions per million year per kb for the selected human-divergent and hominin-divergent GREs. e FDR adjusted empirical p-values (one-sided) of HS-DEG (Human-Specific Differentially Expressed Gene) and RDG association per lineage (y-axis: ratio of HS-DEGs that are also RDGs among all HS-DEGs.). Boxplots indicate randomized overlap (randomly selected ‘HS-DEGs’) repeated 1000 times. Red dot indicates the observed overlap. f Same as (a) but for fetal dataset. g Overlap between fetal and adult RDGs. h Number of divergent GREs linked to IGF1R per lineage in fetal (left) and adult (right) datasets. i Same as (c) but for IGF1R in fetal (top) and adult (bottom) datasets. Boxplots represent the median (center line), interquartile range (box), and the minimum and maximum values within 1.5 times the interquartile range (whiskers) in (e).

Functional enrichments of divergent-GREs

To gain biological insight on the evolutionary divergence of fetal microglia, we leveraged GRE-gene linkages to find genes associated with fetal microglia specific GREs for each group (lineage-divergent GREs and conserved GREs). We then found the proportion of these genes that are significantly upregulated in each fetal microglia subtype that were identified in a previous study20 (Fig. 3a). Comparing between divergent and conserved groups, we found that cytokine-associated microglial cells were more associated with the genes linked to divergent GREs than genes linked to conserved GREs (Fig. 3b). We noticed a trend in other microglial cells, but either the p-value or the effect size did not survive a typical threshold (odds ratio > 1.5 and p-value < 0.05, Chi-square test). We highlight some of these genes that are also identified as RDGs in at least one of the lineages (Fig. 3c).

Fig. 3. Functional enrichment analyses of divergent features.

Fig. 3

a UMAP of fetal microglia subtypes. b Comparison of subtype marker enrichment between divergent and non-divergent GREs. For each group, GREs were linked to genes and ratio of subtype markers (significant upregulation in subtype) were calculated (y-axis). P-value and odds ratio were obtained from two-sided chi-square test between all divergent groups and the non-divergent group. Red colors indicate p-value < 0.05 and odds ratio > 1.3. c Genes that are RDG in at least one group, upregulated in cytokine-associated microglia and linked to at least one divergent fetal microglia. d Gene ontology enrichment results of GREAT for adult dataset. Each term is scaled separately with legends shown on the right side. Asterisk indicates FDR < 0.05 and fold enrichment > 1.3. e Number of divergent GREs for genes associated with the gene ontology term. RDGs are shown in blue, other genes are shown in gray.

To uncover potential biological processes associated with each lineage, we then performed gene ontology enrichments on GREs divergent in one lineage compared to the background of GREs divergent in other lineages using GREAT21,22. This analysis revealed significant gene ontology enrichments for most lineages in adult and fetal datasets, with the greatest number of terms enriched in the human lineage in the adult dataset (Supplementary Data 7, Supplementary Fig. 5a, b). Within the adult human lineage, we identified adenosine deaminase activity, lipoprotein particle binding and methionine metabolic process among the top enrichments, highlighting metabolic pathways and RNA editing activity (Fig. 3d, Supplementary Fig. 5c, d). These terms were strongly contributed by the GREs linked to RDGs; however, more than one gene (including non-RDGs) contributed to the overall enrichment (Fig. 3e). Together, these analyses implicate biological pathways that may be lineage-specifically altered in the human brain evolution.

Evolution of transcription factor binding sites

DNA substitutions can alter transcription factor (TF) binding sites (TFBS)23. To elucidate the ancestral TFBS evolution patterns, we determined the gains and losses of all TFBS per lineage in all GREs (Supplementary Data 8, Methods). We created a binary matrix of motif presence per GRE across lineages and only considered stable changes as a TFBS gain or loss (Fig. 4a). We then reported the gain/loss ratio per TFBS as a readout to identify TFBS expansions (gain/loss > 1) and depletions (gain/loss < 1) across lineages (Fig. 4b). Total gain/loss ratios favored more gains than losses and were similar across lineages and datasets (Supplementary Fig. 6a, b). Gain/loss ratios across TFBSs also showed high correlations between adult and fetal datasets for all lineages (Supplementary Fig. 6c). However, gain/loss ratio correlations across TFBSs revealed higher correlations between closer lineages, and lower correlations between more distant lineages (Fig. 4c, d).

Fig. 4. Transcription factor binding site evolution in human and ancestral lineages.

Fig. 4

a, b Identification of motif occurrences per lineage per GRE and total TFBS gains and losses per lineage. c, d Spearman rank correlations of gain / loss ratios across all motifs between lineages. eg Gain / loss ratios of bHLH-PAS TFs (e), ETS-related TFs (f) and MEF2 TFs (g). Dashed rectangles indicate significant expansions (red) and significant depletions (black). Horizontal dashed line indicates the global gain / loss ratio. hk Fisher’s exact test (one-tailed) of gained or lost TFBSs in cell type marker GREs. Asterisk indicates FDR < 0.05 (h, i: Adult, j, k: Fetal).

We then identified the significant TFBS expansions/depletions per lineage by statistically assessing deviations from the background and other lineages (Supplementary Data 9, Methods). Most expansions and depletions are detected in human, hominin and ape lineages (Supplementary Fig. 6d–g). Human-expanded TFBSs were similarly expanded in hominins and vice versa, although ape-expanded TFBSs displayed distinctly greater gain/loss ratio than all other lineages (Supplementary Fig. 6h, k). We also detected a pattern of inverse correlation between human-hominin and ape lineages as expanded TFBSs in one group are relatively depleted in the other and vice versa (Supplementary Fig. 6h–k). Among the TF families with expanded TFBSs in human and / or hominin lineages, top enrichments were from bHLH-PAS factors, and ETS-related factors (Fig. 4e, Supplementary Fig. 7). Notable TFs included CLOCK, a circadian clock gene that is also implicated in human brain evolution24,25, MAX, a regulator of cell proliferation with unknown functions in the brain26, ARNT2, a regulator of activity dependent gene expression27 and TFEB, a regulator of oligodendrogenesis28. We additionally found that MEF2 factor TFBSs have expanded in ape lineage and depleted in human-hominin lineages (Fig. 4f, g). To reveal the cell type specificity of TFBS expansions, we then performed enrichments with cell type specific GREs and found that gains and losses are often enriched in the same cell type (Fig. 4h–k). However, there were also enrichments specific to expansions or depletions. For example, bHLH-PAS binding sites expanded in adult glutamatergic cells (Fig. 4h, i). Among the ape expanded TFBSs, MEF2 binding sites were expanded in GABAergic cells in the fetal dataset (Fig. 4j, k). Taken together, these results reveal ancestral history of TFBS patterns within the open chromatin regions of the human brain (Supplementary Data 89) and identify TFs that significantly altered their putative target space in the human lineage and its ancestral lineages.

Brain disease susceptibility in conserved and divergent GREs

Certain brain diseases such as schizophrenia, autism and Alzheimer’s disease are associated with alterations in cognitive abilities, a hallmark of human brain evolution, leading to the suggestion that human brain evolution contributed to the genetic susceptibility for these conditions2931. While some studies statistically linked human-evolved genetic changes and disease susceptibility3237, very few contrasted this with conserved or non-human divergent features33,35. Interestingly, these studies favored greater enrichments of brain disease susceptibility in conserved genomic features compared to human-specific genomic features33,35. To provide a systematic comparison of disease susceptibility in human brain evolution, we performed LD score regression (LDSC) analysis38 for the conserved and divergent GREs (Fig. 1, Supplementary Data 10). We performed regressions for the top 20,000 GREs to equalize the sample sizes of the background of all GREs detected in a given tissue (fetal or adult) (Methods). Strikingly, we found significant enrichments mainly among the conserved GREs in both adult and fetal brain (Fig. 5a). We reproduced this pattern with the top 10,000 or 5,000 GREs, indicating a robust trend (Supplementary Fig. 8a–d). Among the lineage-divergent enrichments, we mostly detected weaker enrichments (0.01 < FDR < 0.1) between the ancestral lineages and brain diseases / traits. Notably, we did not detect stronger enrichments for disease variants in the human lineage compared to ancestral lineages (Fig. 5a, b). However, schizophrenia variants were strongly enriched specifically in human and hominin lineage divergent GREs within the fetal dataset (Fig. 5b, Supplementary Fig. 8b). The total SNP numbers were similar across the groups, excluding the possibility of an excess number of SNPs driving the statistical differences (Fig. 5c, d). A sliding window analysis further showed that the enrichment comes from the GREs and not from the flanking regions included to obtain robust coefficient estimates33,39 (Methods, Supplementary Fig. 8e).

Fig. 5. Association of disease variants and evolutionary divergence of GREs.

Fig. 5

a, b LDSC regression results between each disease category (x-axis) and lineage (y-axis) for adult (a) and fetal (b) datasets. Top 20,000 most divergent GREs were used to run the regressions in each lineage (c, d) Total number of SNPs mapping to the GREs per lineage per dataset (adult: c, fetal: d). e, f LDSC regression results for each GRE group (ordered based on conservation score and divided into 20,000 GREs per group) in adult (e) and fetal (f) datasets. In all panels, single asterisk indicates FDR < 0.1 and double asterisk indicates FDR < 0.01.

To test the association between conservation and disease susceptibility more directly, we also divided all GREs based on their average conservation score across 5 lineages, and similarly observed stronger enrichments among more conserved groups despite an equivalent total number of SNPs per group (Fig. 5e, f, Supplementary Fig. 8f, g). Interestingly, fetal GREs often displayed more significant enrichments with greater effect size than conserved adult GREs (Fig. 5e, f). These results indicate greater disease susceptibility for more conserved GREs in the human brain epigenome and motivate further discussion on the interplay between human brain disease susceptibility and human brain evolution.

Discussion

Every region of the human genome has an ancestral history. While it is possible to reveal ancestral patterns of genomic regions by comparing the genomes of hundreds of species, it has only recently been possible to tie their ancestral history to functional relevance at tissue and cell type specific manner. Here, we used DNA sequence substitutions within cellularly resolved GREs to gain insight into human brain evolution. We found that most cell type markers were enriched in conserved GREs except for fetal microglia markers (Fig. 1). Using multiomic datasets, we then identified genes with significantly greater number of lineage-divergent GRE links (RDGs) (Fig. 2) and used both divergent GREs and RDGs to elucidate further functional enrichments (Fig. 3). We further identified TFBS evolution patterns across human and human-ancestral lineages, revealing lineage-specific expansions and depletions of all known TFBSs (Fig. 4). Finally, we found greater enrichments of brain disease susceptibility for conserved GREs compared to divergent GREs (Fig. 5). Taken together, our study aimed to reveal ancestral evolutionary patterns of the human brain at cell type resolution through comparative genomic analysis of open-chromatin regions.

Fetal microglia markers were surprisingly more divergent than all other cell types in the ancestral lineages (Fig. 1m). While the human and chimpanzee lineages also have relatively more substitutions in microglia marker GREs compared to other cell types, the divergence was no longer significant when all GREs were considered in the background (Fig. 1m). This indicates that fetal microglia may have undergone an accelerated evolution until the divergence of the human-chimpanzee split. Since human and chimpanzee lineages display similar levels of fetal microglia divergence, we hypothesize that evolutionary divergence of fetal microglia decelerated shortly before the human-chimpanzee split.

Interestingly, the cytokine-associated microglia population was significantly more associated with evolutionarily divergent GREs than conserved GREs in the fetal brain (Fig. 3b). Other studies have established that neuroinflammatory genes are typically upregulated in aging, neurodegeneration, and development20,40,41, making their presence in the unstimulated human fetal brain more intriguing20,41. Recent studies that focused on sequence divergence across numerous mammalian species also identified accelerated evolution in environmental response genes, especially immune responses42,43. We therefore hypothesize that core functions of fetal microglia may also have undergone accelerated evolution to be more responsive to environmental stimuli throughout ape evolution. Since our study identified changes in DNA sequence for all GREs in all lineages, future studies can test the functional output of the divergent GREs that are also fetal microglia markers by comparing the activity of the human sequence, ancestral sequences and the sequences in old world monkeys through reporter assays911. For the GREs linked to cytokine activity, cytokine secretion can be used as a readout to link genetic changes to potential phenotypic activity. Thus, our results provide the grounds for experimental exploration of potential evolutionary novelties in microglia.

Recent studies have identified species-specific gene expression changes at cell type resolution1214. In this study, we identified regulatorily divergent genes, RDGs, that are linked to significantly more divergent GREs than would be expected by chance. Our approach complements transcriptomic assays, which offer snapshots of gene expression and may miss differential regulation by GREs. Single-nucleus transcriptomics can also mask an alternative start site usage due to being 3’ biased. In contrast, an epigenome-driven approach is permissive to these mechanisms and augments transcriptional data. Thus, RDGs offer an in-silico alternative to transcriptomic comparisons, expanding our ability to detect species-specific gene regulation.

TFBS evolution harbors unique insights about regulatory evolution42,44, yet they are poorly understood in the human brain. In this study, we systematically identified gain and loss events (Supplementary Data 89) and observed that correlations of TFBSs gain/loss ratios roughly recapitulate phylogeny of lineages ancestral to humans (Fig. 4c, d). Analysis of TFs with expanded or depleted TFBS pools primarily revealed significant TFBS alterations in human-hominin and ape lineages, suggesting widespread changes of certain TF targets. Since TFs regulate critical developmental decisions, TFBS expansions might be tied to the evolution of cell type identity. Indeed, many of the newly gained and lost TFBSs were significantly associated with cell type markers (Fig. 4h–k).

Evolutionarily conserved molecular features are more associated with disease risk than evolutionarily divergent molecular features45,46. However, diseases can be species-specific, indicating that the adaptive benefit of newly evolved features must exceed the deleterious effects they might have entailed47,48. Some human brain diseases are also thought to be linked to human brain evolution, although disease association in human-divergent features are rarely contrasted with conserved features or non-human divergent features33,35. To our knowledge, no previous study systematically compared the human lineage to its ancestral lineages for disease susceptibility and contrasted this with the conserved features. We investigated this specifically for the adult and fetal cortical brain GREs and uncovered brain disease susceptibility enrichments mostly among conserved features, which increased with greater conservation score (Fig. 5). Strikingly, human divergent-GREs were not exceptional in their disease susceptibility, and we detected similar level of enrichment for human-ancestral divergent GREs (Fig. 5). Therefore, we could not find a human lineage-specific susceptibility to the brain diseases we examined. We note, however, that the disease variants in our study are mainly comprised of common variants identified through genome-wide association studies that do not have enough statistical power to capture the rare variants that can target distinct genomic regions49. It is therefore possible that rare disease variants might have greater association with more divergent GREs. Alternatively, human evolution may have rendered the human brain more susceptible to certain diseases not by disruption of the same biological pathways but by increasing the human brain physiology vulnerability to certain disruptions secondarily.

While our approach revealed insights into the cellular evolution of species and tissues that are not easily accessible, it has certain limitations. Our analyses show a significant overlap between sequence divergence and chromatin accessibility (Fig. 1l); however, there are many chromatin accessibility and gene expression changes that are not fully driven by a cis-acting DNA-sequence change50, and our study cannot comprehensively predict the downstream consequences of DNA-sequence changes. In contrast, direct epigenomic and/or transcriptomic comparisons can uncover the combined effect of cis-acting and trans-acting DNA-sequence changes, provided that samples are obtained with comparable quality across species. However, it is not feasible, if not impossible, to predict the ancestral state of any functional output (e.g. chromatin accessibility) whereas utilizing DNA sequence for ancestral sequence reconstruction provides a reliable readout to classify the human epigenome into its evolutionary divergence patterns.

Methods

Identification of lineage-specific substitutions

Since we are interested in the substitutions specific to ape lineage and its derived lineages extending to the human lineage, we focused on ape, old world monkey and new world monkey species for our comparisons. We therefore extracted the following genomes from UCSC 30-way alignment dataset51: Ape species: Homo sapiens (hg38), Pan Troglodytes (panTro5), Gorilla gorilla (gorGor5), Pongo abelii (ponAbe2), Nomascus leucogenys (nomLeu3). Old world monkey species: Macaca mulatta (rheMac8), Macaca fascicularis (macFas5), Macaca nemestrina (macNem1), Cercocebus atys (cerAty1), Papio anubis (papAnu3), Chlorocebus sabaeus (chlSab2), Mandrillus leucophaeus (manLeu1), Nasalis larvatus (nasLar1), Colobus angolensis (colAng1), Rhinopithecus roxellana (rhiRox1), Rhinopithecus bieti (rhiBie1). New world monkey species: Callithrix jacchus (calJac3), Saimiri boliviensis (saiBol1), Cebus capucinus (cebCap1), Aotus nancymaae (aotNan1). We also only retained these species within the associated phylogenetic tree. We then extracted each GRE from this multi-species alignment by removing all sequences absent in the human genome, essentially removing all potential human-specific deletions (strip.gaps.msa from rphast (v1.6.11)52). Per GRE, we then computed the maximum likelihood for the tree with pml function and optimized it for F81 substitution model using optim.pml function using phangorn (v2.10.0)53. Then we estimated the ancestral sequence probabilities using maximum likelihood approach with ancestral.pml function. We only considered ancestral states estimated with a probability > 0.75 for one of the nucleotides. If no nucleotide exceeded a probability of 0.75, we assigned the ancestral state as N (instead of A, C, T or G). These positions were discarded during the identification of the lineage-specific substitutions (see below). With this approach, we could reconstruct ancestral sequences within apes, as well as the ancestral state of the entire ape lineage since we have old world monkeys as its sister group and new world monkeys as the outgroup. Since we are only focused on the ancestral sequences within apes and how they compare with human sequence, we proceeded with the following sequences: human, hominin-ancestral, African great ape-ancestral, great ape-ancestral and ape-ancestral.

We then identified substitutions that occurred in each lineage with the following criteria:

  1. Ancestral sequence should be confidently reconstructed in all lineages for consideration for a substitution (i.e., any sequence tagged with ‘N’ was not considered).

  2. Substitutions that occurred more than once across the 5 lineages were discarded (e.g., Human: A, Hominin: C, African Great Ape: C, Great Ape: G, Ape: G).

  3. Similarly, substitutions that were reversed in a daughter lineage were also discarded (e.g., Human: C, Hominin: C, African Great Ape: A, Great Ape: C, Ape: C).

  4. We then only retained the substitutions that putatively occurred once across the 5 lineages (e.g., Human: T, Hominin: T, African Great Ape: T, Great Ape: A, Ape: A).

We created a list of all substitutions across the 5 lineages including which ancestral node the substitution took place, the ancestral nucleotide, the derived nucleotide, the position of the nucleotide in the human genome (hg38), the GRE that contains the substitution and the position of the substitution within the GRE (Supplementary Data 1-2).

To calculate the total number of substitutions, we summed all substitutions per lineage per GRE. To normalize for the branch length of each lineage (i.e., evolutionary time), we divided these sums to their corresponding branch length in million years (lower-bounds of a previous phylogenetic estimate54; Human: 6, Hominin: 2, A.G. Ape: 8, G. Ape: 4, Ape: 13). To further adjust for the length of the GRE and calculate substitution rate per kilobase, we then divided these values to the length of the GRE (in base pairs) and multiplied with 1000. This yielded normalized substitution rate per million years per kilobase (per MY per kb) for each GRE for human lineage and human-ancestral lineages.

We note that we performed our analysis on an adult and a fetal cortical brain dataset. While these datasets do not have resolution of all cortical regions (adult: posterior cingulate cortex, fetal: cerebral cortex), our previous analyses showed that—at least in the adult brain –, there is a very high (~90%) overlap of GREs between datasets from different cortical regions13 indicating high reproducibility of DNA-based substitution results across cortical brain datasets by definition. We have therefore prioritized our dataset selection based on cellular resolution, multi-species comparison, and multiomic dataset availability with similar age and from a similar tissue13,15,16.

To identify chimpanzee-specific substitutions and chimpanzee-divergent GREs, we substituted human with chimpanzee and performed the same analysis as described above.

Cell type marker GREs

We grouped nuclei into broad cell type categories in both fetal and adult datasets. We then aggregated counts per sample for each of the broad cell type category. To identify GREs that are significantly more accessible (i.e., cell type marker GREs) per cell type, we performed differential analysis using edgeR (v3.36)55 on the aggregated matrices. GREs that were more accessible for the given cell type compared to other cell types were identified with FDR < 0.05 and logFC > 1 cutoffs (Supplementary Data 3). This analysis was performed separately for the fetal and adult datasets for the cell type categories described in Supplementary Fig. 1a, b.

Cell type marker analysis of the substitutions

To understand the substitution patterns within the cell type markers, we summed evolutionary time normalized substitution values for all marker GREs per lineage per cell type. These values were then normalized for length by dividing them to the total length of all marker GREs per cell type. To highlight deviations of normalized substitutions across cell types, we then divided each value to the mean value across all cell types for each lineage. This yielded values around 1 with values > 1 enriched in substitutions compared to other cell types for each lineage. We tested the statistical significance of enrichments with a one-sided Chi-square test. P-values were adjusted for multiple testing using FDR. Results with FDR < 1e-5 were considered a significant deviation.

Evolutionary classification of the GREs

To classify GREs based on their substitution differences across lineages, we implemented three different cutoffs. First, we randomly sampled 10,000 GREs 1000 times and calculated a background proportion of substitutions across lineages for each randomly selected 10,000 GREs. This yielded 1000 proportions across lineages. We then calculated the proportion for each GRE separately and empirically calculated the p-value as the number of times the background proportion exceeded the observation for each lineage divided to 1000 (subsequently adjusted for multiple testing using FDR). We identified the fold change by dividing the observed proportion to the median value of the randomly sampled proportions. We identified GREs divergent in a given lineage with cutoffs of FDR < 0.05 and fold change > 1.5. Second, we z-transformed the substitution proportions of all GREs per lineage and only retained significant GREs if their standard deviation is > 1 for the lineage in which they are significantly more divergent. This further highlighted GREs that contain a substantial number of substitutions for the given lineage compared to other GREs. Third, we further filtered all divergent GREs to contain at least 2 substitutions for the lineage in which they are divergent. We did not require a GRE to be divergent in only one lineage with these filters. However, we still detected very low Jaccard similarity index (<0.05) of lineage-divergent GREs between any two lineages indicating very low to no overlap of divergent-GREs between lineages.

To identify conserved GREs, we found all GREs with normalized substitution value lower than at least 50% of the GREs for each lineage. We then retained the intersection of these GREs among all lineages. We additionally required this list to not contain any previously identified lineage-divergent GREs. We named the resulting list ‘conserved GREs’.

Enrichment with species-specific accessibility changes

We combined the human-specific chromatin accessibility changes (compared to chimpanzee and rhesus macaque) across all cell types from a recent comparative single-nucleus ATAC-seq study (human dataset of this reference is also the adult dataset we used in this study)13. We defined conserved chromatin accessibilities as GREs that did not display a species-specific (across human, chimpanzee, rhesus macaque) accessibility in any cell type. To perform enrichment while accounting for GRE length, we performed a logistic regression with accessibility classification as the response variable (Human-specific or others. Conserved or others), and GRE length and substitution classification as the predictor variables (Human-divergent or others, Hominin-divergent or others etc.). P - value was computed with Wald’s test and FDR was calculated for multiple testing correction. Results with FDR < 0.05 were considered significant enrichments. We note that this enrichment was only done with the adult dataset since comparative single-nucleus accessibility results are from adult cortical brain tissues.

Comparisons with HARs

Previously published HARs34,5659 and cortical HARs13 were overlapped with lineage-divergent GREs using bedtools (v2.29)60. We identified lineage-specific substitutions within these regions as described above. To compute the GC conversion ratio, we divided the total number of conversions from A/T to G/C to all detected substitutions per HAR and lineage-divergent GRE.

Cell type marker enrichment of lineage-divergent GREs

We performed two-tailed Fisher’s exact tests to determine whether overlaps between cell type marker GREs and lineage-divergent / conserved GREs are significantly enriched / depleted. We used all GREs for each dataset (adult or fetal) as the background. Overlaps with FDR < 0.05 and odds ratio > 1 were labeled as enriched and overlaps with FDR < 0.05 and odds ratio <1were labeled as depleted.

Identification of regulation divergent genes

To identify GRE—gene expression linkage, we utilized multiome datasets obtained from fetal and adult brains15,16. Since the adult brain multiome dataset is from a different cortical brain region (prefrontal cortex instead of posterior cingulate cortex in the cross-species dataset), we re-generated the multiome GRE-cell count matrix on our set of GREs using FeatureMatrix from Signac (v1.10)61. Unsurprisingly, total read counts were tightly correlated between the original and the newly generated matrices (Spearman’s rho ~= 0.99) since there is a high degree of overlap between GREs from different adult cortical regions. We then calculated potential GRE – gene expression links using LinkPeaks function from Signac that utilizes correlation between accessibility and gene expression of nearby genes and compared with the randomly selected associations61,62. Significant links were identified with FDR < 0.05 and score > 0.01 cutoffs. We performed this analysis separately for fetal and adult datasets.

We then sought to identify genes that are linked to a significant number of more lineage-divergent GREs (which we refer to as regulatorily divergent genes or RDGs). To achieve this, we first retained the genes that had sufficient power to perform this analysis by requiring a gene to be linked to at least 5 GREs and at least 2 lineage-divergent GREs. To create a background distribution, we then randomly selected the same number of GREs for each lineage-divergent GRE group (e.g., Human-divergent GREs) and found the number of linked GREs for each gene. This was repeated 1000 times separately for each lineage-divergent GRE group. We then obtained an empirical p-value by counting the number of randomized events that resulted in a greater number of linked GREs than the given lineage-divergent GRE group. After false discovery rate (FDR) calculation, we then sought to evaluate the biological relevance of RDG calls using different FDR cutoffs. To do this, we have run an enrichment of our RDG calls on the human lineage with human-specific differentially expressed genes (HS-DEGs) that were previously identified13. We have reasoned that greater enrichment is more indicative of a functional consequence for the given selection of RDG calls. We have found that FDR < 0.05 yielded a better enrichment that both higher and lower FDR cutoffs (Supplemental Fig. 9). We also wondered if requiring RDG calls to be linked to a greater number of lineage-divergent GREs (on the same lineage that is being tested) can increase their enrichment with HS-DEGs. After testing 3, 4 and 5 GREs, we found that 5 GREs yielded a noticeably greater enrichment compared to the randomized background distribution (Supplemental Fig. 9). Therefore, we required the final RDG calls to have FDR < 0.05 and linked to at least 5 lineage-divergent GREs. We performed this analysis separately for fetal and adult datasets.

Gene ontology enrichments

To perform gene ontology enrichments on the genomic regions, we aimed to capture enrichments that are associated with both divergent GREs and RDGs. To achieve this, we only retained GREs that are linked to at least one gene. Additionally, to highlight enrichments that differ across the lineages, we set the background as all lineage-divergent GREs (plus conserved GREs to provide an additional contrast) and foreground as GREs divergent in one lineage. We then performed gene ontology enrichment with rGREAT (v1.26)21,22 and considered terms with adjusted p-value < 0.05 and fold enrichment > 1.3 significant. We additionally required each enriched term to be associated with at least 10 divergent GREs and at least 1 RDG for the given lineage for increased robustness of the results. Both requirements filtered an additional >50% of the enrichment terms in each lineage. We then reduced the redundancy of the final list of enriched terms using rrvigo (v.1.6.0) (Revigo package in R language)63,64.

Identification of TFBS evolution patterns across lineages

To identify motif occurrences in the human and human-ancestral sequences, we downloaded the JASPAR 2022 non-redundant H. sapiens motif dataset and found motif occurrences in all GREs for each lineage using matchMotifs and motifMatches functions from motifmatchr (v.1.16) package65. This yielded a binary matrix of motif presence / absence per lineage and per motif. We performed this analysis for all GREs since TFBS gain or loss can occur and potentially lead to functional changes in any GRE. For each motif, we scanned all GREs to find the lineages that gained / lost the motif. Similar to substitutions, we only considered the gain/loss of motifs if they were gained / loss exactly once. Gain / loss was assigned to the lineage the change occurred. To understand the putative TFBS expansion / depletion across the entire epigenome, we counted gains and losses per motif across all GREs and divided total gains to total losses per lineage.

Significant expansion / depletion of a TFBS across lineages was identified by Chi-square comparison of gain/loss ratio to (i) overall gain / loss ratio of all TFBS in all lineages and (ii) gain / loss ratio in other lineages for the given TFBS. To consider deviations significant, FDR < 0.01 was required for both comparisons in the same direction (>1 for expansions, <1 for depletions). We further filtered the TFBS for the accessibility/expression of their TF in at least 25% of the cells in at least one cell type category using the original annotations from the reference datasets13,16.

Enrichment with accessibility changes were performed as described above for lineage-divergent GREs except with putatively gained + lost targets of human-expanded TFs, ape-expanded TFs or other TFs.

Cell type enrichment of TFBS evolution patterns

GREs that gained / lost a TFBS for the given TFBS were tested for enrichment in cell type marker GREs using two-tailed Fisher’s exact test. Enrichments were determined with FDR < 0.05 and odds ratio >1.3 cutoffs. Since many TFBSs are from the same family and with similar motifs to each other, we also combined all TFBS gains / loss for some of these families (MEF2, bHLH-PAS, ETS-related) (Supplementary Fig. 5) and performed the same analysis to elucidate more robust cell type evolution patterns (Fig. 4h–k).

LDSC regression

To perform LD (linkage-disequilibrium) score (LDSC) regression analysis, we expanded each GRE 25 kb both upstream and downstream. We ranked GREs based on divergence using fold change of divergence across lineages. Conservation score was calculated as 1 / mean fold change across all lineages per GRE. Top 20000, 10000 or 5000 GREs were selected to perform LDSC regressions.

Genome-wide association study (GWAS) summary statistics from brain disorders (ADHD66: attention-deficit/hyperactivity disorder, ASD67: autism spectrum disorders, BP68: bipolar disorder, SCZ69: schizophrenia, MDD70: major depressive disorder, AD71: Alzheimer’s disease), brain traits (INT72: intelligence, COG73: cognitive function) and non-brain disorders (OST74: osteoporosis, CAD75: coronary artery disease) were downloaded and arranged for use in LDSC regression using munge_sumstats (v1.0.1)38. LDSC regressions were then performed with the recommended parameters38. All GREs (separately for fetal and adult datasets) were used as a background in all regressions. Results with FDR < 0.1 were considered weak enrichments and results with FDR < 0.01 were considered strong enrichments.

To assess the contribution of 25 kb flanking regions in LDSC regression statistics, we performed the same analysis by shifting the 50 kb window in 5 kb increments from -100kb to +100 kb relative to the center of the GRE, similar to a previous study33.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Supplementary information

Supplementary Information (102.5MB, zip)
Supplementary Information (30.1MB, xlsx)
Supplementary Information (27.9KB, xlsx)
Reporting Summary (118.6KB, pdf)
Peer Review file (2MB, pdf)
Supplementary Information (27.1KB, xlsx)

Acknowledgements

The authors thank Dr. Soojin V. Yi and Elliot Outland for their critical comments on the manuscript. G.K. is a Jon Heighten Scholar in Autism Research and Townsend Distinguished Chair in Research on Autism Spectrum Disorders at UT Southwestern. E.C. is a Neural Scientist Training Program Fellow in the Peter O’Donnell Brain Institute at UT Southwestern. This work was partially supported by the James S. McDonnell Foundation 21st Century Science Initiative in Understanding Human Cognition Scholar Award (220020467), the Simons Foundation (947591), NHGRI (HG011641), NINDS (NS115821, NS126143) and NIMH (MH126481, MH103517) to G.K.

Author contributions

E.C. conceptualized the study, performed all analyses and wrote the manuscript. G.K. provided guidance and supervision. E.C. and G.K. edited the manuscript.

Peer review

Peer review information

Nature Communications thanks Gabriel Santpere and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Data availability

The processed data generated in this study are provided in the Source Data files. The adult single-nucleus ATAC-seq data used in this study are available in the GEO database under accession code GSE192774. The adult single nucleus multiome [ATAC + RNA] data used in this study are available in the GEO database under accession code GSE207334. The fetal single nucleus multiome [ATAC + RNA] data used in this study are available in the GEO database under accession code GSE162170. The UCSC multi-species alignment data [multiz30way] used in this study are available in the UCSC database [https://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz30way/].

Code availability

All analysis codes have been deposited at the GitHub repository at https://github.com/konopkalab/HumanEpigenome_Evo76.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Emre Caglayan, Email: emre.caglayan@utsouthwestern.edu, Email: emre.caglayan@childrens.harvard.edu.

Genevieve Konopka, Email: genevieve.konopka@utsouthwestern.edu, Email: gena@alum.mit.edu.

Supplementary information

The online version contains supplementary material available at 10.1038/s41467-025-60665-w.

References

  • 1.Alfoldi, J. & Lindblad-Toh, K. Comparative genomics as a tool to understand evolution and disease. Genome Res.23, 1063–1068 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Zoonomia, C. A comparative genomics multitool for scientific discovery and conservation. Nature587, 240–245 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Franchini, L. F. & Pollard, K. S. Human evolution: the non-coding revolution. BMC Biol.15, 89 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Xue, J. R. et al. The functional and evolutionary impacts of human-specific deletions in conserved elements. Science380, eabn2253 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.McLean, C. Y. et al. Human-specific loss of regulatory DNA and the evolution of human-specific traits. Nature471, 216–219 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Mangan, R. J. et al. Adaptive sequence divergence forged new neurodevelopmental enhancers in humans. Cell185, 4587–4603.e4523 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Dutrow, E. V. et al. Modeling uniquely human gene regulatory function via targeted humanization of the mouse genome. Nat. Commun.13, 304 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Boyd, J. L. et al. Human-chimpanzee differences in a FZD8 enhancer alter cell-cycle dynamics in the developing neocortex. Curr. Biol.25, 772–779 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Girskis, K. M. et al. Rewiring of human neurodevelopmental gene regulatory programs by human accelerated regions. Neuron109, 3239–3251.e3237 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Uebbing, S. et al. Massively parallel discovery of human-specific substitutions that alter enhancer activity. Proc. Natl. Acad. Sci. USA118, e2007049118 (2021). [DOI] [PMC free article] [PubMed]
  • 11.Whalen, S. et al. Machine learning dissection of human accelerated regions in primate neurodevelopment. Neuron111, 857–873.e858 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Bakken, T. E. et al. Comparative cellular analysis of motor cortex in human, marmoset and mouse. Nature598, 111–119 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Caglayan, E. et al. Molecular features driving cellular complexity of human brain evolution. Nature620, 145–153 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Khrameeva, E. et al. Single-cell-resolution transcriptome map of human, chimpanzee, bonobo, and macaque brains. Genome Res.30, 776–789 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ma, S. et al. Molecular and cellular evolution of the primate dorsolateral prefrontal cortex. Science377, eabo7257 (2022). [DOI] [PMC free article] [PubMed]
  • 16.Trevino, A. E. et al. Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution. Cell184, 5053–5069.e5023 (2021). [DOI] [PubMed] [Google Scholar]
  • 17.Kostka, D., Hubisz, M. J., Siepel, A. & Pollard, K. S. The role of GC-biased gene conversion in shaping the fastest evolving regions of the human genome. Mol. Biol. Evol.29, 1047–1057 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Reiterer, M., Schmidt-Kastner, R. & Milton, S. L. Methionine sulfoxide reductase (Msr) dysfunction in human brain disease. Free Radic. Res.53, 1144–1154 (2019). [DOI] [PubMed] [Google Scholar]
  • 19.Ziegler, G. C. et al. A common CDH13 variant is associated with low agreeableness and neural responses to working memory tasks in ADHD. Genes (Basel)12, 1356 (2021). [DOI] [PMC free article] [PubMed]
  • 20.Popova, G. et al. Human microglia states are conserved across experimental models and regulate neural stem cell responses in chimeric organoids. Cell Stem Cell28, 2153–2166.e2156 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol.28, 495–501 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Gu, Z. & Hubschmann, D. rGREAT: an R/bioconductor package for functional enrichment on genomic regions. Bioinformatics39, btac745 (2023). [DOI] [PMC free article] [PubMed]
  • 23.Ataman, B. et al. Evolution of Osteocrin as an activity-regulated factor in the primate brain. Nature539, 242–247 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Konopka, G. et al. Human-specific transcriptional networks in the brain. Neuron75, 601–617 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Liu, Y. et al. An extra-circadian function for human CLOCK in the neocortex. bioRxiv10.1101/2023.03.13.531623 (2023).
  • 26.Amati, B. & Land, H. Myc-Max-Mad: a transcription factor network controlling cell cycle progression, differentiation and death. Curr. Opin. Genet Dev.4, 102–108 (1994). [DOI] [PubMed] [Google Scholar]
  • 27.Sharma, N. et al. ARNT2 tunes activity-dependent gene expression through NCoR2-mediated repression and NPAS4-mediated activation. Neuron102, 390–406.e399 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Sun, L. O. et al. Spatiotemporal control of CNS myelination by oligodendrocyte programmed cell death through the TFEB-PUMA Axis. Cell175, 1811–1826.e1821 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Pattabiraman, K., Muchnik, S. K. & Sestan, N. The evolution of the human brain and disease susceptibility. Curr. Opin. Genet Dev.65, 91–97 (2020). [DOI] [PubMed] [Google Scholar]
  • 30.Gluckman, P. D., Low, F. M., Buklijas, T., Hanson, M. A. & Beedle, A. S. How evolutionary principles improve the understanding of human health and disease. Evol. Appl.4, 249–263 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Usui, N., Co, M. & Konopka, G. Decoding the molecular evolution of human cognition using comparative genomics. Brain Behav. Evol.84, 103–116 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Won, H., Huang, J., Opland, C. K., Hartl, C. L. & Geschwind, D. H. Human evolved regulatory elements modulate genes involved in cortical expansion and neurodevelopmental disease susceptibility. Nat. Commun.10, 2396 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Jeong, H. et al. Evolution of DNA methylation in the human brain. Nat. Commun.12, 2021 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Doan, R. N. et al. Mutations in human accelerated regions disrupt cognition and social behavior. Cell167, 341–354.e312 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Castelijns, B. et al. Hominin-specific regulatory elements selectively emerged in oligodendrocytes and are disrupted in autism patients. Nat. Commun.11, 301 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Berto, S. et al. Accelerated evolution of oligodendrocytes in the human brain. Proc. Natl. Acad. Sci. USA116, 24334–24342 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Banerjee, N. et al. Recently evolved human-specific methylated regions are enriched in schizophrenia signals. BMC Evol. Biol.18, 63 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet50, 621–629 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature581, 434–443 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Gosselin, D. et al. An environment-dependent transcriptional network specifies human microglia identity. Science356, eaal3222 (2017). [DOI] [PMC free article] [PubMed]
  • 41.Kracht, L. et al. Human fetal microglia acquire homeostatic immune-sensing properties early in development. Science369, 530–537 (2020). [DOI] [PubMed] [Google Scholar]
  • 42.Andrews, G. et al. Mammalian evolution of human cis-regulatory elements and transcription factor binding sites. Science380, eabn7930 (2023). [DOI] [PubMed] [Google Scholar]
  • 43.Christmas, M. J. et al. Evolutionary constraint and innovation across hundreds of placental mammals. Science380, eabn3943 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Zhang, X., Fang, B. & Huang, Y. F. Transcription factor binding sites are frequently under accelerated evolution in primates. Nat. Commun.14, 783 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Hujoel, M. L. A., Gazal, S., Hormozdiari, F., van de Geijn, B. & Price, A. L. Disease heritability enrichment of regulatory elements is concentrated in elements with ancient sequence age and conserved function across species. Am. J. Hum. Genet.104, 611–624 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Sullivan, P. F. et al. Leveraging base-pair mammalian constraint to understand genetic variation and human disease. Science380, eabn2937 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.O’Bleness, M., Searles, V. B., Varki, A., Gagneux, P. & Sikela, J. M. Evolution of genetic and genomic features unique to the human lineage. Nat. Rev. Genet13, 853–866 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Benton, M. L. et al. The influence of evolutionary history on human health and disease. Nat. Rev. Genet22, 269–283 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Satterstrom, F. K. et al. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of Autism. Cell180, 568–584.e523 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Hill, M. S., Vande Zande, P. & Wittkopp, P. J. Molecular and evolutionary processes generating variation in gene expression. Nat. Rev. Genet22, 203–215 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature478, 476–482 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Hubisz, M. J., Pollard, K. S. & Siepel, A. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief. Bioinform.12, 41–51 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Schliep, K. P. phangorn: phylogenetic analysis in R. Bioinformatics27, 592–593 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Locke, D. P. et al. Comparative and demographic analysis of orangutan genomes. Nature469, 529–533 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Chen, Y., Lun, A. T. & Smyth, G. K. From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline. F1000Res5, 1438 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Pollard, K. S. et al. An RNA gene expressed during cortical development evolved rapidly in humans. Nature443, 167–172 (2006). [DOI] [PubMed] [Google Scholar]
  • 57.Prabhakar, S., Noonan, J. P., Paabo, S. & Rubin, E. M. Accelerated evolution of conserved noncoding sequences in humans. Science314, 786 (2006). [DOI] [PubMed] [Google Scholar]
  • 58.Bird, C. P. et al. Fast-evolving noncoding sequences in the human genome. Genome Biol.8, R118 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Gittelman, R. M. et al. Comprehensive identification and analysis of human accelerated regulatory DNA. Genome Res.25, 1245–1255 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics26, 841–842 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Stuart, T., Srivastava, A., Madad, S., Lareau, C. A. & Satija, R. Single-cell chromatin state analysis with Signac. Nat. Methods18, 1333–1341 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell183, 1103–1116.e1120 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Sayols, S. rrvgo: a Bioconductor package for interpreting lists of gene ontology terms. MicroPubl Biol. 18, 2023:10.17912/micropub.biology.000811 (2023). [DOI] [PMC free article] [PubMed]
  • 64.Supek, F., Bosnjak, M., Skunca, N. & Smuc, T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS ONE6, e21800 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.A, S. motifmatchr: Fast Motif Matching in R. R Package Version 1.4.0https://greenleaflab.github.io/motifmatchr/ (2018).
  • 66.Martin, J. et al. A genetic investigation of sex bias in the prevalence of attention-deficit/hyperactivity disorder. Biol. Psychiatry83, 1044–1053 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Grove, J. et al. Identification of common genetic risk variants for autism spectrum disorder. Nat. Genet51, 431–444 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Mullins, N. et al. Genome-wide association study of more than 40,000 bipolar disorder cases provides new insights into the underlying biology. Nat. Genet53, 817–829 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Trubetskoy, V. et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature604, 502–508 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Howard, D. M. et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat. Neurosci.22, 343–352 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet51, 404–413 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet50, 1112–1121 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Davies, G. et al. Study of 300,486 individuals identifies 148 independent genetic loci influencing general cognitive function. Nat. Commun.9, 2098 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Estrada, K. et al. Genome-wide meta-analysis identifies 56 bone mineral density loci and reveals 14 loci associated with risk of fracture. Nat. Genet44, 491–501 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Schunkert, H. et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet43, 333–338 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Caglayan, E., Konopka, G. Decoding DNA sequence-driven evolution of the human brain epigenome at cellular resolution. Zenodo10.5281/zenodo.15376944 (2025). [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information (102.5MB, zip)
Supplementary Information (30.1MB, xlsx)
Supplementary Information (27.9KB, xlsx)
Reporting Summary (118.6KB, pdf)
Peer Review file (2MB, pdf)
Supplementary Information (27.1KB, xlsx)

Data Availability Statement

The processed data generated in this study are provided in the Source Data files. The adult single-nucleus ATAC-seq data used in this study are available in the GEO database under accession code GSE192774. The adult single nucleus multiome [ATAC + RNA] data used in this study are available in the GEO database under accession code GSE207334. The fetal single nucleus multiome [ATAC + RNA] data used in this study are available in the GEO database under accession code GSE162170. The UCSC multi-species alignment data [multiz30way] used in this study are available in the UCSC database [https://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz30way/].

All analysis codes have been deposited at the GitHub repository at https://github.com/konopkalab/HumanEpigenome_Evo76.


Articles from Nature Communications are provided here courtesy of Nature Publishing Group

RESOURCES