Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Oct 1.
Published in final edited form as: Curr Opin Genet Dev. 2024 Aug 31;88:102256. doi: 10.1016/j.gde.2024.102256

Massively parallel approaches for characterizing non-coding functional variation in human evolution

Stephen Rong 1,#, Elise Root 1, Steven K Reilly 1,2,#
PMCID: PMC11648527  NIHMSID: NIHMS2021838  PMID: 39217658

Abstract

The genetic differences underlying unique phenotypes in humans compared to our closest primate relatives have long remained a mystery. Similarly, the genetic basis of adaptations between human groups during our expansion across the globe are poorly characterized. Uncovering the downstream phenotypic consequences of these genetic variants has been difficult, as a substantial portion lies in non-coding regions such as cis-regulatory elements (CREs). Here, we review recent high-throughput approaches to measure the functions of CREs and the impact of variation within them. CRISPR screens can directly perturb CREs in the genome to understand downstream impacts on gene expression and phenotypes, while massively parallel reporter assays (MPRAs) can decipher the regulatory impact of sequence variants. Machine learning has begun to be able to predict regulatory function from sequence alone, further scaling our ability to characterize genome function. Applying these tools across diverse phenotypes, model systems, and ancestries is beginning to revolutionize our understanding of non-coding variation underlying human evolution.

Keywords: human evolution, non-coding variation, CRISPR, MPRA, machine learning

Introduction

The rapid expansion of modern and ancient human genomes from around the globe [1,2] as well as the increasing scale of mammalian genome databases [3] is greatly advancing our understanding of human evolutionary genetics. The resulting catalog of genetic changes in our evolutionary history may underlie the unique phenotypes that characterize our species [4,5] or the distinctive local adaptations of different ancestry groups [6]. However, identifying which of these many genetic changes are functional, or how they mechanistically impart phenotypic change, remains a major challenge for human evolutionary studies.

This difficulty is due in part to the genome’s scale and the field’s limited ability to interpret the functional impacts of genetic changes. There are an estimated 35 million single nucleotide differences between humans and chimpanzees, our nearest extant relatives [5]. Even within modern humans, any two unrelated individuals are 99.9% identical, yet differ by ~2–4 million single nucleotide variants [7]. Secondly, causal trait-associated functional variants disproportionately occur in non-coding regions [8], modifying cis-regulatory elements (CREs), such as enhancers, promoters, and silencers, that regulate cell type-specific gene expression [9]. Predicting the phenotypic impact of non-coding variation has historically been challenging, but recent advances in high-throughput functional characterization tools have begun to reveal new insights into their consequences.

Unlike the genetic code for codons, we do not yet understand the context-specific, regulatory grammar of CREs well enough to predict non-coding variant effects from sequence. Mapping of epigenetic marks and transcription factor (TF) binding correlated with CRE activity can help nominate functional variants, but these marks’ widespread genomic prevalence means they cannot usually pinpoint causal non-coding variants [10]. Moreover, CREs can target distant genes through 3D chromatin interactions, making it hard to predict the gene expression effects of non-coding variation. Genome-wide association studies (GWAS) and expression quantitative trait loci (eQTL) studies can correlate variants with complex traits, diseases, or gene expression, but are only applicable to studying within-species variation, and current studies lack sufficient diversity to interpret global variation across human ancestries [11]. Linkage disequilibrium (LD), which is often especially high in recently adaptive loci, creates additional difficulties in distinguishing the few causal variants from the many linked non-causal variants.

This review highlights recent advances in massively parallel experiments and computational prediction tools that are beginning to transform human evolution research by linking causal evolutionary genetic changes to downstream impacts. Such tools must be capable of reading out non-coding functions, have genome-scale throughput, and be applicable across phenotypically relevant cell types and developmental time points. We focus on three major areas. Firstly, advances in CRISPR technologies allow researchers to directly perturb CREs and link them to target gene expression and phenotypes. Secondly, reporter assays enable testing of tens of thousands of non-coding regions and variants simultaneously. These functional tools can be further integrated with insights from transgenic and humanized mouse models [12,13] or be applied to an increasing variety of stem cell and organoid models [4], providing a powerful toolkit for comprehensively characterizing CREs. Finally, machine learning (ML) models are increasingly capable of predicting regulatory activity and genome function from sequence alone.

For this review, we will focus on how these tools have been used to study genetic changes spanning several major periods of human evolutionary history (Figure 1A), though they can be applied to additional evolutionary questions and species. These periods include: 1) the origins of humans following our divergence from the chimpanzee and bonobo lineage, during which we evolved the derived anatomical, physiological, and cognitive traits that differentiate us from other great apes (Figure 1B-D) [4,5]; 2) the evolutionary divergence and subsequent introgression events between anatomically modern humans and our now-extinct archaic human relatives such as Neanderthals and Denisovans (Figure 1E,F) [14,15]; and 3) recent local adaptation within modern human ancestry groups to different environments, pathogens, and diets (Figure 1G) [6].

Figure 1. Non-coding variation in human evolution.

Figure 1.

(A) Simplified human evolutionary history showing human-specific variants underlying divergence from chimpanzees/bonobos (blue), modern human divergence from archaic humans and archaic introgression into modern human ancestry groups (oranges), and variation between modern human ancestry groups (green): AFR: African, EUR: European, SAS: South Asian, EAS: East Asian, AMR: Admixed American, MEL: Melanesian. (B-G) Evolutionarily relevant non-coding variation types, including: human-specific variation, such as rapidly evolving (B) human accelerated regions (HARs) and (C) human ancestor quickly evolved regions (HAQERs), and (D) human-specific deletions in highly conserved regions (hCONDELs); modern v. archaic human genetic differences, including (E) modern-specific variants and (F) archaic introgressed variants; and (G) recent selective sweeps underlying local adaptation in modern human ancestry groups. Causal variants for each non-coding variation type highlighted in labeled color, while non-causal variants are in gray.

Mapping non-coding regions to target genes and phenotypes with CRISPR

For the vast majority of loci of interest in human evolution, the target gene(s) and phenotypes remain unknown (Figure 2A). A prime example of this challenge is seen in human accelerated regions (HARs), which are non-coding regions that are conserved across species but show accelerated substitution rates in the human lineage (Figure 1B) [16]. HARs are enriched in CREs and have been explored across multiple studies for their role in human-specific traits, especially during embryonic, skeletal, and neurodevelopment [1719]. Many studies have used 3D chromatin maps, such as 4C, Hi-C, and PLAC-seq, to link HARs to potential target genes [13,1923]. However, 3D chromatin methods do not establish direct causal links between CREs and gene expression.

Figure 2. High-throughput approaches for characterizing non-coding variation in human evolution.

Figure 2.

(A) Linking non-coding variation in human evolutionary history to their downstream functional impacts is a major challenge for the field. New experimental tools are helping overcome challenges at each stage in this process: identifying relevant variation, prioritizing variants with effects on cis-regulatory elements (CREs), relating those effects to cell type-specific molecular changes, and ultimately characterizing adaptive phenotypes. (B) CRISPR perturbations modalities, including cutting/knockouts (KO), gene expression interference (i), and activation (a), enable characterizing of CREs endogenously, and can be paired with measurement of downstream gene targets and/or cellular and organismal phenotypes. (C) Massively parallel reporter assays (MPRAs) enable high-throughput screens of non-coding variants for differential regulatory activity. (D) Machine learning models trained to predict many functional assays across many cell types from sequence can enable genome-scale analyses. TF: transcription factor binding, 3D: 3D chromatin folding, GE: gene expression.

CRISPR genome manipulation has emerged as a powerful tool for investigating the function of CREs in the context of human evolution. It enables the direct, endogenous measurement of CRE function, often identifying target genes or phenotypes. CRISPR technologies encompasses a range of approaches, including enzymatically active CRISPR-Cas used for gene/CRE knockouts (CRISPR KO) or targeted allelic replacement, and catalytically deactivated Cas protein fused to a repressor/activator domain used to modulate genomic elements in methods called CRISPR interference/activation (CRISPRi/a) (Figure 2B) [24]. However, it’s important to recognize that the majority of these studies have been conducted in cell lines, with limited high-throughput in vivo applications to date. They have also generally been used to interrogate individual candidate loci from other prioritization approaches, such as those prioritized by conservation, epigenomics, chromatin contacts, association studies, reporter assays, and machine learning (ML) predictions [23,2531]. For example, human thermoregulatory adaptation is distinguished from that of other great apes and primates by our nearly hairless bodies and high densities of eccrine sweat glands. Aldea et al. [20] identified candidate enhancers of the EN1 TF gene, screened them in skin cells using transgenic mouse assays, and validated a skin-specific enhancer of EN1 that increased eccrine gland density in a CRISPR-Cas9 humanized enhancer knock-in mouse model. CRISPR genome editing has also been used to identify downstream trans-regulatory effects of extinct archaic coding variants in a transcription factor affecting skeletal morphology (GL13 variant R1537C) [32] and a splicing factors affecting neurodevelopment (NOVA1 variant I200V) [33], thus enabling studies of the relatively few protein-coding variants underlying regulatory evolution in humans.

Pooled CRISPR screens now enable high-throughput in vitro screening of CRISPR perturbations, by combining multiplexed guide libraries with phenotypic readouts such as cell proliferation and single-cell RNA-seq (Perturb-seq) [34]. Researchers have used CRISPR screens to probe the genetic basis of evolving human phenotypes at genome scales. A pooled CRISPR-KO screen in human neural stem cells (hNSCs), which are important for determining cortical size, identified thousands of genes and enhancers that impact hNSC proliferation [35]. These included many HARs, supporting their importance in the evolution of human neurodevelopment, and an enrichment for regions linked to neurodevelopmental disorder genes, such as microcephaly and autism spectrum disorder (ASD). Similarly, a pooled CRISPRi knockdown screen found a small number of human or chimpanzee-specific essential genes, including cell-cycle progression gene regulatory networks that may contribute to increased neural proliferation in humans [36]. Finally, a pooled CRISPRi screen with cell proliferation and single-cell RNA sequencing readouts has been used to mimic large human-specific deletions (hDels) by knockdown of orthologous regions in chimpanzee pluripotent stem cells, allowing the authors to discover both cis and trans-regulatory gene targets of deleted CREs [37].

So far, pooled CRISPR screens for human evolutionary questions have focused on perturbing entire genes or non-coding regions in individual cell lines. However, emerging CRISPR technologies such as base and prime editors allow for highly precise and versatile edits, including insertions, deletions, and point mutations. Pooled base/prime editing screens have so far been used to study disease-associated enhancer variants at scale [3840], but could also be applied to variants in evolutionary biology. For example, they could be used to interrogate the downstream effects of individual substitutions in HARs or identify target genes of small human-specific insertions and deletions. In parallel, the development of pooled CRISPR screens in organoids and animal models will enable high-throughput in vitro and in vivo CRISPR screens across multiple cell types and tissues at once [4,41]. Thus, these tools present new opportunities to directly interrogate the downstream gene regulatory and phenotypic impacts of evolutionarily important regions and variants across multiple cell types and tissue systems.

Identifying functional non-coding evolutionary variation with reporter assays

Traditional low-throughput reporter assays, such as luciferase reporters, have been used to understand the evolutionary importance of genetic changes [42], but do not scale well. In contrast, high-throughput, sequencing-based reporter assays, such as MPRAs and STARR-seq, can functionally screen thousands of non-coding variants in parallel for their effects on gene expression in a cell line of interest [4345]. Generally, a library of putative cis-regulatory sequences is cloned upstream (in MPRA) of a minimal promoter, or in the 3’UTR (STARR-seq) where they can act to drive transcription of a reporter gene. These reporters are transfected into particular cell lines [46], or integrated into the genome for hard-to-transfect primary cells [47], and cis-regulatory output quantified by comparing the number of reporter RNA transcripts to DNA molecules (Figure 2C). By comparing alleles, MPRAs can identify variants with differential cis-regulatory effects, albeit outside their endogenous epigenomic context; and combining MPRAs with endogenous maps of CREs provides a powerful method to prioritize functional variants [48]. MPRAs typically test synthetic oligos, and thus are particularly useful for studying non-coding variants in evolutionary biology [49], such as ancestral, fixed, extinct, rare, from underrepresented ancestry groups, or completely novel variation. High-throughput reporter assays can also be used to study other non-coding processes, though they have been rarely applied to evolutionary questions, with examples for 3’ UTRs [50] and splicing [51].

MPRAs have been frequently used to characterize the regulatory functions of non-coding regions identified through computational analysis of evolving sequences. For example, MPRAs have compared the cis-regulatory output of human sequences to their chimpanzee or ancestral orthologs, both at HARS (Figure 1B), as well as CREs with species-specific epigenetic features [22,23,30,52]. Many of these studies also used the base-pair resolution of MPRAs to identify which variants functionally drive species differences, finding a variety of additive, synergistic, and compensatory interactions [22,23]. To avoid oligo synthesis limitation (200bp), which is shorter than most HARs, Girskis et al. [21] used capture-based MPRA (caMPRA) to target longer sequences from human and chimpanzee genomic DNA (~500 nts). By being able to comprehensively test all HARs and their chimpanzee orthologs, they identified a PPP1R17-regulating HAR functional during neurodevelopment. Overall, MPRA studies of HARs to date have primarily focused on the human brain (see Norman et al. [53] for an MPRA of HARs in testis).

Reporter assays have also interrogated other rapidly evolving human sequences beyond HARs. Mangan et al. [54] studied human ancestor quickly evolved regions (HAQERs), previously neutral regions that have evolved even faster than HARs (Figure 1C). Using in vivo single-cell STARR-seq (scSTARR-seq) in embryonic mouse cerebral cortex, the authors compared modern, archaic, chimpanzee, and ancestral HAQER sequences, identifying those that introduce de novo, cell-type specific CRE activity and were enriched in neurodevelopmental disease variants. These included HAQERs involved in rapid divergence following human-specific segmental duplications of FOXD4, involved in neuronal differentiation, and NBPF, associated with brain size. Another important class of putatively adaptive human-chimpanzee non-coding differences are human-specific deletions, rather than substitutions, in highly conserved regions (hCONDELs) (Figure 1D). hCONDELs may disrupt deeply conserved CRE function and thus could underlie distinctly human traits. While large hCONDELs have been studied previously [55], Xue et al. [29] used the scale and nucleotide-resolution of MPRAs to characterize thousands of small hCONDELs more amenable to reporter assays that compare the activity of two alleles. Following these screens with CRISPR genome editing and single-cell RNA-seq, the authors were able validate two regulatory hCONDELs’ effects on the neurodevelopmental genes PPP2CA and LOXL2.

MPRAs have been used to screen non-coding variants in more recent human evolution, such as those identified by comparing modern human versus Neanderthal and Denisovan genomes (Figure 1E) [14,15]. Most modern-archaic phenotypic differences, such as gene expression and soft tissue differences, are not preserved in the archaeological record, but can be investigated by functional genomics. Weiss et al. [56] found hundreds of modern human-specific single-nucleotide changes with regulatory differences using MPRAs in neural progenitor cells (NPCs), embryonic stem cells, and bone osteoblasts, chosen to reflect the evolutionary importance of the brain, development, and skeletal anatomy. They found that modern-specific cis-regulatory changes were enriched in genes of the vocal tract and brain, possibly reflecting divergence in speech production and cognition despite similar overall brain size. They also found that many modern-specific cis-regulatory changes in the brain were predicted to disrupt binding sites for ZNF281, which inhibits neuronal differentiation, and highlighted a modern-specific promoter variant of SATB2, a TF associated with brain and craniofacial phenotypes, that down-regulated expression in all three cell types. Archaic introgression also contributed to phenotypic variation in modern non-Africans, particularly phenotypes related to sun exposure, high altitudes, cold environments, and endemic pathogens of Eurasia (Figure 1F) [6,57,58]. Jagoda et al. [28] used MPRA to screen Neanderthal introgressed variants for regulatory effects in immune cells, and validated two of these variants, using CRISPR-Cas9 enhancer deletions to show they target the influenza infection responsive genes ELMSAN1 (rs11624425), and PAN2 and STAT2 (rs80317430). They also used MPRA to functionally fine-map causal non-coding variants underlying an adaptive Neanderthal-like haplotype at the COVID-19 risk-associated CCR1/5 locus [59]. Smaller-scale reporter assays have also validated regulatory variants on the adaptive OAS1/2/3 Denisovan-like haplotype involved in innate immunity in Papuans [60]. These MPRAs complement efforts to catalog immune-related eQTLs based on infection-stimulation of immune cells across human ancestry groups, which currently lack substantial Denisovan-introgressed variation [61,62].

Within modern humans, selection scans have nominated hundreds of loci with signatures of recent natural selection underlying locally adaptive traits (Figure 1G) [6,63]. Yet for only a few exemplary selective sweep loci do we currently have a good understanding of the underlying causal variants and their functions [9,64]. Moreover, high LD in recent sweep regions, potentially spanning megabases, complicates efforts to fine-map causal variants [63]. These problems mirror those present in GWAS and eQTL causal variant fine-mapping, which have been successfully tackled by reporter assays and CRISPR genome editing [46,65], suggesting these tools may also be useful for dissecting selective sweeps. Three recent studies used reporter assay fine-mapping and CRISPR validation of evolutionary loci: skin barrier adaptation at the involucrin locus in Northern/Western Europeans [25], pulmonary fibrosis and testosterone at the positively selected IVD locus in Japanese ancestries [26], and hypoxia adaptation at the Denisovan introgressed EPAS1 locus in high altitude Tibetans [27]. Alternatively, starting with a phenotype already known to underlie local adaptation, Feng et al. [31] identified candidate GWAS variants and variants with extreme allele frequency differences underlying skin pigmentation differences in Africa, then applied MPRA and CRISPR to identify regulatory single nucleotide variants in melanocytes and validate their downstream impacts on gene expression and melanin levels. This revealed repeated mutations (rs6497271-G, nearly fixed in Africans, and rs12913832-A, nearly fixed in Europeans) in the same OCA2 enhancer contributing to global diversity in hair, skin, and eye color (the AA haplotype, which had the highest enhancer activity, is most commonly found in Africans and South Asians, GA in East Asians, and GG, which had the lowest, in Europeans), and mutations in MITF (rs111969762), LEF1 (rs11939273 and rs17038630), TRPS1 (rs11985280), and BLOC1S6 (rs72713175) enhancers that contribute to light skin color in the San people of southern Africa. Combining selection scans with massively parallel functional approaches provides a powerful lens to pinpoint phenotypically impactful variation even in underrepresented ancestry groups [11].

Advancing beyond experimental characterization to prediction

While the experimental techniques discussed thus far provide crucial insights into the functional impacts of genomic elements in human evolution, they are often constrained by scale, time, and resources. Here, we transition to exploring how machine learning (ML) approaches are complementing and extending these experimental methods. ML models offer the ability to make predictions across vast genomic landscapes and diverse cellular contexts, potentially identifying targets for more focused experimental validation. Importantly, ML predictions can guide experimental design, while experimental results inform and improve ML models. We examine how ML is enhancing our predictive capabilities, while recognizing that these computational insights will ultimately require experimental validation. .

Machine learning (ML) models of genome function from DNA sequence alone are beginning to accurately predict context-specific epigenomes [6668], 3D chromatin [69], and gene expression [70]. These sequence-to-function models aim to learn underlying cis-regulatory grammar [71], allowing them to predict variant effects in silico at scales that cannot be achieved with CRISPR or MPRA (Figure 2D). However, they require large high-quality training data in relevant contexts [71], can be challenging to adapt to biological data [72], and often struggle with distal interactions, especially in predicting gene expression [73,74]. Nonetheless, ML capabilities are rapidly improving, and current models are already achieving success in interpreting human evolution.

One area where ML models have been applied in evolution is to predict 3D chromatin contact maps. Keough et al. [75] investigated whether HARs emerged from conserved CREs where 3D chromatin folding had been rewired by human-specific structural variants (hsSVs) by using comparisons of Hi-C contacts between human and chimp NPCs, MPRAs in human midgestation telencephalon primary cells, and 3D chromatin contact map prediction of 1 Mb human and chimpanzee sequences from the ML model Akita [69]. These implicated enhancers that target neurodevelopmental genes, such as NECTIN3 and MAF, associated with hsSVs that rewire 3D chromatin folding, and HARs with MPRA activity sharing topologically associating domains (TADs) with developmental genes EFNA5, EN1, PBX3, and GBX2. The Akita model has also been applied to predict 3D chromatin differences between modern and archaic humans and within modern human populations, identifying 392 genomic regions with divergent 3D chromatin folding between modern and archaic genomes, including one containing a Neanderthal introgressed CTCF mutation that creates two TADs and is an eQTL for the insulin-like growth factor-binding protein 3 gene, IGFBP3 [76,77], identifying many loci expected to display 3D chromatin divergence even when lacking substantial sequence divergence, including modern and archaic differences in overlap with genes associated with eye, hair, skeletal, lung function, brain, and immune phenotypes. As in many of these ML approaches, functional experiments are needed to validate these predictions.

ML models have also achieved highly accurate predictions of non-coding CRE activity. Whalen et al. [23] employed MPRAs and the ML model Sei [68] to systematically interrogate the cis-regulatory effects of individual substitutions within HARs. They found that these substitutions have larger effect sizes than human polymorphisms in the same HARs. They also observed that HARs harbor compensatory mutations, which may reflect a history of adaptive trade-offs between novel human cognitive abilities and neuropsychiatric disorders, such as ASD and schizophrenia [19,78]. Li et al. [79] employed an ML model to predict thousands of putative de novo neurodevelopmental enhancers, and found that they could often arise from just a single substitution in a previously neutral sequence, in contrast to HAQERs which have many. Finally, Kaplow et al. [80] developed the ML model TACIT to predict cell type-specific enhancer conservation across 222 mammals using open chromatin data from just a few species. This was used to link non-coding regions to phenotypes that vary across mammals, such as brain size, vocal learning, and social behavior.

Looking forward, we anticipate a wealth of opportunities for ML predictions in human evolution studies. Expanding functional genomic annotations across variants [81], cellular contexts [10], and, importantly, diverse ancestry groups [11] will result in critical community resources for developing highly accurate non-coding variant effect predictors. Incorporating information about distal interactions will be key to improving gene expression prediction from sequence [73,74]. Finally, massively parallel functional experiments, including CRISPR screens and MPRAs on synthetic sequences, will be crucial for generating training and benchmarking data for sequences beyond those seen in the human genome [71].

Conclusions

Rapid progress is being made to link non-coding regulatory variants to downstream functions and phenotypes, a long-standing goal of genomics. This revolution has been enabled by high-throughput experimental systems for functional characterization, such as MPRA and CRISPR screens, as well as computational prediction tools. The field of human evolution is increasingly using this powerful toolkit to distinguish causal non-coding variants and to understand the regulatory genetic basis of adaptive traits in human evolution.

Novel CRISPR-based tools, such as pooled base/prime editing, combinatorial editing, and Perturb-seq, offer new capabilities for large-scale characterization of non-coding variants driving human evolution. These have been primarily applied in individual cell lines, such as NPCs relevant to human and chimp neurodevelopment, but combining these with emerging stem cell-derived models, such as organoids and embryoids [4], will allow us to comprehensively interrogate variant effects across multiple tissue systems and developmental time points. However, the transition from high-throughput variant discovery to organismal phenotypes remains challenging due to the low-throughput nature of in vivo studies.

Moving forward, we anticipate a multi-pronged strategy to bridge this gap. First, refining variant prioritization methods, such as MPRAs and machine learning predictions, will be crucial for identifying the most promising candidate variants for in-depth study. Second, linking prioritized variants to downstream genes, cell types, tissues, and pathways, using techniques such 3D chromatin maps, pooled CRISPR screens, and single-cell multi-omics [82,83]. Third, the development of high-throughput in vivo methods, such as advanced CRISPR techniques in transgenic that can make multiple edits or complex humanized mouse models, will be essential for scalable characterization of organismal phenotypes, such as those involved in skeletal development [12,13]. Combining these approaches will allow researchers to balance detailed characterization of candidate variants with screening approaches, thus linking cellular mechanisms to complex organismal phenotypes.

These genome editing and perturbation tools will be crucial for tackling the wide range of adaptive human traits, as well as answering fundamental questions about adaptive and antagonistic pleiotropy in human evolution. Adaptive pleiotropy occurs when a genetic variant positively affects multiple traits, while antagonistic pleiotropy refers to variants that benefit one trait while negatively impacting another. For example, variants that increased brain size in human evolution may have also led to increased difficulty in childbirth [84,85]. Understanding these trade-offs is crucial for a complete picture of human adaptive evolution. Moreover, these approaches could allow us to identify non-coding variants underlying the many physiological differences between humans and other great apes, such as bipedalism, digestion, metabolism, and life history [4,5]. In the future, new CRISPR engineering approaches may also be used to characterize understudied variation types, such as human-specific tandem repeats, transposable elements or complex genomic rearrangements, which have been linked to evolutionary innovation.

MPRAs have enabled unprecedented precision into understanding the regulatory effects of individual mutations across evolution. However, they have mostly been applied to individual cell lines, with the exception of the relatively low-throughput in vivo scSTARR-seq assay used to characterize HAQERs [54]. They also test sequences outside their endogenous context, and thus should be interpreted in the context of endogenous maps of CRE activity and gene targets, and CRISPR experimental validations. Increasing the throughput of single cell and in vivo MPRAs will enable large-scale comparative regulatory insights across many tissues and cell types. For recent adaptations that vary across modern humans, beneficial variants may be functional only in the context of particular environmental stimuli, such as immune stimulation [59], hypoxia [27], or sun exposure [31]. MPRAs can be used to uncover these gene-by-environment interactions and identify differential regulatory effects under environmental perturbations, such as pathogens, micronutrients, and environmental toxins [86]. Integrating MPRA and CRISPR screens across cellular and environmental contexts will greatly expand the number of selective sweep loci with known causal variants and function.

Finally, ML models are allowing researchers to move beyond experiments to in silico predictions, thus greatly expanding the scope of variants and cell types that can be examined in a single study. Future ML models will show rapid advances in capabilities, such as predicting distal interactions and individual gene expression from sequence [65,68,69]. These models will benefit from expansion of bulk and single cell functional genomic resources, including CRISPR screens and MPRAs, across variants [75], cell types [10], diverse ancestries [11], and closely related species [3]. Moreover, understanding complex adaptive phenotypes will continue to require biobank-scale association studies, and may be aided by ML models for image-based phenotyping, such as for skeletal form [18]. Combining these ML tools for variant prioritization with experimental characterization validation in organisms provides a roadmap to transform our understanding of human origins and adaptive evolution.

Highlights.

  • Human evolution/adaptation is substantially driven by non-coding, regulatory changes

  • Novel high-throughput methods can characterize the function of evolutionarily relevant loci and the variation within them

  • CRISPR genomic perturbation screens endogenously measure molecular/cellular phenotypes

  • Massively Parallel Reporter Assays (MPRAs) assay for regulatory impact of non-coding variants across cell types

  • Machine learning predicts regulatory function from sequence at genome-scale

Acknowledgments

We regret the inability to highlight additional relevant work in this area of research due to space restriction. This work was supported by a Yale University and Boehringer Ingelheim Biomedical Data Science Fellowship to SR, and R00HG010669 and R01HG012872 to SKR. We thank Jared F. Akers, Thanh Thanh L. Nguyen, Catherine McGuinness, and two anonymous reviewers for helpful comments and suggestions that greatly improved the manuscript. Organism silhouettes are public domain and from PhyloPic (www.phylopic.org).

Footnotes

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Declaration of Competing Interests

The authors declare no conflict of interest.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References and recommended reading

  • 1.Ávila-Arcos MC, Raghavan M, Schlebusch C: Going local with ancient DNA: A review of human histories from regional perspectives. Science 2023, doi: 10.1126/science.adh8140. [DOI] [PubMed] [Google Scholar]
  • 2.Koenig Z, Yohannes MT, Nkambule LL, Zhao X, Goodrich JK, Kim HA, Wilson MW, Tiao G, Hao SP, Sahakian N, et al. : A harmonized public resource of deeply sequenced diverse human genomes. bioRxiv 2024, doi: 10.1101/2023.01.23.525248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Christmas MJ, Kaplow IM, Genereux DP, Dong MX, Hughes GM, Li X, Sullivan PF, Hindle AG, Andrews G, Armstrong JC, et al. : Evolutionary constraint and innovation across hundreds of placental mammals. Science 2023, 380:eabn3943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Pollen AA, Kilik U, Lowe CB, Camp JG: Human-specific genetics: new tools to explore the molecular and cellular basis of human evolution. Nat Rev Genet 2023, 24:687–711. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Varki A, Altheide TK: Comparing the human and chimpanzee genomes: Searching for needles in a haystack. Genome Res 2005, 15:1746–1758. [DOI] [PubMed] [Google Scholar]
  • 6.Rees JS, Castellano S, Andrés AM: The Genomics of Human Local Adaptation. Trends Genet 2020, 36. [DOI] [PubMed] [Google Scholar]
  • 7.Yu N, Chen F-C, Ota S, Jorde LB, Pamilo P, Patthy L, Ramsay M, Jenkins T, Shyue S-K, Li W-H: Larger Genetic Differences Within Africans Than Between Africans and Eurasians. Genetics 2002, 161:269–274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.King M-C, Wilson AC: Evolution at Two Levels in Humans and Chimpanzees. Science 1975, doi: 10.1126/science.1090005. [DOI] [PubMed] [Google Scholar]
  • 9.Wray GA: The evolutionary significance of cis-regulatory mutations. Nat Rev Genet 2007, 8:206–216. [DOI] [PubMed] [Google Scholar]
  • 10.Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, Adrian J, Kawli T, Davis CA, Dobin A, Kaul R, et al. : Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 2020, 583:699–710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Popejoy AB, Fullerton SM: Genomics is failing on diversity. Nature Publishing Group; UK: 2016, doi: 10.1038/538161a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kvon EZ, Zhu Y, Kelman G, Novak CS, Plajzer-Frick I, Kato M, Garvin TH, Pham Q, Harrington AN, Hunter RD, et al. : Comprehensive In Vivo Interrogation Reveals Phenotypic Impact of Human Enhancer Variants. Cell 2020, 180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Dutrow EV, Emera D, Yim K, Uebbing S, Kocher AA, Krenzer M, Nottoli T, Burkhardt DB, Krishnaswamy S, Louvi A, et al. : Modeling uniquely human gene regulatory function via targeted humanization of the mouse genome. Nat Commun 2022, 13:1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH- Y, et al. : A Draft Sequence of the Neandertal Genome. Science 2010, 328:710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Reich D, Green RE, Kircher M, Krause J, Patterson N, Durand EY, Viola B, Briggs AW, Stenzel U, Johnson PLF, et al. : Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 2010, 468:1053–1060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Pollard KS, Salama SR, King B, Kern AD, Dreszer T, Katzman S, Siepel A, Pedersen JS, Bejerano G, Baertsch R, et al. : Forces shaping the fastest evolving regions in the human genome. PLoS Genet 2006, 2:e168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Capra JA, Erwin GD, McKinsey G, Rubenstein JLR, Pollard KS: Many human accelerated regions are developmental enhancers. Philos Trans R Soc Lond B Biol Sci 2013, 368:20130025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kun E, Javan EM, Smith O, Gulamali F, de la Fuente J, Flynn BI, Vajrala K, Trutner Z, Jayakumar P, Tucker-Drob EM, et al. : The genetic architecture and evolution of the human skeletal form. Science 2023, 381:eadf8009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Doan RN, Bae B-I, Cubelos B, Chang C, Hossain AA, Al-Saad S, Mukaddes NM, Oner O, Al-Saffar M, Balkhy S, et al. : Mutations in Human Accelerated Regions Disrupt Cognition and Social Behavior. Cell 2016, 167:341–354.e12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Aldea D, Atsuta Y, Kokalari B, Schaffner SF, Prasasya RD, Aharoni A, Dingwall HL, Warder B, Kamberov YG: Repeated mutation of a developmental enhancer contributed to human thermoregulatory evolution. Proc Natl Acad Sci U S A 2021, 118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Girskis KM, Stergachis AB, DeGennaro EM, Doan RN, Qian X, Johnson MB, Wang PP, Sejourne GM, Nagy MA, Pollina EA, et al. : Rewiring of human neurodevelopmental gene regulatory programs by human accelerated regions. Neuron 2021, 109:3239–3251.e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Uebbing S, Gockley J, Reilly SK, Kocher AA, Geller E, Gandotra N, Scharfe C, Cotney J, Noonan JP: Massively parallel discovery of human-specific substitutions that alter enhancer activity. Proc Natl Acad Sci U S A 2021, 118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Whalen S, Inoue F, Ryu H, Fair T, Markenscoff-Papadimitriou E, Keough K, Kircher M, Martin B, Alvarado B, Elor O, et al. : Machine learning dissection of human accelerated regions in primate neurodevelopment. Neuron 2023, 111:857–873.e8. This study used MPRA and ML predictions to systematically interrogate the regulatory activity of substitutions in HARs. They found that HARs harbor compensatory mutations which may reflect adaptive trade-offs between human cognitive abilities and neuropsychiatric disorders.
  • 24.Anzalone AV, Koblan LW, Liu DR: Genome editing with CRISPR-Cas nucleases, base editors, transposases and prime editors. Nat Biotechnol 2020, 38:824–844. [DOI] [PubMed] [Google Scholar]
  • 25.Mathyer ME, Brettmann EA, Schmidt AD, Goodwin ZA, Oh IY, Quiggle AM, Tycksen E, Ramakrishnan N, Matkovich SJ, Guttman-Yassky E, et al. : Selective sweep for an enhancer involucrin allele identifies skin barrier adaptation out of Africa. Nat Commun 2021, 12:2557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Brown EA, Kales S, Boyle MJ, Vitti J, Kotliar D, Schaffner SF, Tewhey RS, Sabeti PC: Three linked opposing regulatory variants under selection associate with IVD. bioRxiv 2022, doi: 10.1101/2022.12.22.521605. [DOI] [Google Scholar]
  • 27. Gray OA, Yoo J, Sobreira DR, Jousma J, Witonsky D, Sakabe NJ, Peng Y-J, Prabhakar NR, Fang Y, Nobréga MA, et al. : A pleiotropic hypoxia-sensitive EPAS1 enhancer is disrupted by adaptive alleles in Tibetans. Sci Adv 2022, 8:eade1942. This study focused on the Denisovan introgressed haplotype at EPAS1 driving hypoxia adaptation in high-altitude Tibetans. The authors used reporter assays to identify four CREs regulating EPAS1 and CRISPR deletions to characterize one of them, ENH6, in mice.
  • 28. Jagoda E, Xue JR, Reilly SK, Dannemann M, Racimo F, Huerta-Sanchez E, Sankararaman S, Kelso J, Pagani L, Sabeti PC, et al. : Detection of Neanderthal Adaptively Introgressed Genetic Variants That Modulate Reporter Gene Expression in Human Immune Cells. Mol Biol Evol 2022, 39. Using MPRA, this study identified high-frequency introgressed variants from Neanderthals with differences in regulatory activity related to immune response in modern humans.
  • 29. Xue JR, Mackay-Smith A, Mouri K, Garcia MF, Dong MX, Akers JF, Noble M, Li X, Zoonomia Consortium†, Lindblad-Toh K, et al. : The functional and evolutionary impacts of human-specific deletions in conserved elements. Science 2023, 380:eabn2253. The authors identified small human-specific deletions in conserved regions (hCONDELs) and showed via MPRAs across six cell types that many of them resulted in a difference in regulatory activity, ultimately validating two affecting neurodevelopmental genes.
  • 30.Shin T, Song JHT, Kosicki M, Kenny C, Beck SG, Kelley L, Qian X, Bonacina J, Papandile F, Antony I, et al. : Rare variation in non-coding regions with evolutionary signatures contributes to autism spectrum disorder risk. medRxiv 2023, doi: 10.1101/2023.09.19.23295780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Feng Y, Xie N, Inoue F, Fan S, Saskin J, Zhang C, Zhang F, Hansen MEB, Nyambo T, Mpoloka SW, et al. : Integrative functional genomic analyses identify genetic variants influencing skin pigmentation in Africans. Nat Genet 2024, 56:258–272. This study integrates MPRA, Hi-C, and CRISPR to identify regulatory variants affecting melanin levels in melanocytes underlying adaptive skin pigmentation differences within Africa.
  • 32.Agata A, Ohtsuka S, Noji R, Gotoh H, Ono K, Nomura T: A Neanderthal/Denisovan GLI3 variant contributes to anatomical variations in mice. Frontiers in Cell and Developmental Biology 2023, 11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Trujillo CA, Rice ES, Schaefer NK, Chaim IA, Wheeler EC, Madrigal AA, Buchanan J, Preissl S, Wang A, Negraes PD, et al. : Reintroduction of the archaic variant of NOVA1 in cortical organoids alters neurodevelopment. Science 2021, 371:eaax2537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Dixit A, Parnas O, Li B, Chen J, Fulco CP, Jerby-Arnon L, Marjanovic ND, Dionne D, Burks T, Raychowdhury R, et al. : Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens. Cell 2016, 167:1853–1866.e17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Geller E, Noble MA, Morales M, Gockley J, Emera D, Uebbing S, Cotney JL, Noonan JP: Massively parallel disruption of enhancers active in human neural stem cells. Cell Rep 2024, 43:113693. This study used a pooled CRISPR KO screen in human neural stem cells to disrupt genes, conserved regions, and CREs involved in brain development. CRE disruptions have weaker overall effects than gene disruptions on cell proliferation. CREs target genes were enriched in neurodevelopmental disorder genes.
  • 36.She R, Fair T, Schaefer NK, Saunders RA, Pavlovic BJ, Weissman JS, Pollen AA: Comparative landscape of genetic dependencies in human and chimpanzee stem cells. Cell 2023, 186:2977–2994.e23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Fair T, Pavlovic BJ, Schaefer NK, Pollen AA: Mapping cis- and trans-regulatory target genes of human-specific deletions. bioRxiv 2023, doi: 10.1101/2023.12.27.573461. [DOI] [Google Scholar]
  • 38.Hanna RE, Hegde M, Fagre CR, DeWeirdt PC, Sangree AK, Szegletes Z, Griffith A, Feeley MN, Sanson KR, Baidi Y, et al. : Massively parallel assessment of human variants with base editor screens. Cell 2021, 184:1064–1080.e20. [DOI] [PubMed] [Google Scholar]
  • 39.Ren X, Yang H, Nierenberg JL, Sun Y, Chen J, Beaman C, Pham T, Nobuhara M, Takagi MA, Narayan V, et al. : High-throughput PRIME-editing screens identify functional DNA variants in the human genome. Mol Cell 2023, 83:4633–4645.e9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Martyn GE, Montgomery MT, Jones H, Guo K, Doughty BR, Linder J, Chen Z, Cochran K, Lawrence KA, Munson G, et al. : Rewriting regulatory DNA to dissect and reprogram gene expression. bioRxiv 2023, doi: 10.1101/2023.12.20.572268. [DOI] [Google Scholar]
  • 41.Liu B, Jing Z, Zhang X, Chen Y, Mao S, Kaundal R, Zou Y, Wei G, Zang Y, Wang X, et al. : Large-scale multiplexed mosaic CRISPR perturbation in the whole organism. Cell 2022, 185:3008–3024.e16. [DOI] [PubMed] [Google Scholar]
  • 42.Chabot A, Shrit RA, Blekhman R, Gilad Y: Using reporter gene assays to identify cis regulatory differences between humans and chimpanzees. Genetics 2007, 176:2069–2076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Melnikov A, Murugan A, Zhang X, Tesileanu T, Wang L, Rogov P, Feizi S, Gnirke A, Callan CG Jr, Kinney JB, et al. : Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat Biotechnol 2012, 30:271–277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Patwardhan RP, Hiatt JB, Witten DM, Kim MJ, Smith RP, May D, Lee C, Andrie JM, Lee S-I, Cooper GM, et al. : Massively parallel functional dissection of mammalian enhancers in vivo. Nat Biotechnol 2012, 30:265–270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Arnold CD, Gerlach D, Stelzer C, Boryń ŁM, Rath M, Stark A: Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 2013, 339:1074–1077. [DOI] [PubMed] [Google Scholar]
  • 46.Tewhey R, Kotliar D, Park DS, Liu B, Winnicki S, Reilly SK, Andersen KG, Mikkelsen TS, Lander ES, Schaffner SF, et al. : Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay. Cell 2016, 165:1519–1529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Inoue F, Kircher M, Martin B, Cooper GM, Witten DM, McManus MT, Ahituv N, Shendure J: A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity. Genome Res 2017, 27:38–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Ray JP, de Boer CG, Fulco CP, Lareau CA, Kanai M, Ulirsch JC, Tewhey R, Ludwig LS, Reilly SK, Bergman DT, et al. : Prioritizing disease and trait causal variants at the TNFAIP3 locus using functional and genomic features. Nat Commun 2020, 11:1237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Gallego Romero I, Lea AJ: Leveraging massively parallel reporter assays for evolutionary questions. Genome Biol 2023, 24:26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Griesemer D, Xue JR, Reilly SK, Ulirsch JC, Kukreja K, Davis JR, Kanai M, Yang DK, Butts JC, Guney MH, et al. : Genome-wide functional screen of 3’UTR variants uncovers causal variants for human disease and evolution. Cell 2021, 184:5247–5260.e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Rong S, Neil CR, Welch A, Duan C, Maguire S, Meremikwu IC, Meyerson M, Evans BJ, Fairbrother WG: Large-scale functional screen identifies genetic variants with splicing effects in modern and archaic humans. Proc Natl Acad Sci U S A 2023, 120:e2218308120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Pizzollo J, Zintel TM, Babbitt CC: Differentially Active and Conserved Neural Enhancers Define Two Forms of Adaptive Noncoding Evolution in Humans. Genome Biol Evol 2022, 14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Norman AR, Ryu AH, Jamieson K, Thomas S, Shen Y, Ahituv N, Pollard KS, Reiter JF: A Human Accelerated Region is a Leydig cell GLI2 Enhancer that Affects Male-Typical Behavior. bioRxiv 2021, doi: 10.1101/2021.01.27.428524. [DOI] [Google Scholar]
  • 54. Mangan RJ, Alsina FC, Mosti F, Sotelo-Fonseca JE, Snellings DA, Au EH, Carvalho J, Sathyan L, Johnson GD, Reddy TE, et al. : Adaptive sequence divergence forged new neurodevelopmental enhancers in humans. Cell 2022, 185:4587–4603.e23. The authors identified the fastest-evolved regions of the human genome (HAQERs). Using a novel in vivo single-cell STARR-seq in embryonic mouse cerebral cortex, the authors find that some HAQERs create hominin-specific neurodevelopmental regulatory elements and are enriched in neurodevelopmental disease variants.
  • 55.McLean CY, Reno PL, Pollen AA, Bassan AI, Capellini TD, Guenther C, Indjeian VB, Lim X, Menke DB, Schaar BT, et al. : Human-specific loss of regulatory DNA and the evolution of human-specific traits. Nature 2011, 471:216–219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Weiss CV, Harshman L, Inoue F, Fraser HB, Petrov DA, Ahituv N, Gokhman D: The cis-regulatory effects of modern human-specific variants. Elife 2021, 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Reilly PF, Tjahjadi A, Miller SL, Akey JM, Tucci S: The contribution of Neanderthal introgression to modern human traits. Curr Biol 2022, 32:R970–R983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Ahlquist KD, Bañuelos MM, Funk A, Lai J, Rong S, Villanea FA, Witt KE: Our Tangled Family Tree: New Genomic Methods Offer Insight into the Legacy of Archaic Admixture. Genome Biol Evol 2021, 13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Jagoda E, Marnetto D, Senevirathne G, Gonzalez V, Baid K, Montinaro F, Richard D, Falzarano D, LeBlanc EV, Colpitts CC, et al. : Regulatory dissection of the severe COVID-19 risk locus introgressed by Neanderthals. Elife 2023, 12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Vespasiani DM, Jacobs GS, Cook LE, Brucato N, Leavesley M, Kinipi C, Ricaut F-X, Cox MP, Gallego Romero I: Denisovan introgression has shaped the immune system of present-day Papuans. PLoS Genet 2022, 18:e1010470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Quach H, Rotival M, Pothlichet J, Loh Y- HE, Dannemann M, Zidane N, Laval G, Patin E, Harmant C, Lopez M, et al. : Genetic Adaptation and Neandertal Admixture Shaped the Immune System of Human Populations. Cell 2016, 167:643–656.e17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Aquino Y, Bisiaux A, Li Z, O’Neill M, Mendoza-Revilla J, Merkling SH, Kerner G, Hasan M, Libri V, Bondet V, et al. : Dissecting human population variation in single-cell responses to SARS-CoV-2. Nature 2023, 621:120–128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Grossman SR, Andersen KG, Shlyakhter I, Tabrizi S, Winnicki S, Yen A, Park DJ, Griesemer D, Karlsson EK, Wong SH, et al. : Identifying recent adaptations in large-scale genomic data. Cell 2013, 152:703–713. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Szpak M, Xue Y, Ayub Q, Tyler-Smith C: How well do we understand the basis of classic selective sweeps in humans? FEBS Lett 2019, 593:1431–1448. [DOI] [PubMed] [Google Scholar]
  • 65.Abell NS, DeGorter MK, Gloudemans MJ, Greenwald E, Smith KS, He Z, Montgomery SB: Multiple causal variants underlie genetic associations in humans. Science 2022, 375:1247–1254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Zhou J, Troyanskaya OG: Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 2015, 12:931–934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Avsec Ž, Weilert M, Shrikumar A, Krueger S, Alexandari A, Dalal K, Fropf R, McAnany C, Gagneur J, Kundaje A, et al. : Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet 2021, 53:354–366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Chen KM, Wong AK, Troyanskaya OG, Zhou J: A sequence-based global map of regulatory activity for deciphering human genetics. Nat Genet 2022, 54:940–949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Fudenberg G, Kelley DR, Pollard KS: Predicting 3D genome folding from DNA sequence with Akita. Nat Methods 2020, 17:1111–1117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Avsec Ž, Agarwal V, Visentin D, Ledsam JR, Grabska-Barwinska A, Taylor KR, Assael Y, Jumper J, Kohli P, Kelley DR: Effective gene expression prediction from sequence by integrating long-range interactions. Nat Methods 2021, 18:1196–1203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.de Boer CG, Taipale J: Hold out the genome: a roadmap to solving the cis-regulatory code. Nature 2024, 625:41–50. [DOI] [PubMed] [Google Scholar]
  • 72.Whalen S, Schreiber J, Noble WS, Pollard KS: Navigating the pitfalls of applying machine learning in genomics. Nat Rev Genet 2022, 23:169–181. [DOI] [PubMed] [Google Scholar]
  • 73.Karollus A, Mauermeier T, Gagneur J: Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers. Genome Biol 2023, 24:56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Huang C, Shuai RW, Baokar P, Chung R, Rastogi R, Kathail P, Ioannidis NM: Personal transcriptome variation is poorly explained by current genomic deep learning models. Nat Genet 2023, 55:2056–2059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Keough KC, Whalen S, Inoue F, Przytycki PF, Fair T, Deng C, Steyert M, Ryu H, Lindblad-Toh K, Karlsson E, et al. : Three-dimensional genome rewiring in loci with human accelerated regions. Science 2023, 380:eabm1696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.McArthur E, Rinker DC, Gilbertson EN, Fudenberg G, Pittman M, Keough K, Pollard KS, Capra JA: Reconstructing the 3D genome organization of Neanderthals reveals that chromatin folding shaped phenotypic and sequence divergence. bioRxiv 2022, doi: 10.1101/2022.02.07.479462. [DOI] [Google Scholar]
  • 77.Gilbertson EN, Brand CM, McArthur E, Rinker DC, Kuang S, Pollard KS, Capra JA: Machine learning reveals the diversity of human 3D chromatin contact patterns. bioRxiv 2023, doi: 10.1101/2023.12.22.573104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Xu K, Schadt EE, Pollard KS, Roussos P, Dudley JT: Genomic and network patterns of schizophrenia genetic variation in human evolutionary accelerated regions. Mol Biol Evol 2015, 32:1148–1160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Li S, Hannenhalli S, Ovcharenko I: De novo human brain enhancers created by single-nucleotide mutations. Sci Adv 2023, 9:eadd2911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80. Kaplow IM, Lawler AJ, Schäffer DE, Srinivasan C, Sestili HH, Wirthlin ME, Phan BN, Prasad K, Brown AR, Zhang X, et al. : Relating enhancer genetic variation across mammals to complex phenotypes using machine learning. Science 2023, 380:eabm7993. This study developed TACIT, a ML model that connects potential regulatory elements with associated phenotypes across mammalian species. They used TACIT to associate CREs with phenotypes underlying convergently evolved phenotypes, including brain size, vocal learning, and social behavior.
  • 81.IGVF Consortium: The Impact of Genomic Variation on Function (IGVF) Consortium. ArXiv 2023, doi: 10.48550/arXiv.2307.13708. [DOI] [Google Scholar]
  • 82.Pal A, Noble MA, Morales M, Pal R, Baumgartner M, Yang JW, Yim KM, Uebbing S, Noonan JP: Resolving the three-dimensional interactome of Human Accelerated Regions during human and chimpanzee neurodevelopment. bioRxiv 2024, doi: 10.1101/2024.06.25.600691. [DOI] [Google Scholar]
  • 83.Schnitzler GR, Kang H, Fang S, Angom RS, Lee-Kim VS, Ma XR, Zhou R, Zeng T, Guo K, Taylor MS, et al. : Convergence of coronary artery disease genes onto endothelial cell programs. Nature 2024, 626:799–807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Xu L, Kun E, Pandey D, Wang JY, Brasil MF, Singh T, Narasimhan VM: The genetic architecture and evolutionary consequences of the human pelvic form. bioRxiv 2024, doi: 10.1101/2024.05.02.592256. [DOI] [Google Scholar]
  • 85.Washburn SL: Tools and Human Evolution Pp. 10–23 in Scientific Technology and Social Change: Readings from Scientific American 1974. 1960, [PubMed] [Google Scholar]
  • 86.Findley AS, Zhang X, Boye C, Lin YL, Kalita CA, Barreiro L, Lohmueller KE, Pique-Regi R, Luca F: A signature of Neanderthal introgression on molecular mechanisms of environmental responses. PLoS Genet 2021, 17:e1009493. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES