Abstract
Genome-wide association studies (GWASs) provide a critical foundation for elucidating the genetic underpinnings of common polygenic diseases. However, these studies have limitations in their ability to assign causality to particular genetic variants, especially those residing in the noncoding genome. Over the last decade, technological and methodological advances in both the analytical and empirical prioritization of noncoding variants have made possible the identification of causative variants by leveraging orthogonal functional evidence at increasing scale. In this review, we present an overview of these approaches and describe how this workflow provides the groundwork necessary to move beyond associations, towards genetically informed studies about the molecular and cellular mechanisms of polygenic disease.
Keywords: Noncoding variant, GWAS, fine-mapping, variant effect prediction, variant prioritization, functional genomics
In search of causality
Elucidating the genetic mechanisms of common diseases with heritable components but no clear Mendelian inheritance patterns is one of the most pressing biomedical challenges of our time. Common diseases can be influenced by many common genetic variants, each with typically small effects on disease phenotypes1,2. Genome-wide association studies (GWASs) delineate the polygenic nature of complex heritable traits by identifying differences in allele frequencies among individuals with similar ancestral backgrounds but distinct phenotypes3,4. GWASs have identified hundreds of thousands of genomic loci associated with human traits3,4. These studies are well-suited for this purpose, but are limited in their ability to assign causality to specific variants, genes, and cell types, which is critical for outlining disease mechanisms. These limitations have spurred the development of post-GWAS analyses to identify mechanisms by which the nominated genomic loci contribute to disease.
We refer to the genetic polymorphisms that directly impact molecular phenotypes with consequent effects on disease risk as “causative variants”. Identifying such variants from GWAS data and experimentally validating their effects on molecular, cellular and organismal disease phenotypes are open challenges5–8. As evidence of this difficulty, a text-mining of the GWAS literature from 2022 found that only 309 noncoding variants among the hundreds of thousands nominated by GWAS were experimentally assessed and found to affect molecular phenotypes, and most were tested in exogenous reporter contexts9. Pinpointing specific causative variants provides critical information to identify the affected genes, biochemical pathways, and cell types predisposing an individual to disease. For example, applying analytical and functional methods described below to the FTO risk locus for obesity identified a single-nucleotide polymorphism (SNP) that disrupts a transcription factor binding motif causing dysregulation of key genes in adipocyte thermogenesis10. Moreover, identification of the precise mechanisms underlying genetic predisposition can help nominate novel targets for therapeutic intervention, prioritize patients for specific clinical trials, and improve disease prediction and prognosis prior to disease onset.
In this review, we outline a workflow to prioritize potentially causative noncoding variants and then predict and experimentally validate their effects on molecular and cellular phenotypes (Figure 1). This approach involves (1) analytical prioritization methods such as statistical and functionally informed fine-mapping (see Glossary) of disease association data and machine learning (ML) approaches to nominate candidate causative variants, (2) empirical prioritization methods via high-throughput assays such as CRISPRi/a and massively parallel reporter assays, and finally (3) functional validation methods such as allelic imbalance analysis in primary human tissue samples and genome editing in cellular and animal models to assess the effects of the variants in their endogenous genomic loci. We provide brief overviews of the rationale behind these approaches, survey recent technological advances, and give practical advice about how to implement these methodologies.
Figure 1: A systematic workflow to prioritize and functionally validate noncoding causative disease-associated variants.

Thousands of mostly-nonfunctional disease-associated variants can be narrowed down to a manageable high-priority set for targeted functional validation experiments using complementary analytical and empirical prioritization approaches. In silico methods such as statistical fine-mapping, colocalization with functional genomics data, and computational and machine-learning based prediction methods, yield insights that are supported and complemented by the results of in vitro and in vivo biochemical assays. Functional validation of variant effects can be achieved by directly measuring the allelic gene regulatory effects of variants in disease-relevant human patient tissues (“allelic imbalance”), or conducting controlled genome editing experiments to test the necessity and sufficiency of the individual variant to affect cellular phenotypes in its endogenous locus. The ultimate goal of identifying functional noncoding variants is to produce translational impacts, including aiding in prognostication and risk group stratification, elucidating cell biological disease mechanisms, and identifying genetics-based therapeutic targets.
The challenges of interpreting GWAS data
Although GWASs are powerful to identify genomic loci associated with disease, they have several limitations which post-GWAS analysis methods aim to resolve.
Linkage disequilibrium obscures causative variants
For a given trait of interest, GWAS summary statistics provide the set of variants most strongly associated with the trait. However, linkage disequilibrium (LD), the nonrandom association of alleles, can obscure causative variants among a co-inherited set at a given locus5,11. For any set of variants, LD depends on factors acting at the population level, such as natural selection and genetic bottlenecks, and at the cellular level, such as meiotic recombination frequency between the variants, which tends to increase as a function of their linear distance on the chromosome5. High LD in a locus can render non-causative variants statistically indistinguishable from the true causative variant(s)5,7,8,11. This issue, brought on by natural LD structures, is further exacerbated by the use of SNP arrays rather than whole-genome sequencing (WGS) for genotyping in most GWASs to date. These SNP arrays directly genotype common “tag” variants, a relatively small subset of all variants genome-wide, selected in the first place for having high LD with neighboring SNPs5. Genotype imputation methods infer association statistics for un-typed common variants by using reference LD patterns, to increase SNP density and improve the resolution of GWASs without inflating sequencing costs5,12. However, these methods are limited by their use of reference panels. In particular, care must be taken to ensure that the reference panels are relevant to the genetic ancestries of the population being studied, which can be challenging given a precedent of deriving these panels from European population genetic data13. Additionally, rare variants in GWAS loci, irrespective of genetic ancestry, will be under-annotated through imputation14. Consequent of all these limitations, the tag variant identified by a GWAS may not be a causative variant in the locus but may instead be in strong LD with the true causative variant(s), which may or may not be identifiable through imputation5,14,15.
Variants in the noncoding genome can have myriad molecular and cellular consequences
Over 90% of variants identified through GWASs fall in noncoding regions of the genome instead of directly affecting the peptide sequence of a protein-coding gene product7,16. Furthermore, GWAS variants are enriched in sites of chromatin accessibility, which mark active regulatory DNA16, suggesting that many disease-associated noncoding variants may affect gene expression. Indeed, variants associated with complex traits are more likely to be expression quantitative trait loci (eQTLs; see Glossary) than other variants with the same allele frequencies5,17. More recently, it has been shown that GWAS variants are especially enriched in regulatory elements for disease-relevant cell types, as identified through the ABC model which integrates data from H3K27ac ChIP-seq, 3D chromatin conformation and functional CRISPRi experiments18. The enrichment of GWAS variants in the noncoding genome presents a few key challenges for characterizing the effects of potentially causative variants.
First, a noncoding variant can affect cellular functions and gene expression in several ways. A common mechanism is via cis-regulatory elements (CREs), including enhancers, promoters, silencers, and insulators. These CREs interact with each other via transcription factors (TFs), which bind to DNA in a sequence-specific manner; therefore, variants in these sequences can alter transcription factor (TF) binding dynamics, three-dimensional chromatin conformation, and ultimately gene expression19,20. A review of GWAS literature found that the majority (287/309, 92.9%) of GWAS-identified, experimentally validated noncoding variants exert their effects through perturbation of CREs, including promoters9, with a caveat that this finding may be biased by current methods of functional validation, which mainly assess variants’ effects on gene regulatory activity. Although noncoding variants may exert their effects through many other pre- and post-transcriptional mechanisms5,20 (Figure 2), this review primarily focuses on the class of noncoding variants affecting gene expression by altering CREs.
Figure 2: Noncoding variants can exert their effects via multiple molecular mechanisms.

The downstream consequences of a noncoding variant depend on the type of genomic element targeted. Most commonly, noncoding variants alter the sequence-specific binding sites of TFs within regulatory elements. Similarly, noncoding variants could alter DNA methylation by perturbing an existing CpG dinucleotide or creating a new one. Both of these effects could result in loss or gain of TF binding. In the context of CTCF, loss or gain of binding could affect larger-scale chromatin interactions. In contrast, noncoding variants can also directly affect RNA, either through alteration of lncRNA secondary structure, splicing, or RNA stability.
Second, it can be challenging to predict which genes are affected by the variants. GWAS-nominated noncoding variants are often assigned the nearest gene as targets. However, enhancers sometimes skip nearby genes, and can regulate genes more than 1 Mb away in linear distance21. Additionally, the genes directly affected by CRE variants in cis can encode protein products that affect the expression of other genes in trans22,23. Moreover, enhancers can interact with multiple promoters and thus regulate multiple genes, and enhancer-promoter interactions may change between cell types and cell states8.
Finally, even after predicting the molecular and transcriptomic effects of a noncoding variant, it remains challenging to identify which cell types are affected, because complex diseases often involve multiple organ systems, implicating multiple tissues and cell types7,8. Noncoding regulatory elements have exquisitely cell type-specific usage8,16,24–28, suggesting that a disease-associated noncoding variant in a given regulatory element may exert its effects only in the specific cell types that use this regulatory element. Identifying the cell types in which a given genetic variant affects an active regulatory element is critical for understanding pathomechanisms, and will ultimately inform the development of genetics-based targeted therapeutic interventions.
Analytical approaches to prioritize variants
Post-GWAS analyses aiming to predict causative variant(s) in disease-associated GWAS loci are collectively referred to as fine-mapping5,19. Developments in this field have integrated statistical approaches with functional genomics datasets to prioritize variants from GWAS (Figure 3). Refinements to these functionally informed fine-mapping approaches allow for the possibility that a genomic locus harbors multiple causal variants, and incorporate functional genomic data into their priors to improve resolution.
Figure 3: Analytical methods of variant prioritization.

Identification of common disease-associated variants will often begin with genome-wide association studies (GWASs). (A) The results of a GWAS can be represented in a Manhattan plot, in which each dot represents a genetic variant that has been tested for its association with disease. Genomic positions are plotted along the X-axis, and the significance of the association between each variant and disease risk is plotted on the Y axis. Variants that surpass a genome-wide significance threshold (dotted horizontal line) can be considered associated with disease. (B) GWASs nominate genomic loci, which often contain many individual disease-associated variants. A zoomed-in plot of the disease-associated locus on Chromosome 7 illustrates how multiple individual variants in the locus surpass genome-wide significance. Each dot represents a variant, and the color of the dot illustrates the degree of linkage disequilibrium (LD) between that variant and the most significantly associated lead variant (red dot) in the locus. Population-level patterns of LD can be used to conduct statistical fine-mapping and identify a credible set of variants containing the functional variant(s) driving association of the locus with disease (“statistically fine-mapped region”, blue overlay). (C) The credible set of variants identified through statistical fine-mapping can be further refined, and their molecular effects potentially mapped to specific tissues and cell types, by intersecting the variants with external tissue and/or cell type-specific biochemical-based functional genomics data. In this example, chromatin accessibility in the locus is shown for two cell types (Cell Type A and Cell Type B). The statistically fine-mapped region contains two cell type-specific regions of accessible chromatin that are only present in Cell Type A; these “functionally fine-mapped regions” are potential cis-regulatory elements (CREs) which tend to be enriched for causative noncoding variants. Three-dimensional (3D) genomic interaction data can associate CREs to the genes they may regulate. In this example, both cell type-specific CREs contact the promoter of Gene X. Colocalization of the functionally fine-mapped regions and the results of expression quantitative trait locus (eQTL) analysis for Gene X in Cell Type A can further refine the set of prioritized variants to those that not only reside in putative CREs, but also are associated with changes to gene expression in the relevant cell type. In the eQTL plot, each dot is an individual variant with genomic coordinates plotted on the X-axis and the association of the variant with the expression of Gene X in Cell Type A plotted on the Y-axis. Other types of molecular QTLs (MolQTL) can be utilized in a similar fashion. (D) Machine learning (ML)-based computational methods of variant effect prediction are an exciting direction for the field of analytical prioritization of noncoding variants. ML models can be trained using a variety of genomic sequence-based data, including gene expression, chromatin accessibility, DNA methylation and 3D chromatin interactions. Sequence-to-sequence architectures can enable predictions about the effects of variants in their endogenous loci, compared to reference sequences.
Statistical fine-mapping to resolve LD-linked variants
In silico statistical fine-mapping efforts have become more rigorous and nuanced over the years, and we refer readers to Schaid et al.5 for more details about the methods to prioritize genomic loci by statistical fine-mapping that are summarized briefly here (Figure 3B). One approach to begin filtering for potentially causative variants is to only consider as potentially causative any variants that surpass an arbitrary LD threshold (pairwise correlation, r2) with the lead variant. This strategy can be limited, as it fails to account for potential joint effects of multiple causative variants in the locus (“allelic heterogeneity”, a trait estimated to characterize up to 20% of the loci identified through GWAS29,30). Furthermore, it does not provide a measure of confidence that a given variant is causative and relies upon semi-arbitrary thresholds of linkage disequilibrium.
To address some of these issues, fine-mapping methods based on penalized regression models were developed to jointly analyze all variants in a locus. These models simultaneously estimate the effect sizes of each variant and shrink the contribution of variants with small effect sizes toward zero, enabling identification of likely causative variants. Penalized regression models used for this purpose include lasso31 and elastic net32. Penalized regression methods allow for allelic heterogeneity and can yield estimates for the effect size of each nominated variant. However, these approaches have limitations: in particular, penalized models tend to be sparse, producing prediction models that can include non-causative variants and exclude true causative variants when they are highly correlated5.
More recently, Bayesian fine-mapping approaches have been developed to circumvent the limitations of P-value based filtering methods by instead determining posterior inclusion probabilities (PIPs) for each variant in a locus5,33. This approach ranks the variants by these values, and identifies the smallest credible set containing all causative variants in the locus with a given probability5,33. Bayesian fine-mapping approaches, such as CAVIAR34, PAINTOR35, CAVIARBF33, FINEMAP36 and SuSiE37 are well-suited to jointly model multiple causative variants and identify manageable sets of potentially causative variants, each with a full posterior probability distribution indicating their estimated effect sizes and the degree of uncertainty in this estimation. As is true for all Bayesian approaches, assumptions made in the designation of a prior can have large effects on the resultant posterior probabilities. For Bayesian fine-mapping of GWAS data, care must be taken to ensure the data from which the priors are determined are relevant to the population studied in the GWAS38.
Incorporation of functional ‘omics data through colocalization and functionally informed fine-mapping to refine statistical predictions of variant effects
To supplement statistical fine-mapping methods based solely on the association data at hand, orthogonal functional genomics data can be incorporated into analyses to enable functionally informed fine-mapping7 (Figure 3C). Methods implementing this approach include eCAVIAR39, fastPAINTOR40, PolyFun+SuSiE41 and CARMA42. Data used for fine-mapping can include chromatin accessibility, TF binding sites and histone modifications, three-dimensional chromatin conformation, or DNA methylation. The premise underlying these approaches is that causative genetic variants exert their effects on disease by first directly affecting one of these so-called “intermediate molecular phenotypes”8. Colocalization analyses (see Glossary) identify the overlap between loci associated with changes to intermediate molecular traits (molecular quantitative trait loci, or molQTLs) with loci associated with disease risk. MolQTLs can be defined by any quantitative molecular trait, including gene expression (eQTLs), chromatin accessibility (caQTLs), methylation levels (meQTL), splicing (sQTLs), protein levels (pQTLs), and others. Early colocalization analyses simply asked whether significant GWAS variants were also significant eQTLs. However, this method introduces a high likelihood of false-positive overlaps7,8, as it has been estimated that about 48% of common genetic variants act as eQTLs for at least one gene22, and the vast majority of protein-coding genes in the human genome have at least one eQTL in at least one tissue tested in the GTEx project dataset43. Additional methods, such as coloc44, enloc45, and coloc in combination with SuSiE to allow for multiple causal variants46 were thus developed to address this problem. Other derivatives of these methods, such as moloc47, and HyPrColoc48 allow for simultaneous analysis of multiple QTLs and other complex traits.
MolQTLs are often highly tissue- and cell type-specific, highlighting the utility of datasets from purified cell types and single-cell studies. Demonstrating the exquisite cell type- and state-specificity of molQTLs and the resolution gained by single-cell eQTL analysis, a study profiling 500,000 unstimulated memory T cells found that 1/3 of cis-eQTLs effects were identified in the context of continuous cell states, and could not have been identified in samples pooled using conventional discrete categories such as CD4+ versus CD8+49. Colocalizing candidate causative variants with tissue or cell type-specific molQTLs can help identify the tissues and cell types altered by the molecular effects of the variants.
Variant effect prediction by machine learning
Noncoding variants residing in CREs can alter cellular function by altering TF binding motifs and the affinity of the TF for the CRE6. Thus, the ability to predict which sequence changes might disrupt active TF binding sites can help prioritize noncoding variants for functional validation. Although there are databases of TF binding sequence preferences50, the cell type-specific grammar of TF binding and CRE activity extends far beyond the motif sequence itself, including myriad epigenetic features that provide the proper context for TF binding. This complexity has necessitated the use of ML methods, which learn to predict a molecular phenotype based on input genetic or epigenomic sequence (Figure 3D). These methods can be used to predict the molecular consequences of noncoding variants by comparing predicted activity between the reference and variant sequences.
To date, such ML models have been successfully employed on a variety of features, including chromatin accessibility and TF binding51,52, gene expression53,54, DNA methylation55, and 3D chromatin interactions56–58. Using these frameworks, potentially causative variants can be prioritized based on the predicted likelihood that they disrupt molecular phenotypes. While most of these methods have yet to be systematically validated for prediction of noncoding variant effects, they hold great promise as high-throughput and cell type-specific in silico platforms for variant prioritization. Our previous review of this topic provides additional details and context about applications of ML approaches in functional genomics59.
Empirical approaches to prioritizing variants
High-throughput experimental assays provide a complementary approach to prioritizing variants. When performed in relevant model systems (Box 1), such assays reveal the variants’ potential to alter gene expression. However, they do not provide a true functional validation of the variants, for reasons we highlight below.
[BOX 1] -. Allelic MPRA library design.
When designing an allelic MPRA, there are several important factors to consider including (1) whether the MPRA constructs are integrated in the model cells’ genome or remain episomal, (2) the extent of the sequence context used, (3) the positive and negative controls included in the library, and (4) the depth of sequencing coverage per CRS. Below, we discuss each of these factors in turn, providing recommendations and references for further reading.
Episomal vs. integrated approaches:
MPRAs can be applied to in vitro cell models using transfection or viral infection, as well as to animals in vivo, using vectors such as AAVs108. AAV-based MPRAs have been applied in live mice to test regulatory element activity in complex and difficult-to-access tissues such as the brain109. Directed evolution and other viral engineering approaches have enabled the development of AAV vectors with specific tropism for different tissues and cell types throughout the body110. These new vectors improve the range of in vivo tissues that can be probed, integrating physiological contexts that are missing from in vitro or ex vivo models. However, because AAV cargoes remain episomal upon delivery, they have the same limitations as other non-integrating approaches.
MPRA designs using non-integrating, episomal plasmids may have a greater dynamic range in their outputs than integrated approaches111. However, it has been demonstrated that libraries integrated into the genome yield greater fidelity to predicted endogenous activity; additionally, they display greater reproducibility between replicates than episomal designs61. LentiMPRA61,112 uses a lentiviral vector to integrate the CRS and reporter sequences into the genome. One concern about lentiviral-mediated genome integration approaches is the random nature of integration and the potential for site-of-integration effects. To reduce the effects of the surrounding sequence into which the construct integrates, the expression construct can be flanked by antirepressors61,112. Targeted-integration MPRA designs such as PatchMPRA have been developed to use known landing sites in the genome, each labeled with a unique barcode, enabling precise localization of where a construct has integrated and analyses based on that information113. However, this approach requires the use of specific cell lines engineered to contain genomic landing sites, and therefore cannot be applied to many disease-relevant models of interest; additionally, the throughput of the assay is limited to libraries of several hundreds, rather than multiple thousands, of CRSs. Additionally, although integrated constructs are more persistent in the cellular genome compared to episomal constructs, they may become silenced over time by endogenous host cell machinery114.
Sequence context:
The sequence context provided in a CRS, i.e, the number of surrounding base-pairs of DNA, can have substantial effects on the regulatory activity of the tested element, with longer sequences adding biologically relevant signal and potentially improving the relevance of the assay results to the activity of the endogenous locus111. This may be because many regulatory sequences act through the concerted binding of multiple TFs, and including additional genomic context results in a more faithful representation of such TF binding dynamics in the MPRA construct111. Moreover, including a larger segment of the surrounding haplotype may reveal disease-relevant variant interactions, as multiple nearby variants can exhibit cooperative or dependent effects29.
Controls:
Given the large numbers of sequences that an MPRA is designed to test, it is critical to include proper positive and negative controls. The ideal controls for an allelic MPRA are allelic pairs of sequences that have demonstrated differential activity in previous MPRAs in the model system of interest; databases of previous MPRA studies are being developed that will help with selection of controls71. However, given the relative paucity of such allelic MPRA data in most models at present, a second-best alternative is to include sequences expected to exhibit strong gene regulatory effects, irrespective of any allelic comparisons. For this class of controls, positive controls may be CRSs that have been previously validated to be active in the model of interest by any other reporter assay. If the model system lacks a set of previously validated elements, an alternative source of positive controls includes databases such as the VISTA Enhancer Browser115, or previously validated MPRA hits from other closely related cellular models. Negative controls are those not expected to be active in the model system of interest, and can come from previous MPRAs carried out in the same model. For the design of both negative and positive controls, tools including machine-learning based models trained on previous MPRA data are being developed to predict sequences expected to have high or low activity in MPRAs116. A null distribution, useful for ascertaining the expected activity of random sequences, can be produced by using scrambled sequences. Such scrambled controls preserve the same overall nucleotide prevalence in the experimental and control libraries.
Depth of sequencing:
Lastly, a critical consideration in implementing an MPRA is the sequencing coverage per-CRS required to obtain meaningful results. Each CRS and barcode should ideally be observed hundreds of times per experiment. For example, the LentiMPRA protocol calls for the use of > 50 barcodes per CRS, to mitigate variability in expression that the barcodes themselves may introduce, and hundreds of integrations per barcode, to mitigate potential site-of-integration effects in the case of random-integration MPRAs61,112. This consideration can limit the models that can feasibly be used for high-complexity MPRA libraries: with models that are not highly amenable to viral infection, assessing all variants with sufficient coverage could require a prohibitively large number of cells per replicate.
Massively Parallel Reporter Assays (MPRAs) enable high-throughput assessment of gene regulatory potential in exogenous contexts
Reporter assays involve cloning a candidate regulatory sequence–for instance, a CRE–in proximity to a minimal promoter controlling a marker gene such as GFP or luciferase in a plasmid, introducing the construct into cells of interest, and measuring the expression of the marker gene as a proxy for the regulatory activity of the CRS60,61.
In Massively Parallel Reporter Assays (MPRAs; see Glossary), candidate regulatory sequences (CRSs) are paired with barcoded marker genes and multiplexed so many CRSs can be tested simultaneously in a pooled screen15,60,62,63. After the library has been introduced into cells, DNA and RNA are collected and sequenced, and the frequency of the marker genes’ barcodes is compared between the RNA sequence reads and the DNA sequence reads to determine the regulatory activity of each CRS in the library15 (Figure 4). MPRAs have been traditionally used to determine the gene-regulatory potential of thousands of noncoding sequences. The advent of allelic MPRAs, designed to compare pairs of CRSs differing only by a single variant, has extended these assays to assess the effects of noncoding variants29,63–65. Differential gene-regulatory activity between alleles in the MPRA suggests they may also have differential activity in their endogenous genomic environment. Evidence suggests that MPRAs can capture cell type-specific features of gene regulation, as enhancer sequences exhibit more specificity in MPRAs than do the more cell type-universal promoter elements66.
Figure 4: High-throughput empirical prioritization of noncoding variants with functional genomics.

(A) Allelic Lentiviral-based Massively Parallel Reporter Assays (lentiMPRAs) are capable of simultaneously prioritizing thousands of noncoding variants with cell type-specific allelic effects on gene regulation in a single experiment. First, pairs of oligonucleotides are synthesized from commercial sources that represent the complement of allelic Candidate Regulatory Sequences (CRS) to test: each oligo pair differs by only the variant of interest, exemplified by the reference allele T and alternate allele C in the illustration. Following, these oligonucleotides are randomly barcoded with DNA sequences that, with up-front barcode-CRS association sequencing, allow for deconvolution of bulk signal in the final analyses. Cloning this oligonucleotide library into an eGFP reporter construct, this plasmid pool is integrated into the genome of a cell type of interest via lentivirus. Collecting bulk gDNA and RNA from these cultures, next-generation sequencing quantifies the number of insertions per barcoded CRS alongside the number of eGFP mRNA transcripts produced per barcode, respectively. Comparing allelic barcoded CRSs, the regulatory effect of a variant is defined. In the box highlighting a “Prioritized functional variant,” the alternate allele C variant reduces the reporter expression two-fold; if statistically significant, this exemplifies a prioritized variant warranting further investigation. (B) Functional identification of cis-regulatory elements (CREs) with CRISPR-based perturbations can prioritize noncoding variants that lie within them that may affect their activity. Analytical strategies, such as epigenomic profiling, can predict CRE sequences, such as enhancers, but do not empirically confirm their functionality nor link to a target gene or pathological mechanism. Instead, this can be achieved by altering the sequence itself (deletion or mutation by CRISPR-KO) or increasing (CRISPR-activation) or decreasing (CRISPR-inhibition) CRE activity followed by characterization of the outcome. If a predicted sequence is a functional CRE, modifying its activity will in turn affect expression of a gene it regulates. Modern applications pair this with high-throughput genomics to efficiently characterize cell type-specific cis-regulatory landscapes genome-wide in various disease-relevant cell types. Nominated variants are then overlaid: for example, noncoding variants within CREs are more likely to disrupt a transcription factor binding motif to exert a pathological change in gene expression while those in non-regulatory regions of the genome are more likely to be non-functional. This is exemplified by prioritization of a C alternate allele variant lying with the orange CRE depicted here.
Variations on the allelic MPRA concept can also assess post-transcriptional effects such as steady-state mRNA abundance and stability67, ribosome loading and translational efficiency68, and splicing69. An allelic MPRA experiment must be designed carefully, from library construction and implementation (Box 1) to model used (Box 2). Methods of implementing MPRAs are under active improvement; for instance, a recently developed single-cell resolution MPRA approach yields cell type-specific results from transfections into heterogeneous tissues70. Additionally, efforts are ongoing to develop centralized repositories enabling dissemination of MPRA results to the community71.
[BOX 2] -. Considerations about experimental modeling.
The experimental approaches outlined in this review are only as powerful as the model system in which they are applied. Given that the disease-relevant effects of noncoding variants are often connected to specific tissues or cell types via context-specific gene-regulatory activities8,16,24–28, it is imperative to consider the precise cellular context in which a variant is being studied in any experimental approach. There is no perfect model system for human tissues, nor is it possible to provide explicit recommendations that cover all applications. There are many factors to consider when selecting an experimental model to study the human disease-associated noncoding genome.
Broadly, experimental models of human tissue include immortalized or cancer-derived cell lines, stem-cell-derived 2D systems, complex 3D multicellular systems, and ex vivo cultures of primary cells, tissues or organs. Immortalized or cancer-derived cell lines are generally easy to handle or manipulate and offer some tissue- and cell type-specificity117,118. However, there are often stark differences between these cells and their in vivo counterparts, for example in gene expression119 and epigenetic landscape120, due to their adaptation to immortalized growth on plastic, the genetic drift they experience over years in culture, and the oncogenic mutations or chromosomal abnormalities they frequently harbor. iPSC- or hESC-derived 2D in vitro models require more expertise and handling, yet can be directed to mimic nearly any cell type in the human body121. Further, multicellular and 3D systems provide yet better models of the cells’ in vivo environments. Polycultures of TF-induced iPSC-derived cell types offer a relatively rapid and scalable modeling solution compatible with functional genomics, such as CRISPR screens122. Organoids123, organ-on-a-chip124, assembloids125, and transplanted organoids126 provide further endogenous structural organization and cell type specificity. These 3D models yield complex cell-cell interactions that provide microenvironment stimuli and recapitulate in vivo tissue architectures, exemplified by brain organoids127. On the other hand, all of the above iPSC-based systems may inherently lack age- or maturation-specific epigenetic marks due to the inherent epigenetic reprogramming in deriving iPSCs. Circumventing this, advances in ex vivo cultures have allowed for experimentation on the in vivo tissue of interest, cultured in 2D or 3D outside of the body128. However, these more complicated models impose a tradeoff between accuracy and feasibility–they often require significant expertise, are time and cost prohibitive, suffer from poor reproducibility, and are not always amenable to the scale of screening approaches129. For many of the assays described in this review, iPSC-derived 2D and polyculture model systems offer an acceptable balance between accuracy and feasibility.
In the context of assessing the functionality of noncoding variants derived from GWASs of human disease, the following are core attributes of any appropriate model. (1) The genome of the model is human–the lack of conservation of the noncoding genome across species130 means that even if orthologous variants exist in a non-human model, the surrounding genetic context will be confounding. (2) The model is amenable to the necessary assays, such as the use of lentivirus in MPRAs61,112. (3) The model reflects a biological context (tissue, cell type and/or cell state) relevant to the disease or variant of interest, and meets the specific biological requirement of the assay or the question at hand. While the first two criteria are straightforward, the third one is much more opaque.
To elaborate on this final criterion, the relevant biological context of a model is inextricably linked to the assay being performed. Different assays require different biological characteristics to give valuable insight. As an example, consider the model features critical for allelic MPRAs and CRISPR/Cas9-based perturbations. Allelic MPRAs primarily rely on the activity of cell type-specific TFs that will act on the exogenously-introduced regulatory sequences. The genetic background of the model system and the precise regulatory landscape at the endogenous locus are largely irrelevant, since we are assaying an exogenous construct. Instead, what is most important for an MPRA is that the model system used expresses a collection of transcription factors (which will act on the exogenous constructs) that closely mirrors the in vivo context of interest. Conversely, CRISPR/Cas9-based perturbations at an endogenous locus require tight recapitulation of the genetic background and gene regulatory landscape at the locus of interest. For example, if a noncoding variant affects the function of a specific enhancer, but that enhancer is not active in the chosen model or is not interacting with its target promoter in 3D space in the same way, inhibiting this enhancer via CRISPRi will not yield meaningful results. Therefore, a putative functional variant within this enhancer may be overlooked. Similarly, if a specific variant is being introduced via base editing to assess the effects that this may have on downstream phenotypes, it may be important to properly model the local haplotype or relevant genetic background which could interact with the variant of interest.
In summary, no single model will be perfectly suited for every assay. We advise up-front molecular characterization of a variety of models to proactively select the best for the application at hand, and use of this characterization to contextualize any results. If feasible, recapitulating results across multiple model systems is a good way to solidify any findings.
MPRAs are powerful methods for high-throughput prioritization of variants. However, they do not assess the effects of variants within their endogenous genomic contexts – that is, in their natural position in the genome – and thus should not, independently, be considered functional validation of causative variants.
CRISPR/Cas9-based perturbations to identify functional CREs
While MPRAs directly assess the effects of variants on gene regulation, they lack the endogenous context of the genome. By contrast, CRISPR/Cas9 technology72 can perturb gene regulatory elements in their endogenous loci. However, with the exception of some cutting-edge prime editing based screens73, most high-throughput CRISPR perturbations do not assess specific variants, but rather highlight functional CREs in disease-associated loci which could harbor causative variants. We therefore view these perturbation approaches as highly complementary to MPRAs but we similarly caution against their interpretation as functional validation of variant effects.
Broadly, these methods interfere with or promote endogenous regulatory activity (Figure 4B). One approach is to modify the DNA sequence itself, either by deleting large regions in the locus of interest74 or by systematically introducing an array of small mutations to disrupt TF binding sites (tiling mutagenesis)75. Another is to recruit machinery that alters the local chromatin state around the CRE. Examples include CRISPR interference (CRISPRi)76 and CRISPR activation (CRISPRa)77 and similar approaches, which allow for inhibition or activation of CREs of interest and have been applied across in vitro and even in vivo models78. All of these perturbation approaches are followed by measurement of transcriptional output and large repositories of these transcriptomic data are now available. CRISPRi or CRISPRa can also be paired with single-cell (sc) RNA-seq to link inhibition or activation of a regulatory element to an individual cell’s transcriptome79,80. This single-cell resolution, combined with multiplexed screening, can reveal the impact of perturbing hundreds to thousands of disease-associated genes or regulatory elements in one experiment.
In sum, high-throughput CRISPR/Cas9-based perturbations are an ideal way of assessing CREs in their endogenous loci. They provide a useful backdrop for the exploration of putative variant effects at scale, thus complementing other methods of high-throughput empirical variant prioritization. Moreover, their ability to nominate downstream target genes provides important information that can guide further functional validation, described in detail below.
Pairing Variants to their Gene Targets
Identifying the genes affected by noncoding variants is important to streamline validation assays and paramount to understand the basis of disease. Among the methods outlined above, colocalization and CRISPR/Cas9-based perturbations by design link a CRE or variant to a target gene, while MPRAs do not. We advocate for the use of multiple forms of evidence to pair a CRE or variant to a gene, such as the experimental, analytical, and modeling approaches we discuss below.
Experimental strategies to assign regulatory elements to genes attempt to capture physical interactions between them, typically via chromosome conformation capture assays. These methods, such as HiC81, reveal the spatial proximity of linearly distant regions of DNA, and may identify enhancer-promoter pairs. However, most high-quality chromatin conformation datasets generated to date are derived from bulk tissues or cell lines, and lack the necessary resolution, since gene regulatory interactions are largely cell type-specific. Emerging protocol optimizations82 and single-cell derivations of these methods83,84 may circumvent this issue.
Alternatively, gene-regulatory interactions can be predicted using various single-cell and/or multi-modal datasets and novel computational techniques, reviewed in detail elsewhere85. With the increasing availability of single-cell Assay for Transposase Accessible Chromatin sequencing (ATAC-seq) data, correlation-based co-accessibility (see Glossary) approaches have been widely adopted. Co-accessibility analysis86 relies on the notion that a gene regulatory interaction would be reflected as a strong correlation in the accessibility of the promoter and the distal regulatory element. Similarly, peak-to-gene linkage analysis correlates the accessibility of a regulatory element with multi-modal gene expression measurements in the same cell87.
Analytical modeling combines the core concepts of the experimental and analytical methods mentioned above. For example, the Activity-By-Contact (ABC) model18,88 incorporates both the strength (activity) of an enhancer and how often it contacts the promoter of its target gene (contact). This information contributes to a quantitative score with which to compare regulatory elements targeting the same gene. In the absence of available 3D conformation data, the “contact” portion of the ABC model (see Glossary) can be replaced by genomic distance, requiring only measurement of the “activity” of the regulatory element to enable prediction of gene-regulatory interactions. More recently, tools combining many of these methods have been developed to readily link risk variants directly to target genes89,90, further narrowing the search space on disease-relevant interactions for the targeted functional validation described below. Similarly, ML models have been trained to predict such gene-regulatory interactions56.
Functional Validation of Variants
All previously discussed methodologies serve only to predict the effects of a given variant, either in silico or in exogenous experimental contexts. The ultimate validation of these predictions requires measuring the molecular effects of variants in their endogenous genomic and cellular contexts (Figure 5A). While this view of validation is rather strict, it increases the translatability of functional validation to patient prognosis and to therapeutics design. For a noncoding variant, molecular phenotypes can be assessed using a combination of primary tissue profiling and controlled genome editing experiments. Ultimately, functional validation based on molecular phenotypes does not on its own prove that a variant causes disease. This level of proof requires linking the molecular phenotypes to disease phenotypes, but this linkage is beyond the scope of this review.
Figure 5: Targeted validation of noncoding variant effects on molecular phenotypes.

(A) Illustration of a mechanism by which a prioritized noncoding variant putatively functions in the endogenous locus. In this toy example, the alternate allele C disrupts the GATA binding motif, reducing the affinity of the GATA transcription factor to bind at a putative enhancer, and in turn reducing expression of this enhancer’s target gene. (B) Leveraging natural genetic variation, high-throughput sequencing assays can identify allelic imbalance of a molecular phenotype. For example, ATAC-seq is used to profile homozygous and heterozygous individuals for the reference allele T and alternate allele C. Given sequencing reads can be annotated to a specific allele when reading through the variant, individual sequencing reads can be ascribed as being derived from a specific allele. At right, the number of reads containing the reference allele are plotted against the number of reads containing the alternate for each individual, represented as a dot. Sequencing data from homozygotes (blue and red) only contain reads with one allele and therefore lie on the axes and are not informative for allelic imbalance. Conversely, sequencing data from heterozygotes (purple) contain reads of both alleles and can be used as direct observation of allelic effects on chromatin accessibility. In the bottom graph, there are roughly equal reads from each allele, and thus heterozygous individuals are distributed along the diagonal, indicating that the variant has no impact on the chromatin accessibility. In the top graph, there is a deviation from this and increased reads of the reference T allele skew heterozygotes off the diagonal line. This indicates the alternate C allele reduces chromatin accessibility at this locus thus supporting its predicted functionality as depicted in (A). (C) CRISPR/Cas9-mediated genome editing strategies, such as HDR, base editing, and prime editing, are capable of engineering scarless single base edits into a locus of interest. Producing isogenic pairs of iPSC lines differing by only the variant of interest, iPSCs are then differentiated into cell types of interest to interrogate molecular effects. Comparing epigenomic, transcriptomic, and comparable data functionally validates the effect of this noncoding variant on a molecular phenotype. In this example, the alternate allele C reduces expression of gene Z specifically exerting its effect in cell type A, in all connecting the variant to a phenotype.
Allelic imbalance analyses assess the immediate molecular effects of variants in their endogenous genomic contexts
High-throughput sequencing methods such as ATAC-seq, DNase-seq, and ChIP-seq, performed on human tissues or cells, enable direct interrogation of the allele-specific molecular effects of noncoding variants. These allele-specific effects manifest as an imbalance in the prevalence of the two alleles in the sequencing data. The presence of allelic imbalance can support the effect of a variant on the given molecular phenotype (for example, chromatin accessibility (Figure 5B). Because this allelic imbalance is observed within primary human cells and in the endogenous genomic context, we consider it to be a form of validation for noncoding variant effects.
Allelic imbalance analyses are performed in heterozygous individuals, allowing a comparison of the two alleles in an internally controlled fashion. This can mitigate some of the effects of technical noise and other confounding factors such as population stratification91,92. Allele-specific analyses consider the individual contributions of each allele, unlike traditional molQTL analyses, which regress total normalized read counts mapping to a locus against allelic dosage37. The increased power gained by integrating allele-specific information reduces the sample sizes required compared to standard molQTL approaches91,93–95. Many methods37,91,93–95 have been developed to integrate between-individual molQTLs and within-individual allele-specific signals to identify cis-QTLs and fine-map causative variants while correcting for technical factors such as reference mapping bias. This type of allele-specific QTL mapping framework has been used to prioritize risk variants in diseases as diverse as cancer92, autoimmunity96, and neurodegeneration26.
While allelic imbalance analysis can be highly supportive of functional effects in endogenous genomic loci of primary tissues and is thus included in the functional validation section, it can still be subject to confounding. For example, variants residing within the same enhancer as a causative variant are likely to show similar levels of allelic imbalance; likewise, variants in promoters regulated by an imbalanced enhancer may also exhibit allelic imbalance due to imbalanced activity from the two enhancer alleles.
Scarless genome editing for isogenic comparisons
Scarless genome editing (see Glossary) overcomes some of the limitations of allelic imbalance analyses by probing the effects of a specific variant in its endogenous genomic locus. Scarless editing has been used to generate precise genotypes in isogenic cell lines, enabling analyses of allele-specific effects within controlled genetic backgrounds. This editing is followed by comparison of significant alterations to molecular and ultimately disease-relevant cellular phenotypes. These isolated comparisons circumvent confounding by linkage disequilibrium and test the sufficiency of the variant to mediate the predicted molecular effect (Figure 2). While such isogenic comparisons can be applied in many contexts, induced pluripotent stem cells (iPSCs) offer an ideal system in which to implement these genome editing techniques, given their potential to be differentiated into many disease-relevant cell types and their availability via disease-specific iPSC banks (Figure 5C; see also Box 2).
Scarless genome editing for functional validation has traditionally relied on homology-directed repair (HDR). However, while reliable across many cellular contexts, HDR-based editing can generate indels or off-target effects at high frequency, and is limited to mitotic cells97. To mitigate some of these limitations, variations on the core concept have been developed98. These strategies have resulted in the functional validation of numerous noncoding variants spanning many polygenic diseases99, such as in Parkinson’s Disease100. Nevertheless, the inherent risks and inefficiencies of introducing and specifically repairing double-stranded breaks essential to HDR methods provided the impetus for the creation of base101 and prime102 editing strategies that do not rely on such breaks. These strategies have yet to be widely applied to noncoding variants and exhibit highly variable efficiency103. However, given their success in modeling coding variants, such as the causal variant of Sickle Cell disease104, we propose their use as a viable alternative to the strategies outlined above. Adoption of high-throughput and single-cell derivatives of these methods, such as prime editing screens73, will greatly accelerate the validation of noncoding variants moving forward.
The ever-evolving gene editing toolkit provides the ability to derive isogenic pairs in increasingly complex model systems. These isogenic pairs can be used to demonstrate variant effects on molecular phenotypes including gene regulation and gene expression. While individual GWAS variants often have a small effect on disease predisposition (and potentially cellular functions), their effect at the level of gene expression is not necessarily as small. Distilling these effects via single-base perturbations in defined contexts is essential for driving GWAS discoveries forward.
Concluding Remarks
Technological and methodological advances in the last decade have made possible the journey from GWAS to functional validation of noncoding variants. Here, we have proposed a workflow for prioritization and validation of noncoding variants, with the goal of outlining the limitations and advantages of each component in the workflow. Nevertheless, key questions and limitations remain in our understanding of the genetic basis of disease predisposition that warrant future exploration (See Outstanding Questions). In particular, genomic diversity must be deliberately increased in GWAS cohorts, making tools such as polygenic risk scores applicable across individuals and agnostic of ancestry105. In many of the methods described here, poor choice of a reference genome (e.g. assuming a largely European reference is universally applicable) can limit the relevance of the results across ancestries. Additionally, while we have focused on common noncoding variants, more concerted efforts must be made to uncover the roles of rare variants and structural variants in disease. These underappreciated drivers of polygenic disease are often excluded because of analytical or sampling constraints106,107.
Looking forward, even the most robust variant prioritization paradigms will be met with one key question—how do we know when we are done? As recent work has demonstrated experimentally that a single locus can harbor multiple putatively functional variants29, it follows that a single locus could also have multiple effects across multiple cell types or developmental contexts. Therefore, deciding when our understanding of a given locus is complete is a challenge. Nevertheless, as our understanding of the genetic drivers of complex diseases increases, so too will our ability to investigate mechanistic questions about the molecular and cellular drivers of disease, and ultimately inform efficacious genetically informed therapies. In the coming era of widely accessible clinical WGS, these efforts to identify causal variants and understand their molecular and cellular mechanisms will fundamentally change how polygenic diseases are tracked, diagnosed, and treated.
Acknowledgements
We thank all of the members of the Corces Lab and the Gladstone Institutes editorial team for thoughtful critique of this review. IMC was supported by a National Science Foundation Graduate Research Fellowship. ZAG, and MRC were supported by U01AG072573, UM1HG012076, and P01AG073082 from the National Institutes of Health. MRC is additionally supported by the Farmer Family Foundation Parkinson’s Research Initiative.
Glossary
- Fine-mapping
the process of refining an association between a particular genomic locus and a given trait to a smaller genomic region, prioritized sets of variants, and ultimately individual causative variants.
- Massively Parallel Reporter Assay (MPRA)
a high-throughput experimental technique that assesses the gene regulatory activity of candidate sequences in a pooled library, outside of their endogenous genomic loci.
- Quantitative Trait Locus (QTL)
a genomic region exhibiting variation within a population that is associated with the occurrence of a trait.
- Cis-regulatory element (CRE)
a sequence of DNA, typically noncoding, which regulates the transcription of a nearby gene, most often via recruitment of transcription factors with sequence specificity to bind this sequence of DNA. Examples include enhancers, repressors, and insulators.
- Colocalization
an analytical approach which aims to refine disease associations by identifying genomic loci that simultaneously affect multiple distinct phenotypes, i.e. a molecular trait and disease predisposition.
- Co-accessibility
an analytical approach based on ATAC-seq data used to predict gene-regulatory interactions by correlating accessibility of a promoter and a CRE.
- Activity-By-Contact (ABC) model
a mathematical model used to predict the genes regulated by a given CRE, combining both experimental measurements to represent the strength of an enhancer such as H3K27ac ChIP-seq (activity) and 3D interaction data such as Hi-C to represent how often a CRE contacts a gene target (contact).
- Scarless genome editing
a class of methods that produce isogenic pairs of cell lines or model organisms that differ by only a variant of interest and are otherwise identical genome-wide.
Footnotes
Declaration of Interests
The authors declare no competing interests.
References Cited
- 1.Manolio TA et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Boyle EA, Li YI & Pritchard JK An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell 169, 1177–1186 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Uffelmann E et al. Genome-wide association studies. Nat. Rev. Methods Primer 1, 1–21 (2021). [Google Scholar]
- 4.Sollis E et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res. 51, D977–D985 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Schaid DJ, Chen W & Larson NB From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet 19, 491–504 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Broekema RV, Bakker OB & Jonkers IH A practical view of fine-mapping and gene prioritization in the post-genome-wide association era. Open Biol. 10, 190221 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Cano-Gamez E & Trynka G From GWAS to Function: Using Functional Genomics to Identify the Mechanisms Underlying Complex Diseases. Front. Genet 11, (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Novikova G, Andrews SJ, Renton AE & Marcora E Beyond association: successes and challenges in linking non-coding genetic variation to functional consequences that modulate Alzheimer’s disease risk. Mol. Neurodegener 16, 27–27 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Alsheikh AJ et al. The landscape of GWAS validation; systematic review identifying 309 validated non-coding variants across 130 human diseases. BMC Med. Genomics 15, 74 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Claussnitzer M et al. FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. N. Engl. J. Med 373, 895–907 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Slatkin M Linkage disequilibrium — understanding the evolutionary past and mapping the medical future. Nat. Rev. Genet 9, 477–485 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Das S, Abecasis GR & Browning BL Genotype Imputation from Large Reference Panels. Annu. Rev. Genomics Hum. Genet 19, 73–96 (2018). [DOI] [PubMed] [Google Scholar]
- 13.Vergara C et al. Genotype Imputation Performance of Three Reference Panels Using African Ancestry Individuals. Hum. Genet 137, 281–292 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lappalainen T, Li YI, Ramachandran S & Gusev A Genetic and molecular architecture of complex traits. Cell 187, 1059–1075 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.McAfee JC et al. Focus on your locus with a massively parallel reporter assay. J. Neurodev. Disord 14, 50 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Maurano MT et al. Systematic Localization of Common Disease-Associated Variation in Regulatory DNA. Science 337, 1190–1195 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Nicolae DL et al. Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS. PLOS Genet. 6, e1000888 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Nasser J et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kuksa Pavel P et al. Scalable approaches for functional analyses of whole-genome sequencing non-coding variants. Hum. Mol. Genet (2022) doi: 10.1093/hmg/ddac191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Gaulton KJ, Preissl S & Ren B Interpreting non-coding disease-associated human variants using single-cell epigenomics. Nat. Rev. Genet 24, 516–534 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lettice LA et al. A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet 12, 1725–1735 (2003). [DOI] [PubMed] [Google Scholar]
- 22.Liu X, Li YI & Pritchard JK Trans Effects on Gene Expression Can Drive Omnigenic Inheritance. Cell 177, 1022–1034.e6 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Võsa U et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet 53, 1300–1310 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Andersson R et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Corces MR et al. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet 48, 1193–1203 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Corces MR et al. Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer’s and Parkinson’s diseases. Nat. Genet 52, 1158–1168 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Nott A et al. Brain cell type–specific enhancer–promoter interactome maps and disease - risk association. Science 366, 1134–1139 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Song M et al. Mapping cis-regulatory chromatin contacts in neural cells links neuropsychiatric disorder risk variants to target genes. Nat. Genet 51, 1252–1262 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Abell NS et al. Multiple causal variants underlie genetic associations in humans. 9 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Hormozdiari F et al. Widespread Allelic Heterogeneity in Complex Traits. Am. J. Hum. Genet 100, 789–802 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Tibshirani R Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. B Methodol 58, 267–288 (1996). [Google Scholar]
- 32.Cho S, Kim H, Oh S, Kim K & Park T Elastic-net regularization approaches for genome-wide association studies of rheumatoid arthritis. BMC Proc. 3, S25 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Chen W et al. Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics. Genetics 200, 719–736 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Hormozdiari F, Kostem E, Kang EY, Pasaniuc B & Eskin E Identifying Causal Variants at Loci with Multiple Signals of Association. Genetics 198, 497–508 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kichaev G et al. Integrating Functional Data to Prioritize Causal Variants in Statistical Fine-Mapping Studies. PLoS Genet. 10, e1004722 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Benner C et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Wang G, Sarkar A, Carbonetto P & Stephens M A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Ser. B Stat. Methodol 82, 1273–1300 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Ruan Y et al. Improving polygenic prediction in ancestrally diverse populations. Nat. Genet 54, 573–580 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Hormozdiari F et al. Colocalization of GWAS and eQTL Signals Detects Target Genes. Am. J. Hum. Genet 99, 1245–1260 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Kichaev G et al. Improved methods for multi-trait fine mapping of pleiotropic risk loci. Bioinforma. Oxf. Engl 33, 248–255 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Weissbrod O et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet 52, 1355–1363 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Yang Z et al. CARMA is a new Bayesian model for fine-mapping in genome-wide association meta-analyses. Nat. Genet 55, 1057–1065 (2023). [DOI] [PubMed] [Google Scholar]
- 43.THE GTEX CONSORTIUM. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Giambartolomei C et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLOS Genet. 10, 1–15 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Wen X, Pique-Regi R & Luca F Integrating molecular QTL data into genome-wide genetic association analysis: Probabilistic assessment of enrichment and colocalization. PLOS Genet. 13, e1006646 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Wallace C A more accurate method for colocalisation analysis allowing for multiple causal variants. PLOS Genet. 17, e1009440 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Giambartolomei C et al. A Bayesian framework for multiple trait colocalization from summary association statistics. Bioinforma. Oxf. Engl 34, 2538–2545 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Foley CN et al. A fast and efficient colocalization algorithm for identifying shared genetic risk factors across multiple traits. Nat. Commun 12, 764 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Nathan A et al. Single-cell eQTL models reveal dynamic T cell state dependence of disease loci. Nature 606, 120–128 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Rauluseviciute I et al. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 52, D174–D182 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Avsec Ž et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet 53, 354–366 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Chen KM, Wong AK, Troyanskaya OG & Zhou J A sequence-based global map of regulatory activity for deciphering human genetics. Nat. Genet 54, 940–949 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Zhou J et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet 50, 1171–1179 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Avsec Ž et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Angermueller C, Lee HJ, Reik W & Stegle O DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 18, 67 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Fudenberg G, Kelley DR & Pollard KS Predicting 3D genome folding from DNA sequence with Akita. Nat. Methods 17, 1111–1117 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Schwessinger R et al. DeepC: predicting 3D genome folding using megabase-scale transfer learning. Nat. Methods 17, 1118–1124 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Zhou J Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale. Nat. Genet 54, 725–734 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Lan AY & Corces MR Deep learning approaches for noncoding variant prioritization in neurodegenerative diseases. Front. Aging Neurosci 14, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Inoue F & Ahituv N Decoding enhancers using massively parallel reporter assays. Genomics 106, 159–164 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Inoue F et al. A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity. Genome Res. 27, 38–52 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Melnikov A et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol 30, 271–277 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Tewhey R et al. Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay. Cell 165, 1519–1529 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Guo MG et al. Integrative analyses highlight functional regulatory variants associated with neuropsychiatric diseases. Nat. Genet 1–16 (2023) doi: 10.1038/s41588-023-01533-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.McAfee JC et al. Systematic investigation of allelic regulatory activity of schizophrenia-associated common variants. Cell Genomics 3, 100404 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Agarwal V et al. Massively parallel characterization of transcriptional regulatory elements in three diverse human cell types. 2023.03.05.531189 Preprint at 10.1101/2023.03.05.531189 (2023). [DOI] [Google Scholar]
- 67.Griesemer D et al. Genome-wide functional screen of 3’UTR variants uncovers causal variants for human disease and evolution. Cell 184, 5247–5260.e19 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Sample PJ et al. Human 5′ UTR design and variant effect prediction from a massively parallel translation assay. Nat. Biotechnol 37, 803–809 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Cheung R et al. A Multiplexed Assay for Exon Recognition Reveals that an Unappreciated Fraction of Rare Genetic Variants Cause Large-Effect Splicing Disruptions. Mol. Cell 73, 183–194.e8 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Zhao S et al. A single-cell massively parallel reporter assay detects cell-type-specific gene regulation. Nat. Genet 55, 346–354 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Zhao J et al. MPRAbase: A Massively Parallel Reporter Assay Database. BioRxiv Prepr. Serv. Biol 2023.11.19.567742 (2023) doi: 10.1101/2023.11.19.567742. [DOI] [Google Scholar]
- 72.Wang JY & Doudna JA CRISPR technology: A decade of genome editing is only the beginning. Science 379, eadd8643 (2023). [DOI] [PubMed] [Google Scholar]
- 73.Ren X et al. High-throughput PRIME-editing screens identify functional DNA variants in the human genome. Mol. Cell 83, 4633–4645.e9 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Diao Y et al. A tiling-deletion-based genetic screen for cis-regulatory element identification in mammalian cells. Nat. Methods 14, 629–635 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Sanjana NE et al. High-resolution interrogation of functional elements in the noncoding genome. Science 353, 1545–1549 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Thakore PI et al. Highly specific epigenome editing by CRISPR-Cas9 repressors for silencing of distal regulatory elements. Nat. Methods 12, 1143–1149 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Li K et al. Interrogation of enhancer function by enhancer-targeting CRISPR epigenetic editing. Nat. Commun 11, 485 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Matharu N et al. CRISPR-mediated activation of a promoter or enhancer rescues obesity caused by haploinsufficiency. Science 363, eaau0629 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Xie S, Duan J, Li B, Zhou P & Hon GC Multiplexed Engineering and Analysis of Combinatorial Enhancer Activity in Single Cells. Mol. Cell 66, 285–299.e5 (2017). [DOI] [PubMed] [Google Scholar]
- 80.Gasperini M et al. A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic Screens. Cell 176, 377–390.e19 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Lieberman-Aiden E et al. Comprehensive mapping of long range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Hsieh T-HS et al. Mapping Nucleosome Resolution Chromosome Folding in Yeast by Micro-C. Cell 162, 108–119 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Ramani V et al. Sci-Hi-C: A single-cell Hi-C method for mapping 3D genome organization in large number of single cells. Methods San Diego Calif 170, 61–68 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Arrastia MV et al. Single-cell measurement of higher-order 3D genome organization with scSPRITE. Nat. Biotechnol 40, 64–73 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Hariprakash JM & Ferrari F Computational Biology Solutions to Identify Enhancers-target Gene Pairs. Comput. Struct. Biotechnol. J 17, 821–831 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Pliner HA et al. Cicero Predicts cis-Regulatory DNA Interactions from Single-Cell Chromatin Accessibility Data. Mol. Cell 71, 858–871.e8 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Corces MR et al. The chromatin accessibility landscape of primary human cancers. Science 362, eaav1898 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Fulco CP et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet 51, 1664–1669 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Gazal S et al. Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity. Nat. Genet 54, 827–836 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Weeks EM et al. Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases. Nat. Genet 55, 1267–1276 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Harvey CT et al. QuASAR: quantitative allele-specific analysis of reads. Bioinformatics 31, 1235–1242 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Grishin D & Gusev A Allelic imbalance of chromatin accessibility in cancer identifies candidate causal risk variants and their mechanisms. Nat. Genet 54, 837–849 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Kumasaka N, Knights AJ & Gaffney DJ Fine-mapping cellular QTLs with RASQUAL and ATAC-seq. Nat. Genet 48, 206–213 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.van de Geijn B, McVicker G, Gilad Y & Pritchard JK WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods 12, 1061–1063 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Liang Y, Aguet F, Barbeira AN, Ardlie K & Im HK A scalable unified framework of total and allele-specific counts for cis-QTL, fine-mapping, and prediction. Nat. Commun 12, 1424 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Qu K et al. Individuality and Variation of Personal Regulomes in Primary Human T Cells. Cell Syst. 1, 51–61 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Paquet D et al. Efficient introduction of specific homozygous and heterozygous mutations using CRISPR/Cas9. Nature 533, 125–129 (2016). [DOI] [PubMed] [Google Scholar]
- 98.Skarnes WC, Pellegrino E & McDonough JA Improving homology-directed repair efficiency in human stem cells. Methods 164–165, 18–28 (2019). [DOI] [PubMed] [Google Scholar]
- 99.Rao S, Yao Y & Bauer DE Editing GWAS: experimental approaches to dissect and exploit disease-associated genetic variation. Genome Med. 13, 41 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Soldner F et al. Parkinson-associated risk variant in distal enhancer of α-synuclein modulates target gene expression. Nature 533, 95–99 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Porto EM, Komor AC, Slaymaker IM & Yeo GW Base editing: advances and therapeutic opportunities. Nat. Rev. Drug Discov 19, 839–859 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Chen PJ & Liu DR Prime editing for precise and highly versatile genome manipulation. Nat. Rev. Genet (2022) doi: 10.1038/s41576-022-00541-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Li X et al. Chromatin Context-Dependent Regulation and Epigenetic Manipulation of Prime Editing. http://biorxiv.org/lookup/doi/10.1101/2023.04.12.536587 (2023) doi: 10.1101/2023.04.12.536587. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Anzalone AV et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Duncan L et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun 10, 3328 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Chiang C et al. The impact of structural variation on human gene expression. Nat. Genet 49, 692–699 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Li X et al. The impact of rare variation on gene expression across tissues. Nature 550, 239–243 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Shen SQ et al. Massively parallel cis-regulatory analysis in the mammalian central nervous system. Genome Res. 26, 238–255 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Lambert JT et al. Parallel functional testing identifies enhancers active in early postnatal mouse brain. eLife 10, e69479 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Li C & Samulski RJ Engineering adeno-associated virus vectors for gene therapy. Nat. Rev. Genet 21, 255–272 (2020). [DOI] [PubMed] [Google Scholar]
- 111.Klein JC et al. A systematic evaluation of the design and context dependencies of massively parallel reporter assays. Nat. Methods 17, 1083–1091 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Gordon MG et al. lentiMPRA and MPRAflow for high-throughput functional characterization of gene regulatory elements. Nat. Protoc 15, 2387–2412 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Maricque BB, Chaudhari HG & Cohen BA A massively parallel reporter assay dissects the influence of chromatin structure on cis-regulatory activity. Nat. Biotechnol 10.1038/nbt.4285 (2018) doi: 10.1038/nbt.4285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Ellis DJ Silencing and Variegation of Gammaretrovirus and Lentivirus Vectors. (2005). [DOI] [PubMed] [Google Scholar]
- 115.Visel A, Minovitsky S, Dubchak I & Pennacchio LA VISTA Enhancer Browser—a database of tissue-specific human enhancers. Nucleic Acids Res. 35, D88–D92 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Lu F, Sossin A, Abell N, Montgomery SB & He Z Deep learning-assisted genome-wide characterization of massively parallel reporter assays. Nucleic Acids Res. 50, 11442–11454 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Kaur G & Dufour JM Cell lines: Valuable tools or useless artifacts. Spermatogenesis 2, 1–5 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118.Salvadores M, Fuster-Tormo F & Supek F Matching cell lines with cancer type and subtype of origin via mutational, epigenomic, and transcriptomic patterns. Sci. Adv 6, eaba1862 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Yu K et al. Comprehensive transcriptomic analysis of cell lines as models of primary tumors across 22 tumor types. Nat. Commun 10, 3574 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Rodger EJ et al. Comparison of Global DNA Methylation Patterns in Human Melanoma Tissues and Their Derivative Cell Lines. Cancers 13, 2123 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Volpato V & Webber C Addressing variability in iPSC-derived models of human disease: guidelines to promote reproducibility. Dis. Model. Mech 13, dmm042317 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Li E et al. CRISPRi-based screens in iAssembloids to elucidate neuron-glia interactions. 2023.04.26.538498 Preprint at 10.1101/2023.04.26.538498 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.Corsini NS & Knoblich JA Human organoids: New strategies and methods for analyzing human development and disease. Cell 185, 2756–2769 (2022). [DOI] [PubMed] [Google Scholar]
- 124.Low LA, Mummery C, Berridge BR, Austin CP & Tagle DA Organs-on-chips: into the next decade. Nat. Rev. Drug Discov 20, 345–361 (2021). [DOI] [PubMed] [Google Scholar]
- 125.Birey F et al. Assembly of functionally integrated human forebrain spheroids. Nature 545, 54–59 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126.Revah O et al. Maturation and circuit integration of transplanted human cortical organoids. Nature 610, 319–326 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127.Di Lullo E & Kriegstein AR The use of brain organoids to investigate neural development and disease. Nat. Rev. Neurosci 18, 573–584 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Neil JE, Brown MB & Williams AC Human skin explant model for the investigation of topical therapeutics. Sci. Rep 10, 21192 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 129.Hofer M & Lutolf MP Engineering organoids. Nat. Rev. Mater 6, 402–420 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130.Taher L et al. Genome-wide identification of conserved regulatory function in diverged sequences. Genome Res. 21, 1139–1149 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
