Summary
Understanding the influence of cis-regulatory elements on gene regulation poses numerous challenges given complexities stemming from variations in transcription factor binding, chromatin accessibility, structural constraints, and cell-type differences. This review discusses the role of gene regulatory networks in enhancing understanding of transcriptional regulation and covers construction methods ranging from expression-based approaches to supervised machine learning. Additionally, key experimental methods, including MPRAs, and CRISPR-Cas9-based screening, which have significantly contributed to understanding transcription factor binding preferences and cis-regulatory element functions, are explored. Lastly, the potential of machine learning and artificial intelligence to unravel cis-regulatory logic is analyzed. These computational advances have far-reaching implications for precision medicine, therapeutic target discovery, and the study of genetic variations in health and disease.
Keywords: Cis-regulation, Disease-associated variants, Transcription factors, Machine learning, Parallel assays, Functional analysis, CRISPR-Cas9
Graphical Abstract
This review discusses the role of gene regulatory networks in enhancing understanding of transcriptional regulation and construction methods ranging from expression-based approaches to supervised machine learning. We explore experimental methods, such as MPRAs and CRISPR-Cas9-based screening, and analyze the potential of machine learning and artificial intelligence to unravel cis-regulatory logic.

Introduction
Gaining a comprehensive understanding of the gene regulatory code (Fig. 1) has been a challenge. While over 90% of disease-associated variants are in non-coding regions,1 incomplete knowledge of cis-regulatory grammar hinders our ability to predict the consequences of a mutation a priori.1–3 Several factors contribute to this complexity, including binding site degeneracy, DNA structural constraints, complex transcription factor (TF) interactions, flanking sequences around binding sites, and variations across cell types, developmental stages, and time points.
Figure 1.

Gene regulatory elements. A. Gene regulatory elements include enhancers and promoters. They play crucial roles in modulating the transcriptional activity of genes. B. The structural components of the eukaryotic gene include the upstream regulatory sequence, which includes the promoter region responsible for initiating transcription, the open reading frame (ORF) where the protein-coding sequence resides, and the downstream regulatory sequence involved in post-transcriptional regulation. (Created with BioRender.com).
To better understand cis-regulatory grammar, network models are utilized to represent gene regulation mediated by TFs (Fig. 1). These models consist of nodes representing genes and directed edges connecting TFs to their target genes, representing regulatory interactions.4 Numerous approaches exist to construct gene regulatory networks. Expression-based methods employ gene expression matrices derived from transcriptome sequencing and computational methods such as correlation metrics, probabilistic methods, and regression algorithms.5
Alternatively, machine learning approaches frame network construction as a binary classification problem and use methods such as support vector machines and decision-trees.6 Although expression-based approaches are more straightforward and intuitive, they cannot model TF binding specificity due to their omission of sequence information. To address this limitation, alternative methods scan genomic sequences near transcription start sites (TSSs) to identify TF binding sites (TFBSs). Motif analysis and chromatin data can help to model sequence specificity for transcriptional regulatory networks.7
To systematically study genome-wide TF binding, researchers have employed various experimental methods, such as ChIP-seq, CUT&Tag, CUT&RUN,8 and DNA footprinting with DNase I.9,10 In vitro techniques, like systematic evolution of ligands by exponential enrichment (SELEX) sequencing, allow researchers to derive binding motifs by examining TFBSs.11,12 CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) technologies have facilitated the identification of the roles and grammar of cis-regulatory elements by enabling systematic knockout, perturbation, and activation of regulatory sequences.13–16 Large-scale, multi-center initiatives, such as the ENCODE consortium, have produced comprehensive genome-wide information maps for TFs across diverse cell types.17
These experimental approaches have led to the development of algorithms for inferring TF binding preferences at sequence motifs and genomic loci. Databases like JASPAR and CIS-BP store TF-motif preferences, while ReMap and UniBind compile binding sites for individual TFs across cell types.18–21 High-throughput assays, such as massively parallel reporter assays (MPRAs), have the ability to examine a multitude of cis-regulatory elements in parallel.2,22,23
Machine learning techniques and artificial intelligence have been instrumental in deciphering regulatory relationships. The use of hierarchical layers within convolutional neural network (CNN) architectures captures increasingly complex features from the input data and facilitates a more comprehensive understanding of cis-regulatory sequences. Subsequently, machine learning-based tools have been used to identify TFBSs, predict promoters, enhancers, and their interactions, and to infer the cis-regulatory grammar in DNA sequences.
Despite the potential of these techniques, their intricate structures and numerous parameters can impede interpretability. Ongoing efforts to build comprehensive catalogs of functional regulatory elements promise to facilitate the development of new machine learning-based tools, improve understanding of the gene regulatory code, and elucidate how specific variants can lead to unique phenotypes.24 This review will delve into current experimental and algorithmic advancements, their challenges, and their potential to provide insights into cis-regulatory elements and their roles in gene regulation.
Transcription factor networks
Transcriptional regulation is often represented through network models, which offer insights into the regulatory relationships among genes.25–29 These networks help identify downstream targets of TFs, support comparative analyses across developmental stages and disease conditions, and locate master regulator TFs, which drive cell differentiation by controlling numerous downstream genes. Despite the significance of transcription factor networks (TFNs), a standardized terminology is lacking in this field, with terms like gene co-expression networks (GCNs), gene regulatory networks (GRNs), and transcriptional regulatory networks (TRNs) used interchangeably.30 In this review, GCNs represent gene expression correlations with undirected edges, while GRNs use directed edges for regulatory interactions (Fig. 2). TRNs are specialized GRNs with edges originating exclusively from TFs, representing transcriptional regulation by TF gene-encoded proteins.30 This terminological distinction aims to reduce ambiguity when categorizing, comparing, and benchmarking network methodologies, although it’s worth noting that most GRN construction methods can infer TRNs by restricting regulator genes to TF genes.
Figure 2.

Comparison of GCNs, GRNs, and TRNs. A. GCNs represent gene expression correlations with undirected edges. B. GRNs use directed edges to represent gene expression correlations. C. In TRNs, the directed edges originate exclusively from transcription factors. (Created with BioRender.com).
There are more than one thousand TF genes within the human genome,31 and each of these genes can regulate the expression of hundreds to thousands of downstream genes through transcriptional control, resulting in an extensive network with millions of potential edges. Computational methods are essential for constructing and analyzing complex TRNs, and a multitude of approaches rely on supervised and unsupervised learning from next generation sequencing data using both expression and sequence data (Table 1).
Table 1.
Statistical and machine learning methods for GRN construction
| Method | Data type | Primary learning type |
|---|---|---|
| Correlation5,32–34 | Expression | Unsupervised |
| Mutual Information33,35–39 | Expression | Unsupervised |
| Probabilistic40–42 | Expression | Unsupervised |
| Nonnegative Matrix Factorization | Expression, Sequence | Unsupervised |
| Regression43–46 | Expression | Supervised |
| Logistic Regression6 | Expression | Supervised |
| K-Nearest Neighbor6 | Expression | Supervised |
| Naive Bayes6 | Expression | Supervised |
| Support Vector Machine47,48,49,50 | Expression, Sequence | Supervised |
| Decision Tree6 | Expression, Sequence | Supervised |
| Gradient Boosting Machines51 | Expression, Sequence | Supervised |
It is worth noting that evolving single-cell technologies had a considerable impact on building TRNs. A large number of methods have been proposed to build TRNs using single-cell genomic data using both expression (scRNA-Seq) and chromatin accessibility data (scATAC-Seq), as discussed comprehensively elsewhere.52–59 Single-cell technologies present tremendous opportunities for constructing TRNs. As opposed to bulk data where there are usually tens of samples as data points, there are thousands of cells as individual data points in the single-cell data. This exponential increase in the number of observations provides an enormous advantage for training machine learning methods, opening the door for employing a vast number of supervised learning algorithms, including many different deep learning architectures.
Methodologies developed for bulk data cannot be directly used to infer TRNs for single-cell data as utilizing single-cell data proposes unique challenges, which do not exist in the bulk data. 52,53,59 Dropout events lead to data sparsity, which can hinder data processing. This is generally addressed by using data imputation, though this approach can sometimes introduce false interactions.60 As an additional challenge, the stochastic nature of gene expression in the individual cells increases noise, which obscures the biological signal. This undesired effect gradually reduces in parallel to the growing size of the data.57 However, the immense increase in data size does propose challenges in terms of computational resources, as the bulk methods that are tested in tens of samples can fail to complete in the case of a single-cell dataset that includes thousands of cells. Typical single-cell methods tackle this obstacle by reducing the number of genes through filtration of gene expression values. However, this approach carries the potential risk of missing information in the output.57 Hence, single-cell-based TRN methods need to be designed efficiently to handle these issues and effectively utilize this specific type of data.
Expression-based methods for constructing TRNs
Most TRN derivation methods rely on gene expression data and employ statistical techniques like correlation coefficients (e.g. Pearson or Spearman) for pairwise gene expression comparisons.5,32–34 The networks that are based on transcriptional correlation in this context are classified as co-expression networks. One of the early developed and most widely used co-expression network methods is WGCNA,61 which not only builds gene networks but also identifies modules, which are highly correlated clusters of genes. WGCNA’s high dimensional counterpart hdWGCNA62 derives gene networks and modules from single-cell genomics and spatial transcriptomics data, using meta-cells, which are defined as groups of cells that are transcriptionally similar to one another.
One drawback of co-expression networks is that they lack the ability to establish directional relationships, which is a necessary feature for identifying the downstream targets of TFs in TRNs. Additionally, they cannot distinguish between direct and indirect interactions, making them susceptible to false positives. To mitigate this, alternative approaches employ partial correlations to model gene expression vectors as multivariate normal distributions.63–67 The resulting precision matrix, the inverse of the covariance matrix, captures partial correlations among the genes. While an improvement over simple correlations, this technique assumes a normal distribution, which may not always hold true for gene expression data.
Another limitation with the correlation metrics is that they only capture linear dependencies in gene expression data, which often contains non-linear relationships like feedback loops and sigmoidal responses. In contrast, mutual information (MI) can capture both linear and non-linear dependencies. Rooted in information theory, MI quantifies the interdependence between two variables by measuring the information gained about one variable while observing the other through Kullback–Leibler divergence.33,35–39 An additional advantage of MI over correlation is that it does not assume a specific distribution of data. However, as a symmetric measure, it still cannot infer the directionality of gene interactions. In addition, as the MI is designed to work with discrete data, continuous gene expression values cannot be used directly for calculating this information.58 Hence, the continuous data must be discretized into bins,68 which can be computationally demanding.69 More importantly, the boundary of the MI values depends on the data distribution, and unlike correlation, there is no predefined boundary. The absence of a preset boundary can make interpretation of the results difficult. Moreover, the MI is incapable of determining indirect correlations, leading to increased levels of false positives.70
On the other hand, regression-based methods naturally infer directionality in gene regulatory networks.43–46 These methods model gene expression using multiple regression models, treating the expression of each gene as the dependent variable, influenced by the expression of other genes as independent variables. To simplify the model, they apply regularization penalties to encourage coefficients to approach zero. An alternative approach involves constructing a regression tree to predict gene expression and determine the weight of regulator genes based on their significance within the tree.71,72 However, this approach can be computationally expensive, leading to longer runtimes as data size increases. To address this challenge, gradient boosting machines have been employed to enhance performance.51 Recently, a balanced approach for deriving TRNs was developed that combines random forest (RF), extra tree, and support vector regressors through an ensemble regression technique.73
Probabilistic methods for constructing TRNs utilize Bayesian networks for inferring gene interactions.40–42 These methods model the gene’s expression as a conditional probability of its parent genes (regulators) and the expression of all genes as a joint probability distribution of individual distributions. Similar to regression-based methods, Bayesian networks inherently infer directionality. However, a significant limitation is their assumption of the network topology as a directed acyclic graph, which reduces their ability to model loops in the networks. This limitation is critical because TFs can regulate themselves and other TFs in various pairwise or higher- order structures.74,75
As an alternative to statistical techniques, various supervised machine learning methods have been applied to predict TRNs from transcriptome data.47,48 These methods frame TRN construction as a binary classification task, aiming to identify TF-to-target interactions using the expression profiles of both genes. Support vector machine (SVM)-based methods classify these interactions directly based on the expression of both the TF and target gene47 or by indirectly converting the data into graph distance profiles for input in the kernel function.49 Transfer learning extends the utility of SVM-based methodologies by training models on one organism and transferring knowledge to another.50 Another variation incorporates positive-only data for training SVM classifiers.48 A recent study conducted a comprehensive evaluation of supervised learning methods for TRN inference using single-cell expression data.6 The authors formulated network inference as a binary classification problem and trained SVM, random forest (RF), K-nearest neighbor (KNN), naive Bayesian, decision tree, and logistic regression algorithms to solve the problem. Notably, these models outperformed unsupervised approaches, with SVM, RF and KNN emerging as top performers among the supervised methods.
Deep learning methods for constructing TRNs
Different variations of deep learning architectures have also been employed to infer TRNs76 and mostly use the BEELINE77 evaluation framework for benchmarking (Table 2). Convolutional neural networks, known for their unprecedented accuracy in image classification, have been modified for TRN construction.78 This adaptation involves converting gene expression data into image representations based on normalized empirical probability function matrices and training the classifiers with these images. A hybrid method combining CNNs and recurrent neural networks has been introduced and takes into account both correlations and pseudotimes for deriving TRNs from single-cell RNA-Seq data.79 Another approach uses 3D CNNs to predict regulatory interactions by using expression of gene triplets to reduce the effect of noise and dropout.80
Table 2.
Deep learning methods for TRN construction
| Method | Data type | Primary learning type |
|---|---|---|
| Self Organizing Maps79 | Expression, Sequence | Unsupervised |
| Latent Dirichlet Allocation80 | Expression, Sequence | Unsupervised |
| Recurrent Neural Networks77 | Expression | Supervised |
| Convolutional Neural Networks76 | Expression, Sequence | Supervised |
| Graph Neural Networks81,82,83,84,85 | Expression | Supervised |
| Graph Attention Neural Networks | Expression | Supervised |
| Structural Equation Modeling86,87,88 | Expression | Supervised |
| Transformer Learning Model89 | Expression | Supervised |
| Denoising Diffusion Probabilistic Models90 | Expression | Supervised |
Structural equation modeling (SEM) techniques, which are used to derive the relationships between observed and latent variables, are also employed in a deep learning context for TRN inference. DeepSEM adapts the SEM for inferring regulatory relationships using single-cell gene expression data.88 DAZZLE follows a similar approach as DeepSEM but introduces the Dropout Augmentation step, which adds random zero values to the expression data during training to increase robusticity.89 MetaSEM couples SEM with a meta-learning approach for learning high dimensional data features.90
Graph neural networks (GNNs) are also utilized for unsupervised network construction. GenKI builds network construction as an unsupervised learning problem by adapting a variational graph autoencoder.83 GRGNN also follows an unsupervised approach using GNNs in which the network construction task is handled as a graph classification problem.84 DeepRIG also utilizes GNNs for TRN construction, but after building a prior co-expression network step.85 Another GNN-based approach is GRINCD, which first generates a graph representation of each gene and applies the additive noise model to predict causal regulation.86 GeneLINK uses a graph attention network (GAT) model, which is a specific type of GNN, to infer TRNs from incomplete prior networks for link prediction.87
The TRN inference problem is addressed with deep learning models as well. One of them is STGRNS, which formulates the network inference problem as a binary classification task and employs the transformer architecture, which is a deep learning model widely used in language learning models.91 RegDiffusion articulates GRN construction as a supervised regression task and uses Denoising Diffusion Probabilistic Models. In contrary to DAZZLE, RegDiffusion adds Gaussian noise to the data, and the machine learning model is trained to predict the added noise.92
Using epigenomic data for constructing TRNs
Expression-based methods for deriving TRNs are frequently validated using relatively simple organisms such as Escherichia coli or yeast, primarily because they are easy to genetically engineer and can facilitate the elegant construction of ground truth networks. However, when these methods are extended to complex organisms, independent benchmarking exposes a notable decrease in accuracy, which can be ascribed to the impact of epigenetic regulation via DNA and chromatin modifications in complex organisms (Fig. 1A), a factor not accounted for in expression-based methods.93,94 Therefore, the cell’s epigenetic landscape must be integrated into network models.
Prior studies have used histone modification data95 to achieve high accuracy at the bulk level, surpassing methods relying on gene expression. Other approaches, like single-cell Assay-for-Transposase-Accessible-Chromatin sequencing (scATAC-seq), have leveraged epigenetic data for TRN construction.81,82,96–98 To identify TFBSs in accessible chromosomal regions, epigenomic methods incorporate motif or footprinting analysis and link TFBSs to their target genes with proximity or correlation-based approaches. Despite the increased cost associated with using two data modalities, recent experimental approaches offer a potential solution by profiling both gene expression and chromatin accessibility together to generate multiomic data.99,100
For example, the multiomic-based SCENIC+ employs a modified version of cisTopic, which uses latent Dirichlet allocation, to calculate motif enrichment scores along putative enhancer regions and identify cis-regulatory enhancers and TF-to-target relationships.82 scREG also uses multiomic data but distinguishes itself in its utilization of non-negative matrix factorization (NMF) to derive a lower-dimensional representation for calculating cluster-specific regulatory scores.97 IReNA couples motif analysis with decision tree (DT) regressions to construct TRNs from single-cell multiomic data,96 while DIRECT-NET uses motif analysis and gradient boosting machines (GBM) to identify cis-regulatory enhancers and TF-to-target interactions.96,98 Self-organizing maps (SOMs) are also utilized to construct TRNs with multiomic data.81 In this deep learning technique, separate SOMs are generated for single-cell RNA-Seq and ATAC-Seq data and then integrated via a linking function to build the regulatory network based on motif enrichment. Additional single-cell multiomics TRN construction methods include Dictys,101 which performs footprint analysis with single-cell ATAC-Seq data to infer TF-to-target relationships, which are further refined by stochastic process modeling of regulatory relationships based on single-cell RNA-Seq data. Finally, scMEGA102 integrates scRNA-Seq and scATAC-Seq data and associates chromatin accessibility peaks with the genes based on signal correlation; it uses TF binding information to link TFs to their target genes.
Functional examination of cis-regulatory elements
Due to the aforementioned ambiguity regarding gene regulatory elements, it remains challenging to link variants in regulatory sequences with functional outcomes.103 Although assays like DNase-seq and ChIP-seq provide comprehensive genome-wide regulatory maps, they do not offer a functional readout for the identified sequences.2 Recent advancements have allowed for a more systematic and large-scale examination of the functions of cis-regulatory elements, and there are two main groups of methods: 1) Massively Parallel Reporter Assays (MPRAs) and Self-Transcribing Active Regulatory Region Sequencing (STARR-seq), and 2) CRISPR-Cas9 techniques (Fig. 3).
Figure 3.

Approaches to studying cis-regulatory elements. A. Standard reporter assays assess the ability of candidate promoter sequences to drive expression of a reporter gene or the ability of a candidate enhancer and minimal promoter together to drive expression of a reporter gene. B. Massively parallel reporter assays utilize transcribed barcodes following the reporter gene. C. STARR-seq uses the candidate enhancer, which is part of the downstream regulatory sequence, as the transcribed barcode. D. CRISPR/Cas9 screens use sgRNAs to assess changes in function or expression of a target gene. E. CRISPRi allows researchers to selectively inhibit the expression of target genes by using Cas9 to block transcription. (Created with BioRender.com).
MPRAs and STARR-seq in gene regulation research
Standard reporter assays test whether a regulatory element activates a reporter gene in a one-by-one manner, potentially leading to bottlenecks,2,104 whereas MRPAs enable the simultaneous assessment of thousands of cis-regulatory elements.105–107 In MPRAs, candidate regulatory sequences are linked to unique barcodes and incorporated into classic promoter or enhancer reporter vectors, enabling the regulatory element to drive its own transcription and that of the associated barcode. Subsequently, next-generation sequencing measures the barcode’s expression and normalizes it to the genomic element’s DNA abundance to represent cis-regulatory activity.108
However, traditional MPRAs are conducted outside of the genome in an episomal manner and may not fully represent the in vivo functions of regulatory elements.2 To address this limitation, novel approaches, such as the lentivirus-based MPRA (LentiMPRA), integrate regulatory elements into the chromosomal context of the genome.109 In one study, LentiMPRA was employed to compare the functional activities of 2,236 candidate liver enhancers in episomal versus chromosomally integrated contexts, revealing significant differences. These findings have broad implications for the identification, prioritization, and functional validation of cis-regulatory elements. Additionally, MPRAs using adeno-associated virus (AAV) vectors offer solutions to mitigate this concern.110,111 Klein et al. (2020) performed a systematic comparison of MPRA experimental designs and found that sequence length had the greatest effect on the results of MPRAs, followed by assay design and then orientation.106
Recent research with MPRAs has advanced understanding of disease mechanisms, regulatory elements, and the functional consequences of genetic variation in the noncoding regions of the human genome. Studies have spanned a wide range of health-related topics, including human traits,112 vascular disease,113 inherited retinal degeneration,114 schizophrenia,115 Alzheimer's disease,115 osteoporosis,116 and early human neurodevelopment117 To understand the functional consequences of more than 30,000 single nucleotide substitutions and deletions within twenty disease-related promoters and enhancers, Kircher et al. (2019) conducted saturation mutagenesis using MPRAs.118 The resulting dataset of functional measurements for potentially disease-causing regulatory mutations has emerged as a comprehensive resource for the development of predictive tools. The findings underscore the potential of MPRAs for identifying new biomarkers and potential therapeutic targets for diseases including cancer.119 Recent studies have also highlighted the role of MPRAs in identifying cis-regulatory elements that are evolutionarily conserved 120 or essential for pluripotency.121
Although traditional MPRAs require serial assays across different cell types, single-cell MPRAs (scMPRAs) concurrently measure cis-regulatory sequences at the single-cell level, while also identifying cell identities through transcriptomes.122 This is accomplished using a two-level barcoding scheme that measures reporter gene copy numbers in single cells based on mRNA. By employing complex random barcodes (rBC) and specific CRS barcodes (cBC), this method minimizes repetition of cBC-rBC pairs within the same cell, ensuring precise measurements of cis-regulatory sequence activity across different cell populations with varying input abundances. The potential of scMPRA’s to assess subtle genetic variations in cis-regulatory sequences across various cell types was demonstrated in live mouse retinas.122
STARR-seq is a massively parallel reporter assay experiment type that examines putative transcriptional enhancers based on their activity in fragments derived from across the entire genome.123,124 DNA fragments are cloned downstream of a core promoter and into the 3’ untranslated region of a reporter gene. Active enhancers within these fragments drive transcription and become part of the resulting reporter transcripts, enabling the simultaneous testing of millions of DNA sequences in a complex reporter library. Key features of STARR-seq include its independence from the location of candidate sequences within the genome and its avoidance of position effects that are typically associated with random genomic integration. It provides a quantitative measure of enhancer activity and can generate genome-wide cell type-specific maps of enhancer activity. Although STARR-seq does not directly measure enhancer activity within its endogenous chromatin environment, it identifies functional enhancers that overlap accessible chromatin and bear typical histone modifications, suggesting functionality within their endogenous context.23 Additionally, it can detect enhancer activity within inaccessible chromatin regions marked by specific histone modifications, which can provide insights into chromatin-mediated silencing mechanisms in gene regulation.
STARR-seq has allowed for the identification and examination of novel enhancers in humans, animals,124 and plant genomes125 and for further characterization of genome-wide enhancer-promoter interactions.126 Whole Human Genome STARR-seq (WHG-STARR-seq) is a powerful method for assessing enhancer activity across the entire human genome and has been utilized to identify active enhancers in open chromatin regions and potentially functional enhancers in inaccessible chromatin regions.127 Additionally, STARR-seq facilitates high-resolution mapping of tissue-specific regulatory elements, including enhancers with highly biased activity towards the dorsal raphe nucleus in the brain.128 STARR-seq has also helped researchers understand how alterations in regulatory elements can contribute to diseases like coronary artery disease129; this has offered insights into potential drug targets within regulatory regions.
CRISPR-Cas9 in gene regulation research
In contrast to MPRAs and STARR-seq, CRISPR-Cas9 allows regulatory elements to be studied within their broader sequence context.13–16 The CRISPR-Cas9 gene-editing system selectively modifies or deletes specific cis-regulatory elements, and researchers can discern functional importance by observing the impact on gene expression. In gene regulation research, CRISPR-Cas9 has been utilized for screens, knockout/knock-in studies, CRISPR activation (CRISPRa), CRISPR interference (CRISPRi), and epigenetic studies.
Studies have highlighted the value of CRISPR screens in general in identifying cis-regulatory elements associated with disease states,130–133 including neurodevelopmental disorders and neurodegenerative diseases134,135 and host-pathogen interactions for viruses such as COVID-19.136 These screens have also pinpointed specific TFs as potential immunotherapeutic targets in various cancer types137 and been used to investigate the connection between regulatory elements and drug resistance or sensitivity in cancer cells.138 Pooled CRISPR screens, like Perturb-seq, couple genetic perturbations with single-cell transcriptomics to study the effects of these perturbations at the single-cell level. However, their limitations include prohibitive costs and challenges in efficiently measuring lowly expressed genes and small effects.139,140 To address these drawbacks, targeted Perturb-seq (TAP-seq) amplifies specific genes of interest, significantly increasing the scalability and cost-effectiveness of single-cell genetic and functional CRISPR screens, while also providing improved sensitivity for detecting gene expression changes.140,141
CRISPR interference (CRISPRi), which selectively inhibits target gene expression using catalytically inactive Cas9 protein, provides insights specifically into the effects of gene silencing on regulatory networks. To functionally validate distal cis-regulatory elements and link them to their target genes, researchers have combined CRISPRi and RNA-seq.16,139,142,143 One study introduced CRISPRi-FlowFISH, which combines CRISPRi with RNA fluorescence in situ hybridization and flow cytometry.144 By applying this technique to a large dataset of potential enhancer-gene connections, researchers developed an Activity-by-Contact (ABC) model; this model significantly improves predictions regarding enhancer-gene connections and offers a systematic way of mapping and predicting relationships based on chromatin state measurements. Over the past year, CRISPRi experiments have investigated regulatory element pathways important in various cancers,145–149 neurodegenerative diseases,150,151 systemic lupus erythematosus,152 and asthma153; these and similar studies highlight CRISPRi’s potential for therapeutic purposes.
Although CRISPRi experiments have led to advancements in cis-regulatory research, the vast majority of studies have worked on screening regulatory elements for necessity and not for sufficiency in the endogenous context.154 On the other hand, CRISPR activation (CRISPRa) leverages dCas9 fused to transcriptional activators to enhance gene expression and therefore allows researchers to test for sufficiency.155–158 In addition, CRISPRa offers other advantages: it can identify elements that, when targeted, upregulate the activity of already active genes, discover dormant regions that act as enhancers and increase gene expression when activated, and pinpoint cell type-specific activity in noncoding regulatory elements, making it a potential tool for targeted therapy in haploinsufficient and other low-dosage associated disorders, such as autism.159–163 However, one limitation of CRISPRa research thus far is that it has primarily been used in an ad hoc manner in workhouse cancer cell lines, not therapeutically relevant in vitro models.154 In response to this, Chardon et al. (2023) have introduced an experimental framework that combines multiple gRNA perturbations with sc-RNA-seq and allows for large-scale screening of gRNAs that activate therapeutically relevant genes in a cell type-dependent manner.154
Additionally, CRISPR-Cas9-based epigenome editing, specifically with dCas9-KRAB, has shown potential to induce targeted epigenetic changes in regulatory elements in the native genomic context.164 Researchers successfully silenced multiple globin genes by directing H3K9 trimethylation to the HS2 enhancer, demonstrating precise disruption of enhancer activity and gene expression without significant off-target effects.165 Klann et al. (2017) used dCas9-KRAB repressor, as well as dCas9p300 activator, with lentiviral single guide RNA libraries to perform loss- and gain-of-function screens targeting DNase I hypersensitive sites near specific genes, successfully identifying known and novel regulatory elements.166 Kabadi et al. (2022) also used dCas9-KRAB and dCas9p300, but investigated Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) gene regulation across 18 genomic regions, identifying enhancers and repressors and showing potential for enhancing CFTR expression as a therapeutic strategy for cystic fibrosis.167
Applying machine learning techniques to decipher the cis-regulatory code
Architectures including CNNs, recurrent neural networks (RNNs), and transformers have emerged as powerful machine learning tools for research on gene regulation (Fig. 4). Studies have focused on identifying TFBSs and regulatory elements, as well as deciphering the rules for cis-regulatory grammar.
Figure 4.

Machine learning approaches to studying cis-regulatory elements. A. Convolutional neural networks are deep learning models designed for processing structured grid-like data by using filters to detect patterns hierarchically. B. Recurrent neural networks are specialized in handling sequential data, where information cycles through recurrent connections. C. Transformers enable complex hierarchical relationships and dependencies within data to be captured. (Created with BioRender.com).
Methods for transcription factor binding identification
Numerous machine learning tools have been developed for identifying TFBSs. Deep learning-based methods, such as DeepBind, DeepSea, DanQ, FCNA, and Basset have been developed to predict TF-DNA binding motifs across diverse cell lines.168–172 However, these methods have faced challenges related to model interpretability and performance in capturing complex regulatory relationships. In response to these limitations, one algorithm entitled TF–MoDISco173 leverages explainability techniques, such as DeepLIFT,174 and uses per-base importance scores to consolidate motifs learned by the network into a non-redundant set, subsequently improving the interpretability of the motifs.
Recognizing the need for an interpretable method that considers TF cooperativity and is based on genome-wide experimental data, BPNet, a CNN, was developed to model the relationship between cis-regulatory sequences and TFBSs at base resolution.175 It successfully identified composite TFBSs, indirect binding footprints, and TFBS periodicity patterns, with CRISPR-validated motif syntax. Another machine-learning framework, AgentBind, utilizes CNNs to predict TF binding, but also assesses the importance of sequence context features.176 Ultimately, this model highlights the importance of training data quality and features such as open chromatin on prediction accuracy.
In order to apply deep learning to massively parallel sequencing data, not just sequential data, DeepGRN was developed next. It combines single and pairwise attention modules and utilizes an attention mechanism to capture long-range dependencies from DNA sequences and associated data.177 More recently, DeepSTF, a unique deep-learning architecture for predicting TFBSs that integrates shape and DNA sequence profiles, was designed; it utilizes stacked CNNs and a novel transformer encoder structure.178 Lastly, to provide user-friendly TFBS prediction from ATAC-seq data, maxATAC, a large suite of deep neural network models, was established.179
Identifying promoters, enhancers, and their interactions
Recent advancements in machine learning have focused on computationally predicting promoters,180,181 their strength,182,183 and mRNA abundance.184,185 For instance, iSEGnet, a deep CNN integrating epigenetic modifications and RNA-seq data, identified potential regulatory sites within promoters and transcription termination sites and provided insight into specific epigenetic modifications within regulatory regions.186 CNNs have also been utilized in models to predict enhancers187–189 and their interactions with promoters.190–192
To assess the functional effects of human-chimpanzee variants in human accelerated regions, a recent study utilized Sei, a deep CNN model, coupled with lentiMPRA and epigenetic experiments.193 The findings highlighted nucleotide changes, predicted by TF footprints, as the primary source of differences in human-chimpanzee enhancer activity. Additionally, evidence of compensatory evolution to preserve ancestral enhancer activity was observed, which is significant given links between enhancer-active human accelerated regions and neurodevelopmental genes and neuropsychiatric diseases.
The Enformer model incorporates transformers in addition to CNNs to predict enhancers.190 Transformers have shown promise in various other fields, such as Natural Language Processing (NLP), which attempts to capture the most relevant information through an attention mechanism. Despite a recent push to favor large-scale attention transformer models in this field, some researchers have argued that despite excellent performance in protein structure prediction, text mining, and genomic data analysis, the quality of transformer models can be overestimated under certain test scenarios.194,195 Concerns also persist regarding their ability to effectively capture long-range interactions.194
A relevant development has been LegNet, a CNN for modeling short gene regulatory regions that achieved first rank in predicting promoter expression from a gigantic parallel reporter assay at the DREAM 2022 challenge.195 The authors highlight that fully convolutional networks should be recognized as a dependable method for computationally modeling short gene regulatory regions and predicting the consequences of regulatory sequence modifications. However, ultimately, it is critical to remember that the effectiveness of machine learning and artificial intelligence models hinges on the quality of experimental data, with current limitations in wet lab techniques contributing to challenges in precisely defining enhancers across the genome and occasionally leading to poor reproducibility even in replicates of the same experiment.
Deciphering the rules governing cis-regulatory sequences
Researchers have been challenged to decipher cis-regulatory grammars, the binding combinations and patterns that dictate regulatory activity in different cellular contexts. Models, ranging from the flexible TF billboard model196 to the more stringent enhanceosome model,197 have attempted to explain how regulatory grammar drives enhancer activity. To investigate the impact of TFBS orientation and order, Georgakopoulos-Soares et al. (2023) utilized an extensive lentiMPRA library of 209,440 sequences.198 Their findings indicated that TFBS orientation significantly impacts gene regulatory activity, especially with multiple copies of the same TFBS. Some TFBSs showed increased expression levels with specific orientations, while others performed best with a balanced proportion of orientations. In addition, the study highlighted that the order in which heterotypic TFBSs are placed can significantly influence gene expression. Ultimately, the research concluded that incorporating TFBS orientation into predictive models enhances their performance and may improve understanding of disease-associated genetic variants.
Agarwal et al. (2023) also utilized lentivirus-based MPRAs to study the sequence features controlling the activity and cell-type-specific attributes of cis-regulatory elements within the human genome.24 Using lentiMPRAs, they tested over 680,000 sequences representing annotated CREs across three cell types and found that while promoters exhibited significant strand orientation effects, enhancers displayed tissue-specific characteristics. Ultimately, the research generated accurate sequence-based models for predicting CRE function, identified factors influencing cell-type specificity, and provided a comprehensive catalog of functional CREs in commonly used cell lines.
Neural networks have, in addition to MPRAs, been employed to predict cis-regulatory grammar. DeepSTARR, a deep learning model for cis-regulatory grammar, was designed to predict the activities of developmental and housekeeping enhancers in Drosophila melanogaster S2 cells directly from the DNA sequence.199 This model not only identifies relevant TF motifs, but also discerns the higher-order rules governing functional differences between instances of the same TF motif and facilitates the creation of custom synthetic enhancers. One deep learning method based on generative adversarial networks, ExpressionGAN, utilizes genomic and transcriptomic data to generate de novo synthetic regulatory DNA.200 To better understand how mutations in non-coding regulatory sequences impact cis-regulatory grammar, one study developed sequence-to-expression models using deep neural networks with convolutional layers.201 These models provided a framework for addressing questions in regulatory evolution. The study ultimately concluded that regulatory evolution occurs rapidly and is subject to diminishing returns epistasis, which means that as genetic changes accumulate, their effects become less pronounced.
In addition, residual neural network (ResNet) algorithms, designed to address vanishing gradients in very deep networks, have been utilized to classify enhancer sequences by simulating the sequences with various regulatory architectures, including homotypic/heterotypic clusters and enhanceosomes.202 Findings demonstrated that ResNets can effectively model regulatory grammars, even with heterogeneity in regulatory sequences and a significant proportion of TFBSs outside regulatory grammars. However, the network’s ability to learn regulatory grammar does still depend on the nature of the prediction task.
Conclusions
In summary, understanding cis-regulatory grammar has emerged as a challenge in genomics, but recent research has provided valuable information. Advancements in deep learning, such as GNNs and CNNs, have shown promise in inferring TRNs with higher accuracy and the ability to capture nonlinear dependencies. Additional research has demonstrated that integrating both gene expression and epigenetic data enhances our understanding of transcriptional regulation. Functional examination of cis-regulatory elements through techniques like MPRA, STARR-seq, and CRISPR-Cas9 has furthered research on their role in disease and allowed for the identification of biomarkers and therapeutic targets for precision medicine. Future advancements in single-cell genomics will allow researchers to dissect regulatory networks within individual cells, and machine learning models will allow for a more holistic view of cis-regulatory logic by integrating genomics, epigenomics, transcriptomics, and proteomics. A multitude of machine learning techniques rely on black-box artificial intelligence (AI) models, like CNNs and RNNs, which provide accurate predictions but often lack explanations for the underlying mechanisms.203 Consequently, there is growing emphasis on incorporating more explainable AI techniques into biological data analysis. The ongoing evolution of AI is set to enhance predictive abilities, potentially unveiling novel regulatory motifs and strengthening our understanding of cis-regulatory grammar.
Acknowledgements
C.M., I.M., N.C., and I.G.S. were supported by startup funds of I.G.S from the Penn State College of Medicine. Y.U. was supported by the Four Diamonds Pediatric Cancer Research Center and by the National Institute of General Medical Sciences of the National Institutes of Health under award number R35GM150616. We also want to thank Vikram Agarwal for his insightful comments on the review.
Abbreviations
- ATAC-seq
Assay-for-Transposase-Accessible-Chromatin sequencing
- AI
Artificial Intelligence
- ChIP-seq
Chromatin Immunoprecipitation followed by Sequencing
- CUT&Tag
Cleavage Under Targets & Tagmentation
- CUT&RUN
Cleavage Under Targets & Release Using Nuclease
- ChIP-exo
ChIP-exonuclease
- ChIP-nexus
Chromatin-ImmunoPrecipitation with nucleotide resolution using exonuclease digestion, unique barcode and single ligation
- CNN
Convolutional neural network
- TF
Transcription factor
- TFN
Transcription factor networks
- TSSs
Transcription start sites
- GCNs
Gene co-expression networks
- GRNs
Gene regulatory networks
- TRNs
Transcriptional regulatory networks
- SVM
Support vector machine
- TFBS
Transcription factor binding sites
- MPRAs
Massively parallel reporter assays
- STARR-seq
Self-Transcribing Active Regulatory Region Sequencing
Footnotes
Conflict of Interest
All authors declare that they have no conflicts of interest.
Data Availability Statement
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
References
- 1.Maurano MT et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Chatterjee S & Ahituv N Gene Regulatory Elements, Major Drivers of Human Disease. Annu. Rev. Genomics Hum. Genet. 18, 45–63 (2017). [DOI] [PubMed] [Google Scholar]
- 3.Manolio TA et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.He B & Tan K Understanding transcriptional regulatory networks using computational models. Curr. Opin. Genet. Dev. 37, 101–108 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Marbach D et al. Wisdom of crowds for robust gene network inference. Nat. Methods 9, 796–804 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Yang B, Bao W, Chen B & Song D Single_cell_GRN: gene regulatory network identification based on supervised learning method and Single-cell RNA-seq data. BioData Min. 15, 13 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Gao P et al. Risk variants disrupting enhancers of T1 and T cells in type 1 diabetes. Proc. Natl. Acad. Sci. U. S. A. 116, 7581–7590 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Cooper YA, Guo Q & Geschwind DH Multiplexed functional genomic assays to decipher the noncoding genome. Hum. Mol. Genet. 31, R84–R96 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Galas DJ & Schmitz A DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res. 5, 3157–3170 (1978). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Meuleman W et al. Index and biological spectrum of human DNase I hypersensitive sites. Nature 584, 244–251 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Jolma A et al. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 20, 861–873 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Jolma A et al. DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013). [DOI] [PubMed] [Google Scholar]
- 13.Borys SM & Younger ST Identification of functional regulatory elements in the human genome using pooled CRISPR screens. BMC Genomics 21, 107 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Canver MC et al. BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis. Nature 527, 192–197 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Maeder ML et al. CRISPR RNA-guided activation of endogenous human genes. Nat. Methods 10, 977–979 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Fulco CP et al. Systematic mapping of functional enhancer-promoter connections with CRISPR interference. Science 354, 769–773 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Davis CA et al. The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 46, D794–D801 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Castro-Mondragon JA et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 50, D165–D173 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Weirauch MT et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hammal F, de Langen P, Bergon A, Lopez F & Ballester B ReMap 2022: a database of Human, Mouse, Drosophila and Arabidopsis regulatory regions from an integrative analysis of DNA-binding sequencing experiments. Nucleic Acids Res. 50, D316–D325 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Puig RR, Boddie P, Khan A, Castro-Mondragon JA & Mathelier A UniBind: maps of high-confidence direct TF-DNA interactions across nine species. BMC Genomics 22, 482 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Melnikov A et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30, 271–277 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Arnold CD et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013). [DOI] [PubMed] [Google Scholar]
- 24.Agarwal V et al. Massively parallel characterization of transcriptional regulatory elements in three diverse human cell types. bioRxiv (2023) doi: 10.1101/2023.03.05.531189. [DOI] [Google Scholar]
- 25.De Smet R & Marchal K Advantages and limitations of current network inference methods. Nat. Rev. Microbiol. 8, 717–729 (2010). [DOI] [PubMed] [Google Scholar]
- 26.Bar-Joseph Z et al. Computational discovery of gene modules and regulatory networks. Nat. Biotechnol. 21, 1337–1342 (2003). [DOI] [PubMed] [Google Scholar]
- 27.Reiss DJ, Baliga NS & Bonneau R Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinformatics 7, 1–22 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Gerstein MB et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Jothi R et al. Genomic analysis reveals a tight link between transcription factor dynamics and regulatory network architecture. Mol. Syst. Biol. 5, 294 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Uzun Y Approaches for benchmarking single-cell gene regulatory network inference methods. (2023) doi: 10.48550/ARXIV.2307.08463. [DOI] [Google Scholar]
- 31.Shen W-K et al. AnimalTFDB 4.0: a comprehensive animal transcription factor database updated with variation and expression annotations. Nucleic Acids Res. 51, D39–D45 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Zhang B & Horvath S A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4, Article17 (2005). [DOI] [PubMed] [Google Scholar]
- 33.Liu Z-P Quantifying Gene Regulatory Relationships with Association Measures: A Comparative Study. Front. Genet. 8, 96 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Yin W, Mendoza L, Monzon-Sandoval J, Urrutia AO & Gutierrez H Emergence of co-expression in gene regulatory networks. PLoS One 16, e0247671 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Margolin AA et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7 Suppl 1, S7 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Faith JJ et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 5, e8 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Meyer PE, Kontos K, Lafitte F & Bontempi G Information-theoretic inference of large transcriptional regulatory networks. EURASIP J. Bioinform. Syst. Biol. 2007, 79879 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Butte AJ & Kohane IS Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac. Symp. Biocomput. 418–429 (2000). [DOI] [PubMed] [Google Scholar]
- 39.Cover TM & Thomas JA Elements of Information Theory. (John Wiley & Sons, 2012). [Google Scholar]
- 40.Statnikov A & Aliferis CF Analysis and computational dissection of molecular signature multiplicity. PLoS Comput. Biol. 6, e1000790 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Hill SM et al. Bayesian inference of signaling network topology in a cancer cell line. Bioinformatics 28, 2804–2810 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Gao S, Dai Y & Rehman J A Bayesian inference transcription factor activity model for the analysis of single-cell transcriptomes. Genome Res. 31, 1296–1311 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Haury A-C, Mordelet F, Vera-Licona P & Vert J-P TIGRESS: Trustful Inference of Gene REgulation using Stability Selection. BMC Syst. Biol. 6, 145 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Bonneau R et al. The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol. 7, R36 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Fuente A Gene Network Inference: Verification of Methods for Systems Genetics Data. (Springer Science & Business Media, 2014). [Google Scholar]
- 46.Michailidis G & d’Alché-Buc F Autoregressive models for gene regulatory network inference: sparsity, stability and causality issues. Math. Biosci. 246, 326–334 (2013). [DOI] [PubMed] [Google Scholar]
- 47.Mordelet F & Vert J-P SIRENE: supervised inference of regulatory networks. Bioinformatics 24, i76–82 (2008). [DOI] [PubMed] [Google Scholar]
- 48.Cerulo L, Elkan C & Ceccarelli M Learning gene regulatory networks from only positive and unlabeled data. BMC Bioinformatics 11, 228 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Razaghi-Moghadam Z & Nikoloski Z Supervised learning of gene-regulatory networks based on graph distance profiles of transcriptomics data. NPJ Syst Biol Appl 6, 21 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Mignone P, Pio G, D’Elia D & Ceci M Exploiting transfer learning for the reconstruction of the human gene regulatory network. Bioinformatics 36, 1553–1561 (2020). [DOI] [PubMed] [Google Scholar]
- 51.Aibar S et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Fiers MWEJ et al. Mapping gene regulatory networks from single-cell omics data. Brief. Funct. Genomics 17, 246–254 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Saint-André V Computational biology approaches for mapping transcriptional regulatory networks. Comput. Struct. Biotechnol. J. 19, 4884–4895 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Stumpf MPH Inferring better gene regulation networks from single-cell data. Curr. Opin. Syst. Biol. 27, 100342 (2021). [Google Scholar]
- 55.Dai H, Jin Q-Q, Li L & Chen L-N Reconstructing gene regulatory networks in single-cell transcriptomic data analysis. Zool Res 41, 599–604 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Badia-I-Mompel P et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat. Rev. Genet. (2023) doi: 10.1038/s41576-023-00618-5. [DOI] [PubMed] [Google Scholar]
- 57.Nguyen H, Tran D, Tran B, Pehlivan B & Nguyen T A comprehensive survey of regulatory network inference methods using single cell RNA sequencing data. Brief. Bioinform. 22, (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Mercatelli D, Scalambra L, Triboli L, Ray F & Giorgi FM Gene regulatory network inference resources: A practical overview. Biochim. Biophys. Acta Gene Regul. Mech. 1863, 194430 (2020). [DOI] [PubMed] [Google Scholar]
- 59.Akers K & Murali TM Gene regulatory network inference in single-cell biology. Curr. Opin. Syst. Biol. 26, 87–97 (2021). [Google Scholar]
- 60.Ly L-H & Vingron M Effect of imputation on gene network reconstruction from single-cell RNA-seq data. Patterns (N Y) 3, 100414 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Langfelder P & Horvath S WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Morabito S, Reese F, Rahimzadeh N, Miyoshi E & Swarup V hdWGCNA identifies co-expression networks in high-dimensional transcriptomics data. Cell Rep Methods 3, 100498 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Schäfer J & Strimmer K An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics 21, 754–764 (2005). [DOI] [PubMed] [Google Scholar]
- 64.Peng J, Wang P, Zhou N & Zhu J Partial Correlation Estimation by Joint Sparse Regression Models. J. Am. Stat. Assoc. 104, 735–746 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Johansson A et al. Partial correlation network analyses to detect altered gene interactions in human disease: using preeclampsia as a model. Hum. Genet. 129, 25–34 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Yip KY, Alexander RP, Yan K-K & Gerstein M Improved reconstruction of in silico gene regulatory networks by integrating knockout and perturbation data. PLoS One 5, e8121 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Friedman J, Hastie T & Tibshirani R Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Ross BC Mutual information between discrete and continuous data sets. PLoS One 9, e87357 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Madhamshettiwar PB, Maetschke SR, Davis MJ, Reverter A & Ragan MA Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets. Genome Med. 4, 41 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Lei J, Cai Z, He X, Zheng W & Liu J An approach of gene regulatory network construction using mixed entropy optimizing context-related likelihood mutual information. Bioinformatics 39, (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Huynh-Thu VA, Irrthum A, Wehenkel L & Geurts P Inferring regulatory networks from expression data using tree-based methods. PLoS One 5, (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Huynh-Thu VA & Sanguinetti G Combining tree-based and dynamical systems for the inference of gene regulatory networks. Bioinformatics 31, 1614–1622 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Alawad DM, Katebi A, Kabir MWU & Hoque MT AGRN: accurate gene regulatory network inference using ensemble machine learning methods. Bioinform Adv 3, vbad032 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Sahu B et al. Sequence determinants of human gene regulatory elements. Nat. Genet. 54, 283–294 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Dotson GA et al. Deciphering multi-way interactions in the human genome. Nat. Commun. 13, 5498 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Li Z et al. Applications of deep learning in understanding gene regulation. Cell Rep Methods 3, 100384 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Pratapa A, Jalihal AP, Law JN, Bharadwaj A & Murali TM Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat. Methods 17, 147–154 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Yuan Y & Bar-Joseph Z Deep learning for inferring gene relationships from single-cell expression data. Proc. Natl. Acad. Sci. U. S. A. 116, 27151–27158 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Zhao M, He W, Tang J, Zou Q & Guo F A hybrid deep learning framework for gene regulatory network inference from single-cell transcriptomic data. Brief. Bioinform. 23, (2022). [DOI] [PubMed] [Google Scholar]
- 80.Fan Y & Ma X Gene regulatory network inference using 3D convolutional neural network. Proc. Conf. AAAI Artif. Intell. 35, 99–106 (2021). [PMC free article] [PubMed] [Google Scholar]
- 81.Jansen C et al. Building gene regulatory networks from scATAC-seq and scRNA-seq using Linked Self Organizing Maps. PLoS Comput. Biol. 15, e1006555 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Bravo González-Blas C et al. SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks. Nat. Methods (2023) doi: 10.1038/s41592-023-01938-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Yang Y et al. Gene knockout inference with variational graph autoencoder learning single-cell gene regulatory networks. Nucleic Acids Res. 51, 6578–6592 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Wang J, Ma A, Ma Q, Xu D & Joshi T Inductive inference of gene regulatory network using supervised and semi-supervised graph neural networks. Comput. Struct. Biotechnol. J. 18, 3335–3343 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Wang J, Chen Y & Zou Q Inferring gene regulatory network from single-cell transcriptomes with graph autoencoder model. PLoS Genet. 19, e1010942 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Feng K, Jiang H, Yin C & Sun H Gene regulatory network inference based on causal discovery integrating with graph neural network. Quant. Biol. 11, 434–450 (2023). [Google Scholar]
- 87.Chen G & Liu Z-P Graph attention network for link prediction of gene regulations from single-cell RNA-sequencing data. Bioinformatics 38, 4522–4529 (2022). [DOI] [PubMed] [Google Scholar]
- 88.Shu H et al. Modeling gene regulatory networks using neural network architectures. Nature Computational Science 1, 491–501 (2021). [DOI] [PubMed] [Google Scholar]
- 89.Zhu H & Slonim DK Improving gene regulatory network inference using Dropout Augmentation. bioRxiv (2023) doi: 10.1101/2023.01.26.525733. [DOI] [Google Scholar]
- 90.Zhang Y et al. MetaSEM: Gene Regulatory Network Inference from Single-Cell RNA Data by Meta-Learning. Int. J. Mol. Sci. 24, (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Xu J, Zhang A, Liu F & Zhang X STGRNS: an interpretable transformer-based method for inferring gene regulatory networks from single-cell transcriptomic data. Bioinformatics 39, (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Zhu H & Slonim DK From noise to knowledge: Probabilistic diffusion-based neural inference of gene regulatory networks. bioRxiv (2023) doi: 10.1101/2023.11.05.565675. [DOI] [Google Scholar]
- 93.Chen S & Mar JC Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data. BMC Bioinformatics 19, 232 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Kang Y, Thieffry D & Cantini L Evaluating the Reproducibility of Single-Cell Gene Regulatory Network Inference Algorithms. Front. Genet. 12, 617282 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Gao P et al. Transcriptional regulatory network controlling the ontogeny of hematopoietic stem cells. Genes Dev. 34, 950–964 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Jiang J et al. IReNA: integrated regulatory network analysis of single-cell transcriptomes and chromatin accessibility profiles. bioRxiv 2021.11.22.469628 (2022) doi: 10.1101/2021.11.22.469628. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Duren Z et al. Sc-compReg enables the comparison of gene regulatory networks between conditions using single-cell data. Nat. Commun. 12, 4763 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Zhang L, Zhang J & Nie Q DIRECT-NET: An efficient method to discover cis-regulatory elements and construct regulatory networks from single-cell multiomics data. Sci Adv 8, eabl7393 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Lee J, Hyeon DY & Hwang D Single-cell multiomics: technologies and data analysis methods. Exp. Mol. Med. 52, 1428–1442 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Ogbeide S, Giannese F, Mincarelli L & Macaulay IC Into the multiverse: advances in single-cell multiomic profiling. Trends Genet. 38, 831–843 (2022). [DOI] [PubMed] [Google Scholar]
- 101.Wang L et al. Dictys: dynamic gene regulatory network dissects developmental continuum with single-cell multiomics. Nat. Methods 20, 1368–1378 (2023). [DOI] [PubMed] [Google Scholar]
- 102.Li Z, Nagai JS, Kuppe C, Kramann R & Costa IG scMEGA: single-cell multi-omic enhancer-based gene regulatory network inference. Bioinform Adv 3, vbad003 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Matharu N & Ahituv N Modulating gene regulation to treat genetic disorders. Nat. Rev. Drug Discov. 19, 757–775 (2020). [DOI] [PubMed] [Google Scholar]
- 104.Oz-Levi D et al. Noncoding deletions reveal a gene that is critical for intestinal function. Nature 571, 107–111 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Inoue F & Ahituv N Decoding enhancers using massively parallel reporter assays. Genomics 106, 159–164 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Klein JC et al. A systematic evaluation of the design and context dependencies of massively parallel reporter assays. Nat. Methods 17, 1083–1091 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Roberts BS et al. Genome-wide strand asymmetry in massively parallel reporter activity favors genic strands. Genome Res. 31, 866–876 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Andersson R & Sandelin A Determinants of enhancer and promoter activities of regulatory elements. Nat. Rev. Genet. 21, 71–87 (2020). [DOI] [PubMed] [Google Scholar]
- 109.Inoue F et al. A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity. Genome Res. 27, 38–52 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Zheng Y & VanDusen NJ Massively Parallel Reporter Assays for High-Throughput In Vivo Analysis of Cis-Regulatory Elements. J Cardiovasc Dev Dis 10, (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Guo MG et al. Integrative analyses highlight functional regulatory variants associated with neuropsychiatric diseases. Nat. Genet. (2023) doi: 10.1038/s41588-023-01533-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Griesemer D et al. Genome-wide functional screen of 3’UTR variants uncovers causal variants for human disease and evolution. Cell 184, 5247–5260.e19 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Toropainen A et al. Functional noncoding SNPs in human endothelial cells fine-map vascular trait associations. Genome Res. 32, 409–424 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Oh IY & Chen S High-Throughput Analysis of Retinal Cis-Regulatory Networks by Massively Parallel Reporter Assays. Adv. Exp. Med. Biol. 1185, 359–364 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Myint L et al. A screen of 1,049 schizophrenia and 30 Alzheimer’s-associated variants for regulatory potential. Am. J. Med. Genet. B Neuropsychiatr. Genet. 183, 61–73 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Klein JC et al. Functional testing of thousands of osteoarthritis-associated variants for regulatory activity. Nat. Commun. 10, 2434 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Capauto D et al. Characterization of enhancer activity in early human neurodevelopment using Massively parallel reporter assay (MPRA) and forebrain organoids. bioRxiv (2023) doi: 10.1101/2023.08.14.553170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118.Kircher M et al. Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution. Nat. Commun. 10, 3583 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Catizone AN et al. Locally acting transcription factors regulate p53-dependent cis-regulatory element activity. Nucleic Acids Res. 48, 4195–4213 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Tan Y et al. Genome-wide enhancer identification by massively parallel reporter assay in Arabidopsis. Plant J. 116, 234–250 (2023). [DOI] [PubMed] [Google Scholar]
- 121.King DM et al. Synthetic and genomic regulatory elements reveal aspects of -regulatory grammar in mouse embryonic stem cells. Elife 9, (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Zhao S et al. A single-cell massively parallel reporter assay detects cell-type-specific gene regulation. Nat. Genet. 55, 346–354 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.Muerdter F, Boryń ŁM & Arnold CD STARR-seq - principles and applications. Genomics 106, 145–150 (2015). [DOI] [PubMed] [Google Scholar]
- 124.Neumayr C, Pagani M, Stark A & Arnold CD STARR-seq and UMI-STARR-seq: Assessing Enhancer Activities for Genome-Wide-, High-, and Low-Complexity Candidate Libraries. Curr. Protoc. Mol. Biol. 128, e105 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 125.Jores T et al. Identification of Plant Enhancers and Their Constituent Elements by STARR-seq in Tobacco Leaves. Plant Cell 32, 2120–2131 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126.Leung AK-Y, Yao L & Yu H Functional genomic assays to annotate enhancer-promoter interactions genome wide. Hum. Mol. Genet. 31, R97–R104 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127.Liu Y et al. Functional assessment of human enhancer activities using whole-genome STARR-sequencing. Genome Biol. 18, 219 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Nagayasu K, Andoh C, Shirakawa H & Kaneko S Diff-ATAC-STARR-Seq: A Method for Genome-Wide Functional Screening of Enhancer Activity in Vivo. Biol. Pharm. Bull. 45, 1590–1595 (2022). [DOI] [PubMed] [Google Scholar]
- 129.Örd T et al. Single-Cell Epigenomics and Functional Fine-Mapping of Atherosclerosis GWAS Loci. Circ. Res. 129, 240–258 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130.Xu J et al. Subtype-specific 3D genome alteration in acute myeloid leukaemia. Nature 611, 387–398 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 131.Dietlein F et al. Genome-wide analysis of somatic noncoding mutation patterns in cancer. Science 376, eabg5601 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132.Huang H et al. Defining super-enhancer landscape in triple-negative breast cancer by multiomic profiling. Nat. Commun. 12, 2242 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 133.Zhao X et al. Molecular Mechanisms of ARID5B-Mediated Genetic Susceptibility to Acute Lymphoblastic Leukemia. J. Natl. Cancer Inst. 114, 1287–1295 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 134.Kampmann M CRISPR-based functional genomics for neurological disease. Nat. Rev. Neurol. 16, 465–480 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135.Lim CKW et al. CRISPR base editing of cis-regulatory elements enables the perturbation of neurodegeneration-linked genes. Mol. Ther. 30, 3619–3631 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 136.Schneider WM et al. Genome-Scale Identification of SARS-CoV-2 and Pan-coronavirus Host Factor Networks. Cell 184, 120–132.e14 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 137.Chen Z et al. In vivo CD8 T cell CRISPR screening reveals control by Fli1 in infection and cancer. Cell 184, 1262–1280.e22 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 138.Li Y et al. Integrative analysis of CRISPR screening data uncovers new opportunities for optimizing cancer immunotherapy. Mol. Cancer 21, 2 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139.Gasperini M et al. A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic Screens. Cell 176, 377–390.e19 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 140.Datlinger P et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat. Methods 14, 297–301 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 141.Schraivogel D et al. Targeted Perturb-seq enables genome-scale genetic screens in single cells. Nat. Methods 17, 629–635 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 142.Gasperini M, Tome JM & Shendure J Towards a comprehensive catalogue of validated and target-linked human enhancers. Nat. Rev. Genet. 21, 292–310 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 143.Xie S, Duan J, Li B, Zhou P & Hon GC Multiplexed Engineering and Analysis of Combinatorial Enhancer Activity in Single Cells. Mol. Cell 66, 285–299.e5 (2017). [DOI] [PubMed] [Google Scholar]
- 144.Fulco CP et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 145.Saito Y et al. A Therapeutically Targetable TAZ-TEAD2 Pathway Drives the Growth of Hepatocellular Carcinoma via ANLN and KIF23. Gastroenterology 164, 1279–1292 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 146.Morgens DW, Gulyas L, Souza AS & Glaunsinger BA Coding and non-coding elements comprise a regulatory network controlling transcription in Kaposi’s sarcoma-associated herpesvirus. bioRxiv (2023) doi: 10.1101/2023.07.08.548212. [DOI] [Google Scholar]
- 147.Wang T-M et al. High-throughput identification of regulatory elements and functional assays to uncover susceptibility genes for nasopharyngeal carcinoma. Am. J. Hum. Genet. 110, 1162–1176 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 148.Bril O, Schwarzmueller LJ, Moreno LF, Vermeulen L & Léveillé N Identifying essential long non-coding RNAs in cancer using CRISPRi-based dropout screens. STAR Protoc 4, 102588 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 149.Ng M et al. Myeloid leukemia vulnerabilities embedded in long noncoding RNA locus. iScience 26, 107844 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 150.Rogers BB et al. expression is mediated by long-range interactions with -regulatory elements. bioRxiv (2023) doi: 10.1101/2023.03.07.531520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 151.Yang X et al. Functional characterization of Alzheimer’s disease genetic variants in microglia. Nat. Genet. 55, 1735–1744 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152.Hou G et al. Integrative Functional Genomics Identifies Systemic Lupus Erythematosus Causal Genetic Variant in the IRF5 Risk Locus. Arthritis Rheumatol 75, 574–585 (2023). [DOI] [PubMed] [Google Scholar]
- 153.Koh KD et al. Genomic characterization and therapeutic utilization of IL-13-responsive sequences in asthma. Cell Genom 3, 100229 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 154.Chardon FM et al. Multiplex, single-cell CRISPRa screening for cell type specific regulatory elements. bioRxiv (2023) doi: 10.1101/2023.03.28.534017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 155.Gilbert LA et al. Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation. Cell 159, 647–661 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 156.Tian R et al. Genome-wide CRISPRi/a screens in human neurons link lysosomal failure to ferroptosis. Nat. Neurosci. 24, 1020–1034 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 157.Konermann S et al. Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature 517, 583–588 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 158.Schmidt R et al. CRISPR activation and interference screens decode stimulation responses in primary human T cells. Science 375, eabj4008 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 159.Simeonov DR et al. Discovery of stimulation-responsive immune enhancers with CRISPR activation. Nature 549, 111–115 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 160.Matharu N et al. CRISPR-mediated activation of a promoter or enhancer rescues obesity caused by haploinsufficiency. Science 363, (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 161.Dai Z et al. Inducible CRISPRa screen identifies putative enhancers. J. Genet. Genomics 48, 917–927 (2021). [DOI] [PubMed] [Google Scholar]
- 162.Joung J et al. Genome-scale activation screen identifies a lncRNA locus regulating a gene neighbourhood. Nature 548, 343–346 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 163.Sala C & Verpelli C Neuronal and Synaptic Dysfunction in Autism Spectrum Disorder and Intellectual Disability. (Academic Press, 2016). [Google Scholar]
- 164.Klann TS, Crawford GE, Reddy TE & Gersbach CA Screening Regulatory Element Function with CRISPR/Cas9-based Epigenome Editing. Methods Mol. Biol. 1767, 447–480 (2018). [DOI] [PubMed] [Google Scholar]
- 165.Thakore PI et al. Highly specific epigenome editing by CRISPR-Cas9 repressors for silencing of distal regulatory elements. Nat. Methods 12, 1143–1149 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 166.Klann TS et al. CRISPR-Cas9 epigenome editing enables high-throughput screening for functional regulatory elements in the human genome. Nat. Biotechnol. 35, 561–568 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 167.Kabadi AM et al. Epigenome editing of the CFTR-locus for treatment of cystic fibrosis. J. Cyst. Fibros. 21, 164–171 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 168.Alipanahi B, Delong A, Weirauch MT & Frey BJ Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015). [DOI] [PubMed] [Google Scholar]
- 169.Zhou J & Troyanskaya OG Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 170.Quang D & Xie X DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 44, e107 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 171.Zhang Q et al. Locating transcription factor binding sites by fully convolutional neural network. Brief. Bioinform. 22, (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 172.Kelley DR, Snoek J & Rinn JL Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 173.Shrikumar A et al. Technical Note on Transcription Factor Motif Discovery from Importance Scores (TF-MoDISco) version 0.5.6.5. (2018). [Google Scholar]
- 174.Shrikumar A, Greenside P & Kundaje A Learning Important Features Through Propagating Activation Differences. in Proceedings of the 34th International Conference on Machine Learning (eds. Precup D & Teh YW.) vol. 70 3145–3153 (PMLR, 06--11 Aug 2017). [Google Scholar]
- 175.Avsec Ž et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 176.Zheng A et al. Deep neural networks identify sequence context features predictive of transcription factor binding. Nat Mach Intell 3, 172–180 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 177.Chen C et al. DeepGRN: prediction of transcription factor binding site across cell-types using attention-based deep neural networks. BMC Bioinformatics 22, 38 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 178.Ding P et al. DeepSTF: predicting transcription factor binding sites by interpretable deep neural networks combining sequence and shape. Brief. Bioinform. 24, (2023). [DOI] [PubMed] [Google Scholar]
- 179.Cazares TA et al. maxATAC: Genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks. PLoS Comput. Biol. 19, e1010863 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 180.Liu B, Yang F, Huang D-S & Chou K-C iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC. Bioinformatics 34, 33–40 (2018). [DOI] [PubMed] [Google Scholar]
- 181.Oubounyt M, Louadi Z, Tayara H & Chong KT DeePromoter: Robust Promoter Predictor Using Deep Learning. Front. Genet. 10, 286 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 182.Zhao M, Yuan Z, Wu L, Zhou S & Deng Y Precise Prediction of Promoter Strength Based on a De Novo Synthetic Promoter Library Coupled with Machine Learning. ACS Synth. Biol. 11, 92–102 (2022). [DOI] [PubMed] [Google Scholar]
- 183.Li H et al. dPromoter-XGBoost: Detecting promoters and strength by combining multiple descriptors and feature selection using XGBoost. Methods 204, 215–222 (2022). [DOI] [PubMed] [Google Scholar]
- 184.Agarwal V & Shendure J Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks. Cell Rep. 31, 107663 (2020). [DOI] [PubMed] [Google Scholar]
- 185.Zhou J et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 186.Gao S, Rehman J & Dai Y Assessing comparative importance of DNA sequence and epigenetic modifications on gene expression using a deep convolutional neural network. Comput. Struct. Biotechnol. J. 20, 3814–3823 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 187.Chen S, Gan M, Lv H & Jiang R DeepCAPE: A Deep Convolutional Neural Network for the Accurate Prediction of Enhancers. Genomics Proteomics Bioinformatics 19, 565–577 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 188.Kaur A, Chauhan APS & Aggarwal AK Prediction of Enhancers in DNA Sequence Data using a Hybrid CNN-DLSTM Model. IEEE/ACM Trans. Comput. Biol. Bioinform. 20, 1327–1336 (2023). [DOI] [PubMed] [Google Scholar]
- 189.Min X et al. Predicting enhancers with deep convolutional neural networks. BMC Bioinformatics 18, 478 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 190.Avsec Ž et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 191.Kelley DR et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 192.Singh S, Yang Y, Póczos B & Ma J Predicting enhancer-promoter interaction from genomic sequence with deep neural networks. Quant Biol 7, 122–137 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 193.Whalen S et al. Machine learning dissection of human accelerated regions in primate neurodevelopment. Neuron 111, 857–873.e8 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 194.Karollus A, Mauermeier T & Gagneur J Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers. Genome Biol. 24, 56 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 195.Penzar D et al. LegNet: a best-in-class deep learning model for short DNA regulatory regions. Bioinformatics 39, (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 196.Kulkarni MM & Arnosti DN Information display by transcriptional enhancers. Development 130, 6569–6575 (2003). [DOI] [PubMed] [Google Scholar]
- 197.Panne D The enhanceosome. Curr. Opin. Struct. Biol. 18, 236–242 (2008). [DOI] [PubMed] [Google Scholar]
- 198.Georgakopoulos-Soares I et al. Transcription factor binding site orientation and order are major drivers of gene regulatory activity. Nat. Commun. 14, 2333 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 199.de Almeida BP, Reiter F, Pagani M & Stark A DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat. Genet. 54, 613–624 (2022). [DOI] [PubMed] [Google Scholar]
- 200.Zrimec J et al. Controlling gene expression with deep generative design of regulatory DNA. Nat. Commun. 13, 5099 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 201.Vaishnav ED et al. The evolution, evolvability and engineering of gene regulatory DNA. Nature 603, 455–463 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 202.Chen L & Capra JA Learning and interpreting the gene regulatory grammar in a deep learning framework. PLoS Comput. Biol. 16, e1008334 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 203.Novakovsky G, Dexter N, Libbrecht MW, Wasserman WW & Mostafavi S Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat. Rev. Genet. 24, 125–137 (2022). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
