Graphical abstract
Keywords: Chromatin Conformation Capture, Enhancer-promoter interaction, cis-Regulatory element, Chromatin loop, Machine learning, Computational method
Abstract
Mechanisms underlying gene regulation are key to understand how multicellular organisms with various cell types develop from the same genetic blueprint. Dynamic interactions between enhancers and genes are revealed to play central roles in controlling gene transcription, but the determinants to link functional enhancer-promoter pairs remain elusive. A major challenge is the lack of reliable approach to detect and verify functional enhancer-promoter interactions (EPIs). In this review, we summarized the current methods for detecting EPIs and described how developing techniques facilitate the identification of EPI through assessing the merits and drawbacks of these methods. We also reviewed recent state-of-art EPI prediction methods in terms of their rationale, data usage and characterization. Furthermore, we briefly discussed the evolved strategies for validating functional EPIs.
1. Seeking functional EPIs
In the last several decades, researchers have identified several types of functional DNA elements that can regulate tissue/cell type-specific gene expression in cis [1]. These cis-regulatory elements (CREs) are often located in the non-coding genomic regions, which make up of over 98% of the human genome [2]. Promoters and enhancers are two major CREs that control context-dependent gene transcription, with which promoters drive gene expression adjacent to transcription start sites (TSS) and enhancers faithfully orchestrate transcription from distal position regardless of orientation [3], [4]. However, when and how these elements precisely regulate target gene expression are largely unknown.
Eukaryotic enhancers are bound by various transcription factors (TFs) when activated and upregulate target gene expression by forming chromatin loops with promoters [4], [5]. The chromatin loops are mostly driven by persistent or transient cohesin extrusion and other unknown mechanisms [6], [7]. The complexity of EPI formation and its functional implementation limits accurate detection of functional EPIs in high throughput manner. First, it is estimated that over 1,000,000 enhancers in the human genome, whereas the number of promoters, even if the alternative promoters are considered, is in the same order of magnitude as the number of gene transcripts [4], [8], [9], [10]. Such great redundancy enables target genes to be regulated by different enhancers and ensures robust gene control at different conditions, but in turn complicates the detection of tissue/cell type-specific EPIs [11], [12]. Second, enhancer is suggested to regulate gene expression by forming loops with promoters of target genes [13]. Though the distance among most EPIs is less than 200 kb, in extreme cases, enhancer can locate over 1 Mb away from its regulatory targets [14]. It is estimated that only 40% enhancers can regulate their nearest genes, and others cannot be detected through the nearest-gene rule [11]. Third, interaction between enhancer and promoter does not necessarily mean functional causation. EPIs detected by close proximity usually represent more likely association than causation because of contradictory evidence showing the influence of EPIs on gene expression [15], [16], [17]. Fourth, the dynamics of EPI and corresponding maintenance of gene expressions could be partially explained by loop extrusion model [18], [19], and how the functional EPIs are established and maintained remain to be fully addressed.
Currently, techniques based on chromosome conformation capture (3C-based techniques) are commonly used to identify EPIs in different throughputs [20]. However, the identification of true functional EPIs during development and homeostasis usually requires extra efforts, such as profiling chromatin states, tracking TF binding and quantifying eventual gene expression [21], [22]. Simplistically, evidence from three aspects are suggested to define a functional EPI in common practice: 1) active or meaningful chromatin states; 2) close spatial proximity (although several lines of evidence indicate enhancers could control target gene expression independent of EPI [17], [23]); and 3) positive transcription outcome (Fig. 1). By focusing on assessing the drawbacks and merits, in this review we summarize and categorize the technologies for identifying active CREs, detecting chromatin proximity and validating potential EPIs. We briefly introduce the roles of the evolved 3D genomic profiling assays for characterizing functional EPIs. Furthermore, we illustrate how the state-of-the-art computational methods were developed based on these functional genomics data.
Fig. 1.
Definition of functional EPIs. Functional EPIs required evidence from three aspects: (A) Active status of enhancers and promoters. (B) Spatial proximity between enhancer and promoter (though some studies revealed exceptional cases). (C) Context-dependent gene expression alteration.
2. Identifying active CREs
A key component of functional EPI detection is to investigate whether the associated CREs are activated in particular conditions. Comparative genome analysis has been widely applied in identifying functional DNA elements through searching the cross-species conserved regions [24], [25]. But its application on enhancer detection is challenged since the findings that enhancers evolve rapidly and most of them are species-specific [12]. The lack of ability in measuring activity state of CREs is another limitation of comparative genome analysis, which promotes the development of new strategies to track active CREs beyond DNA sequence alone. Advances in biotechnologies and high-throughput sequencing greatly facilitate the identification of tissue/cell type-specific enhancers and promoters. For examples, the binding of certain TFs, co-factors and histone modifiers (such as EP300, CDK7, BRD4, and Mediator) usually indicate active CREs. The histone modifications can mark the activity states of CREs (such as H3K27ac for active enhancer, H3K4me1 for primed enhancer, both H3K4me1 and H3K27me3 for poised enhancer, and H3K4me3 for active promoter [26], [27], [28]). These biological processes could be measured by ChIP-ChIP, ChIP-seq, as well as recent Cut&Run and Cut&Tag [29], [30], [31], [32]. Active enhancers and promoters are also closely related to chromatin states (such as open chromatin and nucleosome occupancy), which is highly relevant to the binding of various TFs. Therefore, genomic assays measuring chromatin states, including DNase-seq, ATAC-seq and MNase-seq, are all informative for detecting active CREs [33], [34], [35]. But there is no one-to-one correspondence between epigenomic feature and certain CRE. As a part of ENCODE project, ChromHMM and Segway are used to integrate multiple histone modifications and chromatin states across large numbers of tissues/cell types to generate comprehensive predictions for different CREs using DNA segmentation algorithms [36], [37], [38]. Detecting nascent RNAs is another feasible approach for the identification of active CREs. It is enlightened by the similar properties between enhancers and promoters. For instances, enhancers can be transcribed into non-coding enhancer RNAs (eRNA) when activated [39], [40]. And promoters of some genes were found to have enhancer’s ability and distally regulate the transcription of other genes [34], [41], [42]. Cap analysis of gene expression (CAGE) and similar techniques (like GRO-seq and PRO-seq) are very suitable for detecting transcripts within cell nucleus, thus are widely applied in detecting eRNA and nascent mRNA [43], [44], [45], [46], [47].
Studies have shown that poised enhancers can physically contact their target genes by polycomb dependent manner in certain cellular and genomic contexts, but the permissive EPIs will be silent until receiving active nuclear signatures [48], [49]. Therefore, detecting active enhancers and promoters could be the prerequisite to define functional EPIs (Fig. 2A). Nevertheless, current understanding of CRE activation by specific chromatin marks or nascent transcripts do not necessarily imply that the enhancers or promoters are truly functional. For example, it was revealed that only 26% enhancers predicted in ENCODE project have regulatory activity [50]. Besides, recent CRISPR screening had uncovered that some regulatory regions with unmarked regulatory signatures are functional [51], which highlights the importance to exploit the novel chromatin features for active CRE definition. Taken together, these techniques do not provide direct evidence for the linkage between enhancers and promoters. Some patterns of the data, such as co-activation between enhancer and promoter across cells, are informative for functional EPI detection, which will be discussed further in the part of computational methods.
Fig. 2.
Conventional workflow for detecting, predicting and validating functional EPIs. (A) Epigenomic features and nascent transcripts are the major characteristics of active CREs. (B) Functional EPIs require enhancer and promoter to be spatially adjacent. (C) Candidate EPIs are routinely derived from the combination of active CREs and chromatin loops. (D) Computational methods are developed on candidate EPIs using either supervised or unsupervised algorithms. (E) Disrupting CREs and testing the transcriptional effects on gene transcription are the main approaches to validate candidate EPIs.
3. Tracking spatial proximity
Reduced spatial distance by chromatin loop formation is another critical property of functional EPI. 3C-based techniques lead to a revolution for identifying DNA interactions [52] (Fig. 2B). Through introducing proximity ligation to next generation sequencing (NGS), 3C and its derivatives are able to capture the three-dimensional interactions among chromatin. Especially, high-throughput 3C-derived techniques, including high-depth Hi-C, HiChIP and ChIA-PET using a large number of cells, provide efficient avenue to identify genome-widely potential EPIs to date [53], [54], [55]. Relying on modeling contact frequencies and assuming interaction background generated by random collisions across the chromatin polymer, significant interactions can be called through various computational methods [56]. However, conventional 3C-based techniques still have limitations in terms of resolution, sensitivity and expenditure. In fact, unless sustaining enough library complexity and investing ultra-high sequencing depth, the resolution of Hi-C was not precise enough to distinguish chromatin loops. A fundamental difficulty of 3C-based techniques was their dependence on proximity ligation. It was suggested that proximity ligation fails to capture many known structures and introduces high background noise [57], which significantly affects the quality of identified loops for follow-up EPI modeling. Besides, restriction enzyme dependent techniques could not distinguish genomic regions smaller than a theoretical limit determined by enzyme. For example, the theoretical maximum resolution of 6 bp restriction enzymes was 4 kb. ChIA-PET was suggested to have higher resolution with the same sequencing depth because it focused on regions marked by specific factors. But it was criticized to have low sensitivity which leads to high false negative rate in detecting chromatin loops [55]. Additionally, it usually required a mass of cells to achieve high resolution for loop calling, which not only made it unaffordable for many studies, but also made it unable to detect chromatin interactions at single cell level. The development of single-cell or single-molecule 3C-techniques provide insights into single-cell loop calling, but most of them are immature and need to be validated and improved with more efforts [57]. Therefore, genome-wide identification of unbiased loops (including EPIs) is impossible by conventional 3C-based techniques. Some considerations and evolved strategies are briefly summarized here from different angles with previous reviews [58], [59], [60], [61].
3.1. Crosslinking and proximity ligation introduce noises and artificial interactions
As two necessary steps in 3C library construction, crosslinking could capture unwanted contacts that are not mediated by direct chromatin interactions and introduce artifacts through intervening molecules or organelles, while ligation heavily relies on the specificity of crosslinked sequences and physicochemical state of chromatin in nucleus or solution. Although several modified techniques, such as DLO Hi-C [62] and BL-Hi-C [63], had been developed to optimize the efficiency and specificity during the crosslinking and ligation, the original defects still exist. Recently, several new methods complementary to previous 3C-based techniques have been invented to provide relatively unbiased investigation of chromosome interactions. Briefly, native 3C-based assay (i3C/iHi-C) captures spatial interactions without crosslinking [64], [65]; genome architecture mapping (GAM) leverages ultrathin nuclear cryosectioning followed by sequencing to detect long distance and three-way contacts [66]; split-pool recognition of interactions by tag extension (SPRITE) discriminates different types of contacts by split-pool barcoding of DNA molecules within the same crosslinking complex [57]; transposase-mediated analysis of chromatin looping (Trac-looping) simultaneously detects chromatin accessibility and multiscale chromatin interactions without prior chromatin fragmentation and proximity-based ligation [67]; DNA adenine methyltransferase identification (DamID) named DamC provides the first crosslinking- and ligation-free demonstration of chromosome structure by DNA methylation-based detection [68]; multiplex chromatin-interaction analysis via droplet-based and barcode-linked sequencing (ChIA-Drop) captures complex chromatin interactions with single-molecule precision [60], [69]. These novel methods that overcome some restrictions of conventional 3C-based library construction can achieve more reliable and powerful chromatin conformation measurement.
3.2. Restriction enzyme-based fragmentation limits the resolution of chromosome folding mapping
Theoretically, HaeIII and MboI can cut the human genome every 342 bp and 401 bp on average respectively [63]. Although increasing sequencing read coverage has been widely applied in Hi-C experiments, reliance on restriction enzyme limits the resolution of data despite read coverage. Factor-specific 3C-based techniques, like ChIA-PET, HiChIP and PLAC-seq on architecture proteins (such as CTCF and YY1), RNA Polymerases as well as histone modifications (such as H3K27ac and H3K27me3), improve the yield of conformation-informative reads and increase resolution by chromatin immunoprecipitation and peak calling [55], [70], [71], [72], [73], [74]. In addition, Capture-C, Tiled-C, T2C, HiCap and other types of Capture Hi-C (CHi-C) enrich selected regions of interest by pre-designed capture oligonucleotides, which significantly increase power of interaction detection at high resolution [75], [76], [77], [78], [79], [80]. Syn-HiC redesigns chromosome with regularly spaced restriction sites thus enables unbiased distribution of contact frequencies and robust definition of Hi-C resolution [81]. Avoiding to use restriction enzyme, Micro-C and DNase Hi-C utilize micrococcal nuclease and DNase I respectively to achieve mononucleosome resolution [82], [83].
3.3. Loop calling methods are heterogeneous for significant interactions identification
Many computational tools have been developed to call loops from genome-wide 3C-based data (notably Hi-C), and the performance of several representative methods was comprehensively benchmarked before [56], [84]. Extreme difference in the number of identified interactions, varied mean distance between the interacting points, low reproducibility among replicates as well as high false discovery rate upon simulated data were observed for existing loop calling methods. Such defects could be attributed to several intrinsic issues of data. First, there is no internal normalization criteria or reference of ground truth to convert relative contact probabilities into absolutely comparable values, which complicates the assumption of background model for contact frequency and hampers the definition of significant threshold referring to true EPIs. Multiplexed single molecule FISH combined with recent optical reconstruction of chromatin architecture (ORCA) that identify kilobase-level EPIs could provide a potentially complementary reference dataset [85]. Second, 3C-based library construction causes insufficient power of loop calling methods to detect long range cis (>1 Mb on the same chromosome) and trans interactions [86]. Some targeted enrichment approaches, such as CHi-C, HiChIP and ChIA-PET, could preferentially capture long range interactions. Specifically, ligation-free strategies, like GAM, SPRITE, Trac-looping and DamC provide more adequate power to identify long range cis and trans loops [60].
3.4. Conventional 3C-based processing lacks ability to capture simultaneous or cooperative interactions
The redundancy of enhancers compared to promoters implies the relationship between enhancer and promoter cannot be 1-to-1, and the percentage of enhancers interacted with multiple promoters were estimated from 9% to 50% [14], [87]. It was speculated that simultaneous interactions between enhancers and promoters are important to ensure stable gene expression in transcription factories [14], [88]. Although conventional 3C-based techniques can be used to detect multiple-contacts, it significantly lacks throughput and resolution to locate CREs genome-widely [78], [89]. To capture simultaneous interactions at scale, several multi-way chromatin contacts identification methods have been developed. For examples, tethered multiple 3C (TM3C), that maps genome-wide simultaneous chromatin contacts via ligation of fragments upon agarose gel beads followed by paired-end sequencing [90]. Chromosomal walks (C-walks) investigates higher-order organization by linking multiple genomic loci together into proximity chains [91]. GAM enables the detection of three-way chromatin contacts but the resolution (>100 kb) is not enough to capture the interactions between multiple enhancers and promoters [66]. Multi-contact 4C (MC-4C) applies nanopore sequencing to measure multi-way DNA conformations in individual alleles using modified 4C-seq method [92], [93]. Similar to MC-4C, Tri-C efficiently detects multiple ligation junctions within single sonicated 3C fragments by oligonucleotide capture [87]. SPRITE can identify multiple loci that simultaneously interact in a single cluster and long-distance [57]. Likewise, Trac-looping provides unbiased detection of multiple-way chromatin interactions and captures chromatin interactions across extremely large distances [67]. Taken together, the development of multi-way methods with diverse strategies have made it possible to detect multi-contacts among enhancers and promoters.
3.5. Current techniques face difficulties to exploit fine-scale interactions at single cells
Given the rapid evolution of 3C-based techniques at multiple contacts level, most of them generally capture snapshots of 3D genome for the whole cell populations at specific time point. To describe the highly dynamic 3D chromatin at single cells, several groups have optimized the 3C-based techniques and applied them to track chromatin conformation on single cells across different development stages and conditions [94], [95], [96], [97], [98]. However, the genomic resolution of current single cell chromatin conformation methods is limited due to either the highly variable chromatin structure among cell populations or the technical issue of subsampling, which significantly prevent the identification of stable EPIs at single cell level. Despite all these, Dip-C improves the detection power of chromatin contacts by combining a transposon-based whole-genome amplification [99], while ChIA-Drop uses droplet-based chromatin interaction analysis and reveals many promoter-centered multivalent interactions at high resolution [69]. Such technical progresses will initiate the in-depth exploitation of fine-scale EPIs at single cell level.
4. Predicting unseen EPIs
The continuous evolution of proximity-based 3D genome techniques made them possible to detect EPIs in genome wide. Meanwhile, accumulated genomic, epigenomic, and transcriptomic profiling data provided abundant resources for the identification of active CREs. To detect meaningful EPIs in particular conditions, the common practice would be to overlap high resolution chromatin interactions with tissue/cell type-specific active CREs (Fig. 2C) [20]. Although the discovery of EPIs has been fueled by integrating tissue/cell type-specific epigenomic profiles and 3D genomic data, high resolution genome-wide loop data are only available for limited human tissues/cell types thus far [100], [101]. Unless the limitations such as low resolution, low sensitivity and high cost were properly addressed, 3C-based techniques could not be widely applied in most studies for chromosome loop identification. Besides, EPI was not the exclusive case in terms of spatial neighborhood, thus the chromatin loops identified by 3C data could be interpreted as either functional loops or other interactions by chance. To predict the unrecognized EPIs at different contexts, many computational methods have been developed by learning or modeling existing 3D genomic data and other molecular phenotype profiles, such as open chromatin, transcript expression, histone modification and TF binding [102], [103] (Fig. 2D). Pioneered by epigenetic mark-promoter linkage studies [28], [104], over 30 in silico methods currently have been proposed to predict EPIs in human using diverse omics datasets and statistical models. Generally, existing computational EPI prediction methods could be divided into two major categories including unsupervised and supervised learning (Table 1).
Table 1.
Computational methods for EPI prediction.
| Tool | Year | Method category | Features | Algorithm | Links |
|---|---|---|---|---|---|
| Ernst et al. [28] | 2011 | Correlation-based | Histone marks, TF binding | Pearson’s Correlation | http://compbio.mit.edu/ENCODE_chromatin_states/ |
| Thurman et al. [104] | 2012 | Correlation-based | DHS | Pearson’s Correlation | https://genome.ucsc.edu/ENCODE/downloads.html |
| DRE-target [115] | 2013 | Correlation-based | DHS, Sequence homology | Pearson’s Correlation | ftp://public:public@202.120.224.143/NAR2013.tar.gz |
| Andersson et al. [11] | 2014 | Correlation-based | CAGE | Pearson’s Correlation | http://fantom.gsc.riken.jp/5/ |
| PreSTIGE [106] | 2014 | Distance-based | Distance, Insulator | Linear Domain Models | http://mendel.gene.cwru.edu:8080/ |
| gkm-SVM [139] | 2014 | Train Classifier | DNA | Support Vector Machine | http://www.beerlab.org/gkmsvm/ |
| IM-PET [123] | 2014 | Train Classifier | Histone marks, TF binding, DNA, RNA-seq | Random Forest | www.healthcare.uiowa.edu/labs/tan/IM-PET_Package.tgz |
| ELMER [114] | 2015 | Correlation-based | DNA methylation, RNA-seq | Pearson’s Correlation | https://static-content.springer.com/esm/art%3A10.1186%2Fs13059-015–0668-3/MediaObjects/13059_2015_668_MOESM4_ESM.xlsx |
| RIPPLE [125] | 2015 | Train Classifier | Histone marks, TF binding, DHS, DNA-seq | Random Forest | http://pages.discovery.wisc.edu/~sroy/ripple/index.html |
| PEGASUS [116] | 2015 | Correlation-based | Conservation | Linkage Scoring | ftp://ftp.biologie.ens.fr/pub/dyogen/PEGASUS/ |
| Basset [140] | 2016 | Train Classifier | DNA | CNN | https://github.com/davek44/Basset |
| TargetFinder [126] | 2016 | Train Classifier | Histone marks, TF binding, DHS, CAGE | Gradient Tree Boosting | https://github.com/shwhalen/targetfinder |
| PETModule [124] | 2016 | Train Classifier | Histone marks, Conservation, Motif | Random Forest | http://hulab.ucf.edu/research/projects/PETModule/ |
| EpiTensor [119] | 2016 | Decomposition-based | Histone marks | Tensor Decomposition | http://wanglab.ucsd.edu/star/EpiTensor/ |
| JEME [141] | 2017 | Regression-based | Histone marks, DHS, DNA methylation, eRNA | Linear Regression | https://github.com/yiplabcuhk/JEME |
| McEnhancer [135] | 2017 | Train Classifier | DHS | Markov Chain Model | https://ohlerlab.mdc-berlin.de/software/McEnhancer_134/ |
| SWIPE-NMF [120] | 2017 | Decomposition-based | eQTL, DHS | Matrix Factorization | https://github.com/kaiyuanmifen/SWIPE-NMF |
| EPIANN [130] | 2017 | Train Classifier | DNA | CNN + Attention Model | https://github.com/wgmao/EPIANN |
| PEP [131] | 2017 | Train Classifier | DNA | Gradient Tree Boosting | https://github.com/ma-compbio/PEP |
| CISD [138] | 2017 | Train Classifier | MNase-seq | Logistic Regression | https://github.com/huizhangucas/CISD |
| FOCS [142] | 2018 | Regression-based | DHS, CAGE, GRO-seq | Linear Regression | https://github.com/Shamir-Lab/FOCS |
| Cicero [111] | 2018 | Correlation-based | scATAC-seq | Graphical Lasso | https://github.com/cole-trapnell-lab/cicero-release |
| TransDecomp [121] | 2018 | Decomposition-based | CAGE | Decomposition | https://github.com/anderssonlab/transcriptional_decomposition |
| Rambutan [136] | 2018 | Train Classifier | DNA, DHS | CNN | https://github.com/jmschrei/rambutan |
| SPEID [129] | 2018 | Train Classifier | DNA | CNN | https://github.com/ma-compbio/SPEID |
| 3DEpiLoop [127] | 2018 | Train Classifier | Histone marks, TF binding | Random Forest | https://bitbucket.org/4dnucleome/3depiloop |
| EP2vec [132] | 2018 | Train Classifier | DNA | Word2vec + Gradient Boosted Regression Trees | https://github.com/wanwenzeng/ep2vec |
| DeepTACT [137] | 2019 | Train Classifier | DHS, DNA | CNN + Attention Model | https://github.com/liwenran/DeepTACT |
| C3D [112] | 2019 | Correlation-based | DHS | Pearson’s Correlation | https://github.com/LupienLab/C3D |
| EPIP [128] | 2019 | Train Classifier | DHS, Histone marks | Adaboost | http://www.cs.ucf.edu/~xiaoman/EPIP/ |
| DRAGON [152] | 2019 | Polymer Simulation | Histone marks, TF binding | Maximum Entropy | https://github.com/ZhangGroup-MITChemistry/DRAGON |
| CHINN [133] | 2019 | Train Classifier | DNA | CNN | https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE135052 |
| CT-FOCS [143] | 2019 | Regression-based | DHS | Linear Mixed Effect Models | http://acgt.cs.tau.ac.il/ct-focs |
| HiC-Reg [150] | 2019 | Regression-based | DHS, Histone marks, TF binding | Random Forests Regression | https://github.com/Roy-lab/HiC-Reg |
| ABC [110] | 2019 | Distance-based | Distance, DHS, Histone marks | Activity-by-contact Model | https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction |
| 3DPredictor [146] | 2020 | Train Classifier | CAGE, CTCF | Gradient Boosting | https://github.com/labdevgen/3Dpredictor |
4.1. Unsupervised learning methods
Attributing enhancers to their nearest genes was commonly used approach to identify EPIs. Besides, increasing genomic/epigenomic features provided another approach to detect EPIs since active enhancers and promoters that have distinct patterns compared to inactive ones. These characteristics further enable the correlation of epigenomic signals between enhancer and promoter to be useful criterion for identifying potential interactions [28]. Similarly, the relation between gene expression and EPIs was also a feasible strategy to detect active EPIs, which makes use of linear/non-linear regression to quantitatively estimate the regulatory potential of enhancers towards given genes. In general, these unsupervised methods could be classified into three categories: (1) distance-based methods; (2) correlation-based methods and (3) decomposition-based methods (Fig. 3A–C).
Fig. 3.
Overview of computational methods for EPI prediction. Strategies to predict EPIs can be divided into two major categories, unsupervised learning and supervised learning. Unsupervised learning algorithms include (A) Distance-based methods assign enhancers to the nearest genes, and the regulatory scope is restricted in some methods. (B) Correlation-based methods detect EPI according to high correlation of chromatin features between enhancer and promoter from a panel of samples. (C) Decomposition-based methods decompose feature matrix/tensor into subspaces, which capture the spatial features of genome thus could be used to detect EPI. Supervised learning algorithms include (D) Training Classifier methods measure the relationship between gene activity and enhancers by estimating the regulatory potential of enhancers for specific gene. (E) Regression-based methods build different machine learning classifier to distinguish positive EPIs from randomly selected negative set. ML: machine learning, DL: deep learning.
4.1.1. Distance-based methods
Linking enhancers to the nearest promoter had been widely used in many studies. Despite the simple rationale, methods applying this strategy was proved to be considerably effective. It was estimated that in 40% cases enhancers regulate the nearest genes [11], [105]. Setting additional criteria can increase the efficiency of distance-based methods. For example, considering the cell type-specific activated enhancer and promoters, PreSTIGE captured EPIs that only exist in specific cell types [106]. In addition to distance, false positive rate caused by distance-based methods were greatly reduced by constraining the range of EPIs within topologically associating domains (TADs) [107] or insulated neighborhoods (INs) [108] which limit the regulatory potential of enhancers within specific 3D genome architecture [109]. Recent activity-by-contact (ABC) model scored the potential of EPI by combining distance effect and enhancer activity, demonstrating a superior performance than existing approaches [110]. Nevertheless, as a naïve strategy that barely consider long distance interaction, distance-based methods were commonly used as baseline approach to predict EPIs in many studies and the performance of distance-based methods was usually not as good as subsequent methods.
4.1.2. Correlation-based methods
Many EPI prediction methods, including Ernst et al. [28], Thurman et al. [104], Cicero [111] and C3D [112] were developed based on the correlation of epigenomic marks between enhancers and promoters. To be specific, Ernst et al. calculated Pearson’s correlation of histone modifications and TF binding between enhancers and approximate promoters [28], while, Thurman et al. leveraged DNase I hypersensitive sites (DHSs) across many cell types to compute correlation [104]. Similarly, C3D was able to detect correlated CREs with more DHS data [112]. Besides, with single cell ATAC-seq [35] profiles, Cicero was designed to detect correlation of open chromatin at single cell level [111]. Instead, Andersson et al. [11] used cap analysis gene expression (CAGE) [113] profiles to inspect whether consistent transcriptional events could be observed between enhancer locus (such as enhancer RNA (eRNA)) and target genes across different tissues/cell types. ELMER uses inverse correlation between DNA methylation and expression of nearby genes to predict transcriptional targets [114]. DRE-target [115] identified target genes of distal regulatory elements (DREs) as those obtaining high phylogenetic correlation with DREs. Despite the improved performance, DRE-target depended on Hi-C data which were not widely available. Likewise, PEGASUS relied on evolutionary conservation of synteny to estimate enhancer-gene associations [116], [117]. The performance of correlation-based methods was not only influenced by the choice of features, it was also suggested to be affected by the algorithm to calculate correlation [111]. Pearson’s correlation was very sensitive to outliers, usually generating large numbers of false positive predictions. Among the correlation-based EPI methods, only Cicero addressed this problem by using graphical Lasso [111] which calculated regularized correlation matrices, thus was more robust to outliers [118].
4.1.3. Decomposition-based methods
With accumulating genomic/epigenomic/transcriptomic data from diverse tissues/cell types, matrix decomposition became a feasible approach to detect EPIs by extracting meaningful co-variation patterns from high-dimensional signals. Hitherto there were three EPI methods based on matrix decomposition, including EpiTensor [119], SWIPE-NMF [120] and TransDecomp [121]. EpiTensor collected various assays from many cell types and combined all the data with a three-order tensor in which the dimensions represent genomic loci, assay type and cell type respectively. Tensor decomposition was then applied to resolve the combined tensor into three subspaces: cell subspace, assay subspace and locus subspace. By analyzing the eigenvectors of locus subspace, genomic interactions were captured by linking the peaks of eigenvectors following distance-based approaches [119]. SWIPE-NMF firstly included six types of genomic segments and then established association matrices for every pairs of segments. Extended version of three-factor penalized matrix factorization (PMF) was then used to factorize the association matrices. Enhancer-promoter interactions were then characterized from the results of PMF [120]. TransDecomp implemented very different strategies compared with EpiTensor and SWIPE-NMF. It collected CAGE signals and decomposed the data to get two principle components called positional independent (PI) and positional dependent (PD). Then TransDecomp set 25 features related to the components to train a random forest classifier to identify EPIs derived from promoter-capture Hi-C [122]. The results showed that features derived from PI and PD components were unique in distinguishing active EPIs [121]. Different with correlation-based methods that only rely on limited data type across large numbers of samples, the decomposition-based methods leveraged multiscale information from omics data to learn unique patterns for putative EPIs.
4.2. Supervised learning methods
With the development of 3C-based techniques, especially the high throughput methods, such as high-depth Hi-C, HiChIP and ChIA-PET in large numbers of cells, EPIs could be effectively described across entire genome, which made it possible to use supervised methods to identify potential EPIs (Fig. 3D and E).
4.2.1. Training classifier with machine learning
There were many efforts to detect EPIs by training classifiers. By leveraging various 1D genomic/epigenomic features, classifiers could be established to distinguish Hi-C/ChIA-PET supported EPIs from random selected negatives. IM-PET was the first method following this strategy [123]. It implemented four features, including enhancer and promoter activity profile correlation, transcription factor and target promoter correlation, coevolution of enhancer and target promoter and distance constraint between enhancer and target promoter [123]. Therefore, IM-PET was a combination of several strategies mentioned above, but this made it not user-friendly since the way of calculating those integrated features was not easy to fix, which raises additional challenges. PETModule implemented a similar feature sets with IM-PET but showed higher performance [124]. Instead, in subsequent studies, such as RIPPLE [125], TargetFinder [126], 3DEpiLoop [127] and EPIP [128] directly used epigenomic/transcriptomic profiles, including TF binding, histone marks, DHSs and expression data to train comprehensive classifier. Both RIPPLE and TargetFinder carried out careful feature evaluation and identified expression level as the most distinct feature, and DHS, CTCF binding were also informative in trained models. Besides, TargetFinder included ChIP-seq data for over 100 TFs and the feature importance of those data was top-ranked. Both RIPPLE and TargeFinder achieved high performance in identifying EPIs within specific cell types. However, the requirement of plenty of features made it impractical to apply RIPPLE and TargetFinder to train specific classifiers for more tissues/cell types.
Efforts had also been put to explore the possibility of EPIs detection with sequence features only. There were several methods developed based on this approach, including SPEID [129], EPIANN [130], PEP [131], EP2vec [132] and CHINN [133]. DNA sequences were usually represented with one-hot encoding, but with the development of deep learning methods, especially the great success of word2vec in nature language processing, word embedding, like dna2vec, had been considered in motif feature analysis [129], [130], [131], [132]. Among those methods, SPEID and EPIANN used one-hot encoding while PEP and EP2vec tried dna2vec approach to represent DNA sequence. All sequence-based methods focused on efficiently extracting the information from DNA sequences. To pinpoint it, SPEID implemented a bidirectional long short-term memory (BiLSTM) [134] module before training classifier, while EPIANN applied attention mechanism to directly locate the functional DNA elements. PEP and EP2vec trained a word embedding model first, and then trained classifiers with embedded DNA sequences. In addition, to identify cell type-specific EPIs, sequence-based methods required active enhancers and promoters to be well defined in given condition, such as CHINN used DNA sequences of the interacting open chromatin regions [133].
To overcome the limitation of sequence-based methods, some algorithms used widely available open chromatin profiles to supply cell type-specific signatures. For example, McEnhancer [135] learned related DHS-gene pairs from small number of known pairs and used a semi-supervised strategy to predict unlabeled ones. Besides, Rambutan [136] used convolutional neural network (CNN) to extract features from both DNA and DNase-seq data, and the summarized features together with the distance between enhancer and promoter were finally used to make predictions. Similarly, DeepTACT [137] applied CNN layers to extract features from raw data, but the merged features were further processed with BiLSTM and attention layers to make better integration. Other type of chromatin accessibility data, such as MNase-seq could also be used in EPI prediction [34]. CISD implemented MNase-seq to train classifier to distinguish EPIs derived from ChIA-PET loops against randomly selected ones [138].
According to different algorithms aforementioned, statistical learning methods could be divided into two categories: typical machine learning and deep learning. Machine learning methods used random forest, logistic regression, and support vector machine (SVM) to establish classifiers directly with Hi-C/ChIA-PET loops and various feature sets. According to RIPPLE, ensemble learning methods, such as random forest, usually performed better than other classification methods [125]. However, feature selection could not be performed by random forest. To solve this problem, RIPPLE combined random forest with Lasso for feature selection. On the other hand, deep learning methods used CNN, recurrent neural network (RNN) and attention model to extract informative features from raw input and the then implemented simple machine learning to train classifiers. Deep learning methods usually performed better when sample size is large enough. Specifically, when implementing attention layers, learning kernel could locate the relatively precise position of enhancer elements [23], [130].
Interestingly, some machine learning methods which were not originally designed for EPI prediction could also be applied to this topic. For example, gkm-SVM [139] and Basset [140] were developed to identify regulatory elements, but when the input was set as the EPI samples and concatenated features of enhancers and promoters, both gkm-SVM and Basset were easily transformed into EPI prediction methods.
4.2.2. Regression-based methods
Although training classifiers to distinguish active EPIs from others was a promising strategy, the regulatory potential of specific enhancers to their target genes could not be properly quantified. This limitation could be addressed by regression-based methods. When considering gene expression level as the effect indicator of neighboring CREs, the regulatory ability of separate enhancers could be inferred from training regression model among a panel of expression data together with epigenomic profiles in many cell types. In other words, the parameters learned from regression model may represent the degree of enhancers influencing target genes. Based on this rationale, JEME [141] quantitatively estimated the universal regulatory potential of enhancers while cell type-specific EPIs were then identified by training a random forest classifiers. Similar to JEME, FOCS [142] implemented a leave-n-out algorithm to obtain robust regression model linking nascent RNA transcription (measured by GRO-seq [47]) with DHSs data for correlated enhancer and promoter activity across many samples. CT-FOCS extended the FOCS model to use multiple replicates per cell type to infer cell type-specific EPIs [143]. Therefore, compared with the classifier learning, the regression-based methods can properly evaluate regulatory potential by directly associating CREs with gene transcription.
4.3. Existing issues and challenges for EPI prediction
Despite the great achievements, there are limitations among strategies applied in the current state-of-art computation methods. Distance-based methods highly depend on accurate identification of cell type-specific regulatory elements, which usually requires multiple genomic and epigenomic features. Distance-based methods are also hampered by the fact that many enhancers regulate distal promoters. Correlation-based methods and regression-based methods have higher performance, but they require large panel of samples, thus is usually not generalizable when unseen cell type is given. In contrast, supervised learning methods which train classifiers are applicable in predicting EPI across cell types. The merits and disadvantages of each EPI prediction strategy have been discussed before [102], [103]. Here, we supplemented some critical problems that significantly affect the performance of current methods and briefly summarized related challenges of future prediction task.
First, there is no widely accepted ground truth to systematically evaluate the existing methods, probably due to the low power and high false positive rate of current technologies in capturing chromatin interactions. A recent benchmark study generated a set of candidate enhancer-gene interactions (BENGI) by integrating the candidate active CREs with experimentally validated genomic interactions on specific tissues/cell types [144]. They found that, overall, the correlation-based methods did not outperform the distance-based methods, and the supervised methods, like TargetFinder [126], were only modestly better than distance-based methods for most benchmark datasets when trained and tested with the same cell type but underperformed when applied across cell types. Second, the inflated performance of supervised learning methods was frequently observed [145]. The key problem could be attributed to the improper generation of training dataset and biased sampling procedure, in which the positive samples shared same features but negative samples obtained varied feature distribution. Random-split strategy in preparation of training dataset seems to overcome such problems [145], [146]. Third, methods using deep learning have problems in doing feature selection. To address this problem, SPEID evaluated feature importance by measuring the decreasing of method performance when replacing certain feature value with random noise [129]. To evaluate feature importance in deep leaning-based methods, one can use the feature selection strategies including SHAP [147], DeepLIFT [148], and Deep Feature Selection [149]. Fourth, most of existing computational methods modeled the chromatin interaction by either spatial proximity or transcriptional outcome, which makes them face difficulty in verifying causal relationship instead of showing only correlation. Although methods, such as HiC-Reg [150], 3Dpredictor [146], MEGABASE [151], DRAGON [152] and ABC [110], can quantitatively measure the interacting probability or intensity, how such prediction values are proportional to functional readout is largely unknown. Recent biochemical experiments showed that the increased EPI could lead to decreased gene activation [17], [19], [23], which further complicates the establishment of causal link between spatial proximity and transcriptional outcome. Lastly, computational EPI prediction methods are facing challenges and new opportunity brought by advanced techniques. For example, some 3D genomic techniques, such as C-walks [91], GAM [66], MC-4C [92], SPRITE [57] and ChIA-drop [69], have been developed to detect multiplex chromatin interactions in single allele. In these data, multiple cis-regulatory elements can interact with same target gene simultaneously, which is difficult scenario for current EPI methods, especially for methods implemented training classifiers. Methods based on regression could detect multiple interactions theoretically [153], [154], but whether the interactions are active simultaneously or just gloss caused by the average of bulk cells remain elusive.
5. Validating functional EPIs and future direction
Even if the advance of biochemical and computational methods has deepened our understanding of precise transcriptional control in the 3D genome, the biological function of CREs as well as the links between enhancer and their regulated targets identified by current techniques remain to be validated (Fig. 2E). Using transgenic reporter assays and massively parallel reporter assays, candidate enhancers could be dissected regardless of their native genomic context and endogenous target genes [5], [155], [156]. In vivo enhancer manipulation can be performed by Clustered Regulatory Interspaced Short Palindromic Repeat (CRISPR) associated Cas (CRISPR/Cas) system currently. Several lines of CRISPR/Cas strategies, including nuclease-active genome-editing screens and nuclease-inactive epigenome-editing screens, have been successfully applied to characterize large numbers of enhancers in their native genomic context [5]. On the other hand, super-resolution DNA FISH and microscopy complement with proximity-based techniques provide an unprecedented view of chromatin interactions at kilobase-scale resolution [157]. However, accurate identification of active CREs and bona fide EPIs in particular condition are usually two uncoupled experiments, which cannot answer which EPIs are true functional and eventually modulate transcriptional event of target genes or other molecular phenotypes. The objective of validating true functional EPIs have sparked enormous interest in designing novel coherent experiment. For examples, by introducing chromatin loops at desired genomic loci, chromatin loop reorganization using CRISPR-dCas9 (CLOuD9) can selectively inspect gene expression at targeted loci [158]; CRISPR affinity purification in situ of regulatory elements (CAPTURE) simultaneously identifies locus-specific transcriptional regulator complexes, chromatin-associated RNA and DNA interactions [159]; CRISPR-genome organization (CRISPR-GO) system can efficiently control the spatial positioning of genomic loci relative to specific nuclear compartments, enabling interrogation of chromatin interaction dynamics and associated molecular events [160]; light-activated-dynamic-looping (LADL) system allows light-inducible loop formation followed by single-molecule RNA-FISH for nascent expression quantification [161]. These new technologies greatly facilitate the one-stop evaluation of true biological functions for individual EPIs.
The evolution of advanced biotechnologies and accumulated functional genomics data are constantly revolutionizing the genome-wide identification of functional EPIs. By integrating multilayer tissue/cell type-specific evidence from uncoupled assays on genomic, epigenomic and transcriptomic profiling, the false positive rate of functional EPIs discovery could be reduced [162]. For example, tissue/cell type-specific quantitative trait locus mapping on gene expression (eQTL), chromatin accessibility (caQTL) and promoter interaction (pieQTL) have been used to refine or ascertain true functional EPIs together with active CREs profiling and 3C-based techniques [77], [163], [164], [165], [166], which will ultimately facilitate the interpretation of non-coding regulatory variant effect on 3D genome and complex disease [167]. In addition, the high throughput CRISPR/Cas-based perturbation screenings, such as Mosaic-seq [168], crisprQTL mapping [169] and CRISPRi-FlowFISH [110], on multiple target genes have offered promising strategies to simultaneously validate the endogenous effect of CREs with their putative target genes. Although these CRISPR/Cas-based systems are still in its infancy which only identified several hundreds of high-confidence EPIs, we believe they will initialize fundamentally novel computational methods by combining advanced 3D genomic data in the near future.
Credit authorship contribution statement
Hang Xu: Investigation, Writing - original draft. Shijie Zhang: Writing - review & editing. Xianfu Yi: Writing - review & editing. Dariusz Plewczynski: Writing - review & editing. Mulin Jun Li: Conceptualization, Investigation, Supervision, Writing - original draft.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by grants from the National Natural Science Foundation of China 31871327 (M.J.L), Natural Science Foundation of Tianjin 19JCJQJC63600 (M.J.L); and Polish National Science Centre (2014/15/B/ST6/05082) and Foundation for Polish Science co-financed by the European Union under the European Regional Development Fund (D.P).
Footnotes
Supplementary data to this article can be found online at https://doi.org/10.1016/j.csbj.2020.02.013.
Appendix A. Supplementary data
The following are the Supplementary data to this article:
References
- 1.Kellis M., Wold B., Snyder M.P., Bernstein B.E., Kundaje A., Marinov G.K. Defining functional DNA elements in the human genome. Proc Natl Acad Sci U S A. 2014;111:6131–6138. doi: 10.1073/pnas.1318948111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Maston G.A., Evans S.K., Green M.R. Transcriptional regulatory elements in the human genome. Annu Rev Genomics Hum Genet. 2006;7:29–59. doi: 10.1146/annurev.genom.7.080505.115623. [DOI] [PubMed] [Google Scholar]
- 3.Lenhard B., Sandelin A., Carninci P. Metazoan promoters: emerging characteristics and insights into transcriptional regulation. Nat Rev Genet. 2012;13:233–245. doi: 10.1038/nrg3163. [DOI] [PubMed] [Google Scholar]
- 4.Ong C.T., Corces V.G. Enhancer function: new insights into the regulation of tissue-specific gene expression. Nat Rev Genet. 2011;12:283–293. doi: 10.1038/nrg2957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Gasperini M., Tome J.M., Shendure J. Towards a comprehensive catalogue of validated and target-linked human enhancers. Nat Rev Genet. 2020 doi: 10.1038/s41576-019-0209-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Vian L., Pekowska A., Rao S.S.P., Kieffer-Kwon K.R., Jung S., Baranello L. The Energetics and Physiological Impact of Cohesin Extrusion. Cell. 2018;173 doi: 10.1016/j.cell.2018.03.072. 1165–78 e20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Davidson I.F., Bauer B., Goetz D., Tang W., Wutz G., Peters J.M. DNA loop extrusion by human cohesin. Science. 2019;366:1338–1345. doi: 10.1126/science.aaz3418. [DOI] [PubMed] [Google Scholar]
- 8.Landry J.R., Mager D.L., Wilhelm B.T. Complex controls: the role of alternative promoters in mammalian genomes. Trends Genet. 2003;19:640–648. doi: 10.1016/j.tig.2003.09.014. [DOI] [PubMed] [Google Scholar]
- 9.Davuluri R.V., Suzuki Y., Sugano S., Plass C., Huang T.H.M. The functional consequences of alternative promoter use in mammalian genomes. Trends Genet. 2008;24:167–177. doi: 10.1016/j.tig.2008.01.008. [DOI] [PubMed] [Google Scholar]
- 10.Yoo E.J., Cooke N.E., Liebhaber S.A. Identification of a secondary promoter within the human B cell receptor component gene hCD79b. J Biol Chem. 2013;288:18353–18365. doi: 10.1074/jbc.M113.461988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Andersson R., Gebhard C., Miguel-Escalada I., Hoof I., Bornholdt J., Boyd M. An atlas of active enhancers across human cell types and tissues. Nature. 2014;507:455–461. doi: 10.1038/nature12787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Long H.K., Prescott S.L., Wysocka J. Ever-changing landscapes: transcriptional enhancers in development and evolution. Cell. 2016;167:1170–1187. doi: 10.1016/j.cell.2016.09.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bulger M., Groudine M. Functional and mechanistic diversity of distal transcription enhancers. Cell. 2011;144:327–339. doi: 10.1016/j.cell.2011.01.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chepelev I., Wei G., Wangsa D., Tang Q., Zhao K. Characterization of genome-wide enhancer-promoter interactions reveals co-expression of interacting genes and modes of higher order chromatin organization. Cell Res. 2012;22:490–503. doi: 10.1038/cr.2012.15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Osterwalder M., Barozzi I., Tissieres V., Fukuda-Yuzawa Y., Mannion B.J., Afzal S.Y. Enhancer redundancy provides phenotypic robustness in mammalian development. Nature. 2018;554:239–243. doi: 10.1038/nature25461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Symmons O., Pan L., Remeseiro S., Aktas T., Klein F., Huber W. The Shh topological domain facilitates the action of remote enhancers by reducing the effects of genomic distances. Dev Cell. 2016;39:529–543. doi: 10.1016/j.devcel.2016.10.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Benabdallah N.S., Williamson I., Illingworth R.S., Kane L., Boyle S., Sengupta D. Decreased enhancer-promoter proximity accompanying enhancer activation. Mol Cell. 2019;76 doi: 10.1016/j.molcel.2019.07.038. 473–84 e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Schwarzer W., Abdennur N., Goloborodko A., Pekowska A., Fudenberg G., Loe-Mie Y. Two independent modes of chromatin organization revealed by cohesin removal. Nature. 2017;551:51–56. doi: 10.1038/nature24281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Rao S.S.P., Huang S.C., Glenn St Hilaire B., Engreitz J.M., Perez E.M., Kieffer-Kwon K.R. Cohesin loss eliminates all loop domains. Cell. 2017;171 doi: 10.1016/j.cell.2017.09.026. 305–20 e24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Mora A., Sandve G.K., Gabrielsen O.S., Eskeland R. In the loop: promoter-enhancer interactions and bioinformatics. Brief Bioinform. 2016;17:980–995. doi: 10.1093/bib/bbv097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kim S., Shendure J. Mechanisms of Interplay between Transcription Factors and the 3D Genome. Mol Cell. 2019;76:306–319. doi: 10.1016/j.molcel.2019.08.010. [DOI] [PubMed] [Google Scholar]
- 22.Stadhouders R., Filion G.J., Graf T. Transcription factors and 3D genome conformation in cell-fate decisions. Nature. 2019;569:345–354. doi: 10.1038/s41586-019-1182-7. [DOI] [PubMed] [Google Scholar]
- 23.Alexander J.M., Guan J., Li B., Maliskova L., Song M., Shen Y. Live-cell imaging reveals enhancer-dependent Sox2 transcription in the absence of enhancer proximity. eLife. 2019;8 doi: 10.7554/eLife.41769. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Mouse Genome Sequencing C., Waterston R.H., Lindblad-Toh K., Birney E., Rogers J., Abril J.F. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–562. doi: 10.1038/nature01262. [DOI] [PubMed] [Google Scholar]
- 25.Siepel A., Bejerano G., Pedersen J.S., Hinrichs A.S., Hou M., Rosenbloom K. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15:1034–1050. doi: 10.1101/gr.3715005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Rada-Iglesias A., Bajpai R., Swigut T., Brugmann S.A., Flynn R.A., Wysocka J. A unique chromatin signature uncovers early developmental enhancers in humans. Nature. 2011;470:279–283. doi: 10.1038/nature09692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Local A., Huang H., Albuquerque C.P., Singh N., Lee A.Y., Wang W. Identification of H3K4me1-associated proteins at mammalian enhancers. Nat Genet. 2018;50:73–82. doi: 10.1038/s41588-017-0015-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Ernst J., Kheradpour P., Mikkelsen T.S., Shoresh N., Ward L.D., Epstein C.B. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011;473:43–49. doi: 10.1038/nature09906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kim T.H., Barrera L.O., Zheng M., Qu C., Singer M.A., Richmond T.A. A high-resolution map of active promoters in the human genome. Nature. 2005;436:876–880. doi: 10.1038/nature03877. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Robertson G., Hirst M., Bainbridge M., Bilenky M., Zhao Y., Zeng T. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods. 2007;4:651–657. doi: 10.1038/nmeth1068. [DOI] [PubMed] [Google Scholar]
- 31.Skene P.J., Henikoff S. An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. Elife. 2017;6 doi: 10.7554/eLife.21856. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Kaya-Okur H.S., Wu S.J., Codomo C.A., Pledger E.S., Bryson T.D., Henikoff J.G. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat Commun. 2019;10:1930. doi: 10.1038/s41467-019-09982-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Boyle A.P., Davis S., Shulha H.P., Meltzer P., Margulies E.H., Weng Z. High-resolution mapping and characterization of open chromatin across the genome. Cell. 2008;132:311–322. doi: 10.1016/j.cell.2007.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kundaje A., Kyriazopoulou-Panagiotopoulou S., Libbrecht M., Smith C.L., Raha D., Winters E.E. Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements. Genome Res. 2012;22:1735–1747. doi: 10.1101/gr.136366.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Buenrostro J.D., Giresi P.G., Zaba L.C., Chang H.Y., Greenleaf W.J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013;10:1213–1218. doi: 10.1038/nmeth.2688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Ernst J., Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nat Methods. 2012;9:215–216. doi: 10.1038/nmeth.1906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Hoffman M.M., Buske O.J., Wang J., Weng Z., Bilmes J.A., Noble W.S. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Methods. 2012;9:473–476. doi: 10.1038/nmeth.1937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Consortium EP An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kim T.K., Hemberg M., Gray J.M., Costa A.M., Bear D.M., Wu J. Widespread transcription at neuronal activity-regulated enhancers. Nature. 2010;465:182–187. doi: 10.1038/nature09033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Wang D., Garcia-Bassets I., Benner C., Li W., Su X., Zhou Y. Reprogramming transcription by distinct classes of enhancers functionally defined by eRNA. Nature. 2011;474:390–394. doi: 10.1038/nature10006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Dao L.T.M., Galindo-Albarran A.O., Castro-Mondragon J.A., Andrieu-Soler C., Medina-Rivera A., Souaid C. Genome-wide characterization of mammalian promoters with distal enhancer functions. Nat Genet. 2017;49:1073–1081. doi: 10.1038/ng.3884. [DOI] [PubMed] [Google Scholar]
- 42.Engreitz J.M., Haines J.E., Perez E.M., Munson G., Chen J., Kane M. Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature. 2016;539:452–455. doi: 10.1038/nature20149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Shiraki T., Kondo S., Katayama S., Waki K., Kasukawa T., Kawaji H. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A. 2003;100:15776–15781. doi: 10.1073/pnas.2136655100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Hirabayashi S., Bhagat S., Matsuki Y., Takegami Y., Uehata T., Kanemaru A. NET-CAGE characterizes the dynamics and topology of human transcribed cis-regulatory elements. Nat Genet. 2019;51:1369–1379. doi: 10.1038/s41588-019-0485-9. [DOI] [PubMed] [Google Scholar]
- 45.Churchman L.S., Weissman J.S. Nascent transcript sequencing visualizes transcription at nucleotide resolution. Nature. 2011;469:368–373. doi: 10.1038/nature09652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Mahat D.B., Kwak H., Booth G.T., Jonkers I.H., Danko C.G., Patel R.K. Base-pair-resolution genome-wide mapping of active RNA polymerases using precision nuclear run-on (PRO-seq) Nat Protoc. 2016;11:1455–1476. doi: 10.1038/nprot.2016.086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Core L.J., Waterfall J.J., Lis J.T. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science. 2008;322:1845–1848. doi: 10.1126/science.1162228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Cruz-Molina S., Respuela P., Tebartz C., Kolovos P., Nikolic M., Fueyo R. PRC2 facilitates the regulatory topology required for poised enhancer function during pluripotent stem cell differentiation. Cell Stem Cell. 2017;20 doi: 10.1016/j.stem.2017.02.004. 689–705 e9. [DOI] [PubMed] [Google Scholar]
- 49.Entrevan M., Schuettengruber B., Cavalli G. Regulation of genome architecture and function by polycomb proteins. Trends Cell Biol. 2016;26:511–525. doi: 10.1016/j.tcb.2016.04.009. [DOI] [PubMed] [Google Scholar]
- 50.Kwasnieski J.C., Fiore C., Chaudhari H.G., Cohen B.A. High-throughput functional testing of ENCODE segmentation predictions. Genome Res. 2014;24:1595–1602. doi: 10.1101/gr.173518.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Rajagopal N., Srinivasan S., Kooshesh K., Guo Y., Edwards M.D., Banerjee B. High-throughput mapping of regulatory DNA. Nat Biotechnol. 2016;34:167–174. doi: 10.1038/nbt.3468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Dekker J., Rippe K., Dekker M., Kleckner N. Capturing chromosome conformation. Science. 2002;295:1306–1311. doi: 10.1126/science.1067799. [DOI] [PubMed] [Google Scholar]
- 53.Lieberman-Aiden E., van Berkum N.L., Williams L., Imakaev M., Ragoczy T., Telling A. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Fullwood M.J., Liu M.H., Pan Y.F., Liu J., Xu H., Mohamed Y.B. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature. 2009;462:58–64. doi: 10.1038/nature08497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Mumbach M.R., Rubin A.J., Flynn R.A., Dai C., Khavari P.A., Greenleaf W.J. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat Methods. 2016;13:919–922. doi: 10.1038/nmeth.3999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Forcato M., Nicoletti C., Pal K., Livi C.M., Ferrari F., Bicciato S. Comparison of computational methods for Hi-C data analysis. Nat Methods. 2017;14:679–685. doi: 10.1038/nmeth.4325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Quinodoz S.A., Ollikainen N., Tabak B., Palla A., Schmidt J.M., Detmar E. Higher-order inter-chromosomal hubs shape 3D genome organization in the nucleus. Cell. 2018;174 doi: 10.1016/j.cell.2018.05.024. 744–57 e24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Schoenfelder S., Fraser P. Long-range enhancer-promoter contacts in gene expression control. Nat Rev Genet. 2019;20:437–455. doi: 10.1038/s41576-019-0128-0. [DOI] [PubMed] [Google Scholar]
- 59.Robson M.I., Ringel A.R., Mundlos S. Regulatory landscaping: how enhancer-promoter communication is sculpted in 3D. Mol Cell. 2019;74:1110–1122. doi: 10.1016/j.molcel.2019.05.032. [DOI] [PubMed] [Google Scholar]
- 60.Kempfer R., Pombo A. Methods for mapping 3D chromosome architecture. Nat Rev Genet. 2019 doi: 10.1038/s41576-019-0195-2. [DOI] [PubMed] [Google Scholar]
- 61.McCord R.P., Kaplan N., Giorgetti L. Chromosome conformation capture and beyond: toward an integrative view of chromosome structure and function. Mol Cell. 2020 doi: 10.1016/j.molcel.2019.12.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Lin D., Hong P., Zhang S., Xu W., Jamal M., Yan K. Digestion-ligation-only Hi-C is an efficient and cost-effective method for chromosome conformation capture. Nat Genet. 2018;50:754–763. doi: 10.1038/s41588-018-0111-2. [DOI] [PubMed] [Google Scholar]
- 63.Liang Z., Li G., Wang Z., Djekidel M.N., Li Y., Qian M.P. BL-Hi-C is an efficient and sensitive approach for capturing structural and regulatory chromatin interactions. Nat Commun. 2017;8:1622. doi: 10.1038/s41467-017-01754-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Brant L., Georgomanolis T., Nikolic M., Brackley C.A., Kolovos P., van Ijcken W. Exploiting native forces to capture chromosome conformation in mammalian cell nuclei. Mol Syst Biol. 2016;12:891. doi: 10.15252/msb.20167311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Mizi A., Gade Gusmao E., Papantonis A. iHi-C 2.0: a simple approach for mapping native spatial chromatin organisation from low cell numbers. Methods. 2020;170:33–37. doi: 10.1016/j.ymeth.2019.07.003. [DOI] [PubMed] [Google Scholar]
- 66.Beagrie R.A., Scialdone A., Schueler M., Kraemer D.C., Chotalia M., Xie S.Q. Complex multi-enhancer contacts captured by genome architecture mapping. Nature. 2017;543:519–524. doi: 10.1038/nature21411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Lai B., Tang Q., Jin W., Hu G., Wangsa D., Cui K. Trac-looping measures genome structure and chromatin accessibility. Nat Methods. 2018;15:741–747. doi: 10.1038/s41592-018-0107-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Redolfi J., Zhan Y., Valdes-Quezada C., Kryzhanovska M., Guerreiro I., Iesmantavicius V. DamC reveals principles of chromatin folding in vivo without crosslinking and ligation. Nat Struct Mol Biol. 2019;26:471–480. doi: 10.1038/s41594-019-0231-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Zheng M., Tian S.Z., Capurso D., Kim M., Maurya R., Lee B. Multiplex chromatin interactions with single-molecule precision. Nature. 2019;566:558–562. doi: 10.1038/s41586-019-0949-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Li X., Luo O.J., Wang P., Zheng M., Wang D., Piecuch E. Long-read ChIA-PET for base-pair-resolution mapping of haplotype-specific chromatin interactions. Nat Protoc. 2017;12:899–915. doi: 10.1038/nprot.2017.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Tang Z., Luo O.J., Li X., Zheng M., Zhu J.J., Szalaj P. CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell. 2015;163:1611–1627. doi: 10.1016/j.cell.2015.11.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Rowley M.J., Nichols M.H., Lyu X., Ando-Kuri M., Rivera I.S.M., Hermetz K. Evolutionarily conserved principles predict 3D chromatin organization. Mol Cell. 2017;67 doi: 10.1016/j.molcel.2017.07.022. 837–52 e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Weintraub A.S., Li C.H., Zamudio A.V., Sigova A.A., Hannett N.M., Day D.S. YY1 is a structural regulator of enhancer-promoter loops. Cell. 2017;171 doi: 10.1016/j.cell.2017.11.008. 1573–88 e28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Fang R., Yu M., Li G., Chee S., Liu T., Schmitt A.D. Mapping of long-range chromatin interactions by proximity ligation-assisted ChIP-seq. Cell Res. 2016;26:1345–1348. doi: 10.1038/cr.2016.137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Hughes J.R., Roberts N., McGowan S., Hay D., Giannoulatou E., Lynch M. Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment. Nat Genet. 2014;46:205–212. doi: 10.1038/ng.2871. [DOI] [PubMed] [Google Scholar]
- 76.Oudelaar A.M., Beagrie R.A., Gosden M., De Ornellas S., Georgiades E., Kerry J. Dissection of the 4D chromatin structure of the α-globin locus through in vivo erythroid differentiation with extreme spatial and temporal resolution. bioRxiv. 2019 [Google Scholar]
- 77.Javierre B.M., Burren O.S., Wilder S.P., Kreuzhuber R., Hill S.M., Sewitz S. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell. 2016;167 doi: 10.1016/j.cell.2016.09.037. 1369–84 e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Schoenfelder S., Furlan-Magaril M., Mifsud B., Tavares-Cadete F., Sugar R., Javierre B.-M. The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements. Genome Res. 2015;25:582–597. doi: 10.1101/gr.185272.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Kolovos P., Brouwer R.W.W., Kockx C.E.M., Lesnussa M., Kepper N., Zuin J. Investigation of the spatial structure and interactions of the genome at sub-kilobase-pair resolution using T2C. Nat Protoc. 2018;13:459–477. doi: 10.1038/nprot.2017.132. [DOI] [PubMed] [Google Scholar]
- 80.Sahlen P., Abdullayev I., Ramskold D., Matskova L., Rilakovic N., Lotstedt B. Genome-wide mapping of promoter-anchored interactions with close to single-enhancer resolution. Genome Biol. 2015;16:156. doi: 10.1186/s13059-015-0727-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Muller H., Scolari V.F., Agier N., Piazza A., Thierry A., Mercy G. Characterizing meiotic chromosomes’ structure and pairing using a designer sequence optimized for Hi-C. Mol Syst Biol. 2018;14 doi: 10.15252/msb.20188293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Hsieh T.H., Weiner A., Lajoie B., Dekker J., Friedman N., Rando O.J. Mapping nucleosome resolution chromosome folding in yeast by micro-C. Cell. 2015;162:108–119. doi: 10.1016/j.cell.2015.05.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Ma W., Ay F., Lee C., Gulsoy G., Deng X., Cook S. Fine-scale chromatin interaction maps reveal the cis-regulatory landscape of human lincRNA genes. Nat Methods. 2015;12:71–78. doi: 10.1038/nmeth.3205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Ben Zouari Y., Molitor A.M., Sikorska N., Pancaldi V., Sexton T. ChiCMaxima: a robust and simple pipeline for detection and visualization of chromatin looping in Capture Hi-C. Genome Biol. 2019;20:102. doi: 10.1186/s13059-019-1706-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Mateo L.J., Murphy S.E., Hafner A., Cinquini I.S., Walker C.A., Boettiger A.N. Visualizing DNA folding and RNA in embryos at single-cell resolution. Nature. 2019;568:49–54. doi: 10.1038/s41586-019-1035-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Kaul A., Bhattacharyya S., Ay F. Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2. Nat Protoc. 2020 doi: 10.1038/s41596-019-0273-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Oudelaar A.M., Davies J.O.J., Hanssen L.L.P., Telenius J.M., Schwessinger R., Liu Y. Single-allele chromatin interactions identify regulatory hubs in dynamic compartmentalized domains. Nat Genet. 2018;50:1744–1751. doi: 10.1038/s41588-018-0253-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Sutherland H., Bickmore W.A. Transcription factories: gene expression in unions? Nat Rev Genet. 2009;10:457–466. doi: 10.1038/nrg2592. [DOI] [PubMed] [Google Scholar]
- 89.Jiang T., Raviram R., Snetkova V., Rocha P.P., Proudhon C., Badri S. Identification of multi-loci hubs from 4C-seq demonstrates the functional importance of simultaneous interactions. Nucleic Acids Res. 2016;44:8714–8725. doi: 10.1093/nar/gkw568. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Ay F., Vu T.H., Zeitz M.J., Varoquaux N., Carette J.E., Vert J.P. Identifying multi-locus chromatin contacts in human cells using tethered multiple 3C. BMC Genomics. 2015;16:121. doi: 10.1186/s12864-015-1236-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Olivares-Chauvet P., Mukamel Z., Lifshitz A., Schwartzman O., Elkayam N.O., Lubling Y. Capturing pairwise and multi-way chromosomal conformations using chromosomal walks. Nature. 2016;540:296–300. doi: 10.1038/nature20158. [DOI] [PubMed] [Google Scholar]
- 92.Allahyar A., Vermeulen C., Bouwman B.A.M., Krijger P.H.L., Verstegen M., Geeven G. Enhancer hubs and loop collisions identified from single-allele topologies. Nat Genet. 2018;50:1151–1160. doi: 10.1038/s41588-018-0161-5. [DOI] [PubMed] [Google Scholar]
- 93.Vermeulen C., Allahyar A., Bouwman B.A.M., Krijger P.H.L., Verstegen M., Geeven G. Multi-contact 4C: long-molecule sequencing of complex proximity ligation products to uncover local cooperative and competitive chromatin topologies. Nat Protoc. 2020;15:364–397. doi: 10.1038/s41596-019-0242-7. [DOI] [PubMed] [Google Scholar]
- 94.Nagano T., Lubling Y., Stevens T.J., Schoenfelder S., Yaffe E., Dean W. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature. 2013;502:59–64. doi: 10.1038/nature12593. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Flyamer I.M., Gassler J., Imakaev M., Brandao H.B., Ulianov S.V., Abdennur N. Single-nucleus Hi-C reveals unique chromatin reorganization at oocyte-to-zygote transition. Nature. 2017;544:110–114. doi: 10.1038/nature21711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Ramani V., Deng X., Qiu R., Gunderson K.L., Steemers F.J., Disteche C.M. Massively multiplex single-cell Hi-C. Nat Methods. 2017;14:263–266. doi: 10.1038/nmeth.4155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Du Z., Zheng H., Huang B., Ma R., Wu J., Zhang X. Allelic reprogramming of 3D chromatin architecture during early mammalian development. Nature. 2017;547:232–235. doi: 10.1038/nature23263. [DOI] [PubMed] [Google Scholar]
- 98.Stevens T.J., Lando D., Basu S., Atkinson L.P., Cao Y., Lee S.F. 3D structures of individual mammalian genomes studied by single-cell Hi-C. Nature. 2017;544:59–64. doi: 10.1038/nature21429. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Tan L., Xing D., Chang C.H., Li H., Xie X.S. Three-dimensional genome structures of single diploid human cells. Science. 2018;361:924–928. doi: 10.1126/science.aat5641. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Wang Y., Song F., Zhang B., Zhang L., Xu J., Kuang D. The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Genome Biol. 2018;19:151. doi: 10.1186/s13059-018-1519-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Huang D., Yi X., Zhang S., Zheng Z., Wang P., Xuan C. GWAS4D: multidimensional analysis of context-specific regulatory variant for human complex diseases and traits. Nucleic Acids Res. 2018;46:W114–W120. doi: 10.1093/nar/gky407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Hariprakash J.M., Ferrari F. Computational Biology Solutions to Identify Enhancers-target Gene Pairs. Comput Struct Biotechnol J. 2019;17:821–831. doi: 10.1016/j.csbj.2019.06.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Cao Q., Yip K.Y. A survey on computational methods for enhancer and enhancer target predictions. Conf Proc. 2015 [Google Scholar]
- 104.Thurman R.E., Rynes E., Humbert R., Vierstra J., Maurano M.T., Haugen E. The accessible chromatin landscape of the human genome. Nature. 2012;489:75–82. doi: 10.1038/nature11232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Shlyueva D., Stampfel G., Stark A. Transcriptional enhancers: from properties to genome-wide predictions. Nat Rev Genet. 2014;15:272–286. doi: 10.1038/nrg3682. [DOI] [PubMed] [Google Scholar]
- 106.Corradin O., Saiakhova A., Akhtar-Zaidi B., Myeroff L., Willis J., Cowper-Sal lari R. Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome Res. 2014;24:1–13. doi: 10.1101/gr.164079.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Dixon J.R., Selvaraj S., Yue F., Kim A., Li Y., Shen Y. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–380. doi: 10.1038/nature11082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Hnisz D., Day D.S., Young R.A. Insulated neighborhoods: structural and functional units of mammalian gene control. Cell. 2016;167:1188–1200. doi: 10.1016/j.cell.2016.10.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Heintzman N.D., Hon G.C., Hawkins R.D., Kheradpour P., Stark A., Harp L.F. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature. 2009;459:108–112. doi: 10.1038/nature07829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Fulco C.P., Nasser J., Jones T.R., Munson G., Bergman D.T., Subramanian V. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat Genet. 2019;51:1664–1669. doi: 10.1038/s41588-019-0538-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Pliner H.A., Packer J.S., McFaline-Figueroa J.L., Cusanovich D.A., Daza R.M., Aghamirzaie D. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Mol Cell. 2018;71 doi: 10.1016/j.molcel.2018.06.044. 858–71 e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Mehdi T., Bailey S.D., Guilhamon P., Lupien M. C3D: a tool to predict 3D genomic interactions between cis-regulatory elements. Bioinformatics. 2019;35:877–879. doi: 10.1093/bioinformatics/bty717. [DOI] [PubMed] [Google Scholar]
- 113.Kodzius R., Kojima M., Nishiyori H., Nakamura M., Fukuda S., Tagami M. CAGE: cap analysis of gene expression. Nat Methods. 2006;3:211–222. doi: 10.1038/nmeth0306-211. [DOI] [PubMed] [Google Scholar]
- 114.Yao L., Shen H., Laird P.W., Farnham P.J., Berman B.P. Inferring regulatory element landscapes and transcription factor networks from cancer methylomes. Genome Biol. 2015;16:105. doi: 10.1186/s13059-015-0668-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Lu Y., Zhou Y., Tian W. Combining Hi-C data with phylogenetic correlation to predict the target genes of distal regulatory elements in human genome. Nucleic Acids Res. 2013;41:10391–10402. doi: 10.1093/nar/gkt785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Naville M., Ishibashi M., Ferg M., Bengani H., Rinkwitz S., Krecsmarik M. Long-range evolutionary constraints reveal cis-regulatory interactions on the human X chromosome. Nat Commun. 2015;6:6904. doi: 10.1038/ncomms7904. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Clement Y., Torbey P., Gilardi-Hebenstreit P., Crollius H.R. Enhancer-gene maps in the human and zebrafish genomes using evolutionary linkage conservation. Nucleic Acids Res. 2020 doi: 10.1093/nar/gkz1199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118.Friedman J., Hastie T., Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9:432–441. doi: 10.1093/biostatistics/kxm045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Zhu Y., Chen Z., Zhang K., Wang M., Medovoy D., Whitaker J.W. Constructing 3D interaction maps from 1D epigenomes. Nat Commun. 2016;7:10812. doi: 10.1038/ncomms10812. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Liu D., Davila-Velderrain J., Zhang Z., Kellis M. Integrative construction of regulatory region networks in 127 human reference epigenomes by matrix factorization. Nucleic Acids Res. 2019;47:7235–7246. doi: 10.1093/nar/gkz538. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Rennie S., Dalby M., van Duin L., Andersson R. Transcriptional decomposition reveals active chromatin architectures and cell specific regulatory interactions. Nat Commun. 2018;9:487. doi: 10.1038/s41467-017-02798-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Dryden N.H., Broome L.R., Dudbridge F., Johnson N., Orr N., Schoenfelder S. Unbiased analysis of potential targets of breast cancer susceptibility loci by Capture Hi-C. Genome Res. 2014;24:1854–1868. doi: 10.1101/gr.175034.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.He B., Chen C., Teng L., Tan K. Global view of enhancer-promoter interactome in human cells. Proc Natl Acad Sci U S A. 2014;111:E2191–E2199. doi: 10.1073/pnas.1320308111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 124.Zhao C., Li X., Hu H. PETModule: a motif module based approach for enhancer target gene prediction. Sci Rep. 2016;6:30043. doi: 10.1038/srep30043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 125.Roy S., Siahpirani A.F., Chasman D., Knaack S., Ay F., Stewart R. A predictive modeling approach for cell line-specific long-range regulatory interactions. Nucleic Acids Res. 2015;43:8694–8712. doi: 10.1093/nar/gkv865. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126.Whalen S., Truty R.M., Pollard K.S. Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat Genet. 2016;48:488–496. doi: 10.1038/ng.3539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127.Al Bkhetan Z., Plewczynski D. Three-dimensional epigenome statistical model: genome-wide chromatin looping prediction. Sci Rep. 2018;8:5217. doi: 10.1038/s41598-018-23276-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Talukder A., Saadat S., Li X., Hu H. EPIP: a novel approach for condition-specific enhancer-promoter interaction prediction. Bioinformatics. 2019 doi: 10.1093/bioinformatics/btz641. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 129.Singh S., Yang Y., Poczos B., Ma J. Predicting enhancer-promoter interaction from genomic sequence with deep. Neural Networks. 2018: doi: 10.1007/s40484-019-0154-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130.Mao W., Kostka D., Chikina M. Modeling enhancer-promoter interactions with attention-based. Neural Networks. 2017: [Google Scholar]
- 131.Yang Y., Zhang R., Singh S., Ma J. Exploiting sequence-based features for predicting enhancer-promoter interactions. Bioinformatics. 2017;33:i252–i260. doi: 10.1093/bioinformatics/btx257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132.Zeng W., Wu M., Jiang R. Prediction of enhancer-promoter interactions via natural language processing. BMC Genomics. 2018;19:84. doi: 10.1186/s12864-018-4459-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 133.Cao F, Zhang Y, Loh YP, Cai Y, Fullwood MJ. Predicting chromatin interactions between open chromatin regions from DNA sequences. 2019:720748.
- 134.Schuster M., Paliwal K.K. Bidirectional recurrent neural networks. IEEE Trans Signal Process. 1997;45:2673–2681. [Google Scholar]
- 135.Hafez D., Karabacak A., Krueger S., Hwang Y.C., Wang L.S., Zinzen R.P. McEnhancer: predicting gene expression via semi-supervised assignment of enhancers to target genes. Genome Biol. 2017;18:199. doi: 10.1186/s13059-017-1316-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 136.Schreiber J, Libbrecht M, Bilmes J, Noble W. Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture. 2018:103614.
- 137.Li W., Wong W.H., Jiang R. DeepTACT: predicting 3D chromatin contacts via bootstrapping deep learning. Nucleic Acids Res. 2019 doi: 10.1093/nar/gkz167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 138.Zhang H., Li F., Jia Y., Xu B., Zhang Y., Li X. Characteristic arrangement of nucleosomes is predictive of chromatin interactions at kilobase resolution. Nucleic Acids Res. 2017;45:12739–12751. doi: 10.1093/nar/gkx885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139.Ghandi M., Lee D., Mohammad-Noori M., Beer M.A. Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput Biol. 2014;10 doi: 10.1371/journal.pcbi.1003711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 140.Kelley D.R., Snoek J., Rinn J.L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 2016;26:990–999. doi: 10.1101/gr.200535.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 141.Cao Q., Anyansi C., Hu X., Xu L., Xiong L., Tang W. Reconstruction of enhancer-target networks in 935 samples of human primary cells, tissues and cell lines. Nat Genet. 2017;49:1428–1436. doi: 10.1038/ng.3950. [DOI] [PubMed] [Google Scholar]
- 142.Hait T.A., Amar D., Shamir R., Elkon R. FOCS: a novel method for analyzing enhancer and gene activity patterns infers an extensive enhancer-promoter map. Genome Biol. 2018;19:56. doi: 10.1186/s13059-018-1432-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 143.Hait T.A., Elkon R., Shamir R. CT-FOCS: a novel method for inferring cell type-specific enhancer-promoter maps. bioRxiv. 2019 doi: 10.1093/nar/gkac048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 144.Moore J.E., Pratt H.E., Purcaro M.J., Weng Z. A curated benchmark of enhancer-gene interactions for evaluating enhancer-target gene prediction methods. Genome Biol. 2020;21:17. doi: 10.1186/s13059-019-1924-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 145.Cao F., Fullwood M.J. Inflated performance measures in enhancer-promoter interaction-prediction methods. Nat Genet. 2019;51:1196–1198. doi: 10.1038/s41588-019-0434-7. [DOI] [PubMed] [Google Scholar]
- 146.Belokopytova P.S., Nuriddinov M.A., Mozheiko E.A., Fishman D., Fishman V. Quantitative prediction of enhancer-promoter interactions. Genome Res. 2020;30:72–84. doi: 10.1101/gr.249367.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 147.Lundberg SM, Erion GG, Lee S-I. Consistent individualized feature attribution for tree ensembles. arXiv e-prints2018.
- 148.Shrikumar A, Greenside P, Shcherbina A, Kundaje A. Not just a black box: learning important features through propagating activation differences. arXiv e-prints2016.
- 149.Li Y., Chen C.Y., Wasserman W.W. Deep feature selection: theory and application to identify enhancers and promoters. J Comput Biol. 2016;23:322–336. doi: 10.1089/cmb.2015.0189. [DOI] [PubMed] [Google Scholar]
- 150.Zhang S., Chasman D., Knaack S., Roy S. In silico prediction of high-resolution Hi-C interaction matrices. Nat Commun. 2019;10:5449. doi: 10.1038/s41467-019-13423-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 151.Di Pierro M., Cheng R.R., Lieberman Aiden E., Wolynes P.G., Onuchic J.N. De novo prediction of human chromosome structures: epigenetic marking patterns encode genome architecture. Proc Natl Acad Sci U S A. 2017;114:12126–12131. doi: 10.1073/pnas.1714980114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152.Qi Y., Zhang B. Predicting three-dimensional genome organization with chromatin states. PLoS Comput Biol. 2019;15 doi: 10.1371/journal.pcbi.1007024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 153.Kim M., Zheng M., Tian S.Z., Lee B., Chuang J.H., Ruan Y. MIA-Sig: multiplex chromatin interaction analysis by signal processing and statistical algorithms. Genome Biol. 2019;20:251. doi: 10.1186/s13059-019-1868-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 154.Zhang R, Ma J. Probing multi-way chromatin interaction with hypergraph representation learning. 2020:2020.01.22.916171. [DOI] [PMC free article] [PubMed]
- 155.Visel A., Minovitsky S., Dubchak I., Pennacchio L.A. VISTA enhancer browser–a database of tissue-specific human enhancers. Nucleic Acids Res. 2007;35:D88–D92. doi: 10.1093/nar/gkl822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 156.Patwardhan R.P., Hiatt J.B., Witten D.M., Kim M.J., Smith R.P., May D. Massively parallel functional dissection of mammalian enhancers in vivo. Nat Biotechnol. 2012;30:265–270. doi: 10.1038/nbt.2136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 157.Boettiger A., Murphy S. Advances in chromatin imaging at kilobase-scale resolution. Trends Genet. 2020 doi: 10.1016/j.tig.2019.12.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 158.Morgan S.L., Mariano N.C., Bermudez A., Arruda N.L., Wu F., Luo Y. Manipulation of nuclear architecture through CRISPR-mediated chromosomal looping. Nat Commun. 2017;8:15993. doi: 10.1038/ncomms15993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 159.Liu X., Zhang Y., Chen Y., Li M., Zhou F., Li K. In situ capture of chromatin interactions by biotinylated dCas9. Cell. 2017;170 doi: 10.1016/j.cell.2017.08.003. 1028–43 e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 160.Wang H., Xu X., Nguyen C.M., Liu Y., Gao Y., Lin X. CRISPR-mediated programmable 3D genome positioning and nuclear organization. Cell. 2018;175 doi: 10.1016/j.cell.2018.09.013. 1405–17 e14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 161.Kim J.H., Rege M., Valeri J., Dunagin M.C., Metzger A., Titus K.R. LADL: light-activated dynamic looping for endogenous gene expression control. Nat Methods. 2019;16:633–639. doi: 10.1038/s41592-019-0436-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 162.Fishilevich S., Nudel R., Rappaport N., Hadar R., Plaschkes I., Iny Stein T. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database (Oxford) 2017;2017 doi: 10.1093/database/bax028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 163.Duggal G., Wang H., Kingsford C. Higher-order chromatin domains link eQTLs with the expression of far-away genes. Nucleic Acids Res. 2014;42:87–96. doi: 10.1093/nar/gkt857. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 164.Gate R.E., Cheng C.S., Aiden A.P., Siba A., Tabaka M., Lituiev D. Genetic determinants of co-accessible chromatin regions in activated T cells across humans. Nat Genet. 2018;50:1140–1150. doi: 10.1038/s41588-018-0156-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 165.Chandra V, Bhattacharyya S, Schmiedel BJ, Madrigal A, Fotsing S, Seumois G, et al. Promoter-interacting expression quantitative trait loci (pieQTLs) in human immune cell types. SSRN. 2019; http://dx.doi.org/10.2139/ssrn.3402070.
- 166.Zheng Z., Huang D., Wang J., Zhao K., Zhou Y., Guo Z. QTLbase: an integrative resource for quantitative trait loci across multiple human molecular phenotypes. Nucleic Acids Res. 2020;48:D983–D991. doi: 10.1093/nar/gkz888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 167.Sadowski M., Kraft A., Szalaj P., Wlasnowolski M., Tang Z., Ruan Y. Spatial chromatin architecture alteration by structural variations in human genomes at the population scale. Genome Biol. 2019;20:148. doi: 10.1186/s13059-019-1728-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 168.Xie S., Duan J., Li B., Zhou P., Hon G.C. Multiplexed engineering and analysis of combinatorial enhancer activity in single cells. Mol Cell. 2017;66 doi: 10.1016/j.molcel.2017.03.007. 285–99 e5. [DOI] [PubMed] [Google Scholar]
- 169.Gasperini M., Hill A.J., McFaline-Figueroa J.L., Martin B., Kim S., Zhang M.D. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell. 2019;176 doi: 10.1016/j.cell.2018.11.029. 377–90 e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




