Skip to main content
iScience logoLink to iScience
. 2022 Nov 30;25(12):105687. doi: 10.1016/j.isci.2022.105687

Integrating extrusion complex-associated pattern to predict cell type-specific long-range chromatin loops

Yajing Deng 1,2,3, Li Tang 1,3, Xiaolong Zhou 1, Wenkang Wang 1, Min Li 1,4,
PMCID: PMC9768375  PMID: 36567710

Summary

The chromatin loop plays a critical role in the study of gene expression and disease. Supervised learning-based algorithms to predict the chromatin loops require large priori information to satisfy the model construction, while the prediction sensitivity of unsupervised learning-based algorithms is still unsatisfactory. Therefore, we propose an unsupervised algorithm, Ecomap-loop. It takes advantage of extrusion complex-associated patterns, including CTCF, RAD21, and SMC enrichments, as well as the orientation distribution of CTCF motif of loops to build feature matrices; then the eigen decomposition model is employed to obtain the cell type-specific loops. We compare the performance of Ecomap-loop with the state-of-the-art unsupervised algorithm using Hi-C, ChIA-PET, expression quantitative trait locus (eQTL), and CRISPR interference (CRISPRi) screen data; the results show that Ecomap-loop achieves the best in four cell types. In addition, the functional analysis reveals the ability of Ecomap-loop to predict active functionality-related and cell type-specific loops.

Subject areas: computational bioinformatics, genomic analysis, methodology in biological sciences, Computing methodology

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • Extrusion complex-associated pattern facilitates the prediction of chromatin loops

  • Ecomap-loop outperforms the state-of-the-art unsupervised methods

  • Ecomap-loop can predict active functionality-related and cell type-specific loops


Computational bioinformatics; Genomic analysis; Methodology in biological sciences; Computing methodology

Introduction

The three-dimensional architecture of chromatin plays an important role in maintaining normal gene expression levels,1,2 marking cell specificity, and regulating cell growth and development.3,4,5 In the past decade, the development of chromosome conformation capture (3C) techniques6 facilitated the identification of chromatin interactions. To obtain higher resolution and throughput, the 3C-based assay has evolved from "one-to-one" (3C)6 to "one-to-many" (circular chromosome conformation capture/chromosome conformation capture-on-chip, 4C),7 "many-to-many" (chromosome conformation capture carbon copy, 5C),8 and "high-throughput chromosome conformation capture" (Hi-C),9,10 which enables high-throughput genome-wide remote chromatin interaction analysis. To detect the chromatin interactions mediated by specific proteins, chromatin interaction analysis using paired end tag sequencing (ChIA-PET)11,12 and in situ Hi-C followed by chromatin immunoprecipitation (HiChIP) and proximity ligation-assisted ChIP-Seq (PLAC-seq)13,14 have been proposed, which capture the effect of chromatin structure through the viewpoint of targeted proteins. The hierarchical chromatin organization can be studied mainly at four levels: chromosome territory, chromatin compartments, topologically associating domain (TAD), and chromatin loops.3 The chromatin loops are the basic building blocks for the 3D architecture of chromatins, which establish regulatory networks between the distant elements through their physical proximity.15 However, limited by the cost and technical issues of wet-lab experiments, it is still a great challenge to identify the chromosome loops of unrecognized cell types or species.

Recently, some machine learning-based algorithms have been emerged to solve the difficulties of identifying chromatin loops and investigating their regulatory function. These methods can be categorized into supervised and unsupervised, according to whether using 3C-identified loops for model training or not.16,17 For the supervised algorithms, the multi-omics features or genomic sequences were usually used to construct the feature matrix, and the chromatin interactions from Hi-C, ChIA-PET, HiChIP, and so on were used to generate the positive and negative training sets.18,19,20,21 However, these algorithms require large number of inputs to train the model, which is hard to apply to the uncharacterized cell types, and the prediction procedure usually takes a long time to complete. For the unsupervised algorithms, the distance and some other genomic characteristics were used as model features to infer the chromatin loops. Ernst et al.22 and Thurman et al.23 employed the correlations between enhancers and promoter DNaseI-hypersensitive sites (DHS) and the expression levels in specific regulatory regions to perform the prediction. PreSTIGE24 utilized a linear domain model to link the enhancers to their target genes. EpiTensor25 derived 3D interactions between distal genomic loci from 1D epigenomic data. Although these methods require less inputs and running time compared to the supervised methods, the accuracy of them is still unsatisfactory, which needs to be improved.16

With this in mind, we propose Ecomap-loop, which integrates extrusion complex-associated pattern to predict the cell type-specific chromatin loops. Ecomap-loop extracts the relevant patterns between three-dimensional structure of chromatin and epigenomics data and then generates the feature matrix with eigen decomposition, through which all the patterns are binned and summarized to the linear locus of genome. The predicted results are evaluated with 3C-based data, expression quantitative trait Loci (eQTLs), and clustered regularly interspaced short palindromic repeats interference or inhibition (CRISPRi) screen data, which showed that Ecomap-loop outperformed the other methods. Finally, the functional analysis indicated that Ecomap-loop can be used to predict the active functionality-related and cell type-specific loops.

Result

Extrusion complex-associated pattern (Ecomap) facilitates the prediction of chromatin loops

Considering the loop extrusion model, in which a complex, including the proteins CCCTC-binding factor (CTCF) and cohesin, mediates the formation of loops by a process of extrusion.26 The cohesin complex consists of structural maintenance of chromosomes protein (SMC), double-strand break repair protein (RAD21), and so on.27 CTCF is a widely expressed class of transcription factors that are important for the local anchoring of loop structures.10,12 In the extrusion model, when a loop is established and the extrusion complex stops sliding, the DNA located around the extrusion complex is maintained rigid (Figure 1A). Recently, some studies uncovered that the orientation of the CTCF motif is critical for the formation of the loop, which includes convergent, tandem (leftward and rightward), and divergent motif patterns (Figure 1B). And there is an orientation preference of convergent that with higher contact frequency than the other orientations.10,12

Figure 1.

Figure 1

Extrusion complex-associated pattern (Ecomap) facilitates the prediction of chromatin loops

(A) Diagram of extrusion complex in loop formation.

(B) Diagram of three types of CTCF motif orientation (divergent, tandem, and convergent) bound to DNA.

(C) Percentage of E-E, E-P, and P-P loops in K562-H3K27ac ChIA-PET dataset.

(D) The ChIP-seq peak enrichment of CTCF, RAD21, and SMC in three types of loops of K562-H3K27ac ChIA-PET dataset.

(E) Percentage of divergent, tandem, and convergent CTCF motifs.

(F) MDI feature importance in three types of loops.

To investigate whether the extrusion complex-associated pattern facilitates the prediction of chromatin loops, we firstly collected H3K27ac ChIA-PET loops and characterized the type of loops into enhancer-enhancer (E-E), enhancer-promoter (E-P), and promoter-promoter (P-P) (see STAR Methods). The characterization showed E-P loops occupied the most percentage (43.35%), followed by E-E loops (32.1%) (Figure 1C), which was consistent with the finding that H3K27ac-mediated loops identified functional enhancer interactions.28 Then ChIP-seq datasets of CTCF, SMC, and RAD21 in K562 cell line were collected and mapped to the different types of H3K27ac ChIA-PET loops, respectively (see STAR Methods); the results indicated the ChIP-seq peaks of three proteins enriched near the anchors of loop. As the promoter anchor positions of H3K27ac-mediated loops were high transcriptional activity related, which tend to have higher ChIP-seq signal, thus, the P-P loops showed the highest enrichment of ChIP-seq peaks, followed by E-P loops (Figure 1D). And we annotated all the ChIA-PET loops with CTCF motifs, of which 78% were associated with bound CTCF at both anchors; within these associated loops, 64% were convergent, 33% were tandem, and 3% were in the divergent orientation (Figure 1E). Our analysis result was consistent with the previous finding that the convergent orientation was required for the formation of loops.10,26,29,30 Then we used a random forest classifier to compute the feature importance with mean decrease in impurity (MDI), which was defined as the mean and SD of accumulation of the impurity decrease within each classifier tree (Figure 1F). We observed that the ranking of importance was identical between three types of loops, suggesting the long-range chromatin loops were predictable through these extrusion complex-associated patterns. And the orientation of CTCF motifs showed higher importance than the others, which provided the basis for subsequent model design. The feature importance of CTCF motif orientation is similar for three types of anchors, indicating that the feature occupied similar weight in the prediction model for all the loops, not affected by the anchor types.

Ecomap-loop: Integrating extrusion complex-associated pattern to predict cell type-specific long-range chromatin loops

As extrusion complex (CTCF, RAD21, and SMC) plays important role in the formation of loops, the bounding pattern of which has been proved to benefit the prediction of chromatin interactions.12 Here we propose an unsupervised algorithm Ecomap-loop (Figure 2), which integrates the bounding pattern of extrusion complex to predict the long-range chromatin loops in a cell type-specific manner. The model of Ecomap-loop can be divided into three parts: calculating the read coverage of CTCF, RAD21, and SMC on each fragment, the evaluation for CTCF motif orientations, and the eigen decomposition by using the assays including histone marks ChIP-seq and DNase-seq of different cell types (see STAR Methods).

Figure 2.

Figure 2

The schema of Ecomap-loop

The model of Ecomap-loop can be divided into three parts: calculating the read coverage of CTCF, RAD21, and SMC on each fragment as Vc, calculating the evaluation score for CTCF motif orientations as VF, and the eigen decomposition by using the assays including histone marks ChIP-seq and DNase-seq of different cell types as VQ. The final E-E, E-P, and P-P loops are measured with score V. The blue peaks indicate the ChIP-seq peaks across the genome. The green arrows indicate the CTCF motif orientation.

Considering the final prediction effect and the changes of coverage rate led by the different lengths of different gene fragments, the number of base pairs is regarded as an evaluation indicator to present the coverage of CTCF, RAD21, and SMC on different fragments. Then the coverage value of three proteins is calculated on different fragments as Vc. Then we evaluate the matching probability and the orientations of CTCF motif for each fragment; a matching score VF is calculated. As four orientations occur in different frequencies across all the loops, we assigned different weights to different orientations. Finally, the epigenomics data including histone mark ChIP-seq and DNase-seq data are used to build the eigen decomposition part, and we capture the peaks with covariation as chromatin interaction. We calculate an association score VQ to measure the strength of interaction. The final score of Ecomap-loop is defined as the sum of the three parts.

Evaluation of predicted chromatin loops with 3C-based experimental datasets

To evaluate the prediction results of Ecomap-loop, we downloaded the Hi-C experiment data of K562, GM12878, IMR90, and HepG2 with 5–10 kb resolution10 (see STAR Methods), which were regarded as positive samples. As most of the earlier unsupervised methods did not provide source code, here we used the state-of-the-art method EpiTensor for comparison. The predicted loops from EpiTensor were ranked in terms of their association scores (AS), and the predicted loops from Ecomap-loop were ranked in terms of the final evaluation scores (ES). The predicted loops validated by the Hi-C contacts were defined as true positives; the predicted loops not validated by the Hi-C contacts were defined as false positives; the loops not predicted while validated by the Hi-C contacts were defined as false negatives; and the loops not predicted and not validated by the Hi-C contacts were defined as true negatives. To generate the receiver operating characteristic (ROC) curve, we changed the threshold of AS and ES gradually to calculate a series of sensitivity and specificity values. Then the area under the curve (AUC) for different cell lines were calculated. The ROC curves of EpiTensor are simple fold lines, which may be due to the low number of positive samples predicted by EpiTensor. And the AUC values of Ecomap-loop were higher than those of EpiTensor in three loop types across the cell lines of K562, GM12878, IMR90, and HepG2 in which Ecomap-loop achieved the highest AUC increasing of 20.9% in the E-P loop (K562) dataset (Figure 3A).

Figure 3.

Figure 3

Evaluation of predicted chromatin loops with Hi-C experimental datasets

(A) The receiver operating characteristic (ROC) curves are generated by changing the threshold of AS and ES gradually; the area under the curve (AUC) for different cell types were calculated.

(B) Example of prediction results near MYC gene locus.

(C) Example of prediction results near JAK2 gene locus.

To further validate the prediction results, we collected the ChIA-PET experimental datasets of four cell types from ENCODE31 as positive samples (see STAR Methods). Similar as the validation of Hi-C experimental data, we change the AS and ES gradually to generate the ROC curve and the definition of true positives, false positives, false negatives, and true negatives depending on the consistency between predicted loops and ChIA-PET loops. The comparison results indicated that Ecomap-loop outperformed EpiTensor in all the cell types and loop types, in which Ecomap-loop achieved the highest AUC increasing of 15.0% in the P-P loop (GM12878) dataset (Figure S1).

In the study of Fulco,32 the interactions between MYC and 7 enhancers were identified in K562 cell line. Here we checked the prediction results of Ecomap-loop and EpiTensor near the MYC locus with K562 datasets, which showed Ecomap-loop can predict the loops of MYC-e1, MYC-e2, MYC-e4, and MYC-e5, while EpiTensor only predicted the loop of MYC-e5 (Figure 3B). In the study of Mattews,33 JAK2 gene promoter interacts with an enhancer 222 kb away and relates to the myeloproliferative disorder. The enhancer has an H3K27ac peak and harbor the SNP rs385893. Presence of SNP within the H3K27ac peak alters the transcription factor binding property of that region and thus causes reduced interaction with JAK2 promoter. The prediction results showed Ecomap-loop can predict the loop of JAK2-rs385893, while no loop was detected by EpiTensor in the region (Figure 3C).

Overall, the validation comparison between Ecomap-loop and EpiTensor in Hi-C and ChIA-PET experimental datasets across four cell types revealed the high sensitivity of Ecomap-loop to predict E-E, E-P, and P-P chromatin loops.

Validation of predicted chromatin loops with eQTL and CRISPRi datasets

eQTLs are genetic loci that control the expression level of genes for quantitative traits, which paralleled the adoption of genome-wide association studies (GWAS) to analyze the complex traits and disease in humans. It has become common to interpret noncoding variant-gene associations using eQTL data.34,35 Here we obtained the reliable chromatin associations from eQTL datasets of each cell line to validate the predicted loops (see STAR Methods). Because the number of eQTL loops was relatively less than the number of loops detected by sequence-based techniques (such as Hi-C and ChIA-PET), we used the overlapping percentage between predicted loops and eQTL loops to measure the precision of Ecomap-loop and EpiTensor. Before calculation, the predicted loops from Ecomap-loop and EpiTensor were ranked by AS and ES, respectively, and the top 20% chromatin loops were retained for the calculation. The comparison results revealed that the precision of Ecomap-loop outperformed EpiTensor in four cell lines and three loop types. In addition, it was expected that the overlapping percentage of E-E loops was observed the highest across three loop types for both Ecomap-loop and EpiTensor as eQTL detected the chromatin associations of noncoding variants and most of them linked to enhancers (Figure 4A).

Figure 4.

Figure 4

Validation of predicted chromatin loops with eQTL and CRISPRi datasets

(A) Overlapping percentage between eQTL loops and Ecomap-loop/EpiTensor-predicted loops in four cell types and three loop types.

(B) Genomic tracks of virtual 4C profile, CRISPRi screen interactions, and Ecomap-loop predictions near the GATA1 gene locus. The red triangles indicate the CRISPRi testing positions.

We next validated Ecomap-loop-predicted loops by comparing them to functionally validated enhancer-promoter pairs identified via systematic CRISPRi screen.36 In the study of Klann et al., several candidate regulatory elements were perturbed and the expression changes of gene HBE1 were detected in K562 cell line.36 We collected the ChIP-seq signal tracks of H3K27ac, H3K4me1, H3K4me3, CTCF, RAD21, and SMC3 in K562 and mapped the Virtual 4C profile,37 CRISPRi screen interaction, and predicted loops from Ecomap-loop to these tracks. The mapping results showed the CRISPRi screen interactions were predicted by Ecomap-loop. Besides, the furthest upstream loops predicted by Ecomap-loop have been validated by 4C profile (Figure 4B).

Functional analysis revealed the cell type-specific prediction ability of Ecomap-loop

To analyze the regulatory functionality of predicted loops, we firstly annotated the loop anchors with active histone marks and filtered the loops with both anchors active (see STAR Methods). We calculated the percentage of active loops for each cell type and loop type, which showed that the prediction of GM12878 cell type had the most active loops (>60%) in three loop types. And E-E loop was observed with the highest percentage of active loops across four cell types (Figure 5A). Then we extracted the ChIP-seq peak signals of H3K27ac, H3K4me1, and H3K4me3 at the locus of anchors; the active loops were observed with higher histone marks binding signal than the common loops in four cell types. The p value was calculated by the Wilcoxon test, indicating that the active loops had higher chromatin activity than the common loops (Figure 5B).

Figure 5.

Figure 5

Functional analysis revealed the cell type-specific prediction ability of Ecomap-loop

(A) Percentage of active loops in four cell types and three loop types predicted by Ecomap-loop.

(B) Histone mark ChIP-seq peak signal of active loop anchors in four cell types.

(C) Cell identity GO enrichment of active loop anchors in four cell types.

We next extracted the genomic positions of active anchors for each cell type individually, then annotated the positions with the nearest genes, and the active gene sets were used to detect the gene ontology (GO) enrichment by Metascape38 with the p value cutoff of 0.01, minimum overlap of 3, and minimum enrichment of 1.5.38 The significant enriched terms related to the cell type identity were selected and showed. For K562, the GO terms enriched in leukemia and immunity. For GM12878, the GO terms enriched in lymphocyte activation and related regulation process. For IMR90, the GO terms enriched in lung cancer and morphogenesis. For HepG2, the GO terms enriched in liver development and disease. These results indicated that the loop anchors contributed to the corresponding cell identity, suggesting the loops predicted by Ecomap-loop were cell type-specific (Figure 5C).

Discussion

With the rapid development of 3C-based techniques and high-throughput sequencing, we have known that human interphase chromosomes are folded into multiple layers of hierarchical structures, including chromatin territory, compartment, topologically associated domain (TAD), and chromatin loop. Among them, the chromatin loop by definition is two genomic loci that are physically closer in the nucleus than their intervening sequences, which play an important role in gene expression and disease-associated studies. Recently, some supervised-learning algorithms have been developed to eliminate the obstacles of wet-lab experiments, while these algorithms require large data input and long running time. Thus, a fast and easy-to-use algorithm with high sensitivity is required in this area. In this study, we develop an unsupervised-based algorithm, Ecomap-loop, to predict the cell type-specific long-range chromatin loops.

The contribution of CTCF and cohesin to the formation of E-P loops is still an open question. Some studies have revealed the extrusion complexes do not contribute to the E-P interactions significantly,39 while other cases found that CTCF is directly involved in E-P interactions.10,12,40 Although the opposing statement exists, the consensus view is that CTCF and cohesin are the important mediator of chromatin loops.9,10 In our study, we did not use the extrusion complexes to distinguish the anchor type of enhancer/promoter. For the classification of promoter/enhancer, we used GENCODE data and the EnhancerAtlas to annotate the loops (see STAR Methods). Through these steps, we get the prediction of E-P loops on the basis of all the chromatin loops.

Recently, some studies have incorporated CTCF and cohesin binding information to predict 3D loops in silico. In the studies of Oti et al.41 and Matthews et al.,42 the features of CTCF and cohesin were considered individually, and only the interactions anchored by cohesin and CTCF were predicted. In our study, we employed the extrusion complex-associated pattern, including CTCF orientation and CTCF/cohesin bounding, to construct an unsupervised-learning model to perform the prediction. Ecomap-loop can predict all the possible loops across the genome, including the CTCF/cohesin-mediated ones. Besides, the prediction results of Ecomap-loop were classified into E-E, E-P, and P-P.

The prediction results of Ecomap-loop have been validated by Hi-C, ChIA-PET, eQTL, and CRISPRi data. The benchmarking results show that Ecomap-loop outperforms the state-of-the-art unsupervised algorithm EpiTensor. For further comparison, we evaluated the number of loops predicted, median loop length, and the number of genes covered for the two methods (Table S1). The inputs for Ecomap-loop and EpiTensor are different. The data matrices used for EpiTensor are generated from histone ChIP-seq data. The input for Ecomap-loop included the CTCF, SMC, and RAD21 ChIP-seq data and the CTCF motif matrix with orientation information. To make the comparison as fair as possible, we processed Hi-C and ChIA-PET datasets with the same steps for Ecomap-loop and EpiTensor to generate the positive samples. Then we defined the true positive, false negative, true negative, and false negative as described by the EpiTensor paper. To generate the ROC curve, we changed the threshold of loop score gradually to calculate the sensitivity and specificity.

To check the ability of predicting inactive loops, we extracted the ChIP-seq signal of H3K27me3 at the locus of anchors; the inactive loops were observed with higher H3K27me3 histone mark binding signal than the common loops (Figure S2). Overall, Ecomap-loop can predict not only active loops but also inactive loops, which facilitates the further mining of gene regulation mechanism under the context of 3D architecture.

Limitations of the study

Compared with other unsupervised learning-based algorithms to predict the chromatin interactions, Ecomap-loop makes use of the orientation distribution of CTCF motif of loops and the enrichments of CTCF, RAD21, and SMC on loops to improve the prediction accuracy. However, there are still some limitations. Firstly, Ecomap-loop requires a variety of input files, including CTCF, RAD21, and SMC ChIP-seq data, CTCF motif matrix, and gene annotation files. For different cell lines, the publicly accessible files may not completely satisfy the requirements. Secondly, Ecomap-loop takes a substantial amount of computational resources and time to process the data for cell lines in the whole genome, which needs to be improved in the future. Thirdly, Ecomap-loop divides the whole genome into different segments, including promoter, enhancer, and other regions, while the regions overlapped with both promoter and enhancer should be treated properly.

STAR★Methods

Key resources table

REAGENT or RESOURCE SOURCE IDENTIFIER
Deposited data

GENCODE Frankish et al., 202143 https://www.gencodegenes.org/
EnhancerAtlas Gao and Qian, 202044 http://www.enhanceratlas.org/downloadv2.php
ROADMAP Roadmap Epigenomics Consortium et al., 201545 http://www.roadmapepigenomics.org/
ENCODE ENCODE Project Consortium, 201231 https://www.encodeproject.org/(See Table S2)
GEO Rao et al., 201410 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63525
UCSC Lander et al., 200146 https://genome.ucsc.edu/

Software and algorithms

Ecomap-loop This study https://github.com/CSUBioGroup/Ecomap-loop
EpiTensor Zhu et al., 201625 http://wanglab.ucsd.edu/star/EpiTensor/
BEDTools(v2.30) Quinlan and Hall, 201047 https://bedtools.readthedocs.io/en/latest/
Deeptools Ramírez et al., 201648 https://deeptools.readthedocs.io
FIMO(v5.4) Grant et al., 201149 https://meme-suite.org/meme/doc/fimo.html

Resource availability

Lead contact

Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Min Li (limin@mail.csu.edu.cn).

Materials availability

This study did not generate new unique reagents.

Method details

Annotation of loop types

The entire genome is divided into three parts: promoter, enhancer and other. Other regions are defined as the remaining portion of the genome not overlapping with annotated promoters and enhancers. We download the GFF3 file of human GRCh37 (v19) in GENCODE (https://www.gencodegenes.org/).43 The promoter regions are defined as the region 1000 bp upstream and 1000 bp downstream of each gene transcription start site. The enhancer regions are accessible in EnhancerAtlas (http://www.enhanceratlas.org/downloadv2.php)44 which provides enhancers predicted by combining the results of different analyses of high-throughput data in humans (hg19). All the predicted loop anchors overlap with the promoters or enhancers with at least 1 bp were retained, here BEDTools47 is used to count the overlapping length. Then we extract the promoter parts and enhancer parts from the predicted loops, and concentrate on the promoter-enhancer pairs, promoter-promoter pairs, and enhancer-enhancer pairs.

Enrichment of ChIP-seq peaks in loops

We firstly download the raw fastq sequencing files of K562-H3K27ac ChIA-PET, then trim the linkers and align the reads to reference hg19, the alignment results are converted to bigwig format with bamCoverage of deepTools.48 And the ChIP-seq peaks of CTCF, RAD21, and SMC are collected in bed format, then the regions of E-E, E-P and P-P loops were used to extract the corresponding alignments from ChIA-PET bigwig file. We next calculate the reads density with the bin length of 10bp for three loop types at the ChIP-seq peak center with 3 kb upstream and 3 kb downstream. Finally, the matrices of read density are used to generate the enrichment plot.

Implementation of Ecomap-loop

The implementation of Ecomap-loop can be divided into three parts, calculating the read coverage of CTCF, RAD21 and SMC on each fragment, the evaluation for CTCF motif orientations and the eigen decomposition by using the assays including histone marks ChIP-seq and DNase-seq of different cell types.

As CTCF, RAD21 and SMC are reported as the extrusion complex and play important role in loop formation, the ChIP-seq peaks of CTCF, RAD21 and SMC are downloaded from ENCODE (https://www.encodeproject.org/) for each cell line, then coverageBed function of BEDTools47 is used to obtain the coverage rate and the number of base pairs of CTCF, RAD21 and SMC peaks covering with on each fragment. Considering the final prediction effect and the changes of coverage rate led by the different lengths of different fragments, here we use the number of base pairs as an evaluation indicator to present the coverage of CTCF, RAD21 and SMC on different fragments. To balance the impact of such a high coverage threshold, we have experimentally assigned different coefficients to the coverage and determined its final coefficient to be 0.1. Then we combine all the promoters and enhancers genome-wide as E-E, E-P, and P-P pairs. After matching all the CTCF data, RAD21 data and SMC data to the pairs, we sort them by the genomic locus, through which the fragment with smaller genomic coordinate is placed in front, named fragment-1, and the fragment with larger genomic coordinate is placed behind, named fragment-2. The coverage of CTCF, RAD21 and SMC are defined as Vc=0.1×(C1+C2+R1+R2+S1+S2), where C1 represents the CTCF coverage score of fragment-1, and C2 represents the CTCF coverage score of fragment-2. Similarly, R1, R2, S1 and S2 are the RAD21 coverage score of fragment-1, the RAD21 coverage score of fragment-2, the SMC coverage score of fragment-1, and the SMC coverage score of fragment-2, respectively.

There are four orientations of CTCF motif on interactions, including convergent, tandem leftward, tandem rightward and divergent. To evaluate the orientations of CTCF motif on both ends of loops, we employ FIMO,49 which is a motif scanning tool in the MEME suit to scan each promoter and enhancer fragment. FIMO needs a motif file containing MEME formatted motifs and a sequence file in FASTA format as input, then reports all possible positions in each sequence that match a motif with the corresponding stand, matched sequence, log likelihood ratio score, p value, and q-value. We next process the orientations of CTCF motifs on each fragment, firstly, we download the hg19 genome sequence in UCSC (https://hgdownload.soe.ucsc.edu/) and the meme motif format data of CTCF in JASPER (https://jaspar.genereg.net/)50 to get the position-dependent letter-probability matrices that describe the probability of each possible letter at each position in the pattern. Then we extract the sequences in.fasta format for each promoter and enhancer by using getfasta function in BEDTools package. Next, we use the FIMO to identify the candidate CTCF binding sites and their corresponding chains. Finally, we filter for maximum value in both forward and reverse strands of each promoter and enhancer from the output files of FIMO and preserve the DNA strand information for each fragment.

We use F1+ to represent the CTCF motif score of fragment-1 on the forward strand, F2 represents the CTCF score of fragment-2 on the reverse strand, F1 represents the CTCF score of fragment-2 on the reverse strand, and F2+ represents the CTCF score of fragment-2 on the forward strand. Thus, the convergent orientation of CTCF on both ends of interactions can be described as Fc=(F1+×F2). Similarly, the tandem rightward orientation and the tandem leftward orientation are described as Ftr=(F1+×F2+) and Ftl=(F1×F2), respectively, and the Fd=(F1×F2+) stands for the divergent orientation. However, these orientations have different frequencies across all the loops. The convergent orientation has been proved be the majority part (around 64.5–92%) in four orientations, while the divergent orientation is rare because of its structural instability. Therefore, we assign different weights for these orientations in different experiments and validate the corresponding results in four cell lines to choose the optimal weights for different orientations. The equation to evaluate the orientations of CTCF motifs on both ends of interactions is defined as following,

VF=0.7×Fc+0.15×Ftr+0.15×Ftl

Finally, we use epigenomics data including histone ChIP-seq and DNase-seq data to construct the eigen Q, which can be divided into three feature matrices, as shown below,

Q=G×1A×2B×3C

Where A, B, C represent the feature matrix of the cell type, the genomic locus of epigenomic data and the epigenomics data such as DNase-seq data and different histone mark ChIP-seq data by eigen decomposition. G is the Core third order matrix among three feature matrices. The definitions of the 1-mode product G×1A, the 2-mode product G×2B and the 3-mode product G×3C are shown as below,

(G×1A)i1j2j3=j1=1J1gj1j2j3ai1j1
(G×2B)j1i2j3=j2=1J2gj1j2j3bi2j2
(G×3C)j1j2i3=j3=1J3gj1j2j3ci3j3
  • Thus, we can get another equation as following,

qi1i2i3=j3=1J3j2=1J2j1=1J1gj1j2j3ai1j1bi2j2ci3j3

where qi1i2i3 is the specific value in (i1,i2,i3) of Q, which is the same as gj1j2j3, ai1j1, bi2j2, ci3j3.

Here we focus on the feature matrix of the genomic locus. Then we capture the peaks with co-variation across different cell types and epigenomic datasets by dimensionality reduction, inferring that there is a physical association between them, and determining the type of association based on the gene region in which the peak loci are located. We use VQ=h1×h2 to define the association between two peaks, where h1 and h2 are the strength of two peaks, respectively. And the final score of Ecomap-loop is defined as V=Vc+VF+VQ.

Process of 3C-based experimental datasets

To evaluate the predicted loops, we used the public Hi-C and ChIA-PET datasets of K562, GM12878, IMR90 and HepG2 to generate the positive loops. For Hi-C datasets, the Hi-C matrix are downloaded from 4DN data portal (https://data.4dnucleome.org/)51 with resolution of 5 kb, then we call the Hi-C interactions by HiCCUPS52 with default parameters. Since we concentrate on the E-E, E-P, and P-P loops, here we narrow down the Hi-C interactions with the promoters from ENCODE and the enhancers from EnhancerAtlas in a cell type-specific manner. For the ChIA-PET datasets, we obtain the interactions from ENCODE with bedpe format, like Hi-C data, we use promoters and enhancers to narrow down the ChIA-PET interactions.

We divided the whole genome into different segments, including promoter, enhancer, and others. All the enhancer and promoter segments are combined to obtain the possible promoter-promoter, enhancer-enhancer and promoter-enhancer pairs. For each pair, we got a final score V, which is the sum of VC, VF, VQ. We arranged the possible pairs according to the score V. The pairs with V greater than the pre-setting threshold are regarded as loops predicted by Ecomap-loop, which are positive samples, and the others are regarded as loops not predicted by Ecomap-loop. Then, we regarded the pairs whose both-end overlapped with the loops in Hi-C data as loops validated by Hi-C experiments, and the other pairs are defined as loops not validated by Hi-C experiments. The process of ChIA-PET data for the verification is the same as Hi-C.

Process of eQTL datasets

We curate the published eQTL datasets from eQTL Catalog (https://www.ebi.ac.uk/eqtl/Data_access/), for K562, lymphoblastoid cell line (LCL) eQTL data are used; for GM12878, blood tissue eQTL data are used; for IMR90, lung tissue eQTL data are used; for HepG2, liver tissue eQTL data are used. The variant-gene pairs with the highest PIP within each credible set are extracted as the candidate chromatin loops, and we extend 1000bp length on both ends for each variant, the regions of extended variants are regarded as left anchors, and the paired target genes are regarded as right anchors.

Detection of active loops

Active and inactive promoters/enhancers are connective with the bounding pattern of histone marks. The histone mark H3K27me3 is regarded as the sign of inactive promoters and inactive enhancers,53,54 while active enhancers have deposition of H3K27ac,55 and H3K4me3 localizes at the active promoter regions.56 In this study, active promoters are defined as the promoter regions overlapped with the H3K4me3 peaks and not overlapped with the H3K27me3 peaks, while active enhancers are defined as the enhancer regions enriched with H3K27ac. These active promoter-active enhancer pairs, active promoter-active promoter pairs and active enhancer-active enhancer pairs (validated by Hi-C experimental data, ChIA-PET experimental data etc.) are assumed as active loops. Here we use the intersect function of BEDTools to obtain the active enhancers by extracting the intersections of enhancers and the H3K27ac peaks for corresponding cell types. The active promoters are accessible by extracting the intersections of promoters and subtracting the H3K4me3 peaks to extract the differences of the intersections and H3K27me3 peaks.

Quantification and statistical analysis

Data were analyzed using Python. Details of specific statistical analyses are included in the main text. The AUC curves were generated using the matplotlib in python. The bar graph, box graph, and arc graph were generated with the R package ggplot2. For differences between the peak signals of histone marks, we used the Wilcoxon test to calculate the p value. Statistical significance was defined as p < 0.05.

Acknowledgments

This work was supported by grants from the National Natural Science Foundation of China under Grants (No. 62225209) [M.L.], the science and technology innovation program of Hunan Province (2021RC4008) [M.L.], and the Fundamental Research Funds for the Central Universities of Central South University (2021zzts0203) [L.T.]. We are grateful to the High-Performance Computing Center of Central South University for partial support of this work.

Author contributions

L.T. and M.L. conceived the presented idea. Y.D. and L.T. collected the data and designed the model. Y.D. wrote the source code. X.Z. helped improve the bioinformatics analysis. W.W. helped organize the code. L.T. and M.L. aided in interpreting the results and provided input on the data presentation. All authors provided critical feedback and helped shape the research, analysis, and manuscript.

Declaration of interests

The authors declare no competing interests.

Published: December 22, 2022

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.isci.2022.105687.

Supplemental information

Document S1. Figures S1, S2 and Tables S1, S2
mmc1.pdf (676.1KB, pdf)

Data and code availability

This paper analyzes existing, publicly available data. These accession URLs for the datasets are listed in the key resources table. The accession numbers of publicly Epigenomics datasets used in this study are shown in Table S2.

Source code and tutorials are publicly available online at https://github.com/CSUBioGroup/Ecomap-loop.

Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

References

  • 1.Lupiáñez D.G., Kraft K., Heinrich V., Krawitz P., Brancati F., Klopocki E., Horn D., Kayserili H., Opitz J.M., Laxova R., et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell. 2015;161:1012–1025. doi: 10.1016/j.cell.2015.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Hnisz D., Weintraub A.S., Day D.S., Valton A.-L., Bak R.O., Li C.H., Goldmann J., Lajoie B.R., Fan Z.P., Sigova A.A., et al. Activation of proto-oncogenes by disruption of chromosome neighborhoods. Science. 2016;351:1454–1458. doi: 10.1126/science.aad9024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Zheng H., Xie W. The role of 3D genome organization in development and cell differentiation. Nat. Rev. Mol. Cell Biol. 2019;20:535–550. doi: 10.1038/s41580-019-0132-4. [DOI] [PubMed] [Google Scholar]
  • 4.Naumova N., Imakaev M., Fudenberg G., Zhan Y., Lajoie B.R., Mirny L.A., Dekker J. Organization of the mitotic chromosome. Science. 2013;342:948–953. doi: 10.1126/science.1236083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Zhang H., Lam J., Zhang D., Lan Y., Vermunt M.W., Keller C.A., Giardine B., Hardison R.C., Blobel G.A. CTCF and transcription influence chromatin structure re-configuration after mitosis. Nat. Commun. 2021;12:5157. doi: 10.1038/s41467-021-25418-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Dekker J., Rippe K., Dekker M., Kleckner N. Capturing chromosome conformation. Science. 2002;295:1306–1311. doi: 10.1126/science.1067799. [DOI] [PubMed] [Google Scholar]
  • 7.Simonis M., Klous P., Splinter E., Moshkin Y., Willemsen R., de Wit E., de Laat W., van Steensel B. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture–on-chip (4C) Nat. Genet. 2006;38:1348–1354. doi: 10.1038/ng1896. [DOI] [PubMed] [Google Scholar]
  • 8.Dostie J., Richmond T.A., Arnaout R.A., Selzer R.R., Lee W.L., Honan T.A., Rubio E.D., Krumm A., Lamb J., Nusbaum C., et al. Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res. 2006;16:1299–1309. doi: 10.1101/gr.5571506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lieberman-Aiden E., van Berkum N.L., Williams L., Imakaev M., Ragoczy T., Telling A., Amit I., Lajoie B.R., Sabo P.J., Dorschner M.O., et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Rao S.S.P., Huntley M.H., Durand N.C., Stamenova E.K., Bochkov I.D., Robinson J.T., Sanborn A.L., Machol I., Omer A.D., Lander E.S., Aiden E.L. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Fullwood M.J., Liu M.H., Pan Y.F., Liu J., Xu H., Mohamed Y.B., Orlov Y.L., Velkov S., Ho A., Mei P.H., et al. An oestrogen-receptor-α-bound human chromatin interactome. Nature. 2009;462:58–64. doi: 10.1038/nature08497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Tang Z., Luo O.J., Li X., Zheng M., Zhu J.J., Szalaj P., Trzaskoma P., Magalska A., Wlodarczyk J., Ruszczycki B., et al. CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell. 2015;163:1611–1627. doi: 10.1016/j.cell.2015.11.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mumbach M.R., Rubin A.J., Flynn R.A., Dai C., Khavari P.A., Greenleaf W.J., Chang H.Y. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat. Methods. 2016;13:919–922. doi: 10.1038/nmeth.3999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Fang R., Yu M., Li G., Chee S., Liu T., Schmitt A.D., Ren B. Mapping of long-range chromatin interactions by proximity ligation-assisted ChIP-seq. Cell Res. 2016;26:1345–1348. doi: 10.1038/cr.2016.137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kumar S., Kaur S., Seem K., Kumar S., Mohapatra T. Understanding 3D genome organization and its effect on transcriptional gene regulation under environmental stress in plant: a chromatin perspective. Front. Cell Dev. Biol. 2021;9:774719. doi: 10.3389/fcell.2021.774719. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Tang L., Zhong Z., Lin Y., Yang Y., Wang J., Martin J.F., Li M. EPIXplorer: a web server for prediction, analysis and visualization of enhancer-promoter interactions. Nucleic Acids Res. 2022;50:W290–W297. doi: 10.1093/nar/gkac397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Tao H., Li H., Xu K., Hong H., Jiang S., Du G., Wang J., Sun Y., Huang X., Ding Y., et al. Computational methods for the prediction of chromatin interaction and organization using sequence and epigenomic profiles. Brief Bioinform. 2021;22:bbaa405. doi: 10.1093/bib/bbaa405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Whalen S., Truty R.M., Pollard K.S. Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat. Genet. 2016;48:488–496. doi: 10.1038/ng.3539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Cao Q., Anyansi C., Hu X., Xu L., Xiong L., Tang W., Mok M.T.S., Cheng C., Fan X., Gerstein M., et al. Reconstruction of enhancer-target networks in 935 samples of human primary cells, tissues and cell lines. Nat. Genet. 2017;49:1428–1436. doi: 10.1038/ng.3950. [DOI] [PubMed] [Google Scholar]
  • 20.Belokopytova P.S., Nuriddinov M.A., Mozheiko E.A., Fishman D., Fishman V. Quantitative prediction of enhancer–promoter interactions. Genome Res. 2020;30:72–84. doi: 10.1101/gr.249367.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Tang L., Hill M.C., Wang J., Wang J., Martin J.F., Li M. Predicting unrecognized enhancer-mediated genome topology by an ensemble machine learning model. Genome Res. 2020;30:1835–1845. doi: 10.1101/gr.264606.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ernst J., Kheradpour P., Mikkelsen T.S., Shoresh N., Ward L.D., Epstein C.B., Zhang X., Wang L., Issner R., Coyne M., et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011;473:43–49. doi: 10.1038/nature09906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Thurman R.E., Rynes E., Humbert R., Vierstra J., Maurano M.T., Haugen E., Sheffield N.C., Stergachis A.B., Wang H., Vernot B., et al. The accessible chromatin landscape of the human genome. Nature. 2012;489:75–82. doi: 10.1038/nature11232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Corradin O., Saiakhova A., Akhtar-Zaidi B., Myeroff L., Willis J., Cowper-Sal·lari R., Lupien M., Markowitz S., Scacheri P.C. Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome Res. 2014;24:1–13. doi: 10.1101/gr.164079.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zhu Y., Chen Z., Zhang K., Wang M., Medovoy D., Whitaker J.W., Ding B., Li N., Zheng L., Wang W. Constructing 3D interaction maps from 1D epigenomes. Nat. Commun. 2016;7:10812. doi: 10.1038/ncomms10812. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Sanborn A.L., Rao S.S.P., Huang S.-C., Durand N.C., Huntley M.H., Jewett A.I., Bochkov I.D., Chinnappan D., Cutkosky A., Li J., et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc. Natl. Acad. Sci. USA. 2015;112:E6456–E6465. doi: 10.1073/pnas.1518552112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Peters J.-M., Tedeschi A., Schmitz J. The cohesin complex and its roles in chromosome biology. Genes Dev. 2008;22:3089–3114. doi: 10.1101/gad.1724308. [DOI] [PubMed] [Google Scholar]
  • 28.Mumbach M.R., Satpathy A.T., Boyle E.A., Dai C., Gowen B.G., Cho S.W., Nguyen M.L., Rubin A.J., Granja J.M., Kazane K.R., et al. Enhancer connectome in primary human cells identifies target genes of disease-associated DNA elements. Nat. Genet. 2017;49:1602–1612. doi: 10.1038/ng.3963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.de Wit E., Vos E.S.M., Holwerda S.J.B., Valdes-Quezada C., Verstegen M.J., Teunissen H., Splinter E., Wijchers P.J., Krijger P.H.L., de Laat W. CTCF binding polarity determines chromatin looping. Mol. Cell. 2015;60:676–684. doi: 10.1016/j.molcel.2015.09.023. [DOI] [PubMed] [Google Scholar]
  • 30.Ghirlando R., Felsenfeld G. CTCF: making the right connections. Genes Dev. 2016;30:881–891. doi: 10.1101/gad.277863.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Fulco C.P., Munschauer M., Anyoha R., Munson G., Grossman S.R., Perez E.M., Kane M., Cleary B., Lander E.S., Engreitz J.M. Systematic mapping of functional enhancer–promoter connections with CRISPR interference. Science. 2016;354:769–773. doi: 10.1126/science.aag2445. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Maurano M.T., Humbert R., Rynes E., Thurman R.E., Haugen E., Wang H., Reynolds A.P., Sandstrom R., Qu H., Brody J., et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337:1190–1195. doi: 10.1126/science.1222794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Gilad Y., Rifkin S.A., Pritchard J.K. Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet. 2008;24:408–415. doi: 10.1016/j.tig.2008.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Morley M., Molony C.M., Weber T.M., Devlin J.L., Ewens K.G., Spielman R.S., Cheung V.G. Genetic analysis of genome-wide variation in human gene expression. Nature. 2004;430:743–747. doi: 10.1038/nature02797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Klann T.S., Black J.B., Chellappan M., Safi A., Song L., Hilton I.B., Crawford G.E., Reddy T.E., Gersbach C.A. CRISPR–Cas9 epigenome editing enables high-throughput screening for functional regulatory elements in the human genome. Nat. Biotechnol. 2017;35:561–568. doi: 10.1038/nbt.3853. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Wang Y., Song F., Zhang B., Zhang L., Xu J., Kuang D., Li D., Choudhary M.N.K., Li Y., Hu M., et al. The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Genome Biol. 2018;19:151. doi: 10.1186/s13059-018-1519-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zhou Y., Zhou B., Pache L., Chang M., Khodabakhshi A.H., Tanaseichuk O., Benner C., Chanda S.K. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 2019;10:1523. doi: 10.1038/s41467-019-09234-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Hnisz D., Day D.S., Young R.A. Insulated neighborhoods: structural and functional units of mammalian gene control. Cell. 2016;167:1188–1200. doi: 10.1016/j.cell.2016.10.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Ren G., Jin W., Cui K., Rodrigez J., Hu G., Zhang Z., Larson D.R., Zhao K. CTCF-mediated enhancer-promoter interaction is a critical regulator of cell-to-cell variation of gene expression. Mol. Cell. 2017;67:1049–1058.e6. doi: 10.1016/j.molcel.2017.08.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Oti M., Falck J., Huynen M.A., Zhou H. CTCF-mediated chromatin loops enclose inducible gene regulatory domains. Bmc Genomics. 2016;17:252. doi: 10.1186/s12864-016-2516-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Matthews B.J., Waxman D.J. Computational prediction of CTCF/cohesin-based intra-TAD loops that insulate chromatin contacts and gene expression in mouse liver. Elife. 2018;7:e34077. doi: 10.7554/elife.34077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Frankish A., Diekhans M., Jungreis I., Lagarde J., Loveland J.E., Mudge J.M., Sisu C., Wright J.C., Armstrong J., Barnes I., et al. GENCODE 2021. Nucleic Acids Res. 2021;49:D916–D923. doi: 10.1093/nar/gkaa1087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Gao T., Qian J. EnhancerAtlas 2.0: an updated resource with enhancer annotation in 586 tissue/cell types across nine species. Nucleic Acids Res. 2020;48:D58–D64. doi: 10.1093/nar/gkz980. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Roadmap Epigenomics Consortium. Kundaje A., Meuleman W., Ernst J., Bilenky M., Yen A., Heravi-Moussavi A., Kheradpour P., Zhang Z., Wang J., Ziller M.J., et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Lander E.S., Linton L.M., Birren B., Nusbaum C., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W., et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. [DOI] [PubMed] [Google Scholar]
  • 47.Quinlan A.R., Hall I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Ramírez F., Ryan D.P., Grüning B., Bhardwaj V., Kilpert F., Richter A.S., Heyne S., Dündar F., Manke T. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44:W160–W165. doi: 10.1093/nar/gkw257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Grant C.E., Bailey T.L., Noble W.S. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27:1017–1018. doi: 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Castro-Mondragon J.A., Riudavets-Puig R., Rauluseviciute I., Lemma R.B., Turchi L., Blanc-Mathieu R., Lucas J., Boddie P., Khan A., Manosalva Pérez N., et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2022;50:D165–D173. doi: 10.1093/nar/gkab1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Reiff S.B., Schroeder A.J., Kırlı K., Cosolo A., Bakker C., Mercado L., Lee S., Veit A.D., Balashov A.K., Vitzthum C., et al. The 4D Nucleome Data Portal as a resource for searching and visualizing curated nucleomics data. Nat. Commun. 2022;13:2365. doi: 10.1038/s41467-022-29697-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Durand N.C., Shamim M.S., Machol I., Rao S.S.P., Huntley M.H., Lander E.S., Aiden E.L. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Lee M.G., Villa R., Trojer P., Norman J., Yan K.-P., Reinberg D., Di Croce L., Shiekhattar R. Demethylation of H3K27 regulates polycomb recruitment and H2A ubiquitination. Science. 2007;318:447–450. doi: 10.1126/science.1149042. [DOI] [PubMed] [Google Scholar]
  • 54.Herz H.-M., Mohan M., Garruss A.S., Liang K., Takahashi Y.H., Mickey K., Voets O., Verrijzer C.P., Shilatifard A. Enhancer-associated H3K4 monomethylation by Trithorax-related, the Drosophila homolog of mammalian Mll3/Mll4. Genes Dev. 2012;26:2604–2620. doi: 10.1101/gad.201327.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Creyghton M.P., Cheng A.W., Welstead G.G., Kooistra T., Carey B.W., Steine E.J., Hanna J., Lodato M.A., Frampton G.M., Sharp P.A., et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl. Acad. Sci. USA. 2010;107:21931–21936. doi: 10.1073/pnas.1016071107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Allis C.D., Jenuwein T. The molecular hallmarks of epigenetic control. Nat. Rev. Genet. 2016;17:487–500. doi: 10.1038/nrg.2016.59. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1, S2 and Tables S1, S2
mmc1.pdf (676.1KB, pdf)

Data Availability Statement

This paper analyzes existing, publicly available data. These accession URLs for the datasets are listed in the key resources table. The accession numbers of publicly Epigenomics datasets used in this study are shown in Table S2.

Source code and tutorials are publicly available online at https://github.com/CSUBioGroup/Ecomap-loop.

Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.


Articles from iScience are provided here courtesy of Elsevier

RESOURCES