Skip to main content
iScience logoLink to iScience
. 2025 Jun 9;28(7):112855. doi: 10.1016/j.isci.2025.112855

Chromatin interaction-based annotation of regulatory elements reveals dynamic promoter-enhancer interactions in lymphocyte development

Johanna Tingvall-Gustafsson 1,2, Christina T Jensen 1, Jonas Ungerbäck 1,4, Mikael Sigvardsson 1,2,3,5,
PMCID: PMC12275072  PMID: 40687796

Summary

Stage- and lineage-specific gene expression patterns are controlled by a complex interplay between transcription factors, the epigenetic landscape, and the three-dimensional (3D) structure of the DNA. The 3D structure allows for the formation of DNA loops that juxtaposition distal regulatory elements to the promoters, allowing for tight control of gene expression. Developing a tool to facilitate the exploration of complex gene regulatory networks based on chromosome configuration data in early lymphocytes, we show that lineage-specific transcription factors target regulatory elements annotated to both lineage-specific and broadly expressed genes. Several regulatory elements annotated to lineage-specific genes were also annotated to alternative promoters in a context-dependent manner, revealing a highly complex interplay between promoters and DREs in early lymphocyte development. These data highlight how efficient annotation procedures for linking distal regulatory elements to target genes provide valuable insights into gene regulatory networks.

Subject areas: Chromosome organization, Molecular genetics, Epigenetics, Components of the immune system

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • ICE-A interaction-based annotation facilitates gene regulatory network exploration

  • Lineage-specific TFs target both lineage-restricted and broadly expressed genes

  • Lymphocyte development is associated with a highly dynamic genome 3D organization

  • EBF1 is a key mediator of the promoter-enhancer landscape in B cell development


Chromosome organization; Molecular genetics; Epigenetics; Components of the immune system

Introduction

Differentiation of hematopoietic stem cells into highly differentiated blood cells requires the orchestrated action of transcription factors (TFs), establishing gene regulatory networks (GRNs). The TF networks that control the separation of the B- and T-lymphoid lineages have been extensively studied, and several key regulators of fate determination have been identified. T-lineage cell fate is dependent upon the TFs TCF7, GATA3, and BCL11B, which act in a functional hierarchy to activate T-lineage-specific gene expression programs and establish a stable commitment to the T-lineage cell fate.1,2,3,4,5 In a similar manner, the TFs TCF3, EBF1, and PAX5 act in a functional hierarchy to establish B-cell fate in lymphoid progenitors.6,7,8,9,10,11,12 While several key regulators of lymphoid cell fate determination have been identified, it remains to be resolved how these factors regulate transcription from target genes to drive lineage commitment and dictate the cell fates of multipotent progenitor cells. Although the binding sites for TFs in the genome can be identified, most of these cannot easily be assigned to any specific target gene using conventional annotation approaches.6,8,10,11,13

Regulatory elements make up around 6% of the human genome, and it has been estimated that each promoter can interact with on average 5–50 distal regulatory elements (DREs).14,15 While most of the regulatory elements that affect transcription in budding yeasts are located within a few hundred base-pairs (bp) of their target promoters,16 DREs located several kilobases (kb) from their promoter elements have been identified in Drosophila.17 These elements are, however, mainly acting on proximal genes.17 In mammals, it has been estimated that the average distance between promoters and DREs lies in the range of 100–500 kb,14,18,19,20 and only 27%–60% of these DREs act on their most-proximal promoter.21,22,23 It has been suggested that the ability of DREs to act over such long distances reflects the fact that the genome is arranged in a highly organized and dynamic three-dimensional (3D) architecture.24,25 This architecture comprises large-scale structures of chromosomal territories, compartments and topologically associated domains (TADs), as well as the smaller-scale structures of chromatin loops.26,27 These structures can serve to connect physically various DREs, such as enhancer elements, to promoters,28,29 and it has been proposed that they are important for functional gene regulation.30,31,32,33 Even though the classical model for static promoter enhancer interactions is challenged,34,35,36 most models predict that the functional interplay between regulatory elements involves proximity in a 3D space at some point in the activation process.29

Even though it is well established that DREs in human and mice act over long distances and not always at the most-proximal gene,21,22,23 the standard methods used for the annotation of DREs to target genes are based on proximity on the linear DNA strand. Improvements have been made to the standard proximity-based annotation to facilitate the annotation of more distal peaks. This includes the use of the Genomic Regions Enrichment of Annotations Tool (GREAT),37 which by introducing a concept of gene regulatory domains surrounding the transcription start site (TSS) and extending the annotation to domains of neighboring genes, allows for increased inter-element distances and the assignment of more than one gene for each DRE. However, all proximity-based methods are highly dependent upon the local gene density and are restricted by an upper distance limit for the identification of regulatory elements (Figures S1A and S1B). This local gene density dependence, combined with the inability to incorporate the dynamic aspect of chromatin organization into the annotation, highlights the need for improved annotation methods for DREs. As an alternative to proximity-based annotation, methods to link gene expression to DRE-accessibility have been employed.38 While this approach creates a functional link between a gene and a regulatory element, the method assumes that DRE activity is always reflected in changes in chromatin accessibility. With advancements in methods that capture chromatin-genome 3D organization (e.g., Hi-C, HiCHiP), interaction-based assignment of target gene to DREs has emerged as an alternative solution. Interaction-based annotation is not constrained by the simplistic linear representation of the genome and can incorporate cell type-specific or dynamic aspects of genome organization into the annotation process. There are also publicly available databases for chromatin interaction data,39 ensuring that interaction datasets are readily accessible for use in annotation. To address the need for easy-to-use tools that can improve the annotation of DREs, we have generated an Interaction-based Cis-regulatory Element Annotator (ICE-A) (Figure S1C). ICE-A is based on the widely used and reproducibility-focused workflow management system Nextflow.40 It can, with a single command, perform interaction-based annotation of one or more sets of genomic regions (e.g., peak files from an ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) or ChIP-seq experiment). In addition, ICE-A perform interaction-centered annotation and provide several options for visualization of the results. Using this tool, we have investigated the regulatory landscapes in lymphoid progenitor cells. This has revealed that lineage-restricted TFs frequently bind to cis-regulatory elements that are annotated to broadly expressed genes, several of which are essential for B-lymphocyte development. Furthermore, regulatory elements annotated to lineage-specific genes could be annotated to alternative promoters in a context-dependent manner. The results show how interaction-based annotation of regulatory elements has the potential to increase significantly our understanding of GRN in health and disease.

Results

Interaction-based annotation allows for cell type-specific identification of cis-regulatory elements

Current standards for the annotation of DREs to target genes are often based on the concept of proximity, defined as the distance between the DRE and the target promoter in the linear DNA (Figure S1D). Thus, the abilities of the conventional methods for annotation of DREs, such as proximity and GREAT, are dependent upon the local gene density and distances to neighboring genes. To determine how this affects the possibility to identify distal elements, we calculated the theoretical upper distance limit for assigning distal elements to each gene in the human (hg38) and mouse (mm10) genomes. Based on the method-specific annotation rules (i.e., half the distance to a neighboring gene for the standard proximity annotation and the default basal plus extension rule in GREAT), we determined the median distance threshold to be 35 kb or 67 kb for the mouse genome and 47 kb or 91 kb for the human genome, for the standard proximity annotation and GREAT, respectively. If directionality is considered, the ability to identify distal gene regulatory elements is even more restricted in terms of upstream and downstream distances from the TSS (Figure S1A). Even though there are gaps in the knowledge regarding DREs, estimates suggest that the median distance for enhancer-promoter pairs is in the range of 100–500 kb.14,18,19,20 Taking the constraints associated with proximity-based annotation into consideration, it follows that using the rather conservative estimate of 100 kb, it is possible to identify DREs for less than one-third of the genes in the human genome when applying the standard proximity-based annotation (Figure S1B). Thus, there is strong motivation to develop more advanced methods for the assignment of DREs to target genes.

To facilitate the annotation of DREs to their target genes, we developed ICE-A (Figures S1C and S1D), based on the reproducibility-focused workflow management system Nextflow.40 ICE-A incorporates chromatin interaction data into the annotation process using 2D-bed (bedpe) files, making it compatible with several different interaction-calling software. To counteract some of the negative features associated with use of predefined interactions for peak annotation, such as bin size and minimum distance for interaction calling, ICE-A allows for combined usage of interaction-based and proximity-peak annotation systems. The main output from ICE-A is a single text file for each provided input bed file with information about each annotated element, including the gene symbol, entrezID, distance to the TSS for each assigned target gene, type of annotation used (proximal or interaction-based), and the interaction score from the 2D-bed file (if provided). To allow for the incorporation of different types of data, ICE-A has three different modes. The Basic mode performs interaction-based annotation one or more individual bed files. The Multiple mode is suitable in situations where the overlap between sets of regions (e.g., co-occupancy of multiple transcriptional regulators) is of interest for the analysis. In addition to performing annotation of every set of regions, the Multiple mode identifies and visualizes overlaps in the form of an UpSet plot or an interaction-based Circos plot. The third mode integrates gene expression data allowing for annotated elements to be associated with changes in gene expression. As ICE-A works with pre-processed data files; run times are short. A run using basic mode with four peak files takes approximately two minutes and, in multiple mode, including generation of an upset plot, eight minutes on an eight-core laptop. Thus, ICE-A is an analysis tool that facilitates the integration of different types of sequence-based omics data, making it highly suitable for the analysis of gene regulatory elements (GREs).

To compare the performances of ICE-A and proximity-based methods for the basic annotation of DREs to target genes, we used CRISPRi-FlowFISH data from functionally validated enhancer-gene pairs in the myelogenous leukemia cell line K562.41 In this paper, the functionalities of enhancers based on their involvement in transcription were determined in an unbiased way, by evaluating the functional impacts of all open chromatin regions within 450 kb of the TSS as potential regulatory elements.41 The interaction-based annotation of distal elements with ICE-A were based on identification of significant interactions called from H3K27ac HiChIP data (GSM2705043-45) generated from K562 cells.42 This combination of datasets has previously been used for benchmarking of other tools for chromatin interaction processing and analysis,43,44 as H3K27ac can be found at the majority of the functional enhancer elements (Figure S1E). All the evaluated annotation tools identified most of the regulatory elements that were located <10 kb from the TSS (Figure 1A). However, for the annotation of distally located enhancers (>50 kb from the TSS), ICE-A was superior to the two proximity-based annotation methods (Figure 1A). ICE-A identified at least one distal enhancer (>10 kb from the target TSS) for 15 of the 17 genes with experimentally validated distal enhancers, as compared with 5 and 0 genes for the GREAT and HOMER proximity annotation, respectively (Figure 1B). Using ICE-A with inactivated proximity annotation function revealed that about 30% of the elements located within 25 kb from the TSS lost annotation (Figure 1D) resulting in reduced efficiency in DRE annotation (Figure 1E). The effect on annotation of elements located more than 25 kb from the TSS was marginal (Figures 1D and 1E). The advantage of using ICE-A interaction-based annotation over the proximity-based methods is exemplified by the GATA1 locus (Figure 1C). ICE-A identified three out of the five CRISPRi-FlowFISH validated enhancers, including a distal element located 400 kb downstream of the TSS, as compared to zero and one element for the standard proximity-based annotation and GREAT, respectively. To estimate the precision in the different methods for DRE annotation, we determined the fraction of functionally relevant elements in the context of the K562 cells (Figure 1F). Using conventional proximity annotation, about 14% of the distal enhancers located within 25 kb of the TSS, were proven to be functionally relevant. Basing our analysis on GREAT annotation, this frequency was increased to 25% while using ICE-A in a default setting identified 20% of functional elements. Exploring the ability to annotate elements located more than 25 kb from the TSS revealed that only ICE-A could efficiently identify such elements with an efficiency of about 7% in a default setting (Figure 1F). The frequency of functional distal enhancers annotated could be increased by focusing on the highest ranked elements (Figure 1F). Hence, even if the precision of ICE-A was somewhat lower than for GREAT in identification of functional elements located within 25 kb of the TSS, ICE-A was outstanding regarding identification of more distally located elements. Hence, the ability of ICE-A to identify a higher number of functional regulatory elements (Figure 1B) does not come with a dramatic reduction in precision.

Figure 1.

Figure 1

Interaction-based annotation of DREs has advantages over proximity-based approaches

(A–C) CRISPRi-FlowFISH -validated enhancers from Fulco et al. (2019)41 are used for comparisons of the HOMER proximity annotation, GREAT and ICE-A, with respect to their abilities to annotate distal elements. For the ICE-A interaction-based annotation, FitHiChIP significant interactions (q-value ≤0.05) based on H3K27ac HiChIP (GSM2705043-45) and ChIP-seq (GSM733656) data were used. (A) Fractions of validated enhancers correctly annotated to their corresponding target genes using the different annotation methods. The percentages of identified enhancers are reported for different distance intervals from the relevant TSS. (B) Comparisons of the numbers and percentages of identified functional distal enhancers (>10 kb from the TSS) per gene, for the HOMER proximity annotation, GREAT and ICE-A. (C) Example of enhancer annotation at the Gata1 locus. The H3K27ac ChIP-seq, CRISPRi-FlowFISH -validated GATA1 enhancers, and H3K27ac HiChIP (filtered for validated enhancer-promoter interactions) data are visualized in the WashU Epigenome Browser. For each annotation method, the identified enhancers are presented.

(D–F) Average number (D) and percentage (E) of significant enhancers from Fuclo et al. identified per gene using different annotation methods. (F) Average percentage of total elements assigned to each gene with a significant impact on target gene expression. Annotation methods explored in F-H are as follows: Proximity (HOMER), GREAT, ICE-A default, ICE-A no proximity (annotations based on proximity annotation excluded), ICE-A top 50% (top 50% most significant interactions), ICE-A top 25% (top 25% most significant interactions), ICE-A top 5% (top 5% most significant interactions).

To test the efficiency of ICE-A’s ability to identify functional enhancer elements in a different cellular context, we analyzed an independent dataset assigning regulatory elements to essential genes in cell lines45 (Figure S1F). ICE-A was able to identify 2–3 times as many functional enhancer gene pairs in the colon cancer cell line (HC1116) as compared to GREAT or proximity annotation methods (Figure S1G). Basing the analysis on Hi-ChIP data from a lung carcinoma cell line (A549) or using ICE-A without proximity annotation impaired the ability of ICE-A to identify relevant control elements (Figure S1G). As for the analysis of the K562 dataset, the larger number of enhancer/gene pairs identified using ICE-A did not impair the precision of the annotation analysis (Figure S1H). Thus, ICE-A allows for a more comprehensive annotation of DREs than the current standard methodology.

ICE-A identifies key regulatory networks in lymphocyte development

The 3D structure of the genome is not static, and chromatin interactions linking a distal enhancer to a target gene can be dependent upon the cellular state or developmental stage.46,47,48,49 One advantage of the chromatin interaction-based annotation is the ability to incorporate this dynamic aspect of genome organization into the annotation process, allowing for the identification of target genes for DREs in a cell type-specific manner. Having developed an efficient tool for interaction-based annotation of DREs, we used lineage-specific target gene assignment of DREs to explore the GRNs active in early lymphocyte development. To this end, we took advantage of the gene expression data for hematopoietic cells generated within the frame of the Immgen consortium to identify genes that are selectively expressed in B cell development, as compared to those expressed in T cell development (Figure 2A; Table S1B). To annotate regulatory elements to these genes in their respective cell types, we used the ICE-A annotation based on our previously published H3K4me3 HiChIP data from the B-cell progenitor cell line 230–238 (GSM4964247). For the annotation of elements to T-lineage genes, we generated H3K4me3 HiCHIP data from the T cell progenitor cell line (Scid.adh.2C2). Exploring the ATAC accessibility of DREs annotated to B-lineage-restricted cells (B-DRE) in B-cell progenitors (pro-B cells), as compared to T cell progenitors (pro-T cells), revealed generally larger accessibility in B-lymphoid cells (Figure 2B). The opposite was observed for DREs annotated to T-lineage genes (T-DREs). Even though a major part of the elements were identified based on the interaction data (Figure S2A), a set of elements displaying lineage restricted accessibility was identified via the proximity function of ICE-A (Figure S2B). These data suggest that ICE-A can be used to identify lineage-restricted regulatory elements.

Figure 2.

Figure 2

ICE-A interaction-based annotation can identify lineage-specific DREs

(A) Heatmap with z-scores of the normalized gene expression counts of early lymphoid progenitors from the Immgen consortium (GSE100738). Genes with significantly different expression patterns between FrBC B-cell progenitors and DN2b T cell progenitors are shown. K-means clustering was used to define the B- and T cell progenitor-specific gene sets.

(B) Violin plot of normalized ATAC-seq signals of distal elements annotated to a B- or T-lineage gene by ICE-A interaction-based annotation, with H3K4me3 PLAC-seq data from a cell line matching the cell-type-specific gene sets. The ATAC-seq signal levels are compared between B- (230–238) and T- (Scid.adh.2C2) cell progenitor cell lines. Statistical analyses are based on the Mann–Whitney U-test, ∗∗∗∗p < 0.0001.

(C and D) Top-rated enriched motifs based on HOMER de novo motif analysis for distal B- (C) and T- (D) cell elements. Bold text indicates lineage specific transcription factors. B- and T-lineage elements are defined as open chromatin regions annotated to B- or T-lineage genes sets using ICE-A with H3K4me3 PLAC-seq interaction in 230–238 and Scid.adh.2C2 cells, respectively.

(E and F) Output from a transcription factor (TF) co-occupancy analysis in ICE-A run in multiple mode, including an UpSet plot of TF co-occupancy of distal elements and a Circos plot of TF overlap in distal and promoter regions of previously defined, cell-type-specific gene sets.

To identify the key TFs of the GRNs in B-cell and T cell progenitors, we performed motif enrichment analysis of elements that were annotated to lineage-restricted genes by ICE-A (Tables S1C and S1D). The B-DREs were enriched for binding sites for BORIS, ETS (SPI-B), and RUNX proteins, as well as for the lineage-restricted TF EBF1 (Figure 2C). T-DREs were similarly predicted to bind the RUNX and ETS proteins as well as the TCF7 and GATA proteins (Figure 2D). To explore the abilities of these TFs to bind ICE-A-defined regulatory elements, we took advantage of the multiple mode in ICE-A for the identification of TF occupancy at the identified DREs. We included the ChIP-seq data for the TFs EBF1 and PAX5 in pro-B cells and for TCF7 and GATA3 in pro-T cells. Of the B-DREs, 27% were detected as being bound by EBF1, often in combination with PAX5 (Figure 2E). In addition, EBF1 bound 8% of the T-DREs in the pro-B cells (Figure 2F). TCF7 binding was detected at approximately 25% of the T-DREs and at 4% of the B-DREs. The observation that EBF1 and PAX5 bind to a substantial fraction of the T-DREs, as well as at promoters of T-lineage-restricted genes (Figure 2F) may reflect the abilities of these B-lineage TFs to repress alternative lineage programs in B-cell progenitors. Our findings are concordant with the essential functions of EBF1 in B cell development and TCF7 in T cell differentiation, suggesting that analysis of ICE-A annotations can be used to identify critical components of lineage-restricted GRNs in development.

Lineage-restricted transcription factors interact with the regulatory elements of broadly expressed genes

Even though lineage-specific TFs, such as EBF1 and PAX5, control the expression of lineage-restricted genes, they also appear to participate in the regulation of genes that are involved in basic biologic processes, such as proliferation, cell survival, and metabolism.6,7,8,9,10,11,50 To gain a better understanding of how lineage-restricted TFs are integrated into stage and lineage restricted regulatory networks, we used ICE-A to scrutinize GRNs out from a TF-centered perspective, through the identification of target genes annotated to bound regulatory elements. Thus, we identified EBF1-bound promoters and DREs using ChIP-seq data derived from the 230–238 pro-B cells, and annotated the bound elements to coding genes using ICE-A. Using gene expression data from the Immgen consortium, we then investigated how these target genes were expressed by hematopoietic progenitors (Figure 3A). While B-lineage restricted genes were identified among the target genes, only 20% (Cluster 5) displayed a B-lineage restricted expression pattern and a substantial fraction of the genes were broadly expressed. Performing the same analysis for PAX5, GATA3, and TCF7 yielded highly similar results (Figure S3A) identifying several broadly expressed target genes. The binding of lineage-restricted TFs to regulatory elements annotated to broadly expressed genes suggests that these proteins are integrated into the global GRN of the cell, supporting the idea that these lineage specific TFs control basic cellular processes.6,7,8,9,10,11,50

Figure 3.

Figure 3

EBF1 interacts with the regulatory elements of broadly expressed genes

(A) Heatmap with the z-scores of normalized gene expression of Ebf1-bound elements in early lymphoid progenitors from the Immgen dataset (GEO: GSE100738). Percentage of genes in each cluster is presented.

(B) Gene ontology (GO) analysis of enriched biologic processes in genes bound by EBF1 at promoters or distal regions. Gene sets are categorized based on differential expression between Wt and Ebf1−/− FL pro-B cells [Up-/Down-regulated: padj 0.05 & |log2FC| >1, common: log2FC < log2(1.5)].

(C) Diagram displaying the relative number of generated CD19+ cells after 11 days of in vitro incubation of KIT+ cells on OP9 stroma cells compared to R26 control. The red to blue color scale represents log2FC Wt vs. Ebf1−/− from RNA-seq. Data are based on at least 3 separate experiments, and samples (n = 3–12) with a single guide or pool of individual guides are aggregated per gene. Statistical analysis is based on Student’s t test using the R26 control guide as reference: ∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, ∗∗∗∗p < 0.0001.

(D) Promoter and distal elements associated to the gene sets from 2B are identified by ICE-A interaction-based annotation with H3K4me3 PLAC-seq data from FL Wt pro-B cells. Fraction of elements with a significant change (padj <0.05 & abs(log2FC) > 1) in accessibility between Wt and Ebf1−/− FL pro-B cells are visualized in a stacked barplot, split into categories based on gene category and EBF1 occupancy.

(E) The Myc locus is shown as an example of a gene that shows EBF1-addicted activation, visualized in the WashU genome Epigenome Browser. Selected EBF1-bound elements located in a distal enhancer region ∼1.6 Mb from the Myc TSS are highlighted in gray. The epigenome browser tracks include EBF1 ChIP-seq (GEO: GSE159957), ATAC-seq data from Wt/Ebf1−/− FL pro-B cells and H3K4me3 PLAC-seq interactions from Wt FL pro-B cells.

(F) Venn diagram displaying the overlap between common genes (defined as in B) and genes with differential expression at EBF1 degradation (|log2FC| > log2(1.5)), from (GSE201141). Genes that show the strongest EBF1-addicted activation or repressive expression are listed.

To gain further insights into how lineage-restricted TFs, such as EBF1, are integrated into the transcriptional context of a lymphoid progenitor cell, we compared the expression levels of EBF1 target genes in in vitro-expanded Ebf1−/− cells and in wildtype Wt pro-B cells. The genes were classified as either up- or down-regulated (padj <0.05 & |log2FC| >1 in Wt vs. Ebf1−/−) or common (|log2FC| < log2(1.5)). Performing a gene ontology analysis, the differentially expressed EBF1 targets, upregulated or downregulated in Wt cells, were enriched for genes linked to GO-terms such as B-cell activation and lymphocyte activation, independently of whether EBF1 was bound at the promoter or at a DRE (Figure 3B). The target genes that were not differentially expressed, rather encoded proteins that are involved in basic biologic processes, such as RNA processing and translation (Figure 3B). To determine if broadly expressed target genes are of importance for B cell development, we used hematopoietic progenitor cells (KIT+ cells) that carry a doxycycline-responsive CAS9-encoding gene. The cells were transduced with lentiviruses carrying a florescent reporter gene (mCherry) and express guides, that target genes annotated to EBF1-bound elements. After sorting for mCherry+ cells, the progenitors were seeded onto OP9 stroma cells in the presence of interleukin (IL)-7, FLT3/FLK-2 ligand (FL) and KIT ligand (KL), and the numbers of CD19+ cells generated in the cultures were determined by flow cytometry. While inactivation of the gene encoding KLF2 resulted in increased generation of CD19+ cells, targeting the histone-encoding H3f3a, the ribonuclear proteins Hnrnpa2b1 and Hnrnpu or the DNA topoisomerase Top2a genes resulted in reduced generation of B-cell progenitors (Figure 3C). To determine if any of these EBF1 target genes are important for T cell development, we incubated the progenitor cells on a NOTCH ligand expressing stroma cell, OP9 Delta-1 (OP9D) under conditions allowing for the combined generation of B and T-lineage cells. While the generation of B-lineage cells was impaired by the downregulation of the same target genes as observed in the OP9 supported B-cells cultures, the generation of T-lineage cells were only significantly impaired upon targeting of Top2a (Figure S3B). Therefore, several broadly expressed EBF1 targets genes are of importance for the generation of B-lineage cells.

To attain a better understanding of how EBF1 is integrated into the GRNs of broadly expressed genes, we took advantage of the ATAC-seq data from in vitro-expanded Ebf1−/− and Wt pro-B cells. All the genes were annotated to regulatory elements with ICE-A, and using the EBF1 ChIP-seq data, we identified the elements that directly bound by EBF1 (EBF1+) (Figure 3D). Exploring the changes in accessibility for the elements annotated to genes expressed at higher levels in Wt cells than in Ebf1−/− pro-B cells (Up), we detected increased accessibility (|log2FC| >log2(1) at 45%–50% of the elements bound by EBF1. While the promoters linked to commonly expressed genes displayed few changes in accessibility, we detected increased accessibility for approximately half of the EBF1-bound DREs, while elements not bound by EBF1, frequently displayed a loss of accessibility. Motif enrichment analysis revealed that elements linked to differentially regulated genes were enriched for unique TF binding sites (Figure S3C), indicating that differential function of EBF1 may be context dependent. Thus, EBF1 is integrated into the GREs of commonly expressed genes both by binding to already accessible elements and by modifying the epigenetic landscape at targeted DREs. This is exemplified by the Myc gene, where EBF1 interacts with DREs that display both EBF1-dependent and EBF1-independent accessibility (Figure 3E). This analysis suggests a substantial degree of epigenetic dynamics at elements that are involved in the regulation of broadly expressed genes, and that EBF1-mediated repression may not be accompanied by reductions in epigenetic accessibility at the targeted element.

To investigate the functional impact of EBF1 integration into the GRNs that control the expression of commonly expressed genes, we analyzed data from an experiment in which endogenous EBF1 was replaced by a protein fused with FKBP (F36V),51 resulting in rapid degradation of the protein following the addition of the drug TAG13. Analyzing the RNA expression levels, we detected decreased expression (padj <0.05 & log2FC > log2(1.5)) of 281 genes, 58 of which belong to the commonly expressed genes (log2FC < log2(1.5) in Wt vs. Ebf1−/− pro-B cells), upon degradation of EBF1 (Figure 3F). Using the same criteria, we found 354 genes whose expression levels were upregulated upon loss of EBF1; of these, 68 were classified as commonly expressed genes mainly associated with basic biological functions of the cell (Figure S3D). Thus, lineage-specific TFs can modulate the GRNs controlling the expression of broadly expressed genes, to establish tissue-specific control of basic biologic processes.

ICE-A annotation reveals the dynamics of promoter/enhancer interactions in early lymphocyte development

Just as promoters may collaborate with several DREs,14,15 it has been suggested that enhancers can target several promoters.21,22,36,52,53,54 The comprehensive annotation of DREs using ICE-A opens the exciting possibility to explore the complexity of promoter-enhancer pair interactions in different cellular contexts. To this end, using ICE-A, we identified DREs that were annotated to lineage-restricted genes (B-DREs or T-DREs) (Figure 2). We then determined their interactions with target promoters using H3K4me3 PLAC-seq data from in vitro-expanded Wt or Ebf1−/− pro-B cells, as well as pro-T cells (Scid.adh.2C2) (Figure 4A). In the pro-T cells, we detected 1,186 interactions between DREs and promoters of T-lineage-restricted genes (TPs) (Table S1H), as well as 854 interactions between DREs and promoters of B-lineage-restricted genes (BPs) (Table S1H), and 5,324 interactions between DREs and the promoters of commonly expressed genes (UPs). Examination of the B-DRE and T-DRE interactome in Wt pro-B cells revealed a substantial increase in the number of interactions with BPs and a reduction in the number of interactions with TPs, as compared to what we observed in T-lineage cells. However, the majority (66%) of the detected interactions were, just as in the T-lineage cells, with UPs. The interactome of Ebf1−/− pro-B cells appeared to have an intermediate profile, with comparable numbers of interactions between the BPs and TPs and the DREs. To investigate the dynamics of the interactomes for lineage-restricted DREs, we determined the levels of conservation of the detected interactions in the different the cell types (Figure 4B). Among the 1,186 identified interactions with TPs in T-lineage cells, 42% were exclusively detected in the T-lineage cells, whereas 32% were conserved in the Wt pro-B cells. Similar numbers were obtained when determining the interactome of the DREs linked to BPs, as 46% of the interactions were unique to the Wt pro-B cells, while 26% of the interactions were shared between the B- and T-lineages. Hence, even if the interactions between B- and T-DREs and UPs are relatively conserved, lineage specific interactions were easily detectable. For the interactions with UPs, the degree of conservation between lineages was higher than for interactions with lineage-restricted genes, with >50% conservation between B-cell and T cell progenitors for both B-DREs and T-DREs (Figure 4C). The analysis also showed that many of the lineage-restricted interactions were conserved in the Ebf1−/− pro-B cells (Figure 4B), confirming that the EBF1-deficient pro-B cells display an interactome that is intermediate to those of the pro-B and pro-T cells. This is exemplified by the Gata3 gene. The detected promoter-enhancer interactions are almost identical in pro-T cells and Ebf1−/− pro-B cells (Figure 4D). Even though the low level of H3K4me3 at the Gata3 promoter in the Wt-pro-B cells makes it difficult to determine the actual structure of the locus, we detected interaction to an alternative EBF1 and PAX5 bound promoter from one of the distal Gata3 elements solely in the pro-B cells. These data underline the complexity of GRNs in early lymphocyte development and underpin the importance of EBF1 for the establishment of the epigenetic landscape during B cell development.

Figure 4.

Figure 4

ICE-A annotation reveals the dynamics of promoter-enhancer interactions in early lymphocytic development

(A) Circos plot of chromatin interactions between putative lineage-specific distal elements (open chromatin regions in B-cell and/or T cell progenitor cell lines, with associations to the B-cell or T cell gene list defined in Figure 2A). Chromatin interactions are based on H3K4me3 PLAC-seq interactions for the Scid.adh.2C2 T cell line, Wt FL pro-B cells and Ebf1−/− FL pro-B cells and are restricted to interactions with H3K4me3-associated promoter regions. Interactions with alternative promoters are included.

(B and C) Venn diagram displaying overlap in DRE interactions to lineage restricted (B) and common (C) promoters. For each lineage, the fraction of conserved interaction from putative lineage-specific distal element (B/T-DREs) in the alternative lineage and Ebf1−/− FL pro-B cells are presented.

(D) Visualization of TF binding, ATAC-seq signal and H3K4me3 status for the Scid.adh.2C2 T cell line, Wt and Ebf1−/− B-cell progenitors at the Gata3 locus in the WashU Epigenome Browser. H3K4me3 PLAC-seq data from all cell types are included, filtered for interactions associated with putative B- or T-lineage enhancers.

Discussion

To expand our understanding of stage- and lineage-specific gene activities, there is an increasing need to accurately assign regulatory elements to their target genes. We here present ICE-A, a software that allows for the easy incorporation of chromosomal configuration data to annotate DREs to coding genes. We find that using chromatin interactions to identify putative target genes for distal elements reduces the impact of known limitations that hamper the efficiency of proximity-based methods. In addition, the ability of chromatin capture methods to reflect the dynamic and cell type-specific aspects of genome organization will be reflected in the annotation process and can provide an additional layer of understanding of gene regulatory mechanisms. Even though ICE-A outperforms the proximity-based methods with respect to the annotation of distal enhancers, it is worth noting that GREAT detected the highest number of the most proximal enhancers (<10 kb from the target TSS) (Figure 1A). This is expected considering the principle of basal gene regulatory domains plus the extension principle applied by GREAT, which allows for the assignment of more than one gene per element. This is in contrast to the single closest gene assignment applied in standard proximity annotation as well as in ICE-A for distances below the user-specified interaction threshold. The improved detection rate of GREAT for proximal enhancers, which is linked to the potential for multiple annotations per regulatory region, may come at the cost of an increased rate of “false” annotations. Determining the frequency of elements verified as functionally relevant suggested that GREAT was somewhat more efficient than ICE-A (25% as compared to 21%) when it came to the identification of functional elements located within 25 kb of the TSS. However, ICE retained a good efficiency to identify relevant DREs over 25 kb from the TSS that was not matched by GREAT. Even though ICE-A may identify DREs that are not critical for the regulation of the gene in question in each cell type, the annotation is based on direct interactions of the enhancers with the promoter that may be of relevance in another cellular context. However, false positives are inevitable, and the extent to which this poses a significant problem is highly dependent upon the application. ICE-A provides several options that could help tailor the annotation to specific needs, including the option to include only one annotation per region by filtering or ranking of interaction score (Figure 1F), selection, or adjustment of the way in which ICE-A handles nearby interactions. Another aspect that gives interaction-based annotation approaches an advantage over proximity annotation strategies is the potential to identify cell type-specific or dynamic regulatory regions. ICE-A does not only outperform existing programs but also allows for easy incorporation of ChIP-seq/Cut&run data; also, gene expression data in the analytic method allows for complementary approaches that are centered on gene expression patterns, TF binding, the epigenetic landscape or chromosomal interactions.

Using a gene-centered approach for the identification of DREs annotated to lineage-restricted genes (Figure 2), we were able to identify TFs that act as key regulators of early lymphocyte development (Figure 2). Motif enrichment analysis identified GATA- and TCF7-binding sites in elements annotated to T-lineage genes, while DREs linked to B-cell genes were enriched for EBF1-binding sites. Thus, ICE-A identifies DREs bound by lineage-specific TFs that act as key regulators of early T cell and B cell development.1,2,3,4,5,6,7,8,9,10,11,12 Exploring a TF-centered approach, in which ICE-A annotates elements bound to coding genes, we were able to identify target genes for the TF, thereby providing insights into how specific DNA-bound proteins affect cellular functions. Our analysis of early lymphocytic development suggests that lineage-restricted TFs target regulatory elements that are annotated to many broadly expressed genes. Focusing on the role of EBF1 in B cell development, several of the bound DREs were annotated to commonly expressed genes, with no significant change in the gene expression levels in Ebf1−/− pro-B cells as compared with Wt pro-B cells. We provide experimental evidence that several of these EBF1 target genes are of importance for the generation of normal B-cell progenitors, supporting the relevance of these target genes (Figure 3C). There are also several examples of broadly expressed genes that become dependent upon EBF1 once this TF has been expressed (Figure 3D). These genes include Myc,50,55 which likely explains the significance of EBF1 for the survival and expansion of pro-B cells.10,50 The mechanism underlying the addiction to EBF1 may involve changes in enhancer or chromatin structure (Figure 3D)13 or, as in the case of the Myc gene, activated transcription of a repressor of the target gene.50 Based on our analysis of early lymphocytic development, we believe that ICE-A-based annotation provides unique and relevant insights into GRNs.

The use of ICE-A allows for easy and extensive exploration of the interactome that underlies the general principles of gene regulation in a specific cellular context. Several model systems have provided evidence that DREs can target multiple promoters.21,22,36,52,53,54 Our analysis of the DRE interactome in early lymphocyte progenitors suggests that a substantial fraction of the elements that interact with promoters to control the expression of lineage-specific genes may also interact with alternative promoters (Figure 4), often in a cell type–specific manner. While linear proximity appears to account for most of the promoter enhancer specificity in yeast16 and to 79%–88% of the promoter enhancer specificity in Drosophila,17 only about half of the DREs in a mammalian genome are estimated to follow this principle.21,22 Therefore, it remains as a persistent challenge to resolve the mechanisms that control enhancer-promoter specificity in complex genomes. In Drosophila, certain DREs show a preference for specific core promoters56,57,58,59 or rely on a proximal “tethering” element that is proximal to the promoter.60,61,62,63 Our analysis suggests that lymphocyte progenitors exploit a pool of DREs to control the expression of both lineage-specific and broadly expressed genes, and that the interactome is highly dynamic with enhancers switching to alternative promoters depending on the cellular context (Figure 5). It has been proposed that alterations in the primary DNA sequence result in the formation of alternative promoter-enhancer interactions.36,52,64,65,66 Our data indicate that promoter selections may be highly lineage- and stage-specific, possibly because of changes to the epigenetic landscape. Such dynamic changes in DRE-promoter interactions may also contribute to the abilities of TFs to act as both activators and repressors.9,10,67,68,69,70,71 This may be of importance in relation to the abilities of lineage-specific proteins such as EBF1, PAX5, GATA3, and TCF7 to interact with both activator and repressor complexes,13,72 possibly making their functions highly context-dependent. Our analysis of EBF-repressed target genes does, however, suggest that EBF1 binding results in increased accessibility of the DRE, even when it is annotated to a repressed gene. This is concordant with the ability of EBF1 to act as a pioneer factor and to promote the formation of phase-separated structures at target genes.13,73,74 While the activation of regulatory elements may appear incongruous with target gene repression, we believe that our findings provide an alternative model for how a TF can contribute to transcriptional repression through diversion of DRE activity from lineage-specific promoters (Figure 5).

Figure 5.

Figure 5

Model of lineage-specific gene regulation in lymphoid development

Schematic representation of a model for lineage specific transcriptional regulation in lymphoid development. Lymphoid progenitors explore a pool of distal regulatory elements (DREs) for control of both lineage specific and common genes. A highly dynamic interactome with distal elements switching to alternative promoters depending on cellular state and developmental stage, allow for control of both lineages restricted and common genes in a cell type-specific manner.

Our use of ICE-A to dissect the roles of GRNs in early lymphocytic development reveals the versatility of the software and highlights the importance of integrating chromosomal configuration data into the DRE annotation processes. Given that more than 90% of disease- and trait-associated genetic variants reside in non-coding regions of the genome, accurate assignment of DREs to target genes could provide insights into the mechanisms operating in various disease states.75 Therefore, we believe that ICE-A can greatly facilitate the exploration of genomics data in an easy and user-friendly manner.

Limitations of the study

During the benchmarking of ICE-A against other tools for assignment of target genes for DREs, only HiChIP data were used for the chromatin-interaction based annotation. ICE-A is compatible with all types of chromatin-interaction data provided in 2D-bed format, and similar results are expected from Hi-C or similar methods considering that the quality and resolution is sufficient.

This study is focused on studying the actions of lineage-restricted TFs and the dynamics of enhancer-promoter interactions in early lymphocyte development. Considering the well-documented importance of lineage-specific TFs and genome 3D structure in development, it is reasonable to assume that some of the findings of this study could be of relevance in other contexts. However, further studies in other biological systems are required in order to determine the generalizability as well as the context-specific features of GRNs.

The exploration of the enhancer-promoter landscape in early lymphocyte development is based on H3K4me3 HiChIP data. Considering that H3K4me3 is a histone modification associated with active promoters, interactions between enhancers and promoters of non-expressed genes are likely to be excluded in the analysis due to non-existent or low levels of H3K4me3. However, it is worth noting that interactions to silenced genes with bivalent promoters (i.e., promoters marked by both H3K4me3 and H3K27me3) is retained.

Resource availability

Lead contact

Requests for further information and resources should be directed to and will be fulfilled by the lead contact, Mikael Sigvardsson (mikael.sigvardsson@liu.se).

Materials availability

This study did not generate unique reagents.

Data and code availability

  • Data: The generated sequencing datasets in this study are deposited in the Gene Expression Omnibus (GEO) database with accession numbers GSE279957 (ChIP-seq) and GSE279961 (PLAC-seq). The GEO accession number and information related to the previously published data used in this study are presented in key resources table and Table S1A.

  • Code: ICE-A is available as an open source tool on GitHub: https://github.com/Tingvall/ICE_A. All the code related to the analysis and generation of figures for this article is available on GitHub: https://github.com/Tingvall/ICEA_analysis.

  • Other: Supplemental information including gene sets, element and guide info is available in Table S1.

Acknowledgments

We are grateful for editorial suggestions from Vincent Collins. This work was supported by grants from the Swedish Cancer Society (23-3019P), the Swedish Childhood Cancer Foundation (2022-0019), Stiftelsen för Strategisk Forskning (IB23-0001), the Swedish Research Council (2021-02379), and Linköping University.

Author contributions

J.T.-G. in collaboration with J.U. designed ICE-A. The analysis using ICE-A was performed by J.T.-G. in collaboration with M.S. The gene targeting KO experiments were performed by C.T.J. While J.T.G. and M.S. wrote the first draft of the manuscript, all authors contributed to the finalization of the manuscript.

Declaration of interests

The authors declare no competing interests.

Declaration of generative AI and AI-assisted technologies in the writing process

AI tools have not been used for the generation of this manuscript.

STAR★Methods

Key resources table

REAGENT or RESOURCE SOURCE IDENTIFIER
Antibodies

Alexa Fluor 700 anti-mouse (30-F11) BioLegend Cat#103128; RRID:AB_493715
CD19 Monoclonal Antibody PE-Cyanine7 (1D3) eBioscience Cat#15360900; RRID:AB_657663
Pacific Blue anti-mouse CD90.2 (Thy1.2) (53-2.1) Biolegend Cat#140306; RRID:AB_10641693
APC/Cyanine7 anti-mouse Ly-6G/Ly-6C (Gr-1) (RB6-8C5) Biolegend Cat#108424; RRID:AB_2137485
APC/Cyanine7 anti-mouse/human CD11b (M1/70) Biolegend Cat#101226; RRID:AB_830642
PE anti-mouse NK-1.1 (PK136) Biolegend Cat#108708; RRID:AB_313395
APC anti-mouse CD11c (N418) Biolegend Cat#117310; RRID:AB_313779
Anti-trimethyl-Histone H3 (Lys4) Millipore Cat#07-473; RRID:AB_1977252

Chemicals, peptides, and recombinant proteins

X-tremeGENE HP DNA Transfection Reagent Roche Cat#6366244001
Lenti-X concentrator Takara Bio Cat#631232
CD117 MicroBeads Miltenyi Biotec Cat#130-097-146
RetroNectin Takara Bio Cat#T100A
Mouse SCF Recombinant Protein PeproTech Cat#250-03
Mouse Flt-3 Ligand (FLT3L) Recombinant Protein PeproTech Cat#250-31L
Mouse IL-7 Recombinant Protein PeproTach Cat#217-17
Doxycycline Hydrochloride Sigma Cat#D3072-1ML
CountBright Absolute Counting Beads Invitrogen Cat#C36950
cOmplete EDTA-free Protease Inhibitor Cocktail Roche Cat#11873580001
Triton X-100 Sigma Cat#T8787
Dynabeads Protein G Invitrogen Cat#10004D
IGEPAL CA-630 Sigma Cat#I8896
RNase A, DNase and protease-free Thermo Fisher Scientific Cat#EN0531
Recombinant Proteinase K Solution Invitrogen Cat#AM2546
NEBuffer 2 New England Biolabs Cat#B7002S
Biotin-14-dATP Invitrogen Cat#19524016
DNA Polymerase I, Large (Klenow) Fragment New England Biolabs Cat#M0210
T4 DNA Ligase Reaction Buffer New England Biolabs Cat#B0202
BSA New England Biolabs Cat#B9000
T4 DNA ligase New England Biolabs Cat#M0202
dNTPs VWR Cat#E636-40UMOLE
T4 DNA Polymerase New England Biolabs Cat#M0203
T4 Polynucleotide Kinase New England Biolabs Cat#M0201
Klenow Fragment (3′→5′ exo-) New England Biolabs Cat#M0212
Ampure XP Reagent Beckman Coulter Cat#A63881
dATP Solution Thermo Fisher Scientific Cat#R0141
dCTP SolutioG Thermo Fisher Scientific Cat#R0151
dGTP Solution Thermo Fisher Scientific Cat#R0161
dTTP Solution Thermo Fisher Scientific Cat#R0171
RPMI 1640 Medium, GlutaMAX Supplement Thermo Fisher Scientific Cat#61870-044
Fetal Bovine Serum Thermo Fisher Scientific Cat#10500064
Sodium Pyruvat Thermo Fisher Scientific Cat#11360-070
MEM Non-Essential Amino Acids Solution Thermo Fisher Scientific Cat#11140035
Gentamicin Thermo Fisher Scientific Cat#11520506
β-mercaptoethanol Sigma Cat#M3148
HEPES Thermo Fisher Scientific Cat#15630056
16% Formaldehyde, Methanol-free Thermo Fisher Scientific Cat#28908
Glycine Sigma Cat#50046
DPBS Thermo Fisher Scientific Cat#12037539
Potassium chloride Sigma Cat#P5405
Sodium Deoxycholate Sigma Cat#30970
Mbol New England Biolabs Cat# R0147
Sodium Bicarbonate Sigma Cat#S5761
Opti-MEM I Reduced Serum Medium Thermo Fisher Scientific Cat#31985-054

Critical commercial assays

ChIP DNA Clean & Concentrator Zymo Research Cat#D5205
NextSeq 500/550 High Output Kit v2 75 cycles Illumina Cat#FC-404-2005
Fast-Link DNA Ligation Kit Epicenter Cat#LK0750H

Deposited data

ChIP-seq H3K27ac K562 ENCODE Project Consortium GEO:GSM733656
HiChIP H3K27ac K562 Mumbach et al.42 GEO:GSE101498
PLAC-seq H3K4me3 HCT116/A549 Chen et al.45 GEO:GSE161873
RNA-seq lymhoid populations Yoshida et al.38 GEO:GSE122597
ATACseq 230238 Strid et al.13 GEO:GSE162858
ATAC-seq P2C2 Ungerbäck et al.3 GEO:GSE93755
H3K4me3 PLAC-seq 230238 Strid et al.13 GEO:GSE162858
H3K4me3 PLACseq P2C2 This study GEO:GSE279961
ChIP-seq EBF1 Somasundaram et al.50 GEO:GSE159957
ChIP-seq PAX5 Okuyama et al.76 GEO:GSE126375
Cut&Run TCF7 Astori et al.72 GEO:GSE131673
ChIP-seq GATA3 Ungerbäck et al.3 GEO:GSE93755
RNA-seq WT/Ebf1KO Strid et al.13 GEO:GSE92434
RNA-seq degradation Zolotarev et al.51 N/A
ATAC-seq WT/Ebf1KO Strid et al.13 GEO:GSE92434
H3K4me3 PLACseq WT/Ebf1KO ProB Somasundaram et al.50 GEO:GSE159957
ChIP-seq H3K4me3 WT/Ebf1KO Jensen et al.7 GEO:GSE162858
ChIP-seq H3K4me3 P2C2 This study GEO:GSE279957

Experimental models: Cell lines

Scid.adh.2C Dionne et al.77 RRID:CVCL_B7SD
293T-HEK Graham et al.78 RRID:CVCL_0063
OP9 Nakano et al.79 RRID:CVCL_4398
OP9D Schmitt et al.80 RRID:CVCL_B218

Experimental models: Organisms/strains

TetO-Cas9 mice (B6.Cg-Col1a1tm1(tetO-cas9)Sho/J) The Jackson Laboratory Strain#029476; IMSR_JAX:029476
R26m2rtTA Zhu et al.81 N/A

Oligonucleotides

Guide oligos (See Table S1E) N/A N/A

Recombinant DNA

pAW13.lentiGuide-mCherry plasmid Addgene RRID: Addgene_104375
psPax2 Addgene RRID:Addgene_12260
pMD2.G Addgene RRID:Addgene_12259

Software and algorithms

Trim Galore (v.0.6.7) Babraham Bioinformatics RRID:SCR_011847
Bowtie 2 (v.2.4.4). Langmead et al.82 RRID:SCR_005476
Deeptools (v1.9) Ramírez et al.83 RRID:SCR_016366
MACS2 (v2.2.7.1) Zhang et al.84 RRID:SCR_013291
IDR (v.2.0.4.2). Li et al.85 RRID:SCR_017237
HiC-Pro (v2.11.4) Servant et al.86 RRID:SCR_017643
FitHiChIP (v9.1) Bhattacharyyae et al.44 N/A
ENCODE ATAC-seq pipeline (v.2.2.1) ENCODE RRID:SCR_023100
STAR (v2.7.3) Doblin et al.87 RRID:SCR_004463
RSEM (v1.3.0) Li et al.88 RRID:SCR_000262
HOMER Heinz et al.89 RRID:SCR_010881
GREAT McLean et al.37 RRID:SCR_005807
ICE-A This study https://github.com/Tingvall/ICE_A; https://doi.org/10.5281/zenodo.15194059
WashU Epigenome Browser (v54.0.6) N/A RRID:SCR_006208
ComplexHeatmap (v2.18.0) Gu et al.90 RRID:SCR_017270
DESeq2 (v1.42.0). Love et al.91 RRID:SCR_015687
DiffBind (v3.12.0) Stark et al.92 RRID:SCR_012918
ggplot2 (v3.5.1) Wickham et al.93 RRID:SCR_014601
ggpubr (v0.6.0) N/A RRID:SCR_021139
clusterProfiler (v4.10.0) Yu et al.94 RRID:SCR_016884
bedtools (v2.30.0) Quinlan et al.95 RRID:SCR_006646
GimmeMotifs (v0.15.2) Bruse et al.96 RRID:SCR_001146
Benchling Biology Software RRID:SCR_013955
circlize (v0.4.16) Gu et al.97 RRID:SCR_002141
ggVennDiagram (v1.5.2) Gao et al.98 N/A
htslib (v1.10.2) Bonfjeld et al.99 N/A
Nextflow Di Tommaso et al.40 RRID:SCR_024135
UpSetPlot N/A RRID:SCR_023225
Cytoscape Shannon et al.100 RRID:SCR_003032
Macs2_IDR pipeline N/A https://github.com/Tingvall/macs2_idr

Experimental model and study participant details

Mice models

TetO-Cas9 mice (stock no: 029476 - B6.Cg-Col1a1tm1(tetO-cas9)Sho/J), bought from JAX, were crossed with R26m2rtTA mice81 to obtain a Cas9 inducible (iCas9) mouse line. Both male and female mice aged 8–16 weeks were used for isolation of BM progenitor cells. Animal procedures were performed with consent from the local ethics committee at Lund University (Sweden).

Cell lines

The DN3-line T cell progenitor cell line Scid.adh.2C2 is a subclone the Scid.adh cell line derived from a spontaneous thymic lymphoma in SCID mutant mouse.77 230-238 is an Abelson transformed mueine pre-B cell line. Cell lines in the laboratory are tested for mycoplasma contamination on a regular basis.

Cell culturing

Scid.adh.2C2 cells were cultured in RPMI1640 with 10% fetal bovine serum, sodium pyruvate, non-essential amino acids, gentamicine and 50 μM β-mercaptoethanol. 230–238 cells were maintained in RPMI1640 supplimented with 10% heat-inactivated fetal calf serum, 25mM HEPES, 50 μg/mL Gentamicin, and 50μM β-mercaptoethanol. Primary bone marrow cells were cultured on OP9 or OP9D stromal cells in OptiMEM media supplemented with 10% fetal calf serum, 50μg/mL gentamicin, 50μM β-mercaptoethanol, 50 ng/mL KIT ligand, 50 ng/mL Fms-like tyrosine kinase 3 ligand (FLT3L) and 50 ng/mL interleukin-7 (IL-7). Cas9 expression was induced with 0.1 μg/mL Doxycyclin (DOX, Sigma). All cytokines were obtained from Peprotech.

Method details

ICE-A

ICE-A is an open source tool for chromatin interaction-based annotation of genomic regions based on the widely used and reproducibility-focused workflow management system Nextflow.40 The tool allows for the assignment of target genes to one or more sets of genomic regions (bed format) using user-specified chromatin interactions. ICE-A accepts interactions in 2D-bed (bedpe) format with any number of columns, if the first six columns specify the genomic coordinates, making it compatible with different interaction-calling tools. ICE-A can be run on a standard laptop, with a typical run time for annotation of 4 peak files for basic and multiple mode is 2 and 8 min respectively. More information and installation instructions can be found at: https://github.com/Tingvall/ICE_A.

Interaction-based annotation

Interaction-based annotation of ICE-A is based on overlaps between the user-defined regions and interaction anchor points. A region overlapping an anchor point is assigned to a target gene if the corresponding interaction anchor overlaps with its promoter region (default: ±2,500 bp from the target TSS). Even though the main concept of ICE-A is to use chromatin interactions to improve the accuracy and specificity of the DRE annotation, the dependence on predefined interactions introduces constraints (e.g., bin size and minimum distance for interaction calling), thereby preventing the annotation of peaks to proximally located genes. Given the high likelihood that peaks located within promoter regions will influence the expression of the corresponding gene, disregarding these constraints would be inappropriate. To handle this aspect, ICE-A combines conventional proximity-based and interaction-based annotation systems to improve the accuracy and cell specificity of the annotation of distal regulatory regions, while still allowing for peaks that are located close to promoter regions to be annotated to the corresponding gene. For peaks located at a distance that is below the interaction threshold (default 2∗bin size) to the closest TSS, the proximal annotation obtained from annotatePeaks.pl for HOMER89 was applied, in addition to any available interaction-based annotations. The default behavior of ICE-A is to use only interactions that overlap with the peak in one of the anchor points and a promoter in the second anchor point. However, the resolution of the interaction-based annotation is limited by the bin size, with the consequence that loops where the true interaction is located several kilobases from the peak can be used for annotation, while closer interactions are ignored based on the arbitrarily defined bins. To make the interaction-based annotation less-stringent, ICE-A offers the possibility to also include neighboring interactions, defined by either the distance or number of bins from the bin overlapping the peak/promoter using the options: –close_peak_type and –close_promoter_type. Adjusting these parameters can also be suitable if the chromatin interactions used for annotation have been subjected to nearby interaction filtering, and in the case of an alternative TSS. As the default, ICE-A uses the TSS positions from HOMER, although it is possible to provide custom annotations to which the genomic regions can be assigned.

The main output from the peak-centered annotation is a single text file for each provided input bed with information about each annotation, including the target gene symbol, entrezID, distance to the target TSS, type of annotation used (proximal/interaction-based), and interaction score (if provided). In contrast to conventional proximity-based annotation, interaction-based annotation can generate multiple annotations per region. ICE-A offers several options for how to handle this depending on the intended subsequent analysis. The default behavior is to present all annotations in a comma-separated format with one row per peak. It is also possible to obtain the output file in a format with one row for each unique annotation. In addition to the annotated peak file(s), a gene list for each input bed file is provided, which is suitable for gene ontology analysis etc.

Modes

To account for different types of data, ICE-A can be run in three different modes. Basic mode performs interaction-based annotation one or more individual bed files. Multiple mode is suitable for cases where identification of overlaps between different sets of regions are of interest, e.g., co-occupancy of multiple transcriptional regulators. In addition to performing an annotation of every peak file, Multiple mode identifies overlaps defined either at the bin level or based on overlaps with customized regions (user-provided or a union of input bed files). Differential mode handles the common situation of comparing two conditions (e.g., differential TF binding). By providing the corresponding gene expression data, distally located peaks associated with changes in gene expression can be identified, as well as being categorized as activating or repressive.

Visualization

ICE-A provides numerous options for the visualization of peak annotations depending on the mode. If run in Multiple mode, the overlaps between a set of regions can be identified and visualized in the forms of UpSet and Circos plots. UpSet plots, which are based on the UpSetPlot library (a Python implementation of the UpSet plots101), show the overlaps of promoters and distally located regions, separately. For the Circos plot, overlaps between input bed files in interacting distal and proximal regions are visualized using the circlize R package.97 Both UpSet and Circos plots can be filtered on elements associated with a user-specified gene list if the option –filter_genes is used. For all three modes, there is the option to visualize the interaction in a network format using Cytoscape.100 If provided, interaction and/or peak scores can be represented by edge weight. If run in Differential mode, the network can be filtered according to differentially expressed genes, and separate networks for annotations associated with up- and down-regulated genes can be generated. In addition to the PDF output of the default network layout, ICE-A also provides the network xGMML files to be loaded into Cytoscape for visualization and customization in an interactive format.

Distance limits of proximity-based annotation

The theoretical distance limits for assignment of a distal element to a target gene were calculated for both the mouse (mm10) and human (hg38) genomes based on standard proximity annotation (annotation to the closest TSS) and GREAT. For standard proximity annotation, the upper threshold for assigning an element to a gene is half the distance to the upstream and downstream neighboring genes, respectively. For GREAT, the distance is extended to the basal gene regulatory domain (default −5 kb/+1 kb from the TSS) of the neighboring genes. Curated regions provided by GREAT (v4.0) are included in the distance calculations. For both annotation methods, the TSS coordinates are based on the USCS Known Gene dataset (NCBI build 37 derived from Ensembl Biomart ver. 67 for mouse, and GRCh38 derived from Ensembl Biomart ver. 90 for human). A violin plot of the maximal distance for CRE annotation (irrespective of direction) for each gene, as well as a barplot of the median upper distance limit in the up- and down-stream direction, respectively, was generated using ggplot2 (v3.5.1).

ChIP-sequencing and data analysis

ChIP-seq

ChIP-seq was carried out as previously reported for histone modifications in Strid et al. (2021)13 in duplicates. Scid.adh.2C T-cell progenitor cell line were fixed in 1% formaldehyde for 10 min at RT followed by quenching by addition of 1/10 volume of 0.125 M glycine and wash in PBS. Nuclei were isolated by incubation in Nuclei Isolation buffer (50 mM Tris, pH 8, 60 mM KCl, and 0.5% NP40) + protease inhibitor mixture (PIC) (Roche Diagnostics) for 10 min on ice. Pelleted nuclei were resuspended in Lysis buffer (0.5% SDS, 10 mM EDTA, 0.5 mM EGTA, and 50 mM Tris–HCl (pH 8))) + PIC and sonicated on a Bioruptor (Diagenode), followed by pelleting of debris. The supernatant was diluted 5× in HBSS (Lonza) + PIC and 2× radioimmunoprecipitation assay buffer (20 mM Tris–HCl (pH 7.5), 2 mM EDTA, 2% Triton X-100, 0.1% SDS, 0.2% sodium deoxycholate, and 200 mM NaCl) + PIC. 10 μL of H3K4me3 (Millipore, 07-473) was hybridized to 70 μL Protein G/A Dynabeads (Life Technologies). ChIP was performed overnight at 4 °C and subsequently washed 1 × 500 μL Low Salt Immune Complex Wash Buffer (0.1% SDS, 1% Triton-X, 2 mM EDTA, 20 mM Tris-HCl (pH 8.1), 150 mM NaCl), 1x with 200 μL High Salt Immune Complex Wash Buffer, (0.1% SDS, 1% Triton-X, 2 mM EDTA, 20 mM Tris-HCl (pH 8.1), 500 mM NaCl), 1x with 200 μL LiCl Immune Complex Wash Buffer (0.25 M LiCl, 1% Igepal-CA630, 1% deoxycholic acid, 1 mM EDTA, 10 mM Tris (pH 8.1))), and 2x with 200 μL TE buffer (10 mM Tris-Hcl (pH 8.0), 10 mM EDTA). Chromatin was eluted for 6 h at 65 °C in Elusion buffer (20 mM Tris–HCl, pH 7.5, 5 mM EDTA, 50 mM NaCl, 1% SDS, 100 μg RNase A, and 50 μg proteinase K), and finally cleaned up using Zymo ChIP DNA Clean & Concentrator (Zymo Research). Libraries were prepared using NEXTflex DNA barcodes (Bioo Scientific). ChIP-seq libraries were subjected to 75 cycles of single-end sequencing on the NextSeq 500 system. The data was deposited in the GEO database (GSE279957).

ChIPseq data analysis

FASTQ files of the H3K4me3 histone ChIP-seq data from the 230–238, Scid.adh.2C and Wt/Ebf1−/− pro-B cells were trimmed with Trim Galore (v.0.6.7) using the following options: -fastqc --clip_R1 5 --three_prime_clip_R1 3. Trimmed reads were mapped to the mm10 reference genome (GRCm38) using Bowtie (v.2.4.4). Bigwig files for genome browser visualization were generated with bamCoverage (--normalizeUsing RPGC --centerReads) from deepTools (v1.9). Peak calling was performed with a custom pipeline (https://github.com/Tingvall/macs2_idr), which included peak calling with MACS2 (v2.2.7.1) followed by IDR (v.2.0.4.2). The pipeline was run in narrow mode and the following options were used: --macs_q 0.05 --idr_threshold 0.05. The IDR optional peak set was used for the downstream analysis.

HiChIP and PLAC-sequencing and data analysis

PLAC-sequencing

PLAC-seq was carried out in duplicates as previously reported.50 Scid.adh.2C cells were cross-linked with 1% formaldehyde for 5 min, re-suspended in ice cold lysis buffer (10 mM Tris-HCl (pH 7.5) 10mM NaCl, 0.2% NP-40) + Protease Inhibitors Cocktail (PIC, Roche) and incubated 15 min on ice. Pelleted cells were washed in ice-cold lysis buffer + PIC and resuspended in 0.5% SDS and incubated at 62 °C for 10 min. The reaction was diluted with water and 10% Triton X-100, followed by incubation at 37 °C for 15 min. Restriction enzyme digestion with 40U Mbol and 25 μL NEB2 buffer was performed by incubation at 37 °C for 2 h (shaking, 900 rpm), followed by inactivation by 20 min incubation at 62 °C. Fill-in reaction was conducted with 0.3 mM Biotin-14-dATP (ThermoFisher, 19524016), 0.3 mM dCTP, 0.3 mM dTTP, 0.3 mM dGTP and 40U Klenow (NEB, M0210) at 37 °C for 1.5 h with shaking, followed by ligation reaction by incubation at RT for 2 h with rotation in ligation master-mix (T4 ligation buffer (NEB, B0202), 1% Triton-X 100, 120 μg BSA (NEB, B9000), 4000U T4 DNA ligase (NEB, M0202)). Pellet cells were resuspended in 250 μL ChIP SDS-lysis buffer (0.5% SDS, 10 mM EDTA, 0.5 mM EGTA,50 mM Tris-HCl (pH 8.0)) + PIC followed by sonication on Covaris ME220 for 6 min (Peakpower = 75, cycles per burst = 1000, Duty Factor = 15%). Chromatin was spun spun down at 13000 rpm for 10 min and supernatants were diluted in 750 μL HBSS + 1 mL RIPA (20 mM Tris–HCl, pH 7.5, 2 mM EDTA, 2% Triton X-100,0.1% SDS, 0.2% sodium deoxycholate, 200 mM NaCl) + PIC. 1% input was removed and 10 μg H3K4me3 (Millipore, 07-473) pre-absorbed to 60 μL Protein-G dynabeads was added to remining sample, followed by ON incubation at 4 °C with rotation. Samples were washed 2x in Low Salt ImmuneComplex Wash Buffer (0.1% SDS, 1% Triton X-100, 2 mM EDTA, 50 mM Tris-HCl pH8,150 mM NaCl), 2x in High Salt Immune Complex Wash Buffer (0.1% SDS, 1 %Triton X 100, 2 mM EDTA, 50 mM Tris-HCl pH 8.0, 500 mM NaCl), 1x in LiCl Immune Complex Wash Buffer (0.25 M LiCl, 1% Igepal-CA630, 1% sodium deoxycholate, 1mM EDTA, 10 mM Tris-HCl pH 8.0), 2x in TE buffer (10 mM Tris–HCl, pH 8.0,10 mM EDTA) followed by elution of chromatin with two rounds of 100 μL elution buffer (1%SDS, 100 mM NaHCO3) with shaking (1500 rmp). Eluded chromatin complexes were reverse cross-linked ON at 65 °C with the addition of 250 mM NaCl, 100 μg RNase A (ThermoFisher, EN0531) and 50 μg proteinase K (ThermoFisher, AM2546) followed by clean up using Zymo Research ChIP DNA Clean &Concentrator (Zymo Research). 25 μL of Streptavidin T1 beads (Thermo Fisher,65601) washed in Tween Wash Buffer (5 mM Tris-HCl pH 8.0, 0.5 mM EDTA, 1 M NaCl, 0.05% Tween-20) then resuspended in 50 μL of Biotin Binding Buffer (10 mM Tris-13HCl pH 7.5, 1 mM EDTA, 2M NaCl), were added to samples followed by incubation at RT for 15 min with shaking. Beads were recovered using a magnet, washed 2x in Tween Wash buffer and incubated at 55 °C for 2 min with shaking followed by another wash in T4DNA ligation buffer. Captured beads were resuspended in 100 μL end-repair mastermix (0.5 mM dNTPs (VWR, E636-40UMOLE), 12U T4 DNA Polymerase (NEB, M0203), 50U T4 Polynucleotide Kinase (NEB, M0201), 5U Klenow (NEB, M0210), NEBT4 DNA ligase buffer) followed by incubation at RT for 30 min with shaking. Samples were washed 3x in Tween Wash buffer, incubated at 55 °C for 2 min with shaking and washed 1x in NEB2 buffer. Beads were captured, and incubated with100 μL A-tailing reaction mix (0.5 mM dATP (ThermoFisher), 25U Klenow Exo- (NEB, M0212), NEB2 buffer) with shaking, washed 2x with Tween Wash buffer and incubation at 55 °C for 2 min with shaking. Beads were washed 1x in 100 μL Fast-Link ligation buffer (Epicenter, LK0750H) followed by NEXTflex DNA barcode ligation (BIOO scientific) using the Fast-link ligation kit (Epicenter,LK0750H) and 3x wash in Tween wash buffer, incubation at 55 °C FOR 2 min and finally wash in 100 μL 10mMTris-HCl (pH 8.0). PCR amplification of libraries was performed according to optimal number of cycles determined by qPCR, and beads were collected on magnet. Libraries (supernatants) clean-up was performed with Ampure XP (0.8x sample volume). PLAC-seq libraries were subjected to 2 × 38 cycles of paired-end sequencing on the NextSeq 500 system. The data is deposited as in the GEO database (GSE279961).

Preprocessing of PLAC-seq data

H3K4me3 PLAC-seq data generated from Wt and Ebf1−/− fetal liver pro-B cells were generated and preprocessed according to Strid et al.13 For the preprocessing of H3K4me3 PLAC-seq data from the Scid.adh.2C pro-T cells, FASTQ files from two replicates were merged and trimmed with Trim Galore (v0.6.7) using the following options: (--paired --fastqc --clip_R1 10 --clip_R2 10 --three_prime_clip_R1 3 --three_prime_clip_R2 3. Trimmed reads were processed through the HiC-Pro pipeline (v2.11.4)86 for the generation of valid interaction pairs. The HiC-pro processing included alignment to the mm10 reference genome (GRCm38) using Bowtie2 (v2.4.2)82,102 and the assignment of mapped reads to an mm10 MboI restriction map.

Interaction calling

For the H3K27ac HiChIP data obtained from the K562 cells and for the H3K4me3 PLAC-seq data from the Scid.adh.2C cells, interaction calling was performed with FitHiChIP (v9.1)44 using the merged validParirs output from the HiC-Pro pipeline with the following parameters: IntType = 3, BINSIZE = 5000, LowDistThr = 10000, UppDistThr = 3000000, UseP2PBackgrnd = 0, BiasType = 1, MergeInt = 1, QVALUE = 0.05. The corresponding H3K27ac ChIP-seq peak in broadPeak format (GSM733656) was used as the reference peak file. The merged close PEAK-to-ALL interactions, passing the significance threshold (q < 0.05), was used as the input for peak annotation conducted using ICE-A. The H3K27ac ChIP-seq peaks (GSM733656) and H3K4me3 ChIPseq peaks (this study) from matching cell types were provided as PeakFile references. For the Wt and Ebf1−/− cells, ALL-to-ALL interactions were generated with the same parameters, except for IntType = 4. For comparison, ALL-to-ALL interactions were also generated for the Scid.adh.2C cells.

ATAC-seq data analysis

ATAC-seq data were preprocessed with the ENCODE ATAC-seq pipeline (https://github.com/ENCODE-DCC/atac-seq-pipeline) (v.2.2.1). For paired-end data from 230-238 cells, the pipeline was run with the following argument: (“atac.paired_end”:true, “atac.auto_detect_adapter”: true, “atac.multimapping”: 4, “atac.mapq_thresh”: 30, “atac.dup_marker”: “picard”, “atac.no_dup_removal”: false, “atac.enable_idr”: true, “atac.idr_thresh”: 0.05, “atac.enable_count_signal_track”: true, “atac.filter_chrs”: [“chrM”, “MT”]. Single-end data from Scid.adh.2C and Wt/Ebf1−/− pro-B cells were analyzed as for the 230–238 cells with the following differences: “atac.paired_end”:true “atac.no_dup_removal”: false. The IDR optimal peak set was used for the downstream analysis.

RNA-seq data analysis

RNA-seq data derived from common progenitor populations, as well as from early B- and T-cell progenitors from the Immgen consortium (GSE122597) were preprocessed by trimming with TrimGalore (v0.6.4), followed by alignment to the mm10 mouse reference genome (GRCm38.p5.vM15) using STAR (v2.7.3). Transcript levels were generated with rsem-calculate-expression in RSEM (v1.3.0), with the following options: --paired-end --alignments --forward-prob 0 --seed-length 20 --output-genome-bam --sampling-for-bam --estimate-rspd --calc-c.

Identification of functional K562 enhancers

CRISPRi-FlowFISH-validated enhancer-promoter pairs in K562 cells41 were used to assess the ability of ICE-A to annotate cis-regulatory elements compared to other available annotation tools [proximity annotation with HOMER89 and GREAT37]. Enhancer-promoter pairs [taken from Table S3A in the paper of Fulco et al. (2019)]41 were filtered to exclude: (i) candidate enhancer elements with a distance >3 Mb from the target TSS; and (ii) elements within promoters or gene bodies. Enhancer-promoter pairs with a reported false discovery rate (FDR) < 0.05 were considered significant enhancers with respect to their corresponding target gene(s) and were used as inputs for the evaluation of the different annotation methods. Proximity annotation of validated enhancers was performed using annotatePeaks.pl with HOMER (v4.10.0) using the hg19 RefSeq annotations. Annotation with GREAT (v.4.0) was performed using the Basal plus extension mode with default settings for the basal regulatory domain. The maximum distance for annotation of distal regions was increased to 3 Mb, corresponding to the maximum distance used for interaction calling of the HiCHiP data GSM2705043-GSM2705045.42 For ICE-A, the interaction-based annotation of the validated enhancers was based on H3K27ac HiChIP significant interactions that were called with FitHiChIP. ICE-A was run in Basic mode with the default parameters, with the exceptions of --multiple_anno keep. For all three annotation methods, the numbers and percentages of correctly annotated enhancers were evaluated and presented, both overall and on the individual gene level for different distance ranges. In addition, the performance of ICE-A annotation without proximal annotation were evaluated and compared against other annotation methods, with regards to the ability to identify significant enhancers and their precision (percentage of functional enhancers among total annotated elements). Fraction of enhancers with H3K27ac was evaluated and WashU Epigenome Browser (v54.0.6) was used to visualize identified enhancers of Gata1, along with the H3K27ac ChIPseq and HiChIP data.

Annotation of enhancers for cancer fitness

Benchmarking of ICE-A against alternative annotation methods were performed with enhancers essential for cancer fitness and proliferation from Chen et al.,45 obtained from a CRISPRi screening. Essential enhancer in HCT-116 colorectal cancer cell line were filtered for having a TSS of an essential target gene within 2Mb. Target genes discovery of essential enhancers was performed with proximity annotation using HOMER (v4.10.0), GREAT (v.4.0) with default setting and ICE-A (default settings except --close_peak_type distance and –close_promoter_type distance). Hg38 TSS location from GREAT (v.4.0) were used for all annotation methods (specified with -gtf in HOMER and –tss in ICE-A). For ICE-A, H3K4me3 PLAC-seq interactions from HCT-116 cell line, as well as A549 lung cancer cell line was used for annotation (the interactions down sampled to have equal number of total interactions, based on interaction score). In addition, ICE-A with proximity annotations excluded were compared against the other annotation strategies. The performance to identify relevant target genes were evaluated based on fraction of essential target genes assigned to at least one essential gene. In addition, the specificity was evaluated based on fraction of essential genes among all target gene assignments.

Investigation of lineage-specific elements

B- and T-lineage specific elements

A gene-level matrix from the RSEM expression tool was used to estimate the numbers of progenitor and lymphoid populations from the Immgen consortium (LTHSC_CD34-, LTHSC_CD34+, STHSC, MPP4, CLP, FrA, FrBC, FrE, DN1, DN2a, DN2b, DN3) using tximport (v1.3.0), and then variance stabilizing transformed using DESeq2 (v1.42.0). K-mean clustering and visualization of z-scored variance stabilizing transformation (VST) counts averaged per population were performed using the ComplexHeatmap package (v2.18.0). The clustering was restricted to genes that were identified as being differentially expressed between committed FrBC B-progenitor and DN2b T-progenitor cell populations (padj <0.01 & |log2FC|>2) using DESeq2. Clusters with a lineage-specific expression profile were used to define B- and T-lineage-specific gene sets. Lineage-specific elements were defined as open regions from the 230-238 B-cell progenitor cell line (GSE162858) or Scid.adh.2C2 T-cell progenitor cell line (GSE93755), which interacted with the corresponding lineage-specific gene sets. Assignment of target genes to open chromatin regions was based on matching H3K4me3 PLAC-seq interactions using ICE-A with the default options, with the exceptions of: --close_peak_type distance --close_promoter_type distance. Distal elements are defined as non-promoter regions.

Chromatin accessibility and motif enrichment

ATAC-seq counts in open chromatin regions for 230-238 B-cell progenitor cell line (GSE162858) and Scid.adh.2C2 T-cell progenitor cell line (GSE93755) were extracted and normalized based on full library size using DiffBind (v3.12.0). Replicates are aggregated and log2 mean normalized counts are visualized in a violin plot for each set of lineage-specific distal elements using ggplot2 (v3.5.1). Statistical analysis was performed with stat_comapre_means from the ggpubr package (v0.6.0) and based on Mann–Whitney U test. The difference in chromatin accessibility was also evaluated separately based on annotation strategy used for target gene assignment. De novo motif enrichment analysis was performed for B-cell- and T-cell-specific distal elements using findMotifGenome -size 200 -mask from HOMER (v4.10.0). The chromatin accessibility in B- and T cell associated elements were also evaluated based on type of annotation (proximity and/or interaction-based annotation) used by ICE-A.

Transcription factor co-occupancy analysis

Co-occupancy of the lymphoid TFs EBF1 (GSE159957), PAX5 (GSE126375), TCF7 (GSE131673) and GATA3 (GSE93755) in lineage-specific elements is investigated using ICE-A run in multiple mode (--mode multiple) with the following options: --upset_plot --circos_plot --circos_use_promoters --skip_promoter_promoter. The co-occupancy analysis was restricted to lineage-specific distal elements (--in_regions regions.bed), interacting with B- and T-cell gene sets (--filter_genes --genes genes.txt).

Regulation of broadly expressed genes

Investigation of Ebf1 target genes

EBF1 target genes are defined as genes associated to EBF1 bound elements with ICE-A annotation (default options used, with the exceptions of: --close_peak_type distance --close_promoter_type distance). The gene expression patterns of EBF1 targets in early lymphoid development were visualized in a heatmap of the Immgen gene expression data, as described for the identification of B-/T-cell-specific elements, except that EBF1-associated genes were selected. EBF1 targets were classified based on differential gene expression between FL Wt and Ebf1-deficient pro-B cells from GSE92434. Normalized gene expression data were merged per cell type and filtered for genes with normalized score >1 in at least one of the cell types. Up-regulated or down-regulated genes in Wt vs. Ebf1−/− were defined as genes with a significant difference in gene expression (padj <0.05) and |log2FC| >1, and common genes were defined as non-differentially expressed genes with |log2FC| <log2(1.5). CompareClusters from the clusterProfiler package (v4.10.0) was used to compare the levels of enrichment of biological process gene ontologies between the different categories of EBf1 targets, split into genes that are bound by EBF1 in promoter or distal (>2.5 kb from TSS) elements.

Ebf1-mediated chromatin accessibility change

ATAC-seq data from Wt and Ebf1-deficient FL pro-B cells (GSE92434) were used to investigate the effect of EBF1 activity on chromatin accessibility. A consensus peak set for the Wt and Ebf1−/− data, defined as the aggregated union of peaks, was generated using bedtools (v2.30.0). ATAC-seq counts for the Wt and Ebf1−/− pro-B cells in the consensus peak set were extracted using DiffBind (v3.12.0). Differential accessibility analysis between Wt and Ebf1−/− cells was performed using DESeq2 (v1.42.0). The percentage of elements with a significant EBF1-mediated change in chromatin accessibility (|log2FC|>1, padj <0.05) were visualized in a stacked barplot. The analysis was restricted to elements that were open and associated with at least one gene in Wt pro-B cells, split into different categories based on EBF1-mediated changes in gene expression and type of element (promoter/distal). Comparative motif analysis of elements split into categories of EBF1 occupancy and direction of change was performed with gimme maelstrom from GimmeMotifs (v0.15.2).

EBF1-addicted gene regulation

To investigate a potential scenario of EBF1-addicted gene regulation, gene expression data derived from the EBF1 degradation system of Zolotarev et al.51 were used. Genes with EBF1-addicted activation or repression were defined as commonly expressed genes in Wt vs. Ebf1−/− pro-B cells, with significantly differential expression at EBF1 degradation (padj <0.05, |FC| >1.5). Gene ontology enrichment analysis of the biologic processes for the top 50 genes (based on log2FC in the degradation system) with EBF1-addicted activation/repression was performed using CompareClusters from the clusterProfiler package (v4.10.0). Myc represent an example of a gene with EBF1-addicted activation. The WashU Epigenome Browser (v54.0.6) was used to visualize Ebf1 binding, chromatin accessibility of Wt and Ebf1−/− pro-B cells, and H3K4me3 PLAC-seq interactions from Wt pro-B cells at the Myc locus, with a zoomed-in view of the BENC region located ∼1.6 Mb upstream of the TSS.

Inactivation of putative EBF1 target genes

Virus expressed guides for CRISPR/Cas9

CRISPR guides were identified by overlaying the genomic coordinates of coding genes of interest with potential CRISPR/Cas9 guides in Benchling (Biology Software. 2020) using the single guide, mm10 genome and 3′ NGG PAM settings. Guides which promote Cas9 mediated cutting within the coding regions were selected for cloning into pAW13.lentiGuide-emCherry plasmid (lentiguide-mCherry, Addgene plasmid #104375; http://n2t.net/addgene:104375; RRID: Addgene_104375, a gift from Richard Young). Lentiviruses were produced by transfecting 293T-HEK cells with lentiguide-mCherry plasmids as well as psPax2 and pMD2G packaging plasmids together with X-tremeGENE HP DNA Transfection Reagent (Sigma) according to the manufacturer’s instructions. The resulting virus was harvested after 54–64 h and concentrated using the Lenti-X concentrator according to the manufacturer’s instructions (Takara Bio).

Functional screen for target gene relevance

Hind bones from iCas9 mice aged 8–16 weeks were crushed in a mortar and the cell suspension passed through a 50um filter. KIT+ cells were isolated by magnet-activated cell sorting column using anti-CD117 immunomagnetic beads (Miltenyi Biotec). The cells were transduced with lentiviruses carrying gRNAs targeting putative EBF1-target genes as well as positive (Ebf1) and negative (Rosa 26, R26) control gRNAs. Briefly, non-tissue coated plates were coated with 40 μg/mL retronectin (Takara Bio) overnight at 4 °C, blocked with a 2% BSA solution at RT for 30 min and washed with PBS. Thereafter, viruses were added, and plates spun at 2000×g and 32 °C for 2 h. The remaining viral supernatant was aspirated, and wells were washed with PBS. KIT+ cells were then added to the virus-coated wells and plates spun at 300×g and 32 °C for five minutes. After an overnight culture in OptiMEM media supplemented with 10% fetal calf serum, 50μg/mL gentamicin, 50μM β-mercaptoethanol, 50 ng/mL KIT ligand, 50 ng/mL Fms-like tyrosine kinase 3 ligand (FLT3L) and 50 ng/mL interleukin-7 (IL-7) and Doxycyclin (DOX, Sigma) to induce Cas9 expression, transduced BM cells were put in co-culture with OP9 or OP9D stromal cells. The cells were cultured in OptiMEM media supplemented with 10% fetal calf serum, 50 μg/mL gentamicin, 50μM β-mercaptoethanol, 10 ng/mL KIT ligand, 10 ng/mL FLT3L and 10 ng/mL IL-7 and DOX (0.1 μg/mL DOX every two days). At day3 post transduction, CD45+mCherry+ cells were sorted for subsequent OP9/OP9D co-culture as above. At day 11, the cells were stained with an antibody cocktail containing CD45 (30-F11, AF700, BioLegend, 103128), CD19 (1D3, PECy7, eBioscience, 15360900), Thy1.2 (53-2.1, Pacific Blue, BioLegend, 140306), Gr1 (RB6-8C5, APCCy7, BioLegend, 108424), CD11b (M1/70, APCCy7, BioLegend, 101226), NK1.1 (PK136, PE, BioLegned, 108708) and CD11c (N418, APC, BioLegned, 117310). 7AAD was used as a viability marker. The absolute number of B-cells (CD45+Gr1-CD11b-CD11c-NK1.1-CD19+) and T-cells (CD45+Gr1-CD11b-CD11c-NK1.1-CD19-Thy1+) generated was determined by FACS analysis using the addition of a set amount of CountBright Absolute Counting Beads (Fisher Scientific) and presented relative to R26 control (Table S1F).

Dynamics of promoter/enhancer interactions

Interaction landscape in lymphoid development

A consensus set of distal elements involved in lymphoid development (based on open chromatin regions in 230-230 or Scid.adh.2C2 cells; see Figure 2) were annotated in a lineage-specific manner using ICE-A with H3K4me3 significant PLAC-seq interactions from Scid.adh.2C2 pro-T cells and Wt/Ebf1−/− pro-B cells. For comparison, the numbers of total ALL-to-ALL interactions were subsetted to the dataset with the fewest interactions (top interactions based on FitHiChIP q-values are retained). Distal elements were filtered for interactions with B- or T-lineage-specific genes (defined in Figure 2) in any of the cell types. For these putative lineage-specific DREs, the overall enhancer-promoter interaction landscape was compared between cell types and visualized with Circos plots using the circlize package (v0.4.16). Only interactions with B-/T-lineage-restricted genes (defined in Figure 2A) or common target genes [|log2FC FrBCvsDN2b | <log2(1.5)] with activated promoters were included (defined by overlap with H3K4me3 in the promoter of the respective cell type). For visualization of the dynamics of enhancer/promoter interactions in early lymphocytic development, a Venn diagram displaying the numbers and percentages of interactions with B- or T-lineage-restricted genes present in each cell type was generated using the ggVennDiagram package (v1.5.2). In addition, a Venn diagram displaying the degree of overlap between the interactions with alternative common target genes for the lineage-restricted, gene-associated DREs was generated.

Genome browser visualization

Genome browser tracks of interactions associated with putative lineage-specific DREs were generated for visualization in the WashU Epigenome Browser (v54.0.6), by compressing and indexing sorted and filtered H3K4me3 PLAC-seq interactions from the Scid.adh.2C2 and FL Wt and Ebf1−/− pro-B cells using bgzip and tabix from htslib (v1.10.2). The filtered interactions, along with ATAC-seq and H3K4me3 tracks from each cell type, as well as EBF1 and PAX5 binding were visualized in the WashU Epigenome Browser (v54.0.6) at the Gata3 locus.

Quantification and statistical analysis

Analysis workflows for HiChIP/PLAC-seq, ChIP-seq, ATAC-seq and RNA-seq are presented in detail in the method details section. For chromatin interaction data, FitHiChiP (v9.1) is used for calling significant (q-value < 0.05) interactions. For ChIP-seq/ATAC-seq data peaks are called using macs2 (v2.2.71) with a q-value threshold of 0.05. Significant peaks are defined as peaks passing an irreproducibility discovery rate (IDR) of 0.05 from two replicates. For ChIP-seq matching input controls are used in peak calling. Transcript levels count from RNA-seq data were generated with rsem-calculate-expression in STAR (v1.3.0). Differential accessibility and expression analysis is performed with Wald test using DESeq2 (v1.42.0). Features with an Benjamini-Hochberg adjusted p-value < 0.05 are considered significant is not specified otherwise. Log2FC cutoff value for specific analysis are provided in the respective figure legends. Motif enrichment analysis is performed with HOMER (v4.10.0) for de novo motif analysis or GimmeMotifs (v0.15.2) for comparative motif enrichment. Gene set enrichment analysis was performed with the CompareCluster from the clusterProfiler package (v4.10.0) with a significance level of padj (Benjamini-Hochberg correction) < 0.05. If not stated otherwise, statistical analysis is performed with the stat_comapre_means from the ggpubr package (v0.6.0) and based on Mann–Whitney U test. Statistical significance levels are denoted as follows, p-value < 0.05: ∗, p-value < 0.01: p∗∗, p-value < 0.001: ∗∗∗, p-value < 0.0001: ∗∗∗∗.

Additional resources

ICE-A is available as an open source tool on GitHub: https://github.com/Tingvall/ICE_A.

Published: June 9, 2025

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.isci.2025.112855.

Supplemental information

Document S1. Figures S1–S3
mmc1.pdf (1.4MB, pdf)
Table S1. Supplemental data

(A) Information about datasets.

(B) B- and T-lineage specific gene sets.

(C) Output from ICE-A TF co-occupancy analysis for B cell specific target genes.

(D) Output from ICE-A TF co-occupancy analysis for T cell specific target genes.

(E) Guide information for inactivation of putative EBF1 targets.

(F) Cell counts from EBF1 target inactivation experiment.

(G) Information about elements with EBF1 dependent regulation.

(H) Information about alternative promoters of T-lineage distal elements.

mmc2.xlsx (3.8MB, xlsx)

References

  • 1.Hosokawa H., Romero-Wolf M., Yui M.A., Ungerbäck J., Quiloan M.L.G., Matsumoto M., Nakayama K.I., Tanaka T., Rothenberg E.V. Bcl11b sets pro-T cell fate by site-specific cofactor recruitment and by repressing Id2 and Zbtb16. Nat. Immunol. 2018;19:1427–1440. doi: 10.1038/s41590-018-0238-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Li L., Leid M., Rothenberg E.V. An early T cell lineage commitment checkpoint dependent on the transcription factor Bcl11b. Science. 2010;329:89–93. doi: 10.1126/science.1188989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ungerbäck J., Hosokawa H., Wang X., Strid T., Williams B.A., Sigvardsson M., Rothenberg E.V. Pioneering, chromatin remodeling, and epigenetic constraint in early T-cell gene regulation by SPI1 (PU.1) Genome Res. 2018;28:1508–1519. doi: 10.1101/gr.231423.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Weber B.N., Chi A.W.S., Chavez A., Yashiro-Ohtani Y., Yang Q., Shestova O., Bhandoola A. A critical role for TCF-1 in T-lineage specification and differentiation. Nature. 2011;476:63–68. doi: 10.1038/nature10279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hosokawa H., Rothenberg E.V. Cytokines, Transcription Factors, and the Initiation of T-Cell Development. Cold Spring Harb. Perspect. Biol. 2018;10 doi: 10.1101/cshperspect.a028621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Lin Y.C., Jhunjhunwala S., Benner C., Heinz S., Welinder E., Mansson R., Sigvardsson M., Hagman J., Espinoza C.A., Dutkowski J., et al. A global network of transcription factors, involving E2A, EBF1 and Foxo1, that orchestrates B cell fate. Nat. Immunol. 2010;11:635–643. doi: 10.1038/ni.1891. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Jensen C.T., Åhsberg J., Sommarin M.N.E., Strid T., Somasundaram R., Okuyama K., Ungerbäck J., Kupari J., Airaksinen M.S., Lang S., et al. Dissection of progenitor compartments resolves developmental trajectories in B-lymphopoiesis. J. Exp. Med. 2018;215:1947–1963. doi: 10.1084/jem.20171384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Treiber T., Mandel E.M., Pott S., Györy I., Firner S., Liu E.T., Grosschedl R. Early B Cell Factor 1 Regulates B Cell Gene Networks by Activation, Repression, and Transcription- Independent Poising of Chromatin. Immunity. 2010;32:714–725. doi: 10.1016/j.immuni.2010.04.013. [DOI] [PubMed] [Google Scholar]
  • 9.Pongubala J.M.R., Northrup D.L., Lancki D.W., Medina K.L., Treiber T., Bertolino E., Thomas M., Grosschedl R., Allman D., Singh H. Transcription factor EBF restricts alternative lineage options and promotes B cell fate commitment independently of Pax5. Nat. Immunol. 2008;9:203–215. doi: 10.1038/ni1555. [DOI] [PubMed] [Google Scholar]
  • 10.Györy I., Boller S., Nechanitzky R., Mandel E., Pott S., Liu E., Grosschedl R. Transcription factor Ebf1 regulates differentiation stage-specific signaling, proliferation, and survival of B cells. Genes Dev. 2012;26:668–682. doi: 10.1101/gad.187328.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Revilla-I-Domingo R., Bilic I., Vilagos B., Tagoh H., Ebert A., Tamir I.M., Smeenk L., Trupke J., Sommer A., Jaritz M., Busslinger M. The B-cell identity factor Pax5 regulates distinct transcriptional programmes in early and late B lymphopoiesis. EMBO J. 2012;31:3130–3146. doi: 10.1038/emboj.2012.155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Sigvardsson M. Transcription factor networks link B-lymphocyte development and malignant transformation in leukemia. Genes Dev. 2023;37:703–723. doi: 10.1101/gad.349879.122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Strid T., Okuyama K., Tingvall-Gustafsson J., Kuruvilla J., Jensen C.T., Lang S., Prasad M., Somasundaram R., Åhsberg J., Cristobal S., et al. B Lymphocyte Specification Is Preceded by Extensive Epigenetic Priming in Multipotent Progenitors. J. Immunol. 2021;206:2700–2713. doi: 10.4049/jimmunol.2100048. [DOI] [PubMed] [Google Scholar]
  • 14.Laverré A., Tannier E., Necsulea A. Long-range promoter-enhancer contacts are conserved during evolution and contribute to gene expression robustness. Genome Res. 2022;32:280–296. doi: 10.1101/gr.275901.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Mills C., Muruganujan A., Ebert D., Marconett C.N., Lewinger J.P., Thomas P.D., Mi H. PEREGRINE: A genome-wide prediction of enhancer to gene relationships supported by experimental evidence. PLoS One. 2020;15 doi: 10.1371/journal.pone.0243791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Dobi K.C., Winston F. Analysis of transcriptional activation at a distance in Saccharomyces cerevisiae. Mol. Cell Biol. 2007;27:5575–5586. doi: 10.1128/MCB.00459-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kvon E.Z., Kazmar T., Stampfel G., Yáñez-Cuna J.O., Pagani M., Schernhuber K., Dickson B.J., Stark A. Genome-scale functional characterization of Drosophila developmental enhancers in vivo. Nature. 2014;512:91–95. doi: 10.1038/nature13395. [DOI] [PubMed] [Google Scholar]
  • 18.Ren X., Wang M., Li B., Jamieson K., Zheng L., Jones I.R., Li B., Takagi M.A., Lee J., Maliskova L., et al. Parallel characterization of cis-regulatory elements for multiple genes using CRISPRpath. Sci. Adv. 2021;7 doi: 10.1126/sciadv.abi4360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Lidschreiber K., Jung L.A., von der Emde H., Dave K., Taipale J., Cramer P., Lidschreiber M. Transcriptionally active enhancers in human cancer cells. Mol. Syst. Biol. 2021;17 doi: 10.15252/msb.20209873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Clément Y., Torbey P., Gilardi-Hebenstreit P., Roest Crollius H. Enhancer-gene maps in the human and zebrafish genomes using evolutionary linkage conservation. Nucleic Acids Res. 2020;48:2357–2371. doi: 10.1093/nar/gkz1199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Li G., Ruan X., Auerbach R.K., Sandhu K.S., Zheng M., Wang P., Poh H.M., Goh Y., Lim J., Zhang J., et al. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell. 2012;148:84–98. doi: 10.1016/j.cell.2011.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Sanyal A., Lajoie B.R., Jain G., Dekker J. The long-range interaction landscape of gene promoters. Nature. 2012;489:109–113. doi: 10.1038/nature11279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.van Arensbergen J., van Steensel B., Bussemaker H.J. In search of the determinants of enhancer-promoter interaction specificity. Trends Cell Biol. 2014;24:695–702. doi: 10.1016/j.tcb.2014.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Ptashne M. Gene regulation by proteins acting nearby and at a distance. Nature. 1986;322:697–701. doi: 10.1038/322697a0. [DOI] [PubMed] [Google Scholar]
  • 25.Schleif R. DNA looping. Annu. Rev. Biochem. 1992;61:199–223. doi: 10.1146/annurev.bi.61.070192.001215. [DOI] [PubMed] [Google Scholar]
  • 26.Lieberman-Aiden E., van Berkum N.L., Williams L., Imakaev M., Ragoczy T., Telling A., Amit I., Lajoie B.R., Sabo P.J., Dorschner M.O., et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Dixon J.R., Selvaraj S., Yue F., Kim A., Li Y., Shen Y., Hu M., Liu J.S., Ren B. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–380. doi: 10.1038/nature11082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Rao S.S.P., Huntley M.H., Durand N.C., Stamenova E.K., Bochkov I.D., Robinson J.T., Sanborn A.L., Machol I., Omer A.D., Lander E.S., Aiden E.L. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Popay T.M., Dixon J.R. Coming full circle: On the origin and evolution of the looping model for enhancer-promoter communication. J. Biol. Chem. 2022;298 doi: 10.1016/j.jbc.2022.102117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Hnisz D., Weintraub A.S., Day D.S., Valton A.L., Bak R.O., Li C.H., Goldmann J., Lajoie B.R., Fan Z.P., Sigova A.A., et al. Activation of proto-oncogenes by disruption of chromosome neighborhoods. Science. 2016;351:1454–1458. doi: 10.1126/science.aad9024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Kragesteen B.K., Spielmann M., Paliou C., Heinrich V., Schöpflin R., Esposito A., Annunziatella C., Bianco S., Chiariello A.M., Jerković I., et al. Dynamic 3D chromatin architecture contributes to enhancer specificity and limb morphogenesis. Nat. Genet. 2018;50:1463–1473. doi: 10.1038/s41588-018-0221-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Lupiáñez D.G., Kraft K., Heinrich V., Krawitz P., Brancati F., Klopocki E., Horn D., Kayserili H., Opitz J.M., Laxova R., et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell. 2015;161:1012–1025. doi: 10.1016/j.cell.2015.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Stadhouders R., Filion G.J., Graf T. Transcription factors and 3D genome conformation in cell-fate decisions. Nature. 2019;569:345–354. doi: 10.1038/s41586-019-1182-7. [DOI] [PubMed] [Google Scholar]
  • 34.Lim B., Levine M.S. Enhancer-promoter communication: hubs or loops? Curr. Opin. Genet. Dev. 2021;67:5–9. doi: 10.1016/j.gde.2020.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Benabdallah N.S., Williamson I., Illingworth R.S., Kane L., Boyle S., Sengupta D., Grimes G.R., Therizols P., Bickmore W.A. Decreased Enhancer-Promoter Proximity Accompanying Enhancer Activation. Mol. Cell. 2019;76:473–484.e7. doi: 10.1016/j.molcel.2019.07.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Fukaya T., Lim B., Levine M. Enhancer Control of Transcriptional Bursting. Cell. 2016;166:358–368. doi: 10.1016/j.cell.2016.05.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.McLean C.Y., Bristor D., Hiller M., Clarke S.L., Schaar B.T., Lowe C.B., Wenger A.M., Bejerano G. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 2010;28:495–501. doi: 10.1038/nbt.1630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Yoshida H., Lareau C.A., Ramirez R.N., Rose S.A., Maier B., Wroblewska A., Desland F., Chudnovskiy A., Mortha A., Dominguez C., et al. The cis-Regulatory Atlas of the Mouse Immune System. Cell. 2019;176:897–912.e20. doi: 10.1016/j.cell.2018.12.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Reyna J., Fetter K., Ignacio R., Ali Marandi C.C., Ma A., Rao N., Jiang Z., Figueroa D.S., Bhattacharyya S., Ay F. Loop Catalog: a comprehensive HiChIP database of human and mouse samples. bioRxiv. 2025 doi: 10.1101/2024.04.26.591349. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Di Tommaso P., Chatzou M., Floden E.W., Barja P.P., Palumbo E., Notredame C. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 2017;35:316–319. doi: 10.1038/nbt.3820. [DOI] [PubMed] [Google Scholar]
  • 41.Fulco C.P., Nasser J., Jones T.R., Munson G., Bergman D.T., Subramanian V., Grossman S.R., Anyoha R., Doughty B.R., Patwardhan T.A., et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 2019;51:1664–1669. doi: 10.1038/s41588-019-0538-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Mumbach M.R., Satpathy A.T., Boyle E.A., Dai C., Gowen B.G., Cho S.W., Nguyen M.L., Rubin A.J., Granja J.M., Kazane K.R., et al. Enhancer connectome in primary human cells identifies target genes of disease-associated DNA elements. Nat. Genet. 2017;49:1602–1612. doi: 10.1038/ng.3963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Sahin M., Wong W., Zhan Y., Van Deynze K., Koche R., Leslie C.S. HiC-DC+ enables systematic 3D interaction calls and differential analysis for Hi-C and HiChIP. Nat. Commun. 2021;12:3366. doi: 10.1038/s41467-021-23749-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Bhattacharyya S., Chandra V., Vijayanand P., Ay F. Identification of significant chromatin contacts from HiChIP data by FitHiChIP. Nat. Commun. 2019;10:4221. doi: 10.1038/s41467-019-11950-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Chen P.B., Fiaux P.C., Zhang K., Li B., Kubo N., Jiang S., Hu R., Rooholfada E., Wu S., Wang M., et al. Systematic discovery and functional dissection of enhancers needed for cancer cell fitness and proliferation. Cell Rep. 2022;41 doi: 10.1016/j.celrep.2022.111630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Barajas-Mora E.M., Kleiman E., Xu J., Carrico N.C., Lu H., Oltz E.M., Murre C., Feeney A.J. A B-Cell-Specific Enhancer Orchestrates Nuclear Architecture to Generate a Diverse Antigen Receptor Repertoire. Mol. Cell. 2019;73:48–60.e5. doi: 10.1016/j.molcel.2018.10.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Barajas-Mora E.M., Lee L., Lu H., Valderrama J.A., Bjanes E., Nizet V., Feeney A.J., Hu M., Murre C. Enhancer-instructed epigenetic landscape and chromatin compartmentalization dictate a primary antibody repertoire protective against specific bacterial pathogens. Nat. Immunol. 2023;24:320–336. doi: 10.1038/s41590-022-01402-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Pongubala J.M.R., Murre C. Spatial Organization of Chromatin: Transcriptional Control of Adaptive Immune Cell Development. Front. Immunol. 2021;12 doi: 10.3389/fimmu.2021.633825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Lin Y.C., Benner C., Mansson R., Heinz S., Miyazaki K., Miyazaki M., Chandra V., Bossen C., Glass C.K., Murre C. Global changes in the nuclear positioning of genes and intra- and interdomain genomic interactions that orchestrate B cell fate. Nat. Immunol. 2012;13:1196–1204. doi: 10.1038/ni.2432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Somasundaram R., Jensen C.T., Tingvall-Gustafsson J., Åhsberg J., Okuyama K., Prasad M., Hagman J.R., Wang X., Soneji S., Strid T., et al. EBF1 and PAX5 control pro-B cell expansion via opposing regulation of the Myc gene. Blood. 2021;137:3037–3049. doi: 10.1182/blood.2020009564. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Zolotarev N., Bayer M., Grosschedl R. EBF1 is continuously required for stabilizing local chromatin accessibility in pro-B cells. Proc. Natl. Acad. Sci. USA. 2022;119 doi: 10.1073/pnas.2210595119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Tian K., Henderson R.E., Parker R., Brown A., Johnson J.E., Bateman J.R. Two modes of transvection at the eyes absent gene of Drosophila demonstrate plasticity in transcriptional regulatory interactions in cis and in trans. PLoS Genet. 2019;15 doi: 10.1371/journal.pgen.1008152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Oudelaar A.M., Harrold C.L., Hanssen L.L.P., Telenius J.M., Higgs D.R., Hughes J.R. A revised model for promoter competition based on multi-way chromatin interactions at the alpha-globin locus. Nat. Commun. 2019;10:5412. doi: 10.1038/s41467-019-13404-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Quintero-Cadena P., Sternberg P.W. Enhancer Sharing Promotes Neighborhoods of Transcriptional Regulation Across Eukaryotes. G3 (Bethesda) 2016;6:4167–4174. doi: 10.1534/g3.116.036228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Ramamoorthy S., Kometani K., Herman J.S., Bayer M., Boller S., Edwards-Hicks J., Ramachandran H., Li R., Klein-Geltink R., Pearce E.L., et al. EBF1 and Pax5 safeguard leukemic transformation by limiting IL-7 signaling, Myc expression, and folate metabolism. Genes Dev. 2020;34:1503–1519. doi: 10.1101/gad.340216.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Arnold C.D., Zabidi M.A., Pagani M., Rath M., Schernhuber K., Kazmar T., Stark A. Genome-wide assessment of sequence-intrinsic enhancer responsiveness at single-base-pair resolution. Nat. Biotechnol. 2017;35:136–144. doi: 10.1038/nbt.3739. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Butler J.E., Kadonaga J.T. Enhancer-promoter specificity mediated by DPE or TATA core promoter motifs. Genes Dev. 2001;15:2515–2519. doi: 10.1101/gad.924301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Juven-Gershon T., Kadonaga J.T. Regulation of gene expression via the core promoter and the basal transcriptional machinery. Dev. Biol. 2010;339:225–229. doi: 10.1016/j.ydbio.2009.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Zabidi M.A., Arnold C.D., Schernhuber K., Pagani M., Rath M., Frank O., Stark A. Enhancer-core-promoter specificity separates developmental and housekeeping gene regulation. Nature. 2015;518:556–559. doi: 10.1038/nature13994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Akbari O.S., Bae E., Johnsen H., Villaluz A., Wong D., Drewell R.A. A novel promoter-tethering element regulates enhancer-driven gene expression at the bithorax complex in the Drosophila embryo. Development. 2008;135:123–131. doi: 10.1242/dev.010744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Batut P.J., Bing X.Y., Sisco Z., Raimundo J., Levo M., Levine M.S. Genome organization controls transcriptional dynamics during development. Science. 2022;375:566–570. doi: 10.1126/science.abi7178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Kwon D., Mucci D., Langlais K.K., Americo J.L., DeVido S.K., Cheng Y., Kassis J.A. Enhancer-promoter communication at the Drosophila engrailed locus. Development. 2009;136:3067–3075. doi: 10.1242/dev.036426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Qian S., Varjavand B., Pirrotta V. Molecular analysis of the zeste-white interaction reveals a promoter-proximal element essential for distant enhancer-promoter communication. Genetics. 1992;131:79–90. doi: 10.1093/genetics/131.1.79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Oh S., Shao J., Mitra J., Xiong F., D'Antonio M., Wang R., Garcia-Bassets I., Ma Q., Zhu X., Lee J.H., et al. Enhancer release and retargeting activates disease-susceptibility genes. Nature. 2021;595:735–740. doi: 10.1038/s41586-021-03577-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Dillon N., Trimborn T., Strouboulis J., Fraser P., Grosveld F. The effect of distance on long-range chromatin interactions. Mol. Cell. 1997;1:131–139. doi: 10.1016/s1097-2765(00)80014-3. [DOI] [PubMed] [Google Scholar]
  • 66.Kmita M., Fraudeau N., Hérault Y., Duboule D. Serial deletions and duplications suggest a mechanism for the collinearity of Hoxd genes in limbs. Nature. 2002;420:145–150. doi: 10.1038/nature01189. [DOI] [PubMed] [Google Scholar]
  • 67.Eberhard D., Jiménez G., Heavey B., Busslinger M. Transcriptional repression by Pax5 (BSAP) through interaction with corepressors of the Groucho family. EMBO J. 2000;19:2292–2303. doi: 10.1093/emboj/19.10.2292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Heavey B., Charalambous C., Cobaleda C., Busslinger M. Myeloid lineage switch of Pax5 mutant but not wild-type B cell progenitors by C/EBPalpha and GATA factors. EMBO J. 2003;22:3887–3897. doi: 10.1093/emboj/cdg380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Hill L., Ebert A., Jaritz M., Wutz G., Nagasaka K., Tagoh H., Kostanova-Poliakova D., Schindler K., Sun Q., Bönelt P., et al. Wapl repression by Pax5 promotes V gene recombination by Igh loop extrusion. Nature. 2020;584:142–147. doi: 10.1038/s41586-020-2454-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Linderson Y., Eberhard D., Malin S., Johansson A., Busslinger M., Pettersson S. Corecruitment of the Grg4 repressor by PU.1 is critical for Pax5-mediated repression of B-cell-specific genes. EMBO Rep. 2004;5:291–296. doi: 10.1038/sj.embor.7400089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Nechanitzky R., Akbas D., Scherer S., Györy I., Hoyler T., Ramamoorthy S., Diefenbach A., Grosschedl R. Transcription factor EBF1 is essential for the maintenance of B cell identity and prevention of alternative fates in committed cells. Nat. Immunol. 2013;14:867–875. doi: 10.1038/ni.2641. [DOI] [PubMed] [Google Scholar]
  • 72.Astori A., Tingvall-Gustafsson J., Kuruvilla J., Coyaud E., Laurent E.M.N., Sunnerhagen M., Åhsberg J., Ungerbäck J., Strid T., Sigvardsson M., et al. ARID1a Associates with Lymphoid-Restricted Transcription Factors and Has an Essential Role in T Cell Development. J. Immunol. 2020;205:1419–1432. doi: 10.4049/jimmunol.1900959. [DOI] [PubMed] [Google Scholar]
  • 73.Wang Y., Zolotarev N., Yang C.Y., Rambold A., Mittler G., Grosschedl R. A Prion-like Domain in Transcription Factor EBF1 Promotes Phase Separation and Enables B Cell Programming of Progenitor Chromatin. Immunity. 2020;53:1151–1167.e6. doi: 10.1016/j.immuni.2020.10.009. [DOI] [PubMed] [Google Scholar]
  • 74.Gao H., Lukin K., Ramírez J., Fields S., Lopez D., Hagman J. Opposing effects of SWI/SNF and Mi-2/NuRD chromatin remodeling complexes on epigenetic reprogramming by EBF and Pax5. Proc. Natl. Acad. Sci. USA. 2009;106:11258–11263. doi: 10.1073/pnas.0809485106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Maurano M.T., Humbert R., Rynes E., Thurman R.E., Haugen E., Wang H., Reynolds A.P., Sandstrom R., Qu H., Brody J., et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337:1190–1195. doi: 10.1126/science.1222794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Okuyama K., Strid T., Kuruvilla J., Somasundaram R., Cristobal S., Smith E., Prasad M., Fioretos T., Lilljebjörn H., Soneji S., et al. PAX5 is part of a functional transcription factor network targeted in lymphoid leukemia. PLoS Genet. 2019;15 doi: 10.1371/journal.pgen.1008280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Dionne C.J., Tse K.Y., Weiss A.H., Franco C.B., Wiest D.L., Anderson M.K., Rothenberg E.V. Subversion of T lineage commitment by PU.1 in a clonal cell line system. Dev. Biol. 2005;280:448–466. doi: 10.1016/j.ydbio.2005.01.027. [DOI] [PubMed] [Google Scholar]
  • 78.Graham F.L., Smiley J., Russell W.C., Nairn R. Characteristics of a human cell line transformed by DNA from human adenovirus type 5. J. Gen. Virol. 1977;36:59–74. doi: 10.1099/0022-1317-36-1-59. [DOI] [PubMed] [Google Scholar]
  • 79.Nakano T., Kodama H., Honjo T. Generation of lymphohematopoietic cells from embryonic stem cells in culture. Science. 1994;265:1098–1101. doi: 10.1126/science.8066449. [DOI] [PubMed] [Google Scholar]
  • 80.Schmitt T.M., Zúñiga-Pflücker J.C. Induction of T cell development from hematopoietic progenitor cells by delta-like-1 in vitro. Immunity. 2002;17:749–756. doi: 10.1016/s1074-7613(02)00474-0. [DOI] [PubMed] [Google Scholar]
  • 81.Zhu H., Shyh-Chang N., Segrè A.V., Shinoda G., Shah S.P., Einhorn W.S., Takeuchi A., Engreitz J.M., Hagan J.P., Kharas M.G., et al. The Lin28/let-7 axis regulates glucose metabolism. Cell. 2011;147:81–94. doi: 10.1016/j.cell.2011.08.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Langmead B., Salzberg S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Ramírez F., Ryan D.P., Grüning B., Bhardwaj V., Kilpert F., Richter A.S., Heyne S., Dündar F., Manke T. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44:W160–W165. doi: 10.1093/nar/gkw257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Zhang Y., Liu T., Meyer C.A., Eeckhoute J., Johnson D.S., Bernstein B.E., Nusbaum C., Myers R.M., Brown M., Li W., Liu X.S. Model-based analysis of ChIP-Seq (MACS) Genome Biol. 2008;9 doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Li Q.B., Brown J.B., Huang H., Bickel P.J. Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat. 2011;5:1752–1779. [Google Scholar]
  • 86.Servant N., Varoquaux N., Lajoie B.R., Viara E., Chen C.J., Vert J.P., Heard E., Dekker J., Barillot E. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:259. doi: 10.1186/s13059-015-0831-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Li B., Dewey C.N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinf. 2011;12:323. doi: 10.1186/1471-2105-12-323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Heinz S., Benner C., Spann N., Bertolino E., Lin Y.C., Laslo P., Cheng J.X., Murre C., Singh H., Glass C.K. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell. 2010;38:576–589. doi: 10.1016/j.molcel.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Gu Z., Eils R., Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 2016;32:2847–2849. doi: 10.1093/bioinformatics/btw313. [DOI] [PubMed] [Google Scholar]
  • 91.Love M.I., Huber W., Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Stark R., Brown G. 2011. DiffBind: differential binding analysis of ChIP-Seq peak data.http://bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf [Google Scholar]
  • 93.Wickham H. 2nd. Springer Publishing Company, Incorporated; 2016. Ggplot2: Elegant Graphics for Data Analysis. [Google Scholar]
  • 94.Yu G., Wang L.G., Han Y., He Q.Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16:284–287. doi: 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Quinlan A.R., Hall I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Bruse N., van Heeringen S.J. GimmeMotifs: an analysis framework for transcription factor motif analysis. bioRxiv. 2018 doi: 10.1101/474403. Preprint at. [DOI] [Google Scholar]
  • 97.Gu Z., Gu L., Eils R., Schlesner M., Brors B. circlize Implements and enhances circular visualization in R. Bioinformatics. 2014;30:2811–2812. doi: 10.1093/bioinformatics/btu393. [DOI] [PubMed] [Google Scholar]
  • 98.Gao C.H., Chen C., Akyol T., Dusa A., Yu G., Cao B., Cai P. ggVennDiagram: Intuitive Venn diagram software extended. Imeta. 2024;3 doi: 10.1002/imt2.177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Bonfield J.K., Marshall J., Danecek P., Li H., Ohan V., Whitwham A., Keane T., Davies R.M. HTSlib: C library for reading/writing high-throughput sequencing data. GigaScience. 2021;10 doi: 10.1093/gigascience/giab007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Shannon P., Markiel A., Ozier O., Baliga N.S., Wang J.T., Ramage D., Amin N., Schwikowski B., Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Lex A., Gehlenborg N., Strobelt H., Vuillemot R., Pfister H. UpSet: Visualization of Intersecting Sets. IEEE Trans. Vis. Comput. Graph. 2014;20:1983–1992. doi: 10.1109/TVCG.2014.2346248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Langmead B., Trapnell C., Pop M., Salzberg S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S3
mmc1.pdf (1.4MB, pdf)
Table S1. Supplemental data

(A) Information about datasets.

(B) B- and T-lineage specific gene sets.

(C) Output from ICE-A TF co-occupancy analysis for B cell specific target genes.

(D) Output from ICE-A TF co-occupancy analysis for T cell specific target genes.

(E) Guide information for inactivation of putative EBF1 targets.

(F) Cell counts from EBF1 target inactivation experiment.

(G) Information about elements with EBF1 dependent regulation.

(H) Information about alternative promoters of T-lineage distal elements.

mmc2.xlsx (3.8MB, xlsx)

Data Availability Statement

  • Data: The generated sequencing datasets in this study are deposited in the Gene Expression Omnibus (GEO) database with accession numbers GSE279957 (ChIP-seq) and GSE279961 (PLAC-seq). The GEO accession number and information related to the previously published data used in this study are presented in key resources table and Table S1A.

  • Code: ICE-A is available as an open source tool on GitHub: https://github.com/Tingvall/ICE_A. All the code related to the analysis and generation of figures for this article is available on GitHub: https://github.com/Tingvall/ICEA_analysis.

  • Other: Supplemental information including gene sets, element and guide info is available in Table S1.


Articles from iScience are provided here courtesy of Elsevier

RESOURCES