Abstract
Inter-chromosomal interactions play a crucial role in genome organization, yet the organizational principles remain elusive. Here, we introduce a novel computational method to systematically characterize inter-chromosomal interactions using in situ Hi-C results from various cell types. Our method successfully identifies two apparently hub-like inter-chromosomal contacts associated with nuclear speckles and nucleoli, respectively. Interestingly, we discover that nuclear speckle-associated inter-chromosomal interactions are highly cell-type invariant with a marked enrichment of cell-type common super-enhancers (CSEs). Validation using DNA Oligopaint fluorescence in situ hybridization (FISH) shows a strong but probabilistic interaction behavior between nuclear speckles and CSE-harboring genomic regions. Strikingly, we find that the likelihood of speckle-CSE associations can accurately predict two experimentally measured inter-chromosomal contacts from Hi-C and Oligopaint DNA FISH. Our probabilistic establishment model well describes the hub-like structure observed at the population level as a cumulative effect of summing individual stochastic chromatin-speckle interactions. Lastly, we observe that CSEs are highly co-occupied by MAZ binding and MAZ depletion leads to significant disorganization of speckle-associated inter-chromosomal contacts. Taken together, our results propose a simple organizational principle of inter-chromosomal interactions mediated by MAZ-occupied CSEs.
INTRODUCTION
The emergence of Hi-C and its variant methods has allowed the genome-wide observation of chromatin interactions and uncovered the principles and functions of multi-layered genome organization (1–3). Analyses of intra-chromosomal Hi-C contact maps have identified chromosomal compartmentalization at multiple length scales ranging from several megabases to a few kilobases, such as compartment A/B, topologically associating domains (TAD), and chromatin loops (1,3–5). The cell-type specific higher-order structures strongly support that the chromatin in a nucleus is not just randomly packed but rather adopts well-organized hierarchical structures tightly coupled with various genome functions (4,6–10). However, the poor detection efficiency and limited spatial distances measured by proximity ligation in Hi-C assays have hampered extraction of reliable information on inter-chromosomal interactions (11,12).
Newly developed sequencing and imaging approaches have revealed that inter-chromosomal interactions are a substantial part of genome organization. SPRITE, scSPRITE, and RD-SPRITE have identified the inter-chromosomal hubs organized around nuclear bodies, such as nuclear speckles and nucleoli (13–15). TSA-Seq also enables mapping the 3D genome organization relative to specific nuclear compartments, illustrating two types of transcription hot zones in terms of distance from nuclear speckles (16). The development of genome-scale chromatin imaging techniques including DNA-MERFISH, seqFISH+, OligoFISSEQ, and in situ genome sequencing has further confirmed the existence of inter-chromosomal interactions, albeit with considerable variability at single-cell resolution (17–20). In addition to experimental approaches, computational methods, such as SNIPER and SPIN, have enabled the analysis of the genome compartments from Hi-C data (21–23). Taken together, these results strongly suggest the presence of organizational principles that constrain the overall spatial arrangement of inter-chromosomal interactions.
Despite these technical advances, the governing principle underlying inter-chromosomal interactions is still poorly understood. In particular, it is unclear whether and how inter-chromosomal interaction hubs are organized in cell-type-specific or cell-type-independent manners. Moreover, the detailed molecular mechanism and the associated genomic elements responsible for these interactions have not yet been studied. Due to technical difficulties, the aforementioned techniques for probing inter-chromosomal interactions have been applied to only a limited number of cell types. Therefore, a systematic investigation of inter-chromosomal interactions across multiple cell types and their consequential effect on gene regulation has not been performed. In this regard, we reasoned that a new method enabling the effective extraction of inter-chromosomal interactions from Hi-C datasets, widely available for numerous cell/tissue types, can be highly useful to probe the principle underlying inter-chromosomal interactions.
To tackle such a challenge, we devised a new computational method that projects sparse Hi-C inter-chromosomal contact maps into lower dimensions using non-negative matrix factorization (NMF) (24). The non-negativity property of Hi-C contact maps allowed us to apply NMF to deconvolve distinct spatial conformations of inter-chromosomal organization. The new method successfully identified two types of inter-chromosomal interactions, each associated with either a nuclear speckle or nucleolus, respectively. We showed that the speckle-associated inter-chromosomal interactions were highly conserved across various human cell types. Further, genomic regions involved in these interactions are significantly enriched with common super-enhancers (CSEs) (25). Importantly, DNA Oligopaint FISH analyses revealed probabilistic interaction behaviors of speckle-associated inter-chromosomal interactions at the single-cell level, in contrast to predictions from a fixed hub-like structure. Lastly, our DNA-binding motif analysis revealed that MYC Associated Zinc Finger Protein (MAZ) is highly co-occupied with CSEs, which might drive speckle-associated inter-chromosomal interactions. Consistent with our finding, DNA Oligopaint FISH and Hi-C experiments on MAZ knockdown (KD) cells showed a marked disorganization of inter-chromosomal interactions. Taken together, we proposed that inter-chromosomal interactions are probabilistically established through a stochastic association between MAZ-occupied CSEs and nuclear speckles.
MATERIALS AND METHODS
Cell culture
IMR90 and HeLa cells were cultured in EMEM with 10% FBS and 1% Penicillin-Streptomycin at 37°C in a humidified incubator with 5% CO2. Every 2–3 days, the media were exchanged, and cells were passaged when they reached 90% confluency. HUVEC cells were cultured in identical incubation conditions except for culture media (EGM-2). Before the Oligopaint FISH experiments, cells were plated on the round 40-mm coverslips (Bioptechs, 40-1313-0319) and cultured for 2 days. Sources of cell lines used in the study are listed in Supplementary Table S1. Cells were tested for mycoplasma contamination.
DNA oligopaint fluorescence in situ hybridization (oligopaint FISH)
Oligonucleotide probe design
The primary probes are 120-nt oligonucleotides that bind directly to genomic DNA in the cell nucleus and are composed of the following five domains in 5′ to 3′ order.
A 20-nt forward primer sequence for PCR amplification
A unique 20-nt readout target sequence for each targeted locus, which is reverse complementary to the first 20-nt sequence of readout probes
A 40-nt genomic homology sequence to hybridize targeted genomic locations
A 20-nt readout target sequence identical to that in (2)
A 20-nt reverse primer sequence for PCR amplification
The forward and reverse primers, as well as the readout target sequence, were chosen from a verified human genome orthogonal list in a previously published work (26). The genomic homology sequence was obtained through an open-source online interface (http://www.ifish4u.org/) (27). For each CSE/nCSE of 30 kb regions, a total of 300 primary probes were designed to hybridize.
The readout probes are 51-nt oligonucleotides with 3 domains: 20-nt readout sequence, 10-nt toe-hold sequence, and 20-nt imaging target sequence. The 20-nt readout sequence binds to the readout target sequence in primary probes, and the imaging target sequence binds to the imaging probe, i.e. 21-nt 5′-Cy5 conjugated oligonucleotide. The toe-hold sequence is unique for each targeted locus and enables the detachment of readout probes from primary probes through an action of DNA-strand displacement (DSD) probe, which is the reverse complementary to the readout probe's initial 30-nt sequence. The procedure for attaching and detaching readout probes is identical to that described in a previously published work (26). Sequence information of all probes used in the study is listed in Supplementary Tables S2–S5.
Primary probe synthesis
The primary probe synthesis is identical to the published RNA-MERFISH protocol (28). Briefly, the oligo-pool containing primary probe sequences was amplified by a limited-cycle PCR with the T7-promoter conjugated reverse primers. The ssRNA was generated through overnight in-vitro transcription at 37°C, and reverse transcription was performed to obtain the bulk of ssDNA primary probes.
Combination of immunofluorescence and primary probe hybridization
The immunofluorescence of SC35, a nuclear speckle marker, was conducted before DNA-FISH. The sample was briefly rinsed with 1x PBS and fixed with 4% formaldehyde in 1× PBS for 10 min at RT. Then, the sample was washed three times in 1x PBS and permeabilized with 0.5% Triton X-100 in 1× PBS for 10 min at RT. After washing in 1× PBS three times, the sample was blocked with the blocking buffer (10% Normal Goat Serum in 1× PBST) for 15 min at RT. The primary antibody was diluted 1:1000 in the existing blocking buffer and incubated for 45 min at RT. The sample is then washed three times in 1× PBS and incubated with 1:1000 diluted secondary antibody in blocking buffer for 40 min at RT.
For Oligopaint FISH, the sample was washed three times in 1× PBS and post-fixed in 4% formaldehyde in 1× PBS for 10 min at RT. Then, further permeabilization was conducted with 0.5% Triton X-100 in 1× PBS and 0.1 M HCl for 10 min and 5 min, respectively, at RT. The sample is then washed three times with 1× PBS and incubated in 0.1 mg/ml RNase A in 1× PBS for 45 min at 37°C. After washing three times in 2× SSC, the sample was incubated in the hybridization buffer (50% formamide, 10% dextran-sulfate, and 0.1% Tween-20 in 2× SSC) for 1 h at 47°C. The sample coverslip was faced-down and immersed with ∼2 μg primary probes in 100 μl hybridization buffer inside the 60-mm petri dish. The dish was partially submerged in a 90°C water bath for 3 min and immediately incubated in a humidified chamber for 16 h at 47°C. After the incubation, the sample was gently detached from the dish and washed twice with the 2× SSC for 10 min each at 47°C. The nuclei were stained with 500 ng/ml DAPI in 2× SSC for 10 min at RT, followed by 2× SSC washing. Then, the sample was incubated with 1:1000 diluted fiducial beads (Merck, L3280) in 2× SSC for an hour. The sample coverslip was assembled in the FCS2 chamber (Bioptechs, 060319-2-NH) for imaging.
Instrumental setup
Image acquisition was performed using a 100× oil immersion objective with 1.49 NA on a Nikon Ti2-E inverted microscope connected with four-laser units (405, 488, 561, 645 nm) and a Nikon intenslight C-HGFI to provide fluorescence illumination. The fluorescent signals were collected to iXon Ultra 897 EMCCD camera.
The microfluidic device was designed to allow for sequential input of reagents into the FCS2 chamber. Two bidirectional 11-port/10-way microfluidic valves (Fluigent, ESSMSW003) are connected to each other with a controller (Fluigent, ELUSEZ) connected to one of the valves. Each port of the valves is occupied by conical tubes containing either wash buffer (25% ethylene carbonate in 2× SSC) or imaging buffer (500 ng/ml DAPI, 0.5 mg/ml glucose oxidase, 40 μg/ml catalase, and 10% glucose in 2× SSC) or microtubes containing readout solutions (15 nM readout probe, 16.5 nM imaging probe, 45 nM DSD probe and 25% ethylene carbonate in 2× SSC). All ports are connected to conical tubes or microtubes through metal air-tight caps (Fluigent, P-CAP2-HP, P-CAP15-HP, or P-CAP50-HP). Nitrogen pressure flow generated by a pressure pump (Fluigent, LU-FEZ) is redirected to each ports through 10-way pressure manifolds. All devices were controlled automatically by MATLAB and during 40 rounds of sequential imaging, microtubes containing the readout solution were exchanged manually.
Sequential hybridization of readout probes and image acquisition
Each hybridization step included the following procedures.
Injection of 1 ml of readout solutions. The solution contained a readout probe for imaging the targeted locus and a 5′ Cy5-labeled imaging probe. Additionally, a matching DSD probe capable of detaching the previous readout probe was also included in the solution (except for the first round).
Following a 15-min incubation, the wash buffer was flowed through for 140 s.
Imaging buffer was pumped into the sample-containing flow chamber for 120 s and then incubated for 5 min.
Acquisition of cell images at multiple locations
For each field of view, the nucleus, speckles, and fiducial beads were first captured before any hybridization steps, using the 405-nm, 488-nm, and 561-nm channels, respectively. Then, multiple rounds of probe hybridization followed by image acquisition for beads and FISH signals were conducted, using the 645-nm and 561-nm channels, respectively. All image acquisitions involve z-stacks of 0.2-μm in size to scan the entire cell nucleus.
40 Imaging target loci selection
To validate the association of CSEs to nuclear speckle compared to nCSEs and other regions, we selected 20 CSEs and 10 nCSEs for DNA Oligopaint FISH. Precisely, we selected 17 CSEs with top S-basis values, 3 CSEs with moderate S-basis values, and the 10 nCSEs with bottom S-basis values from IMR90 data. To ensure the generation of the unbiased DNA Oligopaint FISH inter-chromosomal contact map, we selected at most three SEs from the same chromosome. For the target genes, we chose four CSE-proximal genes (genomic distance to the nearest CSE < 1Mb) including two highly active (FPKM > 100) genes and inactive (FPKM < 1) genes. We also selected two highly active CSE-distal genes with criteria of FPKM > 100 and genomic distance to nearest CSE > 40Mb.
Image analysis
Nucleus and speckle localization
The cell nucleus was recognized through a threshold filter, filtering only pixels with intensities a standard deviation (s.d.) above the mean value of DAPI signals. Also, the speckle territories were identified by applying a threshold of 1.5 s.d. above the mean value of immuno-fluorescence signals in the nucleus region.
Drift correction and DNA loci localization
Fiducial bead signals were used to determine the drift of each imaging round. The correction values for sample drift were obtained from offsets that minimized the disparity of bead locations between the initial reference and those at later time points.
After Gaussian filtering was applied to the FISH images to exclude background noise, the locations of individual DNA loci were identified by applying thresholding with the manually defined threshold values. To localize the spatial coordinates of individual loci at the sub-pixel resolution, the centroids were computed using images cropped around each DNA locus.
RNA interference
RNA interference
HeLa cells were transfected with 100 nM of siRNA using Lipofectamine RNAiMAX and re-seeded after 48h. Then, HeLa cells were transfected again with 100 nM of siRNA using Lipofectamine RNAiMAX and harvested at 48 h (96 h in total). Knockdown efficiencies of siRNA (Dharmacon, SMARTpool) are listed in Supplementary Table S6. The efficiency of the RNA interference was measured by mRNA (RT-qPCR) level.
RT-qPCR
RNA was extracted from 106 harvested cells with Nucleospin RNA plus kit (Macherey-Nagel, MN740984). 1 μg of total RNA was used for complementary DNA synthesis, using SuperScript™ IV Reverse Transcriptase. RT-qPCR was performed using iQ SYBRP Green Supermix. The primer sequences used are listed in Supplementary Table S7.
Western blot
106 harvested cells were incubated at 100°C with 1× sample buffer and resolved by SDS-PAGE using 4-20% gradient gel (Biorad, 4561093). The proteins were transferred to a nitrocellulose membrane (Cytiva, 10600004) and blocked with 1% BSA in TBST. The membrane was incubated with MAZ antibody overnight at 4°C and washed with TBST. A secondary antibody (Biorad, BR1706515) was treated for 1h at RT and washed with TBST. The proteins were detected with ECL.
Chromatin immunoprecipitation sequencing
ChIP-seq library
Chromatin immunoprecipitation sequencing (ChIP-seq) was performed to profile genome-wide histone 3 lysine 27 acetylation (H3K27ac) on control and MAZ KD samples with two biological replicates. 106 harvested cells were crosslinked in a resuspension buffer of 10 ml PBS and 100 μl FBS, and 1% formaldehyde for 9 min at RT. The crosslinking was quenched with 250 mM glycine for 5 min at RT and 15 min on ice. The samples were suspended in SDS lysis buffer of 1% SDS, 50mM Tris–HCl pH 8, 10 mM EDTA and protease inhibitor. Chromatin fragmentation was performed by sonication (Covaris, S220) in the volume of 100 μl. After centrifugation at 13 000 rpm for 15 min at 4°C, the sonicated chromatin in the supernatant was diluted 10 times with dilution buffer of 0.1% Triton X-100, 0.1% SDS, 15 mM Tris–HCl pH 8, 1 mM EDTA, 150 mM NaCl, and protease inhibitor. The sonicated chromatin was incubated with protein-A Dynabead (ThermoFisher, 10001D) coated with anti-H3K27ac antibody overnight at 4°C with rotation. The chromatin-antibody-bead complex was thoroughly washed with varying salt concentrations optimized for the antibody used. The complex was treated with RNaseA, and reverse-crosslinked overnight at 65°C. The immunoprecipitated DNA was extracted with AMPure XP beads (Beckman Coulter, A63881), and ChIP-seq libraries were generated using NEBNext Ultra II DNA Library Prep Kit (NEB, E7645). The ChIP-seq libraries were sequenced in 75bp paired-end mode using Nextseq 550 platform.
ChIP-seq analysis
Paired-end ChIP-seq reads were aligned to the reference genome (hg38) with BWA-MEM with default parameters (29). Putative PCR duplicates were removed with Picard and low-quality reads (MAPQ < 10) were filtered out. H3K27ac peak information was obtained with MACS2 callpeak, with a q-value cutoff 0.05.
To define differential H3K27ac peaks, consensus H3K27ac peak regions were obtained by merging peaks from all HeLa samples. ChIP-seq read counts of the consensus peak regions were obtained using bedtools coverage (30). Differential peaks were defined from the read counts using DESeq2 with FDR < 0.05 (31).
In situ hi-C
106 harvest cells for two biological replicates of control and MAZ KD samples were crosslinked in a resuspension buffer of 10 ml PBS and 100 μl FBS, and 1% formaldehyde for 9 min at RT. The crosslinking was quenched with 250 mM glycine for 5 min at RT and 15 min on ice. The crosslinked cells were lysed with 10nM Tris–HCl pH 8, 10 mM NaCl, and 0.2% IGEPAL CA630, and digested with 100 U MboI. The digested fragments were labelled with biotin-14-dCTP and re-ligated with T4 DNA Ligase. The ligated samples were reverse-crosslinked with 2 μg/μl proteinase K, 1% SDS, and 500 mM NaCl overnight at 65°C. The DNA fragments were extracted with AMPure XP beads (Beckman Coulter, A63881) and sonicated (Covaris, S220) into optimal lengths (around 300∼400 bp). The biotin-labelled DNA fragments were pulled down with Dynabeads MyOne streptavidin T1 beads (Invitrogen, 65602), and thoroughly washed. Hi-C libraries were generated by performing DNA end repair, removal of un-ligated ends, adenosine addition at 3′ end (NEB, M0212), ligation of Illumina indexed adapters (NEB, M2200), and PCR amplification (Thermo Fisher Scientific, F549). The in situ Hi-C libraries were sequenced in 100bp paired-end mode using MGI DNBSEQ-G400 platform.
RNA sequencing
RNA-seq library
RNA was extracted from 106 harvested cells from two replicates with a Nucleospin RNA XS kit (Macherey-Nagel, MN740902). RNA-seq libraries were prepared using TruSeq stranded mRNA library prep kit (Illumina, 20020594). RNA-seq libraries were sequenced in a 75 bp paired-end mode, using Nextseq 550 platform.
RNA-seq analysis
Paired-end reads were aligned to the reference genome (hg38) using STAR software v2.7.8a with default parameters (32). The gene counts were quantified with RSEM based on a GENCODE v32 (33). The differentially expressed genes (DEGs) were defined using DESeq2 with a false discovery rate (FDR) < 0.05 (31). Among the obtained DEGs, only genes annotated as protein-coding genes with confidence levels 1 and 2 in GENCODE v32 were selected. Gene ontology (GO) analysis was performed using Metascape with biological process (BP), cellular components (CC), and KEGG pathway terms (34).
HiCAN
Hi-C data alignment and normalization
In situ Hi-C data were downloaded from Sequencing Read Archive (SRA). The downloaded fastq data were mapped to the mouse reference genome mm10 or human reference genome hg38 with BWA-MEM. Chimeric reads spanning multiple sites of the genome were filtered out. Reads with lower mapping quality (MAPQ < 10) were removed. We also removed nearly-mapped paired reads (< 15 kb) to filter out self-ligating reads. Considering the smaller number of Hi-C inter-chromosomal reads and the higher number of possible contact partners, the mapped reads were assigned into 500 kb genomic bins to generate a 500 kb Hi-C contact map containing both inter and intra-chromosomal reads. To avoid any biases caused by over-representing during the experiment, we removed bins that overlapped with ENCODE unified blacklist (35). To consider possible genome-dependent bias, ICE normalization (36) was performed on the 500 kb Hi-C contact map using the R package ‘dryhic’. Before the normalization step, bins with 3-fold less coverage than average coverage were excluded to avoid putative coverage-dependent normalization bias. Lastly, quantile normalization was performed among the processed Hi-C contact maps obtained from different cell types to normalize depth differences.
Gene annotation
The gene annotation information was obtained from GENCODE human comprehensive gene annotation version 27 (37). Only genes annotated as ‘protein-coding’ in levels 1 and 2 were used. To obtain genome-wide 500 kb gene density, we calculated the number of TSS sites of the protein coding genes within 500 kb genomic bins using ‘bedtools coverage’ (30).
HiCAN
To further analyze inter-chromosomal interactions from the Hi-C contact map, we need to ensure that the intra-chromosomal interactions have no impact on the feature extraction. To avoid a zero-inflated matrix issue, instead of eliminating intra-chromosomal interactions, we first generated the random values that follow the distribution of inter-chromosomal interactions of each chromosome. Next, we replaced the intra-chromosomal interactions of each chromosome with the randomly generated values so that the intra-chromosomal interactions can be considered as random noise during the feature extraction.
NMF was conducted on the Hi-C contact map to project interactions into low-dimensional space with the R package ‘NMF’. We used ‘nndsvd’ initialization (38) for NMF instead of random initialization to obtain optimized output without multiple iterations. We decomposed the Hi-C contact map with factorization rank r = 3, and we selected two of the three bases that are redundantly observed with different factorization rank values in the 8 cell lines. We considered these two bases as the major bases of the Hi-C inter-chromosomal contact maps. The two bases from each cell line were further categorized into the gene-rich basis and gene-poor basis based on gene density.
GB176 HiCAN analysis
A published Hi-C data obtained from a glioblastoma cell line (GB176) was downloaded (39), followed by applying HiCAN to extract S-basis. All reported reciprocal translocation coordinates were also downloaded. S-basis values within 5 Mb from the translocation breakpoints were considered as ‘near breakpoints’.
HiCAN analysis on MAZ KD/KO hi-C data
In situ Hi-C results for HeLa MAZ KD samples and published mESC MAZ knockout (KO) samples were processed as described above. HiCAN was applied to the processed inter-chromosomal Hi-C contact maps. For sensitive detection of inter-chromosomal interaction alteration, we normalized the depth of inter-chromosomal contact frequencies between the samples. The S-basis values were log-transformed and used as an input to LIMMA (40). We defined genomic regions with significantly increased/decreased S-basis upon MAZ depletion using the limma-trend algorithm in HeLa (FDR < 0.05) and mESC (FDR < 0.1).
TAD boundary
For TAD boundary calling in MAZ KD HeLa samples, the 40 kb intra-chromosomal Hi-C matrices were obtained from the alignment file. The potential genome-dependent biases were normalized using covNorm (41). The TAD boundaries were defined using the directionality index (DI) score as previously described (4), with 40 kb of bin size and 2 Mb of window size.
Comparison of HiCAN results with other methods
MERFISH
Published IMR90 MERFISH results (19) containing in vivo distance information of ∼1000 genomic loci to the closest nuclear speckle and nucleolus were downloaded. Using this information, we obtained each genomic loci's empirical contact (distance < 250 nm) frequencies to nuclear speckles and nucleolus, respectively. We compared the contact frequencies to the values of NMF bases of the corresponding genomic loci.
SPRITE
We accessed to genomic coordinates of the ‘active’ and ‘nucleolar’ hub defined by SPRITE, provided by the previous study (15). The given hg19 coordinates were lifted over to hg38 with the ‘UCSC liftover’.
Subcompartment annotation
To compare the HiCAN bases to previously defined compartment A/B patterns, we downloaded published subcompartment annotations (A1, A2, B1, B2 and B3) in GM12878 (3). The given hg19 subcompartment annotation was lifted over to hg38 with the ‘UCSC liftover’, and directly compared with HiCAN bases.
Super-enhancer analysis
Public histone ChIP-seq data
Public histone ChIP-seq data used in this research were obtained from ENCODE, as in replicated narrow peak format. The peak coverages in 500 kb genomic bins were calculated using bedtools (30).
Super-enhancer annotations
Coordinate, cell-type specificity (common, intermediate, and specific), and H3K27ac level of super-enhancers were obtained from the previous research (25). The given hg19 coordinates were lifted over to hg38 using ‘UCSC liftover’. To categorize cell-type specificity of super-enhancers more strictly, we narrowed down the common super-enhancer list by selecting super-enhancers called at least 10 cell types from the original list. The remaining super-enhancers of the original list were re-assigned as intermediate.
Super-enhancer call
Super-enhancer information in HeLa cell was obtained from HeLa siRNA control H3K27ac ChIP-seq by ROSE algorithm with default parameters (42). The super-enhancers overlapped with the common super-enhancer domains were defined as HeLa CSEs.
CSE protein motif analysis
We applied HOMER (43) to select protein motifs commonly enriched in CSEs H3K27ac peaks over the genomic background with the command ‘findMotifsGenome.pl –size 500’ in each cell line. The enriched motif candidates were filtered by the JASPAR CORE vertebrate non-redundant list (44). We further filtered out protein motifs whose gene is not expressed (FPKM = 0) in at least one of the four cell types. The filtered motifs obtained from all 4 cell lines were selected as ‘pre-candidate’ (n = 76). For the pre-candidate motifs, we calculated motif enrichment scores in terms of the number of motifs in CSE H3K27ac peaks, SSE H3K27ac peaks, and the other H3K27ac peaks via FIMO with the command ‘fimo –bfile –uniform–’. K-means clustering (K = 10) was performed on motif enrichment scores of CSEs and SSEs over typical H3K27ac peaks from the 4 cell lines. We manually reordered the clusters and identified CSE-specific motifs (n = 25).
Protein sequence analysis
The disordered scores of the candidate proteins were estimated by the PONDR webserver using VSL2 (45). We counted the numbers of different amino acid types in the proteins, which were then normalized by the total number of residues in the disordered regions. The structures of the proteins were further predicted by AlphaFold (46).
Validation of the probabilistic establishment model
Model-predicted inter-chromosomal contact map
To confirm our model which suggests that the formation of nuclear body-associated inter-chromosomal interactions is fully dependent on nuclear body-to-chromatin contact probabilities, we first calculated empirical speckle association (distance < 300 nm) probabilities of the 40 imaged loci from the DNA Oligopaint FISH results. We then constructed the model-predicted speckle-associated inter-chromosomal contact map among the imaged loci by calculating co-contact probabilities to the same speckle based on the obtained probabilities. Precisely, the predicted map was calculated by vector product between the calculated empirical speckle association vector and its transpose.
As a control for the model, we replaced the empirical speckle association rates with gene transcription score or chromatin activity. The gene transcription score was defined as the sum of the expression (FPKM) of all protein-coding genes within each imaged locus. For the chromatin activity, we calculated the number of H3K27ac peaks within each locus per bp. The predicted map was calculated by vector product between a vector of gene transcription scores (or H3K27ac peak densities) and its transpose.
We also tested the deterministic model where we assumed that each locus has fixed contact partners. Precisely, we assumed that each imaged locus can have five fixed contact partners at maximum. After sorting the values of the DNA-Oligopaint-FISH-measured inter-chromosomal contacts for all pairs of loci in descending order, we select the top pair of loci and considered them as fixed contact partners of each other. Then, we repeated this process until all possible contact partners are selected. The predicted map was calculated by vector product between a vector of DNA Oligopaint FISH-measured speckle contact frequency with only fixed contact partners and its transpose. In the model, the changes in the number of contact partners did not produce meaningful prediction results.
Empirically-measured inter-chromosomal contact map
To generate the empirically-measured speckle-associated inter-chromosomal contact map, the FISH dot pairs of distinct readouts that both are associated (distance < 300 nm) on the same speckle were counted. The counted number of each distinct readout (RO) pairs are normalized by the corresponding total possible number of pairs. The total possible number of contact pair is calculated by the summation of the product of each RO dots number in every cell nucleus. For instance, to normalize the number of RO 1 and 2 contact pair, we multiply the number of RO 1 and 2 FISH dots in each cell and add these numbers together to obtain total possible number of RO 1 and 2 contact pair. Then, the empirically counted RO 1 and 2 contact pair number is normalized by this total possible number of RO 1 and 2 contact pair.
Quantification and statistical analysis
Image averaging analysis
Image averaging analysis was performed to determine average speckle signals surrounding a set of CSE/nCSE FISH dots. Once images (41 × 41 pixels) centered on each FISH dot were cropped, average speckle and FISH images were generated by averaging pixel values for corresponding fluorescence channels.
Spatial distance between genomic locus and speckles
To compute the spatial distance of individual FISH dots to the nearest speckles, the Euclidean distances were first computed from the center of each FISH dot to pixels corresponding to the interior of speckles. Then, the minimum distance value was chosen as the spatial distance of each locus to the nearest speckle.
Bootstrapping analysis
Bootstrapping was performed to check the stability of the comparison results of the model-predicted contact map to the Hi-C contact map and the DNA Oligopaint FISH-measured contact map. The bootstrapped speckle association rates of the imaged loci were calculated by performing random sampling with a replacement on distances of the imaged dots to the nearest speckle. A model-predicted map was constructed based on the bootstrapped speckle association rates as described before, and Pearson correlation coefficients of the map compared to the Hi-C contact map and the DNA Oligopaint FISH-measured contact map were calculated. We repeated the steps 10,000 times to obtain the bootstrapped distribution of Pearson correlation coefficients.
Statistical tests
To compare data distribution between two different groups, a two-sided t-test or a two-sided Welch t-test was used depending on the data variance when the data follow the normal distribution. To compare data from the same genomic regions but with different conditions, a two-sided paired t-test was used. Otherwise, a two-sided KS test was used. Asterisks indicate each P-values (* P < 0.05, ** P < 0.01, *** P < 0.005).
RESULTS
HiCAN captures two types of nuclear hub-associated inter-chromosomal interactions
To systematically identify genomic regions establishing inter-chromosomal interactions from Hi-C contact maps, we devised a new computational method named Hi-C inter-chromosomal contact map analysis with NMF (HiCAN) (see Materials and Methods). Reasoning from the previously reported hub-like organizations (15), HiCAN aimed to extract distinct conformations of inter-chromosomal interactions by overcoming the limited capability of Hi-C assays to detect relatively scarce inter-chromosomal interactions. The core design principle of HiCAN is mainly composed of three steps: construction of an intra-chromosomal interaction-filled inter-chromosomal Hi-C contact map, projection of the contact map into three low-dimensional spaces with NMF, and annotation of S (speckle-associated)-, N (nucleolus-associated)-, and U (undefined)-basis based on gene density (Figure 1A) (see Materials and Methods). The numeric value of each entry (or, equivalently, each genomic locus) in an NMF basis indicates the degree to which the genomic locus belongs to the corresponding basis. When we decomposed Hi-C contact maps with factorization rank 3, S- and N-bases values showed highly skewed distributions to specific genomic regions exclusive to each other, representing distinct modes of inter-chromosomal organizations. In contrast, U-basis values followed a normal distribution across the entire genome, indicating genome-wide background signals (Supplementary Figure S1A). Importantly, the non-orthogonality and non-negativity of NMF bases enable effective deconvolution of the complex intermingled inter-chromosomal interactions into biological interpretable low-dimensional structures. Thus, using HiCAN, we anticipated identifying three different modes of inter-chromosomal organizations.
Figure 1.
HiCAN extracts multiple modes of inter-chromosomal organization from Hi-C chromatin contact maps (A) A schematic illustration of HiCAN. The normalized Hi-C inter-chromosomal contact map was deconvoluted into three basis vectors (S: speckle-associated, N: nucleolus-associated, U: undefined) at 500 kb resolution. (B) A sorted IMR90 Hi-C inter-chromosomal contact map among the top 100-ranked genomic bins in each basis. The color of the heatmap indicates normalized Hi-C contact frequencies. The bar plots below the contact map indicate the corresponding basis vectors (orange for S-basis, sky blue for N-basis, and grey for U-basis). (C, D) A scatter plot illustrating the relation between S-basis (or N-basis) values and MERFISH-measured contact frequencies with nuclear speckles (or nucleolus) (n = 941). Spearman's correlation coefficients (SCC) were shown together. (E, F) A boxplot illustrating distributions of the three bases on ‘Active hub’ (or ‘Nucleolar hub’) regions previously defined by SPRITE (n = 110 for Active hub and n = 471 for Nucleolar hub) (P-values < 2.2 × 10−16, two-sided KS test). (G) Combined DNA Oligopaint FISH and SRSF2 immunofluorescence images of IMR90 cells. The high S-basis loci, labeled as S1 and S2, and the low S-basis loci, labeled as control 1 and 2, are shown together with the nuclear speckle marker, SRSF2. Inset images highlight the identified speckle boundary (green) and the spatial locations of genomic loci (circle: high S-basis loci, triangle: low S-basis loci). The dashed lines indicate the nuclear periphery. (H) A bar plot illustrating each S-basis value of imaged loci in G. (I) A boxplot illustrating distributions of distances to the nearest speckle of the imaged dots in G (coral: high S-basis loci, grey: low S-basis loci) (P-value = 4.99 × 10−8, two-sided KS test) (50 cells; n = 98, 98, 100 and 99 for s1, s2, control1, and control2, respectively). For the boxplots (E, F and I), the box represents the interquartile range (IQR), and the whiskers correspond to the highest and lowest points within 1.5× IQR.
We applied HiCAN to publicly available Hi-C results from eight different cell types (IMR90—lung fibroblasts, GM12878—lymphoblastoid cells, HMEC—human mammary epithelial cells, HUVEC—human umbilical vein endothelial cell, NHEK—normal human epidermal keratinocytes, GM23248—skin fibroblasts, teloHAEC—immortalized human aortic endothelial cells and WI38-RAF—lung fibroblasts) at 500 kb resolution. We identified two major bases (S and N) consistently extracted from every cell type examined (Supplementary Figures S1B and C). These two bases were highly reproducible between biological replicates, at different resolutions, and even at different sequencing depths (Supplementary Figures S1D−F). Notably, we confirmed that the two bases were still reproducible with only 14 million inter-chromosomal reads (Supplementary Figure S1F). Further, genomic loci strongly belonging to the same basis showed higher inter-chromosomal contacts than those in different bases, indicating compartmentalized inter-chromosomal organizations (Figure 1B and Supplementary Figure S1G). Although one basis, the S-basis, was enriched in gene density while the other major bases were depleted, similar to compartment A/B patterns (Supplementary Figure S1H), we did not observe a simple one-to-one relationship between two major bases in HiCAN and compartment A/B (Supplementary Figure S1I). Thus, HiCAN captures unique 3D inter-chromosomal organizations from Hi-C contact maps.
Motivated by a recent study showing chromosomal interaction hubs surrounded by nuclear speckles and nucleoli (15), we next examined whether each mode of inter-chromosomal interactions extracted by HiCAN recapitulates previously reported hub-like organizations. Strikingly, we found that S- and N- HiCAN bases values highly resembled the contact frequency of each locus with the nuclear bodies based on published MERFISH data (19). A strong correlation was observed between the values in the gene-rich, S-basis (or the gene-poor, N-basis) values and the loci-to-speckle (or loci-to-nucleolus) contact frequency. (Figures 1C and D). In contrast, the U-basis did not show meaningful relationships with both loci-to-speckle/nucleolus contact frequencies (Supplementary Figures S1J and K).
Not only the MERFISH imaging results but also the SPRITE results (15) strongly supported the ability of HiCAN to capture previously known hub-like chromosomal organization (see Methods, Figures 1E and F). Compared to SPRITE results, the S-basis exhibited a marked enrichment of speckle hub genomic regions, while the N-basis showed a strong association with nucleolar hub genomic regions.
Using DNA Oligopaint FISH in IMR90, we also tested whether the high S-basis regions chosen by HiCAN are closely located to the nuclear speckles in a nucleus. Indeed, we observed a close association between genomic loci with high values of S-basis and nuclear speckles, compared to random genomic loci at single-cell resolution (see Methods, Figures 1G−I). Moreover, centromere-proximal regions showed high values of gene-poor basis, consistent with previous observations that centromere-proximal regions preferentially localize around nucleoli (Supplementary Figure S1L) (47,48).
Given the tight association between nuclear bodies and gene-rich/poor bases, we denoted the gene-rich S-basis as speckle-associated chromosome organization and the gene-poor N-basis as nucleolus-associated chromosome organization. We assigned a remaining U-basis as the regions devoid of inter-chromosomal interactions. Taken together, from in situ Hi-C contact maps, HiCAN can extract three types of genomic loci exhibiting distinct modes of inter-chromosomal interactions as well as their association tendency with nuclear bodies.
Cell-type invariant speckle-associated inter-chromosomal interactions
Taking advantage of HiCAN, which can be easily expanded to multiple cell-/tissue-types with published Hi-C results, we next investigated whether the values in S-basis and N-basis depend on specific cell types. Interestingly, our HiCAN analysis revealed that the S-basis was substantially conserved across various cell types consistent with the recent experiments using TSA-seq (49), while the N- or U-basis exhibited much less conservative properties (Figures 2A and B). We next wondered whether the conservation of the S-basis is a consequence of cell-type conserved spatial genome territories or cell-type-independent genomic elements that drives preferential association with nuclear speckles.
Figure 2.
Speckle-associated inter-chromosomal interactions are cell-type invariant. (A) A line graph illustrating the conserved level of each basis across the eight cell lines. The plot illustrates the number of a union set (y-axis) of the top n regions from the 8 cell lines (x-axis). The top n regions are ranked by S-basis (orange), N-basis (cyan), and U-basis (grey), respectively. The dashed line above is a virtual line indicating that no overlap has occurred, while the dashed line below indicates the completely overlapped regions among the eight cell lines. (B) Heatmaps illustrating the S-basis value (left), N-basis value (middle), and U-basis value (right) of the top-100 genomic regions ranked by the three basis sets from eight cell lines. (C) The line graph illustrating the S-basis values on chromosome 1 (upper) and chromosome 22 (lower) from GB176 (red) and IMR90 (grey). The vertical dashed line on each chromosome indicates a genomic coordinate of reciprocal translocation breakpoints. (D) Combined DNA Oligopaint FISH images with SRSF2 immunofluorescence images of HeLa cells (maximum projection). The high S-basis loci, labeled as S1 and 2 (top), and the low S-basis loci, labeled as control 1 and 2 (bottom), are shown together with the nuclear speckle marker, SRSF2. Images on the right highlight the identified speckle boundary (green) and the spatial locations of genomic loci (circle: high S-basis loci, triangle: low S-basis loci). The dashed lines indicate the nuclear periphery, and insets show magnified views of corresponding nuclear regions. (E) A boxplot illustrating distances to the nearest nuclear speckle of the imaged loci in HeLa cells (P-value < 2.2 × 10−16, two-sided KS test) (98 cells for S1, 2 and 71 cells for control 1, 2; n = 103, 68, 92, and 74 for s1, s2, control 1, and control 2, respectively). For the boxplots, the box represents the IQR, and the whiskers correspond to the highest and lowest points within 1.5× IQR. (F) A scatter plot showing a relation between speckle association rates of the imaged loci in IMR90 and HeLa cells. Pearson correlation coefficient (PCC) was 0.94.
To test the possibilities, we utilized two cancer cell lines (GB176 and HeLa) harboring derivative chromosomes and chromosome aneuploidy. If the speckle-associated inter-chromosomal interactions are mediated in a sequence-specific manner, chromosomal aberrations may not strongly influence the association of genomic regions with speckles. Using Hi-C results of the human glioblastoma (GB176) cell line where multiple reciprocal translocations were identified (39), we investigated the S-basis of genomic regions near translocation breakpoints (Supplementary Figures S2A and B). Strikingly, we found that the values of GB176 S-basis located near breakpoints were still highly preserved, with similar trends to other cell types without translocations (Figure 2C, Supplementary Figures S2B and C).
To further validate the sequence-encoded nature of S-basis values, we performed DNA Oligopaint FISH on the Henrietta Lacks (HeLa) cell line containing 70 − 90 chromosomes with over 20 translocations (50,51). We used two loci showing high S-basis values in IMR90 used in Figure 1G–I to investigate whether such speckle-associated inter-chromosomal interactions are well maintained in HeLa cells. Again, we observed that these loci are significantly associated with nuclear speckles in HeLa cells (Figures 2D and E). Furthermore, the nuclear speckle-associated interactions between IMR90 and HeLa cells showed striking conservation (Figure 2F). Our results suggest that the speckle-associated inter-chromosomal interactions are highly conserved across different cell types, which are likely established in a manner dependent on cell-type invariant genomic elements.
CSEs are highly enriched in speckle-associated inter-chromosomal interacting regions
Next, we sought to identify chromatin marks and regulatory elements responsible for establishing speckle-associated inter-chromosomal interactions using four well-characterized cell lines (IMR90, GM12878, HMEC and HUVEC). In agreement with previous reports (15,16), active histone marks (H3K4me3 and H3K27ac) were positively correlated with S-basis values but not with N-basis values (Figure 3A and Supplementary Figure S3A). Unexpectedly, H3K27me3, which is a well-known histone mark for facultative heterochromatin, also showed slight enrichment on S-basis, whereas the constitutive heterochromatin mark H3K9me3 was generally depleted (Figure 3A). In agreement with the enrichment test, a substantial portion of subcompartment B1 also possesses high S-basis (Supplementary Figure S1I). Although speckle-associated interactions were simply considered as ‘active hub’, our observations suggest that the speckle-associated inter-chromosomal interactions also include some portion of facultative heterochromatin, which is annotated as ‘inactive hub’ in the previous study (15).
Figure 3.
CSEs are enriched in the speckle-associated inter-chromosomal interacting regions. (A) A scatter plot illustrating the enriched chromatin marks and regulatory elements in S-basis. The x-axis indicates fold-change enrichment of each feature in the top-500 S-basis regions (top 500 regions / the rest of the genome) from the four cell lines (IMR90. GM12878, HMEC and HUVEC), and the y-axis indicates the average of multiplications between S-basis value and enrichment of each feature on every 500 kb genomic bin. The colors indicate cell line information, and the shapes of the dots indicate the corresponding genome features. The red dashed circle highlights enrichments of CSEs in the four cell lines. (B) A boxplot illustrating the distribution of S-basis values of super-enhancers in the four cell lines. The colors indicate the type of super-enhancers (coral: CSEs, green: ISEs, and cyan: SSEs, respectively) (IMR90: n = 143, 290, 25, GM12878: n = 83, 546, 112, HMEC: n = 164, 861, 58, HUVEC: n = 129, 775, 82) (P-values: IMR90 = 2.4 × 10−3, GM12878 = 5.6 × 10−7, HMEC = 3.2 × 10−14, HUVEC = 2.2 × 10−9, two-sided KS test). For the boxplots, the box represents the IQR, and the whiskers correspond to the highest and lowest points within 1.5 × IQR. (C) Left: 500 kb Hi-C inter-chromosomal contact maps from the four cell lines between the S-basis enriched regions of chromosomes 8 and 17. The vertical and horizontal red line plots indicate the S-basis values of the corresponding regions. The vertical and horizontal colored lines on the contact maps indicate the presence of the CSEs. Right: Genome tracks of published H3K27ac ChIP-seq (–log P-values) from the four cell lines. The highlighting colors indicate the corresponding genomic location from the left Hi-C contact maps. The black lines below each track indicate the genome coordinates of CSEs.
Next, we looked up the degree of enrichment of super-enhancers, considering their critical roles in genomic looping and phase separation (52,53). Notably, it was also suggested that super-enhancers are mainly involved in cohesin-independent inter-chromosomal interactions, which might be directly related to the speckle-associated inter-chromosomal interactions (54). Although super-enhancers are widely known as a key controller of cell type-specific genes, it is previously reported that a substantial portion of super-enhancers, known as common super-enhancers (CSEs), is constitutively active across multiple cell types and are associated with rapid chromatin loop recovery (25,55). Given the cell-type-invariant property of the S-basis, we hypothesized that CSEs may have a crucial function in speckle-associated inter-chromosomal interactions. In agreement with our hypothesis, we discovered that CSEs were over-represented at top-ranked genomic regions in the S-basis in all four cell lines examined, whereas cell-type-intermediate super-enhancers (ISEs) or cell-type-specific super-enhancers (SSEs) were weakly enriched or even depleted (Figures 3A−C). The S-basis rapidly decreases with the genomic distance to the CSEs, indicating that CSEs may anchor speckle-associated inter-chromosomal interactions of the nearby genome (Supplementary Figure S3B). In addition, we observed that the high enrichment of CSEs is independent of the H3K27ac level (Supplementary Figure S3C). Thus, the association of CSEs with the regions of high S-basis values is not a consequence of the strength of super-enhancer activities. When examined at a relatively higher resolution (100 kb), we consistently obtained a strong association of CSEs with the S-basis (Supplementary Figure S3D). Furthermore, raw inter-chromosomal Hi-C reads were also enriched on the CSE regions compared to active genes (Supplementary Figures S3E, F). In sum, CSE itself is a key genomic element responsible for speckle-associated inter-chromosomal interactions.
DNA oligopaint FISH validates CSE-mediated cell-type invariant speckle-associated inter-chromosomal interactions
To validate the association of CSEs with nuclear speckles at single-cell resolution, we performed DNA Oligopaint FISH targeting 40 different loci. We included probes targeting 20 CSEs, 10 nCSEs (super-enhancers except for CSEs), two active and inactive genes close to the CSEs, two active genes far from the CSEs, and four high N-basis regions (Figure 4A). Then, we performed combined immunofluorescence and DNA Oligopaint FISH to acquire locations of nuclear speckles and genomic loci in individual IMR90 cells (Figure 4B). Consistently, we observed a significantly high association of CSEs to nuclear speckles compared to nCSEs (Figures 4B and C). Moreover, the speckle contact frequencies of each targeted locus, obtained from DNA Oligopaint FISH, showed a linear relationship with the strength of inter-chromosomal interactions defined by the S-basis (Figure 4D). Similar results were obtained from combined immunofluorescence and DNA Oligopaint FISH on HUVEC, re-confirming the cell-type-independent association of CSEs with speckles (Figures 4E, F and Supplementary Figure S4A). We note that the speckle association rates were very similar between CSE and CSE-proximal gene pairs (Figures 4D and F).
Figure 4.
DNA oligopaint FISH confirms CSE—speckle associations. (A) A schematic illustrating 40 loci targeted with DNA Oligopaint FISH. Genome tracks of published H3K27ac ChIP-seq (–log P-values) at the DNA oligopaint FISH-targeted regions are shown. Among 40 regions, ChIP-seq data around two representative CSEs, two non-CSEs (nCSEs), and one high N-basis region in IMR90 are shown. The two active genes (VPS28, FDPS) and two inactive genes (SCRT1, SEMA4A) proximal to the CSEs, and the two active genes (CTHRC1 and SARS) distal from the CSEs are targeted. The numeric values within parentheses indicate the expression value (FPKM) of the corresponding gene. (B) Combined DNA Oligopaint FISH and SRSF2 immunofluorescence images of IMR90 cells (maximum projection). (Left) All 40 loci FISH images are merged into a single image together with immunofluorescence images of the nuclear speckle marker, SRSF2. (Right) The identified speckle boundary (green) and the spatial locations of the genomic loci under scrutiny are shown. Inset images highlight the co-localization event of multiple loci on identical speckles. The nuclear peripheries are indicated with dashed lines. (C) A boxplot illustrating the distribution of the speckle association (distance to the closest nuclear speckle < 300 nm) rates (y-axis) of 20 CSEs (n = 7096) and 10 nCSEs (n = 3655) in IMR90 measured by DNA Oligopaint FISH (P-value = 1.6 × 10−4, two-sided t-test) (215 cells). (D) A scatter plot illustrating a relation between the S-basis value and log2 speckle association rate of each imaged locus in IMR90. The color indicates group information of the loci. The dashed line is a linear regression line of the dots. Pearson correlation coefficient (PCC) was 0.72. (E) A boxplot illustrating the distribution of the speckle association (distance to the closest nuclear speckle < 300 nm) rates (y-axis) of 20 CSEs (n = 6986) and 10 nCSEs (n = 3632) in HUVEC measured by DNA Oligopaint FISH (P-value = 2.2 × 10−4, two-sided t-test) (211 cells). For the boxplots, the box represents the IQR, and the whiskers correspond to the highest and lowest points within 1.5× IQR. (F) A scatter plot illustrating a relation between the S-basis value and log2 speckle association rate of each imaged locus in HUVEC. The color indicates group information of the loci. The dashed line is a linear regression line of the dots. Pearson correlation coefficient (PCC) was 0.81. (G) Averaged image of speckles (green) and FISH signals (red) around a set of FISH dots (N = 3655 and 7096 for CSE and nCSE, respectively). The FISH intensities are normalized with peak values. (H) The cumulative distribution of the distances of the imaged loci to the nearest speckles (CSE: n = 7096, CSE-prox: n = 1340, nCSE: n = 3655, CSE-distal: n = 679, and N-basis: n = 1381). The colors indicate group information of each locus. The shaded lines indicate the cumulative distribution of each locus, and the bold dashed lines indicate averaged cumulative distribution within the same group (P-value < 2.2 × 10−16, two-sided KS test). The vertical dashed line indicates a cut-off distance used to define speckle association (300 nm). (I) A barplot showing Pearson correlation coefficient value between speckle association rates of the imaged genes and genomic distances to the closest CSE (bp) (coral) or its expression value (FPKM) (grey). (J) A scatter plot illustrating the relation between the speckle association rates of the imaged loci measured by DNA Oligopaint FISH in IMR90 (x-axis) and HUVEC (y-axis). The color indicates group information of the loci. The dashed line is a linear regression line of the dots. PCC was 0.89.
Considering that measured speckle contact frequencies can be subjected to a threshold distance used to define physical contacts, we also performed image averaging analysis for each type of loci (see Methods). Image averaging analysis in both IMR90 and HUVEC revealed that nuclear speckle signals were highly enriched around the CSE loci, but this tendency was not observed for the nCSE loci (Figure 4G and Supplementary Figure S4B). Radial distribution profiles further illustrated that the peak of nuclear speckle signals coincided with the center of averaged CSE loci but not with that of nCSE loci (Supplementary Figure S4C). In addition, the cumulative distributions of distances to the nearest speckle showed that CSE loci were generally positioned closer to nuclear speckles compared to nCSE or active gene loci located far from CSE in both IMR90 and HUVEC (Figure 4H and Supplementary Figure S4D).
We next investigated whether the expression level of the gene itself can impact the speckle association of the locus. For six genes examined in our DNA Oligopaint FISH, a strong correlation was observed between their genomic distances from respective CSEs and speckle association frequencies while the expression levels showed no correlation with speckle associations (Figure 4I). Lastly, we probed whether high N-basis values could signify a strong association of the genomic region with nucleoli, similar to the way S-basis values do for speckles. Combining immunofluorescence and DNA Oligopaint FISH, we indeed observed that the high N-basis regions were significantly associated with the nucleolus compared to the CSEs (Supplementary Figures S4E and F), confirming that HiCAN well predicts both speckle- and nucleolus-associated interactions.
Moreover, the speckle contact frequencies of each targeted locus were strongly correlated between the two cell types, indicating that even the association rates of individual CSE loci with nuclear speckles were highly conserved (Figure 4J and Supplementary Figure S4G).
Association probabilities between each CSE and speckles determine inter-chromosomal contact frequencies
Although a strong association between CSEs and nuclear speckles has been revealed in DNA Oligopaint FISH, we hardly observed multi-contact events where several CSE loci were co-localized to the same speckle, in contrast to the Hi-C and SPRITE, which suggested hub-like chromatin interactions (Figure 1B) (15). Furthermore, we observed that specific interacting pairs co-localizing to the same speckle vary from one cell to another (Figure 4B). To describe how such stochastic association behaviors between CSEs and speckles can explain the hub-like inter-chromosomal organization, we propose a probabilistic establishment model that describes the formation of inter-chromosomal interactions involving speckles and CSEs.
Our model is derived from a simple mathematical interpretation of HiCAN’s matrix decomposition procedure. For a given Hi-C contact matrix M (n by n), the application of HiCAN computes the NMF with k = 3 which yields matrix W (n by 3) and H (3 by n). Each column of W consists of S-basis, N-basis, and U-basis. In the symmetry of the Hi-C contact matrix, H equals WT. We have shown that the S-basis value of each genomic locus, Si, essentially represents the contact probability of each locus i (pi) to nuclear speckles, so a probability of two different loci i and j simultaneously making contacts to the same nuclear speckle can be proportional to the multiplication of pi and pj. Thus, our model suggests that the association rates of individual loci with speckles can predict the speckle-associated inter-chromosomal contact frequencies measured in Hi-C. Notably, individual mammalian cells typically harbor tens of nuclear speckles in the nucleus, which would decrease the probability of multi-contact events with simultaneous co-localization of several CSE loci in a single speckle. Moreover, physical associations between genomic loci and speckles are highly dynamic, further contributing to the low number of multi-contacts observed using DNA Oligopaint FISH. In contrast, Hi-C assays conducted on a population of cells would exhibit hub-like profiles from the accumulation of individual inter-chromosomal contacts (Figure 5A).
Figure 5.
The probabilistic establishment model of speckle-associated inter-chromosomal interactions (A) A schematic illustration of the probabilistic establishment model of speckle-associated inter-chromosomal interactions driven by each genomic region's speckle contacting probability. Simplified examples are shown, illustrating CSE—speckle associations with various contact probabilities at the single-cell level (left), in four cells (middle), and at the bulk level (right). (B) A model-predicted speckle-associated inter-chromosomal contact matrix for the DNA Oligopaint FISH-targeted genomic regions. The predicted matrix is generated by multiplication between ‘S’ (a vector of DNA Oligopaint FISH-measured speckle contact frequency of the loci in IMR90) and ST (a transpose of ‘S’). (C) An empirically measured speckle-associated inter-chromosomal contact matrix among the DNA Oligopaint FISH-targeted genomic regions (215 cells). The matrix values indicate the number of locus pairs located at the same speckle (< 300 nm), measured from the DNA Oligopaint FISH results of IMR90. (D) An IMR90 Hi-C contact map of 500 kb genomic regions containing the DNA Oligopaint FISH-targeted genomic regions. For B, C and D, the intra-interaction bins are marked as white. The black dashed boxes indicate speckle-associated inter-chromosomal interactions among CSEs and CSE-proximal genes, showing a hub-like pattern. (E) A scatter plot illustrating the relation between the DNA Oligopaint FISH-measured (y-axis) and model-predicted (x-axis) speckle-associated inter-chromosomal interactions. The dashed black line is a trend line of the dots. Pearson correlation coefficient (PCC) was 0.64. (F) A scatter plot illustrating the relation between the Hi-C (y-axis) and model-predicted (x-axis) speckle-associated inter-chromosomal interactions. The dashed black line is a trend line of the dots. Pearson correlation coefficient (PCC) was 0.64. (G) Violin plots illustrating bootstrapped PCC values (n = 10 000) of the model-predicted interactions against the DNA Oligopaint FISH-measured interactions (red) and Hi-C interactions (blue) (see Methods). For the boxplots, the box represents the IQR, and the whiskers correspond to the highest and lowest points within 1.5× IQR.
To assess the capability of our proposed model, we first computed co-localization probabilities of each pair of loci to speckles using speckle association rates of individual loci obtained in the DNA Oligopaint FISH assay. After that, we constructed a predicted inter-chromosomal contact map based on our probabilistic establishment model and measured speckle association rates (see Methods, Figure 5B). We then compared the predicted matrix to two experimentally measured contact maps: a Hi-C inter-chromosomal contact map and a DNA-Oligopaint-FISH-derived one where the number of co-localizing events at the same speckle was counted for each pair of loci (Figures 5C and D). Surprisingly, the predicted contact map exhibited a striking resemblance to both maps (Figures 5−G). We consistently obtained such a strong resemblance using various distance thresholds and a different cell line, supporting that our model describes the formation of speckle-associated inter-chromosomal interactions with high accuracy (Supplementary Figures S5A−D). In contrast, when we utilized gene transcription scores as a sum of FPKM values at 30 kb resolution or chromatin activities using H3K27ac peaks, we could not obtain such accurate prediction results (Supplementary Figures S5E–G). We also tested the performance of a deterministic model where we assumed fixed contact partners for each speckle (see Methods). The deterministic model poorly resembles the hub-like structure of the Hi-C contact map (Supplementary Figures S5H–J). These results suggest that speckle-associated inter-chromosomal interactions are probabilistically shaped through independent associations of individual loci to nuclear speckles, leading to dynamic interactions within identical speckle compartments. Furthermore, our results imply that the hub-like structure observed in SRPITE and Hi-C experiments may reflect the likelihood of different loci simultaneously contacting the same speckle.
Depletion of MAZ reorganizes speckle-associated inter-chromosomal interactions
Our probabilistic model well described both hub-like chromatin organization at the cell-population level and stochastic behavior at the single-cell level. However, it is unclear which factor determines speckle-CSE associations. To dissect the underlying mechanism, we hypothesized that distinct protein binding at CSEs might be involved in the speckle association compared to SSEs. To identify CSE-specific candidate binding proteins, we conducted motif analysis at CSEs compared to the SSEs and the typical enhancers (see Methods). We found that 25 protein-binding motifs were highly enriched on CSEs while depleted on the SSEs (Figure 6A). Notably, multiple zinc finger proteins playing a role in transcription regulation and splicing were recognized. We further narrowed down the candidates to six proteins (MAZ, KLF5, ZNF740, ZNF263, ZNF148 and KLF4) based on the correlation between each motif enrichment at CSEs and the corresponding values in S-basis (see Methods, Supplementary Figure S6A). Interestingly, these DNA-binding proteins contain significant fractions of structurally disordered regions, whose amino acid compositions are biased toward charged (R, K, D, E) and pi (π) residues (F, Y) (Supplementary Figures S6B − I), which are known to drive liquid−liquid phase separation in RNA-binding proteins, such as FUS and hnRNPA1 (56–58).
Figure 6.
Depletion of MAZ reorganizes speckle-associated inter-chromosomal interactions. (A) The k-means (k = 10) clustered heatmap illustrating the enrichments of the selected protein motifs (see Materials and Methods) on super-enhancer regions (up: CSEs, down: SSEs) from the four cell lines. The colors indicate log2 fold changes of motif enrichment (super-enhancer/typical enhancer). The six final candidate proteins are shown below. (B) Combined DNA Oligopaint FISH and SRSF2 immunofluorescence images of control and MAZ KD HeLa cells (maximum projection). (Left) All CSE and nCSE FISH images are merged into a single image together with immunofluorescence images of the nuclear speckle marker, SRSF2. (Right) The identified speckle boundary (green) and the spatial locations of genomic loci (circle: CSE, triangle: nCSE) are shown. The color index for each locus is shown in the image. The nuclear peripheries are indicated with dashed lines. (C) Log2 radially averaged SRSF2 intensity (4 CSEs/4 nCSEs) as a function of the distance from the loci center in MAZ KD (138 cells; red; n = 1486 for CSE, n = 1432 for nCSE) and control (133 cells; grey; n = 1499 for CSE, n = 1397 for nCSE) HeLa cells (P-value < 2.2 × 10−16, two-sided KS test). (D) A heatmap showing relative S-basis values for the regions with significantly altered (FDR < 0.05) S-basis upon the MAZ depletion. The color bar on the right-side of the heatmap illustrates the original S-basis values in control HeLa cells. (E) (left) An aggregated heatmap illustrating normalized Hi-C inter-chromosomal contact frequencies surrounding 5 Mb upstream and downstream centered on all pairs of genomic regions between HeLa top 500 S-basis regions and S-basis decreased regions by the MAZ depletion. (Right) Similar to the left, between mESC top 500 S-basis regions and S-basis decreased regions by the MAZ knockout. For the heatmaps, the average values are plotted. (F) Boxplots illustrating log2 (MAZ KD/control) HeLa S-basis values of CSEs (red) (n = 66), nCSEs (orange) (n = 117), and bottom-500 S-basis (grey) (n = 500) regions (P-values: CSE = 5.5 × 10−4, nCSE = 0.011, and bottom < 2.2 × 10−16, two-sided paired t-test between MAZ KD and control S-basis values). (G) Violin plots illustrating the corresponding log2 (HeLa MAZ KD S-basis/control S-basis) values of downregulated (blue, n = 726) and upregulated (red, n = 980) DEGs. The asterisks indicate the P-values for the S-basis difference between the HeLa MAZ KD and control in the corresponding genomic regions (P-values: down = 2.04 × 10−15, up = 3.60 × 10−5, two-sided paired t-test). The box represents the IQR, and the whiskers correspond to the highest and lowest points within 1.5× IQR. (H) A barplot showing –log10(P-value) of the top 10 GO biological process (BP), cellular component (CC) and KEGG pathway terms of the downregulated DEGs.
To test the function of the top six CSE-specific binding candidates on CSE-speckle associations, we performed a 96 h knockdown experiment with RNA interference in HeLa cells. The knockdown efficiency was confirmed by qPCR and western blot (Supplementary Table S6 and Supplementary Figure S7A). We then investigated the speckle association of four CSE and four nCSE loci with 8-round DNA Oligopaint FISH. Surprisingly, we observed that the depletion of MAZ banished the difference in speckle association between CSE and nCSE among the six target TFs (Figures 6B, C and Supplementary Figure S7B). Precisely, the genomic regions with high S-basis values decreased speckle associations while those with low S-basis values increased speckle associations upon MAZ depletion (Supplementary Figure S7C). In addition, MAZ ChIP-seq results of IMR90 and GM12878 cell lines demonstrated that MAZ significantly binds to CSEs compared to the other types of SEs (Supplementary Figures S7D and E). Considering both the disorganization effect upon MAZ depletion and its strong binding pattern on CSEs, we hypothesized that MAZ may be crucial to establish speckle-CSE associations.
To systematically investigate the global effect of MAZ depletion, we performed in situ Hi-C experiments on MAZ-depleted HeLa cells and applied HiCAN (Supplementary Figure S7F). We observed that about 40% of genomic regions underwent significant S-basis alteration after the depletion of MAZ (Figure 6D and Supplementary Figure S7G) (See methods). Changes in speckle-associated inter-chromosomal interaction were also highly visible on the Hi-C contact maps (Figure 6E). Consistent with the 8-round DNA Oligopaint FISH result, high S-basis regions tend to lose speckle-associated inter-chromosomal interactions after the MAZ depletion whereas low S-basis regions tend to gain the interactions (Figure 6D and Supplementary Figure S7H). In contrast, we did not observe dramatic changes in intra-chromosomal contacts in terms of directionality index (DI) scores consistent with previous reports, nor the N-basis (59,60) (Supplementary Figures S7I and J). Notably, the effect of MAZ-depletion was reproducible in published mESC MAZ Knockout (KO) Hi-C data (60), indicating that MAZ is an important factor for shaping speckle-associated inter-chromosomal interactions both in the human and mouse (Figure 6E, Supplementary Figures S7K and L).
Although we observed significant alteration of speckle association in many genomic regions, not all genomic regions were affected by MAZ depletion. Given the strong enrichment of MAZ binding on CSEs, we hypothesized that the effect of MAZ on speckle-associated inter-chromosomal interactions might be directly related to CSEs (Supplementary Figures S7D and E). To test this hypothesis, we performed H3K27ac ChIP-seq on the MAZ-depleted HeLa cells and defined 25,022 peaks (Supplementary Figure S8A). We also obtained 80 CSEs among the total of 218 super-enhancers in the control HeLa cell. Notably, only about 0.1% of the peaks were significantly altered after the depletion of MAZ, indicating that the effect of MAZ depletion on inter-chromosomal interaction is independent to super-enhancer activity (Supplementary Figures S8B and C). Interestingly, we observed that the S-basis values of CSEs were more significantly decreased upon the MAZ depletion compared to nCSEs (Figure 6F). Overall, our results strongly support that MAZ depletion specifically disrupts speckle-associated inter-chromosomal interactions mediated by CSEs.
Finally, to investigate the functional impact of the disorganization of inter-chromosomal interactions, we performed RNA-seq on the MAZ-depleted HeLa cells (Supplementary Figure S8D). We defined 1,018 upregulated and 755 downregulated differentially expressed genes (DEGs) upon the MAZ depletion (Supplementary Figure S8E, Supplementary Tables S8, and S9). Upregulated DEGs were tended to located in genomic regions with increased S-basis values, while downregulated DEGs were vice versa (Figure 6G). Gene ontology (GO) analysis revealed that downregulated DEGs were greatly enriched in the ‘cytosolic ribosome’ term which includes genes encoding cytosolic ribosomal subunit proteins (Figure 6H). The result suggests that speckle-associated inter-chromosomal interactions might play an important role in the steady expression of ribosomal proteins. Although we cannot exclude the possibility that the DEGs are directly affected by the MAZ depletion rather than the structural alteration, the results indicate that speckle-associated inter-chromosomal interactions are tightly related to the expression of a certain functional group of genes.
DISCUSSION
As our understating of chromatin structures in the nucleus has expanded, it is increasingly evident that chromatin is not just randomly distributed in the nucleus but rather hierarchically organized to precisely coordinate diverse genome functions. Importantly, recent studies have revealed that, in addition to cis-chromosomal interactions, numerous high-order inter-chromosomal interactions are present in the nucleus, gathered around various nuclear bodies (13–15). However, the presence of cell-type-independent inter-chromosomal interactions and the detailed molecular mechanisms driving these processes remain to be elucidated.
In this study, we devised a new computational method, HiCAN, to extract inter-chromosomal interactions from in situ Hi-C data. Our approach provides a unique advantage over recently developed experimental techniques for probing inter-chromosomal interactions, in that HiCAN works on in situ Hi-C data, where large public datasets are available for application. We were able to apply HiCAN to multiple human cell types and found the presence of cell-type invariant speckle-associated inter-chromosomal interactions. Furthermore, we provided multiple lines of evidence highlighting the key role of CSEs as a putative driver for speckle-to-chromatin associations. Finally, we proposed MAZ as a new structural protein regulating CSE-speckle mediated inter-chromosomal interactions.
Nuclear speckles, also called interchromatin granule clusters, have long been observed to form in the nuclear space between chromosomes (61). Emerging evidence suggests that nuclear speckles play an important role in genome organization through a preferential association with certain genomic regions. Recent studies using TSA-seq showed deterministic positioning near speckles of genomic regions, such as super-enhancers, housekeeping genes, as well as heat-shock induced genes (49). Our data show that speckle association rates can vary appreciably depending on the type of super-enhancers: CSEs exhibit much stronger speckle associations than SSEs. Notably, this tight association of CSEs is independent of enhancer strengths, as evidenced by H3K27ac levels. Our motif analysis hinted that the recruitment of specific proteins to CSE may facilitate an association between genomic elements and nuclear speckles. As a result, we have focused on MAZ as a key regulator of speckle-associated inter-chromosomal contacts. Indeed, we observed significant alteration of inter-chromosomal organization upon depletion of MAZ.
MAZ is currently highlighted as a cofactor of CTCF insulation which contributes to TAD integrity and loop interactions. However, the previous works pointed out that the depletion of MAZ did not have large-scale effects on intra-chromosomal genome organization (59,60). Recent in silico screening results also proposed MAZ as a new structural protein with an independent function from CTCF (62). Consistently, our results strongly support the structural role of MAZ, but uniquely propose that MAZ may exert its function on the inter-chromosomal organization rather than the intra-chromosomal organization. Our results may elucidate how the genome can maintain its speckle-associations across multiple cell types and given highly aberrant chromosomal rearrangements in the cancer genome. The association between speckle and MAZ-occupied CSE may act as knots on the chromatin fiber to preserve the higher-order chromatin organization, despite the dynamic reorganization of intra-chromosomal interactions.
Previous studies suggested that the higher-order inter-chromosomal interactions form hub-like structures, simultaneously contacting each other in a nucleus (15). Similarly, we also observed hub-like interaction patterns from in situ Hi-C data. However, we could not observe multi-contact events involving more than two genomic loci as evidenced by the DNA Oligopaint FISH assay findings. We proposed a probabilistic establishment model to help reconcile this seemingly contradictory result. Our model connects inter-chromosomal interactions captured by bulk Hi-C data to single-cell level probabilistic associations of genomic regions with speckles by mathematically interpreting the NMF formulation. The model suggested that observed hub-like structures are accumulated effects of summing up individual, stochastic chromatin − speckle interactions. Thus, our model successfully filled the chasm between imaging and bulk sequencing data and described the basic principle of nuclear body-associated inter-chromosomal interaction formation. Importantly, although the model prediction from the assumption of purely independent speckle associations is in excellent agreement with experiments, we cannot rule out that cooperativity between other factors may play a role in inter-chromosomal genome organization.
In conclusion, we devised a new computational method, HiCAN, to reliably extract inter-chromosomal interactions from Hi-C contact maps with datasets widely available across many cell types. Our results showed that a group of genomic regions enriched with CSEs strongly participate in speckle-associated inter-chromosomal contacts in a conserved manner across cell types. These data indicate the critical role of the genome − condensate interactions in establishing the higher-order organization of chromosomes. Combining DNA Oligopaint FISH and Hi-C results, we propose the probabilistic establishment of speckle-associated inter-chromosomal interactions, whereby stochastic associations of individual genomic loci with speckles provide a localized environment for further inter-chromosomal interactions. Finally, we find that MAZ is highly co-occupied with CSEs and verify that MAZ depletion disorganizes speckle-associated inter-chromosomal interactions. Our results highlight that MAZ is a new structural protein and CSEs may serve as a binding platform for MAZ. Elucidating the functional consequence of these interactions and the way cells regulate these processes remains an exciting and key future challenge to understanding the mechanisms of inter-chromosomal interaction establishment.
DATA AVAILABILITY
The genome sequences used in this research were derived from HeLa cell line. We deeply appreciate Henrietta Lacks and to her family members for their huge contributions to genome research. The HeLa sequencing data generated from this study (in situ Hi-C, RNA-seq, and H3K27ac ChIP-seq on MAZ-depleted and control HeLa cells) are available from the corresponding author on request.
Preliminary version of HiCAN is available at online: https://github.com/kaistcbfg/HiCAN.
All public datasets used in the research are listed in Supplementary Table S10.
Supplementary Material
ACKNOWLEDGEMENTS
We thank members of the all laboratories for support and critical suggestions throughout the course of this work.
Contributor Information
Jaegeon Joo, Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
Sunghyun Cho, Department of Mechanical Engineering, Seoul National University, Seoul 08826, Republic of Korea.
Sukbum Hong, Department of Mechanical Engineering, Seoul National University, Seoul 08826, Republic of Korea.
Sunwoo Min, Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
Kyukwang Kim, Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
Rajeev Kumar, Department of Chemistry and Chemistry Institute for Functional Materials, Pusan National University, Busan 46241, Republic of Korea.
Jeong-Mo Choi, Department of Chemistry and Chemistry Institute for Functional Materials, Pusan National University, Busan 46241, Republic of Korea.
Yongdae Shin, Department of Mechanical Engineering, Seoul National University, Seoul 08826, Republic of Korea; Interdisciplinary Program in Bioengineering, Seoul National University, Seoul 08826, Republic of Korea.
Inkyung Jung, Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
Ministry of Science and ICT through the National Research Foundation in Republic of Korea [2020R1A2C400146413 to I.J., 2022R1A5A1026413 to I.J., 2019R1C1C1006477 to Y.S.]; SUHF Fellowship [to I.J.]; Samsung Science and Technology Foundation [SSTF-BA1901-12 to Y.S.]. Funding for open access charge: SUHF Fellowship.
Conflict of interest statement. None declared.
REFERENCES
- 1. Lieberman-Aiden E., van Berkum N.L., Williams L., Imakaev M., Ragoczy T., Telling A., Amit I., Lajoie B.R., Sabo P.J., Dorschner M.O.et al.. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009; 326:289–293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Mumbach M.R., Rubin A.J., Flynn R.A., Dai C., Khavari P.A., Greenleaf W.J., Chang H.Y.. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat. Methods. 2016; 13:919–922. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Rao S.S., Huntley M.H., Durand N.C., Stamenova E.K., Bochkov I.D., Robinson J.T., Sanborn A.L., Machol I., Omer A.D., Lander E.S.et al.. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014; 159:1665–1680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Dixon J.R., Selvaraj S., Yue F., Kim A., Li Y., Shen Y., Hu M., Liu J.S., Ren B.. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012; 485:376–380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Dixon J.R., Jung I., Selvaraj S., Shen Y., Antosiewicz-Bourget J.E., Lee A.Y., Ye Z., Kim A., Rajagopal N., Xie W.et al.. Chromatin architecture reorganization during stem cell differentiation. Nature. 2015; 518:331–336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Crane E., Bian Q., McCord R.P., Lajoie B.R., Wheeler B.S., Ralston E.J., Uzawa S., Dekker J., Meyer B.J.. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature. 2015; 523:240–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Kim K., Eom J., Jung I.. Characterization of structural variations in the context of 3D chromatin structure. Mol. Cells. 2019; 42:512–522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Krijger P.H., Di Stefano B., de Wit E., Limone F., van Oevelen C., de Laat W., Graf T.. Cell-of-origin-specific 3D genome structure acquired during somatic Cell reprogramming. Cell Stem Cell. 2016; 18:597–610. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Pope B.D., Ryba T., Dileep V., Yue F., Wu W., Denas O., Vera D.L., Wang Y., Hansen R.S., Canfield T.K.et al.. Topologically associating domains are stable units of replication-timing regulation. Nature. 2014; 515:402–405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Siersbaek R., Madsen J.G.S., Javierre B.M., Nielsen R., Bagge E.K., Cairns J., Wingett S.W., Traynor S., Spivakov M., Fraser P.et al.. Dynamic rewiring of promoter-anchored chromatin loops during adipocyte differentiation. Mol. Cell. 2017; 66:420–435. [DOI] [PubMed] [Google Scholar]
- 11. Jerkovic I., Cavalli G.. Understanding 3D genome organization by multidisciplinary methods. Nat. Rev. Mol. Cell Biol. 2021; 22:511–528. [DOI] [PubMed] [Google Scholar]
- 12. Maass P.G., Barutcu A.R., Weiner C.L., Rinn J.L.. Inter-chromosomal contact properties in live-cell imaging and in hi-C. Mol. Cell. 2018; 69:1039–1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Arrastia M.V., Jachowicz J.W., Ollikainen N., Curtis M.S., Lai C., Quinodoz S.A., Selck D.A., Ismagilov R.F., Guttman M.. Single-cell measurement of higher-order 3D genome organization with scSPRITE. Nat. Biotechnol. 2022; 40:64–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Quinodoz S.A., Jachowicz J.W., Bhat P., Ollikainen N., Banerjee A.K., Goronzy I.N., Blanco M.R., Chovanec P., Chow A., Markaki Y.et al.. RNA promotes the formation of spatial compartments in the nucleus. Cell. 2021; 184:5775–5790. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Quinodoz S.A., Ollikainen N., Tabak B., Palla A., Schmidt J.M., Detmar E., Lai M.M., Shishkin A.A., Bhat P., Takei Y.et al.. Higher-order inter-chromosomal hubs shape 3D genome organization in the nucleus. Cell. 2018; 174:744–757. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Chen Y., Zhang Y., Wang Y., Zhang L., Brinkman E.K., Adam S.A., Goldman R., van Steensel B., Ma J., Belmont A.S.. Mapping 3D genome organization relative to nuclear compartments using TSA-Seq as a cytological ruler. J. Cell Biol. 2018; 217:4025–4048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Nguyen H.Q., Chattoraj S., Castillo D., Nguyen S.C., Nir G., Lioutas A., Hershberg E.A., Martins N.M.C., Reginato P.L., Hannan M.et al.. 3D mapping and accelerated super-resolution imaging of the human genome using in situ sequencing. Nat. Methods. 2020; 17:822–832. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Payne A.C., Chiang Z.D., Reginato P.L., Mangiameli S.M., Murray E.M., Yao C.C., Markoulaki S., Earl A.S., Labade A.S., Jaenisch R.et al.. In situ genome sequencing resolves DNA sequence and structure in intact biological samples. Science. 2021; 371:eaay3446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Su J.H., Zheng P., Kinrot S.S., Bintu B., Zhuang X.. Genome-scale imaging of the 3D organization and transcriptional activity of chromatin. Cell. 2020; 182:1641–1659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Takei Y., Yun J., Zheng S., Ollikainen N., Pierson N., White J., Shah S., Thomassie J., Suo S., Eng C.L.et al.. Integrated spatial genomics reveals global architecture of single nuclei. Nature. 2021; 590:344–350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Wang Y., Zhang Y., Zhang R., van Schaik T., Zhang L., Sasaki T., Peric-Hupkes D., Chen Y., Gilbert D.M., van Steensel B.et al.. SPIN reveals genome-wide landscape of nuclear compartmentalization. Genome Biol. 2021; 22:36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Xiong K., Ma J.. Revealing hi-C subcompartments by imputing inter-chromosomal chromatin interactions. Nat. Commun. 2019; 10:5069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Wang S., Su J.H., Beliveau B.J., Bintu B., Moffitt J.R., Wu C.T., Zhuang X.. Spatial organization of chromatin domains and compartments in single chromosomes. Science. 2016; 353:598–602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Lee D.D., Seung H.S.. Learning the parts of objects by non-negative matrix factorization. Nature. 1999; 401:788–791. [DOI] [PubMed] [Google Scholar]
- 25. Ryu J., Kim H., Yang D., Lee A.J., Jung I.. A new class of constitutively active super-enhancers is associated with fast recovery of 3D chromatin loops. BMC Bioinf. 2019; 20:127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Mateo L.J., Murphy S.E., Hafner A., Cinquini I.S., Walker C.A., Boettiger A.N.. Visualizing DNA folding and RNA in embryos at single-cell resolution. Nature. 2019; 568:49–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Gelali E., Girelli G., Matsumoto M., Wernersson E., Custodio J., Mota A., Schweitzer M., Ferenc K., Li X., Mirzazadeh R.et al.. iFISH is a publically available resource enabling versatile DNA FISH to study genome architecture. Nat. Commun. 2019; 10:1636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Moffitt J.R., Zhuang X.. RNA imaging with multiplexed error-robust fluorescence In situ hybridization (MERFISH). Methods Enzymol. 2016; 572:1–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Li H., Durbin R.. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009; 25:1754–1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Quinlan A.R., Hall I.M.. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010; 26:841–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Love M.I., Huber W., Anders S.. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014; 15:550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R.. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013; 29:15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Li B., Dewey C.N.. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinf. 2011; 12:323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Zhou Y., Zhou B., Pache L., Chang M., Khodabakhshi A.H., Tanaseichuk O., Benner C., Chanda S.K.. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 2019; 10:1523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Amemiya H.M., Kundaje A., Boyle A.P.. The ENCODE blacklist: identification of problematic regions of the genome. Sci. Rep. 2019; 9:9354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Imakaev M., Fudenberg G., McCord R.P., Naumova N., Goloborodko A., Lajoie B.R., Dekker J., Mirny L.A.. Iterative correction of hi-C data reveals hallmarks of chromosome organization. Nat. Methods. 2012; 9:999–1003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Harrow J., Frankish A., Gonzalez J.M., Tapanari E., Diekhans M., Kokocinski F., Aken B.L., Barrell D., Zadissa A., Searle S.et al.. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012; 22:1760–1774. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Boutsidis C., Gallopoulos E.. SVD based initialization: a head start for nonnegative matrix factorization. Pattern Recognit. 2008; 41:1350–1362. [Google Scholar]
- 39. Harewood L., Kishore K., Eldridge M.D., Wingett S., Pearson D., Schoenfelder S., Collins V.P., Fraser P.. Hi-C as a tool for precise detection and characterisation of chromosomal rearrangements and copy number variation in human tumours. Genome Biol. 2017; 18:125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Ritchie M.E., Phipson B., Wu D., Hu Y., Law C.W., Shi W., Smyth G.K.. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015; 43:e47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Kim K., Jung I.. covNorm: an R package for coverage based normalization of hi-C and capture hi-C data. Comput Struct Biotechnol J. 2021; 19:3149–3159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Whyte W.A., Orlando D.A., Hnisz D., Abraham B.J., Lin C.Y., Kagey M.H., Rahl P.B., Lee T.I., Young R.A.. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013; 153:307–319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Heinz S., Benner C., Spann N., Bertolino E., Lin Y.C., Laslo P., Cheng J.X., Murre C., Singh H., Glass C.K.. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell. 2010; 38:576–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Castro-Mondragon J.A., Riudavets-Puig R., Rauluseviciute I., Lemma R.B., Turchi L., Blanc-Mathieu R., Lucas J., Boddie P., Khan A., Manosalva Perez N.et al.. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2022; 50:D165–D173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Peng K., Radivojac P., Vucetic S., Dunker A.K., Obradovic Z.. Length-dependent prediction of protein intrinsic disorder. BMC Bioinf. 2006; 7:208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Zidek A., Potapenko A.et al.. Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596:583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Pollock C., Huang S.. The perinucleolar compartment. J. Cell. Biochem. 2009; 107:189–193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Tjong H., Li W., Kalhor R., Dai C., Hao S., Gong K., Zhou Y., Li H., Zhou X.J., Le Gros M.A.et al.. Population-based 3D genome structure analysis reveals driving forces in spatial genome organization. Proc. Natl. Acad. Sci. U.S.A. 2016; 113:E1663–1672. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Zhang L., Zhang Y., Chen Y., Gholamalamdari O., Wang Y., Ma J., Belmont A.S.. TSA-seq reveals a largely conserved genome organization relative to nuclear speckles with small position changes tightly correlated with gene expression changes. Genome Res. 2021; 31:251–264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Chen T.R. Re-evaluation of HeLa, HeLa S3, and hep-2 karyotypes. Cytogenet. Cell Genet. 1988; 48:19–24. [DOI] [PubMed] [Google Scholar]
- 51. Landry J.J., Pyl P.T., Rausch T., Zichner T., Tekkedil M.M., Stutz A.M., Jauch A., Aiyar R.S., Pau G., Delhomme N.et al.. The genomic and transcriptomic landscape of a HeLa cell line. G3 (Bethesda). 2013; 3:1213–1224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Sabari B.R., Dall’Agnese A., Boija A., Klein I.A., Coffey E.L., Shrinivas K., Abraham B.J., Hannett N.M., Zamudio A.V., Manteiga J.C.et al.. Coactivator condensation at super-enhancers links phase separation and gene control. Science. 2018; 361:eaar3958. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Dowen J.M., Fan Z.P., Hnisz D., Ren G., Abraham B.J., Zhang L.N., Weintraub A.S., Schujiers J., Lee T.I., Zhao K.et al.. Control of cell identity genes occurs in insulated neighborhoods in mammalian chromosomes. Cell. 2014; 159:374–387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Rao S.S.P., Huang S.C., Glenn St Hilaire B., Engreitz J.M., Perez E.M., Kieffer-Kwon K.R., Sanborn A.L., Johnstone S.E., Bascom G.D., Bochkov I.D.et al.. Cohesin loss eliminates all loop domains. Cell. 2017; 171:305–320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Hnisz D., Abraham B.J., Lee T.I., Lau A., Saint-Andre V., Sigova A.A., Hoke H.A., Young R.A.. Super-enhancers in the control of cell identity and disease. Cell. 2013; 155:934–947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Bremer A., Farag M., Borcherds W.M., Peran I., Martin E.W., Pappu R.V., Mittag T.. Deciphering how naturally occurring sequence features impact the phase behaviours of disordered prion-like domains. Nat. Chem. 2022; 14:196–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Martin E.W., Holehouse A.S., Peran I., Farag M., Incicco J.J., Bremer A., Grace C.R., Soranno A., Pappu R.V., Mittag T.. Valence and patterning of aromatic residues determine the phase behavior of prion-like domains. Science. 2020; 367:694–699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Wang J., Choi J.M., Holehouse A.S., Lee H.O., Zhang X., Jahnel M., Maharana S., Lemaitre R., Pozniakovsky A., Drechsel D.et al.. A molecular grammar governing the driving forces for phase separation of prion-like RNA binding proteins. Cell. 2018; 174:688–699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Xiao T., Li X., Felsenfeld G.. The Myc-associated zinc finger protein (MAZ) works together with CTCF to control cohesin positioning and genome organization. Proc. Natl. Acad. Sci. U.S.A. 2021; 118:e2023127118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Ortabozkoyun H., Huang P.Y., Cho H., Narendra V., LeRoy G., Gonzalez-Buendia E., Skok J.A., Tsirigos A., Mazzoni E.O., Reinberg D.. CRISPR and biochemical screens identify MAZ as a cofactor in CTCF-mediated insulation at Hox clusters. Nat. Genet. 2022; 54:202–212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Chen Y., Belmont A.S.. Genome organization around nuclear speckles. Curr. Opin. Genet. Dev. 2019; 55:91–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Tan J., Shenker-Tauris N., Rodriguez-Hernaez J., Wang E., Sakellaropoulos T., Boccalatte F., Thandapani P., Skok J., Aifantis I., Fenyo D.et al.. Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening. Nat. Biotechnol. 2023; 10.1038/s41587-022-01612-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The genome sequences used in this research were derived from HeLa cell line. We deeply appreciate Henrietta Lacks and to her family members for their huge contributions to genome research. The HeLa sequencing data generated from this study (in situ Hi-C, RNA-seq, and H3K27ac ChIP-seq on MAZ-depleted and control HeLa cells) are available from the corresponding author on request.
Preliminary version of HiCAN is available at online: https://github.com/kaistcbfg/HiCAN.
All public datasets used in the research are listed in Supplementary Table S10.






