Abstract
Comparing chromatin contact maps is an essential step in quantifying how three-dimensional (3D) genome organization shapes development, evolution, and disease. However, no gold standard exists for comparing contact maps, and even simple methods often disagree. In this study, we propose novel comparison methods and evaluate them alongside existing approaches using genome-wide Hi-C data and 22,500 in silico predicted contact maps. We also quantify the robustness of methods to common sources of biological and technical variation, such as boundary size and noise. We find that simple difference-based methods such as mean squared error are suitable for initial screening, but biologically informed methods are necessary to identify why maps diverge and propose specific functional hypotheses. We provide a reference guide, codebase, and benchmark for rapidly comparing chromatin contact maps at scale to enable biological insights into the 3D organization of the genome.
Introduction
The same genomic locus can adopt different three-dimensional (3D) conformations in different cells, species, and disease states, which can impact gene regulation, cell identity, and replication timing (Fig. 1A)1,2,3–7. Chromosome-conformation capture methods (3C, 4C, 5C, Hi-C, Micro-C)8–12 measure how the genome folds across scales, including chromosomal territories, topologically associating domains (TADs), enhancer-promoter loops, and architectural stripes 10,13–15. In recent years, single-cell and deep learning techniques accelerated the study of chromatin conformation across an expanding range of biological contexts 16–22.
Figure 1. Approaches for comparing 3D chromatin contact maps.
(A) 3D genome comparisons drive insights into many domains of chromatin biology. Differences observed between maps may reflect consequences of mutations, cell type differences, species differences, or technical biases. (B) 3D contact maps exhibit a range of functionally meaningful differences, e.g., in global folding patterns, contact intensity, or small, focal changes to part of the map. (C) We define three categories of comparison methods and evaluate 11 representative methods. Basic methods (left) compare the contact intensities at each contact bin across two maps with simple measures such as mean squared error or correlations. Map-informed methods (middle) transform the 2D contact maps into 1D tracks that describe qualities like the directionality index or insulation score. These tracks were compared to obtain a score. Feature-informed methods (right) are designed to identify relevant elements (e.g., from functional genomics data) or structures (e.g., TADs or loops).
There are many ways to compare chromatin conformation maps, but no gold standard exists. Existing approaches rank differences between pairs of maps6,7,23–26, test reproducibility between replicates and modalities 7,23,24,27, identify tissue specific contacts26, and highlight differential chromatin interactions6,25. Some scores are designed to identify global differences like boundaries and contact intensities (Fig. 1B, left and center), while others target focal changes like enhancer stripes (Fig. 1B, right). To rank thousands of loci with diverse folding patterns, one must consider how scoring metrics prioritize different map features and respond to technical artifacts.
Here, we develop a unifying framework to guide strategies for comparing contact maps for new use cases. We introduce three novel methods—eigenvector difference, contact decay probability difference, and triangle track comparison—and benchmark these along with representative methods from the literature to evaluate 11 total approaches (Fig. 1C). We quantify how methods differentially rank pairs of contact maps across experimental Hi-C data, 22,500 in silico sequence insertions and deletions, and simulated contact maps that capture both biological and technical variation. Our analyses identify when methods diverge and when they are consistent, which methods are redundant or complementary, and where methods commonly fail. The new methods we introduce have relatively high concordance with existing metrics while providing rich information about biological mechanisms. We summarize our recommendations and release a library of open-source code for scoring differences between contact frequency maps to enable scientists to choose and apply the right method for their research question.
Results
Diverse strategies for scoring pairs of contact maps
When scoring differences between pairs of contact maps, it is common to apply basic methods that consider entire 2D contact matrices (e.g., mean squared error 206,720) or feature-informed methods that sum differences in specific structures (e.g., loops28). These methods represent two extremes. Basic methods are global summary statistics that can overlook small differences that are most biologically interesting. In contrast, feature-informed algorithms specifically target elements such as TADs, stripes, and loops, but are agnostic to overall contact change and may emphasize artifactual differences. As a compromise between these extremes, we extend statistics previously developed to quantify compartments (eigenvectors/PCA29), boundaries (directionality index30, insulation31), and contact decay9 in individual maps to instead score differences between pairs of maps. We also propose a new method, called triangle score, which calculates average contact frequencies across all submatrices in a larger contact matrix. These new map-informed methods (Supplemental Text) transform 2D contact matrices into 1D tracks that capture features relevant to genome folding, and then score them using Spearman’s correlation or mean squared error (MSE). The intermediate 1D track allows for the interpretation of which regions contribute most.
To comprehensively characterize the behavior of the basic, map-informed, and feature-informed scoring approaches, we implemented 11 representative methods in open-source code (Fig. 1C, Supplementary Table 1, Supplemental Text): MSE, Spearman’s rank correlation coefficient (ρ), structural similarity (SSIM), stratum-adjusted correlation coefficient (SCC), eigenvector difference, directionality difference, insulation difference, contact probability decay difference, triangle score, the HiCCUPS loop caller28, and the cooltools TAD caller31,32. We evaluated how these methods perform across diverse settings. We first applied the methods to Micro-C from human foreskin fibroblasts (HFF) and embryonic stem cells (ESC) to develop biological intuition about the type of map differences each method captures. We then evaluated their performance using a mass screen of in silico genetic perturbations. Finally, simulations isolated the effects of specific kinds of technical and biological variation. This three-part benchmark focuses on how methods rank map pairs, rather than the statistical significance of specific differences; stricter or looser significance thresholds can be applied to any score. In sum, we explored and quantified the behavior of scoring methods to learn when they are discordant with each other.
Beware! Map comparison methods produce discordant rankings
Spearman’s correlation, Pearson’s correlation, and mean squared error are most commonly used to score two maps28,33,34, as they are computationally efficient and require no feature selection. We compared their behavior using Micro-C contact maps from HFF and ESC cells across all 7,840 1-Mb windows of the human genome (Methods). These basic methods prioritized markedly different regions (Fig. 2, r2 = 0.0002, Supplemental Fig. 1)11, often for reasons unrelated to underlying biology. For example, a pair of maps with visible structural rearrangements but a low range of contact frequencies was prioritized by correlation, but not by MSE, as the absolute difference between them is small (Fig. 2A). Conversely, two maps with similar overall structure but different contact frequency ranges produce a large MSE even though they are very strongly correlated with each other (Fig. 2D). These inconsistencies occur because Spearman’s correlation is agnostic to intensity changes, while MSE is sensitive to intensity. Basic methods were not designed to identify specific chromatin features, and therefore may not always be biologically interpretable on their own. They often disagree.
Figure 2. Basic methods to compare contact frequency maps rank map pairs differently.
Mean squared error (MSE) and Spearman’s correlation (ρ) were calculated across the genome on experimental contact maps from embryonic stem cell (ESC) and human foreskin fibroblast (HFF) (n = 7840). Each point represents a comparison score between a pair of contact maps. We highlight examples where (A) only correlation ranks highly, (B) both methods agree the maps are similar, (C) both methods agree the maps are different, and (D) only MSE ranks highly.
Map-informed methods highlight changes in genome structure
The map-informed methods we created or extended have never been benchmarked. To gain intuition about their behavior, we used our comparison across experimental Micro-C maps in HFF and ESCs to evaluate how these methods behave on contact maps containing three common changes linked to disruption in gene regulation: a boundary change, a stripe change, and a loop change (Fig. 3A i, red boxes). Triangle score, directionality index, insulation difference, and eigenvector difference all correctly identified large contact changes across the three examples (Fig. 3A ii–v). Eigenvector difference in particular showed a strong separation between tracks at the emergence of a new boundary and the strengthening of an existing boundary (Fig. 3A iii). Compared to other approaches, directionality index performed best in identifying focal changes, like the loss of loops (Fig. 3A iv), while eigenvector difference and insulation difference instead prioritized global changes in contact. Finally, eigenvector difference and contact decay were sensitive to overall contrast difference. We observed a divergence in the contact decay tracks across the first pair where a map gains distal contact (Fig. 3 vi). In sum, the design of these methods highlight different features in the tracks, from overall structural differences and average contact, to sharp changes in contrast.
Figure 3: Map-informed and feature-informed methods capture differences in TAD boundaries, stripes, and loops. A.
i. Examples of regions where contact frequency maps differ between HFF and ESCs across three structural changes: a lost TAD boundary (left panel), a lost stripe (middle panel), and lost loops (right panel), as marked by red boxes. ii-vi. Tracks corresponding to each map-informed disruption score method are shown below for ESCs (blue) and HFF (gray). Tracks for methods in (ii. - v.) correspond to the coordinates of the contact maps, while contact decay in (v.) is plotted across genomic distance. B. i. Two loci in HFF and ESC with a boundary and loop change (GRCh38 chr3:137129984–138178560 and GRCh38 chr3:138702848–139751424, respectively). ii. Applying a TAD boundary caller identifies a boundary change between cell types iii. Comparing chromatin loops identifies a genomic region with differential looping.
Feature-informed methods prioritize changes to interacting chromatin regions
To evaluate comparison approaches based on TAD and loop calling methods, we chose two regions with differential structure between ESC and HFF maps (Fig. 3B i) and tuned the parameters of the cooltools TAD caller 31,32 and the HiCCUPS loop caller 28 (Supplemental Fig. 2) 35,36. As expected, the TAD caller correctly identified all three TAD boundaries visible in ESCs, including one that is lacking in HFF (Fig. 3B ii). Similarly, the loop caller identified a loop that is unique to ESCs (Fig. 3B iii). While these feature-informed approaches are biologically interpretable, they tend to be slower, address only one element at a time, and require additional parameter selection (Table 1, Supplemental Table 1). These methods also require a significance cutoff for initial feature calls, which may result in missed features of low signal. Additionally, most maps contain fewer than ten called features in a 1-Mb window, creating a small range of possible scores. Therefore, caution should be exercised when using these scores at a large scale, especially in maps without strong TADs or loops, where they can produce artificial results.
Table 1:
Strengths, weaknesses, and suggested applications of disruption score methods.
Technical sensitivities | Application-based features | Specs | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Comparison method | Resistant to noise | Resistant to decreased resolution | Biased towards ___ contrast maps | Sensitive to small structural changes | Sensitive to signal change at boundaries | Sensitive to loss or gain of boundaries | Sensitive to changes in TAD substructures | Needs tuning | Time & memory | ||
High | Low | ||||||||||
MSE | X | ✓ | ✓✓ | ✓ | ✓ | ||||||
Corr | ✓ | X | ✓ | ✓✓ | ✓ | ||||||
SCC | X | ✓ | ✓✓ | ✓ | |||||||
SSIM | ✓ | ✓ | ✓✓ | ✓ | ✓ | ||||||
Eigenvector (corr)* | ✓✓ | ✓✓ | X | ✓✓ | ✓ | X | ✓ | ||||
DI (corr) | ✓✓ | ✓✓ | ✓ | ✓ | ✓✓ | ✓ | ✓ | ✓ | |||
Insulation (mse) | ✓ | X | ✓ | ✓✓ | ✓ | ✓ | ✓ | ||||
Insulation (corr) | ✓ | X | ✓✓ | ✓ | ✓ | ||||||
Contact decay (corr)** | ✓ | ✓✓ | X | ✓✓ | X | ||||||
Triangle (mse)* | ✓ | ✓ | X | ✓ | ✓✓ | ✓✓ | ✓ | X X | |||
Triangle (corr)* | ✓✓ | ✓✓ | X | ✓ | ✓✓ | X X | |||||
Loop calling | NA | NA | X | ✓ | X | ✓ | ✓ | ||||
TAD calling | NA | NA | X | X | X | ✓ | ✓ | ✓ |
Trends and patterns across disruption scores summarized from statistical comparisons (Fig. 4), simulations (Fig. 5), and manual parsing of the most highly disruptive perturbations for each method (Supp. Fig. 5). While this summary is not exhaustive of all possible outcomes, it provides qualitative guidelines for users to make informed decisions when selecting a comparison method based on the scale and application of their research. We use green checks to indicate advantages and red X’s to indicate disadvantages for each method category: basic methods (blue) map-informed methods (orange), and feature-informed methods (green). Double signs represent strong patterns, while no sign indicates no pattern, and NA denotes that the method was not tested.
In silico perturbation enables evaluation of contact map comparison methods at scale
Although differences between cell-types exist, 3D genome organization is often highly conserved7,30,37. To evaluate the performance of map comparison methods across a wider variety of possible changes in chromatin structure, we used an in silico approach to generate pairs of 1-Mb maps across the genome with a variety of perturbations. We applied Akita 20, a convolutional neural network that predicts genome folding from sequence alone, to generate contact frequency maps from sequences with and without a genetic perturbation likely to disrupt genome folding (Fig. 4A). We designed three types of perturbations: CTCF canonical motif insertions 38, endogenous CTCF motif deletions, and random 100 base pair deletions (Methods). In total, we produced 22,500 unique contact frequency map pairs on which to test all three types of methods. To enable large-scale evaluation, we applied the 11 methods and transformed their scores such that higher values indicate greater disruption of 3D organization and smaller values indicate more similar organization (Methods, Supplemental Fig. 3).
Figure 4. Comparison of disruption score methods.
(A) Schematic describing the strategy for comparing in silico perturbed contact maps. Random ~1 Mb windows of the human genome (GRCh38) are selected and input into Akita to predict chromatin contacts (left). The same window is also perturbed with a CTCF motif insertion, deletion, or random 100 base pair deletion. The resulting sequence is also input into Akita to predict chromatin contacts of this perturbed reference sequence (right). The perturbed and unperturbed maps were compared by applying the 11 basic, map-informed, and feature-informed methods. (B) Correlation matrix of the methods tested, where cells are shaded according to how well their scores correlated across perturbations. Concordance of the top-scoring perturbations (Supp. Fig. 6) also shows agreement between corr and SSIM, while highlighting that loops and triangle (corr) are quite concordant with other methods when considering only the top scores. Colors across the top of the heatmap identify the individual methods. (C) Principal component analysis of disruption scores of each method from perturbed map pairs. (D) Heatmap of normalized disruption scores across all methods and perturbations. The colored key along the top of the heatmap indicates whether the perturbation was a random deletion (pink), a CTCF insertion (navy), or a CTCF deletion (light blue). Method colors are the same as in (C). Four broad trends in disruption score patterns across methods are marked with brackets. (E) Representative example map pairs chosen from the groups identified in D: i. high scores across 5 methods; ii. low across all methods except for eigenvector (corr); iii. low scores across all methods; iv. low scores across methods but higher for MSE-based scores; v. high scores only for MSE-based scores; vi. high scores for correlation-based scores: triangle (corr), corr, and SCC.
We quantified the similarities and differences between methods by comparing the scores for all 22,500 in silico perturbations across all possible pairs of methods. We found that TAD- and loop-based scores are most different from the rest, as they only detect a specific type of change (Fig. 4B). Correlation-based measures (i.e., Spearman’s correlation, SSIM, and correlation of contact decay) cluster together distinct from MSE-based methods (i.e., MSE, triangle (MSE), insulation (MSE)). This result aligns with our initial observation that Spearman’s correlation and MSE often do not agree, especially across their top-scoring variants (Fig. 2, Supplemental Fig. 4, Supplemental Fig. 5). Principal component analysis (PCA) on the disruption scores shows similar clustering (Fig. 4C).
We next simultaneously clustered the perturbed map pairs and scores across methods to identify groups of perturbations that differentiate them (Fig. 4D). While all correlation-based methods exhibit similar behavior, insulation (corr), SSIM, and DI (corr) produce scores which are more uniformly distributed and less extreme across perturbations, highlighting the necessity of appropriate normalization when comparing across methods (Fig. 4E i and vi). We also find that perturbations created by CTCF insertion group together, as they are often the most disruptive of 3D organization. However, we observed substantial sub-structure within the cluster, reflecting differences in the behavior of scores on these maps. For example, cluster i is highly scored by all methods, and a representative perturbation example shows a variety of changes: gained loops, lost stripes, and boundary changes. The magnitude of changes in this set likely contributes to the universally high scores. Clusters iv and v are primarily composed of CTCF insertions, where scores are similar across most methods, but higher only for MSE-based methods. Profile v is the most dissimilar. Here, the representative map pair has minimal structural differences but extreme contrast, suggesting that this cluster is defined by examples of high dynamic range that are over-prioritized by MSE-based methods (Fig. 4E v).
We further compared methods by quantifying how well the top-ranked maps agree across methods. Some methods have high overlap (Supplemental Fig. 5, Supplemental Fig. 6). For example, 85% of map pairs are ranked in the top 5th percentile for both SCC and Spearman’s correlation, indicating some general agreement in the methods. However, many methods have minimal overlap, suggesting they prioritize different features. For example, only 32% of the top 5th percentile of maps ranked by insulation (MSE) and SSIM are shared. Finally, we applied methods to map pairs selected to represent a range of effect sizes and confirmed all methods are sensitive to large changes and insensitive to small changes (Supplemental Fig. 7).
Simulation studies quantify method sensitivity
Our in silico screen produced a diversity of structural alterations, often affecting multiple aspects of the map. For instance, a CTCF site insertion can both create a new TAD boundary and alter overall contact intensity. To disentangle how each method responds to changes in particular map features, we generated simulated maps and synthetically altered a single variable at a time. We then measured the sensitivity of each score to each specific change. As a template, we created a contact frequency map with two CTCF motifs forming a TAD and used this canvas to simulate both biologically meaningful changes (e.g., change in TAD size, substructure, or intensity) and technical artifacts (e.g., change in noise or resolution) (Methods). For each change, we gradually increased the strength of the perturbation across 100 maps and subsequently applied scoring methods (Supplemental Fig. 8).
Each method responded differently across the simulated changes (Fig. 5). Steeper curves represent high sensitivity to the perturbation, while flatter curves represent less sensitivity. We find that basic methods are most sensitive to technical variations, such as increased noise and decreased resolution, while map-informed methods are most robust (Fig. 5A–B). As expected, correlation-based methods are unaffected by changes in contrast and intensity, while MSE-based methods are highly sensitive (Fig. 5C–D). All methods except eigenvector difference identify TAD size and sub-structure changes. However, some prioritize certain types of organizational changes over others (Fig. 5E–F). For example, insulation difference and triangle profile are sensitive to boundary changes, while directionality index highlights new boundaries but is less effective in identifying changes to existing boundaries. We synthesized these results along with findings from in silico perturbations in the Guidelines to provide recommendations based on the intended application.
Figure 5. Simulated contact frequency maps with controlled perturbations estimate disruption score method sensitivities.
Normalized disruption scores are plotted for a simulated contact frequency map containing a TAD across 6 types of perturbations, plotted on the x-axis. Each perturbation was added at 100 different degrees. The images shown correspond to the final degree–the maximum perturbation added. Line plots show disruption scores from comparing the original map (top left corner) to each perturbed map. Maps corresponding to the incremental increases in perturbation are shown alongside the changed scores in Supp. Fig. 8. (A) Noise is added by introducing random values drawn from a Gaussian distribution to the maps; (B) Resolution is lowered by increasing bin size; (C) Contrast is applied by increasing the range of the signal; (D) Intensity is increased globally by adding a constant to all values; (E) Size is increased by slightly enlarging the domain width; (E) A sub-structure is added by gradually incorporating a new boundary at the center of the existing TAD.
Guidelines
Our study assessed the effectiveness of 11 existing and new methods for comparing 3D genome contact maps (Supplementary Table 1). Although there were similarities between the top-scoring variants of most methods, our results indicate that they differ substantially in their sensitivity to biological and technical variation (Supplemental Fig. 6). We summarize these findings and guidelines in Table 1.
All of the methods can identify structural changes, such as changes to domain size or the addition of substructure, but to varying degrees. Of the basic methods, MSE and SCC more readily identify subtle organizational changes. Among the map-informed methods, insulation difference and triangle difference are the most effective at identifying changes in both existing and new domain boundaries. Directionality index highlights new boundaries or substructures but less readily identifies changes in existing boundaries. Eigenvector difference and contact probability decay are the least sensitive to small-scale organizational changes, but prioritize larger-scale changes in the overall structure of the map. These statistics have been deployed primarily for identifying differences at the scale of compartments and whole chromosomes, so it is not surprising that they are not sensitive to map differences within 1-Mb windows.
In general, the new map-informed methods we proposed are concordant with basic methods and each other, especially when comparing the top 5% of scores genome-wide (30%−80% of examples are shared; Supplemental Fig. 6). Triangle difference stands out among our newly implemented methods as highly concordant with other methods and able to detect a variety of map differences, but it is also the slowest (Supplementary Table 1). Insulation difference is faster and also fairly concordant with other methods. Most top 5% map pairs called by other methods are also high-scoring with loop calling, but not TAD calling. Loop calling also identifies many additional map pairs that are not in the top 5% of other methods.
Correlation-based methods are insensitive to changes in contrast and intensity, while MSE-based methods are highly sensitive to these changes. In contrast, map-informed methods summarize maps across a feature track and are therefore more robust to these changes. One notable exception is insulation difference, which is more sensitive to resolution changes that obscure domain boundaries. Some map changes, such as contrast or intensity, may either be biologically meaningful or a consequence of technical variability, depending on the scenario. The basic methods, especially MSE, SCC, and SSIM, are particularly sensitive to technical variation such as increased noise and decreased resolution. SSIM falls in between. We also note that MSE is by far the fastest approach (Supplementary Table 1). All others require less than 10 seconds per thousand calls, aside from eigenvector difference and triangle track, which can be accelerated by decreasing the resolution of the maps prior to comparison.
We recommend using multiple methods in tandem. We find that there is no “one size fits all” metric that best identifies every feature of interest in a chromatin contact map. Researchers should consider the intended application and the types of changes that are meaningful when selecting the most effective and relevant metrics. We recommend first applying basic methods as an initial screen to identify the most disrupted maps, especially when evaluating large datasets. Using both correlation- and MSE-based scores will help mitigate biases of each. We next suggest applying a map-informed method, such as triangle or insulation difference, to a subset of disruptive perturbations to gain insight into the types of changes present. Finally, feature-informed methods can be used to explore TAD and loop gains/losses and to develop mechanistic hypotheses.
Code
Our codebase is publicly available to enable researchers to easily test and apply all 11 approaches to their own research questions. The code is written in Python and is accompanied by documentation and tutorials to help users get started. The methods have flexible hyperparameters and can be run simultaneously on one dataset, making it easier to compare the results of different approaches and select the most appropriate methods. To aid in interpretation of the methods, we also provide guidance on how to visualize map-informed and feature-informed approaches across contact matrices. Overall, our codebase provides a valuable resource for researchers who wish to apply multiple methods to their own datasets and rank pairs of maps based on their differences.
Discussion
In this study, we evaluated and compared the behavior of 11 methods for quantifying differences between pairs of 3D contact maps, including many methods that have not been previously used for this application. We introduced insulation difference, eigenvector difference, and contact decay difference, as well as the new triangle comparison method, which is robust to noise while capturing structural differences between maps. We found that the choice of scoring function can have a significant impact on the conclusions drawn from the data, and therefore suggest that multiple comparison metrics should be used when seeking biological insights into the function of the 3D genome.
Several limitations should be considered when evaluating our results. While we consider a range of experimental, predicted, and simulated maps, our findings may not apply to other experimental conditions, such as single-cell contact matrices or other scenarios in which maps have a high level of noise and/or sparsity. Additionally, some of the methods we evaluated have variables that can be tuned to optimize performance in a given context (Supplementary Text, Supplementary Table 1). We only tested one TAD caller and one loop caller to examine their general utility 35,36. Finally, we did not directly address the problem of identifying a threshold beyond which the differences should be considered biologically or statistically significant. One could apply previously proposed 6,25,39 and novel thresholding methods to the ranks computed with scoring methods to define a significant set of map pairs.
Our work provides useful guidelines for scoring contact maps that will enable further discovery into the mechanisms of the 3D genome. We provide a codebase of methods for flexible and fast scoring across contact maps under a unified framework. The experiments we performed as a part of this study, such as the in silico deletion and insertion of thousands of CTCF motifs genome-wide, provide a useful dataset for evaluating diverse biological questions or utility as controls for the level of 3D genome variation expected based on CTCF and random perturbations. We anticipate that incorporating methods with stronger biological interpretability, like those evaluated here, may further improve machine learning methods for predicting contact maps. Overall, by developing novel and more robust scoring functions, our study provides a foundation for analyzing contact maps at scale.
Methods
Datasets
Experimental maps
Maps of 3D chromatin contact are represented as 2D matrices of pairwise interaction frequencies. Regions of maps with high values indicate genomic loci with a high frequency of interaction in physical space, on average. Following experimental Hi-C, maps begin as raw read counts, which are subsequently balanced and normalized to reflect log(observed/expected) contact frequencies 40.
Experimental data considered in this study from HFF and ESCs were preprocessed as training datasets for the Akita model11,20. Specifically, these high-quality Micro-C datasets were normalized with genome-wide iterative correction (ICE), adaptively coarse-grained, normalized for distance-dependent decrease in contact frequency, log clipped to (−2,2), linearly interpolated to fill missing bins, and convolved with a 2D Gaussian filter for smoothing. Processing maps ensures consistency across the experimental data and computational predictions since we do not evaluate raw experimental read counts.
Predicted maps
To effectively compare contact maps at scale, we generated a dataset of thousands of maps predicted from in silico CTCF motive insertions, CTCF motif deletions, random 100 bp sequence insertions, and random 100 bp sequence deletions. These alterations were passed into Akita20, a model predicting genome folding from sequence, to generate pairs of maps with structural rearrangements. We first curated sequences for insertion. CTCF motif sequences were randomly selected from annotated CTCF sites in the reference genome from the hg38 build of the JASPAR database38. Random 100 bp fragments were also selected from chromosome 1 for insertion. Both the CTCF and random sequences were inserted into the center of 1-Mb of DNA with start locations randomly selected from chromosome 1. Akita requires a fixed input of 220 bp. Additional sequence was trimmed from the 3’ end, such that the final sequence remained 1-Mb. To curate deletions, we again selected random CTCF sites from JASPAR, pulled the surrounding 1-Mb of DNA, removed the motif sequence, and pulled in additional sequence from the 3’ end such that the entire sequence remained 1-Mb in length. The same strategy was applied to randomly selected 100 bp fragments for deletion. All generated 1-Mb genomic query sequences were filtered to exclude overlap with ENCODE blacklisted regions41. For each perturbation, both the original genomic sequence and the perturbed sequence were provided to Akita, resulting in two predicted 448×448 contact maps where the resolution of each pixel is 2048 bp representing a total length of ~1 Mb (220) of DNA sequence 20. This dataset consists of 7,500 matched contact maps for each category of perturbation for 30,000 total map pairs. Random 100 bp insertions were generally excluded from analysis, as they had almost no effect.
Simulated maps
To generate simulated maps, we initially generated predicted maps with Akita from random DNA sequence. Predicted maps still showed minimal structure from randomly occurring CTCF-like motifs. Sequence matches to the forward and reverse canonical CTCF motif38 were therefore shuffled to produce a predicted blank canvas map devoid of all higher-order folding patterns. Structure was reintroduced to simulated maps by inserting forward and reverse CTCF motifs ¼ and ¾ through the random DNA sequence, producing TAD-like boundaries. We tuned simulated parameters as described below.
Noise: Gaussian noise was added to the maps with a standard deviation ranging from 0 (no added noise) to 0.2.
Resolution: The original 448×448 map was downsampled ranging from a resolution of 2,048 bp (original resolution) to 50,972 bp.
Contrast: Pixel intensities of the contact map were multiplied by a scalar ranging from 1 (no increase in contrast) to 2.
Intensity: A scalar value ranging from 0 (no addition) to 0.2 was added to all pixels in the contact map.
Size: The size of the substructure within the map was increased by resizing the original map by a scalar and trimming the matrix back down to the original dimensions. Map sizes were increased by a factor of 1 (no resize) to 1.1.
Substructure: An additional map was created by introducing CTCF halfway into the random sequence to produce an additional boundary. The original map was combined with the substructure map with a multiplier ranging from 0 (no added structure) to 1 (total added structure).
Visualizations of these changes can be found in Supplemental Fig. 8.
Benchmarking methods
Adapting new methods
Triangle profile is a novel scoring method. Directionality index30, PCA29, insulation31, and contact decay9 are established methods for analysis on individual Hi-C maps, but have not previously been used to to score pairs of maps. For map-motivated and feature-motivated methods, it is possible to plot the scoring method results along the length of the map, or on the map itself, as seen in Figure 3. The common behavior across maps with a small change, a large change, and no change is illustrated in Supplemental Fig. 7.
Comparing contact maps
We applied all comparison metrics to pairs of experimental, predicted, and synthetic maps. For details regarding how each metric is computed, see Supplemental Text. Any missing values were masked prior to evaluation and not considered by the comparison metrics. Scoring method implementations can be found within scoring.py in the codebase. MSE, Spearman’s rank correlation coefficient, and Pearson correlation coefficient were applied to map-informed methods to collapse two 2D tracks into a scalar value. Pearson correlation behaved almost identically to Spearman’s rank correlation, and therefore was excluded from analysis (Supplemental Fig. 1). For computationally intensive methods, we reduced the resolution of the input from 2kb to 10kb to speed evaluation time across thousands of comparisons.
To ensure that scores across approaches are comparable, we flip some methods such that higher values indicate greater disruption and smaller values indicate more similar maps. For methods like correlations, we use 1 - correlation such that a perfect correlation (1) is flipped to mean no difference (0). For all the results, we provide raw scores and normalized scores so that it is easier to interpret how a raw score for one method compares to a raw score of another method. We additionally scale all values by the mean score of all random 100 bp deletions using Akita, which we find to have minimal impact (Supplemental Fig. 3). For example, a raw MSE of 0.0065 and a raw 1 - pearson correlation of 0.036 both correspond to the same normalized score of 2. That is, a disruption of that magnitude corresponds to 2 times the average disruption of a 100 bp deletion.
For loop and TAD callers, we quantify the ratio of changed (e.g. added or lost) features (TADs or loops) to extend these approaches and generate a single score for each pair of maps.
Method parameters
The following methods required no adjustable input parameters: mean squared error, Spearman’s rank correlation coefficient, and pearson correlation coefficient, SSIM, SCC, contact decay, eigenvector, and triangle correlation. We describe tunable parameters choices for the remaining methods below. We did not optimize tunable parameter choices but instead selected default choices from existing approaches. Results from alternative parameter selection are demonstrated in Supplemental Fig. 2 and Supplemental Fig. 9.
Insulation:
window_size=10: size of the diamond-shaped window considered
Directionality index:
window_resolution=10000: resolution of sliding window in bp
replace_ends: replaces ends of DI track with 0s
buffer=50: how far from the track ends to replace with 0
Loop difference:
p=2: the width of the interaction region surrounding the peak
width=5: the size to get the donut filter
ther=1.1: the threshold for the ratio of center windows to the donut filter and lower left filter
ther_H=1.1: the threshold for the ratio of center windows to the horizontal filter
ther_V=1.1: the threshold for the ratio of center windows to the vertical filter
radius=5: the upper bound of distance of two loop points considered as same
TAD difference:
window_size=5: size of the diamond-shaped window
ther=0.2: the threshold for TAD boundaries
radius=5: the upper bound of distance of two TADs considered as same
Supplementary Material
Acknowledgements
We gratefully acknowledge members of the Pollard lab for project feedback, as well as Vijay Ramani, for proposing eigenvector difference. We additionally thank Mike Keiser for feedback and support.
Funding Sources
This work was supported by the NIH 4D Nucleome Project (award #U01HL157989 to K.S.P.), the NIH Office of the Director (award #R03OD034499 to K.S.P.), NIGMS (award #R35GM127087 to J.A.C, award #T32GM007347), NHGRI (award #F30HG011200 to E.M.), Additional Ventures, a UCSF Achievement Rewards for College Scientists Scholarship (L.M.G.), and Gladstone Institutes.
Code and Data Availability
All original code and resulting data from in silico screens have been deposited at –https://github.com/pollardlab/contact_map_scoring.
References
- 1.Kragesteen B. K. et al. Dynamic 3D chromatin architecture contributes to enhancer specificity and limb morphogenesis. Nat. Genet. 50, 1463–1473 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Spielmann M., Lupiáñez D. G. & Mundlos S. Structural variation in the 3D genome. Nat. Rev. Genet. 19, 453–467 (2018). [DOI] [PubMed] [Google Scholar]
- 3.Gorkin D. U. et al. Common DNA sequence variation influences 3-dimensional conformation of the human genome. Genome Biol. 20, 255 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Eres I. E., Luo K., Hsiao C. J., Blake L. E. & Gilad Y. Reorganization of 3D genome structure may contribute to gene regulatory evolution in primates. PLoS Genet. 15, e1008278 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hoencamp C. et al. 3D genomics across the tree of life reveals condensin II as a determinant of architecture type. Science 372, 984–989 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Galan S. et al. CHESS enables quantitative comparison of chromatin contact data and automatic feature extraction. Nat. Genet. 52, 1247–1255 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Yang T. et al. HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res. 27, 1939–1949 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Dekker J., Rippe K., Dekker M. & Kleckner N. Capturing Chromosome Conformation. Science 295, 1306–1311 (2002). [DOI] [PubMed] [Google Scholar]
- 9.Lieberman-Aiden E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Dixon J. R., Gorkin D. U. & Ren B. Chromatin Domains: The Unit of Chromosome Organization. Mol. Cell 62, 668–680 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Krietenstein N. et al. Ultrastructural Details of Mammalian Chromosome Architecture. Mol. Cell 78, 554–565.e7 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hsieh T.-H. S. et al. Resolving the 3D Landscape of Transcription-Linked Mammalian Chromatin Folding. Mol. Cell 78, 539–553.e8 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Fudenberg G. et al. Formation of Chromosomal Domains by Loop Extrusion. Cell Rep. 15, 2038–2049 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Vian L. et al. The Energetics and Physiological Impact of Cohesin Extrusion. Cell 175, 292–294 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kraft K. et al. Serial genomic inversions induce tissue-specific architectural stripes, gene misexpression and congenital malformations. Nat. Cell Biol. 21, 305–310 (2019). [DOI] [PubMed] [Google Scholar]
- 16.Nagano T. et al. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502, 59–64 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Nagano T. et al. Cell-cycle dynamics of chromosomal organization at single-cell resolution. Nature 547, 61–67 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Tan L., Xing D., Chang C.-H., Li H. & Xie X. S. Three-dimensional genome structures of single diploid human cells. Science 361, 924–928 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Tan L. et al. Changes in genome architecture and transcriptional dynamics progress independently of sensory experience during post-natal brain development. Cell 184, 741–758.e17 (2021). [DOI] [PubMed] [Google Scholar]
- 20.Fudenberg G., Kelley D. R. & Pollard K. S. Predicting 3D genome folding from DNA sequence with Akita. Nat. Methods 17, 1111–1117 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Schwessinger R. et al. DeepC: predicting 3D genome folding using megabase-scale transfer learning. Nat. Methods 17, 1118–1124 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zhang R., Zhou T. & Ma J. Multiscale and integrative single-cell Hi-C analysis with Higashi. Nat. Biotechnol. 40, 254–261 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Yan K.-K., Yardimci G. G., Yan C., Noble W. S. & Gerstein M. HiC-spector: a matrix library for spectral and reproducibility analysis of Hi-C contact maps. Bioinformatics 33, 2199–2201 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Yardımcı G. G. et al. Measuring the reproducibility and quality of Hi-C data. Genome Biol. 20, 57 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Stansfield J. C., Cresswell K. G., Vladimirov V. I. & Dozmorov M. G. HiCcompare: an R-package for joint normalization and comparison of HI-C datasets. BMC Bioinformatics 19, (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Yang D., Chung T. & Kim D. DeepLUCIA: predicting tissue-specific chromatin loops using Deep Learning-based Universal Chromatin Interaction Annotator. Bioinformatics 38, 3501–3512 (2022). [DOI] [PubMed] [Google Scholar]
- 27.Boninsegna L. et al. Integrative genome modeling platform reveals essentiality of rare contact events in 3D genome organizations. Nat. Methods 19, 938–949 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Rao S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Nichols M. H. & Corces V. G. Principles of 3D compartmentalization of the human genome. Cell Rep. 35, 109330 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Dixon J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Crane E. et al. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523, 240–244 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Open2C et al. Cooltools: enabling high-resolution Hi-C analysis in Python. bioRxiv 2022.10.31.514564 (2022) doi: 10.1101/2022.10.31.514564. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Imakaev M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Dixon J. R. et al. Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331–336 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Forcato M. et al. Comparison of computational methods for Hi-C data analysis. Nat. Methods 14, 679–685 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Zufferey M., Tavernari D., Oricchio E. & Ciriello G. Comparison of computational methods for the identification of topologically associating domains. Genome Biol. 19, 217 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.McArthur E. & Capra J. A. Topologically associating domain boundaries that are stable across diverse cell types are evolutionarily constrained and enriched for heritability. Am. J. Hum. Genet. 108, 269–283 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Castro-Mondragon J. A. et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 50, D165–D173 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Xu Z. et al. A hidden Markov random field-based Bayesian method for the detection of long-range chromosomal interactions in Hi-C data. Bioinformatics 32, 650–656 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lyu H., Liu E. & Wu Z. Comparison of normalization methods for Hi-C data. Biotechniques 68, 56–64 (2020). [DOI] [PubMed] [Google Scholar]
- 41.Amemiya H. M., Kundaje A. & Boyle A. P. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci. Rep. 9, 9354 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All original code and resulting data from in silico screens have been deposited at –https://github.com/pollardlab/contact_map_scoring.