Related to Figure 5
a. Barplot of feature importance scores calculated using recursive feature selection method for predicting gene expression levels, shows that 8 out of 10 features scored as important ranked by high to low. Pink indicates 1D features, while blue indicates 3D variables. Light blue color indicates the features that were not selected for model training. For further details see also Supplementary Table 6.
b. Spearman correlation values between each of the 10 variables considered for our 3D model with gene expression levels (left) and differential expression levels (right). For each feature the dots represent the minimum, mean and maximum correlation score from 4 tested cell lines (ESC, TSC, XEN and EPISC) (left) or from 6 differential analysis pairs (right). For further details see also Supplementary Table 6.
c. Area Under Curve (AUC) scores and Spearman Correlation scores generated for predicting classification of gene expression (top 10% high vs low expressing genes, left graph) and absolute levels (right graph) in ESC or TSC cells using each of our 3D-HiChAT, Promoter-1D and Linear-1D models across various distances from the TSS (5kb-100kb). Each dot represents the average score across all 20 chromosomes using the LOCO approach, while error bars show standard deviation. See also Extended Data Figure 5 for the rest of the cell lines and comparisons. For further details see also Supplementary Table 6.
d. Plots showing AUC and Sperman correlation scores for predicting classification of gene expression (top 10% high vs low expressing genes, left graph) and absolute levels (right graph) using 3D-HiChAT model (Trained in TSCs) in various lineages including mouse lineages: TSCs, ESCs, XEN, EpiSCs and MEFs42 and published data from human lineages: Naïve T cells, T-Helper 17 Cells (Th17), and T regulatory cells (Tregs)151,152.
e. Area Under Curve (AUC) scores and Spearman Correlation scores generated for predicting differential expression classification (top 10% up or downregulated, left) and fold change expression (right) between XEN and ESCs using each of 3D-HiChAT, Promoter-1D and Linear-1D models across various distances from the TSS (5kb-100kb). Each dot represents the average score across all 20 chromosomes using the LOCO approach, while error bars show standard deviation. For further details see also Supplementary Table 6.
f. Ranked perturbation scores (%) as predicted by in silico perturbations of ~46K E-P pairs in ESC, ~46.7K in TSC and ~53.1K in XEN using the 3D-HiChAT model. The dotted horizontal lines indicate the selected cut-offs for impactful or not perturbations, defined as the points on the curves where the slope of the tangent is >1 (blue) or <−1 (red). The latter represent putative functional enhancer-promoter pairs, since in silico perturbation of the enahcers results in reduced predicted gene expression levels.
g. Scatterplot comparing for each anchor the predicted perturbation scores from our 3D-HiChAT model with the respective ABC scores. The R Spearman correlation value is shown on the top.
h. Boxplots showing that enhancers with high 3D-HiChAT-predicted perturbation scores and low ABC scores (red) are significantly more distal to their target genes (loop size) than those with concordant high scores in both models (blue) (left plot). Similarly, comparison of all enhancers/anchors with either high 3D-HiChAT predicted perturbation scores (perturbation <−10%, red) or high ABC scores (ABC>0.7), show that 3D-HiChAT predicts potentially functional enhancers at larger distances (right plot). n, indicates the number of anchors used for each comparison. Asterisk indicates significance, p-val<0.001
Note: all statistics are provided in Supplementary Table 9.