Skip to main content
Cell Reports Methods logoLink to Cell Reports Methods
. 2023 Sep 25;3(9):100594. doi: 10.1016/j.crmeth.2023.100594

Predicting patient-specific enhancer-promoter interactions

Brittany Baur 1,3,4, Sushmita Roy 1,2,
PMCID: PMC10545932  PMID: 37751694

Abstract

Computational methods that can predict hard-to-measure modalities from those that are easier to measure, in a patient-specific manner, play a critical role in personalized medicine. In this issue of Cell Reports Methods, Khurana et al. present differential gene targets of accessible chromatin (DGTAC), an approach which predicts patient-specific enhancer-promoter interactions.


Computational methods that can predict hard-to-measure modalities from those that are easier to measure, in a patient-specific manner, play a critical role in personalized medicine. In this issue of Cell Reports Methods, Khurana et al. present differential gene targets of accessible chromatin (DGTAC), an approach which predicts patient-specific enhancer-promoter interactions.

Main text

Individualized patient-specific omics profiling has great potential in developing and delivering precise, tailored patient-specific therapy and has shown early promise for a number of diseases such as cancer and infectious diseases.1 A major hurdle in realizing the promise of personalized omic profiling is collecting multiple modalities from a small number of cells typical of patient samples. A key question is whether we need to measure all the modalities or if can we predict some modalities of individual patients from a small number of informative assays.

In many diseases, such as cancer, mis-regulated gene expression is a driver of aberrant cellular phenotypes. Long-range gene regulation, which determines how distal regulatory elements can control the expression of a gene, has emerged as a major determinant of context-specific gene expression. Rewiring of enhancer-promoter interactions has been shown to be a core feature of cancer samples (Figure 1). Mis-interaction of enhancers and promoters in cancer can be caused by chromatin structural changes or genomic rearrangements that reposition the enhancer close to an oncogene.2 Additional mechanisms may cause activation of enhancers that cause overexpression of oncogenes. Overall, these changes can cause a loss of cell identity that promotes the development of cancer.2 Such long-range interactions are also possible mechanisms by which regulatory variants impact downstream gene expression programs. Defining long-range interactions in a patient-specific manner could significantly advance our understanding of mis-regulated gene expression in diseases, including cancer.

Figure 1.

Figure 1

Examining patient-specific omic profiles is important to provide customized therapy for individual patients

Many diseases are due to mis-regulation of gene expression, and there are different mechanisms by which gene expression can be disrupted between healthy individuals and those with disease. For example, structural rearrangements can break topologically associating domain (TAD) boundaries (shown as red triangles), resulting in promiscuous expression of genes now in the same TAD as an enhancer that was previously compartmentalized in normal cells. Regulatory variants in a distal enhancer can disrupt the binding landscape of an enhancer and impact the expression of the target gene. Finally, chromatin state, defined by histone modifications and accessibility, can change between normal individuals and those with disease, resulting in overexpression or underexpression of the target gene. The outputs from DGTAC can be used to examine these different modes of mis-regulation of gene expression.

Patient silhouette image by pikisuperstar on Freepik.

Long-range interactions can be defined both experimentally3 or computationally.4 However, applying these techniques on patient samples is not straightforward as these methods require either a large number of cells to experimentally map these interactions or multiple modalities to reliably predict such interactions. ATAC-seq requires fewer cells and can identify regions of accessible chromatin, a key feature of enhancers.5 However, ATAC-seq alone is not enough to differentiate between poised vs. active enhancers or the gene targets that the enhancer regulates. In this issue of Cell Reports Methods, Khurana and colleagues present differential gene targets of accessible chromatin (DGTAC),6 which offers a possible solution to the problem for identifying long-range interactions between enhancers and genes in cancer patient samples.

DGTAC is a machine-learning method to identify differential target genes of accessible chromatin in a cell-type-specific manner using only ATAC-seq and RNA-seq. The authors used ATAC-seq and RNA-seq data from 371 patients spanning 22 cancer types and used ElasticNet regression to compute the coefficients for ATAC-seq peaks within 0.5 Mbp of the gene’s transcription start site to predict gene expression. The key here is that the authors then convert this to a sample-specific error term by comparing the predicted expression for a patient with the actual gene expression. Smaller error terms indicate a stronger association of the peak to gene expression. A similar idea was used by the Joint Effect of Multiple Enhancers (JEME) method from Cao et al.7 In addition to the sample-specific error term, the authors use ATAC-seq signal strength, copy number, and expression of target genes to construct a feature matrix for the patient sample. Then the authors train two random forest models using Cohesin/CTCF ChIA-PET datasets from MCF7, a breast cancer cell line for breast invasive carcinoma (BRCA), and HeLA, a cervical cancer cell line for cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), on patient samples best matching these cell lines to predict enhancer-gene interactions. With these trained models in hand, the authors generated predictions with ATAC-seq and RNA-seq alone. In addition to achieving high performance as measured by the area under the precision recall-curve (AUPRC) (92%) in a cross-validation setting, the models also generalized to cancer types which they were not trained on. In particular, the BRCA model performed well on the CESC samples and the CESC model performed well on BRCA samples. The sample-specific error term derived from the ElasticNet regression was the most important sample-specific feature.

With these predicted interactions in hand, the authors identified “canonical” and “non-canonical” peak-gene interactions and poised and active enhancers. The authors hypothesized that sites where the ATAC-seq peak is always present in patient samples, but only predicted by DGTAC to connect to genes in a subset of the samples, may be enhancers in a poised state, and operate as non-canonical peak-gene interactions. Canonical peak-gene interactions are more intuitive. If the peak is present in a sample, then it will be connected to its gene target in the sample. Therefore, the authors devised a set of experiments to show that open chromatin regions associated with predicted target genes are active regulatory elements. They generated predictions in 11 cell lines and showed a significantly higher H3K27ac signal, a hallmark of active enhancers, for peaks with a predicted target gene compared with peaks without a predicted target gene. Peaks with a predicted target gene are also enriched in enhancer-like signatures and have stronger correlation of H3K27ac with DNase. Likewise, predicted poised enhancers are enriched in H3K4me1, a repressive mark.

Beyond characterizing poised and active enhancers, DGTAC is able to identify cancer-specific enhancer-gene connections, often representing novel connections to known cancer genes. For example, the authors showed that 74 of the novel peak-gene connections regulate 72 Catalogue of Somatic Mutations in Cancer (COSMIC) cancer genes, with 65 of them being novel connections identified by DGTAC. DGTAC predicted two peaks, enhDistal and enhIntron, that regulate ESR1 in estrogen receptor (ER)+ breast cancer subtypes but not in the majority of HER2 or triple-negative subtypes. Only 1% of ER+ cases are explained by the altered copy number of ESR1. Remarkably, these two predicted peaks explain the high expression of ESR1 in 93% of ER+ samples. enhDistal represents a canonical interaction, whereas enhIntron is a non-canonical interaction, and both were novel ESR1 enhancers. By applying DGTAC on 8 breast cancer cell lines corresponding to 4 major breast cancer subtypes (LumA, LumB, Her2+, and triple-negative breast cancer [TNBC]), the authors were able to confirm the results from the patient data that the enhancers only connect to ESR1 in ER+ cell lines. Most excitingly, the authors then tested the enhancers using CRISPRi in T-47D (ER+), MCF-7 (ER+), and MDA-MB-231 (ER−) and found that guides targeting enhDistal and enhIntron resulted in decreased expression of ESR1 and reductions in cell growth only for the ER+ cell lines.

There are a number of directions along which this line of work can be extended. As mentioned by Khurana and colleagues, some cancer types have too few samples to reliably obtain cancer-specific predictions. In such cases, systematic methods to borrow information from other related samples, including cell line models, could be a direction of future research. Beyond enhancer-gene predictions, the model can also be extended to directly predict counts to avoid additional pre-processing of inputs. To address the count prediction problem, recently, both deep and shallow methods have been developed that can help us examine other units of long-range gene regulation such as topologically associating domains (TADS).8,9,10 It would also be interesting to examine the generalization of this model to additional disease types from patient cohorts. The explicit modeling of sequence as done in models such as C.Origami could be used to more directly predict the impact of sequence variants.11

Acknowledgments

This work was supported by NIH grant R01HG012349 to S.R.

Declaration of interests

The authors declare no competing interests.

References

  • 1.Babu M., Snyder M. Multi-Omics Profiling for Health. Mol. Cell. Proteomics. 2023;22 doi: 10.1016/j.mcpro.2023.100561. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Okabe A., Kaneda A. Transcriptional dysregulation by aberrant enhancer activation and rewiring in cancer. Cancer Sci. 2021;112:2081–2088. doi: 10.1111/cas.14884. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Kempfer R., Pombo A. Methods for mapping 3D chromosome architecture. Nat. Rev. Genet. 2020;21:207–226. doi: 10.1038/s41576-019-0195-2. [DOI] [PubMed] [Google Scholar]
  • 4.Hariprakash J.M., Ferrari F. Computational Biology Solutions to Identify Enhancers-target Gene Pairs. Comput. Struct. Biotechnol. J. 2019;17:821–831. doi: 10.1016/j.csbj.2019.06.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Buenrostro J.D., Giresi P.G., Zaba L.C., Chang H.Y., Greenleaf W.J. Transposition of native chromatin for multimodal regulatory analysis and personal epigenomics. Nat. Methods. 2013;10:1213–1218. doi: 10.1038/nmeth.2688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Xu D., Forbes A.N., Cohen S., Palladino A., Karadimitriou T., Khurana E. Recapitulation of patient-specific 3D chromatin conformation using machine learning. Cell Reports Methods. 2023;3 doi: 10.1016/j.crmeth.2023.100578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Cao Q., Anyansi C., Hu X., Xu L., Xiong L., Tang W., Mok M.T.S., Cheng C., Fan X., Gerstein M., et al. Reconstruction of enhancer-target networks in 935 samples of human primary cells, tissues and cell lines. Nat. Genet. 2017;49:1428–1436. doi: 10.1038/ng.3950. [DOI] [PubMed] [Google Scholar]
  • 8.Zhang S., Chasman D., Knaack S., Roy S. In silico prediction of high-resolution Hi-C interaction matrices. Nat. Commun. 2019;10:5449–5518. doi: 10.1038/s41467-019-13423-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Hong H., Jiang S., Li H., Du G., Sun Y., Tao H., Quan C., Zhao C., Li R., Li W., et al. DeepHiC: A generative adversarial network for enhancing Hi-C data resolution. PLoS Comput. Biol. 2020;16 doi: 10.1371/journal.pcbi.1007287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Yang R., Das A., Gao V.R., Karbalayghareh A., Noble W.S., Bilmes J.A., Leslie C.S. Epiphany: predicting Hi-C contact maps from 1D epigenomic signals. Genome Biol. 2023;24:134. doi: 10.1186/s13059-023-02934-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Tan J., Shenker-Tauris N., Rodriguez-Hernaez J., Wang E., Sakellaropoulos T., Boccalatte F., Thandapani P., Skok J., Aifantis I., Fenyö D., et al. Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening. Nat. Biotechnol. 2023;41:1140–1150. doi: 10.1038/s41587-022-01612-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Cell Reports Methods are provided here courtesy of Elsevier

RESOURCES