Abstract
Amplification and overexpression of the SOX2 oncogene represent a hallmark of squamous cancers originating from diverse tissue types. Here, we find that squamous cancers selectively amplify a 3’ noncoding region together with SOX2, which harbors squamous cancer-specific chromatin accessible regions. We identify a single enhancer e1 that predominantly drives SOX2 expression. Repression of e1 in SOX2-high cells causes collapse of the surrounding enhancers, remarkable reduction in SOX2 expression, and a global transcriptional change reminiscent of SOX2 knockout. The e1 enhancer is driven by a combination of transcription factors including SOX2 itself and the AP-1 complex, which facilitates recruitment of the co-activator BRD4. CRISPR-mediated activation of e1 in SOX2-low cells is sufficient to rebuild the e1-SOX2 loop and activate SOX2 expression. Our study shows that squamous cancers selectively amplify a predominant enhancer to drive SOX2 overexpression, uncovering functional links among enhancer activation, chromatin looping, and lineage-specific copy number amplifications of oncogenes.
Subject terms: Epigenomics, Gene regulation, Cancer genomics
SOX2 amplification and overexpression represents a hallmark of squamous cancers with distinct distribution of chromatin accessible regions depending on cancer type. Here, the authors identify a single enhancer e1 that predominantly drives SOX2 expression in squamous cancer.
Introduction
Lineage-specific oncogenes represent a class of genes that play important roles in the normal development of specific cell lineages, but drive tumorigenesis when dysregulated. Many lineage-specific oncogenes encode transcription factors such as MITF in melanomas1, AR in prostate cancer2, CDX2 in colorectal cancer3, and KLF5 in squamous cancer and colorectal cancer4,5. SOX2, a member of the SRY-box transcription factor family, is well known for its role in the pluripotency of embryonic stem cells (ESCs)6. SOX2 is also essential in maintaining the self-renewal ability of basal cells7, which have been reported as the cell of origin for squamous cancers, the most common type of solid tumors8. Squamous cancer can originate from diverse tissues such as lung (lung squamous cell carcinoma; LUSC), cervix (cervical squamous cell carcinoma; CESC), skin, esophagus (esophageal squamous cell carcinoma; ESSC), and upper digestive tissues in the head and neck (head and neck squamous cell carcinoma; HNSC). Genomic analyses have revealed that the SOX2 gene is widely amplified and overexpressed in squamous cancers, nominating SOX2 as a lineage-specific oncogene9–16. Indeed, previous in vivo studies have shown that Sox2 overexpression, together with the inactivation of tumor suppressors such as Pten or Lkb1, drives the formation of mouse lung squamous cancers8,17,18. In addition to squamous cancer, SOX2 amplification and overexpression have also been reported in glioma19, a common type of brain tumor that includes low-grade glioma (LGG), glioblastoma (GBM), and several other subtypes.
SOX2 overexpression in cancer cells has been largely attributed to copy number amplifications of the SOX2 gene itself9–11,14–16,19. However, our understanding of oncogene copy number amplifications is evolving. We and others have recently shown that noncoding enhancers outside oncogenes such as MYC, MYCN, AR, KLF5, and EGFR are selectively amplified with or without their respective oncogenes5,20–28, demonstrating novel mechanisms that transcriptionally activate oncogenes in diverse cancer types. Therefore, we decided to revisit the SOX2 locus and its associated copy number changes.
Here, we reveal distinct copy number profiles at the SOX2 locus between squamous cancers and gliomas, which corresponds to the distribution of lineage-specific potential regulatory elements. Focusing on the noncoding region that is selectively co-amplified with SOX2 in squamous cancers, we discover a single predominant enhancer that is necessary and sufficient for SOX2 activation. Furthermore, we delineate its relationship with the surrounding enhancers, identify its associated transcription factors, reveal its vulnerability to bromodomain protein degradation, and illustrate its impact on 3D chromatin architecture. Our study reveals the functional link among enhancer activation, enhancer–promoter interactions, and lineage-specific copy number amplifications in cancer.
Results
Squamous cancers selectively amplify lineage-specific chromatin-accessible noncoding regions adjacent to the SOX2 oncogene
To identify regions that are recurrently amplified in squamous cancers, we applied Genomic Identification of Significant Targets in Cancer (GISTIC)29 to the single-nucleotide polymorphism (SNP)-array-based copy number data of combined squamous cancer samples (LUSC, ESSC, HNSC, and CESC) from The Cancer Genome Atlas (TCGA)11,30,31 (Supplementary Fig. 1a). In comparison, we also analyzed copy number data of gliomas (combined LGG and GBM), another cancer type that is associated with SOX2 overexpression. The GISTIC focal amplification peaks showed that, although both of the cancer types significantly amplify the SOX2 gene, they also selectively amplify noncoding regions adjacent to SOX2. Indeed, the squamous cancer peak (chr3:181,415,947–181,719,852) covers SOX2 and a ~290 kb noncoding region 3′ to SOX2, while the glioma peak (chr3:181,256,575–181,496,100) covers SOX2 and a ~173 kb noncoding region 5′ to SOX2 (Fig. 1a). Squamous cancers are known to acquire arm-level or broad amplifications at the chromosome 3q arm where SOX2 resides32. We selected focal copy number alterations that are smaller than 10 Mb. Focusing on samples that harbor focal amplifications of SOX2 (amplitude log 2 (copy number/2) > 0.1), which corresponds to 9% of squamous cancers and 4% of gliomas, we profiled the averaged copy number amplitude surrounding the SOX2 locus (Fig. 1a and Supplementary Fig. 1b). The copy number profiles agree with the GISTIC results, showing cancer type-specific amplifications of the noncoding regions 3′ and 5′ to SOX2 (Fig. 1a and Supplementary Fig. 1c). TCGA squamous cancers and gliomas with SOX2 focal amplifications are associated with higher SOX2 expression, as compared to samples with non-focal amplifications or samples without amplifications (Supplementary Fig. 2). SOX2 overlaps with the SOX2-OT noncoding gene (Fig. 1b). We found that SOX2 focally amplified squamous cancers are also associated with higher SOX2-OT expression, which was not observed in gliomas (Supplementary Fig. 2).
We hypothesized that the distinct copy number profiles between these cancers may be attributed to lineage-specific distribution of regulatory elements. Therefore, we analyzed the assay of transposase accessible chromatin-sequencing (ATAC-seq) data from TCGA, which profiled genome-wide chromatin accessibility to identify potential regulatory elements in diverse types of human primary tumors33. In squamous cancers, we found multiple chromatin-accessible regions within the 3′ noncoding region that is selectively amplified in squamous cancers (Fig. 1b). These regions exhibit little chromatin accessibility in gliomas, suggestive of their squamous cancer-specific function (Fig. 1b). In contrast, most of the glioma-specific chromatin-accessible regions, as defined by the ATAC-seq signal, are distributed in the 5′ noncoding region that is selectively amplified in gliomas (Fig. 1b). We then calculated cancer specificity Z-scores for each of the TCGA-annotated chromatin-accessible sites by comparing the ATAC-seq signal across all the profiled cancer types, which highlighted the unique spatial distribution of squamous cancer- and glioma-specific chromatin accessibility (Fig. 1c, examples of highlighted regions are shown in Fig. 1d). Collectively, these data suggest that these two cancer types may selectively amplify lineage-specific regulatory elements together with the SOX2 oncogene (Fig. 1e).
3D genomics identified SOX2 candidate functional enhancers in squamous cancer cells
We next sought to interrogate the relationship between the SOX2 gene and the potential regulatory elements. The human genome is organized into series of insulated neighborhoods or topologically associating domains demarcated by CCCTC-binding factor (CTCF) binding, which restrict promoter–enhancer interactions34. We found that the SOX2 promoter resides at the boundary of two adjacent insulated neighborhoods that were previously identified from CTCF chromatin interaction analysis with paired-end tag (ChIA-PET) sequencing analysis35, suggesting that the SOX2 promoter may have access to regulatory elements from both ends (Fig. 2a). Indeed, we observed strong CTCF binding sites as well as DNA motifs of other chromatin looping factors such as YY136 and ZNF14337 in front of the SOX2 promoter region (Fig. 2a and Supplementary Fig. 3a), suggesting that the SOX2 promoter region may serve as a docking site for chromatin loops.
Given the high frequency of SOX2 focal amplifications in squamous cancers, in addition to previous in vivo evidence demonstrating the oncogenic significance of SOX2 overexpression in squamous cancer8,17,18, we focused on characterizing the functional importance of the squamous cancer-specific chromatin-accessible regions co-amplified with SOX2. We first aimed to assess their enhancer activity and physical interaction with the SOX2 gene promoter. We analyzed chromatin immunoprecipitation-sequencing (ChIP-seq) data of H3K27ac, a marker for active regulatory elements, in esophageal squamous cancer cell lines KYSE140, TT, and TE10, and lung squamous cancer cell lines LK2 and NCI-H5205,38 (Fig. 2b). All of these cell lines are associated with high SOX2 expression and co-amplification of SOX2 and the candidate enhancers (Supplementary Fig. 3b). We found that several of the squamous cancer-specific chromatin-accessible regions exhibit strong and consistent H3K27ac signals (Fig. 2b). In particular, three individual candidate enhancers, which we refer to as e1–e3, form a super-enhancer element (chr3:181,624,870–181,635,218) as defined by strong and condensed enrichment of H3K27ac signal across SOX2-high squamous cancer cell lines (Fig. 2b). Nearby candidate enhancers e4–e5 are also enriched with varying levels of H3K27ac in these cell lines (Fig. 2b), while e6–e8 show noticeable H3K27ac signal only in KYSE140 and TE10 cells. The e6–e8 elements reside within ±5 kb of the transcription start site (TSS) of LINC01206, suggesting that they may serve as the promoter or promoter-proximal elements of the noncoding gene (Fig. 2a).
In contrast, little H3K27ac signal was detected at these regions in the LUSC cell line CALU1, which exhibits low SOX2 expression, or the immortalized normal lung epithelial cell line AALE39 (Fig. 2b and Supplementary Fig. 3b). Interestingly, human ESCs that are associated with high SOX2 expression also exhibit little H3K27ac signal at these loci40 (Fig. 2b). Previous studies have identified a super-enhancer element that drives Sox2 expression in mouse ESC41,42. We applied LiftOver43 to identify mouse genomic regions that are conserved to the human squamous cancer-specific enhancers including the e1–e3 super-enhancer. We found that they are distinct from the reported mouse ESC super-enhancer (Supplementary Fig. 3c). Our findings suggest that this set of candidate enhancers is specific to SOX2-high squamous cancer cells.
We then applied H3K27ac HiChIP assays44 to assess the physical interactions (false discovery rate (FDR) < 0.05) of the candidate enhancers with the SOX2 promoter in SOX2-high esophageal squamous cancer cell lines KYSE140, KYSE70, and TT and lung squamous cancer cell lines LK2 and NCI-H520 (Fig. 2c and Supplementary Fig. 3d). The results consistently show that among the enhancers, the super-enhancer constituents e1–e3 have the strongest physical interaction with the SOX2 promoter region. In contrast, these interactions are absent in the LUSC cell line RERFLCAI that exhibits low SOX2 expression (Fig. 2c). Taken together, these data support e1–e3 as likely functional enhancers of the SOX2 oncogene in squamous cancers.
SOX2 activation is predominantly driven by a single enhancer in squamous cancer
Focusing on the enhancers e1–e3 within the squamous cancer-specific super-enhancer as well as the adjacent enhancers e4–e5, we sought to interrogate their impact on SOX2 expression. We applied an improved CRISPR interference (CRISPRi) system, which uses an inactivated Cas9 (dCas9) fused to two transcriptional repressors KRAB and MeCP245, to inhibit each of the five enhancers in SOX2-high squamous cancer cell lines KYSE140, LK2, and NCI-H520. ChIP-coupled with quantitative PCR (ChIP-qPCR) of dCas9 in KYSE140 cells validated the on-target effects of the single guide RNAs (sgRNAs) (Supplementary Fig. 4a). We found that repression of the e1 enhancer, but not the other four enhancers, consistently resulted in remarkable reductions (64–75%) in SOX2 expression, suggesting the predominant role of e1 (Fig. 3a). For validation, we included a separate sgRNA to target e1 and performed CRISPRi assays in six SOX2-high squamous cancer cell lines representing three tissue types ESSC, LUSC, and HNSC. We showed that e1 repression consistently led to 62–82% reductions in SOX2 expression (averaged value of two separate sgRNAs) across the six cell lines (Fig. 3b). In addition, repression of e1 resulted in clear reductions in the protein levels of SOX2, as revealed by immunoblotting, in all the six cell lines (Fig. 3c). Previous studies have shown that the proliferation of squamous cancer cells with SOX2 overexpression are dependent on the SOX2 gene9,46. We showed that e1 repression led to significant reductions in the cell proliferation rate of SOX2-high squamous cancer cell lines KYSE140, LK2, and NCI-H520 (Fig. 3d). The proliferation-inhibitory phenotype observed in KYSE140 cells was rescued by ectopically expressing SOX2 (Supplementary Fig. 4b), indicating that e1 regulates cell proliferation through activating SOX2. We also transplanted the LK2 cells with and without e1 repression into flanks of nude mice, which showed that activity of the e1 enhancer is required for in vivo tumor growth (Fig. 3e).
The SOX2 gene encodes an SRY-box transcription factor that is involved in both transcriptional activation and repression47. We thereby reasoned that, in addition to reduced SOX2 expression, repression of e1 may also result in dysregulation of SOX2-associated gene expression programs. We first identified SOX2-activated and -repressed genes by performing RNA-seq assays in the ESSC KYSE140 cells with and without CRISPR-mediated SOX2 knockouts (Supplementary Fig. 4c). We selected the top 1000 genes that are activated or repressed by SOX2 (FDR-ranked; FDR < 0.05) and performed Gene Set Enrichment Analysis (GSEA), which showed that e1 repression caused expression changes of these genes in a manner that is highly similar to that caused by SOX2 knockouts (Fig. 3f). Indeed, e1 repression significantly downregulated expression of SOX2-activated genes (normalized enrichment score = −2.06, P < 0.001) and upregulated SOX2-repressed genes (normalized enrichment score = 1.85, P < 0.001). Furthermore, expression of e1-regulated genes (FDR < 0.05; fold change > 1.5) is significantly correlated with SOX2 expression in TCGA squamous cancer samples, suggesting they are likely to be SOX2-target genes in human primary tumors (Fig. 3g). Genes activated by e1 are enriched in squamous cancer-related pathways such as MAPK signaling and Hedgehog signaling (Supplementary Table 1). Collectively, these results demonstrate the critical role of the e1 enhancer in SOX2 activation and SOX2-associated cellular and molecular phenotypes.
We then went on to test if e1 and the surrounding enhancers directly regulate any other genes in addition to SOX2. We analyzed the HiChIP data in SOX2-high squamous cancer cell lines by focusing on HiChIP anchors that harbor the e1–e8 elements (four anchors in total). We identified four additional candidate coding and noncoding genes FXR1, ATP11B, SOX2-OT, and LINC01206—the promoter region of each gene interacts with at least one of the enhancer anchors in two or more of the five tested cell lines (Supplementary Fig. 5a). Among them, the SOX2 promoter has the strongest interactions with these enhancer anchors. We then performed CRISPRi assays in KYSE140 to assess the effects of e1–e8 on these candidate genes. In addition to SOX2, e1 repression also decreased SOX2-OT expression (Supplementary Fig. 5b). However, ectopic expression of SOX2, which had no effect on the decreased endogenous SOX2 expression, rescued the decreased SOX2-OT expression (Supplementary Fig. 5c). This result, together with our observation of several SOX2 binding sites at or next to SOX2-OT promoter region (Supplementary Fig. 5d), suggests that SOX2-OT is directly regulated by SOX2 but not e1. Repression of e6–e8 caused significant reductions in LINC01206 expression (Supplementary Fig. 5b), which together with the observation that e6–e8 are next to LINC01206 TSS suggests that they serve as promoter or promoter-proximal elements for this noncoding gene.
Given the predominant role of the e1 enhancer in SOX2 regulation, we sought to examine structural variants targeting e1 in squamous cancers. We downloaded whole-genome sequencing (WGS) data for 113 squamous cancers from the Pan-Cancer Atlas of Whole-Genome (PCAWG) dataset48,49. GISTIC analysis of the segment data validated the focal amplification of the SOX2-e1 locus (Supplementary Fig. 6a). We identified 16 tumor samples with tandem duplications at the SOX2-e1 region (Supplementary Fig. 6b). Duplications in 12 of the cases contain both SOX2 and e1. Interestingly, four tumor samples harbor duplications of only the enhancer region without the SOX2 gene (Supplementary Fig. 6b), reminiscent of our previous findings regarding duplications of MYC and KLF5 enhancers5,28. The presence of tandem duplications of just the enhancer region further highlights the importance of the e1 enhancer in squamous cancer.
The e1 enhancer drives the activity of the e1–e5 enhancer cluster
We next aimed to assess the functional link of e1 with the surrounding enhancers. Distal enhancers activate target gene expression by recruiting transcriptional coactivators such as the bromodomain protein BRD4 and the mediator complex that promote POL2 elongation50. ChIP-seq of the coactivator BRD4 showed that e1–e7 are enriched with BRD4 binding in ESSC KYSE140 cells. Repression of e1 decreased recruitment of BRD4 at not only e1 but also e2–e7 (Fig. 4a). H3K27ac HiChIP data showed that e1 physically interacts with the rest of the potential regulatory elements (Supplementary Fig. 7a), suggesting a structural basis for their interdependency. Globally, repression of e1 caused a clear reduction of BRD4 recruitment preferentially at high-confidence SOX2 binding sites (SOX2 ChIP-seq peaks containing SOX motifs) as compared to the other BRD4 sites in KYSE140 cells (Fig. 4b), which is likely due to the reduced abundance of the SOX2 transcription factor. We then performed BRD4 ChIP-seq in three additional squamous cancer cell lines LK2, NCI-H520, and HSC4 with and without e1 repression. The results consistently show that the activity of e1 is required for BRD4 recruitment at the surrounding enhancers (Fig. 4c).
In addition to e1, the e2–e7 elements are also enriched with SOX2 binding in KYSE140 cells (Supplementary Fig. 7b), which raised an important question of whether these enhancers are directly regulated by e1 or SOX2. To address this, we performed a rescue experiment by using the doxycycline-inducible SOX2 expression system in KYSE140 cells with and without e1 repression. We performed BRD4 ChIP-qPCR by focusing on e1–e7 as they show significant BRD4 enrichment in KYSE140 cells. We found that ectopic expression of SOX2 only rescued 27.0–32.4% of BRD4 binding at e2–e3 and 49.5–65.3% of the binding at e4–e5 (Supplementary Fig. 7c). In contrast, SOX2 re-expression fully rescued the BRD4 binding at e6–e7 (Supplementary Fig. 7c). These results demonstrate that the enhancers e2–e5, but not e6–e7, are directly dependent on e1 to varying levels, defining an e1–e5 enhancer cluster.
A combination of transcription factors including SOX2 itself contribute to the activity of the e1 enhancer
We next sought to identify transcription factors that may contribute to e1 activity. Motif analysis of the e1 enhancer (chromatin-accessible region) identified DNA sequences recognized by multiple transcription factor families (Fig. 4d), most of which are distributed in regions that are highly conserved across species based on the PhastCons scores51. We then applied CRISPR/Cas9 to specifically disrupt the DNA motifs within e1 and assessed their effects on SOX2 expression (as illustrated in Fig. 4d and Supplementary Fig. 8a) in ESSC KYSE140 and LUSC LK2 cells. We observed >25% reductions of SOX2 expression after disruptions of DNA motifs recognized by transcription factor families EHF, STAT, RUNX, SOX, and AP1 in KYSE140 cells and SNAIL, TCF, SOX, and AP1 in LK2 cells. The combinations of transcription factor motifs are different between these two tested cell lines, which is likely because that they represent two distinct types of squamous cancers. KYSE140 represents the classic SOX2-high and TP63-high squamous cancers, while LK2 was recently reported to represent a variant SOX2-high and POU3F2-high squamous cancer type that is enriched with neural signatures38. Despite the subtype difference, the transcriptional regulatory activity of e1 in both of the cell lines is dependent on the motifs recognized by SOX2 (SOX family motif) and the AP1 complex (Fig. 4e) that was previously indicated as a SOX2 cofactor46, suggesting a positive feedback loop activating SOX2 expression. For validation, we performed ChIP-qPCR in KYSE140 cells and showed that both SOX2 and FOSL1, a member of the AP1 complex, bind to the e1 enhancer, which was disrupted by CRISPR-mediated cutting of their respective motifs (Supplementary Fig. 8b). In addition, we also tested several additional SOX motifs in e2–e8 and the SOX2 promoter, which showed that they have modest or minimal effect on SOX2 expression (Supplementary Fig. 8c).
We then applied CRISPR/Cas9 to simultaneously disrupt SOX (2nd), AP1, RUNX, and STAT (2nd) motifs within the e1 enhancer in KYSE140 and SOX (2nd), AP1, SNAIL, and TCF motifs in LK2, which may either alter the nucleotides of the motif sequences or delete DNA fragments covering the motifs. We found that combinatorial cutting of the motifs resulted in more dramatic reductions in SOX2 expression (76% for KYSE140 and 93% for LK2) as compared to individual motif disruptions, suggesting joint effects of the candidate transcription factors (Fig. 4e). Combinatorial disruptions of the motifs also caused reductions of BRD4 binding at not only e1 but also e2–e5 enhancers (Fig. 4f), which is consistent with the aforementioned finding that activity of the entire enhancer cluster is dependent on e1. Although some of the candidate functional motifs were also found in the e2–e5 enhancers, a full collection of the motifs was only observed at e1 (Fig. 4g), suggesting that combinatorial binding of the candidate transcription factors may determine the predominant role of e1 in the enhancer cluster.
The coactivator BRD4 is required for SOX2 activation, but is dispensable for the e1-SOX2 loop
Given the strong binding of the coactivator BRD4 at the e1 enhancer, we reasoned that the activity of e1 and the associated SOX2 overexpression may be sensitive to BRD4 perturbation. To test this hypothesis, we applied the proteolysis targeting chimera (PROTAC) molecule ARV-771 to recruit the E3 ligase cereblon to degrade BRD452. We found that 2 h of 0.5 µM ARV-771 treatment efficiently decreased the BRD4 protein level (Supplementary Fig. 9a) and removed the majority of BRD4 binding at e1 and its surrounding enhancers in KYSE140 cells (Fig. 5a). Indeed, the e1 enhancer is ranked as the top BRD4-bound regulatory element that is most sensitive to BRD4 degradation in KYSE140 cells (Fig. 5b). RNA-seq analysis showed that ~93% of SOX2 expression was lost in KYSE140 cells after 6 h of 0.5 µM ARV-771 treatments (Fig. 5c). Comparable levels of reductions in SOX2 expression were observed in five additional SOX2-high squamous cancer cell lines (Fig. 5d), suggesting common hypersensitivity of SOX2 expression to BRD4 degradation. We also observed >50% reductions in the proliferation of KYSE140 and LK2 cells in response to 2 days of 0.5 µM ARV-771 treatments (Supplementary Fig. 9b).
While we observed only ~35% reduction in SOX2 expression after 2 h of 0.5 µM ARV-771 treatments in KYSE140 cells (Fig. 5e), we reasoned that most of the remaining signal may come from SOX2 RNA that was already transcribed before the drug treatment. In order to assess the immediate effect of BRD4 degradation on SOX2 transcription, we applied 4-thiouridine (4sU) to label the newly transcribed RNA, also known as nascent RNA, which was then captured by biotin pulldown and quantified by quantitative reverse transcription PCR (RT-qPCR). We showed that 2 h of ARV-771 treatments resulted in ~88% loss of the nascent SOX2 transcription, demonstrating an acute and remarkable response of SOX2 transcription to BRD4 degradation (Fig. 5f).
We then sought to assess if BRD4 degradation affects the chromatin interaction between the e1 enhancer and the SOX2 promoter. We performed HiChIP of H3K27ac in KYSE140 cells with 2 h of dimethyl sulfoxide (DMSO) or 0.5 µM ARV-771 treatment. H3K27ac serves as an ideal bait for the HiChIP capture in this experiment, as the enrichment of H3K27ac at the SOX2-e1 locus was barely affected by 2 h of ARV-771 treatment (Fig. 5a). Surprisingly, despite the dramatic response of e1 activity and SOX2 expression to BRD4 degradation, no appreciable change was observed for the chromatin interaction between e1 and the SOX2 promoter (Fig. 5g), suggesting that the bromodomain protein BRD4 is dispensable for maintaining the chromatin loop (illustrated in Fig. 5h).
Activation of e1 is sufficient to drive SOX2 expression and rebuild the e1-SOX2 chromatin loop
We next aimed to investigate if activation of e1 is sufficient to drive SOX2 expression. We selected two LUSC cell lines RERFLCAI and SKMES1 and one ESSC cell line TE1, all of which are associated with low SOX2 expression (Supplementary Fig. 3b). We applied an improved CRISPR activation (CRISPRa) system, which utilized MS2 and PP7 RNA stem-loops to bring together multiple transcriptional activators such as VP64, p65, and HSF1 and the dCas9 protein53. We used two separate sgRNAs to recruit the CRISPRa complex to the e1 enhancer. Activation of e1 resulted in 8–146-fold increases of SOX2 expression (averaged value of two separate sgRNAs) across the tested cell lines, demonstrating that e1 is sufficient for SOX2 activation (Fig. 6a). Immunoblotting showed that e1 activation also increased SOX2 protein level in SKMES1 cells, which is comparable to that induced by promoter activation of the SOX2 gene (Fig. 6b). In agreement with the aforementioned finding that SOX2-OT is a target gene of the SOX2 transcription factor, activation of e1 also caused upregulation of SOX2-OT expression in SKMES1 cells (Supplementary Fig. 10). As compared to e1, activation of e2–e8 elements have modest or minimal effects on SOX2 expression, again highlighting the predominant role of e1. Activation of e6–e8 that are next to LINC01206 TSS resulted in 10–45-fold increases of LINC01206 expression, which agrees with their roles as a promoter or promoter-proximal elements for this noncoding gene.
RNA-seq analysis revealed that SOX2 was the most significantly upregulated gene (fold change = 162; FDR = 8.8e−111) in SKMES1 cells after e1 activation (Fig. 6c). GSEA analysis showed that e1 activation in SKMES1 cells significantly induced SOX2-associated transcriptional programs that we identified from SOX2 knockouts in the SOX2-high KYSE140 cell line. Indeed, e1 activation significantly upregulated SOX2-activated genes (normalized enrichment score = 1.43, P = 0.003) and downregulated SOX2-repressed genes (normalized enrichment score = −1.30, P = 0.009) (Fig. 6d). Finally, we performed H3K27ac HiChIP assays in SKMES1 cells with and without e1 activation, which revealed that e1 activation led to the formation of the e1-SOX2 chromatin loop (Fig. 6e). We also observed increased chromatin interactions between SOX2 and other enhancers surrounding e1, suggestive of a reconstitution of the chromatin architecture at the SOX2 locus. Taken together, we show that activation of e1 is sufficient to bridge the e1 enhancer to the SOX2 promoter, which results in transcriptional activation of the SOX2 oncogene in squamous cancer cells (illustrated in Fig. 6f).
Discussion
We and others have previously shown that overexpression of oncogenes can be driven by copy number amplifications of distal enhancers5,20–28. Here, we show that this phenomenon extends to lineage-specific enhancers in a cancer type-specific manner. Squamous cancers and gliomas selectively amplify enhancers located 3′ and 5′ to the SOX2 gene, respectively, exhibiting a spatial switch of cancer type-specific copy number amplifications. The phenomenon is likely caused by the unique chromatin architecture surrounding the SOX2 gene: (1) glioma- and squamous cancer-specific enhancers are distributed in two adjacent insulated neighborhoods demarcated by CTCF binding; (2) SOX2 resides right at the boundary of the two neighborhoods so that it has access to both. A recent study showed that copy number amplifications of oncogenes including SOX2 may occur as different forms of structural events such as linear tandem duplications, chromosomal rearrangements, or extrachromosomal circular DNA54. We reveal that SOX2 and distal enhancers that are looped to the SOX2 promoter are often co-amplified in squamous cancers, suggesting a common transcriptional regulatory mechanism that may be shared by different structural forms of SOX2 amplifications.
Lineage-specific oncogenes are known to be driven by condensed clusters of enhancer elements, namely super-enhancers or stretch enhancers55,56,57, yet the hierarchical structures for most of these enhancer clusters remain largely unknown. We show that the enhancer cluster co-amplified with SOX2 in squamous cancer is predominantly driven by a single enhancer e1, which aligns with previous findings of predominant enhancers in other enhancer clusters58,59. Within the SOX2 enhancer cluster, all the remaining enhancers physically interact with e1 and are dependent on the activity of e1, but individually have a minimal or modest impact on SOX2 expression. It is possible that some of the enhancers are redundant to each other in activating SOX2—repression of e1 collapses the entire enhancer cluster and thereby impairs the redundancy. Our work suggests that the predominant role of e1 may be driven by a series of squamous cancer-relevant transcription factors such as SOX2 itself, AP1, and potential family members of RUNX, STAT, SNAI, and TCF complexes. Identification of such predominant enhancers and their associated protein complexes will clarify mechanisms underlying the activation of lineage-specific oncogenes.
As transcription factors are difficult targets with small molecules, understanding the mechanisms underlying their transcriptional activation may imply alternative therapeutic strategies. This is particularly important for squamous cancers that are largely associated with copy number amplifications and the transcriptional activation of transcription factor genes such as SOX2, TP63, and KLF54,5,11–13,60. While the encoded transcription factors are hard to be therapeutically targeted, enhancer activation may yield unique vulnerabilities for cancer cells driven by these oncogenes. We show that the activity of the SOX2 enhancer is dependent on the SOX2 transcription factor itself, its potential cofactor AP1, and the transcriptional coactivator BRD4, representing a self-regulatory circuit that is normally hypersensitive to transcriptional inhibitors—a unique vulnerability that has been reported for other oncogenic transcription factors61. We show that PROTAC-mediated BRD4 degradation leads to an acute and dramatic reduction of SOX2 transcription, suggesting an alternative strategy to target squamous cancers with SOX2 activation, although the efficacy and specificity of such treatments require further preclinical investigations.
It remains elusive how enhancer–promoter loops are initiated and maintained. We show that CRISPR-mediated activation of the e1 enhancer is sufficient to rebuild the e1-SOX2 loop, suggesting that enhancer activation is a prerequisite for initiating enhancer–promoter loops. On the other hand, despite the remarkable impact of BRD4 degradation on SOX2 transcription, we find that it has little effect on maintaining the e1-SOX2 chromatin loop. The observation agrees with recent findings of the MYC and BCL2 loci in leukemia cells62. Similar findings have also been reported for the mediator complex, another important transcriptional coactivator63,64. These together suggest that enhancer activation may be dispensable for maintaining enhancer–promoter loops. Previous studies have shown that binding of several transcription factors such as CTCF, ZNF143, and YY1 to promoters or promoter-proximal regions is required for maintaining enhancer–promoter loops36,37,65,66. Indeed, we observed DNA recognition motifs of these transcription factors in the promoter-proximal region of SOX2. Future investigations are needed to determine whether and which enhancer-bound transcription factors play similar roles in promoter–enhancer interactions.
Methods
Cell lines
Squamous cancer cell lines KYSE140, KYSE70, LK2, NCI-H520, HSC4, TE1, TE10, SKMES1, and RERFLCAI were obtained from the Broad Institute Cancer Cell Line Encyclopedia (CCLE) project67,68. The esophageal squamous cancer cell line TT was obtained from the Japanese Collection of Research Bioresources Cell Bank. Cells were grown in RPMI-1640 media supplemented with 10% fetal bovine serum and 1% of penicillin–streptomycin and tested negative for mycoplasma using the Lonza MycoAlert Detection kit. Cell lines were used for experiments after <3 months of passages. Cell line identities were verified by either SNP-array-based fingerprinting as previously described in the CCLE project67,68 or short tandem repeat analysis.
Analysis of TCGA and PCAWG datasets
TCGA pan-cancer copy number segment dataset was downloaded from the National Cancer Institute Genomic Data Commons data portal (URL: https://gdc.cancer.gov/about-data/publications/pancanatlas). We performed GISTIC229 analysis using combined glioma data (LGG and GBM) and combined squamous cancer data (LUSC, HNSC, ESSC, and CESC) to call significantly focally amplified regions in the two cancer types. TCGA ATAC-seq data was published by Corces et al.33 and downloaded from the NCI Genomic Data Commons data portal (URL: https://gdc.cancer.gov/about-data/publications/ATACseq-AWG). We used the published bigWig data for Integrative Genomics Viewer presentation and the normalized ATAC-seq insertion counts across the identified pan-cancer peak set for calculating Z-scores for each cancer type. TCGA-processed RNA-seq data was downloaded from the Firehose GDAC data portal of the Broad Institute (URL: http://gdac.broadinstitute.org/). For RNA expression correlation of SOX2 and genes regulated by the e1 enhancer, we first calculated Z-scores for each tumor based on SOX2 expression level (log 2 (RSEM + 1)) or the sum of the expression level of e1-regulated genes. One thousand six hundred and thirty-four e1-activated and 1391 e1-repressed genes (based on RNA-seq in KYSE140 cells with and without e1 repression: FDR < 0.05; fold change > 1.5) were used for the analysis. We used the Z-scores for Spearman’s correlation analysis to examine the relationship between the expression of SOX2 versus e1-regulated genes. Squamous cancer WGS-based copy number segment data and structural variants data were downloaded from the PCAWG data portal (URL: https://dcc.icgc.org/releases/PCAWG/).
CRISPR-mediated enhancer repression and activation
For enhancer repression, we first subcloned dCas9-KRAB-MeCP2 (Addgene, 110821) into BamHI–NheI sites of lenti-Cas9-Blast (Addgene, 52962) to generate a lentiviral dCas9-KRAB-MeCP2 vector. We then infected cells with the vector to stably express the dCas9-KRAB-MeCP2 fusion. The infected cells were selected with blasticidin (10 µg/ml) for at least 5 days. Enhancer-targeting sgRNAs were designed close to the summits of ATAC-seq peaks within the SOX2 enhancer cluster. We then infected the dCas9-KRAB-MeCP2 cells with LentiGuide-Puro (Addgene: 52963) carrying either previously published sgRNAs that have no recognition sites in the genome4,28 or sgRNAs targeting the SOX2 enhancers. The infected cells were selected with puromycin (2 µg/ml) for at least 2 days before any molecular or cellular assays. For enhancer activation, cells were first infected with lenti-dCas9-VP64-Blast (Addgene: 61425) and selected with blasticidin (10 µg/ml) for at least 5 days. We then infected the dCas9-VP64 cells with pXPR502 (Addgene 96923) carrying either negative control or enhancer-targeting sgRNA, and selected the cells with puromycin (2 µg/ml) for at least 2 days. All sgRNA sequences are listed in Supplementary Table 2.
Cell proliferation assays
For cell proliferation, infected and selected cells were seeded at the same number (0.025 or 0.05 million) in a 6-well plate and counted after 6 days. For phenotype-rescue experiment, we first cloned SOX2 complementary DNA (cDNA) into the doxycycline-inducible expression vector pCW57.1-Puro (Addgene: 41393) and infected dCas9-KRAB-MeCP2 KYSE140 cells with pCW57.1-SOX2-Puro. We then used LentiGuide-Zeo (to avoid overlap of selection markers) to express sgRNAs. Cells were selected with zeocin (800 µg/ml) for 3 days and the same number of cells were then seeded with or without 1 µg/ml doxycycline. The cell culture media were changed with fresh doxycycline every other day before counting.
CRISPR-mediated gene knockouts and DNA motif cutting
We first generated Cas9-expressing cells by infecting cells with lenti-Cas9-Blast (Addgene: 52962). The infected cells were selected with blasticidin (10 µg/ml) for at least 5 days. We then infected the Cas9-expressing cells with LentiGuide-Puro carrying previously published sgRNAs targeting negative control regions sgAAVS1 and sg-Chr.2-24,69, the coding region of SOX2, or the transcription factor DNA binding motifs identified in the e1 enhancer. The infected cells were selected with puromycin (2 µg/ml) for at least 2 days before any experiments. All sgRNA sequences are listed in Supplementary Table 2.
ChIP-seq and ChIP-qPCR
ChIP-seq assays were performed as previously described4. Briefly, five million cells were crosslinked with 1% formaldehyde (diluted in 1× phosphate-buffered saline (PBS)) and lysed with Lysis Buffer I (5 mM PIPES pH 8.0, 85 mM KCl, 0.5% NP40) and then Lysis Buffer II (1× PBS, 1% NP40, 0.5% sodium deoxycholate, 0.1% sodium dodecyl sulfate) supplemented with protease inhibitors. Chromatin extract was sonicated with QSonica Q800R (pulse: 30 s on/30 s off; sonication time: 20 min; amplitude: 70%) and immunoprecipitated with antibodies premixed with Dynabeads A and G. Antibodies: H3K27ac (Abcam, ab4729, rabbit polyclonal, 4 µg/ChIP), BRD4 (Bethyl, A301-985A100, rabbit polyclonal, 4 µg/ChIP), SOX2 (R&D Systems, AF2018, goat polyclonal, 4 µg/ChIP), CTCF (Cell Signaling, 2899, rabbit polyclonal, 10 µl/ChIP), FOSL1 (Cell Signaling, 5281, rabbit monoclonal, clone D80B4, 10 µl/ChIP), Cas9 (Diagenode, C15310258, rabbit polyclonal, 4 µg/ChIP). ChIP-seq libraries were prepared using NEBNext DNA Ultra II library prep kit and NEBNext Multiplex Oligos for Illumina (96 Unique Dual Index Primer Pairs), and sequenced by Illumina NovaSeq. For ChIP-qPCR assays, we designed primers targeting individual SOX2 enhancers and used sonicated genomic DNA to normalize primer efficiency variance. All the qPCR primers are listed in Supplementary Table 2.
We used Bowtie270 to align sequencing reads to the hg19 human genome, Samtools71 to sort and index the aligned reads, and MACS272 to calculate signal per million reads (SPMR) and to call significant ChIP-seq peaks (q value < 0.05). For super-enhancer analysis, we used the Homer pipeline73 to identify super-enhancers based on H3K27ac ChIP-seq signal, and then used bedtools74 to intersect super-enhancers called from SOX2-high squamous cancer cell lines, which identified the shared super-enhancer region in the SOX2 locus. For enhancer comparison between human and mouse cells, we applied the UCSC LiftOver tool43 to identify mouse genomic regions that are conserved to human squamous cancer SOX2 enhancers and then compared these regions to mouse ESC Sox2 enhancers. For measuring the effect of e1 repression on global BRD4 ChIP-seq signal, we first categorized BRD4 binding sites (BRD4 ChIP-seq peaks identified in KYSE140 cells with sg-NC#1) into two groups based on whether or not they overlap with high-confidence SOX2 binding sites (SOX2 ChIP-seq peaks containing SOX motifs). We used deepTools75 to present averaged BRD4 ChIP-seq profile across these two groups of BRD4 sites in KYSE140 cells with and without e1 repression. To calculate the change of BRD4 ChIP-seq signal at individual enhancers in KYSE140 cells post ARV-771 treatment, we used the UCSC tool “bigWigAverageOverBed”76 to calculate the SPMR value for each BRD4 binding site (BRD4 ChIP-seq peaks identified in KYSE140 cells treated with DMSO) and then performed t tests to compare the values from cells with DMSO or ARV-771 treatments.
RNA-seq and RT-qPCR
Total RNA was extracted using the Zymo Quick-RNA miniprep kit with on-column DNase treatments. mRNA was then purified using the NEBNext Poly-A mRNA Magnetic Isolation Module. RNA-seq libraries were prepared using NEBNext Ultra Directional RNA library prep kit and NEBNext Multiplex Oligos for Illumina (96 Unique Dual Index Primers), and then sequenced by Illumina NovaSeq. Sequencing reads were aligned to the hg19 human genome using Bowtie270. Expression level (read counts) for each GENCODE gene was quantified with RSEM77. We applied the edgeR package78 to normalize the read counts and to perform differential expression analysis. We applied the cut-off of FDR < 0.05 and fold change > 1.5 to identify genes that are significantly down- or upregulated after SOX2 knockout, e1 repression, or e1 activation. To compare gene expression changes caused by SOX2 knockouts versus e1 repression or activation, we ranked SOX2-regulated genes based on their edgeR FDR values and selected the top 1000 SOX2-activated or -repressed genes as “gene sets,” which we used for GSEA analysis to assess if they are down- or upregulated after e1 repression or activation. For RT-qPCR, the extracted RNA was first converted into cDNA with NEB LunaScript SuperMix kit and then processed with real-time PCR using the NEB Luna Universal qPCR Master Mix on a Bio-Rad CFX96 qPCR instrument. The signal of qPCR was normalized to the internal reference genes ACTB or HPRT1. Primers used for RT-qPCR were listed in Supplementary Table 2.
HiChIP and loop calling
HiChIP was performed based on the previously published protocol44 with several modifications as previously described4. Briefly, crosslinked chromatin was first digested with the MboI restriction enzyme, filled with dCTP, dGTP, dTTP, and biotin-labeled dATP, ligated with T4 DNA ligase, and sonicated to achieve an average fragment length of ~1 kb. Antibodies of H3K27ac (Abcam, ab4729, rabbit polyclonal, 7.5 µg/HiChIP; Active Motif, 39133, rabbit polyclonal, 7.5 µg/HiChIP) were used to capture DNA fragments with potential regulatory activity. The streptavidin C1 magnetic beads were used to enrich DNA fragments that were successfully ligated. HiChIP libraries were generated with the Illumina Tagment DNA Enzyme and Buffer kit and sequenced with Illumina NextSeq or NovaSeq.
The sequencing reads were aligned to the hg19 human genome using the HiC-Pro pipeline79. For calling chromatin loops, we first generated a union of H3K27ac sites by merging H3K27ac ChIP-seq binding sites (broad peaks) identified in SOX2-high squamous cancer cell lines KYSE140, TT, TE10, LK2, and NCI-H520. We then used these sites as “anchors” and counted the number of PET spanning these anchors using the Hichipper pipeline80. For presentation, we selected the significant loops (PETs ≥ 5 and FDR < 0.05) that are connected to the anchor overlapping with the SOX2 transcription start site. For presenting HiChIP data from SKMES1 cells with and without CRISPR-mediated e1 activation, two biological replicates were merged.
Motif analysis
We used ATAC-seq signal from TCGA squamous cancer samples and BRD4 ChIP-seq signal from KYSE140 cells to narrow down DNA coordinates of candidate SOX2 enhancer regions. We then used the FIMO software81 with a threshold of P value <10−4 to identify transcription factor binding motifs from the JASPAR motif database82 that are present in the SOX2 enhancers.
4sU labeling of nascent RNA
4sU labeling of nascent RNA was performed as previously described83 with minor modifications. Two million cells per 10 cm dish were seeded one day prior to labeling. Cells were first treated with 0.5 µM ARV-771 or the same volume of DMSO control for 2 h at 37 °C and then treated with 200 µM 4sU (Sigma Aldrich, T4509) or the same volume of DMSO for 20 min at 37 °C. Per condition, cells were harvested and processed with TRIzol and Zymo Quick-RNA miniprep purification. Twenty micrograms of purified RNA was mixed with 5 μg of MTSEA biotin-XX (Biotium, Cat#90066) in 400 μl of Biotinylation buffer (10 mM HEPES pH 7.5, 1 mM EDTA, and 20% dimethylformamide) and incubated for 2 h with rotation in the dark. Free biotin was then removed with Zymo RNA Clean & Concentrator kit. One hundred microliters of Dynabeads MyOne Streptavidin C1 (Thermo Fisher) was added to RNA and incubated in 200 µl Biotin binding buffer (10 mM Tris-HCl pH 7.5, 2 mM EDTA, 200 mM NaCl, 0.02%Tween-20) for 1 h with rotation. Beads were washed three times with the Biotin binding buffer. RNA was eluted sequentially with 5% β-mercaptoethanol for 15 min at room temperature and 100% β-mercaptoethanol for 5 min at 50 °C. The combined RNA elute was purified with Zymo RNA Clean & Concentrator kit. The abundance of SOX2 nascent transcription was quantified using RT-qPCR and normalized to HPRT1 nascent transcription. All the qPCR primers are listed in Supplementary Table 2.
Immunoblotting
Cells were lysed with NP40 lysis buffer (1% NP40, 150 mM NaCl, and 50 mM Tris-HCl, pH 8.0) supplemented with protease inhibitors and sonicated with QSonica Q800R (pulse: 30 s on/30 s off; sonication time: 2 min; amplitude: 50%). Protein extract was denatured with LDS sample buffer (Thermo Fisher) supplemented with 20 mM dithiothreitol, separated in 4–12% NuPage Bis-Tris gel (Thermo Fisher), and transferred to nitrocellulose membranes. For immunoblotting, we used primary antibodies of SOX2 (Cell Signaling, 3728, rabbit monoclonal, clone C70B1, 1:1000 dilution), ACTIN (Santa Cruz, sc-47778, mouse monoclonal, clone C4, 1:2500 dilution), and BRD4 (Bethyl, A301-985A100, rabbit polyclonal, 1:1000 dilution), and secondary antibodies of goat anti-rabbit IRDye 800CW (LI-COR, 925-32211, 1:10,000 dilution), and goat anti-mouse IRDye 680CW (LI-COR, 925-68070, 1:10,000 dilution). In this study, we independently validated the specificity of the SOX2 antibody by using cells with and without ectopic expression of SOX2 cDNA (Supplementary Fig. 4b) or cells with and without CRISPR-mediated SOX2 knockout (Supplementary Fig. 4c). Immunoblotting images were taken on an LI-COR instrument following the manufacturer’s instructions. For Figs. 3c and 6b and Supplementary Fig. 4b–c, we blotted the same membranes with primary antibodies of different species (SOX2: rabbit; ACTIN: mouse) and then LI-COR fluorescent secondary antibodies. For Supplementary Fig. 9a, we cut the same membrane in half and blotted BRD4 and ACTIN separately. The uncropped and unprocessed scans of the immunoblots were included in the Source data.
In vivo xenograft assays
All animal experiments were conducted in accordance with procedures approved by the institutional Animal Care and Use Committee at the Dana-Farber Cancer Institute, in compliance with NIH guidelines. For subcutaneous implantation, 2.5 million LK2 cells with and without CRISPR-mediated e1 repression were resuspended in 200 μl mixture (1:1 Matrigel:media) and injected into the flanks of female nude mice (6–8 weeks old, Nu/Nu; Jackson Laboratory). All the mice were housed in pathogen-free environment, with 12 h of light and 12 h of dark cycles, 18–21 °C, and 40–60% humidity. Mice were examined every 3–4 days, and tumor length and width were measured using calipers. The maximal tumor size permitted by our ethics committee is 2 cm (in any dimension). None of the tumors in our study exceeded the maximal limit. Tumor volume was calculated using the following formula: (length × width2) × 0.5.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
We thank Trudy Oliver (HCI) for useful discussions. S.D.B. is supported by the Canadian Institutes of Health Research (CIHR) and the Cancer Research Society (CRS). S.D.B. is the recipient of a Chercheur-boursier Junior 1 award from the Fonds de Recherche du Québec—Santé (FRQS), the Thomlinson award from McGill University, and the Dr. Ray Chiu distinguish scientist in surgical research award from the Montreal General Hospital Foundation. X.C. is supported by the National Natural Science Foundation of China (grant numbers: 82073637 and 82122060).
Source data
Author contributions
D.K.A.R. and K.L.M. contributed equally. Y.L., S.D.B., and X.Z. designed the research and wrote the manuscript. Y.L., Z.W., J.Z., D.K.A.R., K.L.M., E.A.-J., and X.Z. performed the biological experiments. Y.L., Z.W, J.Z., Y.Y., A.M.T., A.D.C., S.D.B., and X.Z. performed the computational and statistical analyses. X.Y., K.E.V., J.G., P.S.C., X.C., A.J.B., S.D.B., and X.Z. supervised the research and reviewed and revised the manuscript. K.E.V., J.G., P.S.C., and X.C. provided technical and material support.
Data availability
TCGA publicly available copy number and ATAC-seq data were downloaded from NCI Genomic Data Commons data portal (copy number URL: https://gdc.cancer.gov/about-data/publications/pancanatlas; ATAC-seq URL: https://gdc.cancer.gov/about-data/publications/ATACseq-AWG). TCGA publicly available RNA-seq data were downloaded from Broad Institute GDAC data portal (URL: http://gdac.broadinstitute.org/). PCAWG publicly available whole-genome sequencing data was downloaded from the PCAWG data portal (URL: http://gdac.broadinstitute.org/). The H3K27ac ChIP-seq publicly available data used in this study were downloaded from the Gene Expression Omnibus (GEO) series GSE137461 (for LK2, NCI-H520, and CALU1 cells), GSE88976 (for TT and TE10 cells), GSE16256 (for hESC), and GSE31039 (for mESC). The ChIP-seq, HiChIP, and RNA-seq data generated in this study have been deposited to GEO under the series GSE166234. The remaining data are available within the Article, Supplementary information, or Source Data file. Source data are provided with this paper.
Competing interests
E.A.-J. is employed at Recursion Pharmaceuticals. K.E.V. is a cofounder and consultant for Kailos Genetics. A.M.T. receives research funding from Ono Pharmaceutical. A.D.C. receives research funding from Bayer. A.J.B. receives research funding from Bayer, Merck, and Novartis, serves as a consultant to Earli and HelixNano, and is a cofounder of Signet Therapeutics. The other authors declare no competing interests.
Footnotes
Peer review information Nature Communications thanks Yotam Drier and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Yanli Liu, Zhong Wu, Jin Zhou.
Contributor Information
Swneke D. Bailey, Email: swneke.bailey@mcgill.ca
Xiaoyang Zhang, Email: xiaoyang_zhang@fudan.edu.cn.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-021-27055-4.
References
- 1.Garraway LA, et al. Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma. Nature. 2005;436:117–122. doi: 10.1038/nature03664. [DOI] [PubMed] [Google Scholar]
- 2.Heinlein CA, Chang C. Androgen receptor in prostate cancer. Endocr. Rev. 2004;25:276–308. doi: 10.1210/er.2002-0032. [DOI] [PubMed] [Google Scholar]
- 3.Salari K, et al. CDX2 is an amplified lineage-survival oncogene in colorectal cancer. Proc. Natl Acad. Sci. USA. 2012;109:E3196–E3205. doi: 10.1073/pnas.1206004109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Liu Y, et al. Chromatin looping shapes KLF5-dependent transcriptional programs in human epithelial cancers. Cancer Res. 2020;80:5464–5477. doi: 10.1158/0008-5472.CAN-20-1287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Zhang X, et al. Somatic superenhancer duplications and hotspot mutations lead to oncogenic activation of the KLF5 transcription factor. Cancer Discov. 2018;8:108–125. doi: 10.1158/2159-8290.CD-17-0532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Takahashi K, Yamanaka S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006;126:663–676. doi: 10.1016/j.cell.2006.07.024. [DOI] [PubMed] [Google Scholar]
- 7.Que J, Luo X, Schwartz RJ, Hogan BLM. Multiple roles for Sox2 in the developing and adult mouse trachea. Development. 2009;136:1899–1907. doi: 10.1242/dev.034629. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ferone G, et al. SOX2 is the determining oncogenic switch in promoting lung squamous cell carcinoma from different cells of origin. Cancer Cell. 2016;30:519–532. doi: 10.1016/j.ccell.2016.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bass AJ, et al. SOX2 is an amplified lineage-survival oncogene in lung and esophageal squamous cell carcinomas. Nat. Genet. 2009;41:1238–1242. doi: 10.1038/ng.465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Campbell JD, et al. Distinct patterns of somatic genome alterations in lung adenocarcinomas and squamous cell carcinomas. Nat. Genet. 2016;48:607–616. doi: 10.1038/ng.3564. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Campbell JD, et al. Genomic, pathway network, and immunologic features distinguishing squamous carcinomas. Cell Rep. 2018;23:194–212. doi: 10.1016/j.celrep.2018.03.063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Dotto GP, Rustgi AK. Squamous cell cancers: a unified perspective on biology and genetics. Cancer Cell. 2016;29:622–637. doi: 10.1016/j.ccell.2016.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Guan Y, Wang G, Fails D, Nagarajan P, Ge Y. Unraveling cancer lineage drivers in squamous cell carcinomas. Pharmacol. Ther. 2020;206:107448. doi: 10.1016/j.pharmthera.2019.107448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.The Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature. 2012;489:519–525. doi: 10.1038/nature11404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.The Cancer Genome Atlas Research Network. Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature. 2015;517:576–582. doi: 10.1038/nature14129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.The Cancer Genome Atlas Research Network. Integrated genomic characterization of oesophageal carcinoma. Nature. 2017;541:169–175. doi: 10.1038/nature20805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Mollaoglu G, et al. The lineage defining transcription factors SOX2 and NKX2-1 determine lung cancer cell fate and shape the tumor immune microenvironment. Immunity. 2018;49:764–779. doi: 10.1016/j.immuni.2018.09.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Mukhopadhyay A, et al. Sox2 cooperates with Lkb1 loss in a mouse model of squamous cell lung cancer. Cell Rep. 2014;8:40–49. doi: 10.1016/j.celrep.2014.05.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Brennan CW, et al. The somatic genomic landscape of glioblastoma. Cell. 2013;155:462–477. doi: 10.1016/j.cell.2013.09.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Helmsauer K, et al. Enhancer hijacking determines extrachromosomal circular MYCN amplicon architecture in neuroblastoma. Nat. Commun. 2020;11:5823. doi: 10.1038/s41467-020-19452-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Herranz D, et al. A NOTCH1-driven MYC enhancer promotes T cell development, transformation and acute lymphoblastic leukemia. Nat. Med. 2014;20:1130–1137. doi: 10.1038/nm.3665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Morton AR, et al. Functional enhancers shape extrachromosomal oncogene amplifications. Cell. 2019;179:1330–1341. doi: 10.1016/j.cell.2019.10.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Shi J, et al. Role of SWI/SNF in acute leukemia maintenance and enhancer-mediated Myc regulation. Genes Dev. 2013;27:2648–2662. doi: 10.1101/gad.232710.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Takeda DY, et al. A somatically acquired enhancer of the androgen receptor is a noncoding driver in advanced prostate cancer. Cell. 2018;174:422–432. doi: 10.1016/j.cell.2018.05.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Viswanathan SR, et al. Structural alterations driving castration-resistant prostate cancer revealed by linked-read genome sequencing. Cell. 2018;174:433–447. doi: 10.1016/j.cell.2018.05.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Wu S, et al. Circular ecDNA promotes accessible chromatin and high oncogene expression. Nature. 2019;575:699–703. doi: 10.1038/s41586-019-1763-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zhang X, Meyerson M. Illuminating the noncoding genome in cancer. Nat. Cancer. 2020;1:864–872. doi: 10.1038/s43018-020-00114-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zhang X, et al. Identification of focally amplified lineage-specific super-enhancers in human epithelial cancers. Nat. Genet. 2016;48:176–182. doi: 10.1038/ng.3470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Mermel CH, et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 2011;12:R41. doi: 10.1186/gb-2011-12-4-r41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Bailey MH, et al. Comprehensive characterization of cancer driver genes and mutations. Cell. 2018;173:371–385. doi: 10.1016/j.cell.2018.02.060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ding L, et al. Perspective on oncogenic processes at the end of the beginning of cancer genomics. Cell. 2018;173:305–320. doi: 10.1016/j.cell.2018.03.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Taylor AM, et al. Genomic and functional approaches to understanding cancer aneuploidy. Cancer Cell. 2018;33:676–689. doi: 10.1016/j.ccell.2018.03.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Corces MR, et al. The chromatin accessibility landscape of primary human cancers. Science. 2018;362:eaav1898. doi: 10.1126/science.aav1898. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Schoenfelder S, Fraser P. Long-range enhancer-promoter contacts in gene expression control. Nat. Rev. Genet. 2019;20:437–455. doi: 10.1038/s41576-019-0128-0. [DOI] [PubMed] [Google Scholar]
- 35.Hnisz D, et al. Activation of proto-oncogenes by disruption of chromosome neighborhoods. Science. 2016;351:1454–1458. doi: 10.1126/science.aad9024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Weintraub AS, et al. YY1 is a structural regulator of enhancer-promoter loops. Cell. 2017;171:1573–1588. doi: 10.1016/j.cell.2017.11.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Bailey SD, et al. ZNF143 provides sequence specificity to secure chromatin interactions at gene promoters. Nat. Commun. 2015;2:6186. doi: 10.1038/ncomms7186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Sato T, et al. Epigenomic profiling discovers trans-lineage SOX2 partnerships driving tumor heterogeneity in lung squamous cell carcinoma. Cancer Res. 2019;79:6084–6100. doi: 10.1158/0008-5472.CAN-19-2132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lundberg AS, et al. Immortalization and transformation of primary human airway epithelial cells by gene transfer. Oncogene. 2002;21:4577–4586. doi: 10.1038/sj.onc.1205550. [DOI] [PubMed] [Google Scholar]
- 40.Roadmap Epigenomics Consortium. et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Li Y, et al. CRISPR reveals a distal super-enhancer required for Sox2 expression in mouse embryonic stem cells. PLoS ONE. 2014;9:e114485. doi: 10.1371/journal.pone.0114485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Zhou HY, et al. A Sox2 distal enhancer cluster regulates embryonic stem cell differentiation potential. Genes Dev. 2014;28:2699–2711. doi: 10.1101/gad.248526.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Kent WJ, et al. The human genome browser at UCSC. Genome Res. 2002;12:996–1006. doi: 10.1101/gr.229102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Mumbach MR, et al. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat. Methods. 2016;13:919–922. doi: 10.1038/nmeth.3999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Yeo NC, et al. An enhanced CRISPR repressor for targeted mammalian gene regulation. Nat. Methods. 2018;15:611–616. doi: 10.1038/s41592-018-0048-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Watanabe H, et al. SOX2 and p63 colocalize at genetic loci in squamous cell carcinomas. J. Clin. Invest. 2014;124:1636–1645. doi: 10.1172/JCI71545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Sarkar A, Hochedlinger K. The Sox family of transcription factors: versatile regulators of stem and progenitor cell fate. Cell Stem Cell. 2013;12:15–30. doi: 10.1016/j.stem.2012.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature. 2020;578:82–93. doi: 10.1038/s41586-020-1969-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Rheinbay E, et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature. 2020;578:102–111. doi: 10.1038/s41586-020-1965-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Chen F. X., Smith E. R., Shilatifard A., et al. Born to run: control of transcription elongation by RNA polymerase II. Nat. Rev. Mol. Cell Biol. 2018;19:464–478. doi: 10.1038/s41580-018-0010-5. [DOI] [PubMed] [Google Scholar]
- 51.Siepel A, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15:1034–1050. doi: 10.1101/gr.3715005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Raina K, et al. PROTAC-induced BET protein degradation as a therapy for castration-resistant prostate cancer. Proc. Natl Acad. Sci. USA. 2016;113:7124–7129. doi: 10.1073/pnas.1521738113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Sanson KR, et al. Optimized libraries for CRISPR-Cas9 genetic screens with multiple modalities. Nat. Commun. 2018;9:5416. doi: 10.1038/s41467-018-07901-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Kim H, et al. Extrachromosomal DNA is associated with oncogene amplification and poor outcome across multiple cancers. Nat. Genet. 2020;52:891–897. doi: 10.1038/s41588-020-0678-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Hnisz D, et al. Super-enhancers in the control of cell identity and disease. Cell. 2013;155:934–947. doi: 10.1016/j.cell.2013.09.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Lovén J, et al. Selective inhibition of tumor oncogenes by disruption of super-enhancers. Cell. 2013;153:320–334. doi: 10.1016/j.cell.2013.03.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Parker SCJ, et al. Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants. Proc. Natl Acad. Sci. USA. 2013;110:17921–17926. doi: 10.1073/pnas.1317023110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Carleton JB, et al. Regulatory sharing between estrogen receptor α bound enhancers. Nucleic Acids Res. 2020;48:6597–6610. doi: 10.1093/nar/gkaa454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Huang J, et al. Dissecting super-enhancer hierarchy based on chromatin interactions. Nat. Commun. 2018;9:943. doi: 10.1038/s41467-018-03279-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Jiang Y-Y, et al. TP63, SOX2, and KLF5 establish a core regulatory circuitry that controls epigenetic and transcription patterns in esophageal squamous cell carcinoma cell lines. Gastroenterology. 2020;159:1311–1327. doi: 10.1053/j.gastro.2020.06.050. [DOI] [PubMed] [Google Scholar]
- 61.Bradner JE, Hnisz D, Young RA. Transcriptional addiction in cancer. Cell. 2017;168:629–643. doi: 10.1016/j.cell.2016.12.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Crump NT, et al. BET inhibition disrupts transcription but retains enhancer-promoter contact. Nat. Commun. 2021;12:223. doi: 10.1038/s41467-020-20400-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.El Khattabi L, et al. A pliable mediator acts as a functional rather than an architectural bridge between promoters and enhancers. Cell. 2019;178:1145–1158. doi: 10.1016/j.cell.2019.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Jaeger MG, et al. Selective mediator dependence of cell-type-specifying transcription. Nat. Genet. 2020;52:719–727. doi: 10.1038/s41588-020-0635-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Kubo N, et al. Promoter-proximal CTCF binding promotes distal enhancer-dependent gene activation. Nat. Struct. Mol. Biol. 2021;28:152–161. doi: 10.1038/s41594-020-00539-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Zhou Q, et al. ZNF143 mediates CTCF-bound promoter-enhancer loops required for murine hematopoietic stem and progenitor cell function. Nat. Commun. 2021;12:43. doi: 10.1038/s41467-020-20282-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Barretina J, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–607. doi: 10.1038/nature11003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Ghandi M, et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature. 2019;569:503–508. doi: 10.1038/s41586-019-1186-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Chan EM, et al. WRN helicase is a synthetic lethal target in microsatellite unstable cancers. Nature. 2019;568:551–556. doi: 10.1038/s41586-019-1102-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Zhang Y, et al. Model-based Analysis of ChIP-Seq (MACS) Genome Biol. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Heinz S, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell. 2010;38:576–589. doi: 10.1016/j.molcel.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Ramírez F, et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44:W160–W165. doi: 10.1093/nar/gkw257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010;26:2204–2207. doi: 10.1093/bioinformatics/btq351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinform. 2011;12:323. doi: 10.1186/1471-2105-12-323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Robinson MD, McCarthy DJ, Smyth G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Servant N, et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:259. doi: 10.1186/s13059-015-0831-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Lareau CA, Aryee M. J. hichipper: a preprocessing pipeline for calling DNA loops from HiChIP data. Nat. Methods. 2018;15:155–156. doi: 10.1038/nmeth.4583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27:1017–1018. doi: 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Fornes O, et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2020;48:D87–D92. doi: 10.1093/nar/gkz1001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Duffy EE, et al. Tracking distinct RNA populations using efficient and reversible covalent chemistry. Mol. Cell. 2015;59:858–866. doi: 10.1016/j.molcel.2015.07.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
TCGA publicly available copy number and ATAC-seq data were downloaded from NCI Genomic Data Commons data portal (copy number URL: https://gdc.cancer.gov/about-data/publications/pancanatlas; ATAC-seq URL: https://gdc.cancer.gov/about-data/publications/ATACseq-AWG). TCGA publicly available RNA-seq data were downloaded from Broad Institute GDAC data portal (URL: http://gdac.broadinstitute.org/). PCAWG publicly available whole-genome sequencing data was downloaded from the PCAWG data portal (URL: http://gdac.broadinstitute.org/). The H3K27ac ChIP-seq publicly available data used in this study were downloaded from the Gene Expression Omnibus (GEO) series GSE137461 (for LK2, NCI-H520, and CALU1 cells), GSE88976 (for TT and TE10 cells), GSE16256 (for hESC), and GSE31039 (for mESC). The ChIP-seq, HiChIP, and RNA-seq data generated in this study have been deposited to GEO under the series GSE166234. The remaining data are available within the Article, Supplementary information, or Source Data file. Source data are provided with this paper.