Abstract
The human genome contains regulatory elements, such as enhancers, that are often rewired by cancer cells for the activation of genes that promote tumorigenesis and resistance to therapy. This is especially true for cancers that have little or no known driver mutations within protein coding genes, such as ovarian cancer. Herein, we utilize an integrated set of genomic and epigenomic datasets to identify clinically relevant super-enhancers that are preferentially amplified in ovarian cancer patients. We systematically probe the top 86 super-enhancers, using CRISPR-interference and CRISPR-deletion assays coupled to RNA-sequencing, to nominate two salient super-enhancers that drive proliferation and migration of cancer cells. Utilizing Hi-C, we construct chromatin interaction maps that enable the annotation of direct target genes for these super-enhancers and confirm their activity specifically within the cancer cell compartment of human tumors using single-cell genomics data. Together, our multi-omic approach examines a number of fundamental questions about how regulatory information encoded into super-enhancers drives gene expression networks that underlie the biology of ovarian cancer.
Subject terms: Gene regulatory networks, Cancer genomics, Transcriptomics, Gene regulation, Ovarian cancer
Super-enhancers and their associated transcription factor networks have been shown to influence ovarian cancer biology. Here, based on an integrated set of genomic and epigenomic datasets, the authors identify clinically relevant super-enhancers amplified in ovarian cancer patients and functionally validate their activity.
Introduction
Ovarian cancer is one of the deadliest cancers among women worldwide and is the leading cause of gynecologic-related cancer deaths in the US1. High-grade serous ovarian cancer (HGSOC) is the most common subtype (~80% of all ovarian cancer) and is characterized by a high number of copy number alterations and few driver mutations, which is thought to account for the clinical aggressiveness of this disease as well as the eventual development of chemoresistance2,3. The most commonly seen mutation in HGSOC is p53 (>90% of cases), followed by a low, but statistically significant, prevalence of recurrent somatic mutations in NF-1, BRCA1/2, and CDK2, which often lead to genomic instability4–6. Due to this genomic instability, ovarian cancer has a high rate of copy number abnormalities and recent studies have shown that these alterations can be used to stratify HGSOC2. However, the paucity of known driver mutations for ovarian cancer has made it difficult to develop effective targeted therapies. Consequently, the standard of care remains cytoreductive surgery followed by carboplatin/taxane chemotherapy, with ~75% of patients experiencing a recurrence2,3. Thus, additional analysis of the non-coding regions of the genome, that extends beyond gene profiling, is desperately needed.
Mounting evidence suggests that regulatory elements, such as transcriptional enhancers, can be rewired or hijacked by cancer cells for the activation of genes that promote tumor formation, metastasis, and resistance to therapy7–9. This is especially true for cancers that have little or no known driver mutations within protein-coding genes, such as ovarian cancer10. Enhancers are non-coding DNA elements that contain information for the binding of transcription factors and interact spatially with their target genes to orchestrate spatiotemporal patterns of gene expression11,12. It is estimated that there are hundreds of thousands of enhancers found throughout our genome and these can act independent of orientation and linear distance from their target genes, forming high-order chromatin loops with their target genes. Of note, the activity of enhancers is often restricted to a particular cell type or to specific physiological or pathological conditions, enabling their genomic function to determine precisely when, where, and at what level each of our genes is expressed13–15. Large clusters of neighboring enhancers that have unusually high occupancy of interacting factors are typically called super-enhancers (SEs)16. These super-enhancers are known to regulate key cell identity genes, and in cancer are known to drive oncogene expression17.
The high transcriptional output of cancer cells is thought to be sustained by the activity of super-enhancers, suggesting cancer cells can become addicted to super-enhancer-driven regulatory networks18. Furthermore, recent studies in ovarian cancer have demonstrated the capacity of super-enhancers and their associated networks of transcription factors to directly influence chemoresistance19,20. The molecular characteristics and high activity of super-enhancers make them exquisitely sensitive to epigenetic drugs, more so than typical enhancers21. Thus, there is a growing belief that exploiting transcriptional dependence by targeting oncogenic super-enhancers may be a valid therapeutic avenue21. For example, the bromodomain-containing protein 4 (BRD4) is a druggable transcription factor that recognizes acetylated histone proteins and is found in large quantities at super-enhancers22,23. Small molecule inhibition of BRD4 (such as JQ1 and BET inhibitors) has been shown to reduce cell proliferation and survival in vivo as well as increase therapeutic sensitivity of several cancer types, leading to the development of several clinical trials20,23,24. However, despite their effectiveness in inhibiting oncogenic processes in ovarian cancer cells, anti-BRD4 agonists remain a poor therapeutic option due to their overall toxicity and delivery constraints25. Nevertheless, the study of BRD4-associated super-enhancers in ovarian cancer may lead to the identification of biomarkers, downstream druggable targets, and a better understanding of the regulatory processes that drive this disease.
To this end, the studies described herein examine several fundamental questions about how regulatory information is encoded into super-enhancers, how they are preferentially amplified in ovarian cancer cells, and how they drive gene expression networks that underlie the biology of ovarian cancer cells. We use an integrated genomic and computational framework to (1) identify BRD4-enriched and copy number amplified super-enhancers in ovarian cancer patients, (2) systematically probe the functions of the top 86 ovarian cancer-enriched super-enhancers using CRISPR interference assays (CRISPRi) (dCas9-KRAB) coupled to RNA-seq, (3) validate their roles in driving the proliferation and migration of cancer cells via CRISPR-knockouts, (4) annotate direct target genes using chromatin looping information via Hi-C, and (5) confirm their activity specifically within the cancer cell compartment of human tumors using single-cell genomics data.
Results
Identification of BRD4-enriched super-enhancers in ovarian cancer
Super-enhancers are one of the most salient regulatory elements in the genome and are known to be repurposed by cancer cells to drive the expression of oncogenes16,26. Due to the unusually high levels of interacting transcription factors and the prominence of their target genes, super-enhancers contain untapped potential that can lead to a new set of markers with diagnostic and prognostic potential, or even serve as tractable targets for therapeutic intervention21,27. To identify enhancers likely to be associated with oncogenic gene expression programs, we leveraged both ovarian cancer cell line epigenetic data and patient tumor RNA-seq and copy number data from The Cancer Genome Atlas (TCGA)10 (Fig. 1a).
First, we used existing ChIP-seq data in the well-vetted high-grade serous ovarian cancer cell line OVCAR3 to identify active enhancers by searching for co-localization of the histone modification histone H3 lysine 27 acetylation (H3K27ac) and BRD4 (Fig. 1f)28–30. BRD4 enrichment was considered a critical component for the detection of potentially oncogenic enhancers due to key observations previously shown in ovarian cancer patients23. Namely, across the entirety of the TCGA Pan-Cancer dataset, ovarian cancer patients have the highest rate of genetic amplifications at the BRD4 locus, with ~11% of patients having an amplification of this region (Fig. 1b)5–7,31. Moreover, ovarian cancer has the highest overall expression of BRD4 across all TCGA cancer types and patients with increased expression of BRD4 experienced significantly reduced survival times as determined through Kaplan–Meier analysis (Fig. 1c, d)6,31–35. Therefore, we defined active enhancers as intergenic regions that contained at least a 1-base pair overlap between statistically significant BRD4 peaks and H3K27ac peaks called by the MACS2 peak calling algorithm (Fig. 1e)36. To focus on distal enhancer elements, any peaks that overlapped with annotated genes or promoter regions were removed. This pipeline identified 12,339 BRD4-enriched active enhancer elements in ovarian cancer cells. To determine if these enhancers are lineage-specific or extensible to other cancer types, we investigated the overlap with existing enhancer annotations across all normal tissues (defined by the ENCODE consortium), and across existing annotations in other cancer types (Supplementary Fig. 1c, d)16,37. In addition to this general analysis, we also specifically compared our 12,339 enhancer predictions to previously annotated enhancers in two distinct fallopian tube secretory epithelial cell lines (FTSEC)38, currently thought to be the most common precursor cell of origin for HGSOC (Supplementary Fig. 1b)39,40. We found that 44.1% of the 12,339 BRD4-enriched enhancers had at least 1-base pair overlap with active enhancers in normal tissues and this number increases to 73.6% when compared to active enhancers across several cancer types. Of particular interest, we also identified ~6000 enhancers in HGSOC cells that are not active in normal FTSEC cells (Supplementary Fig. 1b). The aforementioned importance of BRD4 and the high degree of overlap between the enhancers identified in this study with previously annotated enhancers in cancer cells gave us confidence for using these data for calling super-enhancers.
From our pool of 12,339 constituent enhancers, we identified 126 super-enhancer regions using the rank ordering of super-enhancers (ROSE) algorithm (Fig. 1g, h, and Supplementary Data 1–4)41,42. To determine if these BRD4-enriched super-enhancers are relevant to ovarian cancer patients, we leveraged the assay for transposase-accessible chromatin at single-cell resolution (scATAC-seq) data generated from HGSOC patients to measure the activity of the super-enhancers within these tumors43. We detected the activity (defined by chromatin accessibility) of 121 out of 126 (96%) super-enhancers in the cancer cell fraction of HGSOC patients (Supplementary Fig. 1a). Taken together, these data suggest that the super-enhancers identified using our pipeline are not cell line specific and may be relevant to both ovarian cancer and other cancer types. To further investigate the clinical utility of these SEs, we next looked for evidence in patient tumors using both TCGA RNA-seq and copy number variation data.
Copy Number Variation and expression Quantitative Trait Loci (CNVeQTL) analysis nominates potentially oncogenic super-enhancers
Given that copy number variation (CNV) has been previously identified as an important hallmark of ovarian cancer, we sought to investigate whether these BRD4-enriched super-enhancers were preferentially amplified in ovarian cancer patients2. To this end, we performed a computational experiment making use of publicly available copy number variation data across ~600 ovarian cancer patients10 to compare the copy number amplification values overlapping our SE regions to the amplification across the ovarian cancer genome as a whole, by both random-draw (pseudo-bootstrap) and direct comparison analyses (Fig. 2a). Copy number variation values across ~600 ovarian cancer patients were quantified by dividing the genome into uniform 15 kb sliding windows and assigning CNV segment values within each window (Fig. 2d). We then compared the amplification of the windows that overlap the SEs against an equivalent number of randomly drawn windows across the ovarian cancer genome (inclusive of our SE regions). The random drawing of windows was iterated 10,000 times and, in each comparison, there was significant enrichment in amplification of the SE overlapping windows compared to the random groups (Fig. 2c). This observation was reinforced by comparing SE CNV to the CNV across the ovarian cancer genome as a whole (Fig. 2e). Remarkably, amplification of the super-enhancers themselves was prognostic of clinical outcome44. In many cases, patients with increased copy numbers had significantly increased hazard ratio and reduced survival times, suggesting that super-enhancer copy numbers may be of prognostic value (Fig. 2b). Taken together, these data suggest that the SEs we identified in OVCAR3 cells are preferentially amplified in ovarian cancer patients and that some SE amplifications are associated with reduced survival.
To better understand how amplification of these SEs are associated with oncogenic gene expression networks, we leveraged the RNA-seq data generated from a subset of the same ovarian cancer patients (~300) to link the SEs to gene expression. We took inspiration from a commonly used approach in complex genetics which associates nucleotide variants to changes in gene expression called eQTL analysis45,46. However, unlike eQTL analysis which focuses on point mutations, the comparison, in this case, focuses on changes in copy number across SE loci to changes in gene expression within each patient (copy number variation expression quantitative trait loci (CNVeQTL)) (Supplementary Fig. 2a). The assumption is that amplification or deletion of SE regions should affect their target genes, therefore, looking across hundreds of patients for shared patterns of variation will identify putative target genes of each SE. However, since we altered the input data of the eQTL detection software to utilize two quantitative variables (copy number and gene expression), we needed to determine a robust indication of our null condition for statistical analysis.
To generate the null dataset, we broke the linkage of RNA to copy number by randomly permuting the columns of the RNA data matrix and then running Matrix eQTL46 on this permutated dataset, repeating this process 100k times, and using the median distribution from all 100k trials to inform our experimental analysis. Importantly, all 100k runs using the permutated null data showed a relatively uniform distribution of p-values across the null condition, suggesting no meaningful relationship between copy number and gene expression, and returned a similar count of total significant CNVeQTLs (the median number of CNVeQTLs across all 100k was 11,632) (Supplementary Fig. 2b). In contrast, the results from the true data show a much sharper peak around p-value = 0 and returned a much larger number of significant CNVeQTLs (n = 126,438) (Supplementary Fig. 2c and Supplementary Data 5). We used the results of the 100k null experiments to determine an empirical false discovery rate47 of about 0.092. This data also allowed us to investigate some higher-order questions, such as whether the number of CNVeQTL detected was strictly a function of size. While there was a modest linear relationship between these features, this analysis suggested something other than genomic size influenced the number of CNVeQTL (Supplementary Fig. 2d). Collectively, these data suggest that amplification of the super-enhancer regions is associated with pervasive gene expression changes in human tumors, reinforcing the idea they are not merely cell line specific, and they may be preferentially amplified for a biologically meaningful reason.
We recognize that the identification of 126,438 CNVeQTL linkages across 126 super-enhancers seems high, despite the null distributions tested, and that the vast majority of copy number amplifications will have very strong effects in cis (and most will have effects in trans) irrespective of their designation as a super-enhancer. Therefore, to functionally validate and assess the full scope of this data, we chose the top 86 super-enhancers ranked by BRD4 enrichment and H3K27ac signal (which were located both above and below the CNVeQTL prediction line) to perturb using a high throughput CRISPRi screen (Supplementary Fig. 2d).
High throughput CRISPR-interference screen highlights super-enhancer target gene relationships
To systematically probe the functions of each SE and determine the consequences on gene expression, we used high-throughput CRISPR-interference assays coupled to RNA-seq. For this experiment, we engineered OVCAR3 cells to stably express nuclease deficient Cas9 fused to the KRAB effector domain (dCas9-KRAB). The KRAB effector domain induces local chromatin repression via methylation of histone 3 lysine 9 (H3K9me3) and, when fused to dCas9, allows us to use the programmable properties of CRISPR to target and inhibit any genomic loci of interest (Fig. 3a)48–51. For this experiment, each well received a different set of custom-designed guide RNAs (sgRNAs) to specifically inhibit one SE per well (i.e. arrayed CRISPRi screen) (Fig. 3b and Supplementary Data 1). A total of 86 super-enhancers were tested plus 10 control wells. Two different sgRNAs, targeting the two highest BRD4 peak summits within each super-enhancer, were designed for each SE (see Methods)52. For negative controls, we used a non-targeting scrambled sgRNA in addition to a sgRNA designed to target a dormant region of the genome (Supplementary Data 1). After 72 h of epigenetic silencing, RNA was purified from each well and barcoded to specifically track which super-enhancer was probed per well (96 total barcodes). The RNA was prepped and sequenced on an Illumina platform to measure changes in gene expression as a consequence of super-enhancer inhibition (Fig. 3b).
Given our intent to survey as many super-enhancers as possible and to increase the chances of finding those that exhibited the most profound effects on gene expression, we decided to probe each SE once within the 96-well setup, prioritizing breadth over the inclusion of replicates (Fig. 3b). Therefore, a traditional differential gene expression analysis pipeline (requiring the use of replicates) had to be eschewed in favor of something better able to handle our experimental setup. We took inspiration from previous analyses performed on large-scale perturbation databases, such as the Connectivity Map project (CMap)53, and chose to focus on relative changes in rank for each gene (uprank or downrank) rather than traditional differential gene expression analysis or absolute expression counts. The resulting changes in rank could then be investigated across the entire dataset by iterating through a series of rank change cutoffs, identifying super-enhancers that affected significantly more genes at a particular cutoff as compared to the negative control wells (based on an empirical false discovery rate of 0.1) (see the “Methods” section). Any genes detected at these thresholds could then be tentatively assigned as target genes to each SE (Fig. 3 and Supplementary Data 6). To investigate whether a traditional relative expression approach would have identified similar target genes, we determined the log2-fold change of every gene for each SE relative to the controls. We then assessed the relationship between gene expression determined by the relative change in rank and relative expression for each SE. In every case, the correlation between log2-fold change (LFC) and rank change (RC) was highest when comparing each SE to itself, as opposed to all other SEs on the screen, suggesting that differential gene expression calculated in both ways gave similar results (Supplementary Fig. 3a). Notably, some of the correlations were much stronger than others, leading us to focus on SEs with an LFC versus RC correlation value above the mean. Of particular interest was super-enhancer 14 (SE14) which exhibited an LFC vs. RC correlation value of 0.95 (the highest in the entire dataset), suggesting particularly robust results for this SE (Supplementary Fig. 3b). Confident that our rank change approach was adequately supported by this comparison, we proceeded to look for other SEs that exhibited profound effects on gene expression.
First, we focused on a number of summary analyses from the CRISPRi screen. The median number of genes downregulated by each SE was four and there were a few salient SEs that affected a much larger number of genes (Fig. 3c). Interestingly, there was only a weak correlation between the number of differentially regulated genes and SE size (Fig. 3e) or enrichment of H3K27ac (Fig. 3f), suggesting that the effects on gene expression are not merely a function of size. Of note, super-enhancer 60 (SE60) was in the bottom half in terms of size, but it affected the greatest number of genes. Therefore, we felt it prudent to understand the specificity of our CRISPRi targeting process and empirically determine the extent of spreading of the repressive H3K9me3 mark upon dCas9-KRAB binding. To this end, we performed H3K9me3 ChIP-seq in ovarian cancer cells transfected with SE60 targeting sgRNAs versus non-targeting sgRNAs. Differential binding analysis revealed that only our region of interest (SE60) was significantly enriched for the H3K9me3 signal upon transfection of the targeting sgRNAs but not the scramble non-targeting sgRNA (Supplementary Fig. 4c, d). Additionally, there was an increase in the H3K9me3 signal at each of our two SE60 sgRNA locations, suggesting both guides successfully delivered dCas9-KRAB to the SE target sites (Supplementary Fig. 4a, b). We found that the region with increased H3K9me3 was about 20 kb, spreading ~10 kb from each sgRNA target site, enough to cover the entire SE. There also did not appear to be an increase in signal at the computationally predicted off-target sites, suggesting that the guides for SE60 were highly specific (Supplementary Fig. 4e, f). Taken together, these results validate our method for designing sgRNAs and reinforce that the observations from the screen (specifically for SE60) were not due to off-target effects (Supplementary Fig. 4).
Having supported the validity of our CRISPRi assay, we next wanted to examine the patterns of gene expression that resulted from the screen. To accomplish this, we utilized two clustering methods, K-means clustering for the target genes and unsupervised-hierarchical clustering for the SEs. We found that the differentially regulated genes could be divided into three optimal clusters that represent distinct gene expression pathways in cancer cells (Fig. 3d, g). Conversely, the SEs can be divided into 10 distinct clusters with shared patterns of gene expression (Supplementary Data 1). More specifically, CRISPRi targeting of the SEs in clusters 2–4 (containing SE14 and SE60) caused decreases in the expression of genes enriched for pathways such as KRAS signaling, estrogen response (both early and late), and epithelial to mesenchymal transition (EMT). In contrast, SEs in clusters 5–10 maintain some similarities (KRAS and early estrogen response) but also have a unique role in the regulation of the JAK–STAT pathway and immune-related pathways (Fig. 3g). Taken together, the CRISPRi screen in conjunction with our CNV analyses have allowed us to comprehensively determine which SEs have the most profound effects on gene expression and inform us of the enhancers that likely regulate key gene pathways in ovarian cancer. Based on these results, two salient SEs, SE60 and SE14, were selected for follow-up experiments.
Deletion of SE60 and SE14 causes dysregulation of oncogenic gene expression pathways leading to reduced proliferation and migration of cancer cells
Perturbation of SE60 affected the greatest number of genes in the CRISPRi screen. In addition, amplification of this SE in ovarian cancer patients is prognostic of worse patient outcomes, nominating it as a super-enhancer that is associated with oncogenic processes (Fig. 4a, b). Therefore, we wanted to experimentally determine whether SE60 drives critical gene expression programs in ovarian cancer. To that end, we designed sgRNAs flanking the BRD4 peak summit of the largest constituent enhancer within SE60 and generated three independent CRISPR-Knockout (KO) clones resulting from ~1700 to 1800 bp deletions (Fig. 4c and Supplementary Fig. 5).
Global changes in gene expression resulting from each SE60 KO clone were measured using RNA-seq. Differential expression analysis using DESeq254 revealed pervasive changes in gene expression with 660 genes being detected as significantly downregulated and 1090 genes being upregulated at a strict confidence threshold (adjusted p-value of 0.0005) (Fig. 4d, e, and Supplementary Data 7). Pathway analysis of the top 100 significantly downregulated genes (determined by p-value) identified significant enrichment in cell cycle progression, quiescence, metastasis, differentiation, and KRAS-signaling (Fig. 4f)55–57, further suggesting SE60 drives critical gene expression programs in ovarian cancer. This observation was supported by clinical analysis of these predicted target genes, where increased expression of the top 100 SE60 target genes is associated with worse clinical outcomes in HGSOC patients (utilizing data from 15 ovarian cancer datasets), even after considering additional clinical variables such as stage and grade (Fig. 4i, Supplementary Fig. 6, and Supplementary Data 10). The notion that SE60 plays a key role in ovarian cancer was further validated by the effects that deletion of this SE had on cancer cell proliferation and migration (Fig. 4g, h).
To substantiate our approach for identifying clinically relevant super-enhancers, we selected an additional candidate from the CRISPRi screen for validation. SE14 was chosen because (1) it has the highest correlation between LFC and RC differential gene expression analysis from the CRISPRi screen, (2) it is in the top four SEs that affected the greatest number of genes, and (3) its amplification portends a worse clinical outcome in ovarian cancer patients (Fig. 5a, b). To investigate the functional role of SE14, we designed sgRNAs flanking the BRD4 peak summit of the largest constituent enhancer within the super-enhancer and generated three independent CRISPR-KO clones resulting from ~2500 to 2800 bp deletions (Fig. 5c and Supplementary Fig. 5). Global changes in gene expression resulting from each knockout clone were measured using RNA-seq. Differential expression analysis identified 860 genes as significantly downregulated, and 629 genes as upregulated at our confidence threshold (adjusted p-value of 0.0005) (Fig. 5d, e, and Supplementary Data 7). Pathway analysis of the top 100 most significant downregulated genes identified significant enrichment in cell cycle progression, quiescence, metastasis, differentiation, and EMT (Fig. 5f)55–57, further suggesting that SE14 plays an important role in ovarian cancer. Kaplan–Meier analysis of the top 100 most significant downregulated genes after SE14 KO revealed a significant association with worse clinical outcomes in HGSOC patients (Fig. 5i, Supplementary Fig. 6, and Supplementary Data 10). Similar to the results obtained with SE60, the biological assays on all three SE14 KO cell lines exhibited a significant decrease in proliferation and migration compared to wild-type cells (Fig. 5g, h).
We had an interest in determining how similar the results of CRISPRi-based perturbation of SE60 and SE14 are to the gene expression changes caused by CRISPR-KO. Therefore, we performed additional dCas9-KRAB experiments coupled to RNA-seq (in replicate) for both SE60 and SE14. Differential gene expression analysis for both the CRISPRi and CRISPR-KO was performed with DESeq2 to facilitate the comparisons of the resulting changes in gene expression (Supplementary Figs. 7, 8, and Supplementary Data 7). For SE60, 169 genes were detected as differentially expressed by both CRISPRi and CRISPR-KO, and 11 of these genes were downregulated, suggesting that these are true target genes of SE60 (Supplementary Fig. 7c, d). Further analysis of the 11 downregulated genes detected by both CRISPRi and CRISPR-KO found this gene set to be enriched for Metastasis, Cell Cycle Progression, and Inflammation pathways, as well as being associated with reduced survivorship in HGSOC patients (Supplementary Fig. 7f, g, Supplementary Fig. 6, and Supplementary Data 10). The analysis of SE14 revealed 731 differentially expressed genes by both CRISPRi and CRISPR-KO and 169 of these genes were downregulated (Supplementary Fig. 8c, d). Analysis of the 169 shared downregulated genes detected by both CRISPRi and CRISPR-KO found this gene set to be enriched for Quiescence, Cell Cycle Progression, Differentiation, Inflammation, Stemness, and Estrogen Response pathways as well as being associated with reduced survivorship in HGSOC patients (Supplementary Figs. 8f, g, 6, and Supplementary Data 10). Taken together, these results validate our approach to identifying clinically relevant SEs and highlight the importance of these two SEs in ovarian cancer. Next, we investigated whether the mechanistic roles of SE60 and SE14 on cell proliferation and migration were due to direct or indirect target gene regulation.
3D-chromatin interactions defined by Hi-C in ovarian cancer cells establish direct target genes for SE60 and SE14
The significant effects on proliferation and migration caused by CRISPR-based deletion of SE60 and SE14 led us to investigate if these biological phenotypes were caused by direct or indirect regulation of target genes. We reasoned that direct target genes would exhibit increased chromatin looping interactions with the SE, whereas indirect target genes would be downstream of an effector gene that was directly regulated by the SE. To enable unbiased measurement of interaction frequencies between each super-enhancer and its target genes, we performed Hi-C in OVCAR3 cells to comprehensively annotate chromatin interactions across the ovarian cancer genome58. In order to maximize the breadth of this analysis, we focused on the target gene set detected from the CRISPR-KO experiments that represented the most statistically robust gene set for each SE, resulting from 4 replicates of RNA-seq across three independent knockout clones for each SE. Moreover, the constitutive perturbation of each SE, caused by CRISPR-based deletion, gave rise to consistent gene expression patterns that resulted in marked biological phenotypes, thus facilitating integration with the Hi-C data.
Since Hi-C is highly dependent on distance, we limited our search space to genes located on the same chromosome as the super-enhancers (cis genes) in order to get an accurate metric of interaction frequency59,60. To perform this analysis, we quantified the interaction frequency between each SE and its downregulated genes upon SE deletion. This was compared to a control dataset consisting of 100 permutations of distance-matched gene sets that exhibited no significant changes in gene expression upon SE deletion (see the “Methods section). This enabled us to compare distributions of interaction frequency measurements between each SE and a random set of genes based entirely on genomic distance. Direct targets were defined as SE-gene pairs with an observed/expected contact frequency greater than the 75th percentile of the control/background distribution. Overall, we observed that the target genes for each SE had a higher interaction frequency with their cognate SE compared to distance-matched genes found on the same chromosome (Fig. 6a, f).
We identified one cis direct target gene and four cis indirect target genes for SE60 (Fig. 6b). Of note, the cis direct target gene for SE60, RAE1, has previously been associated with invasion in ovarian cancer and has been shown to promote EMT in breast cancer61. In addition, increased expression of this gene portends a worse outcome in HGSOC patients34,35 (Fig. 6c, d, Supplementary Fig. 6, and Supplementary Data 10). Notably, RAE1 was also predicted as a target of SE60 by the CNVeQTL analysis (Supplementary Fig. 9a, b, and Supplementary Data 8). When looking at Hi-C contact frequency across chromosome 20, we notice a marked increase in contact between the RAE1 locus and the SE60 locus as compared to the background (Fig. 6e). This suggests that the decrease in migration detected upon SE60 deletion is due, in part, to its direct regulation of RAE1. We suspect that there may exist more direct target genes for SE60 located on other chromosomes based on the reference genome, that may be translocated to chromosome 20 in the ovarian cancer genome. However, these trans-chromosomal interaction frequencies are technically more challenging to detect via Hi-C.
Interestingly, we identified a much greater number of cis direct target genes (28 genes) and cis indirect targets (62 genes) for SE14 (Fig. 6g and Supplementary Data 8). Pathway analysis of the cis direct targets revealed key roles in cell cycle progression, quiescence, invasion, differentiation, metastasis, and stemness (Fig. 6h). Kaplan–Meier analysis of this gene signature highlighted a statistically significant decrease in survival for patients that had high expression of these genes (Fig. 6i, Supplementary Fig. 6, and Supplementary Data 10). Likewise, 8 of these cis direct targets had been predicted from our CNVeQTL analysis reinforcing the utility of CNVeQTLs to predict cis-direct targets (Supplementary Fig. 9c, d). Through our analysis of these of cis direct targets, we identified examples of both close-range (EPHA2) and distant (MAB21L3) connections to SE14 (Fig. 6j and Supplementary Fig. 10). Interestingly, there were genes within very close proximity to SE14 (such as ARHGEF19) that showed no evidence of interaction or differential gene expression. These data implicate SE14 as being directly involved in both proliferation and migration, as well as other key processes in ovarian cancer.
To validate our Hi-C analysis, we performed an orthogonal method, called Activity-by-Contact model (ABC)62, to identify interactions across the genome by integrating Hi-C data with measures of chromatin activity. This analysis allowed us to determine the top interacting cis-genes for both SEs regardless of the distance between both features. The results of the ABC analysis recapitulate those from our initial Hi-C analysis and independently nominated RAE1, EPHA2, and MAB21L3 as the genes most likely to interact with SE60 and SE14, respectively (Supplementary Fig. 11 and Supplementary Data 9). This analysis highlighted our ability to delineate true targets and suggests that both SE60 and SE14 are directly involved in clinically relevant processes in ovarian cancer.
SE60 and SE14 are specifically active within the epithelial cancer cell fraction of human HGSOC tumors as revealed by single cell genomics
Our previous experiments had demonstrated that these SEs are preferentially amplified in ovarian cancer patients and that they regulate gene expression pathways that govern the proliferation and migration of cancer cells. As a final validation experiment, we wanted to determine if SE60 and SE14 were specifically active within the cancer cell compartment of human HGSOC tumors and if their target genes are also active within the same cell type. To test this, we analyzed matched single-cell RNA-seq (scRNA-seq) and single-cell ATAC-seq (scATAC-seq) data from two HGSOC patients previously generated in our lab (Supplementary Fig. 12)43. We annotated seven distinct cell types present in these tumors by both scRNA-seq and scATAC-seq and identified the cancer cell population using the FDA-approved biomarker CA125 (also known as MUC16) (Fig. 7a, b)63. We found significant enrichment of RAE1, a SE60 cis direct target, and EPHA2, a SE14 cis direct target, within the cancer cell fraction as compared to the normal cell fraction (Wilcoxon Rank Sum tests, Bonferroni-corrected p-values < 2.2e−308 and average logFC ≥ 0.1) (Fig. 7b).
In order to assess whether SE60 and SE14 are uniquely active in ovarian cancer, we next leveraged the scATAC-seq data. These data showed significantly increased chromatin accessibility at three constituent enhancers of both SE60 and SE14, specifically within the cancer epithelial cell fraction as compared to the stromal compartments of these tumors (Wilcoxon Rank Sum tests, Benjamini–Hochberg FDR ≤ 0.10 and Log2FC ≥ 0.25) (Fig. 7c). Additionally, both HGSOC patients showed this pattern, suggesting that activation of these SEs is a common feature of HGSOC biology. While there is previous evidence from ENCODE that these regions contain regulatory elements in normal epithelial tissue, it appears that there is significantly more accessibility of these super-enhancers in ovarian cancer cells (Fig. 7c).
In order to investigate what transcription factors might be involved with these super-enhancers, we performed motif enrichment analysis using FIMO sequence analysis64. To provide confidence to the TF motif calls, we investigated the expression of the predicted TFs within the cancer epithelial cells. Within the cancer-enriched constituent enhancers of SE60, we found SOX4, ATF4, and YY1 as the top three predicted transcription factors. Of note, YY1 is known as an integral component of enhancer–promoter loop interactions and is a hallmark of an active enhancer65. Similarly, we detected binding motifs for ELF3, KLF, and JUN in the cancer-enriched constituent enhancers of SE14. Notably, ELF3 has been previously associated with vascular inflammation, tumorigeneses, epithelial differentiation, and the ERRB3 pathway providing additional evidence of the importance of this SE (Fig. 7d, e)66,67. Taken together, these data suggest that these SEs and their target genes show more activity in the cancer cell fraction of HGSOC tumors and serve to validate our computational pipeline for the identification of clinically relevant super-enhancers.
Discussion
Every year, an estimated 22,000 new cases of ovarian cancer will be diagnosed and around 14,000 women will die as a result of this disease1. The paucity of known drivers for ovarian cancer makes identifying at-risk individuals very difficult and has led to a lack of effective targeted therapies. Thus, platinum-based chemotherapy coupled with surgery remains the standard of care68. Given their critical functions in controlling gene regulation, enhancers are often required to achieve the levels of transcriptional activity needed to sustain cancer cells and have been shown to play an integral part in cancer development and patient survival. Additionally, super-enhancers have demonstrated the capacity to regulate many critical pathways for the development and maintenance of the cancer cell state as well as influence therapeutic resistance17–20.
With the advent of therapeutics designed to inhibit various epigenetic factors that convey functionality to enhancers, it is now possible to exploit the dependency of cancer cells on transcription as an effective strategy for treating therapeutically recalcitrant cancers such as ovarian cancer27,69. For example, the Bromodomain and Extra-Terminal motif inhibitors (BET inhibitors; such as JQ1) designed to interfere with the functions of bromodomain-containing proteins like BRD4 have shown promise in several pre-clinical models of cancer, although their efficacy in a clinical setting is still unknown25,70. Nonetheless, investigating enhancers with high BRD4 enrichment can lead to the identification of biomarkers, druggable targets, and an improved understanding of ovarian cancer. Notably, the expression of BRD4 is highest in ovarian cancer as compared to every other cancer type represented in The Cancer Genome Atlas and high expression portends a worse outcome in ovarian cancer patients (Fig. 1). Thus, we reasoned that co-enrichment of BRD4 and H3K27ac can be used as a surrogate to find SEs driving oncogenic processes in ovarian cancer. This was substantiated by the observation that the SEs identified in our study were preferentially copy number amplified in ovarian cancer patients and that some amplification events were themselves predictive of worse clinical outcomes (Fig. 2). Additionally, our CNVeQTL analyses across HGSOC patients demonstrate that the activity of these super-enhancers is pervasive. This is perhaps not surprising since genomic instability is a hallmark of ovarian cancer and several studies have demonstrated that somatic mutations at specific regulatory elements in the ovarian cancer genome play a pivotal role in subtype determination and overall progression2,71. Furthermore, the dysregulation of genomic architecture in ovarian cancer may allow for cancer cells to hijack existing enhancers for oncogenic purposes. In fact, several examples of enhancer hijacking exist in other types of cancer such as Burkett’s Lymphoma, B-Cell Lymphoma, and Glioblastoma9,72,73. Overall, these findings suggested that a number of our identified SEs were amplified for biologically meaningful reasons.
Rather than limiting our study to the standard taxonomic listing of super-enhancers, we used three orthogonal approaches to define the regulatory logic of SEs in ovarian cancer—CRISPRi, CRISPR-KO, and Hi-C. The CRISPRi screen allowed us to systematically determine the target genes for each of the top 86 most active SEs (Fig. 3). While most CRISPR screens involve a pool of sgRNAs and rely on a cellular endpoint (such as proliferation) to be able to capture the relative abundances of remaining sgRNAs, our screen was customized to provide a readout of gene expression for each super-enhancer. We knew, a priori, which sgRNAs were used and which SEs were affected in each well. On average, we found that each SE perturbation resulted in the downregulation of about four genes and the total number of genes regulated by each SE was not a function of size or enrichment of H3K27ac or BRD4. In fact, SE60 was in the bottom quartile of super-enhancers in terms of size and H3K27ac enrichment, but it had the most profound effects on gene expression. Therefore, we reasoned that SE60 harbored the most potential for further study due to its likely role in regulating genes that contribute to the pathology of ovarian cancer. While the goal of the CRISPRi screen was to broadly investigate the effects on gene expression across a large cohort of super-enhancers, we recognize that the CRISPRi screen was underpowered to definitively establish target genes for each SE. Thus, we elected to perform CRISPR-KOs of SE60 and SE14 to enable robust target gene detection.
CRISPR-KO of SE60 and SE14 had dramatic effects on gene expression programs involved in Quiescence, Metastasis, and Invasion, among other important pathways. Moreover, the gene sets for both SEs were associated with poor outcomes in HGSOC patients. This was supported by both proliferation and migration defects in the SE60 and SE14 knockout cells (Figs. 4 and 5). We note that there were hundreds of genes differentially regulated upon deletion of these two SEs, and that there was a modest overlap with the differentially expressed genes detected via CRISPRi. This is perhaps due to the technical nuances of CRISPR-KO (a constitutive genomic deletion arising from an individual clone) versus CRISPRi (a transient epigenetic inhibition of the enhancer locus) that may affect the target genes identified for a particular SE. In fact, the field as a whole has wrestled with the best way to assign target genes to enhancers, especially considering the genomic rearrangements observed in cancer cells. Thus, we reasoned that direct chromatin interactions between the SEs and their target genes (as measured by Hi-C) would give confidence to the annotation of target genes (Fig. 6).
Overall, the downregulated genes upon SE deletion showed higher interaction, both nearby and across long distances, with the SE as compared to the distance-matched control gene set. Importantly, several cis direct target genes are involved in oncogenic pathways and perhaps could serve as prognostic indicators or biomarkers in the future (Fig. 6, Supplementary Fig. 6, and Supplementary Data 10). We note that there may exist more target genes for each SE located on other chromosomes, however, we found no distinguishable interactions (via Hi-C) between our SEs and their target genes located on different chromosomes. This suggests that genes found on different chromosomes are likely indirect target genes where the SE directly regulates a gene in cis that is co-linear with the trans target gene. In the absence of Hi-C data for determining direct target genes, we posit that evidence from two orthogonal experiments (such as CRISPRi and CRISPR-KO or inclusion of reporter-based enhancer assays) would yield high confidence results since genes detected by multiple assays are agnostic to the technical nuances of each. In fact, a logical framework to describe the level of support needed to definitively annotate an enhancer and its bona fide target genes have been recently proposed74, and its implementation would yield a catalog of enhancers with confidently linked target genes.
Finally, both SE60 and SE14 were found to have a statistically significant increase in chromatin accessibility within the cancer cell fraction of human HGSOC tumors at single-cell resolution (Fig. 7), further suggesting that the SEs that we identified are not merely cell line specific. This validates our enhancer identification pipeline and reveals that certain super-enhancers involved in growth and migration are preferentially enriched and amplified in cancer cells. In addition, we found that cis direct target genes annotated for each SE (such as RAE1 and EPHA2) were more highly expressed in the cancer cells compared to the stromal/non-malignant cells within HGSOC tumors. Collectively, these results expound the idea that super-enhancers themselves and the genes they regulate represent viable therapeutic avenues and may aid in biomarker identification. More broadly, our study described a genomic and computational approach for identifying clinically relevant enhancers and their bona fide target genes which should be applicable to a wide variety of biological systems.
Methods
Cell culture
NIH:OVCAR-3 (OVCAR3) (ATCC, HTB-161) and HEK-293T (ATCC, CRL-3216) cell lines were used in this study. OVCAR3 cells were cultured in RPMI media (Gibco, 11875-093) supplemented with 10% FBS and 1% penicillin/streptomycin (Corning, MT30002CI). HEK-293T cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM) (Gibco, 11995065) supplemented with 10% FBS and 1% penicillin/streptomycin. OVCAR3-dCas9-KRAB-blast (OVCAR-KRAB) cells were maintained in RPMI media with 10% FBS, 1% penicillin/streptomycin, and 1 µg/mL blasticidin (Corning, 30100RB) after the selection process. All cells were grown at 37 °C in 5% CO2. OVCAR3 cells were authenticated with Short Tandem Repeat profiling through ATCC before being used. Cell lines were tested for mycoplasma and were mycoplasma negative.
Engineering dCas9-KRAB expressing OVCAR3 cells
Lentivirus was packaged in HEK-293T cells and contained the Lenti-dCas9-KRAB-blast vector75 (Addgene plasmid #89567). Cells seeded in a T75 flask were transfected with the following: 6.67 µg Lenti-dCas9-KRAB-blast, 5 µg psPAX2 (Addgene, 12260), and 3.33 µg PMD2G (Addgene, 12259) using Fugene 6 (Promega, E2691) following the manufacturer’s protocol. 48–72 h post-transfection, the lentivirus containing supernatant was harvested and concentrated using the Lenti-X Concentrator (Takara, 631231) following the manufacturer’s protocol. To transduce OVCAR3 cells, cells were seeded at 50,000 cells/well in a six-well plate and treated with the harvested lentivirus in RPMI media with 10% FBS and 10 µg/mL polybrene (Millipore, TR1003G). After 72 h, transduced cells were placed in RPMI selection media with 3 µg/mL blasticidin for 7 days. After batch selection, OVCAR3-KRAB cells were collected for Western Blot to validate the presence of dCas9-KRAB. Cells were lysed with following lysis buffer: 50 mM Tris–HCl (pH 8), 0.5 M NaCl, 1% NP-40, 0.5% sodium deoxycholate, 0.1% SDS and 1× protease inhibitor. The β-tubulin antibody (Abcam, ab6046) was diluted at 1:5000 in 5% BSA in TBST and incubated overnight at 4 °C. The Cas9 antibody (7A9-3A3) (Santa Cruz, sc-517386) was diluted at 1:1500 in 5% BSA in TBST and incubated overnight at 4 °C. The secondary antibodies (Donkey anti-rabbit IgG HRP-linked (GE, NA934) and Donkey anti-mouse IgG HRP-linked (Invitrogen, PA1-28748)) were diluted at 1:5000 in 5% BSA in TBST.
CRISPRi screen sgRNA design
For sgRNAs targeting super-enhancers, target regions were chosen by selecting two regions within the super-enhancer with the highest BRD4 enrichment and clear H3K27ac signal. For each super-enhancer region, sgRNAs were designed using the CRISPOR web tool52 taking into account the specificity and off-target scores. If all suggested sgRNA sequences to a region had low specificity scores, a second sgRNA was instead designed to target the third highest BRD4 peak. Two sgRNAs were designed per super-enhancer to be transfected together. Genomic coordinates for all super-enhancers and their sgRNA sequences are found in Supplementary Data 1. sgRNA oligos were ordered from Integrated DNA Technologies. The negative control sgRNAs (Scramble1 and Scramble2) were previously published76.
sgRNA vector cloning
The sgRNA cloning vector pX-sgRNA-eGFP-MI was created by modifying pSpCas9(BB)−2A-Puro (pX459) v2.077 (Addgene plasmid #62988) by removing Cas9 and replacing it with eGFP to allow for visualization. Additionally, the sgRNA stem-loop was extended and an A–U base pair flip was utilized to improve sgRNA stability and assembly with dCas978. The vector cloning protocol was adapted from Feng Zheng’s group79. In short, sgRNA oligonucleotides ordered from Integrated DNA Technologies (IDT) were duplexed using: 10 µM sgRNA forward oligo, 10 µM sgRNA reverse oligo, 10U T4 polynucleotide kinase (NEB, M0201L), and 1x T4 ligation buffer under the following conditions: 37 °C for 30 min, 95 °C for 5 min, then ramp down to 25 °C at 5 °C/min. Next, duplexed sgRNAs were diluted to 1:100. 2 µL of diluted sgRNA was ligated with 100 ng pX-sgRNA-eGFP-MI that had been linearized with BbsI-HF (NEB, R3539S). Cloned sgRNA vectors were verified through Sanger sequencing with the human U6 promoter primer (GGC-CTA-TTT-CCC-ATG-ATT-CC).
CRISPRi screen
OVCAR3-KRAB cells were plated at 50,000 cells per well in 24-well plates using antibiotic-free RPMI media supplemented with 10% FBS. 24 h after plating, OVCAR3-KRAB cells were transfected with a total of 300 ng sgRNA vectors using Fugene 6 following the manufacturer’s protocol. Two sgRNAs were designed to target the BRD4 peak summit for each super-enhancer. For negative control wells (empty vector, scramble1, scramble2, Dorm1) and the well targeting the TP53 gene, a single sgRNA vector was transfected. For positive control wells (PLAG1 gene promoter, RNF4 gene promoter, FOXL2 gene promoter, RNF4 enhancer, FOXL2 enhancer) and wells targeting each super-enhancer, two sgRNA vectors were co-transfected in each well. Genomic coordinates for all super-enhancers and their sgRNA sequences are found in Supplementary Data 1. After a 72 h incubation, cells were visualized for GFP expression in order to verify good transfection efficiency. After visualization, wells were washed with 1× PBS and RNA was extracted using the Zymo Quick-RNA Miniprep Kit (Zymo, R1055) with the on-column DNAseI treatment step. RNA-seq libraries were prepared using the Lexogen Quantseq 3’ mRNA-seq FWD Library Prep Kit (Lexogen QuantSeq, 015.2 × 96) and the PCR Add-On Kit for Illumina (Lexogen QuantSeq, 020.96).
CRISPRi for SE14 and SE60
OVCAR3-KRAB cells were plated at 200,000 cells/well in six-well plates using antibiotic-free RPMI media supplemented with 10% FBS. 24 h after plating, each well of OVCAR3-KRAB cells were transfected with 1.5 µg sgRNA vector using Fugene 6 (Promega, E2691) following the manufacturer’s protocol. A single sgRNA vector was transfected for the negative control wells (Scramble1). Two sgRNA vectors were co-transfected for wells targeting SE14 and SE60. After a 72 h incubation, cells were visualized for GFP expression in order to verify good transfection efficiency. After visualization, wells were washed with 1× PBS and RNA was extracted using the Zymo Quick-RNA Miniprep Kit (Zymo, R1055) with the on-column DNAseI treatment step. Experiments were conducted three to four times to ensure reproducibility.
Super-enhancer knockout with CRISPR-Cas9
Deletion of super-enhancers was performed using CRISPR-cas9 following existing protocols79–81. In short, the BRD4 peak summit was targeted for each super-enhancer and sgRNA target sites were selected using the CRISPOR web tool52. The guide RNA sequences and their genomic coordinates can be found in Supplementary Table 1. Guide oligos used for knockouts were ordered from Integrated DNA Technologies. Oligos were duplexed and cloned into pSpCas9(BB)−2A-Puro (PX459) V2.0 (Addgene Plasmid #62988). Per super-enhancer targeted, four complete guide RNA plasmids (two 5’ and two 3’ of the target site) were transfected into OVCAR3 cells using Fugene 6 (Promega, E2691) following the manufacturer’s protocol. Three days post-transfection, positive clones underwent puromycin selection for 7 days. Deletion of super-enhancer targets was confirmed by genotyping PCR with two sets of primers: (1) external primers flanking the SE deletion sites, and (2) internal primers for identification of wild-type alleles (Supplementary Table 2). Deletion of SE14 resulted in a ~2500–2800 bp deletion. Deletion of SE60 resulted in a ~1700–1800 bp deletion. Additionally, correct super-enhancer knockout cells were submitted for Sanger DNA sequencing to verify the boundaries of deletions.
RNA-seq
For the CRISPRi screen, RNA-seq libraries were prepared using the Lexogen Quantseq 3’ mRNA-seq FWD Library Prep Kit (Lexogen QuantSeq, 015.2 × 96) and the PCR Add-On Kit for Illumina (Lexogen QuantSeq, 020.96). Libraries underwent 75 bp single-end sequencing on an Illumina NextSeq 500 instrument by UNC’s Translational Genomics Lab.
For RNA-seq of OVCAR3 WT, SE60KO1, SE60KO2, SE14KO1rep1, and SE14KO1rep2, libraries were prepared with the Illumina TruSeq Stranded mRNA Kit following the manufacturer’s protocol. Libraries underwent 75 bp paired-end sequencing on an Illumina NextSeq 500 instrument by UNC’s Translational Genomics Lab.
For SE60KO3, SE14KO2, SE14KO3, scramble1-KRAB, and SE60-KRAB, libraries were created and sequenced by Novogene. These libraries underwent 150 bp paired-end sequencing on an Illumina NovaSeq 6000 instrument.
ChIP-seq
OVCAR3-KRAB cells were transfected with sgRNAs targeting either scramble1 (non-targeting) or SE60 (2 pooled sgRNAs) following the same protocol mentioned above for “CRISPRi for SE14 and SE60.” For each of the two replicates conducted per condition, 1–2 million cells were used for fixation with 11% formaldehyde following Active Motif’s Epigenetic Services ChIP Fixation Protocol. ChIP-seq for H3K9me3 was performed by Active Motif using H3K9me3 antibody (Active Motif, 39161) with spike-in Drosophila normalization. ChIP-seq libraries underwent 75 bp single-end sequencing on an Illumina NextSeq 5000 instrument by Active Motif.
Cell proliferation assay
Cell collections were performed at Days 2, 4, and 6. On the day of collection, cells were fixed with 10% formaldehyde and stained using a 0.1% crystal violet solution. Incorporated crystal violet was extracted using 10% glacial acetic acid and the absorbance was read at 595 nm. This procedure was conducted four times to ensure reproducibility. Results are shown as the mean fold change of Day 4 and Day 6 OD 595 nm readings compared to Day 2 OD 595 nm readings ± SEM. Statistical analysis was conducted in R using a two-sided Student’s t-test.
Cell migration assay
OVCAR3 WT and SEKO cells in serum-free RPMI media were seeded to the upper chamber of a transwell insert at 60,000 cells per insert. The lower chamber contained RPMI with 10% FBS. Cells were incubated for 24 h, then all non-migrated cells were removed from the upper membrane. Cells were fixed and stained using the Hema 3 Staining Kit (Fisher Scientific, 122-911). Ten brightfield images were taken per insert and images were analyzed using the CellProfiler 4.2.1 software to count the number of cells per transwell insert. This procedure was conducted four times to ensure reproducibility. Results are shown as the mean cell count per insert ±SEM. Statistical analysis was conducted in R using a t-test.
General program versions
Unless specified, these are the versions used for scripting/analysis in R (R: 4.0.0) and Python (Python: 3.6.5) throughout the project for the bulk data analysis of CRISPRi, CRISPR-KO, CNV, and H3K27ac/BRD4 ChIP-Seq data. Unless otherwise stated, all “overlap” analysis visualization was performed using Intervene Intervene: (0.6.5)82.
RNA Seq: CRISPRi screen
General metrics
RNA-seq was performed following the pipeline put forth by LEXOGEN in the 3’ mRNA-Seq package; namely using STAR, HTSEQ, and DESEQ2. These processes will be explained in more detail below.
QC
Quality control was performed using the FastQC (version v0.11.7) tool and the results were analyzed (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). All of the metrics returned as acceptable with no clear failures. We thus proceeded with processing and analysis.
Trimming
Trimming was performed using the bbmap (https://sourceforge.net/projects/bbmap/) function bbduk.sh (version 38.46) with the following parameters ktrim=r, k = 13, useshortkmers = t, mink = 5, qtrim = r, trimq = 10, minlength = 20, ftm = 5.
Alignment
The trimmed and cleaned reads were then aligned to the HG38v12 human genome using STAR version 2.6.0a with the following parameter set–runMode alignReads,–outFilterType BySJout,–outFilterMultimapNmax 20,–alignSJoverhangMin 8,–alignSJDBoverhangMin 1,–outFilterMismatchNmax 999,–outFilterMismatchNoverLmax 0.6,–alignIntronMin 20,–alignIntronMax 1000000,–alignMatesGapMax 1000000,–readFilesCommand gunzip -c,–outSAMtype BAM SortedByCoordinate, and–outSAMattributes NH HI NM MD and otherwise default conditions83.
File formatting
The bam files from STAR were then indexed and sorted using functions in the SAMTOOLS package (version 1.9), namely samtools sort and samtools index84.
Quantification
The sorted and indexed bam files were quantified using htseq (version 0.11.2) and the gencode v29 primary assembly as a reference and with the following parameters -m intersection-nonempty, -s yes, -f bam, -r pos85.
Read distributions
The package RSeQC (version 3.0.0) was used to assess the distribution of reads across the genome. Specifically, the python program read_distribution.py was used with default parameterizations to create a summary of this information86.
Review QC
All of the alignment, counting, and cleaning program outputs were assessed with MultiQC (version 1.9) for potential issues, of which none were determined87. Default parameters were used.
Normalization
The count data were first normalized by removing all of the low count genes (genes with <1 count in every sample); this data was then read into DESEQ2 (DESeq2_1.30.1)54. Within DESEQ2 normalized by scaling and size factors followed by a VST transformation. Batch effects were addressed by utilizing the SVT program (part of the DESEQ2 package using sva_3.38.0) and variation from two surrogate variables was removed for the final analysis. The process used for this step of the analysis can be followed within the script Screen_Preprocessing.R.
Determination of DEGs
Differential gene expression was determined by utilizing a rank-based approach similar to the ranking method used by CMAP for their single replicate screens53. Genes were ranked in order of expression (rank 1 being the highest expressed, n being the lowest) within every sample, then all samples were aggregated and a global rank was assigned for every gene. Next, the change in rank was determined between the within-sample rank and the global rank for every gene in every sample. These changes in rank were used to build a distribution of all rank changes for eFDR analysis. The process used for this step of the analysis can be followed within the script OVCAR3_Screen_Analysis_with_Plotting_LFC_Comparison.ipynb.
Empirical false discovery rate analysis
Empirical false discovery rate, an empirically derived variation of the false discovery rate, was determined by choosing a rank change threshold and assessing the median number of genes across controls beyond that threshold as compared to a given sample47. For example, if there is a median of 4 genes in the controls and 40 genes in Sample A; the eFDR for this comparison would be 4/40 or 10%. The process used for this step of the analysis can be followed within the script OVCAR3_Screen_Analysis_with_Plotting_LFC_Comparison.ipynb.
Relative expression correlation analysis
A log2-fold change was calculated between all genes in a sample and the median of the controls. All genes determined as significant by the rank-based analysis (across all super-enhancers) were aggregated into one pool of genes. This pool of genes was then used to compare RC to LFC values within each super-enhancer to determine the correlation of these sets of values. The process used for this step of the analysis can be followed within the script OVCAR3_Screen_Analysis_with_Plotting_LFC_Comparison.ipynb.
Clustering
KMeans clustering analysis was used to cluster the differentially ranked gene list. Three clusters were determined as optimal by analysis of the elbow plot and these clusters were then applied to the data. Unsupervised hierarchical clustering was then used to determine the super-enhancer relationships. The process used for this step of the analysis can be followed within the script OVCAR3_Screen_Analysis_with_Plotting_LFC_Comparison.ipynb.
Pathway analysis
Genes detected from the differential expression analysis were analyzed using CancerSEA and the molecular signatures database55–57. This program performs pathway analysis using cell-type specific information relevant to cancer based on available single-cell datasets. All of the genes in a given KMeans cluster were fed into this set of programs as a gene list and results were retrieved.
RNA Seq: CRISPR KO
General metrics
RNA-Seq was performed following a similar pipeline to that used in the screen analysis with parameters adjusted to account for differences in the data (paired-end with greater depth); namely using STAR, HTSEQ, and DESEQ2. This will be expounded in more detail below.
QC
Quality control was performed using the FastQC (version v0.11.7) tool (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). All of the metrics returned as clear or warnings with no failures.
Trimming
No trimming was needed or performed.
Alignment
The reads were then aligned to the HG38v12 human genome using STAR version 2.6.0a with the following parameter set–runMode alignReads,–outFilterType BySJout,–outFilterMultimapNmax 20,–alignSJoverhangMin 8,–alignSJDBoverhangMin 1,–outFilterMismatchNmax 999,–outFilterMismatchNoverLmax 0.6,–alignIntronMin 20,–alignIntronMax 1000000,–alignMatesGapMax 1000000,–readFilesCommand gunzip -c,–outSAMtype BAM SortedByCoordinate,–outSAMattributes NH HI NM MD83.
File formatting
The bam files from STAR were then indexed and sorted using functions in the SAMTOOLS package (version 1.9), namely samtools sort and samtools index84.
Quantification
The sorted and indexed bam files were quantified using htseq (version 0.11.2) using the gencode v29 primary assembly (gencode.v29.annotation.gff3) as a reference and with the following parameters -m union, -nonunique all, -s reverse,–type=gene,–additional, attr=gene_name, -f bam, -r pos85.
Read distributions
The package RSeQC (version 3.0.0) was used to assess the distribution of reads across the genome. Specifically, the python program read_distribution.py was used with default parameterizations to create a summary of this information86.
Review QC
All of the alignment, counting, and cleaning program outputs were assessed with MultiQC (version 1.9) for potential issues87. Default parameters were used and all of the reports were good.
Normalization (batch effect detection)
The count data were first normalized by removing all of the low count genes (genes with <1 count in every sample); this data was then read into DESEQ2 (DESeq2_1.30.1)54. Within the DESEQ framework, the counts data were adjusted for scaling and size factors followed by a VST transformation. Batch effects were addressed by utilizing the SVT program (sva_3.38.0) and variation from one surrogate variable was accounted for in the DESEQ2 model. This process was completed and can be followed using DESEQ2_2021Reps_RNA_SVA_Plotting_V2.Rmd.
Normalization
The pre-VST data was used for standard in-program normalization by DESEQ2 during the differential expression analysis procedure. This process was completed and can be followed using DESEQ2_2021Reps_RNA_SVA_Plotting_V2.Rmd.
Determination of DEGs
Differential gene expression was determined by utilizing DESEQ2 and default parameters. Genes called as differentially expressed at an FDR-adjusted p-value less than 0.0005 were identified and collected for analysis and figure making. This process was completed and can be followed using DESEQ2_2021Reps_RNA_SVA_Plotting_V2.Rmd.
Pathway analysis
Genes detected from the differential expression analysis were analyzed using CancerSEA and the molecular signatures database55–57. This program performs pathway analysis using cell-type-specific information relevant to cancer based on available single-cell datasets. The top 100 most significant downregulated genes from differential expression analysis were fed into this program as a gene list and results relevant to ovarian cancer were retrieved. This process was completed and can be followed using DESEQ2_2021Reps_RNA_SVA_Plotting_V2.Rmd.
Survival analysis
To perform survival analyses we made use of the KM plotter tool34,35. This tool allows a user to look at the effect that the expression of induvial genes or a gene set has on overall survival across a number of cancer patients from 15 ovarian cancer datasets. We looked at the top 100 genes ordered by adjusted P-value (the top 100 most significant genes) as a set (using the median expression of the whole group); and/or looked at genes individually.
CRISPRi RNA-Seq analysis
General metrics
RNA-Seq was performed following a similar pipeline to that used in the screen analysis with parameters adjusted to account for differences in the data (paired-end with greater depth); namely using STAR, HTSEQ, and DESEQ2. This will be expounded in more detail below.
QC
Quality control was performed using the FastQC (version v0.11.7) tool (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). All of the metrics returned as clear or warnings with no failures.
Trimming
No trimming was needed or performed.
Alignment
The reads were then aligned to the HG38v12 human genome using STAR version 2.6.0a with the following parameter set–runMode alignReads,–outFilterType BySJout,–outFilterMultimapNmax 20,–alignSJoverhangMin 8,–alignSJDBoverhangMin 1,–outFilterMismatchNmax 999,–outFilterMismatchNoverLmax 0.6,–alignIntronMin 20,–alignIntronMax 1000000,–alignMatesGapMax 1000000,–readFilesCommand gunzip -c,–outSAMtype BAM SortedByCoordinate,–outSAMattributes NH HI NM MD83.
File formatting
The bam files from STAR were then indexed and sorted using functions in the SAMTOOLS (version 1.9) package, namely samtools sort and samtools index84.
Quantification
The sorted and indexed bam files were quantified using htseq (version 0.11.2) using the gencode v29 primary assembly (gencode.v29.annotation.ggf3) as a reference and with the following parameters parameters -m union, -nonunique all, -s reverse,–type=gene,–additional, attr=gene_name, -f bam, -r pos85.
Read distributions
The package RSeQC (version 3.0.0) was used to assess the distribution of reads across the genome. Specifically, the python program read_distribution.py was used with default parameterizations to create a summary of this information86.
Review QC
All of the alignment, counting, and cleaning program outputs were assessed with MultiQC (version 1.9) for potential issues87. Default parameters were used and all of the reports were good.
Normalization
The pre-VST data was used for standard in-program normalization by DESEQ2 during the differential expression analysis procedure. This process can be followed using DESEQ2_RNA_Plotting_CRISPRi_Analysis_Revised.Rmd.
Determination of DEGs
Differential gene expression was determined by utilizing DESEQ2 and default parameters. Genes called as differentially expressed at an FDR adjusted p-value less than 0.0005 were identified and collected for analysis and figure making. This process can be followed using DESEQ2_RNA_Plotting_CRISPRi_Analysis_Revised.Rmd.
Pathway analysis
Genes detected from the differential expression analysis were analyzed using CancerSEA and the molecular signatures database55–57.
Survival analysis
To perform survival analyses we made use of the KM plotter tool34,35. This tool allows a user to look at the effect that the expression of induvial genes or a gene set has on overall survival across a number of cancer patients from 15 ovarian cancer datasets. We looked at the top 100 genes ordered by adjusted P-value (the top 100 most significant genes) as a set (using the median expression of the whole group); and/or looked at genes individually.
Copy number analysis
Gathering
The copy number and RNA-seq data for this analysis were downloaded from the TCGA repository Firebrowse (http://firebrowse.org/) which contains the data used in the TCGA analysis of ovarian cancer10. We used the TCGA patient barcodes to determine if a tumor was from normal tissue or cancer patients. Samples were subset based on these barcodes to select for tumors. Additionally, for the CNVeQTL analysis, samples unique to each dataset (RNA or Copy Number) were removed. To perform this, we looked for matching patient identifiers between RNA-seq and copy number data and kept any data with ID overlaps.
Windowing
The autosomal (Chr 1–22) genome (hg19) was divided into 15 kb bins using python. We decided to use a sliding window size of 15 kb based on the overall size distribution of our super-enhancers. Since the median size of our super-enhancers is 21 kb, we wanted a window size similar to the median size but smaller, as smaller windows allow for better resolution. We settled on 15 kb as being close to the median size and small enough to give us good resolution, yet large enough to be computationally feasible (smaller window sizes create larger datasets and increase the computational burden of assigning signals and analyzing the data). This process was completed using Split_Genome_into_windows.ipynb.
Super enhancer overlap
Bedtools (version v2.25.0) intersect (one bp overlap) was used to create a subset of the whole genome 15 kb sliding windows that overlapped the super-enhancer regions. This gave us two data sets, one being the whole genome sliding windows and the other being SE overlapping sliding windows (a subset of the whole genome group).
Copy number assignment—whole genome
Patient copy number was assigned to each 15 kb window for every chromosome individually using the script OVLP_CNV_Whole_Genome.py. If the patient data overlapped a sliding window by at least one base pair, signal from the patient was assigned to this window. Once this was performed for every chromosome individually, the chromosome data were aggregated using Combine_CNV_Chr_Files.ipynb.
Copy number assignment—super-enhancer overlap
Patient copy number was assigned to each 15 kb bin for every chromosome individually using the script SEOVLP_CNV.py. If the patient data overlapped a sliding window by at least one base pair, signals from the patient was assigned to this window. Chromosome data was aggregated using Combine_CNV_Chr_Files.ipynb.
CNVeQTL analysis
Copy Number Expression QTL was identified using MatrixQTL where the SE overlapping CNV windows were defined as the “SNPs” and the matching RNA-Seq data served as the Expression dataset46. Of note, genes with over 100 NA, missing, or 0 values were removed from this dataset prior to analysis. The CNV and RNA data were also converted into float values for ease of use in MatrixEQTL. CNVeQTL was identified using the linear MatrixEQTL algorithm on the original data with a P-value threshold of 1e−3. In order to determine significance, the null hypothesis was induced and used to determine an empirical FDR. The null hypothesis, in which there is no association between specific copy number regions and gene expression, was induced by randomly permuting the column assignments of the RNA-seq data, the CNV data was left alone. This maintains the variance structure of the CNV data and merely changes which CNV data column gets matched with a given RNA-Data column. For example, CNV columns 1–4 (corresponding to patients 1–4) might now be matched with RNA columns 30, 75, 6, and 210; this allows us to use the same overall data and investigate what happens where there is no link between CNV and RNA values (as patient 1’s CNV values should be random in relation to the gene expression of patient 30). MatrixEQTL was then run using the original copy number data and the new column shuffled RNA-seq data; this shuffling and running of MatrixEQTL were performed 100,000 times. The median number of significant eQTLs detected across all 100k null conditions was used as the numerator for the empirical false discovery rate analysis, with the experimental results being the denominator. There is some variability in eFDR, as no seed was set and the permutations are random, but all repeats of 100k (3 repeats or 300k trials) returned an eFDR <0.1 or 10%. This process was completed using CNV_eQTL.R.
Determining super enhancer amplification
In order to assess whether the super-enhancer regions were amplified, we compared the distribution of CNV values in the super-enhancer overlapping sliding windows with the whole genome by sub-setting and direct comparison. We performed 10k random subset comparisons, and one direct comparison. In any given comparison, we took the 336 super-enhancer overlapping windows and then randomly drew 336 windows from the whole genome background; these two sets were then compared for significant differences using a Welch’s one-sided t-test. This analysis allowed us to determine if the super-enhancer overlapping group was significantly amplified relative to the randomly drawn subset. For the direct comparison, we took all 336 SE overlapping windows and directly compared the CNV values across these windows to the ~192,000 total regions using the same t-test metric. This process was completed using the script OVCAR_CNV_Comparison_Final.R.
Survival analysis
The effect of amplification of these regions on overall survival in patients was calculated using the Kaplan–Meier log rank change test and the Cox proportional hazards model32. The survival data were downloaded from the TCGA and the patient ID was mapped back to the CNV values for each patient44. These datasets were then combined into a single set formatted as described in CNV_KM_Plots.R. This combined survival and copy number dataset was then analyzed using the functions built in CNV_KM_Plots.R to provide a metric of significance for each 15 kb copy number region.
ChIP Seq (OVCAR3 BRD4 and H3K27ac)
Data acquisition
Publicly available ChIP-Seq data were downloaded from the SRA database associated with GSE101408 (experimental OVCAR3 H3K27ac condition) using fastq dump (version 2.9.0)30. This process was repeated to get BRD4 binding data for DMSO-treated OVCAR3 cells as well as the input control from GSE7756828.
Processing
The following steps were used to process each file separately (H3K27ac, BRD4 ChIP, and BRD4 sample input). At the peak calling step, the BRD4 ChIP data was informed by the processed input control. As there was no input provided for the H3K27ac data, no input was processed or utilized for this sample.
Data quality check
The quality of the data was assessed using fastqc and reads were trimmed using Trimmomatic (version 0.38) with the following parameters Leading: 30, Trailing: 30, Sliding Window: 4:30, MINLEN: 36, Phred3388.
Alignment
The fastq files were aligned to hg19 using Bowtie2 (version 2.3.5.1) with default parameters89. The output sam files were then converted to bam files using samtools and sorted/indexed.
Processing Bam files (marking duplicates)
The aligned and sorted bam files were then marked for duplicate reads using Picard MarkDuplicates (2.11.0) with the following parameters java -Xmx4G -jar $PICARD/picard.jar MarkDuplicates, VALIDATION_STRINGENCY = LENIENT, ASSUME_SORTED = true, REMOVE_DUPLICATES = false90.
Processing Bam files (removing duplicates)
The duplicate reads marked by Picard MarkDuplicates 2.11.0 were then removed by samtools using the command samtools view -F 1804 -b in.bam > clean.bam.
Create tagAlign files
A tagAlign file was generated using the following command.
bamToBed -i clean.bam | awk ‘BEGIN{OFS = “\t”}{$4 = “N”;$5 = “1000”;print $0}’ | tee clean.tagAlign | gzip -c > clean.tagAlign.gz.
Peak calling
Peaks were identified using MACS2 (version 2.2.6) with the following parameterization36. The input sample was used as the control for the BRD4 ChIP data; the H3K27ac data was processed without an input with MACS2 determining the control by default processes. The BRD4 data was processed with the following parameters -g hs, -p 1e-2,–nomodel,–extsize 121, -B. The H3K27ac data was processed with the parameter set -g hs, -p 1e-2,–nomodel,–extsize 218, -B.
Determination of the final peak set
The called peaks were then intersected with all genes in the hg19 human genome, using bedtools intersect (1 bp overlap) and overlapping regions were removed (https://bedtools.readthedocs.io/en/latest/). The remaining peaks from H3K27ac and BRD4 that did not overlap genes were then intersected using bedtools (1 bp overlap) and regions with both an H3K27ac and BRD4 peak were kept (using the BRD4 coordinates).
Creating BigWigs
The fold enrichment of the bam files was calculated across these peaks for both H3K27ac and BRD4 using macs2 bdgcmp and the -m ppois parameter. As the H3K27ac had no input we felt that in order to allow for fair comparison both H3K27ac and BRD4 bedgraph files should use the -m ppois parameter (we did also generate a fold enrichment aka FE bedgraph for BRD4 to ensure it was comparable to the ppois version). These bedgraph files were then converted to bigwigs using bedGraphToBigWig (version v4) from UCSC.
Calling super enhancers
Super enhancers were then identified using the ROSE41 pipeline (version 0.1, python 2.7) with default parameters.
Meta-analysis
Meta plots and heatmaps for these data were created using Deeptools (version 3.1.0). We generated matrices using signals from the bigwig files and the overlapping 12,339 peaks as the regions. These matrices were then used for plotting.
ChIP-Seq (H3K9me3)
ChIP-Seq analysis for H3K9me3 was performed by Active Motif following their spike in protocol. The following is a modified excerpt from the workflow provided to us.
Sequence analysis
The 75-nt single-end (SE75) sequence reads generated by Illumina sequencing (using NextSeq 500) were mapped to the genome using the BWA algorithm (“bwa aln/samse” with default settings). Alignment information for each read is stored in the BAM format. Only reads that pass Illumina’s purity filter, align with no more than 2 mismatches, and map uniquely to the genome were used in the subsequent analysis. In addition, duplicate reads (“PCR duplicates”) were removed.
Determination of fragment density
Since the 5´-ends of the aligned reads (=“tags”) represent the end of ChIP/IP-fragments, the tags were extended in silico (using Active Motif software) at their 3´- ends to a length of 200 bp, which corresponds to the average fragment length in the size-selected library. To identify the density of fragments (extended tags) along the genome, the genome was divided into 32-nt bins and the number of fragments in each bin is determined. This information (“signal map”; histogram of fragment densities) is stored in a bigWig file. bigWig files also provide the peak metrics in the Active Motif analysis program described below.
Peak finding
The generic term “Interval” is used to describe genomic regions with local enrichments in tag numbers. Intervals are defined by the chromosome number and a start and end coordinate. The peak caller used at Active Motif for this project was SICER91. This method was used to detect significant enrichments in the ChIP/IP data file when compared to the Input data file or relative to neighboring background regions.
Additional Analysis Steps:
(a) Standard Normalization: In the default analysis, the tag number of all samples (within a comparison group) is reduced by random sampling to the number of tags present in the smallest sample.
(b) Spike-in Adjustment: Spike-in of Drosophila chromatin was performed; the number of test tags was adjusted (again by down-sampling) by a factor that would result in the same number of spike-in Drosophila tags for each sample.
Merged region analysis
To compare peak metrics between 2 or more samples, overlapping Intervals (orange bars in diagram below) were grouped into “Merged Regions” (green bars), which are defined by the start coordinate of the most upstream Interval and the end coordinate of the most downstream Interval (=union of overlapping Intervals; “merged peaks”). In locations where only one sample has an Interval, this Interval defines the Merged Region. The use of Merged Regions was necessary because the locations and lengths of Intervals are rarely exactly the same when comparing different samples. Furthermore, with this approach fragment density values could be obtained even for samples for which no peak was called.
Annotations
After defining the Intervals and Merged Regions, their genomic locations along with their proximities to gene annotations and other genomic features are determined. In addition, average and peak (i.e. at “summit”) fragment densities within Intervals and Merged Regions were compiled.
Differential binding analysis
DESeq2 was used to determine regions of differential binding.
Active motif provided us with a list of program versions listed here bcl2fastq2 (v2.20), bwa (v0.7.12), Samtools (v0.1.19), BEDtools (v2.25.0), MACS2 (v2.1.0), SICER (v1.1), wigToBigWig (v4).
Hi-C
In situ Hi-C
OVCAR3 cells were grown under recommended culture conditions in RPMI media supplemented with 10% FBS and 1% penicillin/streptomycin. Four to five million cells were fixed with 1% formaldehyde for 10 min, then cell pellets were flash frozen and stored at −80 °C.
In situ Hi-C was performed as previously described92. In short, pellets were lysed in ice-cold Hi-C lysis buffer (10 mM Tris–HCl pH 8.0, 10 mM NaCl, 0.2% IGEPAL CA630) with 50 μL of protease inhibitors for 15 min on ice. Cells were pelleted and washed once more using the same buffer. Pellets were resuspended in 50 μL of 0.5% SDS and incubated at 62 °C for 7 min. Reactions were quenched with 145 μL water and 25 μL 10% Triton X-100 at 37 °C for 15 min. Chromatin was digested overnight with 25 μL of 10X NEBuffer2 and 100U of MboI at 37 °C with rotation.
Reactions were incubated at 62 °C for 20 min to inactivate MboI, then cooled to RT. Fragment overhangs were repaired by adding 37.5 μL 0.4 mM biotin-14-dATP; 1.5 μL each 10 mM dCTP, dGTP, dTTP; 8 μL 5U/μL DNA Polymerase I, Large (Klenow) Fragment and incubating at 37 °C for 1.5 h with rotation. Ligation was performed by adding 673 μL water, 120 μL 10X NEB T4 DNA ligase buffer, 100 μL 10% Triton X-100, 6 μL 20 mg/mL BSA, and 1 μL 2000 U/μL T4 DNA ligase and incubating at RT for 4 h with slow rotation. Samples were pelleted at 2500×g, resuspended in 432 μL water, 18 μL 20 mg/mL proteinase K, 50 μL 10% SDS, and 46 μL 5 M NaCl, incubated at 55 °C for 30 min, and then transferred to 68 °C overnight.
Samples were cooled to RT and 1.6× volumes of pure ethanol and 0.1× volumes of 3 M sodium acetate pH 5.2 were added to each sample, which were subsequently incubated at −80 °C for over 4-6 h. Samples were spun at max speed at 2 °C for 15 min and washed twice with 70% ethanol. The resulting pellet was dissolved in 130 μL of 10 mM Tris–HCl pH 8.0 and incubated at 37 °C for 1–2 h. Samples were stored at 4 °C overnight.
DNA was sheared using the Covaris LE220 (Covaris, Woburn, MA) to a fragment size of 300–500 bp in a Covaris microTUBE. DNA was transferred to a fresh tube and the Covaris microTUBE was rinsed with 70 μL of water and added to the sample. A 1:5 dilution of DNA was run on a 2% agarose gel to verify successful shearing.
Sheared DNA was size selected using AMPure XP beads. 0.55× volumes of 2× concentrated AMPure XP beads were added to each reaction and incubated at RT for 5 min. Beads were reclaimed on a magnet and the supernatant was transferred to a fresh tube. 30 μL of 2× concentrated AMPure XP beads were added and incubated for 5 min at RT. Beads were reclaimed on a magnet and washed with fresh 70% ethanol. Beads were dried for 5 min at RT prior to DNA elution in 300 μL of 10 mM Tris–HCl pH 8. Undiluted DNA was run on a 2% agarose gel to verify successful size selection between 300 and 500 bp.
150 μL of 10 mg/mL Dynabeads MyOne Streptavidin T1 beads were washed with 400 μL of 1× Tween washing buffer (TWB; 250 μL Tris–HCl pH 7.5, 50 μL 0.5 M EDTA, 10 mL 5 M NaCl, 25 μL Tween 20, 39.675 μL water). Beads were then resuspended in 300 μL of 2X binding buffer (500 μL Tris–HCl (pH 7.5), 100 μL 0.5 M EDTA, 20 mL 5 M NaCl, 29.4 mL water), added to the DNA sample, and incubated at RT for 15 min with rotation. DNA-bound beads were then washed twice with 600 μL of 1X TWB at 55 °C for 2 min with shaking. Beads were resuspended in 100 μL 1× NEBuffer T4 DNA ligase buffer, transferred to a new tube, and reclaimed.
Sheared ends were repaired by resuspending the beads in 88 μL of 1× NEB T4 DNA Ligase Buffer with 1 mM ATP, 2 μL of 25 mM dNTP mix, 5 μL of 10 U/μL NEB T4 PNK, 4 μL of 3 U/μL NEB T4 DNA polymerase I, and 1 μL of 5 U/μL NEB DNA polymerase 1, large (Klenow) fragment and incubating at RT for 30 min. Beads were washed two more times with 1× TWB for 2 min at 55 °C with shaking. Beads were washed once with 100 μL of 1× NEBuffer 2, transferred to a new tube, and resuspended in 90 μL of 1X NEBuffer 2, 5 μL of 10 mM dATP, and 5 μL of NEB Klenow exo minus, and incubated at 37 °C for 30 min. Beads were washed two more times with 1× TWB for 2 min at 55 °C with shaking. Beads were washed in 100 μL 1× Quick Ligation Reaction Buffer, transferred to a new tube, reclaimed, and resuspended in 50 μL of 1× NEB Quick Ligation Reaction Buffer. 2 μL of NEB DNA Quick Ligase and 3 μL of an appropriate Illumina indexed adapter (TruSeq nano) were added to each sample before incubating at RT for 15 min. Beads were reclaimed and washed twice with 1× TWB for 2 min at 55 °C. Beads were washed in 100 μL 10 mM Tris–HCl pH 8, transferred to a new tube, reclaimed and resuspended in 50 μL of 10 mM Tris–HCl pH 8.
Hi-C libraries were amplified directly off T1 beads with 10 cycles in 5 μL of PCR primer cocktail, 20 μL of Enhanced PCR mix, and 25 μL of DNA on beads. The PCR settings were as follows: 3 min at 95 °C followed by 4-12 cycles of 20 s 98 °C, 15 s at 60 °C, and 30 s at 72 °C. Samples were held at 72 °C for 5 min before lowering for holding at 4 °C. Amplified samples were transferred to a new tube and brought to 250 μL in 10 mM Tris–HCl pH 8.
Beads were reclaimed and the supernatant containing the amplified library was transferred to a new tube. Beads were resuspended in 25 μL of 10 mM Tris–HCl pH 8 and stored at −20 °C. 0.7× volumes of warmed AMPure XP beads were added to the supernatant sample and incubated at RT for 5 min. Beads were reclaimed and washed once with 70% ethanol without mixing. Ethanol was aspirated. Beads were resuspended in 100 μL of 10 mM Tris–HCl pH 8, 70 μL of fresh AMPure XP beads were added, and the solution was incubated for 5 min at RT. Beads were reclaimed and washed twice with 70% ethanol without mixing. Beads were left to dry and DNA was eluted in 25 μL of 10 mM Tris–HCl pH 8. The resulting libraries were next quantified by Qubit and Tapestation. A low-depth sequence was performed first using the Miniseq sequencer system (Illumina) and analyzed using the Juicer pipeline to assess quality. The resulting libraries underwent paired-end 2 × 150 bp sequencing on an Illumina NovaSeq sequencer. Each replicate was sequenced to an approximate depth of 730 million reads. The full sequencing depth was approximately 2.92 billion reads.
Hi-C data processing and analysis
In situ Hi-C datasets were processed using dietJuicer, a modified version of the Juicer Hi-C pipeline (https://github.com/EricSDavis/dietJuicer), using default parameterization93. Reads were aligned to the hg19 human genome (using Mbol restriction enzyme) with bwa (version 0.7.17). Four biological replicates were aligned and then these replicates were merged for a total of 2,922,558,308 Hi-C read pairs in OVCAR3 cells yielding 2,598,024,810 valid Hi-C contacts (88.90%). The resulting Hi-C contact matrix was next normalized with the “KR” matrix balancing algorithm. This was done in order to adjust for regional background differences in chromatin accessibility and allow for proper visualization of this data94.
CRISPR-KO gene targets were identified as direct or indirect targets using Hi-C contact frequency. Specifically, we compared the fold-change in observed over expected contact frequency between SE14 or SE60 and their respective gene targets with 100 permutations of distance-matched region-gene pairs as controls. Since distance-matching is only relevant for regions within a chromosome, we restricted our analysis to intra-chromosomal pairs. Direct targets were defined as SE-gene pairs with an observed/expected contact frequency greater than the 75th percentile of the control distribution. We performed this analysis on (1) CRISPR-KO-validated target genes and (2) significantly down-regulated (LFC < −0.5) CRISPR-KO-validated target genes. The analysis was conducted in R (4.1.0) using the following R/Bioconductor packages: GenomicRanges (1.45.0), data.table (1.14.2), Homo.sapiens (1.3.1), InteractionSet (1.21.1), plyranges (1.13.1), ggplot2 (3.3.5), ggrepel (0.9.1)95. Example regions were visualized with the plotgardener (1.0.3) Bioconductor package. Scripts can be made available upon request96.
Hi-C ABC analysis
We used the ABC method62 with slight modifications to determine direct super-enhancer (SE) KO target genes. BRD4 signal for each SE was used to measure enhancer activity and 50 kb resolution, KR-normalized Hi-C counts were used to measure contact between each SE and their putative target genes. Putative targets were defined as expressed genes (baseMean > 100 counts). Activity and contact vectors were log2-transformed and scaled before being multiplied together to create an ABC score for each SE-target pair.
Single-cell analysis
Data acquisition
We obtained the single-cell RNA-seq and single-cell ATAC-seq data from the GEO accession number GSE173682.
scRNA-seq data processing and barcode quality-control (QC)
The filtered feature-barcode matrix was converted into a Seurat object for each patient tumor sample using the Seurat R package (Seurat version 3.2)97. QC and doublet removal were performed for each patient dataset individually to emrich for high-quality cells. Outlier cells were defined in each of the following metrics: log(UMI counts) (>2 MADs, low end), log(percent mitochondrial read count +1) (>2 MADs, high end), and log(number of genes expressed) (>2 MADs, low end). Non-outlier cells, according to all three criteria, were kept for doublet detection. Cells marked as doublets by both DoubletDecon98(version 1.1.5) and DoubletFinder99 (version 2.0.3) were removed from downstream analysis.
scRNA-seq clustering and cell type annotation
Seurat objects were normalized using Seurat’s NormalizeData() with the normalization method set to “LogNormalize.” Seurat’s FindVariableFeatures() was used for feature selection with the selection method set to “vst” and the number of top variable features set to 2000. Prior to principal component analysis (PCA), gene expression values were scaled using Seurat’s ScaleData(). The top 2000 most variably expressed genes were summarized by PCA and the cells were visualized in a two-dimensional UMAP plot using Seurat’s RunUMAP() with 50 principal components (PCs), as suggested by the results of Seurat’s JackStraw(). To cluster cells, graph-based Louvain clustering was performed using Seurat’s FindNeighbors() with all 50 PCs and Seurat’s FindClusters() with a resolution of 0.7. scRNA-seq UMAP plots were generated in R using ggplot2.
Cell type annotation was performed using the R package SingleR100 and was verified with gene signature enrichment scores using Seurat’s AddModuleScore(). Both scRNA-seq datasets used in this study were annotated based on a reference scRNA-seq dataset from a human ovarian tumor (sample ID: HTAPP-624-SMP-3212)101. The individual patient datasets were then combined using Seurat’s merge() and subsequently reprocessed according to the normalization, feature selection, and clustering methods described above. The resulting clusters in the merged dataset were annotated based on the majority of cell type label within each cluster. SingleR cell type annotations were verified by calculating cell type gene signature enrichment scores from PanglaoDB102 using Seurat’s AddModuleScore().
scRNA-seq differential gene expression analysis
Differential gene expression was computed using Seurat’s FindMarkers() with the “test.use” parameter set to “wilcox” for the Wilcoxon Rank Sum test. Genes with a Bonferroni-corrected P-value ≤ 0.01 and average logFC ≥ 0.1 were deemed upregulated in the cancer epithelial fraction relative to the remaining cell type clusters.
scATAC-seq data processing and barcode quality-control (QC)
The scATAC-seq fragments file for each patient tumor sample was read into the R package ArchR (version 0.9.3) to perform barcode quality control and doublet removal103. To enrich for cellular barcodes, log10(TSS enrichement+1) and log10(number of unique fragments) were used as QC metrics, both of which showed a bimodal distribution. Gaussian mixture models (GMM), implemented in the R package mclust104, were used to estimate barcode cutoff thresholds for log10(TSS enrichment+1) and log10(number of unique fragments). Barcodes above these GMM-estimated thresholds in both metrics were retained. ArchR’s addDoubletScores() function was used to calculate doublet enrichment scores with the knnMethod set to “UMAP” and putative doublets were then filtered out using ArchR’s filterDoublets() with default parameters.
scRNA-seq cell type label transfer to scATAC-seq
Prior to transferring labels from scRNA-seq to scATAC-seq, ArchR’s addGeneScoreMatrix() was used to infer gene activity scores in scATAC-seq using default parameters. ArchR’s addGeneIntegrationMatrix() was used to assign each of the scATAC-seq cells a cell type subcluster identity from the matching scRNA-seq data and an associated label prediction score97. Of note, this label transfer procedure was constrained to only align cells of the same patient dataset. Only scATAC-seq cells with a label prediction score >0.5 were included in downstream analyses. Moreover, only inferred cell type subclusters with >30 cells were included in the downstream analysis.
scATAC-seq peak calling and data visualization
Pseudo-bulk replicates were generated for each inferred cell type subcluster using the R package ArchR and pseudo-bulk peak calling was performed within each inferred cell type subcluster using MACS236,103. The peak calls from each inferred cell type subcluster were then merged into a single profile using ArchR’s default iterative overlap procedure to create a merged peak by barcode matrix across all cellular barcodes from both patient tumor samples. ArchR’s plotBrowserTrack() was used to visualize the scATAC-seq coverage per inferred cell type.
scATAC-seq differential peak accessibility for determining cancer-enriched enhancers
ArchR’s getMarkerFeatures() was used for differential peak accessibility analysis with the bias argument set to include both “TSSEnrichment” and “log10(number of fragments).” Differentially accessible peaks (DEPs) were identified for each cell cluster by comparing the accessibility values of peaks across all cells in a cluster (group 1) relative to the accessibility values for a group of background cells matched for TSS enrichment and read depth (group 2). This comparison was made between cancer clusters (group 1) and all remaining cell-type clusters (group 2). Peaks with Benjamini–Hochberg FDR ≤ 0.10 and Log2FC ≥ 0.25 were deemed cancer-enriched with statistically significant increased accessibility relative to the non-cancer fraction.
Enhancer motif analysis in scATAC-seq
Bedtools getfasta() was used to extract the sequences of select cancer-enriched enhancers according to the hg38 reference genome105. FIMO motif scanning with default parameters was applied to the enhancer sequences using a motif database supplied by JASPAR202064,106. FIMO motif results were ranked by Benjamini–Hochberg corrected q-values and TF expression in the cancer fraction by summing the normalized TF counts across all cells within the cancer epithelial clusters. Seurat’s VlnPlot() was used to generate TF expression violin plots.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
We thank Dr. Victoria Bae-Jump for her helpful clinical insights on ovarian cancer. We also thank the members of the Franco Lab for their helpful comments and discussions along the way. We thank the UNC Translational Genomics Lab for their technical support. This work was funded by grants from the NIH/National Cancer Institute (5-P50-CA058223-25), the Susan G. Komen Breast Cancer Research Foundation (CCR19608601), the V Foundation for Cancer Research (V2019-015), and the Rivkin Center Pilot Study Award to H.L.F. Additional support was provided by NIH grants 5T32-CA217824 (Cancer Epigenetics Training Program) to A.A.P. and M.W.L., NIGMS Training grant T32-GM067553 to E.S.D., and NIH grant R35-GM128645 to D.H.P.
Source data
Author contributions
H.L.F and M.R.K conceived and supervised the study with input from K.W. The computational analyses were designed and performed by M.R.K. with input from M.J.R., J.S.P., and H.L.F. The CRISPR based experiments and molecular biology were performed by K.W. with help from M.W.L. The Hi-C data were generated and analyzed by A.A.P., E.S.D., and D.H.P. The manuscript was written by M.R.K., K.W., and H.L.F. with input from all authors.
Peer review
Peer review information
Nature Communications thanks Paul Spellman, Haoyue Zhang and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Data availability
Data generated in this study (CRISPRi/CRISPR KO RNA-seq, Hi-C, and H3K9me3 ChIP-Seq) have been uploaded and are publicly available in the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) database under the accession number GSE174259.
The single cell genomics data were downloaded from GEO accession number GSE17368243. The H3K27ac ChIP-seq and BRD4 ChIP-seq were downloaded from GEO accessions GSE10140828 and GSE7756828, respectively. All FTSEC data was downloaded from GEO accession GSE6810438. TCGA expression and copy-number data10 (RNA-seq and CNV data) was downloaded using the TCGA repository Firebrowse (http://firebrowse.org/) and survival data was attained from the supplement of the TCGA clinical paper44. All ENCODE data was downloaded from the Screen database37 (https://screen.encodeproject.org/#).
Kaplan–Meier plots for gene set survival analysis were created with the publicly available KM Plot tool34,35 (https://kmplot.com/analysis/). Patient data from Fig. 1b, c are publicly available through CBioPortal6,31 (https://bit.ly/3QY91sa). The bar chart from Fig. 1b can be found under Cancer Types Summary. The boxplot from Fig. 1c can be generated under Plots by plotting TCGA PanCanAtlas Cancer Type Acronym vs. mRNA Expression, RSEM (Batch normalized from Illumina HiSeq_RNASeqV2) (log2(value + 1)) and sorting the categories by a median. For both Fig. 1b, c, the 16 highest altered/expressed TCGA cancer types are presented.
The remaining data are available within the article, Supplementary Information, or Source Data file. Source data are provided with this paper.
Code availability
Programs and Scripts for the bulk data analysis, mentioned in the methods, are located at the Github repository https://github.com/mkelly9513/OV-Project-One/releases/tag/v1.1.3 (Release version 1.1.3).
Programs and scripts for the single cell data analysis are located at the Github repository https://github.com/RegnerM2015/scOVAR_SE_Screen/releases/tag/v1.0.0 (Release version 1.0.0).
Programs and scripts for the Hi-C data analysis are available on request to https://github.com/EricSDavis (no “original” scripts were used). There are no conditions required to access these scripts aside from contacting the above individual.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Michael R. Kelly, Kamila Wisniewska.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-022-31919-8.
References
- 1.Torre LA, et al. Ovarian cancer statistics, 2018. CA Cancer J. Clin. 2018;68:284–296. doi: 10.3322/caac.21456. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Macintyre G, et al. Copy number signatures and mutational processes in ovarian carcinoma. Nat. Genet. 2018;50:1262–1270. doi: 10.1038/s41588-018-0179-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Matulonis UA, et al. Ovarian cancer. Nat. Rev. Dis. Prim. 2016;2:16061. doi: 10.1038/nrdp.2016.61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Zhang S, et al. Frequencies of BRCA1 and BRCA2 mutations among 1,342 unselected patients with invasive ovarian cancer. Gynecol. Oncol. 2011;121:353–357. doi: 10.1016/j.ygyno.2011.01.020. [DOI] [PubMed] [Google Scholar]
- 5.Hoadley KA, et al. Cell-of-Origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell. 2018;173:291–304.e6. doi: 10.1016/j.cell.2018.03.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Cerami E, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2:401–404. doi: 10.1158/2159-8290.CD-12-0095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Chen H, et al. A Pan-cancer analysis of enhancer expression in nearly 9000 patient samples. Cell. 2018;173:386–399.e12. doi: 10.1016/j.cell.2018.03.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Franco HL, et al. Enhancer transcription reveals subtype-specific gene expression programs controlling breast cancer pathogenesis. Genome Res. 2018;28:159–170. doi: 10.1101/gr.226019.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Northcott PA, et al. Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma. Nature. 2014;511:428–434. doi: 10.1038/nature13379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Cancer Genome Atlas Research, N. Integrated genomic analyses of ovarian carcinoma. Nature. 2011;474:609–615. doi: 10.1038/nature10166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lewis MW, Li S, Franco HL. Transcriptional control by enhancers and enhancer RNAs. Transcription. 2019;10:171–186. doi: 10.1080/21541264.2019.1695492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kim TK, Shiekhattar R. Architectural and functional commonalities between enhancers and promoters. Cell. 2015;162:948–959. doi: 10.1016/j.cell.2015.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Heinz S, Romanoski CE, Benner C, Glass CK. The selection and function of cell type-specific enhancers. Nat. Rev. Mol. Cell Biol. 2015;16:144–154. doi: 10.1038/nrm3949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Shlyueva D, Stampfel G, Stark A. Transcriptional enhancers: from properties to genome-wide predictions. Nat. Rev. Genet. 2014;15:272–286. doi: 10.1038/nrg3682. [DOI] [PubMed] [Google Scholar]
- 15.Pennacchio LA, Bickmore W, Dean A, Nobrega MA, Bejerano G. Enhancers: five essential questions. Nat. Rev. Genet. 2013;14:288–295. doi: 10.1038/nrg3458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Hnisz D, et al. Super-enhancers in the control of cell identity and disease. Cell. 2013;155:934–947. doi: 10.1016/j.cell.2013.09.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Thandapani P. Super-enhancers in cancer. Pharm. Ther. 2019;199:129–138. doi: 10.1016/j.pharmthera.2019.02.014. [DOI] [PubMed] [Google Scholar]
- 18.Bradner JE, Hnisz D, Young RA. Transcriptional addiction in cancer. Cell. 2017;168:629–643. doi: 10.1016/j.cell.2016.12.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ma Q, et al. Super-enhancer redistribution as a mechanism of broad gene dysregulation in repeatedly drug-treated cancer cells. Cell Rep. 2020;31:107532. doi: 10.1016/j.celrep.2020.107532. [DOI] [PubMed] [Google Scholar]
- 20.Shang S, et al. Chemotherapy-induced distal enhancers drive transcriptional programs to maintain the chemoresistant state in ovarian cancer. Cancer Res. 2019;79:4599–4611. doi: 10.1158/0008-5472.CAN-19-0215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Hnisz D, Shrinivas K, Young RA, Chakraborty AK, Sharp PA. A phase separation model for transcriptional control. Cell. 2017;169:13–23. doi: 10.1016/j.cell.2017.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Sabari, B. R. et al. Coactivator condensation at super-enhancers links phase separation and gene control. Science361, eaar3958 (2018). [DOI] [PMC free article] [PubMed]
- 23.Rhyasen GW, et al. BRD4 amplification facilitates an oncogenic gene expression program in high-grade serous ovarian cancer and confers sensitivity to BET inhibitors. PLoS ONE. 2018;13:e0200826. doi: 10.1371/journal.pone.0200826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Yin M, et al. Potent BRD4 inhibitor suppresses cancer cell–macrophage interaction. Nat. Commun. 2020;11:1833. doi: 10.1038/s41467-020-15290-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Bakshi S, McKee C, Walker K, Brown C, Chaudhry GR. Toxicity of JQ1 in neuronal derivatives of human umbilical cord mesenchymal stem cells. Oncotarget. 2018;9:33853–33864. doi: 10.18632/oncotarget.26127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Sur I, Taipale J. The role of enhancers in cancer. Nat. Rev. Cancer. 2016;16:483–493. doi: 10.1038/nrc.2016.62. [DOI] [PubMed] [Google Scholar]
- 27.Pfister SX, Ashworth A. Marked for death: targeting epigenetic changes in cancer. Nat. Rev. Drug Discov. 2017;16:241–263. doi: 10.1038/nrd.2016.256. [DOI] [PubMed] [Google Scholar]
- 28.Yokoyama Y, et al. BET inhibitors suppress ALDH activity by targeting ALDH1A1 super-enhancer in ovarian cancer. Cancer Res. 2016;76:6320–6330. doi: 10.1158/0008-5472.CAN-16-0854. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Bradbury, A., O’Donnell, R., Drew, Y., Curtin, N. J. & Sharma Saha, S. Characterisation of ovarian cancer cell line NIH-OVCAR3 and implications of genomic, transcriptomic, proteomic and functional DNA damage response biomarkers for therapeutic targeting. Cancers12, 1939 (2020). [DOI] [PMC free article] [PubMed]
- 30.Kalender-Atak Z, et al. Identification of cis-regulatory mutations generating de novo edges in personalized cancer gene regulatory networks. Genome Med. 2017;9:80. doi: 10.1186/s13073-017-0464-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Gao J, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 2013;6:pl1. doi: 10.1126/scisignal.2004088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Dinse GE, Lagakos SW. Nonparametric estimation of lifetime and disease onset distributions from incomplete observations. Biometrics. 1982;38:921–932. doi: 10.2307/2529872. [DOI] [PubMed] [Google Scholar]
- 33.Nagy Á, Munkácsy G, Győrffy B. Pancancer survival analysis of cancer hallmark genes. Sci. Rep. 2021;11:6047. doi: 10.1038/s41598-021-84787-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Gyorffy B, Lánczky A, Szállási Z. Implementing an online tool for genome-wide validation of survival-associated biomarkers in ovarian-cancer using microarray data from 1287 patients. Endocr. Relat. Cancer. 2012;19:197–208. doi: 10.1530/ERC-11-0329. [DOI] [PubMed] [Google Scholar]
- 35.Lánczky A, Győrffy B. Web-based survival analysis tool tailored for medical research (KMplot): development and implementation. J. Med. Internet Res. 2021;23:e27633. doi: 10.2196/27633. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Zhang Y, et al. Model-based analysis of ChIP-Seq (MACS) Genome Biol. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Moore JE, et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature. 2020;583:699–710. doi: 10.1038/s41586-020-2493-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Coetzee SG, et al. Cell-type-specific enrichment of risk-associated regulatory elements at ovarian cancer susceptibility loci. Hum. Mol. Genet. 2015;24:3595–3607. doi: 10.1093/hmg/ddv101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lawrenson K, et al. A study of high-grade serous ovarian cancer origins implicates the SOX18 transcription factor in tumor development. Cell Rep. 2019;29:3726–3735.e4. doi: 10.1016/j.celrep.2019.10.122. [DOI] [PubMed] [Google Scholar]
- 40.Lõhmussaar K, et al. Assessing the origin of high-grade serous ovarian cancer using CRISPR-modification of mouse organoids. Nat. Commun. 2020;11:2660. doi: 10.1038/s41467-020-16432-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Whyte WA, et al. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013;153:307–319. doi: 10.1016/j.cell.2013.03.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Loven J, et al. Selective inhibition of tumor oncogenes by disruption of super-enhancers. Cell. 2013;153:320–334. doi: 10.1016/j.cell.2013.03.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Regner, M. J. et al. A multi-omic single-cell landscape of human gynecologic malignancies. Mol. Cell81, 4924–4941 (2021). [DOI] [PMC free article] [PubMed]
- 44.Liu J, et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell. 2018;173:400–416.e11. doi: 10.1016/j.cell.2018.02.052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Nica AC, Dermitzakis ET. Expression quantitative trait loci: present and future. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2013;368:20120362. doi: 10.1098/rstb.2012.0362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Shabalin AA. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012;28:1353–1358. doi: 10.1093/bioinformatics/bts163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. 1995;57:289–300. [Google Scholar]
- 48.Fulco CP, et al. Systematic mapping of functional enhancer–promoter connections with CRISPR interference. Science. 2016;354:769–773. doi: 10.1126/science.aag2445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Larson MH, et al. CRISPR interference (CRISPRi) for sequence-specific control of gene expression. Nat. Protoc. 2013;8:2180–2196. doi: 10.1038/nprot.2013.132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Gilbert LA, et al. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell. 2013;154:442–451. doi: 10.1016/j.cell.2013.06.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Qi LS, et al. Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell. 2013;152:1173–1183. doi: 10.1016/j.cell.2013.02.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Concordet JP, Haeussler M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res. 2018;46:W242–W245. doi: 10.1093/nar/gky354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Musa A, et al. A review of connectivity map and computational approaches in pharmacogenomics. Brief Bioinform. 2018;19:506–523. doi: 10.1093/bib/bbw112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Yuan H, et al. CancerSEA: a cancer single-cell state atlas. Nucleic Acids Res. 2019;47:D900–D908. doi: 10.1093/nar/gky939. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Subramanian A, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Liberzon A, et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1:417–425. doi: 10.1016/j.cels.2015.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Lieberman-Aiden E, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Fudenberg G, Mirny LA. Higher-order chromatin structure: bridging physics and biology. Curr. Opin. Genet. Dev. 2012;22:115–124. doi: 10.1016/j.gde.2012.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Lajoie BR, Dekker J, Kaplan N. The Hitchhiker’s guide to Hi-C analysis: practical guidelines. Methods. 2015;72:65–75. doi: 10.1016/j.ymeth.2014.10.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Oh JH, et al. RAE1 mediated ZEB1 expression promotes epithelial-mesenchymal transition in breast cancer. Sci. Rep. 2019;9:2977. doi: 10.1038/s41598-019-39574-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Fulco CP, et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 2019;51:1664–1669. doi: 10.1038/s41588-019-0538-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Bonifacio VDB. Ovarian cancer biomarkers: moving forward in early detection. Adv. Exp. Med. Biol. 2020;1219:355–363. doi: 10.1007/978-3-030-34025-4_18. [DOI] [PubMed] [Google Scholar]
- 64.Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27:1017–1018. doi: 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Weintraub AS, et al. YY1 is a structural regulator of enhancer–promoter loops. Cell. 2017;171:1573–1588.e28. doi: 10.1016/j.cell.2017.11.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Chen H, et al. E26 transformation (ETS)‑specific related transcription factor‑3 (ELF3) orchestrates a positive feedback loop that constitutively activates the MAPK/Erk pathway to drive thyroid cancer. Oncol. Rep. 2019;41:570–578. doi: 10.3892/or.2018.6807. [DOI] [PubMed] [Google Scholar]
- 67.Luk, I. Y., Reehorst, C. M. & Mariadason, J. M. ELF3, ELF5, EHF and SPDEF transcription factors in tissue homeostasis and cancer. Molecules23, 2191 (2018). [DOI] [PMC free article] [PubMed]
- 68.Stewart C, Ralyea C, Lockwood S. Ovarian cancer: an integrated review. Semin. Oncol. Nurs. 2019;35:151–156. doi: 10.1016/j.soncn.2019.02.001. [DOI] [PubMed] [Google Scholar]
- 69.Franco HL, Kraus WL. No driver behind the wheel? Targeting transcription in cancer. Cell. 2015;163:28–30. doi: 10.1016/j.cell.2015.09.013. [DOI] [PubMed] [Google Scholar]
- 70.Shorstova T, Foulkes WD, Witcher M. Achieving clinical success with BET inhibitors as anti-cancer agents. Br. J. Cancer. 2021;124:1478–1490. doi: 10.1038/s41416-021-01321-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Corona RI, et al. Non-coding somatic mutations converge on the PAX8 pathway in ovarian cancer. Nat. Commun. 2020;11:2020. doi: 10.1038/s41467-020-15951-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Dalla-Favera R, et al. Human c-myc onc gene is located on the region of chromosome 8 that is translocated in Burkitt lymphoma cells. Proc. Natl Acad. Sci. USA. 1982;79:7824–7827. doi: 10.1073/pnas.79.24.7824. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Ryan RJ, et al. Detection of enhancer-associated rearrangements reveals mechanisms of oncogene dysregulation in B-cell lymphoma. Cancer Discov. 2015;5:1058–1071. doi: 10.1158/2159-8290.CD-15-0370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Gasperini M, Tome JM, Shendure J. Towards a comprehensive catalogue of validated and target-linked human enhancers. Nat. Rev. Genet. 2020;21:292–310. doi: 10.1038/s41576-019-0209-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Xie S, Duan J, Li B, Zhou P, Hon GC. Multiplexed engineering and analysis of combinatorial enhancer activity in single cells. Mol. Cell. 2017;66:285–299.e5. doi: 10.1016/j.molcel.2017.03.007. [DOI] [PubMed] [Google Scholar]
- 76.Lawhorn IE, Ferreira JP, Wang CL. Evaluation of sgRNA target sites for CRISPR-mediated repression of TP53. PLoS ONE. 2014;9:e113232. doi: 10.1371/journal.pone.0113232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Ran FA, et al. Genome engineering using the CRISPR-Cas9 system. Nat. Protoc. 2013;8:2281–2308. doi: 10.1038/nprot.2013.143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Chen B, et al. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell. 2013;155:1479–1491. doi: 10.1016/j.cell.2013.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Cong L, et al. Multiplex genome engineering using CRISPR/Cas systems. Science. 2013;339:819–823. doi: 10.1126/science.1231143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Canver MC, et al. Characterization of genomic deletion efficiency mediated by clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 nuclease system in mammalian cells. J. Biol. Chem. 2014;289:21312–21324. doi: 10.1074/jbc.M114.564625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Mali P, et al. RNA-guided human genome engineering via Cas9. Science. 2013;339:823–826. doi: 10.1126/science.1232033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Khan A, Mathelier A. Intervene: a tool for intersection and visualization of multiple gene or genomic region sets. BMC Bioinform. 2017;18:287. doi: 10.1186/s12859-017-1708-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Dobin A, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Li H, et al. The sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Anders S, Pyl PT, Huber W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–169. doi: 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Wang L, Wang S, Li W. RSeQC: quality control of RNA-seq experiments. Bioinformatics. 2012;28:2184–2185. doi: 10.1093/bioinformatics/bts356. [DOI] [PubMed] [Google Scholar]
- 87.Ewels P, Magnusson M, Lundin S, Kaller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32:3047–3048. doi: 10.1093/bioinformatics/btw354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Broad Institute. Picard Tools. http://broadinstitute.github.io/picard (2018).
- 91.Zang C, et al. A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. Bioinformatics. 2009;25:1952–1958. doi: 10.1093/bioinformatics/btp340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Rao SS, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Durand NC, et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Knight, P. A. & Ruiz, D. A fast algorithm for matrix balancing. IMA J. Numer. Anal.33, 1029–1047 (2013).
- 95.Lawrence M, et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 2013;9:e1003118. doi: 10.1371/journal.pcbi.1003118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Kramer, N. E. et al. Plotgardener: cultivating precise multi-panel figures in R. Bioinformatics38, 2042–2045 (2022). [DOI] [PMC free article] [PubMed]
- 97.Stuart T, et al. Comprehensive integration of single-cell data. Cell. 2019;177:1888–1902.e21. doi: 10.1016/j.cell.2019.05.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.DePasquale EAK, et al. DoubletDecon: deconvoluting doublets from single-cell RNA-sequencing data. Cell Rep. 2019;29:1718–1727.e8. doi: 10.1016/j.celrep.2019.09.082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.McGinnis CS, Murrow LM, Gartner ZJ. DoubletFinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors. Cell Syst. 2019;8:329–337.e4. doi: 10.1016/j.cels.2019.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Aran D, et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol. 2019;20:163–172. doi: 10.1038/s41590-018-0276-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Slyper M, et al. A single-cell and single-nucleus RNA-Seq toolbox for fresh and frozen human tumors. Nat. Med. 2020;26:792–802. doi: 10.1038/s41591-020-0844-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Franzén, O., Gan, L.M. & Björkegren, J. L. M. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database2019, baz046 (2019). [DOI] [PMC free article] [PubMed]
- 103.Granja JM, et al. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat. Genet. 2021;53:403–411. doi: 10.1038/s41588-021-00790-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Scrucca L, Fop M, Murphy TB, Raftery AE. mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R. J. 2016;8:289–317. doi: 10.32614/RJ-2016-021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Fornes O, et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2020;48:D87–D92. doi: 10.1093/nar/gkaa516. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data generated in this study (CRISPRi/CRISPR KO RNA-seq, Hi-C, and H3K9me3 ChIP-Seq) have been uploaded and are publicly available in the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) database under the accession number GSE174259.
The single cell genomics data were downloaded from GEO accession number GSE17368243. The H3K27ac ChIP-seq and BRD4 ChIP-seq were downloaded from GEO accessions GSE10140828 and GSE7756828, respectively. All FTSEC data was downloaded from GEO accession GSE6810438. TCGA expression and copy-number data10 (RNA-seq and CNV data) was downloaded using the TCGA repository Firebrowse (http://firebrowse.org/) and survival data was attained from the supplement of the TCGA clinical paper44. All ENCODE data was downloaded from the Screen database37 (https://screen.encodeproject.org/#).
Kaplan–Meier plots for gene set survival analysis were created with the publicly available KM Plot tool34,35 (https://kmplot.com/analysis/). Patient data from Fig. 1b, c are publicly available through CBioPortal6,31 (https://bit.ly/3QY91sa). The bar chart from Fig. 1b can be found under Cancer Types Summary. The boxplot from Fig. 1c can be generated under Plots by plotting TCGA PanCanAtlas Cancer Type Acronym vs. mRNA Expression, RSEM (Batch normalized from Illumina HiSeq_RNASeqV2) (log2(value + 1)) and sorting the categories by a median. For both Fig. 1b, c, the 16 highest altered/expressed TCGA cancer types are presented.
The remaining data are available within the article, Supplementary Information, or Source Data file. Source data are provided with this paper.
Programs and Scripts for the bulk data analysis, mentioned in the methods, are located at the Github repository https://github.com/mkelly9513/OV-Project-One/releases/tag/v1.1.3 (Release version 1.1.3).
Programs and scripts for the single cell data analysis are located at the Github repository https://github.com/RegnerM2015/scOVAR_SE_Screen/releases/tag/v1.0.0 (Release version 1.0.0).
Programs and scripts for the Hi-C data analysis are available on request to https://github.com/EricSDavis (no “original” scripts were used). There are no conditions required to access these scripts aside from contacting the above individual.