Abstract
Mutations in protein-coding genes are well established as the basis for human cancer, yet it remains elusive how alterations within non-coding genome, a substantial fraction of which contain cis-regulatory elements (CREs), contribute to cancer pathophysiology. Here, we developed an integrative approach to systematically identify and characterize non-coding regulatory variants with functional consequences in human hematopoietic malignancies. Combining targeted resequencing of hematopoietic lineage-associated CREs and mutation discovery, we uncovered 1,836 recurrently mutated CREs containing leukemia-associated non-coding variants. By enhanced CRISPR/dCas9-based CRE perturbation screening and functional analyses, we identified 218 variant-associated oncogenic or tumor suppressive CREs in human leukemia. Non-coding variants at KRAS and PER2 enhancers reside in proximity to nuclear receptor (NR) binding regions and modulate transcriptional activities in response to NR signaling in leukemia cells. NR binding sites frequently co-localize with non-coding variants across cancer types. Hence, recurrent non-coding variants connect enhancer dysregulation with nuclear receptor signaling in hematopoietic malignancies.
Keywords: Non-Coding Variants, Enhancers, Epigenetics, Leukemia, Nuclear Receptor
INTRODUCTION
Advances in genome sequencing have provided critical insights into the role of pathogenic DNA alterations as cancer drivers. However, current efforts are focused on protein-coding sequences (or exomes) consisting of only about 1% of human genome. It remains unclear how alterations within non-coding genome contribute to cancer pathophysiology. Similarly, genome-wide genotype-phenotype association studies continue to reveal non-coding sequences that are altered in human diseases, although identification of causal elements remains difficult impeding drug development and therapeutics.
Enhancers are non-coding cis-regulatory elements (CREs) that determine cell identity by coordinating spatiotemporal gene expression. Major progress has been made to identify candidate enhancers by genome-wide mapping of enhancer-associated histone modifications including H3-Lys27 acetylation (H3K27ac) and H3-Lys4 mono-methylation (H3K4me1), transcription factors (TFs), and chromatin features (1–4). The biological significance of enhancers is underscored by gene expression studies showing the deterministic role of enhancers in directing cell-type-specific transcription (2,5–7). Highly marked clusters of enhancers or super-enhancers associated with developmental or cancer-related genes have been identified in various cell types (8–11). These studies suggest a model that a small set of lineage-defining enhancers determine cell identity in development and cancer.
Emerging evidence points to a critical role of non-coding regulatory variants as cancer drivers (12,13). For instance, recurrent mutations in the TERT promoter create new binding motifs for ETS family TFs to enhance gene transcription in familial and sporadic melanoma (14,15). In the context of leukemia, recurrent non-coding mutations upstream of the TAL1 proto-oncogene introduce de novo binding sites for the MYB oncoprotein in T-cell leukemia. MYB associates with these sites with CBP, RUNX1 and TAL1 to create an oncogenic super-enhancer and activates TAL1 expression (16). In another example, a single chromosomal rearrangement repositions a GATA2 enhancer to ectopically activate EVI1 and confer GATA2 haploinsufficiency in inv(3)/t(3;3) acute myeloid leukemia (AML) (17). These studies raise important questions about the extent to which non-coding variants contribute to cancer development, and how they work. Moreover, the functional impact of non-coding somatic mutations found in cancer genome sequencing has not been systematically investigated or validated. The main challenges are to distinguish candidate non-coding cancer drivers from numerous non-functional passenger mutations and to establish causality of non-coding variants in cancer pathophysiology.
Human nuclear receptors (NRs) are a family of signaling-activated TFs that play integral roles in development and cancer (18). The peroxisome proliferator-activated receptors (PPARα, β/δ, and γ) function as obligate heterodimers with the retinoid X receptors (RXRα, β, and γ) in the absence of ligands and bind to hormone response elements together with corepressor proteins. The retinoic acid receptors (RARα, β, and γ) also heterodimerize with RXRs and bind constitutively to DNA in the absence of ligands. Binding of agonist ligands to PPAR/RXR or RAR/RXR complexes leads to dissociation of corepressors and recruitment of coactivator proteins, resulting in transcriptional activation of downstream target genes. The ability of NRs to rapidly and dynamically respond to various developmental and environmental clues by modulating gene programs makes them versatile cellular ‘sensors’. As such, NRs have historically served as biomarkers for classification of several solid tumors including breast and prostate cancers and targets for hormone therapy (19). In the hematopoietic system, the fusion oncogene PML-RARA provides a diagnostic biomarker for acute promyelocytic leukemia (APL) and the molecular basis for oncoprotein-targeted therapy (20).
In this work, we developed an integrative approach to identify recurrent non-coding variants by targeted resequencing of cis-regulatory elements in human leukemia. A major limiting step towards understanding non-coding cancer variants is the lack of a high-throughput platform to assess their functional effects. To that end, we employed enhanced dCas9-based epigenetic editing and performed CRE perturbation screens. This revealed a cohort of variant-associated oncogenic and tumor suppressive CREs, and established a new molecular link between non-coding regulatory variants and nuclear receptor signaling in modifying gene programs in hematopoietic malignancies.
RESULTS
Identification of non-coding DNA alterations by targeted CRE resequencing
To determine whether human leukemia genomes harbor recurrent non-coding alterations, we generated mutational landscapes of CREs by targeted resequencing (Fig. 1A; Fig. S1). We first annotated hematopoietic lineage-associated CREs based on genome-wide profiling of active enhancer or promoter-associated H3K27ac in various normal and diseased hematopoietic cell types, including CD34+ hematopoietic stem/progenitor cells (HSPCs), lymphocytes, macrophages, monocytes, erythroblasts, lymphoma, acute lymphoblastic leukemia (ALL), and AML (total 72 ChIP-seq datasets for 51 normal and 21 disease samples, respectively; Table S1 and Fig. S1A,B). We then designed a target enrichment panel consisting of 25.2 Mb DNA sequences and 161,328 capture oligos for 22,262 blood cell-associated CREs and 86 leukemia-associated genes including DNMT3A, TET2, IDH2, and EZH2 (Table S2 and Fig. S1). Targeted resequencing was performed on 120 primary samples or leukemia cell lines consisting of 45 AML or myelodysplastic syndromes (MDS) samples, 24 lymphoma samples, 19 ALL samples, and 32 control samples (29 matched lymphocytes from AML or MDS samples, two CD34+ HSPCs from healthy donors and one mixed genomic DNA control; Table S3). The samples were sequenced to an average depth of 80x and 97% on-target enrichment.
A major challenge in non-coding genome sequencing studies has been the lack of a consensus approach for de novo mutation detection with maximized sensitivity and confidence. We sought to identify somatic non-coding variants including single nucleotide variants (SNVs) and insertions/deletions (INDELs) with a higher confidence, rather than to ascertain a larger number of possible mutations with maximal sensitivity. To this end, we integrated 5 commonly used mutation callers including GATK (21), MuTect (22), Strelka (23), Scalpel (24) and VarScan2 (25), and employed a combination of 4 callers (GATK, MuTect, Strelka and VarScan2) to identify somatic SNVs and another combination (GATK, Strelka, VarScan2 and Scalpel) for somatic INDELs in 29 tumor/normal paired AML samples, respectively (Figs. S1C and S2A). After filtering with RepeatMasker region, high-confidence (Tier I) somatic SNVs and INDELs were identified as those called by at least 2 of 4 mutation callers (Fig. S2A; Table S4). We next identified non-coding variants in all leukemia or lymphoma samples and cell lines by GATK (21) followed by removing common variants found in dbSNP database and/or RepeatMasker regions. The resulting variants were identified as Tier II somatic SNVs and INDELs (Fig. S2B; Table S4). The majority of the identified high-confidence somatic SNVs (76.7%) and INDELs (82.4%) have variant allele frequencies (VAF) less than 50% (Fig. S2C).
We applied the same mutation calling pipeline to the 86 leukemia-associated genes included in the target enrichment panel, and identified recurrently protein-coding gene mutations (Fig. S3A). The mutation rate of coding genes is comparable to or slightly higher than other published studies (Fig. S2D). By combining somatic variants in all leukemia and lymphoma samples, we identified 0 to 446 (median 75) coding mutations and 8 to 530 (median 99) non-coding mutations per megabase DNA in each sample, respectively. The frequencies of both coding and non-coding mutations were significantly higher in AML, lymphoma and ALL samples compared to normal HSPCs or the mixed genomic DNA control (Fig. S3A,B), but not significantly different between samples isolated from bone marrow (BM) and peripheral blood (PB), male and female, AML subtypes (FAB M0 to M4), or different age groups (Fig. S3C–E). By clustering analysis of all non-coding variants across the genome, we identified 4,076 mutational hotspots containing non-coding variants within 100bp of each other (Fig. 1B; see Methods).
To identify recurrently mutated CREs harboring somatic non-coding variants, we mapped the identified SNVs and INDELs to the 22,262 H3K27ac-associated CREs, respectively. CREs containing SNVs in at least six independent leukemia samples or INDELs in at least three leukemia samples were identified as recurrently mutated CREs based on a binominal distribution test (Fig. S2E; see Methods). We reasoned that multiple independent non-coding variants may impact the same CRE by affecting the binding sites (or motifs) of TFs in each CRE. Thus, we required that the same CRE to be recurrently mutated in independent samples, whereas the non-coding variants can be identical or different within the same CRE. By this analysis, we identified 1,836 recurrently mutated CREs containing leukemia-associated non-coding variants (Fig. S2E). The gene targets and functional roles of the vast majority of the identified variant-associated CREs in leukemia pathophysiology remain unknown.
High-throughput functional analysis of recurrently mutated CREs in leukemia
A major limiting step towards understanding non-coding variants in cancer pathophysiology is the lack of a high-throughput and unbiased approach to assess their functional effects. To overcome this challenge, we recently engineered enhanced CRISPR/dCas9-based epigenetic perturbation systems, enCRISPRi and enCRISPRa, for targeted modulation of CRE activities in native chromatin in situ and in vivo (Fig. 2A) (26). Specifically, we employed the sgRNA design containing two MS2 hairpins recognized by the MCP RNA-binding domains (27). For CRE inhibition (enCRISPRi), we fused dCas9 with LSD1 (or KDM1A), which catalyzes the removal of H3-Lys4 mono- and di-methylation (H3K4me1/2). We next engineered CRE-targeting sgRNAs containing MS2 hairpins to recruit the MCP-KRAB repressor domains. For CRE activation (enCRISPRa), we fused dCas9 with the core domain of p300, which catalyzes H3K27ac, together with sgRNA-MS2 to recruit the MCP-VP64 activator domains. Since H3K27ac and H3K4me1/2 are the signature histone marks for active enhancers and/or promoters (2,28), by inducible expression of dCas9-LSD1 (or dCas9-p300), sgRNA-MS2, and MCP-KRAB (or MCP-VP64), we engineered a CRE-targeting dual repressor or activator system (Fig. 2A) (26).
To investigate the functional role of the recurrently mutated CREs in human leukemia, we performed enCRISPRi and enCRISPRa-mediated perturbation screens in an established human AML cell line MKPL-1 (Fig. 2A). We first designed a pooled sgRNA library containing 4 or 5 individual sgRNAs per CRE (total 9,224 sgRNAs for 1,836 recurrently mutated CREs) and 198 non-targeting sgRNAs as negative controls (Fig. S4A; Tables S5 and S6). We then generated AML cell lines stably expressing Dox-inducible enCRISPRi or enCRISPRa transgenes (Fig. 2B). Upon lentiviral transduction of pooled sgRNAs at MOI < 0.5, the transduced cells were selected, induced for enCRISPRi or enCRISPRa expression, and cultured for 28 days before harvesting the genomic DNA. The sgRNA sequences were PCR amplified and measured by next-generation sequencing to determine the enrichment or dropout in day 28 (T28) relative to day 0 (T0) cells. We performed independent enCRISPRi and enCRISPRa perturbation screens (each with 3 replicate experiments) using the same pooled sgRNAs, respectively. The results from replicate screens were highly consistent (Fig. 2C and Fig. S4B–D). Moreover, the negative control sgRNAs displayed no or minimal dropout or enrichment relative to CRE-targeting sgRNAs in enCRISPRi and enCRISPRa screens (P = 0.69 and 0.472 by Welch’s t-test, respectively; Fig. S4E,F).
We reasoned that CREs showing strong leukemia cell growth inhibition (e.g. sgRNA dropout in T28 relative to T0) upon enCRISPRi-mediated repression and growth enhancement (e.g. sgRNA enrichment in T28 relative to T0) upon enCRISPRa-mediated activation should nominate candidate ‘oncogenic’ CREs. Likewise, CREs showing strong growth enhancement upon enCRISPRi and growth inhibition upon enCRISPRa should nominate candidate ‘tumor suppressive’ CREs. To this end, we combined results from enCRISPRi and enCRISPRa perturbation screens of 1,836 recurrently mutated CREs, and identified 131 candidate oncogenic CREs with significant sgRNA dropout by enCRISPRi (log2 fold change ≤ −1.5 in T28 relative to T0) and sgRNA enrichment by enCRISPRa (Figs. 2D and S4E,F; Table S7). Similarly, we identified 87 candidate tumor suppressive CREs, which displayed significant sgRNA dropout by enCRISPRa (log2 fold change ≤ −1.5 in T28 relative to T0) and sgRNA enrichment by enCRISPRi (Figs. 2D and S4E,F; Table S7). We next categorized all CREs into annotated promoters and enhancers, and identified the candidate oncogenic or tumor suppressive promoters and enhancers, respectively (Fig. S5A,B; Table S7). The candidate oncogenic or tumor suppressive promoters include several known leukemia-associated genes such as RUNX1, TAL1 and MSI2 (Fig. S5A). Importantly, while several candidate enhancers locate in the proximity of known leukemia-associated genes such as MYB and BCAT1, the vast majority of their gene targets are unknown, and their functional and mechanistic roles in leukemia remain unexplored. In addition, the oncogenic CRE-associated genes displayed strong dependency by CRISPR/Cas9-based screens in multiple AML cell lines (Fig. S6) (29). By comparing the identified 218 functional CREs (131 oncogenic and 87 tumor suppressive) with CREs containing non-coding mutational hotspots, we observed that 82 of 218 (37.6%) functional CREs overlapped with hotspot-containing CREs, indicating that the functional CREs are enriched with non-coding mutational hotspots (P = 4.0e-15 by hypergeometric distribution; Fig. 2E).
To validate the efficacy of enCRISPRi and enCRISPRa-mediated CRE perturbations on target gene expression, we cloned individual sgRNAs for 14 CREs including 5 candidate oncogenic CREs, 4 tumor suppressive CREs, and 5 other CREs (see Methods; Table S5). Upon co-expression of enCRISPRi or enCRISPRa with individual CRE-specific or non-targeting (sgGal4) sgRNAs in MKPL-1 cells, we noted that enCRISPRi resulted in significant downregulation of 13 of 14 genes relative to sgGal4 (≥ 2-fold, P < 0.05; Figs. 2F and S7A–C). Similarly, enCRISPRa resulted in significant upregulation of 12 of 14 genes (≥ 2-fold, P < 0.05). Importantly, we also observed significant gene repression or activation when targeting enCRISPRi or enCRISPRa to 4 of 5 other CREs (Figs. 2F and S7C), whereas no significant sgRNA enrichment or dropout was observed in the enCRISPRi/a screens (Fig. 2D). These results not only provide direct evidence to validate the efficacy of enCRISPRi/a in transcriptional repression/activation of CRE-associated genes, but also demonstrate that the lack of functional effect on AML cell growth (e.g. sgRNA enrichment or dropout) was not due to insufficient CRE repression/activation. To determine the efficacy of enCRISPRi/a across cell lines, we performed similar analyses in MOLM-13 AML and Jurkat ALL cells. We found that 13 of 14 genes were significantly repressed in AML (MKPL-1 and MOLM13) and ALL (Jurkat) cells by enCRISPRi (≥ 2-fold, P < 0.05), and 10 of 14 genes were significantly activated in AML and ALL cells by enCRISPRa (≥ 2-fold, P < 0.05) (Figs. 2F and S7A–C). By comparing the gene expression changes of all 14 CRE perturbations in AML (MKPL-1) and ALL (Jurkat) cells, we observed a significant and positive correlation of the targeted CREs (Pearson correlation coefficient R = 0.844, P < 0.0001; Fig. 2G). These results demonstrate that enCRISPRi and enCRISPRa resulted in consistent repression or activation of CRE-associated genes in both AML and ALL cells.
An enhancer controlling KRAS proto-oncogene harbors non-coding variant in leukemia
By enCRISPRi and enCRISPRa perturbation screens, we identified CRE4399, located ~135kb upstream of KRAS gene, as one of the variant-associated oncogenic enhancers (Figs. 2D and S5B). KRAS is a proto-oncogene that encodes a member of the small GTPase superfamily. Activating mutations in KRAS are common in solid tumors but relatively rare in hematologic malignancies (30,31); however, KRAS expression is increased and associates with poor survival in AML patients (Fig. S8A,B). CRE4399 displays strong chromatin accessibility by DNase I hypersensitivity (DHS), ATAC-seq and enrichment of H3K27ac signals, and interacts with the KRAS promoter through DNA looping by RNA Pol II (RNAPII)-mediated ChIA-PET or Hi-C analyses in normal CD34+ HSPCs and leukemia cell lines (K562 and THP1) (Figs. 3A and S8C), consistent with a putative enhancer element for KRAS. One of the identified non-coding variants locates at chr12:25538881:C>T within CRE4399 in an AML sample. To establish the functional requirement of variant-associated CRE4399 in KRAS gene regulation, we first performed CRISPR/Cas9-mediated knockout (KO) of CRE4399 in MKPL-1 cells (Fig. S10A,B). We found that the expression of KRAS mRNA was significantly impaired in multiple single-cell-derived CRE4399 KO clones, whereas the neighboring genes within the same topologically associating domain (TAD) were not affected (Figs. 3B and S10A,B). These results demonstrate that CRE4399 functions as a gene-distal enhancer for KRAS and is required for KRAS expression in AML cells.
We next determined the requirement of CRE4399 for AML cell growth by Dox-inducible enCRISPRi-mediated negative selection competition analysis in MKPL-1 cells (Figs. 3C and S8D; see Methods). We found that inducible repression of the KRAS enhancer (CRE4399) by independent sgRNAs resulted in consistent and significant inhibition of AML cell growth in vitro. Finally, we assessed the functional role of the KRAS enhancer in AML growth in vivo using xenotransplantation assays (Fig. 3D). As the positive control, KO of KRAS coding DNA sequences (KRAS KO) significantly impaired MKPL-1 cell growth in xenotransplanted NSG (NOD-scid IL2Rgnull) mice (Fig. 3D,E), consistent with the oncogenic role of KRAS in AML. Importantly, KO of KRAS enhancer (two independent single-cell-derived CRE4399 KO1 and KO2 clones) also impaired AML growth, resulting in less tumor burden with significantly decreased bioluminescence signals of the luciferase-expressing AML cells 4 weeks post-transplantation (Fig. 3D,E). Compared to WT control cells, KRAS enhancer KO cells led to significantly less leukemic burden in bone marrow and peripheral blood of xenotransplanted recipients by flow cytometry and bloodsmear analyses (Fig. 3F,G). These results demonstrate that KRAS and its variant-associated enhancer are required for AML cell growth in vitro and in vivo.
KRAS enhancer variants co-localize with nuclear receptor binding sites
Non-coding variants can interfere with transcriptional regulation by modulating the DNA binding activity of TFs. By surveying candidate TF binding sites (or motifs) within or near KRAS enhancer-associated variant, we identified several motifs for nuclear receptor (NR) family TFs including PPARG (PPARγ) and RXRA (RXRα) that co-localize with the KRAS enhancer variant (Fig. 3H). The canonical sequence of PPAR response element (PPRE) is composed of two AGGTCA-like motifs directionally aligned with a single nucleotide spacer. When heterodimerized with RXRA, the PPARG:RXRA motif may contact two half sites from PPARG and RXRA, respectively. It is important to note that PPARs may recognize a 12-bp DNA sequence (WAWVTRGGBBAH) instead of the generally accepted 6-bp sequence (AGGTCA) in vitro and in vivo (32). The optimized RXRA hexad binding sequence is RGKTYA, which makes the optimal PPAR/RXRA binding sequence as WAWVTRGGBBAHRGKTYA (32). The predicted PPARG:RXRA motif at the KRAS enhancer appears to match the optimized PPRE with the AML-associated non-coding variant overlapping with one of the half-sites (Fig. S8E). We next determined whether the KRAS enhancer harbors recurrent mutations in other leukemia samples or other types of human cancers. By analyzing the targeting CRE sequencing and whole genome sequencing (WGS) datasets from ICGC (http://icgc.org), COSMIC (33) and cBioPortal (34), we identified 13 additional variants at the KRAS enhancer in 18 individual cancer samples including T-ALL, breast, lung, and pancreatic cancers (Figs. 3I and S8E; Table S8). Importantly, 8 of 14 unique mutation sites at KRAS enhancer locate in the proximity (+/− 10bp) of binding sites for PPARG/RXRA and/or PPARs (Fig. 3I and S8E), suggesting that non-coding variant-mediated alterations of NR binding at KRAS enhancer may be a generalizable mechanism in cancer cells.
PER2, a clock gene, is controlled by a distal enhancer in leukemia cells
By CRE perturbation screens, we also discovered that CRE12661 near the PER2 gene was among the candidate tumor suppressive enhancers in AML cells (Figs. 2D and S5B). PER2 acts as a negative regulator of CLOCK and BMAL1 genes in controlling circadian rhythm (35). Notably, Per2 mutant mice are deficient in circadian clock function and are unusually cancer prone (36). These mice show a neoplastic growth phenotype and an increased sensitivity to γ radiation, indicating a tumor suppressive role. Lowered PER2 expression is common in many tumors including AML (37,38) and is associated with poor survival (Fig. S9A,B), whereas forced expression of PER2 leads to growth inhibition and apoptosis of cancer cells (37). Emerging evidence in several in vivo models also points to important roles of other core circadian genes in tumorigenesis (39,40). Disruption of the circadian machinery leads to changes in cellular function such as cell division and metabolism, both highly relevant to cancer. Despite these findings, the function and regulation of PER2 in human leukemia remain poorly understood.
Located ~11kb downstream of the PER2 gene, CRE12661 displays strong chromatin accessibility and H3K27ac, and interacts with the PER2 promoter by DNA looping in normal CD34+ HSPCs and leukemia cell lines (Figs. 4A and S9C), consistent with a putative gene-distal enhancer for PER2. Using similar approaches, we found that CRISPR/Cas9-mediated CRE12661 KO resulted in significant downregulation of PER2 without affecting neighboring genes within the same TAD in MKPL-1 cells (Figs. 4B and S10C,D). Moreover, inducible enCRISPRa-mediated activation of CRE12661 resulted in consistent and significant inhibition of MKPL-1 cell growth in vitro (Figs. 4C and S9D), consistent with the tumor suppressive function of PER2 enhancer in leukemia. To establish the requirement of PER2 enhancer in leukemia development in vivo, we xenografted CRE12661 KO MKPL-1 cells (two independent single-cell-derived KO1 and KO2 clones) together with PER2 overexpression (OE) cells as controls. We observed that CRE12661 KO significantly promoted AML growth, tumor burden, and leukemic phenotypes in xenotransplanted NSG mice, whereas PER2 OE significantly inhibited leukemia development in vivo (Fig. 4D–G). Together, these results not only identify a variant-associated gene-distal enhancer for PER2, but also demonstrate that PER2 and its enhancer play a tumor suppressive role in AML cells.
Non-coding variants at a PER2 enhancer co-localize with NR binding sites
By targeted CRE sequencing, we identified two high-confidence non-coding variants within the PER2 enhancer (CRE12661) including chr2:239142998:A>CA in an AML sample. By surveying candidate TF binding sites near the PER2 enhancer variant, we also identified enrichment of PPARG:RXRA and PPARA:RXRA motifs within 10bp of the variant (Figs. 4H and S9E). The predicted PPARG:RXRA motif at the PER2 enhancer contain only one conserved half-site that matches the canonical PPRE, suggesting that it may be a non-canonical PPRE or a weak canonical PPRE (41). Of note, the identified AML-associated variant (chr2:239142998:A>CA) at the PER2 enhancer locates 6bp upstream of the predicted PPARG:RXRA motif (Fig. S11A). The 5’ flanking sequences at PPAR/RXR binding sites have also been shown to be essential for PPAR binding and NR subtype specificity (42–44). Therefore, the AML-associated non-coding variant at PER2 enhancer may also interfere with NR function by affecting the 5’ flanking sequence of the PPARG/RXRA binding site. Moreover, analysis of the targeted CRE sequencing and WGS datasets from ICGC, COSMIC and cBioPortal revealed 10 additional non-coding variants at the PER2 enhancer in 15 individual cancer samples (Fig. 4I; Table S8). Importantly, 9 of 11 unique mutation sites at PER2 enhancer locate in the proximity (+/− 10bp) of binding sites for PPARG/RXRA and/or PPARs (Figs. 4I and S9E), consistent with a broad role for non-coding variants in modulating NR binding at the PER2 enhancer in cancer cells.
Non-coding variants modulate NR-mediated transcriptional activity at KRAS and PER2 enhancers
To directly assess whether the non-coding variants impair NR-dependent transcriptional activities at KRAS and PER2 enhancers, we performed enhancer reporter assays in 3 independent cell types including AML (MKPL-1), erythroleukemia (K562) and 293T (Figs. 5A–C and S11B–E). Compared to the no enhancer control, wild-type (WT) KRAS enhancer sequence significantly activated luciferase reporter expression in all 3 cell types (3.0 to 7.9-fold, P < 0.01; Fig. S11B). The leukemia variant-containing KRAS enhancer sequence (MUT) further increased reporter expression (6.1 to 15.4-fold relative to control, P < 0.01; 1.4 to 2.2-fold relative to WT, P < 0.01). Importantly, activation of RXRs and RARs by 9-cis-RA (45), PPARs by bezafibrate (46) or PPARγ (PPARG) by Rosiglitazone (47), but not PPARα or PPARβ (fenofibrate and GW0742), further enhanced WT KRAS enhancer activity (2.0 to 3.2-fold relative to DMSO, P < 0.001; Figs. 5B and S11D), consistent with the presence of PPRE at the KRAS enhancer. Both PPARG and RXRA are expressed in 3 cell types (Fig. S11F). These results suggest that the KRAS enhancer contains a functional PPRE and PPARG/RXRA can activate KRAS enhancer upon ligand-induced NR signaling. More importantly, KRAS mutant enhancer displayed increased enhancer activity upon activation of RXR/RAR and PPARG signaling (4.4 to 5.8-fold relative to DMSO, P < 0.001; Figs. 5B and S11D), whereas depletion of PPARG and RXRA significantly impaired NR-mediated activation of KRAS WT and MUT enhancers in AML cells (Fig. S11G–I), suggesting that the non-coding variant at the KRAS enhancer increased both the basal and NR-induced enhancer activities in leukemia cells. Consistent with these findings, we found by ChIP-seq and ChIP-qPCR that RXRA and PPARG strongly associate with the endogenous KRAS enhancer in multiple human cancer cell types including MKPL-1 AML cells (Fig. 5D–G).
To test whether the non-coding variants indeed enhance NR binding at the KRAS enhancer, we performed a series of electrophoretic mobility shift assay (EMSA) experiments (Fig. 5H). First, we confirmed the binding of PPARG and RXRA individually or together using the validated PPARG:RXRA motif-containing DNA probe (Fig. S11J; Table S5). Second, we compared the binding affinity of PPARG/RXRA to the WT or MUT CRE4399 DNA sequence, and observed a modest but significant increase in PPARG/RXRA binding at the MUT sequence (Figs. 5H and S11K). Third, the PPARG/RXRA-binding probe, but not the non-specific competitor, effectively abolished protein binding to both WT and MUT sequences, indicating specific binding of PPARG/RXRA proteins. Finally, PPARG and RXRA antibodies further increased the mobility shift of protein-DNA complexes (supershift; Fig. 5H), suggesting that binding to the KRAS enhancer sequence involves PPARG/RXRA proteins. A significant increase in supershift was also observed at the MUT relative to WT sequence (Figs. 5H and S11K). Together, these results support a model that non-coding variant at KRAS enhancer functions to increase the DNA binding of NR proteins and enhance NR-mediated transcriptional activation of KRAS expression in leukemia cells.
We next investigated the role of AML-associated variant (chr2:239142998:A>CA) at the PER2 enhancer with or without NR agonists. PER2 WT enhancer sequence activated luciferase reporter expression by 2.3 to 35.3-fold, which was significantly impaired by the variant-containing MUT enhancer sequence (Fig. S11C). Activation of RXR/RAR by 9-cis-RA, PPARs by bezafibrate, or PPARG by Rosiglitazone further enhanced WT PER2 enhancer activity by 3.9- to 4.8-fold; however, the variant-containing MUT enhancer displayed decreased activity (Figs. 5C and S11E). Depletion of PPARG and RXRA significantly impaired the PER2 WT and MUT enhancer activity (Fig. S11I), suggesting that the PER2 enhancer variant attenuates both the basal and NR-induced enhancer activities. RXRA and PPARG strongly associated with the PER2 enhancer in cancer cell types including AML cells (Fig. 5E–G). By a series of EMSA experiments, we observed that the PER2 MUT enhancer sequence significantly impaired the binding of PPARG and RXRA proteins (Figs. 5I and S11K). Similar to the KRAS enhancer, PPARG/RXRA binding to the PER2 enhancer was effectively abolished by the PPARG/RXRA-binding probe and supershifted by PPARG/RXRA antibodies. Together, these results support a model that the non-coding variant at the PER2 enhancer functions to impair NR-mediated PER2 activation in leukemia.
Knock-in of non-coding variants modulates KRAS and PER2 enhancer activity in situ
Enhancers function by recruiting sequence-specific TFs, co-regulators and chromatin factors in a native chromatin to control spatiotemporal gene transcription. We next determined whether the leukemia-associated variants impact endogenous enhancer activity by site-specific knock-in (KI) of non-coding variants to the wild-type enhancer sequences in leukemia cells. To this end, we designed sgRNA targeting the endogenous KRAS (CRE4399) or PER2 (CRE12661) enhancer sequence, respectively. We next co-expressed the Cas9 nuclease, KRAS or PER2 enhancer-targeting sgRNA, and the single-strand donor oligo (ssDNA donor) containing the targeted variant and PAM site mutation, which rendered the targeted KI allele resistant to Cas9 cleavage, in human K562 leukemia cells (Fig. 6A). Upon Cas9/sgRNA-mediated cleavage of the targeted WT enhancer DNA, the KI allele containing the non-coding variant was introduced by homology directed repair (HDR).
We generated two independent single-cell-derived KI cell lines (KI-Mut1 and KI-Mut2), each containing non-coding variant at one of the KRAS (CRE4399) or PER2 (CRE12661) enhancer sequence as monoallelic KI, respectively (Fig. S12A,B). We first determined the effects of enhancer variants on the baseline and NR-induced gene expression. Compared to the unmodified WT cells, KI of KRAS enhancer variant (chr12:25538881:C>T) increased the baseline KRAS expression in DMSO-treated KI-Mut cells (2.7 and 2.2-fold, P < 0.001; Fig. 6B). Upon activation of NR signaling by treatment with 1μM 9-cis-RA and 5μM Rosiglitazone, the expression of KRAS was further increased in KI-Mut relative to WT cells (5.5 and 5.1-fold, P < 0.001). These results suggest that the leukemia-associated non-coding variant functions to enhance NR-mediated transcriptional activation of KRAS enhancer in leukemia cells. By contrast, KI of PER2 enhancer variant (chr2:239142998:A>CA) led to modest but significant downregulation of the baseline and NR-induced PER2 expression (Fig. 6C), consistent with the role of PER2 enhancer variant in attenuating its activity in leukemia. We next determined the effects of enhancer variants on the binding of nuclear receptors in native chromatin by ChIP assays using antibodies for PPARG and RXRA in KI-Mut cells. The NR-immunoprecipitated or input control DNA were quantified for allele frequency by amplicon sequencing. Compared to input DNA, PPARG/RXRA ChIP DNA were significantly enriched with KI-Mut relative to WT alleles in two independent KI cell lines (Fig. 6D), suggesting that the KRAS enhancer variant enhanced NR chromatin occupancy in leukemia cells. By contrast, the PER2 KI-Mut allele was significantly depleted relative to WT allele in PPARG/RXRA ChIP DNA (Fig. 6E), suggesting that the PER2 enhancer variant impaired NR chromatin occupancy in leukemia cells. Therefore, by site-specific KI to endogenous enhancer sequences, our findings demonstrate that the leukemia-associated non-coding variants modulate NR chromatin occupancy at KRAS and PER2 enhancers in situ, resulting in altered enhancer function and aberrant expression of KRAS and PER2 in leukemia cells.
We next determined the effect of KRAS or PER2 enhancer KI on leukemia development by xenotransplantation of WT or KI cells to NSG mice. We found that KRAS enhancer KI enhanced leukemogenesis, resulting in significantly increased leukemia burden in the peripheral blood of recipients (Fig. S12C). These results are consistent with the oncogenic role of KRAS enhancer variant in AML by enhancing NR-mediated activation of the proto-oncogene KRAS. Similarly, we noted enhanced leukemia development by PER2 enhancer KI (Fig. S12D), consistent with the function of PER2 enhancer variant in leukemia by impairing NR-mediated activation of the tumor suppressor PER2. Finally, we performed motif enrichment analysis of DNA sequences surrounding the non-coding variants (+/− 25bp) at the functionally validated oncogenic CREs from enCRISPRi and enCRISPRa screens (Fig. 2D). We also identified PPAR, RXRA and RARA as the top enriched NR motifs relative to genomic background (Fig. 6F), suggesting that alterations of NR transcriptional activity by non-coding variants are frequent events in human leukemia. Our analyses of two examples (KRAS and PER2) strongly support the context-specific function of non-coding variants that may confer either loss- or gain-of-function activity (Fig. 6G). Importantly, loss- and gain-of-function activities associated with enhancer variants can be explained by the interference with DNA binding of NRs and the signaling-induced transcriptional activity for at least a subset of affected CREs in vitro and in native chromatin. Taken together, these studies support a model that pathogenic non-coding variants cooperate with nuclear receptors to rewire signaling-dependent gene programs in cancer.
DISCUSSION
Non-coding variants in cancer pathophysiology
Compared to coding mutations that often contribute to cancer development through ‘qualitative’ changes in protein expression, structure or function, non-coding variants display several unique features including: 1) ‘quantitative’ control of gene expression through modulation of cis-regulatory elements and/or 3D genome organization; 2) integration with gene transcription through modulation of DNA binding of TFs, chromatin modifying complexes and/or structure proteins; 3) interaction with cellular signaling pathways and downstream effectors, as illustrated in this study. These features allow a more ‘plastic’ and reversible control of gene expression in response to altered growth and signaling pathways, which may be important for the maladaptation of cancer cells in tumor microenvironments.
Given that non-coding sequences consist of nearly 99% of human genome, and the incidences of non-coding variants significantly outnumbered coding variants, a major challenge is to assign functional relevance and prioritize non-coding variants for in-depth mechanistic studies. The development of the CRE-targeting epigenetic editing system allowed us to systematically interrogate the impact of CRE repression or activation on cell growth phenotypes, and prioritize functionally validated CREs in an orthogonal fashion with mutational analyses. One caveat of using cell growth phenotypes as the functional ‘readout’ is that CRE variants may affect other pathological features such as cancer invasion and/or escape from immune surveillance that are likely missed in growth-based studies. Integrative analyses of both CRE repression (enCRISPRi) and activation (enCRISPRa) screens provide higher confidence to establish the causality between CRE activity and growth phenotypes. It is reassuring that the top ranked ‘oncogenic’ or ‘tumor suppressive’ CREs consist of promoters or enhancers in the proximity of several known leukemia-associated genes such as RUNX1, MYB, TAL1 and MSI2. More importantly, we also identified variant-associated enhancers for KRAS and PER2 with unexpected but strong effects on cell growth upon activation or repression. Overall, the mutational analyses (Fig. 1), CRE perturbation screens (Fig. 2), functional (Figs. 3,4) and molecular (Figs. 5,6) studies support the model that recurrent non-coding variants impinge on CREs and associated genetic pathways as critical regulatory nodes to control cancer cell phenotypes.
We noted heightened mutation frequency at non-coding mutational hotspots relative to protein-coding sequences or other genomic regions. The increased mutation frequency at the non-coding regulatory genome may be due to mechanisms associated with DNA accessibility, replication timing, chromatin organization and/or impaired nucleotide excision repair (48–51). The higher sequencing coverage in this study may also increase the detection sensitivity in heterogeneous tumor samples. It is also important to note the limitations of using the human reference genome in enhancer annotation and mutation analyses. For instance, genome rearrangements may cause aberrant CRE function by juxtaposition and/or re-organization of chromatin structure, and these aberrations can be missed by short read sequencing. In addition, it is possible that some of the highly recurrent non-coding mutations may be due to mis-annotation and/or non-representative sequences in the reference genome. Hence our studies highlight the importance of follow-up studies, such as the enCRISPRi and enCRISPRa screens described here, to identify ‘functionally’ relevant CREs or genes for detailed molecular studies.
KRAS and PER2 enhancer variants in hematopoietic malignancies
By demonstrating that the KRAS enhancer variant confer gain-of-function activity to promote KRAS expression in leukemia cells, our results provide a mechanism that may explain the lack of KRAS hotspot coding mutations compared to other RAS family proteins such as NRAS. NRAS hotspot coding mutations (e.g. G12D, G13D and Q61R) are prevalent in hematologic malignancies, whereas KRAS mutations are rarely found (30,31). KrasG12D mutation was found to impair normal HSC self-renewal and differentiation of multiple hematopoietic lineages even in the heterozygous state, resulting in rapid and fatal myeloproliferative diseases in mice (52–55), suggesting that KRAS coding mutations may be detrimental to normal HSCs. By contrast, modest overexpression (~2-fold) of the wild-type KRAS protein in mice increased HSC proliferation and repopulating activity, and improved hematopoietic regeneration and survival following high-dose irradiation (56). Non-coding variants at the KRAS enhancer resulted in quantitative changes in KRAS expression without qualitative alterations in KRAS protein, which may be advantageous for the clonal progression of leukemia-initiating cells without impairing normal HSCs. Hence, the dosage regulation of proto-oncogenes through altered non-coding cis-elements may expand the current catalog of genetic alterations as cancer drivers.
Our studies focusing on KRAS and PER2 also support the context-dependent function of non-coding variants that may confer loss- or gain-of-function activity. Interestingly, alterations at both enhancers can be explained by the interference with nuclear receptors to enhance or impair NR binding. Since KRAS and PER2 variant-containing enhancers are located 135kb and 11kb away from the coding sequences, respectively, it remains challenging to assess how enhancer variants affect allele-specific gene transcription at native chromatin in response to NR signaling. Nonetheless, site-specific KI of enhancer variants resulted in alterations of KRAS or PER2 expression in leukemia cells, respectively. Moreover, motifs for NRs are highly enriched in variant-associated enhancers in leukemia and other cancers, suggesting that altered NR binding by non-coding variants may be a generalizable mechanism to deregulate proto-oncogenes or tumor suppressors. These findings contrast with coding mutations that usually function through distinct mechanisms for proto-oncogenes (gain-of-function) and tumor suppressors (loss-of-function). Rather, enhancer variants for proto-oncogenes or tumor suppressors may converge on the same regulatory pathways, such as those reported here, and represent a unique feature of pathogenic non-coding variants.
Non-coding variants link enhancer dysfunction with NR signaling
Non-coding regulatory variants may modulate NR function in human diseases by affecting NR expression or their chromatin binding. For instance, deregulated ER signaling due to non-coding variants at cis-elements controlling the estrogen receptor-α (ESR1) gene underlies the persistent expression of ESR1 in breast cancers (57). Natural genetic variations at PPARG binding sites modulate PPARG binding activity and subsequent expression of quantitative trait loci associated with disease risk and drug response in diabetes (58). Here, we uncover a new molecular link between NR signaling and non-coding variants through dysregulated enhancer function in human leukemia. In hematopoiesis, RXRA controls a set of genes required for HSC self-renewal, and its downregulation allows the differentiation of myeloid lineages (20). Non-coding variants may enhance or attenuate PPAR/RXR binding at a subset of enhancers (e.g. KRAS and PER2) to modulate NR-mediated transcriptional programs. Altered expression of NR target genes may impair HSC self-renewal and/or lineage differentiation to promote the development of hematological disorders. Moreover, physiological or therapeutic activation of NR signaling may further cooperate with variant-containing enhancers, as illustrated in this study, resulting in amplification of the effects associated with dysregulated transcriptional programs. Therefore, our results support a mechanism underlying deregulated NR signaling through interactions with non-coding regulatory variants in human cancer. Comprehensive categorization of non-coding variants may identify new diagnostic biomarkers that are independent of protein-coding genes. Pathogenic non-coding variants may also serve as potential therapeutic targets through genetic (e.g. enCRISPRa, enCRISPRi or base editing) or pharmacological (e.g. NR agonists and antagonists) approaches. Hence, our study provides an integrative framework to identify and functionally dissect non-coding regulatory variants in human diseases including cancer.
METHODS
Patient Samples
Primary human leukemia and lymphoma samples were collected for diagnosis and de-identified for our study. This study was approved by the Institutional Review Board at UT Southwestern (IRB STU 122013–023). Samples were frozen in fetal bovine serum (FBS) with 10% DMSO and stored in liquid nitrogen.
Cells and Cell Culture
Human MKPL-1, MOLM-13, K562, and Jurkat cells were cultured in RPMI1640 medium containing 10% FBS, 1% penicillin/streptomycin. 293T cells were cultured in DMEM medium containing 10% FBS, 1% penicillin/streptomycin. For agonists treatment, cells were cultured with 9-cis-RA (1μM, Sigma-Aldrich #R4643), ATRA (1μM, Sigma-Aldrich #R2625), Bezafibrate (60μM, Sigma-Aldrich #B9273), Fenofibrate (30μM, Sigma-Aldrich #F6020), GW0742 (0.05μM, Sigma-Aldrich #G3295) or Rosiglitazone (5μM, Sigma-Aldrich #R2408) for 48–72h, respectively. All cultures were incubated at 37°C in 5% CO2. All cell lines were tested for mycoplasma contamination. No cell lines used in this study were found in the database of commonly misidentified cell lines that are maintained by ICLAC and NCBI BioSample.
Mice and Xenograft Experiments
NOD-SCID (NSG) mice were purchased from and maintained at the animal core facility of UT Southwestern. Animal studies were approved and conducted under the oversight of the Institutional Animal Care and Use Committee (IACUC) at UT Southwestern. In brief, luciferase cassette was amplified and cloned into pLVX-Puro vector (Clontech, Catalog No. 632164). Lentivirus was produced to infect the target cells. Puromycin selection (1 μg/ml) was performed 3 days after infection. Six to eight weeks old female NSG mice were sub-lethally irradiated (2.5 Gy) half day before the transplantation. Cells (1 × 106/mice) were resuspended in PBS (200 μl/mice) and intravenously transplanted. Transplanted mice underwent in vivo bioluminescence imaging at various time points to evaluate the tumor growth. Briefly, following intraperitoneal injection of 150 mg/kg D-luciferin (Gold Biotechnology), mice were imaged, and bioluminescence intensity was quantitated using Living Image 3.2 acquisition and analysis software (Caliper Life Sciences). Total flux values were determined by drawing regions of interest (ROI) of identical size over each mouse and were presented in photons (p)/second (sec). PER2 and KRAS enhancer KO cells were analyzed side-by-side using the same control mice, thus the same control bioluminescence images were used in Figs. 3D and 4D. Four weeks after transplantation, the peripheral blood and bone marrow cells were assessed for engraftment by flow cytometry. Bloodsmear was performed and stained with May-Grunwald-Giemsa. For enhancer KI xenotransplantation, cells were transduced with pLVX-EF1a-IRES-zsGreen1 lentivirus and intravenously transplanted as described above, followed by intraperitoneal injections of 10 mg/kg rosiglitazone and 1 mg/kg 9-cis-RA twice a week for 6 weeks prior to peripheral blood collection and analysis.
Plasmids
To generate the inducible dCas9-p300 expression vector, the p300 core domain was amplified from the pcDNA-dCas9-p300-Core vector (Addgene, #61357) and cloned into MluI/BstXI digested pHR-TRE3G-KRAB-dCas9-P2A-mCherry backbone (59). To generate the inducible dCas9-LSD1 expression vector, LSD1 open reading frame (ORF) was amplified and cloned into the pHR-TRE3G-KRAB-dCas9-P2A-mCherry to replace the KRAB domain. To generate the enCRISPRa sgRNA vector, the MCP-VP64-IRES-mCherry cassette was amplified from the pJZC34 vector (Addgene, #62331) and cloned into BsrGI/EcoRI digested lenti-sgRNA (MS2)-zeo backbone (Addgene, #61427). Then the mCherry cassette was replaced by puromycin or zsGreen1 by In-Fusion® HD Cloning Kit (Clontech). To generate the enCRISPRi sgRNA vector, the KRAB sequence was amplified from the pLV hU6-sgRNA hUbC-dCas9-KRAB-T2A-Puro vector (Addgene, #71236) and cloned into the enCRISPRa sgRNA vector to replace VP64. The lentiviral shRNAs for PPARG (TRCN0000001673) and RXRA (TRCN0000330707) in pLKO.1 vector were obtained from Sigma-Aldrich.
Annotation of Blood CRE Repertories and Target Genes
We collected 72 independent H3K27ac ChIP-seq datasets from 17 studies (5,7,8,16,60–71) including normal (lymphocyte, monocyte, macrophage, erythroblast, and CD34+ HSPC) and diseased hematopoietic cell types (AML, erythroleukemia, T-ALL, T-lymphoma, and B-lymphoma) (51 normal and 21 disease samples; Table S1). The erythroleukemia K562 cell line was grouped with AML and MDS samples in this study. ChIP-seq raw reads were aligned to the human (hg19) genome assembly using Bowtie2 (72) with –k 1. Only tags that uniquely mapped to the genome were used. Peaks were identified by MACS version 1.4.2 (73) with P value of 10−5, and ranked by fold enrichment and P value in each dataset. Then 150bp upstream and downstream regions from summits of the top 30% ranked peaks were combined based on subsets in normal and disease groups, respectively. To minimize the number of overlapping regions, the peaks within a distance of 200–400bp were clustered and the overlapping peaks were then merged using Galaxy interval operation tools (https://usegalaxy.org/). In total, we defined 22,262 H3K27ac-associated CREs with an average size of 952bp. For the genome-wide analysis, the nearest neighbor genes within 50kb of a given gene-distal CRE are assigned as the CRE-associated genes. For CREs that no gene was found within 50kb, the associated genes for these CREs were denoted as NA. For the single gene locus (e.g. KRAS and PER2), the CREs were assigned to candidate target gene(s) by the following criteria: 1) the presence of CRE-promoter interactions based on Hi-C and/or ChIA-PET analyses; 2) the expression of the candidate target gene(s) was significantly impaired upon CRISPR/Cas9-mediated KO of the candidate CREs. Human genome assembly (hg19) was used for all gene annotation analyses.
Targeted CRE Resequencing
Targeted resequencing of CREs was performed using Ovation® Target Enrichment System (NuGEN) following the manufacturer’s protocols. Briefly, probes were designed based on both sense and anti-sense DNA sequences of the targeted genomic region every 200~250bp. Genomic DNA (gDNA) from human primary leukemia samples, control samples or leukemia cell lines were isolated by DNeasy Blood & Tissue Kit (Qiagen) and quantified. After fragmentation to ~500bp using Covaris acoustic shearing following the manufacturer’s protocol, 10~500 ng of gDNA were end-repaired and individually ligated with barcoded adaptors. Following purification using AMPure XP beads (Beckman Coulter), samples are combined for multiplexed target enrichment. During this step, the targeting probes were annealed and extended to create amplifiable DNA molecules. Following a second bead purification, qPCR was performed to determine the optimal number of PCR cycles to use in the library amplification step. After library amplification, the multiplexed, target enriched libraries were purified, quantified, and sequenced on an Illumina NextSeq500 system using the 150bp high output sequencing kit. Initial data processing and alignments were performed using commonly used analytical tools. Specifically, for each sample, the FASTQ files were aggregated into single files for read 1 and 2. The adaptor and low-quality bases (quality < 20) were trimmed using trim_galore version 0.4.0 (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) with parameter --stringency 1 -e 0.1 --length 20. Bowtie version 2.1.0 (72) was used to align the read pairs for each sample. PCR duplications were marked using a customized pipeline. Then Picard (version 1.127) (https://broadinstitute.github.io/picard/) were used to sort and convert SAM to BAM files. RealignerTargetCreator/IndelRealigner from The Genome Analysis Toolkit (GATK version 3.3) (21) was used to realign the reads around insertions and deletions. BaseRecalibrator/PrintReads from GATK was used to perform base quality score recalibration. The resulting BAM files were then used as input for mutation discovery analysis.
Mutation Discovery and Validation
For tumor-normal pairs of primary leukemia samples, the single-nucleotide variations (SNVs) were identified using GATK (21), MuTect version 1.1.4 (22), Strelka version 1.0.14 (23) and VarScan2 version 2.3.9 (25). The insertions and deletions (INDELs) were identified using GATK, Strelka version 1.0.14, VarScan2 version 2.3.9 and Scalpel version 0.4.1 (24). MuTect version 1.1.4 and Strelka version 1.0.14 were run with default parameters. VarScan2 version 2.3.9 and Scalpel version 0.4.1 were run in somatic mode with the recommended filtering scheme. High-confidence (Tier I) somatic SNVs were defined as SNVs called by at least two out of four mutation callers (GATK, MuTect, Strelka and VarScan2). High-confidence (Tier I) somatic INDELs were defined as INDELs called by at least two out of four callers (GATK, Strelka, Scalpel and VarScan2). Variants located in repeat-masked regions were excluded from our analysis. Repeat-masked regions were downloaded from UCSC (https://genome.ucsc.edu/cgi-bin/hgTables). For all leukemia or lymphoma samples and cell lines, SNVs and INDELs were identified using GATK HaplotypeCaller. SNVs was filtered using --filterExpression “QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0”. INDELs was filtered using --filterExpression “QD < 2.0 || FS > 200.0 || ReadPosRankSum < −20.0”. Common genetic variants included in dbSNP (build 138) were filtered. The resulting variants were identified as the Tier II somatic SNVs and INDELs. The variants in coding regions were annotated by annovar version 2015–06-17 (74). Synonymous SNVs were excluded. To validate the overall targeted sequencing and mutation calling pipelines, we included 3 control samples including two normal CD34+ HSPCs from healthy donors and one mixed genomic DNA control (Ctrl-17) containing equal molar ratio of gDNA from 17 individual non-cancer cell types.
To identify recurrently mutated CREs, we determined the number of independent samples using a binomial distribution as previously described (75) with modifications. Briefly, we assumed that the observed number of mutated samples, k, for a CRE followed a binomial distribution, binomial (n, pi), where n was the total number of samples with mutation data and pi was the estimated sample mutation rate for CRE i under the null hypothesis that the region was not recurrently mutated. We could therefore compute the following P value:
Here we assumed that pi depended on the length Li of the CRE and the estimated nucleotide mutation rate qi for the region under the null hypothesis as follows:
The background mutation frequency qi was estimated by dividing the total number of observed mutations by the length of all CREs. By this analysis, we found that the probability of observing a 952bp CRE (the average size of all CREs) harboring SNVs in ≥ 2 of 29 tumor/normal pairs, ≥ 2 of 28 leukemia samples and ≥ 2 of 29 leukemia cell lines is 0.027. Similarly, the probability of observing a 952bp CRE harboring INDELs in ≥ 1 of 29 tumor/normal pairs, ≥ 1 of 28 leukemia samples and ≥ 1 of 29 leukemia cell lines is 0.098.
Identification of Non-Coding Mutational Hotspots
Mutational hotspots were identified as previously described (75) with modifications. Specifically, all high-confidence Tier I mutations within 100bp of each other were merged using BEDTools (76) into hotspot clusters until no cluster was found within 100bp of another cluster. Clusters with only one or two mutations were removed from further consideration. A P value was calculated for each cluster using the negative binomial distribution, taking into account the length of the candidate hotspot cluster, the number of mutations in the cluster, and a background mutation rate for the cluster. The cluster background mutation rate was calculated as the mean of the background mutation probability for each sample that had a mutation represented in the cluster. The background mutation probability of each sample was calculated as the total number of mutations divided by the target region size. P values were adjusted for multiple testing with the multtest R package using the Benjamini-Hochberg method, and hotspot clusters were ranked accordingly. For the illustration of the top 100 recurrently mutated hotspots, Tier II non-coding mutations were included in order to compare mutations across different leukemia and lymphoma samples.
Motif Analysis
We scanned motifs around high-confidence somatic variants (Tier 1) at KRAS and PER2 enhancers using MotifScan tool (http://bioinfo.sibs.ac.cn/shaolab/motifscan/index.php). Briefly, to search for candidate motif targets in the given DNA sequences, the program scanned the sequences with a window of the same length as the motif, and defined raw motif score of the sequence S in the window as the ratio of the probability to observe target sequence S given the motif’s Position Weight Matrix (PWM) M and the probability to observe S given the genome background B. For each annotated motif, we modeled the genome background distribution of motif scores by randomly sampling the genome 106 times, and defined the targets of this motif as the candidates with motif scores higher than the cutoff. The PWM matrix of motifs was downloaded from the Jaspar database (77). The enrichment of each motif was represented by the ratio of motif target densities at target regions compared to random control regions, together with a P value calculated from hypergeometric distribution.
ChIP and Data Analysis
ChIP-seq or ChIP-qPCR was performed as described (78) using antibodies for H3K27ac (Abcam, ab4729), PPARG (Santa Cruz, sc-271392) or RXRA (Santa Cruz, sc-515928) in HL60, K562, MKPL-1, MV4–11, THP1 and U937 cells, respectively. ChIP-seq libraries were generated using NEBNext Ultra II DNA library prep kit following the manufacturer’s protocol (NEB), and sequenced on an Illumina NextSeq500 system using the 75bp high output sequencing kit. ChIP-seq raw reads were aligned to the human hg19 genome assembly using Bowtie2 (72) with the default parameters. Only tags that uniquely mapped to the genome were used.
ATAC-seq and Data Analysis
ATAC-seq was performed in K562, MKPL-1, MV4–11, NB4 and U937 cells as previously described (79). ATAC-seq raw reads were trimmed to remove adaptor sequence and aligned to human genome assembly (hg19) using Bowtie2 (72) with default parameters. Only tags that uniquely mapped to the genome were used.
ChIA-PET and Hi-C Data Analysis
The RNAPII ChIA-PET data in K562 cells was downloaded from GEO with the accession number GSM970213. The in situ Hi-C data for K562 cells and CD34+ HSPCs were downloaded from GEO with accession numbers GSM1551618 and GSM2861708, respectively. The in situ Hi-C data for THP1 cells were downloaded from Juicebox (80). The processed bigwig or hic files by Juicer (80,81) were visualized using epigenome browser (https://epigenomegateway.wustl.edu/) as ARC or heatmap.
RNA Isolation and qRT-PCR Analysis
Total RNA was isolated using RNeasy Plus Mini Kit (Qiagen) following manufacturer’s protocol. For quantitative RT-PCR (qRT-PCR) analysis, RNA was reverse-transcribed using iScript cDNA Synthesis Kit (Bio-Rad) following manufacturer’s protocol. qRT-PCR was performed using the iQ SYBR Green Supermix (Bio-Rad) with CFX384 Touch Real-Time PCR Detection System (Bio-Rad). PCR amplification parameters were 95°C (3 min) and 45 cycles of 95°C (15 sec), 60°C (30 sec), and 72°C (30 sec). Primer sequences are listed in Table S5.
CRE Perturbation Screen
To generate inducible enCRISPRi and enCRISPRa stable cell lines, MKPL-1 cells were transduced with lentiviruses expressing dCas9-LSD1 (or dCas9-p300) and rtTA. Doxycycline (Dox) was added following infection and flow cytometry was used to sort cells expressing mCherry and BFP. These cells were then grown in the absence of doxycycline until mCherry fluorescence returned to uninduced levels. 1,836 recurrently mutated CREs were selected for the perturbation screens. For each CRE, 4 or 5 sgRNAs were designed. Briefly, we divided each CRE into five equal segments, and then designed two or more sgRNAs in the segment closer to the ATAC-seq peak summit. All sgRNAs were designed and selected to minimize off-target cleavage based on publicly available filtering tools (http://crispr.mit.edu/). Negative control sgRNAs were also selected from the previous CRISPRi/a library (59). DNA oligonucleotide library synthesis was completed on a programmable microarray using a B3 Synthesizer (CustomArray). Full-length oligonucleotides (96nt) were amplified for 15 cycles by PCR using Phusion® High-Fidelity DNA Polymerase (referred to as PCR1). PCR2 was performed to remove barcodes and replace with enCRISPRa/i sgRNA vector homology sequences. The amplified sequences were purified for the Gibson assembly reaction. The enCRISPRi/a sgRNA vectors were digested by BsmBI (NEB), dephosphorylated and purified. Gibson assembly was performed to generate the sgRNA library following manufacturer’s protocol. Briefly, 1 μl of the Gibson Assembly reaction was added to 25 μl E.cloni 10G ELITE electro competent cells (Lucigen, 60052–4) and electroporated using Bio-Rad MicroPulser. The transformation was plated onto pre-warmed 24.5 cm2 bioassay plates with ampicillin. The colonies were counted to calculate the library coverage (> 200x). All colonies were collected, and maxiprep was performed to isolate the sgRNA library. Lentiviruses were produced as previously described (5).
To perform high-throughput pooled enCRISPRi/a screening, the inducible enCRISPRi/a cell lines were infected with lentiviruses containing the sgRNA library at MOI < 0.5. Two days after infection, cells were selected with 1 μg/ml puromycin for 3 days, and then cultured in fresh medium for 48 hours. Cells were expanded to get enough coverage of sgRNAs (>1,000x) and gDNA were extracted (T0). The remaining cells were cultured in medium with Doxycycline (1 μg/ml) at a density between 500,000 and 1,000,000 cells/ml to maintain the library coverage of at least 1,000 cells per sgRNA. Genomic DNA was harvested from all samples at day 28 (T28), and the sgRNA-containing regions were amplified by PCR and sequenced on an Illumina NextSeq500 sequencer. All the primers are listed in Table S5.
Three replicate experiments of each enCRISPRi or enCRISPRa screen were performed. sgRNA sequences were extracted from FASTQ files and matched to the sgRNA sequences of the enCRISPRi/a screen library. Reads of each sgRNA were counted and normalized to the total read counts for each sample. Pearson correlations between replicates were calculated using the log2 transformed sgRNA counts. Normalized sgRNA growth phenotypes were quantified as previously described (59). Specifically, the phenotype of each sgRNA was calculated by log2 transformed sgRNA ratio between end (T28) and starting time points (T0). To calculate phenotype relative to control samples, we further divided them by the median of negative control phenotypes. MAGeCK (82) was used to identify CREs positively or negatively enriched in each screen. We first determined the abundance of sgRNAs using MAGeCK “count” module in the raw FASTQ files. Then we used MAGeCK “test” module to test for robust gene-level enrichment with the following arguments: mageck test -k path/to/count_file -t end_timepoint -c start_timepoint -n sample_name --normcounts-to-file --norm-method total --gene-lfc-method secondbest. This step normalized sgRNA counts by total read counts in each sample and calculated log2 fold change (LFC) of each CRE using sgRNA with the second strongest LFC.
To validate enCRISPRi and enCRISPRa screen results, we cloned 14 individual CRE-specific sgRNAs for 5 candidate oncogenic CREs (CRE18528, CRE18532, CRE19253, CRE726, and CRE4399), 4 tumor suppressive CREs (CRE16157, CRE9362, CRE11193, and CRE12661), and 5 other CREs (CRE17030, CRE4896, CRE2305, CRE13899, and CRE9836) into the MCP-KRAB-IRES-zsGreen1 (for enCRISPRi) and MCP-VP64-IRES-zsGreen1 (for enCRISPRa) vectors, respectively. Except two CREs (CRE4399 and CRE12661), the other CREs are annotated promoters. Individual enCRISPRi and enCRISPRa experiments using CRE-specific (sgCRE) or control (sgGal4) sgRNAs were performed in AML (MPKL-1 and MOLM-13) and ALL (Jurkat) cells, respectively, followed by the analysis of gene expression of CRE-associated genes after 72 hours of Dox-induced enCRISPRi or enCRISPRa expression. All sgRNA and primer sequences are listed in Table S5.
Negative Selection Competition Assay
Negative selection competition assay was performed to investigate the functional role of candidate CREs that may regulate the proliferation of MKPL-1 cells. Briefly, sgRNAs were cloned into the enCRISPRa or enCRISPRi sgRNA vectors containing a zsGreen1 reporter. Lentiviruses were produced to infect the enCRISPRa or enCRISPRi-expressing MKPL-1 cells. Cells were analyzed for zsGreen1 expression 3 days after infection by flow cytometry. Then doxycycline (1 μg/ml) was added to the medium to induce the expression of dCas9-p300 or dCas9-LSD1. Flow cytometry was repeated every 4 days to measure the percentage of sgRNA-expressing cells (zsGreen1+) and the data were normalized to the starting time point. All sgRNA sequences are listed in Table S5.
Enhancer KO by CRISPR/Cas9
Enhancer KO in MKPL-1 cells was performed following published protocols (83,84) with modifications. Briefly, sgRNAs for site-specific cleavage of genomic targets were designed following described guidelines, and sequences were selected to minimize off-target cleavage based on publicly available filtering tools (http://crispr.mit.edu/). Oligonucleotides were annealed in the following reaction: 10 μM guide sequence oligo, 10 μM reverse complement oligo, T4 ligation buffer (1x), and 5U of T4 polynucleotide kinase with the cycling parameters of 37°C for 30 min; 95°C for 5 min and then ramp down to 25°C at 5°C/min. The annealed oligos were cloned into the pSpCas9(BB) (pX458) vector (Addgene, #48138) using a Golden Gate Assembly strategy including: 100 ng of circular pX458 plasmid, 0.2 μM annealed oligos, buffer 2 (1x) (NEB), 20 U of BbsI restriction enzyme, 0.2 mM ATP, 0.1 mg/ml BSA, and 750 U of T4 DNA ligase (NEB) with the cycling parameters of 20 cycles of 37°C for 5 min, 20°C for 5 min; followed by 80°C incubation for 20 min. To induce deletions of enhancer DNA regions, CRISPR/Cas9 constructs were transfected into MKPL-1 cells by nucleofection using the ECM 830 Square Wave Electroporation System (Harvard Apparatus). Each construct was directed to flanking the target genomic regions. To enrich for deletion, the top 1–5% of GFP+ cells were sorted 48–72 h post-transfection and plated in 96-well plates. Single-cell-derived clones were isolated and screened for CRISPR-mediated deletion. PCR amplicons were subcloned and analyzed by Sanger DNA sequencing to confirm non-homologous end-joining (NHEJ)-mediated repair upon double-strand break formation. The positive single-cell-derived clones containing deletion of the targeted sequences were expanded and processed for analysis.
Enhancer Reporter Assays
The genomic sequences containing wild-type or mutant allele of the enhancers were amplified and cloned into the firefly luciferase reporter constructs pGL4.24[luc2P/minP]. The reporter constructs (1 μg) were co-transfected with pRL-SV40-Renilla luciferase constructs (50 ng) into 293T, K562 or MKPL-1 cells maintained in DMEM containing 10% charcoal/dextran-treated FBS (HyClone) using FuGENE® 6 (Promega) or ECM 830 Square Wave Electroporation System (Harvard Apparatus) according to the manufacturer’s protocols. The nuclear receptor ligands were added to the medium 8h after transfection. Cells were maintained in DMEM containing 10% charcoal/dextran-treated FBS and harvested after 48h, and luciferase activity was measured by Dual-Luciferase Assay system (Promega). Briefly, cells were washed twice with PBS. Cell lysates were prepared by adding 100 μl of PLB to each well, followed by incubation for 15 min at room temperature. Cell lysates were transferred to white opaque 96-well microplates (3 replicates for each sample), and firefly luminescence was measured after adding 30 μl LAR II. After adding an equal volume of Stop&Glo Reagent to the wells, the Renilla luminescence was measured. To determine whether the sensitivity to the mutated enhancers is dependent on PPARG and RXRA expression, shRNA lentiviruses were produced to transduce MKPL-1 cells. Puromycin selection (1 μg/ml) was performed 3 days after infection. Selected cells and control cells were maintained in DMEM containing 10% charcoal/dextran-treated FBS for 48h prior to electroporation, and the luciferase activity was measured by Dual-Luciferase Assay system as described above.
Electrophoretic Mobility Shift Assay
EMSA was performed using a Thermo Fisher Scientific LightShift Chemiluminescent EMSA kit following the manufacturer’s instructions. Briefly, human PPARG and RXRA open read frames (ORFs) were amplified and cloned to the lentiviral vectors pLVX-EF1a-IRES-zsGreen1 (Clontech, #631982) and pLVX-EF1a-IRES-mCherry (Clontech, #631987), respectively. Lentiviruses were produced as previously described (5). MKPL-1 cells were then infected with both lentiviruses and zsGreen1+mCherry+ cells were sorted and expanded. Cell nuclear extracts were prepared using NE-PER Nuclear and Cytoplasmic Extraction Reagents (Thermo Fisher Scientific) according to the manufacturer’s protocols. Unlabeled and biotin-labelled probes were synthesized and annealed on a PCR thermal cycler (Bio-Rad) by heating up the mixture at 95 °C for 5 min and lowering the temperature to room temperature at a ramp of 0.1°C/s. EMSA reactions included 1x binding buffer, 50 ng/μl poly(dI-dC), 2.5% glycerol, 0.05% Nonidet P-40, 5 mM MgCl2, 2 μl nuclear extracts, and 20 fM biotin-labelled probes. Specificity of mobility shifts was analyzed by including 8 pM unlabeled WT or mutant PPAR/RXR competitor oligonucleotides. Supershift assays were performed by including 2 μg antibodies against PPARG (Santa Cruz, sc-7273X) and/or RXRA (Santa Cruz, sc-515929X). Reactions were incubated at room temperature for 20 min, size-separated on a 6% DNA retardation gel, and transferred to a charged nylon membrane (Hybond-XL, GE/Amersham Biosciences). Membranes were crosslinked at 120 mJ/cm2 using Spectrolinker™ XL-1500. Free or protein-bound biotin-labelled probes were detected using streptavidin-horseradish peroxidase conjugates and chemiluminescent substrates according to the manufacturer’s protocols. Intensity of the resulting bands was measured by Image J. All probe sequences are listed in Table S5.
Site-Direct Knock-In of Enhancer Variants
The sgRNAs for site-specific cleavage of genomic targets were cloned into the pSpCas9(BB) (pX458) vector (Addgene, #48138) using a Golden Gate Assembly strategy. To generate independent single-cell-derived knock-in clones containing the non-coding variants, 5 μg sgRNA vector, 3 μM ssDNA donor and cell suspension were combined and electroporated using the ECM 830 Square Wave Electroporation System (Harvard Apparatus) according to the manufacturer’s protocol. To enrich for knock-in cells, the top 1–5% of GFP+ cells were sorted 48–72h post-transfection and plated in 96-well plates. Single-cell-derived clones were isolated and screened for CRISPR-mediated knock-in. Briefly, genomic DNA was extracted and amplified by specific genotyping primers. The PCR products were then subjected to NcoI (CRE4399) or BssSI (CRE12661) digestion, respectively. Positive clones were further validated by Sanger DNA sequencing. The validated single-cell-derived knock-in clones were expanded and processed for subsequent analyses. Sequences of sgRNAs, ssDNA donors, and genotyping primers are listed in Table S5.
To determine the effects of enhancer variants on the baseline and NR-induced target gene expression, the single-cell-derived knock-in clones were maintained in DMEM containing 10% charcoal/dextran-treated FBS for 48h before treatment with DMSO or NR agonists (1μM of 9-cis-RA and 5μM Rosiglitazone). Total RNA was isolated after 48h and qRT-PCR analysis was performed to measure KRAS and PER2 expression. To determine the effects of enhancer variants on the chromatin occupancy of nuclear receptors, the single-cell-derived knock-in clones were maintained in DMEM containing 10% charcoal/dextran-treated FBS for 48h before the treatment with NR agonists (1μM 9-cis-RA and 5μM Rosiglitazone). Cells were fixed after 48h and ChIP experiments were performed using antibodies for PPARG (Santa Cruz, sc-271392) and RXRA (Santa Cruz, sc-515928). The NR ChIP or input control DNA were PCR amplified and quantified for allele frequency by amplicon sequencing.
QUANTIFICATION AND STATISTICAL ANALYSIS
Statistical details including N, mean and statistical significance values are indicated in the text, figure legends, or Method Details. Error bars in the experiments represent standard error of the mean (SEM) or standard deviation (SD) from either independent experiments or independent samples. All statistical analyses were performed using GraphPad Prism, and the detailed information about statistical methods is specified in figure legends or Methods Details.
Supplementary Material
SIGNIFICANCE.
We describe an integrative approach to identify non-coding variants in human leukemia, and reveal cohorts of variant-associated oncogenic and tumor suppressive cis-elements including KRAS and PER2 enhancers. Our findings support a model that non-coding regulatory variants connect enhancer dysregulation with nuclear receptor signaling to modulate gene programs in hematopoietic malignancies.
ACKNOWLEDGMENTS
We thank Liqiang Wang, Yi Du and David Trudgian at UTSW BioHPC for assistance, Lin Li for technical support, and other Xu laboratory members for helpful discussion and technical support. K.L. and Y.L. were supported by the Cancer Prevention and Research Institute of Texas (CPRIT) training grants (RP160157). X.L. was supported by the American Heart Association postdoctoral fellowship (18POST34060219). J.X. is a Scholar of The Leukemia & Lymphoma Society (LLS) and an American Society of Hematology (ASH) Scholar. This work was supported by the NIH grants R01DK111430 and R01CA230631, by the Leukemia Texas Foundation research award, by the CPRIT grants (RR140025, RP180504, RP180826 and RP190417), and by the Welch Foundation grant I-1942 (to J.X.).
Footnotes
DATA AND SOFTWARE AVAILABILITY
All raw and processed targeted CRE resequencing, ATAC-seq and ChIP-seq data are available in the Gene Expression Omnibus (GEO): GSE137656.
Disclosure of Potential Conflicts of Interest:
No potential conflicts of interest were disclosed.
REFERENCES
- 1.Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, et al. An atlas of active enhancers across human cell types and tissues. Nature 2014;507(7493):455–61 doi 10.1038/nature12787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, Harp LF, et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 2009;459(7243):108–12 doi 10.1038/nature07829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Heintzman ND, Stuart RK, Hon G, Fu Y, Ching CW, Hawkins RD, et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nature genetics 2007;39(3):311–8 doi 10.1038/ng1966. [DOI] [PubMed] [Google Scholar]
- 4.Visel A, Blow MJ, Li Z, Zhang T, Akiyama JA, Holt A, et al. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 2009;457(7231):854–8 doi 10.1038/nature07730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Huang J, Liu X, Li D, Shao Z, Cao H, Zhang Y, et al. Dynamic Control of Enhancer Repertoires Drives Lineage and Stage-Specific Transcription during Hematopoiesis. Developmental Cell 2016;36(1):9–23 doi 10.1016/j.devcel.2015.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Rada-Iglesias A, Bajpai R, Swigut T, Brugmann SA, Flynn RA, Wysocka J. A unique chromatin signature uncovers early developmental enhancers in humans. Nature 2011;470(7333):279–83 doi 10.1038/nature09692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Xu J, Shao Z, Glass K, Bauer DE, Pinello L, Van Handel B, et al. Combinatorial assembly of developmental stage-specific enhancers controls gene expression programs during human erythropoiesis. Developmental Cell 2012;23(4):796–811 doi 10.1016/j.devcel.2012.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Hnisz D, Abraham BJ, Lee TI, Lau A, Saint-Andre V, Sigova AA, et al. Super-enhancers in the control of cell identity and disease. Cell 2013;155(4):934–47 doi 10.1016/j.cell.2013.09.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Parker SC, Stitzel ML, Taylor DL, Orozco JM, Erdos MR, Akiyama JA, et al. Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants. Proceedings of the National Academy of Sciences of the United States of America 2013;110(44):17921–6 doi 10.1073/pnas.1317023110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, et al. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 2013;153(2):307–19 doi 10.1016/j.cell.2013.03.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Loven J, Hoke HA, Lin CY, Lau A, Orlando DA, Vakoc CR, et al. Selective inhibition of tumor oncogenes by disruption of super-enhancers. Cell 2013;153(2):320–34 doi 10.1016/j.cell.2013.03.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Khurana E, Fu Y, Chakravarty D, Demichelis F, Rubin MA, Gerstein M. Role of non-coding sequence variants in cancer. Nature reviews Genetics 2016;17(2):93–108 doi 10.1038/nrg.2015.17. [DOI] [PubMed] [Google Scholar]
- 13.Zhou S, Treloar AE, Lupien M. Emergence of the Noncoding Cancer Genome: A Target of Genetic and Epigenetic Alterations. Cancer discovery 2016;6(11):1215–29 doi 10.1158/2159-8290.cd-16-0745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Horn S, Figl A, Rachakonda PS, Fischer C, Sucker A, Gast A, et al. TERT promoter mutations in familial and sporadic melanoma. Science (New York, NY) 2013;339(6122):959–61 doi 10.1126/science.1230062. [DOI] [PubMed] [Google Scholar]
- 15.Huang FW, Hodis E, Xu MJ, Kryukov GV, Chin L, Garraway LA. Highly recurrent TERT promoter mutations in human melanoma. Science (New York, NY) 2013;339(6122):957–9 doi 10.1126/science.1229259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Mansour MR, Abraham BJ, Anders L, Berezovskaya A, Gutierrez A, Durbin AD, et al. Oncogene regulation. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science (New York, NY) 2014;346(6215):1373–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Groschel S, Sanders MA, Hoogenboezem R, de Wit E, Bouwman BA, Erpelinck C, et al. A single oncogenic enhancer rearrangement causes concomitant EVI1 and GATA2 deregulation in leukemia. Cell 2014;157(2):369–81 doi 10.1016/j.cell.2014.02.019. [DOI] [PubMed] [Google Scholar]
- 18.Evans RM, Mangelsdorf DJ. Nuclear Receptors, RXR, and the Big Bang. Cell 2014;157(1):255–66 doi 10.1016/j.cell.2014.03.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Dhiman VK, Bolt MJ, White KP. Nuclear receptors in cancer - uncovering new and evolving roles through genomic analysis. Nature reviews Genetics 2018;19(3):160–74 doi 10.1038/nrg.2017.102. [DOI] [PubMed] [Google Scholar]
- 20.de The H, Chen Z. Acute promyelocytic leukaemia: novel insights into the mechanisms of cure. Nature reviews Cancer 2010;10(11):775–83 doi 10.1038/nrc2943. [DOI] [PubMed] [Google Scholar]
- 21.McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research 2010;20(9):1297–303 doi 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature biotechnology 2013;31(3):213–9 doi 10.1038/nbt.2514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Saunders CT, Wong WS, Swamy S, Becq J, Murray LJ, Cheetham RK. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics (Oxford, England) 2012;28(14):1811–7 doi 10.1093/bioinformatics/bts271. [DOI] [PubMed] [Google Scholar]
- 24.Fang H, Bergmann EA, Arora K, Vacic V, Zody MC, Iossifov I, et al. Indel variant analysis of short-read sequencing data with Scalpel. 2016;11(12):2529–48 doi 10.1038/nprot.2016.150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome research 2012;22(3):568–76 doi 10.1101/gr.129684.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Li K, Liu Y, Cao H, Zhang Y, Gu Z, Liu X, et al. Interrogation of enhancer function by enhancer-targeting CRISPR epigenetic editing. Nature communications 2020;11(1):485 doi 10.1038/s41467-020-14362-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Konermann S, Brigham MD, Trevino AE, Joung J, Abudayyeh OO, Barcena C, et al. Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature 2015;517(7536):583–8 doi 10.1038/nature14136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proceedings of the National Academy of Sciences of the United States of America 2010;107(50):21931–6 doi 10.1073/pnas.1016071107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Tsherniak A, Vazquez F, Montgomery PG, Weir BA, Kryukov G, Cowley GS, et al. Defining a Cancer Dependency Map. Cell 2017;170(3):564–76.e16 doi 10.1016/j.cell.2017.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Li S, Balmain A. A model for RAS mutation patterns in cancers: finding the sweet spot. 2018;18(12):767–77 doi 10.1038/s41568-018-0076-6. [DOI] [PubMed] [Google Scholar]
- 31.Wandler A, Shannon K. Mechanistic and Preclinical Insights from Mouse Models of Hematologic Cancer Characterized by Hyperactive Ras. Cold Spring Harbor perspectives in medicine 2018;8(4) doi 10.1101/cshperspect.a031526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Tzeng J, Byun J, Park JY, Yamamoto T, Schesing K, Tian B, et al. An Ideal PPAR Response Element Bound to and Activated by PPARalpha. PloS one 2015;10(8):e0134996 doi 10.1371/journal.pone.0134996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Forbes SA, Beare D, Boutselakis H, Bamford S, Bindal N, Tate J, et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic acids research 2017;45(D1):D777–d83 doi 10.1093/nar/gkw1121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer discovery 2012;2(5):401–4 doi 10.1158/2159-8290.cd-12-0095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Albrecht U, Sun ZS, Eichele G, Lee CC. A differential response of two putative mammalian circadian regulators, mper1 and mper2, to light. Cell 1997;91(7):1055–64. [DOI] [PubMed] [Google Scholar]
- 36.Fu L, Pelicano H, Liu J, Huang P, Lee C. The circadian gene Period2 plays an important role in tumor suppression and DNA damage response in vivo. Cell 2002;111(1):41–50. [DOI] [PubMed] [Google Scholar]
- 37.Gery S, Gombart AF, Yi WS, Koeffler C, Hofmann WK, Koeffler HP. Transcription profiling of C/EBP targets identifies Per2 as a gene implicated in myeloid leukemia. Blood 2005;106(8):2827–36 doi 10.1182/blood-2005-01-0358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Ye Y, Xiang Y, Ozguc FM, Kim Y, Liu CJ, Park PK, et al. The Genomic Landscape and Pharmacogenomic Interactions of Clock Genes in Cancer Chronotherapy. Cell systems 2018;6(3):314–28.e2 doi 10.1016/j.cels.2018.01.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Papagiannakopoulos T, Bauer MR, Davidson SM, Heimann M, Subbaraj L, Bhutkar A, et al. Circadian Rhythm Disruption Promotes Lung Tumorigenesis. Cell metabolism 2016;24(2):324–31 doi 10.1016/j.cmet.2016.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Puram RV, Kowalczyk MS, de Boer CG, Schneider RK, Miller PG, McConkey M, et al. Core Circadian Clock Genes Regulate Leukemia Stem Cells in AML. Cell 2016;165(2):303–16 doi 10.1016/j.cell.2016.03.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Zhang Y, Dallner OS, Nakadai T, Fayzikhodjaeva G, Lu YH, Lazar MA, et al. A noncanonical PPARgamma/RXRalpha-binding sequence regulates leptin expression in response to changes in adipose tissue mass. Proceedings of the National Academy of Sciences of the United States of America 2018;115(26):E6039–e47 doi 10.1073/pnas.1806366115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Palmer CN, Hsu MH, Griffin HJ, Johnson EF. Novel sequence determinants in peroxisome proliferator signaling. The Journal of biological chemistry 1995;270(27):16114–21. [DOI] [PubMed] [Google Scholar]
- 43.Juge-Aubry C, Pernin A, Favez T, Burger AG, Wahli W, Meier CA, et al. DNA binding properties of peroxisome proliferator-activated receptor subtypes on various natural peroxisome proliferator response elements. Importance of the 5’-flanking region. The Journal of biological chemistry 1997;272(40):25252–9. [DOI] [PubMed] [Google Scholar]
- 44.Chandra V, Huang P, Hamuro Y, Raghuram S, Wang Y, Burris TP, et al. Structure of the intact PPAR-gamma-RXR- nuclear receptor complex on DNA. Nature 2008;456(7220):350–6 doi 10.1038/nature07413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Heyman RA, Mangelsdorf DJ, Dyck JA, Stein RB, Eichele G, Evans RM, et al. 9-cis retinoic acid is a high affinity ligand for the retinoid X receptor. Cell 1992;68(2):397–406. [DOI] [PubMed] [Google Scholar]
- 46.Tenenbaum A, Motro M, Fisman EZ. Dual and pan-peroxisome proliferator-activated receptors (PPAR) co-agonism: the bezafibrate lessons. Cardiovascular diabetology 2005;4:14 doi 10.1186/1475-2840-4-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Forman LM, Simmons DA, Diamond RH. Hepatic failure in a patient taking rosiglitazone. Annals of internal medicine 2000;132(2):118–21. [DOI] [PubMed] [Google Scholar]
- 48.Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 2013;499(7457):214–8 doi 10.1038/nature12213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Schuster-Bockler B, Lehner B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 2012;488(7412):504–7 doi 10.1038/nature11273. [DOI] [PubMed] [Google Scholar]
- 50.Polak P, Lawrence MS, Haugen E, Stoletzki N, Stojanov P, Thurman RE, et al. Reduced local mutation density in regulatory DNA of cancer genomes is linked to DNA repair. Nature biotechnology 2014;32(1):71–5 doi 10.1038/nbt.2778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Sabarinathan R, Mularoni L, Deu-Pons J, Gonzalez-Perez A, Lopez-Bigas N. Nucleotide excision repair is impaired by binding of transcription factors to DNA. Nature 2016;532(7598):264–7 doi 10.1038/nature17661. [DOI] [PubMed] [Google Scholar]
- 52.Braun BS, Archard JA, Van Ziffle JA, Tuveson DA, Jacks TE, Shannon K. Somatic activation of a conditional KrasG12D allele causes ineffective erythropoiesis in vivo. Blood 2006;108(6):2041–4 doi 10.1182/blood-2006-01-013490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Counter CM, Schubbert S, Zenker M, Rowe SL, Boll S, Klein C, et al. Germline KRAS mutations cause Noonan syndrome. Nature reviews Cancer 2006;38(3):331–6. [DOI] [PubMed] [Google Scholar]
- 54.Van Meter ME, Diaz-Flores E, Archard JA, Passegue E, Irish JM, Kotecha N, et al. K-RasG12D expression induces hyperproliferation and aberrant signaling in primary hematopoietic stem/progenitor cells. Blood 2007;109(9):3945–52 doi 10.1182/blood-2006-09-047530. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Braun BS, Tuveson DA, Kong N, Le DT, Kogan SC, Rozmus J, et al. Somatic activation of oncogenic Kras in hematopoietic cells initiates a rapidly fatal myeloproliferative disorder. Proceedings of the National Academy of Sciences of the United States of America 2004;101(2):597–602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Sasine JP, Himburg HA, Termini CM, Roos M, Tran E, Zhao L, et al. Wild-type Kras expands and exhausts hematopoietic stem cells. JCI insight 2018;3(11) doi 10.1172/jci.insight.98197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Bailey SD, Desai K, Kron KJ, Mazrooei P, Sinnott-Armstrong NA, Treloar AE, et al. Noncoding somatic and inherited single-nucleotide variants converge to promote ESR1 expression in breast cancer. Nature genetics 2016;48(10):1260–6 doi 10.1038/ng.3650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Soccio RE, Chen ER, Rajapurkar SR, Safabakhsh P, Marinis JM, Dispirito JR, et al. Genetic Variation Determines PPARgamma Function and Anti-diabetic Drug Response In Vivo. Cell 2015;162(1):33–44 doi 10.1016/j.cell.2015.06.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Gilbert LA, Horlbeck MA, Adamson B, Villalta JE, Chen Y, Whitehead EH, et al. Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation. Cell 2014;159(3):647–61 doi 10.1016/j.cell.2014.09.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, et al. The NIH Roadmap Epigenomics Mapping Consortium. Nature biotechnology 2010;28(10):1045–8 doi 10.1038/nbt1010-1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 2011;473(7345):43–9 doi 10.1038/nature09906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Kasowski M, Kyriazopoulou-Panagiotopoulou S, Grubert F, Zaugg JB, Kundaje A, Liu Y, et al. Extensive variation in chromatin states across humans. Science (New York, NY) 2013;342(6159):750–2 doi 10.1126/science.1242510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Knoechel B, Roderick JE, Williamson KE, Zhu J, Lohr JG, Cotton MJ, et al. An epigenetic mechanism of resistance to targeted therapy in T cell acute lymphoblastic leukemia. 2014;46(4):364–70 doi 10.1038/ng.2913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Lin CY, Loven J, Rahl PB, Paranal RM, Burge CB, Bradner JE, et al. Transcriptional amplification in tumor cells with elevated c-Myc. Cell 2012;151(1):56–67 doi 10.1016/j.cell.2012.08.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science (New York, NY) 2012;337(6099):1190–5 doi 10.1126/science.1222794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Pham TH, Benner C, Lichtinger M, Schwarzfischer L, Hu Y, Andreesen R, et al. Dynamic epigenetic enhancer signatures reveal key transcription factors associated with monocytic differentiation states. Blood 2012;119(24):e161–71 doi 10.1182/blood-2012-01-402453. [DOI] [PubMed] [Google Scholar]
- 67.Schmidl C, Hansmann L, Lassmann T, Balwierz PJ, Kawaji H, Itoh M, et al. The enhancer and promoter landscape of human regulatory and conventional T-cell subpopulations. Blood 2014;123(17):e68–78 doi 10.1182/blood-2013-02-486944. [DOI] [PubMed] [Google Scholar]
- 68.Saeed S, Quintin J, Kerstens HH, Rao NA, Aghajanirefah A, Matarese F, et al. Epigenetic programming of monocyte-to-macrophage differentiation and trained innate immunity. Nature genetics 2014;345(6204):1251086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Wang H, Zang C, Taing L, Arnett KL, Wong YJ, Pear WS, et al. NOTCH1-RBPJ complexes drive target gene expression through dynamic interactions with superenhancers. Proceedings of the National Academy of Sciences of the United States of America 2014;111(2):705–10 doi 10.1073/pnas.1315023111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Chapuy B, McKeown MR, Lin CY, Monti S, Roemer MG, Qi J, et al. Discovery and characterization of super-enhancer-associated dependencies in diffuse large B cell lymphoma. Cancer cell 2013;24(6):777–90 doi 10.1016/j.ccr.2013.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Weinstein JS, Lezon-Geyda K, Maksimova Y, Craft S, Zhang Y, Su M, et al. Global transcriptome analysis and enhancer landscape of human primary T follicular helper and T effector lymphocytes. Blood 2014;124(25):3719–29 doi 10.1182/blood-2014-06-582700. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology 2009;10(3):R25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome biology 2008;9(9):R137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Wigler M, Schatz MC, Narzisi G, Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nature protocols 2010;38(16):e164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Weinhold N, Jacobsen A, Schultz N, Sander C, Lee W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nature genetics 2014;46(11):1160–5 doi 10.1038/ng.3101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics (Oxford, England) 2010;26(6):841–2 doi 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Khan A, Fornes O, Stigliani A, Gheorghe M, Castro-Mondragon JA, van der Lee R, et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic acids research 2018;46(D1):D260–d6 doi 10.1093/nar/gkx1126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Liu X, Zhang Y, Chen Y, Li M, Zhou F, Li K, et al. In Situ Capture of Chromatin Interactions by Biotinylated dCas9. Cell 2017;170(5):1028–43.e19 doi 10.1016/j.cell.2017.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Gu Z, Liu Y, Cai F, Patrick M, Zmajkovic J, Cao H, et al. Loss of EZH2 Reprograms BCAA Metabolism to Drive Leukemic Transformation. Cancer discovery 2019;9(9):1228–47 doi 10.1158/2159-8290.cd-19-0152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Durand NC, Robinson JT, Shamim MS, Machol I, Mesirov JP, Lander ES, et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell systems 2016;3(1):99–101 doi 10.1016/j.cels.2015.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 2014;159(7):1665–80 doi 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Li W, Xu H, Xiao T, Cong L, Love MI, Zhang F, et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome biology 2014;15(12):554 doi 10.1186/s13059-014-0554-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N, et al. Multiplex genome engineering using CRISPR/Cas systems. Science (New York, NY) 2013;339(6121):819–23 doi 10.1126/science.1231143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Mali P, Yang L, Esvelt KM, Aach J, Guell M, DiCarlo JE, et al. RNA-guided human genome engineering via Cas9. Science (New York, NY) 2013;339(6121):823–6 doi 10.1126/science.1232033. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.