Abstract
Esophageal squamous cell carcinomas (ESCCs) harbor recurrent chromosome 3q amplifications that target the transcription factor SOX2. Beyond its role as an oncogene in ESCC, SOX2 acts in development of the squamous esophagus and maintenance of adult esophageal precursor cells. To compare Sox2 activity in normal and malignant tissue, we developed engineered murine esophageal organoids spanning normal esophagus to Sox2-induced squamous cell carcinoma and mapped Sox2 binding and the epigenetic and transcriptional landscape with evolution from normal to cancer. While oncogenic Sox2 largely maintains actions observed in normal tissue, Sox2 overexpression with p53 and p16 inactivation promotes chromatin remodeling and evolution of the Sox2 cistrome. With Klf5, oncogenic Sox2 acquires new binding sites and enhances activity of oncogenes such as Stat3. Moreover, oncogenic Sox2 activates endogenous retroviruses, inducing expression of double-stranded RNA and dependence on the RNA editing enzyme ADAR1. These data reveal SOX2 functions in ESCC, defining targetable vulnerabilities.
Squamous cell carcinomas (SCCs) arise in organs including the esophagus, head and neck, and lung, and share recurrent gain of chromosome 3q which targets the locus encoding the transcription factor SOX2 (refs. 1-5). The Cancer Genome Atlas (TCGA) identified SOX2 as the most selective genomic amplification in SCCs, suggesting a fundamental role of SOX2 in SCC pathophysiology4,6. SOX2 is an SRY-containing homeobox transcription factor active within pluripotent stem cells7-9 with cofactor OCT4 (refs. 10,11). However, SOX2 acts in development of squamous esophagus12 and marks precursor populations of the adult esophagus and large airways12,13. SOX2’s ability to act in pluripotency and a specific lineage may follow its ability to use distinct binding sites when acting with distinct cofactors. Indeed, we demonstrated that SOX2 physically interacts with and colocalizes to the genome with squamous transcription factor p63 in SCC14.
SOX2 represents the ‘lineage-survival oncogenes’15, transcription factors selectively amplified and oncogenic in cancers originating from tissues where the factor plays a physiologic role16-19. SOX2 functions in normal squamous progenitor cells and SCCs20-26. However, it is not clear how these SOX2 functions diverge. Similar to SOX2, p63 is expressed in squamous progenitors. Thus, both normal squamous progenitors and SCCs are typically SOX2 and p63 coexpressing27,28. Distinguishing normal and neoplastic functions of SOX2 could reveal oncogenic mechanisms and nominate therapeutic approaches. We aimed to disentangle normal and neoplastic esophageal roles of Sox2 using a set of engineered murine organoids representing phenotypes from normal esophagus to ESCC. By characterizing SOX2 localization and effects of SOX2 manipulation on chromatin, we define how SOX2 function evolves during progression to ESCC. These approaches yield therapeutic hypotheses and provide a paradigm for disentangling actions of transcription factors in transformed and normal cells.
Results
Sox2 activation and Trp53/Cdkn2a loss promote transformation.
We sought to generate a model whereby we could isolate and study normal esophageal epithelium and engineer into ESCC with genetic manipulation, including Sox2 overexpression. TCGA revealed that losses of TP53 and CDKN2A (p16) are highly recurrent in SOX2-amplified ESCC (Fig. 1a), suggesting these alterations may promote ESCC. To generate models, we bred mice with Cre-inducible expression of Sox2 and Cas9 genes (Rosa2 6CAG-loxp-stop-loxp-Sox2-IRES-Egfp; H11CAG-loxp-stop-loxp-Cas9; Fig. 1b). We collected esophageal epithelium from adult Sox2; Cas9 mice and control Cas9 mice and derived three-dimensional organoids. We constructed lentiviral vectors to express Cre and single guide (sg) RNAs targeting Trp53 and Cdkn2a so that transduction into Sox2; Cas9 mice would simultaneously induce Sox2 overexpression and deletion of tumor suppressors, a model designated ‘SCPP’ (Sox2, Cas9, p53, p16). We also engineered controls containing only Sox2 overexpression (SC) by transducing Lenti-Cre with sgLacZ into Sox2; Cas9 organoids. Additionally, by infecting organoids from Cas9 mice with Lenti-Cre; sgCdkn2a; sgTp53, we generated organoids with tumor suppressor inactivation without Sox2 overexpression (‘CPP’; Cas9, p53, p16).
We characterized organoids using histology, immunohistochemistry and immunoblotting (Fig. 1c,d and Extended Data Fig. 1a,b). Normal organoids possessed stratified squamous epithelium with peripheral basal cells, and weak Sox2 and moderate p63 expression. SC organoids showed some level of differentiation with mild-to-moderate basal cell atypia and strong Sox2 and p63 expression. CPP organoids showed increased nucleated cells with little nuclear atypia, weak Sox2 expression and reduced p63 expression (Fig. 1c). SCPP organoids were poorly differentiated, forming large, undifferentiated cellular clusters with hyperchromatic nuclei and high nuclear-to-cytoplasmic ratio, consistent with advanced dysplasia or cancer. SCPP cells are strongly Sox2 positive with modest p63 expression (Fig. 1c). We confirmed Sox2 expression, endogenous p63 expression, and p53 and Cdkn2a loss by immunoblotting (Fig. 1d and Extended Data Fig. 1a). Sox2 expression in SCPP organoids is comparable with SOX2-amplified human ESCC cell lines (Extended Data Fig. 1b).
We orthotopically and subcutaneously implanted organoids into nude mice (Nu/Nu) (Fig. 1e). No tumors were observed with normal or SC organoids. SCPP organoids gave rise to tumors following a 2–3-month latency in 80% of mice, with histology showing keratinization, keratin pearl formation, and immunostaining for Sox2 and p63, consistent with SCC (Fig. 1f). By contrast, CPP formed tumors in only 40% of injections following a longer latency of 4–5 months, with histology showing poorly differentiated carcinomas negative for Sox2 and p63 (Fig. 1e,f). To evaluate if tumor formation was attributable to additional mutations, we performed whole exome sequencing but failed to identify clear pathogenic alterations (Supplementary Table 1). These results reveal that Sox2 overexpression and Tp53 and Cdkn2a inactivation promote ESCC in murine organoids, allowing us to study Sox2 function in progression from normal esophageal epithelium to ESCC.
Genomic occupancy of Sox2 evolves from normal to cancer.
We utilized our organoids to map Sox2 genomic localization by chromatin immunoprecipitation sequencing (ChIP–seq), profile messenger RNA expression (RNA sequencing (RNA-seq)), assess activity of regulatory elements (H3K27ac ChIP–seq) and explore chromatin accessibility (assay for transposase-accessible chromatin using sequencing (ATAC–seq)) (Extended Data Fig. 2a-c). We evaluated Sox2 binding, focusing on the union of Sox2 binding sites (or ‘peaks’) from the four organoid genotypes (Extended Data Fig. 2a). We categorized Sox2 binding sites by comparing normal and SCPP organoids, classifying peaks into those significantly gained or lost or without significant difference in SCPP compared with normal. The majority of binding sites (84.3%) were not significantly different, but 14.42% of peaks were gained and 1.28% were lost in SCPP compared with normal, respectively (Fig. 2a). Focusing on these sets of Sox2 binding sites, we evaluated Sox2 binding in SC and CPP. Isolated tumor suppressor inactivation (CPP) led to little change in Sox2 localization. Intriguingly, despite notable increase in Sox2 expression in SC, Sox2 gained only a modest number of binding sites. Of the 13,758 binding sites increased in SCPP versus normal, only 4.7% were significantly gained by Sox2 in SC (Fig. 2b, right).
We then evaluated ATAC–seq at gained Sox2 peaks in SCPP, finding 51.5% of Sox2 ChIP sites overlap with ATAC sites gained in SCPP compared with normal, indicating gained Sox2 sites at newly opened chromatin regions (Fig. 2c and Extended Data Fig. 2d,e). Interestingly, ATAC profiles of SC revealed opening of the same regions that were newly opened and bound by Sox2 in SCPP (Fig. 2d and Extended Data Fig. 2f). Specifically, 55.6% of 15,345 ATAC sites opened in the SCPP versus normal comparison were also opened in SC. Unsupervised hierarchical clustering showed that SC and SCPP share similar chromatin accessibility profiles distinct from normal and CPP (Extended Data Fig. 2b). These data raised the question as to why Sox2 does not bind as robustly to these sites in SC despite marked Sox2 expression and accessibility of sites to which Sox2 binds in SCPP
We next evaluated the transcriptional regulatory activity at Sox2 binding sites. Consistent with Sox2 ChIP–seq results, we observed elevated H3K27ac ChIP–seq signal, marker of transcriptional regulatory activity, at gained Sox2 binding sites in SCPP (Fig. 2e and Extended Data Figs. 2c and 3a). In contrast, we observed modest elevation of H3K27ac at the same sites in SC, indicating blunted effects of isolated Sox2 overexpression. We further queried the enhancer-specific marker, H3K4me1, finding H3K4me1 ChIP–seq signal also increased at Sox2-gained binding sites (Extended Data Fig. 3d).
Data from SC suggest isolated Sox2 overexpression may be sufficient to promote chromatin openness but not to increase regulatory activity, raising the hypothesis that partnering transcription factors promote Sox2 binding and activity in SCPP but not SC. We queried transcription factor motifs at Sox2 sites acquired in SCPP but not SC (Fig. 2f), finding enrichment of AP-1, Sox2, Klf5, Tead and Runx motifs, nominating factors that may contribute to stabilization and activity of Sox2.
To investigate the transcriptional impact of Sox2 overexpression, we compared mRNA-seq between SCPP and normal. Integrating mRNA-seq and Sox2 ChIP–seq using Binding and Expression Target Analysis (BETA)29 revealed that Sox2 mainly acts as a transcriptional activator (Kolmogorov–Smirnov test, P = 1.36 × 10−27; Fig. 2g). We compared genes upregulated by ectopic Sox2 in our models with TCGA ESCC data, finding genes identified as Sox2-activated in SCPP as more likely to positively correlate with SOX2 expression in human tumors (Extended Data Fig. 3c). Additionally, a composite set of genes correlated with SOX2 in human ESCC was enriched in SCPP compared with CPP or normal models (Extended Data Fig. 3d,e). These data indicate that our models identify activities of Sox2 similar to its role in human ESCC.
Sox2 promotes super-enhancer (SE) remodeling.
We utilized our models to identify candidate cancer-promoting genes activated by Sox2. Regions with broad yet condensed H3K27ac enrichment, namely SEs, have been associated with activation of oncogenes30. We analyzed H3K27ac ChIP–seq data, finding 372 SEs in normal and 785 SEs in SCPP (Fig. 3a and Supplementary Table 2). Among 543 SEs acquired in SCPP, 74.6% overlap with gained Sox2 binding sites in SCPP, suggesting Sox2 overexpression shapes the SE landscape (Fig. 3b and Extended Data Fig. 4a). Among the newly acquired SEs overlapping gained Sox2 binding sites in SCPP, ATAC–seq revealed that 49.4% of sites were opened following Sox2 overexpression (that is, closed in normal organoids) (Extended Data Fig. 4a). Furthermore, nearly all gained SEs with gained chromatin open sites are also sites of Sox2 binding (Extended Data Fig. 4b), highlighting Sox2’s role shaping the epigenome.
We then evaluated putative gene targets of acquired SEs in SCPP, finding 886 genes associated with 543 gained SEs, including squamous-related oncogenes (for example, MYC, MAX, STAT3, MET and CCND1) (Extended Data Fig. 4c). Examples of acquired SE marks with acquired Sox2 peaks are depicted for Il6ra, where mRNA expression was increased fourfold in SCPP by quantitative PCR (qPCR) validation (Fig. 3c,d and Extended Data Fig. 4d). Similarly, binding of Sox2 ~13 kilobases (kb) upstream of Stat3 is associated with SE formation and elevated mRNA expression and Stat3 phosphorylation (Fig. 3c-e and Extended Data Fig. 4d,e). To explore these findings in human ESCC, we evaluated SEs in SOX2-amplified ESCC cell lines, finding significant overlap of gained SE-related genes between SCPP and ESCC cell lines (Extended Data Fig. 4f) including SOX2 localization at SEs including IL6R and STAT3 (Extended Data Fig. 4g). Furthermore, silencing SOX2 attenuated p-STAT3 in ESCC cell lines (Extended Data Fig. 4h).
We performed motif analysis at gained Sox2 binding sites within new SCPP SEs to identify candidate Sox2 cofactors, observing enrichment of Ap1, Sox2, Tead4 and Klf5 motifs (Fig. 3f). We focused attention on putative SOX2 ESCC partner KLF5, recently identified to colocalize with SOX2 and p63 in SCC cell lines28. KLF5 is focally amplified in SCC and associated with cancer-related activities31-33. Consistent with a potential role of Klf5 in SCPP, we observed a marked increase in Klf5 mRNA and protein in SCPP (Fig. 3g). We identified multiple Sox2 binding sites with H3K27ac signal at the Klf5 locus, suggesting Sox2 promotes Klf5 activation (Fig. 3h). ATAC–seq revealed that specific Sox2 binding sites at Klf5 were newly opened in SC and SCPP compared with normal. To validate functional Sox2 binding at the Klf5 locus, we cloned three enhancers (Enhancer1–Enhancer3 (E1–E3)) with Sox2 gained binding sites in SCPP into a reporter vector. Among sites tested, E3 showed enhanced transcriptional regulatory activity when Sox2 is overexpressed. The activity of E3 is enhanced in SCPP, consistent with Sox2-mediated activation (Extended Data Fig. 5a,b). We deleted the Sox2 DNA binding motif from the E3 enhancer sequence and repeated the reporter assay in SCPP, finding repression of activation, supporting a direct role of Sox2 (Extended Data Fig. 5c). We further utilized CRISPR interference to repress the E1–E3 enhancers individually, significantly reducing Klf5 mRNA expression (Extended Data Fig. 5d). Our results suggest that Sox2 binding increases Klf5 expression (Fig. 3h).
KLF5 facilitates SOX2 activity in squamous tumorigenesis.
We tested our hypothesis that Sox2 and Klf5 interact and colocalize at the genome. We showed physical interaction between Klf5 and Sox2 in SCPP and SOX2-amplified ESCC cell lines by coimmunoprecipitation (Fig. 4a and Extended Data Fig. 6a). We performed Klf5 ChIP–seq in organoids using an antibody recognizing endogenous Klf5, finding dramatic Klf5 binding at Sox2 sites gained in SCPP (Fig. 4b,c). Indeed, 56.6% of Klf5 binding sites gained in SCPP compared with normal overlapped with sites gained by Sox2 (Extended Data Fig. 6b). We queried Sox2-occupied SEs gained in SCPP compared with normal, finding Klf5 binding at 98% of sites (Extended Data Fig. 6c).
We next asked if perturbing Klf5 affected Sox2 binding. We attempted to ectopically express Klf5 complementary DNA in SC but found that these organoids could not tolerate Klf5 overexpression. Therefore, we instead evaluated effects of silencing of Klf5 with RNA-interference in SCPP followed by Sox2 ChIP–seq. We confirmed that Sox2 expression was not significantly influenced by silencing Klf5 (Extended Data Fig. 6d). We then analyzed Sox2 binding in SCPP following Klf5 silencing, finding attenuation of Sox2 binding at 3,950 sites while only 48 sites were gained (Fig. 4d). Furthermore, 49.06% of Sox2 binding sites whose intensity was significantly attenuated with Klf5 loss were sites gained by Sox2 in the SCPP versus SC (Fig. 4e), supporting our hypothesis that Klf5 promotes Sox2 binding to new loci.
To test if Klf5 contributes to Sox2-mediated gene regulation, we performed RNA-seq in SCPP with Klf5 knockout using CRISPR. We observed a significant overlap in genes downregulated by knockout of Klf5 in SCPP (Klf5 transcriptional targets) and genes upregulated in SCPP compared with normal (Fig. 4f). The genes associated with gained Sox2 binding sites, gained Klf5 binding sites, gained H3K27ac sites and gained chromatin open sites in SCPP are all highly enriched for Klf5 transcriptional targets (Fig. 4g). We also performed RNA-seq in ESCC cell lines following individually silencing SOX2 or KLF5, finding significant overlap in genes regulated by SOX2 and KLF5 (Extended Data Fig. 6e and Supplementary Table 3). Gene set enrichment analysis (GSEA) showed genes downregulated by Klf5 silencing are strongly enriched in genes downregulated by SOX2 silencing in ESCC cancer cell lines (Extended Data Fig. 6f). We asked if genes with evidence of coregulation by SOX2 and KLF5 were upregulated in ESCC by comparing RNA-seq data from ESCC with esophageal adenocarcinoma or normal esophagus from TCGA data, finding SOX2/KLF5 coregulated genes to be upregulated in ESCC (Extended Data Fig. 6g). We also performed GSEA of genes downregulated after SOX2 and KLF5 knockdown in ESCC cell lines and found a strong enrichment of these genes in ESCC tumors compared with esophageal adenocarcinoma or normal esophagus (Extended Data Fig. 6h). Furthermore, genes coregulated by SOX2 and KLF5 in human cell lines showed substantial overlap with genes identified as regulated by Sox2 and Klf5 in SCPP (that is, genes with mRNA upregulated in SCPP versus normal and with gained binding sites for both Sox2 and Klf5) (Extended Data Fig. 6i and Supplementary Table 4).
We also evaluated the impact of SOX2 and KLF5 on chromatin accessibility in human ESCC models. We silenced SOX2 in SOX2-amplified cell line KYSE70 (Extended Data Fig. 7a), finding 7,746 genomic regions with lost ATAC–seq signal, sites significantly overlapping SOX2 binding sites. We then performed ATAC–seq following silencing of KLF5, finding attenuation of ATAC–seq signals at 9,303 genomic regions, sites significantly overlapping both SOX2 binding sites and SOX2-affected ATAC sites. There were many fewer gained open chromatin sites (3,068 regions) than lost regions (9,039 genomic regions) after KLF5 silencing (Extended Data Fig. 7b,c). These data show that both KLF5 and SOX2 regulate chromatin accessibility at SOX2 binding sites, supporting KLF5’s role promoting SOX2 activity.
We then asked if KLF5 is essential for SOX2-driven ESCC cells. We observed selective Klf5 dependency in SCPP organoids relative to CPP organoids (Fig. 4h and Extended Data Fig. 7d). We validated KLF5 dependency in human ESCC using small interfering RNA (Extended Data Fig. 7e,f) and doxycycline-inducible short hairpin RNA (shRNA) (Extended Data Fig. 7g-i). Finally, we validated KLF5 dependency in vivo using two doxycycline-inducible shRNAs in KYSE70, finding that KLF5 silencing reduced tumor growth comparable to effects of targeting SOX2 (Extended Data Fig. 7j-l).
SOX2 overexpression promotes ADAR1 dependency.
Following experiments to identify activities of Sox2 in ESCC, we next asked if our models could help evaluate candidate targets. We started with an unbiased assessment of vulnerabilities in SOX2-amplified human cancer models, using data from 625 cancer cell lines interrogated with genome-wide loss-of-function CRISPR screening34 to identify genes preferentially essential with SOX2 amplification. Reassuringly, this analysis revealed genes including SOX2, TP63 and KLF5 as top dependencies in SOX2-amplified cancers (Fig. 5a). Notably we also found significant dependence of RNA adenosine deaminase ADAR1 in SOX2-amplified models (Fig. 5a and Extended Data Fig. 8a). ADAR1 dependency was greater overall in SCC cells as compared with adenocarcinoma cells. Among SCC cell lines, SOX2 amplification was also associated with ADAR1 dependence (Extended Data Fig. 8b,c).
ADAR1 was recently noted as a selective cancer vulnerability as ADAR1 loss leads to aberrant double-stranded RNA (dsRNA) accumulation and stimulation of interferon (IFN) responses, toxic to cells with stronger pre-existing IFN activity35-37. We validated ADAR1 dependence in human SOX2-amplified ESCC cell lines, finding that silencing of ADAR1 decreased growth in vitro and in vivo (Fig. 5b,c and Extended Data Fig. 8d,e). We tested if Sox2 directly promotes Adar1 dependence using isogenic organoids CPP and SCPP. Targeting Adar1 with sgRNAs significantly inhibited growth of SCPP but not CPP (Fig. 5d and Extended Data Fig. 8f). Moreover, we assessed induction of IFN-stimulated genes (ISGs) following Adar1 silencing in isogenic models, finding Adar1 depletion results in markedly greater induction of ISG responses in SCPP than CPP (Fig. 5e and Extended Data Fig. 8g). These data suggest a direct connection of SOX2 with ADAR1 dependence.
We first evaluated the hypothesis that SOX2 promotes an IFN response that makes SOX2+ cells more sensitive to stimuli, further promoting IFN activation. Providing initial support for this hypothesis, GSEA of pathways upregulated in SCPP compared with CPP found enriched IFN-α and IFN-γ pathways (Extended Data Fig. 8h,i). Indeed, we found the IFN-γ response pathway enriched in genes activated by Sox2-dependent SEs (Extended Data Fig. 8j) and by Sox2 target genes in SCPP (Extended Data Fig. 9a). Moreover, we observed Sox2-gained binding sites with H3K27ac signal at loci of Tmem173 (Sting, a double-stranded DNA (dsDNA) sensor), Dhx58 (Lgp2, a dsRNA sensor) and Mavs (a dsRNA sensor) (Extended Data Fig. 9b), with upregulation of Sting and Lgp2 mRNA with Sox2 expression (Extended Data Fig. 9c). We confirmed Klf5 localization at these loci (Extended Data Fig. 9d), consistent with Sox2 promoting IFN activity in ESCC.
We then tested stimuli that promote IFN activation in isogenic CPP and SCPP models by transfecting dsDNA (poly (deoxyadenylic-deoxythymidylic), poly(dA:dT)) or dsRNA (polyinosinic-polycytidylic acid, poly(I:C)). Given the heightened responsiveness we observed in SCPP following Adar1 silencing, we hypothesized that SCPP would show greater transcriptional response to IFN stimulation and greater toxicity. However, poly(dA:dT) yielded no significantly different growth effect in CPP and SCPP models, and poly(I:C) resulted in only a modest decrease in fitness in SCPP relative to CPP (Fig. 5f), results far less striking than observed with Adar1 silencing. Moreover, we quantified induction of ISGs with stimulation with poly(I:C) or poly(dA:dT), finding modest differences and, in some cases, greater induction in CPP (Fig. 5g). In addition to poly(I:C), we also transfected another type of dsRNA, 5′ triphosphate dsRNA (5′ppp-dsRNA), into isogenic models, finding similar modest ISG induction in CPP and SCPP (Extended Data Fig. 9e). As an alternative measure to evaluate responsiveness to IFN-inducing stimulate, we infected organoids with influenza A/PR8/34 (H1N1), a single-stranded RNA virus that produces dsRNA during replication in host cells. As shown in Extended Data Fig. 9f, ISGs induced by viral infection were greater in CPP than SCPP. This discrepancy between the greater sensitivity of SCPP to Adar1 loss and modest response to induced dsRNA suggests that SCPP is not simply more sensitive to stimuli that promote IFN induction.
We then evaluated an alternative hypothesis for the relationship of SOX2 and ADAR1 dependency, that ADAR1 mitigates dsRNA induced by SOX2. We performed dsRNA and Adar1 double immunofluorescence, observing dsRNA accumulation with Adar1 colocalization in SCPP compared with CPP (Fig. 5h and Extended Data Fig. 9g). We further found a significant accumulation of dsRNA after silencing Adar1 (Extended Data Fig. 9h,i). These data suggest that Adar1 dependence results from Sox2-mediated dsRNA induction and a requirement of the enzyme to process dsRNA.
Sox2 promotes dsRNA and endogenous retroviral expression.
We evaluated Sox2-mediated dsRNA formation, building upon our earlier epigenomic studies. Many ISGs contain endogenous retroviral elements (ERVs) in 3′ UTRs that are oriented inversely to the coding sequence38. De-repressing these loci leads to coexpression of both genes and antisense RNAs, forming dsRNA. We found a significant overlap of genes upregulated by Sox2 in SCPP versus CPP with genes possessing antisense 3′ UTR ERVs (Fig. 6a and Supplementary Table 5). Interestingly, 45% of these upregulated genes are also associated with gained Sox2 binding sites (within 50 kb of transcription start site (TSS)) (Fig. 6a). While qPCR with reverse transcription (RT–qPCR) confirmed upregulation of mRNA at Sox2-regulated ISGs with 3′ UTR ERVs in SC and SCPP models, focused quantitation of 3′ UTR-containing dsRNA found markedly elevated expression in only SCPP (Fig. 6b). We then evaluated ERVs outside of 3′ UTRs and found enrichment of ERV elements at gained Sox2 binding sites and newly accessible chromatin regions with increased H3K27 acetylation in ScPP (Fig. 6c).
To further investigate ERVs induced by Sox2, we performed ribosomal RNA-depleted RNA-seq and compared ERV expression between SCPP and CPP ERV families upregulated in SCPP (Fig. 6d and Extended Data Fig. 10a) are enriched in gained Sox2 and Klf5 binding sites, with the greatest enrichment of Sox2 binding in the RLTR13D6 family of ERVs (Fig. 6d,e and Extended Data Fig. 10b). Focused inspection of chromatin at upregulated RLTR13D6 ERVs revealed these sites to be selectively open, to harbor Sox2 and Klf5 binding, and to have active regulatory elements in SCPP relative to CPP (Extended Data Fig. 10c), consistent with Sox2-mediated activation. We confirmed elevated expression of these ERVs in SCPP compared with all other organoid genotypes (Extended Data Fig. 10d,e). Next, we evaluated whether expression of Sox2-induced ERVs, both within 3′ UTRs and in the RLTr13D6 family, was regulated by Klf5 in SCPP, finding significantly reduced expression with Klf5 silencing (Fig. 6f), highlighting the importance of Sox2 and Klf5 in ERV regulation. We further found a significant decrease of J2 expression after Klf5 silencing in SCPP and ESCC cell lines TT and TE10 (Extended Data Fig. 10f-h).
To investigate if Sox2-regulated dsRNAs are enriched in Adar1 editing sites, we re-analyzed CPP and SCPP RNA-seq to identify the adenosine-to-inosine (A-to-I) editing events mediated by Adar1. We focused upon A-to-I editing within ERV families specifically upregulated in SCPP compared with CPP and also enriched with gained Sox2 binding sites. As a control, we assessed A-to-I editing at small interspersed nuclear elements (SINEs) (established sites of Adar1 editing)36. Although one of the SINE families, AmnSINE1 family, was found upregulated in SCPP, we did not find a difference in editing of these sites in SCPP compared with CPP. In contrast, of the 11 classes of ERVs induced by Sox2 with gained Sox2 binding sites, five families had higher editing index in SCPP (indicating preferential editing in SCPP compared with CPP), six had equivalent editing and none had higher editing in CPP (Fig. 6g,h). By demonstrating that Sox2-induced ERVs are edited by Adar1, these data support our hypothesis that Sox2-mediated ERV induction promotes Adar1 dependence.
We then re-evaluated chromatin profiling in human ESCC models to study SOX2-mediated regulation of ERVs. First, we evaluated SOX2 binding and identified preferential binding to genes with ERVs in their 3′ UTR or intronic ERVs (Fig. 7a). Moreover, ERVs that were downregulated with SOX2 silencing showed substantial overlap with those downregulated following KLF5 silencing, suggesting that KLF5 acts with SOX2 to regulate ERV expression in human ESCC (Fig. 7b,c and Supplementary Table 6). Indeed, visual inspection of SOX2 and KLF5 ChIP–seq at ERVs with coregulation by the two transcription factors reveals overlapping KLF5 and SOX2 binding at active regulatory sites (Fig. 7d). These results highlight a possible mechanism by which ESCCs with SOX2 activation are dependent on ADAR1. Activated SOX2 activates ERVs, promoting dsRNA formation, which can induce toxic IFN responses. However, ADAR1 editing of dsRNAs may prevent these IFN responses, making ADAR1 essential in SOX2-activated SCCs (Fig. 7e).
Discussion
We have characterized epigenetic and transcriptional programs induced by Sox2 overexpression in ESCC, demonstrating that while Sox2 maintains actions from normal squamous cells, overexpression promotes opening of loci where Sox2 binds with partners, including Klf5, to activate a new gene expression program. These results are consistent with data demonstrating the capacity of SOX2 to facilitate nucleosome remodeling39. While we focused upon KLF5, additional work is needed to dissect interactions of SOX2 with factors such as AP-1 and TEAD, given data that TEAD4 and AP1 factors coordinate gene transcription in cancer40.
We demonstrated acquisition of Sox2 binding and active regulatory elements adjacent to a number of oncogenic genes. Moreover, this program induces expression of dsRNA from ERV elements, conferring dependency to RNA processing enzyme ADAR1. Sox2 overexpression also upregulates the IL6 receptor pathway and promotes STAT3 activation. The interleukin-6 (IL-6)/JAK/STAT3 pathway has been associated with proliferation, anti-apoptosis, invasion, angiogenesis and immune surveillance evasion41. Recent ESCC data demonstrated that cancer-associated fibroblasts can elicit IL-6, promoting STAT3 and mitogen-activated protein kinase (MAPK) activity and sensitivity to IL6R inhibition42. Our data demonstrate that amplification and pathologic overexpression of SOX2 enables this transcription factor to adopt new functions to promote cancer formation.
Our data support oncogenic activity of KLF5. KLF5 is normally expressed in squamous epithelial cells, including the proliferating esophageal cells43. While studies have shown KLF5 can promote and inhibit ESCC31,32,43, genomic evidence points to KLF5 as an oncogene. KLF5 amplification has been found in esophageal, lung, and head and neck SCC4. Similarly, members of our team described that amplification of an SE adjacent to KLF5 promotes head and neck SCC31. One recent study found p63, SOX2 and KLF5 to jointly regulate chromatin accessibility, epigenetic modifications and gene expression in ESCC28. Beyond validating the pairing of SOX2 with KLF5 in ESCC, our results address the question that inspired this project, characterizing the evolution of SOX2 action in progression to ESCC. With overexpression of Sox2 and inactivation of key tumor suppressors, a new transcriptional program is activated with newly accessible and occupied Sox2 sites, where Sox2 acts with Klf5 to promote a pro-tumorigenic program not active in the normal state.
By finding this network of Sox2 and Klf5 binding sites activated in the progression from normal to cancer, these data raise questions as to whether this ability of Sox2 to bind to new sites with Klf5 may serve a physiologic function in specific contexts. One potential explanation is that SOX2/KLF5 pairing may be active in injury or other acute stresses and this process is hijacked by ESCC, a model in which tumors are ‘wounds that never heal’44. Indeed, pathways induced by SOX2 in ESCC (AP-1, IL6/JAK/STAT and Yap/TEAD) overlap with those activated in response to tissue injury and infection45. Our results bear resemblance to studies of cancers from hair follicle stem cells where KLF5 and SOX9 jointly bind enhancers normally activated by a stress response44. Similarly, in colon injury, KLF5 induction was noted as part of the compensatory response46. Additional studies of SOX2 in esophageal epithelia will be required to query nonmalignant roles for these acquired SOX2 functions.
Our study also demonstrates that SOX2 overexpression promotes ERVs, IFN, dsRNA and dependence upon ADAR1. Induction of ERVs by SOX2 may be a byproduct of injury or acute stress responses as ERVs have been posited as ‘danger signals’47 that stimulate cytoplasmic dsRNA sensors to enhance IFN response48. Reflecting the potential role for chronic inflammation to promote cancer, sustained ERV induction can promote chronic inflammatory conditions49. As ERV function has also been linked to self-renewal50, ERV induction by SOX2 may have capacity to contribute to tumor formation beyond activation of innate immune signaling. Despite our hypothesis that ERV induction contributes to tumorigenesis, other studies demonstrate anti-tumor function of ERV stimulation51. Indeed, use of epigenetic agents to derepress ERVs to stimulate IFN induction can augment efficacy of immune checkpoint inhibitors (ICIs)51. Therefore, SOX2’s ability to promote expression of ERVs and dsRNA may also reveal new vulnerabilities.
Loss of ADAR1 removes a checkpoint that restrains sensing of IFN-inducible dsRNA, leading to enhanced stimulation of IFNs and pro-inflammatory cytokines52. Our data demonstrate dependence upon ADAR1 in SOX2-driven ESCC models. ADAR1 has been noted to be overexpressed in ESCC relative to normal esophagus, and levels of ADAR1 correlate with poor survival53. We initially hypothesized that ADAR1 dependence may have resulted from greater baseline IFN expression with SOX2 amplification. However, our data demonstrated that the marked responsiveness of SCPP to Adar1 loss was out of proportion to the differential effects of stimulating and IFN response in CPP or SCPP with dsRNA or dsDNA or viral infection. When coupled to our data on increased expression of ERVs and dsRNA, our data suggest that Adar1 mitigates effects from Sox2-mediated induction of ERVs and dsRNAs. Interestingly, Ishizuka et al. also found that sites edited by Adar1 in B16 melanoma cells were enriched at loci of 3′ UTRs within ISGs36. We confirmed greater Adar1 editing at several ERV families induced by Sox2 in SCPP compared with CPP This proposed role for ERV induction in stimulating Adar1 dependence is consistent with recent data that epigenetic therapies trigger immunogenic dsRNA formation leading to Adar1 dependency54. However, other Sox2 roles may contribute to Adar1 dependence. Sox2’s induction of cytoplasmic dsRNA and dsDNA sensors may further promote the cascade of IFN induction induced by loss of Adar1 in the setting of ERV induction.
Results regarding ADAR1 dependence have multiple implications. First, they indicate that ADAR1 targeting may have anti-tumor effects in esophageal and other SCCs. Targeting ADAR1 has been demonstrated to augment the efficacy of ICI36, suggesting ADAR1 therapy may be even more effective in concert with these agents. Moreover, the finding that oncogenic SOX2 promotes expression of ERVs has additional implications for modes of enhancing ICI therapy. Epigenomic therapies can derepress ERVs with resulting IFN induction and ICI potentiation55. Combining epigenetic therapies with ADAR1 inhibitors represents a promising strategy for cancer treatment54.
In summary, these results suggest not only that SOX2 amplification reflects selection for tumor cells to maintain SOX2 expression, but that amplification promotes chromatin remodeling and marked evolution of the SOX2 cistrome. The observation that SOX2 acts jointly with other factors, notably KLF5, to engage this altered transcriptional program suggests this acquired SOX2 program is ‘wired’ into squamous biology and hijacked by squamous cancers. Our results demonstrate that use of organoid models coupled to genomic engineering and epigenomic studies enables functional evolution of cells with transformation and transcription factor action changes during tumorigenesis. Our ability to differentiate the functions of SOX2 in normal and neoplastic tissue yields hope that we can identify targets with greater dependence in SOX2+ ESCCs than in normal SOX2-expressing cells, ultimately facilitating development of therapies for ESCCs.
Methods
Generation of mouse cohorts.
All mice used in this study were housed in the animal facilities at Dana-Farber Cancer Institute (Boston, MA, USA). The Rosa26R-lox-stop-lox-Sox2-IRES-GFP mice (hereafter referred to as Sox2) were generously gifted by Dr. Mark Onaitis and H11-lox-stop-lox-Cas9 (hereafter referred to as Cas9) mice were obtained from The Jackson Laboratory (lgs2tm1.1(CAG-cas9*)Mmw/J, Stock No. 027650). The Sox2 and Cas9 mice were bred to obtain a homozygous Sox2; Cas9 colony. Genotyping was confirmed with appropriate primers (Supplementary Table 7). All breeding and care procedures were approved by the Dana-Farber Animal Care and Use Committee (Protocol number: 11-009) and all mice were housed in a pathogen-free environment, with 12-h-light/12-h-dark cycle and at 65–70 °F with 40–60% humidity, at the DFCI animal facility, and procedures were carried out in strict accordance with the guidelines of the Institutional Animal Care and Use Committee of the Dana-Farber Cancer Institute.
Isolation and culture of murine esophageal organoids.
Following humane killing, one 6–8-week-old female mouse esophagus of each genotype was removed and opened longitudinally. The tissue was rinsed with Hank’s salt (HBSS) supplemented with Gibco antibiotic-antimycotic, and then ice-cold PBS, with vigorous shaking. After washing, the epithelial layer from the mucosa of the esophagus was isolated with tweezers, then minced with fine scissors. Collagenase (1 ml, Invitrogen) solution was added to suspend the tissue fragment and then incubated in a thermomixer for 10 min at 37 °C. We then incubated the epithelial fragments in 1 ml of 0.25% trypsin-EDTA at 37 °C for 10 min, briefly vortexed, rinsed the tissue with 8 ml of soybean trypsin inhibitor and repeated the trypsinization and wash cycle once more. When single epithelial units were observed separating from the larger tissue fragments from the dissection microscope, we then filtered the dissociated tissue through a 40-μm cell strainer (BD) and washed the strainer with 9 ml of washing medium (penicillin (100 U ml−1), streptomycin (0.1 mg ml−1), l-glutamine (2 mM) and FBS (10%) in DMEM: Nutrient Mixture F-12 (DMEM/F12) (Invitrogen) with HEPES). We centrifuged the cells at 200g for 5 min before resuspending the cells in membrane matrix (Corning Matrigel, 20 μl per well). After solidification, organoid medium was applied (advanced DMEM/F12 medium supplemented with Glutamax, 0.15 mM HEPES, N2 Supplement, B27 Supplement, 1 μM N-acetylcysteine, 50 ng ml−1 human epidermal growth factor (EGF) and 3% conditioned medium from L-WRN cells containing Wnt3a, Noggin and R-spondin). Epithelial organoids were maintained for successive passage (>30 passages). For subsequent passages, organoids were dissociated by incubation in 0.25% trypsin at 37 °C for 10 min. The trypsinization process disrupted the spherical organoids into cell aggregates that were then embedded in fresh Matrigel.
Subcutaneous and orthotopic implantation.
Before flank or orthotopic transplantation, organoids were collected and dissociated to approximately single cells using trypsin (Life Technologies). For subcutaneous implantation of organoids tumor formation experiments and KYSE70 cancer cell line shRNA experiments, approximately 2.5 × 106 cells in 200 μl of mixture (1:1 Matrigel/medium ratio) were injected subcutaneously into the flank of nude mice (6–8 weeks old, female, Nu/Nu; Jackson Laboratory; 3–5 mice per group). For SCPP organoids shKLF5 tumor formation experiments, approximately 2.5 × 106 cells in 200 μl of mixture (1:1 Matrigel/medium ratio) were injected subcutaneously into the flank of NSG mice (6–8 weeks old, female, NOD.Cg-Prkdcscid Il2rgtm1Wjl/SzJ; Jackson Laboratory; seven mice per group). For orthotopic implantation, female nude mice (6–8 weeks old, Nu/Nu; Jackson Laboratory) were anesthetized with isoflurane, and an incision was made through the left upper abdominal pararectal line and peritoneum. The forestomach and stomach were carefully exposed. Resuspended cells (1 × 106) in a 50-μl mixture of Matrigel and media (1:1) were slowly injected into the squamous forestomach wall, creating a single bubble of cells beneath the out layer of forestomach. The abdominal wall and skin were closed with 5–0 Ethicon sutures (ET-661G). Buprenex was used every 12 h in the first postoperative 48 h to relieve the intra- and postoperative pain.
Animal treatment studies.
After cells/organoids were subcutaneously implanted, mice were examined every 5–7 d, and tumor length and width were measured using calipers. Tumor volume was calculated using the following formula: (length × width2) × 0.5. For doxycycline treatment, mice were treated with normal food or doxycycline food during the whole experimental time point. All animal experiments were conducted in accordance with procedures approved by the institutional Animal Care and Use Committee at the Dana-Farber Cancer Institute, in compliance with National Institutes of Health guidelines. At killing, portions of tumors were snap-frozen and stored in liquid nitrogen or were fixed in 10% buffered formalin for routine histopathologic processing.
ChIP–seq and analysis.
ChIP–seq assays were performed as previously described56. Briefly, cells were crosslinked with 1% formaldehyde and lysed. The chromatin extract was sonicated by a Diagenode bioruptor and immunoprecipitated with antibodies that were coincubated with Dynabeads A and G (Thermo Scientific). Antibodies used include SOX2 (6 μg per chromatin immunoprecipitation (ChIP); R&D AF2018), H3K27ac (2 μg per ChIP; Abcam, ab4729), KLF5 (4 μg per ChIP; Abcam, ab137676) and H3K4me1 (2 μg per ChIP; Abcam, ab8895). The sequencing libraries were prepared using the NEB ChIP–seq library prep kit (NEB, E6200L) and sequenced on the Illumina NextSeq instrument (75-base pair (bp) single-end reads). Sequencing reads were aligned to the hg19 human genome or the mm9 mouse genome. We used ChiLin pipeline 2.0.0 (ref. 57) for ChIP–seq quality control and preprocess. We used Burrows–Wheeler Aligner (BWA)58 to align sequencing reads, and Model-based Analysis of ChIP-Seq (MACS2)59,60 to call narrow peaks (cut-off of false discovery rate = 0.01). We applied DEseq2 to identify differential binding sites between normal and SCPP, normal and SC, and CPP and SCPP (two biological replicates were used for each condition). We defined the gained/lost peaks with a cut-off of adjusted P value of 0.05. We applied the Hypergeometric Optimization of Motif EnRichment (HOMER) analysis61 to the differential binding sites to identify enriched transcription factor motifs. For heatmap presentation, we merged the biological replicates for each condition and used Deeptools62 to generate heatmaps. We measured the sample–sample correlation based on ChIP–seq signal (reads per kilobase of transcript per million mapped reads, rpkm) across all the detected peaks that harbor ChIP–seq signal rpkm > 1.
SE identification.
For each organoid/cell type, H3K27ac ChIP–seq data from multiple replicate experiments were merged into one dataset. Based on the merged ChIP–seq results, including the aligned reads and MACS2 binding peaks, we called putative SEs using the Ranking Of Super Enhancer (ROSE) pipeline (rank ordering of SEs)30,63,64, which yielded rankings of enhancer status and super status for each stitched enhancer. SEs with any overlap (≥1-bp overlap) with a Sox2 ChIP–seq peak were annotated as associated with Sox2 binding.
BETA analysis to combine ChIP–seq and RNA-seq results.
BETA29 was performed to predict whether Sox2 has activating or repressive function by combining ChIP–seq and RNA-seq results. The analysis pipeline was performed as previously described29. Briefly, BETA estimates Sox2’s regulatory potential score for each gene based on the distance between Sox2 binding sites and TSSs of each gene, and also based on the number of Sox2 binding sites ±50 kb centered at the TSS of each gene. BETA then uses a nonparametric statistical test (Kolmogorov–Smirnov test) to compare regulatory potential scores for genes that are upregulated, downregulated or not regulated on the basis of RNA-seq results with and without Sox2 overexpression.
ATAC–seq.
Approximately 50,000 cells were resuspended in 1 ml of cold ATAC–seq resuspension buffer (RSB; 10 mM Tris-HCl pH 7.4, 10 mM NaCl and 3 mM MgCl2 in water). Cells were centrifuged at 17 × 103g for 5 min in a prechilled (4 °C) fixed-angle centrifuge. After centrifugation, supernatant was carefully aspirated. Cell pellets were then resuspended in 50 μl of ATAC–seq RSB containing 0.1% Nonidet P40, 0.1% Tween-20 and 0.01% digitonin by pipetting up and down three times. This cell lysis reaction was incubated on ice for 3 min. After lysis, 1 ml of ATAC–seq RSB containing 0.1% Tween-20 (without Nonidet P40 or digitonin) was added, and the tubes were inverted to mix. Nuclei were then centrifuged for 5 min at 17 × 103g in a prechilled (4 °C) fixed-angle centrifuge. Supernatant was removed and nuclei were resuspended in 50 μl of transposition mix65 including 2.5 μl of transposase (100 nM final), 16.5 μl of PBS, 0.5 μl of 1% digitonin, 0.5 μl of 10% Tween-20 and 5 μl of water by pipetting up and down six times. Transposition reactions were incubated at 37 °C for 30 min in a thermomixer with shaking at 1,000 r.p.m. Reactions were cleaned up with Qiagen columns. Libraries were amplified as described previously66. Motif search was performed by HOMER2 (ref. 61). The sample–sample correlation heatmap was generated by the package ‘ggplot2’ in R. The rpkm threshold we set for all the samples was 1, and we used all the detected peaks for the heatmap.
mRNA-seq analysis.
RNA was extracted using Qiagen RNeasy kit and treated with on-column DNase I. RNA-seq libraries were prepared using the NEBNet Ultra Directional RNA Library Prep Kit (NEB, E7420S) and sequenced on the Illumina NextSeq instrument (75-bp single-end reads for baseline organoids mRNA-seq and 150-bp paired-end reads for SCPP shKlf5 mRNA-seq). Read alignment, quality control and data analysis were performed using Visualization Pipeline for RNA-seq (VIPER)67. Sequencing reads were aligned using STAR read alignment68. Read counts for each gene were generated by Cufflinks69. We used DEseq2 results for further downstream differential expression analysis70 and we performed Gene Ontology term analysis and GSEA on the differential expressed gene sets71.
Total RNA isolation, dsRNA enrichment and qPCR.
RNA was extracted 2 d after plating or transfection. Total RNA was extracted from whole cell lysates via the QIAGEN RNeasy kit with on-column DNase I treatment (Qiagen). cDNA was reverse transcribed from 1 μg of RNA using the Invitrogen reverse transfection kit according to the manufacturer’s instructions. For dsRNA enrichment, RNA was first treated for 30 min with 50 μg of ml-1 RNase A (Qiagen) in high salt concentration (NaCl, 0.35 M) to prevent dsRNA degradation. After treatment, RNase A was removed by ethanol precipitation and the product was resuspended in sterile water. Gene-specific primers for SYBR Green real-time PCR were either obtained from previously published sequences or designed by PrimerBlast (https://www.ncbi.nlm.nih.gov/tools/primer-blast/). Primer sequences targeting genes of interest are listed in Supplementary Table 7. mRNA expression levels were quantified in technical triplicates on a StepOnePlus Cycler (Applied Biosystems) with 10 ng of cDNA in a 20-μl reaction volume using ThermoFisher’s Power SYBR Green Master Mix. Relative mRNA expression was determined by normalizing to GAPDH or beta-ACTIN expression, which served as an internal control.
Ribo-depleted RNA-seq.
RNA isolation was performed using RNeasy Mini Kit (Qiagen) with the standard protocol. The quality of RNA was evaluated by Bioanalyzer (Agilent Technologies). RNA samples were then fragmented at 94 °C for 8 min per manufacturer’s recommendations. The sequencing library was prepared using KAPA RNA HyperPrep Kit with RiboErase (Roche) from 100 ng of fragmented RNA. Library quantity and quality were assessed by Qubit fluorometer and Agilent TapeStation 2200. Multiple libraries were then pooled and sequenced on an Illumina NovaSeq with 2 × 50-bp paired-end mode at the Dana-Farber Cancer Institute Molecular Biology Core Facilities.
Ribo-depleted RNA-seq gene expression analysis.
Residual rRNA reads were removed with SortMeRNA (https://bioinfo.lifl.fr/RNA/sortmerna/)72 from raw Fastq files. FastQC (Babraham Institute) was subsequently performed on postfiltered Fastq files to check any sample outliers. Reads were then aligned to mouse reference database mm9 using STAR (v.020201) with standard Encyclopedia of DNA Elements (ENCODE) parameters68. Transcript quantification was performed using RNA-Seq by Expectation-Maximization (RSEM v.1.2.31)73.
Mouse tumor whole exome sequencing.
Genomic DNA was extracted using QIAamp DNA Mini Kit. Extracted genomic DNA was quantified with Qubit 2.0 DNA HS Assay (ThermoFisher) and the quality of DNA was assessed by Tapestation Genomic DNA Assay (Agilent Technologies). Library construction was performed using Agilent SureSelectXT Mouse All Exon kit (Agilent Technologies) per the manufacturer’s recommendations. Library quantity and quality were assessed with Qubit 2.0 DNA HS Assay, Tapestation High Sensitivity D1000 Assay (Agilent Technologies) and QuantStudio 5 System (ThermoFisher). Multiple libraries were then pooled and sequenced on an Illumina NovaSeq (Illumina) with 2 × 150-bp paired-end mode targeting 80 million reads per sample (~120X coverage). SNPs and small insertions and deletions (InDels) of each sample were called by the Genome Analysis Toolkit (GATK)74 and annotated by ANNOVA75. Additional filters were then applied to only retain variants that are either nonsynonymous exonic variants or splicing variants of genes included in the Cancer Gene Census Tier 1 list (https://cancer.sanger.ac.uk/cosmic). Germline information of these variants was annotated using the Mouse Genome Informatics database (http://www.informatics.jax.org/).
Dependency analysis.
We analyzed dependency (Achilles_gene_effect.csv) and copy number (CCLE_gene_cn.csv) data downloaded from the Cancer Dependency Map’s 19Q3 release (DepMap, Broad (2019): DepMap Achilles 19Q1 Public. figshare. Fileset. doi: 10.6084/m9.figshare.7655150)76,77. To identify dependencies enriched in SOX2-amplified cell lines (Fig. 5a), we used linear model analysis. Specifically, we used the R package limma78 to compute effect sizes (mean difference between cell line groups) for each gene dependency, and P values were based on empirical Bayes moderated t-statistics. SOX2-amplified lines are defined as those with copy number > 1, with copy number defined as log2(copy ratio), and cell line annotations are shown in Supplementary Table 8.
dsDNA or dsRNA stimulation.
Cells (2 × 105 to 5 × 105) were plated onto a six-well plate and transfected using X-tremeGENE HP DNA Transfection Reagent (Roche, Cat. no. 06366236001) with the indicated amount of poly(dA:dT) (InvivoGen, Cat. no. tlrl-patn) or poly(I:C) (InvivoGen, Cat. no. tlrl-pic), 5′ppp-dsRNA (InvivoGen, Cat. no. tlrl-3prna) or 5′ppp-dsRNA Control (InvivoGen, Cat. no. tlrl-3prnac).
H1N1 virus stimulation.
Cells (2 × 105 to 5 × 105) were plated onto a six-well plate and cultured overnight and infected with GFP-labeled influenza A/PR8/34 (H1N1) with 2 μg ml−1 TPCK-trypsin containing virus particles (multiplicity of infection = 0.01). Cells were cultured for 24 h and total RNA was extracted.
Epigenomic analyses at ERV elements.
To compare ChIP–seq and ATAC–seq signals at ERV elements within gained Sox2 binding sites among different organoids, replicate bigwig files were merged and overlapped with ERV elements harboring gained Sox2 binding sites using bigWigAverageOverBed (http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64.v369/bigWigAverageOverBed). The calculated mean0 signal values were used for plotting.
RNA-seq analyses for ERV expression.
Mouse ERV expression was analyzed with rRNA-depleted RNA-seq reads. Residual rRNA reads were firstly removed with SortMeRNA (https://bioinfo.lifl.fr/RNA/sortmerna/)72. Postfiltered reads were aligned to mouse reference database mm9 using STAR algorithm with modified parameters as per the recommendation of multi-mapping (--outFilterMultimapNmax 100 --winAnchorMultimapNmax 100)68. Mapped reads were assigned to the annotated mouse transposable elements GTF file (downloaded from https://github.com/mhammell-laboratory/TEtranscripts) using TEcount from the TEtranscript package79,80. Differentially expressed mouse transposable elements between different organoids were identified using false discovery rate of 0.1 and log2 fold change of 0.5 with DESeq2 (ref. 70). Results were plotted using ‘EnhancedVolcano’ R package v.1.01.
Human ERV expression was analyzed using the ERVmap pipeline (https://www.ervmap.com/). In brief, mRNA-seq reads were aligned with BWA and annotated with the ERVmap database extracted from published studies81. DESeq2 was then performed to compute differentially expressed human ERVs between different conditions. Results were plotted using ‘EnhancedVolcano’ R package v.1.01.
Enrichment analyses of mouse ERV elements.
Genomic coordinates of mouse ERV elements were extracted from the mouse transposable elements database (downloaded from https://github.com/mhammell-laboratory/TEtranscripts)79,80. The number of gained or unaffected Sox2 binding sites in ERV elements was calculated by intersecting corresponding Sox2 peak bed files with mouse ERV genomic coordinates using BEDTools 2.28. The ratio between mouse ERVs with gained Sox2 binding sites and mouse ERVs with unaffected Sox2 binding sites was then calculated and compared against the background, as determined with the total mouse genes.
Analyses of SOX2-bound genes for ERV elements.
To determine whether SOX2-bound genes are enriched for genes harboring ERV elements in the intron/3′ UTR regions, the following steps were taken: First, we performed identification of genes harboring ERV elements in the intron/3′ UTR regions by intersecting known human ERV sites (https://www.girinst.org/repbase/) with sites annotated as introns/3′ UTRs in the UCSC genome browser (https://genome.ucsc.edu/) using BEDTools 2.28 (https://bedtools.readthedocs.io/en/latest/). Of note, 3′ UTRs were generated by extending the region 5,000 bp upstream of UCSC-annotated 3′ UTR start sites to 5,000 bp downstream of UCSC-annotated 3′ UTR end sites. Second, the presence of SOX2 binding in the gene of interest was determined by overlapping SOX2 ChIP–seq peaks with genomic coordinate information. Third, the ratio of Sox2-bound genes with or without intron/3′ UTR ERV elements was then calculated and compared against the background, as determined with the total UCSC genes.
ADAR1 editing index analysis.
The editing analysis was performed using the RNA-seq data. Read quality was assessed using FastQC v.0.11.5 and duplicates marked using Picard 2.25.0. The data were aligned to the mm10 genome using STAR 2.6.1b using default parameters and the additional parameters --outFilterMultimapNmax and --outAnchorMultimapNmax set to 50 and 100, respectively. As described previously, TEtranscripts was used to generate a list of differentially expressed transposable element families. Bed files corresponding to each of these families were generated from a gtf file corresponding to the complete mouse transposable elements database (downloaded from https://github.com/mhammell-laboratory/TEtranscripts) using BEDTools v.2.27.1. Next, an A-to-I editing index was generated on a per-family basis using a previously published method called the RNAEditingIndxer82. In short, this method calculates the ratio of A-to-G mismatches divided by the total coverage of all ‘A’s in a set of predefined genomic regions, providing a normalized measure of total editing activity. A docker image of the software was obtained (https://github.com/a2iEditing/RNAEditingIndexer) and the software was used to assess each biological replicate across the genomic coordinates corresponding to every differentially expressed transposable element family. All default parameters except --snps were used and the additional parameter --regions was used where the bed file corresponding to a transposable element family was inputted.
Statistical analysis.
Data are represented as mean ± s.d. or s.e.m. as indicated in the figure legends. For each experiment, the numbers of independent biological experiments are as noted in the figure legends, with representative images shown of replicates with similar results. Statistical analysis was performed using Microsoft Office statistical tools or in Prism for macOS v.9.0.1 (128) (GraphPad). Pairwise comparisons between groups (experimental versus control) were performed using an unpaired two-tailed Student’s t-test, one-way analysis of variance (ANOVA) comparison test or two-way ANOVA, as appropriate. P < 0.05 is considered to be statistically significant. P values are denoted by *P < 0.05, **P < 0.01, ***P < 0.001, ****p < 0.0001; exact P values were calculated by the PRISM software package, in which the maximal calculated value is P < 0.0001. For all experiments, the variance between comparison groups was found to be equivalent.
Extended Data
Supplementary Material
Acknowledgements
We thank A. Iwasaki and M. Tokuyama for help with the ERV transcriptome analysis. We thank members of the Bass laboratory, S. Kitajima, S. Gu, X. Wang and H. Singh, for insightful discussions. We thank D.E. Ingber for kindly sharing the influenza A/WSN/33 (H1N1) virus strain and influenza A/PR8/34 (H1N1) virus strain, M. Berkeley and the MBCF Core for assistance with RNA sequencing, and the Molecular Pathology Core of P01 CA098101 for assistance with immunohistochemistry staining. The research was supported by the Twomey Family Fellowship in Esophageal Cancer Research (J.Z.). A.J.B. was supported by NIH grants no. R01 CA196932 and no. R01 CA187119. A.K.R., K.-K.W. and A.J.B. were supported by NIH grant no. P01 CA098101. X.Z. was supported by NIH grant no. R00 CA215244.
Footnotes
Competing interests
A.J.B. receives research funding from Bayer, Merck and Novartis; is a consultant to Earli and HelixNano; and is a cofounder of Signet Therapeutics. K.-K.W. is a founder and equity holder of G1 Therapeutics and has consulting/sponsored research agreements with MedImmune, Takeda, TargImmune, BMS, AstraZeneca, Janssen, Pfizer, Novartis, Merck, Ono and Array. The remaining authors declare no competing interests.
Extended data is available for this paper at https://doi.org/10.1038/s41588-021-00859-2.
Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41588-021-00859-2.
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Code availability
All software and bioinformatic tools used in the present study are publicly available.
Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41588-021-00859-2.
Data availability
Further information and requests for resources and reagents should be directed to and will be fulfilled by the corresponding author. ChIP–seq, ATAC–seq, exome and RNA-seq data generated in this study were deposited to Gene Expression Omnibus (GEO) under the series GSE167367. Source data are provided with this paper.
References
- 1.Bass AJ et al. SOX2 is an amplified lineage-survival oncogene in lung and esophageal squamous cell carcinomas. Nat. Genet 41, 1238–1242 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Campbell JD et al. Distinct patterns of somatic genome alterations in lung adenocarcinomas and squamous cell carcinomas. Nat. Genet 48, 607–616 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Taylor AM et al. Genomic and functional approaches to understanding cancer aneuploidy. Cancer Cell 33, 676–689 e3 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Campbell JD et al. Genomic, pathway network, and immunologic features distinguishing squamous carcinomas. Cell Rep. 23, 194–212 e6 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Dotto GP & Rustgi AK Squamous cell cancers: a unified perspective on biology and genetics. Cancer Cell 29, 622–637 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Cancer Genome Atlas Research, N. et al. Integrated genomic characterization of oesophageal carcinoma. Nature 541, 169–175 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Yu J et al. Induced pluripotent stem cell lines derived from human somatic cells. Science 318, 1917–1920 (2007). [DOI] [PubMed] [Google Scholar]
- 8.Wernig M et al. In vitro reprogramming of fibroblasts into a pluripotent ES-cell-like state. Nature 448, 318–324 (2007). [DOI] [PubMed] [Google Scholar]
- 9.Avilion AA et al. Multipotent cell lineages in early mouse development depend on SOX2 function. Genes Dev. 17, 126–140 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Takahashi K & Yamanaka S Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676 (2006). [DOI] [PubMed] [Google Scholar]
- 11.Masui S et al. Pluripotency governed by Sox2 via regulation of Oct3/4 expression in mouse embryonic stem cells. Nat. Cell Biol 9, 625–635 (2007). [DOI] [PubMed] [Google Scholar]
- 12.Que J et al. Multiple dose-dependent roles for Sox2 in the patterning and differentiation of anterior foregut endoderm. Development 134, 2521–2531 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Yuan P et al. Sex determining region Y-Box 2 (SOX2) is a potential cell-lineage gene highly expressed in the pathogenesis of squamous cell carcinomas of the lung. PLoS ONE 5, e9112 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Watanabe H et al. SOX2 and p63 colocalize at genetic loci in squamous cell carcinomas. J. Clin. Invest 124, 1636–1645 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Garraway LA & Sellers WR Lineage dependency and lineage-survival oncogenes in human cancer. Nat. Rev. Cancer 6, 593–602 (2006). [DOI] [PubMed] [Google Scholar]
- 16.Sulahian R et al. An integrative analysis reveals functional targets of GATA6 transcriptional regulation in gastric cancer. Oncogene 33, 5637–5648 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Salari K et al. CDX2 is an amplified lineage-survival oncogene in colorectal cancer. Proc. Natl Acad. Sci. USA 109, E3196–E3205 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Garraway LA et al. Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma. Nature 436, 117–122 (2005). [DOI] [PubMed] [Google Scholar]
- 19.Adler EK et al. The PAX8 cistrome in epithelial ovarian cancer. Oncotarget 8, 108316–108332 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Boumahdi S et al. SOX2 controls tumour initiation and cancer stem-cell functions in squamous-cell carcinoma. Nature 511, 246–250 (2014). [DOI] [PubMed] [Google Scholar]
- 21.Siegle JM et al. SOX2 is a cancer-specific regulator of tumour initiating potential in cutaneous squamous cell carcinoma. Nat. Commun 5, 4511 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Justilien V et al. The PRKCI and SOX2 oncogenes are coamplified and cooperate to activate Hedgehog signaling in lung squamous cell carcinoma. Cancer Cell 25, 139–151 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ferone G et al. SOX2 is the determining oncogenic switch in promoting lung squamous cell carcinoma from different cells of origin. Cancer Cell 30, 519–532 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Mukhopadhyay A et al. Sox2 cooperates with Lkb1 loss in a mouse model of squamous cell lung cancer. Cell Rep. 8, 40–49 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lazarus KA et al. BCL11A interacts with SOX2 to control the expression of epigenetic regulators in lung squamous carcinoma. Nat. Commun 9, 3327 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Daniely Y et al. Critical role of p63 in the development of a normal esophageal and tracheobronchial epithelium. Am. J. Physiol. Cell Physiol 287, C171–C181 (2004). [DOI] [PubMed] [Google Scholar]
- 27.Jiang Y et al. Co-activation of super-enhancer-driven CCAT1 by TP63 and SOX2 promotes squamous cancer progression. Nat. Commun 9, 3619 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Jiang YY et al. TP63, SOX2, and KLF5 establish a core regulatory circuitry that controls epigenetic and transcription patterns in esophageal squamous cell carcinoma cell lines. Gastroenterology 159, 1311–1327.e19 (2020). [DOI] [PubMed] [Google Scholar]
- 29.Wang S et al. Target analysis by integration of transcriptome and ChIP-seq data with BETA. Nat. Protoc 8, 2502–2515 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Hnisz D et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zhang X et al. Identification of focally amplified lineage-specific super-enhancers in human epithelial cancers. Nat. Genet 48, 176–182 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Zhang X et al. Somatic superenhancer duplications and hotspot mutations lead to oncogenic activation of the KLF5 transcription factor. Cancer Discov. 8, 108–125 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rogerson C et al. Repurposing of KLF5 activates a cell cycle signature during the progression from a precursor state to oesophageal adenocarcinoma. eLife 9, e57189 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Tsherniak A et al. Defining a cancer dependency map. Cell 170, 564–576 e16 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Gannon HS et al. Identification of ADAR1 adenosine deaminase dependency in a subset of cancer cells. Nat. Commun 9, 5450 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Ishizuka JJ et al. Loss of ADAR1 in tumours overcomes resistance to immune checkpoint blockade. Nature 565, 43–48 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Liu H et al. Tumor-derived IFN triggers chronic pathway agonism and sensitivity to ADAR loss. Nat. Med 25, 95–102 (2019). [DOI] [PubMed] [Google Scholar]
- 38.Canadas I et al. Tumor innate immunity primed by specific interferon-stimulated endogenous retroviruses. Nat. Med 24, 1143–1150 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Dodonova SO, Zhu F, Dienemann C, Taipale J & Cramer P Nucleosome-bound SOX2 and SOX11 structures elucidate pioneer factor function. Nature 580, 669–672 (2020). [DOI] [PubMed] [Google Scholar]
- 40.Liu X et al. Tead and AP1 coordinate transcription and motility. Cell Rep. 14, 1169–1180 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lee H, Jeong AJ & Ye SK Highlighted STAT3 as a potential drug target for cancer therapy. BMB Rep. 52, 415–423 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Karakasheva TA et al. IL-6 mediates cross-talk between tumor cells and activated fibroblasts in the tumor microenvironment. Cancer Res. 78, 4957–4970 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Tarapore RS, Yang Y & Katz JP Restoring KLF5 in esophageal squamous cell cancer cells activates the JNK pathway leading to apoptosis and reduced cell survival. Neoplasia 15, 472–480 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Ge Y et al. Stem cell lineage infidelity drives wound repair and cancer. Cell 169, 636–650 e14 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Karin M & Clevers H Reparative inflammation takes charge of tissue regeneration. Nature 529, 307–315 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.McConnell BB et al. Kruppel-like factor 5 protects against dextran sulfate sodium-induced colonic injury in mice by promoting epithelial repair. Gastroenterology 140, 540–549 e2 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Mu X, Ahmad S & Hur S Endogenous retroelements and the host innate immune sensors. Adv. Immunol 132, 47–69 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Zeng M et al. MAVS, cGAS, and endogenous retroviruses in T-independent B cell responses. Science 346, 1486–1492 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- 49.Baudino L, Yoshinobu K, Morito N, Santiago-Raber ML & Izui S Role of endogenous retroviruses in murine SLE. Autoimmun. Rev 10, 27–34 (2010). [DOI] [PubMed] [Google Scholar]
- 50.Ohnuki M et al. Dynamic regulation of human endogenous retroviruses mediates factor-induced reprogramming and differentiation potential. Proc. Natl Acad. Sci. USA 111, 12426–12431 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Sheng W et al. LSD1 ablation stimulates anti-tumor immunity and enables checkpoint blockade. Cell 174, 549–563 e19 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Liddicoat BJ et al. RNA editing by ADAR1 prevents MDA5 sensing of endogenous dsRNA as nonself. Science 349, 1115–1120 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Qin YR et al. Adenosine-to-inosine RNA editing mediated by ADARs in esophageal squamous cell carcinoma. Cancer Res. 74, 840–851 (2014). [DOI] [PubMed] [Google Scholar]
- 54.Mehdipour P et al. Epigenetic therapy induces transcription of inverted SINEs and ADAR1 dependency. Nature 588, 169–173 (2020). [DOI] [PubMed] [Google Scholar]
- 55.Roulois D et al. DNA-demethylating agents target colorectal cancer cells by inducing viral mimicry by endogenous transcripts. Cell 162, 961–973 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Zhang X, Cowper-Sal lari R, Bailey SD, Moore JH & Lupien M Integrative functional genomics identifies an enhancer looping to the SOX9 gene disrupted by the 17q24.3 prostate cancer risk locus. Genome Res. 22, 1437–1446 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Qin Q et al. ChiLin: a comprehensive ChIP-seq and DNase-seq quality control and analysis pipeline. BMC Bioinformatics 17, 404 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Li H & Durbin R Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Zhang Y et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Liu T Use Model-based Analysis of ChIP-Seq (MACS) to analyze short reads generated by sequencing protein–DNA interactions in embryonic stem cells. Methods Mol. Biol 1150, 81–95 (2014). [DOI] [PubMed] [Google Scholar]
- 61.Heinz S et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Ramirez F et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Loven J et al. Selective inhibition of tumor oncogenes by disruption of super-enhancers. Cell 153, 320–334 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Whyte WA et al. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 153, 307–319 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Corces MR et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Buenrostro JD, Wu B, Chang HY & Greenleaf WJ ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol 109, 21.29.1–21.29.9 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Cornwell M et al. VIPER: Visualization Pipeline for RNA-seq, a Snakemake workflow for efficient and complete RNA-seq analysis. BMC Bioinformatics 19, 135 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Dobin A et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Trapnell C et al. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol 28, 511–515 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Love MI, Huber W & Anders S Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Subramanian A et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Kopylova E, Noe L & Touzet H SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 28, 3211–3217 (2012). [DOI] [PubMed] [Google Scholar]
- 73.Li B & Dewey CN RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.McKenna A et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Wang K, Li M & Hakonarson H ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Meyers RM et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat. Genet 49, 1779–1784 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Ghandi M et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Ritchie ME et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Jin Y, Tam OH, Paniagua E & Hammell M TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics 31, 3593–3599 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Jin Y & Hammell M Analysis of RNA-seq data using TEtranscripts. Methods Mol. Biol 1751, 153–167 (2018). [DOI] [PubMed] [Google Scholar]
- 81.Tokuyama M et al. ERVmap analysis reveals genome-wide transcription of human endogenous retroviruses. Proc. Natl Acad. Sci. USA 115, 12565–12572 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Roth SH, Levanon EY & Eisenberg E Genome-wide quantification of ADAR adenosine-to-inosine RNA editing activity. Nat. Methods 16, 1131–1138 (2019). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Further information and requests for resources and reagents should be directed to and will be fulfilled by the corresponding author. ChIP–seq, ATAC–seq, exome and RNA-seq data generated in this study were deposited to Gene Expression Omnibus (GEO) under the series GSE167367. Source data are provided with this paper.