Abstract
Chromosomal rearrangements can generate genetic fusions composed of two distinct gene sequences, many of which have been implicated in tumorigenesis and progression. Our study proposes a model whereby oncogenic gene fusions frequently alter the protein stability of the resulting fusion products, via exchanging protein degradation signal (degron) between gene sequences. Computational analyses of The Cancer Genome Atlas (TCGA) identify 2,406 cases of degron exchange events and reveal an enrichment of oncogene stabilization due to loss of degrons from fusion. Furthermore, we identify and experimentally validate that some recurrent fusions, such as BCR-ABL, CCDC6-RET and PML-RARA fusions, perturb protein stability by exchanging internal degrons. Likewise, we also validate that EGFR or RAF1 fusions can be stabilized by losing a computationally-predicted C-terminal degron. Thus, complementary to enhanced oncogene transcription via promoter swapping, our model of degron loss illustrates another general mechanism for recurrent fusion proteins in driving tumorigenesis.
Subject terms: Cancer genomics, Ubiquitin ligases, Ubiquitylation, Cancer genetics, Oncogenes
The impact of genetic fusions on degrons, which are motifs for ubiquitin-mediated protein degradation, has not been fully explored. Here, the authors analyse fusion genes affecting degrons in pan-cancer genomics data, validate their functional impact and find enrichment for both internal and C-terminal degron losses.
Introduction
Genetic alterations accumulate during the multistep processes of tumorigenesis, which lead to the transformation of normal cells into cancer cells1,2. Large-scale tumor sequencing has enabled the systematic identification of gene fusions derived from chromosomal rearrangements. The most famous chromosomal rearrangement, t(9;22), was identified in 1960 as a hallmark of chronic myeloid leukemia (LCML) and subsequently named the Philadelphia chromosome3. The Philadelphia chromosome promptes the discovery of the BCR (breakpoint cluster region)-ABL fusion4,5 and the clinical application of imatinib as a targeted therapy for treating LCML patients6. To date, chromosomal rearrangements have been reported as frequent genetic drivers of several types of human cancer, such as ETS-related gene (ERG) fusions in prostate cancer7, RET or anaplastic lymphoma kinase (ALK) fusions in lung cancer8,9, and fibroblast growth factor receptor 3 (FGFR3) fusions in bladder cancer10. According to a previous comprehensive analysis of The Cancer Genome Atlas (TCGA), there are more than 25,000 genetic fusion events, which might drive the development of approximately 16.5% of total cancer cases11. However, the molecular mechanisms underlying how these gene fusions are oncogenic remain largely unclear for most of these cases.
Several mechanisms have been proposed to explain the oncogenicity of fusion proteins12. One mechanism relies on transcriptional up-regulation due to promoter exchange between two genes, such as the fusion of ERG with the 5′-UTR of TMPRSS2 (transmembrane serine protease 2) to trigger the transcription of fusion products in prostate cancer13. Another mechanism for the oncogenic nature of fusion proteins is the constitutive activation of kinases, often achieved by dimerization or oligomerization, such as for ABL, ALK, and RET fusions5,8,9,14. A third mechanism is the loss of an auto-inhibitory segment, such as for BRAF fusions15. We hypothesized that altered protein stability could be an additional widespread mechanism for gene fusion events. While there have been a few instances characterized, such as TMPRSS2-ETV116 and TMPRSS2-ERG fusions17,18, altered protein stability has not been previously discussed as a major mechanism for the functional impact of gene fusions on tumorigenesis12.
Intracellular protein homeostasis is strictly controlled by the balance between protein synthesis by the ribosome and protein degradation by the ubiquitin proteasome system (UPS)19. Proteins are targeted for 26S proteasome-mediated proteolysis by conjugation of a poly-ubiquitin chain onto lysine residues20,21. The exquisite selectivity of the ubiquitination process on a cellular protein relies on its recognition by specific E3 ubiquitin ligase(s)22–24. There are more than 600 E3 ligases encoded in the human genome, but only a few have been extensively characterized22–24. The binding specificity of an E3 ligase is thought to be governed by short sequence motifs on the substrate, known as degrons25,26, which are typically several amino acids long. Some E3 ligases display strong locational preference for degrons at the C-terminus or N-terminus of a protein27–29, while other degrons can be found within the internal protein sequence26.
While our recent analysis suggest that oncogenic point mutations frequently perturb the function of the UPS (~19% of cancer driver genes)30, it remains unknown whether gene fusions also frequently alter protein stability. In this study, we identify 2406 fusion candidates with possible degron loss preferentially occurred in oncogenes (OG) from bioinformatics analysis across 33 cancer types, and further experimentally validate the increased protein stability resulting from loss of degrons in 5 fusions, indicating that altering protein stability due to degron loss is a general mechanism for cancer-related genetic fusions to promote tumorigenesis.
Results
A systematic computational analysis of degron loss
Previous reports of degron loss in gene fusions have largely focused on prostate cancer due to the high frequency of oncogenic fusions7,11,16. The two most common fusion events in prostate cancers involve either ERG (>50% of primary tumor samples) or ETV family transcription factors (<10%). Notably, we and others have reported that through the fusions with TMPRSS2 or other 5′ partners, ERG loses an SPOP (speckle-type POZ protein) degron, thus leading to stabilization of the fusion protein (Supplementary Fig. 1a, b)17,18. Similarly, ETVs also lose two COP1 degrons during fusion, which leads to escape from COP1-mediated degradation (Supplementary Fig. 1c–f)16. These studies prompted us to hypothesize that degron loss could be a general mechanism for genetic fusion events in driving tumorigenesis beyond prostate cancer (Fig. 1).
We therefore systematically analyzed 24,336 fusion genes reported across 9624 tumor samples in TCGA to discern the importance of a degron loss mechanism (Fig. 2a and Supplementary Data 1). Consistent with a likely substantial contribution of gain-of-function fusions towards tumorigenesis in TCGA, we found that fusions containing previously implicated oncogenes were enriched for in-frame fusions (p < 5 × 10−10, Fig. 2b) and preferentially retained functional protein domains (p < 8 × 10−6, Fig. 2c). To understand the specific contribution of degron loss, we systematically predicted internal degrons for E3 ligases with known motifs using a Random Forest machine learning model (Supplementary Fig. 2a–d, “Methods”). In addition, we also unbiasedly predicted C-terminal degrons using the deepdegron method that we previously developed30 to identify degron motifs from the global protein stability assay31. Notably, among the highly recurrent fusions (>10 tumor samples), degron loss is significantly more enriched in oncogenes (Fig. 2d, e, 30.4%) than tumor suppressor genes (Fig. 2d, e, 14%, p = 0.01, Fisher’s exact test) or likely passenger genes (Fig. 2d, e, 13.2%, p = 5 × 10−6). These results were robust to the choice of threshold for recurrent fusions (Supplementary Fig. 2e). Moreover, fusions involving oncogenes displayed a clear bias for degron loss over degron gain (Fig. 2e). In contrast to oncogenes, fusions involving tumor suppressor genes had a trend towards degron gain, although the overall number of events was relatively low. Taken together, these results indicate that degron loss could be a major contributor to the oncogenicity of gene fusions.
We next systematically discovered the specific genes involved in fusions that preferentially underwent degron loss. By first analyzing internal degrons, we identified 47 genes where gene fusions led to more predicted loss of internal degrons than expected (q < 0.1, permutation test, “Methods”32), which contained several known oncogenes, such as ABL1, RET, and IGF1R (insulin like growth factor 1 receptor) fusions (Fig. 2e, Supplementary Fig. 2f, g and Supplementary Data 2). Likewise, genes that are fusion partners to well-known oncogenes were also common (Fig. 2g and Supplementary Fig. 2h), such as the statistically significant degron loss for CCDC6, particularly when fused with RET (p < 0.0001, permutation test). This suggested a potential selection pressure to avoid protein degradation in both members of a fusion gene product. By further restricting our analysis to only previously implicated oncogenes (q < 0.1), we found additional internal degron loss events for rare oncogenic fusions containing PDGFRA/FIP1L1 (platelet-derived growth factor receptor alpha/factor interacting with PAPOLA and CPSF1, Supplementary Fig. 2i) and a positive control ETV1 fusion (Supplementary Fig. 1c and Supplementary Data 2), with additional fusions containing ETV4 (q = 0.16, Supplementary Fig. 1e) and ETV5 (q = 0.13, Supplementary Fig. 1f) at the borderline of statistical significance. Thus, it is plausible that with greater sample size, additional fusions leading to degron loss in genes not previously known to be oncogenes will be found.
Genetic fusions with degron loss are likely cancer type-specific
Given that genetic fusions have been previously noted to exhibit tissue specificity, such as ETV family fusions in prostate cancer7 and ALK or MET fusions in lung cancer8,9,14, we hypothesized that inclusion of cancer type-specificity would likely improve our statistical power. To this end, using low entropy as a metric for specificity (Fig. 3a), we found that genes involved in highly cancer type-specific fusions were significantly enriched for previously known oncogenic fusions (p = 3.1 × 10−16, Fischer’s Exact test, Supplementary Fig. 3a and Supplementary Data 3). Interestingly, we observed statistical significance for the loss of internal degrons from several fusion genes, such as 5′ EML4 fusions, 3′ NSD1 fusions and the previously validated 3′ ETV4 fusions, only when considered in conjunction with cancer type (Fig. 3b and Supplementary Data 3). Overall, degron loss contributes to many of the most highly recurrent gene fusions specific to particular cancer types (Fig. 3c), including PML-RARA in acute myeloid leukemia (LAML), EGFR-SEPT14 in gliomas, and TMPRSS2-ERG in prostate cancers. Thus, by using an unbiased statistical approach, we found both known (e.g. ETV fusions) and previously unknown cases of gene fusions leading to degron loss.
As numerous gene fusions with degron loss exhibit cancer type-specificity, we sought to identify the corresponding E3 ligases likely involved in this specificity. These associations include APC/CDC20 for EML4-ALK fusions in lung cancers (Fig. 3d), SPOP for NUP98-NSD1 (Fig. 3e) and BCR-ABL fusions in LAML, and FBW7 (or FBXW7, F-box and WD repeat domain containing 7) for CCDC6-RET fusions in thyroid carcinomas (THCAs) (Supplementary Data 3). Indeed, the E3 ligase most frequently involved in degron loss is the known tumor suppressor gene SPOP (Supplementary Fig. 3b), which suggests a selective pressure to avoid protein degradation in a variety of cancer types.
Genetic fusions with degron loss are associated with downstream functional consequences
Based on the above analyses that degron loss may lead to increases in the stability of fusion proteins, we hypothesized that tumors containing these fusions would be associated with an altered proteomic and subsequent transcriptomic state of cancer cells. To validate this hypothesis, we first analyzed the abundance of 198 proteins measured by reverse phase protein arrays (RPPA) across the TCGA (Supplementary Data 4). Consistent with our finding of degron loss, 5′ ERBB2 and 5′ EGFR fusions had significantly higher expression levels and active phosphorylation of their respective proteins (Supplementary Fig. 3d). Furthermore, degron loss in CCDC6 fusions led to elevated levels of downstream effectors, including active phosphorylated forms of YAP, PKC, p38, and 4EBP1 (Supplementary Fig. 3d). Given the limited number of proteins assayed by RPPA, we next analyzed for potential downstream consequences on the transcriptome through modulating the activity of transcription factors (TFs). Since many oncogenic fusions are involved in protein signaling, we reasoned that TF activity could be best approximated by the expression of TF target genes. Here, TF target genes are defined by thousands of ChIP-seq profiles from the Cistrome database33. Using the RABIT algorithm34 to find coordinated differential expression of TF target genes, we found 113 significant associations between TF activity and fusion events (Supplementary Fig. 3e, f and Supplementary Data 5). In support of the reliability of our analysis, previous studies support several of the most significant associations identified, such as AR for ERG fusions, TTF1 for EML4-ALK fusions, and TAL1 for BCR-ABL fusions35–37. Interestingly, 5′ EGFR fusions were significantly associated with increased STAT1 activity, suggesting that it is either a downstream consequence of EGFR kinase activity or an immunogenic consequence of a predicted fusion neoantigen11,38. Cumulatively, our analyses indicate that fusion events undergoing degron loss have significant downstream functional consequences on both the proteome as well as the transcriptome.
BCR-ABL fusion leads to loss of the SPOP degron in ABL and stabilization of fusion protein
Our systematic computational approach allowed us to potentially find, even for the most well-studied oncogenes, previously unknown degrons that were lost during fusion. For example, our analyses predicted that BCR-ABL fusions led to the loss of a SPOP degron originally found in the oncoprotein ABL1 (Fig. 4a). BCR-ABL is the gene fusion product of the Philadelphia chromosome found in LCML (Supplementary Fig. 4a, b)3–5, and it has been a therapeutic target for LCML treatment for decades6,39,40. Our computational analysis predicted that the fusion between ABL and its 5′ partner BCR leads to loss of a degron recognized by SPOP (Fig. 4a, b), which is a substrate adaptor of the Cullin 3 family of E3 ligases. The putative SPOP degron (17-LSSSS-21) is evolutionarily conserved in human and mouse ABL1 protein sequence, and similar to several known SPOP substrates, including ERG, AR, and DEK (Fig. 4c)17,18,41. This indicated that SPOP degron loss was plausible for BCR-ABL fusions, and thus might complement a previously proposed mechanism of constitutive kinase activity42.
First, we aimed to experimentally validate ABL1 as a bona fide substrate of the Cullin 3SPOP E3 ligase. Indeed, similar to the known SPOP substrate ERG, the protein abundance of ABL1 increased in DU145 prostate cancer cells upon treatment with either the proteasome inhibitor MG132 or the neddylation inhibitor MLN4924 (Fig. 4d). Depletion of endogenous Cullin 3 (Supplementary Fig. 4c) or SPOP (Fig. 4e) led to an increase of ABL1 protein abundance. Furthermore, Spop−/− mouse embryonic fibroblasts (MEFs) had relatively higher protein abundance of ABL1 than wild-type (WT) MEFs (Fig. 4f), consistent with the positive control SPOP substrates DEK41 and SRC3 (ref. 43). As expected for abrogating protein degradation, the protein half-life of ABL1 was dramatically longer in Spop−/− MEFs than in WT MEFs (Supplementary Fig. 4d, e). Moreover, ectopic expression of SPOP promoted the ubiquitination and degradation of ABL1 protein, which could be largely inhibited by the proteasome inhibitor MG132, thus indicating a proteasome-dependent mechanism (Fig. 4g, h). To ensure the enhanced degradation of the ABL1 protein was due to an on-target mechanism, we evaluated whether cancer-derived mutations, including Y87C, F102C, W131G, and F133V (Supplementary Fig. 4f)44, that abrogate SPOP binding to substrates would fail to promote ABL1 protein degradation. Notably, ectopic expression of WT SPOP, but not the SPOP mutants, could degrade ABL1 protein (Supplementary Fig. 4g). Taken together, these results support our computational prediction and indicate that ABL1 is likely a bona fide substrate of the SPOP E3 ligase.
We next sought to investigate whether BCR-ABL fusion proteins, named p190 and p210, could escape SPOP-mediated degradation in cells. This requires firstly excluding the possibility of another SPOP degron in ABL1 which is not lost in a fusion. We found that after deleting the predicted SPOP degron in the ABL1 protein, the resultant ABL1-ΔD mutant was relatively resistant to SPOP-mediated degradation in cells (Fig. 4i). Secondly, ectopic expression of SPOP degrades only WT ABL1 (Fig. 4j), but not BCR-ABL1 fusion proteins (Fig. 4k, l), indicating that BCR-ABL fusions escape from SPOP-mediated degradation via loss of the sole SPOP degron in the N-terminus of ABL1. Apart from the most frequent fusions, p190 and p210, there are several other low frequent fusions (e19a2) and rare fusions (e6a2, e8a2, e15a2, e1a3, e6a3, e8a3, e13a3, e14a3, and e19a3) in LCML, LAML, and acute lymphocytic leukemia (ALL, Fig. 4b). Notably, the SPOP degron in exon 1 of ABL is lost in all of these genetic fusions (Fig. 4b), suggesting a similar mechanism for these fusions in promoting tumorigenesis.
CCDC6-RET fusion escapes from FBW7-mediated degradation
Although the BCR-ABL fusion led to loss of a degron in the known oncoprotein ABL1, degron loss in the fusion partner to known oncogenes might also contribute towards increasing protein stability of fusion proteins. Our computational analysis showed that CCDC6-RET fusions were highly enriched for loss of predicted degrons in both fusion components, including FBW7 degrons in CCDC6 and a D-box degron in the oncogene RET (Fig. 5a). There are several variants of CCDC6-RET fusion, which contain N-terminal fragments of CCDC6 and C-terminus of RET, in thyroid carcinoma45, non-small cell lung cancer9, and other cancer types46 (Fig. 5b). Apart from RET, CCDC6 also fuses with other genes, including ROS1 (ref. 47) and PDGFRB48. Notably, the FBW7 degrons in CCDC6 were lost in all of these fusion proteins (Fig. 5b), suggesting an analogous mechanism of increasing protein stability.
Given these computational predictions, we expected the putative FBW7 degrons in CCDC6 would be similar to those found in previously known substrates. Sequence alignment showed that the predicted FBW7 degrons ((pT/pS)PXX(pS/pT), p indicating phosphorylation) were conserved in both human and mouse CCDC6, consistent with several known FBW7 substrates, such as c-Myc49,50, c-Jun51 and cyclin E52 (Fig. 5c). The recognition by FBW7 is known to dependent on the phosphorylation of serine or threonine residues within its degron motif53. As expected, large-scale phospho-proteomics data (https://www.phosphosite.org)54 have detected phosphorylation on residues within the putative FBW7 degrons (Thr-357, Ser-361, Thr-380, Ser-384, and Thr-427), further supporting CCDC6 as a potential substrate of FBW7.
To experimentally assess whether CCDC6-RET escapes FBW7-mediated degradation, we aimed to first validate CCDC6 as a bona fide FBW7 substrate. We found that the CCDC6 protein levels were relatively higher in FBXW7 (also known as FBW7) null DLD1 and HCT116 cells, compared with respective WT parental control cells (Fig. 5d). FBW7 is frequently mutated and inactivated in colorectal cancer (CRC), and FBW7 mutant CRC cells have relatively lower FBW7 expression and higher abundance of FBW7 substrates such as MCL1 (ref. 55). Thus, we further assessed CCDC6 protein levels in a panel of CRC cells with either WT or mutant FBW7, and found that FBW7-mutant cells trend to have relatively higher abundance of CCDC6 protein than FBW7-WT cells (Fig. 5e and Supplementary Fig. 4h). These data together indicate that CCDC6 is a ubiquitin substrate of FBW7. More importantly, compared with WT-CCDC6, CCDC6-RET fusion protein escaped recognition by FBW7 (Fig. 5f), leading to stabilization of the resultant fusion product in the in vivo ubiquitination assay (Fig. 5g). In keeping with this notion, depletion of FBW7 extended the half-life of CCDC6 protein in a cycloheximide (CHX) chasing assay (Supplementary Fig. 4i, j).
Unlike a prior report of CCDC6 as a substrate of FBW756, our findings support the relevance of FBW7 degron loss in CCDC6 fusions. Interestingly, given that CCDC6-RET fusions are predicted to generate neoantigens (Supplementary Fig. 4k)11, an increase of CCDC6-RET protein stability might also reduce the generation of antigenic peptides derived from proteasomal degradation57, thus evading an otherwise strong immune response (p = 0.02, likelihood ratio test; Supplementary Fig. 4l). To assess how loss of FBW7 degrons in the CCDC6 protein impact tumorigenesis, we further generated a DLD1 cell line that stably expresses either WT CCDC6 or CCDC6-RET fusion protein (Fig. 5h). We found that the CCDC6-RET-expressing cell line were more clonogenic than the WT-CCDC6-expressing cells in a colony formation assay (Fig. 5i, j) and resulted in larger tumors in a mouse xenograft model (Fig. 5k, l). Together, these data indicate that loss of FBW7 degrons in the CCDC6-RET fusion elevates its oncogenic phenotype.
PML-RARA escapes from β-TRCP-mediated degradation
Our systematic bioinformatic analyses of internal degrons relied on previously reported motifs for E3 ligases. However, we and others have validated degrons that may sometimes have unconventional motifs, such as the β-TRCP (F-box/WD repeat-containing protein 1A, FBXW1) degron in Twist (sSspvS)58, PER1(tSgcsS)59, and CHK1 (tSggcS)60. Given drugs that induce protein degradation of PML-RARA lead to high response rates in acute promyelocytic leukemia (APL)61–65, we hypothesized that the PML-RARA fusion may escape protein degradation through degron loss, but was missed in our systematic analysis. Interestingly, when using an unconventional β-TRCP degron motif (SSSxxS) reported from a previous study58, we found PML-RARA may lead to loss of a degron that is originally found in the PML protein (560-SSSEDS-565) (Supplementary Fig. 5a–c). Among all the genetic fusions observed in TCGA, PML-RARA is the second most frequent fusion event and is preferentially found in LAML11. Although not included in TCGA, nearly all APLs contain a PML-RARA fusion (95% of cases), which is caused by the reciprocal translocation t(15;17)(q24;q21)66 (Supplementary Fig. 5a). Depending on the exact location of the translocation, PML-RARA fusion yields two major fusions proteins, namely PML-RARa-s and PML-RARa-l (Supplementary Fig. 5b).
Because of the high prevalence and therapeutic relevance of PML-RARA fusions, we next sought to experimentally validate PML as a bona fide substrate of β-TRCP and thereby implicate the predicted degron loss mechanism. Indeed, depleting endogenous ΒTRC (also known as β-TRCP), but not other F-box E3 ligase we tested, induced the accumulation of the endogenous PML protein (Supplementary Fig. 5d). In addition, depletion of β-TRCP extended the half-life of PML protein (Supplementary Fig. 5e). Consistent with the required phosphorylation of a β-TRCP degron, all four Serine residues were observed to be phosphorylated in a previous unbiased screen67,68. Furthermore, depletion of CSNK2A1 (also known as CKII)69 also led to the accumulation of PML protein (Supplementary Fig. 5f), indicating that CKII is a potential kinase for PML. Using an in vitro phosphorylation assay, we found that mutation of serine residues within the putative β-TRCP degron (PML-4A) abolished the phosphorylation mediated by CKII kinase (Supplementary Fig. 5g). Moreover, the non-phosphorylated PML mutant (PML-4A) lost the interaction with β-TRCP, thus becoming resistant to β-TRCP-mediated degradation (Supplementary Fig. 5h). Taken together, these results indicate that PML is likely a bona fide substrate of β-TRCP, and loss of a β-TRCP degron likely renders greater stability to PML-RARA fusions.
Comprehensive analysis of C-terminal degron loss upon oncogenic gene fusion
The loss of a non-canonical degron in PML-RARA highlights that, even for well-studied E3 ligases like β-TRCP, our current knowledge of degron motifs is largely incomplete. This dearth of knowledge may lead to conclusions that overlook the role of degron loss in fusion events. Thus, we hypothesized that unbiased learning of degron motifs from data would reveal additional cases of degron loss in gene fusions. Although systematic profiling of degrons across the entire proteome has not yet been performed, a previous global protein stability (GPS) assay has systematically measured all C-terminal protein sequences for protein stability, which led to the discovery of several novel degron motifs31. We therefore leveraged a machine learning model trained on the GPS assay (deepDegron)30 to score whether gene fusions preferentially lead to C-terminal degron loss (Fig. 6a and Supplementary Fig. 6a). We found gene fusions overall were substantially enriched for C-terminal degron loss, with statistical significance further improved by including cancer type information (Supplementary Fig. 6b and Supplementary Data 6). 5′ EGFR and 5′ RAF1 fusions yielded the highest scores for C-terminal degron loss among the 16 statistically significant genes (Fig. 6b). EGFR and RAF1 fusions additionally displayed substantial cancer type-specificity, with 65% of 5′ EGFR fusions occurring in gliomas (Supplementary Fig. 6c) and 69% of 5′ RAF1 fusions occurring in thyroid carcinomas (THCAs) (Supplementary Fig. 6d). Interestingly, C-terminal and internal degrons can be simultaneously lost in a gene fusion, as observed for 5′ NCOA4 fused with 3′ RET (Supplementary Data 1 and Supplementary Fig. 6e).
EGFR-SEPT14 is the most frequent EGFR fusion and occurs mostly in glioblastoma (GBM) and low-grade gliomas (LGG). EGFR-SEPT14 fusions result in loss of a putative C-terminal degron (-GA*, Fig. 6c and Supplementary Fig. 6f), which is evolutionarily conserved among species (Supplementary Fig. 6g). To experimentally validate the key role of the -GA* motif in controlling the protein stability of EGFR protein, we generated two EGFR mutants with either deletion of the last alanine residue (G1209*) or glycine–alanine dipeptide (I1208*, S6F, Supplementary Fig. 6h). Notably, WT EGFR underwent significant ubiquitination, but both EGFR mutants resulted in a dramatic reduction in ubiquitination (Fig. 6d). This supports our computational finding that a C-terminal degron (-GA*) is lost in EGFR genetic fusions, which likely lead to increase stability of the resultant fusion proteins.
Among RAF1 fusions, RAF1-AGGF1 is the most frequent fusion, with 3′ partners TRAK1 and PHC3 being observed less frequently. Our computational analysis predicts a putative C-terminal degron in RAF1 that is evolutionarily conserved among species (-Vx*, x means any amino acid, Fig. 6e and Supplementary Fig. 6i, j). Notably, all RAF1 fusions result in the loss of this putative C-terminal degron. To experimentally validate this finding, we mutated the putative RAF1 degron, by either deletion of the valine residue (P646F647*) or substitution of the valine to alanine (A647F648*, Supplementary Fig. 6k). Compared to WT RAF1, both mutants exhibited relatively less ubiquitination (Fig. 6f) and an extended protein half-life (Fig. 6g, h). To further assess whether loss of the C-terminal degron in RAF1 affects tumorigenesis, we generated a HeLa cell line that stably express either WT RAF1 or the degron loss mutants of RAF1 (P646F647* and A647F648*, Fig. 6i). Cells expressing the degron loss mutant forms of RAF1 were more clonogenic than those expressing WT-RAF1 in vitro in a colony formation assay (Fig. 6j, k). Furthermore, the RAF1 mutant-expressing cells (A647F648*) generated larger tumors in a mouse xenograft model than those expressing WT-RAF1 (Fig. 6l, m). These experimental results support our computational prediction that RAF1 loses a C-terminal degron (-Vx*) during fusion events, a process likely rendering greater stability to the fusion protein to facilitate tumorigenesis.
Discussion
While oncogenic gene fusions in human cancers have been extensively cataloged11,70, the molecular mechanisms underlying their oncogenicity is incompletely understood. By analyzing more than 9000 tumors across 33 cancer types, we provide a systematic analysis of genetic fusions that demonstrate the prevalence of degron loss as a mechanism to increase the resultant protein stability. Among the 2406 fusion events that are predicted by machine learning to undergo degron loss, we experimentally validated five highly recurrent oncogenic gene fusions for altered protein stability and oncogenicity, thus more than doubling the number of previously validated cases16–18. Prior systematic studies have largely focused on transcriptional over-expression of gene fusions caused by the exchange of promoters or enhancers11,13. Our results suggest that degron loss is a complementary and generally applicable mechanism by which genetic fusions increase protein expression levels and thus promote tumorigenesis. We note that degron loss is not necessarily mutually exclusive with other previously proposed mechanisms such as promoter swapping, and therefore might act in concert with them to explain the oncogenicity of a gene fusion. For example, genetic fusions that lead to loss of the C-terminal degrons (such as those for RAF1 or EGFR) might simultaneously promote the kinase activity through a similar mechanism of dimerization or oligomerization5,8,9,14.
Despite our study providing the most comprehensive examination of degron loss for genetic fusions to date, many instances of degron loss may still be missed for a couple of reasons. First, our analysis still has limited statistical power in identifying enrichment for degron loss in rare fusion events. For example, a previously validated KEAP1 degron in IKBKB71 was lost in HOOK3-IKBKB fusions in breast cancer (Supplementary Fig. 3b), but this fusion event did not surpass our stringent false discovery rate cutoff. Secondly, given the incomplete knowledge of degron motifs, we further prioritized likely true degrons by employing machine learning and ensuring requisite post-translational modifications. However, these stringent criteria may also lead to false negatives in degron motifs, such as the lack of a previously reported phosphorylation event in CCDC6 preventing the accurate prediction of a third FBW7 degron. Further basic science efforts to decipher additional degron motifs coupled with an increased throughput of tumor sequencing will be necessary to provide a complete landscape of degron loss for oncogenic fusions.
Our finding that fusion proteins preferentially escape protein degradation by degron loss suggests that tumors may be particularly sensitive to degradation of oncogenic fusions. Indeed, the standard of care for APL harboring the PML-RARA fusion is either all-trans-retinoic acid (ATRA) or arsenic trioxide, both of which lead to the degradation of the PML-RARA fusion protein61–65. Given recent advance in the development of compounds that induce targeted protein degradation such as PROTACs (PRoteolysis TArgeting Chimeras)72, other fusions besides PML-RARA that undergo degron loss could become efficacious therapeutic targets. Notably, compounds that specifically degrade the BCR-ABL and ALK fusions protein have been developed73–75. An additional theoretical benefit of degrading fusion proteins is the possibility to overcome acquired resistance mutations to previously used inhibitors, such as imatinib for BCR-ABL76,77 and crizotinib for EML4-ALK fusions78. Because not all gene fusions undergo degron loss, our analysis may help prioritize the most promising targets for further PROTAC drug development. However, there are numerous questions that deserve further attention. For example, how can we understand the combinatorial impact of degron loss with other simultaneous mechanisms involved in gene fusions? Are there differences in the functional consequences of pharmacological inhibition versus degradation of stable fusion proteins? Could induced degradation of otherwise stable fusion proteins increase the presentation of neoantigens that yield an immune response against cancer? Future studies of gene fusions that combine mechanistic and bioinformatic insights may reveal the answers to these and more questions.
Methods
Human cell lines and culture conditions
Human embryonic kidney 293 (HEK293), HEK293T, HeLa, DU145, and LNCaP cells were purchased from American Type Culture Collection (ATCC). Spop+/+ and Spop−/− MEFs were kind gifts from Dr. Nicholas Mitsiades (Baylor College of Medicine). The panel of colon cancer cell lines (Lim2405, RKO, DiFi, SW480, Lim1215, LoVo, LS411N, SW1463, SW48, SNU-C2B, HCT8, and SW837) were obtained from Dr. Lin Zhang (University of Pittsburg), and HCT116-FBW7-KO, HCT116 WT, and DLD1-FBW7-KO, DLD1-WT cell lines were kind gifts from Dr. Bert Vogelstein (John Hopkins University). HEK293, HEK293T, HeLa cells, Spop+/+, and Spop−/− MEFs were maintained in Dulbecco’s modified Eagle’s medium (DMEM) containing 10% fetal bovine serum (FBS), 100 units of penicillin and 100 µg/ml streptomycin. DU145, LNCaP, HCT116, DLD1, Lim2405, RKO, DiFi, SW480, Lim1215, LoVo, LS411N, SW1463, SW48, SNU-C2B, HCT8, SW837, HCT116-FBW7-KO, and DLD1-FBW7-KO cells were cultured in RPMI1640 containing 10% FBS, 100 Units of penicillin and 100 µg/ml streptomycin.
General cloning
Expression vectors HA-ABL1, HA-FBW7, and HA-RAF1 were constructed by cloning the corresponding cDNAs into pcDNA3-HA vector. Flag-SPOP, Flag-SPOP-Y87C, Flag-SPOP-F102C, and Flag-SPOP-W131G were constructed as previous described17. Myc-β-TRCP1 was constructed as previous describe79. GFP-CCDC6 (571577), GFP-CCDC6/RET (572024), and HA-EGFR (703594) were purchased from MRC PPU (University of Dundee). HA-ABL1-ΔD, Flag-PML-S518A, Flag-PML-4A, Flag-PML-5A, HA-EGFR-G1029*, HA-EGFR-I1028*, HA-RAF1-P646A647*, and HA-RAF1-A647A648* were constructed using the Site-Directed Mutagenesis Kit (Agilent) following the manufacturer’s instructions. GST-PML-WT, GST-PML-4A, and GST-PML-S518A were constructed by cloning the corresponding cDNA into pGEX-GST-4T1 vector. pLenti-HA-CCDC6, pLenti-HA-CCDC6-RET, pLenti-HA-RAF1, pLenti-HA-RAF1-P646A647*, and pLenti-HA-RAF1-A647A648* were constructed by cloning the corresponding cDNAs into pLenti-puro vector. The primers for site mutation are as below: PML-S518A-f: 5′-GCACCTCCAAGGCAGTCGCACCACCCCACCTGG-3′; PML-S518A-r: 5′-CCAGGTGGGGTGGTGCGACTGCCTTGGAGGTGC-3′; PML-4A-f: 5′-CGCGTTGTGGTGATCGCCGCCGCGGAAGACGCAGATGCCGAAAACTCG-3′; PML-4A-r: 5′-CGAGTTTTCGGCATCTGCGTCTTCCGCGGCGGCGATCACCACAACGCG-3′; ABL1-ΔD-f: 5′-GCAAATCCAAGAAGGGGAGCTGTTATCTGGAAG-3′; ABL1-ΔD-r: 5′-CTTCCAGATAACAGCTCCCCTTCTTGGATTTGC-3′; EGFR-G1029-f: 5′-CAGTGAATTTATTGGATGAGCGGCCGCTTACC-3′; EGFR-G1029-r: 5′-GGTAAGCGGCCGCTCATCCAATAAATTCACTG-3′; EGFR-I1028-f: 5′-CAAAGCAGTGAATTTATTTGAGCGGCCGCTTACCC-3′; EGFR-I1028-f: 5′-GGGTAAGCGGCCGCTCAAATAAATTCACTGCTTTG-3′; RAF1-A647A648-f: 5′-CCCCGAGGCTGCCTATGTTCTAGTTGACTTTGCACC-3′; RAF1-A647A648-r: 5′-GGTGCAAAGTCAACTAGAACATAGGCAGCCTCGGGG-3′; RAF1-P646A647-f: 5′-CCCCGAGGCTGCCTTTCTAGTTGACTTTGCACCTG-3′; RAF1-P646A647-r: 5′-CAGGTGCAAAGTCAACTAGAAAGGCAGCCTCGGGG-3′. The shRNA vectors for SPOP were purchased from Sigma (TRCN0000122224, TRCN0000139181, TRCN0000145024).
Antibodies
The anti-ABL1 (2862, 1:1000), anti-p27 (3686, 1:1000), anti-DEK (13962, 1:1000), anti-SRC3 (2126, 1:1000), anti-CUL3 (2759, 1:1000), anti-GST (2625, 1:2000), anti-β-TRCP (4394, 1:1000), anti-p-ERK(9101, 1:1000), and anti-ERK (4695, 1:1000) antibodies were obtained from Cell Signaling Technology. Anti-ERG (EPR3864, 1:1000) antibody was obtained from Abcam. Anti-SPOP (16750-1-AP, 1:1000) antibody was obtained from Proteintech. Anti-GFP (A-11122, 1:5000) antibody was obtained from Thermo Fisher. Anti-FBW7 (A301-720A, 1:1000) and anti-PML (A301-167A, 1:1000) were obtained from Bethyl Laboratories. Anti-CCDC6 (sc-100309, 1:1000) and anti-α Tubulin (sc-8035, 1:2000) antibodies were obtained from Santa Cruz Biotechnoloy. Mouse monoclonal anti-HA.11 epitope tag (clone 16B12, 901513, 1:1000) was obtained from BioLegend. Anti-Vinculin (V9131, 1:50000), rabbit polyclonal anti-HA (H6908, 1:3000), mouse monoclonal ANTI-FLAG® M2 (F3165, 1:5000), rabbit polyclonal ANTI-FLAG® (F7425, 1:3000), anti-mouse IgG (whole molecule)-peroxidase (A4416, 1:5000), and anti-rabbit IgG (whole molecule)-peroxidase (A4914, 1:5000) were obtained from Sigma-Aldrich. Mouse monoclonal ANTI-FLAG® M2 affinity agarose gel (A2220) and mouse monoclonal anti-HA-agarose (A2095) were obtained from Sigma-Aldrich.
Annotation of fusion consequence
We annotated the protein sequence consequence of fusions using the software tool AGFusion80. To provide consistent annotation, we chose the Matched Annotation from NCBI and EMBL-EBI (MANE select transcripts v0.9) from GENCODE when possible81, or otherwise the longest transcript that is consistent with the fusion junction. Transcript annotations were based on Ensembl release 95 using pyensembl (https://github.com/openvax/pyensembl). Of 25,664 fusions reported in TCGA, 24,239 fusions could be annotated. The PFAM database was used to annotate the impact of fusions on protein domains82. Code used to analyze fusion genes can be found on github (https://github.com/ctokheim/fusion_pipeline).
Annotation of cancer driver genes
A consensus among multiple sources was used to annotate previously implicated cancer driver genes, which included OncoKB (https://www.oncokb.org/, downloaded 4/2020)83, The Cancer Genome Atlas (TCGA)84, and the Cancer Gene Census (CGC, downloaded 4/9/2020)85. For CGC, we excluded genes with only support for germline mutations. For OncoKB, we only used genes that were annotated by OncoKB, rather than including additional genes from other sources. To further distinguish oncogenes versus tumor suppressor genes, we annotated based on the evidence from at least one source and no conflicting interpretations. Given that TCGA has cancer type-specific assessments of oncogene and tumor suppressor genes, we chose based on the most frequent annotation across cancer types.
Enrichment for in-frame fusions
To analyze whether fusions containing driver genes are biased towards in-frame fusions, we analyzed the odds ratio of in-frame vs out-of-frame fusions. The in-frame status of fusions was determined by the annotation software AGFusion (https://github.com/murphycj/AGFusion)80. The log odds ratio was calculated separately for oncogene and tumor suppressor gene fusions, relative to putative passenger fusions that do not contain a gene previously implicated in cancer. In cases where a fusion is composed of both an oncogene and a tumor suppressor, the fusion gene was regarded as an oncogene. The standard error for the log odds ratio was calculated using a normal approximation86.
Protein domain analysis
To analyze whether putatively oncogenic fusions preferentially retain protein domains, we compared the odds ratio that a fusion retained at least one protein domain for implicated driver genes (oncogenes or tumor suppressors) to passenger genes. We used domains from PFAM to annotate whether fusion retained or lost protein domains. Protein domains needed to be at least 25 amino acids long. For cases where the fusion junction interrupted a protein domain, we considered a protein domain as retained in the fusion gene if greater than 50% of the protein sequence was included.
Motif search for internal degrons
We first curated known degron motifs from eukaryotic linear motifs (ELM) database and other literatures25 (Supplementary Data 7). Each motif is represented as a regular expression which describes the allowable amino acid residues at each position. Motifs were then searched against the protein translation of GENCODE transcripts using the python “re” package. When multiple transcripts were available for a gene, the MANE select transcript (v0.9) was used. Some degron motifs require not only a particular protein sequence, but also that certain residues have appropriate post-translational modifications (PTM). Towards this end, we collected all available PTMs in the PhosphoSitePlus database54 and filtered motif sequence matches for any requisite PTMs (phosphorylation or acetylation). For the non-standard BTRC degron, we used the regular expression “SSSxxS”. The motif search revealed 32,804 hits across 8623 genes involved in TCGA fusions.
Machine learning prioritization of internal degron motifs
Because motif instances may happen by chance in the proteome, we wanted further prioritize motifs that are a biologically plausible degron. Previously, we developed a model to predict the potential of a motif to be a degron using a Random Forest algorithm. The model was trained on 83 features from the SNVBox database87,88 to distinguish previously reported degrons (n = 186)26 from random other sequences within the same set of proteins (n = 186). Features spanned characterization of evolutionary conservation to biophysical features of amino acid residues within a protein. To summarize features across the multiple amino acid residues in a motif, we took the average of each feature. Evaluated using 20-fold cross-validation, performance as measured by the area under the receiver operating characteristic curve (auROC) was 0.8 out of 1.0 (p = 2 × 10−25, Mann–Whitney U test).
Internal degron motif filtering
Because degron motifs are generally short, motif matches can happen by chance across the proteome. We therefore filtered motifs that had low potential to actually being a degron according to a Random Forest algorithm (score ≤ 0.6 out of 1.0), see section entitled “Machine learning prioritization of internal degron motifs”. This resulted in keeping 2485 high-likelihood degron motifs for downstream analysis.
C-terminal degron motif
In contrast to internal degron motifs from the ELM database, C-terminal degron motifs were defined based on de novo inference from the Global Protein Stability (GPS) assay31 as previously described30. Briefly, the c-terminal sequence of every protein in the proteome is ranked by a degron potential score by the deepDegron method. A binomial model is then used to test for motifs that are statistically enriched in high scoring sequences (q < 0.05). This revealed 236 C-terminal degron motifs. Note that C-terminal degron motifs may partially overlap so multiple motif matches in a protein sequence are regarded as the same as a single motif match. All C-terminal degron motifs can also be found in Supplementary Data 8. Documentation for deepDegron is available on readthedocs (https://deepdegron.readthedocs.io/) and source code is available on github (https://github.com/ctokheim/deepDegron).
Statistical test for degron loss in fusion genes
A permutation-based approach was used to determine whether fusions preferentially lead to degron loss. Since a gene may have different fusion partners that all lead to degron loss (e.g. ETV fusions, Fig. 1), we chose to measure enrichment separately for 5′ and 3′ genes. For internal degrons, each gene involved in a fusion received a degron loss score, representing the sum of scores for degrons lost in the fusion. The degron loss score represents both the confidence that the degron exists and the frequency by which it is lost in fusion events. Basically, each predicted degron in a protein sequence receives a score from a Random Forest machine learning model that reflects the confidence in the prediction. The score of that degron is then summed each time a fusion event leads to its loss. Likewise, for c-terminal degron analysis, each 5′ gene involved in a fusion received a delta degron potential score, representing the difference in degron potential scores between the 5′ gene and 3′ gene of a fusion. Only fusions resulting in in-frame fusions were analyzed, as the loss of degron in an out-of-frame fusion would not lead to increased activity of the fusion product. Additionally, as the previously reported validation rate of fusion calls is 63%11, we only analyzed genes involved in at least two fusions to mitigate the impact of spurious calls. The sum of degron loss scores for a gene across multiple fusions was then calculated as the test statistic. The observed scores were then compared to 10,000 permutations, where degron loss scores per fusion were randomly shuffled and the gene-based test statistic was recalculated. The p value for a gene’s observed test statistic is calculated as the fraction of permutations that have an equal or greater test statistic. Genes were regarded as statistically significant based on the false discovery rate (q < 0.1) using the Benjamini–Hochberg method32. Given that oncogene fusions display a significant bias towards in-frame mutations, we only considered genes as degron loss candidates if they had at least 50% of fusions as in-frame. To further prioritize oncogene fusions that have degron loss in the oncogene itself rather than the partner gene, we also included analyses of only genes with a retained protein domain and a restricted hypothesis test analyzing only previously implicated oncogenes.
Statistical test for cancer type-specificity of fusion genes
Similar to the statistical test for degron loss, we also used a permutation test to evaluate whether genes were involved in fusions preferentially found in particular cancer types. To quantify cancer type-specificity, we used entropy,
1 |
where is the entropy for gene g, c reflects a particular cancer type, reflects all cancer types with fusions containing gene g, and reflects the fraction of fusions for gene g found in cancer type c. Lower entropy values represent higher cancer type-specificity. We randomly shuffled the labels for cancer types of the fusions 10,000 times, and recomputed . The corresponding p value was calculated as the fraction of permutations i that had an entropy equal to or lower than the observed entropy. Genes were regarded as statistically significant based on the false discovery rate (q < 0.1) using the Benjamini–Hochberg method32.
Lollipop diagrams
Lollipop diagrams displaying fusion genes were generated using ProteinPaint (https://pecan.stjude.cloud/proteinpaint)89. Fusion junctions were submitted according to their genomic coordinates. Protein domains are shown as colored boxes along the protein sequence.
Immunoblots and immunoprecipitation (IP)
Cells were lysed in EBC buffer (50 mM Tris pH 7.5, 120 mM NaCl, 0.5% NP-40) supplemented with protease inhibitors (Thermo Fisher) and phosphatase inhibitors (phosphatase inhibitor cocktail set I and II, Calbiochem). The lysates were then resolved by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) and immunoblotted with indicated antibodies. For IP, 0.5–1 mg lysates were incubated with the appropriate beads for 4 h at 4 °C. Immuno-complexes were washed four times with NETN buffer (20 mM Tris, pH 8.0, 100 mM NaCl, 1 mM EDTA, 0.5% NP-40) before being resolved by SDS-PAGE and immunoblotted for indicated proteins. These primary antibodies were diluted in 5% BSA in TBST and secondary antibodies were diluted in 5% non-fat milk for immunoblotting analysis. The Quantity One software was used for the quantification of protein band intensity, and graphic and statistical analyses were generated using GraphPad 8.
In vitro kinase assays
PML in vitro kinase assays were performed as previous reported90. Briefly, GST-PML-WT, GST-PML-4A, and GST-PML-S518A were expressed in BL21 E. coli and purified using Glutathione Sepharose 4B according to the manufacturer’s instructions (Thermo). One microgram of GST-PML-WT, or GST-PML-4A, or GST-PML-S518A protein were incubated with 32P-ATP in the absence or presence of CKII kinase in kinase assay buffer (10 mM HEPES, pH 8.0, 10 mM MgCl2, 1 mM dithiothreitol, 0.1 mM ATP). The reaction was initiated by the addition of 10× kinase assay buffer in a volume of 30 μL for 45 min at 30 °C followed by the addition of SDS-PAGE sample buffer to stop the reaction before resolved by SDS-PAGE.
In vivo ubiquitination assays
Denatured in vivo ubiquitination assays were performed as previously described17. Briefly, HEK293T cells were transfected with indicated constructs. Fourty-eight hours after transfection, 30 μM MG132 was added to block proteasome degradation for 6 h and then cells were harvested in denatured buffer (6 M guanidine-HCl, pH 8.0, 0.1 M Na2HPO4/NaH2PO4, 10 mM imidazole). After sonication, the ubiquitinated proteins were purified by incubation with Ni-NTA matrices for 3 h at room temperature. The pull-down products were washed sequentially twice in buffer A, twice in buffer A/TI mixture (buffer A: buffer TI = 1:3, v/v) and once in buffer TI (25 mM Tris-HCl, pH 6.8, 20 mM imidazole). The poly-ubiquitinated proteins were separated by SDS-PAGE for immunoblot analyses.
Protein half-life cycloheximide (CHX) chasing assays
To measure the half-life of ABL1 protein, a CHX-based assay was performed following our previously described experimental procedures90. Briefly, cells were treated with 200 μg/ml CHX for indicated time before harvest for immunoblot analysis of indicated proteins.
Colony formation assays
Stable cell lines were seeded into six-well plates in medium (1,000 cells/well) and cultured for 2–3 weeks until colonies are visible. Then, the colonies were washed once with PBS, fixed with fixation buffer (10% acetic acid, 10% methanol) for 20 min, and then stained with staining solution (0.4% crystal violet, 20% ethanol) for 10 min. After staining, the plates were washed with distilled water and air-dried, and then colonies were counted for statistical analysis.
Mouse xenograft assays
Five- to six-week-old male nude mice were purchase from Taconic (#NCRNU) for xenograft studies. A total of 1 x 106 cells were re-suspended in 100 µl PBS solution and injected subcutaneously into the mice (n = 9 or 10 mice for each group) as described previously79. At the end of experiment, mice were sacrificed and tumors were dissected for imaging and weighing. All mouse experiments were approved by the Institutional Animal Care and Use Committee (IACUC, RN150D) at Beth Israel Deaconess Medical Center (BIDMC). The Institute is committed to the highest ethical standards of care for animals used for the purpose of continued progress in the field of human cancer research. All mice were housed in a pathogen-free environment at BIDMC animal facility and were handled in strict accordance with the “Guide for the Care and Use of Laboratory Animals” and the applicable institutional regulations.
Association of CCDC6-RET with leukocyte fraction
To analyze whether CCDC6-RET fusions were associated with leukocyte infiltration, we utilized a previous estimate of immune infiltration for TCGA tumors91. A likelihood ratio test was performed after adjusting for tumor purity from ABSOLUTE (downloaded from https://gdc.cancer.gov/about-data/publications/pancanatlas)92, tumor mutation burden, and cancer type.
Association of fusions events with protein abundance (RPPA)
To analyze whether fusion events were associated with an altered proteome, we correlated the mutation status of fusion genes with protein abundance from reverse phase protein array (RPPA) in TCGA93. A Wald test was performed after adjustment for cancer type. Only fusions present in at least three tumors were considered.
Association of fusions events with transcription factor activity
We hypothesized that fusion events may be associated with altered activity of transcription factors. To quantify activity, we leveraged thousands of transcription factor ChIP-seq profiles in Cistrome DB to identify target genes33. Computational analysis was then carried out as previously performed30. Briefly, we first analyzed fusion events for differentially expressed genes, after adjusting for tumor purity and tumor subtype. RABIT34 was then used to infer the transcription factor regulators that explain the differentially expressed genes by using the transcription factor target genes defined by Cistrome DB. Associations were regarded as significant at a family wise error rate of 0.01. Analysis only considered fusions with at least three events in a cancer type and transcription factors not deemed to be an outlier (see below).
Defining outlier transcription factors
ChIP-seq data defining the target genes of transcription factors can be of inconsistent quality. We reasoned that ChIP-seq datasets that consistently arise as explaining differentially expressed genes for nearly all fusion events may reflect data artifacts. We therefore performed outlier analysis using robust covariance estimation (scikit learn python package)94, assuming a gaussian distribution and a significant contamination rate of 0.01 (Supplementary Fig. 3f).
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
We thank the Liu and Wei lab members for suggestions and comments on this work. This work was supported by R35CA253027 (to W.W.) and Breast Cancer Research Foundation BCRF-20-100 (to X.S.L.). C.T. is a Damon Runyon Fellow supported by the Damon Runyon Cancer Research Foundation (DRQ-04-20).
Source data
Author contributions
The idea was conceived by W.W., X.S.L. and P.P.P.; C.T. designed and performed the bioinformatics analysis; J.L., W.G. and B.J.N. designed and performed most of the experiments with assistance from Y.L.; J.L. and C.T. wrote the manuscript. W.W., X.S.L. and P.P.P. supervised the study and edited the manuscript. All authors commented on the manuscript.
Data availability
Data are available in the article, Supplementary Information, or Supplementary Data 1–8. The full list of recurrent genetic fusions, full list of genes and oncogenes with internal and C-terminal degron loss, full list of protein abundance of fused genes, full list of downstream transcription factors due to genetic fusions are included in the Supplementary Data. The original gene fusion calls were obtained from Supplementary Data 1 of Gao et al.11. The subsequently annotated and processed gene fusion data for downstream statistical analysis is available on GitHub (https://github.com/ctokheim/fusion_pipeline). The output from the analysis can be found in the Supplementary Data. All data used in the analyses described in this study are freely available within the public database, including TCGA (https://www.cancer.gov/tcga), OncoKB (https://www.oncokb.org/), CGC (https://cancer.sanger.ac.uk/census), Uniprot (https://www.uniprot.org/), PFAM (http://pfam.xfam.org/), ELM (http://elm.eu.org/), and PhosphoSitePlus (https://www.phosphosite.org/). Source data are provided with this paper.
Code availability
Custom code for this manuscript is available on GitHub (https://github.com/ctokheim/fusion_pipeline) and is archived on Zenodo95.The README file in the GitHub repository describes how to reproduce the analysis. The code uses python 3 and exact version numbers of dependencies are listed in the environment configuration file. The deepDegron code to analyze c-terminal degrons is also freely available on GitHub (https://github.com/ctokheim/deepDegron).
Competing interests
W.W. and P.P.P. are co-founders and stockholders of the Rekindle Therapeutics. X.S.L. is a cofounder, board member, SAB member, and consultant of GV20 Oncotherapy and its subsidiaries; stockholder of BMY, TMO, WBA, ABT, ABBV, and JNJ; and received research funding from Takeda, Sanofi, and Novartis. The remaining authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Jing Liu, Collin Tokheim, Jonathan D. Lee.
Contributor Information
X. Shirley Liu, Email: xsliu@ds.dfci.harvard.edu.
Pier Paolo Pandolfi, Email: PierPaolo.PandolfiDeRinaldis@renown.org.
Wenyi Wei, Email: wwei2@bidmc.harvard.edu.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-021-26871-y.
References
- 1.Vogelstein B, Kinzler KW. The multistep nature of cancer. Trends Genet. 1993;9:138–141. doi: 10.1016/0168-9525(93)90209-z. [DOI] [PubMed] [Google Scholar]
- 2.Knudson AG., Jr. Mutation and cancer: statistical study of retinoblastoma. Proc. Natl Acad. Sci. USA. 1971;68:820–823. doi: 10.1073/pnas.68.4.820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Nowell P, Hungerford D. A minute chromosome in chronic granulocytic leukemia. Science. 1960;132:1488–1501. [Google Scholar]
- 4.Lugo TG, Pendergast A-M, Muller AJ, Witte ON. Tyrosine kinase activity and transformation potency of bcr-abl oncogene products. Science. 1990;247:1079–1082. doi: 10.1126/science.2408149. [DOI] [PubMed] [Google Scholar]
- 5.Ren R. Mechanisms of BCR–ABL in the pathogenesis of chronic myelogenous leukaemia. Nat. Rev. Cancer. 2005;5:172–183. doi: 10.1038/nrc1567. [DOI] [PubMed] [Google Scholar]
- 6.Druker BJ, et al. Efficacy and safety of a specific inhibitor of the BCR-ABL tyrosine kinase in chronic myeloid leukemia. N. Engl. J. Med. 2001;344:1031–1037. doi: 10.1056/NEJM200104053441401. [DOI] [PubMed] [Google Scholar]
- 7.Kumar-Sinha C, Tomlins SA, Chinnaiyan AM. Recurrent gene fusions in prostate cancer. Nat. Rev. Cancer. 2008;8:497–511. doi: 10.1038/nrc2402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lipson D, et al. Identification of new ALK and RET gene fusions from colorectal and lung cancer biopsies. Nat. Med. 2012;18:382–384. doi: 10.1038/nm.2673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Takeuchi K, et al. RET, ROS1 and ALK fusions in lung cancer. Nat. Med. 2012;18:378–381. doi: 10.1038/nm.2658. [DOI] [PubMed] [Google Scholar]
- 10.Wu YM, et al. Identification of targetable FGFR gene fusions in diverse cancers. Cancer Discov. 2013;3:636–647. doi: 10.1158/2159-8290.CD-13-0050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gao Q, et al. Driver fusions and their implications in the development and treatment of human cancers. Cell Rep. 2018;23:227–238 e223. doi: 10.1016/j.celrep.2018.03.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Tuna M, Amos CI, Mills GB. Molecular mechanisms and pathobiology of oncogenic fusion transcripts in epithelial tumors. Oncotarget. 2019;10:2095–2111. doi: 10.18632/oncotarget.26777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bastus NC, et al. Androgen-induced TMPRSS2:ERG fusion in nonmalignant prostate epithelial cells. Cancer Res. 2010;70:9544–9548. doi: 10.1158/0008-5472.CAN-10-1638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Soda M, et al. Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer. Nature. 2007;448:561–566. doi: 10.1038/nature05945. [DOI] [PubMed] [Google Scholar]
- 15.Palanisamy N, et al. Rearrangements of the RAF kinase pathway in prostate cancer, gastric cancer and melanoma. Nat. Med. 2010;16:793–798. doi: 10.1038/nm.2166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Vitari AC, et al. COP1 is a tumour suppressor that causes degradation of ETS transcription factors. Nature. 2011;474:403–406. doi: 10.1038/nature10005. [DOI] [PubMed] [Google Scholar]
- 17.Gan W, et al. SPOP promotes ubiquitination and degradation of the ERG oncoprotein to suppress prostate cancer progression. Mol. Cell. 2015;59:917–930. doi: 10.1016/j.molcel.2015.07.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.An J, et al. Truncated ERG oncoproteins from TMPRSS2-ERG fusions are resistant to SPOP- Mol. Cell. 2015;59:904–916. doi: 10.1016/j.molcel.2015.07.025. [DOI] [PubMed] [Google Scholar]
- 19.Ciechanover A. The ubiquitin-proteasome proteolytic pathway. Cell. 1994;79:13–21. doi: 10.1016/0092-8674(94)90396-4. [DOI] [PubMed] [Google Scholar]
- 20.Pohl C, Dikic I. Cellular quality control by the ubiquitin-proteasome system and autophagy. Science. 2019;366:818–822. doi: 10.1126/science.aax3769. [DOI] [PubMed] [Google Scholar]
- 21.Komander D, Rape M. The ubiquitin code. Annu. Rev. Biochem. 2012;81:203–229. doi: 10.1146/annurev-biochem-060310-170328. [DOI] [PubMed] [Google Scholar]
- 22.Pickart CM. Mechanisms underlying ubiquitination. Annu. Rev. Biochem. 2001;70:503–533. doi: 10.1146/annurev.biochem.70.1.503. [DOI] [PubMed] [Google Scholar]
- 23.Zhou W, Wei W, Sun Y. Genetically engineered mouse models for functional studies of SKP1-CUL1-F-box-protein (SCF) E3 ubiquitin ligases. Cell Res. 2013;23:599–619. doi: 10.1038/cr.2013.44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bernassola F, Karin M, Ciechanover A, Melino G. The HECT family of E3 ubiquitin ligases: multiple players in cancer development. Cancer Cell. 2008;14:10–21. doi: 10.1016/j.ccr.2008.06.001. [DOI] [PubMed] [Google Scholar]
- 25.Kumar M, et al. ELM-the eukaryotic linear motif resource in 2020. Nucleic Acids Res. 2020;48:D296–D306. doi: 10.1093/nar/gkz1030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Mészáros B, Kumar M, Gibson TJ, Uyar B, Dosztányi Z. Degrons in cancer. Sci. Signal. 2017;10:eaak9982. doi: 10.1126/scisignal.aak9982. [DOI] [PubMed] [Google Scholar]
- 27.Varshavsky A. N-degron and C-degron pathways of protein degradation. Proc. Natl Acad. Sci. USA. 2019;116:358–366. doi: 10.1073/pnas.1816596116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Park SE, et al. Control of mammalian G protein signaling by N-terminal acetylation and the N-end rule pathway. Science. 2015;347:1249–1252. doi: 10.1126/science.aaa3844. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Chen, S. J., Wu, X., Wadas, B., Oh, J. H. & Varshavsky, A. An N-end rule pathway that recognizes proline and destroys gluconeogenic enzymes. Science355, eaal3655 (2017). [DOI] [PMC free article] [PubMed]
- 30.Tokheim C, et al. Systematic characterization of mutations altering protein degradation in human cancers. Mol. Cell. 2021;81:1292–1308. doi: 10.1016/j.molcel.2021.01.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Koren I, et al. The eukaryotic proteome is shaped by E3 ubiquitin ligases targeting C-terminal degrons. Cell. 2018;173:1622–1635 e1614. doi: 10.1016/j.cell.2018.04.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 1995;57:289–300. [Google Scholar]
- 33.Zheng R, et al. Cistrome Data Browser: expanded datasets and new tools for gene regulatory analysis. Nucleic Acids Res. 2019;47:D729–D735. doi: 10.1093/nar/gky1094. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Jiang P, Freedman ML, Liu JS, Liu XS. Inference of transcriptional regulation in cancers. Proc. Natl Acad. Sci. USA. 2015;112:7731–7736. doi: 10.1073/pnas.1424272112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Yu J, et al. An integrated network of androgen receptor, polycomb, and TMPRSS2-ERG gene fusions in prostate cancer progression. Cancer Cell. 2010;17:443–454. doi: 10.1016/j.ccr.2010.03.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Inamura K, et al. EML4-ALK lung cancers are characterized by rare other mutations, a TTF-1 cell lineage, an acinar histology, and young onset. Mod. Pathol. 2009;22:508–515. doi: 10.1038/modpathol.2009.2. [DOI] [PubMed] [Google Scholar]
- 37.Godavarthy PS, et al. The vascular bone marrow niche influences outcome in chronic myeloid leukemia via the E-selectin–SCL/TAL1–CD44 axis. Haematologica. 2020;105:136–147. doi: 10.3324/haematol.2018.212365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Castro F, Cardoso AP, Goncalves RM, Serre K, Oliveira MJ. Interferon-gamma at the crossroads of tumor immune surveillance or evasion. Front. Immunol. 2018;9:847. doi: 10.3389/fimmu.2018.00847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Peggs K, Mackinnon S. Imatinib mesylate—the new gold standard for treatment of chronic myeloid leukemia. N. Engl. J. Med. 2003;348:1048–1050. doi: 10.1056/NEJMe030009. [DOI] [PubMed] [Google Scholar]
- 40.Schiffer CA. BCR-ABL tyrosine kinase inhibitors for chronic myelogenous leukemia. N. Engl. J. Med. 2007;357:258–265. doi: 10.1056/NEJMct071828. [DOI] [PubMed] [Google Scholar]
- 41.Theurillat JP, et al. Prostate cancer. Ubiquitylome analysis identifies dysregulation of effector substrates in SPOP-mutant prostate cancer. Science. 2014;346:85–89. doi: 10.1126/science.1250255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Hantschel O, et al. A myristoyl/phosphotyrosine switch regulates c-Abl. Cell. 2003;112:845–857. doi: 10.1016/s0092-8674(03)00191-0. [DOI] [PubMed] [Google Scholar]
- 43.Li C, et al. Tumor-suppressor role for the SPOP ubiquitin ligase in signal-dependent proteolysis of the oncogenic co-activator SRC-3/AIB1. Oncogene. 2011;30:4350–4364. doi: 10.1038/onc.2011.151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Barbieri CE, et al. Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nat. Genet. 2012;44:685–689. doi: 10.1038/ng.2279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Grieco M, et al. PTC is a novel rearranged form of the ret proto-oncogene and is frequently detected in vivo in human thyroid papillary carcinomas. Cell. 1990;60:557–563. doi: 10.1016/0092-8674(90)90659-3. [DOI] [PubMed] [Google Scholar]
- 46.Cerrato A, Visconti R, Celetti A. The rationale for druggability of CCDC6-tyrosine kinase fusions in lung cancer. Mol. Cancer. 2018;17:46. doi: 10.1186/s12943-018-0799-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Seo JS, et al. The transcriptional landscape and mutational profile of lung adenocarcinoma. Genome Res. 2012;22:2109–2119. doi: 10.1101/gr.145144.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Drechsler M, Hildebrandt B, Kundgen A, Germing U, Royer-Pokora B. Fusion of H4/D10S170 to PDGFRbeta in a patient with chronic myelomonocytic leukemia and long-term responsiveness to imatinib. Ann. Hematol. 2007;86:353–354. doi: 10.1007/s00277-006-0247-5. [DOI] [PubMed] [Google Scholar]
- 49.Thompson BJ, et al. The SCFFBW7 ubiquitin ligase complex as a tumor suppressor in T cell leukemia. J. Exp. Med. 2007;204:1825–1835. doi: 10.1084/jem.20070872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Yada M, et al. Phosphorylation-dependent degradation of c-Myc is mediated by the F-box protein Fbw7. EMBO J. 2004;23:2116–2125. doi: 10.1038/sj.emboj.7600217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Wei W, Jin J, Schlisio S, Harper JW, Kaelin WG., Jr. The v-Jun point mutation allows c-Jun to escape GSK3-dependent recognition and destruction by the Fbw7 ubiquitin ligase. Cancer Cell. 2005;8:25–33. doi: 10.1016/j.ccr.2005.06.005. [DOI] [PubMed] [Google Scholar]
- 52.Koepp DM, et al. Phosphorylation-dependent ubiquitination of cyclin E by the SCFFbw7 ubiquitin ligase. Science. 2001;294:173–177. doi: 10.1126/science.1065203. [DOI] [PubMed] [Google Scholar]
- 53.Davis RJ, Welcker M, Clurman BE. Tumor suppression by the Fbw7 ubiquitin ligase: mechanisms and opportunities. Cancer Cell. 2014;26:455–464. doi: 10.1016/j.ccell.2014.09.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Hornbeck PV, et al. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 2015;43:D512–520. doi: 10.1093/nar/gku1267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Tong J, Tan S, Zou F, Yu J, Zhang L. FBW7 mutations mediate resistance of colorectal cancer to targeted therapies by blocking Mcl-1 degradation. Oncogene. 2017;36:787–796. doi: 10.1038/onc.2016.247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Zhao J, Tang J, Men W, Ren K. FBXW7-mediated degradation of CCDC6 is impaired by ATM during DNA damage response in lung cancer cells. FEBS Lett. 2012;586:4257–4263. doi: 10.1016/j.febslet.2012.10.029. [DOI] [PubMed] [Google Scholar]
- 57.Schumacher TN, Schreiber RD. Neoantigens in cancer immunotherapy. Science. 2015;348:69–74. doi: 10.1126/science.aaa4971. [DOI] [PubMed] [Google Scholar]
- 58.Zhong J, Ogura K, Wang Z, Inuzuka H. Degradation of the transcription factor Twist, an oncoprotein that promotes cancer metastasis. Discov. Med. 2013;15:7–15. [PMC free article] [PubMed] [Google Scholar]
- 59.Shirogane T, Jin J, Ang XL, Harper JW. SCFbeta-TRCP controls clock-dependent transcription via casein kinase 1-dependent degradation of the mammalian period-1 (Per1) protein. J. Biol. Chem. 2005;280:26863–26872. doi: 10.1074/jbc.M502862200. [DOI] [PubMed] [Google Scholar]
- 60.Ma Y, et al. SCFbeta-TrCP ubiquitinates CHK1 in an AMPK-dependent manner in response to glucose deprivation. Mol. Oncol. 2019;13:307–321. doi: 10.1002/1878-0261.12403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Shen ZX, et al. All-trans retinoic acid/As2O3 combination yields a high quality remission and survival in newly diagnosed acute promyelocytic leukemia. Proc. Natl Acad. Sci. USA. 2004;101:5328–5335. doi: 10.1073/pnas.0400053101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Zhu J, et al. Retinoic acid induces proteasome-dependent degradation of retinoic acid receptor alpha (RARalpha) and oncogenic RARalpha fusion proteins. Proc. Natl Acad. Sci. USA. 1999;96:14807–14812. doi: 10.1073/pnas.96.26.14807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Zhu J, et al. Arsenic-induced PML targeting onto nuclear bodies: implications for the treatment of acute promyelocytic leukemia. Proc. Natl Acad. Sci. USA. 1997;94:3978–3983. doi: 10.1073/pnas.94.8.3978. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Yoshida H, et al. Accelerated degradation of PML-retinoic acid receptor alpha (PML-RARA) oncoprotein by all-trans-retinoic acid in acute promyelocytic leukemia: possible role of the proteasome pathway. Cancer Res. 1996;56:2945–2948. [PubMed] [Google Scholar]
- 65.Raelson JV, et al. The PML/RAR alpha oncoprotein is a direct molecular target of retinoic acid in acute promyelocytic leukemia cells. Blood. 1996;88:2826–2832. [PubMed] [Google Scholar]
- 66.Kakizuka A, et al. Chromosomal translocation t(15;17) in human acute promyelocytic leukemia fuses RAR alpha with a novel putative transcription factor, PML. Cell. 1991;66:663–674. doi: 10.1016/0092-8674(91)90112-c. [DOI] [PubMed] [Google Scholar]
- 67.Stehmeier P, Muller S. Phospho-regulated SUMO interaction modules connect the SUMO system to CK2 signaling. Mol. Cell. 2009;33:400–409. doi: 10.1016/j.molcel.2009.01.013. [DOI] [PubMed] [Google Scholar]
- 68.Percherancier Y, et al. Role of SUMO in RNF4-mediated promyelocytic leukemia protein (PML) degradation: sumoylation of PML and phospho-switch control of its SUMO binding domain dissected in living cells. J. Biol. Chem. 2009;284:16595–16608. doi: 10.1074/jbc.M109.006387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Scaglioni PP, et al. A CK2-dependent mechanism for degradation of the PML tumor suppressor. Cell. 2006;126:269–283. doi: 10.1016/j.cell.2006.05.041. [DOI] [PubMed] [Google Scholar]
- 70.Yoshihara K, et al. The landscape and therapeutic relevance of cancer-associated transcript fusions. Oncogene. 2015;34:4845–4854. doi: 10.1038/onc.2014.406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Lee DF, et al. KEAP1 E3 ligase-mediated downregulation of NF-kappaB signaling by targeting IKKbeta. Mol. Cell. 2009;36:131–140. doi: 10.1016/j.molcel.2009.07.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Schapira M, Calabrese MF, Bullock AN, Crews CM. Targeted protein degradation: expanding the toolbox. Nat. Rev. Drug Discov. 2019;18:949–963. doi: 10.1038/s41573-019-0047-y. [DOI] [PubMed] [Google Scholar]
- 73.Zhang C, et al. Proteolysis targeting chimeras (PROTACs) of anaplastic lymphoma kinase (ALK) Eur. J. Med Chem. 2018;151:304–314. doi: 10.1016/j.ejmech.2018.03.071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Kong X, et al. Drug discovery targeting anaplastic lymphoma kinase (ALK) J. Med. Chem. 2019;62:10927–10954. doi: 10.1021/acs.jmedchem.9b00446. [DOI] [PubMed] [Google Scholar]
- 75.Tong B, et al. A nimbolide-based kinase degrader preferentially degrades oncogenic BCR-ABL. ACS Chem. Biol. 2020;15:1788–1794. doi: 10.1021/acschembio.0c00348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Yang Y, et al. Global PROTAC Toolbox for degrading BCR-ABL overcomes drug-resistant mutants and adverse effects. J. Med. Chem. 2020;63:8567–8583. doi: 10.1021/acs.jmedchem.0c00967. [DOI] [PubMed] [Google Scholar]
- 77.Burslem GM, et al. Targeting BCR-ABL1 in chronic myeloid leukemia by PROTAC-mediated targeted protein degradation. Cancer Res. 2019;79:4744–4753. doi: 10.1158/0008-5472.CAN-19-1236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Choi YL, et al. EML4-ALK mutations in lung cancer that confer resistance to ALK inhibitors. N. Engl. J. Med. 2010;363:1734–1739. doi: 10.1056/NEJMoa1007478. [DOI] [PubMed] [Google Scholar]
- 79.Inuzuka H, et al. Phosphorylation by casein kinase I promotes the turnover of the Mdm2 oncoprotein via the SCF(beta-TRCP) ubiquitin ligase. Cancer Cell. 2010;18:147–159. doi: 10.1016/j.ccr.2010.06.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Murphy, C. & Elemento, O. AGFusion: annotate and visualize gene fusions. Preprint at https://www.biorxiv.org/content/10.1101/080903v1 (2016).
- 81.Frankish A, et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 2019;47:D766–D773. doi: 10.1093/nar/gky955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.El-Gebali S, et al. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47:D427–D432. doi: 10.1093/nar/gky995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Chakravarty, D. et al. OncoKB: a Precision Oncology Knowledge Base. JCO Precis Oncol.2017, 10.1200/PO.17.00011 (2017). [DOI] [PMC free article] [PubMed]
- 84.Bailey MH, et al. Comprehensive characterization of cancer driver genes and mutations. Cell. 2018;173:371–385 e318. doi: 10.1016/j.cell.2018.02.060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Sondka Z, et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer. 2018;18:696–705. doi: 10.1038/s41568-018-0060-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Bland JM, Altman DG. Statistics notes: the odds ratio. BMJ. 2000;320:1468. doi: 10.1136/bmj.320.7247.1468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Wong WC, et al. CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer. Bioinformatics. 2011;27:2147–2148. doi: 10.1093/bioinformatics/btr357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.UniProt-Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47:D506–D515. doi: 10.1093/nar/gky1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Zhou X, et al. Exploring genomic alteration in pediatric cancer using ProteinPaint. Nat. Genet. 2016;48:4–6. doi: 10.1038/ng.3466. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Inuzuka H, et al. SCF FBW7 regulates cellular apoptosis by targeting MCL1 for ubiquitylation and destruction. Nature. 2011;471:104. doi: 10.1038/nature09732. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Thorsson V, et al. The immune landscape of cancer. Immunity. 2018;48:812–830 e814. doi: 10.1016/j.immuni.2018.03.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Carter SL, et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 2012;30:413–421. doi: 10.1038/nbt.2203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Li J, et al. TCPA: a resource for cancer functional proteomics data. Nat. Methods. 2013;10:1046–1047. doi: 10.1038/nmeth.2650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Rousseeuw PJ, Driessen KV. A fast algorithm for the minimum covariance determinant estimator. Technometrics. 1999;41:212–223. [Google Scholar]
- 95.Tokheim, C. Genetic fusions favor tumorigenesis through degron loss in oncogenes, fusion_pipeline, 10.5281/zenodo.5565550 (2021). [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data are available in the article, Supplementary Information, or Supplementary Data 1–8. The full list of recurrent genetic fusions, full list of genes and oncogenes with internal and C-terminal degron loss, full list of protein abundance of fused genes, full list of downstream transcription factors due to genetic fusions are included in the Supplementary Data. The original gene fusion calls were obtained from Supplementary Data 1 of Gao et al.11. The subsequently annotated and processed gene fusion data for downstream statistical analysis is available on GitHub (https://github.com/ctokheim/fusion_pipeline). The output from the analysis can be found in the Supplementary Data. All data used in the analyses described in this study are freely available within the public database, including TCGA (https://www.cancer.gov/tcga), OncoKB (https://www.oncokb.org/), CGC (https://cancer.sanger.ac.uk/census), Uniprot (https://www.uniprot.org/), PFAM (http://pfam.xfam.org/), ELM (http://elm.eu.org/), and PhosphoSitePlus (https://www.phosphosite.org/). Source data are provided with this paper.
Custom code for this manuscript is available on GitHub (https://github.com/ctokheim/fusion_pipeline) and is archived on Zenodo95.The README file in the GitHub repository describes how to reproduce the analysis. The code uses python 3 and exact version numbers of dependencies are listed in the environment configuration file. The deepDegron code to analyze c-terminal degrons is also freely available on GitHub (https://github.com/ctokheim/deepDegron).