Table 1. Truncated domains in the fusion proteins.
fusion protein | fplen | bp | Pfid | Dlen | Dbeg | Dend | N/C | Dfract | IUleft | IUright | Pfam Desc | Mode of Survival? |
MLL-ELL | 1966 | 1406 | PF10390 | 237 | 4 | 237 | C | 0.99 | 80.1 | 43.5 | RNA pol II ef | no PDB |
Col1A1-PDGFB | 488 | 269 | PF04692 | 76 | 2 | 76 | C | 0.99 | 97.3 | 64.3 | Platelet gf | almost full domain |
CBFB-MYH11 | 553 | 165 | PF02312 | 171 | 1 | 165 | N | 0.96 | 38.2 | 61.4 | Core bind f α | almost full domain |
RUNX1-CBFA2T3 | 806 | 204 | PF00853 | 135 | 1 | 130 | N | 0.96 | 41.9 | 67.4 | Runt domain | almost full domain |
NSD1-ANKRD28 | 811 | 412 | PF00023 | 33 | 3 | 33 | C | 0.94 | 38.6 | 24.0 | Ankyrin | short repeats |
TPM4-ALK | 784 | 221 | PF00261 | 237 | 1 | 210 | N | 0.89 | 52.1 | 24.4 | Tropomyosin | coiled-coil |
DAZAP1-MEF2D | 390 | 154 | PF00076 | 70 | 1 | 60 | N | 0.86 | 50.0 | 63.5 | RNA rec motif | 2dgs_A termini IUP (11, 14 aa) |
IL2-TNFRSF17 | 298 | 117 | PF00715 | 144 | 1 | 123 | N | 0.85 | 14.8 | 12.5 | Interleukin 2 | artefact |
MN1-ETV6 | 1653 | 1256 | PF02198 | 87 | 14 | 87 | C | 0.85 | 50.7 | 30.9 | SAM domain | 1×66_A N-terminal 18 aa IUP |
TMPRSS2-ETV1 | 422 | 5 | PF04621 | 333 | 61 | 333 | C | 0.82 | 62.3 | 55.1 | PEA3 ETS tf N | no PDB |
DDX10-NUP98 | 1462 | 219 | PF00270 | 171 | 1 | 133 | N | 0.78 | 22.7 | 61.7 | DEAD b h-ase | 1vec_A 151 |
CDK6-MLL | 2599 | 123 | PF00628 | 53 | 13 | 53 | C | 0.77 | 29.0 | 12.5 | PHD-finger | 1weu_A 12, 17 |
E2A-PRL | 825 | 483 | PF03792 | 200 | 56 | 200 | C | 0.73 | 65.7 | 48.4 | PBC domain | no PDB |
CLTC-TFE3 | 1212 | 932 | PF00637 | 140 | 1 | 101 | N | 0.72 | 19.7 | 42.3 | Clathrin rep | elongated coil of a-helices |
MLL-AFF1 | 2308 | 1444 | PF05110 | 1201 | 338 | 1201 | C | 0.72 | 67.2 | 72.1 | AF-4 oncoprot | no PDB |
RUNX1-AFF3 | 1184 | 322 | PF05110 | 1205 | 341 | 1205 | C | 0.72 | 54.3 | 62.9 | AF-4 oncoprot | no PDB |
MYH9-ALK | 2207 | 1644 | PF01576 | 858 | 1 | 605 | N | 0.71 | 55.6 | 41.4 | Myosin tail | no PDB |
RABEP1-PDGFRB | 1317 | 738 | PF09311 | 196 | 1 | 127 | N | 0.65 | 24.8 | 15.4 | Rab5 binding | coiled-coil |
NPM-RAR | 563 | 133 | PF03066 | 199 | 1 | 121 | N | 0.61 | 39.6 | 21.2 | Nucleoplasmin | nucleophosmin 2p1b_H, 122 aa |
COL6A3-CSF1 | 2089 | 1738 | PF00092 | 174 | 1 | 100 | N | 0.57 | 40.7 | 77.8 | VWA | 1dzi_A 104,114 |
Col1A1-PDGFB | 488 | 269 | PF01391 | 60 | 1 | 34 | N | 0.57 | 97.3 | 64.3 | Collagen | coiled-coil, repeat |
ETV6-JAK2 | 622 | 154 | PF07714 | 261 | 118 | 261 | C | 0.55 | 31.8 | 28.0 | Tyr kinase | 2f4j_A 157, 167 |
MLL-FOXO3 | 1910 | 1444 | PF00250 | 98 | 50 | 98 | C | 0.51 | 49.1 | 58.0 | Fork head | 1jxs_A 43, 53, 63 |
MSN-ALK | 1005 | 448 | PF00769 | 240 | 1 | 111 | N | 0.46 | 62.3 | 34.9 | Ezrin | 1e5w_A 106, 206, 286, 306 |
CBFB-MYH11 | 553 | 165 | PF01576 | 858 | 512 | 858 | C | 0.40 | 38.2 | 61.4 | Myosin tail | no PDB |
CDK6-MLL | 2599 | 123 | PF00069 | 288 | 1 | 114 | N | 0.40 | 29.0 | 12.5 | Protein kinase | 1g3n_A 89, 94, 104 |
COL6A3-CSF1 | 1571 | 1211 | PF05337 | 268 | 181 | 268 | C | 0.33 | 32.6 | 81.2 | Mphag CSF1 | no PDB |
ATIC-ALK | 792 | 229 | PF01808 | 328 | 1 | 96 | N | 0.29 | 38.2 | 36.7 | AIC/IMPCHase | 1p4r_A 200(scop) |
RPN1-EVI1 | 1156 | 87 | PF04597 | 429 | 1 | 59 | N | 0.14 | 26.8 | 31.6 | Ribophorin I | no PDB |
MLL-CALM | 1803 | 1406 | PF07651 | 267 | 237 | 267 | C | 0.12 | 71.6 | 58.8 | ANTH domain | 1hf8_A 99, 139, 219 |
Nontrivial cases of fusion proteins are shown where breakpoint falls into a Pfam domain. The abbreviated column identifiers are as follows: Pfid, Pfam identifier; fplen, fusion protein length; bp, breakpoint; Dlen, domain length, Dbeg, Dend, domain match beginning and end, respectively; N/C, the retained half of the truncated domain; Dfract, the retained fraction of the truncated domain; IUleft, IUright, the predicted disorder for the truncated domain and its “mirror” (same number of amino acids as in the truncated domain) on the opposite side of the breakpoint. In the IUleft/IUright columns the value for the truncated domain is italicized whereas the disorder value for its “mirror” is shown in bold. In the last column possible strategies are shown for the truncated domains to follow to avoid elimination by the proteasomal degradation system [49]. “No PDB” indicates the lack of any PDB structures associated with the protein family in question, which together with high predicted disorder values raises the suspicion of the domain being intrinsically disordered. When a PDB code is shown with a list of numbers (shown in bold italic) they indicate positions in the actual domains that are presumably indifferent to truncation based on the exposed hydrophobic surface of the truncated domain (as shown in detail in Figure 4).