Table 1. Select known and novel motifs found by FIRE-pro.
Name | Motif | Z-score | Best Match | Match details | Pos Bias | Domain | Dom. Overlap | Best GO term |
a) Known | ||||||||
CLB2: B-type cyclin | SP.[RK] | 312 | SP.[RK] | CDK kinase substrate | Y | Pkinase (1e-04) | −3.5 | cell cycle (1e-16) |
PTK2: Putative S/T kinase | RR.[SHP] | 122 | RR.S | PKA kinase substrate | - | phosphotransferase activity (0.01) | ||
GO: nuclear part | [KRN]KR[KSR] | 99 | K[KR].[KR] | Nuclear localization | Bromodomain (0.001) | −1.1 | nuclear lumen (1e-91) | |
TPK1: cAMP-dependent kinase | R[RK].S | 96 | R[KER].S | PKA kinase substrate | ||||
LSB3: C-terminal SH3 domain | [PQ]P..P[PTM]R | 92 | P..P | SH3 general ligand | actin cytoskeleton biogenesis (1e-05) | |||
GO: membrane | L[LAF]G | 89 | LLG | Beta2-Integrin binding | Mito_carr (1e-06) | 0.3 | intrinsic to membrane (1e-67) | |
GO: transcription | N[NTP]N[NAP] | 77 | NNNN | Poly-asparagine | Y | Zn_clus (0.001) | −0.7 | transcription (1e-10) |
RSP5: Ubiquitin-protein ligase | PP.Y | 76 | PP.Y | LIG_WW_1 | ||||
CLB2: B-type cyclin | L..SP | 74 | SP | ERK1,2 Kinase substrate | Pkinase (0.001) | −1.4 | bud neck (1e-06) | |
RIM11: kinase | [GSQ]S..[ANV]SP | 72 | [ST]…[ST]P | RIM11 Kinase substrate | ||||
GO: transcription | Q[QNH]Q | 68 | QQQ | Poly-glutamine | zf-C2H2 (1e-11) | −0.9 | transcription (1e-14) | |
GO: membrane-enclosed lumen | K[KRE][REH]K | 67 | KR | CLV_PCSK_PC1ET2_1 | Y | nuclear lumen (1e-10) | ||
GO: nucleus | LK | 67 | F.F.LK…K.R | Phosphatidylserine binding | WD40 (1e-07) | −0.4 | nuclear lumen (1e-19) | |
GO: cellular morphogenesis | [STL]S..[SAD]S | 66 | S..[ST] | Casein kinase I phos. site | Pkinase (0.01) | −4.6 | cellular morphogenesis (1e-15) | |
Localization: actin | PPP.[PHY] | 63 | PPP | Polyproline | Y | SH3_1 (1e-04) | −0.7 | actin cortical patch (1e-14) |
GO: cell cycle | [SYI]S…S | 54 | S…S | WD40 binding | Pkinase (1e-04) | −4.8 | cell cycle (10) | |
PPH22: phosphatase subunit | SP.[GD]R[LYN] | 52 | SP | ERK1,2 Kinase substrate | Proteasome (1e-08) | −3.7 | proteasome core complex(1e-10) | |
CDC15: MEN kinase | S..[PWH]S | 30 | S…S | WD40 binding | Pkinase (1e-18) | −2 | protein kinase activity (1e-14) | |
b) Semi-Novel | ||||||||
SMT3: SUMO family protein | A[DVA]A | 66 | [LV]IA[DE][PA] | Caveolin pattern | carboxylic acid metabolism (1e-07) | |||
YCK1: membrane casein kinase | S.[SEV]D | 65 | HSTSDD | BCKDC kinase | ||||
Plasmodium expression cluster | K..Y[ISH] | 47 | Y[LI] | SH2 ligand for PLCgamma1 | Y | Rifin_STEVOR (0.01) | −5.3 | |
PRE2: 20S proteasome subunit | VEYA | 46 | VIYAAPF | Abl kinase substrate | Y | Proteasome (1e-09) | −3.8 | proteasome core complex (1e-11) |
PPH22: phosphatase subunit | [TIV][FH]SP | 36 | SP | ERK substrate | Y | Proteasome (1e-12) | −4.5 | proteasome core complex (1e-16) |
PPH22: phosphatase subunit | EY.[LS]E[AS] | 36 | [DE]Y | EGFR kinase substrate | Y | Proteasome (1e-10) | −4.1 | proteasome core complex (1e-09) |
HTZ1: Histone | [GVH]G[KYQ]G | 32 | GGQ | N-methylation in E. coli | Y | Histone (1e-05) | −2.5 | nuclear chromatin (1e-06) |
PAB1: Poly(A) binding | G.[PRT]G | 31 | IQ.RG.RG | Binding on Calmodulin | RRM_1 (0.001) | −4.1 | RNA metabolism (1e-09) | |
Localization: periphery (S. pombe) | T..[PSL]N | 30 | T..[SA] | FHA of KAPP binding | Pkinase (1e-04) | −2 | barrier septum (1e-54) | |
Plasmodium expression cluster | R.[GSA]R | 29 | [AG]R | Protease matriptase site | DEAD (1e-13) | −2.9 | ATP-dependent helicase activity (1e-12) | |
ARC1: tRNA binding | S[DQP]S | 28 | R.S.S.P | 14-3-3 bindings | Pkinase (1e-14) | −3.9 | protein kinase activity (1e-13) | |
HHT1: histone | KP..[KFV][KHA] | 28 | KP..[QK] | LIG_SH3_4 | Histone (0.01) | −2.8 | chromatin architecture (1e-07) | |
PPI clusters | SP[STN] | 24 | SP | ERK substrate | interphase (1e-06) | |||
Localization clusters (Huh, 2003) | P..[PSE]P | 21 | P.[ST]PP | ERK substrate | Y | PX (1e-05) | −0.3 | cell cortex part (1e-24) |
Localization multiclass (Huh, 2003) | T..[SFL]T | 11 | T..[SA] | FHA of KAPP binding | Y | nuclear pore (1e-29) | ||
Localization clusters (Huh, 2003) | TG.G[KLW][TFY] | 11 | TGY | ERK6/SAPK3 activation sites | Helicase_C (1e-10) | −1.1 | RNA helicase activity (1e-11) | |
c) Novel | ||||||||
GO: nuclear part | DE[EDK][ED] | 131 | Y | nuclear lumen (1e-09) | ||||
Ubiquitin-conjugates (Peng, 2003) | L..[LDS]A | 125 | Y | IBN_N (1e-05) | −0.4 | Golgi apparatus (1e-08) | ||
GO: membrane | I[FIW]..V | 70 | Adaptin_N (0.001) | 0.6 | transporter activity (1e-40) | |||
GO: ribosome biogenesis | E[EDK]..E[EKD] | 67 | WD40 (0.01) | −2.3 | cytoplasm organization (1e-12) | |||
YAP1: Basic leucine zipper | QQ..M[QIV][QTA] | 66 | RNA polymerase II TF activity (1e-06) | |||||
NOP2: RNA methyltransferase | R[GST].[DQF]IP | 56 | Y | DEAD (1e-05) | −1.1 | ribosome biogenesis (1e-08) | ||
GO: DNA-dependent transcription | N.D[DST] | 52 | zf-C2H2 (1e-06) | −1.5 | transcription, DNA-dependent (1e-23) | |||
GO: transcription | N.D[DST] | 52 | zf-C2H2 (1e-06) | −1.5 | transcription, DNA-dependent (1e-23) | |||
SMT3: SUMO family protein | V.[DKG]A | 47 | Y | carboxylic acid metabolism (1e-04) | ||||
POB3: Nucleosome maintenance | [GH]S..KA[SI] | 33 | Histone (0.01) | −1.6 | chromatin architecture (0.001) | |||
UBP15: Ubiquitin-specific protease | A.[TSL]S | 28 | Pkinase (0.001) | −2.1 | protein kinase activity (0.001) | |||
PRE2: 20S proteasome subunit | Q[VID]E | 26 | Proteasome (1e-08) | −4.8 | proteasome complex (1e-19) | |||
Half-life (Belle, 2006) | R.[RSY]S | 25 | reg. of cellular physiological process (1e-04) | |||||
PPI clusters | GGL[FTL][GEP] | 13 | snRNP protein import into nucleus (1e-07) |
Known: matches previously identified; Semi-novel: matches sequence but has distinct biological context; Novel: no match.
Select (a) known, (b) semi-novel, and (c) novel motifs discovered by FIRE-pro. Known motifs match previously identified motifs in the literature in both sequence and biological context. Semi-novel motifs match previously identified motifs in sequence but not in biological context. Novel motifs do not match any previously identified motif. Motifs presented here were selected based on a combination of criteria including high mutual information and z-score, low domain overlap score, positional bias, GO enrichment, and similarity to known motifs. Name refers to the dataset in which the motif was discovered and is abbreviated as follows, GO: term = binary profile of proteins annotated to the GO term; Protein: description = binary profile of proteins interacting with the protein; Localization: compartment = binary profile of proteins localized to the cellular compartment. See Text S1 for further description of datasets.