"family","P_CPP","P_eff","P_all","GO_function","GO_process","description" "Protamine_P1",0.978618,0.644737,0.631528,"DNA binding","spermatogenesis","Protamine P1 NA" "Ribosomal_L41",0.966797,0.513672,0.496902,"structural constituent of ribosome","translation","Ribosomal protein L41 NA" "TP1",0.890625,0.457933,0.408316,"DNA binding","spermatogenesis","Nuclear transition protein 1 NA" "DUF1713",0.88101,0.467248,0.412342,"N/A","N/A","Mitochondrial domain of unknown function (DUF1713) This domain is found at the C terminal end of mitochondrial proteins of unknown function." "Ribosomal_L39",0.880849,0.564255,0.501161,"structural constituent of ribosome","translation","Ribosomal L39 protein NA" "Ribosomal_L34",0.876961,0.51068,0.447575,"structural constituent of ribosome","translation","Ribosomal protein L34 NA" "DUF3934",0.876202,0.460337,0.403471,"N/A","N/A","Protein of unknown function (DUF3934) This family of proteins is functionally uncharacterised. This family of proteins is found in bacteria. Proteins in this family are approximately 40 amino acids in length. There are two conserved sequence motifs: GTG and SKG." "Microvir_J",0.875,0.5,0.443726,"DNA binding","viral DNA genome packaging","Microvirus J protein This small protein is involved in DNA packaging, interacting with DNA via its hydrophobic carboxyl terminus. In bacteriophage phi-X174, J is present in 60 copies, and forms an S-shaped polypeptide chain without any secondary structure. It is thought to interact with DNA through simple charge interactions [1]." "Protamine_3",0.875,0.640625,0.560547,"DNA binding","sperm chromatin condensation","Spermatozal protamine family This family consists of the spermatozal protamines. Spermatozal protamines play an important role in remodelling of the sperm chromatin during mammalian spermiogenesis. Nuclear elongation and chromatin condensation are concomitant with modifications in the basic protein complement associated with DNA. Somatic histones are initially replaced by testis -specific histone variants, then by transitional proteins, and ultimately by protamines [1]." "Mastoparan",0.869792,0.375,0.326457,"N/A","N/A","Mastoparan protein Mastoparans are a family of tetradecapeptides from wasp venom, that have been shown to directly activate GTP-binding regulatory proteins. These peptides show selectivity among G proteins: they strongly activate Go and Gi but not Gs or Gt. The peptide of this family are composed by 14 amino acids but they can assume different structures [1]." "DUF3983",0.867839,0.537109,0.470022,"N/A","N/A","Protein of unknown function (DUF3983) This family of proteins is functionally uncharacterised. This family of proteins is found in bacteria and viruses. Proteins in this family are approximately 40 amino acids in length. There is a conserved AWRN sequence motif." "AT_hook",0.857474,0.333761,0.286209,"DNA binding","N/A","AT hook motif At hooks are DNA binding motifs with a preference for A/T rich regions." "Antimicrobial_4",0.848958,0.442708,0.375692,"N/A","hemolysis by symbiont of host erythrocytes","Ant antimicrobial peptide This family consists of the ponericin family of antimicrobial peptides isolated from predatory ant Pachycondyla goeldii. The ponericin peptides may adopt amphipathic alpha-helical structure in polar environments. In the ant colony, these peptides exhibit a defensive role against microbial pathogens arising from prey introduction and/or ingestion [1]." "Hap4_Hap_bind",0.82897,0.406672,0.336056,"DNA binding","regulation of transcription, DNA-dependent","Minimal binding motif of Hap4 for binding to Hap2/3/5 In Saccharomyces cerevisiae, the haem-activated protein complex Hap2/3/4/5 plays a major role in the transcription of genes involved in respiration [3]. Hap4_Hap_bind is the essential domain of Hap4 which allows it to associate with Hap2, Hap3 and Hap5 to form the Hap complex [2]." "HC2",0.823582,0.341534,0.286704,"DNA binding","chromosome condensation","Histone H1-like nucleoprotein HC2 This family contains the bacterial histone H1-like nucleoprotein HC2 (approximately 200 residues long), which seems to be found mostly in Chlamydia. HC2 functions in DNA condensation, although it has been suggested that it also has other roles [1]." "P19Arf_N",0.810547,0.585938,0.474823,"N/A","N/A","Cyclin-dependent kinase inhibitor 2a p19Arf N-terminus This family represents the N-terminus (approximately 50 residues) of cyclin-dependent kinase inhibitor 2a p19Arf, which seems to be restricted to mammals. This is a tumour-suppressor protein that has been shown to inhibit the growth of human tumour cells lacking functional p53 by inducing a transient G2 arrest and subsequently apoptosis [1]." "Ribosomal_L23eN",0.806823,0.42499,0.344306,"N/A","N/A","Ribosomal protein L23, N-terminal domain The N-terminal domain appears to be specific to the eukaryotic ribosomal proteins L25, L23, and L23a." "Protamine_P2",0.782031,0.597656,0.470227,"DNA binding","spermatogenesis","Sperm histone P2 This protein also known as protamine P2 can substitute for histones in the chromatin of sperm (Swiss). The alignment contains both the sequence of the mature P2 protein and its propeptide." "Ribosomal_L35p",0.782014,0.440328,0.345104,"structural constituent of ribosome","translation","Ribosomal protein L35 NA" "Asr",0.775,0.307292,0.234798,"N/A","response to acidity","Acid shock protein repeat The Asr protein is synthesised as a precursor and the cleavage is essential for moderate to high acid tolerance [1]." "CCT",0.771471,0.497236,0.385252,"protein binding","N/A","CCT motif This short motif is found in a number of plant proteins. It is rich in basic amino acids and has been called a CCT motif after Co, Col and Toc1 [1]. The CCT motif is about 45 amino acids long and contains a putative nuclear localisation signal within the second half of the CCT motif [1]. Toc1 mutants have been identified in this region." "DUF2986",0.757714,0.431764,0.32831,"N/A","N/A","Protein of unknown function (DUF2986) This family of proteins has no known function." "SRRM_C",0.747243,0.549173,0.409661,"N/A","N/A","Serine/arginine repetitive matrix protein C-terminus This domain is found near to the C-terminus of Serine/arginine repetitive matrix proteins 3 and 4." "Ribosomal_L36",0.74523,0.47004,0.352211,"structural constituent of ribosome","translation","Ribosomal protein L36 NA" "Antimicrobial_9",0.734375,0.421875,0.30957,"N/A","innate immune response","Ponericin L family This family consists of the ponericin L family of antimicrobial peptides that are isolated from the venom of the predatory ant Pachycondyla goeldii. Ponericin L family shares similarities with dermaseptins. Ponericin L may adopt an amphipathic alpha-helical structure in polar environments and these peptides exhibit a defensive role against microbial pathogens arising from prey introduction and/or ingestion [1]." "Ribosomal_L29e",0.729555,0.474502,0.349239,"structural constituent of ribosome","translation","Ribosomal L29e protein family NA" "Ponericin",0.709375,0.553125,0.404443,"N/A","N/A","Ponericin This family contains a number of ponericin peptides (approximately 30 residues long) from the venom of the predatory ant Pachycondyla goeldii. These peptides exhibit antibacterial and insecticidal properties, and may adopt an amphipathic alpha-helical structure in polar environments such as cell membranes [1]." "Flavi_capsid",0.709147,0.544434,0.393379,"structural molecule activity","N/A","Flavivirus capsid protein C Flaviviruses are small enveloped viruses with virions comprised of 3 proteins called C, M and E. Multiple copies of the C protein form the nucleocapsid, which contains the ssRNA molecule." "Ribosomal_L37e",0.697959,0.49496,0.353274,"structural constituent of ribosome","translation","Ribosomal protein L37e This family includes ribosomal protein L37 from eukaryotes and archaebacteria. The family contains many conserved cysteines and histidines suggesting that this protein may bind to zinc." "CaM_bdg_C0",0.697266,0.368164,0.256302,"N/A","N/A","Calmodulin-binding domain C0 of NMDA receptor NR1 subunit This is a very short highly conserved domain that is C-terminal to the cytosolic transmembrane region IV of the NMDA-receptor 1. It has been shown to bind Calmodulin-Calcium with high affinity. The ionotropic N-methyl-D-aspartate receptor (NMDAR) is a major source of calcium flux into neurons in the brain and plays a critical role in learning, memory, neural development, and synaptic plasticity. Calmodulin (CaM) regulates NMDARs by binding tightly to the C0 and C1 regions of their NR1 subunit. The conserved tryptophan is considered to be the anchor residue [1]." "Ribosomal_S30",0.686581,0.483272,0.33642,"structural constituent of ribosome","translation","Ribosomal protein S30 NA" "RPA_interact_N",0.682813,0.539062,0.372856,"N/A","N/A","Replication protein A interacting N-terminal This family of proteins represents the N-terminal domain of replication protein A (RPA) interacting protein. RPA interacting protein is involved in the import of RPA into the nucleus. The N-terminal domain is responsible for interaction with importin beta [1-2]." "GYR",0.682292,0.499653,0.343631,"N/A","N/A","GYR motif The GYR motif is found in several drosophila proteins. Its function is unknown, however the presence of completely conserved tyrosine residues may suggest it could be a substrate for tyrosine kinases." "Tcell_CD4_Cterm",0.68006,0.425595,0.28876,"N/A","N/A","T cell CD4 receptor C terminal region This domain is the C terminal domain of the CD4 T cell receptor. The C terminal domain is the cytoplasmic domain which relays the signal for T cell activation. This process involves co-receptor internalisation. This domain is involved in binding to the N terminal of Lck co-receptor in a Zn2+ clasp structure." "Shugoshin_C",0.663396,0.439801,0.293695,"N/A","meiotic chromosome segregation","Shugoshin C terminus Shugoshin-like proteins contain this conserved sequence at the C terminus, which is rich in basic amino-acids. Shugoshin (Sgo1) protects Rec8 at centromeres during anaphase I (during meiosis) so that sister chromatids remain tethered [1]. Sgo2 is a paralogue of Sgo1 and is involved in correctly orienting sister-centromeres [1]." "Cgr1",0.662588,0.437353,0.290622,"N/A","N/A","Cgr1 family Members of this family are coiled-coil proteins that are involved in pre-rRNA processing [1]." "Ribosomal_L6e_N",0.646649,0.414157,0.270574,"structural constituent of ribosome","translation","Ribosomal protein L6, N-terminal domain NA" "P120R",0.643229,0.334635,0.21108,"N/A","N/A","P120R (NUC006) repeat This characteristic repeat of proliferating cell nuclear antigen P120 is found in three copies [1]." "Plant_zn_clust",0.641098,0.425821,0.278234,"N/A","N/A","Plant zinc cluster domain This zinc binding domain was identified by Babu and colleagues and found associated with the WRKY domain Pfam:PF03106 [1]." "Phageshock_PspD",0.641016,0.506641,0.338586,"N/A","N/A","Phage shock protein PspD (Phageshock_PspD) Members of this family are phage shock protein PspD, found in a minority of bacteria that carry the defining genes of the phage shock regulon (pspA, pspB, pspC, and pspF). It is found in Escherichia coli, Yersinia pestis, and closely related species, where it is part of the phage shock operon. It is known to be expressed but its function is unknown." "Bradykinin",0.638889,0.399306,0.254096,"hormone activity","response to stress","Bradykinin This family consists of several bradykinin sequences. The skins of anuran amphibians, in addition to mucus glands, contain highly specialised poison glands, which, in reaction to stress or attack, exude a complex noxious cocktail of biologically active molecules. These secretions often contain a plethora of peptides among which bradykinin or structural variants have been identified [1]." "DUF1661",0.634375,0.464062,0.295516,"N/A","N/A","Protein of unknown function (DUF1661) This is a family containing bacterial proteins of unknown function. Many of the proteins in this family are hypothetical." "Phage_1_1",0.62358,0.475852,0.296831,"N/A","N/A","Bacteriophage 1.1 Protein Gene 1.1 in Bacteriophage T7 encodes a 42 amino acid protein, rich in basic amino acids suggesting its interaction with nucleic acids [1]. Many homologs are present in different T7 and T3-like bacteriophage." "CCT_2",0.617519,0.497443,0.309048,"protein binding","N/A","Divergent CCT motif This short motif is found in a number of plant proteins. It appears to be related to the N-terminal half of the CCT motif. The CCT motif is about 45 amino acids long and contains a putative nuclear localisation signal within the second half of the CCT motif [1]." "Planc_extracel",0.61169,0.487269,0.313576,"N/A","N/A","Planctomycete extracellular This motif is conserved as the N terminus of several Rhodopirellula baltica proteins predicted to be extracellular." "IQ",0.606256,0.44305,0.279275,"protein binding","N/A","IQ calmodulin-binding motif Calmodulin-binding motif." "Myotoxins",0.601562,0.5625,0.337728,"sodium channel inhibitor activity","N/A","Myotoxin NA" "Melittin",0.600446,0.392857,0.248012,"protein kinase inhibitor activity","N/A","Melittin NA" "GN3L_Grn1",0.592118,0.457556,0.272733,"N/A","N/A","GNL3L/Grn1 putative GTPase Grn1 (yeast) and GNL3L (human) are putative GTPases which are required for growth and play a role in processing of nucleolar pre-rRNA [1]. This family contains a potential nuclear localisation signal." "Somatostatin",0.5825,0.47125,0.28957,"hormone activity","N/A","Somatostatin/Cortistatin family Members of this family are hormones. Somatostatin inhibits the release of somatotropin. Cortistatin is a peptide that is related to the Somatostatins that is found to depresses neuronal electrical activity but, unlike somatostatin, induces low-frequency waves in the cerebral cortex and antagonises the effects of acetylcholine on hippocampal and cortical measures of excitability [1]." "NinE",0.581473,0.550223,0.335693,"N/A","N/A","NINE Protein This family consists of NINE proteins from several bacteriophages and from E. coli." "DUF2462",0.576968,0.37662,0.216627,"N/A","N/A","Protein of unknown function (DUF2462) This protein is highly conserved, but its function is unknown. It can be isolated from HeLa cell nucleoli and is found to be homologous with Leydig cell tumour protein whose function is unknown [1, supplementary Table I]." "CAP18_C",0.571546,0.472039,0.284848,"N/A","defense response to bacterium","LPS binding domain of CAP18 (C terminal) This domain family is found in eukaryotes, and is approximately 30 amino acids in length, and the family is found in association with Pfam:PF00666. CAP18 is a protein which is derived from rabbit granulocytes. It has two domains, an N terminal DUF and a C terminal Gram negative LPS binding domain. This domain is the C terminal domain." "Antimicrobial_3",0.5625,0.442708,0.250814,"N/A","hemolysis by symbiont of host erythrocytes","Spider antimicrobial peptide This family includes antimicrobial peptides isolated from the crude venom of the wolf spider Oxyopes kitabensis. These peptides, known as oxyopinins, are the largest linear cationic amphipathic peptides chemically characterised and exhibit disrupting activities towards biological membranes [1]." "Prion_bPrPp",0.558799,0.342516,0.193771,"N/A","N/A","Major prion protein bPrPp - N terminal This family represents the N-terminal domain (1-30) of the bovine prion protein (bPrPp). The proteins structure consists of mainly alpha helices. BPrPp forms a stable helix which inserts in a transmembrane location in the bilayer, with the N -terminal (1-30) functioning as a cell-penetrating peptide [1]." "RS4NT",0.558299,0.428656,0.254293,"N/A","N/A","RS4NT (NUC023) domain This is the N-terminal domain of Ribosomal S4 / S4e proteins. This domain is associated with S4 and KOW domains [1]." "SARS_3b",0.553125,0.496875,0.276221,"N/A","N/A","Severe acute respiratory syndrome coronavirus 3b protein This family of proteins is found in viruses. Proteins in this family are typically between 32 and 154 amino acids in length. This family contains the SARS coronavirus 3b protein which is predominantly localized in the nucleolus, and induces G0/G1 arrest and apoptosis in transfected cells." "IGF2_C",0.551953,0.542188,0.30423,"N/A","N/A","Insulin-like growth factor II E-peptide This domain is found at the C-terminal domain of the insulin-like growth factor II (IGF-2, also see Pfam:PF00049) in vertebrates and seems to represent the E-peptide [1,2]." "Ribosomal_L20",0.549806,0.561186,0.314694,"structural constituent of ribosome, rRNA binding","translation","Ribosomal protein L20 NA" "DVL",0.548587,0.41738,0.236783,"N/A","N/A","DVL family This family consists of the DVL family of proteins. In a gain-of-function genetic screen for genes that influence fruit development in Arabidopsis, DEVIL (DVL) gene was identified. DVL is a small protein and overexpression of the protein results in pleiotropic phenotypes featured by shortened stature, rounder rosette leaves, clustered inflorescences, shortened pedicles, and siliques with pronged tips. DVL family is a novel class of small polypeptides and the overexpression phenotypes suggest that these polypeptides may have a role in plant development [1]." "Antimicrobial_1",0.548018,0.483232,0.278165,"N/A","N/A","Frog antimicrobial peptide This family includes antimicrobial peptides secreted from skins of frogs. The secretion of antimicrobial peptides from the skins of frogs plays an important role in the self defense of these frogs. Structural characterization of these peptides showed that they belonged to four known families: the brevinin-1 family, the esculentin-2 family, the ranatuerin-2 family and the temporin family [1]." "Doppel",0.547619,0.462798,0.265823,"N/A","N/A","Prion-like protein Doppel Dpl is a homologue related to the prion protein (PrP). Dpl is toxic to neurons and is expressed in the brains of mice that do not express PrP. In DHPC and SDS micelles, Dpl shoes about 40% alpha-helical structure however in aqueous solution it consists of a random coil. The alpha helical segment can adopt a transmembrane localisation also in a membrane [1]. The unprocessed Dpl protein is thought to posses a possible channel formation mechanism which may be related to toxicity through direct interaction with cell membranes and damage to the cell membrane [1]." "DUF2256",0.545254,0.472288,0.269798,"N/A","N/A","Uncharacterized protein conserved in bacteria (DUF2256) Members of this family of hypothetical bacterial proteins have no known function." "Sas10",0.542588,0.472085,0.256081,"N/A","gene silencing","Sas10 C-terminal domain Sas10 is an Essential subunit of U3-containing Small Subunit (SSU) processome complex involved in the production of the 18S rRNA and assembly of the small ribosomal subunit." "DUF331",0.530068,0.453388,0.243989,"N/A","N/A","Domain of unknown function Members of this family are uncharacterised proteins from a number of bacterial species. The proteins range in size from 50-70 residues." "Ribosomal_L40e",0.529228,0.426563,0.233416,"structural constituent of ribosome","translation","Ribosomal L40e family Bovine L40 has been identified as a secondary RNA binding protein [1]. L40 is fused to a ubiquitin protein [2]." "Moricin",0.526042,0.480469,0.25944,"N/A","defense response to bacterium","Moricin Moricin is a antibacterial peptide that is highly basic. The structure of moricin reveals that it is comprised of a long alpha-helix. The N-terminus of the helix is amphipathic, and the C-terminus of the helix is predominately hydrophobic. The amphipathic N-terminal segment of the alpha- helix is mainly responsible for the increase in permeability of the bacterial membrane which kills the bacteria [1]." "DUF3511",0.525069,0.402644,0.219152,"N/A","N/A","Domain of unknown function (DUF3511) This presumed domain is functionally uncharacterised. This domain is found in eukaryotes. This domain is about 50 amino acids in length. This domain has two completely conserved residues (Y and K) that may be functionally important." "Leader_Trp",0.525,0.46875,0.262354,"N/A","N/A","Trp-operon Leader Peptide The tryptophan operon regulatory region of C. freundii's (leader transcript) encodes a 14-residue peptide containing characteristic tandem tryptophan residues. It is about 10 nucleotides shorter than those of E. coli and S. typhimurium [1]." "FARP",0.523416,0.384211,0.211581,"N/A","neuropeptide signaling pathway","FMRFamide related peptide family The neuroactive peptide Phe-Met-Arg-Phe-NH2 (FMRF-amide) has a variety of effects on both mammalian and invertebrate tissues [1]." "GIIM",0.521577,0.496101,0.270579,"N/A","N/A","Group II intron, maturase-specific domain This region is found mainly in various bacterial and archaeal species, but a few members of this family are expressed by fungal and chlamydomonal species. It has been implicated in the binding of intron RNA during reverse transcription and splicing [1]." "IF2_assoc",0.520224,0.442431,0.227279,"N/A","N/A","Bacterial translation initiation factor IF-2 associated region Most of the sequences in this alignment come from bacterial translation initiation factors (IF-2, also Pfam:PF04760), but the domain is also found in the eukaryotic translation initiation factor 4 gamma in yeast and in a hypothetical Euglenozoa protein of unknown function." "Ribosomal_L34e",0.519286,0.48431,0.251275,"structural constituent of ribosome","translation","Ribosomal protein L34e NA" "Ribosomal_L44",0.517497,0.435196,0.227879,"structural constituent of ribosome","translation","Ribosomal protein L44 NA" "p12I",0.5125,0.325,0.166626,"N/A","N/A","Human adult T cell leukemia/lymphoma virus protein This family of proteins is found in viruses. Proteins in this family are approximately 100 amino acids in length. p12I binds to the immature beta and gamma-c chains of the interleukin-2 receptor retarding their translocation to the plasma membrane. p12I forms dimers which bind to these chains." "TP2",0.512074,0.434659,0.235041,"DNA binding","spermatogenesis","Nuclear transition protein 2 NA" "Ribosomal_L28",0.511722,0.496412,0.262252,"structural constituent of ribosome","translation","Ribosomal L28 family The ribosomal 28 family includes L28 proteins from bacteria and chloroplasts. The L24 protein from yeast Swiss:P36525 also contains a region of similarity to prokaryotic L28 proteins. L24 from yeast is also found in the large ribosomal subunit" "TMEMspv1-c74-12",0.509115,0.47526,0.243144,"N/A","N/A","Plectrovirus spv1-c74 ORF 12 transmembrane protein This is a family of proteins expressed by Plectroviruses. The plectroviruses are single-stranded DNA viruses belonging to the Inoviridae. Except that it is a putative transmembrane protein the function is not known." "DNA_binding_2",0.508272,0.425551,0.228221,"DNA binding","N/A","DNA-binding domain This domain, often found on ovate proteins, binds to single-stranded and double-stranded DNA. Binding to DNA is not sequence-specific [1]." "GnRH",0.507812,0.420312,0.216553,"hormone activity","multicellular organismal development","Gonadotropin-releasing hormone NA" "N36",0.505859,0.431641,0.226501,"N/A","N/A","36-mer N-terminal peptide of the N protein (N36) The arginine-rich motif of the N protein is involved in transcriptional antitermination of phage lambda. N36 forms a complex with boxB RNA by binding tightly to the major groove of the boxB hairpin via hydrophobic and electrostatic interactions forming a bent alpha helix [1]." "E2",0.503906,0.414062,0.214844,"N/A","N/A","Bacteriophage E2-like protein Short conseved protein described in Lactococcus Bacteriophage c2 of 37 amino acids [1]."