Figure 1.
Example of a HAMAP protein family annotation template (family rule), MF_00074 (http://www.expasy.org/unirule/MF_00074). Annotation templates contain three sections: ‘General rule information’, ‘Propagated annotation’ and ‘Additional information’. General information comprises: family identification number (MF_xxxxx), dates of creation and revision, ‘Data class’, i.e. that the whole protein is annotated by the family rule and not only a specific domain, and ‘Predictors’, which contain the distribution of matches and the alignment that was used to generate the family profile. The ‘Propagated annotation’ section contains the information that is propagated to all members of a protein family, or to some, if the field is preceded by ‘cases’ or ‘conditions’. For MF_00074, the function field will be different depending on the taxonomic origin, but all proteins will have ‘Cytoplasm’ as subcellular location and all belong to the family ‘RNA methyltransferase rsmG’. It also contains cross-references to other protein family databases, such as Pfam and TIGRFAMS, and manually selected GO terms. Additional information includes the size range of members of this family, if there are protein families related to this one, the list of characterized protein(s) that were used to compile information for the creation of the protein family and its annotation template (for MF_00074, literature is found for the proteins of E. coli, Bacillus subtilis, Microbacterium tuberculosis and Streptomyces coelicolor), the scope, i.e. the taxonomic groups covered by this family, if in at least one member this protein is fused to another protein either in the N-terminal or C-terminal region, and whether there are duplicates or whether in some species the protein is encoded on a plasmid. In the ‘UniProtKB rule member sequences’ section, complete sets of member proteins can be retrieved, taxonomic distribution can be browsed, and specific sets of proteins can be retrieved.