Functional genomics of intein-containing proteins. (A) Dominant functional categories of proteins with inteins based on Clusters of Orthologous Groups (COGs). COG annotation is for bacteria (red bars) and archaea (blue bars). The frequency of proteins with inteins/COGs is shown above and the frequency of proteins/COGs for each functional category within randomized data sets of proteins from bacteria and archaea is shown below. Frequency of intein-containing proteins in the top functional categories is indicated next to arrows. Functional category L (replication, recombination and repair) and F (nucleotide transport and metabolism) are dominant among intein-containing proteins in both bacteria and archaea. Functional categories are designated based on conventional classification (Tatusov et al. 1997; Tatusov et al. 2003; Galperin et al. 2015) and are as follows: J, translation, ribosomal structure and biogenesis; K, transcription; D, cell cycle control, cell division, chromosome partitioning; M, cell wall/membrane/envelope biogenesis; N, cell motility; O, post-translational modification, protein turnover, chaperones; P, Inorganic ion transport and metabolism; T, signal transduction mechanisms; C, energy production and conversion; E, amino acid transport and metabolism; G, carbohydrate transport and metabolism; H, coenzyme transport and metabolism; I, lipid transport and metabolism; Q, secondary metabolites biosynthesis, transport and catabolism; R, general function prediction only; S, function unknown; U, intracellular trafficking, secretion, and vesicular transport; V, defense mechanisms; W, extracellular structures; X, mobilome: prophage, transposons. (B) GO enrichment analysis for bacterial and archaeal intein-containing proteins. GO enrichment of 1,047 bacterial (red) and 502 archaeal (blue) intein-containing proteins was performed using WEGO (Ye et al. 2006). Enriched GO terms in binding and molecular function are shown. DNA and ATP binding as well as ATPase activities are the dominant GO terms among the intein-containing proteins from both bacteria and archaea. The percentage of the associated proteins is indicated on the top for dominant categories. CoF, cofactor; Me, metal clusters; Pr, protein; Ox/Red, oxidoreductase; Trans, transferase; Iso, isomerase; Lig, ligase; DA, deaminase.