ABSTRACT
Signal transduction is an essential process that allows bacteria to sense their complex and ever-changing environment and adapt accordingly. Three distinct major types of signal-transducing proteins (STPs) can be distinguished: one-component systems (1CSs), two-component systems (2CSs), and extracytoplasmic-function σ factors (ECFs). Since Actinobacteria are particularly rich in STPs, we comprehensively investigated the abundance and diversity of STPs encoded in 119 actinobacterial genomes, based on the data stored in the Microbial Signal Transduction (MiST) database. Overall, we observed an approximately linear correlation between the genome size and the total number of encoded STPs. About half of all membrane-anchored 1CSs are protein kinases. For both 1CSs and 2CSs, a detailed analysis of the domain architectures identified novel proteins that are found only in actinobacterial genomes. Many actinobacterial genomes are particularly enriched for ECFs. As a result of this study, almost 500 previously unclassified ECFs could be classified into 18 new ECF groups. This comprehensive survey demonstrates that actinobacterial genomes encode previously unknown STPs, which may represent new mechanisms of signal transduction and regulation. This information not only expands our knowledge of the diversity of bacterial signal transduction but also provides clear and testable hypotheses about their mechanisms, which can serve as starting points for experimental studies.
IMPORTANCE In the wake of the genomic era, with its enormous increase in the amount of available sequence information, the challenge has now shifted toward making sense and use of this treasure chest. Such analyses are a prerequisite to provide meaningful information that can help guide subsequent experimental efforts, such as mechanistic studies on novel signaling strategies. This work provides a comprehensive analysis of signal transduction proteins from 119 actinobacterial genomes. We identify, classify, and describe numerous novel and conserved signaling devices. Hence, our work serves as an important resource for any researcher interested in signal transduction of this important bacterial phylum, which contains organisms of ecological, biotechnological, and medical relevance.
INTRODUCTION
Bacterial survival critically depends on the ability to swiftly respond to environmental changes. To efficiently monitor the surrounding environment, microbial genomes encode numerous and highly diverse proteins that can sense a given extracellular stimulus, transmit the signal to the cytoplasm, and elicit a proper response. These signal-transducing proteins (STPs) can be divided into three major groups: one-component systems (1CSs), two-component systems (2CSs), and extracytoplasmic-function σ factors (ECFs).
The vast majority of STPs in bacteria are 1CSs. These systems are composed of a single protein that contains an input domain, which senses the stimulus, and an output domain, which elicits the response by binding nucleic acids, modifying proteins, or performing an enzymatic reaction (1). 2CSs, which are typically composed of a histidine kinase (HK) and a response regulator (RR), represent the second-most-abundant signaling principle. The HKs are usually membrane-associated proteins with an extracytoplasmic N-terminal input domain and a cytoplasmic C-terminal transmitter domain. Upon stimulus perception, the HKs autophosphorylate at a highly conserved histidine residue. This phosphohistidine then serves as a phosphoryl group donor to activate the cognate RR through phosphorylation of an invariant aspartate residue. RRs are usually soluble proteins that contain an N-terminal receiver domain, as the target site of the phosphotransfer, and a C-terminal output domain. The output domains of 2CSs are often phylogenetically related to those found in 1CSs (1) and hence also bind nucleic acids, modify proteins, perform some enzymatic activity, or, less frequently, bind other proteins (2). The third pillar of bacterial signal transduction is represented by ECFs. Like other σ factors, ECFs are components of the RNA polymerase holoenzyme that determine the promoter specificity (3). In contrast to the more complex and essential housekeeping σ factors, ECFs contain only two of the four conserved domains of σ70 proteins, termed σ2 and σ4, which are sufficient for interaction with the RNA polymerase core enzyme and for mediating promoter recognition. The activity of the ECFs is usually controlled by membrane-associated anti-σ factors (ASFs) that tightly bind (and thereby inactivate) the cognate ECFs (4). Upon perceiving an appropriate inducing stimulus, the ASFs are inactivated through modifications, conformational changes, or regulated proteolysis, thereby releasing the ECF to recruit RNA polymerase core enzyme and ultimately allowing transcription initiation from alternative and ECF-specific target promoters.
In the wake of the genomic era, with its enormous increase in the amount of available sequence information, the challenge has now shifted toward making sense and use of this treasure chest. Such analyses are a prerequisite to provide meaningful information that can help guide subsequent experimental efforts, such as mechanistic studies on novel signaling strategies. With respect to STPs, this provoked the need to phylogenetically group and classify them in order to identify conserved features that ultimately allow the development of hypotheses about their physiological roles and signaling mechanisms. Over the last decade, classification systems were proposed for 1CSs, HKs, RRs, and ECFs (1, 2, 5, 6). In 2005, Ulrich et al. proposed a classification of 1CSs based on the specific combinations of input and output domains (1). One year later, Galperin suggested classifying RRs based on their output domains (2). A functional grouping of HKs was based on the membrane topology, number of transmembrane (TM) helices, and sequential arrangement of the sensory domains within the N-terminal input domains (5). In the case of ECFs, a combination of sequence similarity, the domain architectures of both the ECFs and their the cognate ASFs, genomic context conservation, and target promoter motifs was used to develop a classification scheme (6).
All of these studies indicated a number of unique features of STPs from actinobacterial genomes: for instance, they do not encode a number of RR types found in other bacteria (e.g., REC-SARP or NtrC type) and virtually lack chemotaxis-related proteins (2). Moreover, many actinobacterial genomes are particularly ECF rich and encode over a dozen unique ECF groups (6). For these reasons, and to account for the significantly increased number and diversity of actinobacterial genomes available now, we decided to comprehensively analyze the STP landscape of Actinobacteria. Our results significantly expand our knowledge compared to the earlier studies, which were based on the rather limited number of actinobacterial genome sequences available at that time. Our goal was to extract a comprehensive picture of how this phylum perceives the environment.
MATERIALS AND METHODS
Building the actinobacterial genome collection.
The phylogenetic tree presented in Fig. 1 was built from the set of 16S rRNA gene sequences of all genomes (see Table S1 in the supplemental material), retrieved via the NCBI Nucleotide Database (http://www.ncbi.nlm.nih.gov/nucleotide). The multiple-sequence alignment of the sequences was generated in Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) (7). Its gapless version was used to generate the phylogenetic tree, which was built in the BioEdit Sequence Alignment Editor (http://www.mbio.ncsu.edu/bioedit/bioedit.html) (8) using the neighbor-joining method (9) and visualized in Dendroscope (10) (available at http://ab.inf.uni-tuebingen.de/software/dendroscope/).
Characterization of one- and two-component systems.
For the complete set of proteins identified as 1CSs in the Microbial Signal Transduction (MiST) database (11), the protein annotation, organism, number of transmembrane helices (TMHs) predicted by TMHMM (available at http://www.cbs.dtu.dk/services/TMHMM/) (12), and conserved protein domains predicted by Pfam (http://pfam.xfam.org) (13) were extracted and used for a semiautomated classification with custom scripts written in MATLAB (The MathWorks Inc.). From the complete set of 1CSs, 1,999 proteins with at least one transmembrane helix were selected for further analysis. As a selection criterion, we considered TMH predictions by TMHMM Server v. 2.0 (12), which predicts only membrane-spanning TMHs. Note that the TMHs graphically represented in the MiST database (11) are predictions by the DAS-TMfilter server (http://mendel.imp.ac.at/sat/DAS/DAS.html) (14), which include hydrophobic stretches as small as 2 amino acids long, thus leading to a large number of false positives in an automated screen. The selected proteins were then classified, based on their domain architecture (as predicted by Pfam [13]), as protein kinases or phosphatases, guanylate cyclases, and DNA- or RNA-binding proteins (Table 1; see Table S2 in the supplemental material).
TABLE 1.
Group identifier | No. of proteins | Lengtha | Protein domain architectureb | Taxonomical spanc | Conserved genomic context |
---|---|---|---|---|---|
Kinases | |||||
1CS_1.1 | 633 | 560 ± 136 | TMHn-Pkinase | Widespread | None |
Pkinase-TMHn | |||||
TMHn-Pkinase-TMHn | |||||
1CS_1.2 | 163 | 640 ± 61 | Pkinase-TMH1–2-PASTA1–5 | At, B, Cf, F | Transpeptidase, FtsW |
1CS_1.3 | 15 | 735 ± 106 | Pkinase-NHL1–4 | At, Cf, Dt, F, Pr | None |
1CS_1.4 | 12 | 852 ± 82 | Pkinase-TMH-WD402–7 | At, Cf, Cy, Dt, Pl, V | None |
1CS_1.5 | 9 | 609 ± 32 | Pkinase-TMH_PknH_C | At | None |
1CS_1.6 | 8 | 761 ± 37 | Pkinase-TMH-PQQ_22 | At, Cf, Cy, Dt | None |
1CS_1.7 | 5 | 789 ± 13 | TMH2-PAP2-Pkinase-UPF0104 | At | None |
1CS_1.8 | 5 | 587 ± 83 | Pkinase-TMH-DUF4352 | At, Cf | None |
1CS_1.9 | 5 | 619 ± 26 | Pkinase-TMH-Lipoprotein_21 | At | None |
1CS_1.unclassified | 47 | NA | Various | NA | NA |
Phosphatases | |||||
1CS_2.1 | 117 | 392 ± 44 | SpoIIE-TMH | Widespread | None |
TMH-SpoIIE1–4/16 | |||||
TMH-PP2C_2 | |||||
1CS_2.2 | 35 | 527 ± 153 | TMH2–8-HD | Widespread | None |
TMH1–10-GGDEF-HD | |||||
TMH-(7TMR-HDED)-7TM_7MR_HD-HD | |||||
1CS_2.3 | 7 | 609 ± 52 | (TMH)-CHASE-TMH-HAMP-SpoIIE1–2 | At, Cy | None |
1CS_2.4 | 5 | 680 ± 36 | MASE1-(PAS/GAF)-SpoIIE | At, Cy, Pt, Sp | None |
1CS_2.unclassified | 6 | NA | Various | NA | NA |
Guanylate cyclases | |||||
1CS_3.1 | 192 | 419 ± 92 | TMH1–10-GGDEF | Widespread | None |
1CS_3.2 | 130 | 783 ± 131 | TMH1–10-GGDEF-EAL | Widespread | None |
TMH2/5-GGDEF2-EAL | |||||
TMH2-GGDEF-EAL-TMH | |||||
TMH10-GGDEF-TMH9-GGDEF-EAL | |||||
1CS_3.3 | 118 | 548 ± 67 | TMH2–7-HAMP-Guanylate_cyc | Widespread | None |
1CS_3.4 | 43 | 931 ± 110 | TMH1–2/5–6/8–9-PAS1–2/4-GGDEF-EAL | Widespread | None |
TMH5–7-GAF-GGDEF-EAL | |||||
TMH5-GGDEF-EAL-GAF1–2 | |||||
1CS_3.5 | 9 | 757 ± 133 | MASE1-(PAS2–3)-(GAF)-GGDEF-(EAL) | Ac, At, Cy, F, Pr | None |
1CS_3.6 | 6 | 725 ± 27 | TMH1–3-HAMP-GAF-GGDEF | At, Cf, Cy, Dt, F, Nt, Pr | None |
1CS_3.7 | 6 | 641 ± 177 | TMH6–9-GAF/PAS1–3-GGDEF | Widespread | None |
1CS_3.8 | 6 | 543 ± 156 | TMH2–7/PTS_EIIC-(PAS)-EAL | Widespread | None |
1CS_3.9 | 5 | 1,362 ± 48 | TMH2-PAS-GGDEF-(TMH1–3)-(PAS)-GGDEF1–2 | At | Acyl-CoA dehydrogenase |
1CS_3.unclassified | 12 | NA | Various | NA | NA |
DNA-binding proteins | |||||
1CS_4.1 | 188 | 469 ± 88 | TMH1–13-GerE | Widespread | None |
TMH8-GerE-TMH12 | |||||
1CS_4.2 | 62 | 372 ± 409 | HTH1–2-TMH1–8 | Widespread | None |
TMH1–4-HTH | |||||
1CS_4.3 | 34 | 232 ± 49 | TetR_N-TMH1–2-(TetR_C) | Widespread | None |
TMH4-TetR_N | |||||
1CS_4.4 | 19 | 282 ± 22 | HTH_25-TMH-DUF4115 | Widespread | FtsK, 2-methylthioadenine synthetase, CDP-diacylglycerol-glycerol-3-phosphate 3-phosphatidyltransferase |
1CS_4.5 | 7 | 371 ± 75 | HTH_31-TMH-DUF2690 | At | None |
1CS_4.6 | 6 | 279 ± 97 | DUF2637-HTH | At | None |
1CS_4.7 | 6 | 368 ± 55 | TMH-DUF4066-HTH_18 | Ac, At, B, Cf, Cy, Df, F, Gm, Pl, Pr, Sp, V | None |
1CS_4.unclassified | 35 | NA | Various | NA | NA |
RNA-binding proteins | |||||
1CS_5.1 | 1 | 200 | TMH2-ANTAR | At, B, F, Fu, Nt, Pr, Sp, Sy, T | None |
Amino acids (mean ± standard deviation).
Protein domain designations as in the Pfam database. Note that when TMHs are not explicitly mentioned in the domain architecture, they are part of one of the assigned domains.
Ac, Acidobacteria; Aq, Aquificae; Ar, Armatimonadetes; At, Actinobacteria; B, Bacteroidetes; Ca, Caldiserica; Cf, Chloroflexi; Ch, Chlorobi; Cl, Chlamydiae; Cr, Chrysiogenetes; Cy, Cyanobacteria; Df, Deferribacteres; Dg, Dictyoglomi; Dt, Deinococcus-Thermus; E, Elusimicrobia; Fb, Fibrobacteres; Fu, Fusobacteria; Gm, Gemmatimonadetes; I, Ignaeribacteriae; L, Lentisphaerae; M, Marinimicrobia; Nn, Nitrospinae; Nt, Nitrospirales; Pl, Plantomycetes; Pr, Proteobacteria; Sp, Spirochaetes; Sy, Synergistetes; T, Tenericutes; Td, Thermodesulfobacteria; Tt, Thermotogae; V, Verrucomicrobia. NA, not applicable. In this context, “widespread” refers to 19 to 31 bacterial phyla.
The complete set of identified RRs was investigated based on the nature of their output domains as predicted by Pfam (13). The number of proteins with each individual domain was manually determined. Proteins with uncommon domain architectures were further analyzed regarding their genomic context conservation and taxonomical span (see Table S3 in the supplemental material). The first was investigated using the tree-based genome browser tool in the MicrobesOnline database (15) (http://www.microbesonline.org; March 2011 update) and the second using the NCBI Conserved Domain Architecture Retrieval Tool (CDART) (16) (http://www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi).
Similarly, the complete set of proteins in our genome collection identified by the MiST database (11) as HKs were further analyzed. These proteins were preclassified based on their Pfam (13) domain architecture and transmembrane helices predicted by the TMHMM Server (12) using custom scripts written in MATLAB (The MathWorks Inc.). After manual validation of the classification, one representative of each group was used to evaluate the genomic context and taxonomical span (Table 2; see Table S4 in the supplemental material). As for response regulators, the first was investigated using the tree-based genome browser tool in the MicrobesOnline database (15) and the second using NCBI CDART (16).
TABLE 2.
Group identifier | No. of proteins | Lengtha | Protein domain architectureb | Taxonomical spanc | Predicted sensing | Notesd |
---|---|---|---|---|---|---|
HK_01 | 42 | 547 ± 36 | TMH-CHASE3-TMH-HAMP-HisKA-HATPase_c | At, Pr, Cy, B, Dt, F, Pl, V, Df | Extracytoplasmic | CHASE-CHASE6 sensor-like |
GenCon: in operon with RRs | ||||||
HK_02 | 68 | 547 ± 32 | TMH-(PAS)-TMH-PAS-HATPase_c | Pr, F, At, Sy, Sp, Df | Extracytoplasmic | CitA/DcuS-like |
GenCon: RR-HK-transporter | ||||||
HK_03 | 730 | 507 ± 106 | TMH-[50–300 aa]-TMH-HAMP-HisKA-HATPase_c | ND | Extracytoplasmic | Prototypical sensors |
HK_04 | 30 | 409 ± 24 | TMH-[60 aa]-TMH-HAMP-HisKA-HATPase_c | ND | Extracytoplasmic | NarX/Q-like |
HK_05 | 68 | 382 ± 26 | TMH-[30–40 aa]-TMH-HAMP-HisKA-HATPase_c | ND | Extracytoplasmic | PrmB-like |
HK_06 | 26 | 374 ± 28 | TMH-[25–30 aa]-TMH-HAMP-HisKA-HATPase_c | ND | Extracytoplasmic | VanS-like |
HK_07 | 384 | 461 ± 108 | TMH-HAMP-HisKA-HATPase_c | ND | Membrane? | |
HK_08 | 244 | 419 ± 60 | TMH3-(HAMP)-HisKA-HATPase_c | ND | Membrane? | |
HK_09 | 5 | 404 ± 117 | TMH0–5-HisKA-GerE | F, At, Ap, Pr | Membrane? | Overrepresented in Firmicutes |
HK_10 | 35 | 724 ± 141 | TMH2–6-(HisKA)-HATPase-TMH1–6 | At | Membrane | |
HK_11 | 71 | 435 ± 33 | PspC-TMH4–6-HATPase_c | At | Membrane | GenCon: downstream of pspC |
HK_12 | 26 | 722 ± 234 | TMH9-HisKA-HATPase_c | ND | Membrane | |
HK_13 | 6 | 735 ± 70 | TMH2–5-HisKA_3-(HATPAse_c)-TMH3–5-HisKA-HATPase_c | ND | Membrane? | Restricted to Actinomycetales |
HK_14 | 11 | 541 ± 52 | TMH4-HisKA-HATPase_c-TMH | ND | Membrane | Restricted to Actinomycetales |
HK_15 | 4 | 701 ± 138 | TMH2-HAMP-HisKA-HATPase_c-TMH11–14 | ND | Extracytoplasmic or membrane | |
HK_16 | 139 | 421 ± 108 | TMH-[5–25 aa]-TMH-(HAMP)-HisKA-HATPase_c | ND | Membrane | LiaS-like |
HK_17 | 937 | 416 ± 43 | TMH4/5-HisKA-HATPase_c-(TMH) | ND | Membrane | DesK-like |
HK_18 | 162 | 441 ± 59 | TMH6/7-HATPase_c | ND | Membrane | ComD/ArgC fit descriptor |
HK_19 | 1 | 664 | TMH-7TMR_DISM_7TM-HisKA-HATPase_c | F, Pr, Sp, B, At | Membrane | |
HK_20 | 5 | 841 ± 204 | TMH11/12-HisKA-HATPase_c-(TMH) | ND | Membrane | PutP-like |
HK_21 | 41 | 606 ± 74 | TMH8/10-HisKA-HATPase_c | ND | Membrane | ComP-like |
HK_22 | 2 | 680 ± 3 | TMH8/10-PAS-HisKA-HATPase_c | ND | Membrane | Restricted to Propinibacterineae |
HK_23 | 2 | 795 ± 177 | MASE1-(PAS3)-HisKA-HATPase | At | Membrane/cytoplasm | |
HK_24 | 11 | 567 ± 58 | TMH3–10-GAF-HisKA-HATPase_c | ND | Membrane/cytoplasm | |
HK_25 | 2 | 721 ± 6 | TMH2/3-PAS-GAF-HisKA-HATPase_c | ND | Membrane/cytoplasm | Restricted to Mycobacterium |
HK_26 | 12 | 881 ± 67 | TMH2-PAS-GAF-SpoIIE-HATPase_c | ND | Membrane/cytoplasm | Restricted to Streptomyces |
HK_27 | 71 | 858 ± 30 | KdpD-Usp-TMH3-HisKA-HATPase_c | Pr, At, F, B, T | Cytoplasm | KdpD-like GenCon: part of kdp operon |
HK_28 | 40 | 415 ± 73 | PAS-(HisKA)-HATPase_c | ND | Cytoplasm | NtrB-like |
HK_29 | 9 | 554 ± 106 | PAS2/3-(HisKA)-HATPase_c | ND | Cytoplasm | KinA-like |
HK_30 | 3 | 798 ± 32 | PAS-GAF-PHY-HisKA-HATPase_c | Pr, Cy, B, At, Pl, V | Cytoplasm | |
HK_31 | 100 | 499 ± 12 | H_kinase_N-PAS-HisKA-HATPase_c | At, F | Cytoplasm | |
HK_32 | 5 | 1598 ± 5 | Pkinase-AAA_16-(TRP_2/GAF_2)-HisKA-HATPase_c | F, At, Cy | Cytoplasm | |
HK_33 | 30 | 485 ± 11 | cNMP_binding-HATPase_c | At, Pr, Cy, Ad, B, V, Cf, Dt, F, Nt, Ar | Cytoplasm | GenCon: near thioredoxin reductase |
HK_34 | 104 | 386 ± 103 | HisKA-HATPase_c | At, F, Pr, B, Sp, Cy, Cf, V | Cytoplasm | |
HK_35 | 806 | 186 ± 103 | HATPase | F, At, Pr, Cf, Sp, Cy | Cytoplasm | |
HK_36 | 7 | 217 ± 95 | HisKA | At, F, Pr, Sp, B, Cy, Cf | Cytoplasm | |
HK_37 | 5 | 249 ± 9 | STAS-HATPase_c | At, Pr, F, Sp, B, Tt, Cf | Cytoplasm | Overrepresented in actinobacteria GenCon: downstream of extracellular solute-binding protein and HK |
HATPAse-STAS | ||||||
HK_38 | 170 | 514 ± 86 | GAF1/2-HisKA-HATPase_c | At, F, Pr, Cf, Cy, Dt, Sp | Cytoplasm | |
HK_39 | 10 | 643 ± 88 | PAS-GAF1/2-(PAS)-HisKA-HATPase_c | At, Pr, Cf, Cy, V, F, Dt | Cytoplasm | |
HK_40 | 170 | 790 ± 141 | PAS1/2-(GAF1/2)-SpoIIE-HATPase_c | At, F | Cytoplasm | |
HK_41 | 26 | 815 ± 97 | HATPase_c-(PAS)-GAF1/2-SpoIIE | At, V, Pr | Cytoplasm | Overrepresented in actinobacteria |
HK_42 | 29 | 486 ± 175 | HATPase_c-SpoIIE | Pr, At, Dt, Cy, F, Ad, Pl, V, B | Cytoplasm | GenCon: part of operon of regulators of sigma B activity |
HK_43 | 24 | 622 ± 146 | SpoIIE-HATPase_c | At, Pr, F, B, Pl, Sp | Cytoplasm | Overrepresented in actinobacteria |
HK_44 | 12 | 600 ± 74 | TMH-PAS-HisKA-HATPase | ND | Cytoplasm | |
HK_45 | 11 | 389 ± 84 | (HAMP)-HisKA-HATPase | ND | Cytoplasm | |
HK_46 | 167 | 979 ± 191 | TMH-NIT-HAMP-HATPase | At | Cytoplasm | GenCon: part of operon encoding proteins of unknown function |
HK_47 | 4 | 292 ± 53 | HATPase_c-HTH | At | Cytoplasm | |
HK_48 | 12 | 692 ± 134 | (PAS/PAS-GAF-PAS)-SpoIIE-HATPase_c-STAS_2 | At | Cytoplasm | |
HK_49 | 3 | 739 ± 3 | RsbU_N-PAS-SpoIIE-HATPase_c | At | Cytoplasm | GenCon: downstream of 4 anti-anti-sigma regulatory factors |
HK_50 | 34 | 325 ± 29 | MEDS-HATPase_c | At | Cytoplasm | |
HK_00 | 12 | TMH2-CHASE3-TMH-HAMP-GAF-PAS-HisKA-HATPase_c | Unique architectures | |||
HK_CA-GAF-PAS-HisKA-HATPase_c | ||||||
GAF-PAS2-HisKA-HATPase_c | ||||||
HAMP-PAS-HisKA-HATPase_c | ||||||
MASE1-SpoIIE-HATPase_c | ||||||
HATPase_c-RRXRR-HNH | ||||||
HATPase-DUF3883 | ||||||
HATPase_c-PCMT | ||||||
HATPase-DUF4325 | ||||||
DNA_ligase_A_M-DNA_ligase_A_C-His_kinase | ||||||
SSF-HisKA-HATPase_c | ||||||
ABC_trans-HisKA-HATPase_c |
Amino acids (mean ± standard deviation).
Protein domain designations as in the Pfam database. aa, amino acids.
Ad, Acidobacteria; Ar, Armatimonadetes; At, Actinobacteria; B, Bacteriodetes; Cf, Chloroflexi; Cy, Cyanobacteria; Df, Deferribacteres; Dt, Deinococcus-Thermus; F, Firmicutes; Nt, Nitrospirae; Pl, Plantomyces; Pr, Proteobacteria; Sp, Spirochetes; Sy, Synergistales; T, Tenericutes; Tt, Thermotogae; V, Verrucomicrobia; ND, not determined.
GenCon, genomic context conservation.
Classification of ECFs.
Of the 2,203 ECFs identified in the 119 actinobacterial genomes, 526 ECFs could not be associated with any of the ECF groups defined previously (6). These protein sequences were then further analyzed (see Tables S5 and S6 in the supplemental material). A multiple-sequence alignment was generated in Clustal Omega (7) from the sequences of all unclassified ECF σ factors, trimmed to contain only the conserved regions σ2 and σ4. The unrooted tree was generated from the gapless multiple-sequence alignment using the neighbor-joining method (9) implemented in the BioEdit Sequence Alignment Editor (8). The grouping was then manually performed on the resulting tree. As before, the genomic context analysis was performed using the tree-based genome browser tool in MicrobesOnline (15).
Characterization of ECFs containing C-terminal extensions.
Four ECF groups were composed of longer ECFs. A multiple-sequence alignment was built in Clustal Omega (7) from the complete protein sequences of these ECFs and representatives of standard ECFs, allowing visualization of the C-terminal extension (not shown). The complete protein sequences of these ECF σ factors were then submitted to TMHMM Server 2.0 (12) and Pfam (13) for prediction of TMHs and identification of protein domains, respectively. Multiple-sequence alignments were then generated in Clustal W2 (17) (http://www.ebi.ac.uk/Tools/msa/clustalw2/) from trimmed protein sequences encompassing individual identified conserved domains and were visualized with CLC Sequence Viewer software (CLC bio). The amino acid frequency distribution in the C-terminal extensions was also calculated in the CLC Sequence Viewer software.
Identification and characterization of ASFs.
Protein sequences of conserved genes located next to and presumably cotranscribed with ECFs were retrieved from MiST (11) (see Table S7 in the supplemental material). They were then submitted to TMHMM Sever 2.0 (12) and Pfam (13) for identification of transmembrane helices and protein domains, respectively. Multiple-sequence alignments were generated in Clustal W2 (17) and visualized with CLC Sequence Viewer software. Sequence logos of predicted segments located in the cytoplasm were generated in the WebLogo tool (18) (http://weblogo.berkeley.edu/logo.cgi) and illustrate the degree of amino acid conservation through graphical representation of a position weight matrix. Secondary-structure prediction of segments located in the periplasm were made in the PSIPRED Protein Sequence Analysis Workbench (available at http://bioinf.cs.ucl.ac.uk/psipred/) using the PSIPRED v3.3 prediction method (19) and graphically represented through the Prosite MyDomains tool (http://prosite.expasy.org/cgi-bin/prosite/mydomains/).
Identification of group-specific target promoters.
Initially, a library of upstream regulatory sequences was generated for each new ECF group (see Table S8 in the supplemental material). All 250-nucleotide-long sequences located immediately upstream of the start codon of the first gene in the ECF σ factor-encoding operon were retrieved from Microbes Online (15) or MiST (11). MicrobesOnline's operon predictions were used, except in cases in which the analyzed genome was not part of that database. In such cases, ECF-encoding operons were defined, as previously (20), as all consecutive genes adjacent to the ECF σ factor gene in the same orientation and separated by less than 50 nucleotides. Then, BioProspector (21; http://ai.stanford.edu/∼xsliu/BioProspector/) was used to identify overrepresented motifs in those sequences, mostly as described previously (20). The parameter settings used to search for two-block motifs that may not occur in all input sequences and only on their forward strands were as follows: lengths of the upstream and downstream blocks (W and w, respectively), 5 to 7 nucleotides; minimum distance (g) and maximum distance (G) separating the two blocks, 15 to 19 nucleotides and 16 to 20 nucleotides, respectively. These parameters were iteratively varied to encompass all possible combinations in which the difference between the maximum and minimum distances separating the two blocks was not more than 1 nucleotide. Third, the 10 highest-scoring motifs selected from 40 reinitializations in each run were manually analyzed. The collection of 450 sequence motifs obtained for each ECF group was initially restricted to those in which the number of motif hits was equal to or lower than the number of input sequences. Then, for each remaining motif, the number of sequences with multiple motif hits was manually determined. From those motifs with the lowest number of sequences with multiple motif hits, the one found in the highest number of sequences and with the highest score was selected. Finally, the sequence logos were generated, using the WebLogo tool (18), from all the motif-containing sequences except those that contained additional, lower-scored motif hits (i.e., only one hit per sequence was used).
RESULTS AND DISCUSSION
In order to generate the genome collection for our analysis, all 299 actinobacterial genomes present in the MiST database (11) were selected. This initial set was then reduced to exclude unfinished draft genomes and to eliminate the redundancy by including only one genome per species, which was chosen based on containing the highest number of STPs for the species. Among the genomes of Mycobacterium species strains JLS, KMS, and MCS, only the first was maintained due to the similarity between their STPs' profiles. The remaining set of 119 genomes, which were used for further analysis, is listed in Table S1 in the supplemental material. Information regarding the organisms' lifestyles, as well as abundances and distributions of STPs, was retrieved from the MiST database (11).
Distribution of STPs.
The analyzed actinobacterial genomes have GC contents ranging from 40 to 75%, are 0.9 to 12 Mbp in size, and encode numbers of proteins ranging from 808 to 10,022, with an average of 4,380 proteins per genome (see Table S1 in the supplemental material). Of these, on average, about 10% are involved in signal transduction (see Table S9 in the supplemental material). The morphological, metabolic, and habitat diversity of these organisms (see Table S1 in the supplemental material) suggests that their genomes may encode a corresponding diversity of signal-transducing systems.
Our definition of the different types of STPs follows that of the MiST database (11). Briefly, 1CSs are single proteins that contain both input and output domains but lack phosphotransfer domains typical of 2CSs. These 2CSs include, first, HKs, defined as proteins that have a transmitter unit (consisting of the catalytic HATPase_c domain and the DHp domain as the site of autophosphorylation) but not a receiver domain (a more detailed description of the domains can be found in Table S10 in the supplemental material). The second component, RRs, is defined as proteins that contain a receiver but not a transmitter domain. Also included as 2CSs are hybrid histidine kinases (HHKs) and hybrid response regulators (HRRs), which are proteins that have both transmitter and receiver domains. HHKs contain transmitter domains located N terminal to the receiver domain, while HRRs have transmitter domains located C terminal to the receiver domain. Chemotaxis (Che) proteins are specific types of 2CSs that are defined and classified according to the presence of conserved protein domains (e.g., CheW, CheB, or CheD) (11, 22). Finally, ECFs are members of the σ70 family of σ factors that contain only the conserved regions σ2 and σ4 (23).
Of a total of 51,138 STPs, 77% (39,590) are 1CSs, 5% of which (1,957) are membrane associated. Eighteen percent (9,141) of all the proteins are part of 2CSs, of which 54% (4,928) are HKs, 44% (4,032) are RRs, and 2% (181) are HHKs and HRRs. Only 0.4% (193) are chemotaxis-related proteins, while 4.3% (2,214) are ECF σ factors. While there is no strong relationship between the distribution of STPs and the organisms' morphologies, metabolisms, or habitats (see Table S1 in the supplemental material), a careful analysis of the distribution of STPs among taxonomical families revealed that the distribution is not homogeneous. While the families Actinomycetaceae, Bifidobacteriaceae, Coriobacteriaceae, and Corynebacteriaceae possess relatively low numbers of STPs, members of the families Nocardiaceae, Pseudonocardiaceae, and Streptomycetaceae are particularly STP rich (Fig. 1A and B). Given the genome size distribution within the respective families, this observation seems to be in line with previous reports describing a correlation between the genome size and the number of STPs (24, 25). Indeed, a positive and almost linear correlation (coefficient of determination [R2] = 0.94) between the total number of STPs and the genome size was also observed in the actinobacterial genomes (Fig. 2A).
However, some Mycobacterium spp. (Mycobacterium marinum M, Mycobacterium ulcerans Agy99, and Mycobacterium leprae TN) deviate from this rule by harboring fewer STPs than were expected based on their genome sizes alone. This could be due to genome reduction during evolution, which might introduce a bias into the relationship between STPs and genome size as determined here and depicted in Fig. 2. For example, M. leprae is an obligatory intracellular pathogen that underwent a significant genome reduction during evolution, resulting in less than half of its genome consisting of functional genes (26).
To gain more detailed insight into the distribution of STPs, we subsequently analyzed the distribution of each type of STP. Given that the vast majority of STPs are 1CSs, an identical linear correlation (R2 = 0.94) was observed between the genome size and the number of 1CSs (Fig. 2B), while this correlation was less well preserved (R2 = 0.79) for 2CSs (Fig. 2C). This deviation is mainly caused by significantly increased numbers of HKs over RRs in 2CS-rich genomes (Fig. 2C, inset), indicative of an increased need and hence ability to integrate signals in more complex organisms. The Mycobacteriaceae and Nocardiaceae (e.g., Rhodococcus jostii RHA1 and Rhodococcus opacus B4) are 2CS-poor bacterial families. The weakest correlation with genome size (R2 = 0.69) was observed for the ECFs (Fig. 2D), with four outliers that are particularly ECF rich relative to their genome sizes (Catenulispora acidiphila, Kribbella flavida, Amycolatopsis mediterranei, and Streptosporagium roseum). As a general trend, ECFs seem to be underrepresented (and often absent) in small genomes, while they tend to be enriched in organisms with large genomes. In fact, investigating the correlation between the different types of STPs revealed that for organisms with small genomes, the STPs are almost exclusively made up of 1CSs and 2CSs, while actinobacteria with larger genomes and hence more complex lifestyles start accumulating other types of STPs, particularly ECFs and chemotaxis-related proteins (see Fig. S1 in the supplemental material).
Below, we analyze each signaling principle separately, with a special focus on extracellular sensing. In doing so, we emphasize and highlight systems that are prominent in or even unique to the phylum Actinobacteria.
1CSs.
One-component systems represent the most abundant and simplistic signaling principle in bacteria, because they combine the stimulus-perceiving input domain and the cognate output domain, which mediates the cellular response (1). The vast majority of 1CSs are soluble regulatory proteins that respond to intracellular cues. Given our focus on the perception of environmental signals, we restricted our analysis to membrane-associated 1CSs (i.e., those proteins that contain at least one TMH according to TMHMM analyses), which comprise approximately 5% (1,957) of all 1CSs.
These proteins were then further classified according to their domain architectures. Based on their output domains, about 18% of them are nucleic acid (mostly DNA) binding proteins and 27% are involved in second-messenger sensing. Remarkably, the remaining half of the membrane-associated 1CSs are involved in mediating protein modification as their predicted output, mostly through functioning as Ser/Thr protein kinases (Table 1). Below, we highlight some prominent features of the identified 1CS groups.
Protein kinases.
Protein kinases are the predominant type of membrane-anchored 1CSs in the Actinobacteria. Among them, Ser/Thr kinases are the most common type (about 80% of all protein kinases). Bacterial Ser/Thr kinases, similarly to their eukaryotic counterparts, can phosphorylate a myriad of substrates and thereby structure complex signaling networks involved in diverse cellular processes, e.g., pathogenesis (27), cell division (28), control of gene expression (29), stress response (30), and quorum sensing (31). Most bacterial genomes encode few (if any) Ser/Thr kinases. Higher numbers are found only in bacteria with more complex lifestyles, like the Planctomycetes, 20 to 45% of whose 1CSs are Ser/Thr kinases (32). While Actinobacteria are not as rich in protein kinases (with an average of only 2% of their 1CSs being protein kinases), some families are particularly protein kinase rich: Frankiaceae (5.5%), Actinomycetaceae (4.9%), Nocardiopsaceae (4.4%), Bifidobacteriaceae (4.2%), and Microbacteriaceae (3.1%).
Based on their domain architectures, nine different groups of protein kinases can be distinguished among the 902 such proteins in Actinobacteria (Table 1), with about 800 being associated with one of two major groups. Protein kinases of group 1CS_1.1 contain a variable number of TMHs in addition to the kinase domains. Such architectures are widespread in the bacterial world, and no functional predictions can be made with respect to the stimulus sensed or the physiological role these kinases play.
In contrast, the second-most-abundant group, 1CS_1.2, is well defined. These membrane-anchored kinases contain one to five extracellular PASTA domains (Fig. 3A; see Table S10 in the supplemental material) that are implicated in sensing cell wall components and regulate aspects of cell wall homeostasis and remodeling (33). The best-understood example of this group is PknB of Mycobacterium tuberculosis, which senses muropeptides and mediates the exit of cells from dormancy (34). Genes encoding 1CS_1.2 proteins are frequently preceded by genes encoding penicillin-binding proteins and the cell cycle protein FtsW (Fig. 3D). Such kinases are also found in the phyla Bacteriodetes, Chloroflexi, and Firmicutes.
In contrast, a number of minor protein kinase groups, containing only a few proteins each, are restricted to the Actinobacteria. Group 1CS_1.5 is restricted to the family Mycobacteriaceae and contains an extracellular PknH_C domain with unknown function (Fig. 3A; see Table S10 in the supplemental material). Such domains are also found in numerous other proteins, such as a number of lipoproteins from M. tuberculosis. They contain two conserved cysteine residues that likely form a disulfide bridge (13). Group 1CS_1.7 is restricted to the order Actinomycetales and is characterized by a cytoplasmic PAP2 and a membrane-integral UPF0104 domain (Fig. 3A; see Table S10 in the supplemental material). While the first provides a putative phosphatase activity, the second is uncharacterized but contains a highly conserved proline-glycine motif. Group 1CS_1.9 is restricted to the family Frankiaceae and contains an extracytoplasmic lipoprotein_21 domain with unknown function (Fig. 3A; see Table S10 in the supplemental material). This domain is also found in some lipoproteins from mycobacteria, including LppP, which is required for optimal growth of M. tuberculosis (information retrieved from Pfam [13]).
Protein phosphatases.
Most of the 170 phosphatases are derived from two of the four distinct phosphatase groups that have been identified (Table 1). The domain architectures of both groups can be found in many other bacterial phyla, and no group-specific genomic context conservation was observed that could help shed light on the physiological roles of these proteins. The domain architecture of group 1CS_2.3 contains one sensory input domain (CHASE) and one signal transduction domain (HAMP), in addition to the phosphatase domain (SpoIIE) (Fig. 3A; see Table S10 in the supplemental material). Members of the phylum Cyanobacteria also encode such proteins. Members of the small group 1CS_2.4 contain a membrane-associated sensor domain (MASE1) (Fig. 3A; see Table S10 in the supplemental material) and are also found in the phyla Cyanobacteria, Proteobacteria, and Spirochaetes (Table 1).
Guanylate cyclases.
Guanylate cyclases are the second-most-abundant type of membrane-anchored 1CSs in Actinobacteria. The 527 proteins can be subdivided into nine groups, three of which, 1CS_3.1 to 1CS_3.3, contain more than 100 members each (Table 1). The domain architectures of most groups can be found in many other bacterial phyla, and no group-specific genomic context conservation was observed. Only the smallest group, 1CS_3.9, is actinobacterium specific, and its five members are derived from the genus Frankia (Fig. 3A and Table 1).
Of the 119 actinobacterial genomes in our data set, 88 encoded membrane-associated guanylate cyclases (MAGCs). Of these, 58% have between one and four such enzymes, and 34% contain between five and nine. Only 17% of the analyzed genomes encode 10 or more MAGCs. The most dramatic example is Kineococcus radiotolerans SRS30216, which encodes 42 MAGCs.
The reasons why bacteria harbor multiple guanylate cyclases have been debated for many years. As reviewed previously (35, 36), enzymes involved in cyclic di-GMP (c-di-GMP) metabolism might be expressed at different times, as is the case for Escherichia coli YhjH and Yersinia pestis HmsT, involved in motility and virulence regulation, respectively. Moreover, such enzymes might have distinct localization patterns contributing to distinct local concentrations, as is the case for the Salmonella species curli regulator CsgG and those involved in regulation of the Caulobacter crescentus cell cycle. Finally, as is also apparent in our classification (Table 1), guanylate cyclase DGGE domains might cooccur with different signal input domains (e.g., PAS, GAF, or MASE1), which might also reflect a higher potential for signal integration.
DNA-binding 1CSs represent the third-most-abundant type of membrane-anchored 1CSs in the Actinobacteria. The 357 proteins fall into seven groups, with more than half of all the proteins containing a GerE output domain (group 1CS_4.1) (Table 1; see Table S10 in the supplemental material). Group 1CS_4.4 is characterized by an N-terminal intracellular HTH_25 output domain and an extracellular C-terminal domain with unknown function (DUF4115). The two domains are separated by a single TMH (Fig. 3A; see Table S10 in the supplemental material). This domain architecture can be found in many bacterial phyla, but only the 19 actinobacterial members additionally share a genomic context: genes encoding 1CS_4.4 proteins are frequently preceded by genes encoding FtsK-like DNA translocases. Two downstream genes, encoding MiaB-like 2-methylthioadenine synthetases and CDP-diacylglycerol-3-phosphate 3-phosphatidyltransferases, are potentially cotranscribed (Fig. 3D). The physiological relevance of this conservation remains to be determined. Two small actinobacterium-specific groups of HTH_31-containing 1CSs, 1CS_4.5 and 1CS_4.6, differ in their putative input domains. While the first contains an extracellular DUF2690 domain with unknown function, the latter contains a membrane-embedded DUF2637 domain for perceiving a stimulus at or within the membrane interface (Fig. 3A; see Table S10 in the supplemental material). Both groups lack any genomic context conservation.
Only a single membrane-anchored RNA-binding 1CS (with a C-terminal ANTAR output domain) can be found in all 119 actinobacterial genomes (Table 1; see Table S10 in the supplemental material).
2CSs.
In contrast to 1CSs, which combine input and output domains on a single polypeptide chain, these two domains are separated on two different proteins for 2CSs. The stimulus-perceiving input domain is usually located at the N-terminal end of HKs, while the output domain can be found at the C-terminal end of the cognate RR. Signal transduction requires specific communication between the two partner proteins, which is based on a phosphoryl group transfer; upon stimulus perception, ATP-dependent autophosphorylation of a highly conserved histidine residue, located in the DHp (dimerization and histidine phosphorylation) domain, is mediated by a C-terminally located HATPase_c catalytic domain (see Table S10 in the supplemental material). Together, the DHp and HATPase_c domains form the transmitter unit that characterizes HKs of 2CSs (5). Phosphohistidine then serves as a phosphodonor for activating the cognate RR at an invariant aspartate residue located in the N-terminal REC (receiver) domain (see Table S10 in the supplemental material). This phosphotransfer usually results in dimerization of the RRs, thereby activating the C-terminal output domain. While most RRs are transcriptional regulators, a variety of (often homologous) output domains similar to that observed for 1CSs can also be found in RRs (2). The separation of input and output on two proteins simplifies the response to extracellular cues and also allows signal integration and amplification processes in more complex regulatory cascades, best exemplified by the 2CS-dependent phosphorelay that orchestrates the commitment to sporulation in Bacillus subtilis (37). Accordingly, over 50% of 2CSs are predicted to connect environmental stimuli with cellular responses, based on the presence of extracytoplasmic input domains. This estimation is derived from a comprehensive analysis of the input domain architectures of over 4,500 HKs (5). Nevertheless, many 2CSs are also employed in responding to cellular cues (5). Over 9,100 proteins were extracted from the MiST databases as part of 2CSs. Of these, 55% represent HKs, while the remaining 45% are classified as RRs (see Table S9 in the supplemental material). In order to further understand two-component signaling in actinobacteria, we looked in detail for each component (HKs and RRs) individually. For both types of proteins, an approach similar to that outlined for the membrane-anchored 1CSs was applied: the proteins were grouped based on their domain architectures (Table 2; see Tables S3 and S4 in the supplemental material). For each group, the phylogenetic distribution was analyzed to identify actinobacterium-specific groups. Such groups of HKs and RRs are presented below.
Histidine kinases.
Of the 4,928 HKs extracted from the MiST database, 4,916 could be classified into 50 groups based on their domain architectures (Table 2; see Table S4 in the supplemental material). Some of these groups were found only in actinobacterial proteins and contain unusual domain architectures, as illustrated in Fig. 3B and described below.
In the 35 members of the actinobacterium-specific group HK10, the transmitter unit is flanked on both sides by 2 to 6 TMHs (Fig. 3B). The architecture may suggest that these HKs have a sensing mechanism linked to these transmembrane regions. The mechanistic reason for this unusual domain architecture, particularly the function of the C-terminal TMHs, remains to be identified.
The input domain of group HK11 proteins also contains 4 to 6 TMHs, indicative of a membrane-associated sensing mechanism. Remarkably, the N-terminal TMH represents a conserved phage shock protein C (PspC) domain (Fig. 3B; see Table S10 in the supplemental material). In proteobacteria, PspC is indirectly involved in transcriptional (auto)regulation of its encoding pspABCDE operon. The resulting PSP response plays a significant role in competition for survival under nutrient- or energy-limited conditions (38). Another PspC-like protein is encoded by an upstream and divergently oriented gene, a genomic context that is conserved in most of the 71 members of HK11 (Fig. 3D). Together, these observations indicate that HK11 proteins may play an important role in orchestrating the phage shock protein-like response of Actinobacteria. This might represent a novel mechanism that seems to combine the function of proteobacterial PspC proteins as sensors/membrane anchors with control of the PSP-like Lia response of Firmicutes bacteria by unique LiaRS-like 2CSs (39, 40).
Proteins of the large (167-member) and actinobacterium-specific group HK46 are anchored to the membrane by an N-terminal TMH. Their unifying hallmark feature is a cytoplasmic NIT domain (see Table S10 in the supplemental material) located directly C terminal to the TMH, which is normally associated with microbial responses to nitrate and nitrite (41). Additionally, genes encoding HK46 proteins are cotranscribed with genes encoding conserved proteins with unknown functions.
The actinobacterium-specific groups HK47 to HK50 all represent soluble HK-like proteins that contain only a HATPase_c domain but seem to lack the His-containing DHp domain (Fig. 3B). This architecture indicates a cytoplasmic sensing mechanism and potentially the phosphorylation of a DHp-containing partner protein, which remains to be identified. The three HK47 proteins contain a C-terminal DNA-binding output domain. Group HK48 is restricted to the order Actinomycetales, and the 12 proteins show a rather complex domain architecture, including PAS and GAF domains (see Table S10 in the supplemental material) that might play a sensory role (42, 43). The presence of a SpoIIE domain (found in phosphatases, adenylate cyclases, and sporulation proteins [44]) and a STAS_2 domain (often present in the C-terminal region of sulfate transporters and ASF antagonists [45]) (see Table S10 in the supplemental material) might indicate a unique sensing and signaling mechanism for group HK48 proteins that remains to be investigated.
The presence of SpoIIE and RsbU domains in group HK49 proteins (Fig. 3B; see Table S10 in the supplemental material) points to a role of these HKs in more complex phosphorelay cascades, e.g., in differentiation and/or general stress responses that might also involve alternative σ factors. This HK group is restricted to the order Actinomycetales. The hallmark feature of group HK50 proteins is the presence of a MEDS (methanogen/methylotroph DcmR sensory) domain (see Table S10 in the supplemental material) that likely functions in sensing hydrocarbon derivatives (46).
Response regulators.
The 4,042 actinobacterial RRs extracted from the MiST database were also analyzed and grouped according to their output domains (see Table S3 in the supplemental material). Eighty percent of all RRs contain either GerE or Trans_reg_C output domains (see Table S10 in the supplemental material). The first is a LuxR-type DNA-binding helix-turn-helix domain, while the second is a C-terminal transcription-regulatory domain that also plays a role in DNA binding (44). Some actinobacterium-specific types of RRs with unusual domain architectures are illustrated in Fig. 3C. They include two membrane-anchored RR types (RR1 and RR2) and two groups of soluble RRs characterized by the presence of an N-terminal Trans_reg-C domain and an additional bacterial transcriptional activator (BTAD) domain (see Table S10 in the supplemental material), which can be found in the DnrI/RedD/AfsR family of transcriptional regulators (47). The regulatory mechanisms of such unique types of RRs remain to be determined.
Chemotaxis proteins.
Chemotaxis is a special form of 2CS-dependent regulation that is characterized by a unique type of CheA-like HK and a number of typical protein domains restricted to chemotaxis regulation, including CheW, CheZ, and CheR. All the proteins containing such domains are classified as chemotaxis proteins and have been extracted from the MiST database (see Table S9 in the supplemental material). An earlier study, based on only 17 actinobacterial genomes, concluded that chemotaxis proteins are absent from this phylum (2). Our own analysis of 119 actinobacterial genomes confirms that chemotaxis proteins are indeed very rare. Nevertheless, a few noteworthy exceptions to this rule could be identified and are briefly discussed below.
The genomes of five motile actinobacterial species (Conexibacter woesei DSM 14684, K. radiotolerans SRS30216, Jonesia denitrificans DSM 20603, Cellulomonas fimi ATCC 484, and Mobiluncus curtisii ATCC 43063) encode complete sets of chemotaxis proteins, and their genes are located in the immediate vicinity of flagellar operons. Hence, these proteins might be involved in chemotactic motility. In addition, Nocardioides sp. strain JS614 also contains a complete set of chemotaxis proteins, even though the organism has been described as being nonmotile (48). Additionally, a number of nonmotile actinobacteria contain relatively high numbers of chemotaxis-related STPs, e.g., 21 in Sanguibacter keddieii DSM 10542. This observation raised questions regarding the function and functionality of these chemotaxis proteins in these organisms. Three explanations can be envisaged.
First, an incomplete set of chemotaxis proteins might represent an intermediate of reductive evolution. Potentially, these species derive from a motile ancestor. After they assumed a sessile lifestyle later in evolution, chemotaxis proteins were no longer required and thus were gradually lost. If this assumption is true, this should result in the complete loss of genes encoding chemotaxis-related functions.
Second, such chemotaxis proteins might have acquired new regulatory functions that are no longer associated with motility. A few such cases have been described in the literature. The che3 operon of Myxococcus xanthus is required for differentiation (49). Moreover, it was suggested that one out of the four che operons of Pseudomonas aeruginosa regulates some pathogenicity genes, while one che operon of Pseudomonas fluorescens seems to be involved in cellulose biosynthesis (50). In actinobacteria, we identified several examples that might be in line with this idea. Here, eight nonmotile organisms (Pseudonocardia dioxanivorans, A. mediterranei, and six mycobacteria) harbor only CheB- and/or CheR-like proteins, and the corresponding genes are located next to genes encoding STAS domain-containing proteins (see Table S10 in the supplemental material). This domain is found in sulfate transporters and bacterial anti-σ factor antagonists (45), suggesting that these CheB/CheR proteins might be involved in methyl-mediated signal transduction processes unrelated to motility.
Third, potentially missing chemotaxis proteins might be only weakly conserved and hence misannotated. In these cases, the organisms would indeed use chemotaxis-related proteins for regulating their motility. Because of the high conservation of chemotaxis pathways in bacteria, this hypothesis is probably the least likely but nevertheless cannot be ruled out.
Extracytoplasmic-function σ factors.
ECFs represent the third pillar of bacterial signal transduction, with an average of six ECFs per bacterial genome (6). We previously analyzed more than 2,700 predicted ECFs from 369 microbial genomes belonging to 11 different phyla and could define 67 ECF groups based on the sequence similarity and domain architecture of both the ECFs and their cognate anti-σ factors, genomic context conservation, and putative target promoter motifs (6). Nevertheless, numerous actinobacterial ECFs could not be classified at the time. The significantly increased number of genomes available now inspired a phylum-specific reanalysis of actinobacterial genomes for the present analysis. We could identify 2,203 ECFs in our collection of 119 actinobacterial genomes. Of these, 76% (1, 677) belonged to one of the ECF groups defined in our initial study (6), while the remaining 24% (526) could not be classified. These were then subjected to further in-depth analyses, as described below.
ECF distribution and abundance.
In actinobacteria, the most abundant ECF groups are ECF01, ECF39, ECF41, and ECF42. Together, they account for more than half of all actinobacterial ECFs (Fig. 4; see Table S5 in the supplemental material) (6). Eight ECF groups (ECF14, ECF17, ECF19, ECF27, ECF36, ECF38, ECF39, and ECF40) could be found exclusively in the phylum Actinobacteria (6). They represent almost 20% of all actinobacterial ECFs.
Members of the Corynebacteriaceae, Coriobacteriaceae, Microbacteriaceae, Micrococcaceae, Bifidobacteriaceae, and Actinomycetaceae have low numbers and diversity of ECFs (Fig. 1). For example, members of the families Coriobacteriaceae and Bifidobacteria possess ECFs of only 2 ECF groups: ECF01 (both) and ECF30 (Coriobacteriaceae) or ECF12 (Bifidobacteria). In contrast, Nocardioidaceae, Streptomycetaceae, and Catenulisporaceae are particularly ECF-rich families, with ECFs belonging to 15 to 20 different ECF groups (Fig. 1).
Identification and classification of novel ECF groups.
Given that a quarter of all actinobacterial ECFs could not be assigned to any of the initially defined ECF groups (see Table S5 in the supplemental material), we aimed to classify them. A strategy identical to the one used previously (6), relying on sequence similarity of the ECFs and genomic context conservation, was pursued. We defined 18 new ECF groups (10 groups with more than 10 ECF sequences [ECF47 to ECF56] and 8 “minor” groups [ECF125 to ECF132], each containing less than 10 sequences). This allowed us to classify 427 of the 526 ECFs not covered by the previous classification (6). Hence, only about 4% of all actinobacterial ECFs remain unclassified (Fig. 4).
The vast majority of the novel groups identified here are taxonomically restricted to the Actinobacteria (see Table S11 in the supplemental material). The exceptions are ECFs of groups ECF55, ECF56, and ECF127, which are also found in other phyla, e.g., the Firmicutes, Bacteroidetes, Proteobacteria, and Chloroflexi.
Descriptions of novel ECF groups. (i) ECF47.
ECFs of group ECF47 occur only in the Actinobacteria but are widely distributed within the phylum, having been identified in 19 out of the 39 actinobacterial families analyzed (Fig. 1; see Table S11 in the supplemental material). Genes encoding these ECFs are putatively cotranscribed with their cognate ASFs, but genomic context conservation does not go beyond the σ/anti-σ pair.
The ASFs are membrane associated via a putative alanine- and valine-rich transmembrane helix that shows low similarity to that of RskA (regulator of ECF19 SigK) proteins, as reflected by a hit of the Pfam domain RskA (see Table S10 in the supplemental material) and by the multiple-sequence alignment shown in Fig. 5. Moreover, it was possible to identify an N-terminal anti-σ domain (ASD), which is a structural motif with reduced sequence conservation (only 21% sequence identity over 63 aligned residues), as defined based on structural studies of E. coli RseA and Rhodobacter sphaeroides ChrR (51). This domain is commonly found in both membrane-associated and soluble ASFs and is involved in σ/anti-σ interaction. In contrast to the N-terminal regions, the C-terminal periplasmic regions of these ASFs are very diverse both in sequence and in predicted secondary structure (Fig. 5; see Fig. S2 in the supplemental material).
(ii) ECF48.
ECFs of group ECF48 are also restricted to the phylum Actinobacteria (see Table S11 in the supplemental material). They possess long C-terminal extensions that are unusual, since they contain one putative transmembrane helix (Fig. 6). Thus, ECF48 proteins represent membrane-associated ECFs, a feature that has so far been found only in the planctomycete-specific group ECF01-Gob (32). Moreover, this C-terminal extension contains a conserved HXXXCXXC sequence motif (Fig. 6B) characteristic of the zinc-containing anti-σ (ZAS) domain (see Table S10 in the supplemental material), in which the two conserved cysteine residues usually coordinate a zinc ion. Upon zinc release, a disulfide bond is formed, causing a drastic conformational change in the ASF, ultimately leading to the release of the σ factor (52). The ZAS domain can be either redox sensitive or insensitive, which is mainly determined by the identities of the amino acid residues flanking the two conserved cysteine residues (53). In the case of the ECF48-associated ZAS domains, these flanking amino acid residues do not support redox sensing. Additionally, ECF48 proteins contain long (208- ± 53-amino-acid) putatively periplasmic regions that are proline rich (Fig. 6); their relevance remains to be identified.
The domain architecture of ECF48 proteins can be interpreted in two different ways. Stimulus perception by the extracytoplasmic C-terminal domain may result in a conformational change that is then transduced through the membrane to activate the cytoplasmic ECF output domain. In this case, the ECF would stay intact and mediate transcription initiation from its site at the membrane. Alternatively, a stimulus could trigger regulated proteolysis to release the cytoplasmic ECF domains for transcription initiation. This hypothesis is supported by a recent report demonstrating regulated proteolysis for an unusual membrane-anchored ECF07 protein from Pseudomonas putida PP2192, which is cleaved at the TM domain by RseP, releasing an active soluble ECF into the cytoplasm (54). Future experimental studies will be required to distinguish between these two possibilities.
(iii) ECF49 to ECF51.
ECFs of groups ECF49 to ECF51 are exclusively present in Actinobacteria (see Table S11 in the supplemental material), and the vast majority of those of ECF51 are found in Micromonosporaceae and Streptomycetaceae (Fig. 1). These σ factors are putatively cotranscribed with their cognate ASFs, but genomic context conservation does not go beyond the σ/anti-σ pair. The putative cognate ASFs are likely membrane associated, and in the cases of groups ECF49 and ECF51, the alanine- and valine-rich transmembrane helix shows weak similarity to RskA proteins (Fig. 5). The remaining regions of the protein are overall very diverse in terms of sequence and predicted secondary structure (Fig. 5; see Fig. S2 in the supplemental material).
(iv) ECF52.
ECFs of group ECF52 occur only in the phylum Actinobacteria (see Table S11 in the supplemental material). They possess long C-terminal extensions that, similar to what has been described for ECF48, contain a redox-insensitive ZAS domain with its characteristic HXXXCXXC signature. However, these ECFs contain variable numbers of transmembrane helices (between one and six, as identified by TMHMM Server 2.0) and a long (397- ± 257-amino-acid) proline-rich C-terminal extension (Fig. 6). In this region, we identified carbohydrate-binding domains (e.g., PF08305 and PF00553) (see Table S10 in the supplemental material) that, together with the genomic localization of their encoding genes in the vicinity of genes encoding proteins involved in carbohydrate metabolism (e.g., glycosyl transferases, xylanases, or pyruvate dehydrogenases), might suggest a role of ECF52 σ factors in regulation of a certain aspect of carbohydrate metabolism.
(v) ECF53.
ECF53 σ factors have a very narrow taxonomical distribution and are found almost exclusively in organisms that belong to the family Streptomycetaceae (Fig. 1). These σ factors constitute an unusual group, since they possess a conserved σ2 region but not a well-conserved σ4 region (Fig. 6). Moreover, a redox-insensitive ZAS domain (Fig. 6; see Table S10 in the supplemental material) was identified in their C-terminal extension. Beyond this, these extensions are highly variable between different ECF53 proteins and may contain one of several different domains with predicted enzymatic activities, e.g., glycosyl hydrolase catalytic core domains or alpha-l-arabinofuranosidase B domains (see Table S10 in the supplemental material). This observation again points to a potential role of these ECFs in regulating carbohydrate metabolism.
(vi) ECF54.
ECF54 proteins are restricted to the phylum Actinobacteria (see Table S11 in the supplemental material). Their genes are located either upstream or downstream of a small gene encoding a protein containing a carboxypeptidase-regulatory-like domain, a gene encoding a peptidase S8/S53, and a larger gene encoding a protein containing a C-terminal CHAT domain and three weakly conserved N-terminal tetratricopeptide repeats (Fig. 4). The CHAT domain (see Table S10 in the supplemental material) appears to be related to peptidases (information retrieved from InterPro [44]), and the tetratricopeptide repeats are involved in protein-protein interactions (55). Two scenarios can be hypothesized: either (i) the carboxypeptidase-regulatory-like domain protein is involved in the regulation of the ECF by means of the peptidase encoded adjacently or (ii) it is involved only in the regulation of the peptidase and the proximity to the ECF-encoding gene reflects only that these genes are under the transcriptional control of that ECF.
(vii) ECF56.
ECFs of group ECF56 can be found in the phyla Actinobacteria, Proteobacteria, and Gemmatimonadetes and are consistently present in 22 out of the 39 actinobacterial families analyzed (Fig. 1; see Table S11 in the supplemental material). ECF σ factors of this group are small (336- ± 21-amino-acid) proteins with the characteristic conserved σ2 and σ4 regions followed by a SnoaL_2 domain (Fig. 6; see Table S10 in the supplemental material). This domain was originally described in SnoaL-like proteins, which are polyketide cyclases involved in biosynthesis of nogalamycin, an anthracycline antibiotic produced by Streptomyces nogalater (56) and in a large number of other bacterial sequences (information retrieved from InterPro [44]). Additional genomic context conservation could not be identified for this ECF group.
(viii) ECF125.
ECF125 σ factors are restricted to Actinobacteria (see Table S11 in the supplemental material). Their genes are located upstream of a gene encoding a putative metalloprotein from the “glyoxalase/bleomycin resistance protein/dioxigenase” superfamily (Fig. 4). Such metalloproteins can be involved in peptide antibiotic resistance (57) and detoxification of metabolic subproducts (58). A putative cognate ASF was not identified for ECF125 σ factors. Instead, a transcriptional regulator of the TetR family is frequently encoded in the vicinity of ECF125 genes.
(ix) ECF126.
ECF126 σ factors can be found only in Actinobacteria (see Table S11 in the supplemental material). Their encoding genes are frequently located in the immediate vicinity of those encoding their cognate ASFs, a putative anti-anti-σ factor and a calcium-binding protein (Fig. 4). Their cognate ASFs are long (447- ± 58-amino-acid) and soluble proteins containing an N-terminal redox-insensitive ZAS domain and, C-terminally, a domain similar to the N-terminal domain of the mycothiol maleylpyruvate isomerase.
(x) ECF127 and ECF128.
ECF127 σ factors can be found in Actinobacteria and Chloroflexi, while the ECF128 σ factors are restricted to Actinobacteria (see Table S11 in the supplemental material). Genes encoding putative ASFs were not identified. Instead, ECF127 genes are located upstream of genes encoding Rieske proteins (see Table S10 in the supplemental material), which are iron-sulfur proteins of cytochrome complexes (59), and ECF128 genes are surrounded by genes encoding two membrane-associated proteins with unknown functions and a sortase (Fig. 4).
(xi) ECF130 and ECF132.
Both groups ECF130 and ECF132 are restricted to Actinobacteria, and their genes are linked to those encoding the cognate ASFs (see Table S11 in the supplemental material). While the ASFs of group ECF130 are small soluble proteins with an ASD, those of ECF132 are membrane-associated proteins with an N-terminal ASD and a proline-rich C terminus.
(xii) ECF55, ECF129, and ECF131.
ECFs of group ECF55 can be identified in the phyla Actinobacteria, Firmicutes, and Bacterioidetes. However, in Actinobacteria, they are restricted to the family Coriobacteriaceae (Fig. 1). ECFs of groups ECF129 and ECF131 are restricted to Actinobacteria (see Table S11 in the supplemental material). For all the groups, no conserved genomic context was observed and no putative ASF was identified.
Identification of group-specific ECF target promoter motifs.
The combined body of evidence derived from both comparative genomic predictions and experimental studies strongly suggests that ECFs belonging to the same group recognize similar target promoters (60, 61). Given that one of the hallmark features of most ECFs is the autoregulation of their own expression (62), it is to be expected that such target promoters can be found upstream of the respective transcriptional units, which facilitates their identification by searching for overrepresented bipartite sequence motifs in the promoter regions from within one ECF group.
We therefore attempted to identify the target promoters of newly defined ECF groups (ECF47 to ECF56) and of previously described ECF groups for which no promoter had yet been identified (ECF118, ECF122, and ECF123). Indeed, putative promoter sequences were identified for all these ECF groups, albeit in only about 70% of the promoter sequences (Fig. 7; see Table S8 in the supplemental material). In 7% of those sequences, additional—albeit degenerated—putative promoter motifs could also be found. One-fifth of all the promoter sequences were located very close to the start codon, so the +1 position of the mRNA would reside within the ribosome binding site or even directly upstream of the start codon. This observation indicates that (i) the putative promoter might not be a real promoter; (ii) a leaderless mRNA (without a ribosome binding site) is generated from such promoters, and distinct strategies of regulation of translation initiation are employed (63); or (iii) the start codon is misannotated in those ECFs. The last possibility is supported by a global study of M. tuberculosis H37Rv, which demonstrated that about 7% of all coding sequences were indeed misannotated in the strain (64).
Final considerations.
In the last 5 years alone, the number of completed microbial genomes has tripled. Next-generation sequencing efforts have further expanded by almost an order of magnitude the available sequence space of unfinished draft genomes. This massive increase in sequence information significantly boosts the complexity of comparative genomics analyses but also facilitates grouping of previously unclassified proteins with similar characteristics, thereby enabling the generation of new hypotheses regarding their functions or mechanisms of action.
This is reflected in the study by Jogler et al. (32), in which the analysis of eight genome sequences of Planctomycetes resulted in the definition of eight new ECF groups, thereby classifying almost 80% of the Planctomycetes ECFs that could not be grouped by our original classification (6). The same was observed in the present study: the analysis of 119 actinobacterial genomes resulted in the identification of 18 new ECF groups. This allowed the classification of 81% of the actinobacterial ECFs that were not covered by any of the original ECF groups (5, 6, 32).
Actinobacteria can live in aquatic (65) or terrestrial (66) environments and can also be pathogenic to both animals (67) and plants (68). Moreover, the phylum includes rod-shaped as well as filamentous bacteria, and some of the organisms can also undergo differentiation into spores (69) or other dormant forms (70). These complex lifestyles are mirrored in the complexity of signal transduction and in the number and diversity of STPs shown here but also in signaling networks already elucidated (71). Our initial study on ECF classification (6), in which Actinobacteria were highlighted as one of the most ECF-rich phyla, is a good example of this. Similarly, the current study revealed six actinobacterium-specific 1CS groups, eight actinobacterium-specific HK groups, three actinobacterium-specific RR architectures, and seven new actinobacterium-specific ECF groups.
However, the data presented in this study, which is based on a sequence analysis of over 50,000 STPs, not only serve as a key resource for researchers interested in actinobacterial signal transduction, they also provide a comprehensive platform for the generation of hypotheses regarding the biological roles and regulatory mechanisms of newly identified STP groups. (i) A group-specific protein domain architecture may provide crucial hints about the regulatory mechanism (e.g., as described for the role of HK11 in mediating the actinobacterial phage shock response or the regulatory relevance of C-terminal extensions in ECF group ECF48 or ECF52). (ii) Genomic context conservation may point to important accessory regulatory proteins or conserved target genes, as indicated in Fig. 3D and 4D for 1CSs and ECFs, respectively. (iii) Predicted ECF group-specific target promoter motifs (Fig. 6) are key to identifying the corresponding regulon. All of these indications can help to formulate clear hypotheses that can help to focus and streamline subsequent experimental efforts. Ultimately, the information provided in this report will therefore help to mechanistically explore the abundance, diversity, and uniqueness of actinobacterial STPs that contribute to the remarkable adaptability of these organisms to complex environments and distinct lifestyles.
Supplementary Material
ACKNOWLEDGMENTS
This work was supported by a grant from the Deutsche Forschungsgemeinschaft (MA2837/2-2 to T.M.). D.P. receives funding from the People Programme (Marie Curie Actions) of the European Union's Seventh Framework Programme (FP7/2007-2013) under REA grant agreement no. 628509. X.H. is the recipient of a stipend from the Chinese Scholarship Council (CSC).
Footnotes
Supplemental material for this article may be found at http://dx.doi.org/10.1128/JB.00176-15.
REFERENCES
- 1.Ulrich LE, Koonin EV, Zhulin IB. 2005. One-component systems dominate signal transduction in prokaryotes. Trends Microbiol 13:52–56. doi: 10.1016/j.tim.2004.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Galperin MY. 2006. Structural classification of bacterial response regulators: diversity of output domains and domain combinations. J Bacteriol 188:4169–4182. doi: 10.1128/JB.01887-05. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Helmann JD, Chamberlin MJ. 1988. Structure and function of bacterial sigma factors. Annu Rev Biochem 57:839–872. doi: 10.1146/annurev.bi.57.070188.004203. [DOI] [PubMed] [Google Scholar]
- 4.Mascher T. 2013. Signaling diversity and evolution of extracytoplasmic function (ECF) σ factors. Curr Opin Microbiol 16:148–155. doi: 10.1016/j.mib.2013.02.001. [DOI] [PubMed] [Google Scholar]
- 5.Mascher T, Helmann JD, Unden G. 2006. Stimulus perception in bacterial signal-transducing histidine kinases. Microbiol Mol Biol Rev 70:910–938. doi: 10.1128/MMBR.00020-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Staroń A, Sofia HJ, Dietrich S, Ulrich LE, Liesegang H, Mascher T. 2009. The third pillar of bacterial signal transduction: classification of the extracytoplasmic function (ECF) sigma factor protein family. Mol Microbiol 74:557–581. doi: 10.1111/j.1365-2958.2009.06870.x. [DOI] [PubMed] [Google Scholar]
- 7.Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins DG. 2011. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539. doi: 10.1038/msb.2011.75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Hall TTA. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41:95–98. [Google Scholar]
- 9.Saitou N, Nei M. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425. [DOI] [PubMed] [Google Scholar]
- 10.Huson DH, Richter DC, Rausch C, Dezulian T, Franz M, Rupp R. 2007. Dendroscope: an interactive viewer for large phylogenetic trees. BMC Bioinformatics 8:460. doi: 10.1186/1471-2105-8-460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ulrich LE, Zhulin IB. 2010. The MiST2 database: a comprehensive genomics resource on microbial signal transduction. Nucleic Acids Res 38:D401–D407. doi: 10.1093/nar/gkp940. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Sonnhammer EL, von Heijne G, Krogh A. 1998. A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol 6:175–182. [PubMed] [Google Scholar]
- 13.Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer ELL, Eddy SR, Bateman A, Finn RD. 2012. The Pfam protein families database. Nucleic Acids Res 40:D290–D301. doi: 10.1093/nar/gkr1065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Cserzö M, Eisenhaber F, Eisenhaber B, Simon I. 2002. On filtering false positive transmembrane protein predictions. Protein Eng 15:745–752. doi: 10.1093/protein/15.9.745. [DOI] [PubMed] [Google Scholar]
- 15.Alm EJ, Huang KH, Price MN, Koche RP, Keller K, Dubchak IL, Arkin AP. 2005. The MicrobesOnline Web site for comparative genomics. Genome Res 15:1015–1022. doi: 10.1101/gr.3844805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Geer LY, Domrachev M, Lipman DJ, Bryant SH. 2002. CDART: protein homology by domain architecture. Genome Res 12:1619–1623. doi: 10.1101/gr.278202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Thompson JD, Higgins DG, Gibson TJ. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Crooks GE, Hon G, Chandonia J, Brenner SE. 2004. WebLogo: a sequence logo generator. Genome Res 14:1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.McGuffin LJ, Bryson K, Jones DT. 2000. The PSIPRED protein structure prediction server. Bioinformatics 16:404–405. doi: 10.1093/bioinformatics/16.4.404. [DOI] [PubMed] [Google Scholar]
- 20.Rhodius VA, Segall-Shapiro TH, Sharon BD, Ghodasara A, Orlova E, Tabakh H, Burkhardt DH, Clancy K, Peterson TC, Gross C, Voigt C. 2013. Design of orthogonal genetic switches based on a crosstalk map of σs, anti-σs, and promoters. Mol Syst Biol 9:702. doi: 10.1038/msb.2013.58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Liu X, Brutlag D, Liu J. 2001. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput 6:127–138. [PubMed] [Google Scholar]
- 22.Wuichet K, Alexander RP, Zhulin IB. 2007. Comparative genomic and protein sequence analyses of a complex system controlling bacterial chemotaxis. Methods Enzymol 422:1–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Gruber TM, Gross C. 2003. Multiple sigma subunits and the partitioning of bacterial transcription space. Annu Rev Microbiol 57:441–466. doi: 10.1146/annurev.micro.57.030502.090913. [DOI] [PubMed] [Google Scholar]
- 24.Ulrich LE, Zhulin IB. 2007. MiST: a microbial signal transduction database. Nucleic Acids Res 35:D386–D390. doi: 10.1093/nar/gkl932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Galperin MY. 2005. A census of membrane-bound and intracellular signal transduction proteins in bacteria: bacterial IQ, extroverts and introverts. BMC Microbiol 5:35. doi: 10.1186/1471-2180-5-35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Cole ST, Eiglmeier K, Parkhill J, James KD, Thomson NR, Wheeler PR, Honoré N, Garnier T, Churcher C, Harris D, Mungall K, Basham D, Brown D, Chillingworth T, Connor R, Davies RM, Devlin K, Duthoy S, Feltwell T, Fraser, Hamlin N, Holroyd S, Hornsby T, Jagels K, Lacroix C, Maclean J, Moule S, Murphy L, Oliver K, Quail MA, Rajandream MA, Rutherford KM, Rutter S, Seeger K, Simon S, Simmonds M, Skelton J, Squares R, Squares S, Stevens K, Taylor K, Whitehead S, Woodward JR, Barrell BG. 2001. Massive gene decay in the leprosy bacillus. Nature 409:1007–1011. doi: 10.1038/35059006. [DOI] [PubMed] [Google Scholar]
- 27.Parandhaman DK, Hanna LE, Narayanan S. 2014. PknE, a serine/threonine protein kinase of Mycobacterium tuberculosis initiates survival crosstalk that also impacts HIV coinfection. PLoS One 9:e83541. doi: 10.1371/journal.pone.0083541. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- 28.Molle V, Kremer L. 2010. Division and cell envelope regulation by Ser/Thr phosphorylation: Mycobacterium shows the way. Mol Microbiol 75:1064–1077. doi: 10.1111/j.1365-2958.2009.07041.x. [DOI] [PubMed] [Google Scholar]
- 29.Parandhaman DK, Sharma P, Bisht D, Narayanan S. 2014. Proteome and phosphoproteome analysis of the serine/threonine protein kinase E mutant of Mycobacterium tuberculosis. Life Sci 109:116–126. doi: 10.1016/j.lfs.2014.06.013. [DOI] [PubMed] [Google Scholar]
- 30.Kumar D, Palaniyandi K, Challu VK, Kumar P, Narayanan S. 2013. PknE, a serine/threonine protein kinase from Mycobacterium tuberculosis has a role in adaptive responses. Arch Microbiol 195:75–80. doi: 10.1007/s00203-012-0848-4. [DOI] [PubMed] [Google Scholar]
- 31.Cluzel M-E, Zanella-Cléon I, Cozzone AJ, Fütterer K, Duclos B, Molle V. 2010. The Staphylococcus aureus autoinducer-2 synthase LuxS is regulated by Ser/Thr phosphorylation. J Bacteriol 192:6295–6301. doi: 10.1128/JB.00853-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Jogler C, Waldmann J, Huang X, Jogler M, Glöckner FO, Mascher T, Kolter R. 2012. Identification of proteins likely to be involved in morphogenesis, cell division, and signal transduction in Planctomycetes by comparative genomics. J Bacteriol 194:6419–6430. doi: 10.1128/JB.01325-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Yeats C, Finn RD, Bateman A. 2002. The PASTA domain: a beta-lactam-binding domain. Trends Biochem Sci 27:438. doi: 10.1016/S0968-0004(02)02164-3. [DOI] [PubMed] [Google Scholar]
- 34.Mir M, Asong J, Li X, Cardot J, Boons G-J, Husson RN. 2011. The extracytoplasmic domain of the Mycobacterium tuberculosis Ser/Thr kinase PknB binds specific muropeptides and is required for PknB localization. PLoS Pathog 7:e1002182. doi: 10.1371/journal.ppat.1002182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Povolotsky TL, Hengge R. 2012. “Life-style” control networks in Escherichia coli: signaling by the second messenger c-di-GMP. J Biotechnol 160:10–16. doi: 10.1016/j.jbiotec.2011.12.024. [DOI] [PubMed] [Google Scholar]
- 36.Hengge R. 2009. Principles of c-di-GMP signalling in bacteria. Nat Rev Microbiol 7:263–273. doi: 10.1038/nrmicro2109. [DOI] [PubMed] [Google Scholar]
- 37.Trach K, Burbulys D, Strauch M, Wu JJ, Dhillon N, Jonas R, Hanstein C, Kallio P, Perego M, Bird T. 1991. Control of the initiation of sporulation in Bacillus subtilis by a phosphorelay. Res Microbiol 142:815–823. doi: 10.1016/0923-2508(91)90060-N. [DOI] [PubMed] [Google Scholar]
- 38.Brissette JL, Weiner L, Ripmaster TL, Model P. 1991. Characterization and sequence of the Escherichia coli stress-induced psp operon. J Mol Biol 220:35–48. doi: 10.1016/0022-2836(91)90379-K. [DOI] [PubMed] [Google Scholar]
- 39.Darwin AJ. 2005. The phage-shock-protein response. Mol Microbiol 57:621–628. doi: 10.1111/j.1365-2958.2005.04694.x. [DOI] [PubMed] [Google Scholar]
- 40.Mascher T. 2014. Bacterial (intramembrane-sensing) histidine kinases: signal transfer rather than stimulus perception. Trends Microbiol 22:559–565. doi: 10.1016/j.tim.2014.05.006. [DOI] [PubMed] [Google Scholar]
- 41.Shu CJ, Ulrich LE, Zhulin IB. 2003. The NIT domain: a predicted nitrate-responsive module in bacterial sensory receptors. Trends Biochem Sci 28:121–124. doi: 10.1016/S0968-0004(03)00032-X. [DOI] [PubMed] [Google Scholar]
- 42.Taylor BL, Zhulin IB. 1999. PAS domains: internal sensors of oxygen, redox potential, and light. Microbiol Mol Biol Rev 63:479–506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Aravind L, Ponting CP. 1997. The GAF domain: an evolutionary link between diverse phototransducing proteins. Trends Biochem Sci 22:458–459. doi: 10.1016/S0968-0004(97)01148-1. [DOI] [PubMed] [Google Scholar]
- 44.Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S, de Castro E, Coggill P, Corbett M, Das U, Daugherty L, Duquenne L, Finn RD, Fraser M, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, McMenamin C, Mi H, Mutowo-Muellenet P, Mulder N, Natale D, Orengo C, Pesseat S, Punta M, Quinn AF, Rivoire C, Sangrador-Vegas A, Selengut JD, Sigrist CJA, Scheremetjew M, Tate J, Thimmajanarthanan M, Thomas PD, Wu CH, Yeats C, Yong S-Y. 2012. InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40:D306–D312. doi: 10.1093/nar/gkr948. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Aravind L, Koonin EV. 2000. The STAS domain—a link between anion transporters and antisigma-factor antagonists. Curr Biol 10:R53–R55. doi: 10.1016/S0960-9822(00)00335-3. [DOI] [PubMed] [Google Scholar]
- 46.Anantharaman V, Aravind L. 2005. MEDS and PocR are novel domains with a predicted role in sensing simple hydrocarbon derivatives in prokaryotic signal transduction systems. Bioinformatics 21:2805–2811. doi: 10.1093/bioinformatics/bti418. [DOI] [PubMed] [Google Scholar]
- 47.Yeats C, Bentley S, Bateman A. 2003. New knowledge from old: in silico discovery of novel protein domains in Streptomyces coelicolor. BMC Microbiol 3:3. doi: 10.1186/1471-2180-3-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Coleman NV, Wilson NL, Barry K, Brettin TS, Bruce DC, Copeland A, Dalin E, Detter JC, Del Rio TG, Goodwin L, Hammon NM, Han S, Hauser LJ, Israni S, Kim E, Kyrpides N, Land ML, Lapidus A, Larimer FW, Lucas S, Pitluck S, Richardson P, Schmutz J, Tapia R, Thompson S, Tice HN, Spain JC, Gossett JG, Mattes TE. 2011. Genome sequence of the ethene- and vinyl chloride-oxidizing actinomycete Nocardioides sp. strain JS614. J Bacteriol 193:3399–3400. doi: 10.1128/JB.05109-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Kirby JR, Zusman DR. 2003. Chemosensory regulation of developmental gene expression in Myxococcus xanthus. Proc Natl Acad Sci U S A 100:2008–2013. doi: 10.1073/pnas.0330944100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Wadhams GH, Armitage JP. 2004. Making sense of it all: bacterial chemotaxis. Nat Rev Mol Cell Biol 5:1024–1037. doi: 10.1038/nrm1524. [DOI] [PubMed] [Google Scholar]
- 51.Campbell EA, Greenwell R, Anthony JR, Wang S, Lim L, Das K, Sofia HJ, Donohue TJ, Darst SA. 2007. A conserved structural module regulates transcriptional responses to diverse stress signals in bacteria. Mol Cell 27:793–805. doi: 10.1016/j.molcel.2007.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Li W, Bottrill AR, Bibb MJ, Buttner MJ, Paget MSB, Kleanthous C. 2003. The role of zinc in the disulphide stress-regulated anti-sigma factor RsrA from Streptomyces coelicolor. J Mol Biol 333:461–472. doi: 10.1016/j.jmb.2003.08.038. [DOI] [PubMed] [Google Scholar]
- 53.Jung Y-G, Cho Y-B, Kim M-S, Yoo J-S, Hong S-H, Roe J-H. 2011. Determinants of redox sensitivity in RsrA, a zinc-containing anti-sigma factor for regulating thiol oxidative stress response. Nucleic Acids Res 39:7586–7597. doi: 10.1093/nar/gkr477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Bastiaansen KC, Ibañez A, Ramos JL, Bitter W, Llamas M. 2014. The Prc and RseP proteases control bacterial cell-surface signalling activity. Environ Microbiol 16:2433–2443. doi: 10.1111/1462-2920.12371. [DOI] [PubMed] [Google Scholar]
- 55.D'Andrea LD, Regan L. 2003. TPR proteins: the versatile helix. Trends Biochem Sci 28:655–662. doi: 10.1016/j.tibs.2003.10.007. [DOI] [PubMed] [Google Scholar]
- 56.Torkkell S, Kunnari T, Palmu K, Mäntsälä P, Hakala J, Ylihonko K. 2001. The entire nogalamycin biosynthetic gene cluster of Streptomyces nogalater: characterization of a 20-kb DNA region and generation of hybrid structures. Mol Genet Genomics 266:276–288. doi: 10.1007/s004380100554. [DOI] [PubMed] [Google Scholar]
- 57.Dumas P, Bergdoll M, Cagnon C, Masson JM. 1994. Crystal structure and site-directed mutagenesis of a bleomycin resistance protein and their significance for drug sequestering. EMBO J 13:2483–2492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Sousa Silva M, Gomes RA, Ferreira AEN, Ponces Freire A, Cordeiro C. 2013. The glyoxalase pathway: the first hundred years and beyond. Biochem J 453:1–15. doi: 10.1042/BJ20121743. [DOI] [PubMed] [Google Scholar]
- 59.Schneider D, Schmidt CL. 2005. Multiple Rieske proteins in prokaryotes: where and why? Biochim Biophys Acta 1710:1–12. doi: 10.1016/j.bbabio.2005.09.003. [DOI] [PubMed] [Google Scholar]
- 60.Rhodius VA, Suh WC, Nonaka G, West J, Gross CA. 2006. Conserved and variable functions of the sigmaE stress response in related genomes. PLoS Biol 4:e2. doi: 10.1371/journal.pbio.0040002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Dufour YS, Landick R, Donohue TJ. 2008. Organization and evolution of the biological response to singlet oxygen stress. J Mol Biol 383:713–730. doi: 10.1016/j.jmb.2008.08.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Helmann JD. 2002. The extracytoplasmatic function (ECF) sigma factors. Adv Microb Physiol 46:47–110. [DOI] [PubMed] [Google Scholar]
- 63.Moll I, Grill S, Gualerzi CO, Bläsi U. 2002. Leaderless mRNAs in bacteria: surprises in ribosomal recruitment and translational control. Mol Microbiol 43:239–246. doi: 10.1046/j.1365-2958.2002.02739.x. [DOI] [PubMed] [Google Scholar]
- 64.DeJesus M, Sacchettini JC, Ioerger TR. 2013. Reannotation of translational start sites in the genome of Mycobacterium tuberculosis. Tuberculosis 93:18–25. doi: 10.1016/j.tube.2012.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Makovcova J, Slany M, Babak V, Slana I, Kralik P. 2014. The water environment as a source of potentially pathogenic mycobacteria. J Water Health 12:254–263. doi: 10.2166/wh.2013.102. [DOI] [PubMed] [Google Scholar]
- 66.Park Y, Kook M, Ngo HTT, Kim K-Y, Park S-Y, Mavlonov GT, Yi T-H. 2014. Arthrobacter bambusae sp. nov., isolated from soil of a bamboo grove. Int J Syst Evol Microbiol 64:3069–3074. doi: 10.1099/ijs.0.064550-0. [DOI] [PubMed] [Google Scholar]
- 67.Vázquez-Boland JA, Giguère S, Hapeshi A, MacArthur I, Anastasi E, Valero-Rello A. 2013. Rhodococcus equi: the many facets of a pathogenic actinomycete. Vet Microbiol 167:9–33. doi: 10.1016/j.vetmic.2013.06.016. [DOI] [PubMed] [Google Scholar]
- 68.González AJ, Trapiello E. 2014. Clavibacter michiganensis subsp. phaseoli subsp. nov., pathogenic in bean. Int J Syst Evol Microbiol 64:1752–1755. doi: 10.1099/ijs.0.058099-0. [DOI] [PubMed] [Google Scholar]
- 69.McCormick JR, Flärdh K. 2012. Signals and regulators that govern Streptomyces development. FEMS Microbiol Rev 36:206–231. doi: 10.1111/j.1574-6976.2011.00317.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Shleeva M, Mukamolova GV, Young M, Williams HD, Kaprelyants AS. 2004. Formation of “non-culturable” cells of Mycobacterium smegmatis in stationary phase in response to growth under suboptimal conditions and their Rpf-mediated resuscitation. Microbiology 150:1687–1697. doi: 10.1099/mic.0.26893-0. [DOI] [PubMed] [Google Scholar]
- 71.Flärdh K, Buttner MJ. 2009. Streptomyces morphogenetics: dissecting differentiation in a filamentous bacterium. Nat Rev Microbiol 7:36–49. doi: 10.1038/nrmicro1968. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.