In this study, we analyzed the relationship between the relative frequencies of the primary regulatory elements in bacteria and archaea, namely, transcription factors, sigma factors, and riboswitches. In bacteria, we reveal a compensatory behavior for transcription factors and sigma factors, meaning that in phylogenetic groups in which the relative number of transcription factors was low, we found a tendency for the number of sigma factors to be high and vice versa. For most of the phylogenetic groups analyzed here, except for Firmicutes and Tenericutes, a clear relationship with other mechanisms was not detected for transcriptional riboswitches, suggesting that their low frequency in most genomes does not constitute a significant impact on the global variety of transcriptional regulatory elements in prokaryotic organisms.
KEYWORDS: genome size, phylum-specific trends, riboswitches, sigma factors, transcription factors
ABSTRACT
In prokaryotes, the key players in transcription initiation are sigma factors and transcription factors that bind to DNA to modulate the process, while premature transcription termination at the 5′ end of the genes is regulated by attenuation and, in particular, by attenuation associated with riboswitches. In this study, we describe the distribution of these regulators across phylogenetic groups of bacteria and archaea and find that their abundance not only depends on the genome size, as previously described, but also varies according to the phylogeny of the organism. Furthermore, we observed a tendency for organisms to compensate for the low frequencies of a particular type of regulatory element (i.e., transcription factors) with a high frequency of other types of regulatory elements (i.e., sigma factors). This study provides a comprehensive description of the more abundant COG, KEGG, and Rfam families of transcriptional regulators present in prokaryotic genomes.
IMPORTANCE In this study, we analyzed the relationship between the relative frequencies of the primary regulatory elements in bacteria and archaea, namely, transcription factors, sigma factors, and riboswitches. In bacteria, we reveal a compensatory behavior for transcription factors and sigma factors, meaning that in phylogenetic groups in which the relative number of transcription factors was low, we found a tendency for the number of sigma factors to be high and vice versa. For most of the phylogenetic groups analyzed here, except for Firmicutes and Tenericutes, a clear relationship with other mechanisms was not detected for transcriptional riboswitches, suggesting that their low frequency in most genomes does not constitute a significant impact on the global variety of transcriptional regulatory elements in prokaryotic organisms.
INTRODUCTION
Gene expression regulation is a common mechanism in all living organisms in response to intracellular and environmental changes. In general terms, the regulation of gene expression in prokaryotic organisms is performed at the transcriptional and translational levels during the initiation, elongation, or termination stages (1, 2). Additionally, regulation can take place posttranscriptionally (i.e., mRNA processing or degradation [reviewed in references 3 and 4]) or posttranslationally (i.e., degradation or modification by phosphorylation, acetylation, hydroxylation, methylation, or glycosylation, among others [reviewed in reference 5]).
The three primary key players in prokaryotic transcriptional regulation are transcription factors (TFs), sigma factors, and riboswitches. TFs are proteins for which DNA binding results in the activation or repression of gene transcription. In some cases, TFs may have dual activity as activators or repressors, depending on the position in which they bind to the DNA with respect to the promoter position. Considering the conservation of protein sequences, prokaryotic TFs are found in 91 groups in the COG (Clusters of Orthologous Groups) database (6–8) (see Table S1 in the supplemental material). These TF groups can be subdivided into smaller groups if, in addition to the sequence similarity criterion, the functions of the proteins and the cellular processes in which they participate are also considered. Thus, prokaryotic TFs are grouped into 369 KEGG orthology (KO) groups in the KEGG (Kyoto Encyclopedia of Genes and Genomes) database (9–11) (Table S1).
In addition to transcriptional regulation based on TFs, gene expression in bacteria is also transcriptionally modulated by sigma factors, which are polypeptides that are required by the bacterial RNA polymerase holoenzyme to initiate transcription by conferring promoter recognition selectivity and participating in the initial steps of RNA synthesis. In Escherichia coli and other bacterial organisms, the transcription of most genes, including all of the essential or “housekeeping” genes, depends on sigma70 (12). Nevertheless, under certain specific stresses or developmental pathways, such as sporulation, the participation of alternative sigma factors might change the RNA polymerase preferences for a set of particular promoters, resulting in the coordinated transcription of large numbers of specific genes (13). According to the COG database, bacterial sigma factors can be clustered into four groups (Table S1). These groups can be further subdivided when the processes of the genes that they regulate are considered, as in the KEGG database, where bacterial sigma factors are clustered into nine groups (Table S1). In contrast to bacteria, no sigma factors have been found in archaea. Instead, recruitment of the unique RNA polymerase depends on the TATA-binding protein (TBP) and transcription factor B (TFB), homologous to the eukaryotic transcription factor TFIIB (2, 14).
The regulation of gene expression in bacteria and archaea can also be modulated by a regulatory mechanism known as transcription attenuation, in which the cells sense a specific metabolic signal that originates a response, causing RNA polymerase to terminate transcription prematurely or to continue transcribing the subsequent genes of an operon (15–18). In bacteria, this response is based on the formation, in the nascent transcript, of two alternative RNA hairpin structures that are mutually exclusive, the terminator and a competing transcription antiterminator (15–18). In contrast to bacteria, transcription termination in archaea takes place in response to oligo(T)-rich sequences and seems to be independent of any RNA secondary structures (19). There are different molecular mechanisms used for sensing intracellular signals during the attenuation process. One is ribosome-mediated transcription attenuation, in which the intracellular level of charged tRNAs determines whether a ribosome translates a small leader peptide with or without pausing. Pausing during the translation of the leader peptide favors the formation of the antiterminator secondary structure, while translation without pausing favors the formation of the transcription terminator structure. An example of this kind of attenuation regulates the transcription of the tryptophan biosynthetic operon in E. coli (20). Another is protein-mediated transcription attenuation. In this case, an RNA-binding protein can either interact with the nascent transcript and prevent the formation of an antiterminator structure, as occurs in the Bacillus subtilis pyrimidine nucleotide operon (21), or stabilize this antiterminator structure, as in the E. coli bgl operon (22). The last is riboswitch-mediated transcription attenuation, in which the untranslated RNA leader can be folded into three-dimensional structures capable of sensing intracellular signals with high specificity and sensitivity in the total absence of other factors, including proteins (23–25).
Riboswitches are composed of two platforms or domains, the recognition or sensor domain and the expression domain. The recognition domain is folded into three-dimensional structures that include highly selective binding pockets whose structure is complementary to the shape of their corresponding binding targets. The sizes of the target molecules sensed by riboswitches are commonly small and include vitamin derivatives (thiamine pyrophosphate, flavin mononucleotide, adenosylcobalamin, etc.) (24–27), purines and their derivatives (guanine and adenine [28, 29]), amino acids (lysine and glycine [23, 30, 31]), and a phosphorylated sugar (27, 28). In addition to these compounds, a unique kind of riboswitch called the T-box can sense different types of uncharged tRNAs. This property of recognizing tRNAs as signal molecules implies that the T-box was not originally recognized as a riboswitch. However, given the similarity in its mechanism of action and because it fulfills all of the other characteristics of riboswitches, the T-box is currently described as a riboswitch in the reference Rfam database as well as in many other articles. The T-box regulates the expression of aminoacyl-tRNA synthetases and amino acid biosynthesis and transport (32–34).
The specific recognition of the ligand by the riboswitch is performed by ligand-RNA interactions through specific hydrogen bonds, electrostatic interactions, or stacking interactions. The sensing of the metabolite by the recognition platform stabilizes it. This action commonly produces a conformational change in the expression platform, which is placed contiguously on the sensor platform and is the active regulatory element of the riboswitch. The expression platform is commonly formed by transcriptional or translational attenuators; however, on some occasions, this platform can be folded, forming a ribozyme or defining alternative splicing patterns in eukaryotes (35, 36).
A general relationship between the number of TFs and the genome size of model organisms was described 15 years ago by Cases et al. and Pérez-Rueda et al., who demonstrated an overrepresentation of TFs in larger genomes, usually from free-living organisms, while fewer TFs were found in the smallest genomes, commonly from intracellular organisms (37, 38). In this regard, these authors proposed that transcriptional regulation in intracellular and extremophile bacteria relies almost exclusively on TFs. By contrast, in pathogenic bacteria, the contribution of sigma factors is more significant and nearly as significant as that of TFs in free-living bacteria (39). Our analysis aims to obtain an updated view of the trends in transcriptional regulation of organisms by considering not only a more representative number of organisms but also other key factors in transcription regulation, such as sigma factors and riboswitches. We identified compensatory behaviors for transcription factors and sigma factors in bacteria. A similar compensatory behavior for riboswitches was also exclusively found in the bacterial phyla Firmicutes and Tenericutes. Our study recognizes that the differential use of these regulatory elements might vary depending on the phylogenetic group to which the organisms belong. Our survey on the distribution of regulatory elements in prokaryotic organisms is discussed in light of possible events during the evolution of these organisms.
RESULTS AND DISCUSSION
The abundance of transcriptional regulators correlates with the genome size.
To elucidate how transcriptional regulators are distributed and whether their frequencies vary depending on the genome size of the organisms, we evaluated the total number of TFs and sigma factors in the genomic sequence of a set of 2,720 representative bacterial and archaeal organisms available in the KEGG database (see Table S2 in the supplemental material).
In the first instance, a protein was considered a TF or a sigma factor if its description corresponded to a transcription factor or a sigma factor according to the COG or KEGG databases (see Table S1 in the supplemental material). Note that while proteins in the COG database are grouped based on their sequence similarity, proteins in the KEGG database are grouped not only based on similarity but also considering up-to-date annotations of gene functions. Although this functional subdivision adds a degree of precision in identifying functional orthologs, the number of TFs with KO assignments is much lower than the number with COG classification. Thus, KEGG offers a more precise but limited view of their frequencies in the sequenced genomes. This limitation is less significant for sigma factors since most of their functions can be inferred based on amino acid sequence similarity (see Fig. S1 in the supplemental material).
We plotted the total number of each type of regulatory element versus the number of protein coding sequences (CDS) for each genome. The variation rate between these two variables (the number of regulators versus the number of CDS) was expressed as the slope (m) of the regression line of the plotted values (Fig. 1A to C; see also Fig. S2 in the supplemental material), such that a value of m = 1 would correspond to the group of genomes in which there is one regulatory element for every 100 CDS. Although the exponential regression model has been used previously to compare the number of TFs in relation to the size of the genomes (37, 38), in our case, we used a linear model as it facilitates our comparison of the trends between different phylogenetic groups without a significant decrease of fit value.
FIG 1.
The relationship between the abundance of transcriptional regulatory elements and genome size is phylum dependent. (A and B) The frequency of transcriptional factors (A) and sigma factors (B) in bacteria and archaea based on the COG database versus the number of CDS (grouped into hundreds as a unit) per genome. (C) The abundance of rho-independent transcriptional terminator riboswitches in bacteria and archaea versus the number of CDS per genome. (D to F) The abundance of each regulator is split by phylum. The slope (m) from the linear regression model and the Pearson correlation coefficient (r) are shown for each phylum. The blue line in panel C corresponds to a linear regression model in which the number of riboswitches and the genome sizes are correlated. The red line in panel C corresponds to a linear regression model in which the number of riboswitches seemed to be invariant with respect to the genome size.
Consistent with previous results (37, 38), we observed that the number of genes that code for TFs vary proportionally in relation to the size of their corresponding genomes (Fig. 1A; see also Fig. S2A). Regarding sigma factors, the numbers of genes that code for these regulatory elements in the genomes of our study were 37,271 and 32,391 according to the annotations in the COG or KEGG databases, respectively (see Tables S3 and S4 in the supplemental material). Our results indicate that they have a similar trend to that described for TFs, that is, the number of genes coding for sigma factors tends to vary proportionally to the total number of genes in the organism (Fig. 1B; see also Fig. S2B).
To search for riboswitches in the 5′ untranslated regions (UTRs) of the genes, we used the covariance models defined in the Rfam database (40–42) and the CMsearch program (43). We searched for these regulatory elements in the 5′ UTR of all of the genes in the 2,720 representative genomes of bacteria and archaea and found 9,565 riboswitches (see Materials and Methods for further detail). To identify the level of regulation by the riboswitches, we evaluated the sequence and secondary structure of their expression platforms. We considered that a riboswitch acts at the transcriptional level if its expression platform could fold into rho-independent transcriptional terminators in the case of bacteria or if its expression platform contains an oligo(T)-rich sequence in the case of archaea. Conversely, we considered that a riboswitch acts at the translational level if its expression platform could fold into secondary structures capable of sequestering the Shine-Dalgarno region (see Materials and Methods). Of the set of 9,565 riboswitches identified in our analysis, 4,099 were classified as transcriptional acting riboswitches. These riboswitches belong to 32 different riboswitch families according to their Rfam classification (see Table S5 in the supplemental material). We quantified the number of different riboswitch families per genome and plotted this number in relation to the number of CDS in the genomes. Unlike TFs and sigma factors, the number of riboswitches in most prokaryotic genomes is low and does not increase as the size of their genomes increases (Fig. 1C). From this figure, two different behaviors were observed; the first is one in which the genome sizes and the numbers of riboswitches were correlated (Fig. 1C, blue line), and the second is one in which the number of riboswitches seemed to be invariant with respect to the genome size with a low number of occurrences (Fig. 1C, red line).
The ratio of transcriptional regulators versus genome size depends on the phylogenetic origin of the organisms.
The fact that the number of transcriptional regulatory elements increases in proportion to the size of the genomes does not necessarily imply that the relationship between these two variables is the same for all organisms. Our study was intended to explain the different trends in the frequencies of transcriptional regulator use in prokaryotes, and it was performed by considering an evolutionary perspective and accounting for the phylogenetic relationships among these organisms. We analyzed the genomic sequences of 2,518 bacteria and 202 archaea representative of their species and clustered them based on their phylum. The resulting graphs are shown in Fig. 1D to F. From these figures, it can be observed that the points fit their respective regression models better and that the slope (m) varies significantly depending on the phylogenetic groups in question. For example, for TFs, Actinobacteria (m = 7.80) followed by Verrucomicrobia (m = 6.14), Proteobacteria (m = 6.05), and Firmicutes (m = 5.44) were the phylogenetic groups that contained the highest ratio of genes that code for TFs in relation to the number of CDS in their genomes. Additionally, Chlamydiae and Planctomycetes, both members of the same superphylum, were the groups with the smallest such values at m = 0.94 and 1.39, respectively (Fig. 1D). Note that although the slope values of the regression lines were low, the Pearson correlation coefficients (r) were significant. Regarding the sigma factors (Fig. 1E), Planctomycetes were the phylogenetic group with the highest ratio of this kind of regulatory element per number of CDS (m = 2.07). By contrast, in Chlamydiae, the number of genes encoding sigma factors does not seem to have a significant relationship with the number of CDS per genome (m = −0.04).
Similar trends for riboswitches were only observed in association with Firmicutes and Tenericutes, which were by far the phylogenetic groups that encoded a more significant number of transcriptional riboswitches in relation to the number of CDS in their genomes, as can be inferred from their corresponding slope values of 0.14 and 0.24, respectively (Fig. 1F). For the remaining phyla, the number of transcriptional riboswitches per genome was notoriously lower (Spirochaetes, Proteobacteria, and Bacteroidetes) or nonexistent (Crenarchaeota and Chlamydiae) and appeared to be unrelated to the genome size as can be inferred from the regression line slopes (m), with values close to or equal to zero, and the low values of the corresponding Pearson correlation coefficients (r) (Fig. 1F). Similar behavior is observed when accounting for the number of copies of riboswitches per genome in which Firmicutes and Tenericutes show both the highest copy numbers and increasing ratios of transcriptional riboswitches versus genome size, mostly due to T-box overrepresentation in their genomes. Here, the slope values of their regression lines are 0.60 and 0.57, respectively (see Fig. S3 in the supplemental material).
Trends in genomic frequency compensation for transcription factors, sigma factors, and riboswitches.
From the slope values of the regression lines mentioned above (Fig. 1; see also Fig. S2), a compensatory tendency can be observed between the genomic abundances of the different types of transcriptional regulators. To visualize these compensatory tendencies, we directly compared the number of TFs versus the number of sigma factors per genome and adjusted the regression lines by using the Theil-Sen estimator, which takes the median of all of the slopes in the data (see Materials and Methods for further details) (Fig. 2; see also Fig. S4 in the supplemental material). The slope values (m) of the regression lines for these figures represent the median number of TFs per sigma factor in their genomes.
FIG 2.
Compensatory trends in the frequencies of transcriptional factors and sigma factors are phylum specific. The total numbers of sigma factors (x axis) and transcription factors (y axis) per genome are shown for each phylum. The identification of TFs or sigma factors was performed based on the descriptions from the COG database. The linear regression models were adjusted using the Theil-Sen estimator. The slope (m) representing the median of the data slopes is shown for each phylum.
When TFs represent the primary number of regulators in a specific genome, sigma factors tend to have lower representation in that genome. This is true of Proteobacteria, with a median ratio of 14 TFs per sigma factor (Fig. 2A); Actinobacteria, with a median ratio of 9.27 TFs per sigma factor (Fig. 2B); and Spirochaetes, with a median ratio of 8.85 TFs per sigma factor (Fig. 2C). On the contrary, when the number of genes encoding TFs in a specific phylum is lower, the number of genes that encode sigma factors is higher. Some of the clearest examples of this trend are presented in Bacteroidetes and Planctomycetes, with median ratios of 1.53 and 0.74 TFs per sigma factor, respectively (Fig. 2F and G). Furthermore, there are specific phylogenetic groups that possess almost invariant low numbers of some kinds of regulators, regardless of their genome sizes. The first example is the case of sigma factors in Chlamydiae, the members of which have only two or three sigma factors but variable frequencies of TFs, resulting in the negative slopes shown in Fig. 2I and Fig. S4I. The second case of almost invariant low numbers of regulatory elements is transcriptional riboswitches. With the exceptions of Firmicutes and Tenericutes, transcriptional riboswitches in bacteria and archaea tend to exist in small numbers. In many organisms, the regulation of riboswitches occurs at the translational level; consequently, its effect on the transcriptional regulatory compensation behavior, as observed for TFs and sigma factors, is not significant.
To analyze whether the phylogenetic dependence on the relationship values between the genome size and the number of genes coding for transcription factors observed at the phylum level could be observed at deeper levels, we repeated our analysis at the taxonomic class level. As shown in Fig. S5 in the supplemental material, class-specific rates and compensatory effects were observed. As an example, organisms from Betaproteobacteria and Gammaproteobacteria had the highest increasing ratio of genes coding for transcription factors in comparison with the observed sigma factors. By contrast, Deltaproteobacteria tended to have the highest ratio of genes coding for sigma factors in comparison with TFs (Fig. S5).
We repeated this kind of analysis in different phyla but did not observe any trends similar to those found in Proteobacteria (data not shown). The class-specific rates observed in Proteobacteria could be explained by the fact that Proteobacteria encompass an enormous amount of morphological, physiological, and metabolic diversity (44).
Trends in the use of TF families in bacterial and archaeal genomes.
To assess whether the use of regulatory elements exhibits differences among phylogenetic groups, we calculated the frequency of each COG, KO, or Rfam family per phylum (see Tables S4 to S6 in the supplemental material) and performed a Fisher test (see Materials and Methods). Fisher’s test compares the frequency of a TF family within a particular phylum versus its frequencies in other phyla, such that an enrichment log10 (odd value) greater than zero represents an overrepresentation of a certain TF family on a particular phylum in comparison with other phyla, while a value less than zero reflects an underrepresentation of that family. Note that a TF family with high global frequency but homogeneously distributed in all phyla might have an enrichment value close to zero, while a family with low abundance but present only in one phylum will have a high enrichment value. These Fisher test values were used to construct heat maps to compare the trends in the different regulator families among phylogenetic groups. The frequencies of the TFs or sigma factors were evaluated using either the COG groups (Fig. 3A and C) or the KO groups (Fig. 3B and D) as a reference.
FIG 3.
The enrichment of transcriptional regulator families is phylum dependent. (A and B) The enrichment of transcription factor families per phylum according to the COG (A) and KEGG (B) databases. (C and D) The enrichment of sigma factor families per phylum according to the COG (C) and KEGG (D) databases. (E) Enrichment of transcriptional riboswitches classified by their Rfam family. The color scale shows the log10 (odds ratio) value from the Fisher test.
Although the results on the abundances of transcriptional regulatory elements may be biased by significant differences in the number of organisms sequenced and their degree of characterization in the different phyla, our analysis of the TF distribution according to phylogenetic origin indicates that, to date, 91 families of TFs are known according to the orthologous relationships defined in the COG database. A vast majority of these TF families are present in all of the phylogenetic groups studied in our work, although their proportions can vary significantly (Fig. 3A). Their ability to recognize different intracellular or extracellular stimuli depends on the great diversity of their ligand-binding domains.
When TFs are classified according to COG groups, the most widely distributed TFs belong to the DtxR family of regulators (COG1321), members of the TF family with predicted helix-turn-helix (HTH)-like domains (COG2865), and members of the GntR family (COG1725). By contrast, some TF families are highly specific to particular phylogenetic clades, such as the members of the TF family with an HTH domain (COG3373), members of the TF family involved in thiamine biosynthesis (COG1992), or TFs of a family of regulators with unknown function (COG1709) from the archaeal domain as represented by the phyla Euryarchaeota and Crenarchaeota. In Firmicutes, some of the most enriched TF families belong to transcriptional regulator CodY (COG4465), competence transcription factor ComK (COG4903), and predicted transcriptional regulator containing CBS domains (COG4109). Moreover, in Proteobacteria, the most enriched TF families were those of a predicted transcription regulator (COG4957), the MltR family (COG3722), and sigma54-dependent TF (COG4650). The phyla with fewer identified TFs were Verrucomicrobia, Planctomycetes, and Chlamydiae, which were grouped into a branch of the dendrogram shown in Fig. 3A, consistent with the fact that they constitute the Planctomycetes-Verrucomicrobia-Chlamydiae (PVC) superphylum (45).
A similar TF enrichment analysis was performed using the KO groups as references. In this case, the TF grouping represents not only the orthologous relationships of the proteins but also a common function defined by their regulatory target genes due to the classification itself. In accordance with this criterion, Proteobacteria is the phylum with the greatest diversity of TFs, with 301 different KO groups; by contrast, the phylum Chlamydiae only has 11 different KO groups of TFs. In addition, Proteobacteria, Firmicutes, Actinobacteria, and, to a lesser extent, Bacteroidetes and Euryarchaeota present a clear tendency to have phylum-specific KO groups of TFs (Fig. 3B; see also Table S7 in the supplemental material). We found that 25 of the 114 KO groups exclusively present in Proteobacteria belong to the LysR family, making it the most enriched class of TFs in the phylum, followed by the LuxR, AraC, and TetR/AcrR TF families, which were present in 21, 18, and 12 KO groups, respectively. In Firmicutes, 28 KOs were exclusive, primarily containing members of the TF with helix-turn-helix domain MarR and MerR TF families, which were present in 7, 4, and 3 KO groups, respectively. Among the KO groups exclusively found in Actinobacteria, 4 belong to the WhiB TF family. In addition, there are only two KO groups exclusive to the Bacteroidetes phylum, the CRP/FNR family and the HTH-type of TFs. Notably, Euryarchaeota and Crenarchaeota, both members of the Archaea domain, share 4 KO groups of TFs that are uniquely present in both phyla; most are defined as putative and nonfunction-associated TFs. These TFs can be considered excellent candidates to identify new models of regulation in these organisms.
Trends in the use of sigma factor families in bacterial genomes.
In terms of their frequency in bacterial genomes, sigma factors are the second most common type of transcriptional regulators. They confer promoter selectivity to the RNA polymerase to start transcription. Similar to how we addressed the TFs, we quantified the enrichment of sigma factor families in genomes across the different bacterial phylogenetic groups. Members of the RpoD housekeeping sigmaD factor (sigma70), the RpoS stationary phase sigmaS factor (sigma38), and the RpoH heat shock sigmaH factor (sigma32) families are grouped into COG0568. As expected, all of the bacterial organisms possessed a housekeeping sigma and, therefore, at least one member of this COG in their genome (Fig. 3C). Other widely distributed sigma factors among bacterial organisms are the flagellar synthesis and chemotaxis sigmaF (sigma28/FliA), the nitrogen limitation sigmaN (sigma54/RpoN), and the extracytoplasmic function (ECF) sigma factors that constitute the most ubiquitous and diverse family of alternative sigma factors in bacterial genomes (Fig. 3C and D). ECF sigma factors commonly regulate the expression of genes coding for outer membrane or periplasmic space proteins involved in the response to unfolding proteins during the stationary phase and in responding to different kinds of cellular stresses, for instance, heat shock and osmotic and oxidative stresses as well as genes associated with bacterial virulence, among others (46–48); thus, the high abundance of this class of sigma factors reflects a wide response to environmental changes. Consistent with a previous study (49), we found a high abundance of ECF sigma factors in organisms from the Planctomycetes phylum, most of which are free living in the sea or soil (50), with up to 74 genes coding for proteins of this family in the case of Singulisphaera acidiphila; nevertheless, we did not find these sigma factors in the closely related Chlamydiae phylum. This lack of ECFs in Chlamydiae could be explained by their nature as intracellular pathogens.
Conversely, there are sigma factors that are mostly specific to a phylum, such as SigH, which is involved in the transition to the post-exponential phase at the beginning of the sporulation process, and SigI, which is involved in the regulation of cell wall metabolism in response to heat stress. These two sigma factors are highly enriched in Firmicutes (Fig. 3D). In addition, sigmaH (sigmaH32/RpoH) was found exclusively in Proteobacteria, and it was transcribing genes involved in the heat shock response. Notably, both Chlamydia and Tenericutes contain organisms with small genomes and a low abundance of alternative sigma factors (Fig. 3C and D).
In general terms, the amount and type of alternative sigma factors vary substantially between different bacterial organisms and usually reflect their lifestyle. For example, an obligate intracellular organism living in almost undisturbed environments often possesses only the housekeeping sigma factor. By contrast, some free-living bacteria, such as Streptomyces coelicolor, which inhabit environments with diverse fluctuating physicochemical conditions, may have several dozen alternative sigma factors (ScoDB at http://strepdb.streptomyces.org.uk) (51).
By themselves, sigma factors have a very limited capacity to modify their activity in response to the recognition of intracellular signals or stimuli, although exceptional examples of transcriptional switching through the direct interaction of small molecules with RNA polymerase have been described; this is the case for the so-called stringent response, in which alarmone guanosine tetraphosphate (ppGpp) is recognized and induces an arrest in RNA synthesis in response to nutrient starvation or other stress conditions (52, 53). Regulations by sigma factors work as global regulators, allowing bacteria to switch between different transcriptional programs based exclusively on the type of promoters that the RNA polymerase recognizes. These transcriptional programs might include the coordinated expression of virulence-associated genes in pathogenic bacteria or genes that respond to specific nutritional conditions, developmental processes, or stress-related signals (54, 55). Commonly, the availability of alternative sigma factors is controlled at the level of their synthesis, by proteolysis, or by the reversible interaction with their corresponding anti-sigma factors (12).
Note that neither eukaryotes nor archaebacteria, as represented by the phyla Euryarchaeota and Crenarchaeota, have sigma factors. The transcription machinery in archaebacteria has been widely documented to resemble that of eukaryotes in that the archaea RNA polymerase uses transcription factors such as TATA-binding protein (TBP) and TFIIB instead of sigma factors for initiation (2).
Trends in the use of transcriptional acting riboswitch families in bacterial and archaeal genomes.
Unlike TFs and sigma factors, riboswitches are RNA elements that act in cis, and without counting some exceptional cases, they exclusively affect the genes and operons that are immediately downstream from them. Riboswitches recognize their target molecules with high specificity, binding vitamin derivatives, nucleotides, amino acids, phosphorylated sugars, and metal ions; therefore, the genes directly regulated by riboswitches are commonly those that are involved in the synthesis pathway or the transport of the sensed molecule.
Considering the number of different families of riboswitches per genome, we performed a Fisher test to analyze the enrichment of transcriptional acting riboswitch families per phylum. As shown in Fig. 3E, the riboswitch TPP was the most widespread among bacteria, including the phyla with the lowest (e.g., Verrucomicrobia, Planctomycetes, and Spirochaetes) or the highest (e.g., Firmicutes and Proteobacteria) number of riboswitch families. For example, 48 of the 97 riboswitches found in Bacteroidetes, 6 of the 13 riboswitches found in Spirochaetes, and 3 of the 10 riboswitches from Planctomycetes belong to the TPP riboswitch.
As noted previously, the Firmicutes phylum has by far the highest abundance of transcriptional acting families of riboswitches compared with that of other phyla, with an average of 6 riboswitch families per genome, although there are organisms in this phylum, such as Geosporobacter ferrireducens, which contain 14 out of 36 families of the transcriptional active riboswitches reported in the Rfam database. Firmicutes are enriched in c-dl-GMP-II, glmS, and AdoCbI riboswitches, which are almost exclusively found in this phylum. The T-box riboswitch appears almost exclusively in this phylum as well, where it represents 16% of the riboswitches. Moreover, in other phyla (e.g., Actinobacteria, Bacteroidetes, Euryarchaeota, Planctomycetes, Spirochaetes, and Verrucomicrobia), T-box riboswitches do not exist. This riboswitch is not commonly found in members of Proteobacteria. However, we found that the gene that codes for the 2-isopropylmalate synthase enzyme is regulated by a T-box in 12 out of 73 organisms in the Deltaproteobacteria class (see Table S8 in the supplemental material). The T-box riboswitch might have been acquired by horizontal transfer from some Firmicutes to the common ancestor of these Deltaproteobacteria.
As shown in Fig. 3E on Planctomycetes and Verrucomicrobia, members of the PVC superphylum have the lowest number of riboswitch families regardless of the number of CDS in their genomes. We only found three families of riboswitches in Verrucomicrobia among eight studied organisms that belong to the TPP (1 riboswitch), SAM (1 riboswitch), and SAM I-IV (2 riboswitches) families. From 18 Planctomycetes organisms studied here, we found transcriptional riboswitches in 8 of them, which belonged to 5 families of riboswitches, the TTP (3 riboswitches), SAM (1 riboswitch), cobalamin (4 riboswitches), c-di-GMP-I (1 riboswitch), and mini-ykkC (1 riboswitch) riboswitches. Interestingly, there are some Planctomycetes with up to 2 or 4 riboswitch copies per genome and many others with none. From the 156 Euryarchaeota organisms analyzed here, only 9 of them have riboswitches from the FMN and crcB families. We did not find any transcriptional acting riboswitches in Chlamydiae or Crenarchaeota. We were also able to detect riboswitches that were previously reported as predominant in specific phylogenetic groups. For example, the SAM alpha riboswitch was only found in 137 of the 1,271 Proteobacteria organisms, with 125 of them in Alphaproteobacteria, 7 in Betaproteobacteria, 4 in Gammaproteobacteria, and 1 in Deltaproteobacteria. In addition, we only found transcriptional riboswitches from the SAM IV family in Actinobacteria as previously reported (Table S5).
The analysis of the expression platform for all of the identified riboswitches allowed us to distinguish the level at which they exerted their regulation, whether transcriptional or translational (see Materials and Methods for further details). In certain organisms, the lack of transcriptional acting riboswitches seems to be compensated for by the corresponding translational acting versions of riboswitches and vice versa. To confirm this hypothesis, we looked for riboswitches in their expression platforms that could form secondary RNA structures for sequestering the Shine-Dalgarno sequence and thereby inhibit the start of the translation process (see Table S9 in the supplemental material and Materials and Methods for further details). Figure 4 shows the average number of each of the families of riboswitches per organism within each phylum as well as the percentage that could regulate gene expression at the transcriptional versus translational level. It is noteworthy that specific trends in the riboswitch gene regulation levels can be observed depending on the phylum. For example, Firmicutes and Tenericutes use predominantly transcriptional riboswitches; however, Actinobacteria and Proteobacteria use translational riboswitches from a wide range of Rfam families. All of the riboswitches in Crenarchaeota and most of the riboswitches in Euryarchaeota, as representatives of Archaea, belong to translational acting riboswitches. Notably, some phyla, e.g., Planctomycetes, Verrucomicrobia, Bacteroidetes, and Spirochaetes, possess both transcriptional and translational riboswitches. Moreover, the use of particular families as transcriptional or translational attenuators seems to vary depending on the phylum. This is the case for the mini-ykkC and TPP riboswitches that act as transcriptional riboswitches in Planctomycetes but as translational riboswitches in Actinobacteria, Chlamydiae, Euryarchaeota, and Verrucomicrobia. In addition, Fig. 4 shows that in some cases, only a subgroup of the organisms from a given phylum possess one element of the type riboswitch (e.g., T-box and glutamine in Proteobacteria and FMN in Euryarchaeota). Given their sparse distribution in the phylum, we suggest that these riboswitches might have been acquired by horizontal gene transfer.
FIG 4.
Frequency comparison of transcriptional and translational acting riboswitches. The point size represents the mean of either the transcriptional or translational total riboswitches found per organism for each Rfam family and phylum. The color scale shows the mean percentage of transcriptional riboswitches found for each Rfam family and phylum.
Finally, there are two riboswitches that deserve mention, due to the type of molecule they recognize and their wide distribution in certain types of bacteria. The first of these is the riboswitch T-box, which, instead of identifying a small metabolite, recognizes uncharged tRNAs as a signal to sense the intracellular levels of amino acids. According to our results, the T-box riboswitch is the most common in Firmicutes and Tenericutes, regulating the expression of genes that code for enzymes responsible for charging the amino acids on their corresponding tRNAs and the biosynthesis and transport of amino acids. As an example, we found that the firmicute Bacillus cereus has 68 genes in 37 operons that are regulated by the T-box riboswitch. The second family of riboswitches that deserves special mention is the c-di-GMP riboswitches (I and II). As their name indicates, these riboswitches can recognize cyclic di-GMP, which is a secondary messenger used to signal metabolic states in bacteria. In response, many genes involved in cellular processes, such as virulence, motility, or biofilm formation, are regulated by this family of riboswitches.
In summary, we report a tendency for organisms to compensate for the low frequencies of a particular type of regulatory element (i.e., transcription factors) with a high frequency of other types of regulatory element (i.e., sigma factors), providing a comprehensive description of the most abundant COG, KEGG, and Rfam families of transcriptional regulators present in prokaryotic genomes according to their genome size and phylogenetic origin.
MATERIALS AND METHODS
Selection of representative organisms.
Genome sequences were retrieved from the KEGG database. This database contained 4,852 bacterial and 277 archaeal genomes. To avoid redundant species, we selected the organism with the highest number of open reading frames per species for each representative. We also chose the phylogenetic groups with at least nine organisms. In total, we selected 2,518 bacterial and 202 archaeal genomes belonging to 11 phyla.
Quantification of transcription factors and sigma factors.
Using COG and KEGG orthology classification in bacteria and archaea, we obtained all gene IDs and their corresponding COG/KO descriptions for the selected genomes. In accordance with their COG/KO classifications, we quantified the genes corresponding to sigma and transcriptional factors.
All data obtained for the TFs and sigma factors are summarized in the tables containing the number of regulators per genome, the number of open reading frames (CDS), phylum, and class for each organism. For better visualization, we expressed the genome size as the number of CDS divided by 100. We then plotted the number of regulators per genome versus the CDS and split them by phylum.
Theil-Sen estimator analysis.
The total numbers of sigma factors and transcription factors per organism were compared and split by phylum. The linear regression models per phylum were adjusted using the Theil-Sen estimator, which is available in the mblm R package (https://cran.r-project.org/package=mblm). This method counts the median of all possible slopes in the data, resulting in an outlier-resistant model (56, 57).
Riboswitch prediction.
We obtained the covariance model for all riboswitches reported in Rfam and concatenated them into a compiled matrix. We used a region that is 400 bp upstream of the coding region of genes within available genomes from bacteria and archaea. We evaluated the putative presence of riboswitches by comparing these regions versus the covariance matrix using the CMsearch program (43), which takes advantage of both sequence and secondary structural conservation in the search for RNA homologs. The parameters used to run CMsearch were as follows: E value ≤ 1e−3, cpu = 32, and toponly = TRUE (to only search for riboswitches in the top strand). The parameter −nohmm, which skips all HMM filter stages, was not used when running the CMsearch program. The results from the CMsearch analysis were filtered, taking into account only those sequences with score values greater than the trusted cutoff value reported for each riboswitch in Rfam.
To identify transcriptional acting riboswitches, we developed a local program written in Perl, which was based on the previously reported computer approach by Merino and Yanofsky in 2005 (17). In brief, using this method, a 50-nt analysis window was considered downstream of each region identified as a riboswitch. In this analytical window, a run of 6 nucleotides in length was searched for those that contained at least 5 T residues. Afterward, using the RNAfold program version 2.4.0 (58), we searched for the most stable RNA secondary structure that could be formed, which was composed of a stem and loop structure and whose Gibbs free energy was less than −10 kcal/mol. In cases in which the analyzed region could fold into more than one stem and loop structure, only the one closest to the run of T residues was considered. We allowed for two nucleotides between the base of the secondary structure and the run of T residues.
Unlike bacteria, archaea do not require a stable secondary structure in RNA for the end of transcription, but they are stimulated by the presence of oligo(T) sequences. Therefore, for archaeal genomes, we considered a riboswitch to be acting at the transcription termination level if its sequence is followed by a consecutive run of 6 T residues and its distance to the downstream-regulated gene is greater than 50 nt. These results are summarized in Table S5 in the supplemental material.
To identify translational acting riboswitches, we took riboswitches that do not form a predicted transcriptional attenuator and whose predicted 5′ position is up to 100 nt downstream from the translational start site of their corresponding genes. We then evaluated whether they could form secondary structures that sequester the Shine-Dalgarno sequences with Gibbs free energy ≤−7 kcal/mol and a 3′ position up to 7 nt downstream from the translational start site. The resulting riboswitches are summarized in Table S9 in the supplemental material.
Enrichment analysis.
For each phylum, we compared the frequency of each COG, KO, or Rfam family within each phylum versus the rest of the phyla. We used these frequencies to perform a Fisher test with a standard Bonferroni correction. We then used the logarithm (log10) of the enrichment value (odds ratio) for clustering and plotted the enrichment of each family into a heatmap by using the default parameters in the “Heatmap” function from the R package ComplexHeatmap version 2.4.3, which uses the Euclidean method for hierarchical clustering of rows and columns in the heatmaps (59).
Software used.
The pipelines generated for counting transcriptional regulators from the COG/KEGG/Rfam database were written in Perl version 5.18.2. The forward analysis, plots, and the development of the erba package were performed in R (https://cran.r-project.org) version 4.0. This package integrates functions from previously developed packages dplyr version 1.0.2 (https://dplyr.tidyverse.org) and ggplot2 version 3.3.2 (https://ggplot2.tidyverse.org).
Data access.
All functions developed for the analysis and plotting, which are contained within the R package erba, are available at https://github.com/josschavezf/erba. The pipelines used to create the figures and supplemental figures are located at the GitHub repository https://github.com/josschavezf/Chavez_et_al_2020.
Supplementary Material
ACKNOWLEDGMENTS
We sincerely thank Ricardo Ciria for computer support and Shirley Ainsworth for bibliographical assistance.
We acknowledge the Programa de Maestría y Doctorado en Ciencias Bioquímicas at the Instituto de Biotecnología-UNAM and the Consejo Nacional de Ciencia y Tecnología (CONACyT) for the Ph.D. scholarship, number 565669, awarded to J.C.
We declare no conflicts of interest.
Footnotes
Supplemental material is available online only.
REFERENCES
- 1.Browning DF, Busby SJW. 2004. The regulation of bacterial transcription initiation. Nat Rev Microbiol 2:57–65. doi: 10.1038/nrmicro787. [DOI] [PubMed] [Google Scholar]
- 2.Werner F. 2013. Molecular mechanisms of transcription elongation in archaea. Chem Rev 113:8331–8349. doi: 10.1021/cr4002325. [DOI] [PubMed] [Google Scholar]
- 3.Kavita K, de Mets F, Gottesman S. 2018. New aspects of RNA-based regulation by Hfq and its partner sRNAs. Curr Opin Microbiol 42:53–61. doi: 10.1016/j.mib.2017.10.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Van Assche E, Van Puyvelde S, Vanderleyden J, Steenackers HP. 2015. RNA-binding proteins involved in post-transcriptional regulation in bacteria. Front Microbiol 6:141. doi: 10.3389/fmicb.2015.00141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cain JA, Solis N, Cordwell SJ. 2014. Beyond gene expression: the impact of protein post-translational modifications in bacteria. J Proteomics 97:265–286. doi: 10.1016/j.jprot.2013.08.012. [DOI] [PubMed] [Google Scholar]
- 6.Tatusov RL, Koonin EV, Lipman DJ. 1997. A genomic perspective on protein families. Science 278:631–637. doi: 10.1126/science.278.5338.631. [DOI] [PubMed] [Google Scholar]
- 7.Galperin MY, Makarova KS, Wolf YI, Koonin EV. 2015. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res 43:D261–D269. doi: 10.1093/nar/gku1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Makarova KS, Wolf YI, Koonin EV. 2015. Archaeal clusters of orthologous genes (arCOGs): an update and application for analysis of shared features between Thermococcales, Methanococcales, and Methanobacteriales. Life (Basel) 5:818–840. doi: 10.3390/life5010818. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kanehisa M, Goto S. 2000. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kanehisa M, Sato Y, Furumichi M, Morishima K, Tanabe M. 2019. New approach for understanding genome variations in KEGG. Nucleic Acids Res 47:D590–D595. doi: 10.1093/nar/gky962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kanehisa M. 2019. Toward understanding the origin and evolution of cellular organisms. Protein Sci 28:1947–1951. doi: 10.1002/pro.3715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Paget MS. 2015. Bacterial sigma factors and anti-sigma factors: structure, function and distribution. Biomolecules 5:1245–1265. doi: 10.3390/biom5031245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Weirauch MT, Hughes TR. 2011. A catalogue of eukaryotic transcription factor types, their evolutionary origin, and species distribution. Subcell Biochem 52:25–73. doi: 10.1007/978-90-481-9069-0_3. [DOI] [PubMed] [Google Scholar]
- 14.Grohmann D, Werner F. 2011. Recent advances in the understanding of archaeal transcription. Curr Opin Microbiol 14:328–334. doi: 10.1016/j.mib.2011.04.012. [DOI] [PubMed] [Google Scholar]
- 15.Henkin TM, Yanofsky C. 2002. Regulation by transcription attenuation in bacteria: how RNA provides instructions for transcription termination/antitermination decisions. Bioessays 24:700–707. doi: 10.1002/bies.10125. [DOI] [PubMed] [Google Scholar]
- 16.Merino E, Yanofsky C. 2002. Regulation by termination-antitermination: a genomic approach, p 323–336. In Sonenshein A, Losick R, Hoch J (ed), Bacillus subtilis and its closest relatives. American Society for Microbiology, Washington, DC. [Google Scholar]
- 17.Merino E, Yanofsky C. 2005. Transcription attenuation: a highly conserved regulatory strategy used by bacteria. Trends Genet 21:260–264. doi: 10.1016/j.tig.2005.03.002. [DOI] [PubMed] [Google Scholar]
- 18.Naville M, Gautheret D. 2010. Transcription attenuation in bacteria: theme and variations. Brief Funct Genomics 9:178–189. doi: 10.1093/bfgp/elq008. [DOI] [PubMed] [Google Scholar]
- 19.French SL, Santangelo TJ, Beyer AL, Reeve JN. 2007. Transcription and translation are coupled in Archaea. Mol Biol Evol 24:893–895. doi: 10.1093/molbev/msm007. [DOI] [PubMed] [Google Scholar]
- 20.Oxender DL, Zurawski G, Yanofsky C. 1979. Attenuation in the Escherichia coli tryptophan operon: role of RNA secondary structure involving the tryptophan codon region. Proc Natl Acad Sci U S A 76:5524–5528. doi: 10.1073/pnas.76.11.5524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Turner RJ, Lu Y, Switzer RL. 1994. Regulation of the Bacillus subtilis pyrimidine biosynthetic (pyr) gene cluster by an autogenous transcriptional attenuation mechanism. J Bacteriol 176:3708–3722. doi: 10.1128/jb.176.12.3708-3722.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Amster-Choder O, Wright A. 1993. Transcriptional regulation of the bgl operon of Escherichia coli involves phosphotransferase system-mediated phosphorylation of a transcriptional antiterminator. J Cell Biochem 51:83–90. doi: 10.1002/jcb.240510115. [DOI] [PubMed] [Google Scholar]
- 23.Rodionov DA, Vitreschak AG, Mironov AA, Gelfand MS. 2003. Regulation of lysine biosynthesis and transport genes in bacteria: yet another RNA riboswitch? Nucleic Acids Res 31:6748–6757. doi: 10.1093/nar/gkg900. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Rodionov DA, Vitreschak AG, Mironov AA, Gelfand MS. 2002. Comparative genomics of thiamin biosynthesis in procaryotes. New genes and regulatory mechanisms. J Biol Chem 277:48949–48959. doi: 10.1074/jbc.M208965200. [DOI] [PubMed] [Google Scholar]
- 25.Winkler W, Nahvi A, Breaker RR. 2002. Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression. Nature 419:952–956. doi: 10.1038/nature01145. [DOI] [PubMed] [Google Scholar]
- 26.Winkler WC, Cohen-Chalamish S, Breaker RR. 2002. An mRNA structure that controls gene expression by binding FMN. Proc Natl Acad Sci U S A 99:15908–15913. doi: 10.1073/pnas.212628899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Nahvi A, Sudarsan N, Ebert MS, Zou X, Brown KL, Breaker RR. 2002. Genetic control by a metabolite binding mRNA. Chem Biol 9:1043–1049. doi: 10.1016/s1074-5521(02)00224-7. [DOI] [PubMed] [Google Scholar]
- 28.Mandal M, Boese B, Barrick JE, Winkler WC, Breaker RR. 2003. Riboswitches control fundamental biochemical pathways in Bacillus subtilis and other bacteria. Cell 113:577–586. doi: 10.1016/s0092-8674(03)00391-x. [DOI] [PubMed] [Google Scholar]
- 29.Mandal M, Breaker RR. 2004. Adenine riboswitches and gene activation by disruption of a transcription terminator. Nat Struct Mol Biol 11:29–35. doi: 10.1038/nsmb710. [DOI] [PubMed] [Google Scholar]
- 30.Grundy FJ, Lehman SC, Henkin TM. 2003. The L box regulon: lysine sensing by leader RNAs of bacterial lysine biosynthesis genes. Proc Natl Acad Sci U S A 100:12057–12062. doi: 10.1073/pnas.2133705100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Barrick JE, Corbino KA, Winkler WC, Nahvi A, Mandal M, Collins J, Lee M, Roth A, Sudarsan N, Jona I, Kenneth Wickiser J, Breaker RR. 2004. New RNA motifs suggest an expanded scope for riboswitches in bacterial genetic control. Proc Natl Acad Sci U S A 101:6421–6426. doi: 10.1073/pnas.0308014101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Grundy FJ, Henkin TM. 1993. tRNA as a positive regulator of transcription antitermination in B. subtilis. Cell 74:475–482. doi: 10.1016/0092-8674(93)80049-K. [DOI] [PubMed] [Google Scholar]
- 33.Vitreschak AG, Mironov AA, Lyubetsky VA, Gelfand MS. 2008. Comparative genomic analysis of T-box regulatory systems in bacteria. RNA 14:717–735. doi: 10.1261/rna.819308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Gutiérrez-Preciado A, Henkin TM, Grundy FJ, Yanofsky C, Merino E. 2009. Biochemical features and functional implications of the RNA-based T-box regulatory mechanism. Microbiol Mol Biol Rev 73:36–61. doi: 10.1128/MMBR.00026-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Winkler WC, Nahvi A, Roth A, Collins JA, Breaker RR. 2004. Control of gene expression by a natural metabolite-responsive ribozyme. Nature 428:281–286. doi: 10.1038/nature02362. [DOI] [PubMed] [Google Scholar]
- 36.Cheah MT, Wachter A, Sudarsan N, Breaker RR. 2007. Control of alternative RNA splicing and gene expression by eukaryotic riboswitches. Nature 447:497–500. doi: 10.1038/nature05769. [DOI] [PubMed] [Google Scholar]
- 37.Cases I, de Lorenzo V, Ouzounis CA. 2003. Transcription regulation and environmental adaptation in bacteria. Trends Microbiol 11:248–253. doi: 10.1016/S0966-842X(03)00103-3. [DOI] [PubMed] [Google Scholar]
- 38.Pérez-Rueda E, Collado-Vides J, Segovia L. 2004. Phylogenetic distribution of DNA-binding transcription factors in bacteria and archaea. Comput Biol Chem 28:341–350. doi: 10.1016/j.compbiolchem.2004.09.004. [DOI] [PubMed] [Google Scholar]
- 39.Pérez-Rueda E, Janga SC, Martínez-Antonio A. 2009. Scaling relationship in the gene content of transcriptional machinery in bacteria. Mol Biosyst 5:1494–1501. doi: 10.1039/b907384a. [DOI] [PubMed] [Google Scholar]
- 40.Kalvari I, Argasinska J, Quinones-Olvera N, Nawrocki EP, Rivas E, Eddy SR, Bateman A, Finn RD, Petrov AI. 2018. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res 46:D335–D342. doi: 10.1093/nar/gkx1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Kalvari I, Nawrocki EP, Argasinska J, Quinones-Olvera N, Finn RD, Bateman A, Petrov AI. 2018. Non-coding RNA analysis using the Rfam database. Curr Protoc Bioinformatics 62:e51. doi: 10.1002/cpbi.51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR. 2003. Rfam: an RNA family database. Nucleic Acids Res 31:439–441. doi: 10.1093/nar/gkg006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Cui X, Lu Z, Wang S, Jing-Yan Wang J, Gao X. 2016. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction. Bioinformatics 32:i332–i340. doi: 10.1093/bioinformatics/btw271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Kersters K, De Vos P, Gillis M, Swings J, Vandamme P, Stackebrandt E. 2006. Introduction to the proteobacteria, p 3–37. In The prokaryotes. Springer, New York. [Google Scholar]
- 45.Wagner M, Horn M. 2006. The Planctomycetes, Verrucomicrobia, Chlamydiae and sister phyla comprise a superphylum with biotechnological and medical relevance. Curr Opin Biotechnol 17:241–249. doi: 10.1016/j.copbio.2006.05.005. [DOI] [PubMed] [Google Scholar]
- 46.Pinto D, Liu Q, Mascher T. 2019. ECF σ factors with regulatory extensions: the one-component systems of the σ universe. Mol Microbiol 112:399–409. doi: 10.1111/mmi.14323. [DOI] [PubMed] [Google Scholar]
- 47.Alba BM, Leeds JA, Onufryk C, Lu CZ, Gross CA. 2002. DegS and YaeL participate sequentially in the cleavage of RseA to activate the sigma(E)-dependent extracytoplasmic stress response. Genes Dev 16:2156–2168. doi: 10.1101/gad.1008902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Rowley G, Spector M, Kormanec J, Roberts M. 2006. Pushing the envelope: extracytoplasmic stress responses in bacterial pathogens. Nat Rev Microbiol 4:383–394. doi: 10.1038/nrmicro1394. [DOI] [PubMed] [Google Scholar]
- 49.Wiegand S, Jogler M, Boedeker C, Pinto D, Vollmers J, Rivas-Marín E, Kohn T, Peeters SH, Heuer A, Rast P, Oberbeckmann S, Bunk B, Jeske O, Meyerdierks A, Storesund JE, Kallscheuer N, Lücker S, Lage OM, Pohl T, Merkel BJ, Hornburger P, Müller RW, Brümmer F, Labrenz M, Spormann AM, Op den Camp HJM, Overmann J, Amann R, Jetten MSM, Mascher T, Medema MH, Devos DP, Kaster AK, Øvreås L, Rohde M, Galperin MY, Jogler C. 2020. Cultivation and functional characterization of 79 planctomycetes uncovers their unique biology. Nat Microbiol 5:126–140. doi: 10.1038/s41564-019-0588-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Lage OM, Van Niftrik L, Jogler C, Devos DP. 2019. Planctomycetes, p 614–626. In Encyclopedia of microbiology. Elsevier, Philadelphia, PA. [Google Scholar]
- 51.Bentley SD, Chater KF, Cerdeño-Tárraga A-M, Challis GL, Thomson NR, James KD, Harris DE, Quail MA, Kieser H, Harper D, Bateman A, Brown S, Chandra G, Chen CW, Collins M, Cronin A, Fraser A, Goble A, Hidalgo J, Hornsby T, Howarth S, Huang C-H, Kieser T, Larke L, Murphy L, Oliver K, O'Neil S, Rabbinowitsch E, Rajandream M-A, Rutherford K, Rutter S, Seeger K, Saunders D, Sharp S, Squares R, Squares S, Taylor K, Warren T, Wietzorrek A, Woodward J, Barrell BG, Parkhill J, Hopwood DA. 2002. Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2). Nature 417:141–147. doi: 10.1038/417141a. [DOI] [PubMed] [Google Scholar]
- 52.Dalebroux ZD, Swanson MS. 2012. ppGpp: magic beyond RNA polymerase. Nat Rev Microbiol 10:203–212. doi: 10.1038/nrmicro2720. [DOI] [PubMed] [Google Scholar]
- 53.Hauryliuk V, Atkinson GC, Murakami KS, Tenson T, Gerdes K. 2015. Recent functional insights into the role of (p)ppGpp in bacterial physiology. Nat Rev Microbiol 13:298–309. doi: 10.1038/nrmicro3448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Kazmierczak MJ, Wiedmann M, Boor KJ. 2005. Alternative sigma factors and their roles in bacterial virulence. Microbiol Mol Biol Rev 69:527–543. doi: 10.1128/MMBR.69.4.527-543.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Feklístov A, Sharon BD, Darst SA, Gross CA. 2014. Bacterial sigma factors: a historical, structural, and genomic perspective. Annu Rev Microbiol 68:357–376. doi: 10.1146/annurev-micro-092412-155737. [DOI] [PubMed] [Google Scholar]
- 56.Theil H. 1992. A rank-invariant method of linear and polynomial regression analysis, p 345–381. In Henri Theil’s contributions to economics and econometrics. advanced studies in theoretical and applied econometrics, vol 23. Springer, Dordrecht, Netherlands. [Google Scholar]
- 57.Sen PK. 1968. Estimates of the regression coefficient based on Kendall’s tau. J Am Stat Assoc 63:1379–1389. doi: 10.1080/01621459.1968.10480934. [DOI] [Google Scholar]
- 58.Lorenz R, Luntzer D, Hofacker IL, Stadler PF, Wolfinger MT. 2016. SHAPE directed RNA folding. Bioinformatics 32:145–147. doi: 10.1093/bioinformatics/btv523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Gu Z, Eils R, Schlesner M. 2016. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32:2847–2849. doi: 10.1093/bioinformatics/btw313. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.