Abstract
Genes of prokaryotes and Archaea are often organized in cotranscribed groups, or operons. In contrast, eukaryotic genes are generally transcribed independently. Here we show that there is a substantial economic gain for the cell to cotranscribe genes encoding protein complexes because it synchronizes the fluctuations, or noise, in the levels of the different components. This correlation substantially reduces the shortfall in production of the complex. This benefit is relatively large in small cells such as bacterial cells, in which there are few mRNAs and proteins per cell, and is diminished in larger cells such as eukaryotic cells.
Introduction
What are the evolutionary forces that drive operon formation in prokaryotes but not in eukaryotes? One idea is that stronger genome size constraints in prokaryotes provide a large benefit to reducing the number of promoters. Another idea is that the high rate of horizontal gene transfer in prokaryotes provides an advantage for functionally related genes to be grouped together to increase their probability of cotransfer (1–3). Another benefit of operon formation is that it decreases the fluctuations between the concentrations of the coexpressed proteins (4). Fluctuations in relative protein concentrations can be wasteful, for example, when multiple proteins form a tight complex or act in concert (4–7). Translational coupling, in which ribosomes translating an upstream gene aid the translation of the downstream gene on the same mRNA molecule, has been emphasized as a way in which operon formation can reduce such fluctuations (6, 7). But strong translational coupling is not a general feature of operons (6). Here we show that cotranscription by itself can provide a substantial cost reduction in the production of protein complexes. This benefit decreases as the number of complexes increases, as required in larger cells. Thus, reduction in the shortfall of protein complexes provides an additional explanation for the abundance of operons in prokaryotes and Archaea compared to the lack of that in eukaryotes.
Considering a functional 1:1 complex of two different proteins, we compare a system in which the two genes are cotranscribed but not translationally coupled to a system in which the two genes are transcribed independently from promoters of equal strength (split). If 100 copies of the complex are required, then in the absence of noise (and assuming a tight complex), it would be sufficient to produce 100 copies of each protein. In a living cell, both systems would fall short of the 100-complex target because the number of each protein will fluctuate around 100, and the number of complexes is determined by the minimum level of the two proteins. However, in the operon arrangement, the levels of the two proteins tend to fluctuate in synchrony, and thus, the shortfall is substantially less; in this example, the average number of complexes produced by the operon is ~20% higher than that produced with the split arrangement (Fig. 1). Another way of looking at this is that in order to ensure that at least 100 complexes are present in the cell for at least 95% (50%) of the time, then the cell must make on average 180 (110) copies of each protein in the operon arrangement, while it needs to make 190 (126) copies of each in the split case.
Other factors being equal, this avoidance of a shortfall should thus provide an evolutionary pressure for genes encoding complex-forming proteins to be cotranscribed. This prediction is supported by comparisons of metabolic genes conserved in diverse eubacterial and archaebacterial genomes (Table 1). Genes encoding components of a strong complex (e.g., trpA-trpB) are more likely to be cotranscribed than genes encoding noncomplex-forming proteins acting in the same pathway (e.g., trpE-trpD).
TABLE 1 .
Genes | Complexa | Cotranscription (% of genera)b |
No. of genera | |
---|---|---|---|---|
Likelyc | Unlikelyd | |||
trpB-trpA | + | 69 | 21 | 204 |
malF-malG | + | 100 | 0 | 9 |
carA-carB | + | 58 | 29 | 226 |
nrdA-nrdB | + | 63 | 18 | 68 |
trpE-trpD | – | 6 | 42 | 178 |
malE-malF | – | 43 | 43 | 7 |
thrA-thrB | – | 41 | 43 | 58 |
thrB-thrC | – | 20 | 80 | 164 |
+E. coli: trpB-trpA, tryptophan synthase; malF-malG, maltose ABC transporter; carA-carB, carbamoyl phosphate synthetase; nrdA-nrdB, ribonucleoside diphosphate reductase. –, pairs cotranscribed in E. coli and used for the same pathway but are not complex forming (15).
Data shown for one representative member of each genus obtained from the JVCI-CMR database (bacterial and archaeal genomes) (16). Some gene pairs were absent in some genera.
Genes were judged likely to be cotranscribed when they were in the same orientation and the end of the upstream gene is <150 bp from the start of the other.
Genes were judged unlikely to be cotranscribed when the intergenic distance was larger than 5,000 bp or when the genes were in opposite orientations. Some gene pairs could not be placed into either of these categories.
This economic advantage of operons is lessened when fluctuations in protein numbers are smaller. The size of fluctuations decreases when the number of mRNAs or proteins is larger (Fig. 1), which can be achieved by increasing the transcription rate, the translation rate, or the lifetime of the mRNA (see Materials and Methods). In eukaryotic cells, noise generated in the production of complex-forming proteins is kept at a minimum by longer-lived mRNAs and higher transcription rates (5). In Escherichia coli, because the number of transcripts is generally low, it is the variation in these numbers (8, 9) that makes the largest contribution to noise. In Fig. 1 and elsewhere, we have examined fluctuations when an average of 20 proteins is produced per mRNA. However, recent measurements suggest that an average E. coli mRNA produces ~100 proteins (9). This suggests that the noise benefit of operons may be ~2-fold greater than our estimates, since the number of mRNAs needed to produce a given number of proteins would be smaller, and thus more noise sensitive, than what we assumed.
The metabolic benefit of operon organization increases when the number of different proteins in the complex is larger (Fig. 2). This is because having more proteins in the complex means that there are more chances for the level of one protein to fall below those of the others and thus to become limiting. The ~20% gain for a 2-protein operon increases to ~30% for a 4-protein complex and to more than 50% for a 30-protein complex (e.g., a bacteriophage particle). Thus, for E. coli, in which one-third of the transcription units are polycistronic (10), we estimate an overall lowering of the cell’s metabolic cost by at least 0.2% due to this effect of cotranscription (Table 2).
TABLE 2 .
Abundance | Complex | No. of complexes per cell | % contribution to dry cell mass | Avg operon size | % gain per operon | Mass × gain (%) |
---|---|---|---|---|---|---|
High | Translation apparatus heat shock proteins | 20,000 | 10 | 4 | 2 | 0.2 |
Medium | RNA polymerase β, β′ | 4,000 | 1 | 2 | 3 | 0.03 |
Lowa | Other | (~500) | (2.5 to 5) | (4) | (~10) | (~0.25 to 0.5) |
Numbers are estimates of average values or ranges for these complexes.
Conversely, operon organization has the potential disadvantage of increasing fluctuations in the level of the complex (Fig. 1) because the fluctuations in the components occur in synchrony and thus do not cancel each other out. Although this effect is small, roughly 20%, it does not diminish with increasing protein numbers (Fig. 2). Thus, it may be significant in larger systems which require complex regulation. This effect and the reduced regulatory flexibility of cotranscription may favor independent transcription units in eukaryotes. In bacteria and archaea, the metabolic savings in the production of protein complexes seem to dominate, promoting operon formation.
MATERIALS AND METHODS
Computer simulations.
In our simulations, the individual RNA production, degradation, and translation events occur randomly, with rates which secure a preset average protein number in a cell. Throughout the paper, we assume that each ribosome binding site initiates an average of 20 proteins before the mRNA is inactivated by degradation factors. We assume that mRNA has a much shorter lifetime than the encoded proteins and, accordingly, simulate protein production as an instant event happening immediately after the production of each mRNA. We implement this by assigning each newly synthesized mRNA a protein production capacity c drawn from an exponential distribution with a mean number of 20. Subsequently, we increase the concentration of each protein encoded by the mRNA by an amount drawn from a Poisson distribution with mean c. In this way, protein production by subsequent genes carried by a given polycistronic mRNA will vary to an extent, given by the variations in the number of random translation initiations. Finally, protein dilution upon cell division was taken into account by randomly distributing each protein between the daughter cells.
Assumption of equal production for proteins consecutively encoded by the same mRNA.
In our simulations, we assign identical protein production capacity to each ribosome binding site on a polycistronic mRNA. In this way, the intrinsic noise between two proteins (A and B) encoded by the same mRNA is calculated , proportional to , which decays inversely with the protein concentration. Assuming equal protein production capacity from different parts of the same mRNA seems appropriate to estimate the gain of cotranscription, because a different average protein production capacity would obviously lead to systematic wastage if the proteins are required in equimolar amounts in a functional complex. Such a systematic difference in production capacity was present in a previous study (7) and resulted in the underestimation of the noise reduction due to cotranscription alone. Any systematic differences in protein production caused by a time delay in transcription or the directionality of mRNA decay (7) can easily be compensated by altering the translation initiation sites of the genes without affecting the advantage of the operon arrangement.
Premature termination of an RNA polymerase within an operon can produce systematic decreases in protein production from distal genes (11) and can be compensated for in the same way to maintain equal numbers of the components of the complex. However, this kind of polarity reduces the transcriptional coupling between the genes and reduces the noise benefits of operon organization.
Factors that control fluctuations in intracellular protein numbers.
In our paper, we focus on the amount of protein complex formed relative to the amount that would be produced if protein production and degradation were noise free. In our simulations, we assume that noise is independent of the mRNA lifetime. This is true when the mRNA lifetime is much shorter than the protein lifetime. Even in a more general case, where we do not make such an assumption, we can calculate noise in the protein number as follows:
where 〈P〉 and are the average and the variance of the protein number, respectively, γm and γp are the degradation rates for mRNA and protein, respectively, and kc and kl are transcription and translation rates, respectively (12).
Thus, the protein noise indeed approaches a constant for large γm and kc, provided that the average number of protein copies produced per mRNA, kl/γm (in our simulations, 20 copies), is kept fixed.
Calculations of metabolic gain.
The estimate of overall synthetic gain of at least 0.23% for protein complex formation due to the use of operons in E. coli (Table 2) uses the measurements of Pedersen et al. (13), with later identifications of protein spots for high- and medium-abundance proteins, and an estimate of 50% protein for the dry cell mass (14). This gain is a minimum estimate, as it ignores the fraction of the protein mass comprising numerous different complex-forming proteins with lower expression levels whose encoding genes are cotranscribed. These could make a large contribution to the gain (Table 2).
ACKNOWLEDGMENTS
We thank Stanley Brown of CMOL for suggesting the examination of the evolutionary conservation of cotranscription of complex-encoding genes.
This work was supported by the Danish National Research Foundation.
Footnotes
Citation Sneppen, K., S. Pedersen, S. Krishna, I. Dodd, and S. Semsey. 2010. Economy of operon formation: cotranscription minimizes shortfall in protein complexes. mBio 1(4):e00177-10. doi:10.1128/mBio.00177-10.
REFERENCES
- 1. Lawrence J. G. 2002. Shared strategies in gene organization among prokaryotes and eukaryotes. Cell 110:407–413 [DOI] [PubMed] [Google Scholar]
- 2. Pal C., Hurst L. D. 2004. Evidence against the selfish operon theory. Trends Genet. 20:232–234 [DOI] [PubMed] [Google Scholar]
- 3. Price M. N., Arkin A. P., Alm E. J. 2006. The life-cycle of operons. PLoS Genet. 2:e96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Tabor J. J., Bayer T. S., Simpson Z. B., Levy M., Ellington A. D. 2008. Engineering stochasticity in gene expression. Mol. Biosyst. 4:754–761 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Fraser H. B., Hirsh A. E., Giaever G., Kumm J., Eisen M. B. 2004. Noise minimization in eukaryotic gene expression. PLoS Biol. 2:e137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Lovdok L., Bentele K., Vladimirov N., Muller A., Pop F. S., Lebiedz D., Kollmann M., Sourjik V. 2009. Role of translational coupling in robustness of bacterial chemotaxis pathway. PLoS Biol. 7:e1000171 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Swain P. S. 2004. Efficient attenuation of stochasticity in gene expression through post-transcriptional control. J. Mol. Biol. 344:965–976 [DOI] [PubMed] [Google Scholar]
- 8. Golding I., Paulsson J., Zawilski S. M., Cox E. C. 2005. Real-time kinetics of gene activity in individual bacteria. Cell 123:1025–1036 [DOI] [PubMed] [Google Scholar]
- 9. Taniguchi Y., Choi P. J., Li G. W., Chen H., Babu M., Hearn J., Emili A., Xie X. S. 2010. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science 329:533–538 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Cho B. K., Zengler K., Qiu Y., Park Y. S., Knight E. M., Barrett C. L., Gao Y., Palsson B. O. 2009. The transcription unit architecture of the Escherichia coli genome. Nat. Biotechnol. 27:1043–1049 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Cozy L. M., Kearns D. B. 2010. Gene position in a long operon governs motility development in Bacillus subtilis. Mol. Microbiol. 76:273–285 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Paulsson J. 2004. Summing up the noise in gene networks. Nature 427:415–418 [DOI] [PubMed] [Google Scholar]
- 13. Pedersen S., Bloch P. L., Reeh S., Neidhardt F. C. 1978. Patterns of protein synthesis in E. coli: a catalog of the amount of 140 individual proteins at different growth rates. Cell 14:179–190 [DOI] [PubMed] [Google Scholar]
- 14. Stouthamer A. H. 1973. A theoretical study on the amount of ATP required for synthesis of microbial cell material. Antonie Van Leeuwenhoek 39:545–565 [DOI] [PubMed] [Google Scholar]
- 15. Keseler I. M., Bonavides-Martinez C., Collado-Vides J., Gama-Castro S., Gunsalus R. P., Johnson D. A., Krummenacker M., Nolan L. M., Paley S., Paulsen I. T., Peralta-Gil M., Santos-Zavaleta A., Shearer A. G., Karp P. D. 2009. EcoCyc: a comprehensive view of Escherichia coli biology. Nucleic Acids Res. 37:D464–D470 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Peterson J. D., Umayam L. A., Dickinson T., Hickey E. K., White O. 2001. The comprehensive microbial resource. Nucleic Acids Res. 29:123–125 [DOI] [PMC free article] [PubMed] [Google Scholar]