Abstract
Gene regulatory networks lie at the heart of cellular computation. In these networks, intracellular and extracellular signals are integrated by transcription factors, which control the expression of transcription units by binding to cis-regulatory regions on the DNA. The designs of both eukaryotic and prokaryotic cis-regulatory regions are usually highly complex. They frequently consist of both repetitive and overlapping transcription factor binding sites. To unravel the design principles of these promoter architectures, we have designed in silico prokaryotic transcriptional logic gates with predefined input–output relations using an evolutionary algorithm. The resulting cis-regulatory designs are often composed of modules that consist of tandem arrays of binding sites to which the transcription factors bind cooperatively. Moreover, these modules often overlap with each other, leading to competition between them. Our analysis thus identifies a new signal integration motif that is based upon the interplay between intramodular cooperativity and intermodular competition. We show that this signal integration mechanism drastically enhances the capacity of cis-regulatory domains to integrate signals. Our results provide a possible explanation for the complexity of promoter architectures and could be used for the rational design of synthetic gene circuits.
Synopsis
Transcription regulatory networks are the central processing units of living cells. They allow cells to integrate different intracellular and extracellular signals to recognize patterns in, for instance, the food supply of the organism. The elementary calculations are performed at the cis-regulatory domains of genes, where transcription factors bind to the DNA to regulate the expression level of the genes. The logic of the computations that are performed depends upon the design of the cis-regulatory region. Not only in eukaryotic cells, but also in prokaryotic cells, the architectures of the cis-regulatory regions are often highly complex. They often contain long arrays of transcription factor binding sites. Moreover, the binding sites often overlap with one another. Hermsen, Tans, and ten Wolde discuss whether such complex architectures can be explained from the basic function of cis-regulatory regions to integrate signals. The authors combine a physicochemical model of prokaryotic transcription regulation with an evolutionary algorithm to design cis-regulatory constructs with predefined elementary functions. The resulting architectures make extensive use of repeating binding sites that are organized into cooperative modules. More surprisingly, these modules often overlap with each other, leading to competition between them. This interplay between intramodular cooperativity and intermodular competition is a powerful mechanism to achieve complex functionality, which may explain the daunting complexity of promoter architectures found in nature.
Introduction
Cells continually have to make logical decisions. Many of these decisions are taken in the cis-regulatory regions of genes, which can function as analog implementations of logic gates [1–3]. A classical example is the lactose system in the bacterium Escherichia coli, where the lac operon is strongly expressed only if the concentration of active CRP, due to the absence of glucose, is high and that of active LacI, due to the presence of lactose, is low. This network can be interpreted as a logic gate with two input signals, namely the concentrations of the transcription factors (TFs) CRP and LacI, and one output signal, the expression level of the operon; indeed, this gate could be classified as an ANDN gate. The lactose system has been studied in much detail both theoretically and experimentally and is now fairly well understood [4–7]. However, even in prokaryotes, many cis-regulatory regions are much more complex than that of the lac operon. Figure 1, taken directly from the EcoCyc database version 9.5 [8], shows four typical examples. The cis-regulatory regions often contain long tandem arrays of TF binding sites. Moreover, many TFs can both activate and repress the same operon. Perhaps most strikingly, TF binding sites often overlap with one another. We have performed a statistical analysis of the importance of repetitive and overlapping binding sites in E. coli, based on the EcoCyc database [8]. The results are shown in Figure 2. We find that 37% of the TF–operon interactions are mediated by more than one binding site and 39% of the binding sites overlap with at least one other site. The question arises what kind of functionality these complex structures can convey [9]. Here we present theoretical results that suggest that these intricate structures are a consequence of the functional requirement of cis-regulatory domains to integrate signals. Our results identify a new mechanism for signal integration during transcriptional regulatory control, which is based upon the interplay between cooperative binding of TFs to adjacent sites and competitive binding of TFs to overlapping sites.
To elucidate the origin of the complicated structures shown in Figure 1, we have adopted a novel approach. Using an evolutionary algorithm [10], we have designed prokaryotic cis-regulatory domains with predefined functions in silico. In our approach, no specific promoter architectures are specified a priori: the space of possible architectures is sampled in an unbiased manner. This makes it possible to elucidate new architectures and to find the optimal design for a cis-regulatory domain that is consistent with a required function. The design principles of these architectures are then extracted a posteriori. As we will show below, this approach has allowed us to reveal new design principles of transcriptional regulation, which would have been difficult to obtain using the more conventional approach of studying particular architectures.
To design prokaryotic cis-regulatory domains, we have developed a new model of prokaryotic transcriptional regulation, in which the input–output relation of an operon is deduced from the amino-acid sequences of the TFs and the base-pair (bp) sequence of the cis-regulatory region of the operon. To go from sequence to network function, i.e., the input–output relation, the model contains the following key ingredients (see Figure 3): (i) each TF can bind anywhere on the cis-regulatory region; this directly implies that to a given location, all TFs can bind; (ii) the affinity of a TF for a certain location is determined by its DNA sequence and the amino acids in the DNA-binding domain of the TF; the binding energies of the amino-acid–bp contacts are extracted from a matrix that is based on crystallographically solved protein–DNA complexes [1]; (iii) TFs cannot overlap in space, but binding sites can overlap along the DNA; TFs thus compete with each other for binding to overlapping sites; (iv) TFs that bind close to each other on the DNA exhibit a cooperative interaction [6]; we consider the case where a TF can bind cooperatively with two neighboring TFs, thus allowing for oligomerisation on the DNA; although some TFs are known to have this property, this is not likely to be a generic property of all TFs; (v) the transcription rate of operons is controlled via the mechanism of “regulated recruitment,” meaning that TFs function by stimulating or hindering the binding of RNA polymerase (RNAP) to the DNA [6]. Although this is the dominant mechanism in prokaryotes, we note that many alternative mechanisms are used as well (see Text S1). To describe the input–output relationship for an operon quantitatively, we employ the statistical mechanical approach developed by Shea and Ackers [12] and Buchler et al. [1].
This model makes it possible to design cis-regulatory domains by performing rounds of mutation and selection in an evolutionary algorithm. Because the input–output relation is completely specified at the microscopic level of the amino-acid sequences of the TFs and the bp sequences of the cis-regulatory regions, new architectures can be obtained by introducing mutations at the microscopic (sequence) level, while selecting at the macroscopic level of the input–output relation. Importantly, neither the architectures of the cis-regulatory regions, nor the functional form of the gene regulatory functions, have to be specified a priori: in the course of our simulations, TF binding sites emerge naturally as sites with a particularly high affinity for a certain TF. While the evolutionary algorithm is not designed to closely mimic natural or directed evolution, it does make it possible to freely explore the space of possible promoter architectures.
We have used our approach to design all possible transcriptional logic gates with two input signals and one output signal (see Table 1). These gates have been studied by Buchler et al. using a rational design approach [1]. Our simulations, however, unravel new design principles. In spite of the simplicity of the model, quite complex functionality can emerge. In particular, we find that promoter architectures are often constructed from modules that consist of tandem arrays of binding sites to which TFs can bind cooperatively (see Figure 4). Furthermore, these modules often overlap, leading to competition between them. We show that the intricate interplay between intramodular cooperativity and intermodular competition allows for a wide range of regulatory functions.
Table 1.
Methods
In the next section, we describe our model of prokaryotic transcriptional regulation, in which the input–output relation is determined by the amino-acid sequences of the TFs and the bp sequence of the cis-regulatory region of the operon. The evolutionary design method, in which mutations are made at the level of the TFs' amino-acid sequences and the promoter's bp sequence, while selection is performed on the input–output relation, is described in the subsequent section.
Model of Transcriptional Regulation
We assume that the transcription rate of an operon is proportional to the fraction of time RNAP is bound to the promoter [1,12–14]. The model we use to compute this quantity is illustrated in Figure 3. The RNAP-σ binds only to the −10 and −35 hexamers, called the core promoter, and we determine its binding energy by comparing the core promoter to a large set of real E. coli promoters [15–18] (see Text S1). We ignore the fact that, in some promoters, the affinity of the RNAP for the promoter is enhanced by interactions of its α C-terminal domain with DNA upstream of the −35 hexamers. TFs can bind to any site in the cis-regulatory region. Whenever a TF binds to the DNA, each amino acid interacts with exactly 1 bp, and the total binding free energy is the sum of the contributions of each amino-acid–bp contact. This is known to be a reasonable approximation for many TFs, although exceptions have also been documented [16–21]. The binding energies associated with each amino-acid–bp contact are extracted from a matrix based on crystallographically solved protein–DNA complexes [1]. The results, however, do not depend critically upon the precise values of the matrix elements; random matrices with the same mean and standard deviation give similar results. Note that some real TFs can bind ligands or can become phosphorylated; in that case the TF concentration in our model corresponds to the concentration of the DNA-binding form of the TF.
The model allows for two types of TF–TF and TF–RNAP interactions (see Figure 3) [6]. First, we include steric hindrance: molecules cannot overlap in space. Second, we include a cooperative interaction of energy E TF−TF between any pair of TFs when they bind within a distance of k bp. Likewise, if a TF and RNAP bind close together, we assume a synergetic energy E TF−P [24]. We thus assume that in our model TFs can bind cooperatively with themselves, with RNAP, and with other TFs. Although some TFs are known to have all these properties (for instance MalT and MelR), it is unlikely that this is the case for all TFs. Our results will show, however, that combinations of some of these properties allow for myriad promoter functions.
Cooperative interactions between proteins can have two distinct origins. The first is via direct contact between patches on their surface. On the better-characterized, but relatively simple promoters, TFs typically exhibit cooperative interactions with one adjacent TF on the DNA, thus leading to dimers, and not to longer oligomers. Nevertheless, experiments show that TFs exist that bind cooperatively to multiple binding sites (e.g., MalT [25], MelR [26], MetJ [27], Lrp [28], Fur [29], and ArcA [30,31]). Indeed, complex promoters with long arrays of binding sites are frequently observed, as shown in Figures 1 and 2. It is conceivable that on these complex promoters the TFs have multiple patches, thus allowing them to bind cooperatively into long oligomers. In this context, it is important to note that these protein–protein interactions are very weak and are therefore not likely to be detected in large-scale experiments such as those of [32].
The second mechanism for protein–protein interactions is indirect. Here, the interactions are mediated via the DNA. Cooperativity can result from bending, stretching, or super-coiling the DNA by one of the molecules, thereby affecting the binding affinity of the other [6,33]. Although the nature and the strength of these cooperative interactions is still not fully understood, at the level of our statistical–mechanical model, such mechanisms can be described in the same way as cooperativity by direct contact. This means that most effects of local chromosome structure are implicitly included in the model. Importantly, such indirect interactions could also give rise to TFs binding cooperatively into long oligomers.
However, the model does not allow action at a distance. Therefore, mechanisms involving global chromosome structure, such as DNA looping, are not included. Also, mechanisms that rely on direct interactions between the RNAP and TFs bound farther upstream, for instance, through contact with the flexible RNAP α C-terminal domain, are not possible in our model, although it could be extended to incorporate such effects [1].
We use the statistical mechanical approach developed by Shea and Ackers [12] and Buchler et al. [1] to describe the input–output relationship for an operon in a quantitative way. To compute the influence of each TF on the transcription rate in a tractable way, we have developed a fast algorithm that efficiently takes into account all TF–DNA, TF–TF, and TF–RNAP interactions (see Text S1).
Evolutionary Design of Logic Gates
We combined our model with an evolutionary algorithm to design transcriptional logic gates consisting of one operon, regulated by two TFs. Typically, 250 gates, with initially random DNA and amino-acid sequences, were subjected to cycles of mutation and selection. In each cycle, point mutations were introduced; the probability of a mutation occurring within a given cis-regulatory region or TF was 0.85 and 0.3, respectively, but the results do not depend strongly on these values. Next, the top 20% of the gates were selected and the others were removed. To complete the cycle, we finally refilled the empty slots by copying randomly chosen genotypes from the selected gates.
To select the top 20% of the gates, we define a fitness function that quantifies the quality of the gate. The transcription rate A of a gate depends on the concentrations c 1 and c 2 of the two TFs: A = A(c 1,c 2). First, we compute the transcription rate for 16 values of (c 1,c 2) in the range 0–1,000 nM; for the AND gate in Figure 5, these 4 × 4 values are depicted as red dots. For each of these points, we determine how far A deviates from a goal function G(c 1,c 2), which is defined by the logic gate we are trying to obtain. Next, we compute the sum of the squares of these deviations. If this quantity is small, then the fitness is considered high (see Text S1). Our fitness function selects for rather steeply switching gates, since the switching is required to take place between ci = 333 nM (considered low) and ci = 667 nM (considered high). We also implicitly assume that all conditions are equally important; each of the 16 points has an equal weight in the fitness function. In reality, this is not necessarily the case: the fitness cost of a gene being “on” at a wrong time, need not match the cost of one that is “off” when it should not be (see also [33]). To elucidate general design principles, we select for idealized promoter functions, although, clearly, in nature the input–output relations can be more intricate; an example is the lac promoter, which is not a perfect ANDN gate [7].
Results
cis-Regulatory Constructs
Figures 5 and 6 show typical simulation results for the gates in Table 1. Clearly, the architectures can be quite complex. Interestingly, the final constructs do not depend much on the initial conditions; this can be regarded as a simple example of convergent evolution. Moreover, they are remarkably similar to the structures found in E. coli, as we now describe.
Homo-Cooperative Auxiliary Sites Provide Steep Responses
We can distinguish two kinds of binding sites. Binding sites from where the TFs directly interact with the RNAP are called primary sites. Primary activator sites are located right next to the −35 hexamer of the core promoter, while primary repressor sites directly overlap with the core promoter. The remaining binding sites are called auxiliary or secondary sites [9]. These sites provide cooperativity. The main function of cooperativity between identical TFs, called homo-cooperativity, is to create steep responses [1,34]. We find that activating and repressing binding sites are both regularly supported by (tandem arrays of) auxiliary sites.
Activation.
In cooperative arrays of activation sites, the auxiliary site farthest removed from the core promoter usually has the strongest affinity. This can be seen in the cis-regulatory regions of EQU, ORN, XOR, and ANDN. Further analysis shows that this pattern enhances the steepness of response (see Text S1). The steepness is optimal if the binding affinities of the farthest site and those of the other sites differ by a factor of 2 to 14, depending on the strength of the promoter, the value of the interaction energies (E TF−P and E TF−TF), and the number of tandem repeats: this way, the steepness can be enhanced up to 27%. A similar result was presented in [14] for systems with one auxiliary site, in the context of the regulation of the phage λ promoter P RM. We therefore predict that activating auxiliary sites in real promoters regularly have a higher affinity than their primary sites.
It may be useful to repeat that we define auxiliary sites as sites that do not interact directly with the RNAP. If, in real E. coli promoters, one of the upstream sites does interact with RNAP, for instance via direct contact with the α C-terminal subdomain of the RNAP, then such a site is, by definition, a primary site. If such a distant primary site is accompanied by an auxiliary site, then this auxiliary site still needs to have a higher affinity than its primary site, in order to maximize the steepness of response.
In E. coli, homo-cooperative activation occurs regularly. For example, the TFs of the LysR family often bind to two sites, one at −65 and the other close to the −35 hexamer of the core promoter [35,36]. In some cases, the TFs bind cooperatively to these sites; in these cases the site at −65 has a stronger affinity than that near −35 [37,38], as one would expect from our results. Another example is the activation of the P RM promoter in phage-λ by CI, which binds more strongly to the auxiliary site (OR1) than to the primary activation site (OR2) [12,14]. We note however, that this example is complicated by the fact that OR1 and OR2 are also involved in repressing the P R promoter. We will get back to this in the next subsection.
Repression.
In contrast to the activation modules, the auxiliary sites in repressor complexes are usually much weaker than the primary ones (see, e.g., ORN and EQU). Further analysis (see Text S1) shows that the steepness of repression is optimal if the primary site has a 5× to 50× higher affinity than the auxiliary sites (depending on the promoter strength, the values of the interaction energies, and the number of tandem repeats). This pattern can increase the maximal steepness of the response by about 70%, as compared with the case where all sites have an equal affinity. We therefore predict that auxiliary sites in real repressor systems should often be weak.
Indeed, most well-characterized repressor systems in E. coli have auxiliary operators [9,39], many of which are weak. For example, the two cooperative Fur-binding sites that overlap the core promoter on the pColV-K30 plasmid are supported by an array of low-affinity auxiliary sites [29]. A second example is the duo of dnaA promoters, 1P and 2P [40]. At low concentrations, DnaA represses only 1P, but at high concentrations it blocks both promoters, as a result of the cooperative binding of up to four DnaA monomers to weak binding sites overlapping the 2P region [40]. Other examples are the TrpR repressor on the trp promoter [41] and the Fis repressor on the aldB promoter [42]. Finally, the gltA-sdhC intergenic region contains at least two high-affinity ArcA-P repressor sites, one overlapping the gltA promoter and one blocking the sdhC promoter; at higher ArcA-P concentrations, both binding regions broaden until ArcA-P covers a region of about 230 bp, suggesting ArcA-P oligomerization on the DNA [30,31].
In the previous subsection, we mentioned the activation of P RM by CI in the bacteriophage λ as an example of cooperative activation, and argued that steep activation requires that the auxiliary site OR1 should be considerably stronger than the primary site OR2. Interestingly, the same CI binding sites OR1 and OR2 are also involved in repressing the P R promoter. But the binding sites now have reversed roles: from the point of view of promoter P R, OR1 is the primary repressor site, and OR2 is auxiliary. However, since we just concluded that, in repressor systems, primary sites should be stronger than auxiliary sites, we conclude that both for steep activation of P RM and for steep repression of P R, site OR1 needs to be stronger than site OR2, as is indeed the case.
As a final remark on homo-cooperativity, we point out that, while cooperativity is used widely, as Figure 1 shows, many of the better-characterized promoters, such as the lac promoter, have a simpler architecture. It should be realized that the number of binding sites not only depends upon the complexity of the desired input–output relation, but also upon the required cooperativity. If, for instance, we select for simpler gates with a weaker response function, we do obtain simpler promoter architectures (unpublished data).
Hetero-Cooperativity Provides Conditional Responses
While the benefit of homo-cooperativity is to create steep responses, the function of cooperativity between different molecular species, hetero-cooperativity, is rather to integrate signals. It can be used whenever a response should be conditional on the presence of more than one TF. A good example is the AND gate. As with the OR gate, this gate requires a weak promoter—this ensures that the operon is not transcribed when both TFs are absent. In contrast to the OR gate, however, the AND gate should be on only when both TF1 and TF2 are present. The activation is therefore mediated by a TF1 binding site that is too weak to be functional by itself. Next to this site, a stronger TF2 binding site is present. Only when TF1 and TF2 are both present do they bind cooperatively and induce activation [1]. The remaining sites can bind either TF1 or TF2 and are responsible for the steepness of the response.
Activation.
Hetero-cooperative activation is found regularly in naturally occurring promoters. A good example is the activation of the melAB operon by MelR, which binds to four sites [25,26]. A CRP binding site is present between MelR sites 2 and 3. Here, CRP binds cooperatively with the downstream MelR sites. This increases their fractional occupancy, resulting in transcription activation. Another excellent example is the malKp promoter (see Figure 1D) [25,43], which is discussed below.
Repression.
The CytR regulon provides an example of hetero-cooperative repression. CytR often binds cooperatively with cAMP-CRP to form a repression complex. Good examples are udp [44], nupG [45], tsx-p2 [46] and deoP2 [47]; see also [48,49]. Recently, it has also been shown that Lrp and H-NS act cooperatively at the rrnB promoter [50].
Competition between Modules
Whenever binding sites overlap, competition between TF complexes occurs. It is well-known that the core promoter often overlaps with an operator; this is a standard repression mechanism [9]. The role of overlapping TF binding sites in signal integration has been less commented on. Clearly, a repressor that binds to an operator overlapping with an activator site can be used to create anti-activation. Likewise, anti-repression occurs when a binding site overlaps with a repressor site, but not with the core promoter. But the full potential of this type of competition only becomes clear when it is combined with cooperativity. Our NOR, NAND, EQU, and XOR gates serve as instructive examples.
Sharpening repression by competitive activation.
The NOR gate (see Figures 5 and 6 and Table 1) combines competition and homo-cooperativity. This gate contains both activator and repressor sites for each TF. The single activator sites are strong compared with the repressor sites; as a result, activation dominates at low TF concentrations. However, as the TF concentrations increase, the affinity of the repressor module grows more rapidly; this is the result of the homo-cooperativity between the repressor sites. Consequently, at high TF concentrations repression dominates. The function of the activating sites is thus to counteract repression at low concentrations, thereby increasing the switching steepness. As it turns out, whenever we select for steep repression, we also get activation. The general message is that using competing modules containing different numbers of homo-cooperative binding sites, a TF can effectively be both an activator and a repressor, depending on its concentration.
The NAND gate looks rather similar to the NOR gate, but uses hetero- instead of homo-cooperativity. Repression dominates only if both TF1 and TF2 are present in sufficient concentrations. This shows that by combining competition and hetero-cooperativity, a TF can either be an activator or a repressor, conditionally on the concentration of another TF.
Intramodular cooperativity and intermodular competition.
In the EQU gate all mechanisms act in concert. In an EQU gate the operon must be on when the concentrations of both TFs are low; this requires a strong promoter. If either TF1 or TF2 is present, the operon must be off; this requires homo-cooperative repression modules, which block the binding of RNAP when either TF1 or TF2 is present. However, if both TF1 and TF2 are present in similar concentrations, the operon must be on; this requires a hetero-cooperative activation module that counteracts the effect of the homo-cooperative repression modules.
In the XOR gate, the same mechanisms act, but in an opposite manner: if both TFs are absent, the operon should be off; this requires a weak promoter. If one of the two TFs is present, the operon should be on; this demands homo-cooperative activation modules, which recruit the RNAP when only one of the two TFs is present. If both TFs are present, however, the operon should be off; this requires a hetero-cooperative repression module that neutralizes the actions of the homo-cooperative modules when both TFs are present.
In both gates, the homo-cooperative and hetero-cooperative modules have to compete with one another. This is achieved via the binding of the TFs to overlapping binding sites. Which module wins the competition depends upon the TF concentrations, the number of TFs in the modules, and upon the quantitative details of the protein–protein and protein–DNA interactions. Text S1 discusses both gates quantitatively.
Similar mechanisms are known to occur in E. coli. The malKp promoter (see Figure 1D) provides a good example, although its full input–output relation is more complex than those of the logic gates studied here. In the presence of CRP, MalT binds to three tandem sites to form the activation complex [25,43]. In the absence of CRP, however, MalT binds with relatively high affinity to an alternative triplet of repressor sites that overlaps the activation complex, thereby repressing malK. As in the EQU gate presented here, the activation complex has to compete with the repression complex; the CRP concentration determines whether MalT acts as a repressor or as an activator [25,43].
Discussion
We have developed a model of transcriptional regulation and applied it to the evolutionary design of transcriptional logic gates in prokaryotes. Our approach has revealed new design principles, which would have been difficult to predict using a rational design approach. In particular, our analysis stresses the importance of the interplay of the following mechanisms: 1) homo-cooperative interactions between TFs within modules; 2) hetero-cooperative interactions between TFs within modules; 3) competition between TF modules. Using these mechanisms only, a wide range of input–output relations can be produced, including the full repertoire of cis-regulatory logic gates with two input signals and one output signal.
The resulting constructs make extensive use of cooperative tandem binding sites. Homo-cooperativity is often used as a means of achieving high Hill coefficients. In such tandem arrays of binding sites, weak sites can be important. In repressive arrays, auxiliary sites are usually weak, while in activating arrays the auxiliary sites tend to have the highest affinity. Hetero-cooperativity allows for regulation conditional on the presence of more than one TF species. Hetero-cooperativity within modules thus plays a central role in integrating different signals; in the gates studied here, a hetero-cooperative module only becomes active if both TFs are present. While many promoters in nature exhibit long arrays of binding sites (see Figures 1 and 2), it seems unlikely that all TFs of E. coli have the capacity to bind cooperatively into long arrays. Indeed, the origin and the degree of cooperativity in these complex structures is still far from understood. We hope that our simulation results encourage experimentalists to characterize complex promoter architectures in more detail.
The capacity to integrate signals is dramatically enhanced by the competition between different modules, as summarized in Figure 6. Competing modules allow the integration of signals, because a) both homo- and hetero-cooperative modules can act as activator modules or as repressor modules; b) when the concentrations of the TFs change, the relative activities of the activating and repressing modules also change. How their activities change with the TF concentrations depends upon the strength of the TF–DNA, TF–TF, and TF–RNAP interactions. It also depends upon the degree of cooperativity: the number of binding sites in a module not only determines the steepness of the response, but also affects the concentration range in which the module is active—a large module will dominate an overlapping, but smaller one at sufficiently high TF concentrations, even when the individual TFs in the larger module have a weaker affinity for the DNA. Indeed, not only hetero-cooperativity but also homo-cooperativity can play an essential role in signal integration (see also Figure 3 in Text S1). In Text S1, we discuss in more detail how the mechanisms of cooperative and competitive binding of TFs could be used for the rational design of transcriptional logic gates.
Our results provide a possible explanation for the complexity of cis-regulatory regions found in E. coli, which, indeed, often contain tandem TF binding sites and overlapping sites. Our analysis suggests that these complex architectures are a natural consequence of the basic mechanisms of transcriptional regulation and, on the other hand, the function of cis-regulatory domains to integrate signals. While we focus here on prokaryotes, it should be clear that similar integration mechanisms might also operate in the cis-regulatory domains of transcription units in eukaryotes; ample anecdotal evidence exists, e.g., for the role of adjacent and overlapping TF binding sites in signal integration during embryonic development of the sea urchin [3] and Drosophila [51]. Our results also emphasize that understanding the complex promoters observed both in our simulations and in nature, requires quantitative knowledge of binding affinities and interactions: from the binding site locations only, it is often not possible to distinguish an AND gate from an OR, nor a NAND from a NOR.
In this paper, we have used our evolutionary design method to design cis-regulatory domains of single operons. This method, however, could also be applied to design larger networks, such as multi-input modules [52]. As the network size increases and regulons become larger, we expect that it will become increasingly more difficult to fulfill all constraints imposed on the promoter and TF sequences. For these larger networks, not only positive design—selecting for desired TF–DNA interactions—but also negative design—selecting against unwanted TF–DNA interactions—may be an important design criterion. Our approach could also be extended to design feedback networks. By selecting transcription networks containing multiple genes based on their dynamics, we can design feedback systems such as transcriptional oscillators and bistable switches [10].
Here, we used our method to design transcriptional logic gates. For this reason, our evolutionary algorithm was not developed to mimic natural or directed evolution. However, with suitable modifications and extensions, our approach could also be used to study questions that are pertinent to the evolution of functional promoter regions, such as what the pathways of evolution are, and how the evolution of logic gates depends upon factors such as population size, neutral drift, and mutation rates.
Finally, the proposed signal integration mechanism of intramodular cooperativity versus intermodular competition could be tested experimentally by rationally designing cis-regulatory constructs. But perhaps more interesting would be to see whether an evolutionary design method can be used. Recently, Yokobayashi et al. demonstrated experimentally that directed evolution can be used to change protein–DNA and protein–protein interactions in a rationally designed, but nonfunctional gene circuit to obtain a functional network [53]. Perhaps a similar method can be used to design, by experiment, transcriptional logic gates with desired input–output relations. Since no specific promoter designs have to be imposed, it would be interesting to see whether the resulting architectures exploit the signal integration mechanism of competing binding site modules.
Supporting Information
Accession Numbers
In Table 2 we list SwissProt database accession numbers of the genes and proteins mentioned in this article.
Table 2.
Acknowledgments
We would like to thank Dennis Bray, Nick Buchler, Rosalind Allen, Frank Poelwijk, and Simon Tindemans for helpful suggestions and their careful reading of the manuscript.
Abbreviations
- RNAP
RNA polymerase
- TF
transcription factor
Footnotes
Competing interests. The authors have declared that no competing interests exist.
A previous version of this article appeared as an Early Online Release on October 23, 2006 (doi:10.1371/journal.pcbi.0020164.eor).
Author contributions. RH and PRtW conceived and designed the experiments. RH performed the experiments. RH, ST, and PRtW analyzed the data and wrote the paper.
Funding. This work is part of the research program of the Stichting voor Fundamenteel Onderzoek der Materie (FOM), which is financially supported by the Nederlandse organisatie voor Wetenschappelijk Onderzoek (NWO).
References
- Buchler NE, Gerland U, Hwa T. On schemes of combinatorial transcription logic. Proc Natl Acad Sci U S A. 2003;100:5136–5141. doi: 10.1073/pnas.0930314100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Istrail S, Davidson EH. Gene regulatory networks special feature: Logic functions of the genomic cis-regulatory code. Proc Natl Acad Sci U S A. 2005;102:4954–4959. doi: 10.1073/pnas.0409624102. doi: 10.1073/pnas.0409624102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yuh CH, Bolouri H, Davidson EH. Genomic cis-regulatory logic: Experimental and computational analysis of a sea urchin gene. Science. 1998;279:1896–1902. doi: 10.1126/science.279.5358.1896. doi: 10.1126/science.279.5358.1896. [DOI] [PubMed] [Google Scholar]
- Jacob F, Monod J. Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol. 1961;3:318–356. doi: 10.1016/s0022-2836(61)80072-7. [DOI] [PubMed] [Google Scholar]
- Müller-Hill B. The lac operon: A short history of a genetic paradigm. Berlin: Walter de Gruyter; 1996. 207 [Google Scholar]
- Ptashne M, Gann A. Genes and signals. New York: Cold Spring Harbor Laboratory Press; 2002. 208 [Google Scholar]
- Setty Y, Mayo AE, Surette MG, Alon U. Detailed map of a cis-regulatory input function. Proc Natl Acad Sci U S A. 2003;28:1838–1847. doi: 10.1073/pnas.1230759100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keseler IM, Collado-Vides J, Gama-Castro S, Ingraham J, Paley S, et al. EcoCyc:A comprehensive database resource for Escherichia coli. Nucl Acids Res. 2005;33:D334–D337. doi: 10.1093/nar/gki108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Müller-Hill B. Some repressors of bacterial transcription. Curr Opin Micobiol. 1998;1:145–151. doi: 10.1016/s1369-5274(98)80004-0. [DOI] [PubMed] [Google Scholar]
- Francois P, Hakim Y. Design of genetic networks with specified functions by evolution in silico. Proc Natl Acad Sci U S A. 2004;101:580–585. doi: 10.1073/pnas.0304532101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mandel-Gutfreund Y, Margalit H. Quantitative parameters for amino acid-base interactions: Implications for prediction of protein–DNA binding sites. NAR. 1998;26:2306–2312. doi: 10.1093/nar/26.10.2306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shea MA, Ackers GK. The OR control system of bacteriophage lambda. A physical–chemical model for gene regulation. J Mol Biol. 1985;181:211–230. doi: 10.1016/0022-2836(85)90086-5. [DOI] [PubMed] [Google Scholar]
- Bintu L, Buchler NE, Garcia HG, Gerland U, Hwa T, et al. Transcription regulation by the numbers 1: Models. Curr Opin Gen Dev. 2005;15:116–124. doi: 10.1016/j.gde.2005.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bintu L, Buchler NE, Garcia HG, Gerland U, Hwa T, et al. Transcription regulation by the numbers 2: Applications. Curr Opin Gen Dev. 2005;15:125–135. doi: 10.1016/j.gde.2005.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lisser S, Margalit H. Compilation of E. coli mRNA promoter sequences. NAR. 1993;21:1507–1516. doi: 10.1093/nar/21.7.1507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berg OG, von Hippel PH. Selection of DNA binding sites by regulatory proteins. Statistical–mechanical theory and application to operators and promoters. J Mol Biol. 1987;193:723–750. doi: 10.1016/0022-2836(87)90354-8. [DOI] [PubMed] [Google Scholar]
- Berg OG. Selection of DNA binding sites by regulatory proteins. Functional specificity and pseudosite competition. J Biomol Struc Dynam. 1988;6:275–297. doi: 10.1080/07391102.1988.10507713. [DOI] [PubMed] [Google Scholar]
- Berg OG, von Hippel PH. Selection of DNA binding sites by regulatory proteins ii. The binding specificity of Cyclic AMP receptor protein to recognition sites. J Mol Biol. 1987;193:723–750. doi: 10.1016/0022-2836(88)90482-2. [DOI] [PubMed] [Google Scholar]
- Djordjevic M, Sengupta AM, Shraiman BI. A biophysical approach to transcription factor binding site discovery. Genome Res. 2003;13:2381–2390. doi: 10.1101/gr.1271603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fields DS, He Y, Al-Uzri AY, Stormo GD. Quantitative specificity of the mnt repressor. J Mol Biol. 1997;271:178–194. doi: 10.1006/jmbi.1997.1171. [DOI] [PubMed] [Google Scholar]
- Stormo GD, Fields DS. Specificity, free energy and information content in protein–DNA interactions. Trends Biochem Sci. 1998;23:109–113. doi: 10.1016/s0968-0004(98)01187-6. [DOI] [PubMed] [Google Scholar]
- Gerland U, Hwa T. On the selection and evolution of regulatory DNA motifs. J Mol Evol. 2002;55:386–400. doi: 10.1007/s00239-002-2335-z. [DOI] [PubMed] [Google Scholar]
- Gerland U, Moroz DJ, Hwa T. Physical constraints and functional characteristics of transcription factor–DNA interaction. Proc Natl Acad Sci U S A. 2002;99:12015–12020. doi: 10.1073/pnas.192693599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Busby S, Ebright RH. Promoter structure, promoter recognition, and transcription activation in prokaryotes. Cell. 1994;79:743–746. doi: 10.1016/0092-8674(94)90063-9. [DOI] [PubMed] [Google Scholar]
- Richet E. Synergistic transcription activation: A dual role for CRP in the activation of an Escherichia coli promoter depending on MalT and CRP. EMBO J. 2000;19:5222–5232. doi: 10.1093/emboj/19.19.5222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wade J, Belyaeva T, Hyde E, Busby S. A simple mechanism for co-dependence on two activators at an Escherichia coli promoter. EMBO J. 2001;20:7160–7167. doi: 10.1093/emboj/20.24.7160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marincs F, Manfield IW, Stead JA, McDowall KJ, Stockley PG. Transcript analysis reveals an extended regulon and the importance of protein–protein co-operativity for the Escherichia coli methionine repressor. Biochem J. 2006;396:227–234. doi: 10.1042/BJ20060021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Q, Calvo JM. Lrp, a global regulatory protein of Escherichia coli, binds co-operatively to multiple sites and activates transcription of ilvih. J Mol Biol. 1993;229:306–318. doi: 10.1006/jmbi.1993.1036. [DOI] [PubMed] [Google Scholar]
- Escolar L, Perez-Martin J, de Lorenzo V. Evidence of an unusually long operator for the Fur repressor in the aerobactin promoter of Escherichia coli . J Biol Chem. 2000;275:24709–24714. doi: 10.1074/jbc.M002839200. [DOI] [PubMed] [Google Scholar]
- Lynch A, Lin E. Transcriptional control mediated by the ArcA two-component response regulator protein of Escherichia coli: Characterization of DNA binding at target promoters. J Bacteriol. 1996;178:6238–6249. doi: 10.1128/jb.178.21.6238-6249.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shen J, Gunsalus R. Role of multiple ArcA recognition sites in anaerobic regulation of succinate dehydrogenase (sdhCDAB) gene expression in Escherichia coli . Mol Microbiol. 1997;26:223–236. doi: 10.1046/j.1365-2958.1997.5561923.x. [DOI] [PubMed] [Google Scholar]
- Butland G, Peregrin-Alvarez JM, Li J, Yang W, Yang X, et al. Interaction network containing conserved and essential protein complexes in Escherichia coli . Nature. 2005;433:531–537. doi: 10.1038/nature03239. [DOI] [PubMed] [Google Scholar]
- Berg J, Willmann S, Lässig M. Adaptive evolution of transcription factor binding sites. BCM Evol Biol. 2004;4:42. doi: 10.1186/1471-2148-4-42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alberts B, Bray D, Lewis J, Raff M, Roberts K, et al. Molecular biology of the cell. 3rd edition. New York: Garland; 1994. 1408 [Google Scholar]
- Wagner R. Transcription regulation in prokaryotes. New York: Oxford University Press; 2000. 384 [Google Scholar]
- Schell MA. Molecular biology of the LysR family of transcriptional regulators. Ann Rev Microbiol. 1993;47:597–626. doi: 10.1146/annurev.mi.47.100193.003121. [DOI] [PubMed] [Google Scholar]
- Wilson R, Urbanowski M, Stauffer G. DNA binding sites of the LysR-type regulator GcvA in the gcv and gcvA control regions of Escherichia coli . J Bacteriol. 1995;177:4940–4946. doi: 10.1128/jb.177.17.4940-4946.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lamblin A, Fuchs J. Functional analysis of the Escherichia coli K-12 cyn operon transcriptional regulation. J Bacteriol. 1994;176:6613–6622. doi: 10.1128/jb.176.21.6613-6622.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rojo F. Mechanisms of transcriptional repression. Curr Opin Micobiol. 2001;4:145–151. doi: 10.1016/s1369-5274(00)00180-6. [DOI] [PubMed] [Google Scholar]
- Lee YS, Hwang DS. Occlusion of RNA polymerase by oligomerization of DnaA protein over the dnaA promoter of Escherichia coli . J Biol Chem. 1997;272:83–88. [PubMed] [Google Scholar]
- Jeeves M, Evans PD, Parslow RA, Jaseja M, Hyde EI. Studies of the Escherichia coli Trp repressor binding to its five operators and to variant operator sequences. Eur J Biochem. 1999;265:919–928. doi: 10.1046/j.1432-1327.1999.00792.x. [DOI] [PubMed] [Google Scholar]
- Xu J, Johnson R. aldB, an RpoS-dependent gene in Escherichia coli encoding an aldehyde dehydrogenase that is repressed by Fis and activated by Crp. J Bacteriol. 1995;177:3166–3175. doi: 10.1128/jb.177.11.3166-3175.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Richet E, Sogaard-Andersen L. CRP induces the repositioning of MalT at the Escherichia coli malKp promoter primarily through DNA bending. EMBO J. 1994;13:4558–4567. doi: 10.1002/j.1460-2075.1994.tb06777.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brikun I, Suziedelis K, Stemmann O, Zhong R, Alikhanian L, et al. Analysis of CRP-CytR interactions at the Escherichia coli udp promoter. J Bacteriol. 1996;178:1614–1622. doi: 10.1128/jb.178.6.1614-1622.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pedersen H, Dall J, Dandanell G, Valentin-Hansen P. Gene-regulatory modules in Escherichia coli: Nucleoprotein complexes formed by cAMP-CRP and CytR at the nupG promoter. Mol Microbiol. 1995;17:843–853. doi: 10.1111/j.1365-2958.1995.mmi_17050843.x. [DOI] [PubMed] [Google Scholar]
- Gerlach P, Sogaard-Andersen L, Pedersen H, Martinussen J, Valentin-Hansen P, et al. The cyclic AMP (cAMP)–cAMP receptor protein complex functions both as an activator and as a corepressor at the tsx-p2 promoter of Escherichia coli k-12. J Bacteriol. 1991;173:5419–5430. doi: 10.1128/jb.173.17.5419-5430.1991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shin M, Kang S, Hyun S, Fujita N, Ishihama A, et al. Repression of deoP2 in Escherichia coli by CytR: Conversion of a transcription activator into a repressor. EMBO J. 2001;20:5392–5399. doi: 10.1093/emboj/20.19.5392. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tretyachenko-Ladokhina V, Ross J, Senear D. Thermodynamics of E. coli cytidine repressor interactions with DNA: Distinct modes of binding to different operators suggests a role in differential gene regulation. J Mol Biol. 2002;316:531–546. doi: 10.1006/jmbi.2001.5302. [DOI] [PubMed] [Google Scholar]
- Meibom K, Sogaard-Andersen L, Mironov A, Valentin-Hansen P. Dissection of a surface-exposed portion of the cAMP-CRP complex that mediates transcription activation and repression. Mol Microbiol. 1999;32:497–504. doi: 10.1046/j.1365-2958.1999.01362.x. [DOI] [PubMed] [Google Scholar]
- Pul U, Wurm R, Lux B, Meltzer M, Menzel A, et al. LRP and H-NS—Cooperative partners for transcription regulation at Escherichia coli rrna promoters. Mol Microbiol. 2005;58:864–876. doi: 10.1111/j.1365-2958.2005.04873.x. [DOI] [PubMed] [Google Scholar]
- Gilbert SF. Developmental biology. 7th edition. Sunderland (Massachusetts): Sinauer; 2003. [Google Scholar]
- Shen-Orr SS, Milo R, Mangan S, Alon U. Network motifs in the transcriptional regulation network of Escherichia coli . Nat Genet. 2002;31:64–68. doi: 10.1038/ng881. [DOI] [PubMed] [Google Scholar]
- Yokobayashi Y, Weiss R, Arnold F. Directed evolution of a genetic circuit. Proc Natl Acad Sci U S A. 2002;99:16587–16591. doi: 10.1073/pnas.252535999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Madan Babu M, Teichmann SA. Functional determinants of transcription factors in Escherichia coli: Protein families and binding sites. Trends Genet. 2003;19:75–79. doi: 10.1016/S0168-9525(02)00039-2. [DOI] [PubMed] [Google Scholar]
- Pérez-Rueda E, Collado-Vides J. The repertoire of DNA-binding transcriptional regulators in Escherichia coli K-12. Nucl Acids Res. 2003;28:1838–1847. doi: 10.1093/nar/28.8.1838. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.