Abstract
A crucial step towards engineering biological systems is the ability to precisely tune the genetic response to environmental stimuli. In the case of Escherichia coli inducible promoters, our incomplete understanding of the relationship between sequence composition and gene expression hinders our ability to predictably control transcriptional responses. Here, we profile the expression dynamics of 8269 rationally designed, IPTG-inducible promoters that collectively explore the individual and combinatorial effects of RNA polymerase and LacI repressor binding site strengths. We then fit a statistical mechanics model to measured expression that accurately models gene expression and reveals properties of theoretically optimal inducible promoters. Furthermore, we characterize three alternative promoter architectures and show that repositioning binding sites within promoters influences the types of combinatorial effects observed between promoter elements. In total, this approach enables us to deconstruct relationships between inducible promoter elements and discover practical insights for engineering inducible promoters with desirable characteristics.
Subject terms: Thermodynamics, Gene regulation, Bacterial genetics, Synthetic biology
Precisely tuning the genetic response to environmental stimuli is a key step in engineering synthetic biology systems. Here, the authors profile 8269 IPTG-induced promoters to deconstruct the relationship between sequence architecture and gene expression.
Introduction
Inducible promoters are key regulators of cellular responses to external stimuli and popular engineering targets for applications in metabolic flux optimization and biosensing1–3. For example, inducible systems have been designed to function as controlled cell factories for biosynthesis as well as non-invasive diagnostics for gut inflammation4,5. However, these applications generally rely on synthetic inducible promoters that can elicit precisely programmable responses, a quality that is not exhibited by native promoter systems. As a result, there is a demand for strategies to engineer inducible promoters with desirable characteristics, such as minimal expression in the uninduced state (minimal leakiness) and maximal difference between the induced and uninduced states (maximal fold change). More broadly, the design and analysis of synthetic inducible promoter function provide insight on the biophysical processes driving gene regulation.
A variety of approaches have been implemented to engineer inducible promoters, however, these strategies have their shortcomings. Previous studies have had great success implementing biophysical models to tune the relative behaviors of regulatory elements and explain promoter expression, but do not tell us how the repositioning of binding sites influences expression6–10. Directed evolution is a promising strategy that leverages stepwise random mutagenesis and selection to identify favorable promoters, but is generally limited to optimizing within local, evolutionarily accessible sequence space11,12. While this black box approach can produce variants with the desired phenotype, it often requires iterative rounds of library screenings12 and does not inform our ability to logically construct promoters. Lastly, rational design is a promising approach based on the application of pre-existing mechanistic knowledge of inducible systems to generate novel variants13,14. Although there is great potential in rationally designed promoters for achieving specific applications, this approach requires a fundamental understanding of how to engineer these systems.
Inducible promoters consist of cis-regulatory elements that work in concert with multiple trans-acting factors to determine overall expression output15,16. As such, a critical step towards learning how to engineer these systems is to interrogate the combinatorial regulatory effects between promoter-based elements. Years of studies on the inducible lacZYA promoter have revealed many sequence-based factors influencing its regulation. First, the binding affinities of operator sites are critical elements in determining the activity of the repressor protein, LacI17,18. Second, the nucleotide spacing between operator sites is vital as looping-mediated repression is dependent on repressor orientation17,19. Third, the positioning of the repressor sites relative to the RNA polymerase (RNAP) binding sites determines a variety of repression mechanisms and transcriptional behaviors13,14. Fourth, the strength of the core promoter modulates RNAP avidity and thus gene expression6. However, while previous studies have characterized these modular sequence components individually, the combinatorial effects of these features on promoter induction have yet to be explored.
Inspired by previous success in studying the combinatorial logic of E. coli promoters20, we sought to address these obstacles by integrating rational design with a high-throughput screening of large DNA-encoded libraries. The recent development of massively parallel reporter assays (MPRAs) provides a framework for leveraging next-generation sequencing to measure cellular transcription levels of large numbers of DNA sequence variants. This approach enables the measurement of thousands of synthetic sequences in a single, multiplexed experiment, often using transcriptional barcodes as a readout20,21. Previously, this paradigm has also been used to empirically examine both the individual and combinatorial effects of transcription factor binding sites on gene expression in eukaryotes, improving our ability to design synthetic eukaryotic promoters with programmable responses22–29. However, there have been few similar high-throughput studies in prokaryotes.
In this work, we implement a genomically-encoded MPRA to interrogate thousands of rationally designed variants of the lacZYA promoter and investigate the relationships between inducible promoter components across four cis-regulatory sequence architectures. We first explore the relationship between operator spacing and repression at the lacUV5 promoter using a variety of transcriptional repressors. Next, we design and characterize 8269 promoters composed of combinations of LacI repressor and RNAP-binding sites, exploring combinatorial interactions between elements and establishing relationships that guide transcriptional behavior. Lastly, we isolate and further characterize promoters with various levels of fold change and leakiness that may be useful in synthetic applications.
Results
Repression by transcription factors is dependent on operator spacing
The lacZYA promoter is a classic model for gene regulation in E. coli, with many studies investigating the relationship between sequence composition and induction properties. This promoter contains two LacI dimer sites positioned at the proximal +11 and distal −82 positions relative to the transcription start site (TSS)30,31, which flank a set of σ70 −10 and −35 elements (Fig. 1a, see WT PlacZYA). RNAP cooperatively binds these σ70 hexameric sequences and the relative binding affinity of these elements determines promoter strength6,8. Conversely, the LacI operator sites repress expression from the native lacZYA promoter when bound32. While LacI repressor bound at the proximal site blocks RNAP binding as well as promoter escape, binding at the distal site alone does not inhibit transcription and serves a more nuanced role in repression33. When both the proximal and distal sites are bound, LacI dimers at these sites can engage in homotetrameric protein interaction, tethering these sites together and forming a local DNA loop18,34,35. This repression loop further occludes RNAP binding, decreasing gene expression.
Studies exploring the formation of this repression loop have found that it is heavily dependent on the spacing between LacI operator sites (Fig. 1b)36–38. Due to the helical nature of B-form DNA, which completes a full rotation roughly every 10.5 bp, as operator sites are placed at various distances from one another along with the DNA their relative orientation along the face of the DNA helix changes as well. As a result, the ability of the distal site to engage in this repression loop fluctuates as it is shifted along with the promoter, with repression strength correlated with helical phasing between the two operator sites36,37. In our effort to optimize the lacZYA promoter, we sought to validate the effect operator spacing has on repression, as well as explore whether other repressors follow this same phenomenon.
Accordingly, we tested the relationship between spacing and repression for six transcription factors (TFs) at the most commonly utilized lacZYA-derived promoter, lacUV5: LacI, AraC, GalR, GlpR, LldR, and PurR. While LacI35,37,39, AraC40,41, and GalR42–46 have been experimentally shown to engage in DNA looping, there is evidence that GlpR47, LldR48, and PurR34 may also be capable of this mechanism. Using reported, natural binding sites for these TFs49 (Supplementary Table 1), we designed 624 sequences assessing the ability of these sites to repress a constitutive lacUV5 promoter across various operator spacings. The lacUV5 promoter models the lacZYA canonical architecture, but has a small 2 bp mutation in the −10 to drive more detectable levels of expression50. In our design, a proximal site for each TF was centered at +12, to avoid overlapping the transcription start site, and a series of variants were created in which the distal operator site was centered at each position from −83 to −116 relative to the TSS (Fig. 1a). Furthermore, to quantify the effect of the individual sites, we tested variants where either the proximal or distal site was replaced with a scrambled sequence variant that maintained the GC content of the native LacI site. We grew this library in MOPS rich-defined media supplemented with 0.2% glucose, a condition for which all TFs should be repressive, and measured expression of all variants using a previously described MPRA20 (Fig. 1c). In brief, we synthesized each variant and engineered these promoters to express uniquely barcoded GFP transcripts. Using recombination-mediated cassette exchange51, each barcoded variant was singly integrated into the essQ-cspB intergenic locus of the E. coli genome, positioned near the chromosomal midreplichore. We then grew the integrated libraries in rich, defined media, and quantified relative barcode expression levels by performing RNA-Seq of the transcribed barcodes and normalizing transcript levels to DNA copy number as determined by DNA-Seq. Using this assay, we recovered expression measurements for 615 (98.6%) of the variants we designed, measuring an average of 70 unique barcodes per variant (Supplementary Fig. 1). These measurements exhibited a high degree of correlation between technical replicates (Fig. 1d, r = 0.987, P < 2.2 × 10−16, two-sided Student’s t test).
We first explored the ability of these TFs to repress the lacUV5 promoter when placed in the proximal position. To evaluate this, we compared the relative expression between variants with proximal sites to the lacUV5 promoter containing a scrambled LacI site in the proximal position (Fig. 1e). At this position, repression varied across operators although the AraC, LldR, and GlpR sites were ineffective (AraC: P = 0.06, LldR: P = 0.47, GlpR: P = 0.5837, Welch’s two-sided t test). LacI exhibited the strongest level of repression in the proximal position at 2.62-fold (P = 2.22 × 10−7, Welch’s two-sided t test), which may be due to the strong binding affinity of the native proximal operator site30.
To gauge the performance of these repressors at each position in the distal site, we looked at how expression changes as a function of distance from the proximal site. While LacI37 and AraC40,41 are known to exhibit a cyclic pattern of repression as the distance between operator sites is increased, there are no direct measurements showing that GalR, GlpR, LldR, or PurR share this phenomenon. First, we looked at the effect of moving the distal site across 33 nucleotides in the absence of a functional proximal site (Fig. 1f). We observed a uniformity of cyclic behavior across most repressors tested, suggesting position-dependent effects are a general phenomenon of many TFs. Lone GalR, GlpR, and PurR distal sites alternated between activation and repression, a phenomenon which has been observed in similar translocations of a LacI-binding site upstream of a promoter in the absence of inducer37. This may be due to helical positioning of the repressor relative to RNAP and resulting steric interference or binding-induced DNA distortions52. Interestingly, we observed TFs exhibiting opposing position-dependent behaviors, where GalR and PurR repressed when the center of the binding sites was approximately in-phase with the +1 TSS position and activated when out of phase. We observed the opposite effect with GlpR sites. Conversely, AraC-binding sites gradually increased repression as they moved further upstream, with a significant inverse relationship between operator distance and expression, though the effect size is small (P = 2.19 × 10−5, ANOVA). To see whether these relationships would change when DNA looping was possible, we evaluated the effect of moving the distal site when a proximal site was also present (Fig. 1g). To directly observe the impact of the distal site, we determined the expression at each distal position relative to expression when only the proximal site was present. Coupled with a proximal site, a majority of tested TFs exhibited different repression patterns as the distal site was moved. For AraC, GalR, and LacI the distal sites reduce expression more with a proximal site present than without (AraC: 1.18-fold, P = 1.83 × 10−8, Welch’s two-sided t test; GalR: 1.35-fold, P = 2.82 × 10−11, Welch’s two-sided t test; LacI: 1.37-fold, P = 4.65 × 10−14, Welch’s two-sided t test). This enhanced repression by distal sites when a proximal site is present indicates the existence of synergistic interactions between these sites. Furthermore, repression by these distal sites followed a 10–11 bp periodicity as they were placed incrementally further from the proximal site, which may indicate the formation of DNA loops at the lacUV5 promoter. LldR, PurR, and GlpR distal sites did not show significantly enhanced ability to repress when a proximal site was present (P > 0.4 in all cases, Welch’s two-sided t test), indicating these TFs may not participate in looping-based repression. Additionally, distal site and loop-mediated repression differed between TFs tested which is likely due to differences in how these proteins are situated on their binding sites or oligomerize to form DNA loops. Thus, we find different repressor systems exhibit unique relationships between operator spacings and repression, highlighting the need to study these systems individually.
Tuning binding site strengths alters inducible promoter behavior
Having identified the optimal spacing for LacI sites at the lacUV5 architecture, we next sought to learn how these sites may be manipulated to generate lacUV5 variants with minimal leakiness and maximal fold change, properties that are desirable in synthetic applications. Previously, we found testing large libraries of promoters composed of various combinations of sequence elements allows us to characterize the contribution of individual sequence elements and reveal interactions between them20,53. Employing a similar MPRA strategy, we designed and assayed a library of 1600 inducible promoters, referred to as Pcombo, composed of all possible combinations of one of ten proximal LacI-binding sites at +11, four −10 elements, four −35 elements, and ten distal LacI sites at −90 (Fig. 2a). To cover a wide range of expression, we selected −10 and −35 element variants previously shown to span a range of RNAP-binding affinities6,20,53. Similarly, we chose a range of LacI-binding site variants from well-characterized genomic operator sites (O1, O3, Osym)10,18, a variant of the natural O2 site, O2-var, and a series of LacI sites created from different combinations of the monomeric halves of each of these dimeric binding sites (Supplementary Table 2). While O1 is the naturally occurring operator site reported to have the highest affinity for LacI, the synthetic Osym is a symmetrized variant with an even higher affinity18,54. Expression data for these variants was collected in both uninduced (0 mM IPTG) and fully induced conditions (1 mM IPTG). We recovered expression measurements for 1493 variants within this library (93.3%) with an average of 9 barcodes measured per variant. We observed high expression correlation between biological replicates in both the induced and uninduced conditions (Induced: r = 0.945, P < 2.2 × 10−16, Uninduced: r = 0.955, P < 2.2 × 10−16, two-sided Student’s t test) (Supplementary Fig. 2a).
We first explored how the composition of sequence elements determined uninduced expression or leakiness. Library variants exhibited a 267-fold range of uninduced expression; even amongst variants containing the same core promoter σ70 elements, expression varied by up to 96-fold (Fig. 2b). As has been previously reported in comparable promoter variants20, detectable expression levels were only observed when either the −10 or −35 elements matched the consensus sequence. In the uninduced state, promoters composed of the consensus −10/−35 elements exhibited the greatest leakiness, with up to 21-fold higher average expression than that of promoters composed of weaker −10/−35 elements. Effective repression generally required a strong LacI operator site, such as Osym and O1, in the proximal position, especially amongst variants with consensus −10/−35 elements (Fig. 2c). Although pairs of repressors exhibited similar effects on different combinations of −10 and −35 elements, there was still variability in these effects which may be due to biological and experimental noise at low levels of expression or interactions between sequence elements9,20.
We next explored how the interplay between operator sites and RNAP-binding site strengths influences fold change between induced and uninduced states. We determined the fold change of variants by normalizing induced and uninduced measurements to negative controls in each condition and calculating the ratio of normalized induced expression to normalized uninduced expression. Overall, we observed a 40-fold range of fold changes in expression (Fig. 2d). Promoters consisting of the consensus −10 and −35 sites exhibited the highest fold changes; however, these values were highly variable depending on the variant’s operator site composition (Fig. 2e). Amongst promoters containing these core sites, we found that operators in the proximal site were largely deterministic of fold change, with promoters containing strong operators (O1 and Osym) in the proximal site yielding 4.61-fold higher fold changes on average than promoters containing weak operators in the proximal site (P = 1.44 × 10−6, Welch’s two-sided t test). We attribute this to the importance of the downstream operator in blocking RNAP binding and transcriptional initiation10,55. As expected, promoters containing Osym in the proximal site generally drove the highest fold change, however, pairing with another Osym in the distal site surprisingly decreased fold change relative to other variants. Notably, while the consensus core promoter containing Osym in both the proximal and distal sites yielded a fold change of 4.63, its counterpart containing the weaker O1 variant in the proximal site drove an increased fold change of 8.97. While the promoter containing Osym in both the proximal and distal sites had 1.77-fold lower uninduced expression compared to its counterpart with a weaker O1 in the proximal site, induced expression was also 3.43-fold lower (Supplementary Fig. 3a). Thus, Osym in both the proximal and distal sites decreased expression in the induced state by a larger magnitude than in the uninduced state, resulting in a lower fold change.
To investigate this unusual phenomenon, we determined which proximal/distal site pair resulted in maximal fold change for other −10/−35 site pairs. Interestingly, we observe that maximal fold-change trends with the strength of the proximal site, but the optimal distal site varies on a core promoter basis. For example, the optimal distal site for promoters containing just one of the consensus −10/−35 sites was the comparably weaker O1:R-sym (Supplementary Fig. 3b), demonstrating that promoter architectures incorporating the strongest repressor binding elements available may not always yield the highest fold-change levels.
Biophysical modeling of inducible promoter activity
We set out to clarify the conditions for optimal fold change by combining our experimental measurements with a simple statistical mechanics binding model (described in Supplementary Note). To that end, we modeled promoter architecture by enumerating the various promoter states containing all combinations of RNAP binding, LacI binding, and LacI looping (Supplementary Fig. 4a). We assume that all states where RNAP is bound and the proximal LacI site is not bound to give rise to gene expression rmax, whereas all other states have a small background level of gene expression rmin9,56. The relative probability of each state is given by e-βE where E equals the sum of all binding free energies arising from binding or looping (Supplementary Fig. 4a). In addition, we include an additional term to scale values when in the presence of IPTG. Using this statistical mechanics model of gene expression, we inferred the binding energies of each promoter element and compared the resulting fits for the 1493 different promoters in the absence of IPTG (Fig. 3a, r2 = 0.79, P < 2.2 × 10−16, two-sided Student’s t test, parameter values in Supplementary Fig. 4b). Interestingly, we found that all parameters could be fit using as little as 5% of the library and retain the ability to accurately predict the other 95% of variants when used in this model framework (Supplementary Fig. 5a). Furthermore, this model enables us to extrapolate the gene expression for promoter architectures with arbitrary binding strengths spanning the theoretical parameter space (Fig. 3b).
We then used fit gene expression in the induced and uninduced states to explore how fold change varies as a function of inferred LacI binding energies (Fig. 3c). Returning to our earlier result, we confirmed that pairing together the consensus −35/−10 RNAP-binding site with a proximal and distal Osym LacI site (binding energy −2.4kBT; Supplementary Fig. 4b) leads to suboptimal fold change. Previously, measurements have shown that even at 1 mM IPTG, a small number of LacI dimers are still active57, and hence the large binding affinity to Osym sites may drive measurable repression levels58,59. Both our experimental measurements and statistical mechanics model support this notion, demonstrating that using Osym at both the proximal and distal sites leads to the sufficiently strong binding that overwhelms the small number of active repressors per cell, leading to reduced gene expression even at 1 mM IPTG (Fig. 3b). Instead, the promoter architecture that maximizes fold change couples the strong −10 and −35 RNAP elements with near-maximal LacI operator site strengths that are sufficiently strong enough to repress in the absence of IPTG but not in the presence of saturating IPTG. We also observed that to achieve optimal induction in weaker promoters, the strength of the LacI operator sites should decrease by a commensurate amount (Fig. 3d).
Additional operator sites can promote or reduce induction response
We next sought to explore how these behaviors would change in the context of alternative architectures in which we varied the operator number, placement, and RNAP-binding contacts. Based on our previous characterization of the 1600 Pcombo variants, we speculated whether an additional distal operator site could improve the fold change of promoters. In particular, we expected that an additional distal site would enhance repression, as multiple upstream sites would increase the probability of repressor binding and loop formation. To investigate this, we synthesized and tested 2000 lacUV5 variants within a library we call Pmultiple. This library resembled Pcombo except for the inclusion of an additional modular LacI-binding site, which we refer to as the “distal+ “ site, immediately upstream of the distal binding site. The final design was composed of each combination of five distal+ operator sites, five distal operator sites, four −10 elements, four −35 elements, and five proximal operator sites for a total of 2000 variants (Fig. 4a, top). Using our MPRA, we measured expression for 1638 of these variants (81.9%) in the absence of IPTG and at 1 mM IPTG with an average of 8–9 barcodes measured per variant (Supplementary Fig. 2b). To determine the effect of the distal+ site, we compared the fold change of each Pmultiple variant to Pcombo variants composed of the same distal, −35, −10, and proximal sites. We limited our analysis to studying promoters with consensus core promoter elements as well as an O1 or Osym proximal site to best capture the repressive effects of the distal+ element. The addition of the distal+ site to the Pcombo architecture spanned a 5.4-fold fold change range, largely determined by both distal and distal+ site identity (Fig. 4a, bottom). We observed that a strong distal+ operator site can consistently compensate for a weak distal operator site to decrease leakiness (Supplementary Fig. 6a) and improve fold change. For example, adding an O1 distal+ site to variants with the weakest distal operator, O3, resulted in a 2.93-fold change. However, when the distal site was already strong, adding a distal+ operator decreased expression fold change. Upon further investigation, we found that in cases where a strong distal site was already present, the addition of a strong distal+ site actually increased leakiness and induced expression of the system, suggesting that the distal+ site may be inhibiting distal site repression of the promoter (Supplementary Fig. 6a, b). Thus, we conclude that additional distal operator sites can improve the fold change of inducible systems by reducing the uninduced expression or have negative effects if they lead to competition with another strong distal site.
Finally, we explored whether our previously established statistical mechanics model could accurately predict the expression of variants in this library. We extended our model framework to account for the different promoter states available to the Pmultiple architecture (described in Supplementary Note) while retaining the same parameter values that fit the Pcombo library. Despite a lack of training on promoters of this architecture, the model was still able to predict the expression of Pmultiple variants with impressive accuracy (Supplementary Fig. 5b, R2 = 0.73, P < 2.2 × 10−16, two-sided Student’s t test). We expect the drop in accuracy is related to the observed interactions between the distal and distal+ sites, which will require further studies to parameterize. Nonetheless, we show that this adaptable model framework is robust even across previously unseen sequence architectures.
Changing repression mode alters activity independent of sequence element composition
Next, we explored how repositioning operator sites influence repression of the lacUV5 promoter. Previous work indicated operator sites placed within the spacer region, the segment of DNA between the −10 and −35 elements, enabled strong repression13. Notably, this positions the operator such that it directly competes with RNAP binding. Furthermore, this architecture is desirable for synthetic applications as it avoids placing operators downstream of the TSS, like the proximal site14. To explore this concept in-depth, we synthesized Pspacer, a library of 4400 variants containing all combinations of five distal operator sites, four −35 elements, four −10 elements, and five spacer operator sites (Fig. 4b, top). Because this spacer region is 17 bp and the LacI operators we use are 21 bp, operator sequences were truncated by 2 bp at their termini so as not to overlap the −10 and −35 motifs. In order to determine the optimal spacing between the distal and spacer operator sites, we also tested these combinations with interoperator distances between 46 and 56 bp. We recovered expression data for 3769 (85.7%) of these variants in the absence of IPTG and at 1 mM IPTG with an average of 7 barcodes per variant (Supplementary Fig. 2c). The distance between the spacer and distal operator sites did not appear to significantly affect the fold change of the promoters at the P < 0.05 threshold (ANOVA), which may be because some of the tested distances were insufficient to enable the formation of DNA loops17,37 (Supplementary Fig. 7a, b).
With all operator spacings tested appearing equivalent, we subset our analysis to variants with an interoperator distance of 55 bp, which is reportedly amenable to looping37. Similar to variants with the Pcombo architecture, we only observed strong induced expression with promoters containing −10 and −35 elements resembling the consensus (Supplementary Fig. 7c). To see how this change in architecture altered the performance of these promoters, we compared Pspacer variants to Pcombo promoters composed of the same cis-regulatory elements. Surprisingly, promoters with the Pspacer architecture had on average 2.16-fold higher uninduced and 1.93-fold higher induced expression (Fig. 4b, bottom). This may be because fewer repressed states are possible in this architecture, thereby pushing the system to be more active. Alternatively, this increased expression may be due to greater spacer %AT content within spacer LacI sites which may enhance promoter melting20,60 (Supplementary Table 3). Despite these higher expression values, Pspacer variants had comparable levels of fold change to corresponding variants of the Pcombo architecture (Fig. 4b, bottom).
Altering RNAP-binding contacts
Finally, we tested whether altering RNAP contacts could modify the behavior of inducible systems. Although all promoters tested thus far were designed to contact RNAP through the σ70 −35 and −10 elements, previous reports have suggested the possibility of engineering promoters lacking −35 elements61,62. In these cases, additional compensatory binding sites for transcription factors or RNAP are necessary to recruit RNAP and enable transcription. In addition to the −35 and −10 motifs, RNAP binding may be enhanced by an extended −10 TGn63,64 motif and an AT-rich UP element65,66 upstream of the −35 that stabilizes the RNAP α-subunit. However, it is not yet clear if these additional sequences are sufficient to compensate for the lack of a −35 element or how such an architecture would behave in an inducible context.
We synthesized and tested a library of 1600 lacUV5 variants, called Psteric, containing every combination of four −10 elements, five core operator sites centered at −26 instead of the −35 element, five proximal operator sites, and four UP elements in the presence or absence of an extended −10 motif (Fig. 4c, top). Furthermore, we positioned the proximal operator site centered at either the canonical +11 position or at the +30 position. At +30, the proximal operator is 56 nucleotides away from the core operator, which is near an optimal distance for repression loop formation37. We recovered expression data for 1369 of these variants (85.6%) in the absence of IPTG and at 1 mM IPTG with an average of 8 barcodes per variant (Supplementary Fig. 2d). We first examined library variants lacking functional LacI operator sites to identify combinations of −10 elements, extended −10 elements, and UP elements yielding functional promoters. Although weak or no transcription was detected from promoters with only a −10 element, we found the UP element and extended −10 synergistically increased expression, with up to 13-fold greater expression than promoters containing just a consensus −10 (Supplementary Fig. 8a).
Next, we compared two operator placements within this architecture to evaluate whether they enabled inducible behaviors. First, we found variants with the highest fold change were constructed with proximal operator sites located at the +30 position relative to the TSS, though the overall median fold change of promoters did not differ between the two proximal operator site positions (Supplementary Fig. 8b). Second, we found the inducibility of these promoters relies on the presence of a UP element, an extended −10, and a strong −10 motif. When all three are present, promoters containing a proximal operator site located at the +30 position exhibit up to an 11.8-fold response to IPTG (Fig. 4c, bottom). Despite the apparent viability of this architecture, we found that the highest expressing promoters generally contained Oscram or O1 core operator sites (Supplementary Fig. 8c). In these cases, we found operator sites tended to partially match the −35 motif, although they were not placed in the optimal position relative to the −10 motif (Supplementary Fig. 8d).
Comparison of optimized alternative lacUV5 promoter architectures
To gauge how our alternative inducible promoter architectures perform relative to one another, we compared the distributions of fold changes between each library. To focus on inducible variants, we limited our analysis to promoters with fold change ≥2. Of the thousands of promoters tested, relatively few were capable of induction, highlighting the difficulty in engineering these systems. Each architecture generated promoters with similarly wide ranges of uninduced expression, induced expression, and fold changes (Fig. 4d and Supplementary Table 8). However, overall comparisons revealed significant differences between these distinct architectures. In particular, Psteric members drove the lowest uninduced and induced expression, likely due to the noncanonical RNAP contacts with these promoters (P < 0.05, two-sided Mann–Whitney U test with Benjamini–Hochberg correction). Variants with the highest fold change were isolated from Pspacer and Pmultiple libraries, highlighting the potential benefits of exploring beyond canonical regulatory architectures. Although previously we found Pspacer variants exhibited greater uninduced and induced expression than Pcombo variants, we did not observe this phenomenon between these subsets of each library.
Validation of functional inducible variants using a fluorescent reporter
Finally, we sought to identify inducible variants superior to the canonical lacUV5 promoter. From all four architectures, we individually evaluated promoter sequences exhibiting higher fold change with low leakiness by using flow cytometry to measure sfGFP expression in uninduced (0 mM IPTG) and fully induced (1 mM IPTG) conditions (Fig. 5). Compared to lacUV5, all variants exhibited improved fold change (min: 9.5×, max: 21.0×, lacUV5: 4.1×). In particular, a Pmultiple variant demonstrated >5-fold higher fold change than lacUV5. Many variants, especially Psteric promoters, exhibited low leakiness while maintaining comparable induced expression. Activity measurements using flow cytometry well-correlated with MPRA measurements (induced: r = 0.701, uninduced: r = 0.981, fold change: r = 0.885) (Supplementary Fig. 9). Lastly, we found that all architectures demonstrated similar input–output relationships as lacUV5 in response to IPTG induction at variable concentrations (Supplementary Fig. 10).
Discussion
While current strategies for tuning inducible systems involve arbitrarily manipulating individual operator sites and core promoter elements, these approaches provide little insight into the combinatorial interactions modulating expression. Here, we implemented a MPRA to measure gene expression of nearly 9000 different promoter variants, learning the design logic for multiple sequence architectures. We found different repressors exhibit unique relationships between the operator placement and repression, highlighting the need to study these systems individually. We focused on the canonical PlacZYA inducible promoter, finding that induction largely depends on an interplay between the repressor and the core promoter elements. Notably, RNAP and repressors compete for binding, such that promoters containing near-consensus −35 and −10 σ70 elements are functionally irrepressible unless matched with correspondingly strong repressor sites. However, as has been previously shown56,57, the strongest LacI sites are repressive even in the presence of inducer, reducing fold change. Both a thermodynamic model and our empirical measurements agree that fold change is optimized by selecting repressor binding sites commensurate to the strength of the promoter.
Beyond studying combinatorial effects within the PlacZYA architecture, we investigated these interactions in alternative promoter contexts. Characterizing the dynamic range of expression of alternative inducible promoter architectures expands our ability to fine-tune metabolic pathways for generating chemical compounds, especially when products are toxic to the host system67. Furthermore, this approach could be applied towards identifying design rules to minimize leakiness and maximize fold change in other bacterial repressor systems that likely operate under similar thermodynamic principles. To our knowledge, a systematic analysis exploring a similar interplay between TF and core promoter strengths in eukaryotic systems has yet to be performed, however many MPRAs have explored the regulatory role of TFs23–29,68 and core promoter69 binding site compositions separately. Thus we predict the approach presented here can inform us about the interactions between TF and core promoter sites in other systems.
Ultimately, this systems analysis of inducible promoter regulation demonstrates the utility of combining rational design with large-scale multiplexed assays. Testing sequence libraries in multiplexed formats enabled the exploration of distinct functional designs as well as the discovery of promoter variants with desirable properties. In addition, this assay provides a reliable means for exploring the effects of specific genetic variants, which can reveal insights into promoter mechanisms and sequence–function relationships.
Methods
Promoter library design
A library of 624 variants was created to test the effects of altering the spacing between LacI, AraC, GalR, GlpR, LldR, and PurR operator sites. The core promoter PlacL8-UV5, is the endogenous lacZYA promoter region with L8 and L29 mutations in the CAP site to render it catabolite insensitive (−55 C- > T, −66G- > A) as well as UV5 mutations in the −10 region to increase activity (−9, −8 GT- > AA)70–72. Pairs of 23-bp operator sites were acquired from endogenous loci reported by RegulonDB49 (ver 8.0) (Supplementary Table 1). For sites under 23 bp in length, the surrounding sequence of the native genomic context was included. In all cases, the downstream site found at the endogenous loci, with respect to the regulated promoter orientation, was used as a proximal site in our designs while the upstream sequence was used as the distal site. For each pair of operator sites, a series of variants were designed where the proximal operator was centered at +12 (spanning +1 to +23) and the distal operator varied from positions −83 to −116. Similar series of variants were also designed, in which the sequence of the proximal site or distal site was shuffled to obviate the activity of the operator.
A library (Pcombo) of 1600 lacUV5 variants composed of each combination of 10 proximal operator sites, 10 distal operator sites, four −10 elements, and four −35 elements was designed. The operator sites were selected to span a wide range of lacI-binding affinities (Supplementary Table 2). These consisted of two native LacI operators (O1 and O3) and a variant of the native O2 lac operator with three mutations (O2-var). In addition, Osym and six other synthetic operators (O1:R-sym, O2:L-sym, O2:R-sym, O3:L-sym, O3:R-sym) were used with the latter being designed by creating palindromic sequences based on either the left or right halves of each native sequence. Lastly, a scrambled operator (Oscram) composed of a random scrambling of the O1 sequence served as a negative control. The −10 and −35 sites were selected to span a range of binding affinities for RNA Polymerase and obtained from a previous characterization6,8,20 (Supplementary Tables 4–5). Each variant was composed of a combination of these elements placed onto catabolite insensitive (L8, L29 mutant), lacZYA promoter with the proximal site placed at +11 and the distal site placed at −90, which was found to enable strong looping in the assay of transcription factor spacing.
A library (Pmultiple) of 2000 lacUV5 variants composed of each combination of one of five distal+ operator sites, five distal operator sites, five proximal operator sites, four −10 elements, and four −35 elements was designed. The O1, O3, O2-var, Osym, and Oscram operators from the Pcombo library were selected as the five operator sites for testing. In addition, the same −10 and −35 elements from the Pcombo library were selected. This library was constructed with sequence elements placed in the same positions as the Pcombo library, with the exception of the distal+ sequence being placed immediately upstream of the distal site.
A library (Pspacer) of 4400 lacUV5 variants composed of each combination of five distal operator sites, four −35 elements, four −10 elements, and five spacer operator sites was designed. In order to fit the 17-bp spacer region, two base pairs were trimmed from each end of the spacer operator sites (Supplementary Table 2). The same operators, −10 elements, and −35 elements from the Pmultiple library were selected. Lastly, the distal operator site was tested at 10 different spacings relative to the core promoter, ranging from 20–30 bp from the 5’ most end of the −35 element. These 20–30 bp spacings resulted in an interoperator distance of 46–56 bp.
A library (Psteric) of 800 lacUV5 variants composed of each combination of four −10 elements, five core LacI sites centered at −26, five proximal operator sites, and one of four UP elements in the presence or absence of an extended −10 motif was designed. The same operator sites and −10 elements from the Pmultiple library were selected. Proximal operator sites were tested when centered at both the +11 and +30 positions relative to the TSS. The UP elements selected were obtained from a previous characterization and range in their abilities to enhance transcription20,73 (Supplementary Table 6). In addition, the extended −10 element TGG was used as this is the most commonly found version of an extended −1064.
Library cloning
The library was synthesized by Agilent and then resuspended in 100 µL of elution buffer before cloning into plasmid pLibacceptorV2 (Addgene ID no. 106250). The transcription factor spacing library was ordered separate from the other libraries, which were altogether synthesized and tested in a multiplexed pool. First, the library was amplified with KAPA SYBR FAST qPCR Master Mix (#KK4600) utilizing primers GU 132 and GU 133 at 10 µM to determine Cq values. Afterward, the library was amplified with NEBNext® Q5® Hot Start HiFi PCR Master Mix (#M0543S) at 11 cycles using primers GU 132 and GU 133 as well, in triplicate. Replicates were pooled, then cleaned with Zymo Clean and Concentrator Kit (#D40140).
To barcode the library, each library was amplified with NEBNext® Q5® Hot Start HiFi PCR Master Mix (#M0543S) for 10 cycles using primers GU 132 and GU 134. Library ends were then digested with SbfI-HF (NEB #R3642S) and XhoI (NEB #R0146S) by incubating at 37 °C for 1.5 h. The plasmid vector, pLibAcceptorV2, was first maxi-prepped with QIAGEN Plasmid Maxi Kit (#12162), concentrated with a Promega Wizard SV Gel and PCR Clean-up System (#A9281), and digested with SbfI-HF (NEB #R3642S), SalI-HF (NEB #R3138S), and rSAP (NEB #M0371S) for 1.5 h at 37 °C. Insert (library) and vector (pLibAcceptorV2) were ligated using T7 DNA Ligase (NEB #M0318S), incubating at room temperature for 1 h. The plasmid was then transformed into DH5α electrocompetent E. coli cells (New England Biolabs C2989K) and plated for 24 h at 30 °C on LB + kanamycin (25 µg/mL) agar plates. These plates were then harvested in 5 mL of LB and 400 × 106 cells (based on OD600) were grown overnight in 450 mL LB + kanamycin (25 ug/mL). This plasmid, consisting of the library cloned into pLibacceptorV2, was isolated and concentrated with Zymo Clean and Concentrator Kit (#D40140).
To clone RiboJ::sfGFP into the plasmid, RiboJ::sfGFP was first amplified with NEBNext® Q5® Hot Start HiFi PCR Master Mix (#M0543S) for 25 cycles using primers GU 99 and GU 100 at 10 µM. This amplicon was then digested with BsaI-HF (NEB # R3535) and NcoI-HF (NEB #R3193S) for 1.5 h at 37 °C. pLib was digested with BsaI-HF (NEB # R3535) and NheI (NEB# R3131S). pLib vector was then ligated with the GFP insert using T7 DNA Ligase (NEB #M0318S), incubating at room temperature for 1 h. This plasmid was next transformed into DH5α electrocompetent cells and plated for 24 h of growth at 30 °C as well, yielding pLib_sfGFP plasmid after maxi-prep.
Library integration
The pLib_sfGFP plasmid was first digested with SalI-HF (NEB #R3138S) and NheI (NEB# R3131S) to remove the background. This was then transformed into the landing pad strain, an engineered20 E. coli MG1655 derivative (Yale Coli Genetic Stock Center no. 6300), and grown overnight for 24 h at 30 °C. The following day, plates were scraped and 800 million cells in 200 mL of LB + kan (25 µg/mL) were inoculated overnight at 30 °C.
For library integration, glycerol stocks of landing pad strain with the integration plasmid were grown overnight in 200 mL + kan (25 µg/mL) at 30 °C. 200 million cells from this overnight culture was inoculated the next day into 250 mL LB + 0.2% arabinose + 25 µg/ml Kan at 30 °C for 24 h to induce recombination. The following day, 800 million cells of induced overnight were inoculated into 80 mL LB + 25 µg/mL Kan at 42 °C for heat cure. This was grown to log phase (OD 0.3–0.7) for about 1.5 h. In total, 200 million cells from this log phase culture were plated at 42 °C for 16 h in undiluted, 10−5, and 10−6 dilutions. Plates grown overnight were then scraped, and 400 million cells inoculated into 200 mL LB + Kan 25 µg/mL for overnight growth at 37 °C. Ultimately, this was plated again at 30 °C to validate integration (GFP instead of mCherry) and then glycerol stocked after colony PCR for further confirmation.
Barcode mapping
The promoter and barcode region from pLib was prepared for sequencing and downstream mapping of the barcodes to their respective variants. Two PCRs were performed to prepare pLib samples for sequencing, the first of which adds sites for the sequencing primer whereas the second PCR adds the adaptors for Illumina sequencing and a unique index DNA label. Each barcode mapping was performed in duplicate.
For the first PCR, the library was amplified with KAPA SYBR FAST qPCR Master Mix (#KK4600) with primers GU 60 and GU 79 at 5 µM to determine Cq values. Afterward, the library was amplified with NEBNext® Q5® Hot Start HiFi PCR Master Mix (#M0543S) at 11 cycles using primers GU 60 and GU 79 at 5 µM as well in triplicate. Replicates were pooled, then cleaned with Zymo Clean and Concentrator Kit (#D40140), eluting into 10 µL of Ultra-pure H2O.
For the second PCR, Illumina adapters P7, P5, and a unique DNA index were added. The product from the first PCR was amplified with primers GU 70 and GU 86 at 5 µM to determine Cq values. Afterward, the library was amplified with NEBNext® Q5® Hot Start HiFi PCR Master Mix (#M0543S) at ten cycles using primers GU 70 and GU 86 at 5 µM. Since different primers add different indices to each sample, we re-ran the second PCR with a different set of primers to serve as redundancy and allow us to compare sequencing replicates. This process was repeated in a separate PCR, with primers GU 70 and GU 87 also at 5 µM.
Ultimately, each technical replicate was performed in duplicate, cleaned with Zymo Clean and Concentrator Kit (#D40140), and ran on a 1.0% agarose gel for final confirmation. After quality assessment, samples were sequenced on an Illumina Nextseq 500 using a Paired-end 300-cycle kit (2 × 150 bp). Barcodes were mapped to their respective promoter variants using the pipeline from Urtecho et al.20. In brief, paired-end reads are merged using PEAR74 (version 0.9.1). We then extract the first 150 bp of each read, which encodes the promoter variant, as well as the last 20 bp encoding the barcode, and generate a list of barcode-variant associations. Finally, we perform additional filtering steps for quality control purposes.
Library growth and sequencing preparation
Library pellets were prepared in both Induced and Uninduced conditions. First, glycerol stocks were inoculated in 100 mL of MOPS with 0.2% glucose + kanamycin (25 µg/mL) at 30 °C for 16 h overnight. The following day, the overnight culture was diluted to OD 0.0005, inoculated into 200 mL MOPS + kanamycin (25 µg/mL) with 0.2% glucose, and grown at 37 °C to OD 0.5–0.55 (~5 h) both with 1 mM IPTG and without.
To harvest RNA pellets, the culture was first cooled for two minutes in an ice slurry while periodically swirling. For each sample, three 50 mL aliquots of culture were poured into pre-chilled tubes and spun for two minutes at 13,000 × g at 4 °C. The supernatant was poured off. RNA was extracted from E. coli pellets using Qiagen RNEasy Midiprep kit (#75142). We performed technical replicates of this extraction (separate RNA extractions of the same culture) with the operator spacing library and biological replicates (Different cultures grown in parallel before separately extracting). Subsequent wash steps concentrated isolated RNA with Qiagen Minelute Cleanup Kit (#74204). Next, isolated RNA was converted to cDNA with Thermo Fisher SuperScript IV (#18090010) following the manufacturer’s directions.
To harvest gDNA pellets, 5 mL samples of each culture were then spun down for four min @ 5000 × g. The supernatant was then poured out. DNA from each pellet was then isolated with Zymo Research ZR Plasmid Miniprep Kit (#D4015) for use as normalization.
The barcoded cDNA was amplified with NEBNext® Q5® Hot Start HiFi PCR Master Mix (#M0543S) from 1 µg of gDNA for 14 cycles with primers GU 59 and GU 60 at 5 µM. The product was cleaned with Zymo Clean and Concentrator Kit (#D40140). In all, 1 ng of this sample was amplified again for ten cycles with primers GU 65–68 and GU 70 for indexing, yielding 8 total samples; technical replicates for induced and uninduced cDNA, and induced and uninduced gDNA. Both prepared DNA and RNA library samples were quantified with Agilent Tapestation, then sent for sequencing on HiSeq2500 (SE 50-cycle) to the Broad Stem Cell Research Center at UCLA.
A comprehensive list of all primers used in this paper can be found in Supplementary Table 7.
Data processing
Following RNA-Seq and DNA-Seq of the barcodes, we quantify the relative abundance of each barcode. Demultiplexed RNA and DNA reads for each biological replicate were converted to counts of each barcode via a custom UNIX script that extracts barcode sequences from individual reads and counts the number of observed reads for each barcode. These barcode counts were normalized using the following formula:
Normalized read counts were then merged by common barcode to yield a comprehensive data frame containing normalized read counts for each barcode in each replicate. This data frame was then merged with the barcode mapping data to map normalized read counts to their corresponding promoter. Multiple barcodes could map to a single promoter, thereby providing replicability, and any promoter that contained fewer than three barcodes in any sample was removed. After this filtering step, promoter expression for each replicate was calculated using the following formula:
To normalize promoter expression between induced and uninduced samples, the expression of each promoter was normalized to the median negative control promoter expression in its respective biological replicate. Lastly, the mean expression of the biological replicates was calculated to obtain final expression values for the induced and uninduced conditions.
Thermodynamic model of gene expression
For the Pcombo library, initial guesses for the binding energies of each LacI operator site were used as inputs and refined when fitting a statistical mechanics model to the Pcombo promoter expression data. The coefficient of determination (r2) between fit and actual gene expression values was calculated using log10-transformed values to reduce the effects of large expression outliers.
Individual promoter variant cloning
Two promoters were selected from each of the libraries, yielding eight total promoters in addition to two controls (a constitutive promoter and UV5). Individual promoter variants were selected from our library of variants based on the highest fold change (induced over uninduced expression) and fold change:noise ratio (fold change over uninduced expression). These sequences were ordered from IDT as gBlocks® Gene Fragments. Full RiboJ:sfGFP was PCR isolated from the original library. Since promoters were to be measured individually, we did not include a barcode in synthesis. Plasmid vector, pLibacceptorV2 was linearized with SbfI-HF (NEB #R3642S) and SalI-HF (NEB #R3138S).
After synthesis by IDT, promoters were amplified using primers GU 142, GU89, and NEBNext® Q5® Hot Start HiFi PCR Master Mix (#M0543S). Each reporter was assembled with Gibson Assembly® Master Mix (NEB #E2611S) using 30 bp overlaps between the plasmid pLibAcceptorV2, the promoter, and RiboJ:sfGFP. Each assembled reporter was separately transformed into E. coli DH5α Chemically Competent E. coli (NEB #C2987H) yielding 10 total transformed E. coli strains containing their respective promoter, RiboJ:sfGFP, and Kanamycin antibiotic resistance. Afterward, the promoter and downstream GFP segment were sequenced from isolated colonies using the same set of primers, GU 142 and GU89, to confirm correct constructs. All products were cleaned with Zymo Clean and Concentrator Kit (#D40140) except for pLibAcceptorV2, which was cleaned with Promega Wizard SV Gel and PCR Clean-up System (#A9281) after DNA isolation with QIAGEN Plasmid Maxi Kit (#12162).
Individual promoter variant integration
E. coli strains containing library members were grown overnight for 16 h in 5 mL of Luria Broth and kanamycin (25 mg/µL). Afterward, the plasmid was isolated using Zymo ZR Plasmid Miniprep Kit (#D4054) formed into an electrocompetent MG1655 containing an engineered landing pad within the essQ-cspB intergenic locus20 and plated on LB and kanamycin (25 μg/mL) at 30 °C. Two colonies per promoter were resuspended in LB, and inoculated into 5 mL of LB + kanamycin (25 μg/mL) for overnight growth.
Each promoter was separately integrated into the essQ-cspB locus using Cre-Lox-mediated cassette exchange. Following overnight growth, cells of this culture were inoculated into 5 mL of LB, kanamycin (25 μg/mL), and 0.2% arabinose (g/mL) and grown for 24 h to induce integration of the reporter cassette. After integration of the reporter cassette through the arabinose-induced Cre system, the residual plasmid was removed through heat-curing. In total, 200 million cells were inoculated into 3 mL of LB and kanamycin (25 μg/mL) and grown at 42 °C for about 1.5 h to reach log phase (OD 0.3–0.7). After this growth, cells were diluted to 10−4 and plated on LB + kanamycin (25 ug/mL) plates overnight at 42 °C to complete the heat-curing process.
Flow cytometry
Glycerol stocks for each promoter were first scraped and inoculated into liquid cultures containing MOPS EZ-Rich Media (TEKNOVA #M2105) and 25 µg/mL of kanamycin at 30 °C for overnight growth. The following day, cells grown overnight were diluted to an OD of 0.002 in MOPS EZ-Rich Media (TEKNOVA #M2105) with 0.2% glucose (g/mL) and 25 ug/mL of kanamycin at 30 °C. These cells were then transferred to 100-mL flasks all containing 15 mL of MOPS EZ-rich media + 0.2% glucose. 1 mM IPTG + 25 µg/mL kanamycin were added to the “Induced” cultures, whereas 25 µg/mL kanamycin was added to the “Uninduced” cultures. These cultures were then grown at 37 °C for 3.5 h. In all, 5 mL of each sample was spun down, the supernatant was decanted, and the cell pellets were resuspended in 1 mL PBS (GIBCO® PBS Phosphate-Buffered Saline 10010023). In total, 1 mL of each sample was filtered into a Falcon 5 mL Polystyrene Round-Bottom Tube with Cell-Strainer Cap. E. coli MG1655 was used as a negative control for GFP expression while a constitutively active library member was used as positive. Data was collected using a BioRad S3 Cell Sorter with ProSort Version 1.6 and analyzed in FlowJo (version 10.0.8r1). E. coli cells were isolated by gating using FSC and SSC (Supplementary Fig. 11). Fold change was calculated by dividing the median GFP fluorescence of the induced samples by the median fluorescence of the induced samples
Plate reader assay
Glycerol stocks for each promoter were scraped and inoculated into liquid cultures containing MOPS EZ-Rich Media (TEKNOVA #M2105) and 25 µg/mL of kanamycin at 30 °C for overnight growth in 5 mL disposable culture tubes. The following day, each promoter was diluted to OD 0.005 in 500 µL of MOPS EZ-Rich Media (TEKNOVA #M2105) with 0.2% glucose (g/mL) and 25 µg/mL of kanamycin and set up for plate reader analysis in triplicates across an IPTG gradient: 0, 0.001, 0.005, 0.01, 0.1, 1 mM. After samples were grown for five hours at 37 °C, 100 µL aliquots were transferred into 96-well flat-bottom microplates. Measurements were taken for wavelengths 650 nm (measures OD) and 520 nm (measures GFP) on the Tecan Infinite M1000 Pro No. 30064852 plate reader. Data were analyzed in Excel (Version 16.41) with the four reads per time point per well averaged and divided by the OD measurement to calculate the GFP fluorescence.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
This work was supported by the National Science Foundation Graduate Research Fellowship 2015210106 to G.U., National Institutes of Health New Innovator Award DP2GM114829 to S.K., Searle Scholars Program to S.K., U.S. Department of Energy (DE-FC02-02ER63421 to S.K.), UCLA, and Linda and Fred Wudl. We thank the UCLA BSCRC high-throughput sequencing core and Technology Center for Genomics and Bioinformatics for technical assistance; All past and present members of the Kosuri lab for technical feedback; Suzannah Beeler for thoughtful discussions; and Reid C. Johnson for the paper feedback. Lastly, we thank the UCLA Molecular Biology Interdepartmental Graduate Program and UCLA Bioinformatics Interdepartmental Graduate Program.
Source data
Author contributions
T.C.Y., G.U., W.L.L., J.E.D., J.S., G.B., T.E., and S.K. designed the study. T.C.Y. and K.D.I. generated the sequence libraries. T.C.Y., M.S.B., W.L.L., J.S., and G.B. performed the experiments. T.C.Y., G.U., J.E.D., and T.E. analyzed the data. W.L.L. designed the figures. T.E. and R.P. developed the statistical mechanics model. T.C.Y., G.U., W.L.L., J.E.D., J.S., G.B., and T.E. wrote the paper. All authors edited and approved the paper.
Data availability
Raw data and promoter expression datasets are available without restrictions through NCBI Gene Expression Omnibus (Accession no. GSE145630). All other relevant data are available from the authors upon reasonable request. Source data are provided with this paper.
Code availability
The Mathematica notebook used for the thermodynamic model, as well as all code for recreating plots, are available at https://github.com/timcyu/inducible_architecture75. Statistical significance is reported to a lower limit of P < 2.2 × 10−16, the lowest reportable value by R.
Competing interests
The authors declare no competing interests.
Footnotes
Peer review information Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Timothy C. Yu, Winnie L. Liu.
These authors jointly supervised this work: Sriram Kosuri, Guillaume Urtecho.
Contributor Information
Sriram Kosuri, Email: sri@ucla.edu.
Guillaume Urtecho, Email: gurtecho@ucla.edu.
Supplementary information
Supplementary information is available for this paper at 10.1038/s41467-020-20094-3.
References
- 1.Khalil AS, Collins JJ. Synthetic biology: applications come of age. Nat. Rev. Genet. 2010;11:367–379. doi: 10.1038/nrg2775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rogers JK, et al. Synthetic biosensors for precise gene control and real-time monitoring of metabolites. Nucleic Acids Res. 2015;43:7648–7660. doi: 10.1093/nar/gkv616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Alper H, Fischer C, Nevoigt E, Stephanopoulos G. Tuning genetic control through promoter engineering. Proc. Natl Acad. Sci. USA. 2005;102:12678–12683. doi: 10.1073/pnas.0504604102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Riglar DT, et al. Engineered bacteria can function in the mammalian gut long-term as live diagnostics of inflammation. Nat. Biotechnol. 2017;35:653–658. doi: 10.1038/nbt.3879. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Binder D, et al. Light-controlled cell factories: employing photocaged isopropyl-β-d-thiogalactopyranoside for light-mediated optimization of lac promoter-based gene expression and (+)-valencene biosynthesis in Corynebacterium glutamicum. Appl. Environ. Microbiol. 2016;82:6141–6149. doi: 10.1128/AEM.01457-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Brewster RC, Jones DL, Phillips R. Tuning promoter strength through RNA polymerase binding site design in Escherichia coli. PLoS Comput. Biol. 2012;8:e1002811. doi: 10.1371/journal.pcbi.1002811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Boedicker JQ, Garcia HG, Johnson S, Phillips R. DNA sequence-dependent mechanics and protein-assisted bending in repressor-mediated loop formation. Phys. Biol. 2013;10:066005. doi: 10.1088/1478-3975/10/6/066005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kinney JB, Murugan A, Callan CG, Cox EC. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc. Natl Acad. Sci. USA. 2010;107:9158–9163. doi: 10.1073/pnas.1004290107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Einav, T. & Phillips, R. How the avidity of polymerase binding to the-35/-10 promoter sites affects gene expression. Proc. Natl Acad. Sci. USA116, 13340–13345 (2019). [DOI] [PMC free article] [PubMed]
- 10.Barnes SL, Belliveau NM, Ireland WT, Kinney JB, Phillips R. Mapping DNA sequence to transcription factor binding energy in vivo. PLoS Comput. Biol. 2019;15:e1006226. doi: 10.1371/journal.pcbi.1006226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Cobb RE, Chao R, Zhao H. Directed evolution: past, present and future. AIChE J. 2013;59:1432–1440. doi: 10.1002/aic.13995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Meyer AJ, Segall-Shapiro TH, Glassey E, Zhang J, Voigt CA. Escherichia coli ‘Marionette’ strains with 12 highly optimized small-molecule sensors. Nat. Chem. Biol. 2019;15:196–204. doi: 10.1038/s41589-018-0168-3. [DOI] [PubMed] [Google Scholar]
- 13.Lutz R, Bujard H. Independent and tight regulation of transcriptional units in Escherichia coli via the LacR/O, the TetR/O and AraC/I1-I2 regulatory elements. Nucleic Acids Res. 1997;25:1203–1210. doi: 10.1093/nar/25.6.1203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Camsund D, Heidorn T, Lindblad P. Design and analysis of LacI-repressed promoters and DNA-looping in a cyanobacterium. J. Biol. Eng. 2014;8:4. doi: 10.1186/1754-1611-8-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wilson CJ, Zhan H, Swint-Kruse L, Matthews KS. The lactose repressor system: paradigms for regulation, allosteric behavior and protein folding. Cell. Mol. Life Sci. 2007;64:3–16. doi: 10.1007/s00018-006-6296-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Helmann JD, Chamberlin MJ. Structure and function of bacterial sigma factors. Annu. Rev. Biochem. 1988;57:839–872. doi: 10.1146/annurev.bi.57.070188.004203. [DOI] [PubMed] [Google Scholar]
- 17.Müller J, Oehler S, Müller-Hill B. Repression of lac promoter as a function of distance, phase and quality of an auxiliary lac operator. J. Mol. Biol. 1996;257:21–29. doi: 10.1006/jmbi.1996.0143. [DOI] [PubMed] [Google Scholar]
- 18.Oehler S, Amouyal M, Kolkhof P, von Wilcken-Bergmann B, Müller-Hill B. Quality and position of the three lac operators of E. coli define efficiency of repression. EMBO J. 1994;13:3348–3355. doi: 10.1002/j.1460-2075.1994.tb06637.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Becker NA, Peters JP, Maher LJ. Mechanism of promoter repression by Lac repressor-DNA loops. Nucleic Acids Res. 2013;41:3962–3962. doi: 10.1093/nar/gkt089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Urtecho, G., Tripp, A. D., Insigne, K., Kim, H. & Kosuri, S. Systematic dissection of sequence elements controlling σ70 promoters using a genomically-encoded multiplexed reporter assay in E. coli.Biochemistry10.1021/acs.biochem.7b01069 (2018). [DOI] [PMC free article] [PubMed]
- 21.Tewhey R, et al. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell. 2016;165:1519–1529. doi: 10.1016/j.cell.2016.04.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.de Boer CG, et al. Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat. Biotechnol. 2020;38:56–65. doi: 10.1038/s41587-019-0315-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Sharon E, et al. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat. Biotechnol. 2012;30:521–530. doi: 10.1038/nbt.2205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Mogno I, Kwasnieski JC, Cohen BA. Massively parallel synthetic promoter assays reveal the in vivo effects of binding site variants. Genome Res. 2013;23:1908–1915. doi: 10.1101/gr.157891.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.van Dijk D, et al. Large-scale mapping of gene regulatory logic reveals context-dependent repression by transcriptional activators. Genome Res. 2017;27:87–94. doi: 10.1101/gr.212316.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Fiore C, Cohen BA. Interactions between pluripotency factors specify cis-regulation in embryonic stem cells. Genome Res. 2016;26:778–786. doi: 10.1101/gr.200733.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Grossman, S. R. et al. Systematic dissection of genomic features determining transcription factor binding and enhancer function. Proc. Natl. Acad. Sci. USA114, E1291–E1300 (2017). [DOI] [PMC free article] [PubMed]
- 28.Smith RP, et al. Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model. Nat. Genet. 2013;45:1021–1028. doi: 10.1038/ng.2713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.White, M. A. et al. A simple grammar defines activating and repressing cis-regulatory elements in photoreceptors. Cell Rep.17, 1247–1254 (2016). [DOI] [PMC free article] [PubMed]
- 30.Oehler S, Eismann ER, Krämer H, Müller-Hill B. The three operators of the lac operon cooperate in repression. EMBO J. 1990;9:973–979. doi: 10.1002/j.1460-2075.1990.tb08199.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Reznikoff WS. The lactose operon-controlling elements: a complex paradigm. Mol. Microbiol. 1992;6:2419–2422. doi: 10.1111/j.1365-2958.1992.tb01416.x. [DOI] [PubMed] [Google Scholar]
- 32.Beckwith J, Grodzicker T, Arditti R. Evidence for two sites in the lac promoter region. J. Mol. Biol. 1972;69:155–160. doi: 10.1016/0022-2836(72)90031-9. [DOI] [PubMed] [Google Scholar]
- 33.Becker NA, Peters JP, Lionberger TA, Maher LJ. Mechanism of promoter repression by Lac repressor–DNA loops. Nucleic Acids Res. 2013;41:156–166. doi: 10.1093/nar/gks1011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Cournac A, Plumbridge J. DNA looping in prokaryotes: experimental and theoretical approaches. J. Bacteriol. 2013;195:1109–1119. doi: 10.1128/JB.02038-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Flashner Y, Gralla JD. Dual mechanism of repression at a distance in the lac operon. Proc. Natl Acad. Sci. USA. 1988;85:8968–8972. doi: 10.1073/pnas.85.23.8968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Johnson S, Lindén M, Phillips R. Sequence dependence of transcription factor-mediated DNA looping. Nucleic Acids Res. 2012;40:7728–7738. doi: 10.1093/nar/gks473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Bond LM, Peters JP, Becker NA, Kahn JD, Maher LJ., 3rd Gene repression by minimal lac loops in vivo. Nucleic Acids Res. 2010;38:8072–8082. doi: 10.1093/nar/gkq755. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Brunwasser-Meirom M, et al. Using synthetic bacterial enhancers to reveal a looping-based mechanism for quenching-like repression. Nat. Commun. 2016;7:10407. doi: 10.1038/ncomms10407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Krämer H, et al. lac repressor forms loops with linear DNA carrying two suitably spaced lac operators. EMBO J. 1987;6:1481–1491. doi: 10.1002/j.1460-2075.1987.tb02390.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Dunn TM, Hahn S, Ogden S, Schleif RF. An operator at -280 base pairs that is required for repression of araBAD operon promoter: addition of DNA helical turns between the operator and promoter cyclically hinders repression. Proc. Natl Acad. Sci. USA. 1984;81:5017–5020. doi: 10.1073/pnas.81.16.5017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lee DH, Schleif RF. In vivo DNA loops in araCBAD: size limits and helical repeat. Proc. Natl Acad. Sci. USA. 1989;86:476–480. doi: 10.1073/pnas.86.2.476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Geanacopoulos M, Adhya S. Genetic analysis of GalR tetramerization in DNA looping during repressosome assembly. J. Biol. Chem. 2002;277:33148–33152. doi: 10.1074/jbc.M202445200. [DOI] [PubMed] [Google Scholar]
- 43.Irani MH, Orosz L, Adhya S. A control element within a structural gene: the gal operon of Escherichia coli. Cell. 1983;32:783–788. doi: 10.1016/0092-8674(83)90064-8. [DOI] [PubMed] [Google Scholar]
- 44.Choy HE, Adhya S. Control of gal transcription through DNA looping: inhibition of the initial transcribing complex. Proc. Natl Acad. Sci. USA. 1992;89:11264–11268. doi: 10.1073/pnas.89.23.11264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Haber R, Adhya S. Interaction of spatially separated protein-DNA complexes for control of gene expression: operator conversions. Proc. Natl Acad. Sci. USA. 1988;85:9683–9687. doi: 10.1073/pnas.85.24.9683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Mandal N, Su W, Haber R, Adhya S, Echols H. DNA looping in cellular repression of transcription of the galactose operon. Genes Dev. 1990;4:410–418. doi: 10.1101/gad.4.3.410. [DOI] [PubMed] [Google Scholar]
- 47.Yang B, Larson TJ. Action at a distance for negative control of transcription of the glpD gene encoding sn-glycerol 3-phosphate dehydrogenase of Escherichia coli K-12. J. Bacteriol. 1996;178:7090–7098. doi: 10.1128/JB.178.24.7090-7098.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Aguilera L, et al. Dual role of LldR in regulation of the lldPRD operon, involved in L-lactate metabolism in Escherichia coli. J. Bacteriol. 2008;190:2997–3005. doi: 10.1128/JB.02013-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Salgado H, et al. RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more. Nucleic Acids Res. 2013;41:D203–D213. doi: 10.1093/nar/gks1201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Studier FW, Moffatt BA. Use of bacteriophage T7 RNA polymerase to direct selective high-level expression of cloned genes. J. Mol. Biol. 1986;189:113–130. doi: 10.1016/0022-2836(86)90385-2. [DOI] [PubMed] [Google Scholar]
- 51.Enyeart PJ, et al. Generalized bacterial genome editing using mobile group II introns and Cre-lox. Mol. Syst. Biol. 2013;9:685. doi: 10.1038/msb.2013.41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Kim S, et al. Probing allostery through DNA. Science. 2013;339:816–819. doi: 10.1126/science.1229223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Einav T, Phillips R. How the avidity of polymerase binding to the -35/-10 promoter sites affects gene expression. Proc. Natl Acad. Sci. USA. 2019;116:13340–13345. doi: 10.1073/pnas.1905615116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Spronk CA, et al. Hinge-helix formation and DNA bending in various lac repressor-operator complexes. EMBO J. 1999;18:6472–6480. doi: 10.1093/emboj/18.22.6472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Bintu L, et al. Transcriptional regulation by the numbers: models. Curr. Opin. Genet. Dev. 2005;15:116–124. doi: 10.1016/j.gde.2005.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Vilar JMG, Guet CC, Leibler S. Modeling network dynamics: the lac operon, a case study. J. Cell Biol. 2003;161:471–476. doi: 10.1083/jcb.200301125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Razo-Mejia M, et al. Tuning transcriptional regulation through signaling: a predictive theory of allosteric induction. Cell Syst. 2018;6:456–469. doi: 10.1016/j.cels.2018.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Garza de Leon F, Sellars L, Stracy M, Busby SJW, Kapanidis AN. Tracking low-copy transcription factors in living bacteria: the case of the lac repressor. Biophys. J. 2017;112:1316–1327. doi: 10.1016/j.bpj.2017.02.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Browning DF, Godfrey RE, Richards KL, Robinson C, Busby SJW. Exploitation of the Escherichia coli lac operon promoter for controlled recombinant protein production. Biochem. Soc. Trans. 2019;47:755–763. doi: 10.1042/BST20190059. [DOI] [PubMed] [Google Scholar]
- 60.Liu M, Tolstorukov M, Zhurkin V, Garges S, Adhya S. A mutant spacer sequence between -35 and -10 elements makes the Plac promoter hyperactive and cAMP receptor protein-independent. Proc. Natl Acad. Sci. USA. 2004;101:6911–6916. doi: 10.1073/pnas.0401929101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Hook-Barnard IG, Hinton DM. Transcription initiation by mix and match elements: flexibility for polymerase binding to bacterial promoters. Gene Regul. Syst. Biol. 2007;1:275–293. [PMC free article] [PubMed] [Google Scholar]
- 62.Orsini G, et al. Phage T4 early promoters are resistant to inhibition by the anti-sigma factor AsiA. Mol. Microbiol. 2004;52:1013–1028. doi: 10.1111/j.1365-2958.2004.04038.x. [DOI] [PubMed] [Google Scholar]
- 63.Keilty S, Rosenberg M. Constitutive function of a positively regulated promoter reveals new sequences essential for activity. J. Biol. Chem. 1987;262:6389–6395. doi: 10.1016/S0021-9258(18)45582-2. [DOI] [PubMed] [Google Scholar]
- 64.Djordjevic M. Redefining Escherichia coli σ 70 promoter elements: −15 motif as a complement of the −10 motif. J. Bacteriol. 2011;193:6305–6314. doi: 10.1128/JB.05947-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Rhodius VA, Mutalik VK, Gross CA. Predicting the strength of UP-elements and full-length E. coli σE promoters. Nucleic Acids Res. 2012;40:2907–2924. doi: 10.1093/nar/gkr1190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Chan B, Busby S. Recognition of nucleotide sequences at the Escherichia coli galactose operon P1 promoter by RNA polymerase. Gene. 1989;84:227–236. doi: 10.1016/0378-1119(89)90496-4. [DOI] [PubMed] [Google Scholar]
- 67.Liu X, et al. De novo design of programmable inducible promoters. Nucleic Acids Res. 2019;47:10452–10463. doi: 10.1093/nar/gkz772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Levo M, et al. Systematic investigation of transcription factor activity in the context of chromatin using massively parallel binding and expression assays. Mol. Cell. 2017;65:604–617.e6. doi: 10.1016/j.molcel.2017.01.007. [DOI] [PubMed] [Google Scholar]
- 69.Weingarten-Gabbay S, et al. Systematic interrogation of human promoters. Genome Res. 2019;29:171–183. doi: 10.1101/gr.236075.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Hirschel BJ, Shen V, Schlessinger D. Lactose operon transcription from wild-type and L8-UV5 lac promoters in Escherichia coli treated with chloramphenicol. J. Bacteriol. 1980;143:1534–1537. doi: 10.1128/JB.143.3.1534-1537.1980. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Yudkin MD. Effect of point mutations in the lac promoter on transient and severe catabolite repression of the lac operon of Escherichia coli. Biochem. J. 1971;123:579–584. doi: 10.1042/bj1230579. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Miller, J. H. The Operon (Cold Spring Harbor Laboratory, 1980).
- 73.Estrem ST, Gaal T, Ross W, Gourse RL. Identification of an UP element consensus sequence for bacterial promoters. Proc. Natl Acad. Sci. USA. 1998;95:9761–9766. doi: 10.1073/pnas.95.17.9761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Zhang J, Kobert K, Flouri T, Stamatakis A. PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics. 2014;30:614–620. doi: 10.1093/bioinformatics/btt593. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Yu, T. C. et al. Multiplexed characterization of rationally designed promoter architectures deconstructs combinatorial logic for IPTG-inducible systems. inducible_architecture. 10.5281/zenodo.4099975 (2020). [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Raw data and promoter expression datasets are available without restrictions through NCBI Gene Expression Omnibus (Accession no. GSE145630). All other relevant data are available from the authors upon reasonable request. Source data are provided with this paper.
The Mathematica notebook used for the thermodynamic model, as well as all code for recreating plots, are available at https://github.com/timcyu/inducible_architecture75. Statistical significance is reported to a lower limit of P < 2.2 × 10−16, the lowest reportable value by R.