Skip to main content
eLife logoLink to eLife
. 2018 Dec 20;7:e40618. doi: 10.7554/eLife.40618

Measuring cis-regulatory energetics in living cells using allelic manifolds

Talitha L Forcier 1, Andalus Ayaz 1, Manraj S Gill 1,, Daniel Jones 1,2,, Rob Phillips 2, Justin B Kinney 1,
Editors: Richard A Neher3, Naama Barkai4
PMCID: PMC6301791  PMID: 30570483

Abstract

Gene expression in all organisms is controlled by cooperative interactions between DNA-bound transcription factors (TFs), but quantitatively measuring TF-DNA and TF-TF interactions remains difficult. Here we introduce a strategy for precisely measuring the Gibbs free energy of such interactions in living cells. This strategy centers on the measurement and modeling of ‘allelic manifolds’, a multidimensional generalization of the classical genetics concept of allelic series. Allelic manifolds are measured using reporter assays performed on strategically designed cis-regulatory sequences. Quantitative biophysical models are then fit to the resulting data. We used this strategy to study regulation by two Escherichia coli TFs, CRP and σ70 RNA polymerase. Doing so, we consistently obtained energetic measurements precise to 0.1 kcal/mol. We also obtained multiple results that deviate from the prior literature. Our strategy is compatible with massively parallel reporter assays in both prokaryotes and eukaryotes, and should therefore be highly scalable and broadly applicable.

Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that minor issues remain unresolved (see decision letter).

Research organism: E. coli

Introduction

Cells regulate the expression of their genes in response to biological and environmental cues. A major mechanism of gene regulation in all organisms is the binding of transcription factor (TF) proteins to cis-regulatory elements encoded within genomic DNA. DNA-bound TFs interact with one another, either directly or indirectly, forming cis-regulatory complexes that modulate the rate at which nearby genes are transcribed (Ptashne and Gann, 2002; Courey, 2008). Different arrangements of TF binding sites within cis-regulatory sequences can lead to different regulatory programs, but the rules that govern which arrangements lead to which regulatory programs remain largely unknown. Understanding these rules, which are often referred to as ‘cis-regulatory grammar’ (Spitz and Furlong, 2012), is a major challenge in modern biology.

Measuring the quantitative strength of interactions among DNA-bound TFs is critical for elucidating cis-regulatory grammar. In particular, knowing the Gibbs free energy of TF-DNA and TF-TF interactions is essential for building biophysical models that can quantitatively explain gene regulation in terms of simple protein-DNA and protein-protein interactions (Shea and Ackers, 1985; Bintu et al., 2005; Sherman and Cohen, 2012). Biophysical models have proven remarkably successful at quantitatively explaining regulation by a small number of well-studied cis-regulatory sequences. Arguably, the biggest successes have been achieved in the bacterium Escherichia coli, particularly in the context of the lac promoter (Vilar and Leibler, 2003; Kuhlman et al., 2007; Kinney et al., 2010; Garcia and Phillips, 2011; Brewster et al., 2014) and the OR/OL control region of the λ phage lysogen (Ackers et al., 1982; Shea and Ackers, 1985; Cui et al., 2013). But in both cases, this quantitative understanding has required decades of focused study. New approaches for dissecting cis-regulatory energetics, approaches that are both systematic and scalable, will be needed before a general quantitative understanding of cis-regulatory grammar can be developed.

Here we address this need by describing a systematic experimental/modeling strategy for dissecting the biophysical mechanisms of transcriptional regulation in living cells. Our strategy centers on the concept of an ‘allelic manifold’. Allelic manifolds generalize the classical genetics concept of allelic series to multiple dimensions. An allelic series is a set of sequence variants that affect the same phenotype (or phenotypes) but differ in their quantitative strength. Here we construct allelic manifolds by measuring, in multiple experimental contexts, the phenotypic strength of each variant in an allelic series. Each variant thus corresponds to a data point in a multi-dimensional ‘measurement space’. If the measurement space is of high enough dimension, and if one’s measurements are sufficiently precise, these data should collapse to a lower-dimension manifold that represents the inherent phenotypic dimensionality of the allelic series. These data can then be used to infer quantitative biophysical models that describe the shape of the allelic manifold, as well as the location of each allelic variant within that manifold. As we show here, such inference allows one to determine in vivo values for important biophysical quantities with remarkable precision.

We demonstrate this strategy on a regulatory paradigm in E. coli: activation of the σ70 RNA polymerase holoenzyme (RNAP) by the cAMP receptor protein (CRP, also called CAP). CRP activates transcription when bound to DNA at positions upstream of RNAP (Busby and Ebright, 1999), and the strength of these interactions is known to depend strongly on the precise nucleotide spacing between CRP and RNAP binding sites (Gaston et al., 1990; Ushida and Aiba, 1990). However, the Gibbs free energies of these interactions are still largely unknown. To our knowledge, only the CRP-RNAP interaction at the lac promoter has previously been quantitatively measured (Kuhlman et al., 2007; Kinney et al., 2010). By measuring and modeling allelic manifolds, we systematically determined the in vivo Gibbs free energy (ΔG) of CRP-RNAP interactions that occur at a variety of different binding site spacings. These ΔG values were consistently measured to an estimated precision of ~ 0.1 kcal/mol. We also obtained ΔG values for in vivo CRP-DNA and RNAP-DNA interactions, again with similar estimated precision.

The Results section that follows is organized into three Parts, each of which describes a different use for allelic manifolds. Part 1 focuses on measuring TF-DNA interactions, Part 2 focuses on TF-TF interactions, and Part 3 shows how to distinguish different possible mechanisms of transcriptional activation. Each Part consists of three subsections: Strategy, Demonstration, and Aside. Strategy covers the theoretical basis for the proposed use of allelic manifolds. Demonstration describes how we applied this strategy to better understand regulation by CRP and RNAP. Aside describes related findings that are interesting but somewhat tangential.

Results

Part 1. Strategy: Measuring TF-DNA interactions

We begin by showing how allelic manifolds can be used to measure the in vivo strength of TF binding to a specific DNA binding site. This measurement is accomplished by using the TF of interest as a transcriptional repressor. We place the TF binding site directly downstream of the RNAP binding site in a bacterial promoter so that the TF, when bound to DNA, sterically occludes the binding of RNAP. We then measure the rate of transcription from a few dozen variant RNAP binding sites. Transcription from each variant site is assayed in both the presence and in the absence of the TF.

Figure 1A illustrates a thermodynamic model (Shea and Ackers, 1985; Bintu et al., 2005; Sherman and Cohen, 2012) for this type of simple repression. In this model, promoter DNA can be in one of three states: unbound, bound by the TF, or bound by RNAP. Each of these three states is assumed to occur with a frequency that is consistent with thermal equilibrium, that is with a probability proportional to its Boltzmann weight.

Figure 1. Strategy for measuring TF-DNA interactions.

Figure 1.

(A) A thermodynamic model of simple repression. Here, promoter DNA can transition between three possible states: unbound, bound by a TF, or bound by RNAP. Each state has an associated Boltzmann weight and rate of transcript initiation. F is the TF binding factor and P is the RNAP binding factor; see text for a description of how these dimensionless binding factors relate to binding affinity and binding energy. tsat is the rate of specific transcript initiation from a promoter fully occupied by RNAP. (B) Transcription is measured in the presence (t+) and absence (t-) of the TF. Measurements are made for an allelic series of RNAP binding sites that differ in their binding strengths (blue-yellow gradient). (C) If the model in panel A is correct, plotting t+ vs. t- for the promoters in panel B (colored dots) will trace out a 1D allelic manifold. Mathematically, this manifold reflects Equation 1 and Equation 2 computed over all possible values of the RNAP binding factor P while the other parameters (F, tsat) are held fixed. Note that these equations include a background transcription term tbg; it is assumed throughout that tbgtsat and that tbg is independent of RNAP binding site sequence. The resulting manifold exhibits five distinct regimes (circled numbers), corresponding to different ranges for the value of P that allow the mathematical expressions in Equations 1 and 2 to be approximated by simplified expressions. In regime 3, for instance, t+t-/(1+F), and thus the manifold approximately follows a line parallel (on a log-log plot) to the diagonal but offset below it by a factor of 1+F (dashed line). Data points in this regime can therefore be used to determine the value of F. (D) The five regimes of the allelic manifold, including approximate expressions for t+ and t- in each regime, as well as the range of validity for P.

The energetics of protein-DNA binding determine the Boltzmann weight for each state. By convention we set the weight of the unbound state equal to 1. The weight of the TF-bound state is then given by F=[TF]KF where [TF] is the concentration of the TF and KF is the affinity constant in inverse molar units. Similarly, the weight of the RNAP-bound state is P=[RNAP]KP. In what follows we refer to F and P as the ‘binding factors’ of the TF-DNA and RNAP-DNA interactions, respectively. We note that these binding factors can also be written as F=e-ΔGF/kBT and P=e-ΔGP/kBT where kB is Boltzmann’s constant, T is temperature, and ΔGF and ΔGP respectively denote the Gibbs free energy of binding for the TF and RNAP. Note that each Gibbs free energy accounts for the entropic cost of pulling each protein out of solution. In what follows, we report ΔG values in units of kcal/mol; note that 1 kcal/mol = 1.62kBT at 37 °C.

The overall rate of transcription is computed by summing the amount of transcription produced by each state, weighting each state by the probability with which it occurs. In this case we assume the RNAP-bound state initiates at a rate of tsat, and that the other states produce no transcripts. We also add a term, tbg, to account for background transcription (e.g., from an unidentified promoter further upstream). The rate of transcription in the presence of the TF is thus given by 

t+=tsatP1+F+P+tbg. (1)

In the absence of the TF (F=0), the rate of transcription becomes 

t-=tsatP1+P+tbg. (2)

Our goal is to measure the TF-DNA binding factor F. To do this, we create a set of promoter sequences where the RNAP binding site is varied (thus generating an allelic series) but the TF binding site is kept fixed. We then measure transcription from these promoters in both the presence and absence of the TF, respectively denoting the resulting quantities by t+ and t- (Figure 1B). Our rationale for doing this is that changing the RNAP binding site sequence should, according to our model, affect only the RNAP-DNA binding factor P. All of our measurements are therefore expected to lie along a one-dimensional allelic manifold residing within the two-dimensional space of (t-, t+) values. Moreover, this allelic manifold should follow the specific mathematical form implied by Equations 1 and 2 when P is varied and the other parameters (tsat, tbg, F) are held fixed; see Figure 1C.

The geometry of this allelic manifold is nontrivial. Assuming F1 and tbgtsat, there are five different regimes corresponding to different values of the RNAP binding factor P. These regimes are listed in Figure 1D and derived in Appendix 4. In regime 1, P is so small that both t+ and t- are dominated by background transcription, that is t+ttbg. P is somewhat larger in regime 2, causing t- to be proportional to P while t+ remains dominated by background. In regime 3, both t+ and t- are proportional to P with t+/t-1/(1+F). In regime 4, t- saturates at tsat while t+ remains proportional to P. Regime five occurs when both t+ and t- are saturated, that is t+t-tsat.

Part 1. Demonstration: Measuring CRP-DNA binding

The placement of CRP immediately downstream of RNAP is known to repress transcription (Morita et al., 1988). We therefore reasoned that placing a DNA binding site for CRP downstream of RNAP would allow us to measure the binding factor of that site. Figure 2 illustrates measurements of the allelic manifold used to characterize the strength of CRP binding to the 22 bp site GAATGTGACCTAGATCACATTT. This site contains the well-known consensus site, which comprises two palindromic pentamers (underlined) separated by a 6 bp spacer (Gunasekera et al., 1992). We performed measurements using this CRP site centered at two different locations relative to the transcription start site (TSS): +0.5 bp and +4.5 bp. Note that the first transcribed base is, in this paper, assigned position 0 instead of the more conventional +1, and half-integer positions indicate centering between neighboring nucleotides. To avoid influencing CRP binding strength, the −10 region of the RNAP site was kept fixed in the promoters we assayed while the −35 region of the RNAP binding site was varied (Figure 2A). Promoter DNA sequences are shown in Appendix 1—figure 1.

Figure 2. Precision measurement of in vivo CRP-DNA binding.

Figure 2.

(A) Expression measurements were performed on promoters for which CRP represses transcription by occluding RNAP. Each promoter assayed contained a near-consensus CRP binding site centered at either +0.5 bp or +4.5 bp, as well as an RNAP binding site with a partially mutagenized −35 region (gradient). t+ (or t-) denotes measurements made using E. coli strain JK10 grown in the presence (or absence) of the small molecule cAMP. (B) Dots indicate measurements for 41 such promoters. A best-fit allelic manifold (black) was inferred from n=39 of these data points after the exclusion of 2 outliers (gray ‘X’s). Gray lines indicate 100 plausible allelic manifolds fit to bootstrap-resampled data points. The parameters of these manifolds were used to determine the CRP-DNA binding factor F and thus the Gibbs free energy ΔGF=-kBTlogF. Error bars indicate 68% confidence intervals determined by bootstrap resampling. See Appendix 3 for more information about our manifold fitting procedure.

We obtained t- and t+ measurements for these constructs using a modified version of the colorimetric β-galactosidase assay of Lederberg (1950) and Miller (1972); see Appendix 2 for details. Our measurements are largely consistent with an allelic manifold having the expected mathematical form (Figure 2B). Moreover, the measurements for promoters with CRP sites at two different positions (+0.5 bp and +4.5 bp) appear consistent with each other, although the measurements for +4.5 bp promoters appear to have lower values for P overall. A small number of data points do deviate substantially from this manifold, but the presence of such outliers is not surprising from a biological perspective (see Discussion). Fortunately, outliers appear at a rate small enough for us to identify them by inspection.

We quantitatively modeled the allelic manifold in Figure 2B by fitting n+3 parameters to our 2n measurements, where n=39 is the number of non-outlier promoters. The n+3 parameters were tsat, tbg, F, and P1, P2, …, Pn, where each Pi is the RNAP binding factor of promoter i. Nonlinear least squares optimization was used to infer values for these parameters. Uncertainties in tsat, tbg, and F were quantified by repeating this procedure on bootstrap-resampled data points. See Appendix 3 for details.

These results yielded highly uncertain values for tsat because none of our measurements appear to fall within regime 4 or 5 of the allelic manifold. A reasonably precise value for tbg was obtained, but substantial scatter about our model predictions in regime 1 and 2 remain. This scatter likely reflects some variation in tbg from promoter to promoter, variation that is to be expected since the source of background transcription is not known and the appearance of even very weak promoters could lead to such fluctuations.

These data do, however, determine a highly precise value for the strength of CRP-DNA binding: F=23.9-2.5+3.1 or, equivalently, ΔGF=-1.96±0.07 kcal/mol. This allelic manifold approach is thus able to measure the strength of TF-DNA binding with a precision of ~ 0.1 kcal/mol. For comparison, the typical strength of a hydrogen bond in liquid water is −1.9 kcal/mol (Markovitch and Agmon, 2007).

We note that CRP forms approximately 38 hydrogen bonds with DNA when it binds to a consensus DNA site (Parkinson et al., 1996). Our result indicates that, in living cells, the enthalpy resulting from these and other interactions is almost exactly canceled by entropic factors. We also note that our in vivo value for F is far smaller than expected from experiments in aqueous solution. The consensus CRP binding site has been measured in vitro to have an affinity constant of KF1011M-1 (Ebright et al., 1989). There are probably about 103 CRP dimers per cell (Schmidt et al., 2016), giving a concentration [CRP]106 M. Putting these numbers together gives a binding factor of F105. The nonspecific binding of CRP to genomic DNA and other molecules in the cell, and perhaps limited DNA accessibility as well, might be responsible for this ~ 105-fold disagreement with our in vivo measurements.

Part 1. Aside: Measuring changes in the concentration of active CRP

Varying cAMP concentrations in growth media changes the in vivo concentration of active CRP in the E. coli strain we assayed (JK10). Such variation is therefore expected to alter the CRP-DNA binding factor F. We tested whether this was indeed the case by measuring multiple allelic manifolds, each using a different concentration of [cAMP][cAMP][cAMP] when measuring t+. These measurements were performed on promoters with CRP binding sites at +0.5 bp (Figure 3A). The resulting data are shown in Figure 3B. To these data, we fit allelic manifolds having variable values for F, but fixed values for both tbg and tsat (tbg=2.30×10-3 a.u. was inferred in the prior analysis for Figure 2B; tsat=15.1 a.u. was inferred in the subsequent analysis for Figure 5C).

Figure 3. Measuring in vivo changes in TF concentration.

Figure 3.

(A) Allelic manifolds were measured for the +0.5 bp occlusion promoter architecture using seven different concentrations of cAMP (ranging from 2.5 µM to 250 µM) when assaying t+. (B) As expected, these data follow allelic manifolds that have cAMP-dependent values for the CRP binding factor F. (C) Values for F inferred from the data in panel B exhibit a nontrivial power law dependence on [cAMP]. Error bars indicate 68% confidence intervals determined by bootstrap resampling.

This procedure allowed us to quantitatively measure changes in the RNAP binding factor F, and thus changes in the in vivo concentration of active CRP. Our results, shown in Figure 3C, suggest a nontrivial power law relationship between F and [cAMP]. To quantify this relationship, we performed least squares regression (logF against log[cAMP]) using data for the four largest cAMP concentrations; measurements of F for the three other cAMP concentrations have large asymmetric uncertainties and were therefore excluded. We found that F[cAMP]1.41±0.18, with error bars representing a 95% confidence interval. We emphasize, however that our data do not rule out a more complex relationship between [cAMP] and F.

There are multiple potential explanations for this deviation from proportionality. One possibility is cooperative binding of cAMP to the two binding sites within each CRP dimer. Such cooperativity could, for instance, result from allosteric effects like those described in Einav et al., 2018. Alternatively, this power law behavior might reflect unknown aspects of how cAMP is imported and exported from E. coli cells. It is worth comparing and contrasting this result to those reported in Kuhlman et al. (2007). JK10, the E. coli strain used in our experiments, is derived from strain TK310, which was developed in Kuhlman et al. (2007). In that work, the authors concluded that F[cAMP], whereas our data leads us to reject this hypothesis. This illustrates one way in which using allelic manifolds to measure how in vivo TF concentrations vary with growth conditions can be useful.

Part 2. Strategy: Measuring TF-RNAP interactions

Next we discuss how to measure an activating interaction between a DNA-bound TF and DNA-bound RNAP. A common mechanism of transcriptional activation is ‘stabilization’ (also called ‘recruitment’; see Ptashne, 2003). This occurs when a DNA-bound TF stabilizes the RNAP-DNA closed complex. Stabilization effectively increases the RNAP-DNA binding affinity KP, and thus the binding factor P. It does not affect tsat, the rate of transcript initiation from RNAP-DNA closed complexes.

A thermodynamic model for activation by stabilization is illustrated in Figure 4A. Here promoter DNA can be in four states: unbound, TF-bound, RNAP-bound, or doubly bound. In the doubly bound state, a ‘cooperativity factor’ α contributes to the Boltzmann weight. This cooperativity factor is related to the TF-RNAP Gibbs free energy of interaction, ΔGα, via α=e-ΔGα/kBT. Activation occurs when α>1 (i.e., ΔGα<0). The resulting activated transcription rate is given by 

Figure 4. Strategy for measuring TF-RNAP interactions.

Figure 4.

(A) A thermodynamic model of simple activation. Here, promoter DNA can transition between four different states: unbound, bound by the TF, bound by RNAP, or doubly bound. As in Figure 1, F is the TF binding factor, P is the RNAP binding factor, and tsat is the rate of transcript initiation from an RNAP-saturated promoter. The cooperativity factor α quantifies the strength of the interaction between DNA-bound TF and RNAP molecules; see text for more information on this quantity. (B) As in Figure 1, expression is measured in the presence (t+) and absence (t-) of the TF for promoters that have an allelic series of RNAP binding sites (blue-yellow gradient). (C) If the model in panel A is correct, plotting t+ vs. t- (colored dots) will reveal a 1D allelic manifold that corresponds to Equation 4 (for t+) and Equation 2 (for t-) evaluated over all possible values of P. Circled numbers indicate the five regimes of this manifold. In regime 3, t+αt- where α is the renormalized cooperativity factor given in Equation 5; data in this regime can thus be used to measure α. Separate measurements of F, using the strategy in Figure 1, then allow one to compute α from knowledge of α. (D) The five regimes of the allelic manifold in panel C. Note that these regimes differ from those in Figure 1D.

t+=tsatP+αFP1+F+P+αFP+tbg. (3)

This can be rewritten as

t+=tsatαP1+αP+tbg, (4)

where

α=1+αF1+F (5)

is a renormalized cooperativity that accounts for the strength of TF-DNA binding. As before, t- is given by Equation 2. Note that αα and that αα when F1 and α1/F.

As before, we measure both t+ and t- for an allelic series of RNAP binding sites (Figure 4B). These measurements will, according to our model, lie along an allelic manifold resembling the one shown in Figure 4C. This allelic manifold exhibits five distinct regimes (when tsat/tbgα1), which are listed in Figure 4D.

Part 2. Demonstration: Measuring class I CRP-RNAP interactions

CRP activates transcription at the lac promoter and at other promoters by binding to a 22 bp site centered at −61.5 bp relative to the TSS. This is an example of class I activation, which is mediated by an interaction between CRP and the C-terminal domain of one of the two RNAP α subunits (the αCTDs) (Busby and Ebright, 1999). In vitro experiments have shown this class I CRP-RNAP interaction to activate transcription by stabilizing the RNAP-DNA closed complex.

We measured t+ and t- for 47 variants of the lac* promoter (see Appendix 1—figure 1 for sequences). These promoters have the same CRP binding site assayed for Figure 2, but positioned at −61.5 bp relative to the TSS (Figure 5A). They differ from one another in the −10 or −35 regions of their RNAP binding sites. Figure 5B shows the resulting measurements. With the exception of 3 outlier points, these measurements appear consistent with stabilizing activation via a Gibbs free energy of ΔGα=-4.05±0.08 kcal/mol, corresponding to a cooperativity of α=712-83+102. We note that, with F=23.9 determined in Figure 2B, α=α to 4% accuracy.

Figure 5. Precision measurement of class I CRP-RNAP interactions.

Figure 5.

(A) t+ and t- were measured for promoters containing a CRP binding site centered at −61.5 bp. The RNAP sites of these promoters were mutagenized in either their −10 or −35 regions (gradient), generating two allelic series. As in Figure 2, t+ and t- correspond to expression measurements respectively made in the presence and absence of cAMP. (B) Data obtained for 47 variant promoters having the architecture shown in panel A. Three data points designated as outliers are indicated by ‘X’s. The allelic manifold that best fits the n=44 non-outlier points is shown in black; 100 plausible manifolds, estimated from bootstrap-resampled data points, are shown in gray. The resulting values for α and ΔGα=-kBTlogα are also shown, with 68% confidence intervals indicated. (C) Allelic manifolds obtained for promoters with CRP binding sites centered at a variety of class I positions. (D) Inferred values for the cooperativity factor α and corresponding Gibbs free energy ΔGα for the 12 different promoter architectures assayed in panel C. Error bars indicate 68% confidence intervals. Numerical values for α and ΔGα at all of these class I positions are provided in Table 1.

This observed cooperativity is substantially stronger than suggested by previous work. Early in vivo experiments suggested a much lower cooperativity value, for example 50-fold (Beckwith et al., 1972), 20-fold (Ushida and Aiba, 1990), or even 10-fold (Gaston et al., 1990). These previous studies, however, only measured the ratio t+/t- for a specific choice of RNAP binding site. This ratio is (by Equation 4) always less than α and the differences between these quantities can be substantial. However, even studies that have used explicit biophysical modeling have determined lower cooperativity values: Kuhlman et al. (2007) reported a cooperativity of α240 (ΔGα-3.4 kcal/mol), while Kinney et al. (2010) reported α220 (ΔGα-3.3 kcal/mol). Both of these studies, however, relied on the inference of complex biophysical models with many parameters. The allelic manifold in Figure 4, by contrast, is characterized by only three parameters (tsat, tbg, α), all of which can be approximately determined by visual inspection.

To test the generality of this approach, we measured allelic manifolds for 11 other potential class I promoter architectures. At every one of these positions we clearly observed the collapse of data to a 1D allelic manifold of the expected shape (Figure 5C). We then modeled these data using values of α and tbg that depend on CRP binding site location, as well as a single overall value for tsat. The resulting values for α (and equivalently ΔGα) are shown in Figure 5D and reported in Table 1. As first shown by Gaston et al. (1990) and Ushida and Aiba (1990), α depends strongly on the spacing between the CRP and RNAP binding sites. In particular, α exhibits a strong ~ 10.5 bp periodicity reflecting the helical twist of DNA. However, as with the measurement in Figure 5B, the α values we measure are far larger than the t+/t- ratios previously reported by Gaston et al. (1990) and Ushida and Aiba (1990); see Table 1. We also find tsat=15.1-0.5+0.6 a.u. The single-cell observations of So et al. (2011) suggest that this corresponds to 13.8±6.6 transcripts per minute. By pure coincidence, the ‘arbitrary unit’ (a.u.) units we use in this paper correspond very closely to ‘transcripts per minute’.

Table 1. Summary of results for class I activation by CRP.

The α and ΔGα values listed here correspond to the values plotted in Figure 5D. The corresponding value inferred for the saturated transcription rate is tsat=15.1-0.5+0.6 a.u. Error bars indicate 68% confidence intervals; see Appendix 3 for details. n is the number of data points used to infer these values, while ‘outliers’ is the number of data points excluded in this analysis. For comparison we show the fold-activation measurements (i.e., t+/t-) reported in Gaston et al. (1990) and Ushida and Aiba (1990); ‘-’ indicates that no measurement was reported for that position.

Position (bp) n Outliers ΔGα (kcal/mol) α t+/t- (Gaston) t+/t- (Ushida)
−60.5 21 0 -2.09±0.08 29.6-3.5+4.7 3.85 -
−61.5 44 3 -4.10±0.08 763-84+113 9.05 20.6
−62.5 23 0 -2.43±0.11 51.4-8.5+9.0 4.22 -
−63.5 20 1 -0.88±0.05 4.15-0.37+0.30 - -
−64.5 17 0 -1.08±0.08 5.80-0.67+0.89 - -
−65.5 17 0 -0.48±0.03 2.16-0.11+0.10 - -
−66.5 19 1 0.00±0.04 0.99-0.07+0.07 0.78 0.84
−71.5 35 1 -2.88±0.04 105-7+7 2.50 16.4
−72.5 20 0 -2.73±0.04 83.0-5.8+5.2 3.49 -
−76.5 16 0 -0.15±0.04 1.27-0.06+0.09 0.54 -
−81.5 32 0 -1.53±0.03 11.9-0.8+0.4 - -
−82.5 20 0 -1.82±0.05 19.0-1.8+1.3 - 6.99

Part 2. Aside: Difficulties predicting binding affinity from DNA sequence

The measurement and modeling of allelic manifolds sidesteps the need to parametrically model how protein-DNA binding affinity depends on DNA sequence. In modeling the allelic manifolds in Figure 5C, we obtained values for the RNAP binding factor, P=[RNAP]KP, for each variant RNAP binding site from the position of the corresponding data point along the length of the manifold.

RNAP has a very well established sequence motif (McClure et al., 1983). Indeed, its DNA binding requirements were among the first characterized for any DNA-binding protein (Pribnow, 1975). More recently, a high-resolution model for RNAP-DNA binding energy was determined using data from a massively parallel reporter assay called Sort-Seq (Kinney et al., 2010). This position-specific affinity matrix (PSAM) assumes that the nucleotide at each position contributes additively to the overall binding energy (Figure 6A). This model is consistent with previously described RNAP binding motifs but, unlike those motifs, it can predict binding energy in physically meaningful energy units (i.e., kcal/mol). In what follows we denote these binding energies as ΔΔGP, because they describe differences in the Gibbs free energy of binding between two DNA sites.

Figure 6. RNAP-DNA binding energy cannot be accurately predicted from sequence.

Figure 6.

(A) The PSAM for RNAP-DNA binding inferred by Kinney et al. (2010). This model assumes that the DNA base pair at each position in the RNAP binding site contributes independently to ΔGP. Shown are the ΔΔGP values assigned by this model to mutations away from the lac* RNAP site. The sequence of the lac* RNAP site is indicated by gray vertical bars; see also Appendix 1—figure 1. A sequence logo representation for this PSAM is provided for reference. (B) PSAM predictions plotted against the values ΔGP=kBTlogP inferred by fitting the allelic manifolds in Figure 5C. Error bars on these measurements represent 68% confidence intervals. Note that measured ΔGP values are absolute, whereas the ΔΔGP predictions of the PSAM are relative to the lac* RNAP site, which thus corresponds to ΔΔGP=0 kcal/mol. The dashed line, provided for reference, has slope 1 and passes through this lac* data point.

There is good reason to believe this PSAM to be the most accurate current model of RNAP-DNA binding. However, subsequent work has suggested that the predictions of this model might still have substantial inaccuracies (Brewster et al., 2012). To investigate this possibility, we compared our measured values for the Gibbs free energy of RNAP-DNA binding (ΔGP=-kBTlogP) to binding energies (ΔΔGP) predicted using the PSAM from Kinney et al. (2010). These values are plotted against one another in Figure 6B. Although there is a strong correlation between the predictions of the model and our measurements, deviations of 1 kcal/mol or larger (corresponding to variations in P of 5-fold or greater) are not uncommon. Model predictions also systematically deviate from the diagonal, suggesting inaccuracy in the overall scale of the PSAM.

This finding is sobering: even for one of the best understood DNA-binding proteins in biology, our best sequence-based predictions of in vivo protein-DNA binding affinity are still quite crude. When used in conjunction with thermodynamic models, as in Kinney et al. (2010), the inaccuracies of these models can have major effects on predicted transcription rates. The measurement and modeling of allelic manifolds sidesteps the need to parametrically model such binding energies, enabling the direct inference of Gibbs free energy values for each assayed RNAP binding site.

Part 3. Strategy: Distinguishing mechanisms of transcriptional activation

E. coli TFs can regulate multiple different steps in the transcript initiation pathway (Lee et al., 2012; Browning and Busby, 2016). For example, instead of stabilizing RNAP binding to DNA, TFs can activate transcription by increasing the rate at which DNA-bound RNAP initiates transcription (Roy et al., 1998), a process we refer to as ‘acceleration’. CRP, in particular, has previously been reported to activate transcription in part by acceleration when positioned appropriately with respect to RNAP (Niu et al., 1996; Rhodius et al., 1997).

We investigated whether allelic manifolds might be used to distinguish activation by acceleration from activation by stabilization. First we generalized the thermodynamic model in Figure 4A to accommodate both α-fold stabilization and β-fold acceleration (Figure 7A). This is accomplished by using the same set of states and Boltzmann weights as in the model for stabilization, but assigning a transcription rate βtsat (rather than just tsat) to the TF-RNAP-DNA ternary complex. The resulting activated rate of transcription is given by

Figure 7. A strategy for distinguishing two different mechanisms of transcriptional activation.

Figure 7.

(A) A TF can activate transcription in two ways: by stabilizing the RNAP-DNA complex or by accelerating the rate at which this complex initiates transcripts. (B) A thermodynamic model for the dual mechanism of transcriptional activation illustrated in panel A. Note that α multiplies the Boltzmann weight of the doubly bound complex, whereas β multiplies the transcript initiation rate of this complex. (C) Data points measured as in Figure 4C will lie along a 1D allelic manifold having the form shown here. This manifold is computed using t+ values from Equation 7 and t- values from Equation 2. Note that regime five occurs at a point positioned β-fold above the diagonal, where β is related to β through Equation 8. Measurements in or near the strong promoter regime (P1) can thus be used to determine the value of β and, consequently, the value of β. (D) The five regimes of this allelic manifold are listed.

t+=tsatP1+F+P+αFP+βtsatαFP1+F+P+αFP+tbg. (6)

This simplifies to

t+=βtsatαP1+αP+tbg, (7)

where α is the same as in Equation 5 and

β=1+αβF1+αF (8)

is a renormalized version of the acceleration rate β. The resulting allelic manifold is illustrated in Figure 7C. Like the allelic manifold for stabilization, this manifold has up to five distinct regimes corresponding to different values of P (Figure 7D). Unlike the stabilization manifold however, t+t- in the strong RNAP binding regime (regime 5); rather, t+βtsat while t-tsat.

Part 3. Demonstration: Mechanisms of class I activation by CRP

We asked whether class I activation by CRP has an acceleration component. Previous in vitro work had suggested that the answer is ‘no’ (Malan et al., 1984; Busby and Ebright, 1999), but our allelic manifold approach allows us to address this question in vivo. We proceeded by assaying promoters containing variant alleles of the consensus RNAP binding site (Figure 8A). Note that the consensus RNAP site is 1 bp shorter than the lac* RNAP site (Appendix 1—figure 1, panel C versus panel B). We therefore positioned the CRP binding site at −60.5 bp in order to realize the same spacing between CRP and the −35 element of the RNAP binding site that was realized in −61.5 bp non-consensus promoters.

Figure 8. Class I activation by CRP occurs exclusively through stabilization.

Figure 8.

(A) t+ and t- were measured for promoters containing variants of the consensus RNAP binding site as well as a CRP binding site centered at −60.5 bp. Because the consensus RNAP site is 1 bp shorter than the RNAP site of the lac* promoter, CRP at −60.5 bp here corresponds to CRP at −61.5 bp in Figure 5. (B) n=18 data points obtained for the constructs in panel A, overlaid on the measurements from Figure 5B (gray). The value tsat=15.1 a.u., inferred for Figure 5C, is indicated by dashed lines. (C) Values for β inferred using the data in Figure 5 for the 10 CRP positions that exhibited greater than 2-fold inducibility; β values at the two other CRP positions (−66.5 bp and −76.5 bp) were highly uncertain and are not shown. Error bars indicate 68% confidence intervals.

The resulting data (Figure 8B) are seen to largely fall along the previously measured all-stabilization allelic manifold in Figure 5B. In particular, many of these data points lie at the intersection of this manifold with the t+=t- diagonal. We thus find that β1 for CRP at −61.5 bp. To further quantify possible β values, we fit the acceleration model in Figure 7 to each dataset shown in Figure 5B, assuming a fixed value of tsat=15.1 a.u. The resulting inferred values for β, shown in Figure 8C, indicate little if any deviation from β=1. Our high-precision in vivo results therefore substantiate the previous in vitro results of Malan et al. (1984) regarding the mechanism of class I activation.

Part 3. Aside: Surprises in class II regulation by CRP

Many E. coli TFs participate in what is referred to as class II activation (Browning and Busby, 2016). This type of activation occurs when the TF binds to a site that overlaps the −35 element (often completely replacing it) and interacts directly with the main body of RNAP. CRP is known to participate in class II activation at many promoters (Keseler et al., 2011Salgado et al., 2013), including the galP1 promoter, where it binds to a site centered at position −41.5 bp (Adhya, 1996). In vitro studies have shown CRP to activate transcription at −41.5 bp relative to the TSS through a combination of stabilization and acceleration (Niu et al., 1996; Rhodius et al., 1997).

We sought to reproduce this finding in vivo by measuring allelic manifolds. We therefore placed a consensus CRP site at −41.5 bp, replacing much of the −35 element in the process, and partially mutated the −10 element of the RNAP binding site (Figure 9A). Surprisingly, we observed that the resulting allelic manifold saturates at the same tsat value shared by all class I promoters. Thus, CRP appears to activate transcription in vivo solely through stabilization, and not at all through acceleration, when located at −41.5 bp relative to the TSS (Figure 9B).

Figure 9. Surprises in class II regulation by CRP.

Figure 9.

(A) Regulation by CRP centered at −41.5 bp was assayed using an allelic series of RNAP binding sites that have variant −10 elements (gradient). (B) The observed allelic manifold plateaus at the value of tsat=15.1 a.u. (dashed lines) determined for Figure 5B, thus indicating no detectable acceleration by CRP. This lack of acceleration is at odds with prior in vitro studies (Niu et al., 1996; Rhodius et al., 1997). (C) Regulation by CRP centered at −40.5 bp was assayed in an analogous manner. (D) Unexpectedly, data from the promoters in panel C do not collapse to a 1D allelic manifold. This finding falsifies the biophysical models in Figures 4A and 7B and indicates that CRP can either activate or repress transcription from this position, depending on as-yet-unidentified features of the RNAP binding site. Error bars in panel D indicate 95% confidence intervals estimated from replicate experiments.

The genome-wide distribution of CRP binding sites suggests that CRP also participates in class II activation when centered at −40.5 bp (Keseler et al., 2011Salgado et al., 2013). When assaying this promoter architecture, however, we obtained a 2D scatter of points that did not collapse to any discernible 1D allelic manifold (Figure 9D). Some of these promoters exhibit activation, some exhibit repression, and some exhibit no regulation by CRP.

These observations complicate the current understanding of class II regulation by CRP. Our in vivo measurements of CRP at −41.5 bp call into question the mechanism of activation previously discerned using in vitro techniques. The scatter observed when CRP is positioned at −40.5 bp suggests that, at this position, the −10 region of the RNAP binding site influences the values of at least two relevant biophysical parameters (not just P, as our model predicts). A potential explanation for both observations is that, because CRP and RNAP are so intimately positioned at class II promoters, even minor changes in their relative orientation caused by differences between in vivo and in vitro conditions or by changes in RNAP site sequence could have a major effect on CRP-RNAP interactions. Such sensitivity would not be expected to occur in class I activation, due to the flexibility with which the RNAP αCTDs are tethered to the core complex of RNAP.

Discussion

We have shown how the measurement and quantitative modeling of allelic manifolds can be used to dissect cis-regulatory biophysics in living cells. This approach was demonstrated in E. coli in the context of transcriptional regulation by two well-characterized TFs: RNAP and CRP. Here we summarize our primary findings. We then address some caveats and limitations of the work reported here. Finally, we elaborate on how future studies might be able to scale up this approach using massively parallel reporter assays (MPRAs), including for studies in eukaryotic systems.

Summary

In each of our experiments, we quantitatively measured transcription from an allelic series of variant RNAP binding sites, each site embedded in a fixed promoter architecture. Two expression measurements were made for each variant promoter: t+ was measured in the presence of the active form of CRP, while t- was measured in the absence of active CRP. This yielded a data point, (t-,t+), in a two-dimensional measurement space. We had expected the data points thus obtained for each allelic series to collapse to a 1D curve (the allelic manifold), with different positions along this manifold corresponding to different values of RNAP-DNA binding affinity. Such collapse was indeed observed in all but one of the promoter architectures we studied. By fitting the parameters of quantitative biophysical models to these data, we obtained in vivo values for the Gibbs free energy (ΔG) of a variety of TF-DNA and TF-TF interactions.

In Part 1, we showed how measuring allelic manifolds for promoters in which a DNA-bound TF occludes RNAP can allow one to precisely measure the ΔG of TF-DNA binding. We demonstrated this strategy on promoters where CRP occludes RNAP, thereby obtaining the ΔG for a CRP binding site that was used in subsequent experiments. As an aside, we demonstrated how performing such measurements in different concentrations of the small molecule cAMP allowed us to quantitatively measure in vivo changes in active CRP concentration.

In Part 2, we showed how allelic manifolds can be used to measure the ΔG of TF-RNAP interactions. We used this strategy to measure the stabilizing interactions by which CRP up-regulates transcription at a variety of class I promoter architectures. Our strategy consistently yielded ΔG values with an estimated precision of 0.1 kcal/mol. As an aside, we showed how ΔG values for RNAP-DNA binding could also be obtained from these data. Notably, these ΔG measurements for RNAP-DNA binding were seen to deviate substantially from sequence-based predictions using an established position-specific affinity matrix (PSAM) for RNAP. This highlights just how difficult it can be to accurately predict TF-DNA binding affinity from DNA sequence.

In Part 3, we showed how allelic manifolds can allow one to distinguish between two potential mechanisms of transcriptional activation: ‘stabilization’ (a.k.a. ‘recruitment’) and ‘acceleration’. Applying this approach to the data from Part 2, we confirmed (as expected) that class I activation by CRP does indeed occur through stabilization and not acceleration. As an aside, we pursued this approach at two class II promoters. In contrast to prior in vitro studies (Niu et al., 1996; Rhodius et al., 1997), no acceleration was observed when CRP was positioned at −41.5 bp relative to the TSS. Even more unexpectedly, no 1D allelic manifold was observed at all when CRP was positioned at −40.5 bp. This last finding indicates that the variant RNAP binding sites we assayed control at least one functionally important biophysical quantity in addition to RNAP-DNA binding affinity.

Caveats and limitations

An important caveat is that our ΔG measurements assume that the true transcription rates (of which we obtain only noisy measurements) exactly fall along a 1D allelic manifold of the hypothesized mathematical form. These assumptions are well-motivated by the data collapse that we observed for all except one promoter architecture. But for some promoter architectures, there were a small number of ‘outlier’ data points that we judged (by eye) to deviate substantially from the inferred allelic manifold. The presence of a few outliers makes sense biologically: the random mutations we introduced into variant RNAP binding sites will, with some nonzero probability, either shift the position of the RNAP site or create a new binding site for some other TF. However, even for promoters that exhibit clear clustering of 2D data around a 1D curve, the deviations of individual non-outlier data points from our inferred allelic manifold were often substantially larger than the experimental noise that we estimated from replicates. It may be that the biological cause of outliers is not qualitatively different from what causes these smaller but still detectable deviations from our assumed model.

The low-throughput experimental approach we pursued here also has important limitations. Each of the 448 variant promoters for which we report data was individually catalogued, sequenced, and assayed for both t+ and t- in at least three replicate experiments. We opted to use a low-throughput colorimetric assay of β-galactosidase activity (Lederberg, 1950; Miller, 1972) because this approach is well established in E. coli to produce a quantitative measure of transcription with high precision and high dynamic range. Such assays have also been used by other groups to develop sophisticated biophysical models of transcriptional regulation (Kuhlman et al., 2007; Cui et al., 2013). However, this low-throughput approach has limited utility because it cannot be readily scaled up.

Our reliance on cAMP as a small molecule effector of CRP presents a second limitation. In our experiments, we controlled the in vivo activity of CRP by growing a specially designed strain of E. coli in either the presence (for t+) or absence (t-) of cAMP. This mirrors the strategy used by Kuhlman et al. (2007), and the validity of this approach is attested to by the calibration data shown in Appendix 2—figure 1. However, controlling in vivo TF activity using small molecules has many limitations. Most TFs cannot be quantitatively controlled with small molecules, and those that can often require special host strains (e.g., see Kuhlman et al., 2007). Moreover, varying the in vivo concentration of a TF can affect cellular physiology in ways that can confound quantitative measurements.

Outlook

MPRAs performed on array-synthesized promoter libraries should be able to overcome both of these experimental limitations. Current MPRA technology is able to quantitatively measure gene expression for 104 transcriptional regulatory sequences in parallel. We estimate that this would enable the simultaneous measurement of ~ 102 highly resolved allelic manifolds, each manifold representing a different promoter architecture. Moreover, by using array-synthesized promoters in conjunction with MPRAs, one can measure t+ and t- by systematically altering the DNA sequence of TF binding sites, rather than relying on small molecule effectors of each TF. This capability would, among other things, enable biophysical studies of promoters that have multiple binding sites for the same TF; in such cases it might make sense to use measurement spaces having more than two dimensions.

Will allelic manifolds be useful for understanding transcriptional regulation in eukaryotes? Both Sort-Seq MPRAs (Sharon et al., 2012; Weingarten-Gabbay et al., 2017) and RNA-Seq MPRAs (Melnikov et al., 2012; Kwasnieski et al., 2012; Patwardhan et al., 2012) are well established in eukaryotes so, on a technical level, experiments analogous to those described here should be feasible. The bigger question, we believe, is whether the results of such experiments would be interpretable. Eukaryotic transcriptional regulation is far more complex than transcriptional regulation in bacteria. Still, we believe that pursuing the measurement and modeling of allelic manifolds in this context is worthwhile. Despite the underlying complexities, simple ‘effective’ biophysical models might work surprisingly well. Similar approaches might also be useful for studying other eukaryotic regulatory processes that are compatible with MPRAs, such as alternative splicing (Wong et al., 2018).

Based on these results, we advocate a very different approach to dissecting cis-regulatory grammar than has been pursued by other groups. Rather than attempting to identify a single quantitative model that can explain regulation by many different arrangements of TF binding sites (Gertz et al., 2009; Sharon et al., 2012; Mogno et al., 2013; Smith et al., 2013; Levo and Segal, 2014; White et al., 2016), we suggest focused studies of the biophysical interactions that result from specific TF binding site arrangements. The measurement and modeling of allelic manifolds provides a systematic and stereotyped way of doing this. By coupling this approach with MPRAs, it should be possible to perform such studies on hundreds of systematically varied regulatory sequence architectures in parallel. General rules governing cis-regulatory grammar might then be identified empirically. We suspect that this bottom-up strategy to studying cis-regulatory grammar is likely to reveal regulatory mechanisms that would be hard to anticipate in top-down studies.

Materials and methods

Key resources table.

Reagent type
(species) or
resource
Designation Source or reference Identifiers Additional information
Genetic reagent
(E. coli)
JK10 this paper none genotype: ∆cyaAcpdA
lacYlacZdksA
Recombinant
DNA reagent

pJK47.419

this paper

none
cloning vector with BsmBI
cut sites, ccdB cassette, lacZ
reporter gene, kanamycin
resistance, pSC101 origin
Recombinant
DNA reagent
pJK48
and variants
this paper none reporter plasmids cloned
from pJK47.419
Chemical
compound
cAMP Sigma-Aldrich A9501-1G Adenosine 3’,5’-cyclic
monophosphate,
1 gram
Chemical
compound
IPTG Sigma-Aldrich I5502-1G Isopropyl
β-D-1- thiogalactopyranoside,
1 gram
Chemical
compound
ONPG Sigma-Aldrich N1127-5G 2-Nitrophenyl
β-D-galactopyranoside,
5 gram
Commercial
assay or kit
PureLink Genomic
DNA Mini Kit
ThermoFisher K182001 none
Commercial
assay or kit
Nextera XT DNA Library
Preparation Kit

Illumina

FC-131–1024

24 samples
Other RDM Teknova M2105 growth media: MOPS
EZ Rich
Defined Medium Kit,
5 liter
Other PopCulture
Reagent
MilliporeSigma 71092–4 75 milliliters
Other Breathe-Easier film USA Scientific 9123–6100 sterile, 100 per box
Other Epoch 2 Microplate
Spectrophotometer

BioTek

EPOCH2C

none
Software analysis scripts this paper none Available at https://github.com/jbkinney/17_inducibility
(copy archived at https://github.com/elifesciences-publications/17_inducibility)

Appendix 1 describes the media, strains, plasmids, and promoters assayed in this work. Appendix 2 describes the colorimetric β-galactosidase activity assay, adapted from Lederberg (1950) and Miller (1972), that was used to measure expression levels. Appendix 3 provides details about how quantitative models were fit to these measurements, as well as how uncertainties in estimated parameters were computed. Supplementary file 1 is an Excel spreadsheet containing the DNA sequences of all assayed promoters, all t+ and t- measurements used in this work, and all of the parameter values fit to these data, both with and without bootstrap resampling.

Acknowledgments

We thank Stirling Churchman, Barak Cohen, David McCandlish, Bryce Nickels, and Saurabh Sinha for helpful discussions. We also thank Naama Barkai, Ulrich Gerland, Richard Neher, and one anonymous referee for reviewing this manuscript and providing helpful feedback. This work was supported by a CSHL/Northwell Health Alliance grant to JBK and by NIH Cancer Center Support Grant 5P30CA045508.

Appendix 1

Media, strains, plasmids, and promoters

Appendix 1—figure 1. Promoter sequences used in this study.

Appendix 1—figure 1.

In all panels, the −35 and −10 hexamers of the RNAP binding site are in bold. CRP binding site centers are indicated by small triangles. The palindromic pentamers of the core CRP binding site in each construct are underlined. The transcription start site (TSS) is bold and italicized. Lowercase bases (‘a’,‘c’,‘g’, and ‘t’) indicate positions synthesized with a 24% mutation rate. The lowercase character ‘n’ indicates completely randomized positions. (A) Occlusion promoters assayed for Figure 2. (B) Class I promoters assayed for Figure 5. In the main text we refer to the wild-type promoter with CRP at −61.5 bp as the lac* promoter. The lac* promoter served as the template for all of the promoters shown here. (C) Strong class I promoters assayed for Figure 8. (D) Class II promoters assayed for Figure 9.

Expression measurements were performed on cells grown in rich defined media (RDM; purchased from Teknova) (Neidhardt et al., 1974) supplemented with 10 mM NaHCO3, 1 mM IPTG (Sigma), and 0.2% glucose. We refer to this media as RDM’. RDM’ was further supplemented with 50 µg/ml kanamycin (Sigma) when growing cells, as well as 250 µM cAMP (Sigma) when measuring t+.

Expression measurements were performed in E. coli strain JK10, which has genotype ΔcyaA ΔcpdA ΔlacY ΔlacZ ΔdksA. JK10 is derived from strain TK310 (Kuhlman et al., 2007), which is ΔcyaA ΔcpdA ΔlacY. The ΔcyaA ΔcpdA mutations prevent TK310 from synthesizing or degrading cAMP, thus allowing in vivo cAMP concentrations to be quantitatively controlled by adding cAMP to the growth media. Into TK310 we introduced the ΔlacZ mutation, yielding strain DJ33; this mutation enables the use of β-galactosidase activity assays for measuring plasmid-based lacZ expression. In our initial experiments, we found that the growth rate of DJ33 in RDM’ varied strongly with the amount of cAMP added to the media. Fortunately, we isolated a spontaneous knock-out mutation in dksA (thus yielding JK10), which caused the growth rate (~ 30 min doubling time) in RDM’ to be independent of cAMP concentrations below ~500 µM. We note that JK10 will not grow in minimal media in the absence of cAMP. The TK310, DJ33, and JK10 genotypes were confirmed by whole genome sequencing using the PureLink Genomic DNA Mini Kit (ThermoFisher) for extracting genomic DNA from cultured cells and the Nextera XT DNA Library Preparation Kit (Illumina) for preparing whole-genome sequencing libraries.

Expression of the lacZ gene was driven from variants of a plasmid we call pJK48. These reporter constructs were cloned as follows. We started with the vector pJK14 from Kinney et al. (2010). pJK14 contains a pSC101 origin of replication (~ 5 copies per cell; Thompson et al., 2018), a kanamycin resistance gene, and a ccdB cloning cassette positioned immediately upstream of a gfpmut2 reporter gene and flanked by outward-facing BsmBI restriction sites. First, the gfpmut2 gene in this vector was replaced with lacZ, yielding pJK47. Next, the ribosome binding site in the 5’ UTR of lacZ was weakened, yielding pJK47.419; this weakening prevents lacZ expression from substantially slowing cell growth in RDM’. pJK47.419 was propagated in DB3.1 E. coli (Invitrogen), which is resistant to the CcdB toxin. The promoters we assayed were variants of what we call the ‘lac*’ promoter. The lac* promoter is similar to the endogenous lac promoter of E. coli MG1655 except for (i) it contains a CRP binding site with a consensus right pentamer and (ii) it contains mutations that were introduced in an effort to remove previously reported cryptic promoters (Reznikoff, 1992). Promoter-containing insertion cassettes were created through overlap-extension PCR and flanked by outward-facing BsaI restriction sites. All primers were ordered from Integrated DNA Technologies. Note that some of the primers used to create these inserts were synthesized using pre-mixed phosphoramidites at specified positions; this is how a 24% mutation rate in the −10 or −35 regions of the RNAP binding site was achieved. The resulting promoter sequences are illustrated in Appendix 1—figure 1. To clone variants of pJK48, we separately digested the pJK47.419 vector with BsmBI (NEB) and the appropriate insert with BsaI (NEB). Digests were then cleaned up (Qiagen PCR purification kit) and ligated together in a 1:1 molar ratio for 1 hr using T4 DNA ligase (Invitrogen). After 90 min dialysis, plasmids were transformed into electrocompetent JK10 cells. Individual clones were plated on LB supplemented with kanamycin (50 µg/ml). After initial cloning and plating, each colony was re-streaked, grown in LB+kan, and stored as a catalogued glycerol stock. The promoter region of each clone was sequenced in both directions. Only plasmids with validated promoter sequences were used for the measurements presented in this paper. The promoter sequences of all 448 plasmids used in this study, as well as their measured t+ and t- values, are provided at https://github.com/jbkinney/17_inducibility (copy archived at https://github.com/elifesciences-publications/17_inducibility).

Appendix 2

Miller assays and the calibration of expression measurements

Appendix 2—figure 1. Calibration of expression measurements with and without cAMP.

Appendix 2—figure 1.

(A) Measurements of t+raw (in 250 µM cAMP) vs t-raw (in 0 µM cAMP) for promoters in which the CRP binding site has been replaced by a non-functional ‘null’ site. As expected, these data lie close to the t+raw=t-raw diagonal (dotted line). (B) Upon closer inspection, however, we found that t+raw values consistently fell slightly below corresponding t-raw values. Using least-squares fitting we found that, on average, t+raw/traw=0.8520.053+0.056 where uncertainties indicate a 95% confidence interval (reflecting 1.96 times the standard error of the mean in log space). To correct for this bias, we plot and fit models to t+=t+raw and t-=0.855×t-raw throughout this paper.

We obtained t+ and t- measurements for each promoter as follows. First, the corresponding E. coli clone was streaked out on LB+kan agar and grown overnight. A colony was then picked and used to inoculate a 1.5 ml overnight LB+kan liquid culture. Either 8 µl, 6 µl, or 4 µl of the overnight culture were then diluted into 200 µl RDM’+kan. 25 µl of each dilution was then added to 175 µl RDM’+kan in a 96-well optical bottom plate and supplemented with either 0 µM cAMP (for t-raw), 250 µM cAMP (for t+raw), or another cAMP concentration (for some t+raw measurements in Figure 3). The plate was then covered with Breathe-Easier film (USA Scientific) and cells were cultured for 3 hr at 37 °C, shaking at 900 RPM in a microplate shaker. During this time, 5.5 ml of lysis buffer was freshly prepared using 1.5 ml RDM’, 4.0 ml PopCulture reagent (Millipore), 114 µl of 35 mg/ml chloramphenicol (Sigma), and 44 µl of 40 U/µl rLysozyme (Sigma).

Microplate film was removed and cell density (quantified by A600) was measured using an Epoch 2 Microplate Spectrophotometer (BioTek). Cells were then lysed by adding 25 µl lysis buffer to each microplate well, incubating the microplate at room temperature for 10 min without shaking, then cooling the microplate at 4 °C for a minimum of 15 min. In each well of a 96-well optical bottom plate, 50 µl of lysate was then added to 50 µl of pre-chilled Z-buffer (Miller, 1972) containing 1 mg/ml ONPG (Sigma). Samples were sealed with optical film and both A420 and A550 were periodically measured in the plate reader over an extended period of time (every 1.5 min for 1 hr or every 15 min for 10 hr, depending on the level of expression expected).

The raw expression levels were quantified from these absorbance data using the formula

t±raw=ΔA420ΔA550VΔTA600, (9)

where V = 50 is the volume of lysate in µl added to the ONPG reaction, ΔT is the change in time from the beginning of the measurement, and ΔAX indicates a change in absorbance at X nm over this time interval. Only data from wells with A6000.5 were analyzed. Note that the A550 term in Equation 9 is not multiplied by 1.75 as it is in Miller (1972). This is because our A550 measurements are used to compensate for condensation on the microplate film, not cellular debris as in Miller (1972); our lysis procedure produces no detectable cellular debris. In practice, Equation 9 was not evaluated using individual measurements, but was computed from the slope of a line fit to all of the non-saturated absorbance measurements. Raw A420, A550, and A600 values, as well as our analysis scripts, are available at https://github.com/jbkinney/17_inducibility (copy archived at https://github.com/elifesciences-publications/17_inducibility). Median values from at least three independent Miller measurements (and often more) were used to define each measurement shown in the main figures.

Because we controlled the in vivo activity of CRP by supplementing media with or without cAMP, we tested whether CRP-independent promoters produce measurements that vary between these growth conditions. Specifically, we measured t-raw (in 0 µM cAMP) and t+raw (in 250 µM cAMP) for 39 promoters in which the CRP binding site was replaced with a ‘null’ site (see Appendix 1—figures 1B and C). These measurements are plotted in Appendix 2—figure 1, and show a slight bias. To correct for this bias, we use an unadjusted t+=t+raw together with an adjusted t-=0.855×t-raw throughout the main text. Note that t+=t+raw was used for all nonzero cAMP concentrations, including those in Figure 3B that differ from 250 µM. Some upward bias is therefore possible in these t+ measurements, but we do not expect this to greatly affect our conclusions.

Appendix 3

Parameter inference

Allelic manifold parameters were fit to measured t+ and t- values as follows. First, outlier data points were called by eye and excluded from the parameter fitting procedure. We denote the remaining measurements using t+i,data and t-i,data, where i=1,2,n indexes the n non-outlier data points. Corresponding model predictions t+i(θ) and t-i(θ), where θ denotes model parameters, were then fit to these data using nonlinear least squares optimization. Specifically, we inferred parameters θ=argminθ(θ) where the loss function is given by

(θ)=i=1n[logt+i(θ)t+i,data]2+[logti(θ)ti,data]2. (10)

These optimal parameter values θ were used to generate the best-estimate allelic manifolds, which are plotted in black in the main figures. Uncertainties in θ were estimated by performing the same inference procedure on bootstrap-resampled data. For each variable X{F,P,α,β,tsat,tbg}, we report

X=(X50)(X50X16)+(X84X50) (11)

where X50X84, and X16 respectively denote the median, 84th percentile, and 16th percentile of X values obtained from bootstrap resampling. In the case of X{F,P,α}, we also report

ΔGX=-kBTlogX50±kBT(logX84-logX162), (12)

where 1 kcal/mol = 1.62kBT at 37 °C. We now describe each specific inference procedure in more detail.

Inference for Figure 2B

We inferred θ={tsat,tbg,F,P1,P2,,Pn}, with model predictions given by

t+i(θ)=tsatPi1+F+Pi+tbg,ti(θ)=tsatPi1+Pi+tbg. (13)

Parameters were fit to the n=39 non-outlier measurements made for promoters with +0.5 bp or +4.5 bp architecture. We found that F=23.9-2.5+3.1 and tbg=2.30×10-3 a.u., while tsat values remained highly uncertain.

Inference for Figure 3B

We performed a separate inference procedure for each of the seven cAMP concentrations C{250,125,50,25,10,5,2.5}, indicated in µM units. Specifically, we inferred θC={FC,P1,P2,,PnC} where nC is the number of promoters for which t+ was measured using cAMP concentration C. Model predictions were given by

t+i(θC)=tsatPi1+FC+Pi+tbg,ti(θC)=tsatPi1+Pi+tbg, (14)

where tsat=15.1 a.u. is the median saturated transcription rate from Figure 5C, and tbg=2.30×10-3 a.u. is the median background transcription rate from Figure 2B. Note that many of the t-i measurements were used in the inference procedures for multiple values of C, whereas each t+i measurement was used in only one such inference procedure.

Inference for Figure 5B

Using data from both the −10 and −35 allelic series for the −61.5 bp promoter architecture, we inferred θ={tsat,tbg,α,P1,,Pn}. Model predictions were given by

t+i(θ)=tsatαPi1+αPi+tbg,ti(θ)=tsatPi1+Pi+tbg. (15)

For each inferred α, a value for α was computed using α=α(1+F-1)-F-1, where F=23.9 is the median CRP binding factor inferred for Figure 2B.

Inference for Figure 5C

In a single fitting procedure, we inferred θ={tsat,tbg-82.5,,tbg-60.5,α-82.5,,α-60.5,P1,,Pn} using

t+i(θ)=tsatαDiPi1+αDiPi+tbgDi,ti(θ)=tsatPi1+Pi+tbgDi, (16)

where each Di{82.5,81.5,76.5,72.5,71.5,66.5,65.5,64.5,63.5,62.5,61.5,60.5} represents the position of the CRP binding site (in bp relative to the TSS) for promoter i. Note that a single value for tsat was inferred for all promoter architectures, while both tbgD and αD varied with CRP position D. The corresponding values of α plotted in Figure 5D and listed in the Table 1 were computed using αD=αD(1+F-1)-F-1 where F=23.9 is the median CRP binding factor inferred for Figure 2B. Among other results, we find that tsat=15.1-0.5+0.6 a.u.

Inference for Figure 8C

For each spacing D, we separately inferred θD={αD,βD,tbgD} using

t+i(θD)=βDtsatαDPi1+αDPi+tbgD,ti(θD)=tsatPi1+Pi+tbgD, (17)

where tsat=15.1 a.u. is the median saturated transcription rate inferred for Figure 5C. We then computed αD=αD(1+F-1)-F-1 and βD=βD(1+αD-1F-1)-αD-1F-1, using the median CRP binding factor F=23.9 inferred for Figure 2B.

Appendix 4

Derivation of allelic manifold regimes

Appendix 4—figure 1. Derivation of the regimes of allelic manifolds.

Appendix 4—figure 1.

Panels A-D show simulated induction curves for transcription t as a function of the RNAP binding factor P. Dashed lines indicate boundaries between the minimal and linear regimes of each curve, while dotted lines indicate boundaries between linear and maximal regimes. A formula for the value of P at each regime boundary is also shown. All simulations used tsat=1 a.u., tbg=10-4 a.u., F=100, and P ranging from 10−9 to 104. (A) Induction curve for unregulated transcription; see Equation 18. (B) Induction curve for transcription repressed by occlusion; see Equation 19. (C) Induction curve for transcription activated by stabilization (α=300); see Equation 20. (D) Induction curve for transcription activated by acceleration (α=10, β=30); see Equation 21. Panels E-G show how overlaps between the six regimes of two induction curves (three for t- and three for t+) result in five distinct regimes for the corresponding allelic manifold. (E) Regimes of the allelic manifold for occlusion, which is shown in Figure 1C. (F) Regimes of the allelic manifold for stabilization, which is shown in Figure 4C. (G) Regimes of the allelic manifold for acceleration, which is shown in Figure 7C.

Each transcription rate modeled in this work is a sigmoidal function of the unitless RNAP-DNA binding factor P. As such, a log-log plot of transcription t as a function of P reveals a sigmoidal curve having three distinct regimes. The 'minimal' regime of this induction curve comprises values of P that are sufficiently small for t to be well-approximated by its smallest value (tbg in all cases). The 'maximal' regime occurs when P is so large that t is well-approximated by its largest value (either tsat or βtsat). Between these maximal and minimal regimes lies a 'linear' regime in which t is approximately proportional to P.

For unregulated transcription, which in this paper is denoted t-, these three regimes are given by

t=tsatP1+P+tbg{tbgforPtbgtsattsatPfortbgtsatP1tsatfor1P; (18)

see Appendix 4—figure 1A. For transcription that is repressed by occlusion (with F1), which we denote here by t+occ, these three regimes are shifted (relative to t) to larger values of P by a factor of approximately F. As a result,

t+occ=tsatP1+F+P+tbg{tbgforPFtbgtsattsatP1+FforFtbgtsatPFtsatforFP; (19)

see Appendix 4—figure 1B. By contrast, for transcription that is activated by stabilization, denoted here by t+stab, these three regimes shift (relative to t) to lower values of P by a factor of 1/α, giving

t+stab=tsatαP1+αP+tbg{tbgforPtbgtsatαtsatαPfortbgtsatαP1αtsatfor1αP; (20)

see Appendix 4—figure 1C. For transcription that is activated partially by acceleration and partially by stabilization, here denoted by t+acc, two parameters govern the shape of the induction curve. As a result, the boundary between the minimal and linear regimes are shifted (relative to t) to lower values of P by a factor of 1/αβ, while the boundary between the linear regime and the maximal regime is shifted down by a factor of only 1/α. As a result,

t+acc=βtsatαP1+αP+tbg{tbgforPtbgtsatαβtsatαβPfortbgtsatαβP1αtsatβfor1αP; (21)

see Appendix 4—figure 1D.

Each allelic manifold described in the main text has five distinct regimes. These arise from overlaps between the three regimes of t and the three regimes of t+. Specifically, the five regimes of the allelic manifold for repression by occlusion, which are listed in Figure 1D, arise from the overlaps between the three regimes for t and the three regimes for t+occ. These overlaps are indicated in Appendix 4—figure 1E. Similarly, the five regimes of the allelic manifold for activation by stabilization (Figure 4D) arise from the overlaps between the regimes of t and t+stab, illustrated in Appendix 4—figure 1F, while the regimes of the manifold for activation by acceleration (Figure 7D) arise from overlaps between the regimes of t and t+acc, illustrated in Appendix 4—figure 1G.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Justin B Kinney, Email: jkinney@cshl.edu.

Richard A Neher, University of Basel, Switzerland.

Naama Barkai, Weizmann Institute of Science, Israel.

Funding Information

This paper was supported by the following grant:

  • National Cancer Institute 5P30CA045508 to Justin B Kinney.

Additional information

Competing interests

No competing interests declared.

Author contributions

Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing-review and editing.

Data curation, Validation, Investigation, Methodology, Writing-review and editing.

Data curation, Validation, Investigation, Methodology.

Conceptualization, Investigation, Methodology.

Supervision, Funding acquisition, Writing—review and editing.

Conceptualization, Resources, Data curation, Software, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing.

Additional files

Supplementary file 1. Numerical results plotted in the Figures and listed in Table 1.

Please refer to the ’overview’ sheet within this workbook for a description of each data sheet therein.

elife-40618-supp1.xlsx (2.3MB, xlsx)
DOI: 10.7554/eLife.40618.012
Transparent reporting form
DOI: 10.7554/eLife.40618.013

Data availability

All data used to make the Figures is available in Supplementary file 1. The PSAM for RNAP, previously published by Kinney et al. (2010), is also provided in Supplementary file 1 (with permission). Raw data, processed data, and analysis scripts are also available at https://github.com/jbkinney/17_inducibility (copy archived at https://github.com/elifesciences-publications/17_inducibility). No datasets have been deposited in public databases as part of this work.

References

  1. Ackers GK, Johnson AD, Shea MA. Quantitative model for gene regulation by lambda phage repressor. PNAS. 1982;79:1129–1133. doi: 10.1073/pnas.79.4.1129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Adhya S. Regulation of Gene Expression in Esherichia Coli. Switzerland: Springer Nature; 1996. The lac and gal operons today; pp. 181–200. [DOI] [Google Scholar]
  3. Beckwith J, Grodzicker T, Arditti R. Evidence for two sites in the lac promoter region. Journal of Molecular Biology. 1972;69:155–160. doi: 10.1016/0022-2836(72)90031-9. [DOI] [PubMed] [Google Scholar]
  4. Belliveau NM, Barnes SL, Ireland WT, Jones DL, Sweredoski MJ, Moradian A, Hess S, Kinney JB, Phillips R. Systematic approach for dissecting the molecular mechanisms of transcriptional regulation in bacteria. PNAS. 2018;115:E4796–E4805. doi: 10.1073/pnas.1722055115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bintu L, Buchler NE, Garcia HG, Gerland U, Hwa T, Kondev J, Phillips R. Transcriptional regulation by the numbers: models. Current Opinion in Genetics & Development. 2005;15:116–124. doi: 10.1016/j.gde.2005.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Brewster RC, Jones DL, Phillips R. Tuning promoter strength through RNA polymerase binding site design in Escherichia coli. PLOS Computational Biology. 2012;8:e1002811. doi: 10.1371/journal.pcbi.1002811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Brewster RC, Weinert FM, Garcia HG, Song D, Rydenfelt M, Phillips R. The transcription factor titration effect dictates level of gene expression. Cell. 2014;156:1312–1323. doi: 10.1016/j.cell.2014.02.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Browning DF, Busby SJ. Local and global regulation of transcription initiation in bacteria. Nature Reviews Microbiology. 2016;14:638–650. doi: 10.1038/nrmicro.2016.103. [DOI] [PubMed] [Google Scholar]
  9. Busby S, Ebright RH. Transcription activation by catabolite activator protein (CAP) Journal of Molecular Biology. 1999;293:199–213. doi: 10.1006/jmbi.1999.3161. [DOI] [PubMed] [Google Scholar]
  10. Courey AJ. Mechanisms in Transcriptional Regulation. Malden, MA: Blackwell; 2008. [Google Scholar]
  11. Cui L, Murchland I, Shearwin KE, Dodd IB. Enhancer-like long-range transcriptional activation by λ CI-mediated DNA looping. PNAS. 2013;110:2922–2927. doi: 10.1073/pnas.1221322110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Ebright RH, Ebright YW, Gunasekera A. Consensus DNA site for the Escherichia coli catabolite gene activator protein (CAP): CAP exhibits a 450-fold higher affinity for the consensus DNA site than for the E. coli lac DNA site. Nucleic Acids Research. 1989;17:10295–10305. doi: 10.1093/nar/17.24.10295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Einav T, Duque J, Phillips R. Theoretical analysis of inducer and operator binding for cyclic-AMP receptor protein mutants. PLOS ONE. 2018;13:e0204275. doi: 10.1371/journal.pone.0204275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Foat BC, Morozov AV, Bussemaker HJ. Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE. Bioinformatics. 2006;22:e141–e149. doi: 10.1093/bioinformatics/btl223. [DOI] [PubMed] [Google Scholar]
  15. Forcier TL, Kinney JB. Supplemental Code for Forcier et al., 2018. 602ca57GitHub. 2018 https://github.com/jbkinney/17_inducibility
  16. Garcia HG, Phillips R. Quantitative dissection of the simple repression input-output function. PNAS. 2011;108:12173–12178. doi: 10.1073/pnas.1015616108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Gaston K, Bell A, Kolb A, Buc H, Busby S. Stringent spacing requirements for transcription activation by CRP. Cell. 1990;62:733–743. doi: 10.1016/0092-8674(90)90118-X. [DOI] [PubMed] [Google Scholar]
  18. Gertz J, Siggia ED, Cohen BA. Analysis of combinatorial cis-regulation in synthetic and genomic promoters. Nature. 2009;457:215–218. doi: 10.1038/nature07521. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Gunasekera A, Ebright YW, Ebright RH. DNA sequence determinants for binding of the Escherichia coli catabolite gene activator protein. The Journal of Biological Chemistry. 1992;267:14713–14720. [PubMed] [Google Scholar]
  20. Keseler IM, Collado-Vides J, Santos-Zavaleta A, Peralta-Gil M, Gama-Castro S, Muniz-Rascado L, Bonavides-Martinez C, Paley S, Krummenacker M, Altman T, Kaipa P, Spaulding A, Pacheco J, Latendresse M, Fulcher C, Sarker M, Shearer AG, Mackie A, Paulsen I, Gunsalus RP, Karp PD. EcoCyc: a comprehensive database of Escherichia coli biology. Nucleic Acids Research. 2011;39:D583–D590. doi: 10.1093/nar/gkq1143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Kinney JB, Murugan A, Callan CG, Cox EC. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. PNAS. 2010;107:9158–9163. doi: 10.1073/pnas.1004290107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Kuhlman T, Zhang Z, Saier MH, Hwa T. Combinatorial transcriptional control of the lactose operon of Escherichia coli. PNAS. 2007;104:6043–6048. doi: 10.1073/pnas.0606717104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Kwasnieski JC, Mogno I, Myers CA, Corbo JC, Cohen BA. Complex effects of nucleotide variants in a mammalian cis-regulatory element. PNAS. 2012;109:19498–19503. doi: 10.1073/pnas.1210678109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Lederberg J. The beta-d-galactosidase of Escherichia coli, strain K-12. Journal of Bacteriology. 1950;60:381–392. doi: 10.1128/jb.60.4.381-392.1950. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Lee DJ, Minchin SD, Busby SJ. Activating transcription in bacteria. Annual Review of Microbiology. 2012;66:125–152. doi: 10.1146/annurev-micro-092611-150012. [DOI] [PubMed] [Google Scholar]
  26. Levo M, Segal E. In pursuit of design principles of regulatory sequences. Nature Reviews Genetics. 2014;15:453–468. doi: 10.1038/nrg3684. [DOI] [PubMed] [Google Scholar]
  27. Malan TP, Kolb A, Buc H, McClure WR. Mechanism of CRP-cAMP activation of lac operon transcription initiation activation of the P1 promoter. Journal of Molecular Biology. 1984;180:881–909. doi: 10.1016/0022-2836(84)90262-6. [DOI] [PubMed] [Google Scholar]
  28. Markovitch O, Agmon N. Structure and energetics of the hydronium hydration shells. The Journal of Physical Chemistry A. 2007;111:2253–2256. doi: 10.1021/jp068960g. [DOI] [PubMed] [Google Scholar]
  29. McClure WR, Hawley DK, Youderian P, Susskind MM. DNA determinants of promoter selectivity in Escherichia coli. Cold Spring Harbor Symposia on Quantitative Biology. 1983;47:477–481. doi: 10.1101/SQB.1983.047.01.057. [DOI] [PubMed] [Google Scholar]
  30. Melnikov A, Murugan A, Zhang X, Tesileanu T, Wang L, Rogov P, Feizi S, Gnirke A, Callan CG, Kinney JB, Kellis M, Lander ES, Mikkelsen TS. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nature Biotechnology. 2012;30:271–277. doi: 10.1038/nbt.2137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Miller J. Experiments in Molecular Genetics. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press; 1972. [Google Scholar]
  32. Mogno I, Kwasnieski JC, Cohen BA. Massively parallel synthetic promoter assays reveal the in vivo effects of binding site variants. Genome Research. 2013;23:1908–1915. doi: 10.1101/gr.157891.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Morita T, Shigesada K, Kimizuka F, Aiba H. Regulatory effect of a synthetic CRP recognition sequence placed downstream of a promoter. Nucleic Acids Research. 1988;16:7315–7332. doi: 10.1093/nar/16.15.7315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Neidhardt FC, Bloch PL, Smith DF. Culture medium for enterobacteria. Journal of Bacteriology. 1974;119:736–747. doi: 10.1128/jb.119.3.736-747.1974. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Niu W, Kim Y, Tau G, Heyduk T, Ebright RH. Transcription activation at class II CAP-dependent promoters: two interactions between CAP and RNA polymerase. Cell. 1996;87:1123–1134. doi: 10.1016/S0092-8674(00)81806-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Parkinson G, Wilson C, Gunasekera A, Ebright YW, Ebright RH, Ebright RE, Berman HM. Structure of the CAP-DNA complex at 2.5 angstroms resolution: a complete picture of the protein-DNA interface. Journal of Molecular Biology. 1996;260:395–408. doi: 10.1006/jmbi.1996.0409. [DOI] [PubMed] [Google Scholar]
  37. Patwardhan RP, Hiatt JB, Witten DM, Kim MJ, Smith RP, May D, Lee C, Andrie JM, Lee SI, Cooper GM, Ahituv N, Pennacchio LA, Shendure J. Massively parallel functional dissection of mammalian enhancers in vivo. Nature Biotechnology. 2012;30:265–270. doi: 10.1038/nbt.2136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Pribnow D. Nucleotide sequence of an RNA polymerase binding site at an early T7 promoter. PNAS. 1975;72:784–788. doi: 10.1073/pnas.72.3.784. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Ptashne M, Gann A. Genes and Signals. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press; 2002. [Google Scholar]
  40. Ptashne M. Regulated recruitment and cooperativity in the design of biological regulatory systems. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2003;361:1223–1234. doi: 10.1098/rsta.2003.1195. [DOI] [PubMed] [Google Scholar]
  41. Reznikoff WS. The lactose operon-controlling elements: a complex paradigm. Molecular Microbiology. 1992;6:2419–2422. doi: 10.1111/j.1365-2958.1992.tb01416.x. [DOI] [PubMed] [Google Scholar]
  42. Rhodius VA, West DM, Webster CL, Busby SJ, Savery NJ. Transcription activation at class II CRP-dependent promoters: the role of different activating regions. Nucleic Acids Research. 1997;25:326–332. doi: 10.1093/nar/25.2.326. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Roy S, Garges S, Adhya S. Activation and repression of transcription by differential contact: two sides of a coin. Journal of Biological Chemistry. 1998;273:14059–14062. doi: 10.1074/jbc.273.23.14059. [DOI] [PubMed] [Google Scholar]
  44. Salgado H, Peralta-Gil M, Gama-Castro S, Santos-Zavaleta A, Muñiz-Rascado L, García-Sotelo JS, Weiss V, Solano-Lira H, Martínez-Flores I, Medina-Rivera A, Salgado-Osorio G, Alquicira-Hernández S, Alquicira-Hernández K, López-Fuentes A, Porrón-Sotelo L, Huerta AM, Bonavides-Martínez C, Balderas-Martínez YI, Pannier L, Olvera M, Labastida A, Jiménez-Jacinto V, Vega-Alvarado L, Del Moral-Chávez V, Hernández-Alvarez A, Morett E, Collado-Vides J. RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more. Nucleic Acids Research. 2013;41:D203–D213. doi: 10.1093/nar/gks1201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Schmidt A, Kochanowski K, Vedelaar S, Ahrné E, Volkmer B, Callipo L, Knoops K, Bauer M, Aebersold R, Heinemann M. The quantitative and condition-dependent Escherichia coli proteome. Nature Biotechnology. 2016;34:104–110. doi: 10.1038/nbt.3418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Sharon E, Kalma Y, Sharp A, Raveh-Sadka T, Levo M, Zeevi D, Keren L, Yakhini Z, Weinberger A, Segal E. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nature Biotechnology. 2012;30:521–530. doi: 10.1038/nbt.2205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Shea MA, Ackers GK. The OR control system of bacteriophage lambda. A physical-chemical model for gene regulation. Journal of Molecular Biology. 1985;181:211–230. doi: 10.1016/0022-2836(85)90086-5. [DOI] [PubMed] [Google Scholar]
  48. Sherman MS, Cohen BA. Thermodynamic state ensemble models of cis-regulation. PLOS Computational Biology. 2012;8:e1002407. doi: 10.1371/journal.pcbi.1002407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Smith RP, Taher L, Patwardhan RP, Kim MJ, Inoue F, Shendure J, Ovcharenko I, Ahituv N. Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model. Nature Genetics. 2013;45:1021–1028. doi: 10.1038/ng.2713. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. So LH, Ghosh A, Zong C, Sepúlveda LA, Segev R, Golding I. General properties of transcriptional time series in Escherichia coli. Nature Genetics. 2011;43:554–560. doi: 10.1038/ng.821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Spitz F, Furlong EE. Transcription factors: from enhancer binding to developmental control. Nature Reviews Genetics. 2012;13:613–626. doi: 10.1038/nrg3207. [DOI] [PubMed] [Google Scholar]
  52. Thompson MG, Sedaghatian N, Barajas JF, Wehrs M, Bailey CB, Kaplan N, Hillson NJ, Mukhopadhyay A, Keasling JD. Isolation and characterization of novel mutations in the pSC101 origin that increase copy number. Scientific Reports. 2018;8:1590. doi: 10.1038/s41598-018-20016-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Ushida C, Aiba H. Helical phase dependent action of CRP: effect of the distance between the CRP site and the -35 region on promoter activity. Nucleic Acids Research. 1990;18:6325–6330. doi: 10.1093/nar/18.21.6325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Vilar JM, Leibler S. DNA looping and physical constraints on transcription regulation. Journal of Molecular Biology. 2003;331:981–989. doi: 10.1016/S0022-2836(03)00764-2. [DOI] [PubMed] [Google Scholar]
  55. Weingarten-Gabbay S, Nir R, Lubliner S, Sharon E, Kalma Y, Weinberger A, Segal E. Deciphering transcriptional regulation of human core promoters. bioRxiv. 2017 doi: 10.1101/174904. [DOI]
  56. White MA, Kwasnieski JC, Myers CA, Shen SQ, Corbo JC, Cohen BA. A simple grammar defines activating and repressing cis-regulatory elements in photoreceptors. Cell Reports. 2016;17:1247–1254. doi: 10.1016/j.celrep.2016.09.066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Wong MS, Kinney JB, Krainer AR. Quantitative activity profile and context dependence of all human 5' splice sites. Molecular Cell. 2018;71:1012–1026. doi: 10.1016/j.molcel.2018.07.033. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision letter

Editor: Richard A Neher1

In the interests of transparency, eLife includes the editorial decision letter, peer reviews, and accompanying author responses.

[Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that minor issues remain unresolved.]

Thank you for submitting your article "Precision measurement of cis-regulatory energetics in living cells" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Richard A Neher as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Naama Barkai as the Senior Editor. The following individual involved in review of your submission has also agreed to reveal their identity: Ulrich Gerland (Reviewer #3). Reviewer #2 remains anonymous.

The Reviewing Editor has highlighted the concerns that require revision and/or responses, and we have included the separate reviews below for your consideration. If you have any questions, please do not hesitate to contact us.

Forcier et al. present a method to quantitatively estimate parameters of transcriptional regulation in vivo. The method is based on a phenomenological model of regulation that accounts for DNA binding of transcription factors and the RNA polymerase as well as interactions between them. The parameters of these models are estimated by comparing transcription for a range of promoter sequences in the presence and absence of the regulator. All reviewers agreed that the authors have devised an innovative and original way to quantify crucial parameters determining the fundamentals of bacterial gene regulation. However, the reviews and the ensuing discussion brought up a number of concerns that the authors should address.

1) How do the inferred parameters depend on growth rates and physiological state of the cells? Given that substantial contributions to the inferred free energies are entropic, changing concentrations of the interacting partners will affect the estimated energies. Comparing parameter inferences at different growth conditions would illuminate the nature of the measured free energies and make the precision of the measurements more interpretable. Repeating the measurements in different growth media would be one way to explore this effect.

2) The outlier classification and removal seem rather arbitrary. While many biophysical aspects change when promoter sequences are exchanged and these factors are difficult to include in a quantitative model, a more thorough discussion of why outliers might arise and how they can be distinguished from data that putatively conforms with the model is necessary.

3) What alternative scenarios might explain the failure of the model when CRP sits at -40.5 (see reviews below). How do you meaningfully distinguish a 'failure to collapse' from 'more outliers'?

Separate reviews (please respond to each point):

Reviewer #1:

Forcier et al. present a method to quantitatively estimate parameters of transcriptional regulation in vivo. The method is based on a phenomenological model of regulation that accounts for DNA binding of transcription factors (TF) and the RNA polymerase (RNAP) as well as interactions between TF-RNAP and potential accelerated initiation by TF-RNAP interactions. The parameters of these models are estimated by comparing transcription in the presence and absence of the TF for a variety of the promoter sequences. If a particular scheme of regulation is valid, the two sets of transcription rates are expected to follow a path in the 2d plane and parameters can be estimated from the shape of this path. Failure to collapse indicates a model misspecification.

The nature of the method cancels out/avoids many pitfalls and inaccuracies that arise when fitting more complex explicit models to transcription data. The resulting consistent in vivo measurements are an important step at understanding the energetics of the simplest and most fundamental regulatory systems.

Overall, I feel this is a solid piece of work with consistent results obtained by a clever and original method. The authors discuss ways in which this method could be scaled up to high throughput assays, but this part of the manuscript remains very vague.

The bulk of the DNA-TF binding free energy is claimed to be of entropic nature favoring the unbound state, while RNAP-TF interaction energies are estimated to be much larger than previously thought. To make the manuscript stronger, I would like to see the experiments being performed at different concentrations of the TF (CRP), i.e. vary F via [TF] and not only P.

Minor Comments:

Figure 1C and D: Some of the regimes and their relation to the figure are confusing. All seems correct, but some approximations and definitions are not what I initially thought they were.

a) Why not combine regimes 1 and 2 into t_- = t_bg + Pt_sat and t_+ = t_bg + Pt_sat/(1+F).b) Regime 3: if you don't realize that the figure is meant on a logscale (it is marked, I know, but took me a while to realize). Might be better to mark the diagonal lines as t_-=t_+ and t_- = t_+*(1+F) or similar.

Subsection “Strategy for measuring TF-RNAP interactions in vivo”, second paragraph: cooperatively -> cooperativity

Reviewer #2:

The authors carefully quantify binding of the TF crp by changing the affinity of the RNAP binding sites. Experimental measurements of transcription rates are used to infer binding by parameterizing a biophysical model.

The premise of the paper is very interesting and creative. Because of this, I think it is worth investing more energy in making sure the method is made clear – although I am not sure how to do so. The manuscript was also very substantial, and at times difficult to get through. With some clarifications, I think it is worth publishing.

Major comments:

The model and theory.

With the caveat that I do not have strong theoretical expertise:

The use of "manifold" seems to complicate the matter in the context of this paper. As I understand it, the authors are fitting points to a (nonlinear, geometrically complex) model, and looking for deviations from the various regimes of the model. I expect that both t+ and t- are always monotonically increasing functions of P (as they are modelled). A manifold approach would be required if (for example) t+ increased and then decreased as a function of P, but such a case does not appear in the paper, and is not trivial to conceive. Perhaps the authors could explain when/why a manifold approach is necessary.

The classification of points as "outliers" is arbitrary (e.g. Figure 4C for the site at -66.5). A more objective approach could be taken. For example, changes to fitting method could mitigate this. The current loss function minimizes least squares on the log values. I think this is equivalent to assuming error is log normal and minimizing. Could this assumption be relaxed to assume log normal error for most points plus a fraction of points ("outliers") that are drawn from a uniform distribution with some reasonable limits?

Bootstrap: is this necessary? Can the authors infer some confidence interval using the loss function? The most important implication is that the bootstrap procedure may overestimate the precision of their measurements.

Experiments.

I am not sure whether 250uM cAMP is enough to guarantee full occupation of crp in glucose?

Could the authors provide some data for a small number of RNAP binding variants?

It also might be informative to have cAMP induction/repression curves for a few RNAP binding variants.

I'm surprised the authors opted for Miller assays over GFP cytometry (and microscopy). They mention single cell data from another paper, but do not provide any, nor any live cell data. There is information that could be gleaned from single cell data that is relevant. For example, the variance in txn among cells could corroborate the mean txn rate they observe and use to infer binding constants. GFP assays would also provide a corroborative measure of mean expression for the Miller assays.

Can the authors discuss briefly the question of the validity of using plasmid-based assays (in fact I think these are better than chromosomal-based assays for this experiment).

Subsection “Surprises in class II regulation”: The authors frame their result as "When measuring an expression manifold at this position, however, we obtained a scatter of 2D points that did not collapse to any discernible 1D expression manifold (Figure 7D)." I am not convinced. This is a bolder statement than the rest of the paper and requires a bit more evidence to assert. Another interpretation is that for reasons not established, there was more noise (or more outliers) in this experiment than in others.

Minor Comments:

Subsection “Precision measurement of in vivo CRP-DNA binding”: dyadic is an obscure term

Reviewer #3:

General assessment:

The authors devise a scheme to systematically determine the effective in vivo interactions between RNA polymerase, transcription factors, and their respective DNA binding sites. The scheme is based on quantitative gene expression measurements from a large number of promoter constructs, which are interpreted using models for competitive and/or cooperative protein-DNA binding. The authors illustrate the power of this approach with a study of cis-regulatory transcription control by the transcription factor CRP in E. coli.

The general question attacked by this study is clearly important: How can we characterize the in vivo interactions between DNA-bound proteins that determine transcription rates? The authors make significant progress on this question with their proof-of-principle study of CRP-mediated transcription control, since the underlying approach can readily be transferred and extended to other cases of cis-regulatory transcription control.

Comments and questions to the authors:

1) The authors stress the importance of having in vivo rather than in vitro interaction parameters, and the precision with which they determine these interactions. It is indeed nice to see how well the data collapses, and the quality of the fits is convincing. However, given these encouraging results, I find it important to assess the limitations of both the concept and the precision more broadly. In particular, are the in vivo interaction parameters fixed numbers for a given E. coli strain, or do they depend on the state of the cells? All of the experiments were done with the same growth rate and conditions. The effective strength of CRP binding to its consensus DNA site was found to be -2.1 kcal/mol with 0.1 kcal/mol precision under these conditions, but does this parameter change when the cells are put under conditions of e.g. slow growth? The same question applies to the CRP-RNAP interaction. If these parameters do change with the state of the cell, how do the changes compare to the 0.1 kcal/mol precision? This question is crucial to appreciate the significance of the numbers obtained – will they need to be remeasured under every condition or can they be measured just once and then applied to a broad range of conditions?

2) The analysis of CRP regulation from the -40.5 bp site provides an interesting example of a case where the model fails. However, at this point more insight might be gained by considering alternative biophysical models. For instance, could it be that β now depends on the -10 sequence of the promoter? Or could CRP bound in this position generate a situation of "frustrated binding" for RNAP, i.e., it can simultaneously contact the -10 and the -35 region of the promoter when CRP is absent, whereas in the presence of CRP it could only make either the -10 contact or the contact with CRP/-35, and would choose the better one? Perhaps these scenarios are also ruled out by the absence of data collapse – can the authors specify which types of scenarios are ruled out and which are still possible?

3) I think the authors should discuss more clearly which difficulties will need to be overcome when their approach is extended to regulation via more than one TF-binding site. In particular, it seems that determining pairwise interactions may not be enough, since the interaction strength between proteins A and B can depend on whether protein C is bound or not (i.e., 3-body interactions). This can significantly complicate the analysis. How will the approach be generalized – 3-dimensional plots with data collapse onto 2D surfaces? Personally, I think the best hope is that bottom-up approaches like this one will be complemented with top-down approaches like the one described in Hillenbrand et al., eLife (2016).

Minor Comments:

– Abstract: in my mind, RNAP is not a TF

– Results section, third paragraph: the conversion from kT to kcal/mol is wrong

– “Our result indicates that, in living cells, this Gibbs free energy is almost entirely canceled by the entropic cost of removing a CRP molecule from the cytoplasmic environment”: can the authors provide a back-of-the envelope estimate to interpret this conclusion – is this approximate cancellation reasonable/expected?

– Second paragraph of subsection “Strategy for measuring TF-RNAP interactions in vivo”: "cooperatively factor" α -> cooperativity α?

eLife. 2018 Dec 20;7:e40618. doi: 10.7554/eLife.40618.023

Author response


Responses to editorial critiques:

We thank the reviewers and editors for their thoughtful assessment of our manuscript. We have substantially revised this manuscript to address these critiques, as well as to further improve its clarity. Here is a summary of the major changes we have made.

1) We have changed the term “expression manifold” to “allelic manifold”, as we believe our approach is more accurately seen as an extension of the classical genetics concept of allelic series. We have also changed the title of our paper to emphasize the concept of allelic manifolds, and we have added a paragraph to the Introduction aimed at explicitly introducing this concept.

2) We have reduced the length of the Introduction, and we have divided the Discussion into three discrete sections: “Summary”, “Limitations and caveats”, and “Outlook”. We have also organized the Results section into three “Parts”, each of which is sectioned into “Strategy”, “Demonstration”, and “Aside”. We believe these changes will assist the reader in navigating our manuscript.

3) Technical information has been further compartmentalized and expanded in the appendices. These appendices now include a more complete discussion of our model inference methods, a derivation of the five regimes of each allelic manifold, and an added an explanation of how t+ and t- measurements were calibrated relative to each other.

Please find below our responses to each specific critique.

1) How do the inferred parameters depend on growth rates and physiological state of the cells? Given that substantial contributions to the inferred free energies are entropic, changing concentrations of the interacting partners will affect the estimated energies. Comparing parameter inferences at different growth conditions would illuminate the nature of the measured free energies and make the precision of the measurements more interpretable. Repeating the measurements in different growth media would be one way to explore this effect.

To address this critique, we assayed occluding promoters using different concentrations of cAMP in the growth media. Our data (in the new Figure 3) suggest that the CRP-DNA binding factor F, and thus the in vivo concentration of active CRP, roughly follows a nontrivial power of cAMP concentration. We note that this nontrivial power-law dependence might result from cooperativity in cAMP-CRP binding, but it might also result from the dynamics of cAMP import and export from cells. Either way, these new data illustrate how allelic manifolds can be used to quantify changes in the in vivo concentrations of TFs.

Our low-throughput experimental setup, however, makes it difficult to repeat our experiments in entirely different growth conditions. Specifically, the E. coli strain JK10 required by our experiments will not grow in minimal media in the absence of cAMP, thus precluding measurements in this common growth condition. This illustrates how using a small molecule effector to control TF concentration can be a major limitation. Future work using MPRAs will be able to overcome this hurdle, enabling the measurement of such biophysical parameters in a wide variety of growth conditions. These points are addressed in the revised Discussion.

2) The outlier classification and removal seem rather arbitrary. While many biophysical aspects change when promoter sequences are exchanged and these factors are difficult to include in a quantitative model, a more thorough discussion of why outliers might arise and how they can be distinguished from data that putatively conforms with the model is necessary.

We have clarified the nature of outliers in the revised Discussion. Briefly, because we are introducing random mutations into a promoter sequence, there is a possibility of shifting the RNAP binding site or introducing new binding sites for other TFs. We suspect that this is the primary cause of outliers, which are operationally defined as data points that deviate greatly from the proposed biophysical model.

We decided against implementing a specific mathematical criterion for calling outliers. The suggestion of Referee #2, that we perform Bayesian inference with a stick-and-slab prior, is a reasonable suggestion. But introducing such a model would introduce a substantial complication (in an already lengthy manuscript) while being unlikely to substantively change our results. We note that readers can readily judge whether our outlier designations are reasonable, since all of our data is plotted and shown relative to the fitted manifolds. We expect, however, that an automated method for calling of outliers, like that proposed by Referee #2, will be important in future MPRA-based studies.

3) What alternative scenarios might explain the failure of the model when CRP sits at -40.5 (see reviews below). How do you meaningfully distinguish a 'failure to collapse' from 'more outliers'?

We have clarified our view on this matter in the Discussion. Briefly, in the -40.5 bp architecture, changing the -10 element of the RNAP binding site sequence appears to control a biophysical parameter in addition to RNAP-DNA binding affinity. We suspect that this additional parameter is the strength of the CRP-RNAP interaction; this makes sense structurally, but we do not have additional evidence in support of this hypothesis.

More generally, outliers reflect promoters that substantially deviate from the predictions of the proposed biophysical model. If most promoters in an allelic series are outliers, it means that the proposed biophysical model is of little use and that one should consider an alternative model. There isn’t any fundamental difference between ‘failure to collapse’ and ‘more outliers’. But in all the promoters we investigated, outliers were either very rare or (for CRP at -40.5 bp) were so dominant that no convincing 1D manifold could be visually identified.

Separate reviews (please respond to each point):

Reviewer #1:

Forcier et al. present a method to quantitatively estimate parameters of transcriptional regulation in vivo. The method is based on a phenomenological model of regulation that accounts for DNA binding of transcription factors (TF) and the RNA polymerase (RNAP) as well as interactions between TF-RNAP and potential accelerated initiation by TF-RNAP interactions. The parameters of these models are estimated by comparing transcription in the presence and absence of the TF for a variety of the promoter sequences. If a particular scheme of regulation is valid, the two sets of transcription rates are expected to follow a path in the 2d plane and parameters can be estimated from the shape of this path. Failure to collapse indicates a model misspecification.

The nature of the method cancels out/avoids many pitfalls and inaccuracies that arise when fitting more complex explicit models to transcription data. The resulting consistent in vivo measurements are an important step at understanding the energetics of the simplest and most fundamental regulatory systems.

Overall, I feel this is a solid piece of work with consistent results obtained by a clever and original method. The authors discuss ways in which this method could be scaled up to high throughput assays, but this part of the manuscript remains very vague.

The revised Discussion section better explains how this assay might be scaled up using MPRAs.

The bulk of the DNA-TF binding free energy is claimed to be of entropic nature favoring the unbound state, while RNAP-TF interaction energies are estimated to be much larger than previously thought. To make the manuscript stronger, I would like to see the experiments being performed at different concentrations of the TF (CRP), i.e. vary F via [TF] and not only P.

The new Figure 3 shows expression manifolds measured for multiple [cAMP] concentrations. These data show that F, and thus the active concentration of active CRP in cells, varies as a nontrivial power law of cAMP concentration. We would have liked to pursue additional measurements (e.g., of RNAP-TF interactions) at variable cAMP concentrations, but our paper is already quite lengthy and we do not believe this is essential to support our primary conclusions.

Minor Comments:

Figure 1C and D: Some of the regimes and their relation to the figure are confusing. All seems correct, but some approximations and definitions are not what I initially thought they were.

a) Why not combine regimes 1 and 2 into t_- = t_bg + Pt_sat and t_+ = t_bg + Pt_sat/(1+F).

Appendix 4 of the revised manuscript provides an explicit derivation of the 5 regimes of each allelic manifold. This should address the referee’s question.

b) Regime 3: if you don't realize that the figure is meant on a logscale (it is marked, I know, but took me a while to realize). Might be better to mark the diagonal lines as t_-=t_+ and t_- = t_+*(1+F) or similar.

We have implemented this suggestion in the new Figures 1, 4, and 7.

Subsection “Strategy for measuring TF-RNAP interactions in vivo”, second paragraph: cooperatively -> cooperativity

Fixed.

Reviewer #2:

The authors carefully quantify binding of the TF crp by changing the affinity of the RNAP binding sites. Experimental measurements of transcription rates are used to infer binding by parameterizing a biophysical model.

The premise of the paper is very interesting and creative. Because of this, I think it is worth investing more energy in making sure the method is made clear – although I am not sure how to do so. The manuscript was also very substantial, and at times difficult to get through. With some clarifications, I think it is worth publishing.

We have made substantial revisions to this manuscript to improve clarity and readability. In particular, we have renamed “expression manifold” to “allelic manifold”, a concept that is now directly addressed in the Introduction. We have also sectioned the Results and Discussion sections in a way that should help better guide the reader.

Major comments:

The model and theory.

With the caveat that I do not have strong theoretical expertise:

The use of "manifold" seems to complicate the matter in the context of this paper. As I understand it, the authors are fitting points to a (nonlinear, geometrically complex) model, and looking for deviations from the various regimes of the model. I expect that both t+ and t- are always monotonically increasing functions of P (as they are modelled). A manifold approach would be required if (for example) t+ increased and then decreased as a function of P, but such a case does not appear in the paper, and is not trivial to conceive. Perhaps the authors could explain when/why a manifold approach is necessary.

We have revised the Introduction to better motivate our adoption of the term “manifold”. The revised Discussion also points out that multi-dimensional generalizations of this concept might be appropriate in situations, e.g., for promoters that contain multiple TF binding sites. It should be noted that expression from some bacterial promoters have been shown to decrease when the RNAP binding site is strengthened due to “trapping” of RNAP by an overly-high affinity binding site, e.g. https://www.ncbi.nlm.nih.gov/pubmed/8006961.

The classification of points as "outliers" is arbitrary (e.g. Figure 4C for the site at -66.5). A more objective approach could be taken. For example, changes to fitting method could mitigate this. The current loss function minimizes least squares on the log values. I think this is equivalent to assuming error is log normal and minimizing. Could this assumption be relaxed to assume log normal error for most points plus a fraction of points ("outliers") that are drawn from a uniform distribution with some reasonable limits?

This is a fair suggestion, but as described above we felt that this would unnecessarily complicate the paper. Such an approach is likely to be required, however, as we transition to high-throughput experiments.

Bootstrap: is this necessary? Can the authors infer some confidence interval using the loss function? The most important implication is that the bootstrap procedure may overestimate the precision of their measurements.

The actual experimental error function is somewhat unclear, and we are reluctant to take our simple Gaussian error assumption too seriously. We used bootstrap resampling primarily because it was the simplest and most transparent thing we could do that would give reasonable results. However, we do expect to use more sophisticated error modeling as we transition to MPRA-based experiments. In the meantime, we have made all of our data (both raw and processed) and analysis code available for researchers who might wish to redo this analysis.

Experiments.

I am not sure whether 250uM cAMP is enough to guarantee full occupation of crp in glucose?

Could the authors provide some data for a small number of RNAP binding variants?

Our conclusions do not depend on CRP being fully activated by cAMP.

It also might be informative to have cAMP induction/repression curves for a few RNAP binding variants.

The new Figure 3 shows how allelic manifolds for occlusion-regulated promoters change in response to variable cAMP concentrations, thus illustrating how changes in the in vivo concentrations of TFs can be quantified.

I'm surprised the authors opted for Miller assays over GFP cytometry (and microscopy). They mention single cell data from another paper, but do not provide any, nor any live cell data. There is information that could be gleaned from single cell data that is relevant. For example, the variance in txn among cells could corroborate the mean txn rate they observe and use to infer binding constants. GFP assays would also provide a corroborative measure of mean expression for the Miller assays.

ONPG-based assays of β-galactosidase activity were introduced by Lederberg in 1950 and standardized by Miller in 1972. No other assay of gene expression has a longer track record of providing accurate quantitative measurements. Indeed, as the revised Discussion points out, this is the assay that has been used to establish the most sophisticated biophysical models of transcriptional regulation yet reported.

Can the authors discuss briefly the question of the validity of using plasmid-based assays (in fact I think these are better than chromosomal-based assays for this experiment).

Plasmid-based assays of promoter activity are standard in bacterial transcription field, though on a quantitative level it is unclear how precisely these measurements recapitulate expression from chromosomally-integrated constructs. Unfortunately, our present experimental setup does not allow us to address this question. It is worth emphasizing, though, that being able to systematically dissect cis-regulatory energetics – even just on plasmids – is a substantial advance over present capabilities. Moreover, the proof-of-principle experiments we present here are compatible with genome-integrated MPRAs that have already been developed (https://www.ncbi.nlm.nih.gov/pubmed/29388765), so it should be straightforward to address this question in future work.

Subsection “Surprises in class II regulation”: The authors frame their result as "When measuring an expression manifold at this position, however, we obtained a scatter of 2D points that did not collapse to any discernible 1D expression manifold (Figure 7D)." I am not convinced. This is a bolder statement than the rest of the paper and requires a bit more evidence to assert. Another interpretation is that for reasons not established, there was more noise (or more outliers) in this experiment than in others.

We have added error bars to this plot (now Figure 9D) to indicate the SEM estimated from replicate experiments. These show that the increased scatter is not due to increased measurement noise. The relationship between “no collapse” and “more outliers” is elaborated in the revised Discussion.

Minor Comments:

Subsection “Precision measurement of in vivo CRP-DNA binding”: dyadic is an obscure term

We have changed “dyadic” to “palindromic” in the revised text.

Reviewer #3:

[…] Comments and questions to the authors:

1) The authors stress the importance of having in vivo rather than in vitro interaction parameters, and the precision with which they determine these interactions. It is indeed nice to see how well the data collapses, and the quality of the fits is convincing. However, given these encouraging results, I find it important to assess the limitations of both the concept and the precision more broadly. In particular, are the in vivo interaction parameters fixed numbers for a given E. coli strain, or do they depend on the state of the cells? All of the experiments were done with the same growth rate and conditions. The effective strength of CRP binding to its consensus DNA site was found to be -2.1 kcal/mol with 0.1 kcal/mol precision under these conditions, but does this parameter change when the cells are put under conditions of e.g. slow growth? The same question applies to the CRP-RNAP interaction. If these parameters do change with the state of the cell, how do the changes compare to the 0.1 kcal/mol precision? This question is crucial to appreciate the significance of the numbers obtained – will they need to be remeasured under every condition or can they be measured just once and then applied to a broad range of conditions?

These are good questions. Please refer to our response to critique #1 in “Responses to editorial critiques” above.

2) The analysis of CRP regulation from the -40.5 bp site provides an interesting example of a case where the model fails. However, at this point more insight might be gained by considering alternative biophysical models. For instance, could it be that β now depends on the -10 sequence of the promoter? Or could CRP bound in this position generate a situation of "frustrated binding" for RNAP, i.e., it can simultaneously contact the -10 and the -35 region of the promoter when CRP is absent, whereas in the presence of CRP it could only make either the -10 contact or the contact with CRP/-35, and would choose the better one? Perhaps these scenarios are also ruled out by the absence of data collapse – can the authors specify which types of scenarios are ruled out and which are still possible?

This point is clarified in the revised Discussion. Briefly, the lack of collapse suggests that at least two quantitative parameters relevant for expression are changing in response to mutations to the -10 region of the RNAP binding site. We suspect that one of these additional parameters is the CRP-RNAP interaction energy, be we do not have additional evidence of this. DNA-dependence of this interaction could be due to the close proximity between CRP and RNAP when the CRP binds at -40.5 bp. Measurements of this manifold embedded in higher dimensions (e.g., by using 3 cAMP concentrations) might allow us to critically assess this hypothesis, and we plan to pursue this in future work.

3) I think the authors should discuss more clearly which difficulties will need to be overcome when their approach is extended to regulation via more than one TF-binding site. In particular, it seems that determining pairwise interactions may not be enough, since the interaction strength between proteins A and B can depend on whether protein C is bound or not (i.e., 3-body interactions). This can significantly complicate the analysis. How will the approach be generalized – 3-dimensional plots with data collapse onto 2D surfaces? Personally, I think the best hope is that bottom-up approaches like this one will be complemented with top-down approaches like the one described in Hillenbrand et al., eLife (2016).

We believe the best way to address regulation by multiple molecules of a single TF is will be to use MPRAs with array-synthesized oligos. Doing so will allow each individual TF binding site to be turned “on” and “off” independently without any changes to growth conditions. The use of higher-dimensional allelic manifolds might also be useful for this purpose. We expand on this point in the revised discussion.

Minor Comments:

– Abstract: in my mind, RNAP is not a TF

Any protein that regulates transcription by binding DNA qualifies as a TF. In fact, in eukaryotes it is quite common to refer to components of the RNA polymerase as TFs (e.g., TFIID).

– Results section, third paragraph: the conversion from kT to kcal/mol is wrong

Thank you for catching this inversion! This was a typo and has been fixed; it does not reflect any errors in our reported results.

– “Our result indicates that, in living cells, this Gibbs free energy is almost entirely canceled by the entropic cost of removing a CRP molecule from the cytoplasmic environment”: can the authors provide a back-of-the envelope estimate to interpret this conclusion – is this approximate cancellation reasonable/expected?

Upon a closer reading of the source reference, we realized that the previously quoted value of -15 kcal/mol does not represent what we thought it had. We have removed this cited quantity. In its place we have added a back-of-the-envelope estimate of what the binding factor F should be based on in vitro measurements of CRP affinity and estimated aqueous CRP concentrations. Our measured F value is far smaller than this estimate. We suggest that this discrepancy is due to the nonspecific binding of CRP and perhaps also to limited DNA accessibility in the cell.

– Second paragraph of subsection “Strategy for measuring TF-RNAP interactions in vivo”: "cooperatively factor" α -> cooperativity α?

This has been fixed.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Supplementary file 1. Numerical results plotted in the Figures and listed in Table 1.

    Please refer to the ’overview’ sheet within this workbook for a description of each data sheet therein.

    elife-40618-supp1.xlsx (2.3MB, xlsx)
    DOI: 10.7554/eLife.40618.012
    Transparent reporting form
    DOI: 10.7554/eLife.40618.013

    Data Availability Statement

    All data used to make the Figures is available in Supplementary file 1. The PSAM for RNAP, previously published by Kinney et al. (2010), is also provided in Supplementary file 1 (with permission). Raw data, processed data, and analysis scripts are also available at https://github.com/jbkinney/17_inducibility (copy archived at https://github.com/elifesciences-publications/17_inducibility). No datasets have been deposited in public databases as part of this work.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES