ABSTRACT
The global regulator Lrp plays a crucial role in regulating metabolism, virulence, and motility in response to environmental conditions. Lrp has previously been shown to activate or repress approximately 10% of the genes in Escherichia coli. However, the full spectrum of targets, and how Lrp acts to regulate them, have stymied earlier study. We have combined matched chromatin-immunoprecipitation sequencing (ChIP-seq) and RNA sequencing (RNA-seq) under nine physiological conditions to comprehensively map the binding and regulatory activity of Lrp as it directs responses to nutrient abundance. In addition to identifying hundreds of novel Lrp targets, we observe two new global trends, as follows: first, that Lrp will often bind to promoters in a poised position under conditions when it has no regulatory activity to enable combinatorial interactions with other regulators, and second, that nutrient levels induce a global shift in the equilibrium between less-sequence-specific and more-sequence-specific DNA binding. The overall regulatory behavior of Lrp, which as we now show extends to 38% of E. coli genes directly or indirectly under at least one condition, thus arises from the interaction between changes in Lrp binding specificity and cooperative action with other regulators.
IMPORTANCE To survive, bacteria such as E. coli must rapidly respond to changing environmental conditions, including nutrient levels. A decrease in nutrient availability causes bacteria to stop rapid replication and enter stationary phase, where they perform limited to no cell division. The E. coli global regulatory protein Lrp has been previously implicated in modulating the expression of genes particularly important at this transition from rapid to slowed growth. Here, we monitor Lrp’s DNA binding locations and effect on gene expression under three different nutrient conditions across three growth stages. We find that Lrp’s role is even broader than previously suspected and that it appears to interact with many other bacterial regulators to perform its function in a condition-specific manner.
KEYWORDS: ChIP-seq, Lrp, transcription factor, transcriptional regulation
INTRODUCTION
Regulation in response to changing nutrient conditions is a vital characteristic for free-living microbes, which must rapidly sense and respond to their environment in order to optimize fitness. The frequently studied model microbe Escherichia coli uses a hierarchical regulatory architecture to coordinate responses to environmental changes, with the activity and actions of dozens of specific transcription factors organized by seven global regulators, ArcA, FNR, Fis, cAMP receptor protein (CRP), integration host factor (IHF), H-NS, and Lrp (1). E. coli Lrp is the eponymous member of the Lrp/AsnC protein family and regulates 70% of the 215 genes with differential expression upon entrance to stationary phase (2). It influences a variety of cellular processes: amino acid synthesis, degradation and transport, porin expression, and pilus formation (3, 4). Pilus formation represents an example of how Lrp homologues have recently been tied to the expression of virulence genes (5–10).
Lrp itself is an 18-kDa protein containing a helix-turn-helix DNA binding domain and a regulator of amino acid metabolism (RAM) domain (11). In vivo, it is thought to exist in an equilibrium between octameric and hexadecameric states (12). Binding of leucine to the RAM domain is known to favor formation of octamers over hexadecamers (13) and to increase the nonspecific DNA binding affinity of Lrp (14). In addition, the presence of leucine can affect Lrp’s regulatory role. Depending on the target, Lrp either activates or represses transcription, and in turn, leucine binding to Lrp either potentiates, inhibits, or has no effect on Lrp function (15). Recent studies also indicate that Lrp may respond to other amino acids, including alanine, methionine, isoleucine, histidine, and threonine (16). Cho et al. (15) performed chromatin-immunoprecipitation (ChIP) using epitope-tagged Lrp under three conditions, resulting in expansion of the known Lrp regulon to 138 binding sites. However, based on estimates about the levels of Lrp and the percent found free of the nucleoid (14), we estimate that there should be between 400 and 500 Lrp octamers bound and capable of modulating transcription levels at logarithmic growth under both rich- and minimal-medium conditions. Additionally, we still lack a mechanistic understanding of how Lrp regulation occurs.
Making use of a carefully refined ChIP-grade antibody for Lrp, we employed chromatin-immunoprecipitation followed by DNA sequencing (ChIP-seq) of native Lrp in a variety of medium conditions and growth phases to assess the full spectrum of Lrp binding sites. Coupled RNA sequencing (RNA-seq) experiments on both wild-type (WT) and Lrp knockout (lrp::kanR) cells enabled us to distinguish between productive and apparently nonfunctional binding events and between direct and indirect Lrp regulatory targets. This rich high-confidence data set has allowed us to categorize hundreds of novel direct and indirect Lrp targets, representing 38% of the genes in E. coli (roughly one-sixth of which are direct targets of Lrp) compared to the 2.3% currently documented in RegulonDB (17). The fact that many of the newly identified Lrp targets are only apparent under physiological conditions which had not been included in prior studies of the Lrp regulon underscores the importance of considering a wide range of conditions in any survey of transcription factor activity, and it also highlights the physiological role of Lrp in balancing foraging and biosynthetic strategies as nutrient conditions change.
We also identify a surprising but highly prevalent mode of Lrp binding in which Lrp binds to a site under many physiological conditions but only alters transcription under certain conditions, similar to poised transcription factor binding in eukaryotes (18, 19). We show that some of Lrp’s poised regulation may be explained by interactions with other regulatory factors, such as the nitrogen response sigma factor σ54, as well as to changes in Lrp binding occupancy that are consistent with changes in Lrp’s oligomerization state. Despite extensive efforts, we were unable to identify systematically enriched sequence determinants sufficient to either explain transitions from poised to active regulation or to distinguish Lrp activation from Lrp repression. However, we did observe a shift in Lrp’s DNA binding specificity in response to varying nutrient conditions. The conservation of Lrp across many species of bacteria and archaea (20) argues for its critical role in organismal survival, and here, we provide the most comprehensive picture of the Lrp regulon in E. coli to date, establishing rules for Lrp behavior that will likely illuminate study of the protein in many species. The general principles of Lrp’s behavior across conditions may also serve as the template for other bacterial global regulators.
RESULTS
We performed both ChIP sequencing (ChIP-seq) and RNA-seq on WT and Lrp knockout (lrp::kanR) cells to establish a global picture of Lrp binding and regulatory effects under nine physiological conditions. Conditions and time points will be referenced as follows: X_Log (logarithmic phase), X_Trans (transition point), and X_Stat (stationary phase), where the X may be MIN (minimal medium), LIV (minimal medium supplemented with branched-chain amino acids), or RDM (rich defined medium); representative growth curves for each condition are shown in Fig. S1 in the supplemental material. Overall, the combination of Lrp binding data from the ChIP-seq experiments and the expression data from the RNA-seq experiments resulted in the identification of hundreds of novel Lrp targets (Fig. 1A and Table 1). Care must be taken while interpreting these results, as knocking out a global regulator, such as Lrp, may induce some regulatory rewiring that is perhaps distantly related to Lrp’s normal biological function. For ChIP-seq analysis, we are only using the knockout strain samples as a control to filter out peaks resulting from nonspecific interactions of the antibody with complexes other than Lrp. For the RNA sequencing results, we are looking at transcriptional changes between the WT and lrp::kanR mutant strains, and it is possible that some of the changes in transcription are a result of this regulatory rewiring. Therefore, some false positives are unavoidable. Nevertheless, we find these data to be a valuable resource for exploring the full scope of Lrp regulatory activity. Thus, when we observe and state that a given gene is “regulated” by Lrp, we mean by this that its transcript level changes substantially in the complete absence of Lrp; we find this to be an appropriate definition to reflect the extremely broad impacts of a global regulator, such as Lrp, on cellular regulation and physiology. For the more specific definition of genes that are regulated directly by Lrp binding to their promoters, we introduce below the concept of a “direct target” of Lrp, which defines a more narrowly construed Lrp regulon based only on effects in cis at specific target promoters.
FIG 1.

ChIP-seq data show agreement with previous data and reveal novel Lrp binding sites. (A) Total number of high-confidence Lrp binding sites identified under each condition and the number of genes upregulated and downregulated by Lrp under each condition based on RNA-seq. (B) ChIP robust Z-score (top) and RNA-seq expression change [log2(WT/KO)] (bottom) for known Lrp-activated target ilvI. Error bars for the RNA-seq data indicate a percentile-based 95% confidence interval from 100 bootstrap replicates of expression levels, with conservative pooling of replicate information (see Materials and Methods for details). Labels above each bar indicate classification of the gene based on combining RNA-seq and ChIP-seq results (D, direct Lrp target; P, poised Lrp binding site with no regulatory effect under that condition; N, no Lrp link; see Fig. 2A and accompanying text for details of the classification). Dashed lines in RNA-seq plots indicate a 1.5-fold cutoff for the ratio between WT and KO strains needed for biological significance (see Materials and Methods for details). In the genomic diagram above the plots, open reading frames (ORFs) are shown in black, regulatory regions (as defined in Materials and Methods but without the 250-bp padding) are in purple, and the particular gene of interest is in orange; we follow this color scheme throughout the text. (C) ChIP robust Z-score (top) and RNA-seq expression change [log2(WT/KO)] (bottom) for known Lrp repressed target sdaA, as in panel B.
TABLE 1.
Genes with significant Lrp-dependent changes in expression
| Condition | No. (%) of total genes significantlya: |
|
|---|---|---|
| Upregulated by Lrp | Downregulated by Lrp | |
| MIN_Log | 251 (5.39) | 227 (4.87) |
| MIN_Trans | 453 (9.73) | 635 (13.63) |
| MIN_Stat | 90 (1.93) | 71 (1.52) |
| LIV_Log | 147 (3.16) | 258 (5.54) |
| LIV_Trans | 105 (2.25) | 138 (2.96) |
| LIV_Stat | 99 (2.13) | 71 (1.52) |
| RDM_Log | 41 (0.88) | 90 (1.93) |
| RDM_Trans | 58 (1.25) | 21 (0.45) |
| RDM_Stat | 728 (15.63) | 670 (14.38) |
Percentage is out of the total number of genes in E. coli (4,658).
Many well-studied Lrp targets are reproduced in our data. For example, IlvI (b0077) is an enzyme critical for valine and isoleucine biosynthesis that is known to be activated by Lrp (21). Consistent with prior work, we see a strong Lrp binding signal at the ilvI transcription start site (Fig. 1B, top) and an Lrp-dependent activation of ilvI transcription under several medium conditions (Fig. 1B, bottom). The extent of activation is weakened or eliminated completely under LIV or RDM conditions, in agreement with previous studies showing that leucine inhibits the Lrp-mediated activation of ilvI (22). Similarly, we also see strong binding at the promoter of sdaA (b1814), a serine deaminase that has been previously shown to be repressed by Lrp in minimal medium (15) (Fig. 1C). Consistent with prior reports, this repression and binding are relieved in the presence of exogenous leucine. Due to the higher resolution of our ChIP-seq data than that of prior chromatin immunoprecipitation on DNA microarray (ChIP-chip) studies, we are able to resolve an additional peak in the MIN conditions at the 3′ end of the sdaA coding region that may play a role in the repression seen exclusively under these conditions. Note that the lack of a unique transcription start site for pdeD precluded classification of pdeD in our analysis pipeline.
ChIP-seq identifies hundreds of novel Lrp binding sites.
While the level of Lrp protein remained fairly stable across the conditions that we tested (Fig. S2), we observed a 10-fold range (between 61 and 638) in the number of Lrp peaks identified across the nine physiological conditions examined here. Fewer Lrp binding sites are identified in media with higher nutrient levels (either LIV or RDM) than in the MIN medium (summarized in Fig. 1A), in agreement with previously published Lrp ChIP data (15) and with Lrp’s known role as a regulator which responds to decreasing nutrient levels. However, our data identify between 1.8-fold (at LIV_Log) and 4.8-fold (at MIN_Stat) more binding sites overall than in previous studies (15). In general, we document more Lrp binding sites at later time points (Trans and Stat) relative to Log (Fig. 1A), again in agreement with previously published Lrp ChIP data (15) and with the known role of Lrp as being a critical regulator at the transition to stationary phase. As would naturally be expected, we saw strong enrichment of Lrp binding sites among regulatory regions of the genome (see Text S1). Comparing our data to previously published ChIP-chip studies (15), we identify extensive overlap in binding locations, as 96% of the sites in prior ChIP-chip data are reproduced in our data at MIN_Log (27.7-fold enrichment compared to a null distribution of randomly shuffled peaks of identical lengths; P < 0.001, permutation test, r = 1,000 [here, we use r to refer to the number of replicates used for resampling tests]), 44% at LIV_Log (123.0-fold enrichment compared to a null distribution, P < 0.001, permutation test, r = 1,000), and 84% at MIN_Stat (15.5-fold enrichment compared to a null distribution, P < 0.001, permutation test, r = 1,000). The larger disparity at LIV_Log is likely due to differences in metabolic responses upon the addition of leucine alone (as in prior studies) versus supplementation with leucine, isoleucine, and valine as in our study. Overall, across the conditions in our study, we identified over 730 novel Lrp binding sites, 198 sites only occurring under conditions that had not previously been tested and 532 sites occurring under conditions similar to those tested previously, highlighting both the enhanced resolution given by modern sequencing techniques (23) and the general need to explore a broad range of possible conditions to understand the complete regulon of any transcriptional regulator (binding sites under each condition in the regulatory regions of genes are enumerated in Data File S1; data on all identified binding sites can be accessed at GEO data set GSE111874). In fact, our newly identified binding sites found under conditions that have been previously studied with ChIP-chip (15) have, on average, lower ChIP signal at their peak summits than the peaks that overlap the peaks found in the previous ChIP-chip study (Fig. S3). However, the newly identified direct Lrp targets in this study have a distribution of magnitude log2 fold change in RNA expression similar to that of genes that were previously annotated as Lrp targets in RegulonDB (Fig. S4), showing far more overlap with the effect sizes shown for previously annotated targets than did the ChIP signals in Fig. S3. Taken together with our stringent data analysis pipeline, these data suggest that the novel sites we are identifying in this study represent functional Lrp binding sites that are revealed by our more sensitive experimental methods.
Lrp’s regulatory effects are broad and highly condition specific.
Through our use of parallel RNA-seq experiments in WT and lrp::kanR mutant cells, we were able to identify the full range of transcripts showing Lrp-dependent regulation across the conditions in our study. Based on our RNA-seq data, we find that the number of genes regulated by Lrp varies across conditions from 1.7% to 30.0% of all known E. coli genes (Fig. 1A and Table 1); in all, 2,459 genes (52.8% of all genes in E. coli) show Lrp-dependent changes in transcript levels under at least one condition (see Materials and Methods for details). Lrp-dependent RNA expression changes for each categorized gene can be found in Data File S1. Overall RNA expression changes for all annotated genes in both WT and lrp::kanR mutant cells can be accessed at GEO data set GSE111874.
Comparing all genes with an Lrp-dependent change in expression in our RNA-seq data to genes previously identified as Lrp targets, our data set overlaps with 73% of the known targets in RegulonDB (1.38-fold enrichment compared to a null distribution of randomly shuffled gene names, P < 0.001, permutation test, r = 1,000), 81% of the previously identified ChIP-ChIP targets (1.53-fold enrichment compared to a null distribution, P < 0.001, permutation test, r = 1,000) (15), and 89% of the previously identified microarray targets (1.68-fold enrichment compared to a null distribution, P < 0.001, permutation test, r = 1,000) (2), showing good agreement across the variety of strains and medium conditions present in the compared studies, despite some variations in precise experimental conditions (it is important to note that the large fraction of the genome that is regulated by Lrp imposes a fairly low upper limit on the amount of enrichment possible compared with prior lists of targets). Our data also reveal 2,241 genes with previously undocumented Lrp-dependent expression.
The majority of Lrp-dependent regulation occurs via indirect effects.
Global regulators are known to act both directly, by binding target sites and modulating transcription levels, and indirectly, by modulating the expression of other transcription factors or regulatory RNAs which have their own targets (1). Previously, most focus on Lrp regulation has been at the direct target level. By comparing the binding data from our ChIP-seq experiments and the corresponding expression data provided by our RNA-seq experiments, we are able to identify and categorize both direct and indirect targets under a variety of physiological conditions (see Materials and Methods). Direct and indirect targets are both characterized by Lrp-dependent changes in transcript level, but only direct targets have a Lrp binding signal in their regulatory regions, defined as 250 bp upstream and downstream of all annotated transcription start sites in RegulonDB (Fig. 2A; annotations from RegulonDB; see Materials and Methods for details).
FIG 2.
Lrp regulates genes both directly and indirectly. (A) Schematic showing how genes were categorized, with direct targets of Lrp (Lrp-bound regulatory region and with a significant RNA expression change between WT and lrp::kanR mutant cells), indirect targets (not bound but with a significant RNA expression change), poised targets (bound but with no significant RNA expression change), or not linked (not bound and no significant RNA expression change). Filtering was done independently for each condition. (B) Heat map indicating how each gene was classified under the nine experimental conditions. Genes with no Lrp link under any condition were removed from visualization. Genes were hierarchically clustered using a Manhattan distance metric and average linkage clustering. Pink boxes mark out notable clusters of genes, those with leucine-dependent or -independent binding. (C) ChIP density and RNA-seq expression change [log2(WT/KO)] for direct Lrp target LrhA and its known target genes, the FimE, FlhC, and FlhD genes (17). Error bars for the RNA-seq data indicate a percentile-based 95% confidence interval from 100 bootstrap replicates of expression levels. Labels above each bar indicate classification of the gene based on combining RNA-seq and ChIP-seq results (D, direct Lrp target; I, indirect Lrp target; N, no Lrp link). (D) Proposed model of Lrp/LrhA-mediated regulation of LrhA targets. (E) ChIP density and RNA-seq expression change [log2(WT/KO)] for direct Lrp target CysB and some of its known target genes, the TcyP and CysI genes (17), as in panel C. (F) Proposed model of Lrp/CysB-mediated regulation of CysB targets. In both panels C and E, only conditions where LrhA or CysB was a direct target are shown.
In order to allow cross-referencing of our binding and expression data, we restrict our analysis here to the set of genes for which an annotated transcription start site exists in the PromoterSet data set in RegulonDB version 9.4 (17), or an unannotated transcription start site exists in the same direction within 500 bp upstream of the start of the coding region (covering 2,908 genes out of the possible total of 4,658). Typically, this categorizable subset includes the first gene in each transcriptional unit, as well as any genes with additional internal promoters. A heatmap of classifications for all categorizable genes across conditions is shown in Fig. 2B. Additionally, the total numbers of direct and indirect targets which were identified in previous studies are tabulated in Table S1.
From our analysis of that categorizable subset, we note that 37.8% of all E. coli genes are regulated by Lrp, either directly or indirectly, under at least one condition. Out of those, about 10% are only ever regulated directly across the experimental conditions, 84% are only ever regulated indirectly, and 6% are regulated directly and indirectly under different conditions. Due to the restriction on categorizing genes noted above, the counts given here are an underestimate. Even so, we also observe a dramatic increase in the number of indirect targets at MIN_Trans and RDM_Stat, going from 237 to 610 indirect targets from MIN_Log to MIN_Trans, and from 36 to 985 indirect targets from RDM_Trans to RDM_Stat.
Given the high proportion of indirect Lrp targets, and especially the dramatic increase in the number of indirect targets at MIN_Trans and RDM_Stat (Fig. 2B), we investigated whether some of the expression changes of those indirect targets can be explained by the activity of direct Lrp targets at those time points. As Lrp is a global regulator, we expected to find that some percentage of its indirect targets under each condition were annotated targets of transcriptional regulators categorized as direct Lrp targets under that condition (all transcription factor-gene interactions were taken from RegulonDB [17]; see Materials and Methods for details). We would expect that in such cases, we should observe an enrichment among Lrp indirect targets of genes known to be regulated by Lrp direct targets. We observe significant (q < 0.05, permutation test), albeit small, enrichment of explainable indirect targets at LIV_Log and RDM_Stat; a maximum of 5.2% of the indirect targets can be explained by the currently known targets of direct Lrp targets (Table S2). Several key transcription factors that are direct Lrp targets are responsible for explaining the identified indirect Lrp targets across conditions, as follows: Nac, LrhA, LeuO, ArgR, QseB, CysB, SlyA, SoxS, and GadW (Table S2). Many of these transcription factors have been previously identified as Lrp targets (24). Direct Lrp targets that are not currently identified as transcriptional regulators or regulators with incompletely documented regulons could account for why we are not able to explain more instances of indirect regulation, as could transcriptional units regulated by aspects of cellular state that are themselves Lrp dependent (recent large-scale studies based on current annotations of the E. coli transcriptional regulatory network demonstrate that our current enumeration of regulatory interactions is incomplete [25]). In addition, regulatory RNAs that are direct Lrp targets are also likely important in mediating indirect regulation. However, based on current RegulonDB annotations, we are unable to account for any indirect targets in this manner. Since our classification scheme allows for some genes within operons to be separately annotated from transcription of the full operon due to existing internal TSSs, it is possible that some indirect targets could be accounted for by direct Lrp regulation of the gene’s parent operon. Additionally, it is also possible that Lrp is mediating transcription by binding within a particular gene’s coding region rather than its TSS. We therefore subclassified our indirect targets into these possibilities and noted that both cases represent a small fraction of indirect targets under all conditions (Fig. S5B). Furthermore, Lrp binding within the coding region of genes does not appear to meaningfully impact raw RNA-seq coverage as would be expected if Lrp was interfering with transcription by binding within the coding region of genes (Fig. S5C to E).
Investigating at a local as opposed to global scale provides several informative examples of potential indirect regulation by Lrp. At LIV_Log, LIV_Trans, and RDM_Log, the gene for the dual regulator LrhA is a direct Lrp-activated target gene (Fig. 2C). LrhA represses flhC and flhD (Fig. 2D). At LIV_Trans, flhC is indirectly repressed, and at RDM_Log, both flhC and flhD are indirectly repressed (Fig. 2C). While this pattern does not show activity at all LrhA targets under each condition (for example, fimE is known to be activated by LrhA under some conditions but does not show a Lrp-dependent response under conditions tested here), overall, it suggests that indirect regulation of flhCD by Lrp may be explained in some cases by direct LrhA activation by Lrp. All three target genes (fimE, flhC, and flhD) are also known to be regulated by other transcription factors, potentially explaining the incomplete activity from LrhA. Similarly, at MIN_Trans, the transcriptional regulator CysB is the product of a direct Lrp-repressed target gene (Fig. 2E). CysB is known to activate tcyP and cysI, among other genes (Fig. 2F). Both tcyP and cysI were categorized as indirect Lrp-repressed targets, supporting the hypothesis that Lrp repression of cysB is what leads to repression of tcyP and cysI. The transcription factor GadW is an interesting example in that it is a direct Lrp-repressed target at LIV_Log and a direct Lrp-activated target at RDM_Stat. Under both conditions, more than 75% of GadW’s classified annotated targets are indirect Lrp targets, all repressed at LIV_Log and all activated at RDM_Stat, as would be expected if GadW activates them. Thus, this illustrates another case where indirect Lrp-mediated regulation is explained by identifying a transcription factor which is a direct Lrp target. However, it is important to note that due to the interconnected regulatory network of E. coli, compensatory and interconnecting mechanisms likely also contribute to the regulation of these targets; Lrp is unlikely to be the sole regulator responsible for the observed behavior, and other Lrp-dependent pathways may also act in parallel with those suggested.
The majority of Lrp binding reflects poised, rather than active, regulatory sites.
In addition to the indirect regulation discussed above, our data show many examples of a converse mode of Lrp activity, in which binding of Lrp is apparent at a particular promoter but there are no Lrp-dependent changes in expression (Fig. 2A). In fact, these sites comprise as little as 53% (at LIV_Log) to as much as 92% (at MIN_Stat, LIV_Stat, and RDM_Trans) of all instances of Lrp binding (Fig. S5A). We refer to such cases as poised targets of Lrp, since they suggest that Lrp is bound in preparation for regulatory activity under changed conditions. The regulatory potential of the identified poised sites is apparent from the fact that across the set of nine experimental conditions in our study, 40% of the genes that are poised targets under at least one condition become direct targets under a different condition, and conversely, 93% of direct Lrp targets are poised targets under at least one other condition (several such cases are discussed in the following section; for further discussion on the classification of poised targets, including analysis of the ∼60% of poised targets that do not become direct targets, see Text S2). Among genes that undergo a transition between being a poised target and a direct target, 37.8% become activated, 45.9% become repressed, and 16.2% become both activated and repressed under different conditions. The gene brnQ, for example, shows a strong Lrp binding site in its regulatory region under all conditions that we studied, but it is only Lrp repressed under a subset of those conditions and under other conditions is clearly unaffected by Lrp (Fig. 3A). On the other hand, consistently poised binding is apparent at the mog gene; its promoter region is bound by Lrp in 7 of the 9 conditions we studied, but it never exhibits a significant change in expression, thus making it a poised target under all of those conditions (Fig. 3B). Interestingly, mog plays a role in the synthesis of molybdenum-containing cofactors (26) and is nonessential under the conditions we studied here (17). The consistent binding of mog’s promoter by Lrp suggests that Lrp is poised to regulate mog and that it may be a direct target of Lrp under conditions not tested here (perhaps those involving changing molybdenum concentration, as all of our conditions supply abundant molybdenum as part of the MOPS micronutrient mixture [27]). In our consideration of poised targets, it is important to note that some fraction of targets that we assign as poised may actually represent direct targets that our differential expression analysis lacks the power to detect, although this scenario likely accounts for only a minority of cases (see Text S2). At a system-wide level, it is particularly apparent from the highlighted blocks of leucine-dependent and leucine-independent binding sites in Fig. 2B that many genes are bound by Lrp under a far broader range of conditions than the set under which they are regulated by Lrp (or at least by Lrp alone). These findings suggest more broadly that Lrp is often poised at a particular gene under many conditions but may act combinatorially with some other factor or environmental stimulus in order to actually alter expression. The total number of poised genes which overlap those identified in previous studies is shown in Table S1.
FIG 3.
Poised targets show condition-dependent Lrp regulation separate from their Lrp binding profiles. (A) ChIP robust Z-scores (left) and Lrp-dependent RNA-seq expression change [log2(WT/KO), right] for Lrp-poised target brnQ, which shows condition invariant binding but is clearly Lrp regulated only under some conditions (MIN_Log, MIN_Trans, and RDM_Stat), whereas under other conditions. it can be confidently said to have no substantial Lrp-dependent effect (LIV_Stat and MIN_Stat). Color code is as in Fig. 1B. (B) As in panel A, but for the mog gene, which shows constitutive Lrp binding but no direct Lrp-dependent regulation under the conditions studied.
Poised Lrp targets enable condition-specific combinatorial regulation.
The abundance of genes that shift between direct and poised Lrp targets across conditions suggests that Lrp binds some promoters in a poised position under a broad range of conditions but only regulates when certain additional criteria are met, perhaps by coordinating with a second regulatory factor to enable combinatorial logic or acting completely redundantly with a second factor to repress or activate transcription from a particular target. That second regulator could in principle be a sigma factor, another classical transcription factor, or even a nearby condition-dependent Lrp binding site; we in fact observe examples of all three such scenarios in our data.
Lrp binding at the potF promoter region represents a case of additional nearby Lrp binding being associated with conversion of a poised to a direct target. A strong Lrp binding signal is seen directly at the potF promoter under all nine conditions measured in our data, but potF expression is only activated by Lrp under six of the conditions that we studied (Fig. 4A). In contrast to the variable Lrp-dependent RNA expression levels, Lrp binding directly at the potF promoter is very similar across conditions, spanning a similar length of DNA and showing maximal signal at the same point. However, an adjacent upstream Lrp peak at the ybjN promoter shows nearly monotonically increasing occupancy with the strength of Lrp-dependent regulation. Interestingly, this secondary peak does not appear to modulate expression of the ybjN gene under any of the conditions in our study, suggesting that the primary role of this secondary peak under these conditions is in the modulation of potF. This secondary condition-specific binding site may represent an interaction between a weak and strong Lrp binding site or it may represent the formation of a hexadecamer through the bridging of these two sites (Fig. 4B). Future studies will be needed to differentiate these possibilities. Additional examples of secondary peaks appearing under the conditions where we can detect Lrp-dependent regulation can be seen clearly for sdaA (Fig. 1C), lrhA and alaA (Fig. S6A), and dadA and ycgB (Fig. S6B).
FIG 4.
Lrp sits at genes in poised position in preparation for regulatory activity. ChIP robust Z-score (left) and RNA-seq expression change [log2(WT/KO), middle] for two Lrp targets. (A) potF represents a case where a secondary Lrp peak is seen only under the conditions where it is a direct target. ybjN is clearly not transcriptionally regulated by Lrp under these conditions. Here and in panel C, error bars for the RNA-seq data indicate worst-case percentile-based 95% confidence interval from 100 bootstrap replicates of expression estimates across different biological replicates (see Materials and Methods for details). Labels above each bar indicate classification of the gene based on combining RNA-seq and ChIP-seq results (D, direct; I, indirect; P, poised; N, no Lrp link; see Fig. 2A for details). Dashed lines in RNA-seq plots indicate a 1.5-fold cutoff for the ratio between WT and KO strains needed for biological significance (see Materials and Methods for details). (B) Model suggested by panel A, in which Lrp primarily interacts at the promoter site under most conditions (poised), but upon changes in condition, Lrp binds adjacent sites as a separate octamer (direct top) or together as a hexadecamer through potential looping of the DNA (direct bottom). (C) pepD represents a case where no obvious changes in Lrp binding signal occur, but differences in transcriptional regulation are clear. (D) Model suggested by the data in panel C, where Lrp is always bound at the site (poised) but upon conditions where a secondary factor is needed for expression, and Lrp is present to interact with the factor (direct) and block or enhance its activity. Here, gpt displays RNA expression patterns similar to those of pepD.
In contrast to potF, pepD has relatively invariant Lrp binding signal at its promoter across each condition (Fig. 4C). Although a small secondary peak can be seen under each MIN condition, only the MIN_Trans condition shows direct Lrp regulation. Additionally, the RDM_Stat condition does not show this secondary peak but is similarly Lrp repressed, suggesting that the secondary peak is not sufficient to explain Lrp regulation at this locus. Thus, pepD likely represents a case where Lrp is interacting with an additional factor; for example, the activity at pepD could be explained by Lrp’s presence blocking a transcriptional activator from binding (Fig. 4D), although other scenarios are also possible. An additional example of invariant Lrp binding with differential RNA regulation can be seen at ilvI (Fig. 1A). As detailed in the supplemental material (Text S3), systematic analysis of all genes with Lrp bound at their regulatory regions reveals that both the transcription factor NtrC and σ54 explain a subset of these transitions from poised to direct targets, but additional factors also likely interact with Lrp in similar ways.
Lrp directs distinct survival strategies across changing nutrient conditions.
By applying iPAGE (28) to search for gene ontology (GO) terms that that show significant patterns in Lrp binding and regulation across conditions, we identified several key patterns in Lrp’s regulatory logic (summarized in Fig. 5A and detailed in Fig. S7). Consistent with its previously established physiological roles and the fitness effects of lrp loss-of-function mutations (see, e.g., references 24 and 29), the most prominent pathways regulated by Lrp involve the synthesis and uptake of amino acids, as well as nutrient foraging (via regulation of flagellar motility). In particular, Lrp tends to directly activate amino acid biosynthetic pathways during logarithmic- and transition-phase growth, particularly in nutrient-poor media, and at the same time to directly repress amino acid uptake pathways under the same conditions, presumably responding to a lack of available substrates in the environment (Fig. S7B). Regulation of leucine transport itself represents a special case, where Lrp appears to activate a subset of leucine transporters and repress others (Fig. 5B). A similar switch is apparent in Lrp’s regulation of flagellar motility in rich media, where Lrp acts as a global repressor of motility in log phase (presumably keeping cells static under conditions of optimal nutrition) but lifts repression and activates a small set of flagellar genes when nutrients are depleted during stationary phase (Fig. 5C). Thus, Lrp directly governs a shift between strategies of synthesizing or foraging for critical nutrients, depending on their availability in the cell’s surroundings. A very different Lrp-dependent regulatory program is apparent under conditions of slowing growth (typified by our MIN_Trans and RDM_Stat conditions), where Lrp additionally acts to inhibit translation, through indirect repression of ribosomal components and tRNA synthetases (Fig. S7B).
FIG 5.
Pathway analysis of genes regulated and bound by Lrp. (A) Pathway analysis using iPAGE identifying GO terms that show significant mutual information with gene classification (direct, indirect, poised, or no Lrp link). Color indicates magnitude of the log10 P value, with positive values indicating enrichment and negative values indicating depletion of members of a given GO term among genes in that class. Boxes indicate particularly outstanding cells (P < 0.01). Panel A is continued in panel A'. (B) Comparison of Lrp-dependent effects on gene expression for genes annotated with GO:0015820 (leucine transport). Stars indicate significant Lrp-dependent expression changes (following our standard criteria), with error bars indicating bootstrap-based 95% confidence intervals. (C) As in panel B, for genes annotated with GO:0071973 (bacterial-type flagellum-dependent cell motility). Error bars that pass to infinity under our bootstrap-based 95% confidence intervals are indicated with dashed lines. Bars where the WT/KO ratio could not be determined are not plotted. (D) iPAGE plots (as in panel A) showing genes with significant leucine dependents of nearby Lrp binding sites. Categories are: −, Lrp binds target genes only under low-leucine conditions (MIN); +, Lrp binds target genes only under high-leucine conditions (RDM and LIV); X, leucine independent (Lrp binds in at least 8 of 9 conditions); M, genes with any other pattern of Lrp binding. Several additional transposition-related terms with similar expression profiles to GO:0032196 are omitted for clarity. (E) As in panel D, showing dependence of binding on growth phase. Categories are L, Lrp binds target genes only in log phase (in one or more medium types, but at no other growth phase); T, Lrp binds target genes only in transition phase; S, Lrp binds target genes only in stationary phase; and M, all other genes with Lrp binding across multiple growth phases.
It is intriguing to note that several of Lrp’s pathway-level activities, such as the aforementioned switching between amino acid biosynthesis and transport, do not appear to depend solely on the presence of leucine, as similar regulatory behaviors are observed under both our MIN and LIV conditions. Indeed, the same behavior is also suggested by the leucine-independent cluster of Lrp targets noted in Fig. 2B. To assess if there is any class of genes that Lrp binds in a leucine-dependent manner, we identified GO terms showing informative patterns of leucine-dependent occupancy at their promoters (Fig. 5D). Aside from being surprisingly short, this list is especially notable for the fact that Lrp binds genes associated with several GO terms involved in amino acid metabolism independently of leucine, suggesting an important role for poised regulation in mediating the key metabolic functions of Lrp. A similarly small set of GO terms shows consistent occupancy patterns at different phases of growth (Fig. 5E), although the inclusion of tRNAs among those targets is notable. These findings highlight the important role played by Lrp in repressing the translational apparatus during stationary phase but at the same time demonstrate the importance of poised regulation and local regulatory interactions in setting the effects of Lrp at each bound promoter.
Lrp shows condition-dependent changes in DNA sequence specificity.
While not as invariant as motifs for other E. coli transcription factors, a 15-bp motif comprising terminal inverted repeats and an AT-rich center has been previously identified for Lrp (15, 30). To determine how well previously identified Lrp motifs could predict the binding sites identified in our study, we used a logistic regression model to classify 500-bp windows of the genome as either containing a Lrp binding site or not, using as predictors the presence of previously documented Lrp motifs and the AT content (given the AT richness of the Lrp motif itself). Starting with a minimal model containing only an intercept term, we created more complex models by adding a single predictor at a time and scoring each new model with the Bayesian information criterion (BIC), as displayed in Fig. 6A (note that a lower BIC indicates a more parsimonious model). A minimal model was chosen by adding to the new model the predictor with the largest decrease in BIC from the intercept-only model and iterating this process until the change in BIC switched signs (indicating that additional terms were no longer informative). A similar analysis was done in which we started with a full model containing all of the predictors and removed the predictor with the largest increase in BIC until the change in BIC switched signs (Fig. S8). In both cases, we arrived at the same set of minimal models for each condition. Intriguingly, among the minimal models for each condition, we see a shift between a general preference for low-information-content AT-rich regions at Log points and a preference for higher-information-content sequence motifs at later time points across all conditions (Fig. 6A); importantly, all coefficients for motifs in the final models shown in Fig. 6A are positive, indicating that their presence is informative of higher levels of Lrp occupancy. Here, we are referring to information content in the information-theoretic sense, i.e., a motif with higher information content indicates that protein has higher specificity for more positions within the motif, and we consider a motif with higher information content to indicate a higher sequence specificity. Under each condition, from early to late time points, there is a decrease in how predictive the general AT content is in terms of differentiating between Lrp binding sites and background genomic locations. While their relative importance to the model shifts, the minimal variables needed to explain most of the data include a combination of AT content and established Lrp motifs across all conditions. This suggests that Lrp binding is less influenced by higher-information-content sequence motifs in earlier phases of growth and only gains preference for these higher-information-content sequence motifs upon nutrient limitation and entrance into stationary phase, which also agrees with our observed increase in the number of peaks in later time points. Additionally, this pattern of specificity agrees with the proposed position of importance of Lrp as a regulator of the transition to stationary phase. However, since we see the same lack of preference for higher-information-content sequence motifs in LIV_Log and MIN_Log (two conditions with dramatically different leucine concentrations), we can conclude that leucine level alone is not sufficient to shift the binding specificity of Lrp, but rather that other signals (such as, potentially, energy/carbon source availability) must also be integrated somehow into Lrp’s binding.
FIG 6.
Lrp exhibits condition-dependent sequence preference. (A) Change in BIC for add-one-in logistic regression models. The y axis displays the position weight matrix (PWM) used to create a particular feature. PWMs were obtained from the publication indicated above the PWM (15, 30, 61), RegulonDB (17) or, in the case of SR motifs, the SwissRegulon (62). Features were created from a given PWM by dividing the count of matches within a sequence (as obtained by FIMO [63] with q < 0.0001) by the length of the sequence. AT-stretch indicates the longest stretch of continuous As and Ts normalized by the length of the sequence. AT content indicates the number of As and Ts normalized by the length of the sequence. Colors, moving from dark red (negative BIC, added term is favored in the model) to light blue (positive BIC, added term is disfavored in the model), then indicate the change in BIC when a given term is added to a minimal model containing only an intercept term. Heavy boxes indicate a feature was included in the final model for that condition. For both this panel and panel B, the positive class of sequences was obtained by taking 500 bp around the center of each peak for each condition. The negative class of sequences was obtained by taking three times the number of equal-sized random sequences from the subset of the genome that was not in a peak for that condition. (B) Receiver operator characteristic curves for each final model by condition. Curves were calculated at 0.01 increments from 0 to 1 for a predicted probability cutoff from the logistic regression. Full statistics, including 5-fold cross-validation are included in Table S3.
The derived models perform relatively well; the receiver operator characteristic (ROC) curves, which show the recall for every potential false-positive rate, trend toward the upper-left corner where a perfect model would be (Fig. 6B; quantified by the ROC-area under the curve [ROC-AUC] in Table S3). In addition, the Matthews correlation coefficient (MCC), a combined measure of precision and recall which has potential values from 0 to 1, ranges from 0.24 to 0.61 (Table S3). These performance metrics were robust to withholding of shuffled subsets of the data, as indicated by minimum and maximum values found in 5-fold cross-validation (values in parentheses in Table S3). Overall, the specificity of these models is much better than their sensitivity, indicating that they perform well in rejecting locations where Lrp does not bind. However, there is still substantial room for improvement in calling Lrp-bound sequences. Interestingly, the sensitivity drops under the conditions where higher-information-content sequence motifs are more informative. It is likely that we are missing additional features that would improve the sensitivity under these conditions; however, efforts to discover additional sequence determinants of Lrp binding were unsuccessful, as well as efforts to determine any sequence elements that differentiated activated from repressed targets (data not shown). This could simply indicate that sequence-independent mechanisms, such as the well-established observation of Lrp cooperativity in binding (31), or recruitment of Lrp by binding of additional factors, could play a role in determining Lrp binding locations.
DISCUSSION
Lrp regulates hundreds of genes in distinct categories by direct and indirect mechanisms.
By investigating both the binding and regulatory activities of Lrp under several medium conditions and time points, we are able to present a broader view of the Lrp regulon. Our use of a high-quality antibody against native Lrp removes any possibility of epitope tagging hindering native behavior in our experiments, and the use of modern sequencing-based methods provides us with a high-resolution snapshot of both Lrp’s binding and regulatory activities. We document hundreds of novel targets and note the especially important effect of indirect regulation at MIN_Trans and RDM_Stat, which appear in our experimental setup to correspond to times of high Lrp activity due to dropping nutrient conditions. Targets may appear selectively under certain conditions due to a number of potential influences, including that (i) various levels of Lrp protein may dictate that only the strongest binding sites are occupied, (ii) required coregulators may only be expressed under certain conditions, and (iii) posttranslational modifications of Lrp may influence its binding or interaction with coregulators. In addition, the high number of indirect targets that are unique to one or two conditions are most easily explained by invoking one transcription factor that is regulated by Lrp, as we discussed above, but we cannot know from this study alone how many levels of regulatory control are actually in play for many indirect targets. Given that, it is clear that there are many possibilities for condition-specific regulation.
The differences between direct and indirect targets are borne out by the GO term analysis in which we see a shift between GO terms at direct targets (more transport- and biosynthesis-related genes) and those at indirect targets (flagellum-associated genes, among others). This could point to organization at a temporal level; the genes needing most urgent regulation (such as those involved directly in importing or generating needed nutrients) may be under direct Lrp control, while genes requiring less urgent modulation and instead governing foraging strategies may be indirectly regulated by Lrp. Many of the identified GO terms include genes previously implicated as Lrp targets, indicating agreement with previous work. However, newly identified targets and novel patterns of regulation (such as poised binding) suggest that further work on the mechanistic aspects of Lrp regulation is important.
Poised Lrp binding argues for interaction with coregulatory factors.
From our experiments, we identify many points at which Lrp binds the regulatory region of a gene without producing an effect on transcription, and even points at which an apparently identical Lrp binding pattern has no effect on transcription under one condition but has a substantial effect under another. Given that Lrp binding is enriched in regulatory regions relative to other locations in the genome, this argues against a purely DNA-organizing role for these poised sites. If that was the case, we would expect Lrp binding sites (the majority of which are poised sites under any condition) to be distributed more evenly across the genome. The idea of poised regulation is not without precedent, as poised regulation has also been reported for some eukaryotic transcription factors, such as the tumor suppressor p53 in binding to the mdm2 gene (19). Therefore, while Lrp itself is not conserved in eukaryotes, its ability to bind without regulating may have parallels to eukaryotic regulation, suggesting convergent evolution to a similar regulatory scheme.
There are several possibilities for why Lrp may not have regulatory function in all cases where it binds, including that (i) Lrp acts as a scaffold to interact directly with other proteins which are only present under certain conditions and modulate transcription, (ii) Lrp wraps DNA in order to control DNA accessibility of other regulators, reminiscent of eukaryotic histone-like behavior, and/or (iii) switching between the presence of a Lrp octamer or hexadecamer may control or influence the regulatory behavior of Lrp. We investigated the first possibility by analyzing if certain σ factors or transcription factors might be responsible for the condition-dependent regulation on a global scale (see Text S3). While many potential connections appear to be cases of convergent regulation, we do find that Lrp may facilitate NtrC-σ54 interaction by binding and bending DNA. This would agree with the connection between Lrp and nitrogen metabolism regulation seen previously in genome-wide studies (32). Analogous interactions with other transcription or regulatory factors may explain other poised/direct target transitions. For example, Lrp interaction with H-NS is important for regulating rRNA promoters (33), and Lrp competition with DNA adenine methyltransferase is critical in regulating expression of the pap operon, which produces pili (34). In addition, nonprotein small molecules, like guanosine tetraphosphate (ppGpp), are known to affect some Lrp-regulated target genes (35). Finally, although we do not see global evidence in our analysis, gene-level studies have previously implicated Lrp in interacting with σ38 (36, 37). Further studies are needed to investigate Lrp’s interactions with other regulatory factors and the alternate mechanisms proposed above. We must also acknowledge the possibility that some fraction of the always-poised sites present in our data set are in fact false positives; some false-positive rate is essentially unavoidable in a high-throughput experiment of this type, and thus, the behavior of any particular site can only be resolved with certainty through a targeted follow-up experiment. However, several lines of evidence point to the majority of poised sites being genuine and likely being cases where Lrp binding will play a regulatory function under an as-yet-unstudied condition, as follows: our ChIP-seq peak calling pipeline is designed to err toward being conservative; even the low-intensity binding sites near direct targets called in our study have levels of regulatory activity similar to those identified in previous experiments (see Fig. S4 in the supplemental material); and extrapolation from our existing set of conditions suggests that while we have neared saturation in our discovery of poised targets, the fraction of poised targets that become direct under at least one condition is likely to increase upon consideration of additional conditions not studied here (Fig. S9).
Lrp binding activity is partially predicted by known sequence motifs.
While we detected a preference for Lrp binding at several previously identified related motifs and AT-rich regions, there are still a significant subset of peaks that are not predicted by these models. We were unable to improve Lrp binding prediction from additional sequence determinants despite application of several state-of-the-art motif finders. As mentioned above, this could be due to Lrp binding initially at a sequence-specific location and subsequent Lrp molecules binding due to cooperativity and the high local concentration of Lrp molecules provided by Lrp’s oligomeric nature. Alternatively, Lrp itself may be recruited by other proteins. Due to Lrp’s relatively high nonspecific DNA binding affinity, especially under rich conditions (14), it is reasonable to find that not all of its binding locations can be predicted based on sequence alone. It is again important to note that the switch in DNA binding specificity occurs regardless of the levels of leucine, suggesting that other small-molecule regulators (16) or potentially posttranslational modifications (38, 39) may play a role in Lrp regulatory activity. Additionally, despite extensive effort, we were unable to identify any sequence determinants capable of reliably explaining Lrp regulatory activity, either through predicting transitions from poised to active regulation or distinguishing Lrp activation from Lrp repression. Possible mechanisms for this behavior include interactions with condition-specific factors that bind near the multifunctional Lrp sites (many potential partners have likely not yet been characterized), condition-dependent DNA looping triggered by the binding of Lrp to nearby sites or by octamer-hexadecamer transitions, or posttranslational modifications to Lrp itself. Dissecting the detailed molecular mechanisms underlying the binding and regulatory landscape that we have revealed here will be a fruitful area for future research.
MATERIALS AND METHODS
Strains and media.
The WT strain used in this study was E. coli K-12 MG1655 (ATCC 47076). The Lrp deletion strain was constructed by homologous recombination, resulting in the insertion of a kanamycin resistance cassette (40). The primers used for strain construction and validation are listed in Table S4 in the supplemental material. The lrp::kanR mutant strain was validated by sizing of the P965/P1568/P1569 products and Sanger sequencing.
All routine cell growth during cloning was done in LB medium (10 g/liter tryptone, 5 g/liter yeast extract, 5 g/liter NaCl) or on LB plates (LB medium plus 15 g/liter Bacto agar) supplemented with 50 μg/ml kanamycin or 100 μg/ml ampicillin (both from United States Biological, Salem, MA) as required. For the ChIP-seq and RNA samples, a single colony of wild-type E. coli or the lrp::kanR mutant strain was inoculated into MOPS medium (Teknova, Hollister, CA) with 0.04% glucose (27) and grown overnight. The cells were then back-diluted to an optical density at 600 nm (OD600) of 0.003 in 100 ml of the appropriate target medium. Experiments were performed in MOPS with 0.2% glycerol (the MIN medium condition), MOPS with 0.2% glycerol and 0.2% (wt/vol) each leucine (Amresco, Solon, OH), isoleucine (Alfa-Aesar, Haverhill, MA), and valine (Amresco; the LIV condition), or MOPS plus 0.4% glycerol, ACGU, and EZ supplements (Teknova; the RDM condition). The medium conditions are summarized in Table S5.
The cells were grown at 37°C with shaking (200 rpm) until the OD600 was between 0.15 and 0.25 (for log-phase samples), between 1.8 and 2.2 (for transition point in MIN or LIV medium), between 2.3 and 2.7 (for transition point in RDM), or 12 h past the log point (for stationary-phase samples). The same incubator was used for all cell growth in order to limit variation in temperature or aeration. The OD600 range for transition point harvest was determined by monitoring the growth of cells grown under conditions identical to those of the experiment and selecting the point in the OD600 range during which exponential growth becomes nonlinear when visualized on a log scale (Fig. S1). Logarithmic-growth rates for all samples are summarized in Fig. S10. Associated data for growth rates can be found in Data File S2.
ChIP-seq.
At the appropriate time, either WT or lrp::kanR mutant cells were cross-linked by adding formaldehyde (37%; Sigma-Aldrich, St. Louis, MO) to a final concentration of 1% (vol/vol) and incubated with shaking for 15 min at room temperature. Formaldehyde cross-linking was quenched by the addition of Tris (pH 8) to a final concentration of 280 mM and incubation with shaking at room temperature for 10 min. The culture was then immediately centrifuged for 5 min at 5,500 × g at 4°C. The pellet was washed twice with 30 ml ice-cold TBS (50 mM Tris, 150 mM NaCl [pH 7.5]) before being resuspended in 1 ml TBS. Following a 3-min centrifugation at 10,000 × g at 4°C and removal of the supernatant, the pellet was flash-frozen in a dry ice-ethanol bath and then stored at −80°C. Two biological replicates grown on different days were prepared for each condition.
The cell pellet was resuspended in lysis buffer (phosphate-buffered saline [PBS], 0.1% Tween 20, 1 mM EDTA, 1× cOmplete Mini EDTA-free protease inhibitors [Roche, Basel, Switzerland], 0.6 mg lysozyme [Amresco, Solon, OH]), vortexed for 3 s, and incubated at 37°C for 30 min. The sample was then sonicated in 3 bursts of 10 s each at 25% power (Branson Digital Sonifier). Cellular debris was removed by centrifugation at 16,000 × g for 10 min at 4°C. To obtain an accurate representation of the isolated pool of DNA before the extraction procedure, 50 μl of the supernatant was removed and mixed with EDTA to 8.6 mM and 235 μl elution buffer (50 mM Tris [pH 8], 10 mM EDTA, 1% SDS [vol/vol]) to be the input sample. The remainder of the lysate was added to 50 μl prewashed SureBeads protein G magnetic beads (Bio-Rad, Hercules, CA) and rocked for 1 h at room temperature for preclearing. A separate aliquot of 100 μl of prewashed SureBeads protein G magnetic beads was incubated with 10 μg Lrp monoclonal antibody (NeoClone, Madison, WI) for 10 min at room temperature with rocking and then washed thrice with PBS–0.1% Tween 20 before the precleared supernatant was added. The bead-lysate mixture was again incubated with rocking for 1 h at room temperature. The beads were then washed thrice with PBS–0.1% Tween 20. To elute the cross-linked Lrp-DNA complexes, the beads were resuspended in 285 μl elution buffer and incubated at 65°C for 20 min, with vortexing every 5 min. The resulting eluate was incubated overnight at 65°C to reverse the cross-links.
The sample was treated with 0.05 mg RNase A (Thermo Fisher, Waltham, MA) for 2 h at 37°C and then 0.2 mg proteinase K (Thermo Fisher) for 2 h at 50°C before the DNA was isolated by phenol-chloroform extraction and ethanol precipitation. The samples were quantified (QuantiFluor double-stranded DNA [dsDNA] kit; Promega, Madison, WI) and prepared for sequencing using the NEBNext Ultra DNA library prep kit for Illumina (NEB, Ipswich, MA). The library was checked for quality by 2% agarose gel electrophoresis using GelRed stain (Biotium, Fremont, CA). Samples were pooled and the sequencing performed on an Illumina NextSeq 500, with 38 × 37-bp paired-end reads. We obtained at least 3,000,000 reads that passed all filters and aligned properly to the genome per biological replicate, with an average of 9,000,000 reads per replicate (Table S6). Input samples were treated identically to the ChIP-extracted samples beginning at the overnight incubation to reverse the cross-links.
RNA-seq.
For RNA-seq samples in both WT and lrp::kanR mutant cells, 2.5 ml of culture was removed when cells had reached the appropriate OD and mixed with 5 ml RNAprotect bacteria reagent (Qiagen, Hilden, Germany), vortexed, incubated for 5 min at room temperature, and then centrifuged for 10 min at 5,000 × g in a fixed-angle rotor at 4°C. The supernatant was removed, and the pellet was flash-frozen in a dry ice-ethanol bath before being stored at −80°C. The pellet was resuspended in TE and treated with 177 kU Ready-Lyse lysozyme solution (Epicentre, Madison, WI) and 0.2 mg proteinase K (Thermo Fisher, Waltham, MA) for 10 min at room temperature, with vortexing every 2 min. The RNA was purified using the RNA Clean & Concentrator kit (Zymo, Irvine, CA), treated with 5 units Baseline-ZERO DNase (Epicentre) in the presence of RNase inhibitor (NEB; Ipswich, MA) for 30 min at 37°C, and then again purified with the Zymo RNA Clean & Concentrator kit. RNA quality was assessed by electrophoresis in a denaturing agarose-guanidinium gel (41). rRNA depletion was performed using the Ribo-Zero rRNA removal kit for bacteria (Illumina, San Diego, CA), halving all reagent and input quantities but otherwise following the manufacturer’s instructions. cDNA synthesis and sequencing library preparation were performed following the NEBNext Ultra directional RNA library prep kit (NEB). The library was checked for quality by 2% agarose gel electrophoresis using GelRed stain (Biotium, Fremont, CA). Samples were pooled and the sequencing performed on a NextSeq 500 platform at the University of Michigan’s DNA Sequencing Core Facility.
Preprocessing of ChIP-seq data.
Sequencing adapters were removed from all sequences using CutAdapt version 1.8.1 (42) with parameters -a AGATCGGAAGAGC -A AGATCGGAAGAGC -n 3 -m 20 --mask-adapter --match-read-wildcards. Low-quality reads were trimmed with Trimmomatic version 0.32 (43) using the parameters TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20. The quality of the raw and preprocessed fastq files was assessed using FastQC version 0.10.1 (44) and MultiQC version 1.2 (45). The numbers of raw and surviving reads for each sample are described in Table S6.
Alignment of ChIP-seq data.
All samples were aligned to the MG1655 (GenBank accession no. U00096.2) genome modified to match the insertions and deletions for the ATCC 47076 variant of E. coli MG1655 as reported by Freddolino et al. (46). Alignments were performed using Bowtie version 2.1.0 (47) and arguments -X 2000 -q --end-to-end --very-sensitive -p 5 --phred33 --dovetail in order to maximize the sensitivity of the alignment. The final alignment rates for each sample are described in Table S6.
Calculation of ChIP-seq summary signal.
The amount of Lrp-mediated DNA enrichment under any given experimental condition or genotype is represented by two different sequencing reactions: an extracted sample, where the DNA cross-linked to Lrp is extracted and purified using a specific monoclonal antibody, and a matched input sample (taken from the same tube after lysis and digestion), where the total input DNA before the extraction procedure is sequenced. Here, references to the extracted and input samples refer to the definitions given above for any given pair of samples for each combination of experimental condition and genotype. To determine the raw enrichment for a set of paired extracted and input samples, the coverage c of paired-end reads at every tenth base pair n across the genome was calculated from the alignments for the ChIP-extracted and input reads for each sample separately using SAMtools (48) and custom python scripts. The raw read coverage for extracted and input samples was then scaled using the median coverage across the genome for each individual track to account for differences in sequencing depth between the two samples. The median coverage was chosen as a scaling factor, as it represents an estimator of the baseline read coverage in a given sample that is less impacted by the exact heights of the peaks within that sample. The raw enrichment (RE) was calculated using the log2 ratio of scaled extracted to input coverage separately for each pair of extracted and input samples, as shown below:
where E and I denote the extracted and input samples, respectively. Thus, the RE represents a log2-transformed ratio of normalized extracted DNA abundance to normalized input DNA abundance. In addition, these log-transformed ratios put each sample, for each condition, on the same scale, removing the need for additional normalization between samples, thereby allowing for a direct comparison between REs from any given genotype or experimental condition.
Unique to this experimental setup is the use of Lrp knockout samples as an additional control to account for any biases from the antibody-mediated extraction procedure. Possible biases could include antibody interactions with the DNA or low-level cross-reactivity with other cross-linked proteins during the extraction procedure. To the best of our knowledge, no existing ChIP-seq analysis pipeline is able to use two separate sets of control information from both an input control sample and an entirely separate set of extracted and input samples under the knockout genotype for the protein of interest. Therefore, we set out to create our own pipeline tailored to this experimental design with the goal of minimizing the high rate of false positives commonly seen in ChIP-seq experiments. We use the lrp::kanR mutant samples to remove enrichment that also exists in the absence of Lrp by subtracting the lrp::kanR mutant RE signal from the Lrp WT RE signal to obtain a raw enrichment signal (RSE) for any combination of WT and lrp::kanR mutant replicates within a single experimental condition (see Fig. S11B and C for an example of how this subtraction removes false-positive peaks). The RSE is represented mathematically below:
The max function in the equation above ensures that the lrp::kanR signal is subtracted only if its RE was positive. Since both the Lrp WT RE signal and the lrp::kanR RE signal represent normalized log-transformed ratios, they can be directly subtracted without additional normalization between the samples from the two genotypes. The RSE can be interpreted as how much more enrichment with the monoclonal antibody over the purified DNA is obtained when Lrp is present in the WT genotype compared to when Lrp is not present in the lrp::kanR genotype. For each experimental condition in this paper, we generated two Lrp WT replicates and two lrp::kanR mutant replicates. The Lrp WT and lrp::kanR samples are not paired; therefore, we took each combination of a WT Lrp RE replicate and an lrp::kanR RE replicate to generate a raw subtracted enrichment signal representing the Lrp WT-lrp::kanR signal. This results in four possible RSEs for each condition and time point (i.e., WT rep1-KO rep1, WT rep2-KO rep1, WT rep2-KO rep1, and WT rep2-KO rep1). We next converted each of the RSE scores to a robust Z-score so that enrichments between different experimental conditions could more easily be interpreted on a universal scale. For each replicate pairing, the raw subtracted Lrp enrichment signals were converted to robust Z-score estimates (RZ) using the following formula:
Here, the 1.4826 is a standard scaling factor used to convert the median absolute deviation (MAD) in the denominator into an estimator for the standard deviation under the assumption that the values follow a normal distribution (49, 50). This allows the RZ to be treated as a proper Z-score. The RZ replicates were then averaged to generate a final occupancy signal for visualization and estimates of the ChIP signal at a peak summit. The reproducibility of both the RE and RSE for each replicate can be seen in Fig. S11A.
Determination of high-confidence Lrp binding sites.
In order to determine regions of high-confidence Lrp binding, we required the following three criteria for Lrp enrichment to be satisfied: (i) the enrichment must be technically reproducible, (ii) the enrichment must be above the input background, and (iii) the enrichment must be biologically reproducible. The following paragraphs detail how each of these criteria were determined.
Assessment of technical reproducibility of Lrp enrichment.
To assess the technical reproducibility of the Lrp enrichment, we used custom python scripts to sample with replacement from the aligned reads separately for each paired extracted and input sample. The RSE for each of the four possible subtracted Lrp WT versus lrp::kanR mutant replicates was calculated, as described in the section “Calculation of ChIP-seq summary signal,” above, for each of 1,000 bootstrap replicates. To test for technically reproducible enrichment, we considered a null hypothesis that the RSE is normally distributed centered at 0. A Z-score for each location n was then determined as follows:
where RSE0 is the unsampled data set and RSEB represents the bootstrap replicates for which m = 1:1,000. The resulting Z-score was converted to a P value using a one-sided Z test through the scipy.stats normal cumulative distribution function (51). These P values were false-discovery rate (FDR) corrected using the procedure described by Benjamini and Hochberg (52). A region was considered to be technically reproducible if its q value was less than 0.001.
Assessment of Lrp-specific enrichment.
To assess enrichment of ChIP signal above the input background and to differentiate from off-target antibody enrichments seen in pulldown using the lrp::kanR mutant strain, an RZ score (see above) was calculated for each of the four possible combinations of WT-lrp::kanR replicates, yielding a positive signal only when the WT pulldown value was substantially above that of the lrp::kanR signal. We then tested for enrichment of the RZ score above the median signal for that track through the use of a one-sided Z-test using scipy.stats normal cumulative distribution function and FDR correction of the resulting P value to a q value. To be considered enriched above background, a region was required to have an enrichment q value of less than 0.001.
Assessment of biological reproducibility.
To assess the biological reproducibility of each region n, the irreducible discovery rate (53) was calculated for each data point between the RSE signals for each possible combination of the four Lrp WT-lrp::kanR combinations for each condition and time point (i.e., WT1-KO1 and WT2-KO2; WT1-KO2 and WT2-KO1). Starting parameters for the irreproducible discovery rate (IDR) calculation for each condition included μ = 0.0, σ = 1.4826, ρ = 0.1, and an associated weight (W) based on the estimated number of bound Lrp octamers, Lx, for each nutrient condition x as determined in reference 14:
where L for MIN (LMIN) = 684, L for LIV (LLIV) = 616, and L for RDM (LRDM) = 188. A region was considered to be biologically reproducible if the FDR-corrected IDR q value for regions passing the previous 2 filters was less than 0.01 for both combinations of RSE replicates.
Combining enrichment and reproducibility into final peaks.
Final peaks were determined if a region n passed the biological reproducibility filter (using each possible pair of RSE signals) and at least one of the four subtracted replicate combinations passed both the technical and enrichment filters (using each possible RSE signal). Adjacent passing regions were consolidated into one region if they were within 30 bp. The applied cutoffs and other thresholds were confirmed to be reasonable through manual inspection of called peaks and candidate peaks that narrowly missed one or more cutoffs. An example peak in comparison to a non-Lrp-specific peak can be seen in Fig. S11B and C.
Preprocessing of RNA-seq data.
Similar to the ChIP-seq reads, sequencing adapters were removed from all sequences using CutAdapt version 1.8.1 (42) with parameters --quality-base = 33 -a AGATCGGAAGAGC -A AGATCGGAAGAGC -n 3 -m 20 --mask-adapter --match-read-wildcards. Low-quality reads were trimmed with Trimmomatic version 0.32 (43) using the parameters LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20. The quality of the raw and preprocessed fastq files was assessed using FastQC version 0.10.1 (44) and MultiQC version 1.2 (45). The numbers of raw and surviving reads for each sample are described in Table S7.
Filtering highly abundant RNAs from analysis.
In some, but not all, of our samples, as much as 70% of our RNA-seq reads were ribosomal reads or the highly abundant RNA products from ssrA and ssrS (Table S7). To filter highly abundant RNA reads and thus avoid having variations in ribosome depletion efficiency interfere with proper normalization, all RNA-seq reads were aligned using Bowtie 2 version 2.1.0 (47) to the same ATCC 47076-modified version of the U00096.2 genome used for the ChIP-seq data. The following parameters were used for Bowtie 2: -q --end-to-end --very-sensitive -p 5 --phred33 --dovetail. The subsequent alignments were parsed for reads that overlapped with ribosomal reads in a strand-specific manner using custom python scripts. New fastq files were written that only included reads that did not overlap ribosomal reads, and these files were used for downstream gene expression analyses. In all replicates, at least two million reads survived this final filter, with the smallest replicate containing 2.6 million reads after filtering (Table S7).
Gene-centric quantification of RNA-seq data.
Gene-centric quantification of RNA expression for all samples was performed using kallisto version 0.43.0 (54) with the arguments quant -t 4 -b 100 --rf-stranded. The appropriate transcriptome file needed for alignment was created through converting the GeneProductSet data set from RegulonDB version 9.4 (17) to the appropriate ATCC 47076 coordinates and input file format for kallisto using custom python scripts.
Determination of Lrp-dependent changes in transcription.
To determine Lrp-dependent changes in transcription, we used kallisto’s companion postprocessing data analysis software, sleuth (55), to model the transcript abundance for each condition and time point. We tested for differential expression between the WT and lrp::kanR mutant strains separately for each condition and time point by using a Wald test on the genotype term of the simple model: transcript abundance ∼ genotype; here, the lrp::kanR is the baseline condition. Additionally, we used the bootstrapped read counts from kallisto to calculate the average WT to lrp::kanR mutant expression ratio. We first normalized the count k for each gene i using a scaling factor, sj, for each replicate j as adapted from equation 5 in reference 56 and shown below:
with N indicating the total number of genes. Expression for gene i in replicate j is thus
We then calculated the log2 expression ratio (expr ratio) between WT and lrp::kanR, as shown below:
Transcripts that passed both an FDR-corrected P value of less than 0.05 and a log2 expression ratio magnitude of greater than log2(1.5) were considered to have a significant Lrp-dependent RNA expression change under that condition.
To obtain a maximally conservative credible interval on the log2 expression ratio, we calculated the 95% credible interval on the log2 ratio of each of the four possible WT replicate to lrp::kanR mutant replicate pairs across all 100 bootstrap replicates performed by kallisto. We then chose the minimum of the minimum credible intervals of all possible pairs and the maximum of the maximum credible intervals of all possible pairs to report in each of our RNA-seq plots. This credible interval is on average two times larger than a credible interval obtained from bootstrap replicates of the average expression ratio and best represents the true uncertainty of each ratio.
Antibody development and testing.
The monoclonal antibody used in these experiments was developed via a contract with NeoClone (Madison, WI). Using purified His-tagged Lrp, several rounds of potential antibodies were developed. The potential antibodies were tested for cross-reactivity with the known Lrp homologues AsnC and YbaO by enzyme-linked immunosorbent assay (ELISA) at NeoClone. We used an in vitro DNA pulldown assay to ensure that the potential antibodies did not inhibit Lrp-DNA binding (Fig. S12A). In addition, we tested the antibody for use in Western blotting (Fig. S12B). We also confirmed that the antibody did not bind the oligomerization interface by observing bands corresponding to Lrp octamers and hexadecamers in native Western blots (data not shown).
Filtering of genes into Lrp-dependent categories.
For gene target filtering, we established four categories through a two-level filtering scheme (Fig. 2A). We first tested whether the gene had an Lrp-dependent change in RNA expression by comparing the target gene’s expression in WT and lrp::kanR mutant strains using a Wald test, as described above. We next asked if the gene had any overlapping high-confidence Lrp binding site, as defined above, within the regulatory region (defined as 250 bp upstream and downstream from the annotated transcription start site [TSS; annotations from RegulonDB {17}]). If multiple TSSs were annotated for a gene, the regulatory region included 250 bp upstream of the most distal TSS and 250 bp downstream of the most proximal TSS. For unannotated TSSs present within the RegulonDB PromoterSet, we assigned the TSS to the nearest downstream gene within 500 bp based on the ORF definitions in RegulonDB’s GeneProductSet, and any TSS that fell outside this range was left unassigned. This automated TSS assignment is consistent with those used in other similar applications (e.g., reference 57).
Genes were thus categorized as either a direct target (RNA expression change and Lrp binding), an indirect target (RNA expression change but no Lrp binding), a poised target (no RNA expression change but Lrp binding), or unconnected to Lrp (neither RNA expression change nor Lrp binding). For the additional classification of poised targets in Fig. S5A, we further divided the poised targets into four subcategories, poised, poised_nearby_direct, poised_uncertain_direct, and poised_unexplained. Poised targets were considered true poised targets if, under any other condition, they transitioned to a direct target. Of the genes that do not fit into the true poised classification if, within the same condition, a nearby gene (with a promoter region within 1,000 bp of the poised gene) was a direct target, they were then classified as a poised_nearby_direct. Failing those two classifications, genes for which the conservative credible interval on the log2 expression values (described in “Determination of Lrp-dependent changes in transcription” above) spanned our log2(1.5) ratio biological cutoff were considered poised_uncertain_direct since we do not have enough information in our data to definitively say that the RNA expression is not impacted by Lrp under that condition. Finally, any poised gene not falling in the above-mentioned classifications was considered a poised_unexplained gene, representing genes where Lrp is binding at the promoter but never regulates the transcription levels of the gene under the conditions studied here.
We also subcategorized the Lrp indirect genes into three categories, indirect, indirect_operon_direct, and indirect_peak_in_cds (Fig. S5B). Some genes classified as indirects were the result of alternative TSSs within an operon, for which Lrp binds the TSS of the first gene in the operon. Genes that fall under this category were considered indirect_operon_direct. Additionally, some genes had a called Lrp peak that overlapped within the coding region of the gene; these genes were classified as indirect_peak_in_cds. All other indirect genes were classified as truly indirect.
For comparing enrichment of Lrp targets with σ factor or transcription factor targets, we used permutation tests as noted in the text, implemented using custom python scripts and 1,000 to 10,000 permutations. When testing for enrichment across several different σ factors or transcription factors, we corrected for multiple hypothesis testing with the python statsmodels.sandbox.stats.multicomp.multipletests implementation of the Benjamini-Hochberg method (52, 58).
All plots except where noted were created using ggplot2 (59) or Matplotlib (60). All genomic features above plots were created using the DNA features viewer python library (https://github.com/Edinburgh-Genome-Foundry/DnaFeaturesViewer).
Data availability.
Raw sequencing data have been deposited in the Gene Expression Omnibus with accession number GSE111874. Source codes for standalone analysis of sequencing data are publicly available from https://github.com/freddolino-lab/2018_Lrp_ChIP.
ACKNOWLEDGMENTS
We thank Robert Blumenthal for his advice and suggestions on this project.
This research was supported by NIH grant R00 GM097033 (to L.F.), NIH grant R35 GM128637 (to L.F.), and startup funds from the University of Michigan (to L.F.). G.M.K. is supported by the Cellular Biotechnology Training Program (grant T32-GM008353) and a one-time Rackham Research Grant from the University of Michigan. M.B.W. is supported by a National Science Foundation Graduate Research Fellowship (DGE 1256260).
The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
G.M.K. and L.F. planned the experiments, G.M.K. performed the experiments, M.B.W. performed the sequencing analysis, M.B.W., G.M.K., and L.F. analyzed the data, M.B.W., G.M.K., and L.F. prepared the manuscript, and L.F. supervised the work.
Footnotes
Supplemental material for this article may be found at https://doi.org/10.1128/jb.00411-18.
REFERENCES
- 1.Martínez-Antonio A, Collado-Vides J. 2003. Identifying global regulators in transcriptional regulatory networks in bacteria. Curr Opin Microbiol 6:482–489. doi: 10.1016/j.mib.2003.09.002. [DOI] [PubMed] [Google Scholar]
- 2.Tani TH, Khodursky A, Blumenthal RM, Brown PO, Matthews RG. 2002. Adaptation to famine: A family of stationary-phase genes revealed by microarray analysis. Proc Natl Acad Sci U S A 99:13471–13476. doi: 10.1073/pnas.212510999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Haney SA, Platko JV, Oxender DL, Calvo JM. 1992. Lrp, a leucine-responsive protein, regulates branched-chain amino acid transport genes in Escherichia coli. J Bacteriol 174:108–115. doi: 10.1128/jb.174.1.108-115.1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Willins DA, Ryan CW, Platko JV, Calvo JM. 1991. Characterization of Lrp, and Escherichia coli regulatory protein that mediates a global response to leucine. J Biol Chem 266:10768–10774. [PubMed] [Google Scholar]
- 5.Baek C-H, Wang S, Roland KL, Curtiss R. 2009. Leucine-responsive regulatory protein (Lrp) acts as a virulence repressor in Salmonella enterica serovar Typhimurium. J Bacteriol 191:1278–1292. doi: 10.1128/jb.01142-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Engstrom MD, Mobley HLT. 2016. Regulation of expression of uropathogenic Escherichia coli nonfimbrial adhesin TosA by PapB momolog TosR in conjunction with H-NS and Lrp. Infect Immun 84:811–821. doi: 10.1128/IAI.01302-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Cordone A, Lucchini S, Felice M, Ricca E. 2011. Direct and indirect control of Lrp on LEE pathogenicity genes of Citrobacter rodentium. FEMS Microbiol Lett 325:64–70. doi: 10.1111/j.1574-6968.2011.02411.x. [DOI] [PubMed] [Google Scholar]
- 8.Parti RPS, Shrivastava R, Srivastava S, Subramanian AR, Roy R, Srivastava BS, Srivastava R. 2008. A transposon insertion mutant of Mycobacterium fortuitum attenuated in virulence and persistence in a murine infection model that is complemented by Rv3291c of Mycobacterium tuberculosis. Microb Pathog 45:370–376. doi: 10.1016/j.micpath.2008.08.008. [DOI] [PubMed] [Google Scholar]
- 9.Hussa EA, Casanova-Torres ÁM, Goodrich-Blair H. 2015. The global transcription factor Lrp controls virulence modulation in Xenorhabdus nematophila. J Bacteriol 197:3015–3025. doi: 10.1128/jb.00272-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Lin W, Kovacikova G, Skorupski K. 2007. The quorum sensing regulator HapR downregulates the expression of the virulence gene transcription factor AphA in Vibrio cholerae by antagonizing Lrp- and VpsR-mediated activation. Mol Microbiol 64:953–967. doi: 10.1111/j.1365-2958.2007.05693.x. [DOI] [PubMed] [Google Scholar]
- 11.de los Rios S, Perona JJ. 2007. Structure of the Escherichia coli leucine-responsive regulatory protein Lrp reveals a novel octameric assembly. J Mol Biol 366:1589–1602. doi: 10.1016/j.jmb.2006.12.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chen S, Rosner MH, Calvo JM. 2001. Leucine-regulated self-association of leucine-responsive regulatory protein (Lrp) from Escherichia coli. J Mol Biol 312:625–635. doi: 10.1006/jmbi.2001.4955. [DOI] [PubMed] [Google Scholar]
- 13.Chen S, Calvo JM. 2002. Leucine-induced dissociation of Escherichia coli Lrp hexadecamers to octamers. J Mol Biol 318:1031–1042. doi: 10.1016/S0022-2836(02)00187-0. [DOI] [PubMed] [Google Scholar]
- 14.Chen S, Hao Z, Bieniek E, Calvo JM. 2001. Modulation of Lrp action in Escherichia coli by leucine: effects on non-specific binding of Lrp to DNA1. J Mol Biol 314:1067–1075. doi: 10.1006/jmbi.2000.5209. [DOI] [PubMed] [Google Scholar]
- 15.Cho B-K, Barrett CL, Knight EM, Park YS, Palsson BØ. 2008. Genome-scale reconstruction of the Lrp regulatory network in Escherichia coli. Proc Natl Acad Sci U S A 105:19462–19467. doi: 10.1073/pnas.0807227105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Hart BR, Blumenthal RM. 2011. Unexpected coregulator range for the global regulator Lrp of Escherichia coli and Proteus mirabilis. J Bacteriol 193:1054–1064. doi: 10.1128/jb.01183-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gama-Castro S, Salgado H, Santos-Zavaleta A, Ledezma-Tejeida D, Muñiz-Rascado L, García-Sotelo JS, Alquicira-Hernández K, Martínez-Flores I, Pannier L, Castro-Mondragón JA, Medina-Rivera A, Solano-Lira H, Bonavides-Martínez C, Pérez-Rueda E, Alquicira-Hernández S, Porrón-Sotelo L, López-Fuentes A, Hernández-Koutoucheva A, Del Moral-Chávez V, Rinaldi F, Collado-Vides J. 2016. RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond. Nucleic Acids Res 44:D133–D143. doi: 10.1093/nar/gkv1156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Graunke DM, Fornace AJ, Jr, Pieper RO. 1999. Presetting of chromatin structure and transcription factor binding poise the human GADD45 gene for rapid transcriptional up-regulation. Nucleic Acids Res 27:3881–3890. doi: 10.1093/nar/27.19.3881. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Xiao G, White D, Bargonetti J. 1998. p53 binds to a constitutively nucleosome free region of the mdm2 gene. Oncogene 16:1171–1181. doi: 10.1038/sj.onc.1201631. [DOI] [PubMed] [Google Scholar]
- 20.Brinkman AB, Ettema TJG, De Vos WM, Van Der Oost J. 2003. The Lrp family of transcriptional regulators. Mol Microbiol 48:287–294. doi: 10.1046/j.1365-2958.2003.03442.x. [DOI] [PubMed] [Google Scholar]
- 21.Willins DA, Calvo JM. 1992. In vitro transcription from the Escherichia coli ilvIH promoter. J Bacteriol 174:7648–7655. doi: 10.1128/jb.174.23.7648-7655.1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Platko JV, Willins DA, Calvo JM. 1990. The ilvIH operon of Escherichia coli is positively regulated. J Bacteriol 172:4563–4570. doi: 10.1128/jb.172.8.4563-4570.1990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Park PJ. 2009. ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet 10:669–680. doi: 10.1038/nrg2641. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Shimada T, Saito N, Maeda M, Tanaka K, Ishihama A. 2015. Expanded roles of leucine-responsive regulatory protein in transcription regulation of the Escherichia coli genome: genomic SELEX screening of the regulation targets. Microb Genom 1:e000001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Fang X, Sastry A, Mih N, Kim D, Tan J, Yurkovich JT, Lloyd CJ, Gao Y, Yang L, Palsson BO. 2017. Global transcriptional regulatory network for Escherichia coli robustly connects gene expression to transcription factor activities. Proc Natl Acad Sci U S A 114:10286–10291. doi: 10.1073/pnas.1702581114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Neumann M, Leimkühler S. 2008. Heavy metal ions inhibit molybdoenzyme activity by binding to the dithiolene moiety of molybdopterin in Escherichia coli. FEBS J 275:5678–5689. doi: 10.1111/j.1742-4658.2008.06694.x. [DOI] [PubMed] [Google Scholar]
- 27.Neidhardt FC, Bloch PL, Smith DF. 1974. Culture medium for enterobacteria. J Bacteriol 119:736–747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Goodarzi H, Elemento O, Tavazoie S. 2009. Revealing global regulatory perturbations across human cancers. Mol Cell 36:900–911. doi: 10.1016/j.molcel.2009.11.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Zinser ER, Kolter R. 2000. Prolonged stationary-phase incubation selects for lrp mutations in Escherichia coli K-12. J Bacteriol 182:4361–4365. doi: 10.1128/jb.182.15.4361-4365.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Cui Y, Wang Q, Stormo GD, Calvo JM. 1995. A consensus sequence for binding of Lrp to DNA. J Bacteriol 177:4872–4880. doi: 10.1128/jb.177.17.4872-4880.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Chen S, Iannolo M, Calvo JM. 2005. Cooperative binding of the leucine-responsive regulatory protein (Lrp) to DNA. J Mol Biol 345:251–264. doi: 10.1016/j.jmb.2004.10.047. [DOI] [PubMed] [Google Scholar]
- 32.Ishihama A, Shimada T, Yamazaki Y. 2016. Transcription profile of Escherichia coli: genomic SELEX search for regulatory targets of transcription factors. Nucleic Acids Res 44:2058–2074. doi: 10.1093/nar/gkw051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Pul U, Wurm R, Wagner R. 2007. The role of LRP and H-NS in transcription regulation: involvement of synergism, allostery and macromolecular crowding. J Mol Biol 366:900–915. doi: 10.1016/j.jmb.2006.11.067. [DOI] [PubMed] [Google Scholar]
- 34.Peterson SN, Reich NO. 2008. Competitive Lrp and Dam assembly at the pap regulatory region: implications for mechanisms of epigenetic regulation. J Mol Biol 383:92–105. doi: 10.1016/j.jmb.2008.07.086. [DOI] [PubMed] [Google Scholar]
- 35.Traxler MF, Zacharia VM, Marquardt S, Summers SM, Nguyen H-T, Stark SE, Conway T. 2011. Discretely calibrated regulatory loops controlled by ppGpp partition gene induction across the “feast to famine” gradient in Escherichia coli. Mol Microbiol 79:830–845. doi: 10.1111/j.1365-2958.2010.07498.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Bouvier J, Gordia S, Kampmann G, Lange R, Hengge-Aronis R, Gutierrez C. 1998. Interplay between global regulators of Escherichia coli: effect of RpoS, Lrp and H-NS on transcription of the gene osmC. Mol Microbiol 28:971–980. doi: 10.1046/j.1365-2958.1998.00855.x. [DOI] [PubMed] [Google Scholar]
- 37.Colland F, Barth M, Hengge-Aronis R, Kolb A. 2000. σ factor selectivity of Escherichia coli RNA polymerase: role for CRP, IHF and Lrp transcription factors. EMBO J 19:3028–3037. doi: 10.1093/emboj/19.12.3028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Potel CM, Lin M-H, Heck AJR, Lemeer S. 2018. Widespread bacterial protein histidine phosphorylation revealed by mass spectrometry-based proteomics. Nat Methods 15:187–190. doi: 10.1038/nmeth.4580. [DOI] [PubMed] [Google Scholar]
- 39.Baeza J, Dowell JA, Smallegan MJ, Fan J, Amador-Noguez D, Khan Z, Denu JM. 2014. Stoichiometry of site-specific lysine acetylation in an entire proteome. J Biol Chem 289:21326–21338. doi: 10.1074/jbc.M114.581843. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Datsenko KA, Wanner BL. 2000. One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proc Natl Acad Sci U S A 97:6640–6645. doi: 10.1073/pnas.120163297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Goda SK, Minton NP. 1995. A simple procedure for gel electrophoresis and northern blotting of RNA. Nucleic Acids Res 23:3357–3358. doi: 10.1093/nar/23.16.3357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Martin M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17:10–12. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
- 43.Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Andrews S. 2010. FastQC: a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
- 45.Ewels P, Magnusson M, Lundin S, Käller M. 2016. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32:3047–3048. doi: 10.1093/bioinformatics/btw354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Freddolino PL, Amini S, Tavazoie S. 2012. Newly identified genetic variations in common Escherichia coli MG1655 stock cultures. J Bacteriol 194:303–306. doi: 10.1128/jb.06087-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup . 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Hampel FR. 1974. The influence curve and its role in robust estimation. J Am Stat Assoc 69:383–393. doi: 10.1080/01621459.1974.10482962. [DOI] [Google Scholar]
- 50.Rousseeuw PJ, Croux C. 1993. Alternatives to the median absolute deviation. J Am Stat Assoc 88:1273–1283. doi: 10.1080/01621459.1993.10476408. [DOI] [Google Scholar]
- 51.Jones E, Oliphant T, Peterson P. 2001. SciPy: open source scientific tools for Python. http://www.scipy.org/.
- 52.Hochberg Y, Benjamini Y. 1990. More powerful procedures for multiple significance testing. Stat Med 9:811–818. doi: 10.1002/sim.4780090710. [DOI] [PubMed] [Google Scholar]
- 53.Li Q, Brown JB, Huang H, Bickel PJ. 2011. Measuring reproducibility of high-throughput experiments. Ann Appl Stat 5:1752–1779. doi: 10.1214/11-AOAS466. [DOI] [Google Scholar]
- 54.Bray N, Pimentel H, Melsted P, Pachter L. 2016. Near-optimal RNA-seq quantification with kallisto. https://liorpachter.wordpress.com/2015/05/10/near-optimal-rna-seq-quantification-with-kallisto/. [DOI] [PubMed]
- 55.Pimentel H, Bray NL, Puente S, Melsted P, Pachter L. 2017. Differential analysis of RNA-seq incorporating quantification uncertainty. Nat Methods 14:687–690. doi: 10.1038/nmeth.4324. [DOI] [PubMed] [Google Scholar]
- 56.Anders S, Huber W. 2010. Differential expression analysis for sequence count data. Genome Biol 11:R106. doi: 10.1186/gb-2010-11-10-r106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Thomason MK, Bischler T, Eisenbart SK, Förstner KU, Zhang A, Herbig A, Nieselt K, Sharma CM, Storz G. 2015. Global transcriptional start site mapping using differential RNA sequencing reveals novel antisense RNAs in Escherichia coli. J Bacteriol 197:18–28. doi: 10.1128/jb.02096-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Seabold S, Perktold J. 2010. Statsmodels: econometric and statistical modeling with python. 9th Python in Science Conference. 28 June to 3 July 2010, Austin, TX. [Google Scholar]
- 59.Wickham H. 2009. ggplot2: elegant graphics for data analysis. Springer-Verlag, New York, NY. [Google Scholar]
- 60.Hunter JD. 2007. Matplotlib: a 2D graphics environment. Comput Sci Eng 9:90–95. doi: 10.1109/MCSE.2007.55. [DOI] [Google Scholar]
- 61.Rex JH, Aronson BD, Somerville RL. 1991. The tdh and serA operons of Escherichia coli: mutational analysis of the regulatory elements of leucine-responsive genes. J Bacteriol 173:5944–5953. doi: 10.1128/jb.173.19.5944-5953.1991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Pachkov M, Erb I, Molina N, van Nimwegen E. 2007. SwissRegulon: a database of genome-wide annotations of regulatory sites. Nucleic Acids Res 35:D127–D131. doi: 10.1093/nar/gkl857. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Grant CE, Bailey TL, Noble WS. 2011. FIMO: scanning for occurrences of a given motif. Bioinformatics 27:1017–1018. doi: 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Raw sequencing data have been deposited in the Gene Expression Omnibus with accession number GSE111874. Source codes for standalone analysis of sequencing data are publicly available from https://github.com/freddolino-lab/2018_Lrp_ChIP.





