Abstract
Bacteria regulate gene expression to adapt to changing environments through transcriptional regulatory networks (TRNs). Although extensively studied, no TRN is fully characterized since the identity and activity of all the transcriptional regulators comprising a TRN are not known. Here, we experimentally evaluate 40 uncharacterized proteins in Escherichia coli K-12 MG1655, which were computationally predicted to be transcription factors (TFs). First, we used a multiplexed chromatin immunoprecipitation method combined with lambda exonuclease digestion (multiplexed ChIP-exo) assay to characterize binding sites for these candidate TFs; 34 of them were found to be DNA-binding proteins. We then compared the relative location between binding sites and RNA polymerase (RNAP). We found 48% (283/588) overlap between the TFs and RNAP. Finally, we used these data to infer potential functions for 10 of the 34 TFs with validated DNA binding sites and consensus binding motifs. Taken together, this study: (i) significantly expands the number of confirmed TFs to 276, close to the estimated total of about 280 TFs; (ii) provides putative functions for the newly discovered TFs and (iii) confirms the functions of four representative TFs through mutant phenotypes.
INTRODUCTION
Bacteria employ a broad range of mechanisms to regulate gene expression to achieve and maintain phenotypic states (1). The primary mechanism by which gene expression is regulated in bacteria relies on the promoter recognition by the RNA polymerase (RNAP) holoenzyme and its subsequent initiation of transcription (2). Since the core enzyme (including α, α, β, β’ and ω) itself is unable to recognize promoters or to initiate transcription, a sigma factor, which directly recognizes its target sequence, binds to the core enzyme, forming a complex known as the RNA polymerase holoenzyme. This complex then orchestrates transcription initiation from specific promoters (1). In addition to the regulation by sigma factors, transcription factors (TFs) also bind to intergenic regulatory regions of DNA, preventing or promoting RNAP binding upstream from a transcription start site (3). Thus, the identification of transcription factors and their association with sigma factors is fundamental to understanding how an organism responds to varying phenotypic demands through transcriptional regulation.
A complete description of the Escherichia coli K-12 transcriptional regulatory network (TRN) is of particular importance to the scientific community because it provides fundamental information not only for unravelling regulatory network architectures that are host to individual regulators and their target genes, but also for studying the interactions among multiple regulators. Although E. coli K-12 MG1655 is one of the best understood model organisms, our current knowledge of its TRN is still incomplete (4,5). To reconstruct the global TRN, it is necessary to identify a full set of TFs and expand the TRN through new chromatin immunoprecipitation (ChIP) data for individual TFs.
We previously developed a pipeline for computational prediction followed by experimental validation via ChIP technology (6,7). The first use of this pipeline successfully discovered ten novel TFs in E. coli and identified their regulatory roles. In this study, to get closer to the complete characterization of the E. coli K-12 MG1655 TRN, we employ this pipeline again to characterize an additional 40 candidate TFs and their target genes. Specifically, we use a high-throughput method (multiplexed ChIP-exo) to generate massive protein-DNA interactions datasets for these candidate TFs, RNAP, and the sigma factor RpoD. Combining these data, we successfully uncover 588 binding sites of 34 TFs from 40 initial candidates, in which 283 binding sites are located upstream. Based on the number of target genes, we classify these TFs into three groups: (i) one global regulator (>100 target genes), (ii) twenty-nine local regulators (<100 target genes) and (iii) four single-target regulators (8). We further explore the physiological roles of four representative TFs using gene expression profiling and mutant phenotype analysis. Our results illustrate that newly discovered TFs have a varied number of regulatory targets and participate in key cellular processes from replication, transcription, nutrition metabolism to stress responses in E. coli K-12 MG1655. Taken together, our results expand the total number of validated TFs to 276 (an increase of ∼12%), and support the estimated total of 280∼300 TFs comprising the TRN in E. coli K-12 MG1655 (9).
MATERIALS AND METHODS
Computational prediction of candidate TFs
Previously, we had generated a list of candidate TFs and used 16 of the top candidates to assess the discovery pipeline (10). Ten of the 16 candidates were found to be TFs. Here, we extended the experimental validation of these computationally predicted targets by selecting and studying additional candidates from this previous list. Briefly, the list was generated using the TFpredict algorithm (11) modified for use with bacterial genomes (10). The TFpredict algorithm takes a protein sequence as input and generates a quantified score in the range [0,1] that represents the likelihood of that protein being a TF based on sequence homology, where a score of 1 represents the highest confidence. We selected 40 of the top candidate TFs from this rank-ordered list. See reference (10) for a full description of the computational methods.
Bacterial strains, media and growth conditions
The strains used in this study are E. coli K-12 MG1655 and its derivatives, deletion strains, and myc-tagged strains (Dataset S1). For ChIP-exo experiments, the E. coli strains harboring 8-myc were generated by a λ red-mediated site-specific recombination system targeting the C-terminal region as described previously (12). For ChIP-exo experiments, glycerol stocks of E. coli strains were inoculated into M9 minimal medium (47.8 mM Na2HPO4, 22 mM KH2PO4, 8.6 mM NaCl, 18.7 mM NH4Cl, 2 mM MgSO4 and 0.1 mM CaCl2) with 0.2% (w/v) glucose. The M9 minimal medium was also supplemented with 1 ml trace element solution (100X) containing 1 g EDTA, 29 mg ZnSO4.7H2O, 198 mg MnCl2.4H2O, 254 mg CoCl2.6H2O, 13.4 mg CuCl2 and 147 mg CaCl2. The culture was incubated at 37°C overnight with agitation and was then used to inoculate fresh media (1/200 dilution). The volume of the fresh media was 150 mL per biological replicate. The fresh culture was incubated at 37°C with agitation to the mid-log phase where optimal density at 600 nm (OD600) was around 0.5. To create oxidative stress, the overnight cultures were inoculated at an OD600 of 0.01 into the fresh 70 mL of glucose M9 minimal medium in a 500 ml flask supplemented with 250 μM paraquat (PQ) at an OD600 of 0.3 and incubated for 20 min with stirring. The strains in the ChIP-exo experiments were grown under the conditions listed in Dataset S2.
To evaluate the susceptibility of bacterial cells to H2O2, mid-log phase cells (OD600 ≈ 0.5) were harvested, washed with phosphate-buffered saline (PBS), and resuspended in M9 minimal medium. The culture was then treated with 60 mM H2O2 (the final concentration) for 15 min. Samples were taken before and after the treatment, diluted, and plated in triplicate on LB plates. Viable counts were determined following incubation at 37°C for up to 24 h. The sensitivity of cells to the lethal effect of the stimulus was expressed as percent survival of treated cells relative to that of untreated cells determined at time zero.
To examine the effects of carbon sources on cell growth, E. coli K-12 MG1655 and yciT deletion strains were incubated on M9 minimal medium with a sole carbon source (glucose, fructose, or sorbitol) at 37°C overnight with agitation. The concentration of the carbon sources was 0.2% (w/v). These cultures were then used to inoculate the same fresh media (1/200 dilution) and were incubated again at 37°C with agitation. Growth curves were monitored by measuring OD600 every 30 min using a Bioscreen C (Growth curves, USA), and repeated twice with three biological replicates.
To determine the effects of osmotic stress on the growth, E. coli K-12 MG1655 and yciT deletion strains were grown on M9 minimal sorbitol (0.2% w/v) media and the same media supplemented with 0.5 M NaCl at the beginning of the culture, respectively. The culture was incubated at 37°C with agitation and monitored by measuring OD600 every 30 min using a Bioscreen C, and repeated twice with three biological replicates.
Multiplexed ChIP-exo experiment
A multiplexed ChIP-exo experiment was performed through simple modification of our standard ChIP-exo method described previously (13). Here, after ligating the first adapter to each sample separately, the samples are then pooled together and subject to the remainder of the enzymatic reactions used for library preparation. Each sample receives a different first adapter bearing a unique 6-base sequence (barcode), thus allowing demultiplexing of sequencing data.
To identify the binding map of each candidate TF in vivo, the DNA bound to each candidate TF from formaldehyde cross-linked E. coli cells were isolated by chromatin immunoprecipitation (ChIP) with the antibody that specifically recognizes the myc tag (9E10, Santa Cruz Biotechnology) and Dynabeads Pan Mouse IgG magnetic beads (Invitrogen). This step was followed by stringent washings (14). Cells were initially grown in glucose minimal medium to OD600 = 0.5 and incubated with 1% formaldehyde (Thermo Scientific) for 25 min at room temperature. The formaldehyde was quenched by 2.5 M glycine (Thermo Fisher Scientific) for an additional 5 min and the cells were washed with ice-cold TBS (Thermo Fisher Scientific) three times. The resulting pellets were lysed with Ready-lyse lysozyme solution (Epicentre). Lysates were sonicated using a sonicator (QSonic) to generate 300–500 bp randomly sheared chromosomal DNA fragments. The extent of shearing was monitored with a 1% agarose gel and confirmed by separation on a 2100 High sensitivity Bioanalyzer chip (Agilent Technologies) upon completion of the immunoprecipitation. Immunoprecipitation was carried out at 4°C with overnight incubation and 15 μl anti-c-myc mouse antibody (9E10, Santa Cruz Biotechnology). The protein of interest, together with its cross-linked DNA and covalently bound mouse antibody, was captured with 50 μl Dynabeads Pan mouse IgG (Invitrogen) and washed with buffer I (50 mM Tris–HCl (pH 7.5), 140 mM NaCl, 1 mM EDTA, 1% Triton X-100).
ChIP materials (chromatin-beads) were used to perform on-bead enzymatic reactions of the ChIP-exo method (7). The sheared DNA of chromatin-beads was repaired by the NEBNext End Repair Module (New England Biolabs) followed by the addition of a single dA overhang and ligation of a first adaptor (5′-phosphorylated) using the dA-Tailing Module (New England Biolabs) and the NEBNext Quick Ligation Module (New England Biolabs), respectively. The first adaptor was designed to have different indices to distinguish different DNA samples after the sequencing. After ligation, multiple ChIP materials could be pooled together. Nick repair was performed by using PreCR Repair Mix (New England Biolabs). Lambda exonuclease- and RecJf exonuclease-treated chromatin was eluted from the beads and incubated overnight at 65°C to reverse the protein–DNA cross-link. RNAs- and proteins-removed DNA samples were used to perform primer extension and second adaptor ligation with following modifications. The DNA samples incubated for primer extension as described previously (13) were treated with dA-Tailing Module (New England Biolabs) and NEBNext Quick Ligation Module (New England Biolabs) for second adaptor ligation. The DNA sample purified by GeneRead Size Selection Kit (Qiagen) was enriched by polymerase chain reaction (PCR) using Phusion High-Fidelity DNA Polymerase (New England Biolabs). The amplified DNA samples were purified again by GeneRead Size Selection Kit (Qiagen) and quantified using Qubit dsDNA HS Assay Kit (Life Technologies). Quality of the DNA sample was checked by running Agilent High Sensitivity DNA Kit using Agilent 2100 Bioanalyzer (Agilent) before sequenced using HiSeq 2500 (Illumina) following the manufacturer's instructions. The antibody (NT63, Biolegend) that specifically recognizes RNA polymerase β was used to conduct the ChIP-exo experiment to detect the binding sites of RNA polymerase in E. coli K-12 MG1655. The antibody (2G10, Biolegend) that specifically recognizes σ70 was used to detect the binding sites of σ70 in E. coli K-12 MG1655. Each step was also performed following the manufacturer's instructions. ChIP-exo experiments were performed in biological duplicates (Dataset S3 and S4).
Peak calling for ChIP-exo dataset
Peak calling was performed as previously described (13). Sequence reads generated from ChIP-exo were mapped onto the reference genome (NC_000913.2) using bowtie (15) with default options to generate SAM output files. The MACE program was used to define peak candidates from biological duplicates for each experimental condition with sequence depth normalization (16). To reduce false-positive peaks, peaks with a signal-to-noise (S/N) ratio <1.5 were removed; and peaks without expected bimodal shape were removed (17) The noise level was set to the top 5% of signals at genomic positions (13). The calculation of S/N ratio resembles the way to calculate ChIP-chip peak intensity where the IP signal was divided by Mock signal. Finally, each peak was assigned to the target gene, according to genomic position (Supplementary Figure S1). Genome-scale data were visualized using MetaScope (https://sites.google.com/view/systemskimlab/software?authuser=0) and NimbleGen's SignalMap software.
Motif search from ChIP-exo peaks
The consensus DNA sequence motif analysis for validated TFs was performed using the MEME software suite (the E-value < 1e-3) (18). For YciT, YcjW, YdcN, YdhB, YfeC, YfeD and YidZ, sequences in binding regions were extracted from the reference genome (NC_000913.2).
COG functional enrichment
Regulon genes were categorized according to their annotated clusters of orthologous groups (COG) category (19). Functional enrichment of COG categories in the target genes was determined by performing a hypergeometric test, and a P-value <0.01 was considered significant.
Transcriptomics
RNA-seq was performed using two biological replicates (Dataset S5). The strains were grown under the same conditions as those used in the ChIP-exo experiments. Transcripts were stabilized by mixing 3 ml of cell cultures at the mid-log phase with 6 ml of RNAprotect Bacteria Reagent (Qiagen). Samples were immediately vortexed for 5 s, incubated for 5 min at room temperature, and then centrifuged at 5000 × g for 10 min. The supernatant was decanted, and any residual supernatant was removed by inverting the tube once onto a paper towel. Total RNA samples were then isolated using a RNeasy Plus Mini kit (Qiagen) following the manufacturer's instruction. Samples were then quantified using a NanoDrop 1000 spectrophotometer (Thermo Scientific) and quality of the isolated RNA was checked by running RNA 6000 Pico Kit using an Agilent 2100 Bioanalyzer (Agilent). Paired-end, strand-specific RNA-seq libraries were prepared using KAPA RNA Hyper Prep kit (KAPA Biosystems), following the instructions (20,21). Resulting libraries were analyzed on an Agilent Bioanalyzer DNA 1000 chip (Agilent). Sequencing was performed on a Hiseq 2500 sequencer (illumina) at the Genomics Core facility of University of California, San Diego.
Calculation of differentially expressed genes
Expression profiling was performed as previously described (13). Raw sequence reads generated from RNA-seq were mapped onto the reference genome (NC_000913.2) using bowtie v1.2.3 with the maximum insert size of 1000 bp, and two maximum mismatches after trimming 3 bp at 3′ ends (15). Transcript abundance was quantified using summarizeOverlaps from the R GenomicAlignments package, with strand inversion for the dUTP protocol and strict intersection mode (22). We then calculated the dispersion and differential expression level of each gene using DESeq2 (23). DESeq2 uses empirical Bayes shrinkage for dispersion estimation which substantially improves the stability and reproducibility of analysis results compared to maximum-likelihood-based solutions. This also makes DESeq2 applicable for small studies with few replicates (23). Transcripts per Million (TPM) was calculated by DESeq2. For significance testing, DESeq2 uses the Wald test to calculate the P-value. The Wald test calculates P-values from the subset of genes that pass an independent filtering step, and they are adjusted for multiple testing using the procedure of Benjamini and Hochberg (23). Expression with log2(fold-change) ≥ log2(2.0) and adjusted P-value <0.05 or log2(fold-change) ≤–log2(2.0) and adjusted P-value <0.05 was considered as differentially expressed (Dataset S6).
Structural analysis of candidate TFs
Homology models of the candidate transcription factors YidZ, YfeC, YciT, YcjW, YdcN and YgbI were constructed using the SWISS-MODEL pipeline (24). Multiple templates were analyzed, and inference of the oligomeric state was based on the reported interface conservation scores to existing complexes of similar sequence identity. The structures were annotated using information in UniProt (25) and visualized with VMD (26).
RESULTS
Here, we describe the discovery and characterization of candidate TFs in E. coli K-12 MG1655 following our previously reported and validated pipeline (10). First, we present an overview of the binding sites determined by multiplexed ChIP-exo for these candidate TFs, highlighting their structural and functional properties. We then describe the regulation of transcription initiation by these candidate TFs through a separate ChIP-exo screen for the RNAP holoenzyme. Next, we characterize the putative functions of 10 candidate TFs in E. coli to understand their biological roles (Figure 1). Finally, we provide further phenotypic analysis for the wild type and four mutant strains through deletion of either yfeC, yciT, ybcM or ygbI.
Screening putative transcription factors in E. coli K-12 MG1655
Previously, we had generated a rank-ordered list of candidate TFs from a group of uncharacterized genes (‘y-genes’) using a homology-based algorithm (10). We experimentally tested 16 of the top hits from this list and verified that ten (62.5%) were indeed TFs. To expand this effort, in the present study, we selected an additional 40 y-genes from the list of candidate TFs and experimentally tested them by using multiplexed ChIP-exo (Table 1). Recently, several of the candidate TFs have been independently suggested to be TFs using in vitro assays: ComR (YcfQ) (27), YcjW (28), SutR (YdcN) (29), RcdB (YhjC) (30), NimR (YeaM) (31), CsqR (YihW) (32,33), YqhC (34,35). However, our results provide in vivo binding sites of these TFs, which is important for expanding the knowledge of the target genes for these TFs in E. coli K-12 MG1655.
Table 1.
Gene name | Locus_tag (b_number) | Total length (AA) | TF Family type* | Relative HTH position# |
---|---|---|---|---|
yahB | b0316 | 310 | LysR | 3–29% |
ybcM | b0546 | 265 | AraC | 80–99% |
ybdO | b0603 | 300 | LysR | 3–30% |
ybeF | b0629 | 317 | LysR | 8–33% |
ybhD | b0768 | 317 | LysR | 1.2–35% |
ycaN | b0900 | 302 | LysR | 1–37% |
ycfQ | b1111 | 210 | TetR | 5–39% |
yciT | b1284 | 249 | DeoR | 1–24% |
ycjW | b1320 | 332 | GalR/LacI | 0–17% |
ydcN | b1434 | 178 | N/A* | 2–40% |
ydcR | b1439 | 468 | GntR | 0–15% |
ydhB | b1659 | 310 | LysR | 2–28% |
ydiP | b1696 | 303 | AraC | 77–94% |
yeaM | b1790 | 273 | AraC | 72–94% |
yebK | b1853 | 289 | N/A* | 0–28% |
yedW | b1969 | 223 | CheY | 0–56% |
yeeY | b2015 | 309 | LysR | 2–28% |
yehT | b2125 | 239 | CheY | 0–51% |
yfeC | b2398 | 114 | N/A* | 0–50% |
yfeD | b2399 | 130 | N/A* | 4–63% |
yfiE | b2577 | 293 | LysR | 0–29% |
yfjR | b2634 | 233 | N/A* | N/A |
ygaV | b2667 | 99 | N/A* | 11–99% |
ygbI | b2735 | 255 | DeoR | 2–23% |
ygeR | b2865 | 251 | N/A* | N/A |
ygfI | b2921 | 298 | LysR | 3–29% |
yggD | b2929 | 169 | N/A* | N/A |
yhjB | b3520 | 200 | LuxR | N/A |
yhjC | b3521 | 299 | LysR | 1–28% |
yiaU | b3585 | 324 | LysR | 2–35% |
yidL | b3680 | 297 | AraC | 80–96% |
yidZ | b3711 | 319 | LysR | 2–25% |
yihL | b3872 | 236 | GntR | 1–31% |
yihW | b3884 | 261 | DeoR | 3–31% |
yjhI | b4299 | 262 | IclR | 3–29% |
yjjJ | b4385 | 443 | N/A* | N/A |
yneJ | b1526 | 293 | LysR | 0–37% |
ynfL | b1595 | 297 | LysR | 1–30% |
ypdC | b2382 | 285 | AraC | 82–99% |
yqhC | b3010 | 318 | AraC | 82–98% |
Note, TF Family type* was annotated by the Hidden Markov Model (37).
N/A* indicates no annotation due to the lack of structural information.
Relative HTH position# was calculated by the position of a HTH domain at the full length of protein sequence. N/A# indicates the absence of a HTH domain in a given protein.
To predict the family types of candidate TFs, we employed Hidden Markov Models to annotate them based on the homology to the collection of known protein structures in the SUPERFAMILY 2 database (36) (Table 1, Dataset S7). We found that the majority of these 40 candidate TFs contain winged helix-turn-helix (HTH) DNA-binding domains, and can be grouped into different TF family types based on homology to known transcription factors (Supplementary Table S1) (37). These candidates can be classified into nine known TF family types (LysR, AraC, GntR, CheY, TetR, LuxR, GalR/LacI, IclR, DeoR) and one unknown group (due to the lack of structure information), which were listed in ‘TF family type’ (Supplementary Figure S2A). We then calculated the relative position of the HTH domain for all the candidate TFs, according to the start and end position of amino acids sequences (9) (Supplementary Figure S2B). Several candidate TFs (YfjR, YgeR, YggD, YhjB, YjjJ) do not have a predicted DNA-binding domain due to a lack of structural information, thus their relative HTH positions were annotated as N/A.
Identifying the binding sites for candidate TFs
Next, to characterize binding sites of these candidate TFs on the genome, we constructed 40 myc-tagged strains corresponding to each candidate TF of interest and employed a multiplexed ChIP-exo method to increase the throughput of the assay (Supplementary Figure S3).
We obtained the binding profiles for all candidate TFs using the peak-calling algorithm MACE (16), and confirmed that 34 out of the 40 have DNA-binding affinities (Figure 2A). A total of 588 binding sites were identified for these candidate TFs (Figure 2B). Four of the six candidates, YgeR, YggD, YjjJ and YfjR, did not display any DNA binding, probably because they are non-HTH domain proteins (Table 1). It is likely that the remaining two proteins, YpdC and YeeY, are not activated under the test conditions in this study. They have therefore been excluded from further analyses.
For the 34 validated candidate TFs, we analyzed the conserved binding motifs using the MEME algorithm (38) and obtained consensus sequences for 7 TFs (Figure 2C). Specifically, we found that the consensus binding motifs for YciT, YcjW, YdcN and YidZ were palindromic. For some validated TFs (YbcM, YbdO, YcaN, YcfQ, YdiP, YedW, YihW and YqhC), although they had a limited number of binding sites, their binding sites exhibited sequence-specific patterns (Supplementary Figure S4).
The majority of promoters in E. coli are recognized by the sigma factor RpoD (σ70), also known as the housekeeping sigma factor (14). Thus we performed additional ChIP-exo experiments to investigate whether target genes of the 34 candidate TFs are also expressed by RNAP assembled with σ70 (Supplementary Figure S5A). We specifically focused on three combinations between RNA polymerase, RpoD, and candidate TFs: (i) RNAP + RpoD: a binding site is located upstream of a target gene, and both RNAP and RpoD recognize the promoter region of this gene; (ii) RNAP_only: a binding site is located upstream of a target gene, but only RNAP recognizes the promoter region (while RpoD could not recognize the promoter region, it is likely that alternative sigma factors could recognize this promoter region); and (iii) others: includes two scenarios; one where a binding site is located within the coding region, and the other where a binding site is located upstream of a target gene but neither RNAP or RpoD recognize the promoter region. Given these criteria, we identified 208 binding events belonging to type (i) and 75 binding events belonging to type (ii). Thus, a total of 283 binding events overlaps with RNAP for the 34 candidate TFs, accounting for 48% (283/588) of total binding sites (Supplementary Figure S5B).
Deciphering regulatory roles of candidate transcription factors
Having verified whether candidate TFs were DNA-binding proteins, we next assessed their putative functions. We used the definition put forth by Shimada et al.—based on the number of target genes—to classify the regulatory nature of the TFs studied here (8). This definition uses four classes: (i) nucleoid-associated regulators (hundreds of target genes); (ii) global regulators (>100 target genes); (iii) local regulators (<100 target genes); and (iv) single-target regulators. In this study, 34 validated TFs were classified into the latter three types: 1 global regulator (type I), 29 local regulators (type II), and 4 single-target regulators (type III). In particular, we further inferred the putative biological roles of ten validated TFs (YidZ, YfeC, YciT, YdhB, YbcM, YneJ, YjhI, YfiE, YgbI and YnfL) based on annotated functions of their target genes (Table 2).
Table 2.
Gene# (b-number) | Classification of candidate TFs (# of TFBSs) | Family Type | Binding sites associated with metabolic pathway | Proposed regulatory roles | Results |
---|---|---|---|---|---|
yidZ (b3711) | Type I (118) | LysR | Widespread, intragenic binding | Target genes have diverse functions | Figure 3 |
yfeC (b2398) | Type II (50) | N/A* | chaAB, panD, grxC, pqqL, hybE, lpp, rpmH, rpmB | yfeC mutant was reported to increase eDNA release (40) | Figure 4 |
yciT (b1284) | Type II (49) | DeoR | ybiO, ybiV, ybiY | A regulator involved in osmolarity | Figure 5 |
ydhB# (b1659) | Type II (29) | LysR | ydhB, ydhC | A regulator involved in purine metabolism | Supplementary Figure S6 |
ybcM (b0546) | Type II (12) | AraC | ybcL, ucpA | A regulator related to stress response | Figure 6 |
yneJ# (b1526) | Type II (8) | LysR | sad, yneJ | A regulator involved in glutamate metabolism | Supplementary Figure S7 (54) |
yjhI# (b4299) | Type II (5) | IclR | yjhG, yjhH, yjhI | A regulator related to the energy conversion between pyruvate and glycolaldehyde | Supplementary Figure S8 |
yfiE# (b2577) | Type II (4) | LysR | yfiE, eamB | A regulator related to the control of a cysteine and O-acetylserine exporter | Supplementary Figure S9 |
ygbI (b2735) | Type III (1) | DeoR | ygbJ, ygbK | A regulator involved in tartrate metabolism | Figure 7 |
ynfL# (b1595) | Type III (1) | LysR | ynfL, ynfM | A regulator involved in the control of arabinose efflux transporter | Supplementary Figure S10 |
*N/A indicates no prediction due to the lack of structural information.
Genes# were analyzed and presented in the supplementary material.
For detailed analysis, 5 of 10 validated TFs in the three categories—one global regulator (YidZ), three local regulators (YfeC, YciT and YbcM), and one single-target regulator (YgbI)—were selected as representative TFs. To infer their regulatory roles, we combined the binding sites with gene expression profiling to analyze the most significant enrichment of pathways in which validated TFs are involved. The remaining five validated TFs can be found in the Supplementary Material.
A global regulator (type I), YidZ
We identified 118 binding sites of YidZ (Figure 3A) and then enriched 108 out of the 118 binding sites at the high confidence (E-value = 1.2e–140, Figure 2C, Supplementary Figure S11). Based on SWISS-MODEL, YidZ was predicted to form the dimer or tetramer (Supplementary Table S2) (24).
To determine the relative location between YidZ binding in vivo and RNA polymerase, we integrated YidZ binding with the previous ChIP-exo data of RpoB and σ70. Among 27 YidZ intergenic bindings, we identified 12 binding sites at the promoters in the presence of core RNAP and σ70, 9 binding sites at the promoters in the presence of core RNAP, and 6 binding sites at the promoters in the absence of core RNAP and σ70 (Supplementary Figure S5B). Of the 91 intragenic binding sites, 34 are located inside the genes in the presence of core RNAP at the promoter DNA; the remaining 57 binding sites are in the absence of core RNAP at the promoters.
Finally, to explore the regulatory roles of YidZ, we compared the gene expression profile between the wild-type strain and the yidZ knockout strain using RNA-seq. With the deletion, we found that 19 of the 118 target genes were differentially expressed, indicating these genes are directly regulated by YidZ as a major regulator. Genes/operons associated with acid stress and amino acid transport and metabolism (gadA, gadBC, hdeD, hdeAB-yhiD) were down-regulated, while genes involved in carbohydrate transport and metabolism (rbsD, malM, malE, malX) were up-regulated (Figure 3B, Dataset S6). The remaining target genes from ChIP-exo were not differentially expressed after the deletion of yidZ.
Overall, we observed two notable features of the YidZ binding profile. First, YidZ has a large number of binding sites, with 77% (91/118) located within the coding regions and 23% (27/118) located within the intergenic regions. Second, YidZ is associated with diverse gene functions, based on Clusters of Orthologous Groups (COGs) annotations of differentially expressed genes (DEGs) (19) (Figure 3C). However, we did not find any significantly enriched COGs (P < 0.01), indicating that genes directly or indirectly regulated by YidZ are not, as a group, strongly associated with any specific function(s).
A local regulator (type II), YfeC
We identified 50 YfeC binding sites in E. coli K-12 MG1655 (Figure 4A) and then enriched the sequence motif of YfeC (E-value = 7.1e−10, Figure 2C). The consensus DNA binding sequence showed that the TFBSs of YfeC enclose TTC-rich inverted repeats separated by 6-nt. It is likely that YfeC can form the homodimer in the cell as inferred from SWISS-MODEL (Supplementary Figure S12, Table S2).
Functional classification showed that 50 YfeC binding sites are involved in various functional groups, from DNA replication, transcription, translation, to cell envelope biogenesis (Figure 4B). To identify genes directly regulated by YfeC, we compared the gene expression profile between the wild-type strain and the yfeC knockout strain using RNA-seq, and found that 124 genes were up-regulated and 81 genes were down-regulated in the yfeC knockout strain, indicating that YfeC might be a dual regulator in E. coli K-12 MG1655 (Figure 4C, Supplementary Figure S13). Combining YfeC ChIP-exo results with the transcriptomic data, we found that 40% (20 of 50) of the genes with YfeC binding were differentially expressed, suggesting that these 20 genes are directly regulated by YfeC (Figure 4C, Supplementary Table S3). Of these 20 genes, 80% (16 of 20) are repressed by YfeC (Figure 4D). These data confirm that the regulation of YfeC is involved in various functional groups, such as nutrient transport and metabolism (chaB, ychO, panD), translation (rpmH, rpmB, rpsU), post-translational modification (grxC, pqqL, hybE), and cell envelope (lpp).
A previous study reported that single-gene deletion strains for genes rna, hns, nlpI, rfaD and yfeC altered eDNA production in E. coli. These mutations were related to general cellular processes, such as transcription (rna, hns), lipid transport (nlpI), cell envelope (rfaD), and unknown function (yfeC) (39). These results suggest that the yfeC gene is associated with the mutant phenotype-eDNA production in E. coli. Furthermore, although the underlying mechanisms remain unknown, the study hints that eDNA release might be related to multiple cellular processes rather than a single biological pathway. At this point there is no detailed molecular study to determine the mechanism of eDNA release regulated by YfeC in E. coli. Designing such a study may serve as the context for future work.
A local regulator (Type II), YciT
YciT was annotated as a DeoR-type putative transcription factor via the Hidden Markov Model. However, its in vivo DNA binding affinity had not been reported. Here, we identified 49 binding sites of YciT in E. coli K-12 MG1655 (Figure 5A), and then enriched the sequence motif of YciT binding sites (E-value = 1.8e−37, Figure 2C). To predict the putative functions of YciT, we assessed YciT binding sites and the functions of corresponding target genes. We found 47% (23 out of 49) of binding sites located within regulatory regions, indicating that these binding events may modulate target genes. Among these 23 binding sites, three target genes encode proteins involved in sugar metabolism, including sugar phosphatase (ybiV), a putative pyruvate formate-lyase activating enzyme (ybiY), and fructose-6-phosphate aldolase1 (fsaA) (Figure 5B). Some of the other genes encode products involved in membrane components, such as moderate conductance mechanosensitive channel YbiO (ybiO) (Figure 5C), copper/silver export system periplasmic binding protein (cusF), and outer membrane protein X (ompX). The remaining genes (such as ykfC, ycaP, ydbD and yfdQ) are of unknown function.
To confirm the regulation by YciT for these genes, we analyzed the transcriptomic data of the wild type and yciT deletion strain. It was found that target genes involved in metabolic pathways (ybiV, ybiY, fsaA) and membrane components (cusF) were indeed differentially expressed upon the deletion of the yciT gene (Supplementary Figure S14), indicating that YciT may participate in the control of the metabolic pathways and/or osmotic stress in E. coli K-12 MG1655.
To test these hypotheses, we evaluated the impact of yciT deletion on the growth of E. coli in M9 minimal media containing different carbon sources (glucose, fructose, sorbitol), and found that the deletion of the yciT gene did not reveal significant growth deficiencies compared to the wild type strain. However, the final OD600 of the yciT deletion strain at the stationary phase was slightly lower than the wild type strain (Supplementary Figure S15). Since the physiological roles of enzymes (YbiV, YbiY, and FsaA) regulated by YciT are not yet fully understood, little is known about the impact of YciT on the metabolic pathways.
Furthermore, we assessed the effects of osmotic stress on E. coli grown in M9 minimal medium with sorbitol as the sole carbon source (Figure 5D). We found osmotic stress induced growth retardation in the wild type and yciT deletion strains. Specifically, high osmolarity resulted in impaired growth and slowed the growth rate of the yciT deletion strain. Thus, we demonstrated that YciT is involved in the control of osmolarity in E. coli K-12 MG1655.
A local regulator (type II), YbcM
The ybcM gene was found by screening genes whose products protect E. coli from lethal effects of stresses (40). But there are no in vivo assays to confirm its DNA binding affinity. To determine the binding sites, the ChIP-exo experiment for YbcM was conducted under oxidative stress. We identified 12 binding sites in E. coli K-12 MG1655 (Figure 6A). 92% (11/12) of the binding sites are located upstream of target genes. We found one binding site located upstream of operon ybcLM, indicating its autoregulation (Figure 6B). The gene ybcL encodes the periplasmic protein YbcL, and has sequence and structural similarity to rat/human RKIP (Raf kinase inhibitor protein), which modulates signal transduction pathways (41).
To predict the functions of YbcM, we examined 12 binding sites and their functions, and found that there are two important binding sites involved in stress response. The first was located upstream of the gene ucpA, encoding the oxidoreductase UcpA (Figure 6C, upper panel). Overexpression of ucpA in plasmids was previously shown to lead to improved tolerance to furan (42), a chemical likely generating oxidative stress. The other divergent binding site was located between operons ubiT-yhbS and ubiUV (Figure 6C, bottom panel). Here, the ubiT gene encodes anaerobic ubiquinone biosynthesis accessory factor UbiT, yhbS encodes putative N-acetyltransferase YhbS, and ubiUV encodes ubiquinone biosynthesis complex UbiUV. Another gene, ubiW, near the operon ubiUV, encodes putative luciferase-like monooxygenase. We also identified a consensus YbcM binding motif in the regulatory region of these target genes (Supplementary Figure S4). Taken together, this data suggests that YbcM is a regulator responsible for the oxidative stress response in E. coli K-12 MG1655.
To confirm YbcM’s physiological role, the survival rate of the wild type and ybcM deletion strains were compared under oxidative stress conditions (Figure 6D). The survival rate of the wild type strain was 8-fold higher than the ybcM deletion strain after 15 min 60 mM H2O2 treatment. This observation confirms the involvement of YbcM in the reactive oxygen species (ROS) stress response.
A single-target regulator (type III), YgbI
In this study, we identified a single divergent binding site between the ygbI and ygbJ genes, indicating the autoregulation of ygbI (Figure 7A). We also found that this binding site overlaps the promoter region of the gene ygbJ. This observation strongly suggested that the overlap competes with the RNAP binding site, repressing the expression of downstream genes (ygbJ, ygbK).
To examine this assumption about the regulation of YgbI, we compared gene expression profiling between the wild type and the ygbI mutant (Figure 7B). The results showed that the expression of a cluster of genes (ygbJ, ygbK, ygbL, ygbM, ybgN) are upregulated after the deletion of ygbI (Figure 7C). This suggests that YgbI regulates the downstream gene cluster (ygbJKLMN) as a repressor, which is consistent with the prediction of a regulatory effect.
Previous studies reported that the downstream gene cluster ygbJKLM had putative functions in catabolic pathways for acid sugars (43), and hypothesized that the E. coli K-12 strain carrying mutations in the ygbI gene would provide a growth benefit on the tartrate medium (44). To verify the function of YgbI, the growth profiles of the wild type and the ygbI deletion strain were measured in 20 mM l-tartrate medium. Although the wild type strain does not grow on l-tartrate medium, the ygbI deletion strain could grow on l-tartrate (Figure 7D). Taking these factors into consideration, the potential pathway that YgbI is involved in was proposed as follows: when YgbI is present and active in vivo, it directly binds to the promoter of the operon ygbJK, and indirectly inhibits the expression of the genes ygbLM and ygbN. When the gene ybgI is knocked out, it leads to de-repression of operons ygbJK and ygbLM and the gene ygbN (Figure 7E). Based on the putative function of genes (ygbJKLMN), we suggest that YgbI is a repressor involved in the catabolic pathway for l-tartrate in E. coli K-12 MG1655.
DISCUSSION
Despite extensive research over many decades focused on the E. coli genome, around 35% of its genes are still poorly characterized, including some uncharacterized transcription factors (10,45). Our primary goal in this study was to generate a large data set to further identify DNA-binding proteins from a pool of uncharacterized proteins in E. coli K-12 MG1655. We used a systematic approach to validate 34 computationally predicted transcription factors and employed a multiplexed ChIP-exo method to characterize binding sites and classify this experimental evidence for each TF. Next, we compared the binding profiles of the candidate TFs with binding peaks for RNAP holoenzyme, which generated a total of 283 (out of 588 sites) that are likely to regulate a nearby promoter (Dataset S4), and provide a coarse-grained functional prediction. Finally, we inferred the putative functions for ten of these candidate TFs (YidZ, YfeC, YciT, YdhB, YbcM, YneJ, YjhI, YfiE, YgbI, YnfL), and verified the biological roles of the representative TFs with detailed analysis. The implications of our results are below.
First, our study collected a large dataset of 588 TFBSs and expanded the total number of verified TFs in E. coli K-12 MG1655, close to the estimated total number of 280 (Supplementary Figure S16). Comparative analysis of binding sites of the TFs and RNAP enables the identification of target genes that are recognized by RNA polymerase complexes. The 283 RNAP binding sites among a total of 588 TFBSs means that almost half of the binding sites are likely to regulate a nearby promoter under the test conditions. Also, the interaction between RNAP and the recognition sequence at the promoter region may change depending upon the test conditions. It is possible that some TFBSs that are not identified by RNAP may be recognized by the RNAP complex under different conditions. Furthermore, discovering all of the TFs is fundamental to fully understanding the key role TRNs play in enabling bacteria to modulate the expression of thousands of genes in response to environmental and genetic perturbations (46). This study has brought us closer to revealing the identity of all the TFs in E. coli K-12 MG1655.
Second, we used the definition of TFs reported by Shimada et al., to classify candidate TFs into three groups: type I regulators, type II regulators, and type III single-target regulators (8). This classification was based on the number of genes bound by TFs as determined from the systematic evolution of ligands with exponential enrichment (SELEX) (47). Our rationale for using this classification was twofold: (i) the multiplexed ChIP-exo method employed here offers a similar readout to SELEX (i.e., the number of target genes), allowing for its application in the same context; and (ii) it has a successful track record of assigning annotations (e.g. ‘global’ or ‘local’ regulator) prior to a full understanding of the functions of the validated TFs, helping to guide their future study. Thus, we employed this classification based on the number of target genes shown by genome-wide experiments. We expect that a detailed characterization of these validated TFs will help us develop a comprehensive understanding of transcriptional regulation in E. coli K-12 MG1655.
Third, we did not identify binding sites for six of the candidate TFs tested in this study (YgeR, YggD, YjjJ, YfjR, YeeY, YpdC). There may be two reasons for this. The first is the false-positive predictions of candidate TFs due to the limitations of the sequence homology search. Specifically, YgeR has been recently re-annotated as putative lipoprotein involved in septation (48). YggD has been verified as fumarase E (49). Overexpression of YjjJ increases toxic effects in E. coli, thus yjjJ is likely to be a toxin (50). YfjR is predicted as a putative TF involved in biofilm formation (51), but a recent study that searched for novel TFs involved in biofilm formation has not validated this prediction (30). A second reason for failed prediction is that we may need to test for DNA-binding activity under the active conditions. YeeY and YpdC are annotated as a LysR-type regulator with a C-terminal HTH domain and an AraC-type regulator with a C-terminal HTH domain, respectively (Table 1). Thus they may have regulatory functions under the appropriate growth conditions.
Fourth, while we identified additional TFs with the experimental data, we did not fully decipher mutant phenotypes. For example, we identified YciT as a TF and found that it directly regulated multiple target genes (fsaA, ybiY, ybiV). This result hinted at an uncharacterized pathway composed of genes encoding DUF1479 domain-containing protein (ybiU), a sugar phosphatase (ybiV), a putative pyruvate formate lyase (PFL) (ybiW), a putative pyruvate formate-lyase activating enzyme (PFL-AE) (ybiY), and a fructose-6-phosphate aldolase1 (FSA) (fsaA) (Supplementary Figure S17). However, these enzymes and their corresponding substrates are rare and have not been identified. Little is known about their physiological roles in E. coli K-12 MG1655 (52). These bottlenecks may pose challenges in fully examining mutant phenotypes. Studying these enzymes should provide insight into the biological roles of YciT.
Finally, a collection of TFBSs data sets will lay the foundation for understanding the mechanisms of transcriptional regulation. In this study, we discovered that YfeC regulates multiple cellular processes in E. coli K-12 MG1655. Previous studies had not delved into a possible relationship between eDNA release and YfeC. Therefore, we employed a yfeC mutant to better understand any possible connections. The common mechanism of eDNA release in bacteria is through membrane vesicles (MVs) secretion (39). Thus eDNA production relies on several biological processes: (i) DNA replication, to produce DNA for secretion (referred to as eDNA); (ii) nutrient transport and metabolism, to generate lipid metabolism for MVs; (iii) energy conversion, to produce energy for the conversion of metabolism and the secretion of MVs; (iv) transcription and translation, to produce the proteins for the assembly of MVs; (v) post-translational modification, protein turnover, and chaperones, to modify and fold the proteins for secretion and (vi) cell wall/envelope biogenesis, to repair the cell wall after the secretion of eDNA (Supplementary Figure S18) (53). As a repressor, YfeC participates in many cellular processes, including lipid metabolism, translation, post-translational modification, and cell wall/envelope biogenesis. Accordingly, these corresponding biological processes are up-regulated after the deletion of yfeC. We proposed that the deletion of the yfeC gene may hasten these cellular processes, leading to eDNA release. Taken together, this study significantly expands the size of the TFs with experimental evidence, broadening our knowledge of transcriptional regulation in E. coli K-12 MG1655.
DATA AVAILABILITY
The whole dataset of ChIP-exo and RNA-seq has been deposited to GEO with the accession number of GSE159777 and GSE159658, respectively.
Supplementary Material
ACKNOWLEDGEMENTS
We thank Drs Donghyuk Kim and Sang Woo Seo for the insights, and Marc Abrams for reviewing and editing the manuscript.
Contributor Information
Ye Gao, Department of Biological Sciences, University of California San Diego, La Jolla, CA 92093, USA; Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA.
Hyun Gyu Lim, Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA.
Hans Verkler, Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA.
Richard Szubin, Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA.
Daniel Quach, Department of Biological Sciences, University of California San Diego, La Jolla, CA 92093, USA; Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA.
Irina Rodionova, Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA.
Ke Chen, Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA.
James T Yurkovich, Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA.
Byung-Kwan Cho, Department of Biological Sciences and KI for the BioCentury, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea.
Bernhard O Palsson, Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA; Department of Pediatrics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA; Novo Nordisk Foundation Center for Biosustainability, 2800, Kongens Lyngby, Denmark.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
Novo Nordisk Foundation [NNF10CC1016517]. Funding for open access charge: Novo Nordisk Foundation [NNF10CC1016517].
Conflict of interest statement. None declared.
REFERENCES
- 1.Browning D.F., Busby S.J.W.. Local and global regulation of transcription initiation in bacteria. Nat. Rev. Microbiol. 2016; 14:638–650. [DOI] [PubMed] [Google Scholar]
- 2.Mejía-Almonte C., Busby S.J.W., Wade J.T., van Helden J., Arkin A.P., Stormo G.D., Eilbeck K., Palsson B.O., Galagan J.E., Collado-Vides J.. Redefining fundamental concepts of transcription initiation in bacteria. Nat. Rev. Genet. 2020; 21:699–714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Balleza E., López-Bojorquez L.N., Martínez-Antonio A., Resendis-Antonio O., Lozada-Chávez I., Balderas-Martínez Y.I., Encarnación S., Collado-Vides J.. Regulation by transcription factors in bacteria: beyond description. FEMS Microbiol. Rev. 2009; 33:133–151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Dobrin R., Beg Q.K., Barabási A.-L., Oltvai Z.N.. Aggregation of topological motifs in the Escherichia coli transcriptional regulatory network. BMC Bioinformatics. 2004; 5:10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Fang X., Sastry A., Mih N., Kim D., Tan J., Yurkovich J.T., Lloyd C.J., Gao Y., Yang L., Palsson B.O.. Global transcriptional regulatory network for Escherichia coli robustly connects gene expression to transcription factor activities. Proc. Natl. Acad. Sci. U.S.A. 2017; 114:10286–10291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Park P.J.ChIP–seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 2009; 10:669–680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Rhee H.S., Pugh B.F.. ChIP-exo method for identifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy. Curr. Protoc. Mol. Biol. 2012; Chapter 21:Unit 21.24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Shimada T., Ogasawara H., Ishihama A.. Single-target regulators form a minor group of transcription factors in Escherichia coli K-12. Nucleic Acids Res. 2018; 46:3921–3936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Pérez-Rueda E., Collado-Vides J.. The repertoire of DNA-binding transcriptional regulators in Escherichia coli K-12. Nucleic Acids Res. 2000; 28:1838–1847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Gao Y., Yurkovich J.T., Seo S.W., Kabimoldayev I., Dräger A., Chen K., Sastry A.V., Fang X., Mih N., Yang L.et al.. Systematic discovery of uncharacterized transcription factors in Escherichia coli K-12 MG1655. Nucleic Acids Res. 2018; 46:10682–10696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Eichner J., Topf F., Dräger A., Wrzodek C., Wanke D., Zell A.. TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors. PLoS One. 2013; 8:e82238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Cho B.-K., Knight E.M., Palsson B.O.. PCR-based tandem epitope tagging system for Escherichia coli genome engineering. BioTechniques. 2006; 40:67–72. [DOI] [PubMed] [Google Scholar]
- 13.Seo S.W., Kim D., Latif H., O’Brien E.J., Szubin R., Palsson B.O.. Deciphering Fur transcriptional regulatory network highlights its complex role beyond iron metabolism in Escherichia coli. Nat. Commun. 2014; 5:4910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Cho B.-K., Kim D., Knight E.M., Zengler K., Palsson B.O.. Genome-scale reconstruction of the sigma factor network in Escherichia coli: topology and functional states. BMC Biol. 2014; 12:4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Langmead B., Trapnell C., Pop M., Salzberg S.L.. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009; 10:R25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wang L., Chen J., Wang C., Uusküla-Reimand L., Chen K., Medina-Rivera A., Young E.J., Zimmermann M.T., Yan H., Sun Z.et al.. MACE: model based analysis of ChIP-exo. Nucleic Acids Res. 2014; 42:e156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kharchenko P.V., Tolstorukov M.Y., Park P.J.. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol. 2008; 26:1351–1359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bailey T.L., Boden M., Buske F.A., Frith M., Grant C.E., Clementi L., Ren J., Li W.W., Noble W.S.. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009; 37:W202–W208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Tatusov R.L., Galperin M.Y., Natale D.A., Koonin E.V.. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000; 28:33–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ross M.G., Russ C., Costello M., Hollinger A., Lennon N.J., Hegarty R., Nusbaum C., Jaffe D.B.. Characterizing and measuring bias in sequence data. Genome Biol. 2013; 14:R51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Quail M.A., Otto T.D., Gu Y., Harris S.R., Skelly T.F., McQuillan J.A., Swerdlow H.P., Oyola S.O.. Optimal enzymes for amplifying sequencing libraries. Nat. Methods. 2011; 9:10. [DOI] [PubMed] [Google Scholar]
- 22.Lawrence M., Huber W., Pagès H., Aboyoun P., Carlson M., Gentleman R., Morgan M.T., Carey V.J.. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 2013; 9:e1003118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Love M.I., Huber W., Anders S.. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014; 15:550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Biasini M., Bienert S., Waterhouse A., Arnold K., Studer G., Schmidt T., Kiefer F., Gallo Cassarino T., Bertoni M., Bordoli L.et al.. SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res. 2014; 42:W252–W258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.The UniProt Consortium UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2017; 45:D158–D169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Humphrey W., Dalke A., Schulten K.. VMD: visual molecular dynamics. J. Mol. Graph. 1996; 14:33–38. [DOI] [PubMed] [Google Scholar]
- 27.Mermod M., Magnani D., Solioz M., Stoyanov J.V.. The copper-inducible ComR (YcfQ) repressor regulates expression of ComC (YcfR), which affects copper permeability of the outer membrane of Escherichia coli. Biometals. 2012; 25:33–43. [DOI] [PubMed] [Google Scholar]
- 28.Luhachack L., Rasouly A., Shamovsky I., Nudler E.. Transcription factor YcjW controls the emergency H2S production in E. coli. Nat. Commun. 2019; 10:2868–2875. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Yamamoto K., Nakano M., Ishihama A.. Regulatory role of transcription factor SutR (YdcN) in sulfur utilization in Escherichia coli. Microbiology. 2015; 161:99–111. [DOI] [PubMed] [Google Scholar]
- 30.Ogasawara H., Ishizuka T., Hotta S., Aoki M., Shimada T., Ishihama A.. Novel regulators of the csgD gene encoding the master regulator of biofilm formation in Escherichia coli K-12. Microbiology. 2020; 166:880–890. [DOI] [PubMed] [Google Scholar]
- 31.Ogasawara H., Ohe S., Ishihama A.. Role of transcription factor NimR (YeaM) in sensitivity control of Escherichia coli to 2-nitroimidazole. FEMS Microbiol. Lett. 2015; 362:1–8. [DOI] [PubMed] [Google Scholar]
- 32.Shimada T., Yamamoto K., Nakano M., Watanabe H., Schleheck D., Ishihama A.. Regulatory role of CsqR (YihW) in transcription of the genes for catabolism of the anionic sugar sulfoquinovose (SQ) in Escherichia coli K-12. Microbiology. 2019; 165:78–89. [DOI] [PubMed] [Google Scholar]
- 33.Kaznadzey A., Shelyakin P., Belousova E., Eremina A., Shvyreva U., Bykova D., Emelianenko V., Korosteleva A., Tutukina M., Gelfand M.S.. The genes of the sulphoquinovose catabolism in Escherichia coli are also associated with a previously unknown pathway of lactose degradation. Sci. Rep. 2018; 8:3177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Turner P.C., Miller E.N., Jarboe L.R., Baggett C.L., Shanmugam K.T., Ingram L.O.. YqhC regulates transcription of the adjacent Escherichia coli genes yqhD and dkgA that are involved in furfural tolerance. J. Ind. Microbiol. Biotechnol. 2011; 38:431–439. [DOI] [PubMed] [Google Scholar]
- 35.Kim G.B., Gao Y., Palsson B.O., Lee S.Y.. DeepTFactor: a deep learning-based tool for the prediction of transcription factors. Proc. Natl. Acad. Sci. U.S.A. 2021; 118:e2021171118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Gough J., Karplus K., Hughey R., Chothia C.. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol. Biol. 2001; 313:903–919. [DOI] [PubMed] [Google Scholar]
- 37.Pandurangan A.P., Stahlhacke J., Oates M.E., Smithers B., Gough J.. The SUPERFAMILY 2.0 database: a significant proteome update and a new webserver. Nucleic Acids Res. 2019; 47:D490–D494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Bailey T.L., Boden M., Buske F.A., Frith M., Grant C.E., Clementi L., Ren J., Li W.W., Noble W.S.. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009; 37:W202–W208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Sanchez-Torres V., Maeda T., Wood T.K.. Global regulator H-NS and lipoprotein NlpI influence production of extracellular DNA in Escherichia coli. Biochem. Biophys. Res. Commun. 2010; 401:197–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Han X., Dorsey-Oresto A., Malik M., Wang J.-Y., Drlica K., Zhao X., Lu T.. Escherichia coli genes that reduce the lethal effects of stress. BMC Microbiol. 2010; 10:35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Serre L., Pereira de Jesus K., Zelwer C., Bureaud N., Schoentgen F., Bénédetti H.. Crystal structures of YBHB and YBCL from Escherichia coli, two bacterial homologues to a Raf kinase inhibitor protein. J. Mol. Biol. 2001; 310:617–634. [DOI] [PubMed] [Google Scholar]
- 42.Wang X., Miller E.N., Yomano L.P., Shanmugam K.T., Ingram L.O.. Increased furan tolerance in Escherichia coli due to a cryptic ucpA gene. Appl. Environ. Microbiol. 2012; 78:2452–2455. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Zhang X., Carter M.S., Vetting M.W., SanFrancisco B., Zhao S., Al-Obaidi N.F., Solbiati J.O., Thiaville J.J., de Crécy-Lagard V., Jacobson M.P.et al.. Assignment of function to a domain of unknown function: DUF1537 is a new kinase family in catabolic pathways for acid sugars. Proc. Natl. Acad. Sci. U.S.A. 2016; 113:E4161–E4169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Guzmán G.I., Sandberg T.E., LaCroix R.A., Nyerges Á., Papp H., de Raad M., King Z.A., Hefner Y., Northen T.R., Notebaart R.A.et al.. Enzyme promiscuity shapes adaptation to novel growth substrates. Mol. Syst. Biol. 2019; 15:e8462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Ghatak S., King Z.A., Sastry A., Palsson B.O.. The y-ome defines the 35% of Escherichia coli genes that lack experimental evidence of function. Nucleic. Acids. Res. 2019; 47:2446–2454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Martínez-Antonio A., Janga S.C., Thieffry D.. Functional organisation of Escherichia coli transcriptional regulatory network. J. Mol. Biol. 2008; 381:238–247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Shimada T., Fujita N., Yamamoto K., Ishihama A.. Genomic SELEX for the genome-wide search of regulation targets by transcription factors: SELEX-clos and SELEX-chip procedures. 2009 International Symposium on Micro-NanoMechatronics and Human Science. 2009; 10.1109/mhs.2009.5351954. [DOI] [Google Scholar]
- 48.Keseler I.M., Mackie A., Santos-Zavaleta A., Billington R., Bonavides-Martínez C., Caspi R., Fulcher C., Gama-Castro S., Kothari A., Krummenacker M.et al.. The EcoCyc database: reflecting new knowledge about Escherichia coli K-12. Nucleic Acids Res. 2017; 45:D543–D550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Sévin D.C., Fuhrer T., Zamboni N., Sauer U.. Nontargeted in vitro metabolomics for high-throughput identification of novel enzymes in Escherichia coli. Nat. Methods. 2017; 14:187–194. [DOI] [PubMed] [Google Scholar]
- 50.Maeda Y., Lin C.-Y., Ishida Y., Inouye M., Yamaguchi Y., Phadtare S.. Characterization of YjjJ toxin of Escherichia coli. FEMS Microbiol. Lett. 2017; 364:fnx086. [DOI] [PubMed] [Google Scholar]
- 51.Herzberg M., Kaye I.K., Peti W., Wood T.K.. YdgG (TqsA) controls biofilm formation in Escherichia coli K-12 through autoinducer 2 transport. J. Bacteriol. 2006; 188:587–598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Schürmann M., Sprenger G.A.. Fructose-6-phosphate aldolase is a novel class I aldolase from Escherichia coli and is related to a novel group of bacterial transaldolases. J. Biol. Chem. 2001; 276:11055–11061. [DOI] [PubMed] [Google Scholar]
- 53.Ibáñez de Aldecoa A.L., Zafra O., González-Pastor J.E.. Mechanisms and regulation of extracellular DNA release and its biological roles in microbial communities. Front. Microbiol. 2017; 8:1390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Rodionova I.A., Gao Y., Sastry A., Monk J., Wong N., Szubin R., Lim H., Zhang Z., Saier M.H., Palsson B.. PtrR (YneJ) is a novel E. coli transcription factor regulating the putrescine stress response and glutamate utilization. 2020; bioRxiv doi:29 April 2020, preprint: not peer reviewed 10.1101/2020.04.27.065417. [DOI]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The whole dataset of ChIP-exo and RNA-seq has been deposited to GEO with the accession number of GSE159777 and GSE159658, respectively.