Summary
Recent studies on targeted gene integrations in bacteria have demonstrated that chromosomal location can substantially affect a gene’s expression level. However, these studies have only provided information on small numbers of sites. To measure position effects on transcriptional propensity at high resolution across the genome, we built and analyzed a library of over 144,000 genome-integrated, standardized reporters in a single mixed population of Escherichia coli. We observed more than 20-fold variation in transcriptional propensity across the genome when the length of the chromosome was binned into broad 4 kbp regions; greater variability was observed over smaller regions. Our data reveal peaks of high transcriptional propensity centered on ribosomal RNA operons and core metabolic genes, while prophages and mobile genetic elements were enriched in less transcribable regions. In total, our work supports the hypothesis that E. coli has evolved gene-independent mechanisms for regulating expression from specific regions of its genome.
Graphical Abstract

eTOC
Scholz et al. use barcodes to track thousands of reporters integrated into various sites across the E. coli genome in order to create a high-resolution map of the propensity for transcription. They find that ribosomal RNA operons and core metabolic genes are enriched in highly transcribable regions, while mobile genetic elements such as prophages are enriched in silenced regions.
Introduction
The bacterial nucleoid is a dense structure composed of DNA, RNA and proteins, and excludes other abundant cellular machinery, such as ribosomes and RNA polymerase (RNAP), from its interior (Bakshi et al., 2015; Chai et al., 2014; Jin and Cabrera, 2006). Several studies have demonstrated that packing of the nucleoid is non-random and condition dependent. For example, chromosome conformation capture studies in multiple bacterial species have revealed segments of DNA that preferentially self interact, and have been called chromosome interaction domains (Le et al., 2013; Lioy et al., 2018; Marbouty et al., 2015; Wang et al., 2015). During exponential growth, RNAPs are also organized into tight foci on the nucleoid surface, actively transcribing the ribosomal RNA operons (rrn) (Cabrera and Jin, 2006), most of which appear spatially co-localized (Gaal et al., 2016). Despite the specific localization of DNA and RNAP, previous findings based on site-specific integrations have suggested that gene expression from different genomic loci is roughly equivalent, except for the effect of gene dosage, which decreases from the origin of replication to the terminus during exponential growth (Beckwith et al., 1966; Schmid and Roth, 1987; Sousa et al., 1997). Higher gene dosage near the origin is a result of multiple replication initiation events before terminus replication and cell division (Cooper and Helmstetter, 1968); historically, the bacterial chromosome has otherwise been considered generally accessible structurally and for transcription, without detectable interference from chromosomal structure (Masters, 1977; Schmid and Roth, 1987).
By measuring GFP fluorescence from a terminator-flanked reporter integrated into several sites, Block et al. observed that gene expression variation from the origin to the terminus corresponded to expected growth-rate dependent gene dosage changes (Block et al., 2012), consistent with the expectations outlined above. More recently, however, the dogma of uniform expression capability across the genome has been challenged by several lines of evidence. Using a similar approach to Block et al., Bryant et al. demonstrated widely varying expression from a GFP reporter in E. coli that did not correlate with genome copy number (Bryant et al., 2014). Some of the lowest expressing sites were in transcriptionally silent Extended Protein Occupancy Domains (tsEPODs) (Vora et al., 2009), which are regions of high protein occupancy on the genome that appear to correlate with low transcript levels. In some cases, the reporter gene expression could be increased by replacing the tsEPOD with the reporter gene instead of integrating within it (Bryant et al., 2014). For some reporters outside of tsEPODs, expression interference from neighboring genes drove down reporter expression, depending on the relative gene orientation. Gene expression interference between neighboring genes has also been studied in more detail on plasmids within E. coli cells (Yeung et al., 2017). In that study, some of the gene expression interference observed between neighboring genes could be attributed to competition for negative DNA supercoiling and was gene orientation-specific. DNA gyrases and topoisomerases maintain negative supercoiling, which compacts the nucleoid and is important for gene expression (Dorman, 2006). Brambilla and Sclavi have also tracked expression of a reporter under a promoter known to be bound by the nucleoid protein H-NS from 9 different sites over the E. coli growth period and observed different site-specific expression levels depending on the growth phase (Brambilla and Sclavi, 2015).
Despite the specific observations described above, a systematic understanding of the effects of chromosomal position itself on gene expression has so far eluded the field. Previous studies on position-dependent expression variation have been limited to a small number of integration sites, which was appropriate for mechanistic studies into the effects of specific genomic features, but could not reveal the full range of position-dependent effects on transcription. DNA supercoiling, protein occupancy, transcriptional interference and binding of promoters and genes by various nucleoid associated proteins (NAPs) are examples of genomic features that affect expression of large portions of genes in the bacterial genome. Extensive work has been conducted to characterize the effects of a number of these factors for expression of specific genes. However, genomic features vary simultaneously across the genome, potentially leading to combinatorial effects on gene expression (Martínez-Antonio et al., 2009; Meyer et al., 2017). Specific loci may have unique features affecting transcription, which could only be identified by high-resolution mapping of position-dependent expression variation.
Here, we employ Tn5 transposase to perform massively parallel integration of a standardized, barcoded reporter construct, allowing us to obtain an empirical map of gene-independent transcriptional propensity -- that is, the amount of RNA produced per unit DNA from a given reporter -- across the bacterial genome (Fig. 1). High-resolution transcriptional propensity comparisons with genomic features can reveal both strong and weak correlations with high statistical power. To test the effect of genome position on gene expression, and not native gene regulation, we designed a reporter construct with strong bi-directional terminators (Chen et al., 2013) and its own inducible promoter (Fig. 1A). Each reporter construct is tagged with a unique barcode identifier, which allows simultaneous tracking of gene expression from thousands of integrations. Using a modified transposon footprinting procedure, unique barcodes were paired with integration location, allowing barcodes to serve as a proxy for the overall abundance of RNA or DNA at each integration address. The σ70 dependent TetO1 promoter drives expression of mNeonGreen (mNG) followed by a 15 base barcode on the 3’ UTR of the RNA upon induction by anhydrotetracycline (aTc) (Clavel et al., 2016). The reporter used here was designed to be relatively small in size and have an intermediate transcription rate (Kosuri et al., 2013), in order to minimize the effect of the reporter on the local genome structure-(Le et al., 2013). The inclusion of an open reading frame in our construct ensures that the transcribed RNA will be subject to typical post-transcriptional phenomena (e.g., co-transcriptional translation and subsequent protection by ribosomes). In keeping with efforts to minimize reporter size, the selection marker is an FRT-flanked kanamycin resistance cassette, and was removed by Flp recombinase before the full-scale profiling procedure (Fig. 1A).
Figure 1: Library construction and data acquisition for position-dependent transcriptional propensity mapping.
A) mNeonGreen (mNG) reporter is controlled by the TetO1 promoter. The orange arrow indicates the position of the 15 bp barcode that is transcribed with mNG. The construct is flanked by strong bi-directional terminators and mosaic ends (ME), which are recognized by Tn5 transposase. P1 and P2 indicate sites used for light amplification in preparation for barcode sequencing. Construct size and features are shown before and after curing of a kanamycin resistance marker (KanR). B) To produce the reporter library, randomly barcoded reporter constructs in complex with Tn5 are electroporated into cells and randomly integrated into the E. coli genome in parallel. C) Transposon footprinting pairs barcode sequence (orange) with integration location on the genome (black). 4 bp recognition restriction enzymes cut upstream of the barcode and randomly in the downstream genomic DNA. After ligation of the Y-linker (red), construct-containing DNA fragments are specifically amplified and sequenced. Footprinting need only be done once for a given library to identify the insertion location corresponding to each barcode. D) To measure transcriptional propensities, the reporter library is grown to an optical density (OD)at 600 nm of 0.2. Total RNA and DNA are extracted. After nucleic acid processing (Fig S7), the RNA/DNA ratio for each barcode are mapped to their corresponding genomic location.
Results
Tn5 transposition was used to integrate the barcoded reporter in a massively parallel fashion into the E. coli genome. We mapped 144,672 unique reporter barcodes to 98,034 unique genomic integration sites, corresponding to an average of one unique location every 47 bp. As integration rate was not uniform across the genome, resolution varies depending on the region (Fig. 2C and Fig. S1E). Neighboring integrations have high similarity in raw RNA barcode produced per unit DNA barcode (which we refer to as transcriptional propensity), indicating that reporter transcription is dependent on integration location (Fig. 2A, B). After smoothing the raw transcriptional propensity by taking the median value for reporters in a 500 bp window around each integration, the highly-correlated replicates (Fig. 2D, Spearman ρ=0.915) were averaged to produce the high-resolution transcriptional propensity map (Fig. 2E). The transcriptional propensity map for all analyses includes only sites where at least three independent integrations were measured within a 500 bp window. The transcriptional propensity signal is reported as a median of signal for all integration events within a 500 bp window centered around each integration in all calculations in order to minimize noise potentially arising from a single barcode (See Table S7 for all transcriptional propensity and count values). Several other potentially confounding features, such as barcode-specific GC content and reporter integration-specific growth rate changes, do not have systematic effects on this transcriptional propensity signal (Fig. S2 and STAR Methods - Elimination of potentially confounding features). N.b. the barcode abundance measurements used are for the reporter barcodes only; RNA from native transcripts is not sequenced in our experiments. We also note that in principle, any potential genome position-dependent effects on RNA stability would be part of the transcriptional propensity signal. Although there is a weak positive correlation between the degradation rate of RNA from neighboring operons (Selinger et al., 2003), RNA abundance is well-correlated with transcription rate of native genes in E. coli (Chen et al., 2015), whereas RNA stability is generally not predictive of overall transcript levels in E. coli (Bernstein et al., 2002), hence our use of the term transcriptional propensity (as opposed to RNA abundance propensity).
Figure 2: Genome-position dependent transcriptional propensity from Tn5-based integration of a barcoded reporter is non-random.
A) Autocorrelation of raw RNA/DNA ratio values for replicate 1. Lag represents base pair distance. The dashed line represents the 99% autocorrelation confidence interval for a white noise process, thus representing the level of autocorrelation that would be observed in the absence of true signal. B) Autocorrelation of raw RNA/DNA ratio values for replicate 2. C) Reporter integration count within 1 kb windows throughout the genome. D) Correlation between replicates for calculated transcriptional propensity from 500 bp rolling median windows (Spearman ρ= 0.91). E) Average transcriptional propensity (over 500 bp median rolling windows with at least three unique barcodes) mapped to specific integration locations on the E. coli genome. The color indicates the number of unique transposon insertions in the same bin of RNA/DNA ratio. All values used to generate these plots can be found in Table S7. F) Shown are kernel density estimates of the distributions of log2 fold deviations of transcriptional propensities (smoothed by taking the median value from different window sizes around each site) vs the global median of transcriptional propensity from the same sample. “Int” indicates the minimum number of integrations required to generate a median-smoothed value for each window size. The blue curve corresponds to the smoothing used throughout the remainder of the text. G) Spectral analysis of the observed transcriptional propensity signal (averaged across biological replicates) using the Lomb-Scargle periodogram method (Lomb, 1976; Scargle, 1982). Only periods that represent an integer divisor of the total genome length were analyzed. The green dotted line represents an overall <1% false discovery rate as determined by a permutation test using blocks of 2,200 adjacent collection bins (~100,000 bp; see Methods for details) Inset: as in main figure, zoomed out to show periods < 160,000 bp. H) Sinusoidal function fit (red line) of the period with the highest Lomb-Scargle power (663,093.14 bp, 7 repetitions per genome length) to the transcriptional propensity.
Transcriptional propensity is highly variable across the E. coli genome
Transcriptional propensity variation appears roughly periodic at the whole-genome scale (Fig. 2E). Several sharp troughs are also apparent, independent of the overall waveform. Transcriptional propensities are not a result of gene dosage resulting from high Ori-Ter ratios during exponential phase growth or from differing representation of a library member, because all transcriptional propensities are reported as RNA:DNA ratios.-The distributions of transcriptional propensity values observed using different windowing sizes are plotted in Figure 2F. Substantial position-dependent variation is present throughout the genome. A smooth, roughly log-normal population is observed spanning a 16-fold range of propensities (with 99% of values contained within a central 4.2-fold range); furthermore, many genomic regions are apparent in the tails of the distribution that represent dramatically activated or silenced sites. There is a >250-fold propensity difference between the highest and lowest 500 bp windows (using the median of no fewer than three sites within each window, to avoid undue impact of individual outliers). Even considered over broader regions, an overall >195-fold range persists using a window size of 1 kbp (requiring no fewer than 5 reporters), and a >22-fold range of transcriptional propensity difference between the highest and lowest 4 kbp regions of the chromosome (median of integration sites within a 4kb window with at least 8 reporters - see Fig. 2F for transcriptional propensity variation considered over different median-windowing sizes). A tradeoff of course exists in expanding the window size used in the analysis above, as larger windows will be less subject to statistical fluctuations, but also will likely miss biologically meaningful local variations in transcriptional propensity and instead provide a regional average across large chunks of the chromosome.
The superficially apparent periodicity in the transcriptional propensity is supported by spectral analysis via Lomb-Scargle periodograms (Lomb, 1976; Scargle, 1982). As shown in Fig. 2G, strong spectral lines are apparent at both 663 Kb and 773 Kb periods which would correspond to 7 and 6 repetitions throughout one genome length, suggesting that this length scale may be characteristic of a key unit of functional and/or spatial organization in the E. coli chromosome. Modeling the transcriptional propensity with a sinusoidal function at a period of 663 Kb shows a good fit (Fig. 2H). These period lengths are roughly consistent with the size of macrodomains observed in recent 3C-sequencing experiments (Lioy et al., 2018). We also note that the absence of a ~10kb component in our periodogram, which might be expected based on experiments measuring the propagation of supercoiling relaxation upon DNA damage (Postow et al., 2004), may arise simply due to a lack of periodicity in the ~10kb domain organization.
Ribosomal RNA operons are centered in broad transcriptional propensity peaks
Several genomic features are readily apparent as having substantial correlations with regions of high transcriptional propensity, as shown in Figure 3A. The seven rrn operons in the E. coli genome are located within the major peaks of transcriptional propensity, although a single major peak (near 1 Mb) occurs without an rrn operon (near 1 Mb). Thus, either the rrn operons have been selected to be contained in regions of exceptional transcriptional propensity, or they contain some feature that itself enhances transcriptional propensity in their surroundings. By subtracting a LOWESS local regression smoothing on transcriptional propensity with distance from the nearest rrn operon (Fig. 3C), from the overall transcriptional propensity signal, the major waveform pattern is mostly eliminated, while local peaks and troughs are still apparent (Fig. 3D); thus, several additional features must contribute locally to both position-dependent activation and silencing. Another feature that could contribute to the transcriptional propensity signal is the structural organization of the genome at both long and short length scales. In order to explore the relationship between long length scale organization and transcriptional propensity, we examined the interaction between our transcriptional propensity data and the macrodomain boundaries identified in (Lioy et al., 2018), shown in Fig. 3A (lower panel). Working at the level of each macrodomain, we grouped the transcriptional propensity signal into 10 equally spaced bins according to length-normalized position within the corresponding macrodomain. Transcriptional propensity is higher at macrodomain boundaries near a rrn operon, but this effect is not present at macrodomain boundaries that do not have a nearby rrn operon (Fig. S3C). On shorter length scales, such as the scale of the chromosomal-interacting domains (CIDs) observed in 3C-sequencing experiments (Lioy et al., 2018) or topologically separated domains observed in isolated chromosomal DNA (Sinden and Pettijohn, 1981), we see a limited relationship between the transcriptional propensity and the position within a defined CID (Fig. S3A). In fact, like the macrodomain boundaries, the transcriptional propensity tends to be higher at CID boundaries that coincide with a ribosomal RNA operon, but this effect is not present at CID boundaries that do not have a nearby rrn operon. Together, these results suggest that measures of the structural organization of the genome alone are not predictive of transcriptional propensity and other features must also contribute to the transcribability of any particular region.
Figure 3: Transcriptional propensity peaks correspond to ribosomal RNA operon and macrodomain boundaries.

A)(top) Transcriptional propensity plot with macrodomain boundaries, as determined from Lioy et al. 2018. Red dashed lines indicate ribosomal RNA operons. Ribosomal RNAs labeled with a star indicate rRNA operons near macrodomain boundaries. Black dashed lines indicate macrodomain boundaries that are not near a ribosomal RNA operon and gene names for the overlapping or nearest gene to these boundaries are indicated above. (bottom) Directionality index (DI; see Methods) determined at 400 Kb scale for each of two biological replicates taken from exponentially growing E. coli cells at 37 oC in LB media with macrodomain boundaries indicated with black dashed lines, obtained by re-analysis of data from (Lioy et al., 2018). For details on determination of macrodomain boundaries and accession of Lioy et al. data see Methods. B) Correlation of transcriptional propensity and distance from the nearest rrn operon (Spearman ρ= −0.56). C) Lowess fit of transcriptional propensity with rrn distance (fitting using a smoothing parameter of 0.33). D) Transcriptional propensity signal with values from Lowess regression subtracted from the overall signal in panel A.
Binding of the NAPs H-NS and Fis is strongly correlated with transcriptional propensity
We next examined the correlation of transcriptional propensity with several characterized genomic features using rolling-window medians over 500 bp for each data set. We observed the strongest effects for binding of the nucleoid proteins Fis and H-NS, as well as global protein occupancy measured via in vivo protein occupancy display (IPOD; Fig. 4 and Fig. S4A,B; see Table S1 for all Spearman correlations). Despite the fact that the abundant NAP Fis is not expected to bind the reporter construct itself, transcriptional propensity is highly positively correlated with Fis binding level at genomic integration sites (Spearman ρ = 0.50, Fig. 4A). Conversely, transcriptional propensity is strongly negatively correlated with H-NS binding (Spearman ρ = −0.58, Fig. 4B), consistent with the previously described gene silencing role for H-NS (Kahramanoglou et al., 2011). Transcriptional propensity is also negatively correlated with overall protein occupancy, as it is significantly lower in tsEPODS than in other sites, strongly supporting reporter silencing observed by Bryant et al. when integrated within tsEPODs (Fig. S4A) (Bryant et al., 2014; Vora et al., 2009).
Figure 4: Correlation of transcriptional propensity with binding of abundant NAPs and nucleotide content.
A) Correlation of transcriptional propensity with Fis binding (500 bp rolling median, Spearman ρ= 0.50). B). Correlation of transcriptional propensity with enrichment by H-NS binding (500 bp rolling median, Spearman ρ = −0.58). C) Correlation of transcriptional propensity with AT content (500 bp rolling mean, Spearman ρ = −0.59). D) Genome browser view of a large tsEPOD (2.79 Mb - 2.81 Mb) and surrounding genomic context. Tracks from top to bottom for Fis binding, H-NS binding, AT content and transcriptional propensity, transcripts per million (TPM) RNA from native genes and tsEPOD ranges. Strand-specific gene annotations are indicated below the data tracks (Gama-Castro et al., 2015).
RNA abundance from native genes displays only a weak positive correlation with transcriptional propensity (Spearman ρ = 0.24, Fig. S4D). However, when larger rolling median windows (50 Kb) are used for RNA abundance from native genes, correlation with transcriptional propensity is much higher (Spearman ρ = 0.51, Fig. S4E). These results show that while highly expressed genes are more frequently located in high transcriptional propensity regions, the regulatory logic governing expression of individual genes is dominant over the underlying transcriptional propensity of a given region.
Binding of other NAPs (HU, LRP and SeqA) was not well correlated with transcriptional propensity, nor was RNAP binding (Fig. S4). We likewise found no substantial correlation of transcriptional propensity with a measure of DNA supercoiling density or with reporter location with respect to genes encoding proteins recognized by the signal recognition particle (Fig. S4) (Lal et al., 2016; Moffitt et al., 2016). In contrast, mean adenine and thymine (AT) content in a 500 bp window around insertion locations was strongly negatively correlated with transcriptional propensity (Fig. 4B). AT content is also highly correlated with H-NS and protein occupancy binding (it is notable that both H-NS and Fis have consensus motifs with high AT content (Gama-Castro et al., 2015), although the Fis consensus sequence is bookended by a G/C).
Transcriptional propensity does not show substantial strand specificity and is insulated from native transcription
We also examined the correlation of neighboring RNA abundance in all orientations relative to the reporter on transcriptional propensity (Table S2). These data indicate that neighboring transcription has no more than a tiny impact on transcriptional propensity in our experimental setup, likely due to the strong bidirectional terminators flanking our insertion construct. Since the correlations of native RNA abundances with transcriptional propensity in the tandem orientation with respect to the reporter are very similar regardless of which is upstream (Spearman correlation between reporter and adjacent RNA abundance is 0.16 with the reporter upstream, 0.15 with the reporter downstream), insulation by the strong upstream transcriptional terminator of the reporter is also validated. Transcriptional propensity from reporters on the leading and lagging strands display the same overall waveform pattern and transcriptional propensities at the same positions are highly correlated with each other (Spearman ρ=0.89, Fig. S4L-O). Sequence composition and nucleoid protein occupancy make unique contributions to transcriptional propensity
Given the numerous correlations between transcriptional propensity and other genomic features observed above, it is useful to consider how much independent information is contributed by the various features that we have noted, and to what extent transcriptional propensity can be predicted solely on the basis of those features. W applied lasso regression to obtain regularized models that predict transcriptional propensities based on a minimal number of useful features. The input feature set included a total of 96 characteristics including sequence composition, protein occupancy data, and ribosomal RNA positioning (see Methods for details). During lasso regression, a regularization parameter (lambda) is gradually scaled from higher to lower values; as it does so, the penalty associated with having non-zero coefficients for various features falls, and thus more of the features contribute to the model. As seen in the fits in Fig. 5 and Fig S5, the simplest justifiable model (based on five-fold cross validation) incorporates six features (given here in the order in which they appear during the regression): proximity to ribosomal RNA, AT content, H-NS occupancy, Fis occupancy, HU occupancy, and total protein occupancy (measured via IPOD).
Figure 5: Contributions of known genomic features to prediction of transcriptional propensity.

The top panel shows the mean-squared error (MSE) upon five-fold cross-validation for lasso regression as a function of the regularization parameter lambda (see Methods for details); dashed lines occur at the point of lowest MSE (left), and at the point with the highest value of lambda that is within 1 standard error of the lowest MSE (right). The bottom panel shows the signs of coefficients for several key genomic features as a function of lambda (grey: zero; blue: negative; red: positive). Fitted values for these coefficients at the two points shown by the dashed lines are given in Table S3, and a similar plot showing all features used in the fitted model given as Fig. S5. “rRNA impact” refers to the LOWESS-fitted signal assignable to proximity to the nearest ribosomal RNA, as shown in Fig. 3C.
Consistent with the results in Fig. 4 and Fig. S4, proximity to ribosomal RNA, Fis occupancy, and HU occupancy seem to characterize regions with high transcriptional propensity, whereas high AT content, H-NS occupancy, and overall protein occupancy are associated with lower transcriptional propensity. The fact that all six of the features discussed here have non-zero coefficients in the penalized lasso regression model, even when accounting for uncertainty under cross-validation, demonstrates that each bears significant information content, rather than one of them (e.g., AT content) acting as an underlying basis for others (e.g., H-NS occupancy, which does show a preference for high AT regions (Kahramanoglou et al., 2011)). The resulting linear model explains 69.9% of the variance in transcriptional propensity using the most parsimonious parameterization, and 72.3% of the variance using the parameterization that minimizes the mean squared error (see Fig. 5, Fig. S5 and Table S3). Thus, while the predictions are not perfect, a substantial fraction of the observed fluctuations in transcriptional propensity may be predicted on the basis of simple chromosomal features and nucleoid-associated protein occupancy. Note that in less conservative parameterizations, many additional features are incorporated into the model, including transcript levels, initiating (rifampicin-treated) RNA polymerase occupancy, and density of motif matches for several other transcription factors (Fig. S5); however, in light of the observed uncertainties upon cross-validation, inclusion of features beyond those shown in Fig. 5 cannot be strongly justified.
Location of specific classes of genes are informative of transcriptional propensity
To obtain a global picture of the biological logic dictating the organization of genes into high and low transcriptional propensity regions, we used iPAGE analysis (Goodarzi et al., 2009) to identify Gene Ontology (Ashburner et al., 2000; The Gene Ontology Consortium, 2017) terms that are informative about the the transcriptional propensity at each gene location (Fig. 6). As expected, large ribosomal subunit genes are informative of high transcriptional propensity (GO:0022625). Genes in pathways for enterobacterial common antigen biosynthesis or organic phosphonate catabolism (GO:0009246, GO:0019700) are clustered together in high transcriptional propensity peaks. Cellular amino acid biosynthetic process (GO:0008652), which has 105 genes in E. coli, is also predictive of high transcriptional propensity. We also identify intracellular protein transmembrane transport (GO:0065002) as being enriched in regions of high transcriptional propensity. However, we note that there is no difference in local transcriptional propensity between genes that encode products that are recognized by the SRP machinery and all other genes (Fig. S4K). Genes involved in cytolysis and DNA integration (GO:00019835, GO:0015074) are significantly enriched in regions of the lowest transcriptional propensity. They are also both composed predominantly of prophage genes, possibly reflecting selection for such genes to be in broadly silencing genomic contexts. Genes for lipopolysaccharide core region biosynthesis (GO:0009244) and O-antigen biosynthesis (GO:0009243) are also in regions of low transcriptional propensity, possibly because several of the GO term member genes are clustered in large tsEPODs between 3.795 Mb and 3.810 Mb and between 2.102 Mb and 2.115 Mb. It is important to note that iPAGE automatically filters out GO terms that do not convey substantial conditional mutual information above an already-present set; the absence of terms such as additional ribosomal protein GO terms simply arises due to this redundancy filtering.
Figure 6: Over-representation of GO terms in transcriptional propensity bins.
Pathways identified by iPAGE analysis (Goodarzi et al., 2009) as having significant mutual information with transcriptional propensity, and their overrepresentation within specific transcriptional propensity bins. Gene-specific metrics for transcriptional propensity are calculated as the log2 median values of replicate-averaged RNA/DNA ratios within the regions of the genes and all locations within 2.5 kb up- and down-stream. iPAGE then discretizes the gene-specific propensity metrics into nine equally populated bins, as shown in the upper panel, where the range of propensity metric within each bin (red boxes) is shown as a proportion of the overall range (background black boxes). The leftmost bins contain genes within the lowest transcriptional propensity regions, with the rightmost bins containing the highest. Enrichment of GO terms within each bin is identified as the absolute value of log-10 of enrichment p-value, with sign set such that under-represented GO terms are negative (blue end of left scale) and the overrepresented are positive (red end of scale). The heatmap of the sign-adjusted enrichment shows the over- (as red tiles in heatmap) and under-representation (as blue tiles) of GO terms in different transcriptional propensity levels that are visualized as separated bins on the horizontal axis of heatmap. Tiles with bold borders indicate significant individual enrichments (p<0.05 after Bonferroni correction across the row); note, however, that all displayed GO terms have significant mutual information with the transcriptional propensity profile (as assessed by the default series of tests used by iPAGE).
RNA abundance from targeted reporter integrations reveal a range of expression compared to native gene expression.
In order to further test the effect of chromosomal position on reporter transcription, we used lambda Red recombineering to perform targeted integrations of the reporter construct into several different sites representing a range of transcriptional propensities. Spline-smoothed genomic DNA sequencing counts from cells grown under the same conditions (Fig. 7A) were used to transform the transcriptional propensity map of RNA per DNA into a measure of reporter RNA counts per cell (Fig. 7B). These dosage-corrected values can be used to identify the highest and lowest transcriptional propensity regions for heterologous gene expression (Table S4). We then used RT-qPCR to quantify reporter RNA from four targeted reporter integration strains relative to a set of native reference transcripts. To provide a representative range of transcriptional propensities for comparison with native promoters while avoiding extremes, sites from within EPODs (ybcK and eaeH) and from relatively high propensity regions (phnO and yihG) were selected to represent the upper- and lower-middle distribution of transcriptional propensity variation (Compare to Fig. 2F). Reporter transcription from targeted integrations were in good quantitative agreement with dosage-transformed transcriptional propensity (Fig. 7C). Additionally, by measuring the RNA abundance per unit DNA from three native genes in each of the reporter integration strains, and comparing with insertions spanning a range of intermediate transcriptional propensities (1.5 – 8), we could determine the transcription from the targeted integration reporters relative to native gene expression (Fig 7D). These results show that RNA abundance per DNA from the reporter construct is in the 80–86th percentile when compared to native genes (Fig. 7E), indicating a moderately strong (but far from overpowering) promoter, and thus confirming the physiological relevance of our reporter.
Figure 7: Expression from targeted reporter integrations indicate transcription level relative to native genes.
A) Genomic DNA counts (blue) were used to generate spline-smoothed values (red) to estimate DNA dosage for cells grown under the same conditions as the reporter library B) Transcriptional propensity (as in Fig. 2E) transformed by the DNA dosage spline line in A. The map here reflects the transcriptional propensity per cell instead of per unit DNA (see Materials and Methods). C) qPCR measurement of RNA from targeted reporter integrations from strains grown under the same conditions as the reporter library compared to dosage-transformed transcriptional propensity. All values were normalized by opgG signal (Heng et al., 2011). D) qPCR measurements of mNeonGreen RNA per DNA for four targeted integration strains (green) and for three native genes (black). E) Histogram of RNA abundance (as estimated from TPM using RNA-seq (Kroner et al., 2018) per DNA abundance (as estimated from DNA copy number as in A) for each annotated gene in E. coli. The three native genes that were measured by qPCR in D are indicated.
Table S4 lists broad regions with the most extreme transcriptional propensity per cell (after transformation by genomic DNA copy number estimates), which may be useful information for heterologous gene expression from genomic sites.
Reporter integration by Tn5 is biased toward low transcriptional propensity regions
The correlation between reporter integration density and known genomic features was also tested (Fig. S6). H-NS binding had a very strong positive correlation with integration density (Fig. S6F). In addition, RNA abundance from native transcription showed the strongest negative correlation with integration density. Consistent with these two observations, transcriptional propensity itself was also negatively correlated with integration density. Although integration density is generally high, these results indicate that the resolution in low transcriptional propensity regions is generally better than the resolution at high transcriptional propensity regions, and also suggest that the same biological mechanisms responsible for shaping low transcriptional propensity regions also tend to occur in an environment more permissive for transposon insertion. It is important to note that as the integration densities arise from libraries that have undergone growth and antibiotic selection, some bias may arise from exclusion of essential genes or those genes that cause severe growth phenotypes upon transposon insertion.
Discussion
Random integration of barcoded reporters in the E. coli genome has allowed us to map transcriptional propensity at an unprecedented resolution across the genome. Previously, a reporter has been integrated into 27,000 sites in parallel in mouse embryonic stem cells using piggyBac transposition (Akhtar et al., 2013). The average resolution of one integration per 100 kbp revealed a stronger association of low transcription with lamina-associated domains than with repressive H3K9me2 histone modification. To our knowledge, as many as 38 sites have previously been tested in a single study for position-dependent expression variation in bacteria, which, due to the small 4.22 Mb Bacillus subtilis genome size, is a similar resolution to the mouse genome study described above (Jeong et al., 2018). Here, we used Tn5 transposition to integrate and track 144,000 barcoded reporters into the 4.6 Mb E. coli genome, to produce a map with an average resolution of one integration per 47 bp, the highest resolution gene-independent expression map for any species to date that we are aware of. This integration density uniquely allowed testing of reporter transcription from multiple sites within genomic neighborhoods with rare and distinct features (eg. ribosomal RNA operon regions, extreme nucleotide content).
Considered over the entire genome the fold-change between the highest and lowest transcriptional propensity locations is 272-fold, which is on the order of the fold change from different sites reported from reporter fluorescence at a small number of integrations (Bryant et al., 2014). These calculated propensities represent the value for a rolling median over 500 bp windows that required at least three independent integrations, thus avoiding strong influences of single outliers. It is also important to note that most of the genome shows intermediate levels of transcriptional propensity, with 99% of sites contained within a 4.2-fold range centered upon the median (using our standard window averaging). The full range of observed propensities arises due to substantially higher values at biologically important sites such as rrn operons, and dramatically lower values in silenced regions such as some EPODs; we consider the biological meanings of both extremes in detail below.
Ribosomal RNA operons occur in broad regions of high transcriptional propensity
Several large peaks of transcriptional propensity across the genome are centered on rrn operons (Fig. 3A). The rrn operons are the most highly transcribed genes in the E. coli genome, with an average of one RNA polymerase molecule per 85 bp, compared to one every 10–20 kb for the rest of the genome (French and Miller, 1989; Paul et al., 2004). An rrn encoded on a plasmid can also physically relocate RNAP away from the nucleoid, which causes a decrease in growth rate (Cabrera and Jin, 2006), suggesting that the rrn genes themselves affect RNAP localization. With the exception of rrnC (which appears to have its physical location controlled by its proximity to the origin), the rrn operons also colocalize in the cell (Gaal et al., 2016). Regardless, we find that rrnC is also within a transcriptional propensity peak. Together, these findings suggest a model in which the very high concentrations of RNAP involved in active transcription of rrn occur in regions of increased transcriptional propensity, although we cannot yet determine whether the local propensity is a consequence of rrn transcription, or has evolved to facilitate it. In general, highly transcribed native E. coli genes are more frequently located in rrn-proximal regions with high transcriptional propensity (Fig. S4E), suggesting that both gene-specific regulation and genome organization evolve for specific expression outcomes.
Fis and H-NS are markers for activation and repression, respectively
Transcriptional propensity is highly correlated with Fis binding and anticorrelated with H-NS binding, which are by far the two strongest correlations for protein binding (Fig. 4) (Kahramanoglou et al., 2011). As two of the top five most abundant NAPs during exponential phase growth, they bind to and affect expression of many genes both directly and indirectly (Ali Azam et al., 1999; Azam and Ishihama, 1999; Cho et al., 2008; Kahramanoglou et al., 2011). In general, genes bound by H-NS are directly repressed, shown by increased expression of bound genes in hns knockout strains. Fis-bound genes are typically more highly expressed. However, only 15% of Fis-bound genes are differentially expressed in a fis knockout strain during mid-exponential phase (Kahramanoglou et al., 2011). Our reporter is essentially identical at every integration site. Therefore, any mechanistic effects of H-NS or Fis on transcriptional propensity must occur directly on chromosomal integration neighborhood or region, rather than due to specific binding to the promoter of the reporter itself.
It is conceivable that Fis activates reporter transcription through binding to local regions around integration sites in a similar manner to rrn transcriptional activation (Bokal et al., 1995; Hirvonen et al., 2001; Ross et al., 1990). Alternatively, regions of high Fis density may be activated through other mechanisms, such as spatial localization to regions of high RNA polymerase availability, or effects on supercoiling state and DNA conformation that promote transcriptional initiation. Consistent with these models, Fis binding is also anticorrelated with distance from the nearest rrn (Spearman ρ=−0.52).
The negative correlation of transcriptional propensity with identified binding sites of the NAP H-NS from ChIP-seq experiments is consistent with a silencing role for H-NS (Fig. 4B) (Kahramanoglou et al., 2011). H-NS can also oligomerize along DNA, in a process dependent on non-specific electrostatic interactions with DNA (Gao et al., 2017). Therefore, H-NS, and likely other proteins such as StpA and Hha (Boudreau et al., 2018), may oligomerize/bridge from silenced genomic regions into small integrated reporters and silence their transcription (Lang et al., 2007; Rangarajan and Schnetz, 2018). The size, promoter strength, AT content and other features of the reporter itself may play an important role in determining the particular transcriptional outcome at different sites due to conflict between H-NS and RNAP, as has been suggested (Landick et al., 2015); we reiterate, however, that for comparison among the barcoded reporters used in our study, we showed that variations in AT content of the barcode itself has no impact on observed transcriptional propensities (Figure S2A-C). We also find that reporter integration density is most highly and positively correlated to H-NS binding and low transcriptional propensity (both of which partially overlap with genomic AT content). Furthermore, integration density is anticorrelated with RNA abundance from native genes (Spearman ρ=−0.34). These results were surprising because the opposite occurs in eukaryotes, where low gene expression is well-correlated with heterochromatin that is inaccessible to Tn5 transposon insertion, a fact used to great effect in ATAC-seq assays (Buenrostro et al., 2013). Although this is only a single observation for the present study, it suggests a model in which foreign DNA may more readily integrate into H-NS-bound sites, thereby increasing the likelihood that integrated foreign DNA is silenced, as has been previously proposed (Dorman, 2014; Fang and Rimsky, 2008; Higashi et al., 2016). Such a mechanism would also be consistent with the enrichment of prophages and mobile elements that we observed in low transcriptional propensity regions (Fig. 6). As opposed to horizontally transferred genes, which are often bound by and silenced by H-NS and are generally AT-rich in E. coli and closely related species (Lucchini et al., 2006; Navarre, 2006), the integration construct has 53.7% GC content. Additionally, barcode GC content has no correlation with genomic GC content in 500 bp surrounding each integration site (Spearman ρ=0.001), indicating that small changes in overall reporter GC content do not affect integration site. The exact mechanism by which H-NS influences integration and expression of foreign DNA remains an important but challenging subject for ongoing studies, due to partial functional redundancies and suppressive potential of other NAPs (Ali et al., 2014; Uyar et al., 2009). We also observed very low expression from reporters integrated into tsEPODs (Fig. S4A), supporting and greatly expanding on previous functional tests from a few sites (Bryant et al., 2014; Vora et al., 2009). We note that there may also be sites that were silenced to the extent that integrations could not be selected for in kanamycin-containing media and were thereby lost from the library.
Transcriptional propensity has minor differences depending on the reporter strand
In general, we observe no more than minor effects of neighboring transcription on transcriptional propensity (Table S2). These results may indicate that DNA supercoiling regulation is highly efficient on the chromosome, ameliorating supercoiling-mediated transcription conflicts. Additionally, the expression level for genes in various orientations used in previous studies is likely high compared to the global transcriptional activity considered for this analysis (note the relative RNA abundances in Fig. 7D, E) (Bryant et al., 2014; Yeung et al., 2017). Similarly, we did not observe a bias in transcriptional propensity depending on whether reporters were encoded on the leading or lagging strands, indicating that replication conflicts generally do not impose a major effect on transcriptional propensity. It is possible that global strand differences would be detectable in cells that are deficient in R-loop resolution, as has been reported for reporters and native genes in RNase HIII mutant B. subtilis cells (Lang et al., 2017), or in the presence of higher levels of transcription through our integrated reporter.
Functional classes of genes are enriched at specific transcriptional propensity levels
Clustering of genes involved in the same pathway is a hallmark of bacterial genome organization (Demerec and Hartman, 1959; Lawrence, 1999; Ochman et al., 2000). By definition, clustered genes will end up within the same transcriptional propensity region. For example, the large operon encoding genes for organic phosphonate catabolism is entirely contained in a region of very high transcriptional propensity. However, there are other classes of genes that are not clustered which are nonetheless significantly enriched at specific transcriptional propensity levels. For example, the GO term for cellular amino acid biosynthetic process (GO:0008652) is composed of over 100 genes, which are scattered throughout the E. coli genome in operons and as single genes, but are significantly enriched in high transcriptional propensity regions. For genes within a specific pathway, however, clustering for co-regulation or as a result of horizontal transfer also allows genes within a common pathway to reside in the same transcriptional propensity neighborhood, which may be another evolutionary strategy by which genes in the same pathway are expressed at optimal levels. Perhaps gene clustering within a transcriptional propensity region could be considered another method of co-regulation.
In considering the implications of our results, it is important to bear in mind that all experiments described here were performed on cells growing in rich media during early exponential phase. In all likelihood, growth phase dependent changes in NAP occupancy (Talukder and Ishihama, 2015), as well as (potentially) local regulation of transcriptional propensity across changing physiological conditions, may substantially alter the positional effects of transcription. Future mapping of transcriptional propensity under different growth conditions will be particularly interesting in light of the enrichment of specific gene classes involved in rapid growth that we found in transcriptional propensity levels observed during exponential growth in rich defined media. It is also important to consider the properties of the reporter construct itself. Although design and analysis choices were made to optimize the collection of detectable signals while simultaneously minimizing the effect of the reporter on the underlying biology, there is a large diversity of gene organization in the E. coli genome, which may be differently affected by position depending on the physiological condition of the cell. To that end, future studies may elucidate how different gene architectures are affected by position for each cell in a population as opposed to the population averages reported here. Our findings also provide a roadmap for how chromosomal positioning can be utilized to add another layer of regulatory tuning to control expression of chromosomally integrated heterologous pathways, and potentially will enable the design of dedicated integration platforms to target particular expression levels (See Table S4). Future investigation into condition-dependent changes in transcriptional propensities at different genomic regions will be essential to realizing the full potential of this regulatory tool for synthetic biology applications.
Taken together, our results reveal the presence of regional variations in the transcriptional propensity of an identical construct integrated into different regions of the E. coli chromosome. Both extremes of transcriptional propensity appear to have functional significance: ribosomal RNA operons and important biosynthetic operons are disproportionately located in regions of high transcriptional propensity, whereas mobile genetic elements and prophages are located in regions of low transcriptional propensity. We have also elucidated several mechanistic details determining transcriptional propensity: regions of low transcriptional propensity are characterized by high levels of H-NS occupancy, high overall protein occupancy, and high AT content, whereas regions of high transcriptional propensity are characterized instead by higher binding of the nucleoid-associated proteins Fis and HU. The fact that high local levels of one nucleoid protein or another in adjacent regions of the chromosome can so profoundly impact the transcription of a uniform reporter suggests a functional compartmentalization in the bacterial chromosome akin to the euchromatin/heterochromatin distinction in eukaryotes, where active and silenced genes are characterized by the binding state and epigenetic marks of histone proteins (Kouzarides, 2007) and the three-dimensional structure of the chromosome itself (Lanctôt et al., 2007). We expect that future work will more fully explore both any additional molecular details giving rise to these distinctions in bacteria, and determine the role played by position-dependent transcriptional propensity in gene regulation and evolution.
STAR METHODS
CONTACT FOR REAGENT AND RESOURCE SHARING
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Peter Freddolino (petefred@umich.edu).
EXPERIMENTAL MODEL AND SUBJECT DETAILS
MG1655 (CSGC 7740) was obtained from the Coli Genetic Stock Center (CGSC, Yale) (Blattner, 1997). We used P1 vir transduction to introduce the Z1 cassette from MG1655 Z1 malE, a gift from Keith Tyo (Addgene plasmid # 65915), into MG1655. The TetR repressor itself is integrated at the attB site (genomic coordinates 807,328–807,342); as this is a region of high transcriptional propensity, the variations that we observe cannot be attributed to simple proximity to the site of repressor production. The MG1655 Z1 strain was then transformed with the lambda red plasmid pSIM5 (gift from Prof. Don Court). We then used the primers BT1promCh F and BT1promCh R to amplify the mCherry and ampicillin resistance cassette from pBT1-proD-mCherry, a gift from Michael Lynch (Addgene plasmid # 65823). The mCherry cassette was then integrated into a site directly downstream from yihG using lambda red recombination to produce ecSAS17 (MG1655 Z1 mCherry+ AmpR). We confirmed the mCherry integration by genotyping and the transduction of the Z1 cassette by observing TetR-mediated repression of mNG compared to a blank MG1655 strain. ecSAS17 was then transformed with the pBAD-Flp plasmid (see below) to provide the starting strain for library generation.
METHOD DETAILS
Reporter construct design
The mNeonGreen (mNG) coding sequence was obtained through license from Allele Biotech (Shaner et al., 2013). We put mNG under control of the TetO1 promoter and the B0030 ribosome binding site, which is predicted to have 30-fold lower translation initiation rate than the highest rate of a native gene in E. coli (Espah Borujeni et al., 2014; Kosuri et al., 2013). Upstream of the mNG cassette, an FRT-flanked kanamycin resistance cassette amplified from a Keio collection strain was introduced in the divergent orientation relative to mNG (Baba et al., 2006). Directly downstream of the mNG coding sequence, we introduced an Illumina i5 adapter primer complement sequence and an AscI recognition site for later barcoding of the integration construct. The reporter and antibiotic cassettes are flanked by the strong bidirectional terminators L3S2P21 and ECK120026481 (Chen et al., 2013). Finally the entire cassette is flanked by mosaic ends (MEs) to allow for binding to Tn5 transposase. The ME-flanked construct was modified to remove two PvuII restriction sites in order to allow for PvuII digestion of the plasmid pSAS31 and release the integration construct for Tn5 transposase binding in vitro. The full annotated pSAS31 sequence, with the exception of the mNeonGreen CDS, can be found in Data S1.
Large-scale plasmid barcoding
pSAS31 was digested with the restriction enzyme AscI (NEB Cat#R0558). Primers were used to introduce the barcode and amplify the entire plasmid by PCR (Fig. S7). The reverse primer includes one base that is either an A or T directly 5’ of the annealing sequence. Fourteen hand-mixed random nucleotides followed by an AscI site are directly 5’ of the A/T (Integrated DNA Technologies) (Table S6). Six μg of resulting fragment was digested by DpnI (NEB Cat# R0176) and AscI. The digested DNA was column purified and then ligated in a 2.4 mL reaction with 40 μl T4 ligase overnight at 14°C (Invitrogen Cat# 15224017). The reaction was quenched with 40 μl of 0.5 M EDTA. We then scaled up the Hanahan procedure to transform chemically competent cells with the ligated plasmid (Hanahan et al., 1991). Cells were recovered in SOC for one hour at 37°C before removing an aliquot for transformation efficiency counts and adding 50 μg/mL kanamycin for 8h liquid selection at 37°C. Cells were then pelleted for 7 minutes at 4600 x g and snap frozen in liquid nitrogen. To obtain the plasmid, snap-frozen cells were resuspended in lysis buffer for plasmid miniprep. By colony counts we estimate that 48.55 million cells were uniquely transformed with a barcoded plasmid, with a transformation efficiency of approximately one in 4,500 cells.
pBAD-Flp plasmid construction
Upon initial attempts at library construction, the pCP20 plasmid (Cherepanov and Wackernagel, 1995) caused over 90% of cells with an FRT-flanked kanamycin resistance cassette to lose resistance even at the non-inducing 28°C temperature, presumably due to leaky expression of Flp recombinase (data not shown). Since leaky expression of Flp recombinase from the pCP20 plasmid appeared to be severely reducing transposon integration efficiency, probably due the removal of the KanR cassette soon after integration and prior to liquid-phase selection, we replaced the PR temperature sensitive promoter on pCP20 with the arabinose-inducible promoter pBAD and repressor araC gene. The modified pBAD-Flp plasmid did not cause detectable loss of the KanR cassette under repressing conditions, yet still allowed efficient excision upon arabinose induction (data not shown). The full sequence of pBAD-FLP can be found in Data S2.
Tn5 integration of barcoded reporter constructs
To generate stable transposomes for electroporation into our target strain, we utilized the Epicentre EZ-Tn5 custom transposome construction kit following the manufacturer’s instructions. In brief, barcoded pSAS31 plasmid was digested with PvuII (NEB Cat#R0151) for one hour at 37°C and fragments were separated on a 0.8% agarose gel. The band corresponding to the integration fragment size was excised from the gel and purified (Qiagen Cat# 28706). Two μl of 200 ng/μl fragment was then incubated with 2 μl Tn5 transposase and 1 μl glycerol, according to the manufacturer’s instructions. After 30 minutes incubation at room-temperature, the mixture was stored at −20°C. Electrocompetent cells were prepared using ecSAS17 with 34 μg/L chloramphenicol included in the growth medium in order to maintain the pBAD-Flp recombinase plasmid. One μl of the Tn5-DNA complex was mixed with 50 μl of fresh electrocompetent cells. Four separate electroporations were carried out in 2 cm electroporation cuvettes at 2500kV and immediately resuspended in 1mL of 30°C SOC medium. Each reaction was pooled into SOC medium including 34 μg/mL chloramphenicol and incubated at 30°C for 1.5 hours. An aliquot for plating on selective plates to assess integrant counts was removed from the recovery medium before adding 50 μg/mL Kanamycin. Liquid selection proceeded for 16 hrs at 30°C. After liquid selection, all cells were pelleted at 4600 x g for 7 minutes. Cells were then resuspended in 30 mL 15% glycerol, pipetted into 30 1 mL aliquots and snap frozen in a dry-ice ethanol bath before storage of the transposon library at −80°C (the entire transformation procedure is adapted from (Girgis et al., 2007)). According to colony forming unit counts from plating after recovery, 609,000 cells were uniquely transformed and maintained pBAD-Flp, as indicated by resistance to kanamycin and chloramphenicol, corresponding to approximately one in 5,600 cells integrated with a reporter; thus, the odds of dual integration in a single cell are exceedingly small, and we have never observed such an event in transposon footprinting experiments performed on single colonies (data not shown).
Pairing integration site with barcode via transposon footprinting
Cells from one aliquot of the transposon library were recovered in 5 mL SOC for 30 minutes at 30°C with shaking. Genomic DNA was isolated from the library using the Qiagen Blood and Tissue kit for Gram negative bacteria. 1 μg of the resulting DNA was digested separately with each of CviAII (NEB Cat#0640) and CviQI (NEB Cat#R0639) restriction enzymes (each has a different 4 bp cut site but leaves compatible overhangs; the use of both enzymes prevents inability to identify footprints in the rare event when a restriction site is close to the transposon insertion). An annealed Y-linker (final concentration of 10 pM of each CviQI-YTA3 with CviQI-YTA5 or CviAII-YAT3 with CviAII-YAT5, Table S6) that complements the overhangs was ligated to the digested DNA fragments with T4 DNA ligase (Invitrogen Cat# 15224017) for 10 minutes. The reaction was quenched with 1 μl 0.5 M EDTA. The DNA from the ligation mix was purified with Axygen AxyPrep Mag PCR cleanup beads at a 0.9:1 bead to DNA ratio to remove unligated Y-linker. The resulting DNA was amplified by PCR using the primers that bind within the transposon and on the Y-linker to amplify transposon-genomic DNA specific fragments (P9, P10, Table S6). NEBnext dual index primers (NEB Cat#E7600) were then used to add sequencing adapters by PCR with Q5 Hot Start polymerase (NEB Cat#M0493). Sequencing preparation was completed in parallel for the CviQI and CviAII cleaved samples, and they were then combined computationally during postprocessing by concatenating the resulting reads.
Full-scale genome profiling procedure
For each biological replicate, a single aliquot of the cryopreserved transposon library was scraped into 1 mL of M9-EZrich medium (NH4Cl 1 g/L, KH2PO4 3 g/L, NaCl 0.5 g/L, Na2HPO4 6 g/L, MgSO4 240.7 mg/L, ferric citrate 2.45 mg/L, CaCl2 111 ng/L, 200 mL/L 5x Supplement EZ (Teknova cat # M2008), 1 mL/L micronutrient solution (Neidhardt et al., 1974)) and diluted into 50 mL of M9-EZrich with 1% Arabinose + 0.4% glycerol + 34 μg/mL chloramphenicol in a baffled 125mL flask to achieve OD600 (optical density at 600 nm) of 0.0031. Micronutrient solution is composed of 0.3 μM Ammonium heptamolybdate, 400 μM boric acid, 30 μM cobalt (ii) chloride, 10 μM copper (ii) sulfate, 80 μM manganese (ii) chloride, 10 μM zinc sulfate (Neidhardt et al., 1974). The flask was incubated at 30°C for 8 hours with shaking at 225 rpm to allow Flp recombinase to excise the kanamycin resistance cassette. Cells were then pelleted at 4600 x g for 7 minutes and resuspended in 25 mL PBS. In parallel, an aliquot of the culture was diluted and plated on LB-kanamycin and LB plates to determine the fraction cell that permanently lost kanamycin resistance (>92.5%). Assessed by qPCR, there was 0.1% and 1.4% kanR relative to the amount of kanR remaining in the library where Flp recombinase was not induced, for replicates one and two, respectively (mNG primers P34 and P35 were used to normalize library DNA concentrations with and without Flp induction). See below for analysis and interpretation regarding the low rate of kanR retention.
After the pre-growth and kanR excision described above, cells were pelleted again and resuspended in 10 mL M9RDM. Cells were then diluted into 100 mL of M9RDM (Glucose, 4 g/L, NH4Cl 1 g/L, KH2PO4 3 g/L, NaCl 0.5 g/L, Na2HPO4 6 g/L, MgSO4 240.7 mg/L, ferric citrate 2.45 mg/L, CaCl2 111 ng/L, 200 mL/L 5x Supplement EZ, 100 mL/L 10X ACGU (Teknova cat # M2103), 1 mL/L micronutrient solution). Then a final concentration of 100 μg/L anhydrotetracycline (aTc) was added to 0.0031 OD600 cells. The culture was incubated at 37°C until an OD600 of 0.2 was reached (about 3.5 hours) to allow induction of the transposon-born reporter construct. The entire flask was then immediately transferred to an ice-slurry bath. Three aliquots of 5 mL were then pelleted at 6600 x g for 3 minutes and snap-frozen in a dry-ice ethanol bath to allow harvest of genomic DNA. In parallel, three additional aliquots of 5 mL of the culture was rapidly mixed with 10 mL RNAProtect Bacteria Reagent (Qiagen) and frozen according to the manufacturer’s instructions to allow harvest of RNA from matched samples of the growing library. All samples were then stored at −80°C.
Nucleic acid processing and sequencing
Genomic DNA (gDNA) from harvested samples was extracted following the Qiagen Blood and Tissue kit instructions. 1 μg of gDNA was then digested for 1 hour with CviQI. The resulting DNA was purified with PCR cleanup kit and eluted into 0.1x TE. The DNA was then amplified with primers P9 and P11 flanking the barcode for eight cycles using Q5 polymerase, resulting in a 186 bp fragment (Fig. S7 and Table S6). The DNA from the PCR mix was purified with Axygen AxyPrep Mag PCR cleanup beads at a 1.8:1 bead to DNA ratio to remove unincorporated primers.
RNA from the exponentially growing cells was extracted following the Qiagen RNeasy Bacterial RNA protect protocol including on-column DNaseI treatment. 1 μg of the resulting RNA and a single reverse primer (P11) were used for first strand synthesis with the NEB Protoscript II First Strand cDNA kit using the manufacturer’s instructions, and the resulting cDNA was stored at −20°C. No-polymerase controls (-RT) were included. 20 μl of the gDNA or 5 μl of cDNA reaction mixture was used for a 50 μl minimal-cycle PCR amplification using NEB Q5 hotstart polymerase, following the manufacturer’s instructions with the following modifications: NEB i5 or i7 primers were used to add Illumina adapter sequences. EvaGreen dsDNA dye to a final 1x concentration was added to each reaction. 10 μl of each reaction (including -RT controls) were then monitored for qPCR fluorescence signal during PCR amplification. The remaining 40 μl of each reaction was then amplified with the number of PCR cycles corresponding to 25% of the maximum fluorescence observed in the 10 μl qPCR pilot reaction. We verified that the cycle threshold for the -RT cDNA controls were at least 7 cycles greater than the standard cDNA samples (indicating background from DNA contamination of less than 1%). Each 40 μl PCR reaction was then purified with 90 μl of Axygen MAG-S1 beads and eluted in 0.1x TE. The purified, prepared DNA library was was submitted to the University of Michigan sequencing core for sequencing on an Illumina NextSeq 550.
Construction and testing of targeted reporter integrations
The ecSAS17 strain was transformed with the lambda Red plasmid pSIM5 (gift from Prof. Don Court). The reporter construct was amplified with primers with 37–40 bp 5’ flanks to introduce homology domains for each integration site (Table S6). Each purified reporter DNA fragment and digested with DpnI to remove the pSAS31 template plasmid. Integration constructs were then electroporated into the ecSAS17 + pSIM5 strain and plated on LB-kanamycin agar. Integration strain candidate colonies were streaked out and grown overnight at 37°C. A single colony from each candidate streak was then grown overnight at 37°C in LB broth in order to eliminate the temperature-sensitive pSIM5 plasmid. The resulting strain was then transformed with pCP20 and selected on ampicillin at 30°C in order to cure the KanR cassette. Single colonies were streaked out on LB ampicillin plates and grown overnight at 30°C. The resulting single colonies were then spotted onto both LB agar and LB-kanamycin agar in order to confirm loss of kanR. The strains showing sensitivity to kanamycin were then checked for integration using primers from each side of the chromosomal integration site and primers P34 and P35 within mNG (Table S6). PCR reactions that produced bands of the expected size were purified and sent to the University of Michigan Sequencing Core for Sanger sequencing and confirmed for the integration site and sequence. Confirmed strains were grown in LB broth at 37°C overnight and cryopreserved indefinitely.
For qPCR analysis of mNG transcript level in targeted integration strains, cells were grown overnight in LB broth at 37°C. The strains were then diluted 1:100 into M9RDM and grown for two hours and 37°C. After the pre-growth, the cells were diluted to a final concentration of 0.0031 OD600 cells in fresh M9RDM including 100 μg/L aTc. The culture was incubated at 37°C until an OD600 of 0.2 was reached (about 3.5 hours) to allow induction of the reporter construct. The entire flask was then immediately transferred to an ice-slurry bath. Two aliquots of 2 mL were then pelleted at 4600 x g for 6 minutes at 4°C and snap-frozen in a dry-ice ethanol bath for later harvest of genomic DNA. In parallel, two additional aliquots of 650 μl of the culture was rapidly mixed with 1.3 mL RNAProtect Bacteria Reagent (Qiagen) and frozen according to the manufacturer’s instructions to allow harvest of RNA from matched samples of the growing library. All samples were then stored at −80°C. The procedure was performed in its entirety on three separate days. Purified RNA was converted to cDNA using the standard protocol for NEB Protoscript II using random hexamers. cDNA and genomic DNA was quantified by qPCR with primers P46–P51 using iTaq™ Universal SYBR® Green Supermix. Cycle thresholds for opgG were used to normalize loading of DNA and cDNA for all other primer sets (Heng et al., 2011).
For growth rate analysis, the same pre-growth procedure as above was performed. Next, cells from the pregrowth were diluted to 0.0031 OD600 in fresh M9RDM with and without aTc and 150 μl was added into a clear-bottom, black-walled microplate in triplicate. Each culture was covered with 100μl of sterile mineral oil. The microplate was shaken at 37°C and monitored every 10 minutes for OD600 in order to derive the doublings per hour. The procedure was completed in its entirety on three separate days.
QUANTIFICATION AND STATISTICAL ANALYSIS
Footprinting positions of insertions on the genome
Sequencing results returned from Illumina NextSeq 550 had sequencing depths of 33,197,291 and 39,030,663 for RNA barcodes, 13,260,924 and 14,763,447 for DNA barcodes, and 5,604,268 and 5,622,546 paired-end reads for kanR retention samples, for each replicate. Barcodes from inserted reporter constructs in the genome are included at the 3’ end of the mNG transcript. The footprinting process sequences both the barcodes and genomic sequences after the insertion site obtained from fragmented genomic DNA. Barcodes and genomic region sequences were extracted from the obtained Read 1 sequences, using Cutadapt 1.8.1 (Martin, 2011) to remove a fixed length of leading or trailing sequences and to remove construct sequences. Only barcodes and genomic region sequences from reads with an identifiable construct sequence were extracted (Fig. S7).
The extracted barcodes from the two sequencing runs for both CviAII and CviQI (four samples in total) were pooled into a single barcode read pool for further analysis, and the genomic region sequences were similarly treated. Pooled barcodes were filtered to remove any barcodes with any base of quality score below 30. The filtering survival rate for the barcodes was 65.96% (68,933,499 out of 104,513,169 reads). In parallel with barcode filtering, pooled genomic region sequences were trimmed by quality using Trimmomatic 0.33, removing trailing bases with quality scores below 3, any sliding window of 4 bases that had average quality score below 15, and keeping reads with a minimum length of 20 bases. Quality trimming survival rate for the pooled genomic region sequences was 74.74% (78,114,467 out of 104,513,169). Only reads which passed both filtering steps noted above were included in alignments to the genome.
For alignment, the reference was built using sequences from MG1655 genome (U00096.3), pBAD-Flp, pSAS31, and pBT1-proD-mCherry sequences. The alignment of the extracted genomic sequences to the reference was performed using Bowtie2 (2.1.0), under the “very sensitive” preset. Alignment rate was 58.17% (45,437,982 out of 78,114,467). The query read names, 5’ aligned positions, and strandness information were extracted to match the transcriptional propensity data.
The Y-linkers used in footprinting incorporated a random 4 bp unique molecular identifier (UMI), which was then observed in Read 2 of footprinting data. Cutadapt was used to cut construct sequences as anchored 5’ adapters, allowing no indels and discarding uncut sequences. The trailing sequences of construct were removed from the remainder sequences to retrieve UMI sequences. UMIs, barcode sequences, and insertion locations were matched to identify the corresponding insertion location and UMI count for each barcode, keeping only entries with all three types of information.
Tables of barcodes, UMIs, and genome positions were first deduplicated to keep only unique records of every combinations of all three source of information. Barcodes with multiple insertion positions were removed. The barcode-position pairs were then supplemented by the counts of unique corresponding UMIs. As a result, each barcode sequence was mapped to its unique insertion position on genome and its unique UMI count number. Combined, these data allowed mapping of transcriptional propensity for each barcode onto the E. coli genome. In total, 355,314 barcodes were mapped to E. coli genome (excluding sequences derived from plasmids), of which 184,575 were supported by at least 2 different UMIs; only the latter category of locations were included in downstream analysis.
For each genomic position identified above, we used the number of integration sites falling into a 1-kb window (500 bases on each side) to describe the integration density across genome (Figure S1E). We investigated the percentage of genomic positions with at least a certain number of integration sites within the window, for all positions on genome. Integration sites included all integrations identified and mapped in footprinting that were on the genome. The geometric mean and median were calculated including sites with no integrations in the windows (that is, across the entire genome).
Quantitation and mapping of transcriptional propensity to integration sites
To retrieve the barcode sequences from RNA or DNA sequencing of the barcodes themselves, all Read 1 sequences containing the barcodes were processed using Cutadapt, with part of the construct and primer sequence removed as anchored 3’ adapters (Fig. S7). The construct cutting process allowed no indels and discarded any reads that were not cut. The counts of barcodes were measures of abundances of barcodes in RNA (as cDNA libraries) and DNA (as gDNA libraries) samples. Ambiguous barcodes were removed.
The barcode abundances were mapped to insertion positions by barcode sequences, keeping only barcodes that had both at least one count in either RNA or DNA samples, and had a mapped location on the genome from the footprinting data. The barcodes were further filtered to require at least two different UMIs. A total of 140,292 barcodes, mapping to 98,034 locations, passed all filters on the merge, and were thus included in the transcriptional propensity calculations described in the text.
Quantitation of knock-out and growth rate effects of insertions
“Knock-out effects” refer to the situation where the insertions of barcodes in or near genes could disrupt that gene’s function. To identify potential knock-out effects of genes, for each gene annotation (NCBI annotation for U00096.3), four statistics were calculated: median RNA/DNA ratio within the windows of upstream 500 base pairs, downstream 500 base pairs, first 500 base pairs, and last 500 base pairs of a gene. All calculation of medians required a minimum of 10 insertions in the window. For genes with length between 500 and 1000 base pairs, overlapping windowing within the gene was allowed. Genes with length less than 500 base pairs were filtered out.
In addition to gene-specific knockout effects, we also considered the more general possibility that insertions with a strong impact on growth rate might affect the observed transcriptional propensity, due to alteration of the dilution rate of the transcript of interest. We have observed in low-throughput experiments that the impact of our reporter insertions on growth rate are very low, even for insertion locations with large differences in transcriptional propensity (Fig. S1F), thus immediately arguing against growth rate effects as a major confounder of our observations. Nevertheless, we also assessed impacts of clonal growth rate on transcriptional propensity observations using the relative abundance of DNA barcodes, regardless of their position relative to genes or the transcription level from any reporter (Fig. S1G,H). To evaluate the potential effect of cell growth on transcriptional propensity, the DNA abundance ratio between after induction and growth (post-) and before (pre-) were calculated. Briefly, in the pre-/post-induction experiment, the reporter library was grown under the same conditions as in the “Full-scale genome profiling procedure” and genomic DNA was collected before induction (pre-) and at after induction and growth to OD600 0.2 (post-), processed and sequenced using the same methods as in the “Nucleic acid processing and sequencing” section. To examine reproducibility between the pre-/post-induction experiment and and the experiment for transcriptional propensity profiling, we visualized and quantified the correlation of counts of barcodes in common between two sets of experiments. To reduce the effect of noise, we performed filterings of a minimum requirement of 10 or 100 counts for each barcode in each sample (Figure S1 F and H for a minimum requirement of 10 counts).
For the growth rate experiment, we examined the reproducibility when different levels of filterings were applied, by examining how well the barcode counts from two replicates agreed with each other. More specifically, after filtering by a minimum requirement of 10 or 100 counts, we calculated the Spearman correlation between replicates as post-growth counts (ρ=0.17 when filtered by 10 and ρ=0.90 when filtered by 100), pre-induction counts (ρ=0.30 when filtered by 10 and ρ=0.89 when filtered by 100), and ratios of post-growth barcode counts over pre-growth counts (0. 02 when filtered by 10 and 0.37 when filtered by 100). The correlations were generally low, suggesting a low signal-to-noise ratio (consistent with our low-throughput observations that the insertions present in the library generally had no effect on growth rate, possibly because clones containing detrimental insertions were already selected out).
To directly test whether relative growth rate had an effect on RNA/DNA ratios or transcriptional propensity, we applied the minimum count filter, then calculated the Spearman correlations of the log2(RNA/DNA) or log2(transcriptional propensity), respectively, with log2(post-/pre-induction). We required barcodes evaluated for growth-rate effects to be detected in all four samples in both experiments: growth-rate experiment with two replicates and transcriptional propensity profiling experiment with two replicates. With a requirement of minimum of 10 barcode counts from growth rate experiment replicates, 64,003 barcodes passed the filter; with a requirement of a minimum of 100 counts, 185 barcodes passed the filter.
Profiling genome-wide HU binding landscape
HU binding data was obtained from (Prieto et al., 2011), accessed via Accession Number SRP008538 in NCBI SRA database. HupA (SRR353962) and HupB (SRR353967) single-end ChIP-seq data were downloaded as sra data files, converted into single fastq files via sra-toolkit (2.8.2–1). Quality controls were performed using FastQC (version 0.10.1). Cutadapt was used in preset Illumina sequencing cutting mode to remove adapter sequences from reads. Trimmomatic was used to remove reads and read ends of low quality. The trimming parameters were Phred+33 scores, leading quality score 3, trailing 3 and sliding windows of length 4 and quality score 15, and a minimum read length of 20 bases, in single-end mode. Alignments were performed using Bowtie2, in very sensitive preset mode, to E. coli genome version U00096.3. For each position on genome, the coverages were defined as the counts of aligned templates of reads that spanned the position, calculated using an in-house Python script. The resulting coverages were divided by a spline-smoothed version of the same data to correct for origin-to-terminus effect of circular genome of E. coli (see below for details).
Estimation of gene-level transcriptional propensity for functional analysis
For each gene annotation, the log2 median value of RNA/DNA ratio within the region of the gene’s open reading frame plus a flanking region of 2500 bases on each side of the gene were calculated as the gene-level transcriptional propensity for functional analyses. iPAGE analysis was performed using nine uniformly populated bins with dependency of GO terms (Fig. 6). Genes were also categorized by whether or not the coding products were recognized by SRP, according to the list of gene names provided in (Moffitt et al., 2016). Gene names in the list were mapped to b numbers based on gene annotation for genome version U00096.3. For gene names that corresponded to multiple b numbers, the b number with the gene name as its primary name was prioritized over b numbers with the gene name as synonyms. For genes with no matching gene name but multiple synonyms, the smallest b number was used.
Autocorrelation
To compute the autocorrelation in transcriptional propensity across the genome, we estimated the Spearman correlation coefficient between pairs of loci separated by different base pair lags (ranging from 1 to 200,000 base pairs). For each base pair distance, we first created two lists representing expression levels at pairs of loci separated by a distance equal to the lag. We then computed the Spearman correlation coefficient between these two lists. If a given locus had multiple raw transcriptional propensity values (without median windowing), we took the median for all reporters at that coordinate. To compute the null distribution for the autocorrelation we used a white noise process with N samples where N is the total number of unique insertion locations (98,034) (The MathWorks, Inc., 2018).
Discovery of periodic signals in transcriptional propensity
_______To detect periodic signals in the transcriptional propensity signal we used the astropy implementation (The Astropy Collaboration et al., 2013)of the Lomb-Scargle algorithm (Lomb, 1976; Scargle, 1982) to perform spectral analysis for signals at all possible frequencies that could repeat between 1 and 100000 times across the E. coli genome. Due to the circular nature of a bacterial genome, any periodic signal must repeat at some integer divisor of the full length of the bacterial chromosome and thus we restrict our analysis to only frequencies that are possible under this constraint (n.b. each period need not, however, consist of an integer number of base pairs). The Lomb-Scargle algorithm was designed for unevenly sampled linear time series data and cannot natively handle data coming from a strictly periodic series as is the case for our data. Therefore, in order to better detect low frequency, high period signals over the circular chromosome, we ran the Lomb-Scargle algorithm on the linear transcriptional propensity signal repeated, end-to-end, five times to simulate the circular genome. For all Lomb-Scargle calculations, the spectral power was normalized using the standard normalization based on data residuals around a constant reference model as described in the astropy documentation. In order to assess the statistical significance of the spectral power for the periods we observed we repeated the Lomb-Scargle analysis on 1000 permuted transcriptional propensity signals generated by shuffling blocks of 2200 adjacent collection bins (~100,000 bp) of the original transcriptional propensity signal, and repeating the same shuffled signal end-over-end five times. Periods discovered in the original transcriptional propensity with a Lomb-Scargle power higher than the period with the highest Lomb-Scargle power in all 1000 permuted signals represent periodic signals discovered at a false discovery rate of < 1%.
Comparison with macrodomain and CID locations
Calculation of macrodomain and chromosome interacting domain (CID) boundaries Macrodomain and chromosome interacting domains were calculated from data published with Lioy et al. (Lioy et al., 2018) using GCC contact matrices obtained from E. coli cells in exponential phase growth at 37° C in LB media (GEO data sets GSM2870426 and GSM2870427). Processed count matrices taken directly from GEO and were normalized using code from https://github.com/koszullab/E_coli_analysis. Directionality indices were calculated as described in Lioy et al. on normalized matrices at both 100 Kb (CIDs) and 400 Kb (macrodomains) scales. Significant boundaries were determined by choosing locations where the value of the directionality index t-test transitioned to a value of +2 or greater after previously obtaining a value of −2 or less upstream. Final boundaries were chosen by taking the average of boundaries found within 25 kB of each other between both replicates. Boundaries found only in one replicate were not considered, and final boundaries were converted from the U00096.2 gene coordinates to the U00096.3 gene coordinates used in this study. Boundaries were labeled with the either the first gene overlapping or the closest gene to the boundary as found in the GeneProductSet dataset in RegulonDB version 9.4 and sorted by start coordinate in the annotation file.
Transcriptional propensity, integration density and experimental data processing
RNA barcode counts were divided by DNA barcode counts separately for each replicate to generate raw transcriptional propensity values. Each replicate was then smoothed by a rolling median window over 500 bp for all windows with at least 3 reporters. Smoothed transcriptional propensity values for all integration sites were retained. The replicates (Fig. 2D) were then quantile normalized and averaged to generate the transcriptional propensity values used in this study, unless otherwise noted. All external experimental data sets (see Table S1) were subjected to the same smoothing and averaging of replicates described above. For each correlation python and Matplotlib were used to generate the hexbin plots, histograms, violin plots and Spearman statistics. All spline normalization was carried out using a smoothing B-spline with four knots, located at equidistant points along the chromosome including one at oriC to provide a low-pass filter responding primarily to DNA abundance.
Integration density was calculated by summing reporter integration over a rolling 500 bp window. Since the reporter was integrated during exponential growth phase, integration density was expected to be higher around the origin of replication (Fig. 2C). In order to eliminate density variation arising from gene dosage effects from all correlation analysis, we performed a B-spline smoothing of integration rates over the length of the chromosome (Fig S6). The raw integration density data was divided by the spline values to generate the gene-dosage corrected integration density values that were used for all correlation analysis (Fig S6A-C and Table S1).
In order to approximate the transcriptional propensity per cell instead of per DNA copy number (as in Fig. 2E), we multiplied transcriptional propensity by genomic DNA copy number during exponential phase for cells grown under the same conditions (Data from Thomas Goss; manuscript in preparation). Specifically, B-spline smoothing was used on read depth of total genomic DNA (Fig 7A). Transcriptional propensity was then multiplied by the spline values to generate the dosage transformed transcriptional values shown in Fig. 7B and in Table S4.
Modeling of transcriptional propensity based on chromosomal features
To obtain a minimal set of informative parameters for use in predicting transcriptional propensity, we performed lasso regression (Friedman et al., 2010) using the R package glmnet. Table S1 shows correlation statistics and data source for each experimental data set. We fitted the regression under five-fold cross validation, using a blocked strategy with each group consisting of four ~230 kilobase regions of contiguous locations, in order to account for the correlation structure inherent to the data itself. We only used data points for which all features were available, and thus the cross validation regions all contain consistent numbers of locations, but not necessarily precisely the same size of genomic region.
Elimination of potentially confounding features
As described here, we considered and subsequently eliminated several possible sources of systematic bias in our transcriptional propensity measurements. For the effects considered here, we observed in some cases small effects at the level of individual barcodes, but all such effects were eliminated upon applying the window averaging used in our actual transcriptional propensity statistic (500 bp rolling median requiring at least three independent barcodes), demonstrating a lack of meaningful contribution from any of the factors noted here to the reported signal.
Barcode Sequences.
We observed very low correlation of transcriptional propensity with GC content of the barcode (Spearman ρ = 0.13 and 0.15 for each replicate, respectively, considered at the level of individual barcodes), and essentially all detectable bias was eliminated by the median windowing that we applied in our analysis (Spearman ρ=0.013 for the 500 bp moving median signal) (Fig. S2A-C), eliminating any impact on our final analysis.
Transposon-based Knockout Effects.
In principle, it is possible that the signal that we observed could be altered by the effects of gene disruptions caused by reporter integration. In practice, however, we observed that reporters integrated within the beginning of a gene coding sequence (thus knocking out its activity) were nearly identical to reporters integrated directly downstream of the CDS for the vast majority of genes (Fig. S2D and Table S5), and thus knockout effects appear to play little role in our observations. We could identify only five genes where transcriptional propensity from reporters within a gene differed from the surrounding neighborhood (by at least 1.7 fold) that could be potentially attributed to gene knockout effects instead of H-NS peak location (rep, bioH, dtpC, yfdL, and ftsN - see Table S5). The largest effect was a 2.4 fold-change and these represent exceptions rather than the rule. Most likely, integrations in genes that would globally affect transcriptional propensity also result in a competitive disadvantage during growth and were therefore lost from the library, as were integrations in essential genes. However, there may be some loci where reporter integration causes minor growth defects and results in the appearance of a slightly elevated transcriptional propensity over specific loci as a result of the decreased growth rate. Based on the threshold used for our identification of knockout effects above, we would expect most of these cases to have an effect of less than 1.7-fold. Potential cases that were not already identified with the knockout analysis (Fig. S2D) do not explain the genome-scale signal variation visible in Fig. 1E, which varies over 10 – 100 kb.
Effects of Clone-dependent Growth Rates.
We also considered the possibility that differences in growth rate might impact transcriptional propensity measurements by altering the effective stability of the transcript (through altering its dilution rate). Some evidence against this possibility arose from our consideration of targeted insertions (Fig. S1F), where we saw no growth rate effects from any insertion locations tested, despite those locations showing a 15-fold range in transcriptional propensities. To provide a more comprehensive test of possible effects of growth rate, we estimated the relative growth rates from many transposon-inserted reporters by measuring the abundance of each genomic DNA barcode at the start of the assay period compared to the end of the assay period. We found low correlations of growth-rate dependent genomic DNA abundance of barcodes with either raw RNA/DNA ratios and transcriptional propensity (see methods for correlation coefficients), which suggested that the differences in relative growth rates did not have a substantial effect on our measures of transcriptional propensity. We also noticed that the signal-to-noise ratio in these assays was generally low, likely due to a relative rarity of insertions that caused strong changes in growth rate. The Spearman correlation of the observed replicate-averaged genomic DNA abundance ratios of barcodes after and before growth with raw transcriptional propensity as RNA/DNA ratios before median windowing, was 0.01 using a threshold of 10 counts (Fig. S1G) and 0.14 using a threshold of 100 counts (Fig. S1H), demonstrating that any effect of growth rate on transcriptional propensity is exceedingly weak, and in particular is not needed even for very low or high propensities (Fig. S1H). Note well that the correlations stated here are for individual barcodes, rather than the window-averaged statistics used for transcriptional propensities. Indeed, the Spearman correlations of the window-averaged, growth-dependent changes in DNA barcode abundance ratios with transcriptional propensity (500 bp median windowing) at each genomic position that had insertions of filtered barcodes was −0.02 with a 10 count filtering threshold, and 0.02 with a 100 count filtering threshold, demonstrating a complete lack of meaningful correlation with the key statistic used in our work.
kanR Excision Efficiency.
Although the rate of kanR retention after the excision step is generally very low (0.1–1.4% percent of total mNG signal as assessed by qPCR - see Methods), there was a mild correlation of the rate of kanR retention between replicates for different reporters (Figure S2G), and between the rate of kanR retention and the transcriptional propensity at individual integration sites (Fig. S2E,F). However, unlike the transcriptional propensity signal, the fluctuations in kanR retention were not highly correlated between nearby sites, and lose all correlation with the transcriptional propensity signal upon median windowing (FIg. S2H, I). The combination of this lack of overall correlation, and the very low absolute rates of retention of the kanR marker (see above), leads us to conclude that any site specific variations in kanR excision efficiency have no meaningful effect on our overall transcriptional propensity profiles (indeed, it may well be that the site level correlation that is observed is caused by the variation in propensity/accessibility per se, rather than by retention of the marker).
Effects of Neighboring Sequence Context.
The transcriptional propensity signal was not a result of bias in DNA barcode amplification that could be present due to variations in neighboring genomic AT content, as the two had no meaningful correlation (Fig. S2J). As expected from the large and diverse reporter library strain, counts for DNA barcodes and RNA barcodes vary substantially across the genome and at the local scale (Fig. S2K, L and M). Taken together, these figures indicate that genomic nucleotide content has a strong effect at the level of transcription (Fig. 4C), but is not a result of bias introduced by light PCR amplification of genomic DNA barcodes.
Supplementary Material
The header name (and description) are provided. Attribute (Gene and locus ID), Log2_CDS/downstream (the log2 fold change of the first or last 500 bp window of a gene CDS compared to 500bp downstream of a gene - a negative value indicates that potential integration knockouts cause the transcriptional propensity appear lower across part of the CDS compared to integrations directly downstream of the CDS), Description (ecocyc description of genes with a >1.7-fold change from the downstream window value), KO_effect (Manual annotation of genes with a >1.7-fold change from the downstream window value), First_Last500CDS (indicating whether the CDS value is from the first window or the last window).
We provide a comprehensive table of transcriptional propensity data, which contains the following columns (with description): t1_cDNA (replicate 1 raw RNA barcode counts), t1_gDNA (replicate 1 raw DNA barcode counts), t2_cDNA (replicate 2 raw DNA barcode counts), t2_gDNA (replicate 2 raw DNA barcode counts), pos (U00096.3 coordinates), strand (relative to U00096.3 reference), gc_fraction (Fraction of GC in barcode), raw_propensity1 (replicate 1 RNA/DNA barcode counts), raw_propensity2(replicate 2 RNA/DNA barcode counts), 500_med_win_propensity1(replicate 1 median of RNA/DNA barcode counts in 500bp window around each integration when at least 3 integrations are included), 500_med_win_propensity2(replicate 2 median of RNA/DNA barcode counts in 500bp window around each integration when at least 3 integrations are included), Avrg_500_med_win_propensity (average of replicates after median windowing and quantile normalization).
KEY RESOURCES TABLE
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Bacterial and Virus Strains | ||
| E. coli K12 MG1655 | CGSC | CGSC#: 7740 |
| DH5α™ | Invitrogen | 18265017 |
| MG1655 Z1 malE | Addgene | a gift from Keith Tyo (Addgene plasmid # 65915) |
| Strain BL21(DE3)/pCP20 | CGSC | CGSC#: 14177(Cherepanov and Wackernagel 1995) |
| Chemicals, Peptides, and Recombinant Proteins | ||
| CviAII | NEB | Cat#0640 |
| CviQI | NEB | Cat#R0639 |
| T4 DNA ligase | Invitrogen | Cat# 15224017 |
| iTaq™ Universal SYBR® Green Supermix | Biorad | 1725120 |
| EZ-Tn5™ Transposase | Lucigen | TNP92110 |
| Q5 Hot Start polymerase | NEB | Cat#M0493 |
| 10X ACGU | Teknova | cat # M2103 |
| 5x Supplement EZ | Teknova | cat # M2008 |
| Protoscript II | NEB | Cat#M0368 |
| AscI | NEB | Cat#R0558 |
| PvuII | NEB | Cat#R0151 |
| EvaGreen | Biotium | Catalog #: 31000 |
| RNAProtect Bacteria Reagent | Qiagen | Cat No./ID: 76506 |
| Critical Commercial Assays | ||
| AxyPrep™ Mag PCR Clean-Up Kit | Axygen | MAGPCRCL50 |
| NEBNext® Multiplex Oligos for Illumina® (96 Index Primers) | NEB | E6609S |
| Deposited Data | ||
| Raw sequence files for cDNA barcodes, genomic DNA barcodes, KanR-associated barcodes and transposon footprinting | Sequence Read Archive (SRA) | SRP149841 |
| Experimental Models: Organisms/Strains | ||
| MG1655 Z1 with proD-mCherry | This Study | ecSAS17 |
| lambda red integration of yihG-pSAS31 integration fragment & CURED KanR | This Study | ecSAS20 |
| lambda red integration of yafT-pSAS31 integration fragment & CURED of KanR | This Study | ecSAS21 |
| lambda red integration of eaeH-pSAS31 integration fragment & CURED of KanR | This Study | ecSAS22 |
| lambda red integration of htrL-pSAS31 integration fragment & CURED of KanR | This Study | ecSAS23 |
| lambda red integration of ybcK-pSAS31 integration fragment & CURED of KanR | This Study | ecSAS33 |
| lambda red integration of in_yagF-pSAS31 integration fragment & CURED of KanR | This Study | ecSAS34 |
| lambda red integration of eyeA-pSAS31 integration fragment & CURED of KanR | This Study | ecSAS35 |
| lambda red integration of nrfG-pSAS31 integration fragment & CURED of KanR | This Study | ecSAS36 |
| lambda red integration of phnO-pSAS31 integration fragment & CURED of KanR | This Study | ecSAS37 |
| Oligonucleotides | ||
| Table S6: primer list | This Study | |
| Recombinant DNA | ||
| pSIM5 | gift from Prof. Don Court | |
| pBT1-proD-mCherry | Addgene | a gift from Michael Lynch (Addgene plasmid # 65823) |
| PNCS-mNeonGreen | Allele Biotech | N/A |
| pBAD-Flp | This Study | N/A |
| pSAS31 | This Study | N/A |
| Software and Algorithms | ||
| Autocorrelation code | Shweta Ramdas | https://github.com/shwetaramdas/autocorrelation |
| cutadapt, version 1.8.1 | (Martin 2011) | https://cutadapt.readthedocs.io/en/stable/ |
| Trimmomatic, version 0.33 | (Bolger, Lohse, and Usadel 2014) | http://www.usadellab.org/cms/?page=trimmomatic |
| Bowtie2, version 2.1.0 | (Langmead and Salzberg 2012) | http://bowtie-bio.sourceforge.net/bowtie2/index.shtml |
| iPAGE | (Goodarzi, Elemento, and Tavazoie 2009) | https://tavazoielab.c2b2.columbia.edu/iPAGE/ |
| Data analysis code | This Study | https://github.com/freddolino-lab/2018_genomeProfiling |
Highlights:
Barcoded reporters enable high-resolution mapping of transcriptional propensity
Ribosomal RNA operons are located in the center of highly transcribable regions
Nucleoid-associated proteins Fis and H-NS are predictive of transcriptional propensity
Genes involved in core metabolic processes are enriched in highly transcribable regions
Acknowledgements
We are grateful to all members of the Freddolino and Lin labs, and the University of Michigan Microbiology Super Group, for many helpful comments and suggestions. We thank Shweta Ramdas for implementation of the autocorrelation analysis. This work was supported by NIH R00-GM097033 (to P.L.F.), R03-AI130610 (to P.L.F.), and R35-GM128637 (to P.L.F.). S.A.S. was additionally supported by the NIH Cellular and Molecular Biology Training Grant T32-GM007315. M.B.W. is supported by an NSF Graduate Research Fellowship DGE 1256260.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declaration of Interests
P.L.F., S.A.S., and X.N.L. have submitted a provisional patent application (US62/666,198) for aspects of the work described with regards to using high expression genomic regions to optimize recombinant protein production. The authors declare that no other competing interests exist.
DATA AND SOFTWARE AVAILABILITY
Source code implementing the autocorrelation analysis: https://github.com/shwetaramdas/autocorrelation.
Source code implementing all other statistical analysis and modeling: https://github.com/freddolino-lab/2018_genomeProfiling
We provide a comprehensive table of transcriptional propensity data (Table S7), which contains the following columns (with description): t1_cDNA (replicate 1 raw RNA barcode counts), t1_gDNA (replicate 1 raw DNA barcode counts), t2_cDNA (replicate 2 raw DNA barcode counts), t2_gDNA (replicate 2 raw DNA barcode counts), pos (U00096.3 coordinates), strand (relative to U00096.3 reference), gc_fraction (Fraction of GC in barcode), raw_propensity1 (replicate 1 RNA/DNA barcode counts), raw_propensity2(replicate 2 RNA/DNA barcode counts), 500_med_win_propensity1(replicate 1 median of RNA/DNA barcode counts in 500 bp window around each integration when at least 3 integrations are included), 500_med_win_propensity2(replicate 2 median of RNA/DNA barcode counts in 500 bp window around each integration when at least 3 integrations are included), Avrg_500_med_win_propensity (average of replicates after median windowing and quantile normalization).
We provide the raw sequence files for the cDNA barcodes, genomic DNA barcodes, KanR-associated barcodes and transposon footprinting reads (Accession SRP149841).
References
- Akhtar W, de Jong J, Pindyurin AV, Pagie L, Meuleman W, de Ridder J, Berns A, Wessels LFA, van Lohuizen M, and van Steensel B (2013). Chromatin position effects assayed by thousands of reporters integrated in parallel. Cell 154, 914–927. [DOI] [PubMed] [Google Scholar]
- Ali SS, Soo J, Rao C, Leung AS, Ngai DH-M, Ensminger AW, and Navarre WW (2014). Silencing by H-NS Potentiated the Evolution of Salmonella. PLoS Pathog 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ali Azam T, Iwata A, Nishimura A, Ueda S, and Ishihama A (1999). Growth phase-dependent variation in protein composition of the Escherichia coli nucleoid. J. Bacteriol 181, 6361–6370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet 25, 25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Azam TA, and Ishihama A (1999). Twelve species of the nucleoid-associated protein from Escherichia coli. Sequence recognition specificity and DNA binding affinity. J. Biol. Chem 274, 33105–33113. [DOI] [PubMed] [Google Scholar]
- Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, and Mori H (2006). Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol. Syst. Biol 2, 2006.0008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bakshi S, Choi H, and Weisshaar JC (2015). The spatial biology of transcription and translation in rapidly growing Escherichia coli. Front. Microbiol 6, 636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beckwith JR, Signer ER, and Epstein W (1966). Transposition of the Lac Region of E. coli. Cold Spring Harb. Symp. Quant. Biol 31, 393–401. [DOI] [PubMed] [Google Scholar]
- Bernstein JA, Khodursky AB, Lin P-H, Lin-Chao S, and Cohen SN (2002). Global analysis of mRNA decay and abundance in Escherichia coli at single-gene resolution using two-color fluorescent DNA microarrays. Proc. Natl. Acad. Sci. U. S. A 99, 9697–9702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blattner FR (1997). The Complete Genome Sequence of Escherichia coli K-12. Science 277, 1453–1462. [DOI] [PubMed] [Google Scholar]
- Block DHS, Hussein R, Liang LW, and Lim HN (2012). Regulatory consequences of gene translocation in bacteria. Nucleic Acids Res 40, 8979–8992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bokal AJ 4th, Ross W, and Gourse RL (1995). The transcriptional activator protein FIS: DNA interactions and cooperative interactions with RNA polymerase at the Escherichia coli rrnB P1 promoter. J. Mol. Biol 245, 197–207. [DOI] [PubMed] [Google Scholar]
- Boudreau BA, Hron DR, Qin L, van der Valk RA, Kotlajich MV, Dame RT, and Landick R (2018). StpA and Hha stimulate pausing by RNA polymerase by promoting DNA–DNA bridging of H-NS filaments. Nucleic Acids Res 46, 5525–5546. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brambilla E, and Sclavi B (2015). Gene regulation by H-NS as a function of growth conditions depends on chromosomal position in Escherichia coli. G3 5, 605–614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bryant JA, Sellars LE, Busby SJW, and Lee DJ (2014). Chromosome position effects on gene expression in Escherichia coli K-12. Nucleic Acids Res 42, 11383–11392. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buenrostro JD, Giresi PG, Zaba LC, Chang HY, and Greenleaf WJ (2013). Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cabrera JE, and Jin DJ (2006). Active transcription of rRNA operons is a driving force for the distribution of RNA polymerase in bacteria: effect of extrachromosomal copies of rrnB on the in vivo localization of RNA polymerase. J. Bacteriol 188, 4007–4014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chai Q, Singh B, Peisker K, Metzendorf N, Ge X, Dasgupta S, and Sanyal S (2014). Organization of ribosomes and nucleoids in Escherichia coli cells during growth and in quiescence. J. Biol. Chem 289, 11342–11352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen H, Shiroguchi K, Ge H, and Xie XS (2015). Genome-wide study of mRNA degradation and transcript elongation in Escherichia coli. Mol. Syst. Biol 11, 808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen Y-J, Liu P, Nielsen AAK, Brophy JAN, Clancy K, Peterson T, and Voigt CA (2013). Characterization of 582 natural and synthetic terminators and quantification of their design constraints. Nat. Methods 10, 659–664. [DOI] [PubMed] [Google Scholar]
- Cherepanov PP, and Wackernagel W (1995). Gene disruption in Escherichia coli: TcR and KmR cassettes with the option of Flp-catalyzed excision of the antibiotic-resistance determinant. Gene 158, 9–14. [DOI] [PubMed] [Google Scholar]
- Cho B-K, Knight EM, Barrett CL, and Palsson BØ (2008). Genome-wide analysis of Fis binding in Escherichia coli indicates a causative role for A-/AT-tracts. Genome Res 18, 900–910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clavel D, Gotthard G, von Stetten D, De Sanctis D, Pasquier H, Lambert GG, Shaner NC, and Royant A (2016). Structural analysis of the bright monomeric yellow-green fluorescent protein mNeonGreen obtained by directed evolution. Acta Crystallogr D Struct Biol 72, 1298–1307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cooper S, and Helmstetter CE (1968). Chromosome replication and the division cycle of Escherichia coli. J. Mol. Biol 31, 519–540. [DOI] [PubMed] [Google Scholar]
- Demerec M, and Hartman PE (1959). Complex Loci in Microorganisms. Annu. Rev. Microbiol 13, 377–406. [Google Scholar]
- Dorman CJ (2006). DNA supercoiling and bacterial gene expression. Sci. Prog 89, 151–166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dorman CJ (2014). H-NS-like nucleoid-associated proteins, mobile genetic elements and horizontal gene transfer in bacteria. Plasmid 75, 1–11. [DOI] [PubMed] [Google Scholar]
- Espah Borujeni A, Channarasappa AS, and Salis HM (2014). Translation rate is controlled by coupled trade-offs between site accessibility, selective RNA unfolding and sliding at upstream standby sites. Nucleic Acids Res 42, 2646–2659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fang FC, and Rimsky S (2008). New insights into transcriptional regulation by H-NS. Curr. Opin. Microbiol 11, 113–120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- French SL, and Miller OL Jr(1989). Transcription mapping of the Escherichia coli chromosome by electron microscopy. J. Bacteriol 171, 4207–4216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friedman J, Hastie T, and Tibshirani R (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Softw 33, 1–22. [PMC free article] [PubMed] [Google Scholar]
- Gaal T, Bratton BP, Sanchez-Vazquez P, Sliwicki A, Sliwicki K, Vegel A, Pannu R, and Gourse RL (2016). Colocalization of distant chromosomal loci in space in E. coli : a bacterial nucleolus. Genes Dev 30, 2272–2285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gama-Castro S, Salgado H, Santos-Zavaleta A, Ledezma-Tejeida D, Muñiz-Rascado L, García-Sotelo JS, Alquicira-Hernández K, Martínez-Flores I, Pannier L, Castro-Mondragón JA, et al. (2015). RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond. Nucleic Acids Res 44, D133–D143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gao Y, Foo YH, Winardhi RS, Tang Q, Yan J, and Kenney LJ (2017). Charged residues in the H-NS linker drive DNA binding and gene silencing in single cells. Proceedings of the National Academy of Sciences 114, 12560–12565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Girgis HS, Liu Y, Ryu WS, and Tavazoie S (2007). A comprehensive genetic characterization of bacterial motility. PLoS Genet 3, 1644–1660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goodarzi H, Elemento O, and Tavazoie S (2009). Revealing global regulatory perturbations across human cancers. Mol. Cell 36, 900–911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hanahan D, Jessee J, and Bloom FR (1991). [4] Plasmid transformation of Escherichia coli and other bacteria. In Methods in Enzymology, pp. 63–113. [DOI] [PubMed]
- Heng SSJ, Chan OYW, Keng BMH, and Ling MHT (2011). Glucan Biosynthesis Protein G Is a Suitable Reference Gene in Escherichia coli K-12. ISRN Microbiology 2011, 1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Higashi K, Tobe T, Kanai A, Uyar E, Ishikawa S, Suzuki Y, Ogasawara N, Kurokawa K, and Oshima T (2016). H-NS Facilitates Sequence Diversification of Horizontally Transferred DNAs during Their Integration in Host Chromosomes. PLoS Genet 12, e1005796. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hirvonen CA, Ross W, Wozniak CE, Marasco E, Anthony JR, Aiyar SE, Newburn VH, and Gourse RL (2001). Contributions of UP elements and the transcription factor FIS to expression from the seven rrn P1 promoters in Escherichia coli. J. Bacteriol 183, 6305–6314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jeong D-E, So Y, Park S-Y, Park S-H, and Choi S-K (2018). Random knock-in expression system for high yield production of heterologous protein in Bacillus subtilis. J. Biotechnol 266, 50–58. [DOI] [PubMed] [Google Scholar]
- Jin DJ, and Cabrera JE (2006). Coupling the distribution of RNA polymerase to global gene regulation and the dynamic structure of the bacterial nucleoid in Escherichia coli. J. Struct. Biol 156, 284–291. [DOI] [PubMed] [Google Scholar]
- Joshi MC, Magnan D, Montminy TP, Lies M, Stepankiw N, and Bates D (2013). Regulation of sister chromosome cohesion by the replication fork tracking protein SeqA. PLoS Genet 9, e1003673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kahramanoglou C, Seshasayee ASN, Prieto AI, Ibberson D, Schmidt S, Zimmermann J, Benes V, Fraser GM, and Luscombe NM (2011). Direct and indirect effects of H-NS and Fis on global gene expression control in Escherichia coli. Nucleic Acids Res 39, 2073–2091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kosuri S, Goodman DB, Cambray G, Mutalik VK, Gao Y, Arkin AP, Endy D, and Church GM (2013). Composability of regulatory sequences controlling transcription and translation in Escherichia coli. Proc. Natl. Acad. Sci. U. S. A 110, 14024–14029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kouzarides T (2007). Chromatin Modifications and Their Function. Cell 128, 693–705. [DOI] [PubMed] [Google Scholar]
- Kroner GM, Wolfe MB, and Freddolino P (2018). Escherichia coli Lrp regulates one-third of the genome via direct, cooperative, and indirect routes [DOI] [PMC free article] [PubMed]
- Lal A, Dhar A, Trostel A, Kouzine F, Seshasayee ASN, and Adhya S (2016). Genome scale patterns of supercoiling in a bacterial chromosome. Nat. Commun 7, 11055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lanctôt C, Cheutin T, Cremer M, Cavalli G, and Cremer T (2007). Dynamic genome architecture in the nuclear space: regulation of gene expression in three dimensions. Nat. Rev. Genet 8, 104–115. [DOI] [PubMed] [Google Scholar]
- Landick R, Wade JT, and Grainger DC (2015). H-NS and RNA polymerase: a love–hate relationship? Curr. Opin. Microbiol 24, 53–59. [DOI] [PubMed] [Google Scholar]
- Lang B, Blot N, Bouffartigues E, Buckle M, Geertz M, Gualerzi CO, Mavathur R, Muskhelishvili G, Pon CL, Rimsky S, et al. (2007). High-affinity DNA binding sites for H-NS provide a molecular basis for selective silencing within proteobacterial genomes. Nucleic Acids Res 35, 6330–6337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lang KS, Hall AN, Merrikh CN, Ragheb M, Tabakh H, Pollock AJ, Woodward JJ, Dreifus JE, and Merrikh H (2017). Replication-Transcription Conflicts Generate R-Loops that Orchestrate Bacterial Stress Survival and Pathogenesis. Cell 170, 787–799.e18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lawrence J (1999). Selfish operons: the evolutionary impact of gene clustering in prokaryotes and eukaryotes. Curr. Opin. Genet. Dev 9, 642–648. [DOI] [PubMed] [Google Scholar]
- Le TBK, Imakaev MV, Mirny LA, and Laub MT (2013). High-resolution mapping of the spatial organization of a bacterial chromosome. Science 342, 731–734. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lioy VS, Cournac A, Marbouty M, Duigou S, Mozziconacci J, Espéli O, Boccard F, and Koszul R (2018). Multiscale Structuring of the E. coli Chromosome by Nucleoid-Associated and Condensin Proteins. Cell 172, 771–783.e18. [DOI] [PubMed] [Google Scholar]
- Lomb NR (1976). Least-squares frequency analysis of unequally spaced data. Astrophys. Space Sci 39, 447–462. [Google Scholar]
- Lucchini S, Rowley G, Goldberg MD, Hurd D, Harrison M, and Hinton JCD (2006). H-NS mediates the silencing of laterally acquired genes in bacteria. PLoS Pathog 2, e81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marbouty M, Le Gall A, Cattoni DI, Cournac A, Koh A, Fiche J-B, Mozziconacci J, Murray H, Koszul R, and Nollmann M (2015). Condensin- and Replication-Mediated Bacterial Chromosome Folding and Origin Condensation Revealed by Hi-C and Super-resolution Imaging. Mol. Cell 59, 588–602. [DOI] [PubMed] [Google Scholar]
- Martin M (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10. [Google Scholar]
- Martínez-Antonio A, Medina-Rivera A, and Collado-Vides J (2009). Structural and functional map of a bacterial nucleoid. Genome Biol 10, 247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Masters M (1977). The frequency of P1 transduction of the genes of Escherichia coli as a function of chromosomal position: preferential transduction of the origin of replication. Mol. Gen. Genet 155, 197–202. [DOI] [PubMed] [Google Scholar]
- Meyer S, Reverchon S, Nasser W, and Muskhelishvili G (2017). Chromosomal organization of transcription: in a nutshell. Curr. Genet 64, 555–565. [DOI] [PubMed] [Google Scholar]
- Moffitt JR, Pandey S, Boettiger AN, Wang S, and Zhuang X (2016). Spatial organization shapes the turnover of a bacterial transcriptome. Elife 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mooney RA, Davis SE, Peters JM, Rowland JL, Ansari AZ, and Landick R (2009). Regulator trafficking on bacterial transcription units in vivo. Mol. Cell 33, 97–108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Navarre WW (2006). Selective Silencing of Foreign DNA with Low GC Content by the H-NS Protein in Salmonella. Science 313, 236–238. [DOI] [PubMed] [Google Scholar]
- Neidhardt FC, Bloch PL, and Smith DF (1974). Culture medium for enterobacteria. J. Bacteriol 119, 736–747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ochman H, Lawrence JG, and Groisman EA (2000). Lateral gene transfer and the nature of bacterial innovation. Nature 405, 299–304. [DOI] [PubMed] [Google Scholar]
- Paul BJ, Ross W, Gaal T, and Gourse RL (2004). rRNA transcription in Escherichia coli. Annu. Rev. Genet 38, 749–770. [DOI] [PubMed] [Google Scholar]
- Postow L, Hardy CD, Arsuaga J, and Cozzarelli NR (2004). Topological domain structure of the Escherichia coli chromosome. Genes Dev 18, 1766–1779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prieto AI, Kahramanoglou C, Ali RM, Fraser GM, Seshasayee ASN, and Luscombe NM (2011). Genomic analysis of DNA binding and gene regulation by homologous nucleoid-associated proteins IHF and HU in Escherichia coli K12. Nucleic Acids Res 40, 3524–3537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rangarajan AA, and Schnetz K (2018). Interference of transcription across H-NS binding sites and repression by H-NS. Mol. Microbiol 108, 226–239. [DOI] [PubMed] [Google Scholar]
- Ross W, Thompson JF, Newlands JT, and Gourse RL (1990). E.coli Fis protein activates ribosomal RNA transcription in vitro and in vivo. EMBO J 9, 3733–3742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scargle JD (1982). Studies in astronomical time series analysis. II-Statistical aspects of spectral analysis of unevenly spaced data. Astrophys. J 263, 835–853. [Google Scholar]
- Schmid MB, and Roth JR (1987). Gene location affects expression level in Salmonella typhimurium. J. Bacteriol 169, 2872–2875. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Selinger DW, Saxena RM, Cheung KJ, Church GM, and Rosenow C (2003). Global RNA half-life analysis in Escherichia coli reveals positional patterns of transcript degradation. Genome Res 13, 216–223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shaner NC, Lambert GG, Chammas A, Ni Y, Cranfill PJ, Baird MA, Sell BR, Allen JR, Day RN, Israelsson M, et al. (2013). A bright monomeric green fluorescent protein derived from Branchiostoma lanceolatum. Nat. Methods 10, 407–409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sinden RR, and Pettijohn DE (1981). Chromosomes in living Escherichia coli cells are segregated into domains of supercoiling. Proc. Natl. Acad. Sci. U. S. A 78, 224–228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sousa C, de Lorenzo V, and Cebolla A (1997). Modulation of gene expression through chromosomal positioning in Escherichia coli. Microbiology 143 ( Pt 6), 2071–2078. [DOI] [PubMed] [Google Scholar]
- Talukder A, and Ishihama A (2015). Growth phase dependent changes in the structure and protein composition of nucleoid in Escherichia coli. Sci. China Life Sci 58, 902–911. [DOI] [PubMed] [Google Scholar]
- The Astropy Collaboration, Robitaille TP, Tollerud EJ, Greenfield P, Droettboom M, Bray E, Aldcroft T, Davis M, Ginsburg A, Price-Whelan AM, et al. (2013). Astropy: A community Python package for astronomy. Astron. Astrophys. Suppl. Ser 558, A33. [Google Scholar]
- The Gene Ontology Consortium (2017). Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res 45, D331–D338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The MathWorks, Inc. (2018). Confidence Intervals for Sample Autocorrelation - MATLAB & Simulink
- Uyar E, Kurokawa K, Yoshimura M, Ishikawa S, Ogasawara N, and Oshima T (2009). Differential binding profiles of StpA in wild-type and h-ns mutant cells: a comparative analysis of cooperative partners by chromatin immunoprecipitation-microarray analysis. J. Bacteriol 191, 2388–2391. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vora T, Hottes AK, and Tavazoie S (2009). Protein occupancy landscape of a bacterial genome. Mol. Cell 35, 247–253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang X, Le TBK, Lajoie BR, Dekker J, Laub MT, and Rudner DZ (2015). Condensin promotes the juxtaposition of DNA flanking its loading site in Bacillus subtilis. Genes Dev 29, 1661–1675. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yeung E, Dy AJ, Martin KB, Ng AH, Del Vecchio D, Beck JL, Collins JJ, and Murray RM (2017). Biophysical Constraints Arising from Compositional Context in Synthetic Gene Networks. Cell Syst 5, 11–24.e12. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
The header name (and description) are provided. Attribute (Gene and locus ID), Log2_CDS/downstream (the log2 fold change of the first or last 500 bp window of a gene CDS compared to 500bp downstream of a gene - a negative value indicates that potential integration knockouts cause the transcriptional propensity appear lower across part of the CDS compared to integrations directly downstream of the CDS), Description (ecocyc description of genes with a >1.7-fold change from the downstream window value), KO_effect (Manual annotation of genes with a >1.7-fold change from the downstream window value), First_Last500CDS (indicating whether the CDS value is from the first window or the last window).
We provide a comprehensive table of transcriptional propensity data, which contains the following columns (with description): t1_cDNA (replicate 1 raw RNA barcode counts), t1_gDNA (replicate 1 raw DNA barcode counts), t2_cDNA (replicate 2 raw DNA barcode counts), t2_gDNA (replicate 2 raw DNA barcode counts), pos (U00096.3 coordinates), strand (relative to U00096.3 reference), gc_fraction (Fraction of GC in barcode), raw_propensity1 (replicate 1 RNA/DNA barcode counts), raw_propensity2(replicate 2 RNA/DNA barcode counts), 500_med_win_propensity1(replicate 1 median of RNA/DNA barcode counts in 500bp window around each integration when at least 3 integrations are included), 500_med_win_propensity2(replicate 2 median of RNA/DNA barcode counts in 500bp window around each integration when at least 3 integrations are included), Avrg_500_med_win_propensity (average of replicates after median windowing and quantile normalization).





