(
A) To provide an alternative measure to that presented in
Figure 4E, instead of creating a random sequence and comparing the number of predicted σ
70-RNAP binding sites in it and in the
Escherichia coli genome, here we created 100 shuffled σ
70-RNAP energy matrices and used each of them to predict the expression from every single position in the
E. coli genome. For each shuffle, we constructed cumulative histograms of free energy for inter-genic and within-genes regions. For each bin, we then calculated the p-value of the Extended model that used the actual σ
70-RNAP energy matrix, assuming a normal distribution with mean and standard deviation given by the set of models with shuffled matrices. This is a conservative estimate, as for energies Δ
E < 1, the assumption of Gaussian distribution leads to overestimates of standard deviation. The matrices were shuffled per position, that is, an energy matrix of dimension 4×
L, with
L being the length of the binding site, is shuffled by randomly reordering the
L columns while leaving the energy entries in each column unchanged in order and magnitude. Gray lines represent 95% confidence intervals. (
B) For evidence of selection against σ
70-RNAP binding sites only in the inter-genic regions that contain experimentally confirmed promoters (based on RegulonDB), we compared model-predicted binding energy across the region to the expected binding for a 10
8 bp random sequence with the GC% of the corresponding region. Also shown is the selection against binding sites within genes (same as in
Figure 4E). Gray shaded areas are 95% confidence intervals.