Summary
Quantifying and predicting growth rate phenotype given variation in gene expression and environment is complicated by epistatic interactions and the vast combinatorial space of possible perturbations. We developed an approach for mapping expression-growth rate landscapes that integrates sparsely sampled experimental measurements with an interpretable machine-learning model. We used mismatch CRISPRi across pairs and triples of genes to create over 8,000 titrated changes in E. coli gene expression under varied environmental contexts, exploring epistasis in up to 22 distinct environments. Our results show that a pairwise model previously used to describe drug interactions well-described these data. The model yielded interpretable parameters related to pathway architecture and generalized to predict the combined effect of up to four perturbations when trained solely on pairwise perturbation data. We anticipate this approach will be broadly applicable in optimizing bacterial growth conditions, generating pharmacogenomic models, and understanding the fundamental constraints on bacterial gene expression. A record of this paper’s Transparent Peer Review process is included in the Supplemental Information.
Keywords: epistasis, CRISPRi, genetic interaction, fitness landscape, functional genomics, E. coli, essential genes, expression-fitness mapping
eTOC
Otto et al. measure the effect of thousands of transcriptional perturbations in diverse environments on E. coli growth rate. They use these data to train an interpretable machine learning model to reconstruct expression-growth rate landscapes, identify meaningful genetic interactions, and predict how combinations of genetic and environmental factors influence growth.
Graphical Abstract

Introduction
Changes in gene expression provide the foundation for cells to differentiate, metabolize varied nutrient sources, adapt to new environmental niches, and respond to stress across all domains of life. In the specific context of bacterial infection, expression changes can lead to immune evasion1, trigger a switch between commensal and pathogenic behavior2, or confer antibiotic tolerance or resistance3,4. Yet our ability to quantify and predict the effects of gene expression changes on growth rate phenotype for even relatively simple, well-studied organisms like E. coli remains limited. Complicating factors include that the relationship between gene expression and cell growth rate is non-linear and often unknown5–9, epistatic interactions amongst genes and environments result in context-dependent phenotypes10–12, and the combinatorial space of possible genetic and environmental perturbations is vast. To address these challenges, we need integrated experimental and computational tools that enable reconstruction of high-dimensional gene expression-growth rate landscapes under relevant environmental contexts.
Recent advances in titratable or mismatch CRISPRi offer a fresh experimental strategy to quantify the relationship between expression and growth8,9,13–15, and have been combined with environmental perturbations to yield powerful chemical-genetic screens16–18. In contrast to traditional genetic screens which typically make use of binary “on-off” perturbations in the form of a single strong knockdowns or knockout — mismatch CRISPRi allows the creation of analog-like intermediate expression perturbations19. Prior work has shown that sampling intermediate perturbations leads to more complete detection of possible drug targets and environmental interactions9,20,21. In this work, we extend our previous single-gene mismatch CRISPRi methodology to titrate expression over pairwise and three-way combinations of genes under varied environments9. This strategy advances pairwise genetic interaction mapping from a set of discrete end-point epistasis measurements (Fig 1A) to the reconstruction of continuous epistasis landscapes (Fig 1B). These landscapes can then be used to predict the effect of previously uncharacterized perturbation combinations (Fig 1C). More specifically, by growing combinatorial libraries of CRISPRi knockdowns in a multiplexed turbidostat, we mapped growth rate landscapes across titrated variation in expression and environment for up to fourth-order perturbations (two genes, two environments). These data define fitness landscapes across combinatorial genetic and environmental perturbations (akin to Fig 1B). Our goal was to then quantitatively summarize these data and construct a predictive model that extends to arbitrary combinations of continually varied genetic and environmental perturbations (Fig 1C). We sought a model that would readily describe diverse types of genetic interactions (e.g., both binding in physical complexes and indirect interactions across metabolism) with minimal prerequisite knowledge of mechanistic detail. To do this, we drew upon recent ideas from pharmacological interaction modeling. A growing body of work has emerged centered on few-parameter models for predicting the effects of combinatorial drug treatments22–34. Following the rationale that a reduction in enzyme expression is functionally analogous to inhibiting an enzyme with a drug, we hypothesized that a similar coupling-based approach could capture context-dependence between genes and predict the impact of multiple gene expression changes on bacterial growth rate. We found that a pairwise model extended from the work of Zimmer and colleagues29 was able to well-capture our data, yielded interpretable parameters that provide insight into underlying pathway architecture and hierarchy, and enabled prediction of three and four perturbations at a time when trained on only pairwise data. Taken together, our experimental and computational methods provide a strategy to incorporate sparsely sampled epistasis measurements into a predictive model that is scalable to quantify expression-growth rate coupling across entire metabolic pathways or potentially even genomes.
Figure 1. Mismatch CRISPRi enables characterization of gene expression-growth rate relationships.

(A) A linear interpolation from experimental data exploring the growth rate effects of complete knockdown of one or two genes, akin to a classic genetic interaction measurement. Growth rate of wildtype cells (x = y = 0, red) linearly decreases with either or both knockdowns of genes 1 and 2 (dotted lines). Growth rate units here (and in all figures) are arbitrary units (AU) as growth rates are linearly rescaled such that wildtype growth is equal to 1 and no growth is equal to 0. (B) A computationally modeled, continuous, pairwise expression-growth rate model, fit from intermediate experimental data (gray dots). Growth rate no longer scales linearly with knockdown and is far more robust to expression perturbation than a linear model would suggest. (C) Tuning the expression of E. coli genes (dials) and thus relative protein abundances (gradients) alters cellular growth rate. (D) The concentration of mRNA in log-phase E. coli for each titrating sgRNA was quantified by RT-qPCR. A repression efficiency of 1 corresponds to complete knockdown, and a repression efficiency of 0 corresponds to the expression level following treatment with a non-targeting sgRNA control that does not perturb mRNA level. Individual colored bars represent different titrating sgRNAs targeting thyA, with error bars describing the standard error of the mean (SEM) across n ≥ 3 replicates. Individual measurements are shown as gray dots. (E) Growth rate was quantified by next-generation sequencing. We measured the log2 relative (Rel.) frequency of each sgRNA relative to a non-targeting control (gray dots along y = 0) over eight time points spanning 14 hours. Color coding of sgRNAs is identical to panel D. The slope of the line of best fit represents the growth rate of each knockdown relative to the non-targeting control for a single replicate. (F) Measurements of growth rate and repression efficiency show a sigmoidal relationship (blue line), with serious growth rate deficits emerging at severe repression levels. Color coding of sgRNAs is identical to panel D; error bars represent SEM of growth rate measurements (vertical, n ≥ 4) and RT-qPCR measurements (horizontal, n ≥ 3).
Results
Quantifying expression-dependent epistatic landscapes with pairwise CRISPRi
To quantitatively map the relationship between expression and growth rate, we used mismatch CRISPR interference (CRISPRi). CRISPRi uses a catalytically dead Cas9 endonuclease (dCas9) and a single guide RNA (sgRNA) to target a specific DNA locus, sterically repressing transcription at the target site35. In standard CRISPRi, the target of the sgRNA is specified by a 20-nucleotide homology region designed to be fully complementary to the target gene. In contrast, mismatch CRISPRi uses sgRNAs containing nucleotide mismatches in the target homology region8,9,36,37. These mismatches disrupt sgRNA-DNA interactions, decreasing gene repression strength. In prior work9, we defined empirical design rules that allow us to design a series of sgRNAs targeting the same DNA locus that effectively titrate a single gene’s transcription to numerous intermediate levels between no knockdown and complete knockdown (Fig 1D). Growth rate defects caused by these knockdowns can be quantified at high throughput using CRISPRi-seq10,36,38–40. More specifically, we estimate the growth rate effects of thousands of sgRNAs in a single experiment by fitting a linear model to the logarithm of each sgRNA’s frequency over time (Fig 1E). By plotting growth rate as a function of sgRNA repression strength, we define continuous expression-growth rate functions for genes of interest (Fig 1F).
To develop our integrated experimental and modeling approach, we selected nine genes to target with mismatch CRISPRi, with an eye to sampling distinct metabolic pathways, architectures, and epistatic relationships (Fig 2A–D). Included were genes encoding enzymes catalyzing consecutive metabolic steps (dapA/dapB, purN/purL), parallel reactions (gdhA/gltB), and a three-enzyme loop (folA/glyA/thyA). All genes except purN, gdhA, and gltB were previously shown to be essential in our experimental conditions41 (OD600 < 0.01 after 24 hours in minimal media). To avoid polar effects of CRISPRi knockdown, no gene was upstream of an essential gene in an operon36. Following the rationale behind classic genetic interaction mapping, we generally expected positive epistasis between genes in consecutive reactions and negative epistasis between those in parallel reactions42. Indeed, prior work showed that double knockout of gdhA/gltB is synthetically lethal43. We designed 9–12 distinct sgRNAs targeting each gene as well as a non-targeting sgRNA control to generate a 96 sgRNA library (Table S1). We quantified the effect of each CRISPRi perturbation on the mRNA concentration of its target gene using reverse transcription-quantitative PCR (RT-qPCR). As expected, our library of sgRNAs created a titrated range of expression levels for every gene in our data set (Table S1).
Figure 2. Mismatch CRISPRi for a diverse set of metabolic enzymes.

sgRNAs were designed to target 9 E. coli genes from (A) diaminopimelate biosynthesis, (B) purine, (C) folate, and (D) glutamate metabolism. Gene names are listed in black italic text underneath the corresponding metabolic enzyme (bold black text). Metabolite names are in blue. (E-M) Repression efficiency of each CRISPRi sgRNA, as measured by RT-qPCR, correlated with growth rate, as measured by next-generation sequencing. A repression efficiency of 1 corresponds to complete knockdown, and a repression efficiency of zero corresponds to the expression level following treatment with a non-targeting sgRNA control. Negative repression efficiency corresponds to an increase in relative mRNA abundance, which can be observed for sgRNAs with low homology to the gene of interest. Error bars represent SEM of growth rate measurements between n ≥ 4 replicates (vertical) and RT-qPCR measurements across n ≥ 3 replicates (horizontal). Blue lines represent the two-parameter logistic fit to each data set. Blue shaded area represents a pointwise 95% confidence interval of the logistic fit estimated by n = 100 bootstrapping iterations. The genes gdhA and gltB are nonessential in our experimental conditions and show no growth rate defect following repression.
We then assembled a library of all pairwise combinations of these sgRNAs in three distinct barcoded plasmid vectors to generate internal replicates9,44. Sequencing of our library prior to selection showed that it was complete and well-distributed (Fig S1A). We quantified the growth rate effect of each CRISPRi knockdown (both singles and pairwise) on exponentially growing E. coli cells in M9 minimal media with glucose using next-generation sequencing (all growth rate data is available on GitHub, for DOI see key resources table). The mixed bacterial culture was maintained at a dilute optical density (OD600=0.15) in a turbidostat to ensure a more constant environment and minimize the effects of communication or cross-feeding between cells45,46. Growth rate was well-correlated between internal replicates (r2 = 0.57–0.59, Fig S1B). In agreement with earlier work35, changing the sgRNA order within pairwise constructs (sgRNA1-sgRNA2 vs. sgRNA2-sgRNA1) did not systematically alter growth rate (r2 = 0.77, Fig S1C). Thus, after pooling our three internal replicates for each of two sgRNA orders, we had six independent measurements for each pairwise CRISPRi treatment. As in prior work9, we used Dixon’s Q test to identify and filter out “escapers”: replicates whose growth rate was significantly faster than all other replicates of the same CRISPRi treatment, likely due to evasion of the CRISPRi machinery (Fig S1D). Escaper correction removed 182 measurements (Fig S1E–F, 0.7% of the data). Finally, we averaged the remaining replicates and empirically rescaled our growth rate data, setting the minimum growth rate observed to 0 and the non-targeting CRISPRi control to 1 (in arbitrary growth rate units). Normalization to the non-targeting control (which doubles ~20% slower than wildtype MG1655 E. coli) deconvolutes the general growth rate defect caused by dCas9 induction from the specific effects of CRISPRi. In total, we quantified the growth rate effect of 4,404 unique CRISPRi perturbations. We found that growth rates from this study correlated well with measurements from our previous work investigating single-gene knockdowns9 (Fig S1G, r2 = 0.90).
Key resources table.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Bacterial and virus strains | ||
| Escherichia coli XL1-Blue (recA1 endA1 gyrA96 thi-1 hsdR17 supE44 relA1 lac [F’ proAB lacIq ZΔM15 Tn10 (Tetr)]) | Agilent | #200236 |
| Escherichia coli K12 MG1655 + dCas9 (F- λ- ilvG- rfb-50 rph-1 HK022 attB:dCas9) | AddGene | #118727 |
| Chemicals, peptides, and recombinant proteins | ||
| Anhydrotetracycline | Cayman Chemical Company | #10009542 |
| Thymidine | Sigma-Aldrich | #T1895 |
| Methionine | Sigma-Aldrich | #M5308 |
| Kanamycin | Sigma-Aldrich | #K1377 |
| Critical commercial assays | ||
| Luna Universal One-Step RT-qPCR Kit | New England Biolabs | #E3005 |
| RNeasy Protect Bacteria Mini Kit | QIAGEN | #74524 |
| RNase-Free DNase Set | QIAGEN | #79254 |
| Picogreen assay | Thermo Fisher Scientific | #P7581 |
| Qubit assay | Thermo Fisher Scientific | #Q32851 |
| Deposited data | ||
| Next-generation sequencing reads | This paper | SRA: PRJNA877364 |
| All custom code, qPCR data, growth rate data, and model fits | This paper | 10.5281/zenodo.10278952 |
| Oligonucleotides | ||
| See Table S1 for all sgRNA sequences used in this study | This paper | Table S1 |
| See Table S7 for all primer sequences used in this study | This paper | Table S7 |
| Recombinant DNA | ||
| Barcoded pCRISPR3 plasmid backbones | This paper | AddGene: #191856-191861 |
| CRISPRi Libraries | This paper | Available upon request |
| Software and algorithms | ||
| Python version 3.9.12 | Python Software Foundation | RRID:SCR_008394 |
| Jupyter Notebook | Project Jupyter | RRID:SCR_018315 |
| CFX Maestro Software | Bio-Rad | https://www.bio-rad.com/en-us/category/qpcr-analysis-software |
| All custom code, qPCR data, growth rate data, and model fits | This paper | 10.5281/zenodo.10278952 |
| Other | ||
| CFX Opus 384 Real-Time PCR System | Bio-Rad | #12011452 |
| Synergy Neo2 Hybrid Multi Mode Reader | Agilent | #BTNEO2 |
We quantified single gene expression-growth rate relationships by considering the subset of sgRNA constructs consisting of one targeting and one non-targeting control sgRNA (Fig 2E–M). All genes showed monotonic relationships between repression strength and growth rate except gdhA and gltB, which were known to be nonessential in our growth condition. Notably, our approach to sgRNA library design reliably sampled the steep, intermediate growth regime of each gene, allowing us to precisely identify the knockdown level where each gene’s transcription became growth-limiting. We saw that the general shape of expression-growth rate curves was similar for biochemically related genes — for example dapA/dapB (Fig 2E–F) or purN/purL (Fig 2G–H), in line with previous findings8.
Our complete data set yielded a matrix of pairwise expression-growth rate landscapes (Fig 3, lower triangle). To quantify interactions between CRISPRi perturbations cooccurring within a cell, we quantified growth rate (g) epistasis between every pair of CRISPRi knockdowns using a multiplicative Bliss model, where epistasis between sgRNAs 1 and 2 is equal to g1,2 – g1 * g2 (Fig 3, upper triangle). For the majority of sgRNA pairs, we observed minimal epistasis. This is consistent with expectation given that: 1) many of the gene pairs are drawn from unrelated metabolic pathways and 2) many sgRNAs yielded only modest changes in gene expression. Across the library, we observed both positive and negative epistasis, particularly for sgRNAs with the strongest effects on gene expression. Our data also recapitulated the synthetic lethality previously reported for gdhA and gltB as negative epistasis43: we observed minimal growth rate defects when these genes were repressed individually, but severe growth rate defects when both genes were repressed together. However, even for gene pairs where epistasis was prevalent, the intensity of epistasis varied drastically with expression. These results motivated identifying a single measure of inter-gene interaction that could summarize the many disparate epistasis values collected across expression levels.
Figure 3. Pairwise CRISPRi growth rate and epistasis measurements.

Each column and row represents a unique sgRNA perturbation. Gene names denote groups of sgRNAs targeting a given gene, and sgRNAs are sorted within each group by increasing CRISPRi repression strength (top-to-bottom and left-to-right). Nont indicates the non-targeting control sgRNA. The lower triangle of the matrix describes pairwise growth rate measurements calculated relative to the non-targeting control across 14 hours, averaged across n ≥ 4 experimental replicates, where wildtype-like (WT) growth is equal to 1. The upper triangle of the matrix describes pairwise growth rate epistasis. Epistasis was calculated as the difference between the observed growth rate and the multiplicative (Bliss) growth rate expectation of pairwise sgRNA perturbations (n ≥ 4).
A continuous, coupling-sensitive model of expression-growth rate relationships
To construct a continuous, predictive description of expression-growth rate relationships we loosely followed the approach introduced for combinatorial drug interactions in Zimmer et al., 201629. In this approach, the authors fit dose-response curves relating drug concentration to growth rate. Interactions between drugs were treated as a change in the “effective dose” of one drug in the presence of the other. These interactions were summarized with two (potentially asymmetric) coupling constants — the effect of drug “A” on “B”, and the effect of drug “B” on “A”. The authors found that a model constructed using only these pairwise coupling terms could predict the effects of higher-order combinations of drugs. This model was unit-agnostic, meaning the framework could be trained on any quantitative input, not just drug treatment. We hypothesized that this model could be extended to make predictions of growth rate given variation in gene expression or nutrient abundance. Here, we sought to test if such a simple, mechanism-free model could be used to describe our data and. predict the effect of combinatorial genetic perturbations.
Given that the number of measurements necessary to quantify single perturbation-phenotype relationships scales linearly with the number of perturbations, we used well-constrained single perturbation functions as the backbone of our modeling strategy. In analogy to Zimmer et al., we fit sigmoidal “dose-response” curves relating gene repression efficiency () to growth rate () using two parameters, steepness () and repression efficiency at half-maximal growth rate () (Equation 1). Here, represents unperturbed gene expression and is a complete knockdown. Similarly, a relative growth rate of 1 is the bacterial growth rate in the absence of knockdown and approximates no growth.
| (1) |
These sigmoidal fits provided a continuous description of the effects of gene repression on growth rate (Fig 2E–M, Table S2). We used bootstrap resampling of the data to estimate 95% confidence intervals (Cls) for each parameter of the sigmoidal fit (Table S2, STAR Methods). These fit parameters were well constrained for all genes except gdhA and gltB — two nonessential genes without serious growth rate defects. Although these two expression-growth rate functions can be fit by a wide range of n and R0 values, we will show that our model’s performance is insensitive to this variation and thus robust for essential and nonessential genes. Assuming independence between CRISPRi perturbations, we next constructed a Null model for pairwise knockdowns by multiplicatively combining the growth rate effects of two single gene knockdown functions (Equation 2).
| (2) |
This initial model enabled continuous prediction of growth rate following pairwise gene knockdown but did not account for the possibility of gene-gene epistasis. To introduce epistasis, we fit coupling constants aij and aji to modulate the effective repression of a given gene based on the intensity of a secondary perturbation and vice versa, again as in Zimmer et al., 201629 (Equation 3). These effective repression values are then used in place of true repression values in equations 1 & 2.
| (3) |
Based on the assumption of sparse gene-gene coupling, we introduced a regularization term to penalize the absolute values of aij and aji (STAR Methods). Using mild regularization, we reduced the spread of coupling constant values while retaining our model’s predictive power (Fig S2, Table S3).
The fit coupling constants acted to summarize inter-gene epistasis and, in some cases, captured asymmetric genetic interactions. For example, for gene pair dapB/purN (Fig 4A), we observed that pairwise knockdowns (lower right pixels) had less severe growth rate defects than would be expected from single knockdown measurements alone (upper right and lower left). Our continuous epistasis model captured this interaction with a pair of positive gene-gene coupling terms (Fig 4B, aij = 0.100,95%Cl : [0.066,0.161], aji = 0.222, [0.078, 0.303]), while the Null model erroneously underestimated growth (Fig 4C). Notably, the couplings between dapB and purN are asymmetric: in this instance, it reflects the fact that purN repression more strongly modulates the effect of dapB repression than the converse. In contrast, the synthetic lethal gdhA and gltB show negative epistasis, where the double knockdown phenotype was more severe than expected (Fig 4D). In our continuous epistasis model, this was captured with one negative gene-gene coupling term and one zero coupling term (Fig 4E, aij = 0, [−1.06,0], aji = −0.425, [−1.19,0]), but this phenotype was again missed by the Null model (Fig 4F). In this case, the apparent coupling directionality for gdhA/gltB is the result of our regularization strategy and, as the confidence intervals indicate, either coupling constant can be negative to recapitulate synthetic lethality. The gene pair dapA/dapB shows positive coupling with near-symmetrical coupling constants, which we infer is because their gene products catalyze consecutive metabolic reactions (aij = 0.361, [0.237, 0.449], aji = 0.327, [0.267, 0.387], Fig S3A–C). Metabolic flux is limited by only a single knockdown, and the effect of an additional knockdown is dampened by this first flux restriction. However, we see almost no interaction between purN and purL, even though the products of these genes are also consecutive (Fig S3D–F, aij = 0.053, [0.034, 0.095], aji = 0.009, [0.002, 0.016]). Thus, the continuous epistasis model summarizes gene-gene coupling across entire pairwise expression-growth rate landscapes while providing insight that metabolic architecture alone may miss.
Figure 4. The continuous epistasis model captures gene-gene coupling at all expression levels.

(A) Pairwise expression-growth rate data following CRISPRi knockdown of both dapB and purN, which show positive coupling. Each row and column represents a unique sgRNA, and pixels represent the growth rate effect of a given sgRNA pair (n ≥ 4). Rows and columns are sorted by knockdown intensity, ranging from wildtype-like expression (top left) to maximal double-knockdown (bottom right). (B) Predicted expression-growth rate data utilizing the coupling-sensitive continuous epistasis model. (C) Predicted expression-growth rate data utilizing the coupling-insensitive Null model. (D-F) Same as A-C, but for the gdhA/gltB gene pair, which shows negative coupling and synthetic lethality. (G) The performance of the coupling-sensitive model compared to the Null model. Each data point represents model RMSD across all pairwise sgRNA combinations for a given gene pair. Full expression-growth rate data is shown for white, annotated dots in this figure and Fig S3 (e.g., the white dot labeled 4A-C represents dapB/purN). Error bars are standard deviations of models fit on n = 100 bootstrapped single perturbation-growth curves. Dotted gray line is y = x. (H) Coupling constants (aij and aji values) fit across all pairwise growth rate data, as calculated from the continuous epistasis model. (I) Bliss epistasis of growth rate following the strongest possible pairwise knockdown of each gene pair, calculated as the difference between the experimentally determined growth rate of the double knockdown and the product of each single-gene knockdown growth rate (n ≥ 4).
The resulting epistatic model effectively predicted growth rates following knockdown for every gene pair, as quantified by the root-mean-square deviation (RSMD) between the model’s predictions and experimentally determined growth rates (Fig 4G). Decreases in RMSD between the continuous epistasis and nonepistatic Null model (visualized as a point falling below y = x) highlight significant prediction improvements (epistatic model RMSD = 0.123, [0.122, 0.139], Null model RMSD = 0.158, [0.151, 0.173]). As our model includes additional parameters, this improvement was expected. To ensure that these accuracy gains warranted the additional complexity of fitting coupling constants we computed the Akaike Information Criterion (AIC) for each model. This measure supported inclusion of these additional parameters (our model AIC = −15,716, [−15,785, −14,743], Null AIC =−13,958, [−14,299, −13, 262]). We refer to this continuous, coupling-sensitive model as the “continuous epistasis” model.
Next, we compared the couplings learned by our approach (Fig 4H, aij and aji) to the Bliss epistasis between maximum knockdown-effect sgRNAs (Fig 4I). Multiplicative Bliss epistasis at the limit of strong knockdown mimics epistasis values calculated during a classical double-knockout study. We found that epistatic couplings fit using our model, which considers the entire pairwise expression-growth rate landscape, are less common than the couplings calculated by comparing end-point perturbations alone. Thus, our data suggest that severe knockdown measurements (which are subject to greater experimental noise) are enriched for false positives, while consistent epistasis that permeates the entirety of a pairwise expression landscape is much sparser.
Finally, we tested two alterations to our model. First, we tried an alternate fitting method with data requirements between the binary (strong-knockdown) approach above and our continuous model by rigorously fitting single gene expression-growth rate curves, but fitting coupling constants using only the most extreme double-knockdown measurement. This approach was prone to overfitting on the double-knockdown growth rate (regardless of regularization strength) and failed to recreate intermediate phenotypes or accurately quantify epistasis (Fig S3G, RMSD = 0.194, [0.174, 0.206]). Second, to test whether the asymmetries in epistatic coupling constants were indicative of true biological directionality, we swapped the aij and aji terms within each gene pair and reassessed model performance. This transposed model performed significantly worse than the model with correct coupling directionalities. However, it still outperformed our Null model as some epistatic coupling constants are truly symmetric (Fig S3H, RMSD = 0.137,[0.131,0.154]).
The expression-growth rate model can be reliably inferred using a small number of measurements
A key strength of this modeling strategy is its ability to represent an entire pairwise expression-growth rate landscape using a short list of parameters (n and R0 for each of two genes, aij, and aji). However, if the pairwise landscape must be densely sampled to reliably fit these parameters, a predictive model will be of little additional use. To quantify the minimal sampling requirements to accurately fit coupling constants, we subsampled our pairwise expression data and refit the model with this smaller training set. We tested three subsampling patterns with the goal of informing future library design strategies: subsampling a fraction of sgRNA pairs from across our entire library (Fig S4A), subsampling a fraction of sgRNA pairs for each gene pair in our library (Fig S4B), and subsampling a fraction of sgRNAs from our initial oligo pool, then constructing all pairwise combinations of the remaining guides (Fig S4C). The first approach would be implemented by constructing a large pairwise library and then bottlenecking prior to growth rate selection and sequencing, while the last uses a smaller starting library. The other approach is a middle ground that may be more difficult to realize experimentally. Using each approach, we randomly split our pairwise expression-growth rate data into a training and test set, fit epistatic coupling constants for each gene pair using the training data, and evaluated the model’s growth rate predictions on the test data over 100 unique subsampling iterations. By varying the relative sizes of our training and test sets, we determined that a model fit with 20% of our original input data when subsampling using either of the first two strategies yielded a median RMSD increase of ≤ 5% relative to a model fit with the full dataset (Fig S4D–E). The third approach — subsampling sgRNAs from the starting library — was robust to subsampling down to 40% of the initial library size (Fig S4F). In all cases, subsampling down to 5% of the library (2–4 perturbations per gene pair) gave model performance comparable to the Null model, supporting the finding from Fig S3G that extreme subsampling (regardless of the specific points sampled) failed to capture pairwise landscapes. Given these data, we recommend bottlenecking a diverse CRISPRi library to sample an average of 10–20 perturbations per gene pair as a simple but robust strategy to capture epistatic landscapes (Fig S5, RMSD = 0.128,[0.125, 0.144]).
Sparse sampling of pairwise expression landscapes enabled pathway-level epistatic studies
Next, we applied a sparse-sampling strategy to map epistasis and predict the effects of pairwise gene knockdown across four cellular pathways. The goals of this experiment were to (1) further test the sparse sampling approach and (2) inspect the connection between coupling magnitude, asymmetry, and annotated functional relationships across a larger set of genes. To do this, we constructed a pairwise CRISPRi library targeting 19 essential genes (171 gene pairs) sampled from glycolysis, diaminopimelate biosynthesis, purine biosynthesis, and DNA replication (Fig 5A–D, Table S1). We included genes from DNA replication to evaluate the capacity of our approach to generalize to non-metabolic genes and encompass several physical interactions. As before, no targeted genes were upstream of another essential gene in an operon. We designed 4–5 sgRNAs for each gene (79 sgRNAs total), resulting in ~20 knockdowns per gene pair (akin to a 15–20% subsampling of our previous assay), totaling 3,123 pairwise knockdowns. Inter-replicate correlations were consistent with our previous assays (Fig S6A, r2 = 0.52–0.53), sgRNA order did not systematically affect growth rate (Fig S6A, r2 = 0.74), and CRISPRi perturbations explored in both our initial screen and this experiment correlated well (Fig S6 B, r2 = 0.72).
Figure 5. Continuous epistatic analysis across cellular pathways connects pathway architecture, sensitivity to knockdown, and protein conservation.

(A) Schematic of E. coli glycolysis. Colored boxes are metabolites, and protein names represent enzymes. Protein names in light gray are not targeted in this assay. Arrows represent epistatic couplings, with thicknesses proportional to coupling strength and orientation indicative of coupling direction. (B-C) Same as A, but for diaminopimelate biosynthesis (B) and purine biosynthesis (C). (D) E. coli DNA replication machinery. Gray boxes indicate individual proteins or protein complexes. Protein names and arrows formatted as in A-C. (E) Single-gene expression-growth rate curves. Relative repression efficiency is estimated from a model based on sgRNA mismatches, and growth rate is quantified through next-generation sequencing. Sigmoidal fits (solid lines) and 95% CIs (shaded areas) are calculated as in Fig 2E–M. Error bars are SEM of growth rate measurements between n ≥ 4 measurements. Genes are sorted by epistatic cluster, as determined in Fig 5G. (F) Growth rate predictions for all data using the continuous epistasis (orange points) and Null model (gray points) compared to experimentally determined growth rates (n ≥ 4). Dotted line is y = x. (G) Coupling constants (aij and aji values) fit across all pairwise growth rate data, as calculated from the continuous epistasis model, as in Fig 4H. Genes are clustered hierarchically, and fall into three distinct groups. Dendrograms are colored by cluster; these colors map to the same colors in E and H. H) Pairwise sequence identities for all genes in each cluster across all Enterobacteriaceae species, compared to the reference E. coli gene sequence. All distributions are significantly different from one another, as determined by Welch’s t-tests (Cluster 1–2 p=10−29, Cluster 1–3 p=10−111, Cluster 2–3 p=10−32).
In lieu of directly quantifying the repression efficiency of each sgRNA using qPCR, we estimated the repression efficiency of each mismatched sgRNA relative to its “parent” fully on-target sgRNA by fitting a sigmoidal model relating the number of mismatches in an sgRNA to relative repression efficiency (Fig S6C, calculated as: repression efficiencysgRNA/ repression efficiencyparent). This model, trained using all 96 sgRNAs from our initial study, is similar (albeit simpler) to other approaches estimating CRISPRi strength, and is better suited to capture the effect of our compounding mutation library design (wherein mutations are systematically added to the sgRNA homology region) than prior models trained on single or double point mutations scattered throughout the sgRNA homology region8,37. Using this proxy, we fit single-gene expression-growth rate relationships for all 19 genes (Fig 5E, Table S2). We then trained our continuous epistasis model on pairwise knockdown data to obtain coupling parameters between all 171 gene pairs. The fit model effectively described our experimentally determined growth rates (Fig 5F, RMSD: 0.059, [0.063, 0.083]) and significantly outperformed the non-epistatic Null model (RMSD: 0.117, [0.096, 0.121]). While the shapes of the dose-response curves for genes present in both the sparsely and densely sampled epistatic studies (dapA, dapB, purL, and purN) were in general consistent, we observed some deviations. In particular, the dapA dose-response curve appeared steeper while the purN curve appeared shallower. Though much of the 95% Cl for the sparsely sampled curves overlapped with the initial densely sampled fits, the limited sampling of single-gene perturbations and variance in repression approximation added uncertainty to our model. Nonetheless, the model’s prediction accuracy indicated that sparse sampling is sufficient to model growth rates for perturbations in both metabolic and non-metabolic genes.
Next, we performed hierarchical clustering to inspect the pattern of couplings across all targeted genes (Fig 5G, Table S4). We obtained three gene clusters that roughly assort by pathway. In Cluster 1 (light blue dendrogram in Fig 5G), we observed strong, pervasive growth rate epistasis, primarily between glycolytic genes. This linear, central metabolic pathway showed positive (buffering) coupling between nearly all genes tested (barring fbaA, which is partially redundant with fbaB, though fbaB is primarily associated with gluconeogenesis47). Cluster 2 contained most purine biosynthetic genes (orange dendrogram); these genes showed weak epistasis with one another and moderate, asymmetrical epistasis with the glycolytic gene cluster. Finally, most DNA replication genes showed minimal epistasis and fell in Cluster 3 (purple dendrogram). A notable exception is the parE gene (and to a lesser extent dnaE), which were highly epistatic with the glycolytic gene cluster. The parE gene is uniquely induced by CreB, a transcription factor that regulates carbon metabolism, which may lead to the epistatic phenotype observed48. These patterns of epistasis are distinct from those observed when considering only the maximal knockdown effect (Fig S6D); the severe growth rate deficits induced by near-complete knockdown of essential genes resulted in noisy epistasis measurements and failed to even loosely recreate pathways. This limitation of epistasis calculations for large genetic perturbations may contribute to the common practice of reconstructing pathways using correlations between coupling profiles as opposed to learning interactions from the coupling values themselves42.
We then evaluated whether asymmetry in coupling magnitude between genes in a shared pathway was indicative of pathway architecture. Instead of assuming bidirectional coupling, we allowed only a single coupling direction to be nonzero, identifying the orientation and magnitude of epistasis with the strongest impact on predictive power (Fig 5A–D (arrows), Fig S6E). Gene pairs with no strong coupling (coupling magnitude < 0.1) were not connected. Within glycolysis, epistasis almost universally propagated “upwards”, such that knockdown of genes near the end of glycolysis strongly affected genes near the beginning the of the pathway, but not vice versa (exception: TpiA and GapA, which lie at a “split” in carbon flux following an aldolase cleavage). Couplings in the other pathways were sparser and generally more symmetric, however we found that much of the epistasis observed in DNA replication occurred between components of the DNA Pol III holoenzyme.
Epistatic clusters also reported on the sensitivity of growth rate to gene knockdown, i.e., coupling magnitude was related to the Hill coefficient for each gene’s expression-growth rate function, which has been reported in other studies of pharmacological coupling34 (Fig 5E). Consistent with the fact that growth is more sensitive to modest expression variation for genes in Cluster 1 than Cluster 3, we see a bias in sequence conservation (Fig 5H). As a metric of conservation, we computed the pairwise sequence identity between all enterobacterales orthologs of a given gene and the E. Coli sequence (STAR Methods). Cluster 1 is highly conserved and affects growth at even moderate knockdowns, Cluster 2 is intermediate, and Cluster 3 is least conserved and only growth linked at the level of extreme knockdown. This echoes the well-known result that essential genes are more strongly conserved49, however here we note differences in conservation even amongst essential genes based on the sensitivity of the cell to partial gene knockdown. Thus, our approach identifies a hierarchy of relationships between gene activity and growth within the broad grouping of essential genes.
High-order predictions can be made using low-order training data
We next wanted to evaluate whether a model trained on only single and pairwise expression-growth rate data could predict the effects of higher-order CRISPRi perturbations. If so, this would vastly dampen the effect of the combinatorial explosion problem. We constructed a third-order sgRNA library across all nine genes from the “densely sampled” dataset (Fig 2E–M), bottle-necked the library to a total size of 911 constructs, and quantified growth rates following these third-order CRISPRi perturbations across five barcoded replicates (Fig 6A). The library comprised 4 single, 76 pairwise, and 830 triple gene knockdowns and a non-targeting control (Fig 6B). Four additional non-targeting sgRNA controls were added to the library to ensure the presence of low-order sgRNA constructs in the final library. These alternative non-targeting sgRNAs phenocopy our original non-targeting control (Fig S7A), and replicate measurements were well-correlated (Fig S7B, r2 = 0.43–0.65). As each sequencing experiment measured growth rates relative to the bulk culture doubling time (which varies with library composition), we rescaled relative growth rates from this experiment such that our single and pairwise CRISPRi knockdowns were well-correlated to those from our densely sampled pairwise data set (Fig S7C, r2 = 0.80).
Figure 6. A model trained on pairwise measurements predicts growth rate following third-order gene knockdowns.

(A) Diagram depicting the design of the third-order sgRNA library. After generating a diverse three-sgRNA library with >106 possible constructs, we bottlenecked our library to a diversity of ~103 constructs. We then used inverse PCR to generate five copies of the library with identical sgRNA constructs but distinct barcodes (BC). (B) Distribution of targeting sgRNAs in the library. The library contained zero- through third-order targeting sgRNA combinations, with bars scaled to illustrate each order’s representation in the library. (C-D) Model performance on third-order sgRNA combinations, where experimentally determined growth rates are plotted against predicted growth rates for the (C) continuous epistasis and (D) Null model. Error bars are SEM across n ≥ 4 experimental replicates (horizontal) and n = 100 bootstrapped model fits (vertical). Data is colored based on the presence of pairwise positive or negative epistasis (orange and purple points, respectively) within a construct. For a construct to be considered epistatic, two sgRNAs must target genes with a magnitude of epistasis (aij or aji) > 0.15 with fewer than 9 sgRNA mismatches. Dotted gray line is y = x.
We used the coupling values fit from our previous pairwise data to predict growth rates following all third-order CRISPRi perturbations. These predictions outperformed the coupling-insensitive Null model (Fig 6C–D, RMSD = 0.174 vs RMSD = 0.234 for the epistatic and Null models respectively). Both models underestimated growth rate near wildtype, as these minor perturbations combined sub-additively. However, when considering moderate and severe growth rate defects (where ~75% of the data lie), the models’ predictions diverged. The continuous epistasis model recapitulated growth rates observed in our data, while the Null model continued underestimating growth. The accuracy of the Null model was also highly variable based on the presence or absence of epistasis between sgRNAs in a given construct, while the continuous epistasis model accounted for these interactions and performed comparably across all epistatic regimes.
We then compared our model’s predictions to two other commonly used methods of estimating high-order phenotypes from low-order data: the Isserlis model50 and a regression-based model51. In pharmacology, the Isserlis model is often used to predict the combined effects of multiple perturbations using only single- and pairwise perturbation data (Equation 4).
| (4) |
In this equation, growth rate following a third-order perturbation is predicted directly from low-order combinations of the same perturbations (e.g., , etc.). Regression-based models encompass a broader class of equations; here we considered a model used in machine-learning applications that is truncated to pairwise terms (Equation 5).
| (5) |
A strength of these models is their precision when predicting the combined effects of perturbations given the effects of all single and pairwise combinations at a defined perturbation dosage or strength. However, these models cannot generalize to predict the effects of unseen perturbation strengths, and both models have limited interpretability. Some regression-based models optimize epistatic coefficients that are conceptually similar to those discussed here, but the number of coefficients scales exponentially with both the number of perturbations and number of dosages explored, which makes identifying biologically relevant parameters a challenge51. We can address the first of these concerns — the discrete nature of these models — by utilizing their “smoothed” versions29. Briefly, the low-order perturbation effects used in the Isserlis and regression equations can be sampled from a continuous model as opposed to measured directly (STAR Methods). This enables these models to predict the combined effects of arbitrary perturbation strengths, even if these exact conditions weren’t experimentally observed. The smoothed Isserlis model performed comparably to our own, outperforming our model for growth rates near wildtype at the cost of increased uncertainty under slower growth conditions (Fig S7D, RMSD = 0.160). The smoothed regression-based model performed poorly due to its sensitivity to variance in low-order measurements (Fig S7E, RMSD = 0.238). Our low-parameter model outputs predictions on par or superior to these alternative approaches, inherently possesses the ability to extrapolate from pairwise data, and offers unique biological insight in the form of optimized coupling constants.
The continuous epistasis model captured gene-by-environment interactions
Importantly, the continuous epistasis model is unit-agnostic — because the model considers perturbation intensities on a relative scale, couplings need not be limited to gene-gene epistasis, but might more generally describe interactions between genes, environments, and other continuous perturbations. To assess the ability of the model to capture gene-by-environment interactions, we used mismatch CRISPRi to repress transcription of the folate metabolic genes folA and/or thyA while supplementing culture growth media with two products of folate metabolism: thymidine and/or methionine (Figure 7A, red text). Previous studies demonstrated that these perturbations have higher-order growth rate interactions52–54. In our own prior work, we observed that folA and thyA form an environmentally-dependent synthetic rescue: growth rate defects due to loss of function mutations to DHFR (the protein encoded by folA) are rescued by loss of function mutations to TYMS (the protein encoding thyA) in the presence (but not the absence) of thymidine52. Moreover, the balance between thymidine starvation and amino acid limitation influences whether DHFR inhibition by the antibiotic trimethoprim results in cell death53,54. Thus, these combined genetic and environmental perturbations serve as a non-trivial test of the model’s ability to predict high-order phenotypes given low-order data.
Figure 7. The continuous epistasis model captures gene-by-environment interactions.

(A) Simplified schematic of a portion of E. coli folate metabolism. Gene names are listed in italics under each arrow, and enzyme abbreviations are listed in bold, black text above each arrow. Red text indicates a gene or metabolite perturbed over the course of the experiment. (B) Gene-by-gene-by-environment-by-environment relative growth rates, normalized to the growth rate of unsupplemented E. coli harboring a non-targeting control sgRNA (n ≥ 4). Within each heatmap, rows represent distinct folA-targeting sgRNAs, and columns represent distinct thyA-targeting sgRNAs. Thymidine and methionine supplementation varies between heat maps, and gray boxes represent pairwise media combinations not explored in this study or individual measurements with insufficient replicates to fit growth rates. (C-D) Growth rate predictions for all data generated using the continuous epistasis (C) and Null model (D). Error bars are SEM across n ≥ 4 experimental replicates (horizontal) and n = 100 bootstrapped model fits (vertical). Dotted line is y = x. The combined effects of thymidine supplementation, folA knockdown, and thyA knockdown (thy-folA-thyA) result in a third-order interaction (orange points), while all others lack this third-order epistasis (blue points). (E) Expression-growth rate data (n ≥ 4) and predictions following pairwise folA/thyA knockdown in unsupplemented media (top row) and media supplemented with 50 ng/μL thymidine (bottom row). (F) Continuous epistasis model performance, split by perturbation combination. For each second- and higher-order combination of expression and environmental perturbations (y-axis labels), the RMSD of growth rate measurements for the continuous epistasis model (red) and Null model (blue) are indicated. Violin plots represent model performance over 100 bootstrapped replicates.
To construct our gene-by-environment model, we first measured: (1) the effect of media supplementation on bacterial growth rate in the absence of expression perturbations and (2) the growth effect of expression perturbations in the absence of media supplementation. For wildtype E. coli harboring a non-targeting sgRNA control, the addition of either methionine or thymidine slightly accelerated growth (Fig S8A–B, Table S5). These data were modeled with a 4-parameter sigmoid (using additional parameters gmin and gmax to represent the minimum and maximum growth rate effects). We also refit expression-growth rate functions for folA and thyA using expression-growth rate data gathered in this experiment and bootstrapped all single perturbation-growth functions to estimate uncertainty in model performance and coupling strengths (Fig S8C–D, Table S5). To investigate gene-by-environment interactions within folate metabolism, we created a compact mismatch CRISPRi library containing 110 pairwise sgRNA combinations targeting folA and thyA. We transformed this library into our CRISPRi strain and grew the transformed population in a turbidostat for parallel control of multiple supplemented media conditions45,46. We again quantified relative growth rates following CRISPRi by next-generation sequencing and rescaled these measurements such that the growth rate of E. coli harboring the non-targeting CRISPRi control in unsupplemented media was equal to one. We then identified and removed escapers (125 measurements, 0.9% of data) and averaged across the remaining growth rate replicates, yielding 2,394 unique growth rate measurements across 22 growth conditions (Fig 7B). Due in part to increased sequencing depth, barcoded replicates were extremely well-correlated (Fig S8E–G, r2 = 0.94–0.95). We observed a “stripe” of slightly elevated growth in the last row of our pairwise expression-growth rate matrices. This apparent nonmonotonicity likely arises because matrix rows are ordered by sgRNA repression strength, and the strongest knockdowns (bottommost rows) are nearly indistinguishable by qPCR and thus may be misordered. This sensitivity highlights the importance of bootstrapping when fitting single perturbation-growth functions, as we know the model’s overall performance is generally insensitive to these variations at maximal knockdown. Taken together, these experiments generated a rich dataset to train and evaluate the continuous epistasis model on gene-by-environment interactions.
We isolated our pairwise perturbation data and fit couplings between perturbations as previously described (Table S6). We found that our model effectively recapitulated growth rates for this independent data set, with an epistatic model RMSD of 0.145, [0.144, 0.176] and a Null RMSD of 0.273, [0.260, 0.293]. As with our gene-gene model, subsampling as little as 20% of our gene-by-environment data was sufficient to reliably fit the continuous epistasis model (RMSD = 0.151, [0.147, 0.194]). We performed subsampling using the same regularization strength used in our pairwise study, supporting the generalizability of this mild regularization. Thus, we see that our unit-agnostic model can capture not only gene-gene interactions, but also gene-by-environment and environment-by-environment interactions.
Next, we evaluated the ability of our model - trained exclusively on first- and second-order perturbations — to predict the growth rate effect of 2,035 third- and fourth-order perturbations. The continuous epistasis model outperformed the Null model on these higher-order predictions: including pairwise couplings yielded a prediction RMSD of 0.275, [0.252, 0.351], while Null model predictions had an RMSD of 0.584, [0.555, 0.612]. High-order predictions remained robust when subsampling pairwise training data (RMSD = 0.300, [0.237, 0.412]). Combining all perturbations, the continuous epistasis model significantly outperformed the Null model on gene-by-environment predictions in terms of RMSD (Fig 7C–D, continuous epistasis: 0.260, [0.239, 0.330], Null: 0.547, [0.521, 0.575]) and AIC (continuous epistasis: −6,416, [−6,817, −5,278], Null: −2,868, [−3,108, −2,637]). We again compared our model to the smoothed versions of the Isserlis and regression models (Equations 9 and 10, respectively, STAR Methods), and all models performed similarly (Isserlis RMSD = 0.300, regression RMSD = 0.254). However, only the continuous epistasis model yielded summarized coupling values between genes and environments in addition to growth rate predictions.
We can visualize why the continuous epistasis model outperformed the Null model using select environmental case-studies (Fig 7E). The top left matrix (red outline) contains growth rate measurements following CRISPRi knockdown in unsupplemented M9-glucose media, and as expected, knockdown of either gene is deleterious to growth. Both the continuous epistasis and Null models accurately recapitulated this experimental data as there is not strong pairwise epistasis between folA and thyA in this environment. However, upon addition of thymidine into the media (bottom left, green outline), the models’ predictions diverged. The addition of thymidine to the media decouples thyA expression from growth rate (thymidine knockdown no longer caused a growth defect in the first few rows of this matrix). This phenotype was captured by the continuous epistasis model, but the Null model’s predictions did not change following nutrient supplementation except to uniformly increase growth rate in the richer media condition. We also note that a phenotype present in our experimental data is missing from both the continuous epistasis and Null models — when thymidine is present in the media and folA is strongly repressed, thyA repression actually benefits growth. As mentioned above, this third-order gene-by-gene-by-environment interaction was expected and was not captured by our pairwise model (Fig 7C–D, orange points). This highlights the need for careful evaluation of model performance, as a systematic failure in a pairwise model may be indicative of biologically interesting high-order interactions. Although this interaction can be identified empirically as a systematic deviation from our model’s predictions, other instances of high-order epistasis may be less apparent. While not explored in this study, one could develop a top-model trained on the residuals of our pairwise model’s high-order predictions to identify and account for these additional, potentially sparse epistatic terms (see Kuzmin et al., 201855 for a survey of third-order epistatic prevalence in yeast).
Finally, we separated our data by genetic and environmental perturbation types and plotted each model’s error separately for each perturbation subset (Fig 7F). The continuous epistasis model outperformed the Null model the most when thyA was repressed and thymidine was added to the media. As noted previously, thymidine replaces the key metabolite made by the thyA gene product, rescuing growth and resulting in a positive pairwise coupling (aij = 0, [0, 0], aji = 0.73, [0.20, 1.29]). This asymmetric coupling is indicative of a hierarchy between thymidine and thyA expression, where the addition of thymidine to growth media fully decouples growth rate from thyA expression but not vice versa. Thus, the continuous epistasis model can be applied to identify directional gene-by-environment and environment-by-environment interactions.
Discussion
Our work shows that a continuous, coupling-sensitive model trained on sparsely sampled growth rate data can predict the effect of new combinatorial perturbations (up to fourth order) over both genes and environment. The model yielded interpretable parameters that described: (1) the sensitivity of growth rate to expression changes and (2) the pattern of hierarchical coupling within and between genes. Based on these findings, we propose a general hybrid experimental/modeling approach to construct fitness landscapes across genotypes and environments. The first step is to precisely quantify the growth rate effect of individual perturbations (genes or environments) at varied intensities to construct dose-response curves. Because this step scales linearly with the number of genes (or perturbations), it is feasible to densely sample these curves (with 7–10 measurements apiece) to enable accurate estimation of the hill coefficient and R0. The second step is to sparsely sample the growth rate effect of pairwise perturbations (exact sampling density may vary; this study found 10 s of measurements per combination suffice). This can be achieved by bottlenecking or subsampling a pairwise library constructed from all possible combinations of the single gene titrating CRISPRi library. Finally, one uses these data to fit a continuous epistasis model that defines the pairwise perturbation landscape and enables the prediction of growth rate effects for unmeasured pairwise perturbations and high-order combinations. For reference, if one wished to perform an experiment with the recommended sampling density (20 measurements per pair), replicates (6), and similar sequencing depth (≥70x coverage) to our present study — but for 1,000 genes of interest (totaling ~1 million epistatic interactions and half a million pairwise landscapes) — the sequencing data required would fit on a single NovaSeq X 25B flow cell. This scale makes it cost-feasible to map interactions across the entirety of bacterial metabolism.
The limiting step to expanding this workflow then becomes quantifying the repression strength of individual sgRNAs. RT-qPCR is relatively low-throughput, precluding genome-level studies. Using single-cell approaches to quantify CRISPRi repression such as microSPLiT56 (bacterial cells), PETRI-seq57 (bacterial cells) or perturb-seq58 (mammalian cells) alongside next-generation sequencing-based growth rate measurements would remove these throughput restrictions and permit much larger CRISPRi studies. Alternatively, one could utilize a predictive model relating sgRNA homology sequence to CRISPRi repression strength, although (as the uncertainty in our sgRNA repression model indicates) these tools may need to be tailored to a specific assay or sgRNA design strategy to replace experimental quantification8,37. We anticipate that future work will combine targeted RNAseq or single-cell measurements of sgRNA efficiencies with machine learning to train more effective models of sgRNA repression strength.
While the continuous epistasis model generally well-captured our data, we did observe some instances of poor model fit — particularly for select gene and environmental perturbations with higher-order epistasis. Moreover, like all machine-learning models, noise in the underlying training data can influence model predictive capacity. For instance, differences in sequencing depth and experiment length affect growth rate measurement precision, effectively setting an accuracy limit on predictive models. Future work can improve this experimental noise by sequencing at higher depth, using genomically integrated sgRNAs to reduce the potential for copy number variation15, and/or using a ligation-based (rather than PCR) strategy to add sequencing adaptors for NGS59. At the analysis level, a growing body of pharmacological models for predicting the effects of multidrug cocktails provides inspiration for further model development and refinement22,23,30,32–34. For example, the MuSyC approach allows for couplings at the level of both drug efficacy (the maximum effect achieved) and drug potency (the dose at which a significant effect is achieved, related to R0)33,34; while methods like SynBa strive to provide improved uncertainty estimates32. Prior work has identified a conceptual synergy between these pharmacological approaches and modeling the combinatorial effect of adjuvants on the immune response27; our data now show that this general class of models may naturally extend across genotype-phenotype-environment relationships.
Beyond the mechanism-free modeling presented here, Metabolic Control Analysis (MCA) offers another approach to constructing fitness landscapes in the specific context of metabolism60–62. MCA relates variation in enzyme velocity to flux using a series of hyperbolic Michaelis-Menten like equations not dissimilar to the sigmoidal functional forms in our own work63. Flux is then quantitatively connected to fitness, yielding fitness landscapes that satisfyingly link microscopic catalytic parameters and enzyme abundances to macroscopic growth rates. The fitness landscapes typically feature a plateau (where enzyme activity is saturated) flanked by drop-offs as a function of enzyme activity (analogous to Fig 1B). We anticipate that closely inspecting the connection between MCA60–64, other detailed mechanistic models of fitness65, and our own mechanism-free approach will lead to improvements in both model functional form and interpretation. For example, a key difference between inhibiting an enzyme with a drug (as in pharmacologically inspired models) and reducing expression (as in our experiments) lies in the cost of protein synthesis. Recent work has extended MCA to account for the tradeoff between the benefit of increasing flux and cost of increased protein synthesis when increasing gene expression64,66,67. This intriguingly results in a non-monotonic relationship between enzyme velocity and growth, such that there is no longer a fitness plateau but a peak near some optimal activity. We observe some inklings of this in our own data; as modest levels of thyA repression actually seem to benefit growth (Fig 1F). More generally, including a non-monotonic “cost” component may become necessary to capture overexpression data or expression-growth rate relationships in rich environments. By generating further datasets across metabolic pathways, we hope to test alternate functional forms and the connection to mechanistic frameworks like MCA.
Our combined experimental and computational approach now provides a foundation for more deeply quantifying the connection between expression and growth rate across the cell. The reduced sampling requirements make pathway-level epistatic analyses feasible in single experiments. We anticipate that our method can be used to better understand the fundamental constraints on enzyme abundance and stoichiometry across pathways68,69, construct pharmacogenomic models relating gene expression to antibiotic sensitivity, and rationally optimize biosynthetic pathway yields or growth conditions70. We hope that interpretable, continuous, and coupling-sensitive models like the one described in this work can continue to improve computational predictions to extract more information from the data-rich -omics experiments we now regularly perform.
STAR Methods
Resource Availability
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Kimberly Reynolds (kimberly.reynolds@usouthwestern.com).
Materials availability
Plasmids generated in this study have been deposited to Addgene, #191856–191861. Additionally, all CRISPRi libraries are available upon request.
Data and code availability
All FASTQ files generated from next-generation sequencing experiments are available through the Sequence Read Archive. The accession number is listed in the key resources table. All qPCR and E. coli growth rate data from this work is available in formatted, machine-parseable Excel spreadsheets available via our GitHub repository The DOI is listed in the key resources table.
All original code has been deposited at GitHub and is publicly available as of the date of publication. The DOI is listed in the key resources table.
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
Experimental Model and Study Participant Details
Cell lines
All cloning and library preparation was performed in E. coli XL1-Blue cells (recA1 endA1 gyrA96 thi-1 hsdR17 supE44 relA1 lac [F´ proAB lacIq ZΔM15 Tn10 (Tetr)]). All CRISPRi experiments were performed in E. coli K12 MG1655 cells (F- λ- ilvG- rfb-50 rph-1 HK022 attB:dCas9). Culture conditions for each assay are listed under their respective subheading in the Method Details section.
Method Details
Pairwise CRISPRi Library Construction
We designed a pool of CRISPRi sgRNAs targeting nine E. coli K-12 MG1655 genes. For each gene, we selected 9–12 distinct sgRNAs to provide a titrated range of mRNA knockdown. We previously characterized CRISPRi knockdown of seven of these genes9 (dapA, dapB, folA, thyA, glyA, purN, and purL), giving us an initial list of sgRNAs for each gene with mutually distinguishable growth rate effects. We supplemented these sgRNAs with up to two fully on-target sgRNAs per gene and added sgRNAs by hand to populate underrepresented growth regimes. The final two genes (gdhA and gltB) were previously uncharacterized by our lab. We designed two on-target sgRNAs for each gene using previously published design principles35. In addition, we introduced 4, 6, 8, 10, and 14 mismatches into the homology region of each on-target sgRNA9. For a full list of sgRNA sequences, see Table S1.
All sgRNAs were synthesized as a mixed oligo pool by Integrated DNA Technologies. We resuspended the library to a final concentration of 50 nM in molecular biology-grade (MB) H2O and added BsaI flanking sites using PCR. We performed two separate PCRs to add two distinct sets of flanking sites onto the library for Golden Gate cloning. We used the GG_UnivF/GG_R1 primers to generate the insert 1 library and the GG_F2/GG_UnivR primers to generate the insert 2 library (for primer sequences, see Table S7). The PCR mix contained 10.75 μL MB H2O, 5 μL Q5 Reaction Buffer (NEB), 5 μL High GC Enhancer (NEB), 0.5 μL 10 mM dNTPs, 1.25 μL 10 μM forward primer, 1.25 μL 10 μM reverse primer, 0.25 μL Q5 High-Fidelity DNA Polymerase (NEB, #M0491), and 1 μL of the 50 nM oligo pool. Initial denaturation was carried out at 98°C for 30 seconds, followed by 25 cycles of 98°C for 30 seconds, 58°C for 10 seconds, and 72°C for 20 seconds. A final elongation was performed at 72°C for 60 seconds, followed by an indefinite hold at 4°C. We ran the PCR product on a 1% agarose 1X TAE gel with EtBr staining at 80 V for 45 minutes and excised the 176 bp band. We extracted the band using a DNA Clean & Concentrator-5 Kit (Zymo, #D4014), eluted in 30 μL MB H2O, and quantified dsDNA concentration using a PicoGreen assay (Thermo, #P7581). To ensure representation of the non-targeting sgRNA sequence (Nont) in the final library, we isolated this control sgRNA using the same protocol illustrated above, replacing the 50 nM oligo pool with 1 μL of purified 20–50 ng/μL plasmid containing the Nont sgRNA9 (pCRISPR2). We spiked Nont inserts into each of the mixed libraries at a 1:100 Nont:library ratio.
We performed three Golden Gate cloning reactions to ligate the CRISPRi inserts into three distinct barcoded CRISPRi vectors (pCRISPR3, AddGene accession numbers are listed in the key resources table). The reaction mix contained 75 ng of barcoded vector, 6 ng of insert 1 mix, 6 ng of insert 2 mix, 1.5 μL of 10X T4 DNA Ligase Buffer (NEB), 1 μL T4 DNA Ligase (2,000,000 U/mL, NEB, #M0202M), 1 μL BsaI-HF v2 (NEB, #R3733), and MB H2O to 15 μL. Reaction conditions were 40 cycles of 37°C for 2 minutes and 16°C for 3 minutes. These cycles were followed by 50°C for 10 minutes, 80°C for 20 minutes, and an indefinite 4°C hold. A no-insert control, where both inserts were left out of the reaction, was performed identically.
We cleaned and concentrated each reaction using a DNA Clean & Concentrator-5 Kit (Zymo, #D4014), eluting in 20 μL MB H2O. We transformed 2 μL of each purified library into 100 μL of XL1 Blue electrocompetent cells followed by a 1-hour recovery at 37°C with shaking. We plated 50 μL of a 1:10 and 1:100 dilution to quantify library coverage and added the remaining cells to 9 mL SOB + 35 μg/mL kanamycin. We incubated this culture overnight at 37°C with shaking. We also transformed and plated the no-insert control and a no DNA control and adjusted library coverage estimates accordingly. Library coverage ranged from 10–14x for each barcoded vector. In addition to these mixed CRISPRi libraries, we synthesized constructs containing two Nont sgRNAs in six distinct barcoded vectors using the same protocol described above, substituting each of the mixed insert libraries for 6 ng of purified Nont sgRNA insert.
We isolated plasmids using the GeneJET Plasmid Miniprep Kit (Thermo, #K0503), eluted DNA in 50 μL MB H2O, and quantified dsDNA concentration with a PicoGreen assay (Thermo, #P7581). We combined the three barcoded CRISPRi libraries in an equimolar ratio. We also mixed the Nont+Nont controls in an equimolar ratio, then spiked this control into the library at a 1:5,000 Nont+Nont:library ratio. We transformed 3 ng of this final mixture into 45 μL MG1655 + chromosomal dCas9 electrocompetent cells (gift from the Bikard lab) using the same transformation protocol as above, plating 50 μL of 1:100 and 1:1,000 dilutions. We performed a no DNA control with zero breakthrough colonies. Coverage exceeded 200x, and we grew the remaining cells in 9 mL SOB + 35 μg/mL kanamycin overnight at 37°C with shaking.
The pairwise pathway CRISPRi library and pairwise folA/thyA CRISPRi library were constructed as described above. For the pathway library, coverage after the first transformation event in the three barcoded vectors ranged from 6x to 10x. Library coverage after the second transformation event was 137x. For the folA/thyA library, coverage calculated after the first transformation event in the three barcoded vectors ranged from 14x to 89x. We mixed three barcoded Nont+Nont CRISPRi constructs in equimolar ratios and spiked them into the folA/thyA library at a 1:400 Nont+Nont:library ratio. As two continuous culture experiments were required to investigate all environments of interest, we freshly transformed the library two separate times. For each transformation, we added 1 ng of mixed library to 45 μL MG1655 + dCas9 electrocompetent cells and achieved 583x and 127x library coverage.
Continuous culture
Following transformation of the MG1655 dCas9 electrocompetent E. coli, cells were grown overnight in LB + 35 μg/mL kanamycin. We spun saturated outgrowth cultures at 1,000 rcf for 5 minutes, decanted media, and resuspended cells in 1 mL M9 minimal media pH 6.5, 0.4% glucose, 35 μg/mL kanamycin (M9+Kan). We repeated this wash, then resuspended 10 μL of concentrated cells in 8 mL M9+Kan. We incubated this culture for 12 hours at 37°C with shaking.
Following outgrowth, we spun the culture at 1,000 rcf for 5 minutes, decanted, and resuspended cells in 1 mL fresh M9+Kan. We repeated this wash, diluted the culture to optical density at 600 nm (OD600) = 0.05 in 15 mL M9+Kan, and grew these cells in a continuous culture device at 37°C, clamping the optical density at 0.15. Our continuous culture device follows the design specifications from Toprak et al., 2012 and 201345,46. Following an overnight adaptation, we added anhydrotetracycline (Cayman Chemical Company, #10009542) to a final concentration of 50 ng/mL to induce CRISPRi. We took the first timepoint (T0) after 3 hours of CRISPRi by extracting 1 mL of cells from the culture and took additional timepoints every 2 hours until T14. We centrifuged timepoints at 3,000 rcf for 5 minutes to pellet cells, decanted by pipetting, and stored them at −20°C.
We performed continuous culture for the subsequent CRISPRi experiments with slight modifications. We terminated these experiments after T10, as the final two timepoints were shown to be unnecessary for growth rate quantification. We inoculated the pathway CRISPRi and third-order CRISPRi libraries into the continuous culture device at a starting OD600 = 0.2. Media conditions were altered during the folA/thyA gene-by-environment experiment. Thymidine (Sigma, #T1895) and/or methionine (Sigma, #M5308) were supplemented into M9+Kan to create the following conditions: No additive, 0.05 ng/μL thymidine, 1 ng/μL thymidine, 2 ng/μL thymidine, 5 ng/μL thymidine, 10 ng/μL thymidine, 50 ng/μL thymidine, 0.01 mM methionine, 0.02 mM methionine, 0.05 mM methionine, 0.1 mM methionine, 0.3 mM methionine, 1 mM methionine, 0.05 ng/μL thymidine + 0.01 mM methionine, 0.05 ng/μL thymidine + 0.05 mM methionine, 0.05 ng/μL thymidine + 0.1 mM methionine, 1 ng/μL thymidine + 0.01 mM methionine, 1 ng/μL thymidine + 0.05 mM methionine, 1 ng/μL thymidine + 0.1 mM methionine, 2 ng/μL thymidine + 0.01 mM methionine, 2 ng/μL thymidine + 0.1 mM, and 5 ng/μL thymidine + 0.01 mM methionine.
Sample preparation and next-generation sequencing
We resuspended each frozen cell pellet (each corresponding to a sample — one time point and environmental condition) in 100 μL MB H2O with vortexing and lysed them at 95°C for 3 minutes. Following a 10 minute, 20,000 rcf centrifugation to pellet debris, we took 50 μL of the supernatant as a working stock for use in downstream reactions.
We then added adaptors for Illumina sequencing in two consecutive PCR steps. First, we added flanking sequences to each timepoint or environmental sample. The reaction mix was composed of 10.5 μL MB H2O, 5 μL Q5 Reaction Buffer (NEB), 5 μL 50% glycerol, 0.5 μL 10 mM dNTPs, 1.25 μL 10 μM ΤruSeqF primer, 1.25 μL 10 μM TruSeqR_trunc primer, 0.5 μL Q5 High-Fidelity DNA Polymerase (NEB, #M0491), and 1 μL timepoint working stock. Reaction conditions were 98°C for 30 seconds, 7 cycles of 98°C for 30 seconds, 61°C for 10 seconds, and 72°C for 15 seconds, followed by a final elongation at 72°C for 45 seconds and an indefinite hold at 4°C.
In the second PCR, we added Illumina sequencing adapters and i5/i7 barcoding sequences to each sample. The reaction mix contained 10.5 μL MB H2O, 5 μL Q5 Reaction Buffer (NEB), 5 μL 50% glycerol, 0.5 μL 10 mM dNTPs, 1.25 μL 10 μM i5 primer, 1.25 μL 10 μM i7 primer, 0.5 μL Q5 High-Fidelity DNA Polymerase (NEB, #M0491), and 1 μL template from the previous reaction. Reaction conditions were 98°C for 30 seconds, 20 cycles of 98°C for 30 seconds, 61°C for 10 seconds, and 72°C for 20 seconds, followed by a final elongation at 72°C for 60 seconds and an indefinite hold at 4°C.
Note: While the glycerol was used in the preparation of timepoints for the pairwise CRISPRi library experiment and the folA/thyA sublibrary experiment, we have observed future amplification reactions that failed in the presence of glycerol.
We quantified dsDNA concentration from each reaction using a PicoGreen assay (Thermo, #P7581), and mixed timepoints in an equimolar ratio to create the mixed sequencing library. We ran the library on a 1% agarose 1X TAE gel with EtBr staining at 90 V for 60 minutes and excised the 486 bp construct. We performed gel extraction using the DNA Clean & Concentrator-5 Kit (Zymo, #D4014) and eluted in 20 μL MB H2O. Following a Qubit Assay (Thermo, #Q32851) to quantify final dsDNA quantification, we submitted the library to Azenta for Illumina HiSeq Sequencing on a 300-cycle paired-end run.
Isolation of single CRISPRi knockdown strains
To quantify the concentration of mRNA following each individual CRISPRi perturbation, we isolated plasmids expressing a single targeting sgRNA paired with the Nont sgRNA. For this purpose, we constructed a CRISPRi:Nont library using Golden Gate cloning as described previously, adding the Nont sgRNA insert in place of the mixed library in the second sgRNA position. We screened individual colonies from this mixed library to isolate the majority of CRISPRi:Nont constructs.
We individually synthesized plasmids not isolated during this screen using inverse PCR (iPCR). First, we phosphorylated iPCR primers with a PNK reaction. The reaction mix consisted of 6.5 μL MB H2O, 1 μL 10X T4 DNA Ligase Buffer (NEB), 1 μL 100 μM iPCR_sgRNA_F primer, 1 μL 100 μM iPCR_UnivR primer, and 0.5 μL T4 PNK (NEB, #M0201S). Thermocycler conditions were 37°C for 30 minutes, 65°C for 20 minutes, and an indefinite 4°C hold. Then, we performed iPCR with a reaction mix of 8.25 μL MB H2O, 5 μL Q5 Reaction Buffer (NEB), 0.5 μL 10 mM dNTPs, 0.5 μL Q5 High-Fidelity DNA Polymerase (NEB, #M0491), 10 μL primer phosphorylation reaction, and 1 μL 30–50 ng/μL template plasmid (pCRISPR2). Reaction conditions consisted of 98°C for 30 seconds, followed by 25 cycles of 98°C for 30 seconds, 58°C for 30 seconds, and 72°C for 90 seconds. A final 3-minute elongation at 72°C preceded an indefinite hold at 4°C. We cleaned and concentrated the reaction product using a DNA Clean & Concentrator-5 Kit (Zymo, #D4014) and eluted in 25 μL MB H2O. We removed the template plasmid using a DpnI digestion. Reaction conditions were 13 μL cleaned and concentrated product, 2.5 μL 10x Cutsmart Buffer (NEB, #B6004), 8.25 μL MB H2O, and 1.25 μL DpnI (NEB, #R0176). The reaction was held at 37°C for 1 hour, then held at 4°C. Finally, we circularized the PCR product in a ligation reaction consisting of 8 μL DpnI product, 1 μL 10X T4 DNA Ligase Buffer (NEB), and 1 μL T4 DNA Ligase (400,000 U/mL, NEB, #M0202S). Ligation occurred at 25°C for 2 hours, followed by 10 minutes at 65°C and a hold at 4°C. We transformed this product into XL1 Blue chemically competent cells (1 μL DNA, 45 μL competent cells) with 1 hour recovery at 37°C with shaking. We purified plasmids using a GeneJET Plasmid Miniprep Kit (Thermo, #K0503) and transformed into CRISPRi MG1655 + dCas9 cells as described previously.
RNA Extraction and RT-qPCR
To collect RT-qPCR data following CRISPRi, we grew CRISPRi-Nont strains from glycerol stocks in 4 mL LB + 35 μg/mL kanamycin for 6–8 hours at 37°C with shaking. We washed these cultures into M9+Kan as described previously and incubated them for 8–14 hours at 37°C with shaking. We then washed cultures into M9+Kan+50 ng/mL anhydrotetracycline (Cayman Chemical Company, #10009542) and incubated them for 5 hours at 37°C with shaking at a starting OD600 = 0.05. After treatment, we isolated RNA using the Qiagen RNeasy Protect Bacteria Mini Kit (QIAGEN, #74524), with on-column DNase digestion (QIAGEN, #79254), as described in their protocol. We extracted RNA from control Nont+Nont strains using the same procedure. We performed RT-qPCR on a CFX Opus 384 Real-Time PCR System (Bio-Rad, #12011452) using the Luna Universal One-Step RT-qPCR Kit (NEB, #E3005). No template and genomic DNA controls were included in all experiments. We analyzed raw results using the CFX Maestro Software (Bio-Rad), and calculated changes in mRNA concentration using the ΔΔCt method with the hcaT gene as reference in a custom Python script (available on GitHub, DOI is listed in the key resources table).
Third-order sgRNA library construction and next-generation sequencing
We constructed the third-order sgRNA library in a single barcoded vector (pCRISPR3, AddGene accession numbers are listed in the key resources table) using Golden Gate cloning as described previously with minor modifications. Three sgRNA inserts were used instead of two, using primers GG_UnivF/GG_R1 for insert 1, GG_F2/GG_R2 for inserts 2, and GG_F3/GG_UnivR for insert 3 (Table S7). During Golden Gate cloning, insert concentrations were doubled to 12 ng per insert per reaction, and reaction conditions were adjusted to 60 cycles of 2 minutes at 37°C and 3 minutes at 16°C. These cycles were followed by 10 minutes at 50°C, 20 minutes at 80°C, and an indefinite 4°C hold. Transformed cells were serially diluted and plated, and we selected a library dilution containing ~1,000 colony forming units to form our bottlenecked library.
This library was used as a template in an iPCR reaction to replace the 6-nucleotide barcode with a new barcode sequence while maintaining sgRNA diversity. iPCR was carried out as described above using primers BC_iPCR_F_BC[1–6] and BC_iPCR_UnivR_v2. These libraries were sequence verified and transformed at sufficient efficiency to avoid further bottlenecking. These sublibraries were quantified by Qubit and combined in an equimolar ratio to generate the final third-order CRISPRi library with five barcodes. pCRISPR3 barcode 4 was excluded from the reaction. Finally, to ensure that we could normalize sgRNA constructs to a wildtype-like subpopulation, CRISPRi-sensitive E. coli harboring a Nont+Nont+Nont construct (in all five barcodes) were spiked into the final population at a 1:500 ratio before CRISPRi treatment.
We altered the sampled preparation protocol to accommodate the increased amplicon size of the third-order sgRNA CRISPRi library. For the first PCR, we used a reverse primer with 4 additional base pairs of homology (TruSeqR). The reaction conditions were changed to 30 seconds at 98°C, 17 cycles of 30 seconds at 98°C and 60 seconds at 76°C, followed by 180 seconds at 72°C and an indefinite 4°C hold. In the second PCR, we altered thermal cycler conditions to 30 seconds at 98°C, 10 cycles of 30 seconds at 98°C and 75 seconds at 72°C, followed by 180 seconds at 72°C and an indefinite 4°C hold. We quantified dsDNA from these reactions using a Qubit Assay (Thermo, #Q32851), mixed them in an equimolar ratio, and concentrated DNA using a DNA Clean & Concentrator-5 Kit (Zymo, #D4014) with a 30 μL MB H2O elution. We ran this concentrated library on a 2% agarose 1X TAE gel with an EtBr stain at 90 V for 85 minutes and extracted the resulting 636 bp band using a DNA Clean & Concentrator-5 Kit (Zymo, #D4014) with a 20 μL MB H2O elution. We quantified dsDNA with a Qubit Assay (Thermo, #Q32851) and sequenced in-house using two Illumina MiSeq Nano v2 500-cycle paired-end runs and one Illumina MiSeq v2 500-cycle paired-end run.
Plate reader growth rate assay
We cultured E. coli strains harboring a non-targeting CRISPRi control overnight in 4 mL LB + 35 μg/mL kanamycin at 37°C with shaking. The next day, we washed these cultures twice into M9+Kan as described previously. We diluted cultures 1:200 into each of the 22 media conditions described in Continuous Culture and adapted cells for 4 hours at 37°C with shaking. Following adaptation, we washed cells into 4 mL of their respective supplemented media + 50 ng/mL anhydrotetracycline at OD600 = 0.005. Three 200 μL replicates of each culture were grown in a Synergy Neo2 Hybrid Multi Mode Reader (Agilent, #BTNEO2) overnight at 37°C while taking regular OD600 measurements. Growth rates were fit across replicates using a log-linear fit of exponential phase growth (empirically determined, log2(OD) between −8 and −4) and averaged across triplicate measurements.
Calculating CRISPRi growth rates
We wrote all analysis code in Python 3.9.12 using Jupyter Notebook (available on GitHub, DOI is listed in the key resources table). We extracted sgRNA sequences from Illumina FASTQ files using regular expressions, searching for a 20 bp region between flanking sequences CTAGCTCTAAAAC and A. Up to 3 mismatches and no gaps were permitted in the flanking sequences. The identified homology region was required to match a desired sequence from the CRISPRi library. In addition, to increase confidence, we imposed a filter requiring two specific base calls in each sequence to have a Q-score above 30. The filtered bases were the final “on-target” and first “mismatch” base in the homology region, as this two-nucleotide combination is unique to a single sgRNA within each sgRNA family. We identified the plasmid barcode similarly, with flanking sequences GTACAGCGAGGCAAC and ACGGATCCCCAC, imposing a maximum of 3 mismatches and allowing no gaps. The barcode sequence needed to match a target barcode exactly, and no Q-score filter was imposed.
For the three-sgRNA library, the following modifications were made. As sequencing quality decreases for a 500-cycle kit, the sgRNA flanking sequences were extended to CTAGCTCTAAAAC and ACTAGTATTATAC. These flanking sequences were permitted to have up to 10 mismatches for sgRNAs 1 and 2, and 6 mismatches for sgRNA 3. Flanking sequences for the barcode were the same as above and were permitted to have up to 6 mismatches. The barcode itself was required to match a target exactly, however the cutoff to call a specific sgRNA homology region was lowered. Sequences in position 1 were allowed to have as many as 2 mismatches, and sequences in position 2 were allowed to have as many as 10 mismatches. This permissive filter was required due to the sharp decrease in sequencing confidence at the end of the sequencing run. However, if a sequenced sgRNA ambiguously mapped to multiple library sequences with the same number of mismatches, no identity was called. We performed next-generation sequencing analysis from HiSeq data on a high-performance computing cluster, while all subsequent analysis was performed locally.
We calculated relative growth rates following each CRISPRi treatment using the change in sgRNA counts over time. At each timepoint (t), we normalized counts for a specific barcoded CRISPRi construct (sgRNA) to the control construct (Nont) and our initial timepoint (t0) using Equation 6.
| (6) |
We then log2-transformed this relative frequency prior to growth rate fitting. For the pairwise sgRNA libraries, if fewer than 10 counts were identified for a construct at a given timepoint, no future timepoints were considered for that construct. Growth rates were calculated for CRISPRi constructs with sufficient counts over at least the first three timepoints. For pairwise and third-order CRISPRi experiments, the time axis was rescaled from hours to generations based on the overall culture generation time in each continuous culture vial. We then fit a line to our log2(Relative frequency) vs. generation data using scipy.stats.linregress and defined the slope of the best fit line as the raw growth rate. At this step, two sgRNAs (gdhA_1_42_B_MM14 and gdhA_3_216_B_MM8) were determined to have off-target effects and were removed from future analysis (see Jupyter Notebook 3_Pairwise_Growth_Rates.ipynb for details). For gene-by-environment analysis, continuous culture growth rates for each media and sgRNA combination were rescaled using plate reader absolute growth data using Equation 7.
| (7) |
We then removed escapers, CRISPRi constructs that abnormally lacked a growth rate deficit, using a one-way Dixon’s Q-test for outliers at 95% confidence, as in Mathis et al. 20219. Briefly, this test determines whether a given growth rate measurement is significantly faster than all other replicates of the same condition and removes the measurement if it is identified as an outlier. If at least four growth rates were successfully calculated across barcodes and sgRNA orders after removing escapers, the replicates were averaged to determine a mean relative growth rate. Finally, we normalized growth rates so the fully Nont control had a growth rate of 1 and the most severe growth rate perturbation observed (glyA_1_26_C + thyA_1_60_B_MM2, Min GR) had a growth rate of 0 using Equation 8.
| (8) |
Fitting the Coupling Model
We fit single-gene expression-growth rate curves using scipy.optimize.least_squares, bounding R0 to non-negative values with an initial guess of 0.5 for R0 and 0 for n. We fit gene-gene coupling values for each gene pair using the strategy described in Zimmer et al., 201629, with some modifications. We optimized coupling constants aij and aji using scipy.optimize.least_squares, bounding between −1 and 10 with an initial guess of 0 for both. To limit the spread of coupling constant values, we calculated the RMSD between growth rate predictions and experimental measurements with an added regularization term . We used a regularization value of λ = 10−1.25 after evaluating a range of values on subsampled data for both the pairwise and folA/thyA libraries (Fig S2). We calculated pairwise and third-order predicted growth rates using each gene’s individual repression-growth rate parameters and relevant coupling constants.
Estimating Uncertainty with Bootstrapping
We used bootstrapping to computationally evaluate how sensitive each model’s predictions were to changes in single perturbation-growth rate fits. For each single perturbation curve, we generated a new training set of the same size as our input data set by resampling data with replacement from individual perturbation-growth rate measurements. For cases where both perturbation intensity and growth rate measurements had associated uncertainties (e.g., qPCR-based repression measurements and NGS-based growth rate measurements), we resampled these measurements independently. In all cases, we refit perturbation-growth curves using resampled data 100 times and used the resulting curves to estimate variance and establish 95% confidence intervals on our model’s fit parameters and accuracy. For our pathway CRISPRi experiment, we set an upper bound for the Hill coefficient of these bootstrapped single-gene curves at 1000 to avoid numerical instability on steep growth curves. As this experiment consisted of 19 sparsely sampled single gene expression-growth rate curves, our bootstrapped models showed a slight but consistent increase in RMSD relative to our model fit on the original data. As these expression-growth rate curves are fit on 3–6 data points (instead of 10–12, as in our initial study), we expect that at least one bootstrapped curve will be poorly fit in each iteration, and this stochasticity drives the slight RMSD increase.
Identifying Coupling Directionality between Perturbations
We assessed coupling directionality by fitting two separate continuous epistasis models on each set of pairwise perturbation data. In the two models, we bounded one coupling constant to zero and optimized the remaining coupling constant given the pairwise data. We identified the strongest coupling based on which model (the model fit using only aij or only aji) more accurately captured the data, based on RMSD. The coupling resulting in the lower RMSD was retained. In all cases for our pathway screen, one coupling significantly outperformed the other (>20% decrease in RMSD). Based on coupling constant distributions (Fig S6E), we set an empirical cutoff at 0.1 and visualized couplings above this threshold.
Computing Pairwise Sequence Identity
For each protein in our pathway CRISPRi data set, we collected orthologous protein sequences spanning all enterobacterales species using OrthoDB v11 on 9/14/2023. We generated a multiple sequence alignment for each group of orthologs with Clustal Omega with default settings. To ensure equal representation for each gene, we filtered these alignments to include only the 343 species with orthologs to all 19 genes. Finally, we computed the pairwise sequence identity between each ortholog and the reference E. coli sequence, generating a distribution of pairwise sequence identities for each gene.
Isserlis and Regression Model
Predictions from the Isserlis and regression-based models were made for third-order perturbations using Equations 3 and 4 (respectively). For fourth-order perturbations, we used Equations 9 and 10. Fourth-order Isserlis (Equation 9):
| (9) |
Fourth-order Regression (Equation 10):
| (10) |
To fairly compare these models to our continuous epistasis model, we utilized their “smoothed” versions, as in Zimmer et al., 201629. In this case, growth rates following first-order perturbations (, , etc.) were drawn from single perturbation-growth curves (Fig 2E–M), just as in our continuous epistasis model. Pairwise perturbation terms (, , etc.) were drawn from pairwise expression-growth rate landscapes generated by the continuous epistasis model. This ensures that all models (including our own) utilize the same low-order training data to predict high-order phenotypes.
Quantification and Statistical Analysis
Quantification details for all methods are specified under the relevant subheading in the Method Details section.
We identified outliers in growth rate replicates using a one-sided Dixon’s Q-test at 95% confidence, as described in the Method Details section, and the test was performed using custom Python code (available on GitHub, DOI is listed in the key resources table).
Model performance (RMSD or AIC) was determined to be significantly different if the 95% confidence intervals for two models were exclusive, meaning the upper bound of one confidence interval was lower than the lower bound of the other interval.
We performed a Welch’s t-test to compare pairwise sequence identity distributions using scipy.stats.ttest_ind setting equal_var=False.
Supplementary Material
Table S1: sgRNA homology sequences and their respective repression efficiencies (measured by qPCR) or relative repression efficiencies (estimated).
Table S4: Gene-gene couplings (aij and aji values) for all gene pairs in our 19 gene pathway CRISPRi assay.
Highlights.
We measured E. coli growth rate in response to ~8,000 titrated, pairwise gene knockdowns
A sparse subsampling of these data sufficed to train a predictive machine learning model
A model trained on pairwise data could predict the effects of high-order perturbations
Model coupling parameters identified gene clusters that hierarchically control growth
Acknowledgments
The authors thank Olivier Rivoire for insightful discussions and data analysis ideas. We are grateful to Prashant Mishra for thoughtful comments on our manuscript, and Andrew Mathis for feedback in the early stages of this work. We also thank the Reynolds lab for feedback on the project and manuscript. Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award number R01GM136842. R.O. was also partly supported by a National Institutes of Health training grant 5T32GM007062-46. This content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Inclusion and Diversity
One or more of the authors of this paper self-identifies as a gender minority in their field of research.
Footnotes
Declaration of Interests
The authors declare no competing interests.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Bernard Q, Smith AA, Yang X, Koci J, Foor SD, Cramer SD, Zhuang X, Dwyer JE, Lin Y-P, Mongodin EF, et al. (2018). Plasticity in early immune evasion strategies of a bacterial pathogen. Proc. Natl. Acad. Sci. 115. 10.1073/pnas.1718595115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Abt MC, McKenney PT, and Pamer EG (2016). Clostridium difficile colitis: pathogenesis and host defence. Nat. Rev. Microbiol. 14, 609–620. 10.1038/nrmicro.2016.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Depardieu F, Podglajen I, Leclercq R, Collatz E, and Courvalin P (2007). Modes and Modulations of Antibiotic Resistance Gene Expression. Clin. Microbiol. Rev. 20, 79–114. 10.1128/CMR.00015-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Woodford N, and Ellington MJ (2007). The emergence of antibiotic resistance by mutation. Clin. Microbiol. Infect. 13, 5–18. 10.1111/j.1469-0691.2006.01492.x. [DOI] [PubMed] [Google Scholar]
- 5.Keren L, Hausser J, Lotan-Pompan M, Vainberg Slutskin I, Alisar H, Kaminski S, Weinberger A, Alon U, Milo R, and Segal E (2016). Massively Parallel Interrogation of the Effects of Gene Expression Levels on Fitness. Cell 166, 1282–1294.e18. 10.1016/j.cell.2016.07.024. [DOI] [PubMed] [Google Scholar]
- 6.Zelcbuch L, Antonovsky N, Bar-Even A, Levin-Karp A, Barenholz U, Dayagi M, Liebermeister W, Flamholz A, Noor E, Amram S, et al. (2013). Spanning high-dimensional expression space using ribosome-binding site combinatorics. Nucleic Acids Res. 41, e98. 10.1093/nar/gkt151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bauer CR, Li S, and Siegal ML (2015). Essential gene disruptions reveal complex relationships between phenotypic robustness, pleiotropy, and fitness. Mol. Syst. Biol. 11, 773. 10.15252/msb.20145264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Hawkins JS, Silvis MR, Koo B-M, Peters JM, Osadnik H, Jost M, Hearne CC, Weissman JS, Todor H, and Gross CA (2020). Mismatch-CRISPRi Reveals the Co-varying Expression-Fitness Relationships of Essential Genes in Escherichia coli and Bacillus subtilis. Cell Syst. 11, 523–535.e9. 10.1016/j.cels.2020.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Mathis AD, Otto RM, and Reynolds KA (2021). A simplified strategy for titrating gene expression reveals new relationships between genotype, environment, and bacterial growth. Nucleic Acids Res. 49, e6–e6. 10.1093/nar/gkaa1073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Jaffe M, Dziulko A, Smith JD, St.Onge RP, Levy SF, and Sherlock G (2019). Improved discovery of genetic interactions using CRISPRiSeq across multiple environments. Genome Res. 29, 668–681. 10.1101/gr.246603.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Price MN, Wetmore KM, Waters RJ, Callaghan M, Ray J, Liu H, Kuehl JV, Melnyk RA, Lamson JS, Suh Y, et al. (2018). Mutant phenotypes for thousands of bacterial genes of unknown function. Nature 557, 503–509. 10.1038/s41586-018-0124-0. [DOI] [PubMed] [Google Scholar]
- 12.Domingo J, Baeza-Centurion P, and Lehner B (2019). The Causes and Consequences of Genetic Interactions (Epistasis). Annu. Rev. Genomics Hum. Genet. 20, 433–460. 10.1146/annurev-genom-083118-014857. [DOI] [PubMed] [Google Scholar]
- 13.Vigouroux A, Oldewurtel E, Cui L, Bikard D, and van Teeffelen S (2018). Tuning dCas9’s ability to block transcription enables robust, noiseless knockdown of bacterial genes. Mol. Syst. Biol. 14. 10.15252/msb.20177899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Li X-T, Jun Y, Erickstad MJ, Brown SD, Parks A, Court DL, and Jun S (2016). tCRISPRi: tunable and reversible, one-step control of gene expression. Sci. Rep. 6, 39076. 10.1038/srep39076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Silvis MR, Rajendram M, Shi H, Osadnik H, Gray AN, Cesar S, Peters JM, Hearne CC, Kumar P, Todor H, et al. (2021). Morphological and Transcriptional Responses to CRISPRi Knockdown of Essential Genes in Escherichia coli. mBio 12, e0256121. 10.1128/mBio.02561-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Jiang W, Oikonomou P, and Tavazoie S (2020). Comprehensive Genome-wide Perturbations via CRISPR Adaptation Reveal Complex Genetics of Antibiotic Sensitivity. Cell 180, 1002–1017.e31. 10.1016/j.cell.2020.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ward RD, Tran JS, Banta AB, Bacon EE, Rose WE, and Peters JM (2023). Essential Gene Knockdowns Reveal Genetic Vulnerabilities and Antibiotic Sensitivities in Acinetobacter baumannii. BioRxiv Prepr. Serv. Biol, 2023.08.02.551708. 10.1101/2023.08.02.551708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Mitchell G, Silvis MR, Talkington KC, Budzik JM, Dodd CE, Paluba JM, Oki EA, Trotta KL, Licht DJ, Jimenez-Morales D, et al. (2022). Ceragenins and Antimicrobial Peptides Kill Bacteria through Distinct Mechanisms. mBio 13, e0272621. 10.1128/mbio.02726-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hogan AM, and Cardona ST (2022). Gradients in gene essentiality reshape antibacterial research. FEMS Microbiol. Rev. 46, fuac005. 10.1093/femsre/fuac005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Bosch B, DeJesus MA, Poulton NC, Zhang W, Engelhart CA, Zaveri A, Lavalette S, Ruecker N, Trujillo C, Wallach JB, et al. (2021). Genome-wide gene expression tuning reveals diverse vulnerabilities of M. tuberculosis. Cell 184, 4579–4592.e24. 10.1016/j.cell.2021.06.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Choudhery S, DeJesus MA, Srinivasan A, Rock J, Schnappinger D, and Ioerger TR (2023). A dose-response based model for statistical analysis of chemical genetic interactions in CRISPRi libraries. BioRxiv Prepr. Serv. Biol, 2023.08.03.551759. 10.1101/2023.08.03.551759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wood K, Nishida S, Sontag ED, and Cluzel P (2012). Mechanism-independent method for predicting response to multidrug combinations in bacteria. Proc. Natl. Acad. Sci. 109, 12254–12259. 10.1073/pnas.1201281109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Wood KB, Wood KC, Nishida S, and Cluzel P (2014). Uncovering Scaling Laws to Infer Multidrug Response of Resistant Microbes and Cancer Cells. Cell Rep. 6, 1073–1084. 10.1016/j.celrep.2014.02.007. [DOI] [PubMed] [Google Scholar]
- 24.Zimmer A, Tendler A, Katzir I, Mayo A, and Alon U (2017). Prediction of drug cocktail effects when the number of measurements is limited. PLOS Biol. 15, e2002518. 10.1371/journal.pbio.2002518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Katzir I, Cokol M, Aldridge BB, and Alon U (2019). Prediction of ultra-high-order antibiotic combinations based on pairwise interactions. PLOS Comput. Biol. 15, e1006774. 10.1371/journal.pcbi.1006774. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Tendler A, Zimmer A, Mayo A, and Alon U (2019). Noise-precision tradeoff in predicting combinations of mutations and drugs. PLOS Comput. Biol. 15, e1006956. 10.1371/journal.pcbi.1006956. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Pandey S, Gruenbaum A, Kanashova T, Mertins P, Cluzel P, and Chevrier N (2020). Pairwise Stimulations of Pathogen-Sensing Pathways Predict Immune Responses to Multi-adjuvant Combinations. Cell Syst. 11, 495–508.e10. 10.1016/j.cels.2020.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Larkins-Ford J, Greenstein T, Van N, Degefu YN, Olson MC, Sokolov A, and Aldridge BB (2021). Systematic measurement of combination-drug landscapes to predict in vivo treatment outcomes for tuberculosis. Cell Syst. 12, 1046–1063.e7. 10.1016/j.cels.2021.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Zimmer A, Katzir I, Dekel E, Mayo AE, and Alon U (2016). Prediction of multidimensional drug dose responses based on measurements of drug pairs. Proc. Natl. Acad. Sci. 113, 10442–10447. 10.1073/pnas.1606301113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Twarog NR, Stewart E, Hammill CV, and Shelat AA (2016). BRAID: A Unifying Paradigm for the Analysis of Combined Drug Action. Sci. Rep. 6, 25523. 10.1038/srep25523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Lotfollahi M, Klimovskaia Susmelj A, De Donno C, Hetzel L, Ji Y, Ibarra IL, Srivatsan SR, Naghipourfar M, Daza RM, Martin B, et al. (2023). Predicting cellular responses to complex perturbations in high-throughput screens. Mol. Syst. Biol. 19, e11517. 10.15252/msb.202211517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Zhang H, Ek CH, Rattray M, and Milo M (2023). SynBa: improved estimation of drug combination synergies with uncertainty quantification. Bioinforma. Oxf. Engl. 39, i121–i130. 10.1093/bioinformatics/btad240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Meyer CT, Wooten DJ, Paudel BB, Bauer J, Hardeman KN, Westover D, Lovly CM, Harris LA, Tyson DR, and Quaranta V (2019). Quantifying Drug Combination Synergy along Potency and Efficacy Axes. Cell Syst. 8, 97–108.e16. 10.1016/j.cels.2019.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wooten DJ, Meyer CT, Lubbock ALR, Quaranta V, and Lopez CF (2021). MuSyC is a consensus framework that unifies multi-drug synergy metrics for combinatorial drug discovery. Nat. Commun. 12, 4607. 10.1038/s41467-021-24789-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Qi LS, Larson MH, Gilbert LA, Doudna JA, Weissman JS, Arkin AP, and Lim WA (2013). Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression. Cell 152, 1173–1183. 10.1016/j.cell.2013.02.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Peters JM, Colavin A, Shi H, Czarny TL, Larson MH, Wong S, Hawkins JS, Lu CHS, Koo B-M, Marta E, et al. (2016). A Comprehensive, CRISPR-based Functional Analysis of Essential Genes in Bacteria. Cell 165, 1493–1506. 10.1016/j.cell.2016.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Jost M, Santos DA, Saunders RA, Horlbeck MA, Hawkins JS, Scaria SM, Norman TM, Hussmann JA, Liem CR, Gross CA, et al. (2020). Titrating gene expression using libraries of systematically attenuated CRISPR guide RNAs. Nat. Biotechnol. 38, 355–364. 10.1038/s41587-019-0387-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Larson MH, Gilbert LA, Wang X, Lim WA, Weissman JS, and Qi LS (2013). CRISPR interference (CRISPRi) for sequence-specific control of gene expression. Nat. Protoc. 8, 2180–2196. 10.1038/nprot.2013.132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Rousset F, Cui L, Siouve E, Becavin C, Depardieu F, and Bikard D (2018). Genome-wide CRISPR-dCas9 screens in E. coli identify essential genes and phage host factors. PLOS Genet. 14, e1007749. 10.1371/journal.pgen.1007749. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Wang T, Guan C, Guo J, Liu B, Wu Y, Xie Z, Zhang C, and Xing X-H (2018). Pooled CRISPR interference screening enables genome-scale functional genomics study in bacteria with superior performance. Nat. Commun. 9, 2475. 10.1038/s41467-018-04899-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, and Mori H (2006). Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol. Syst. Biol. 2. 10.1038/msb4100050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Kryazhimskiy S (2021). Emergence and propagation of epistasis in metabolic networks. eLife 10, e60200. 10.7554/eLife.60200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Côté J-P, French S, Gehrke SS, MacNair CR, Mangat CS, Bharat A, and Brown ED (2016). The Genome-Wide Interaction Network of Nutrient Stress Genes in Escherichia coli. mBio 7, e01714–16. 10.1128/mBio.01714-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Jiang W, Bikard D, Cox D, Zhang F, and Marraffini LA (2013). RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nat. Biotechnol. 31, 233–239. 10.1038/nbt.2508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Toprak E, Veres A, Yildiz S, Pedraza JM, Chait R, Paulsson J, and Kishony R (2013). Building a morbidostat: an automated continuous-culture device for studying bacterial drug resistance under dynamically sustained drug inhibition. Nat. Protoc. 8, 555–567. 10.1038/nprot.2013.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Toprak E, Veres A, Michel J-B, Chait R, Hartl DL, and Kishony R (2012). Evolutionary paths to antibiotic resistance under dynamically sustained drug selection. Nat. Genet. 44, 101–105. 10.1038/ng.1034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Scamuffa MD, and Caprioli RM (1980). Comparison of the mechanisms of two distinct aldolases from Escherichia coli grown on gluconeogenic substrates. Biochim. Biophys. Acta 614, 583–590. 10.1016/0005-2744(80)90247-8. [DOI] [PubMed] [Google Scholar]
- 48.Avison MB, Horton RE, Walsh TR, and Bennett PM (2001). Escherichia coli CreBC is a global regulator of gene expression that responds to growth in minimal media. J. Biol. Chem. 276, 26955–26961. 10.1074/jbc.M011186200. [DOI] [PubMed] [Google Scholar]
- 49.Luo H, Gao F, and Lin Y (2015). Evolutionary conservation analysis between the essential and nonessential genes in bacterial genomes. Sci. Rep. 5, 13210. 10.1038/srep13210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Isserlis L (1918). On a Formula for the Product-Moment Coefficient of any Order of a Normal Frequency Distribution in any Number of Variables. Biometrika 12, 134. 10.2307/2331932. [DOI] [Google Scholar]
- 51.Otwinowski J, and Nemenman I (2013). Genotype to phenotype mapping and the fitness landscape of the E. coli lac promoter. PloS One 8, e61570. 10.1371/journal.pone.0061570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Schober AF, Mathis AD, Ingle C, Park JO, Chen L, Rabinowitz JD, Junier I, Rivoire O, and Reynolds KA (2019). A Two-Enzyme Adaptive Unit within Bacterial Folate Metabolism. Cell Rep. 27, 3359–3370.e7. 10.1016/j.celrep.2019.05.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Kwon YK, Higgins MB, and Rabinowitz JD (2010). Antifolate-Induced Depletion of Intracellular Glycine and Purines Inhibits Thymineless Death in E. coli. ACS Chem. Biol. 5, 787–795. 10.1021/cb100096f. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Sangurdekar DP, Zhang Z, and Khodursky AB (2011). The association of DNA damage response and nucleotide level modulation with the antibacterial mechanism of the anti-folate drug Trimethoprim. BMC Genomics 12, 583. 10.1186/1471-2164-12-583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Kuzmin E, VanderSluis B, Wang W, Tan G, Deshpande R, Chen Y, Usaj M, Balint A, Mattiazzi Usaj M, van Leeuwen J, et al. (2018). Systematic analysis of complex genetic interactions. Science 360, eaao1729. 10.1126/science.aao1729. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Kuchina A, Brettner LM, Paleologu L, Roco CM, Rosenberg AB, Carignano A, Kibler R, Hirano M, DePaolo RW, and Seelig G (2021). Microbial single-cell RNA sequencing by split-pool barcoding. Science 371, eaba5257. 10.1126/science.aba5257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Blattman SB, Jiang W, Oikonomou P, and Tavazoie S (2020). Prokaryotic single-cell RNA sequencing by in situ combinatorial indexing. Nat. Microbiol. 5, 1192–1201. 10.1038/s41564-020-0729-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Dixit A, Parnas O, Li B, Chen J, Fulco CP, Jerby-Arnon L, Marjanovic ND, Dionne D, Burks T, Raychowdhury R, et al. (2016). Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens. Cell 167, 1853–1866.e17. 10.1016/j.cell.2016.11.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Head SR, Komori HK, LaMere SA, Whisenant T, Van Nieuwerburgh F, Salomon DR, and Ordoukhanian P (2014). Library construction for next-generation sequencing: overviews and challenges. BioTechniques 56, 61–64, 66, 68, passim. 10.2144/000114133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Kacser H, and Burns JA (1981). The molecular basis of dominance. Genetics 97, 639–666. 10.1093/genetics/97.3-4.639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Kacser H, and Burns JA (1995). The control of flux. Biochem. Soc. Trans. 23, 341–366. 10.1042/bst0230341. [DOI] [PubMed] [Google Scholar]
- 62.Heinrich R, and Rapoport TA (1974). A linear steady-state treatment of enzymatic chains. General properties, control and effector strength. Eur. J. Biochem. 42, 89–95. 10.1111/j.1432-1033.1974.tb03318.x. [DOI] [PubMed] [Google Scholar]
- 63.Hartl DL, Dykhuizen DE, and Dean AM (1985). Limits of adaptation: the evolution of selective neutrality. Genetics 111, 655–674. 10.1093/genetics/111.3.655. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Chou H-H, Delaney NF, Draghi JA, and Marx CJ (2014). Mapping the fitness landscape of gene expression uncovers the cause of antagonism and sign epistasis between adaptive mutations. PLoS Genet. 10, e1004149. 10.1371/journal.pgen.1004149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Perfeito L, Ghozzi S, Berg J, Schnetz K, and Lässig M (2011). Nonlinear fitness landscape of a molecular pathway. PLoS Genet. 7, e1002160. 10.1371/journal.pgen.1002160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Dekel E, and Alon U (2005). Optimality and evolutionary tuning of the expression level of a protein. Nature 436, 588–592. 10.1038/nature03842. [DOI] [PubMed] [Google Scholar]
- 67.Noor E, Flamholz A, Bar-Even A, Davidi D, Milo R, and Liebermeister W (2016). The Protein Cost of Metabolic Fluxes: Prediction from Enzymatic Rate Laws and Cost Minimization. PLoS Comput. Biol. 12, e1005167. 10.1371/journal.pcbi.1005167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Taggart JC, Lalanne J-B, and Li G-W (2021). Quantitative Control for Stoichiometric Protein Synthesis. Annu. Rev. Microbiol. 75, 243–267. 10.1146/annurev-micro-041921-012646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Li G-W, Burkhardt D, Gross C, and Weissman JS (2014). Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell 157, 624–635. 10.1016/j.cell.2014.02.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Byun G, Yang J, and Seo SW (2023). CRISPRi-mediated tunable control of gene expression level with engineered single-guide RNA in Escherichia coli. Nucleic Acids Res. 51, 4650–4659. 10.1093/nar/gkad234. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Table S1: sgRNA homology sequences and their respective repression efficiencies (measured by qPCR) or relative repression efficiencies (estimated).
Table S4: Gene-gene couplings (aij and aji values) for all gene pairs in our 19 gene pathway CRISPRi assay.
Data Availability Statement
All FASTQ files generated from next-generation sequencing experiments are available through the Sequence Read Archive. The accession number is listed in the key resources table. All qPCR and E. coli growth rate data from this work is available in formatted, machine-parseable Excel spreadsheets available via our GitHub repository The DOI is listed in the key resources table.
All original code has been deposited at GitHub and is publicly available as of the date of publication. The DOI is listed in the key resources table.
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
