Abstract
Sequence to activity mapping technologies are rapidly developing, enabling the generation and isolation of mutations conferring novel phenotypes. Here we used the CRISPR EnAbled Trackable genome Engineering (CREATE) technology to investigate the inhibition of the essential ispC gene in its native genomic context in Escherichia coli. We created a full saturation library of 33 sites proximal to the ligand binding pocket and challenged this library with the antimalarial drug fosmidomycin, which targets the ispC gene product, DXR. This selection is especially challenging since it is a relatively weak in E. coli, with multiple naturally occurring pathways for resistance. We identified several previously unreported mutations that confer fosmidomycin resistance, in highly conserved sites that also exist in pathogens including the malaria-inducing Plasmodium falciparum. This approach may have implications for the isolation of resistance-conferring mutations and may affect the design of future generations of fosmidomycin-based drugs.
Keywords: sequence to activity mapping, Deoxyxylulose Phosphate Reductoisomerase, fosmidomycin, acquired resistance, malaria, CRISPR/Cas9
Graphical Abstract

Drug resistance remains a major concern in relation to human health. Resistance may arise through multiple mechanisms including the horizontal transfer of resistance genes, differential expression of various genes such as efflux pumps, and through mutations of the drug target or auxiliary genes.1–3 Mutational resistance is an evolutionary process that takes advantage of increased mutation rates that are triggered by genome instability 4,5 or stress signals,6–9 to scan the mutational landscape for an adaptive solution. The reduced pace of new antibiotic discovery, combined with widespread use of current antibiotics has led to numerous fatalities due to multidrug-resistant bacterial infections suggesting that we are entering a post-antibiotic age.10–12 Similarly, tumor cells frequently mutate the target of specific chemotherapeutics and thus achieve resistance.13–15 Hence, the ability to predict which mutations will confer resistance to a particular drug is a critical early step in the drug development pipeline.
Deoxyxylulose Phosphate Reductoisomerase (DXR) catalyzes the first committed step of the DXP metabolic pathway and converts 1-deoxy-D-xylulose 5-phosphate (DOXP) to 2-C-methyl-D- erythritol 4-phosphate (MEP). The DXP pathway produces isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP), which are essential metabolites in all life forms and are the precursors of the broadly important isoprenoid pathway. While most eukaryotes use the mevalonate pathway to produce these molecules, bacteria, and some protozoa use the alternative DXP pathway.16 Thus, the DXP pathway is an attractive target for inhibition.17,18 Indeed, the DXR inhibitor fosmidomycin (FSM) was successfully used to treat malaria patients either as a monotherapy or in combination with other drugs.19–21 Several pathways were found for FSM resistance in various organisms, including lack of uptake and increased copy number of the encoding genes.22–24 To date, a single point mutation in the E. coli DXR gene was found to confer FSM resistance.25
Sequence to activity mapping approaches explore the mutational landscape by assessing the impact of mutations on fitness. Such approaches link methods for producing sequence diversity with selection or screening techniques to isolate phenotypes of interest. Sequence diversity may be generated randomly, for example by error-prone PCR or adaptive laboratory evolution experiments. However, these approaches are not systematic and might miss important information. Saturation mutagenesis experiments are more comprehensive but often require some a priori knowledge about the exact sites to be mutated26. Recent advances in large-scale DNA synthesis and sequencing allow more thorough investigations by saturating every site within the target gene.27–31 While most methods were developed to search the mutational landscape in the plasmid context, genome editing is sometimes a preferable approach since it keeps the target gene in its native regulatory context. Moreover, working with plasmids on genes that exist genomically, might result in noisy results due to having the background of the wild-type allele. While desirable, direct genome editing is still challenging, especially when editing essential genes.
We recently reported on a genome editing method that combines recombineering and CRISPR-based methods to produce large-scale libraries with single amino acid polymorphism resolution.27 In short, the CRISPR EnAbled Trackable genome Engineering (CREATE) approach uses plasmid-based homology arms containing the desired mutation and a silent PAM mutation that protects the edited genome from further digestion (Fig. 1A). Synthesis of multiple such homology arms, together with their corresponding guide RNAs, allows the pooled cloning of plasmid libraries that are then introduced to the target cells (Fig. 1B-D). Unedited wild-type genomes are attacked by the CRISPR-CAS9 system, inducing genomic double-strand breaks and subsequent death, while edited cells are protected by the silent PAM mutation that introduces synonymous nucleotide changes. Relative fitness values can be calculated by deep sequencing of the editing plasmids that serve as trans-acting barcodes for mutation identification. Here, we took advantage of available experimental and structural information and mapped the sequence to activity relationships of a preselected set of 660 targeted mutations in the E. coli DXR gene, ispC. This library targeted residues proximal to the FSM binding pocket with the aim of isolating FSM resistant mutants. As opposed to non-biased approaches, where every possible mutant is tested,27 smaller, rationally designed and more restricted libraries are expected to perform better under weak selection such as FSM. Due to the presence of multiple pathways conferring resistance, FSM is considered a relatively weak selection in E. coli.32,33 In addition, FSM only restricts bacterial growth for about 16–24 hours, after which cells regain normal cellular growth, providing a short window for mutant generation and selection. Thus, this weak selection pressure introduces an opportunity to test the boundaries of the CREATE approach under less-than-ideal conditions.
Fig. 1.
A schematic representation of the experimental procedure. A, an individual CREATE cassette includes a homology arm, harboring both the target site and adjacent mutations, and the corresponding gRNA. B, 660 such cassettes were synthesized on an array and were inserted into a plasmid backbone in a single multiplex cloning reaction (C). D, Plasmid library was transformed in triplicate into E. coli cells with the transcription induction of Cas9 and the recombineering machinery. Following 2 hours of recovery (E), cells were either plated on FSM-containing plates (F), or cultured overnight with (G) or without (H) the presence of FSM. Samples for deep sequencing were taken from E, G, and H. For a more detailed CREATE cassette, please refer to Fig. S1.
Nevertheless, we isolated several previously uncharacterized mutations that confer resistance. These results may prove valuable when designing future drugs that are based on FSM structure. Moreover, the approach described here is broadly applicable to map many other resistance phenotypes under suboptimal selection conditions, with potential implications for drug development timelines.
RESULTS
The ispC target sites included amino acids within the NADPH, ligand and FSM binding sites as described previously,34,35 and resulted in a total of 32 target amino acids (Materials and Methods). An additional site was randomly added for control purposes, adding to a total of 33 sites (Fig. 2A). Each site was completely saturated to all remaining 19 amino acids and additionally to the native amino acid for control, resulting in a 660-mutant library. Each CREATE cassette harbors a mutated target site, an adjacent silently mutated PAM and a corresponding gRNA as depicted in Figure 1A and Fig. S1. The 660 cassette library was synthesized on an Agilent array, together with other CREATE oligos designed for different experiments, as previously described.27 Genome-edited library cells were prepared in triplicates and allowed to recover for two hours. Library cells were then incubated overnight either with or without 100μM FSM as well as plated on agar plates containing FSM (100μM).
Fig. 2.
A, the DXR target sites. The DXR structure is shown in gray (PDB #1q0l). Red residues represent mutated sites enriched following FSM incubation, while the green amino acids represent the rest of the library. FSM is colored orange, and NADPH is in blue. B, eight different mutations were isolated from FSM-resistant colonies (step F in Fig.1). The CREATE plasmids were isolated, retransformed into fresh E. coli cells and treated with FSM. Optical density was measured following 24 hours of incubation. Data represent three independent experiments. C. library cells growth in the presence of FSM. Optical density was measured 16 hours following FSM addition. Results correspond to steps G and H in Fig. 1. Control cells were edited in the E. coli galK gene which is not related to growth in LB or FSM resistance.
FSM-resistant colonies
24 colonies were recovered from the FSM selection plates and were subjected to both CREATE plasmid and ispC genomic sequencing. Sequencing results are shown in Figure S2. Eight unique mutations in four sites were identified and confirmed genomically (Fig. 2A). The genomically edited colonies included CREATE plasmids that span between zero and four single base mismatches relative to the cassette design, indicating that CREATE editing can be robust to a small number of mutations within the homology arm that can arise in the synthesis, amplification or cloning steps of the process.
Out of the nine colonies that were not edited (37.5%), plasmid sequencing revealed that seven had crossovers between the homology arm and the gRNA, and two had extensive deletions (Fig. S2). Crossovers could occur during multiplex PCR library amplification, cloning or preparation for sequencing and result in a chimeric cassette.36 A mismatch between the homology arm and the gRNA portions of a CREATE cassette is likely to result in cell death since there is no repair template. Indeed, all of the crossed-over gRNAs displayed additional gRNA spacer mutations, most probably rendering the gRNA ineffective (Fig. S2). These wild-type resistant colonies may have adapted to resist FSM via different mechanisms (see Discussion).
Cells harboring each of the eight individual mutations were cultured overnight, and the CREATE plasmid was extracted and re-transformed into new electrocompetent cells. Each culture, now harboring a single point mutation was subjected to 100μM FSM, similar to the original selection. All mutant cells grew to higher optical density than control cells, validating that these mutations indeed confer FSM resistance. (Fig. 2B)
Culture selection
The library cells in the selected liquid cultures grew to a higher optical density than control cells that were edited in the galK gene, a non-essential and non-DXP pathway gene (Fig. 2C). It should be noted that incubation times over 16 hours resulted in growth in the control tube, implying that adaptive events ultimately occur, as previously reported (see Discussion).
CREATE Plasmid deep sequencing
The CREATE plasmids were extracted from all cultures (Fig. 1E, G, H), the editing cassettes were PCR amplified, and amplicons were subjected to deep sequencing. Sequences were filtered to include 99% identity to the designs, to ensure high editing confidence and adequate mutant identification. Total usable reads summed to about 3.36×106 reads. Samples prior to selection included 630 unique CREATE cassettes, constituting 95.4% of the designed library.
All three repeats correlate with each other in a statistically significant manner (Fig. S3A-C).
To determine whether the library cells grew in the presence of FSM due to a specific set of mutations, enrichment analysis of the selected vs. the non-selected overnight cultures was performed (Fig. 1G, H). To establish which mutations were significantly enriched, the average enrichment of the silent mutations was calculated as a baseline (see Materials and Methods). Mutations in proline 274 were highly enriched in all three replicates, indicating it may be a significant player in FSM resistance, in line with the results from individual colonies (Fig. S4). Importantly, a P274K mutation was common to all three repeats (Fig. S4). The complete list of the significantly enriched mutants is shown in Supplementary Table 1. All mutations in the control site, G14, were within the error boundaries.
Genomic sequencing of enriched mutants
Targeting a single gene allows measuring the edited genomes directly, in addition to indirect measurements via the CREATE plasmid. Since the ispC gene is longer than the maximal read length of the MiSeq platform (ispC is 1197 bp in length while the maximal read length using an Illumina Miseq is currently 600 bases), the gene was amplified from the liquid cultures, and we used the Nextera XT (Illumina) kit to fragment the gene to smaller segments, resulting in less sequencing depth (See Discussion). Overall, a total of about 452,000 usable reads were obtained. Still, all three replicates significantly correlated with each other, although to a lesser extent than the plasmid sequencing (Fig. S3D-F). Sequence analysis showed enriched mutations in all repeats and confirmed the plasmid results with multiple substitutions in P274 being highly enriched (Fig. 3). This analysis led to many more statistically-significant enriched mutations, probably due to the differences in the mean and standard deviation calculations (see Materials and Methods). Still, the mutagenesis effects on the control site, G14, were within the error bounds (Fig. S5). Both analyses agree on the majority of mutations within the repeats: the intersections between the significant plasmid and genomic enrichments are P274K for repeat 1, P274K, L230W, P274H and P274M for repeat 2, and P274K and S186Q for repeat 3. Interestingly, while P274K is common to all repeats, both repeats 2 and 3 include unique mutations that are shared in both sequencing approaches (L230W and S186Q, respectively). These differences between repeats may be due to unequal input DNA during library preparation or due to stochastic dynamics of cell survival following electroporation and recovery and were the rationale behind the decision to analyze every repeat individually. In addition to the intersect mutations, D275W and M276E/T were shown to have significant enrichments as well. Apart from L230W, all sites were also found in the colony sequencing (Fig. 2B). Plotting the genomic enrichment vs. the plasmid enrichment values for every repeat showed that both sequencing approaches mostly agree regarding the enrichment and dilution of specific mutants (Fig. S6).
Fig. 3.
Enrichment analysis of library cells at the genomic level. A, enrichment histogram of all three repeats. B-D, Enriched mutants from repeats 1–3, respectively. The X-axis represents the DXR sequence. The dashed line represents the average of all data points, and the solid lines are two standard deviations from the average. Mutants with enrichment values exceeding two standard deviations are labeled. For clarity, the randomly selected internal control (G14) was omitted and can be found in Fig. S5. A similar analysis was also performed using the CREATE reporter plasmid and can be found in Fig. S4.
The P274 site
Mutations in P274 were found to be highly enriched in all FSM-treated repeats and the FSM-resistant colonies, suggesting that these mutations may confer FSM resistance while still retaining normal enzymatic activity. Figure 4 focuses on the genomic saturation data of this residue, ordered by the average enrichment values of the three repeats. P274 in its three-dimensional context is shown in Figure S7. The identity of the enriched mutations indicates that a replacement of the proline with a charged amino acid increases the chances for resistance while replacing it with another hydrophobic residue does not generally result in enrichment.
Fig. 4.
Saturation data of the P274 residue. A, The X-axis represents the final amino acids replacing the wild-type proline and are ordered according to the average value of the three repeats. Y-axis represents the enrichment values of each mutant. B, Growth response curve to FSM of the P274K, P274M, and P274R mutants relative to wild-type DXR and the previously reported S222T mutant by Armstrong et al. The data points and error bars represent mean ± standard error of the mean (SEM). Data shown represents three independent replicates. EC50 values derived from these curves are shown in Table S2.
To allow direct comparison of FSM resistance conferred by the mutants isolated here and the previously reported S222T mutant,25 we evaluated the EC50 of several mutants (Fig. 4B). Indeed, the three P274 mutants we tested (P274K, M and R), exhibited higher EC50 values than E. coli cells expressing wild-type DXR, and that were edited at the galK gene. The highly enriched mutation we observed in our CREATE experiments, P274K, had an EC50 value of 6.697μM, 5.4 times that of the wild-type DXR (EC50=1.245), and similar to the S222T mutant (EC50=6.33). The EC50 values of P274M and R were 3.36μM and 5.532μM respectively, situating these mutants between the S222T mutant and the wild-type allele. EC50 values and their corresponding confidence intervals of all tested mutants are shown in Table S2.
DISCUSSION
Drug resistance is an emerging concern for human health, affecting millions of patients worldwide. It spans from aggressive multidrug-resistant bacteria in hospitals to acquired resistance of cancer cells, to resistance to infectious diseases such as malaria. Hence, the ability to predict the potential emergence of resistance mutations and their relative probability before they rise in the clinic is important for future drug design. Here, we applied the CREATE technology to generate a relatively small genomic saturation mutagenesis library of the E. coli IspC gene, with the aim to isolate mutations that confer resistance to its specific inhibitor, FSM.
The CREATE approach enables the editing of target genomes rapidly, efficiently and systematically, leaving only a silent scar at an adjacent PAM site.27 Such genome editing techniques provide the opportunity to study more biologically relevant phenotypes, as plasmid-derived expression levels can introduce biases or ignore contributions of wild-type genomic alleles that may potentially interfere with interpretation, particularly when targeting essential genes as described here.
Sequencing individual FSM-resistant colonies revealed that a 100% match to the CREATE cassette design is not necessary, and up to four single base deletions (at least) are tolerated. Most unedited colonies harbored a chimeric cassette, with non-matching homology arms and gRNAs. While such errors are common in multiplex PCR reactions, there are several approaches to reduce their abundance.36,37 Such crossovers, if not further mutated will result in cell death due to the gRNA-induced genome cleavage. However, further mutations in the gRNA spacer region (as seen in all wild-type colonies, Figure S2) are a concern since these will not induce any double-strand breaks and might proliferate significantly faster than edited cells which require some recovery time prior to further growth. Hence, increasing the cassette design integrity is of importance. Here, we filtered the deep sequencing data to include cassettes that are 99% homologous to the original designs. This filtering approach allows up to only two point mutations within a cassette and removes all mutated cassettes from further analysis.
CREATE was designed to provide genome-wide variant analysis indirectly via the plasmid-encoded editing cassette sequence. In this study we have shown that by targeting a single locus we can readily validate the CREATE tracking method using direct sequencing of genomically-derived amplicons. Both direct and indirect sequencing identified mutants at the P274 site to be highly enriched following FSM selection (Figures 3 and S4). Either method has its advantages and disadvantages: first, the CREATE plasmid sequencing was much deeper, with significantly more relevant reads than the genomic sequencing. This is due to the sequencing method used for every run; while the CREATE cassette was designed to fit into a single deep sequencing read (300bp), the E. coli ispC gene is significantly longer, requiring a different sequencing approach.
We used the Nextera XT DNA Library Preparation Kit (Illumina) that involves fragmenting the PCR-amplified genomic segment. Since every gene copy was mutated in a single site, most fragments were wild-type and were filtered out at the analysis phase. In contrast, every read of the CREATE plasmid sequencing reaction contained relevant mutant information. However, because CREATE plasmid reads serve as a genomic proxy, they carry a higher probability of false positive reads, correlating with the selection strength. Our colony sequencing resulted in some wild-type genomic ispC, indicating background adaptation. Indeed, unlike trimethoprim resistance that can be acquired mostly by mutations in its target, DHFR,38–40 FSM resistance may arise via other mechanisms. Expression of the efflux pump coded by the E. coli FSM-resistance gene, fsr, was shown to confer resistance,32 as well as deletions of the adenylate cyclase (cya) and the glycerol-3-phosphate transporter (glpT) genes.33 Such adaptation events might have occurred in our selections thus allowing enrichment regardless of a mutation within the ispC gene. False positives may be identified and ignored by isolating mutations that are consistently enriched in several experimental repeats, such as the case with the P274K mutation. Nevertheless, since both direct and indirect approaches agree on the most enriched mutants, we demonstrate that CREATE is amenable for use in relatively weak selections such as FSM.
Mutations in proline 274 were highly enriched following FSM treatment. Adaptive mutations include charged residues which represent four out of the top six (K, H, E, and R) while the most diluted mutations are composed of hydrophobic amino acids. Replacing proline with another hydrophobic amino acid may either not alter FSM binding at all thus not conferring any resistance, or it may disrupt the DXR structure to the point that it interferes with its essential enzymatic activity. The current study cannot distinguish between these two options, and further studies are required for elucidating the resistance mechanisms of P274 mutations to FSM.
Interestingly, five out of the six most adaptive mutations cannot be achieved by mutating a single base in the proline codon used at this position and require at least two base changes: for the CCG codon to be mutated to K, H, E, and M, two base changes are required, while a mutation to N requires a complete codon change. This is in line with previous reports that in some cases the highest peaks in the mutational landscape are inaccessible when limited to single bases changes within a codon.27,41–43 The reason for this is that sometimes dramatic phenotypic changes require a considerable difference in structural or chemical properties, which are buffered by the genetic code structure.43 Having two or three base changes within a codon may occur either simultaneously (given enough genomes and high error rates), or sequentially (given enough time and depending on the fitness contribution of each individual step). Nevertheless, these are not expected to emerge in laboratory timescales and are less likely to occur naturally. Hence, the P274R may prove to be more clinically relevant than the other enriched mutations in this site (Fig. 4).
To date, a single mutation, S222T, was reported to confer FSM resistance in E. coli.25 While S222 was targeted in our library, it was not detected as enriched following FSM selection, but rather was depleted between the two hours recovered sample and the overnight, unselected culture (Fig. 1E vs. 1H). The source of this discrepancy is unknown, but may stem from at least two distinct explanations: First, since the CREATE cassette design is dictated by the genomic sequence, some cassettes will be more efficient than others. The S222T cassette might have been less efficient in editing thus not generating enough mutants to support a stable subpopulation. The second explanation may be the differences between the two approaches: while Armstrong et al. isolated the mutant by colony picking, here, the selection was made in culture, at the population level, adding an element of competition between resistant clones during recovery as well as during overnight culture. Indeed, the Km of the S222T mutant was shown to be significantly increased, reducing the affinity of the enzyme to its substrate, DOXP.25 While Armstrong et al. did not identify any effect of the S222T mutation on E. coli doubling time, this difference in Km may be manifested in such a way in our system where different mutants are pooled and compete for resources. We have reconstructed the S222T mutation using the CREATE approach and validated its FSM-resistance properties (Fig. 4B, S8).
The ability to perform rapid and systematic sequence to activity mapping in multiplex facilitates the isolation of drug-specific resistant mutants before they occur in the field. Armed with such knowledge, drug developers may derive drugs that are less likely to develop acquired resistance via mutagenesis, increasing the efficiency of the development cycle. While this study was made on the IspC gene of the model organism E. coli, the results presented here are of relevance for other, more clinically relevant pathogens. Multiple alignments of the complete list of reviewed DXR genes in Uniprot,44 spanning from bacteria through protozoa to plants, show high conservation values of the identified sites: P274 is conserved in 87.8% of the proteins while S186, D275, and M276 are 100%, 95.9% and 99.3% conserved, respectively. Importantly, and similarly to the previously reported S222 site, these amino acids are conserved in Plasmodium falciparum, making these results especially relevant for antimalarial drug development.25 Specifically, while the Plasmodium falciparum DXR proline residue codon differs from the E. coli P274 codon (CCA vs. CCG, respectively), the evolutionary implications on the development of resistance are similar to those of E. coli, with the P to R mutation being the most clinically relevant by requiring a single base change in the codon. Nevertheless, these results may be expanded by replacing the native E. coli gene with the Plasmodium ortholog and testing for organism-exclusive mutations. Ultimately, with the expansion of high throughput sequence to activity technologies to non-model organisms, similar experiments may be performed directly on the organism of interest.
MATERIALS AND METHODS
Plasmids and strains:
the plasmid coding for Cas9 was previously described.45 The CREATE backbone plasmid was described in the first CREATE paper.27 The recombineering plasmid used in this study was pSim5.46 All experiments were performed using the standard E. coli K12 MG1655 strain.
CREATE cassette design:
a CREATE cassette is 200bp, harboring two elements: (1) an editing segment, or homology arm that corresponds to a genomic region containing the target site. The target site is altered to translate to the amino acid of interest, and an adjacent PAM site is silently mutated to eliminate Cas9 recognition, but to preserve the amino acid sequence. (2) A corresponding gRNA, targeting the mutated PAM site. For a sequence example refer to Fig. S1. The complete library design can be found as Supplementary Database S1.
Target sites:
Target sites are based on previous structural analyses and represent portions of the NADPH, ligand, and FSM interacting amino acids.34,35 The library targets the following residues: 125, 150, 151, 152, 153, 186, 207, 208, 209, 210, 211, 212, 213, 214, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 254, 257, 274, 275, 276. Amino acid number 14 was randomly added to the library as an internal control.
Library preparation:
For a detailed protocol refer to Garst et al.27 ispC library oligos were purchased from Agilent and were received pooled with other CREATE libraries. DNA was purified using acrylamide gel electrophoresis. The ispC library was amplified using its unique primers, was cloned into a plasmid backbone, and was transformed into electrocompetent cells. Library plasmids were extracted (miniprep kit, Qiagen) and were transformed into electrocompetent, heat-shocked K12 MG1655 cells that were already harboring both pSim5 and Cas9 plasmids. For control, a galK (a Y145 to a stop codon mutation) CREATE plasmid was also transformed into separate cells. The galK edited cells were also used to assess editing efficiency using a pink/white colorimetric assay.47,48 All cells were recovered in LB with carbenicillin (CREATE plasmid), kanamycin (Cas9), and arabinose (Cas9 induction) for two hours.
Selection:
All selection experiments were performed with 100μM FSM (Molecular Probes, catalog number: F23103). Recovered cells were divided into three groups and were either plated on FSM-containing plates, or cultured in fresh media with or without FSM, and with carbenicillin, kanamycin, and arabinose (Fig. 1). Cells were incubated overnight at 37°C following optical density measurements or colony picking.
Deep sequencing:
For CREATE plasmid sequencing, the plasmid was extracted from the cultures (steps E, G, and H in Fig. 1), and the editing cassettes were PCR-amplified, with each sample harboring a different sequence barcode, as shown before.27 Amplicons were purified and pooled. For the genomic sequencing, cells were boiled for 5 minutes and the complete ispC gene was amplified from the different samples, including 200 base pairs up- and downstream from the gene using the following primers: forward - GCACTGTTGAAAGATAAAGAGATCAGCG and reverse - CTGGCAATTTTTCGCTAAGTGGTTGAG. Samples were digested and barcoded using the Nextera XT kit (Illumina). Both samples were sequenced using a MiSeq deep sequencer (Illumina) by the University of Colorado Boulder sequencing core.
Plasmid-based CREATE tracking analysis:
Processing of high-throughput sequencing reads and query matching: High-throughput sequencing of CREATE plasmids was performed using an Illumina MiSeq 2×150 paired-end reads run. Reads were demultiplexed according to an experiment-specific unique barcode, allowing a maximum of 3 mismatches, and then merged using the PANDAseq assembler (v2.10). Merged reads were matched to the database of all designed cassettes using the usearch_global algorithm (v9.2.64), with an identity threshold of 90% and a minimal alignment length of 150 bp (75% of total). 40 hits were allowed for each query, which were subsequently sorted by percent identity and the best-matching cassette was chosen. To generate read counts for each designed cassette, only reads that had a full alignment and an identity higher than 99% were used. Data filtering and enrichment analysis: Data frames of read counts were generated and processed using the Pandas data analysis python package (v0.20.2). Final read counts for each cassette after FSM selection were compared against counts for a 2 hours growth baseline and overnight growth baseline (without selection). Enrichment scores were calculated as the logarithm (base 2) of the ratio of frequencies between post-selection (FSM) to pre-selection for each individual replicate. Frequencies were determined by dividing the read counts for each variant to the total experimental reads. Since low-count variants are subject to counting error, variants with initial counts that sum up to less than 100 across all three replicates were not included in the analysis.
The enrichment scores were used to rank the fitness contributions of all variants under FSM selection in each individual replicate. To assess significance, the average of enrichment scores for all synonymous mutations included in the library were considered (i.e., average μ of wild-type enrichment). Bootstrap analysis (resampled with replacement 20,000 times) was performed to obtain a 95% confidence interval for the wild-type enrichment average μ). Variants were considered as significantly enriched under FSM selection if their enrichment scores were at least μ ± 2* (i.e., p- value 0.05 assuming a normal distribution of enrichment scores).
Genomic ispC variant sequencing analysis:
Data processing and variant calling: Following standard quality filtering and demultiplexing of the reads, variants were called using a k-mer strategy. Briefly, a database was generated with all designed ispC mutations. Then, a unique 10-mer sequence was searched in the database for each variant using custom python scripts. It should be noted that 23 cassettes, all encoding synonymous mutations, did not contain any unique k-mers and were not included in the analysis. After the list of unique k-mers were generated, variants were called and counted using the grep command for Unix operating systems.
Enrichment analysis:
Variant counts post-selection (FSM) were compared to pre-selection counts (2 hours or overnight) using the Pandas data analysis python package (v0.20.2). Similarly to the analysis performed for CREATE plasmids, enrichment scores were calculated as the logarithm (base 2) of the ratio of frequencies between post-selection (FSM) to pre-selection for each individual replicate, with frequencies determined by dividing the read counts for each variant to the total experimental reads. To assess significance, the mean values of a complete comparison was calculated, and significance was set to 2 standard deviations from the calculated mean.
Mutants validation:
Single colonies, successfully growing on FSM were picked and cultured overnight. CREATE plasmids were extracted and retransformed to fresh, heat-shocked, electrocompetent K12 MG1655 cells, already harboring the Cas9 and pSim5 plasmids. The retransformed strains were allowed to recover for two hours at 37°C. Following recovery, cells were diluted 1:1000 using fresh LB medium supplemented with carbenicillin and 100μM FSM and were grown at 37°C with continuous shaking in a BioTek Synergy 2 microplate reader. Optical density was measured after 24 hours of incubation. The means and the corresponding standard deviations from three independent experiments are shown (Fig. 2B).
Growth inhibition assay in E. coli
Growth inhibition of E. coli was measured as described by Zhang et al.49 Overnight cultures of E. coli MG1655 strains with the CREATE plasmids were diluted 1:100 into fresh LB medium supplemented with carbenicillin and grown to an optical density at 600 nm (OD600) of 0.5. Cultures were diluted to 105 CFU/mL in 150 L/well of a 96-well plate, with the indicated amounts of fosmidomycin. Strains were grown at 37°C with continuous shaking, and serial OD600 measurements were performed by a BioTek Synergy 2 microplate reader. The GraphPad Prism software (GraphPad Prism version 8.00 for Macintosh, GraphPad Software, La Jolla California USA, www.graphpad.com ) was used to calculate the half-maximal effective concentrations and their corresponding confidence intervals (EC50 and CI) at 8 hours of growth (mid-logarithmic growth phase). Data points in Figure 4B represent the three independent experiments, with eight replicates each.
Supplementary Material
ACKNOWLEDGMENTS
We would like to thank Yael David for her helpful insights and Emily Freed for her help with the manuscript preparation.
This work was funded both by the US Department of Energy Grant No. DE-SC008812 and by the National Institute of Allergy and Infectious Diseases (NIAID) at the NIH, grant number 1R21AI128296-01A1
Footnotes
ASSOCIATED CONTENT
Supporting Information including library sequences and examples, statistical correlations, additional enrichment graphs and tables, growth curves, and EC50 values, is available free of charge on the ACS Publications website at http://pubs.acs.org.
Conflict of interest
R.T.G. and A.D.G. have financial interests in Inscripta, Inc., which is commercializing the CREATE technology.
REFERENCES
- (1).Davies J; Davies D. Origins and Evolution of Antibiotic Resistance. Microbiol. Mol. Biol. Rev 2010, 74 (3), 417–433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (2).Morar M; Wright GD The Genomic Enzymology of Antibiotic Resistance. Annu. Rev. Genet 2010, 44, 25–51. [DOI] [PubMed] [Google Scholar]
- (3).Yelin I; Kishony R Antibiotic Resistance. Cell 2018, 172 (5), 1136–1136.e1. [DOI] [PubMed] [Google Scholar]
- (4).Hoeijmakers JH Genome Maintenance Mechanisms for Preventing Cancer. Nature 2001, 411 (6835), 366–374. [DOI] [PubMed] [Google Scholar]
- (5).Negrini S; Gorgoulis VG; Halazonetis TD Genomic Instability--an Evolving Hallmark of Cancer. Nat. Rev. Mol. Cell Biol 2010, 11 (3), 220–228. [DOI] [PubMed] [Google Scholar]
- (6).Wagner J; Gruz P; Kim SR; Yamada M; Matsui K; Fuchs RP; Nohmi T The dinB Gene Encodes a Novel E. Coli DNA Polymerase, DNA Pol IV, Involved in Mutagenesis. Mol. Cell 1999, 4 (2), 281–286. [DOI] [PubMed] [Google Scholar]
- (7).Tang M; Pham P; Shen X; Taylor JS; O’Donnell M; Woodgate R; Goodman MF Roles of E. Coli DNA Polymerases IV and V in Lesion-Targeted and Untargeted SOS Mutagenesis. Nature 2000, 404 (6781), 1014–1018. [DOI] [PubMed] [Google Scholar]
- (8).Rosenberg SM Evolving Responsively: Adaptive Mutation. Nat. Rev. Genet 2001, 2 (7), 504–515. [DOI] [PubMed] [Google Scholar]
- (9).Winkler JD; Halweg-Edwards AL; Erickson KE; Choudhury A; Pines G; Gill RT The Resistome: A Comprehensive Database of Escherichia Coli Resistance Phenotypes. ACS Synth. Biol 2016. [DOI] [PubMed] [Google Scholar]
- (10).Laxminarayan R; Duse A; Wattal C; Zaidi AKM; Wertheim HFL; Sumpradit N; Vlieghe E; Hara GL; Gould IM; Goossens H; et al. Antibiotic Resistance—the Need for Global Solutions. Lancet Infect. Dis 2013, 13 (12), 1057–1098. [DOI] [PubMed] [Google Scholar]
- (11).Brown ED; Wright GD Antibacterial Drug Discovery in the Resistance Era. Nature 2016, 529 (7586), 336–343. [DOI] [PubMed] [Google Scholar]
- (12).Aminov RI A Brief History of the Antibiotic Era: Lessons Learned and Challenges for the Future. Front. Microbiol 2010, 1, 134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).Yarden Y; Pines G The ERBB Network: At Last, Cancer Therapy Meets Systems Biology. Nat. Rev. Cancer 2012, 12 (8), 553–563. [DOI] [PubMed] [Google Scholar]
- (14).Holohan C; Van Schaeybroeck S; Longley DB; Johnston PG Cancer Drug Resistance: An Evolving Paradigm. Nat. Rev. Cancer 2013, 13 (10), 714–726. [DOI] [PubMed] [Google Scholar]
- (15).Camidge DR; Pao W; Sequist LV Acquired Resistance to TKIs in Solid Tumours: Learning from Lung Cancer. Nat. Rev. Clin. Oncol 2014, 11 (8), 473–481. [DOI] [PubMed] [Google Scholar]
- (16).Rohmer M The Discovery of a Mevalonate-Independent Pathway for Isoprenoid Biosynthesis in Bacteria, Algae and Higher Plants†. Nat. Prod. Rep 1999, 16 (5), 565–574. [DOI] [PubMed] [Google Scholar]
- (17).Imlay L; Odom AR Isoprenoid Metabolism in Apicomplexan Parasites. Curr Clin Microbiol Rep 2014, 1 (3–4), 37–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (18).Rodríguez-Concepción M The MEP Pathway: A New Target for the Development of Herbicides, Antibiotics and Antimalarial Drugs. Curr. Pharm. Des 2004, 10 (19), 2391–2400. [DOI] [PubMed] [Google Scholar]
- (19).Jomaa H; Wiesner J; Sanderbrand S; Altincicek B; Weidemeyer C; Hintz M; Türbachova I; Eberl M; Zeidler J; Lichtenthaler HK; et al. Inhibitors of the Nonmevalonate Pathway of Isoprenoid Biosynthesis as Antimalarial Drugs. Science 1999, 285 (5433), 1573–1576. [DOI] [PubMed] [Google Scholar]
- (20).Borrmann S; Issifou S; Esser G; Adegnika AA; Ramharter M; Matsiegui P-B; Oyakhirome S; Mawili-Mboumba DP; Missinou MA; Kun JFJ; et al. Fosmidomycin- Clindamycin for the Treatment of Plasmodium Falciparum Malaria. J. Infect. Dis 2004, 190 (9), 1534–1540. [DOI] [PubMed] [Google Scholar]
- (21).Borrmann S; Adegnika AA; Moussavou F; Oyakhirome S; Esser G; Matsiegui P-B; Ramharter M; Lundgren I; Kombila M; Issifou S; et al. Short-Course Regimens of Artesunate-Fosmidomycin in Treatment of Uncomplicated Plasmodium Falciparum Malaria. Antimicrob. Agents Chemother 2005, 49 (9), 3749–3754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (22).Brown AC; Parish T Dxr Is Essential in Mycobacterium Tuberculosis and Fosmidomycin Resistance Is due to a Lack of Uptake. BMC Microbiol. 2008, 8, 78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (23).Nair SC; Brooks CF; Goodman CD; Sturm A; Strurm A; McFadden GI; Sundriyal S; Anglin JL; Song Y; Moreno SNJ; et al. Apicoplast Isoprenoid Precursor Synthesis and the Molecular Basis of Fosmidomycin Resistance in Toxoplasma Gondii. J. Exp. Med 2011, 208 (7), 1547–1559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (24).Dharia NV; Sidhu ABS; Cassera MB; Westenberger SJ; Bopp SE; Eastman RT; Plouffe D; Batalov S; Park DJ; Volkman SK; et al. Use of High-Density Tiling Microarrays to Identify Mutations Globally and Elucidate Mechanisms of Drug Resistance in Plasmodium Falciparum. Genome Biol. 2009, 10 (2), R21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (25).Armstrong CM; Meyers DJ; Imlay LS; Meyers CF; Odom AR Resistance to the Antimicrobial Agent Fosmidomycin and an FR900098 Prodrug through Mutations in the Deoxyxylulose Phosphate Reductoisomerase Gene (dxr). Antimicrob. Agents Chemother 2015, 59 (9), 5511–5519. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (26).Pines G; Gill RT Dynamic Management of Codon Compression for Saturation Mutagenesis. In Synthetic Biology: Methods and Protocols; Braman, J. C., Ed.; Springer; New York: New York, NY, 2018; pp 171–189. [DOI] [PubMed] [Google Scholar]
- (27).Garst AD; Bassalo MC; Pines G; Lynch SA; Halweg-Edwards AL; Liu R; Liang L; Wang Z; Zeitoun R; Alexander WG; et al. Genome-Wide Mapping of Mutations at Single-Nucleotide Resolution for Protein, Metabolic and Genome Engineering. Nat. Biotechnol 2016. [DOI] [PubMed] [Google Scholar]
- (28).Firnberg E; Labonte JW; Gray JJ; Ostermeier M A Comprehensive, High-Resolution Map of a Gene’s Fitness Landscape. Mol. Biol. Evol 2014, 31 (6), 1581–1592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (29).Starita LM; Young DL; Islam M; Kitzman JO; Gullingsrud J; Hause RJ; Fowler DM; Parvin JD; Shendure J; Fields S Massively Parallel Functional Analysis of BRCA1 RING Domain Variants. Genetics 2015, 200 (2), 413–422. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (30).Jacquier H; Birgy A; Le Nagard H; Mechulam Y; Schmitt E; Glodt J; Bercot B; Petit E; Poulain J; Barnaud G; et al. Capturing the Mutational Landscape of the Beta-Lactamase TEM-1. Proceedings of the National Academy of Sciences 2013, 110 (32), 13067–13072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (31).Stiffler MA; Hekstra DR; Ranganathan R Evolvability as a Function of Purifying Selection in TEM-1 β-Lactamase. Cell 2015, 160 (5), 882–892. [DOI] [PubMed] [Google Scholar]
- (32).Fujisaki S; Ohnuma S; Horiuchi T; Takahashi I; Tsukui S; Nishimura Y; Nishino T; Kitabatake M; Inokuchi H Cloning of a Gene from Escherichia Coli That Confers Resistance to Fosmidomycin as a Consequence of Amplification. Gene 1996, 175 (1–2), 83–87. [DOI] [PubMed] [Google Scholar]
- (33).Sakamoto Y; Furukawa S; Ogihara H; Yamasaki M Fosmidomycin Resistance in Adenylate Cyclase Deficient (cya) Mutants of Escherichia Coli. Biosci. Biotechnol. Biochem 2003, 67 (9), 2030–2033. [DOI] [PubMed] [Google Scholar]
- (34).Mac Sweeney A; Lange R; Fernandes RPM; Schulz H; Dale GE; Douangamath A; Proteau PJ; Oefner C The Crystal Structure of E. Coli 1-Deoxy-D-Xylulose-5-Phosphate Reductoisomerase in a Ternary Complex with the Antimalarial Compound Fosmidomycin and NADPH Reveals a Tight-Binding Closed Enzyme Conformation. J. Mol. Biol 2005, 345 (1), 115–127. [DOI] [PubMed] [Google Scholar]
- (35).Steinbacher S; Kaiser J; Eisenreich W; Huber R; Bacher A; Rohdich F Structural Basis of Fosmidomycin Action Revealed by the Complex with 2-C-Methyl-D-Erythritol 4-Phosphate Synthase (IspC) Implications for the Catalytic Mechanism and Anti-Malaria Drug Development. J. Biol. Chem 2003, 278 (20), 18401–18407. [DOI] [PubMed] [Google Scholar]
- (36).Kanagawa T Bias and Artifacts in Multitemplate Polymerase Chain Reactions (PCR). J. Biosci. Bioeng 2003, 96 (4), 317–323. [DOI] [PubMed] [Google Scholar]
- (37).Zeitoun RI; Garst AD; Degen GD; Pines G; Mansell TJ; Glebes TY; Boyle NR; Gill RT Multiplexed Tracking of Combinatorial Genomic Mutations in Engineered Cell Populations. Nat. Biotechnol 2015, 33 (6), 631–637. [DOI] [PubMed] [Google Scholar]
- (38).Toprak E; Veres A; Michel J-B; Chait R; Hartl DL; Kishony R Evolutionary Paths to Antibiotic Resistance under Dynamically Sustained Drug Selection. Nat. Genet 2012, 44 (1), 101–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (39).Oz T; Guvenek A; Yildiz S; Karaboga E; Tamer YT; Mumcuyan N; Ozan VB; Senturk GH; Cokol M; Yeh P; et al. Strength of Selection Pressure Is an Important Parameter Contributing to the Complexity of Antibiotic Resistance Evolution. Mol. Biol. Evol 2014, 31 (9), 2387–2401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (40).Lázár V; Nagy I; Spohn R; Csörgő B; Györkei Á; Nyerges Á; Horváth B; Vörös A; Busa-Fekete R; Hrtyan M; et al. Genome-Wide Analysis Captures the Determinants of the Antibiotic Cross-Resistance Interaction Network. Nat. Commun 2014, 5, 4352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (41).Firnberg E; Ostermeier M The Genetic Code Constrains yet Facilitates Darwinian Evolution. Nucleic Acids Res. 2013, 41 (15), 7420–7428. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (42).Miyazaki K; Arnold FH Exploring Nonnatural Evolutionary Pathways by Saturation Mutagenesis: Rapid Improvement of Protein Function. J. Mol. Evol 1999, 49 (6), 716–720. [DOI] [PubMed] [Google Scholar]
- (43).Pines G; Winkler JD; Pines A; Gill RT Refactoring the Genetic Code for Increased Evolvability. MBio 2017, 8 (6), e01654–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (44).The UniProt Consortium. UniProt: The Universal Protein Knowledgebase. Nucleic Acids Res.2017, 45 (D1), D158–D169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (45).Pines G; Pines A; Garst AD; Zeitoun RI; Lynch SA; Gill RT Codon Compression Algorithms for Saturation Mutagenesis. ACS Synth. Biol 2015, 4 (5), 604–614. [DOI] [PubMed] [Google Scholar]
- (46).Datta S; Costantino N; Court DL A Set of Recombineering Plasmids for Gram-Negative Bacteria. Gene 2006, 379, 109–115. [DOI] [PubMed] [Google Scholar]
- (47).Sawitzke JA; Costantino N; Li X-T; Thomason LC; Bubunenko M; Court C; Court DL Probing Cellular Processes with Oligo-Mediated Recombination and Using the Knowledge Gained to Optimize Recombineering. J. Mol. Biol 2011, 407 (1), 45–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (48).Pines G; Freed EF; Winkler JD; Gill RT Bacterial Recombineering: Genome Engineering via Phage-Based Homologous Recombination. ACS Synth. Biol 2015, 4 (11), 1176–1185. [DOI] [PubMed] [Google Scholar]
- (49).Zhang B; Watts KM; Hodge D; Kemp LM; Hunstad DA; Hicks LM; Odom AR A Second Target of the Antimalarial and Antibacterial Agent Fosmidomycin Revealed by Cellular Metabolic Profiling. Biochemistry 2011, 50 (17), 3570–3577. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




