Significance
This work presents the genome-wide replacement of all rare AGR (AGA and AGG) arginine codons in the essential genes of Escherichia coli with synonymous CGN alternatives. Synonymous codon substitutions can lethally impact noncoding function by disrupting mRNA secondary structure and ribosomal binding site-like motifs. Here we quantitatively define the range of tolerable deviation in these metrics and use this relationship to provide critical insight into codon choice in recoded genomes. This work demonstrates that genome-wide removal of AGR is likely to be possible and provides a framework for designing genomes with radically altered genetic codes.
Keywords: codon choice, genome editing, recoded genomes
Abstract
The degeneracy of the genetic code allows nucleic acids to encode amino acid identity as well as noncoding information for gene regulation and genome maintenance. The rare arginine codons AGA and AGG (AGR) present a case study in codon choice, with AGRs encoding important transcriptional and translational properties distinct from the other synonymous alternatives (CGN). We created a strain of Escherichia coli with all 123 instances of AGR codons removed from all essential genes. We readily replaced 110 AGR codons with the synonymous CGU codons, but the remaining 13 “recalcitrant” AGRs required diversification to identify viable alternatives. Successful replacement codons tended to conserve local ribosomal binding site-like motifs and local mRNA secondary structure, sometimes at the expense of amino acid identity. Based on these observations, we empirically defined metrics for a multidimensional “safe replacement zone” (SRZ) within which alternative codons are more likely to be viable. To evaluate synonymous and nonsynonymous alternatives to essential AGRs further, we implemented a CRISPR/Cas9-based method to deplete a diversified population of a wild-type allele, allowing us to evaluate exhaustively the fitness impact of all 64 codon alternatives. Using this method, we confirmed the relevance of the SRZ by tracking codon fitness over time in 14 different genes, finding that codons that fall outside the SRZ are rapidly depleted from a growing population. Our unbiased and systematic strategy for identifying unpredicted design flaws in synthetic genomes and for elucidating rules governing codon choice will be crucial for designing genomes exhibiting radically altered genetic codes.
The genetic code possesses inherent redundancy (1), with up to six different codons specifying a single amino acid. Although it is tempting to approximate synonymous codons as equivalent (2), most prokaryotes and many eukaryotes (3, 4) display a strong preference for certain codons over synonymous alternatives (5, 6). Although different species have evolved to prefer different codons, codon bias is largely consistent within each species (5). However, within a given genome, codon bias differs among individual genes according to codon position, suggesting that codon choice has functional consequences. For example, rare codons are enriched at the beginning of essential genes (7, 8), and codon use strongly affects protein levels (9–11), especially at the N terminus (12). These observations suggest that codon use plays a poorly understood role in regulating protein expression. Several hypotheses attempt to explain how codon use mediates this effect, including but not limited to facilitating ribosomal pausing early in translation to optimize protein folding (13); adjusting mRNA secondary structure to optimize translation initiation or to modulate mRNA degradation; preventing ribosome stalling by coevolving with tRNA levels (6); providing a “translational ramp” for proper ribosome spacing and effective translation (14); and providing a layer of translational regulation for independent control of each gene in an operon (15). Additionally, codon use may impact translational fidelity (16), and the proteome may be tuned by fine control of the decoding tRNA pools (17). Although Quax et al. (18) provide an excellent review of how biology chooses codons, systematic and exhaustive studies of codon choice in whole genomes are lacking. Studies have only begun to probe the effects of codon choice empirically in a relatively small number of reporter genes (12, 19–22). Several important questions must be answered as a first step toward designing custom genomes exhibiting new functions: How flexible is genome-wide codon choice? How does codon choice interact with the maintenance of cellular homeostasis? What heuristics can be used to predict which codons will conserve genome function?
Replacing all essential instances of a codon in a single strain would provide valuable insight into the constraints that determine codon choice and aid in the design of recoded genomes. Although the UAG stop codon has been completely removed from Escherichia coli (23), no genome-wide replacement of a sense codon has been reported. Although the translation function of the AGG codon has been shown to permit efficient suppression with nonstandard amino acids (24–26), AGG necessarily remains translated as Arg in each of these studies. No study has yet demonstrated that all instances of any sense codon can be removed from essential genes. These insights are crucial for unambiguously reassigning sense codon translation function.
We chose to study the rare Arg codons AGA and AGG [termed “AGR” according to International Union of Pure and Applied Chemistry (IUPAC) conventions] because the literature suggests that they are among the most difficult codons to replace and that their similarity to ribosome-binding sequences (RBSs) underlies important noncoding functions (8, 27–30). Furthermore, their sparse use (123 instances in the essential genes of E. coli MG1655 and 4,228 instances in the entire genome) (Table 1 and Dataset S1) made replacing all AGR instances in essential genes a tractable goal, with essential genes serving as a stringent test set for identifying any fitness impact from codon replacement (31). Additionally, recent work has shown the difficulty of directly mutating some AGR codons to other synonymous codons (25), although the authors do not explain the mechanism of failure or report successful implementation of alternative designs. We attempted to remove all 123 instances of AGR codons from essential genes by replacing them with the synonymous CGU codon. CGU was chosen to disrupt the primary nucleic acid sequence maximally (AGR→CGU). We hypothesized that this strategy would maximize design flaws, thereby revealing rules for designing genomes with reassigned genetic codes. Importantly, individual codon targets were not inspected a priori to ensure an unbiased empirical search for design flaws.
Table 1.
Summary of AGR codons changed by location in the genome, and failure rates by pool
|  | 
Results
To construct this modified genome, we used coselection multiplex automatable genome engineering (CoS-MAGE) (32, 33) to create an E. coli strain (C123) with all 123 AGR codons removed from its essential genes (see Fig. 1A and Dataset S1 for a complete list of AGR codons in essential genes). CoS-MAGE leverages Lambda Red-mediated recombination (34, 35) and exploits the linkage between a mutation in a selectable allele (e.g., tolC) to nearby edits of interest (e.g., AGR conversions), thereby enriching for cells with those edits (Fig. S1). To streamline C123 construction, we chose to start with E. coli strain EcM2.1, which was previously optimized for efficient Lambda Red-mediated genome engineering (33, 36). Using CoS-MAGE on EcM2.1 improves allele replacement frequency by 10-fold over MAGE in nonoptimized strains but performs optimally when all edits are on the same replichore and within 500 kb of the selectable allele (33). To accommodate this requirement, we divided the genome into 12 segments containing all 123 AGR codons in essential genes. A tolC cassette was moved around the genome to enable CoS-MAGE in each segment, allowing us to prototype each set of AGR→CGU mutations rapidly across large cell populations in vivo. (Please see General Replacement Strategy and Troubleshooting Strategy in Materials and Methods for a more detailed discussion). Of the 123 AGR codons in essential genes, 110 could be changed to CGU by this process (Fig. 1), revealing considerable flexibility of codon use for most essential genes. The frequency of allele replacement (in this case, AGR→CGU codon substitution) varied widely across these 110 permissive codons, with no clear correlation between the frequency of allele replacement and the normalized position of the AGR codon in a gene (Fig. 2A).
Fig. 1.
Construction of strain C123. (Inner) Workflow used to create and analyze strain C123. The DESIGN phase involved identification of 123 AGR codons in the essential genes of E. coli. MAGE oligos were designed to replace all instances of these AGR codons with the synonymous CGU codon. The BUILD phase used CoS-MAGE to convert 110 AGR codons to CGU and to identify 13 AGR codons that required additional troubleshooting. The in vivo TROUBLESHOOTING phase resolved the 13 codons that could not be readily converted to CGU and identified mechanisms potentially explaining why AGR→CGU was not successful. In the STUDY phase, next-generation sequencing, evolution, and phenotyping were performed on strain C123. (Outer) Schematic of the C123 genome (nucleotide 0 is oriented up; numbering is according to strain MG1655). Exterior labels indicate the set groupings of AGR codons. Successful AGR→CGU conversions (110 instances) are indicated by radial green lines, and recalcitrant AGR codons (13 instances) are indicated by radial red lines.
Fig. S1.
Strategy for replacing each set of AGR codons in all of the essential genes of E. coli (EcM2.1). The AGR codons are marked with open triangles (various colors). To start, a dual-selectable tolC cassette (double green line) is recombined into the genome using Lambda Red in a multiplexed recombination along with several oligos targeting nearby (<500 kb) downstream AGR loci (various colored lines). Upon selection for tolC insertion clones, correctly chosen AGR codons (filled triangles) are also observed at a higher frequency because of strong linkage between recombination events at tolC and other nearby (<500 kb) downstream AGR loci. Next, a second recombination is carried out using the same AGR conversion oligo pool but now paired with another oligo to disrupt the tolC ORF with a premature stop; then the tolC counterselection is applied, again enriching the population for AGR conversions. A third multiplexed recombination then fixes the tolC ORF, again targeting AGR loci. After the tolC selection is applied, clones are assayed by mascPCR. If most conversions in a given set had been made, the selectable marker is removed using a repair oligo in a singleplexed or multiplexed recombination (depending on need). The tolC counterselection then is leveraged both to leave a scarless chromosome and to free up the tolC cassette for use elsewhere in the genome.
Fig. 2.
Analysis of attempted AGR→CGU replacements. (A) AGR recombination frequency (mascPCR, n = 96 clones per cell population) was plotted versus the normalized ORF position (residue number of the AGR codon divided by the total length of the ORF). Failed AGR→CGU conversions are indicated by vertical red lines below the x axis. (B) Doubling time of strains in the C123 lineage in LBL medium at 34 °C was determined in triplicate on a 96-well plate reader. Colored bars indicate the set of codons under construction when a doubling time was determined (coloring based on Fig. 1). Each data point represents a different stage of strain construction. Alternative codons were identified for 13 recalcitrant AGR codons in our troubleshooting pipeline, the optimized replacement sequences were incorporated into the final strain (gray section at right, labeled with an asterisk), and the resulting doubling times were measured. Error bars represent SEM in doubling time from at least three replicates of each strain.
The remaining 13 AGR→CGU mutations were not observed, suggesting codon substitution frequency below our detection limit of 1% of the bacterial population (Materials and Methods and Dataset S2). These “recalcitrant codons” were assumed to be deleterious or nonrecombinogenic and were triaged into a troubleshooting pipeline for further analysis (Fig. 1). Interestingly, all except 1 of the 13 recalcitrant codons were colocalized near the termini of their respective genes, suggesting the importance of codon choice at these positions: seven were at most 30 nt downstream of the start codon, and five were at most 30 nt upstream of the stop codon (Fig. 2A, Lower and Dataset S3). Because of our unbiased design strategy, we anticipated that several AGR→CGU mutations would present obvious design flaws, such as introducing nonsynonymous mutations (two instances) or RBS disruptions (four instances) in overlapping genes. For example, ftsI_AGA1759 overlaps the second and third codons of murE, an essential gene, introducing a missense mutation (murE D3V) that may impair fitness. Replacing ftsI_AGA with CGA successfully replaced the forbidden AGA codon while conserving the primary amino acid sequence of MurE with a minimal impact on fitness (Fig. 3A and Dataset S2). Similarly, holB_AGA4 overlaps the upstream essential gene tmk, and replacing AGA with CGU converts the tmk stop codon to Cys, adding 14 amino acids to the C terminus of tmk. Although some C-terminal extensions are well tolerated in E. coli (37), extending tmk appears to be deleterious. We successfully replaced holB_AGA with CGC by inserting three nucleotides comprising a stop codon before the holB start codon. This insertion reduced the tmk/holB overlap and preserved the coding sequences of both genes (Fig. S2A).
Fig. 3.
Examples of failure mechanisms for four recalcitrant AGR replacements. Wild-type AGR codons are indicated by bold black letters, design flaws are indicated by red letters, and optimized replacement genotypes are indicated by green letters. (A) The genes ftsI and murE overlap with each other. An AGA→CGU mutation in ftsI would introduce a nonconservative Asp3Val mutation in murE. The amino acid sequence of murE was preserved by using an AGA→CGA mutation. (B) Gene secE overlaps with the RBS for the downstream essential gene nusG. An AGG→CGU mutation is predicted to diminish the RBS strength by 97% (53). RBS strength is preserved by using a nonsynonymous AGG→GAG mutation. (C) Gene ssb has an internal RBS-like motif shortly after its start codon. An AGG→CGU mutation would diminish the RBS strength by 94%. RBS strength is preserved by using an AGA→CGA mutation combined with additional wobble mutations indicated by green letters. (D) Gene rnpA has a defined mRNA structure that would be changed by an AGG→CGU mutation. The original RNA structure is preserved by using an AGG→CGG mutation. The RBS (green), start codon (blue), and AGR codon (red) are annotated with like-colored boxes on the predicted RNA secondary structures.
Fig. S2.
Schematic of three different cases of failure in recalcitrant AGR→CGU mutations. In each case, the top row is the initial sequence, the middle row is the AGR→CGU mutation, and the third row of the primary DNA sequence is the optimized solution converged on in troubleshooting. Green boxes below the DNA sequence indicate amino acid sequence in the same order (the top box is the initial sequence, the middle box gives results from AGR→CGU, and the bottom box shows results from troubleshooting solution). (A) Cases of C-terminal overlap of AGRs at the ends of essential genes with downstream ORFs. (i) The genes ftsI and murE overlap with each other. An AGA→CGU mutation in ftsI would introduce a nonconservative Asp3Val mutation in murE. The amino acid sequence of murE was preserved by using an AGA→CGA mutation. (ii) The genes holB and tmk overlap with each other. An AGA→CGU mutation in holB would introduce a nonconservative Stop214Cys mutation in tmk. The amino acid sequence of tmk was preserved by using an AGA→CGC mutation and adding three nucleotides. (B) Cases of C-terminal overlap of AGRs at the ends of essential genes with the RBS of a downstream gene. (i) Gene secE overlaps with the RBS for the downstream essential gene nusG. An AGG→CGU mutation would diminish the RBS strength by 97% (53). RBS strength is preserved by using an AGG→GAG mutation. (ii) Gene dnaT overlaps with the RBS for the downstream essential gene dnaC. An AGG→CGU mutation would diminish the RBS strength by 77% (53). RBS strength is preserved by using an AGG→CGA mutation. (ii) Gene folC overlaps with the RBS for the downstream gene dedD, shown to be essential in our strain. An AGGAGA→CGUCGU mutation would diminish the RBS strength by 99% (53). RBS strength is preserved by using an AGG→CGGCGA mutation. (C) N-terminal RBS-like motifs causing recalcitrant AGR conversions at the beginning of essential genes. (i) Gene dnaT has an internal RBS-like motif. An AGG→CGU mutation would increase the RBS strength 26 times (53). RBS strength is better preserved by using an AGA→CGU mutation combined with additional wobble mutations. (ii) Gene prfB has an internal RBS-like motif. This RBS-like motif is involved in a downstream planned frameshift in prfB (39). AGG→CGU mutation was possible only by removing the frameshift (leaving a poor RBS-like site). To maintain the frameshift, AGG→CGG mutation and additional wobble were required. In that case, local RBS strength was maintained (fourth row). (iii) Gene ssb has an internal RBS-like motif. An AGG→CGU mutation would diminish the RBS strength by 94%. RBS strength is preserved by using an AGA→CGA mutation combined with additional wobble mutations.
Additionally, the four remaining C-terminal failures included AGR→CGU mutations that disrupt RBS motifs belonging to downstream genes (secE_AGG376 for nusG, dnaT_AGA532 for dnaC, and folC_AGAAGG1249,1252 for dedD, the last constituting two codons). Both nusG and dnaC are essential, suggesting that replacing AGR with CGU in secE and dnaT lethally disrupts translation initiation and thus the expression of the overlapping nusG and dnaC (Fig. 3B and Fig. S2B). Although dedD is annotated as nonessential (31), we hypothesized that replacing the AGR with CGU in folC disrupted a portion of dedD that is essential to the survival of EcM2.1 (E. coli K-12). In support of this hypothesis, we were unable to delete the 29 nucleotides of dedD that were not deleted by Baba et al. (31) and did not overlap with folC, suggesting that this sequence is essential in our strain. The unexpected failure of this conversion highlights the challenge of predicting design flaws even in well-annotated organisms. Consistent with our observation that disrupting these RBS motifs underlies the failed AGR→CGU conversions, we overcame all four design flaws by selecting codons that conserved RBS strength, including a nonsynonymous (Arg→Gly) conversion for secE.
These lessons, together with previous observations that ribosomes pause during translation when they encounter RBS motifs in coding DNA sequences (20), provided key insights into the N-terminal AGR→CGU failures. Three of the N-terminal failures (ssb_AGA10, dnaT_AGA10, and prfB_AGG64) had RBS-like motifs that were either disrupted or created by CGU replacement. Although prfB_AGG64 is part of the RBS motif that triggers an essential frameshift mutation in prfB (21, 38, 39), pausing motif-mediated regulation of ssb and dnaT expression has not been reported. Nevertheless, ribosomal pausing data (20) showed that ribosomal occupancy peaks are present directly downstream of the AGR codons for ssb and are absent for dnaT (Fig. S3); meanwhile, unsuccessful CGU mutations were predicted to weaken the RBS-like motif for prfB and ssb and to strengthen the RBS-like motif for dnaT (Fig. 3C and Fig. S2C), suggesting a functional relationship between RBS occupancy and cell fitness. Consistent with this hypothesis, successful codon replacements from the troubleshooting pipeline conserve predicted RBS strength compared with the large predicted deviation caused by unsuccessful AGR→CGU mutations (Fig. 4, y axis and comparison of orange asterisks and green dots). Interestingly, attempts to replace dnaT_AGA10 with either CGN or NNN failed; only by manipulating the wobble position of surrounding codons and conserving the Arg amino acid could dnaT_AGA10 be replaced (Fig. S2C). These wobble variants appear to compensate for the increased RBS strength caused by the AGA→CGU mutation: RBS motif strength with wobble variants deviated eightfold from the unmodified sequence, whereas RBS motif strength for AGA→CGU alone deviated 27-fold.
Fig. S3.
Ribosomal-pausing data drawn from previous work (20) for genes ssb, dnaT, and prfB. The green line represents ribosome-profiling data for each gene. The orange line is the average for all genes with an AGR codon within the first 30 nt of the annotated start codon. The region between the two vertical red lines indicates zones of interest (centered 12 bp after the AGR codon). Interestingly, prfB and ssb show a peak after the AGR codon, but no peak in that location is observed for dnaT. Based on predictions from the Salis calculator (53), replacing AGR with CGU in those three cases is believed to disrupt ribosomal pausing (prfB and ssb) or to introduce ribosomal pausing (dnaT).
Fig. 4.
RBS strength and mRNA structure predict synonymous mutation success. Scatter plot showing predicted RBS strength [y axis, calculated with the Salis RBS calculator (53)] versus deviations in mRNA folding [x axis, calculated at 37 °C by the UNAFold calculator (40)]. Small gray dots represent nonessential genes in E. coli MG1655 that have an AGR codon within the first 10 or last 10 codons. Large gray dots represent successful AGR→CGU conversions in the first 10 or last 10 codons of essential genes. Orange asterisks represent unsuccessful AGR→CGU mutations (recalcitrant codons) in essential genes. Green dots represent optimized solutions for these recalcitrant codons. The SRZ (blue-shaded region) is an empirically defined range of mRNA folding and RBS strength deviations, based on the successful AGR→CGU replacement mutations observed in this study. Most unsuccessful AGR→CGU mutations (orange asterisks) cause large deviations in RBS strength or mRNA structure that are outside the SRZ. The genes holB and ftsI are two notable exceptions because their initial CGU mutations caused amino acid changes in overlapping essential genes. Gene folC corresponds to two AGRs. Arrows for four examples of optimized replacement codons (ftsA, folC, rnpA, and rpsJ) show that deviations in RBS strength and/or mRNA structure are reduced. Arrows are omitted for the remaining eight optimized replacement codons to increase readability.
To understand better the several remaining cases of N-terminal failure that did not exhibit considerable deviations in RBS strength (rnpA_AGG22, ftsA_AGA19, frr_AGA16, and rpsJ_AGA298), we examined other potential nucleic acid determinants of protein expression. Based on the observation that the mRNA secondary structure near the 5′ end of ORFs strongly impacts protein expression (12), we found that these four remaining AGR→CGU mutations changed the predicted folding energy and structure of the mRNA near the start codon of target genes (Fig. 3D and Fig. S4). Successful codon replacements obtained from degenerate MAGE oligos reduced the disruption of the mRNA secondary structure compared with CGU (Fig. 4, green dots). For example, rnpA has a predicted mRNA loop near its RBS and start codon that relies on base pairing between both guanines of the AGG codon to nearby cytosines (Fig. 3D and Fig. S5A). Importantly, only AGG22CGG was observed out of all attempted rnpA AGG22CGN mutations, and the fact that only CGG preserves this mRNA structure suggests that it is physiologically important (Fig. 3D and Fig. S5 B and C). In support of this notion, we successfully introduced an rnpA AGG22CUG mutation (Arg→Leu) only when we changed the complementary nucleotides in the stem from CC (base pairs with AGG) to CA (base pairs with CUG), thus preserving the natural RNA structure (Fig. S5D) while changing both RBS motif strength and amino acid identity. Our analysis of all four optimized gene sequences showed reduced deviation in computational mRNA folding energy [computed with UNAFold (40)] compared with the unsuccessful CGU mutations (Fig. 4, x axis, orange asterisks, and green dots). Similarly, the predicted mRNA structure [computed with different mRNA folding software, NUPACK (41)] for these genes was strongly changed by CGU mutations and was corrected in our empirically optimized solutions (Fig. S4).
Fig. S4.
mRNA folding predictions for the four recalcitrant AGR→CGU mutations explained by mRNA folding variations. mRNA folding prediction of 100 nt upstream and 30 nt downstream of the start codon using UNAfold (40). Both the shape of the mRNA folding and the folding energy value must be taken into account to understand failure of the AGR→CGU conversion. AGR depicts the predicted wild-type mRNA, CGU is the mRNA folding prediction with an AGR→CGU mutation (generally not observed), and “Optimized” corresponds to the mRNA-folding prediction of the AGR replacement solution found after in vivo troubleshooting. The predicted free energy of folding of the visualized structure expressed in kilocalories per mole is listed under each structure.
Fig. S5.
mRNA folding predictions for the gene rnpA. For folding predictions, we used 30 nt upstream and100 nt downstream of the rnpA start site using UNAfold (40). (A) The wild-type rnpA sequence, with AGG in the blue box. (B) The wild-type rnpA sequence with AGG→CGU in the blue box (not observed). (C) The wild-type rnpA sequence with AGG→CGG in the blue box (observed with no growth-rate defect). (D) The wild-type rnpA sequence with AGG→CTG in blue box and one complementary mutation CCC→CCA to maintain the mRNA loop (in the blue box) (also observed with no growth-rate defect).
Troubleshooting these 13 recalcitrant codons revealed that mutations causing large deviations from natural mRNA folding energy or RBS strength are associated with failed codon substitutions. By calculating these two metrics for all attempted AGR→CGU mutations, we empirically defined a safe replacement zone (SRZ) within which most CGU mutations were tolerated (Fig. 4, shaded area). The SRZ is defined as the largest multidimensional space that contains none of the AGR→CGU failures associated with mRNA folding energy or RBS strength (Fig. 4, red asterisks). It comprises deviations in mRNA folding energy of less than 10% with respect to the natural codon and deviations in RBS-like motif scores of less than a half log with respect to the natural codon, providing a quantitative guideline for codon substitution. Notably, the optimized solution used to replace the 13 recalcitrant codons always exhibited reduced deviation for at least one of these two parameters compared with the deviation seen with a CGU mutation. Furthermore, solutions to the 13 recalcitrant codons overlapped almost entirely with the empirically defined SRZ. These results suggest that computational predictions of mRNA folding energy and RBS strength can be used as a first approximation to predict whether a designed mutation is likely to be viable. Developing in silico heuristics to predict problematic alleles streamlines the use of in vivo genome engineering methods such as MAGE to identify viable replacement codons empirically. Therefore, these heuristics reduce the search space required to redesign viable genomes, raising the prospect of creating radically altered genomes exhibiting expanded biological functions.
Once we had identified viable replacement sequences for all 13 recalcitrant codons, we combined the successful 110 CGU conversions with the 13 optimized codon substitutions to produce strain C123, in which all 123 AGR codons have been removed from all of its annotated essential genes. C123 then was sequenced to confirm AGR removal and analyzed using Millstone, a publicly available genome resequencing analysis pipeline (https://github.com/churchlab/millstone). Two spontaneous AAG (Lys) to AGG (Arg) mutations were observed in the essential genes pssA and cca. Although attempts to revert these mutations to AAG were unsuccessful—perhaps suggesting functional compensation—we were able to replace them with CCG (Pro) in pssA and CAG (Gln) in cca using degenerate MAGE oligos. The resulting strain, C123a, is the first strain completely devoid of AGR codons in its annotated essential genes (https://github.com/churchlab/agr_recoding) (Dataset S4). Although some AGR codons in nonessential genes could prove unexpectedly difficult to change, our success in replacing all 123 instances of AGR codons in essential genes provides strong evidence that the remaining 4,105 AGR codons can be completely removed from the E. coli genome, permitting the unambiguous reassignment of AGR translation function (23).
Kinetic growth analysis showed that the doubling time increased from 52.4 (±2.6) min in EcM2.1 (no AGR codons changed) to 67 (±1.5) min in C123a (123 AGR codons changed in essential genes) in lysogeny broth (LB) at 34 °C in a 96-well plate reader (Materials and Methods). Notably, fitness varied significantly during construction of the C123 strain (Fig. 2B). This variation may be attributed to codon deoptimization (AGR→CGU) and compensatory spontaneous mutations to alleviate fitness defects in a mismatch repair-deficient (mutS-) background. Overall the reduced fitness of C123a may be caused by on-target (AGR→CGU) or off-target (spontaneous) mutations that occurred during strain construction. In this way, mutS inactivation is simultaneously a useful evolutionary tool and a liability. Final genome sequence analysis revealed that, along with the 123 desired AGR conversions, C123a had 419 spontaneous nonsynonymous mutations not found in the EcM2.1 parental strain (Fig. S6). Of particular interest was the mutation argU_G15A, located in the D arm of tRNAArg (argU), which arose during CoS-MAGE with AGR set 4. We hypothesized that argU_G15A compensates for increased CGU demand and decreased AGR demand, but we observed no direct fitness cost associated with reverting this mutation in C123, and argU_G15A does not impact aminoacylation efficiency in vitro or aminoacyl-tRNA pools in vivo (Fig. S7 and Dataset S5). Consistent with the findings of Mukai et al. (25) and Baba et al. (31), argW (tRNAArgCCU; decodes AGG only) was dispensable in C123a because it can be complemented by argU (tRNAArgUCU; decodes both AGG and AGA). However, argU is the only E. coli tRNA that can decode AGA and remains essential in C123a, probably because it is required to translate the AGR codons for the rest of the proteome (23).
Fig. S6.
Representational graph of the fully recoded genome relative to MG1655. The outer ring contains the set grouping in which each AGR codon (vertical line) is located. Each line contains information on troubleshooting (red if troubleshooting was required, green if it was not). Relative recombination frequency is represented by the position of the dot. Each internal ring represents the mutations that accumulated during strain construction. The target set of AGR codons for each ring is highlighted. The internal rings with black radial lines represent the mutations that accumulated while the 13 recalcitrant codons were mutated to their optimized codon replacements.
Fig. S7.
G15A ArgU does not affect expression and aminoacylation levels in wild-type and recoded E. coli strains. Northern blot acid-urea PAGE was performed on wild-type and G15A argU tRNA in wild-type E. coli (WT-WT and WT-G15A) and in the final strains C123a and b (501 and 503) in several growth conditions. Aminoacylation levels are comparable to those in wild type for all conditions and combinations, suggesting no effect on charging levels despite the mutation sweeping into the population (Dataset S5).
To evaluate the genetic stability of C123a after removal of all AGR codons from all the known essential genes, we passaged C123a for 78 d (640 generations) to test whether AGR codons would recur and/or whether spontaneous mutations would improve fitness. After 78 d, no additional AGR codons were detected in a sequenced population (sequencing data are available at https://github.com/churchlab/agr_recoding), and doubling time of isolated clones ranged from 22% faster to 22% slower than C123a (n = 60).
To gain more insight into how local RBS strength and mRNA folding impact codon choice, we performed an evolution experiment to examine the competitive fitness of all 64 possible codon substitutions at each of the AGR codons (Dataset S6). Although MAGE is a powerful method for exploring viable genomic modifications in vivo, we were interested in mapping the fitness cost associated with less-optimal codon choices, requiring codon randomization depleted of the parental genotype, which we hypothesized to be at or near the global fitness maximum. To do so, we developed a method called “CRAM” (Crispr-assisted MAGE). First, we designed oligos that changed not only the target AGR codon to NNN but also made several synonymous changes at least 50 nt downstream that would disrupt a 20-bp CRISPR target locus. MAGE was used to replace each AGR with NNN in parallel, and CRISPR/cas9 was used to deplete the population of cells with the parental genotype. This approach allowed exhaustive exploration of the codon space, including the original codon, but without the preponderance of the parental genotype. Following CRAM, the population was passaged 1:100 every 24 h for 6 d and was sampled before each passage using Illumina sequencing (Fig. 5 and Dataset S6).
Fig. 5.
Codon preference of 14 N-terminal AGR codons. CRAM was used to explore codon preference for several AGR codons located within the first 10 codons of their CDS. Briefly, MAGE was used to diversify a population by randomizing the AGR of interest; then CRISPR/Cas9 was used to deplete the parental (unmodified) population, allowing exhaustive exploration of all 64 codons at a position of interest. Thereafter codon abundance was monitored over time by serially passaging the population of cells and sequencing using an Illumina MiSeq. The left y axis (codon frequency) indicates relative abundance of a particular codon (stacked area plot). The right y axis indicates the combined deviations in mRNA folding structure (red line) and internal RBS strength (blue line) in arbitrary units (AU) normalized to 0.5 at the initial time point. Zero indicates no deviation from wild type. The horizontal axis indicates the experimental time point (in hours) at which a particular reading of the population diversity was obtained. The genes bcsB and chpS are nonessential in our strains and thus serve as controls for AGR codons that are not under essential gene pressure.
Sequencing 24 h after CRAM showed that all codons were present, including stop codons (Fig. S8), validating the method as a technique to generate massive diversity in a population. All sequences for further analysis were amplified by PCR with allele-specific primers containing the changed downstream sequence. Subsequent passaging of these populations revealed many gene-specific trends (Fig. 5 and Figs. S8 and S9). Notably, all codons that required troubleshooting (dnaT_AGA10, ftsA_AGA19, frr_AGA16, and rnpA_AGG22) converged to their wild-type AGR codon, suggesting that the original codon was globally optimized. For all cases in which an alternate codon replaced the original AGR, we computed the predicted deviation in mRNA folding energy and local RBS strength (as a proxy for ribosome pausing) for these alternative codons and compared these metrics with the evolution of codon distribution at this position over time. We also computed the fraction of sequences that fall within the SRZ inferred from Fig. 4 (Materials and Methods). CRAM initially introduced a large diversity of mRNA folding energies and RBS strengths, but these genotypes rapidly converged toward parameters that are similar to the parental AGR values in many cases (overlays in Fig. 5). Codons that strongly disrupted predicted mRNA folding and internal RBS strength near the start of genes were disfavored after several days of growth, suggesting that these metrics can be used to predict optimal codon substitutions in silico. In contrast, nonessential control genes bcsB and chpS did not converge toward codons that conserved RNA structure or RBS strength, supporting the conclusion that the observed conservation in RNA secondary structure and RBS strength is biologically relevant for essential genes. Interestingly, tilS_AGA19 was less sensitive to this effect, suggesting that codon choice at that particular position is not under selection. Additionally, the average internal RBS strength for the ispG populations converged toward the parental AGR values, but mRNA folding energy averages did not, suggesting that this position in the gene may be more sensitive to RBS disruption than to mRNA folding. Gene lptF followed the opposite trend.
Fig. S8.
The number of reads for each codon and for each gene in the CRAM experiment at the 24-h time point. CRAM was used to explore codon preference for several N-terminal AGR codons. The left y axis (number of reads) indicates the abundance of a particular codon. The x axis indicates the 64 possible codons ranked from AAA to TTT in alphabetical order. Experimental time point 24 h is presented. Diversity was assayed by Illumina sequencing. The genes bcsB and chpS are nonessential and thus serve as controls for AGR codons that are not under essential gene pressure.
Fig. S9.
The number of reads for each codon and for each gene in the CRAM experiment at the 144-h time point. CRAM was used to explore codon preference for several N-terminal AGR codons. The left y axis (the number of reads) indicates the abundance of a particular codon. The x axis indicates the 64 possible codons ranked from AAA to TTT in alphabetical order. Experimental time point 144 h is presented. Diversity was assayed by Illumina sequencing. The genes bcsB and chpS are nonessential and thus serve as controls for AGR codons that are not under essential gene pressure.
Interestingly, several genes (lptF, ispG, tilS, gyrA, and rimN) preferred codons that changed the amino acid identity from Arg to Pro, Lys, or Glu, suggesting that noncoding functions trump amino acid identity at these positions. Importantly, all successful codon substitutions in essential genes fell within the SRZ (Fig. 6), validating our heuristics based on an unbiased test of all 64 codons. Meanwhile nonessential control gene chpS exhibited less dependence on the SRZ.
Fig. 6.
RBS strength and mRNA structure predict codon preference of 14 N-terminal codon substitutions. Scatter plots show the results of the CRAM experiment (Fig. 5). Each panel represents a different gene. The y axis represents RBS strength deviation [calculated with the Salis RBS calculator; (53)], and the x axis shows deviations in mRNA folding energy [calculated at 37 °C by the UNAFold calculator (40)]. Codon abundance at the intermediate time point (t = 72 h, chosen to show maximal diversity after selection) is represented by the dot size. Green dots represent the wild-type codon. Blue dots represent synonymous AGR codons. Orange dots represent the remaining 58 nonsynonymous codons, which may introduce nonviable amino acid substitutions. Black squares represent unsuccessful AGR→CGU conversions observed in the genome-wide recoding effort (Fig. 1 and Table 1). The SRZ (blue-shaded region) is the empirically defined range of mRNA folding and RBS strength deviations, based on the successful AGR→CGU replacement mutations observed in this study (Fig. 3). The genes bcsB and chpS are nonessential in our strains and thus serve as controls for AGR codons that are not under essential gene pressure.
Discussion
These observations indicate that, although global codon bias may be affected by tRNA availability (6, 42–44), codon choice at a given position may be defined by at least three parameters: (i) amino acid sequence; (ii) mRNA structure near the start codon and RBS; and (iii) RBS-mediated pausing. In some cases, a subset of these parameters may not be under selection, resulting in an evolved sequence that converges for only a subset of the metrics. In other cases, all metrics may be important, but the primary nucleic acid sequence might not have the flexibility to accommodate all of them equally, resulting in codon substitutions that impair cellular fitness.
These rules were used to generate a draft genome in silico with all AGR codons replaced genome-wide, reducing by almost fourfold the number of predicted design flaws (e.g., synonymous codons with metrics outside of the SRZ) as compared with the naive replacement strategy (Materials and Methods, Fig. 7, Fig. S10, and Dataset S7). Furthermore, predicting recalcitrant codons provides hypotheses that can be tested rapidly in vivo using MAGE. Successful replacement sequences then can be implemented together in a redesigned genome. Encouragingly, because all newly predicted design flaws occur in nonessential genes, they would be less likely to impact fitness unless (i) despite the “nonessential” annotation, the gene is actually essential or quasi-essential (i.e., inactivation would impair growth) or (ii) the codon in a nonessential gene impacts the expression of a neighboring essential gene (e.g., impacts an RBS motif or RNA structure). Although incorrect genome annotations can only be addressed empirically (as demonstrated with gene dedD), further analysis reveals that AGR codons in nonessential genes should rarely impact annotated essential genes. In E. coli MG1655, only three AGR codons in nonessential genes overlap with the initial mRNA and RBS motifs of essential genes, and at least one synonymous CGN codon is predicted to obey the SRZ for all three cases. Furthermore, even if all synonymous mutations were to disobey the SRZ, because disruption of nonessential gene function should not compromise viability, it is expected that nonsynonymous mutations in nonessential genes would be viable as long as they conserve crucial motifs impacting expression of the essential gene. Importantly, we confirmed by MAGE that AGR→CGU codon replacement was possible in two of these three cases and that an alternative synonymous solution could be found in the remaining case (Materials and Methods).
Fig. 7.
Predicting optimal replacements for AGR codons reduces the number of codons that are predicted to require troubleshooting. (A) Empirical data from the construction of C123. One hundred ten AGR codons were successfully recoded to CGU (green), and 13 recalcitrant AGR codons required troubleshooting (red, striped). (B) Predicted recalcitrant codons (codons for which no CGN alternatives fall within the SRZ in Fig. 4) for replacing all instances of the AGR codons genome-wide. The reference genome used for this analysis had insertion elements and prophages removed (54) to reduce total nucleotides synthesized and to increase genome stability, leaving 3,222 AGR codons to be replaced (Materials and Methods). Our analysis predicts that replacing all instances of AGR with CGU would have resulted in 229 failed conversions (Naive Replacement, red striped). However, implementing the rules from this work (Informed Replacement) to identify the best CGN alternative reduces the predicted failure rate from 7.1% (229/3,222), to 2.0% (64/3,222) AGR, of which only a small subset will have a direct impact on fitness because the rest are located in nonessential genes. In such cases, MAGE with degenerate oligos could be used to empirically identify replacement codons as we have demonstrated herein. Each specific synonymous CGN is identified by a unique shade of green and is labeled.
Fig. S10.
The number of predicted recalcitrant AGR codons for each AGR replacement strategy. Four possible genomes replacing all 3,222 AGRs have been designed using four replacement strategies. First AGRs were changed to CGU genome-wide (green bars). Second, AGR synonyms were chosen to minimize local mRNA folding deviation near the start of genes (orange bars). Third, AGR synonyms were chosen to reduce RBS strength deviation (blue bars). Finally, AGR synonyms were chosen to minimize both local mRNA folding deviation and RBS strength deviation (purple bars). These genomes were then scored using custom software available on GitHub (https://github.com/churchlab/agr_recoding) and were compared. Every deviation outside the safe replacement zone is predicted to be a recalcitrant codon.
Comprehensively removing all instances of AGR codons from all E. coli essential genes revealed 13 design flaws that could be explained by a disruption in coding DNA sequence, RBS-mediated translation initiation, RBS-mediated translation pausing, or mRNA structure. Although the importance of each factor has been reported, our work systematically explores the extent to which and the frequency at which they impact genome function. Furthermore, our work establishes quantitative guidelines to reduce the chance of designing nonviable genomes. Although additional factors undoubtedly impact genome function, the fact that these guidelines captured all instances of failed synonymous codon replacements (Fig. 4) suggests that our genome design guidelines provide a strong first approximation of acceptable modifications to the primary sequence of viable genomes. These design rules coupled with inexpensive DNA synthesis will facilitate the construction of radically redesigned genomes exhibiting useful properties such as biocontainment, virus resistance, and expanded amino acid repertoires (45).
Materials and Methods
Strains and Culture Methods.
The strains used in this work were derived from EcM2.1 (E. coli MG1655 mutS_mut dnaG_Q576A exoX_mut xonA_mut xseA_mut 1255700::tolQRA Δ(ybhB-bioAB)::[λcI857 N(cro-ea59)::tetR-bla]) (33). Liquid culture medium consisted of the Lennox formulation of lysogeny broth (LBL) [1% (wt/vol) bacto tryptone, 0.5% (wt/vol) yeast extract, 0.5% (wt/vol) sodium chloride] (46) with appropriate selective agents: carbenicillin (50 μg/mL) and SDS [0.005% (wt/vol)]. For tolC counterselections, colicin E1 (colE1) was used at a 1:100 dilution from an in-house purification (47) that measured 14.4 μg protein/μL (22, 36), and vancomycin was used at 64 μg/mL. Solid culture medium consisted of LBL autoclaved with 1.5% (wt/vol) Bacto Agar (Fisher), containing the same concentrations of antibiotics as necessary. ColE1 agar plates were generated as described previously (33). Doubling times were determined on a BioTek Eon Microplate reader with orbital shaking at 365 cycles/min at 34 °C overnight and were analyzed using a Matlab script available on GitHub (https://github.com/churchlab/agr_recoding).
Oligonucleotides, PCR, and Isothermal Assembly.
A complete table of MAGE oligonucleotides and PCR primers can be found in Dataset S1.
PCR products used in recombination or for Sanger sequencing were amplified with Kapa 2G Fast polymerase according to the manufacturer’s standard protocols. Multiplex allele-specific PCR (mascPCR) was used for multiplexed genotyping of AGR-replacement events using the KAPA2G Fast Multiplex PCR Kit, according to previous methods (22, 48). Sanger-sequencing reactions were carried out through a third party (GENEWIZ). CRAM plasmids were assembled from plasmid backbones linearized using PCR (49), and CRISPR/photospacer adjacent motif (PAM) sequences were obtained in Gblocks from Integrated DNA Technologies, using isothermal assembly at 50 °C for 60 min (50).
Lambda Red Recombinations, MAGE, and CoS-MAGE.
Lambda Red recombineering, MAGE, and CoS-MAGE were carried out as described previously (33, 51). In singleplex recombinations, the MAGE oligo was used at 1 μM; the coselection oligo was 0.2 μM, and the total oligo pool was 5 μM in multiplex recombinations (7–14 oligos). When double-stranded PCR products were recombined (e.g., tolC insertion), 100 ng of double-stranded PCR product was used. Because we used CoS-MAGE with tolC selection to replace target AGR codons, each recombination was paired with a control recombined with water only to monitor tolC selection performance. The standard CoS-MAGE protocol for each oligo set was to insert tolC, inactivate tolC, reactivate tolC, and delete tolC. mascPCR screening was performed at the tolC insertion, inactivation, and deletion steps. All Lambda Red recombinations were followed by a recovery in 3 mL LBL followed by an SDS selection (tolC insertion, tolC activation) or ColE1 counterselection (tolC inactivation, tolC deletion) that was carried out as previously described (33).
General AGR Replacement Strategy.
AGR codons in essential genes were found by cross-referencing essential gene annotation according to two complementary resources (31, 52) to find the shared set (107 coding regions), which contained 123 unique AGR codons (82 AGA, 41 AGG). We used optMAGE (35, 51) to design 90-mer oligos (targeting the lagging strand of the replication fork) that convert each AGR to CGU (Datasets S1 and S8). We reduced the total number of AGR replacement oligos to 119 by designing oligos to encode multiple edits where possible, maintaining at least 20 bp of homology on the 5′ and 3′ ends of the oligo. The oligos then were pooled based on chromosomal position into 12 MAGE oligo sets of varying complexity (minimum: 7, maximum: 14) such that a single marker (tolC) could be inserted at most 564,622 bp upstream relative to replication direction for all targets within a given set. We then identified tolC insertion sites for each of the 12 pools either as intergenic regions or nonessential genes that met the distance criteria for a given pool. See Table 1 for descriptors for each of the 12 oligo pools.
Troubleshooting Strategy.
A recalcitrant AGR was defined as one that was not converted to CGU in one of at least 96 clones picked after the third step of the conversion process. The recalcitrant AGR codon then was triaged for troubleshooting (Fig. S1) in the parental strain (EcM2.1). First, the sequence context of the codon was examined for design errors or potential issues, such as misannotation or a disrupted RBS for an overlapping gene. In most cases, corrected oligos could be easily designed and tested. If no such obvious redesign was possible, we attempted to replace AGR with CGN mutations. If attempting to replace AGR with CGN failed to give recombinants, we tested compensatory, synonymous mutations in a 3-aa window around the recalcitrant AGR. If needed, we finally relaxed synonymous stringency by recombining with oligos encoding AGR-to-NNN mutations.
After each step in the troubleshooting workflow, we screened 96 clones from two successive CoS-MAGE recombinations using allele-specific PCR with primers that hybridize to the wild-type genotype. Sequences that failed to yield a wild-type amplicon were Sanger-sequenced to confirm conversion. We also measured doubling time of all clones in LBL to pair sequencing data with fitness data and chose the recombined clone with the shortest doubling time. Doubling time was determined by obtaining a growth curve on a BioTek plate reader (either an Eon or H1) and was analyzed using web-based open-source genome resequencing software available on GitHub at https://github.com/churchlab/millstone. This genotype then was implemented in the complete strain at the end of strain construction using MAGE and was confirmed by mascPCR screening.
AGR Codons in Nonessential Genes with Impact on Essential Genes.
In E. coli MG1655, only three AGR codons in nonessential genes overlap with the initial mRNA and RBS motifs of essential genes, and at least one synonymous CGN codon is predicted to obey the SRZ for all three cases. As in the troubleshooting pipeline, we attempted to replace AGR with CGT mutations using MAGE. After four cycles of MAGE, cells were plated, and 96 clones were screened. Synonymous codon replacement was possible for genes rffT and mraW but not for gene yidD. We then relaxed synonymous stringency by recombining with oligos encoding AGR-to-NNN mutations for gene yidD and found multiple alternative solutions, including CGA, UGA, GUG, GCG, and TAA. Importantly, the synonymous CGA alternative solutions were less disruptive than CGU to RBS strength and mRNA folding (Dataset S7), further confirming our rules as useful guidelines.
mRNA Folding and RBS Strength Computations.
A custom Python pipeline (available at https://github.com/churchlab/agr_recoding) was used to compute mRNA folding and RBS strength value for each sequence. mRNA folding was based on the UNAFold calculator (40) and RBS strength on the Salis calculator (53). The parameters for mRNA folding are the temperature (37 °C) and the window used, which was an average between −30 to +100 nt and −15 to +100 nt around the start site of the gene and was based on ref. 12. The only parameter for RBS strength is the distance between the RBS and the promoter, and we averaged between 9 and 10 nt after the codon of interest based on Li et al. (20). Data visualization was performed through a custom Matlab code.
For in silico predictions on the entire genome, all 3,222 AGR in nonphage genes were analyzed using this custom pipeline; data are presented in Dataset S7. Phage genes were not analyzed to reduce the complexity of the genome, inspired by other reduced genome efforts (54).
Whole-Genome Sequencing of Strains Lacking AGR Codons in Their Essential Genes.
Sheared genomic DNA was obtained by shearing 130 μL of purified genomic DNA in a Covaris E210 ultrasonicator. Whole-genome library preparation was carried out as previously described (55). Briefly, 130 μL of purified genomic DNA was sheared overnight in a Covaris E210 with the following protocol: duty cycle 10%, intensity 5, 200 cycles per burst, time 780 s per sample. The samples were assayed for shearing on an agarose gel, and, if the distribution was acceptable (peak distribution ∼400 nt), the samples were size-selected by solid-phase reverse immobilization (SPRI)/reverse-SPRI purification as described in ref. 55. The fragments then were blunted, and p5/p7 adaptors were ligated, followed by fill-in and gap repair (New England Biolabs). Then each sample was quantified by quantitative PCR (qPCR) using SYBR green and Kapa Hifi. The results were used to determine how many cycles to amplify the resulting library for barcoding using P5-sol and P7-sol primers. The resulting individual libraries were quantified by NanoDrop (Thermo Scientific) and pooled. The resulting library was quantified by qPCR and an Agilent TapeStation, and run on MiSeq 2 × 150. Data were analyzed to confirm AGR conversions and to identify off-target mutations using Millstone, a web-based open-source genome resequencing tool.
Sequences are available online at https://github.com/churchlab/agr_recoding.
NNN-Sequencing and CRISPR.
CRISPR/Cas9 was used to deplete the wild-type parental genotype by selectively cutting chromosomes at unmodified target sites next to the desired AGR codon changes. Candidate sites were determined using the built-in target site finder in Geneious proximally close to the AGR codon being targeted. Sites were chosen if they were less than 50 bp upstream of the AGR codon and could be disrupted with synonymous changes. If multiple sites fulfilled these criteria, the site with the lowest level of sequence similarity to other portions of the genome was chosen. Oligos of a length of ∼130 bp were designed for all 14 genes with an AGR codon in the first 30 nt after the translation start site. Those oligos incorporated both an NNN random codon at the AGR position and multiple (up to six) synonymous changes in a CRISPR target site at least 50 nt downstream of an AGR codon. This change modifies the AGR locus and simultaneously disrupts the CRISPR target site, ensuring randomization of the locus after the parental genotype is deleted.
Specifically, we constructed a plasmid containing the SpCas9 protein gene [plasmid details: DS-SPcas (Addgene plasmid 48645): cloDF13 origin, specR, proC promoter, SPcas9, unused tracrRNA (with native promoter and terminator), J23100 promoter, one repeat (added to facilitate cloning in a spacer onto the same plasmid)]. We also constructed 14 plasmids containing the guide RNA directed toward the unmodified sequences (Plasmid details: PM-!T4Y: p15a origin, chlorR, J23100 promoter, spacer targeting T4, one repeat).
For each of 24 genes, five cycles of MAGE were performed with the specific mutagenesis oligo at a concentration of 1 μM. CRISPR repeat-spacer plasmids carrying guides designed to target the chosen sites were electroporated into each diversified pool after the last recombineering cycle. After 1 h of recovery, both the SpCas9 and repeat-spacer plasmids were selected for and passaged in three parallel lineages for each of the 24 AGR codons for 144 h. After 2 h of selection, and at every 24-h interval, samples were taken, and the cells were diluted 1/100 in selective medium.
Each randomized population was amplified using PCR primers allowing specific amplification of strains incorporating the CRISPR-site modifications. The resulting triplicate libraries for each AGR codon then were pooled and barcoded with P5-sol and P7-sol primers and run were on a MiSeq 1 × 50. Data were analyzed using custom Matlab code available on https://github.com/churchlab/agr_recoding.
For each gene and each data point, reads were aligned to the reference genome, and frequencies of each codon were computed. In Fig. 5, the mRNA structure deviation (red line) and RBS strength deviation (blue line) in arbitrary units were computed as the product of the frequencies and the corresponding deviation for each codon.
SI Materials and Methods
Strains and Culture Methods.
The strains used in this work were derived from EcM2.1 (E. coli MG1655 mutS_mut dnaG_Q576A exoX_mut xonA_mut xseA_mut 1255700::tolQRA Δ(ybhB-bioAB)::[λcI857 N(cro-ea59)::tetR-bla]) (33). Liquid culture medium consisted of the Lennox formulation of lysogeny broth (LBL) [1% (wt/vol) bacto tryptone, 0.5% (wt/vol) yeast extract, 0.5% (wt/vol) sodium chloride] (46) with appropriate selective agents: carbenicillin (50 μg/mL) and SDS [0.005% (wt/vol)]. For tolC counterselections, colicin E1 (colE1) was used at a 1:100 dilution from an in-house purification (47) that measured 14.4 μg protein/μL (22, 36), and vancomycin was used at 64 μg/mL. Solid culture medium consisted of LBL autoclaved with 1.5% (wt/vol) Bacto Agar (Fisher) containing the same concentrations of antibiotics as necessary. ColE1 agar plates were generated as described previously (33). Doubling times were determined on a BioTek Eon microplate reader with orbital shaking at 365 cycles per minute at 34 °C overnight and were analyzed using a Matlab script available on GitHub (https://github.com/churchlab/agr_recoding).
Oligonucleotides, PCR, and Isothermal Assembly.
A complete table of MAGE oligonucleotides and PCR primers can be found in Dataset S1.
PCR products used in recombination or for Sanger sequencing were amplified with Kapa2G Fast polymerase according to manufacturer’s standard protocols. mascPCR was used for multiplexed genotyping of AGR replacement events using the KAPA2G Fast Multiplex PCR Kit, according to previous methods (22, 48). Sanger-sequencing reactions were carried out by a third party (GENEWIZ). CRAM plasmids were assembled from plasmid backbones linearized using PCR (49), and CRISPR/PAM sequences obtained in Gblocks from Integrated DNA Technologies, using isothermal assembly at 50 °C for 60 min (50).
Lambda Red Recombinations, MAGE, and CoS-MAGE.
Lambda Red recombineering, MAGE, and CoS-MAGE were carried out as described previously (33, 51). In singleplex recombinations, the MAGE oligo was used at 1 μM; the coselection oligo was 0.2 μM, and the total oligo pool was 5 μM in multiplex recombinations (7–14 oligos). When double-stranded PCR products were recombined (e.g., tolC insertion), 100 ng of double-stranded PCR product was used. Because we used CoS-MAGE with tolC selection to replace target AGR codons, each recombination was paired with a control recombined with water only to monitor tolC selection performance. The standard CoS-MAGE protocol for each oligo set was to insert tolC, inactivate tolC, reactivate tolC, and delete tolC. mascPCR screening was performed at the tolC insertion, inactivation, and deletion steps. All Lambda Red recombinations were followed by a recovery in 3 mL LBL followed by a SDS selection (tolC insertion, tolC activation) or ColE1 counterselection (tolC inactivation, tolC deletion) that was carried out as previously described (33).
General AGR Replacement Strategy.
AGR codons in essential genes were found by cross-referencing essential gene annotation according to two complementary resources (31, 52) to find the shared set (107 coding regions), which contained 123 unique AGR codons (82 AGA, 41 AGG). We used optMAGE (35, 51) to design 90-mer oligos (targeting the lagging strand of the replication fork) that convert each AGR to CGU. We reduced the total number of AGR replacement oligos to 119 by designing oligos to encode multiple edits where possible, maintaining at least 20 bp of homology on the 5′ and 3′ ends of the oligo. The oligos then were pooled based on chromosomal position into 12 MAGE oligo sets of varying complexity (minimum: 7, maximum: 14) such that a single marker (tolC) could be inserted at most 564,622 bp upstream relative to replication direction for all targets within a given set. We then identified tolC insertion sites for each of the 12 pools into either intergenic regions or nonessential genes that met the distance criteria for a given pool. See Table 1 for descriptors for each of the 12 oligo pools.
Troubleshooting Strategy.
A recalcitrant AGR was defined as one that was not converted to CGU in one of at least 96 clones picked after the third step of the conversion process. The recalcitrant AGR codon was then triaged for troubleshooting (asterisks in Fig. 3A) in the parental strain (EcM2.1). First, the sequence context of the codon was examined for design errors or potential issues, such as misannotation or a disrupted RBS for an overlapping gene. In most cases, corrected oligos could be easily designed and tested. If no such obvious redesign was possible, we attempted to replace AGR with CGN mutations. If attempting to replace AGR with CGN failed to give recombinants, we tested compensatory, synonymous mutations in a 3-aa window around the recalcitrant AGR. If needed, we finally relaxed synonymous stringency by recombining with oligos encoding AGR-to-NNN mutations.
After each step in the troubleshooting workflow, we screened 96 clones from two successive CoS-MAGE recombinations using allele-specific PCR with primers that hybridize to the wild-type genotype. Sequences that failed to yield a wild-type amplicon were Sanger-sequenced to confirm conversion. We also measured doubling time of all clones in LBL to pair sequencing data with fitness data and chose the recombined clone with the shortest doubling time. Doubling time was determined by obtaining a growth curve on a BioTek Eon or H1plate reader and was analyzed using web-based open-source genome resequencing software available on GitHub at https://github.com/churchlab/millstone. This genotype then was implemented in the complete strain at the end of strain construction using MAGE and was confirmed by mascPCR screening.
AGR Codons in Nonessential Genes with Impact on Essential Genes.
In E. coli MG1655, only three AGR codons in nonessential genes overlap with the initial mRNA and RBS motifs of essential genes, and at least one synonymous CGN codon is predicted to obey the SRZ for all three cases. As in the troubleshooting pipeline, we attempted to replace AGR codons with CGT mutations using MAGE. After four cycles of MAGE, cells were plated, and 96 clones were screened. Synonymous codon replacement was possible for genes rffT and mraW, but not for gene yidD. We then relaxed synonymous stringency by recombining with oligos encoding AGR-to-NNN mutations for gene yidD and found multiple alternative solutions including CGA, UGA, GUG, GCG, and TAA. Importantly, the synonymous CGA alternative solutions were less disruptive than CGU to RBS strength and mRNA folding (Dataset S7), further confirming our rules as useful guidelines.
mRNA Folding and RBS Strength Computations.
A custom Python pipeline (available at https://github.com/churchlab/agr_recoding) was used to compute mRNA folding and RBS strength value for each sequence. mRNA folding was based on the UNAFold calculator (40), and RBS strength was based on the Salis calculator (53). The parameters for mRNA folding are the temperature (37 °C) and the window used, which was an average between −30 to +100 nt and −15 to +100 nt around the start site of the gene and was based on ref. 12. The only parameter for RBS strength is the distance between the RBS and the promoter, and we averaged between 9 and 10 nt after the codon of interest, based on Li et al. (20). Data visualization was performed through a custom Matlab code.
For in silico predictions on the entire genome, all 3,222 AGR in nonphage genes were analyzed using this custom pipeline. Data are presented in Dataset S7. Phage genes were not analyzed to reduce the complexity of the genome, inspired by other reduced genome efforts (54).
Whole-Genome Sequencing of Strains Lacking AGR Codons in their Essential Genes.
Sheared genomic DNA was obtained by shearing 130 μL of purified genomic DNA in a Covaris E210 focused ultrasonicator. Whole-genome library preparation was carried out as previously described (55). Briefly, 130 μL of purified genomic DNA was sheared overnight in a Covaris E210 ultrasonicator with the following protocol: duty cycle 10%, intensity 5, 200 cycles per burst, time 780 s per sample. The samples were assayed for shearing on an agarose gel, and, if the distribution was acceptable (peak distribution ∼400 nt), the samples were size-selected by SPRI/reverse-SPRI purification as described in ref. 55. Then the fragments were blunted, and p5/p7 adaptors were ligated, followed by fill-in and gap repair (New England Biolabs). Each sample was qPCR quantified using SYBR green and Kapa Hifi. These results were used to determine how many cycles should be used to amplify the resulting library for barcoding using P5-sol and P7-sol primers. The resulting individual libraries were quantified by NanoDrop (Thermo Scientific) and were pooled. The resulting library was quantified by qPCR and an Agilent TapeStation and was run on MiSeq 2 × 150. Data were analyzed to confirm AGR conversions and to identify off-target mutations using Millstone, a web-based open-source genome resequencing tool.
Sequences are available online at https://github.com/churchlab/agr_recoding.
NNN Sequencing and CRISPR.
CRISPR/Cas9 was used to deplete the wild-type parental genotype by selectively cutting chromosomes at unmodified target sites next to the desired AGR codons changes. Candidate sites were determined using the built-in target site finder in Geneious proximally close to the AGR codon being targeted. Sites were chosen if they were less than 50 bp upstream of the AGR codon and could be disrupted with synonymous changes. If multiple sites fulfilled these criteria, the site with the lowest level of sequence similarity to other portions of the genome was chosen. Oligos of a length of ∼130 bp were designed for all 14 genes with an AGR codon in the first 30 nt after the translation start site. Those oligos incorporated both an NNN random codon at the AGR position and multiple (up to six) synonymous changes in a CRISPR target site at least 50 nt downstream of an AGR codon. This change modifies the AGR locus and simultaneously disrupts the CRISPR target site, ensuring randomization of the locus after the parental genotype is deleted.
Specifically, we constructed a plasmid containing the SpCas9 protein gene [plasmid details: DS-SPcas (Addgene plasmid 48645): cloDF13 origin, specR, proC promoter, SPcas9, unused tracrRNA (with native promoter and terminator), J23100 promoter, one repeat (added to facilitate cloning in a spacer onto the same plasmid)]. We also constructed 14 plasmids containing the guide RNA directed toward the unmodified sequences (plasmid details: PM-!T4Y: p15a origin, chlorR, J23100 promoter, spacer targeting T4, one repeat).
For each of 24 genes, five cycles of MAGE were performed with the specific mutagenesis oligo at a concentration of 1 μM. CRISPR repeat-spacer plasmids carrying guides designed to target the chosen sites were electroporated into each diversified pool after the last recombineering cycle. After 1 h of recovery, both the SpCas9 and repeat-spacer plasmids were selected for and passaged in three parallel lineages for each of the 24 AGR codons for 144 h. After 2 h of selection, and at every 24-h interval, samples were taken, and the cells were diluted 1/100 in selective medium.
Each randomized population was amplified using PCR primers allowing specific amplification of strains incorporating the CRISPR-site modifications. The resulting triplicate libraries for each AGR codon were then pooled and barcoded with P5-sol and P7-sol primers and run on a MiSeq 1 × 50. Data were analyzed using custom Matlab code available on https://github.com/churchlab/agr_recoding.
For each gene and each data point, reads were aligned to the reference genome, and frequencies of each codon were computed. In Fig. 5, the mRNA structure deviation (red line) and RBS strength deviation (blue line) in arbitrary units were computed as the product of the frequencies and the corresponding deviation for each codon.
Supplementary Material
Acknowledgments
Stephanie Yaung generously provided CRISPR plasmids and technical support. This work was supported by US Department of Energy Grant DE-FG02-02ER63445; by US Defense Advanced Research Projects Agency Grant N66001-12-C-4211 (to F.J.I. and D.S.); by National Institute of General Medical Sciences Grant M22854-42 (to D.S.); by US Department of Defense National Defense Science and Engineering Graduate Fellowships (M.J.L. and G.K.); by a US National Science Foundation Graduate Research Fellowship (D.B.G.); by the Lynch Foundation (M.L.); by an Amazon Web Services in Education Grant Award (G.K.); and by the Arnold & Mabel Beckman Foundation and DuPont, Inc. (F.J.I.). Funding for open access was provided through US Department of Energy Grant DE-FG02-02ER63445.
Footnotes
Conflict of interest statement: G.K., M.J.L., M.L., M.G.N., D.B.G., and G.M.C. are inventors on patent application #62350468 submitted by the President and Fellows of Harvard College. G.M.C. is a founder of Enevolv Inc. and Gen9bio. Other potentially relevant financial interests are listed at arep.med.harvard.edu/gmc/tech.html.
This article is a PNAS Direct Submission.
Data deposition: The sequence reported in this paper have been deposited in the BioProject database, ncbi.nlm.nih.gov/bioproject (accession no. PRJNA298327).
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1605856113/-/DCSupplemental.
References
- 1.Crick FH. On the genetic code. Science. 1963;139(3554):461–464. doi: 10.1126/science.139.3554.461. [DOI] [PubMed] [Google Scholar]
- 2.Kimura M. Preponderance of synonymous changes as evidence for the neutral theory of molecular evolution. Nature. 1977;267(5608):275–276. doi: 10.1038/267275a0. [DOI] [PubMed] [Google Scholar]
- 3.Newton R, Wernisch L. A meta-analysis of multiple matched copy number and transcriptomics data sets for inferring gene regulatory relationships. PLoS One. 2014;9(8):e105522. doi: 10.1371/journal.pone.0105522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.dos Reis M, Savva R, Wernisch L. Solving the riddle of codon usage preferences: A test for translational selection. Nucleic Acids Res. 2004;32(17):5036–5044. doi: 10.1093/nar/gkh834. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hershberg R, Petrov DA. Selection on codon bias. Annu Rev Genet. 2008;42:287–299. doi: 10.1146/annurev.genet.42.110807.091442. [DOI] [PubMed] [Google Scholar]
- 6.Plotkin JB, Kudla G. Synonymous but not the same: The causes and consequences of codon bias. Nat Rev Genet. 2011;12(1):32–42. doi: 10.1038/nrg2899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Chen GT, Inouye M. Role of the AGA/AGG codons, the rarest codons in global gene expression in Escherichia coli. Genes Dev. 1994;8(21):2641–2652. doi: 10.1101/gad.8.21.2641. [DOI] [PubMed] [Google Scholar]
- 8.Chen GF, Inouye M. Suppression of the negative effect of minor arginine codons on gene expression; preferential usage of minor codons within the first 25 codons of the Escherichia coli genes. Nucleic Acids Res. 1990;18(6):1465–1473. doi: 10.1093/nar/18.6.1465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kane JF. Effects of rare codon clusters on high-level expression of heterologous proteins in Escherichia coli. Curr Opin Biotechnol. 1995;6(5):494–500. doi: 10.1016/0958-1669(95)80082-4. [DOI] [PubMed] [Google Scholar]
- 10.Sharp PM, Li WH. The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 1987;15(3):1281–1295. doi: 10.1093/nar/15.3.1281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Sharp PM, Stenico M, Peden JF, Lloyd AT. Codon usage: Mutational bias, translational selection, or both? Biochem Soc Trans. 1993;21(4):835–841. doi: 10.1042/bst0210835. [DOI] [PubMed] [Google Scholar]
- 12.Goodman DB, Church GM, Kosuri S. Causes and effects of N-terminal codon bias in bacterial genes. Science. 2013;342(6157):475–479. doi: 10.1126/science.1241934. [DOI] [PubMed] [Google Scholar]
- 13.Zhou M, et al. Non-optimal codon usage affects expression, structure and function of clock protein FRQ. Nature. 2013;495(7439):111–115. doi: 10.1038/nature11833. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Tuller T, et al. An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell. 2010;141(2):344–354. doi: 10.1016/j.cell.2010.03.031. [DOI] [PubMed] [Google Scholar]
- 15.Li GW. How do bacteria tune translation efficiency? Curr Opin Microbiol. 2015;24:66–71. doi: 10.1016/j.mib.2015.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Hooper SD, Berg OG. Gradients in nucleotide and codon usage along Escherichia coli genes. Nucleic Acids Res. 2000;28(18):3517–3523. doi: 10.1093/nar/28.18.3517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gingold H, et al. A dual program for translation regulation in cellular proliferation and differentiation. Cell. 2014;158(6):1281–1292. doi: 10.1016/j.cell.2014.08.011. [DOI] [PubMed] [Google Scholar]
- 18.Quax TE, Claassens NJ, Söll D, van der Oost J. Codon bias as a means to fine-tune gene expression. Mol Cell. 2015;59(2):149–161. doi: 10.1016/j.molcel.2015.05.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kudla G, Murray AW, Tollervey D, Plotkin JB. Coding-sequence determinants of gene expression in Escherichia coli. Science. 2009;324(5924):255–258. doi: 10.1126/science.1170160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Li GW, Oh E, Weissman JS. The anti-Shine-Dalgarno sequence drives translational pausing and codon choice in bacteria. Nature. 2012;484(7395):538–541. doi: 10.1038/nature10965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lajoie MJ, et al. Probing the limits of genetic recoding in essential genes. Science. 2013;342(6156):361–363. doi: 10.1126/science.1241460. [DOI] [PubMed] [Google Scholar]
- 22.Isaacs FJ, et al. Precise manipulation of chromosomes in vivo enables genome-wide codon replacement. Science. 2011;333(6040):348–353. doi: 10.1126/science.1205822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lajoie MJ, et al. Genomically recoded organisms expand biological functions. Science. 2013;342(6156):357–360. doi: 10.1126/science.1241459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lee BS, et al. Incorporation of unnatural amino acids in response to the AGG codon. ACS Chem Biol. 2015;10(7):1648–1653. doi: 10.1021/acschembio.5b00230. [DOI] [PubMed] [Google Scholar]
- 25.Mukai T, et al. Reassignment of a rare sense codon to a non-canonical amino acid in Escherichia coli. Nucleic Acids Res. 2015;43(16):8111–8122. doi: 10.1093/nar/gkv787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zeng Y, Wang W, Liu WR. Towards reassigning the rare AGG codon in Escherichia coli. ChemBioChem. 2014;15(12):1750–1754. doi: 10.1002/cbic.201400075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Rosenberg AH, Goldman E, Dunn JJ, Studier FW, Zubay G. Effects of consecutive AGG codons on translation in Escherichia coli, demonstrated with a versatile codon test system. J Bacteriol. 1993;175(3):716–722. doi: 10.1128/jb.175.3.716-722.1993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Spanjaard RA, van Duin J. Translation of the sequence AGG-AGG yields 50% ribosomal frameshift. Proc Natl Acad Sci USA. 1988;85(21):7967–7971. doi: 10.1073/pnas.85.21.7967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Spanjaard RA, Chen K, Walker JR, van Duin J. Frameshift suppression at tandem AGA and AGG codons by cloned tRNA genes: Assigning a codon to argU tRNA and T4 tRNA(Arg) Nucleic Acids Res. 1990;18(17):5031–5036. doi: 10.1093/nar/18.17.5031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Bonekamp F, Andersen HD, Christensen T, Jensen KF. Codon-defined ribosomal pausing in Escherichia coli detected by using the pyrE attenuator to probe the coupling between transcription and translation. Nucleic Acids Res. 1985;13(11):4113–4123. doi: 10.1093/nar/13.11.4113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Baba T, et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: The Keio collection. Mol Syst Biol. 2006;2:2006. doi: 10.1038/msb4100050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Carr PA, et al. Enhanced multiplex genome engineering through co-operative oligonucleotide co-selection. Nucleic Acids Res. 2012;40(17):e132. doi: 10.1093/nar/gks455. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Gregg CJ, et al. Rational optimization of tolC as a powerful dual selectable marker for genome engineering. Nucleic Acids Res. 2014;42(7):4779–4790. doi: 10.1093/nar/gkt1374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Yu D, et al. An efficient recombination system for chromosome engineering in Escherichia coli. Proc Natl Acad Sci USA. 2000;97(11):5978–5983. doi: 10.1073/pnas.100127597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ellis HM, Yu D, DiTizio T, Court DL. High efficiency mutagenesis, repair, and engineering of chromosomal DNA using single-stranded oligonucleotides. Proc Natl Acad Sci USA. 2001;98(12):6742–6746. doi: 10.1073/pnas.121164898. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Lajoie MJ, Gregg CJ, Mosberg JA, Washington GC, Church GM. Manipulating replisome dynamics to enhance lambda Red-mediated multiplex genome engineering. Nucleic Acids Res. 2012;40(22):e170. doi: 10.1093/nar/gks751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Ohtake K, et al. Efficient decoding of the UAG triplet as a full-fledged sense codon enhances the growth of a prfA-deficient strain of Escherichia coli. J Bacteriol. 2012;194(10):2606–2613. doi: 10.1128/JB.00195-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Craigen WJ, Cook RG, Tate WP, Caskey CT. Bacterial peptide chain release factors: Conserved primary structure and possible frameshift regulation of release factor 2. Proc Natl Acad Sci USA. 1985;82(11):3616–3620. doi: 10.1073/pnas.82.11.3616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Curran JF. Analysis of effects of tRNA:message stability on frameshift frequency at the Escherichia coli RF2 programmed frameshift site. Nucleic Acids Res. 1993;21(8):1837–1843. doi: 10.1093/nar/21.8.1837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Markham NR, Zuker M. UNAFold: Software for nucleic acid folding and hybridization. Methods Mol Biol. 2008;453:3–31. doi: 10.1007/978-1-60327-429-6_1. [DOI] [PubMed] [Google Scholar]
- 41.Zadeh JN, et al. NUPACK: Analysis and design of nucleic acid systems. J Comput Chem. 2011;32(1):170–173. doi: 10.1002/jcc.21596. [DOI] [PubMed] [Google Scholar]
- 42.Novoa EM, Ribas de Pouplana L. Speeding with control: Codon usage, tRNAs, and ribosomes. Trends Genet. 2012;28(11):574–581. doi: 10.1016/j.tig.2012.07.006. [DOI] [PubMed] [Google Scholar]
- 43.Novoa EM, Pavon-Eternod M, Pan T, Ribas de Pouplana L. A role for tRNA modifications in genome structure and codon usage. Cell. 2012;149(1):202–213. doi: 10.1016/j.cell.2012.01.050. [DOI] [PubMed] [Google Scholar]
- 44.Ikemura T. Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol. 1985;2(1):13–34. doi: 10.1093/oxfordjournals.molbev.a040335. [DOI] [PubMed] [Google Scholar]
- 45.Lajoie MJ, Söll D, Church GM. Overcoming challenges in engineering the genetic code. J Mol Biol. 2016;428(5 Pt B):1004–1021. doi: 10.1016/j.jmb.2015.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Lennox ES. Transduction of linked genetic characters of the host by bacteriophage P1. Virology. 1955;1(2):190–206. doi: 10.1016/0042-6822(55)90016-7. [DOI] [PubMed] [Google Scholar]
- 47.Schwartz SA, Helinski DR. Purification and characterization of colicin E1. J Biol Chem. 1971;246(20):6318–6327. [PubMed] [Google Scholar]
- 48.Mosberg JA, Gregg CJ, Lajoie MJ, Wang HH, Church GM. Improving lambda red genome engineering in Escherichia coli via rational removal of endogenous nucleases. PLoS One. 2012;7(9):e44638. doi: 10.1371/journal.pone.0044638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Yaung SJ, Esvelt KM, Church GM. CRISPR/Cas9-mediated phage resistance is not impeded by the DNA modifications of phage T4. PLoS One. 2014;9(6):e98811. doi: 10.1371/journal.pone.0098811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Gibson DG, et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods. 2009;6(5):343–345. doi: 10.1038/nmeth.1318. [DOI] [PubMed] [Google Scholar]
- 51.Wang HH, et al. Programming cells by multiplex genome engineering and accelerated evolution. Nature. 2009;460(7257):894–898. doi: 10.1038/nature08187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Hashimoto M, et al. Cell size and nucleoid organization of engineered Escherichia coli cells with a reduced genome. Mol Microbiol. 2005;55(1):137–149. doi: 10.1111/j.1365-2958.2004.04386.x. [DOI] [PubMed] [Google Scholar]
- 53.Salis HM. The ribosome binding site calculator. Methods Enzymol. 2011;498:19–42. doi: 10.1016/B978-0-12-385120-8.00002-4. [DOI] [PubMed] [Google Scholar]
- 54.Umenhoffer K, et al. Reduced evolvability of Escherichia coli MDS42, an IS-less cellular chassis for molecular and synthetic biology applications. Microb Cell Fact. 2010;9:38. doi: 10.1186/1475-2859-9-38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Rohland N, Reich D. Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Res. 2012;22(5):939–946. doi: 10.1101/gr.128124.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

















