Abstract
The RNA-guided Cas9s serve as powerful tools for programmable gene editing and regulation; their targeting scopes and efficacies, however, are always constrained by the PAM sequence stringency. Most Streptococci Cas9s, including the prototype SpCas9 from S. pyogenes, specifically recognize a canonical NGG PAM via a conserved RxR PAM-binding motif within the PAM-interaction (PI) domain. Here, SpCas9-based mining unveils three distinct and rarely presented PAM-binding motifs (QxxxR, QxQ and RxQ) among Streptococci Cas9 orthologs. With the catalytically-dead QxxxR-containing SedCas9 from S. equinus, we dissect its NAG PAM specificity and elucidate its underlying recognition mechanism via computational prediction and mutagenesis analysis. Replacing the SedCas9 PI domain with alternate PAM-binding motifs rewires its PAM specificity to NGG or NAA. Moreover, a semi-rational design with minimal mutation creates a SedCas9-NQ variant showing robust activity towards expanded NNG and NAA PAMs, based upon which we engineered a compact ω-SedCas9-NQ transcriptional regulator for PAM-directed bifunctional and titratable gene control. The ω-SedCas9-NQ mediated metabolic reprogramming of endogenous genes in Escherichia coli affords a 2.6-fold increase of 4-hydroxycoumarin production. This work reveals new Cas9 scaffolds with distinct PAM-binding motifs for PAM relaxation and creates a new PAM-diverse Cas9 variant for versatile gene control in bacteria.
Keywords: Cas9 engineering, CRISPR interference and activation, gene control, metabolic engineering, 4-hydroxycoumarin
1. Introduction
CRISPR/Cas are RNA-based adaptive immune systems against invasive genetic elements in almost all archaea and many bacteria (Barrangou et al., 2007; Brouns et al., 2008; Makarova et al., 2015; Terns and Terns, 2011). Of diverse CRISPR/Cas systems, the type II CRISPR/Cas9 systems have been most extensively engineered for genome editing or repurposed for gene regulation or base editing in a wide range of prokaryotic and eukaryotic organisms (Doudna and Charpentier, 2014; Hsu et al., 2014; Komor et al., 2016; Konermann et al., 2015; Qi et al., 2013). To recognize DNA targets, Cas9 requires a programmable single guide RNA (sgRNA) that typically contains a 20–30 bp spacer sequence complementary to the target DNA strand, and a short protospacer adjacent motif (PAM) on the non-target DNA strand (Anders et al., 2014; Jiang et al., 2015; Nishimasu et al., 2014; Sternberg et al., 2014). Due to the programmability of sgRNA, the target range of Cas9 is mainly determined by PAM specificity, rendering Cas9 only accessible to PAM-containing DNA sites. The most robust and widely used Cas9 from Streptococcus pyogenes (SpCas9) adopts a canonical NGG PAM that theoretically occurs once per every 16 bp of a random single-stranded DNA (Hu et al., 2018). Via different PAM-interacting mechanisms, representative Cas9s from diverse bacterial species like SaCas9 from Staphylococcus aureus, LgaCas9 from Lactobacillus gasseri, FnCas9 from Francisella novicida and CjCas9 from Campylobacter jejuni recognize NNGRRT, NTAA, NGR, and NNNVRYM PAMs, respectively (Hirano et al., 2016a; Ran et al., 2015; Sanozky-Dawes et al., 2015; Yamada et al., 2017). This stringent PAM requirement significantly constrains the targeting scopes and efficacies of Cas9s, and thus limits their wide applications when precise or broad positioning is needed, such as homology-directed repair (Findlay et al., 2014; Weber et al., 2015), base or epigenetic editing (Gaudelli et al., 2017; Hilton et al., 2015; Komor et al., 2016), and CRISPR-based gene regulation (Bikard et al., 2013; Dong et al., 2018; Fontana et al., 2020).
To broaden targetable genomic sites, extensive efforts have endeavored to engineer and confer known Cas9s with altered PAM specificities or expanded PAM ranges. As a prototype, SpCas9 has been well elucidated with regard to its DNA recognition mechanism and served as the most intensively engineered template for PAM rewiring (Siksnys and Gasiunas, 2016). SpCas9 accommodates the PAM-containing duplex (PAM duplex) in the groove formed by the PAM-interacting (PI) domain, and recognizes the 2nd and 3rd guanine (G) of the NGG PAM via bidentate hydrogen bonding with the RxR (R1333/R1335) PAM-binding motif in the PI domain (Anders et al., 2016; Hirano et al., 2016b). Protein evolution of SpCas9 along with high-throughput screening has generated SpCas9 mutants with altered PAM specificities to NGA and NGNG, NGCG, and NAAG (Anders et al., 2016; Hirano et al., 2016b; Kleinstiver et al., 2015), or evolved mutants with expanded PAM ranges, including xCas9-3.7 and SpCas9-NG targeting NGN, and SpCas9-NRRH/NRTH/NRCH targeting non-G PAM (Hu et al., 2018; Miller et al., 2020; Nishimasu et al., 2018). Very recently, two near-PAMless SpCas9 mutants SpRY and SpdNG-LWQT have been developed, covering almost all PAMs (NRN>NYN) but with compromised overall activity (Walton et al., 2020; Wang et al., 2021a). These SpCas9 variants significantly lessened the PAM restriction and showcased the capability of modifying PAM specificity via laboratory protein evolution. Alternatively, instead of tentative mutagenesis and extensive screening of large-library SpCas9 mutants, mining new natural SpCas9 orthologs with diverse PI domains or relaxed PAM specificities offers a bypass to identify or engineer new Cas9s with expanded target scope (Gasiunas et al., 2020; Jacobsen et al., 2020; Karvelis et al., 2017). Moreover, by harnessing natural evolution, new SpCas9 orthologs with structural plasticity for relaxed PAMs would serve as ideal candidates to expedite the PAM rewiring or expansion.
Here, to create new PAM-flexible Cas9s with broad applications in metabolic reprogramming, we mined and engineered Streptococci Cas9s with diverse PAM compatibilities and robust activities. First, we reported the discovery of a wider and distinguished subset of PAM-binding motifs including QxxxR, QxQ and RxQ motifs among SpCas9-like orthologs, implying possibly different PAM specificities and PAM recognition mechanisms. Using the QxxxR-containing SeCas9 from S. equinus as a prototype, we determined its NAG PAM preference and elucidated the PAM recognition mechanism via AlphaFold-based protein structure prediction (Jumper et al., 2021) combined with molecular dynamics (MD) simulations. Our work also showed that, albeit at a close evolutionary distance, Cas9 orthologs with divergent PI domains could evolve altered PAM specificities. To exploit the unique PAM-binding motif and relaxed NRG PAM compatibility of SeCas9, we obtained an SeCas9-NQ variant with expanded NNG and NAA PAMs. With less PAM restriction, we further engineered an SedCas9-based transcriptional regulator ω-SedCas9-NQ capable of two-layered gene regulation: (1) PAM position directed bifunctional gene control, and (2) PAM preference mediated titratable gene control. The ω-SedCas9-NQ regulator was applied and demonstrated in metabolic control for enhanced 4-hydroxycoumarin production in E. coli. This work revealed a previously underestimated PAM diversity and the underlying PAM recognition mechanism of SpCas9-like orthologs, based upon which new Cas9 variants with less PAM constraint and broader applications in metabolic reprogramming were created via semi-rational protein engineering.
2. Results
2.1. Revealing PAM diversity of SpCas9-like orthologs
The many known Cas9 orthologs identified from Streptococci share a conserved RxR PAM-binding motif within the PI domain that confers a unanimous NGG PAM specificity. To reveal the PAM diversity among Streptococci Cas9s, SpCas9 was used as a query to mine SpCas9-like proteins via BLASTp in the NCBI database. Of more than 500 putative Streptococci Cas9s with more than 50% amino acid identities to SpCas9, the majority (87%) contained the RxR motif, while the rest contained three uncanonical PAM-binding motifs, including the QxQ motif recently identified in SmacCas9 from S. macacae (Chatterjee et al., 2020b), QxxxR and RxQ motifs (Fig. 1a, Supplementary Fig. 1 and Supplementary Data 1). The QxQ motif that afforded the NAA PAM specificity was found here to be more prevalent among S. mutans Cas9s (Supplementary Data 1). The PAM specificities of QxxxR or RxQ-containing Cas9s were previously uncharacterized. Re-annotation of these SpCas9-like cas loci showed high sequence identities of cas associated genes (cas9-cas1-cas2-casn2) with that of SpCas9, including almost identical tracrRNAs and crRNA repeats (Fig. 1b–c and Supplementary Fig. 2). Interestingly, a putative toxin-antitoxin system (hicA-hicB) was also present in certain cas loci (Supplementary Fig. 2). These observations implied that, although phylogenetically closely-related, SpCas9-like orthologs may recognize different PAMs by adopting distinct PAM-binding motifs.
Fig. 1. Characterization of the QxxxR PAM-binding motif containing SeCas9.
a. Sequence alignment of different PAM-binding motifs among representative SpCas9-like orthologs. Previously identified Cas9s are underlined. b. The cas locus and protein domain structure of SpCas9 from S. pyogenes M1 GAS and SeCas9 from S. equinus ATCC 9812. c. Pair-wise sequence comparison of crRNA and tracrRNA between Sp-sgRNA and Se-sgRNA. d. The dual-plasmid eGFP repression assay system: pZE12 with randomized NNN PAM library and sgRNA, and pCS27 containing catalytically inactive SedCas9. The NNN was located after the ATG start codon of eGFP. e. PAM profile determination of SedCas9 with eGFP repression assay. f. The influence of the 4th nucleotide of the PAM on SedCas9 activity. g. SeCas9-mediated in vivo genome cleavage at NNG PAM sites of aslB locus from E. coli BW25113 (F′) and eGFP locus from E. coli BW25113 (F′) with chromosomally integrated eGFP (E. coli::eGFP). Data indicated the mean ± standard deviation (n = 3 independent biological replicates).
To gain insights into the PAM diversity, we chose the QxxxR-bearing SeCas9 (EFW88975.1) from S. equinus ATCC 9812 as a prototype (Fig. 1b–c). PAM prediction via BLASTn-based protospacer search indicated that almost all QxxxR-bearing Cas9s exhibit NAG preference (Supplementary Fig. 2). To experimentally determine its PAM profile, we adopted our previously established dual-plasmid eGFP repression assay using the nuclease-deficient SedCas9 (D10A/H847A) (Wang et al., 2021a), by which the extent of eGFP repression would be directly linked to PAM recognition (Fig. 1d). Due to the high sequence identity and functional exchangeability of the engineered Se-sgRNA and Sp-sgRNA (Supplementary Fig. 3), the previously established PAM library plasmids pZE-NNN-eGFP-sgRNA harboring an Sp-sgRNA and a NNN PAM library adjacent to the start codon of eGFP were appliable for PAM profiling of SedCas9 (Wang et al., 2021a). Consistent with the prediction, SedCas9 exhibited preference towards NAG PAMs, with reduced recognition on NGG PAMs and collateral recognition towards RYG PAMs (Fig. 1e). SedCas9 showed comparatively high repression activity against all AAGN PAMs, ruling out the requirement of the 4th nucleotide in PAM recognition (Fig. 1f). The NAG PAM preference was further validated by the cell depletion assay, in which SeCas9 is lethal only when targeting the representative AAG PAM within either aslB of E. coli or eGFP of E. coli with chromosomally-integrated eGFP (Fig. 1g). These results corroborated the structural similarity but PAM diversity between SeCas9 and SpCas9.
2.2. Dissecting the PAM recognition mechanism of SeCas9 orthologs
The distinct PAM preference and PI domain implied a different PAM recognition mechanism between SeCas9 and SpCas9. Hinted by the consensus QxxxR motif, we hypothesized that the conserved glutamine and arginine within the QxxxR motif of SeCas9 and its close orthologs specifically interact with the 2nd adenine (A) and 3rd guanine (G) within the NAG PAM. To elucidate the PAM recognition mechanism, the structure of SeCas9 was computationally predicted using the high-accuracy machine learning program AlphaFold (Jumper et al., 2021). The predicted SeCas9 showed high structural similarity to SpCas9 (PDB ID: 4UN3), and was distinguished by adopting a loop structure of the PAM-binding motif (1340-QSNLR-1344) within the PI domain, which presumably rendered Q1340 and R1344 lying near the PAM-proximal interface (Fig. 2a, and Supplementary Fig. 4). The molecular dynamics (MD) simulations were then exploited to refine and analyze the interactions between the QSNLR motif and NAG or NGG PAM. Based on the MD trajectory files, we calculated the minimum distances between each PAM and Q1340/R1344. The simulations indicated that Q1340 and R1344 could form two bidentate hydrogen bonds with 2nd A and 3rd G of the NAG PAM, respectively (Fig. 2b). In contrast, Q1340 could barely form a hydrogen bond with the 2nd G of the NGG PAM (Fig. 2b).
Fig. 2. Dissecting the PAM recognition mechanism via computational modeling and mutagenesis analysis.
a. Structural superimposition of SpCas9 (PDB ID: 4UN3, grey) and SeCas9 (pink) predicted by AlphaFold 2.0. Colored components include the targeted dsDNA (blue), sgRNA (purple), PAM sequence (yellow), and the QSNLR loop from SeCas9 (green). b. Molecular dynamics simulations of interactions between SeCas9 PI domain with TAG and TGG PAMs. Within the zoomed-in view of the left SeCas9-TAG and SeCas9-TGG panels, potential hydrogen bonds were denoted in dashed lines. The two right panels showed the simulations results of predicted hydrogen-bonding distances between the Q1340/R1344 and 2nd/3rd bases of TAG or TGG PAMs. c. The eGFP repression assay of SedCas9 PI domain variants targeting TAG and TGG PAMs. The dashed boxes indicate the mutation of S1341/N1342/L1343 to leucine (L) or lysine (K). d. Replacement of the PI domain of SedCas9 (grey) with that of SpCas9 (SpPI, orange), putative SmCas9 (VEF19407.1) from S. mutans NCTC10832 (SmPI, green), and putative SeCas9-HC5 (KEY47635.1) from S. equinus HC5 (HC5PI, blue). The chimeric mutants were created by fusing the N-terminal (1–1108 aa) of SedCas9 with the SpPI (1100–1368 aa), SmPI (1090–1350 aa), and HC5PI (1109–1375 aa), respectively. The eGFP repression assays were conducted with SedCas9 chimeric variants towards NNG and NAA PAMs. Data indicated the mean ± standard deviation (n = 3 independent biological replicates).
To validate the computational modeling, we investigated the impact of mutating the QSNLR motif on PAM recognition. Both Q1340A and Q1340V showed markedly impaired recognition on TAG but slightly improved recognition on TGG targets, while Q1340R reverted its PAM specificity from TAG to TGG (Fig. 2c). R1344 is critical in recognizing the 3rd G, since either R1344Q or R1344V abolished recognition on both TAG and TGG PAMs (Fig. 2c). However, N1342—which was presumably the counterpart to the first arginine (R1333) in RxR motif in SpCas9—was not involved in direct PAM recognition, as mutating N1342 to A, V, or R retained full repression activity towards TAG target and unchanged or slightly reduced activity towards TGG target (Fig. 2c). Especially, N1342R could not revert the PAM specificity to TGG, further confirming that N1342 is located on the PAM-distal position (Supplementary Fig. 4). These results demonstrated that Q1340 and R1344 determine the NAG PAM specificity. To further evaluate the importance of the intergenic region (1341-SNL-1343) of the loop, we first disrupted it by deleting SN (ΔSN) or replacing it with K. Both mutations drastically impaired the repression activity of SedCas9 against either TGG or TAG targets, suggesting the necessity of the intact QSNLR motif in PAM recognition (Fig. 2c). Interestingly, replacing QSNL with RK reverted its PAM specificity to TGG (Fig. 2c), further emphasizing the importance of the loop structure of the QSNLR motif in NAG PAM recognition. The general role of the QxxxR motif was also validated by swapping the QSNLR of SedCas9 with that of its close orthologs, including QGRLR from ScCas9, QGNLR from SdCas9, QDDVR from SphCas9, QSNVR from SmiCas9, and QSSVR from SoCas9 (Fig. 1a). All substitutions retained full recognition on TAG PAM, though they showed comparable or compromised activity towards TGG PAM (Fig. 2c). Intriguingly, replacing the RxR of SpdCas9 with RSNLR retained its full activity towards TGG PAM, while replacing it with QSNLR abolished its activity towards either TGG or TAG PAM (Supplementary Fig. 5a). Collectively, these results demonstrated that the QxxxR-bearing Cas9s adopted a new PAM recognition mechanism different from its ortholog SpCas9.
Divergent evolution of the PI domain presumably might have given rise to PAM diversity among SpCas9-like orthologs scope (Gasiunas et al., 2020; Jacobsen et al., 2020; Karvelis et al., 2017). This was supported by the facts that all of these orthologs are more conserved in the N-termini over the PI-containing C-termini (63–91% vs. 36–41% identities with SpCas9) (Supplementary Fig. 6), and that all QxQ, QxxxR or RxQ-bearing Cas9s have equivalent RxR-bearing orthologs with almost identical N-termini (Supplementary Fig. 7). In line with that, swapping the PI domain of SpdCas9 (SpPI, 1100–1368aa) with that of SedCas9 (SePI, 1109–1377aa) reverted its PAM preference from TGG to TAG and vice versa (Fig. 2d and Supplementary Fig. 5b). These findings suggest that the PI domain alone could determine the PAM specificity. Thus, to reveal the PAM specificity of Cas9s orthologs with alternate PAM-binding motifs, the SedCas9 PI domain was replaced with SmPI (the QxQ motif) of a putative S. mutans Cas9 ortholog (SmCas9, VEF19407.1), and HC5PI (the RxQ motif) of a putative S. equinus Cas9 ortholog (SeCas9-HC5, KEY47635.1). Interestingly, SeCas9-HC5 shared almost identical N-terminus with SeCas9 (Supplementary Fig. 7). Consistent with in silico PAM prediction and a previous study (Chatterjee et al., 2020b), the resultant SedCas9N-SmPI showed high specificity to NAA PAMs (Fig. 2d and Supplementary Fig. 2). Surprisingly, albeit pursuant to the PAM prediction, the SedCas9N-HC5PI showed similar NAG PAM preference as SedCas9 and the underlying PAM recognition mechanism remains elusive (Fig. 2d and Supplementary Fig. 2). These implied that the PAM diversity was a result of divergent evolution of the PI domain and underlined the feasibility of rewiring or expanding PAM specificity by substituting or evolving PI domains.
2.3. Semi-rational engineering of SeCas9 with expanded PAMs
The relaxed PAM requirement and unique QxxxR PAM-binding motif rendered SeCas9 a potential molecular scaffold with structural plasticity to evolve expanded targeting capabilities. Amino acids auxiliary for PAM recognition from SpCas9 are conserved in SeCas9, including D1147, S1148 and E1231 (counterpart to D1135, S1136 and E1219 in SpCas9) (Fig. 3a). In SpCas9, D1135 and S1136 confer water-mediated contact to the third G of the PAM, while E1219 forms a salt bridge with the R1335 stabilizing its interaction with the third G (Anders et al., 2014; Kleinstiver et al., 2015). These residues, especially D1135, have been frequently mutated along with the PAM-interacting R1333/R1335 to afford new or expanded PAM specificities (Kleinstiver et al., 2015; Miller et al., 2020; Nishimasu et al., 2018; Walton et al., 2020). These imply that, although SeCas9 deploys a different PAM-binding motif from SpCas9, it shares an overall similar PAM duplex binding pocket as SpCas9 (Fig. 3a).
Fig. 3. Semi-rational generation of SedCas9 mutants with expanded PAM scope.
a. PAM interacting (Q1340 and R1344) or proximal (L1120, D1147, S1148, T1230, E1231 and K1346) residues in SeCas9 identified via AlphaFold modeling and SpCas9-based alignment. b. Comparison of SedCas9 variants towards NNG PAMs via the eGFP repression assay. c. The schematic of the eGFP repression assay of SpdCas9 variants and SedCas9-NQ towards single-R PAMs. d. Comparison of eGFP repression activity between previously engineered PAM-flexible SpdCas9 variants (dxCas9-3.7 and SpdRY) towards NGN PAMs and SedCas9-NQ towards NNG PAMs. e. Comparison of eGFP repression activity between SpdRY towards NAN PAMs and SedCas9-NQ towards NNA PAMs. The data towards all PAMs tested were generated from three biological replicates (n = 3). In boxplots of d and e, the dash and ‘+’ within each box respectively represent medium and mean values. * P≤ 0.05, ** P≤ 0.01, ***P≤ 0.001, ****P≤ 0.0001 (two-tailed t-test; n = 12 independent biological replicates)
To expand the PAM range of SedCas9, we first generated D1147 variants and profiled the PAM specificity against NNG PAMs using the eGFP repression assay. Compared with wild-type (wt) SedCas9, the D1147N variant showed the most improved recognition against all NNG PAMs tested (NAG>NGG>NCG>NTG) (Fig. 3b and Supplementary Fig. 8a). To further improve its activity against all NNG PAMs, we then mutated PAM proximal residues including L1120, T1230, E1231, and K1346, based on sequence alignment and predicted SeCas9 structure (Fig. 3a). D1147N-based L1120R, T1230K, and T1230R were created to enhance PAM-proximal DNA contacts (Chatterjee et al., 2020a; Nishimasu et al., 2018), of which T1230R increased the recognition on NCG and NTG PAMs (Fig. 3b and Supplementary Fig. 8b). E1231 mutation was presumed to eliminate the predicted salt bridge with R1344 (Chen et al., 2019), thus weakening the specific interaction between R1344 and the third G. However, mutating E1231 to either V or F in D1147N/T1230R could neither expand nor enhance PAM recognition. Interestingly, while SpCas9 and other RxR-containing Cas9s usually adopted a Thr or Ser at its PAM-interaction face (RxRx(T/S)), SeCas9 and other QxxxR-containing Cas9s adopted charged residues instead (QxxxRx(K/R/Q/E)) (Fig. 1a). Thus, to introduce potential hydrogen bonds with the PAM duplex (Kleinstiver et al., 2015; Nishimasu et al., 2018), we mutated K1346 to R or Q in to SedCas9 D1147N. Surprisingly, both D1147N/K1346R and D1147N/K1346Q showed robust recognition towards all NNG PAMs, with the latter, hereafter named SedCas9-NQ, exhibiting the best performance among all mutants (Fig. 3b and Supplementary Fig. 8b). Like SedCas9, SedCas9-NQ didn’t require the 4th base for PAM recognition (Supplementary Fig. 9).
SedCas9-NQ was compared with known PAM-flexible dCas9 variants derived from SpCas9, including dxCas9-3.7 and SpRY that recognize single-base NGN or NRN PAMs (Fig. 3c). When tested against NGN PAMs, dxCas9-3.7 showed better eGFP repression activities than SpdRY, and repressed eGFP by 63.8% to 95.6% on average (Fig. 3d). SedCas9-NQ outperformed all SpdCas9 mutants and showed more robust eGFP repression towards almost all tested NNG PAMs (82.7% to 99.0% on average, NAG>NGG>NCG>NTG) (Fig. 3d). To confirm if SedCas9-NQ could tolerate single-A PAMs and also to compare its activity with the near-PAMless SpdRY, they were respectively tested against NNA and NAN PAMs. The eGFP repression assay showed that, though SpdRY showed more broad recognition against single-A PAMs, SedCas9-NQ exhibited substantially higher specificity and repression activity towards NAA PAMs than SpdRY (92.7% versus 74.3%) (Fig. 3e). These results indicated that SedCas9-NQ showed both broad compatibility and robust activity towards an expanded NNG and NAA PAM range. The expanded PAM profile was further validated by restoring the nuclease active SeCas9-NQ with an in vitro DNA cleavage assay. The cleavage assay against linearized plasmid DNA showed robust nuclease activity of SeCas9-NQ towards all NNG and NAA PAM targets (Supplementary Fig. 10), demonstrating the broadened PAM recognition of SeCas9-NQ and its potential application in gene editing.
2.4. Bifunctional and titratable gene regulation via the engineered ω-SedCas9-NQ
Nuclease-deficient dCas9 with lessened PAM constraint could empower its numerous applications including CRISPR-based gene regulation. Particularly, PAM-expanded Cas9s could facilitate the application of CRISPRa, whereby optimal positioning of dCas9-fused transcriptional factors on promoters is extremely critical (Bikard et al., 2013; Dong et al., 2018; Fontana et al., 2020). In this study, to create a programmable dCas9-based transcriptional regulator with compact functionalities, we fused the ω subunit variant (RpoZ I13N) of E. coli RNAP to SedCas9-NQ at its N-terminus with a GGGGS fusion linker. The resultant ω-SedCas9-NQ could potentially achieve a PAM-directed two-layered transcriptional regulation: (1) the PAM positioning based activation (CRISPRa) or interference (CRISPRi), and (2) the PAM preference based titratable gene regulation within a narrow DNA target window. As activation targets, eGFP was placed under the control of two weak promoters, i.e., the Plpp0.03 derived from the lipoprotein promoter Plpp1 (Wang et al., 2017a), and PchbB from the cryptic cellobiose utilization gene cluster chb (Vinuselvi and Lee, 2011). A sgRNA library was then designed to target both DNA strands upstream of the transcription start site (TSS, +1) with either NNG or NAA PAMs every 8–12 bp (Fig. 4a). Compared to ω-SedCas9-NQ lacking a targeting sgRNA as a control, a 2.1- and 7.0-fold eGFP activation were observed with sgRNAs targeting the non-template strand of Plpp0.03 and PchbB promoters, respectively (Fig. 4b and c). To demonstrate the effects of CRISPRa on stronger promoters, we further replaced Plpp0.03 with Plpp0.2, which resulted in a similar activation trend and fold (Supplementary Fig. 11). These results corroborated the functionality and general applicability of ω-SedCas9-NQ for gene activation. When targeting PAM downstream of promoters or start codons, ω-SedCas9-NQ could serve as a gene repressor (Fig. 4a). To this end, we tested the gene repression efficiency of ω-SedCas9-NQ by targeting all five representative NNG and NAA PAMs downstream of the start codon of the reporter gene eGFP. The repression assay showed that ω-SedCas9-NQ repressed all targets in a PAM-specific fashion (AAG>AGG=ACG>AAA>ATG), reaching overall 88.3–98.8% repression efficiencies (Fig. 4d). These results together demonstrated that the PAM-diverse ω-SedCas9-NQ could afford bifunctional and titratable gene control under the guidance of programmable sgRNAs.
Fig. 4. The fusion transcription regulator ω-SedCas9-NQ mediated bifunctional and titratable gene control.
a. Schematic of the ω-SedCas9-NQ mediated promoter walking for CRISPR activation (CRISPRa) and targeting at coding sequence for CRISPR interference (CRISPRi). For CRISPRa, ω-SedCas9-NQ binds to the upstream of transcription start site (TSS, +1 bp) every 8–12 bp on both non-template (N) and template (T) strands and recruits RNA polymerase (RNAP subunits β′βα2). The sgRNAs are designed to target available NNG or NAA PAMs on Plpp0.03 and PchbB promoters. For CRISPRi, ω-SedCas9-NQ binds to representative NNG or NAA PAMs and blocks RNAP for transcriptional elongation. b. CRISPRa of eGFP under control of an engineered low-strength promoter Plpp0.03. NC, negative control. c. CRISPRa of eGFP under control of the PchbB promoter from the cryptic cellulobiose utilization locus chb. NC, negative control. d. ω-SedCas9-NQ mediated titratable CRISPRi of eGFP at NNG and NAA PAMs. * P≤ 0.05, ** P≤ 0.01, ***P≤ 0.001, ****P≤ 0.0001 (two-tailed t-test; n = 3 independent biological replicates). Data indicated the mean ± standard deviation (n = 3 independent biological replicates).
2.5. The ω-SedCas9-NQ mediated metabolic control for 4-hydroxycoumarin enhancement
To evaluate the applicability of ω-SedCas9-NQ in metabolic engineering practices, it was implemented to enhance production of the anticoagulant precursor 4-hydroxycoumarin (4HC) in engineered E. coli. The four-step 4HC pathway previously developed in our lab initiates from the shikimate pathway intermediate chorismate and ends with condensation of salicoyl-CoA and malonyl-CoA to 4HC, involving isochorismate synthase (EntC), isochorismate pyruvate lyase (PchB), salicoyl-CoA ligase (SdgA) and β-ketoacyl-ACP synthase III (FabH)-type quinolone synthase (PqsD) (Fig. 5a) (Lin et al., 2013). To redirect carbon flux from glycerol to the 4HC pathway, CRISPRi was targeted to potential competing genes of the shikimate pathway (ptsI, eno, csrA, pykA, pykF, gltA and ppc) and genes involved in fatty acid biosynthesis from malonyl-CoA (fadD and fabF), and CRISPRa to potential genes beneficial to malonyl-CoA supply (accB and fadR).
Fig. 5. The ω-SedCas9-NQ mediated metabolic control for 4-hydroxycoumarin production enhancement in E. coli.
a. Metabolic pathway of 4-hydroxycoumarin (4HC) production from glycerol in E. coli. The 4HC pathway were assembled on two plasmids, pZE12-EP-APTA and pCS27-PS. sgRNAs were constructed on the pCS27-PS plasmid. Genes to be repressed are shown in red, including ptsI (phosphoenolpyruvate-protein phosphotransferase), eno (enolase), ppc (phosphoenolpyruvate carboxylase), pykA (pyruvate kinase II), pykF (pyruvate kinase I), gltA (citrate synthase), fabD (malonyl-CoA-acyl carrier protein transacylase), fabF (β-ketoacyl-(acyl carrier protein) synthases II), and csrA (carbon storage regulator). Genes to be activated are shown in green, including accB (biotin carboxyl carrier protein) and fadR (fatty acid degradation regulator). Genes to be over-expressed on plasmids are shown in blue, including ppsA (phosphoenolpyruvate synthetase), tktA (transketolase I), aroGfbr (feedback-inhibition-resistant 3-deoxy-7-phosphoheptulonate synthase), aroL (shikimate kinase II), entC (isochorismate synthase), pchB (isochorismate pyruvate lyase), sdgA (salicoyl-CoA ligase) and pqsD (β-ketoacyl-ACP synthase III (FabH)-type quinolone synthase). Metabolite abbreviations: DHA, dihydroxyacetone; DHAP, dihydroxyacetone phosphate; G3P, glycerol 3-phosphate; F6P, fructose-6-phosphate; E4P, erythrose-4-phosphate; PEP, phosphoenolpyruvate; PYR, pyruvate; AcCoA, acetyl-CoA; α-KG, α-ketoglutarate; Mal, malate; OAA, oxalacetate; DAHP, 3-deoxy-arabino-heptulosonate 7-phosphate; Cho, chorismate; ICho, isochorismate. b. Schematic of the ω-SedCas9-NQ mediated CRISPRi and CRISPRa of genes of interest (GOI). sgRNAs for repression targets were targeting five neighboring NNG and NAA PAMs. sgRNAs for activation targets were targeting NNG and NAA PAMs every 5–10 bp upstream of transcription start site (TSS, +1 bp). c. The increase in folds of 4HC titers with first-round screening of repression targets at different PAMs in test tube cultures with 3 ml M9Y medium containing 10 g/l glycerol. d. The increase in folds of 4HC titers with first-round screening of activation targets at different positions on the promoters in test tube cultures with 3 ml M9Y medium containing 10 g/l glycerol. e. The second-round shake flask experiments for 4HC production from first-round CRISPRi and CRISPRa targets in shake flask cultures with 20 ml M9Y medium containing 10 g/l glycerol. The bars denoted 4HC titers and the circle dots denoted OD600. All production tests were performed with the engineered strain E. coli::ω-SedCas9-NQ. Negative control (NC), E. coli::ω-SedCas9-NQ with the 4HC pathway plasmids pZE-EP-APTA and pCS27-PS. * P≤ 0.05, ** P≤ 0.01, ***P≤ 0.001, ****P≤ 0.0001 (two-tailed t-test; n = 3 independent biological replicates). Data indicated the mean ± standard deviation (n = 3 independent biological replicates).
We first engineered a host strain E. coli::ω-SedCas9-NQ with the ω-SedCas9-NQ chromosomally integrated at the dkgB locus for the 4HC production test. The sgRNAs for repression targets were positioned against five neighboring NNG and NAA PAMs at the 5’-end of the coding sequences, while the sgRNAs for activation targets were targeting available NNG or NAA PAM every 5–10 bp upstream of the TSS (+1) on the non-template DNA strand (Fig. 5b). All sgRNAs were incorporated into the downstream 4HC pathway plasmid pCS27-PS expressing pqsD and sdgA. The engineered E. coli::ω-SedCas9-NQ were co-transformed with pCS27-PS-sgRNA and the upstream 4HC pathway plasmid pZE-EP-APTA, and cultivated with M9Y minimal medium containing 10 g/l glycerol for the 4HC production test. First-round screening in test tubes showed that, though CRISPRi of all target genes could improve 4HC titer, the increase in folds was in a PAM-dependent manner (Fig. 5c). Particularly, targeting eno and pykA at the NGG PAM, csrA and ptsI at the NCG PAM, and fabD at the NAA PAM all afforded over 60% increase of 4HC production. This indicated that PAM-directed titratable CRISPRi could locate the optimal gene repression levels in production scenarios, especially when stringent repression of essential genes was detrimental to cell fitness or production. To increase the malonyl-CoA availability, CRISPRa of accB or fadR by targeting at TSS-distal PAMs tended to improve 4HC production, reaching a maximal increase of 43.3% (N2, 72 bp upstream of the TSS of accB) and 50.8% (N3, 77 bp upstream of the TSS of fadR), respectively (Fig. 5d). Second-round 4HC production test was conducted with shake flask experiments in M9Y minimal medium containing 10 g/l glycerol. All selected repression and activation targets were validated for 4HC production enhancement, among which repression of fabD at the NAA PAM enabled the highest 4HC titer (395.5 mg/l) in 48 h, accounting for a 2.6-fold increase compared to the control without any sgRNA (152.4 mg/l) (Fig. 5e). Thus, the engineered ω-SedCas9-NQ allowed the bifunctional and titratable metabolic control based upon the PAM position and choice, and could permit fast and reliable high-throughput screening of high-performance phenotypes.
3. Discussion
The targeting scope limitation of Cas9 caused by the stringent PAM specificity has been a major obstacle for its precise and multipurpose uses (Anzalone et al., 2020; Nishimasu et al., 2018; Pan et al., 2021). One solution to overcome this inherent limitation is to identify new Cas9s with altered or minimal PAM specificity (Chatterjee et al., 2018; Chatterjee et al., 2020b; Gasiunas et al., 2020; Hirano et al., 2016a). In this study, by unveiling the diversity of PAM-binding motifs among closely-related Streptococci Cas9s, we discovered a subset of SpCas9-like orthologs with distinct PAM-binding motifs including the QxxxR-containing SeCas9. We identified the NAG PAM specificity of SeCas9 and elucidated its PAM recognition mechanism. Serving as a new molecular scaffold, model-based engineering of SeCas9 created a variant SedCas9-NQ with expanded NNG and NAA PAM scopes and robust activities that surpass known SpCas9-derived variants in gene transcription control in bacteria.
The identification of SeCas9 and its close orthologs implied the divergent evolution of the PI domain for PAM diversity among closely-related Streptococci Cas9s. Although structurally similar as underlined by high sequence identities and sgRNA compatibilities, SpCas9 orthologs exhibited distinct PAM preferences possibly by evolving PI domains or PAM-binding motifs. Other than the prevalent RxR motif, three uncanonical PAM-binding motifs, i.e. the QxQ, QxxxR and RxQ motifs (Fig. 1a), were found here to be more distributed among SpCas9 orthologs. These distinct PAM-binding motifs respectively afford NGG, NAA and NAG PAM specificities via specific hydrogen bonding (Chatterjee et al., 2020b; Shields et al., 2020). An intriguing observation is that the N-terminus of all these Cas9s are highly similar, suggesting the rapid evolution of the PI domain for PAM diversity possibly during horizontal gene transfer (Jacobsen et al., 2020; Shmakov et al., 2017). Inspired by that, we created chimeric SedCas9 variants with orthologous PI domains, which successfully rewired the PAM of SedCas9 and simultaneously dissected the PAM profiles of SeCas9 orthologs (Fig. 2d). Moreover, compensatory mutations within PI domains might also be necessary in obtaining new PAM specificities. This may explain why simply mutating the RxR motif in SpCas9 failed to alter PAM specificity in both our work and prior research (Anders et al., 2016; Chatterjee et al., 2020b). These results together hinted at the modularity of PAM alteration by recruiting PI domains from diverse Cas9 orthologs and emphasized the feasibility of solely mutating the PI domain for PAM reprogramming.
The relaxed NRG PAM compatibility and distinct QxxxR PAM-binding motif render SeCas9 an ideal molecular scaffold for engineering expanded recognition capabilities. In SpCas9, the NGG PAM is located between the RxR motif and residues D1135/S1136, of which D1135 is one of the most frequently mutated positions for PAM alteration (Anders et al., 2014; Hirano et al., 2016b). We therefore empirically started by scanning mutagenesis of the equivalent D1147 in SedCas9, and of those D1147N substantially increased its recognition towards most NNG PAMs tested (Fig. 3b). Noteworthy though, is that T1337 is highly conserved in NGG-recognizing Cas9s, which is generally occupied by charged residues in NAA- or NAG-recognizing orthologs, implying their potential contacts with PAM sequences (Fig. 1a). The final variant SedCas9-NQ (D1147N/K1346Q) further improved the recognition against all NNG PAMs, and showed substantial recognition against NAA PAMs (Fig. 3). Thus, with semi-rational design approach, we created a PAM-diverse and activity-improved variant SedCas9-NQ with minimal mutations that could aid dCas9-based versatile applications in bacteria with almost no target restriction. As demonstrated in this work, the engineered ω-SedCas9-NQ could enable PAM-directed bifunctional and titratable bacterial gene control, circumventing the need of shortening the sgRNA spacers (Qi et al., 2013; Wang et al., 2021b), regulating sgRNA expression (Fontana et al., 2018; Santos-Moreno et al., 2020), or creating mismatches in sgRNA (Feng et al., 2021; Jost et al., 2020), for gene control tunability.
When employing the engineered ω-SedCas9-NQ in the metabolic control for 4HC production in E. coli, it allowed rapid screening of essential gene targets for carbon flux redirection. It has been noted that CRISPR/SpdCas9 mediated gene repression often led to cell fitness defects especially when targeting essential genes due to its stringent gene repression (Cui et al., 2018; de Bakker et al., 2022). The bifunctional and tunable functionality rendered ω-SedCas9-NQ an ideal transcriptional regulator to maximize 4HC production by controlling gene expression to different extents. By screening major genes involved in PEP consumption and malonyl-CoA supply, we identified multiple targets enabling 4HC production enhancement. Among repression targets, fabD involving in fatty acid biosynthesis initiation was identified as the most effective target to rewire carbon flux towards 4HC. The ω-SedCas9-NQ mediated PAM-dependent CRISPRi of fabD eventually afforded 2.6-fold improvement of 4HC towards an NAA PAM (Fig. 5e). For CRISPRa screening, activation of accB encoding a component of acetyl-CoA carboxylase, and fadR encoding a global regulator upregulating accABCD genes (Zhang et al., 2012), improved 4HC production by 43.3% and 50.8%, respectively (Fig. 5e). These results together suggested that malonyl-CoA availability might be a major bottleneck for 4HC production, and also demonstrated the feasibility of improving product formation via PAM-dependent gene regulation.
In conclusion, we presented here the in silico mining and biochemical characterization of a unique subgroup of Streptococci Cas9s with a distinct PAM-binding motif. Elucidation of the NAG PAM recognition via the QxxxR-motif sheds light on the PAM diversity and its underlying recognition machinery within Streptococci Cas9 orthologs. These new Cas9s also enlighten the evolutionary implication of the PAM diversity being the result of PI domain diversifying. Further protein engineering of SeCas9 created a SeCas9-NQ variant with expanded PAM range and improved activity that surpasses some widely-used PAM-flexible SpCas9 mutants. The SeCas9-NQ variant, as well as other natural SpCas9-like orthologs, hold great potential to be applied in other organisms with further engineering. This work exemplified the feasibility of PAM rewiring through mining and engineering natural orthologs and generated a new variant that enriches the toolkit of PAM-expanded Cas9s for multipurpose genetic manipulations.
4. Materials and Methods
4.1. Bacterial strains and plasmid construction.
All bacterial strains and plasmids used in this study are listed in Supplementary Table 1. E. coli strain XL1-Blue (Stratagene, La Jolla, CA) was used for plasmid cloning and storage, BL21 Star (DE3) (Invitrogen, Waltham, MA) for protein expression, and BW25113 (F′) for eGFP repression assays. E. coli::ω-SedCas9-NQ was constructed by integrating the Plpp1-ω-SedCas9-NQ cassette into the chromosome of E. coli BW25113 (F′) at the dkgB locus using λ Red-mediated recombination (Gerlach et al., 2007). Compatible plasmids pZE12-luc and pCS27 were used for constructing the eGFP repression assay system. pETDuet-1 was used for SeCas9 expression and purification. pCP20 was used for curing the kanamycin resistance marker during gene recombination.
All DNA manipulations were performed following standard molecular cloning protocols (Sambrook et al., 1989). Enzymes including Phusion high-fidelity DNA polymerase, restriction endonucleases, and the Quick Ligation kit were purchased from New England Biolabs (Ipswich, MA). The cas9 gene encoding SeCas9 (EFW88975.1) was amplified from Streptococcus equinus ATCC 9812 and inserted into pCS27 between Acc65I and BamHI under a constitutive Plpp1 promoter (Wang et al., 2017a), yielding the plasmid pCS27-SeCas9. The nuclease-dead SeCas9 (SedCas9, D10A and H847A) and its derived mutants were obtained using the method described by Chiu et al. (Chiu et al., 2004). The reporter plasmids pZE-eGFP and pZE-eGFP-sgegfp, and the NNN PAM library plasmids pZE-NNN-eGFP-sgegfp were obtained from our previous study (Wang et al., 2021a). The pZE-eGFP-Se-sgRNA was obtained by replacing the Sp-sgRNA from pZE-eGFP-sgegfp with Se-sgRNA targeting eGFP. The chimeric genes encoding SedCas9-SpPI, SedCas9-SmPI and SedCas9-HC5PI were obtained from overlapping PCR. The gene fragments encoding SmPI and HC5PI were synthesized from Eurofins Genomics (Louisville, KY). Plasmid pCS27-PS-sgRNA was constructed by inserting the PLlacO1-sgRNA cassette into the pCS27-PS plasmid.
4.2. Culture media and conditions.
E. coli cells were grown at 37 °C in Luria-Bertani (LB) medium (containing 5 g/l yeast extract, 10 g/l NaCl, and 10 g/l tryptone) supplemented with appropriate antibiotics. The cell culture was incubated in a rotary shaker with a speed of 270 rpm. Ampicillin (Amp) and kanamycin (Kan) were applied with a final concentration of 100 and 50 μg/ml, respectively. 0.5 mM isopropyl β-D-1-thiogalactopyranoside (IPTG) was added when needed. The M9Y minimal medium (6 g/l Na2HPO4, 0.5 g/l NaCl, 3 g/l KH2PO4, 1 g/l NH4Cl, 1 mM MgSO4, 0.1 mM CaCl2, and 5 g/l yeast extract) containing 10 g/l glycerol was used for 4-hydroxycoumarin production tests in both test tubes and in 125-ml conical shake flasks.
4.3. 4HC production tests.
For 4HC production, the host strain E. coli::ω-SedCas9-NQ was transformed with the 4HC pathway plasmids pCS27-PS-sgRNA and pZE-EP-APTA. For test tube production tests, single colonies were inoculated into test tubes with 3 ml fresh M9Y medium containing 10 g/l glycerol. For shake flask experiments, single colonies were inoculated in 3 ml LB medium and grown at 37 °C for 8–10 h, and then the seed cultures were transferred to 20 ml fresh M9Y medium in 125-ml shake flasks as 3% (v/v) inoculum. For all production experiments, IPTG was added at a final concentration of 0.5 mM during initial inoculation, and cells were grown in a rotary shaker (New Brunswick Scientific, Edison, NJ) at 30 °C with a speed of 270 rpm. Test tube samples were taken at 24 h and shake flask samples were taken at 24 and 48 h.
4.4. HPLC analysis.
All culture samples were centrifuged at 13,000 rpm for 10 min and the supernatants were subjected to Agilent 1260 Infinity II HPLC system (Agilent Technologies) equipped with a reverse phase ZORBAXSB-C18 column and a 1260 Infinity II Diode Array Detector. Solvent A was water with 0.1 % trifluoroacetic acid (TFA) and solvent B was 100 % methanol. Metabolites were analyzed following a program with 5 to 60 % solvent B for 30 min, and 60 to 5 % solvent B for 5 min at a flow rate of 1 ml/min. 4HC was detected and quantified under UV light of wavelength 280 nm.
4.5. PAM profiling assay.
To determine PAM specificity, pCS-SedCas9 was co-transformed with pZE-NNN-eGFP-sgRNA into E. coli BW25113 (F′) for eGFP repression assays. Transformants of triplicates were grown in 3 ml LB tubes with appropriate antibiotics and induced with IPTG for 24 h. The eGFP fluorescence was measured using a Synergy microplate reader (BioTek, Winooski, VT) as described previously (Wang et al., 2017b). Briefly, 20 μl of culture was sampled and diluted with 180 μl distilled H2O in a black 96-well plate. Cell density was measured at an absorbance of 600 nm (OD600) and eGFP fluorescence was determined using an excitation filter of 485/20 nm and an emission filter of 528/20 nm. The relative eGFP expression was described by the ratio of normalized eGFP fluorescence (eGFP/OD600) of tested dCas9 samples to that of the control group without any sgRNA. Triplicate transformants were used for all experiments.
4.6. Protein purification and in vitro enzyme assay.
E. coli BL21 Star (DE3) was transformed with pETDuet-SeCas9 or its derived plasmids. Single colonies were picked and inoculated in 3 ml LB tubes at 37 °C. The overnight inoculants were transferred into 50 mL LB medium and induced with 0.5 mM IPTG when the optical density (OD600) of cultures reached 0.6–0.8 and incubated at 30 °C on a rotary shaker for another 9–12 h. The cell cultures were harvested by centrifugation and protein purifications were performed using His-Spin Protein Miniprep Kit (Zymo Research, Irvine, CA) according to the manufacturer’s instructions. The purified protein was verified by Tricine-SDS-polyacrylamide gel electrophoresis (Tricine-SDS-PAGE) using 12% protein gel and the protein concentration was measured using a Pierce BCA Protein Assay Kit (Thermo Scientific, Waltham, MA) per manufacturer’s instructions.
The sgRNA was prepared by in vitro transcription using T7 RNA polymerase (NEB). The products were optimized by DNase I (NEB) and purified with Monarch® RNA Cleanup Kit (NEB) following manufactures’ instructions. 5 μM purified SeCas9 proteins and 30 μM sgRNA were used for in vitro cleavage, with Sac I-linearized reporter plasmids (15 μg, 500 nM) as the substrate. The reaction was carried out at 37 °C for 30 min, in 15 μl of reaction buffer containing 20 mM Tris-HCl, pH 7.5,100 mM NaCl, 2 mM MgCl2, 1 mM DTT, and 5% glycerol, and stopped by heating at 70 °C for 10 min. Cleavage products were resolved by gel electrophoresis on 1 % agarose gel.
4.7. In silico PAM prediction.
The spacer sequences (~30 nt) extracted from the crRNA arrays of each Streptococci Cas9 were aligned against Streptococcus phage genomes or plasmids using BLASTn searches (Johnson et al., 2008). The candidate protospacer hits were identified with no or at most two mismatches. The 8bp sequences at the 3′-end of all protospacer hits were analyzed via WebLogo to predict potential PAM sequences (Crooks et al., 2004).
4.8. Protein structure modeling and molecular dynamics (MD) simulations
The three-dimensional structure of SeCas9 was computationally modeled using AlphaFold 2.0 (Jumper et al., 2021; Senior et al., 2020). The predicted SeCas9 structure was superimposed with SpCas9 (PDB ID: 4UN3) obtained from the Protein Data Bank (www.rcsb.org) (Anders et al., 2014; Berman et al., 2000). The sgRNA and TGG PAM-containing DNA from 4UN3 were combined with the predicted SeCas9 structure to generate the SeCas9-TGG model in the software Maestro (Schrodinger, version 12.4). The TGG PAM was then mutated into TAG by PyMOL to generate the SeCas9-TAG model. MD simulations of both models were conducted using GROMACS version 2018 and CHARMM36 force field (Huang et al., 2017). CHARMM-GUI was applied to build the MD simulations solution box, which was a cubic box with a length of 137 Å and was filled with water (Jo et al., 2008; Lee et al., 2016). The minimized structures were equilibrated using an NVT ensemble (constant Number of particles, Volume, and Temperature) and NPT ensemble (the Number of particles, Pressure, and Temperature). The target equilibration temperature was set at 303.15 K, and MD simulations were run for 100 ns. After the MD simulations, the distances between certain atoms of amino acids and nucleotides were calculated.
4.9. Statistics
All data from the eGFP repression assays and 4HC production tests were presented as the mean ± standard deviation of biological triplicates (n = 3). The colonies used for data collection were randomly selected from the agar plates. The two-tailed t-test was performed with JMP Pro 16 software, and three or twelve independent biological replicates (n = 3 or 12) were applied.
Supplementary Material
Acknowledgements
This work was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award number R35GM128620. We also acknowledge the support from the College of Engineering, The University of Georgia, Athens.
Footnotes
Competing interests
The authors declare that they have no competing interests.
Data and materials availability
All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data or materials related to this paper may be requested from the corresponding author.
References
- Anders C, Bargsten K, Jinek M, 2016. Structural plasticity of PAM recognition by engineered variants of the RNA-guided endonuclease Cas9. Mol. Cell 61, 895–902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anders C, Niewoehner O, Duerst A, Jinek M, 2014. Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease. Nature. 513, 569–573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anzalone AV, Koblan LW, Liu DR, 2020. Genome editing with CRISPR–Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol 38, 824–844. [DOI] [PubMed] [Google Scholar]
- Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, Moineau S, Romero DA, Horvath P, 2007. CRISPR provides acquired resistance against viruses in prokaryotes. Science. 315, 1709–1712. [DOI] [PubMed] [Google Scholar]
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE, 2000. The protein data bank. Nucleic Acids Res. 28, 235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bikard D, Jiang W, Samai P, Hochschild A, Zhang F, Marraffini LA, 2013. Programmable repression and activation of bacterial gene expression using an engineered CRISPR-Cas system. Nucleic Acids Res. 41, 7429–7437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brouns SJ, Jore MM, Lundgren M, Westra ER, Slijkhuis RJ, Snijders AP, Dickman MJ, Makarova KS, Koonin EV, Van Der Oost J, 2008. Small CRISPR RNAs guide antiviral defense in prokaryotes. Science. 321, 960–964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chatterjee P, Jakimo N, Jacobson JM, 2018. Minimal PAM specificity of a highly similar SpCas9 ortholog. Sci. Adv 4, eaau0766. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chatterjee P, Jakimo N, Lee J, Amrani N, Rodríguez T, Koseki SR, Tysinger E, Qing R, Hao S, Sontheimer EJ, 2020a. An engineered ScCas9 with broad PAM range and high specificity and activity. Nat. Biotechnol 38(10), 1154–1158. [DOI] [PubMed] [Google Scholar]
- Chatterjee P, Lee J, Nip L, Koseki SR, Tysinger E, Sontheimer EJ, Jacobson JM, Jakimo N, 2020b. A Cas9 with PAM recognition for adenine dinucleotides. Nat. Commun 11, 1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen W, Zhang H, Zhang Y, Wang Y, Gan J, Ji Q, 2019. Molecular basis for the PAM expansion and fidelity enhancement of an evolved Cas9 nuclease. PLoS Biol. 17, e3000496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chiu J, March PE, Lee R, Tillett D, 2004. Site-directed, Ligase-Independent Mutagenesis (SLIM): a single-tube methodology approaching 100% efficiency in 4 h. Nucleic Acids Res. 32, e174–e174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crooks GE, Hon G, Chandonia J-M, Brenner SE, 2004. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cui L, Vigouroux A, Rousset F, Varet H, Khanna V, Bikard D, 2018. A CRISPRi screen in E. coli reveals sequence-specific toxicity of dCas9. Nat. Commun 9, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Bakker V, Liu X, Bravo AM, Veening J-W, 2022. CRISPRi-seq for genome-wide fitness quantification in bacteria. Nat. Protoc 17, 252–281. [DOI] [PubMed] [Google Scholar]
- Dong C, Fontana J, Patel A, Carothers JM, Zalatan JG, 2018. Synthetic CRISPR-Cas gene activators for transcriptional reprogramming in bacteria. Nat. Commun 9, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doudna JA, Charpentier E, 2014. The new frontier of genome engineering with CRISPR-Cas9. Science. 346, 1258096. [DOI] [PubMed] [Google Scholar]
- Feng H, Guo J, Wang T, Zhang C, Xing X. h., 2021. Guide-target mismatch effects on dCas9–sgRNA binding activity in living bacterial cells. Nucleic Acids Res. 49, 1263–1277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Findlay GM, Boyle EA, Hause RJ, Klein JC, Shendure J, 2014. Saturation editing of genomic regions by multiplex homology-directed repair. Nature. 513, 120–123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fontana J, Dong C, Ham JY, Zalatan JG, Carothers JM, 2018. Regulated expression of sgRNAs tunes CRISPRi in E. coli. Biotechnol. J 13, 1800069. [DOI] [PubMed] [Google Scholar]
- Fontana J, Dong C, Kiattisewee C, Chavali VP, Tickman BI, Carothers JM, Zalatan JG, 2020. Effective CRISPRa-mediated control of gene expression in bacteria must overcome strict target site requirements. Nat. Commun 11, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gasiunas G, Young JK, Karvelis T, Kazlauskas D, Urbaitis T, Jasnauskaite M, Grusyte MM, Paulraj S, Wang P-H, Hou Z, 2020. A catalogue of biochemically diverse CRISPR-Cas9 orthologs. Nat. Commun 11, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gaudelli NM, Komor AC, Rees HA, Packer MS, Badran AH, Bryson DI, Liu DR, 2017. Programmable base editing of A• T to G• C in genomic DNA without DNA cleavage. Nature. 551, 464–471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gerlach RG, Hölzer SU, Jäckel D, Hensel M, 2007. Rapid engineering of bacterial reporter gene fusions by using Red recombination. Appl. Environ. Microbiol 73, 4234–4242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hilton IB, D’ippolito AM, Vockley CM, Thakore PI, Crawford GE, Reddy TE, Gersbach CA, 2015. Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers. Nat. Biotechnol 33, 510–517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hirano H, Gootenberg JS, Horii T, Abudayyeh OO, Kimura M, Hsu PD, Nakane T, Ishitani R, Hatada I, Zhang F, 2016a. Structure and engineering of Francisella novicida Cas9. Cell. 164, 950–961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hirano S, Nishimasu H, Ishitani R, Nureki O, 2016b. Structural basis for the altered PAM specificities of engineered CRISPR-Cas9. Mol. Cell 61, 886–894. [DOI] [PubMed] [Google Scholar]
- Hsu PD, Lander ES, Zhang F, 2014. Development and applications of CRISPR-Cas9 for genome engineering. Cell. 157, 1262–1278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu JH, Miller SM, Geurts MH, Tang W, Chen L, Sun N, Zeina CM, Gao X, Rees HA, Lin Z, 2018. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature. 556, 57–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang J, Rauscher S, Nawrocki G, Ran T, Feig M, De Groot BL, Grubmüller H, MacKerell AD, 2017. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat. Methods 14, 71–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jacobsen T, Ttofali F, Liao C, Manchalu S, Gray BN, Beisel CL, 2020. Characterization of Cas12a nucleases reveals diverse PAM profiles between closely-related orthologs. Nucleic Acids Res. 48, 5624–5638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiang F, Zhou K, Ma L, Gressel S, Doudna JA, 2015. A Cas9–guide RNA complex preorganized for target DNA recognition. Science. 348, 1477–1481. [DOI] [PubMed] [Google Scholar]
- Jo S, Kim T, Iyer VG, Im W, 2008. CHARMM‐GUI: a web‐based graphical user interface for CHARMM. J. Comput. Chem 29, 1859–1865. [DOI] [PubMed] [Google Scholar]
- Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL, 2008. NCBI BLAST: a better web interface. Nucleic Acids Res. 36, W5–W9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jost M, Santos DA, Saunders RA, Horlbeck MA, Hawkins JS, Scaria SM, Norman TM, Hussmann JA, Liem CR, Gross CA, 2020. Titrating gene expression using libraries of systematically attenuated CRISPR guide RNAs. Nat. Biotechnol 38, 355–364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, 2021. Highly accurate protein structure prediction with AlphaFold. Nature. 596, 583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karvelis T, Gasiunas G, Siksnys V, 2017. Harnessing the natural diversity and in vitro evolution of Cas9 to expand the genome editing toolbox. Curr. Opin. Microbiol 37, 88–94. [DOI] [PubMed] [Google Scholar]
- Kleinstiver BP, Prew MS, Tsai SQ, Topkar VV, Nguyen NT, Zheng Z, Gonzales AP, Li Z, Peterson RT, Yeh J-RJ, 2015. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature. 523, 481–485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Komor AC, Kim YB, Packer MS, Zuris JA, Liu DR, 2016. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature. 533, 420–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Konermann S, Brigham MD, Trevino AE, Joung J, Abudayyeh OO, Barcena C, Hsu PD, Habib N, Gootenberg JS, Nishimasu H, 2015. Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature. 517, 583–588. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee J, Cheng X, Swails JM, Yeom MS, Eastman PK, Lemkul JA, Wei S, Buckner J, Jeong JC, Qi Y, 2016. CHARMM-GUI input generator for NAMD, GROMACS, AMBER, OpenMM, and CHARMM/OpenMM simulations using the CHARMM36 additive force field. J. Chem. Theory Comput 12, 405–413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin Y, Shen X, Yuan Q, Yan Y, 2013. Microbial biosynthesis of the anticoagulant precursor 4-hydroxycoumarin. Nat. Commun 4, 1–8. [DOI] [PubMed] [Google Scholar]
- Makarova KS, Wolf YI, Alkhnbashi OS, Costa F, Shah SA, Saunders SJ, Barrangou R, Brouns SJ, Charpentier E, Haft DH, 2015. An updated evolutionary classification of CRISPR–Cas systems. Nat. Rev. Microbiol 13, 722–736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miller SM, Wang T, Randolph PB, Arbab M, Shen MW, Huang TP, Matuszek Z, Newby GA, Rees HA, Liu DR, 2020. Continuous evolution of SpCas9 variants compatible with non-G PAMs. Nat. Biotechnol 38, 471–481. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nishimasu H, Ran FA, Hsu PD, Konermann S, Shehata SI, Dohmae N, Ishitani R, Zhang F, Nureki O, 2014. Crystal structure of Cas9 in complex with guide RNA and target DNA. Cell. 156, 935–949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nishimasu H, Shi X, Ishiguro S, Gao L, Hirano S, Okazaki S, Noda T, Abudayyeh OO, Gootenberg JS, Mori H, 2018. Engineered CRISPR-Cas9 nuclease with expanded targeting space. Science. 361, 1259–1262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pan C, Wu X, Markel K, Malzahn AA, Kundagrami N, Sretenovic S, Zhang Y, Cheng Y, Shih PM, Qi Y, 2021. CRISPR–Act3. 0 for highly efficient multiplexed gene activation in plants. Nat. Plants 7, 942–953. [DOI] [PubMed] [Google Scholar]
- Qi LS, Larson MH, Gilbert LA, Doudna JA, Weissman JS, Arkin AP, Lim WA, 2013. Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell. 152, 1173–1183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ran FA, Cong L, Yan WX, Scott DA, Gootenberg JS, Kriz AJ, Zetsche B, Shalem O, Wu X, Makarova KS, 2015. In vivo genome editing using Staphylococcus aureus Cas9. Nature. 520, 186–191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sambrook J, Fritsch EF, Maniatis T, 1989. Molecular cloning. Cold spring harbor laboratory press; New York. [Google Scholar]
- Sanozky-Dawes R, Selle K, O’Flaherty S, Klaenhammer T, Barrangou R, 2015. Occurrence and activity of a type II CRISPR-Cas system in Lactobacillus gasseri. Microbiology 161, 1752–1761. [DOI] [PubMed] [Google Scholar]
- Santos-Moreno J, Tasiudi E, Stelling J, Schaerli Y, 2020. Multistable and dynamic CRISPRi-based synthetic circuits. Nat. Commun 11, 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson AW, Bridgland A, 2020. Improved protein structure prediction using potentials from deep learning. Nature. 577, 706–710. [DOI] [PubMed] [Google Scholar]
- Shields RC, Walker AR, Maricic N, Chakraborty B, Underhill SA, Burne RA, 2020. Repurposing the Streptococcus mutans CRISPR-Cas9 system to understand essential gene function. PLoS Pathog. 16, e1008344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shmakov S, Smargon A, Scott D, Cox D, Pyzocha N, Yan W, Abudayyeh OO, Gootenberg JS, Makarova KS, Wolf YI, 2017. Diversity and evolution of class 2 CRISPR–Cas systems. Nat. Rev. Microbiol 15, 169–182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Siksnys V, Gasiunas G, 2016. Rewiring Cas9 to target new PAM sequences. Mol. Cell 61, 793–794. [DOI] [PubMed] [Google Scholar]
- Sternberg SH, Redding S, Jinek M, Greene EC, Doudna JA, 2014. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature. 507, 62–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Terns MP, Terns RM, 2011. CRISPR-based adaptive immune systems. Curr. Opin. Microbiol 14, 321–327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vinuselvi P, Lee SK, 2011. Engineering Escherichia coli for efficient cellobiose utilization. Appl. Microbiol. Biotechnol 92, 125–132. [DOI] [PubMed] [Google Scholar]
- Walton RT, Christie KA, Whittaker MN, Kleinstiver BP, 2020. Unconstrained genome targeting with near-PAMless engineered CRISPR-Cas9 variants. Science. 368, 290–296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang J, Mahajani M, Jackson SL, Yang Y, Chen M, Ferreira EM, Lin Y, Yan Y, 2017a. Engineering a bacterial platform for total biosynthesis of caffeic acid derived phenethyl esters and amides. Metab. Eng 44, 89–99. [DOI] [PubMed] [Google Scholar]
- Wang J, Teng Y, Zhang R, Wu Y, Lou L, Zou Y, Li M, Xie Z-R, Yan Y, 2021a. Engineering a PAM-flexible SpdCas9 variant as a universal gene repressor. Nat. Commun 12, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang J, Wu Y, Sun X, Yuan Q, Yan Y, 2017b. De novo biosynthesis of glutarate via α-keto acid carbon chain extension and decarboxylation pathway in Escherichia coli. ACS Synth. Biol 6, 1922–1930. [DOI] [PubMed] [Google Scholar]
- Wang J, Zhang R, Zhang J, Gong X, Jiang T, Sun X, Shen X, Wang J, Yuan Q, Yan Y, 2021b. Tunable hybrid carbon metabolism coordination for the carbon-efficient biosynthesis of 1, 3-butanediol in Escherichia coli. Green Chem 23, 8694–8706. [Google Scholar]
- Weber T, Wefers B, Wurst W, Sander S, Rajewsky K, Kühn R, 2015. Increasing the efficiency of homology-directed repair for CRISPR-Cas9-induced precise gene editing in mammalian cells. Nat. Biotechnol 33, 543–548. [DOI] [PubMed] [Google Scholar]
- Yamada M, Watanabe Y, Gootenberg JS, Hirano H, Ran FA, Nakane T, Ishitani R, Zhang F, Nishimasu H, Nureki O, 2017. Crystal structure of the minimal Cas9 from Campylobacter jejuni reveals the molecular diversity in the CRISPR-Cas9 systems. Mol. Cell 65, 1109–1121. e3. [DOI] [PubMed] [Google Scholar]
- Zhang F, Ouellet M, Batth TS, Adams PD, Petzold CJ, Mukhopadhyay A, Keasling JD, 2012. Enhancing fatty acid production by the expression of the regulatory transcription factor FadR. Metab. Eng 14, 653–660. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data or materials related to this paper may be requested from the corresponding author.