Abstract
The ability to alter the genomes of living cells is key to understanding how genes influence the functions of organisms and will be critical to modify living systems for useful purposes. However, this promise has long been limited by the technical challenges involved in genetic engineering. Recent advances in gene editing have bypassed some of these challenges but they are still far from ideal. Here we use FuncLib to computationally design Cas9 enzymes with substantially higher donor-independent editing activities. We use genetic circuits linked to cell survival in yeast to quantify Cas9 activity and discover synergistic interactions between engineered regions. These hyperactive Cas9 variants function efficiently in mammalian cells and introduce larger and more diverse pools of insertions and deletions into targeted genomic regions, providing tools to enhance and expand the possible applications of CRISPR-based gene editing.
Subject terms: Protein design, CRISPR-Cas9 genome editing
The ability to alter the genomes of living cells is key to understanding how genes influence the functions of organisms and will be critical to modify living systems for useful purposes. Here, the authors use computational design to discover Cas9 enzymes with increased activity.
Introduction
The myriad applications of Cas9 have significantly accelerated biological research1–4, however, a number of unsolved issues still remain. Perhaps the most surprising limitation to the CRISPR-Cas system is that Cas9 has been found to be particularly inefficient in its cleavage activity. Although the majority of DNA substrates can be cleaved in seconds5, the cutting half-life and catalytic lifetime for Streptococcus pyogenes Cas9 (Cas9) are both ~6 h2,6,7, which compares unfavorably with other nucleases, such as restriction enzymes. The DNA cleavage rate therefore becomes limited by the time required for Cas9 to dissociate from its DNA substrate and resample the population of target sites in a cell. In fact, kinetic studies have revealed that Cas9 is effectively a single turnover nuclease5,8–10. However, the precise in vivo kinetics of Cas9 cleavage can be target strand and locus-dependent, given that other cellular processes, such as transcription, are thought to promote post-cleavage dissociation and thus turnover in a context-dependent manner11. Interestingly, it is often seen that mutations designed to lower Cas9 off-target activity result in a decreased affinity for its target sequence and reduced mutagenesis rate, thus exacerbating the low efficiency problem12. For many applications in cells and in vitro Cas9 enzymes with higher catalytic efficiency would be particularly beneficial.
In this study, we engineered the HNH-like nuclease domain of Cas9 (Fig. 1) to substantially increase the frequency of donor-independent gene editing through computational design. Recently, we developed a computational design approach, called FuncLib, that mutates positions within the active sites of enzymes to increase their catalytic rate or specificity13. The FuncLib approach combines phylogenetic analysis of sequence homologs with Rosetta atomistic design calculations. The resulting designs each exhibit several mutations in the enzyme active site that are predicted to form low-energy constellations, while conserving the key catalytic residues in their active conformation. Furthermore, the designs are highly diverse, exhibiting several mutations relative to one another and the wild-type enzyme. We and others have demonstrated several cases in which FuncLib yielded large gains in efficiency or specificity increases and that unlike in vitro evolution, it does not require screening more than several dozen designs14. Crucially, FuncLib has been successfully applied to design enzymes in their apo states, suggesting that it can be applied to cases, like Cas9, which are dynamic and in which not all the conformations that are relevant for the catalytic cycle have been characterized structurally.
Results
Computational design and testing of Cas9 efficacy using a synthetic genetic circuit in yeast
We focused our engineering efforts on the HNH nuclease domain as it orchestrates Cas9 cleavage, moving between multiple different conformations during the catalytic cycle, and also regulates cleavage by Cas9’s other active site, the RuvC-like nuclease domain5,10,15. Based on several crystal structures, three separate regions in the HNH domain-containing amino acids 765–780, 838–853, and 911–924 (Fig. 1) were chosen for in silico design using FuncLib. These three regions were chosen as they are either in contact with the target DNA or are required to position active site residues for enzymatic cleavage (Supplementary Fig. 1a–d). For each region, the 10 lowest energy designs according to FuncLib were assessed for enzymatic activity by coupling Cas9 cleavage with cell survival in yeast (Supplementary Fig. 1e). We designed a system consisting of a tetracycline-inducible Cas9 expression plasmid and a constitutive gRNA plasmid (Supplementary Fig. 1f). Specific gRNAs were designed to knock out the open reading frames of the auxotrophic genes ADE2 and HIS3. Loss of function of these genes prevents the growth of yeast on media lacking either adenine or histidine, respectively. In addition, we also used a gRNA targeting CAN1, encoding an arginine permease whose loss enables survival in the presence of canavanine. Cas9 designs were tested using each gRNA separately to determine if the mutations were compatible with Cas9 function. Mutation of ADE2 and HIS3 reduced growth in the absence of adenine or histidine, respectively, while targeting CAN1 rescued growth on media containing canavanine (Supplementary Fig. 1g). Three designs for region 1 (1.4–1.5, 1.8) were found to be catalytically active, while regions 2 and 3 each produced 7 active designs (2.1–2.2, 2.4, 2.6–2.8, 2.10, 3.2–3.4, and 3.7–3.10) (Fig. 2a).
While our system was highly effective in establishing the functionality of wild-type Cas9 (Supplementary Fig. 1g) and our Cas9 designs (Fig. 2a, d) using all three gRNAs, we found that the targeted yeast genes were rapidly mutated upon transformation, complicating a detailed comparison of catalytic activities. To eliminate this confounding variable, we introduced two known Cas9 inhibitors, AcrIIA2 and AcrIIA4, which were shown to bind in distinct modes to the Cas9-gRNA complex16. Notably, it has been found that mutations in Cas9 that eliminate the inhibitory effect of AcrIIA2 have no effect on inhibition of Cas9 activity by AcrIIA4 and vice versa16. AcrIIA2 and AcrIIA4 were fused by a self-cleaving peptide (P2A) and their expression was controlled with a copper-inducible promoter (Fig. 2b, Supplementary Fig. 1h). Using this method, we were able to inhibit preemptive Cas9 activity with 100 mM copper sulfate, while allowing efficient induction of Cas9 in the absence of copper (Supplementary Figs. 1i and 2). This enabled quantification of the in vivo activity of designer Cas9 enzymes targeting ADE2 and HIS3 in comparison to wild-type Cas9 via cell survival mediated through loss of either ADE2 or HIS3 (Fig. 2b and Supplementary Fig. 1j). We observed significant fold changes in activity for 9 designs ranging from 1.4-fold for design 3.9 to 2.3-fold for design 3.10 using the HIS3 gRNA compared to wild-type Cas9 (Fig. 2c). Interestingly, only 2 out of 17 designs (1.4 and 2.4) were found to have a significant decrease in survival and a significant fold change in activity for both the ADE2 and HIS3 gRNAs (Fig. 2c and Supplementary Fig. 3a–g). Taken together, these data demonstrate a strength of the FuncLib approach compared, for instance, to in vitro evolution, since nearly a third of the HNH nuclease-domain designs were significantly more active than their wild-type counterpart, allowing us to choose the most active ones through small-scale experimental testing.
Each of the designs in the three different regions were computed independently of one another, and as such might not be compatible with each other. We hypothesized, however, that there could be a potential to further increase the enzymatic activity by combining designs from different regions. We made designs with all possible combinations of the mutated regions that had a significant increase in activity and found that all combinations, with the exception of design 2.10 + 3.8, retained their enzymatic activity (Fig. 2d). Furthermore, most combinations were found to have a significant increase in activity when compared to wild-type Cas9 for both the ADE2 and HIS3 gRNAs (Fig. 2e and Supplementary Fig. 4a–i). However, in order to establish that the combinations result in a synergistic increase in activity, we compared the activity of each combination relative to their single mutant counterparts (e.g. design 1.4 + 2.1 compared to both designs 1.4 and 2.1) (Supplementary Fig. 4j). Testing using the HIS3 gRNA showed that most combined designs were found to have either a neutral effect (~1-fold change, p > 0.05) or a positive fold change (fold change >1.0, p < 0.05). Only designs 1.4 + 3.9 and 1.5 + 3.10 were found to have negative epistasis (fold change <1.0, p < 0.05) for the HIS3 gRNA and none demonstrated a significant improvement in activity for the ADE2 gRNA compared to the individual mutants (Supplementary Fig. 3j).
Hyperactive Cas9 functionality in mammalian cells
We selected the 10 most active Cas9 designs for testing in mammalian systems. These designs were codon optimized for mammalian-cell expression. We used a well-characterized VEGFA gRNA, with known off target cleavage sites, and determined donor-independent editing efficiencies in human HEK293T cells by next-generation sequencing of targeted DNA amplicons. Several designs showed a significant increase in the number of mutated sequences at the VEGFA locus, particularly designs 2.2 and 2.1 + 3.9, with 95% and 79%, respectively, of VEGFA alleles edited (Fig. 3a and Supplementary Fig. 5a), whereas wild-type Cas9 mutated 64% of VEGFA alleles. This result represents a 1.5-fold improvement in editing for design 2.2, which we dub TurboCas9, and a 1.2-fold increase for design 2.1 + 3.9 (Fig. 3b). Several other Cas9 designs were trending towards improved editing, but these were not statistically significant, while the others remained as active as the wild-type Cas9.
We developed a computational pipeline to classify donor-independent editing into three broad categories: single events of either a deletion or insertion, and combined events in which insertion and deletion or multiples thereof occurred within the same allele (“Multiple”)(Fig. 3c). Wild-type Cas9-mediated editing resulted predominantly in single deletion and insertion events; however, combined events were comparatively sparse (Fig. 3c). Single deletion events occurred at a similar rate for the Cas9 designs and were not significantly different from wild-type Cas9. Our designs had an approximately two-fold decrease in the number of insertions (Fig. 3c), although the insertion lengths were similar (Supplementary Fig. 5b). Still, the designs caused a three-fold or more increase in the number of multiply edited alleles (Fig. 3c). The accumulation of indels has been shown to be dependent on the rate at which cleaved DNA is produced and made available to the cellular repair machinery by editing enzymes6, indicating that the designed mutations successfully increased the activity of Cas9. Furthermore, in addition to the number of resulting mutations, every one of the engineered Cas9 enzymes induced significantly larger deletions (Fig. 3d). Increases in the sizes of the deletions for single events ranged from ~two-fold for design 2.4 to well over four-fold for design 1.5 + 2.2.
TurboCas9 cleavage of the VEGFA gene caused a large number of deletions that nearly consumed the entire amplicon (Supplementary Fig. 6). To examine whether the size of the target region for amplicon sequencing might obscure even larger deletions we amplified a 4.2 kb region of the VEGFA locus surrounding the gRNA target site. Agarose gel electrophoresis revealed a minor proportion of smaller PCR products in VEGFA alleles targeted by wild-type Cas9 and this was amplified in samples treated with TurboCas9 (Supplementary Fig. 5k), indicating that there are several large deletions that would not have been detected in standard amplicon sequencing. Long amplicons were also subjected to tagmentation followed by deep sequencing to catalog the types of large deletions present. These data revealed a preponderance of large deletions ranging up to 3.9 kb in size that was more frequent in cells expressing TurboCas9 (Supplementary Fig. 5l, m). Taken together, we conclude that the designs significantly increase Cas9 activity as well as improving the enzyme’s ability to generate indels that create a knockout or delete larger parts of target genes.
Next we examined the generality of these observations using additional diverse gRNAs. These included a therapeutic gRNA targeting the BCL11A gene, a selection of gRNAs that were verified in our laboratory previously (ZBED5, MECRCN, MRPL45, MRPL58, MTRF1L, VWA8), and an additional two from the literature (CXCR4 and RUNX1). We found that TurboCas9 significantly enhanced donor-independent editing with six of these gRNAs (ZBED5, MECRCN, MRPL45, MRPL58 and MTRF1L), had no effect on editing by two gRNAs (VWA8 and CXCR4), and decreased editing for one gRNA (RUNX1) (Fig. 3e, f). Therefore, for the majority of gRNAs TurboCas9 enables more effective gene disruption (Fig. 3g). Given the increased activity of TurboCas9 that was observed in HEK293T cells, we examined donor-independent editing of our most active mutant TurboCas9 in HeLa, another commonly used cell line (Fig. 3h). Interestingly, even though HeLa cells exhibit a lower level of donor-independent editing using the VEGFA gRNA than HEK293T cells, likely due to the fact that HEK293T cells are partially deficient in DNA mismatch repair17, we observed that TurboCas9 was threefold more active than wild-type Cas9. This indicates that TurboCas9 might be particularly useful in systems where gene editing is typically less efficient.
Increasing Cas9 activity would result in a requirement for an increased number of repair events and thus potentially increase the complexity of DNA repair outcomes at these sites. To examine the nature of the induced mutations in more detail, we mapped the exact locations and lengths of mutations for the VEGFA donor-independent experiments (Supplementary Fig. 6) and categorized indel events based on their respective CIGAR (concise idiosyncratic gapped alignment report) complexity level (Supplementary Fig. 5c), where the higher the CIGAR complexity (CC) levels comprise deletions and insertions occurring simultaneously in more complex combinations. We observed that the number of reads categorized in these higher CIGAR complexity levels in our designs was significantly increased relative to wild-type Cas9 (Supplementary Fig. 5d). All mutants were found to have at least a twofold increase in the number of reads in CC4, TurboCas9 had an approximately threefold increase in the number of reads present in CC6 and CC7, and TurboCas9, design 1.5 + 2.4 and 2.1 + 3.9 were found to have a significant increase of alleles in CC7 (Supplementary Fig. 5d), which include multiple deletions and multiple insertions within a single allele. We did not find a significant change in the occurrence of frameshifts as a result of all editing events combined (Supplementary Fig. 5e), although the larger single deletion events induced by the designs resulted in significantly more frameshifts in 8 out of 10 designs (Supplementary Fig. 5f). Several of our designs are either trending towards or have significantly increased activity relative to wild-type Cas9; however, all designs increased the number of complex editing events in comparison to wild-type Cas9.
For most of the other gRNAs tested the average length of all different indel types was not significantly changed (Supplementary Fig. 5g-j). However, we found that for VWA8, although its level of mutation was not changed significantly, it demonstrated a significant increase in the length of insertions as single events as well as in the context of multiply edited events. Furthermore, we found that ZBED5 and MRPL58 genes targeted by TurboCas9 had a significant increase in the average length of deletions. While MRPL58 and RUNX1 exhibited a significant increase in the average length of deletions in the context of multiply edited alleles. The frequency at which deletions or multiply edited events occur was increased for TurboCas9 with six out of 10 gRNAs and five out of 10 gRNAs, respectively (Fig. 4a-f). In contrast, insertions were only significantly increased for two out of 10 gRNAs. These data are in line with a trend we observed for all of our Cas9 designs as all show a reduced level of single insertion events, whilst providing a significant increase in deletions and multiply edited events.
Efficacy of TurboCas9 in homology directed repair
Although harnessing non-homologous end joining to mutate target genes is currently the most common application of CRISPR, using Cas9 to stimulate homology-directed repair (HDR) to make more substantial and scarless genome modifications is highly desirable. To examine whether it is possible to perform HDR using TurboCas9, we targeted BCL11A and VEGFA loci with homologous single-stranded oligonucleotide templates that carry several mutations within the gRNA to introduce a PstI recognition site. PstI digestion of amplicons generated from these samples showed that efficient levels of HDR were achieved for both wild-type Cas9 as well as TurboCas9 (Fig. 4g–j). We sequenced short amplicons flanking the HDR site and found that levels of HDR were similar for oligonucleotides corresponding to both the sense and antisense strands for both target genes (Fig. 4i–j). Interestingly, although wild-type Cas9 and TurboCas9 stimulated similar levels of HDR at the BCL11A locus, TurboCas9 enabled significantly more HDR events at the VEGFA target site. These data indicate that TurboCas9 is compatible with HDR approaches and in some cases may enable more efficient HDR.
Fidelity of hyperactive Cas9 enzymes in mammalian cells
Increased fidelity has been observed to be inversely correlated with on-target activity12. Therefore, we examined whether Cas9 designs that increase on-target activity would exhibit a similar increase in off-target activity. We amplified the top 4 known off-target sites for the VEGFA gRNA, named OFF22, OFF10, OFF5-1, and OFF5-2, after editing by TurboCas9, compared to wild-type Cas9. Interestingly, we observed that the designs increased editing at two off targets but did not significantly increase editing at two other off-targets (Fig. 5a–d and Supplementary Figs. 7-9). OFF5-2 differs from the VEGFA gRNA by two bp with one mismatch occurring at base 18 of the seed sequence, which is typically less tolerated by Cas9 and corroborated in our data by the low levels of editing for the wild-type Cas9. The increased activity of TurboCas9 does not seem to have lessened the fidelity of Cas9 when mismatches between the seed sequence and the target occur near the PAM sequence. OFF22 has a mismatch at bp 14 of the gRNA sequence and no significant difference was observed between our designs and the wild-type Cas9. OFF10 and OFF5-1 were both found to have been edited significantly more by our designs and both have mutations in the first 10 bp of the gRNA. Unlike the on-target site, we did not observe an increase in multiply edited alleles nor a reduction in insertions for these off-target sites (Fig. 5d, Supplementary Figs. 8 and 9). Similar observations were found for the distribution of reads in the different levels of CIGAR complexity (Fig. 5d, Supplementary Fig. 7a, b). Interestingly, the previously seen increase in deletion size for both the single deletions and also deletions within multiply edited alleles for the engineered Cas9 enzymes was not observed for the off-target sites. On the contrary, for several of the off-target sites, we observed a significant decrease in deletion size. Thus, the designs significantly increase Cas9 on-target activity without a consistently negative impact on fidelity. To extend this observation, we predicted off-target sites for each of the gRNAs targeting ZBED5 and BCL11A and used amplicon sequencing to compare TurboCas9 and wild-type Cas9. We observed overall low levels of editing for these off-target sites, similar to untreated cells, with no significant differences between TurboCas9 and wild-type Cas9. The average length of each of the four classified indel types also showed no significant differences (Supplementary Fig 7c, d). Taken together all the data examining the fidelity of TurboCas9 versus wild-type Cas9 showed we achieved similar levels of fidelity while significantly increasing the on-target activity (Fig. 5g).
Structural analysis of the TurboCas9 designed mutations
To understand the structural basis for the improved efficiency observed in TurboCas9, we examined its design model. When we conducted the design calculations, the most relevant molecular structure of the Cas9 active conformation was PDB entry 5F9R. This structure, however, places the catalytic His840 distant from the DNA and does not include the Mg2+ ions that are crucial for catalytic activity. One of the five mutations in TurboCas9, Gln844Arg, is in close proximity to the catalytic His840 and its role in improving activity cannot be understood in the context of the molecular structure which we had available at the time (Fig. 6a and Supplementary Fig. 10a). We therefore modeled the five TurboCas9 mutations on a more recent structure (PDB entry: 7S4X)18, which has the His840 in a catalytically competent position as well as the crucial Mg2+ ions (Fig. 6b). This model shows that Arg844 may form stabilizing saltbridge interactions with Asp850 (within the HNH domain) and Glu60 (within the Arg domain). Thus, the Gln844Arg mutation is tolerated in the inactive HNH conformation where position 844 is solvent exposed and may specifically stabilize the active form by interacting with two acidic residues, one of which resides in a different domain. We analyzed the evolutionary prevalence of the Gln844Arg mutation, finding that Cas9 homologs predominantly exhibit the wild type Gln identity (80% compared to only 16% for Arg; Supplementary Fig. 10b). Thus, even mutations that occur in the minority of homologs may have a beneficial impact on activity.
The other four mutations in TurboCas9 (Val842Leu, Phe846Tyr, Leu847Met, and Ile852Phe) optimize core packing and hydrogen bonding interactions within the Cas9 enzyme (Supplementary Fig. 10a). Such mutations may indirectly stabilize the catalytically competent form of the enzyme to increase its efficiency. Thus, the combination of phylogenetic and atomistic design calculations that forms the basis for FuncLib calculations focuses design calculations on a small number of constructs. Our modeling suggests that the designed mutations stabilize the HNH domain and its interactions with the Arg domain in the catalytically active conformation of Cas9.
Discussion
In summary, the computational design of the Cas9 HNH domain enabled the production of hyperactive Cas9 enzymes with the ability to introduce larger and more extensive mutational signatures. A few amino acid positions that were targeted by FuncLib were previously targeted for alanine scanning mutagenesis by Slaymaker et al.19 to reduce off-target effects, rather than to increase the on-target activity. Interestingly, Slaymaker et al.19 also designed a number of mutations to decrease specificity by strengthening the interaction of Cas9 with the non-target strand, with the main one of interest being an L847R substitution. Our design 2.1 and TurboCas9 both have this position substituted with a methionine instead of an arginine and in the case of TurboCas9, we observed no overall changes in off-target activity.
The combinatorial layering of activity-enhancing variants resulted in epistatic effects in our study, and in some cases increasing activity when two designed regions were combined. Interestingly, two of the four best single region designs are from region 2 which is centered on H840 of the HNH active site. The other two are from regions 1 and 3 which make up the hinges of the HNH domain, supporting its dynamic repositioning20,21. Furthermore, all our combined designs that have been verified in both mammalian and yeast settings comprise a region 2 and either a region 1 or region 3 mutant. This suggests that for enhanced Cas9 on-target activity and perhaps tolerable off-target activity, having mutations in region 2 is essential and can be potentially further enhanced by adding mutations in either region 1 or 3. This result and our discovery of numerous hyperactive Cas9 enzymes verify the hypothesis that the HNH domain is not naturally optimized for DNA cleavage efficiency. This might reflect the competing evolutionary pressures on natural Cas9 enzymes to balance activity, specificity and Cas9 abundance relative to tracrRNA, crRNAs, and their target DNAs, in their roles of targeting viral genomes for destruction as well as PAM-specific spacer acquisition22.
Structural studies have shown that Cas9 positions its gRNA and target DNA prior to reorientation of the HNH domain for cleavage20. Displacement of the non-target strand and R-loop formation then enable cleavage by the HNH domain5,21. For designs with the same number of cleaved alleles but more extensively mutated targets (such as design 1.4) it could be that the introduced mutations enhance cleavage without improving R-loop formation, while for others (such as TurboCas9) both binding and cleavage might be enhanced, or alternatively TurboCas9 might bind less strongly after cleavage, thereby exposing the DNA to more diverse repair outcomes.
Since the discovery of CRISPR gene editing much attention has focused on improving the specificity of Cas proteins23; however, for many applications efficiency could be more important than specificity. We recommend using TurboCas9 where multiple genes need to be targeted simultaneously, for example in cancer, in which multiple oncogenes might need to be disabled to halt cancer cell growth. The higher activity would also enable applications where multiple cleavage events would be required, for example in vitro applications using Cas9 in a way more analogous to a restriction enzyme24, or in other situations where cleavage efficiency might be limiting. The FuncLib designed hyperactive Cas9 enzymes described here provide tools to address these gaps. Furthermore, the ability of these enzymes to introduce more extensive deletions and complex repair scars from multiple edits could be useful to more effectively knockout genes or to provide diverse signatures for cellular recording and lineage tracing25.
Methods
Computational enzyme design
We performed three separate FuncLib13 runs, independently mutating regions including amino acids 765–780, 838–853, and 911–924 of SpCas9 using PDB:5F9R as a structural model21. The sequence space for each position in regions 1–3 was explored using the combination of PSSM and Rosetta energy filters. The protein elements from ~100,000 designs were explicitly modeled for each region, each with 3–5 mutations from the parental Cas9. The catalytic histidine (H840) was fixed during the production of the region 2 designs. All designs were ranked according to the all-atom Rosetta energy function26, which is dominated by van der Waals packing, solvation, electrostatics and hydrogen bonding, interactions with DNA and metal ions were not included in energy calculations. The best ten designs of each region, which differed by at least 3 mutations from one another, were selected for experimental testing.
Plasmid construction
SpCas9 was codon optimized for S. cerevisiae using Gene Designer software (ATUM) and synthesized by Integrated DNA Technologies (IDT). Wild-type Cas9 was designed in four gBlocks and assembled using Gibson assembly in pJ201 (ATUM). The Cas9 ORF was sub-cloned into the yeast expression plasmids pCM251 and pCM252 using BamHI and NotI. The three regions of the HNH domain that were selected for Funclib mutagenesis were flanked by SpeI-BsaI, BsmBI-SacII and XbaI-StuI restriction sites, respectively. Each region containing the mutations of interest was designed in Gene Designer and synthesized by Twist Biosciences. Designed regions were either individually cloned into Cas9 or as combinations. The pRS426-CAN1 gRNA plasmid27 (Addgene #43803) was used for gRNA expression in yeast and two separate gRNAs targeting ADE2 and HIS3 were synthesized by IDT and sub-cloned in place of the CAN1 gRNA using NheI and MluI. The Cas9 inhibitors AcrIIA2 and AcrIIA4 fused with a P2A peptide and flanked by the CUP1 promoter and PGI1 terminator were synthesized by IDT and cloned into pRS426 gRNA plasmids using KpnI and MluI. For experiments in human cells, the designed HNH domains were human codon optimized and subcloned into the mammalian Cas9 expression plasmid pD1311-AD (ATUM) using the restriction enzymes SacI and SphI. Cassettes for expressing the various gRNAs were produced as gBlocks (IDT) and subcloned into pD1311-AD using BbsI. All plasmids were verified using Sanger sequencing by the Australian Genomics Research Facility (AGRF), Perth, Western Australia. Sequences of designed open reading frames are provided in Supplementary Table 1.
Yeast transformation and survival assays
Saccharomyces cerevisiae BY4738 cells (MATα trp1Δ63 ura3Δ0)28 were obtained from Jef Boeke, New York University. Cells were transformed with plasmids using the LiAc method according to Gietz and Woods29, plated on SC-T-U, and incubated at 30 °C for 2–3 days. A single colony was grown overnight in 10 ml of SC-T-U media at 30 °C. Yeast cultures were standardized to one OD600 in TE and three serial 1/10 dilutions were made in TE. Of each dilution, 5 μl were plated out on selective media (SC) with the appropriate nutrients lacking and supplemented with anhydrotetracycline and canavanine where indicated. Plates were grown for 2–3 days at 30 °C until adequate growth was observed for the control yeast samples on each plate. For quantitative survival assays, single colonies were grown overnight in 10 ml of SC-T-U media at 30 °C. Cells were standardized to one OD600 and diluted to 2.8 × 10-3 in TE. Of each sample, 100 μl were plated on selective media with or without anhydrotetracycline lacking the appropriate auxotrophic nutrients and grown for 2 days at 30 °C.
Mammalian cell transfection and amplicon sequencing
HEK293T (CRL-3216) and HeLa (CCL-2) cells were obtained from ATCC and cultured at 37 °C under humidified 95% air/5% CO2 in Dulbecco’s modified Eagle’s medium (DMEM) containing glucose (4.5 g/l), 1 mM sodium pyruvate, 2 mM glutamine and 10% fetal bovine serum (FBS). Cells were seeded at 60% confluence in 24-well plates and transfected after 24 h with 500 ng of DNA in OptiMEM media (Gibco, Life Technologies) using a 1:1 mix of two transfection reagents (Lipofectamine LTX Reagent with Plus Reagent, Invitrogen, and FuGene HD, Promega)30. Each Cas9 variant was transfected in triplicate. For HDR experiments, a short single-stranded homologous DNA was co-transfected in a total amount of 500 ng of DNA including Cas9 plasmid. 72 h after transfection the cells were trypsinized and the cell pellets lysed for DNA extraction using the KAPA Express Extract Kit, according to the manufacturer’s instructions (Sigma-Aldrich). Amplicons were generated using primers flanking the gRNA and incorporating Illumina adaptor sequences (Supplementary Table 2). For long amplicon deep sequencing, PCR produced were converted into sequencing libraries by tagmentation (Illumina). Libraries were sequenced on an Illumina MiSeq using 250 bp paired-end chemistry by the Australian Genomics Research Facility (AGRF), Perth, Western Australia.
Next generation sequencing analyses
Sequenced reads were trimmed with TrimGalore31 (v0.6.6) using cutadapt32 (v1.18) and fastqc33 (v0.11.9) (--paired –nextera –fastqc). Trimmed reads were merged with FLASH34 (v1.2.11) (--min-overlap 10 –max-overlap 250) and aligned to the amplicon sequences with BLAT35 (v37x1) (-minScore = 0 -stepSize = 1 -out = psl). To avoid a heterozygous genomic insertion that complicated interpretation of the CIGAR string, 23 nt were trimmed from 3′ end of MRPL45 reads with TrimGalore (--three_prime_clip_R1 23). To avoid a 5′ terminal nucleotide next to the adapter region that complicated interpretation of the CIGAR string, 1 nt was trimmed from 5′ end of BCL11A chr10 off target reads with TrimGalore (--clip_R1 1) after merging. The resultant .psl file was converted to SAM/BAM format with the uncle_psl.py script (https://github.com/bsipos/uncle_psl). The resulting BAM files were parsed with command-line tools based on the number of alphabetic characters in the CIGAR sequence (which we have termed CIGAR complexity). Since these characters represent specific alignment characteristics (match, insertion, deletion, or soft-clipping) and are paired with a number describing their length, we used this information to determine the lengths and locations of deletion and insertion events for all alignments. Alignments that contained soft-clipped sequences, or with a CIGAR complexity of 7 or above, were excluded. All configurations of alignment up to a CIGAR complexity of 6 and the simplest of complexity 7 (MIDMIDM) were collated and summarized. Sequenced reads were trimmed with TrimGalore27 (v0.6.6) using cutadapt29 (v1.18) and fastqc30 (v0.11.9) (--paired --nextera --fastqc). Trimmed reads were aligned to amplicon sequences with Bowtie236 v2.4.4 with default parameters, and variants called with freebayes37 v1,3,4 with ploidy (-p) set to 3 for BCL11A or 5 for VEGFA, and the parameters: -F 0.01 --haplotype-length 20 --pooled-continuous.
For long amplicon deep sequencing analyses adaptor trimming was performed using Trimmomatic 0.3938 and aligned with Hisat2 in paired-end mode with no-spliced-alignment being permitted and with read gap open/close set to 1,1 and reference gap open/close set to 10000, 100039. The resultant alignments were parsed using Python3 script CIGAR strings, which were used to identify any read pairs having soft slipping or deletions. Deletions were taken as called, however, where soft clipping was performed the soft clipped end was searched for in prior bases with string length starting from 30 and reducing to 5. Where a unique match was observed this was taken as evidence of a large deletion. The resultant read pairs were then aligned via a custom Python script (https://github.com/ORAFLAB/DeletionQuantifier) and those reads whose mapping or deletion coordinates included any positions within the gRNA (positions 2059–2078 within the reference amplicon sequence) were retained for display.
Mathematical analysis of gene editing results
Variables:
Raw sample CFU = raw colony counts from plates missing either adenine or histidine
Raw CFU from control plate = raw colony counts from loading control plates
CFU (Standardized to 100) = normalized CFU
Variables:
Normalized CFU = normalized CFU standardized to a CFU of 100
Average CFU of WT = average of the normalized counts for WT Cas9
CFU (relative to WT) = Relative CFU compared to WT
Variables:
Normalized sample CFU = normalized CFU standardized to a CFU of 100
Average control CFU = average of the normalized counts for control
CFU WT #n = relative CFU for WT sample 1 through 3
Average WT CFU = average of the normalized counts for WT Cas9
Variables:
Average length of Indels = average length of different indels in bp
Total number of reads for indel type = sum of all reads for indel type
Variables:
Edited read count for TurboCas9 n# = number of edited reads for TurboCas9 sample 1 through 3
Edited read count for WT Cas9 n# = number of edited reads for WT Cas9 sample 1 through 3
Fidelity = Fidelity of TurboCas9 relative to WT Cas9 and WT Cas9
Statistical analysis
Statistical significance of yeast survival assay results were calculated using a standard Student’s t-test with two-tailed distribution and unequal variance was assumed between samples. An FDR-adjusted Student’s t-test, using the same parameters, was used to calculate statistical significance in the mammalian gene editing experiments.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
Work in our laboratories is supported by fellowships and project grants from the National Health and Medical Research Council (APP1154646 to A.F. and APP1154932 to O.R.), the Australian Research Council (DP210103816 to A.F. and O.R.), the European Research Council (815379 to S.J.F.), the Israel Science Foundation (1884/19 to S.J.F.) and by the Dr. Barry Sherman Institute for Medicinal Chemistry (to S.J.F.). S.J.F., A.F., and O.R. are investigators of the ARC Centre of Excellence in Synthetic Biology (CE200100029). P.D.V. was supported by Colliers International. J.L.M. and A.P.G. are supported by Research Training Program (RTP) Scholarships. We thank David Chandler from the Australian Genomics Research Facility (AGRF), Perth, Western Australia, for assistance with amplicon sequencing and Jef Boeke, New York University, for the kind gift of the S. cerevisiae BY4738 strain.
Source data
Author contributions
P.D.V., A.F., and O.R. designed the research. P.D.V., G.R., J.L.M., A.P.G., and M.B. carried out biological experiments. S.J.S., S.A.R., O.K. and S.J.F. performed computational experiments. S.J.F., A.F., and O.R. supervised the research. P.D.V. and O.R. wrote the manuscript, with contributions from all authors, and all authors edited the manuscript.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work.
Data availability
The authors declare that the data supporting the findings of this study are available within the paper and its supplementary information files. Deep sequencing data are available at the NCBI Sequence Read Archive (SRA) under BioProject ID PRJNA762160. Source data are provided in this paper.
Competing interests
The authors declare the following competing interests: P.D.V., A.F., O.R., O.K., and S.J.F. are inventors of a patent application (Australian Provisional Patent Application No. 2020904609) filed by The University of Western Australia on designed Cas9 enzymes. The remaining authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-022-30598-9.
References
- 1.Knott GJ, Doudna JA. CRISPR-Cas guides the future of genetic engineering. Science. 2018;361:866–869. doi: 10.1126/science.aat5011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hsu PD, Lander ES, Zhang F. Development and applications of CRISPR-Cas9 for genome engineering. Cell. 2014;157:1262–1278. doi: 10.1016/j.cell.2014.05.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Karvelis T, Gasiunas G, Siksnys V. Harnessing the natural diversity and in vitro evolution of Cas9 to expand the genome editing toolbox. Current Opinion in Microbiology. 2017;37:88–94. doi: 10.1016/j.mib.2017.05.009. [DOI] [PubMed] [Google Scholar]
- 4.Sander JD, Joung JK. CRISPR-Cas systems for editing, regulating and targeting genomes. Nat. Biotechnol. 2014;32:347–355. doi: 10.1038/nbt.2842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Gong S, Yu HH, Johnson KA, Taylor DW. DNA unwinding is the primary determinant of CRISPR-Cas9 activity. Cell Rep. 2018;22:359–371. doi: 10.1016/j.celrep.2017.12.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Brinkman EK, et al. Kinetics and fidelity of the repair of Cas9-induced double-strand DNA breaks. Mol. Cell. 2018;70:801–813.e6. doi: 10.1016/j.molcel.2018.04.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Richardson CD, Ray GJ, DeWitt MA, Curie GL, Corn JE. Enhancing homology-directed genome editing by catalytically active and inactive CRISPR-Cas9 using asymmetric donor DNA. Nat. Biotechnol. 2016;34:339–344. doi: 10.1038/nbt.3481. [DOI] [PubMed] [Google Scholar]
- 8.Sternberg SH, Redding S, Jinek M, Greene EC, Doudna JA. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature. 2014;507:62–67. doi: 10.1038/nature13011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Yourik P, Fuchs RT, Mabuchi M, Curcuru JL, Robb GB. Staphylococcus aureus Cas9 is a multiple-turnover enzyme. RNA. 2019;25:35–44. doi: 10.1261/rna.067355.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Raper AT, Stephenson AA, Suo Z. Functional insights revealed by the kinetic mechanism of CRISPR/Cas9. J. Am. Chem. Soc. 2018;140:2971–2984. doi: 10.1021/jacs.7b13047. [DOI] [PubMed] [Google Scholar]
- 11.Clarke R, et al. Enhanced bacterial immunity and mammalian genome editing via RNA-polymerase-mediated dislodging of Cas9 from double-strand DNA breaks. Mol. Cell. 2018;71:42–55.e8. doi: 10.1016/j.molcel.2018.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Liu MS, et al. Engineered CRISPR/Cas9 enzymes improve discrimination by slowing DNA cleavage to allow release of off-target DNA. Nat. Commun. 2020;11:3576. doi: 10.1038/s41467-020-17411-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Khersonsky O, et al. Automated design of efficient and functionally diverse enzyme repertoires. Mol. Cell. 2018;72:178–186.e5. doi: 10.1016/j.molcel.2018.08.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bengel LL, et al. Engineered enzymes enable selective N-alkylation of pyrazoles with simple haloalkanes. Angew. Chem. Int. Ed. 2021;60:5554–5560. doi: 10.1002/anie.202014239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Dagdas YS, Chen JS, Sternberg SH, Doudna JA, Yildiz A. A conformational checkpoint between DNA binding and cleavage by CRISPR-Cas9. Sci. Adv. 2017;3:eaao0027. doi: 10.1126/sciadv.aao0027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Liu L, Yin M, Wang M, Wang Y. Phage AcrIIA2 DNA mimicry: structural basis of the CRISPR and anti-CRISPR arms race. Mol. Cell. 2019;73:611–620.e3. doi: 10.1016/j.molcel.2018.11.011. [DOI] [PubMed] [Google Scholar]
- 17.Trojan J, et al. Functional analysis of hMLH1 variants and HNPCC-related mutations using a human expression system. Gastroenterology. 2002;122:211–219. doi: 10.1053/gast.2002.30296. [DOI] [PubMed] [Google Scholar]
- 18.Bravo JPK, et al. Structural basis for mismatch surveillance by CRISPR–Cas9. Nature. 2022;603:343–347. doi: 10.1038/s41586-022-04470-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Slaymaker IM, et al. Rationally engineered Cas9 nucleases with improved specificity. Science. 2016;351:84–88. doi: 10.1126/science.aad5227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Jiang F, Zhou K, Ma L, Gressel S, Doudna JA. A Cas9-guide RNA complex preorganized for target DNA recognition. Science. 2015;348:1477–1481. doi: 10.1126/science.aab1452. [DOI] [PubMed] [Google Scholar]
- 21.Jiang F, et al. Structures of a CRISPR-Cas9 R-loop complex primed for DNA cleavage. Science. 2016;351:867–871. doi: 10.1126/science.aad8282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.McGinn J, Marraffini LA. Molecular mechanisms of CRISPR–Cas spacer acquisition. Nat. Rev. Microbiol. 2019;17:7–12. doi: 10.1038/s41579-018-0071-7. [DOI] [PubMed] [Google Scholar]
- 23.Pineda M, Lear A, Collins JP, Kiani S. Safe CRISPR: challenges and possible solutions. Trends Biotechnol. 2019;37:389–401. doi: 10.1016/j.tibtech.2018.09.010. [DOI] [PubMed] [Google Scholar]
- 24.Karvelis T, Gasiunas G, Siksnys V. Programmable DNA cleavage in vitro by Cas9. Biochem. Soc. Trans. 2013;41:1401–1406. doi: 10.1042/BST20130164. [DOI] [PubMed] [Google Scholar]
- 25.Farzadfard F, Lu TK. Emerging applications for DNA writers and molecular recorders. Science. 2018;361:870–875. doi: 10.1126/science.aat9249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lapidoth GD, et al. AbDesign: An algorithm for combinatorial backbone design guided by natural conformations and sequences. Proteins Struct. Funct. Bioinf. 2015;83:1385–1406. doi: 10.1002/prot.24779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.DiCarlo JE, et al. Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic Acids Res. 2013;41:4336–4343. doi: 10.1093/nar/gkt135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Brachmann CB, et al. Designer deletion strains derived from Saccharomyces cerevisiae S288C: a useful set of strains and plasmids for PCR-mediated gene disruption and other applications. Yeast. 1998;14:115–132. doi: 10.1002/(SICI)1097-0061(19980130)14:2<115::AID-YEA204>3.0.CO;2-2. [DOI] [PubMed] [Google Scholar]
- 29.Gietz RD, Woods RA. Transformation of yeast by lithium acetate/single-stranded carrier DNA/polyethylene glycol method. Methods Enzymol. 2002;350:87–96. doi: 10.1016/S0076-6879(02)50957-5. [DOI] [PubMed] [Google Scholar]
- 30.Ishiguro K, et al. Combinational use of lipid-based reagents for efficient transfection of primary fibroblasts and hepatoblasts. Biotechniques. 2017;63:37–39. doi: 10.2144/000114569. [DOI] [PubMed] [Google Scholar]
- 31.Krueger, F., James, F., Ewels, P., Afyounian, E. & Schuster-Boeckler, B. FelixKrueger/TrimGalore: v0.6.7 - DOI via Zenodo. 10.5281/ZENODO.5127899 (2021).
- 32.Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.Journal; Vol 17, No 1 Next Gener. Seq. Data Anal.10.14806/ej.17.1.200 (2011).
- 33.de Sena Brandine G, Smith AD. Falco: high-speed FastQC emulation for quality control of sequencing data. F1000Research. 2019;8:1874. doi: 10.12688/f1000research.21142.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Magoč T, Salzberg SL. FLASH: Fast length adjustment of short reads to improve genome assemblies. Bioinformatics. 2011;27:2957–2963. doi: 10.1093/bioinformatics/btr507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kent WJ. BLAT-The BLAST-like alignment tool. Genome Res. 2002;12:656–664. doi: 10.1101/gr.229202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. arXiv 1207.3907 (2012).
- 38.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods. 2015;12:357–360. doi: 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The authors declare that the data supporting the findings of this study are available within the paper and its supplementary information files. Deep sequencing data are available at the NCBI Sequence Read Archive (SRA) under BioProject ID PRJNA762160. Source data are provided in this paper.