Abstract
Phosphorothioate (PT)-modification was discovered in prokaryotes and is involved in many biological functions such as restriction-modification systems. PT-modification can be recognized by the sulfur binding domains (SBDs) of PT-dependent restriction endonucleases, through coordination with the sulfur atom, accompanied by interactions with the DNA backbone and bases. The unique characteristics of PT recognition endow SBDs with the potential to be developed into gene-targeting tools, but previously reported SBDs display sequence-specificity for PT-DNA, which limits their applications. In this work, we identified a novel sequence-promiscuous SBDHga from Hahella ganghwensis. We solved the crystal structure of SBDHga complexed with PT-DNA substrate to 1.8 Å resolution and revealed the recognition mechanism. A shorter L4 loop of SBDHga interacts with the DNA backbone, in contrast with previously reported SBDs, which interact with DNA bases. Furthermore, we explored the feasibility of using SBDHga and a PT-oligonucleotide as targeting tools for site-directed adenosine-to-inosine (A-to-I) RNA editing. A GFP non-sense mutant RNA was repaired at about 60% by harnessing a chimeric SBD-hADAR2DD (deaminase domain of human adenosine deaminase acting on RNA), comparable with currently available RNA editing techniques. This work provides insights into understanding the mechanism of sequence-specificity for SBDs and for developing new tools for gene therapy.
Graphical Abstract
Graphical Abstract.

INTRODUCTION
Epigenetic modifications of nucleic acids play important roles in biological processes, such as bacterial restriction-modification systems (1), development (2), aging (3) and cancer (4). Most natural DNA modifications occur on the bases; however, the recently discovered DNA phosphorothioate (PT)-modification involves the replacement of a non-bridging oxygen with sulfur in the DNA backbone (5,6). PT-modification is catalyzed by the products of genes dndABCDE in a sequence-selective and Rp stereo-specific manner (5,6). PT-modification is distributed in bacterial and archaeal species, and occurs in six known consensus sequence patterns, with GPSGCC/GPSGCC (PS denotes PT link) in Streptomyces lividans, GPSAAC/GPSTTC, GPSATC/GPSATC and GPSTAC/GPSTAC in Escherichia coli B7A and Salmonella enterica, GPSATC/GPSATC in Bermanella marisrubri RED65, and CPSCA/TGG (hemi PT-modification) in Vibrio cyclitrophicus FF75 as a few examples (7,8).
DNA modifications are usually associated with restriction-modification systems, in which sequence-specific DNA modifications protect self-DNA from cleavage by restriction endonucleases (REases) that attack unmodified invading foreign DNA, such as phages and mobile genetic elements (9). As a result of this arms race, phages have developed much more diverse DNA modifications, mimicking the host genomic DNA to escape recognition and cleavage by the host restriction systems during infections (10). To prevent invasion of genome-modified phages, bacteria developed the modification-dependent REases, which cleave modified DNA, e.g. DNA that has undergone methylation, 5-hydroxymethylation, or glucosylated hydroxymethylation (11). In previous studies, we identified the PT-dependent REases ScoMcrA and SprMcrA, which specifically recognize and cleave PT-modified DNA (12,13). This kind of REase consists of a PT-recognizing SBD and a cleavage domain, which is usually an HNH endonuclease domain at the carboxyl terminal (12,13). SBD binds PT-DNA by embedding the sulfur atom into a hydrophobic surface cavity and interacting with the DNA backbone, as well as bases, by hydrogen bonds and electrostatic interactions (14). So far, about 10 SBD homologs have been characterized, all of which have sequence-specificity in varying extents (13–15). Protein engineering based on crystal structure expanded the target sequence range of SBDSco (16), but a more promiscuous PT-dependent SBD that could bind all patterns of PT-modified nucleic acids has not been identified yet.
Many worthwhile techniques have been developed from restriction systems, among which genome editing is regarded as an efficient tool for correction of genetic mutations and treatment of numerous diseases (17,18). Additionally, site-directed RNA editing is capable of altering genetic codes to correct mutations in proteins, with potential for mitigating genetic diseases caused by point mutations in RNA (19). RNA editing will not cause permanent genetic changes, and thus it is probably safer than genome editing (19). The genome editing systems, such as CRISPR system, Zinc-finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs) (20); and the site-directed RNA editing systems, such as REPAIR system (19) and CIRTS (21), are usually composed of a targeting module and an effector. The targeting modules, which are responsible for guiding the system to the target region, are usually derived from bacterial restriction systems or nucleic acid binding proteins. For example, the CRISPR systems are derived from bacterial restriction systems that restrict foreign DNA by targeting DNA with a guide RNA (gRNA) (22,23), while the targeting module of ZFNs and TALENs are derived from zinc-finger DNA-binding domains and transcription activator-like effector DNA-binding proteins, respectively (24,25). For RNA editing, sequence-specific RNA binding domains are also utilized for direction of the editing system, such as the LambdaN peptide (26), MS2 coat protein (27), the RNA binding domain of native ADAR2 (adenosine deaminase acting on RNA) (28) or CRISPR-Cas13 (29). The effectors of genome editing systems are usually nucleases (e.g. the native CRISPR nuclease or FokI cleavage domain) (30,31) or DNA deaminases (e.g. TadA or cytidine deaminase) (32,33). For RNA editing, the effectors are usually RNA deaminases, such as hADAR2DD for adenosine-to-inosine (A-to-I) editing or APOBEC1 for cytosine-to-uridine (C-to-U) editing, with the former more prevalent due to correction of non-sense mutations (19). Although the therapeutic potential of site-directed RNA editing is comparable to that of genome editing, the precise and efficient guidance of the effector remains a challenge, and new systems need to be developed. In currently available techniques, either a sequence-specific nucleic acid binding domain or a gRNA was utilized to direct the editing system (21,28,34). However, the possibility of harnessing a modified nucleic acid and a modification-dependent nucleic acid binding protein for gene targeting has not been studied yet.
As a novel modification, the rareness of PT-modification in eukaryotic cells makes it a unique marker with potential as the basis for designing new gene targeting techniques (35). Therefore, we conceived the idea of introducing a chemically synthesized PT-oligonucleotide as a probe to anneal with the target nucleic acid, resulting in a double-stranded hemi PT-modified nucleic acid that could be specifically bound by SBD. Chemically synthesized PT-nucleic acids are widely used in nucleic acid-based drugs to improve metabolic stability and protein interactions (36). Although the PT-probe is able to adopt any sequence pattern and thus anneal with any designated target sequence, the innate sequence-specificity of SBDs still limits the target range to only the sequences recognized by SBDs. Consequently, it would be important to find a promiscuous SBD that could bind all PT-modified sequence patterns for application in RNA editing.
Here in this work, we first identified the sequence-promiscuous PT-dependent SBDHga from Hahella ganghwensis to overcome the limitation of sequence-specificity. We then solved the crystal structure of SBDHga complexed with PT-DNA substrate to 1.8 Å. In the broad-spectrum sequence-recognition mechanism of SBDHga, a short L4 loop interacts with the DNA backbone, replacing the longer L4 loops of other SBDs, which insert into the DNA major groove. Strengthening of other contacts between the SBDHga and the DNA backbone also contributes to the broad sequence range of SBDHga. Taking advantage of the broad sequence recognition of SBDHga, we developed a new site-directed RNA editing system by fusion of SBDHga and hADAR2DD, with PT-modified RNA oligonucleotides as guide RNA. This system was validated by repairing a non-sense mutation in GFP at a frequency of up to 60%, which is comparable with currently available methods. This study is the first utilization of site-directed RNA editing with a targeting domain that recognizes nucleic acid modification. As a proof of concept, this work provides insights for developing new molecular targeting tools for gene therapy.
MATERIALS AND METHODS
Phylogenic analysis of SBD homologs
To find the SBD homologs, a list of experimentally verified, SBD-containing proteins (ScoMcrA, SprMcrA, Eco, Mmo, Sga, EcoWI, Ksp11411I, Bsp305I, Mae9806I and Sau43800I) was used to search against the public protein database on February 23, 2022. The hits were dereplicated by CD-HIT software at a sequence identity cut-off of 90%, and sequences shorter than 200 aa were removed since they were likely cases of truncated proteins. The type II REase EcoRI was added and used as an outgroup. Then the 1723 remaining sequences were aligned with MUSCLE5 (37) using the Super5 algorithm, and the phylogenic tree was generated using the FastTree (38) software using the WAG + CAT model. The iTOL (39) software was used for tree visualization. The protein domains were predicted with CD-search software (40).
Molecular cloning and site-directed mutagenesis
Strains, plasmids, oligonucleotides and primers used in this study are listed in Supplementary Tables S3 and S4. The gene encoding Hga (WP_020410240.1) was synthesized based on the protein sequence with codons optimized for E. coli, followed by cloning into the bacterial expression vector pET28a between the NdeI and XhoI sites and with a 6 × His tag at the N-terminus. The SBD (residues 1–160) of Hga was amplified by high-fidelity PCR using primers listed in Supplementary Table S3 and cloned into pET28a as above. The 6 × His tagged SBDHga-hADAR2DD (SBD- hADAR2DD) expression vector pHWY4 was constructed by cloning SBDHga and hADAR2DD (E488Q) into the pESC-URA vector by high-fidelity PCR with Q5 DNA polymerase and Gibson assembly (NEB). Site-directed mutagenesis was carried out by whole-plasmid inverse PCR and DpnI digestion, followed by chemical transformation. All mutants were constructed in E. coli DH10B and expressed in E. coli BL21(DE3).
Protein expression and purification in E. Coli
The protein expression plasmids were transformed into E. coli BL21(DE3), and single colonies were inoculated into 50 mL LB medium containing 50 μg/mL kanamycin and cultured overnight at 37°C. Then the overnight culture was diluted 1:100 in 1 l fresh LB medium supplemented with 50 μg/mL kanamycin and cultured until the OD600 reached 0.8, and then protein expression was induced by 0.2 mM IPTG for 16–20 h at 16°C. Overexpression of proteins in E. coli at a lower temperature such as 16°C can slow down the rate of protein synthesis, allowing the cellular machinery to efficiently process and fold the newly synthesized proteins, and avoid protein aggregation as inclusion bodies. The cells were harvested and resuspended in 20 mL binding buffer (20 mM Tris–HCl, pH 8.0, 20 mM imidazole and 300 mM NaCl) and lysed by cell homogenizer (JNBio) at 4°C. After centrifugation at 16000 × g for 60 min at 4°C, the supernatant was applied to a 2.5 mL Ni-NTA column (GE Healthcare) pre-equilibrated with binding buffer. The proteins were eluted from the column with 10 mL of elution buffer (20 mM Tris–HCl, pH 8.0, 300 mM imidazole, and 300 mM NaCl) after washing with 200 mL binding buffer. Eluted proteins were further purified with a Source 15Q anion exchange chromatography column, and a Superdex 200 GL 10/300 gel filtration chromatography column using the AKTA FPLC system (GE Healthcare). The peak fractions were combined and concentrated to 7 mg/mL. Purified proteins were visualized by Coomassie staining with 15% SDS-PAGE analysis, and protein concentration was determined using a Bradford Protein Assay Kit (Bio-Rad). The protein was finally stored in buffer containing 10 mM Tris–HCl, pH 8.0, 100mM NaCl and 2 mM DTT at -80°C.
Expression and purification of the chimeric SBD-hADAR2DD in Saccharomyces cerevisiae
The chimeric SBD-hADAR2DD expression vector was transformed into chemically competent S. cerevisiae BY4741 using the Frozen‐EZ Yeast Transformation II Kit (Zymo Research) according to recommendations of the manufacturer. The colonies were selected on synthetic medium without uracil (SC-U) plates at 30°C and verified by PCR using the DNA extracted from small-scale cultures. For protein expression, cells were first inoculated into 50 mL liquid SC-U containing glucose for 2 days at 30°C and then switched to SC-U containing galactose for induction. After 24 h, cells were harvested, lysed by cell homogenizer (JNBio) at 4°C (1800 bar, 3 runs) and purified with the same procedure used for proteins from E. coli.
Preparation of stereo-specific PT-DNA
The PT-DNA oligonucleotides were chemically synthesized by GENEWIZ Co., Ltd. Then the Rp and Sp stereoisomers of PT-DNA were separated by anion exchange HPLC with a DNAPac PA-100 analytical column (Thermo-Fisher Scientific) on an Agilent 1260 Infinity series system at a flow rate of 1 mL/min with the following parameters: column temperature, 50–55°C; solvent A, 10 mM Tris–HCl, pH 8.0; solvent B, 10 mM Tris–HCl, pH 8.0, 1 M NaCl; gradient of 15% B for 5 min, 15% B to 65% B over 30 min, and then 100% B for 10 min; detection, UV absorbance at 260 nm. The peak corresponding to Rp PT-DNA was collected and desalted with Copure C18 column (Biocomma), dried on an RVC 2–25 rotational vacuum concentrator (Christ) and then dissolved with distilled de-ionized water. Double-stranded PT-DNA was prepared by mixing complementary strands, heating to 95°C for 5 min and gradual cooling. The concentration of prepared DNA was determined by spectrophotometric measurement on a NanoDrop 2000 spectrophotometer (Thermo-Fisher Scientific). All DNA oligonucleotides are listed in Supplementary Table S3; PT core sequences were underlined and modification sites were indicated as PS.
Electrophoretic mobility shift assay (EMSA)
Each EMSA reaction (10 μl) contained about 40 ng DNA (600 nM of a 10 bp oligonucleotide) with the molar ratio of protein:DNA at 2:1, in a buffer composed of 20 mM Tris–HCl, pH 8.0, 150 mM NaCl and 5% glycerol. After 15 min incubation on ice, the reaction mixtures were loaded onto a 12% non-denaturing polyacrylamide gel (acrylamide:bis-acrylamide = 79:1, w/w) and electrophoresed in 0.5 × TBE buffer at 150 V mA for 60 min. Gels were stained with a 1:10 000 dilution of SYBR Gold (Invitrogen) and imaged on a Gel Doc XR+ Molecular Imager (Bio-Rad).
Transformation efficiency assay
To test the restriction frequency of Hga for plasmids bearing dnd genes of E. coli B7A and V. cyclitrophicus FF75, expression vectors encoding Hga (pHWY2) were first introduced into E. coli BL21 (DE3). Then 100 ng of pACYCDuet-1 vector (PT−); pYH9 (PT+, carrying the dnd gene cluster from E. coli B7A that resulted in GPSAAC/GPSTTC, GPSATC/GPSATC and GPSTAC/GPSTAC modification); or pHWY1 (PT+, carrying the dnd gene cluster from V. cyclitrophicus FF75 that resulted in CPSCA/TGG PT-modification) were transformed into competent cells containing pHWY2. To test the restriction frequency of Hga mutants D355A, E370A and K372A, the pACYCDuet-1 vector (PT−) and pYH9 were first introduced individually into E. coli BL21 (DE3). Then 100 ng of pET28a vector or its derivatives carrying WT Hga or its mutants were transformed into competent cells containing pACYCDuet-1 or pYH9. The transformation efficiency was determined by counting the E. coli colony-forming units of serial dilutions.
A GPSGCC restriction efficiency assay was conducted by introducing Hga into S. lividans 1326 (PT+, GPSGCC) and S. lividans HXY16 (PT-) via conjugation. The uptake efficiency of Hga by the PT + Streptomyces host decreased by nearly 10-fold when compared to that of the PT- host. The restriction activity on Hga by the host was calculated by comparing the exconjugant numbers for pPM927 harboring Hga and the pPM927 empty vector. Each experiment was carried out in triplicate, and the histograms were generated using GraphPad Prism software (version 8.0).
Crystallization and structure determination
Crystals of SBDHga complexed with the Rp form of 10 bp double-stranded PT-DNA (generated by annealing the Rp form of oligonucleotides 5′-CGAGPSTTCCGGC-3′ and 5′-GCCGGAACTCG-3′) were grown at 12°C by the sitting-drop vapor-diffusion method with drops consisting of 1 μl protein (7 mg/mL) and 1 μl reservoir solution. The reservoir solution contained 0.1 M potassium chloride, 0.05 M sodium cacodylate trihydrate pH 6.0, 16% w/v polyethylene glycol 1000 and 0.0005 M spermine. Crystals were transferred to paraffin oil before being flash-frozen. An X-ray diffraction data set at 1.8 Å resolution was collected at beam line BL19U1 at the National Facility for Protein Science in Shanghai (NPFS, China). The diffraction data were processed using the HKL-3000 software. The crystal of the SBDHga-DNA complex belonged to the P212121 space group with two molecules of SBDHga-DNA in an asymmetric unit. The structure of SBDHga-DNA complex was solved by molecular replacement using the Phenix.Phaser. The primary predicted model of SBDHga-DNA was acquired on tFold (41) and was further mutated to poly-alanine before use as a searching model for molecular replacement. The complete model of the SBDHga-DNA complex was obtained by using iterative refinement in Phenix.Refine and manual building in Coot. The data collection statistics and the refinement statistics for the SBDHga-DNA complexes are summarized in Supplementary Table S1.
In vitro RNA editing assay
The GFPuv-encoding gene (Takara Bio.) was subcloned into pET28a, and the W57stop nonsense mutant (TGG→TAG) was constructed by whole-plasmid PCR and DpnI digestion using primers listed in Supplementary Table S3. The editing mRNA substrate was prepared by the HiScribe™ T7 High Yield RNA Synthesis Kit (NEB), followed by DNase I digestion and RNA purification by the RNA Clean & Concentrator (Zymo Research) according to the recommendations of the manufacturer. The absence of DNA template was always confirmed by PCR prior to cDNA synthesis. Editing reactions were performed in buffer composed of 20 mM Tris‐HCl, 100 mM NaCl, 1 mM DTT and 2.5% glycerol, pH 8.0, and to minimize RNA degradation, the reaction mixture was supplied with 0.5 U/μl recombinant RNase inhibitor (Takara Bio.). For each editing reaction, 100 mM mRNA substrate, 200 nM PT-modified guide RNA (ptgRNA) and 500 nM SBD-hADAR2DD protein were incubated in a thermocycler at 30°C for 30 min followed by 37°C for 30 min for 3 cycles. After editing, the edited mRNA was reverse transcribed into cDNA using the PrimeScript™ RT reagent Kit (Takara Bio.) with primer GFP-cDNA-RP as the specific primer, and PCR amplification using cDNA as template was then carried out for subsequent analysis. To calculate the editing efficiency, the amplification products containing the GFPuv fluorescent reporter gene were first subjected to Sanger sequencing analysis with EditR v1.0.1 (MoriarityLab); additionally, the amplification products were cloned into a plasmid vector followed by transformation into E. coli BL21. The fluorescent colonies and total colonies were counted to calculate the RNA editing efficiency. The results were plotted by GraphPad Prism software (version 8.0).
RESULTS
Hga is a phosphorothioate-dependent PD-(D/E) XK endonuclease
Modification-dependent REases are usually composed of separate modification binding domains and catalytic nuclease domains responsible for recognition and cleavage of targets, respectively. About 10 SBDs have been identified so far, all of which are connected with HNH domains, with only SBDSco and SBDSga bearing an additional SET and RING-associated (SRA) domain or unknown domain, respectively (Figure 1A). To explore the diversity of SBDs, we searched for homologs with previously characterized SBDs in public databases and identified 1722 dereplicated targets. The SBD homologs were divided into seven clades by phylogenetic analysis, and the coupling domains were also analyzed (Figure 1A). We found that, in five of the clades, 1623 proteins or 94% of the homologs contained an HNH domain, among which one clade or more than 80% of the SBD homologs contained a single HNH domain, with eight of the phosphorothioate-dependent REases reported before (14,15). Two clades or 4.7% of the SBD homologs had an extra SRA domain to the N or C terminus of the SBD, indicating they could recognize 5mC-modified DNA as well, consistent with the previously reported ScoMcrA (12). Another two clades or 5.6% of the SBD homologs were linked with an HNH domain and a domain of unknown function, containing the reported SBD homolog Sga. In the SBD homologs without HNH domains, one clade or 3% of the SBD homologs were coupled with an unknown domain and an extra kinase domain. To our surprise, one unique clade or 2.8% of the SBD homologs carried a PD-(D/E) XK domain instead of an HNH domain. The PD-(D/E) XK phosphodiesterase superfamily is extremely diverse and mainly encompasses nucleases, including those involved in DNA restriction, Holliday junction resolution, DNA recombination and DNA repair (42), implying this unique clade may also restrict PT-DNA (Figure 1A). We focused on this unique clade and found an SBD homolog from the marine bacterial strain Hahella ganghwensis (Figure 1A), naming this homolog Hga.
Figure 1.
Identification of the phosphorothioate-dependent PD-(D/E) XK endonuclease Hga. (A) Phylogenic analysis for SBD homologs (left panel). The experimentally verified SBDs are marked by triangles around the tree, with the encoding species stated in parentheses. For each clade, the percentage of proteins and the domain composition are shown in the middle and right panels, respectively. (B) Hga restricts dnd gene clusters from E. coli B7A and V. cyclitrophicus FF75. Error bars indicate the standard deviation of three replicates. (C) Analysis of the PD-(D/E) XK domain. The predicted structure and the alignment with known PD-(D/E) XK domains indicate that Hga harbors a PD-(D/E) XK domain.
We firstly tested if this combination of PD-(D/E) XK and SBD scaffold truly conducted PT-DNA restriction activity. For the in vivo restriction assay, Hga was tested with transformation of plasmids bearing dnd gene clusters from E. coli B7A, which conveyed a double-stranded GPSAAC/GPSTTC, GPSATC/GPSATC and GPSTAC/GPSTAC PT-modification, or from V. cyclitrophicus FF75, which conveyed a hemi CPSCA/TGG PT-modification. The E. coli BL21(DE3) strain expressing Hga restricted the dnd gene cluster from E. coli B7A and the dnd gene cluster from V. cyclitrophicus FF75 1000-fold and 10-fold, respectively (Figure 1B). We also tested Hga for restriction activity on dnd gene clusters of the GPSGCC/GPSGCC sequence pattern in Streptomyces lividans. Hga exhibited a 10-fold degree of restriction on the GPSGCC/GPSGCC dnd gene cluster (Supplementary Figure S1A). To further ensure that this restriction function was achieved by the PD-(D/E) XK domain, we generated the structure of the C-terminal domain (amino acids positions 300–436) of Hga using Alphafold 2 (43). The predicted model (Figure 1C) shows a classic PD-(D/E) XK domain consisting of a central, four-stranded, mixed β-sheet flanked by two α-helices on both sides (with αβββαβ topology). Also, the canonical active site of the PD-(D/E) XK domain was found, which was formed by aspartic acid (D355), located in the N-terminus of the second core β-strand, and glutamic acid (E370), followed by lysine (K372) from the third β-strand (Figure 1C, upper panel), with these three amino acids being conserved in several of its structural analogs (Figure 1C, lower panel). Mutants of D355A, E370A and K372A displayed more than a 300-fold decrease in restriction activity towards the dnd gene, which supported the predictions that Hga is a PT-dependent REase possessing cleavage activity by the PD-(D/E) XK domain and that D355, E370 and K372 together form its active site (Supplementary Figure S1B and C).
SBDHga recognizes PT-DNA of a broad sequence range
Hga restricted not only the double-stranded GPSAAC/GPSTTC, GPSATC/GPSATC and GPSTAC/GPSTAC PT-modification but also the hemi CPSCA/TGG PT-modification in vivo, indicating that the SBD from Hga (SBDHga) recognizes the CPSCA/TGG sequence pattern. So, we measured the PT-DNA binding of SBDHga on chemically synthesized 10 bp PT-DNA with five natural occurring consensus sequences by EMSA, and found that SBDHga binds all the PT-modified substrates but not the unmodified substrates (Figure 2A). SBDHga is the only SBD homolog reported hitherto to recognize all five natural consensus sequence patterns, showing the broadest sequence range on PT-DNA.
Figure 2.
SBDHga binds PT-DNA without sequence selectivity. (A) SBDHga binds all known naturally occurring PT-modified sequence patterns. (B) Substitution of flanking sequences of the core PT site does not affect SBDHga binding. (C) Substitution in the core sequence does not change SBDHga binding. Substituted bases are marked in red.
We further tested the binding ability of SBDHga on non-naturally occurring sequences. We used the 10 bp GPSGCC hemi PT-DNA as substrate, sequentially changed the bases and then evaluated binding by band shifts on EMSAs. We first made single base substitutions on the flanking sequence upstream or downstream of the GPSGCC core site and found no obvious affinity change by EMSA for the substrates with different base types or a degenerate N base (a mixture of four bases) (Figure 2B). Then we evaluated replacing the four bases of the GPSGCC core site sequentially. To our surprise, no apparent decrease in binding capacity of SBDHga was observed (Figure 2C). This experiment demonstrated that SBDHga is insensitive to the base type of PT-DNA in both the flanking and core sequences, indicating that SBDHga is a novel PT-dependent, sequence-promiscuous DNA binding domain.
Overall crystal structure of SBDHga-PT-DNA
To discover the molecular mechanism that SBDHga uses to bind a broad range of PT-DNA sequences, we determined the crystal structure of SBDHga complexed with a 10 bp double-stranded PT-DNA oligonucleotide (5′-CGAGPSTTCGGC-3′) at 1.8 Å resolution by X-ray diffraction (Supplementary Table S1, PDB accession number 8H0L). Two SBDHga-PT-DNA complexes were arranged side by side in an asymmetric unit (Supplementary Figure S2A), and each complex included an SBDHga consisting of eight α-helices (labeled A1-A7 and A4′); two β-sheets (labeled B1-B2); four loops (labeled L1-L4); and a double-stranded PT-DNA molecule, which was adjacent to the loops (Figure 3A and Supplementary Figure S2B). The sulfur atom in the PT-DNA was buried in the hydrophobic surface cavity composed of non-polar atoms, including the methylene groups from the side chains of H25 and K26 on A2; L75 and H79 on A4; and the pyridine ring of P76 on A4 (Figure 3A and B). Amino acid residues were also involved in interactions with the DNA molecule by hydrogen bonds and electrostatic interactions, including N16, R17, R18 and H19 on L1; H25 and K26 on A2; R70 and N72 on L3; H79 and T82 on A4; S104 on L4; and A106 and S107 on A5 (Figures 3A, 4A and Supplementary Figure S2C). These residues mainly interacted with oxygen atoms of the DNA backbone with the exception of N16, which formed hydrogen bonds with the nitrogen atom of G4 (Figure 4A). The detailed interactions are listed in Supplementary Table S2. Mutations of residues responsible for these interactions decreased the binding of PT-DNA (Figure 4D).
Figure 3.
Revealing the SBDHga residues that interact with the PT-DNA and the sulfur atom of PT-DNA. (A) Overall structure of SBDHga binding with its substrate PT-DNA. Residues that interact with the DNA are colored as follows: H25, K26, L75, P76 and H79, which form the sulfur atom-binding cavity, are in yellow; N16, which recognizes the DNA base, is in purple; and R17, R18, H19, R70, N72, T82, S104, S106 and S107, which interact with phosphates through hydrogen bonds or electrostatic interactions, are in blue. (B) Sulfur atom-binding cavity on SBDHga composed of H25, K26, L75, P76 and H79.
Figure 4.
Comparison of the structures for SBDHga, SBDSpr and SBDSco complexed with PT-DNA. (A) Schematic summary of the interactions between SBDHga and PT-DNA. (B and C) Schematic summary of the interactions between SBDSpr(B) or SBDSco(C) and PT-DNA. (D) EMSAs of SBDHga wild-type (WT) and mutants on all known naturally occurring PT-DNA sequences. H25A, R70A and N72A indicate the influence of interactions on the DNA backbone; P76A indicates the influence of the sulfur binding cavity; and H25Y and L75Y indicate the influence of steric hindrance. (E) Schematic summary of the interactions between SBDHga (left), SBDSpr (center) and SBDSco (right) and the PT-DNA substrate. PT-DNA is represented by the trunk and interactions by the leaves. The interactions with bases are marked in brown and interactions with the DNA backbone are marked in green (hydrogen bonds) or yellow (electrostatic interactions). The numbers are listed in the table (F).
Comparison of the structures of SBDHga, SBDSpr and SBDSco complexed with DNA
To reveal the mechanism underlying the different sequence specificities of SBDs, we compared the structures of SBDHga and the previously reported PT-DNA binding proteins SBDSpr and SBDSco. The superimposition of SBDSpr or SBDSco with SBDHga presented similar topology structures, with RMSDs of 4.67Å over 74 Cα atoms, and 2.98 Å over 74 Cα atoms, respectively. There were common features in the three complexes, with the DNAs adjacent to one side of the SBDs (Supplementary Figure S3); mainly positively charged residues in the interface of the SBD and PT-DNA (Figure 5A); and the sulfur atoms buried in a hydrophobic surface cavity, which was composed of a central pyrrolidine and four methyl or methylene groups (Supplementary Figure S4). Furthermore, the SBDs interacted with the DNA backbone and bases via hydrogen bonds and electrostatic interactions in addition to the hydrophobic interactions with the sulfur atom (Figure 4A-C). However, there were still some differences in SBDHga, which are probably related to its broad sequence tolerance for PT-DNA. SBDHga had 14 hydrogen bonds and electrostatic interactions with the DNA, compared with 11 interactions for SBDSpr and SBDSco (Figure 4E, F). Most interactions of SBDHga were not sequence-specific, including 13 contacts with the DNA backbone, and the only base contact between N16 and G4 could also be formed with the other three base types (Supplementary Figure S5). The abundant sequence-nonspecific contacts with DNA in SBDHga provide sufficient interactions to stabilize the complex with PT-DNA of all sequence patterns. In contrast, SBDSpr and SBDSco have 4 and 7 sequence-specific interactions, respectively, with the DNA bases (Figure 4E, F). These contacts are indispensable for stable binding of PT-DNA by SBDs, so SBDSpr and SBDSco bind only specific-sequences.
Figure 5.
Comparison of the structures for SBDHga, SBDSpr and SBDSco complexed with PT-DNA. (A) Surface charge of SBDHga (left), SBDSpr (middle) and SBDSco (right). The surface charge distribution at neutral pH is displayed with blue for positive, red for negative, and white for neutral. The L4 loops of SBDHga, SBDSpr and SBDSco are marked with dashed circles. (B) Sequence alignment of SBDHga, SBDSpr and SBDSco. Secondary structure elements of SBDHga, SBDSpr and SBDSco are numbered according to crystal structures (PDB code: 8H0L, 7CC9 and 5ZMO). The conserved residues forming the sulfur binding cavity are marked with yellow boxes; loops that interact with DNA bases are marked by red boxes; and blue triangles indicate SBDHga residues that interact with PT-DNA. The alignment was generated with ESPript. (C) The L4 loop of SBDHga, SBDSpr and SBDSco. Dashed lines indicate the interactions.
To further explain the absence of sequence-specific interactions in SBDHga, we found that the interactions of SBDHga were almost equally distributed on L1, A2, L3, A4 and L4, and thus more dispersed than for SBDSpr and SBDSco, which mainly had interactions on A2, A4 and especially L4 (Supplementary Figure S6). In SBDSpr and SBDSco, the long and flexible L4, containing positively charged residues such as histidine and arginine, could insert into the DNA major groove and form sequence-specific interactions with the bases (Figure 5A). In contrast, in SBDHga, the L4 region, composed of mainly neutral residues, was too short to insert into the DNA major groove, and the L4 region anchored the DNA by interacting with the ends of the DNA backbone through S104, A106 (interaction through a nitrogen atom in the main chain), and S107 instead, with little effect on base recognition (Figure 5A).
Secondary structure alignment of SBDHga, SBDSpr and SBDSco revealed that the three SBDs share similar secondary structure arrangements except in the region between B1 and B2. This region of SBDHga had two long α-helices (A4′ and A5) and the linking L4 loop was short (only 6 residues), whereas in SBDSpr, the A4′ was missing, so L4 was longer (12 residues) despite a longer A5. In SBDSco, there also a long L4 (11 residues) due to shorter A4′ and A5 α-helices (Figure 5B). In SBDHga, the rigid long A4′ and A5 regions restricted flexibility of L4, inhibiting its contacts with the DNA bases, so that L4 made sequence-nonspecific interactions with the DNA backbone instead. In SBDSpr and SBDSco, L4 was relatively long and flexible and was able to make sequence-specific interactions with the DNA bases (Figure 5C). We also observed that L1 of SBDHga had strong interactions with the DNA backbone via the basic residues R17, R18 and H19, and an interaction with the base G4 by N16 was also found (Figure 3A and Supplementary Figure S6). This kind of interaction was not detected in SBDSpr and SBDSco, among which only a weak interaction between A22 and the DNA backbone in SBDSpr was discovered.
A previously reported surface patch was positively charged in SBDSpr and contained R73, which interacted with the DNA backbone, whereas this patch was negatively charged in SBDSco, contained E156 and D157 and could not interact with DNA (16); the corresponding region in SBDHga was also positively charged but had more interactions with DNA, via R70 and N72, than found with R73 in SBDSpr. EMSA experiments revealed that mutations of R70 or N72 in SBDHga eliminated the interactions between these sites and DNA, thus partially reducing the interactions between the protein and DNA (Figure 4D). In addition, the hydrophobic cavity of SBDHga was composed of relatively smaller residues than tyrosine, which was present in the hydrophobic cavity of SBDSpr and SBDSco. Mutation of L75 on SBDHga to tyrosine, which had no effect on the hydrophobicity of the SBD cavity, reduced the binding with PT-DNA with the sequences GPSTTC and GPSATC. This experiment suggests that the hydrophobic cavity of SBDHga has lower steric hindrance due to the smaller side chain, which is conducive to binding different types of PT-modified sequences.
In summary, compared with SBDSpr and SBDSco, the long and rigid A4′ and A5 reduced length and flexibility of L4 in SBDHga. The shorter L4 of SBDHga, which could not reach the DNA bases, made sequence-nonspecific interactions with the DNA backbone instead, and the missing interactions were compensated for by dispersed sequence-nonspecific interactions between the DNA backbone in L1 and L3 (Supplementary Figure S6). The lower steric hindrance of the SBDHga hydrophobic cavity also helped binding of different types of PT-modified sequences.
Site-directed RNA editing with chimeric SBD-hADAR2DD
Since SBDHga is a PT-dependent, sequence-promiscuous DNA binding domain, we considered that it should be possible to harness SBDHga as a targeting tool to help direct the effectors to perform gene editing. DNA editing needs unwinding of the double helix, so here we tried to verify the concept by site-directed RNA editing using SBDHga and a PT-modified guide RNA probe (ptgRNA) to direct the effector hADAR2DD in performing deamination on the target site (Figure 6A). We first tested binding of SBDHga on PT-dsRNA with one or several consecutive hemi PT-modifications, which is the predicted product after annealing of ptgRNA and target mRNA. The EMSA results revealed that SBDHga binds hemi PT-dsRNA and that the binding efficiency improved with increasing PT-modification number (Supplementary Figure S7), consistent with our previous research that consecutive PT-modifications improve the binding affinity of SBDs on PT-modified nucleic acids (unpublished data).
Figure 6.
In vitro RNA editing experiment with SBDHga-ADAR2DD. (A) Schematic summary of the RNA editing process. (B) Sanger sequencing results for editing products at the target region. The target TAG stop codon is marked with a red box. A/G indicates the mixture of A (green peak) and G (black peak) bases at the edited site. (C & D) RNA editing efficiency of hADAR2DD(C) and SBDHga-ADAR2DD(D) measured with Sanger sequencing. Error bars indicate the standard deviation, and statistical analysis used an unpaired two-sample T-test, in which the tested samples were all compared to the sample with ptgRNA-0PT. *P< 0.05, **P< 0.01. The experiments were carried out in triplicate.
To perform RNA editing, we used the GFPuv nonsense mutant as editing template, in which the codon TGG of W57 was changed to the TAG stop codon (Supplementary Figure S8). We constructed the chimeric SBDHga-hADAR2DD expression vector by fusing the deaminase domain of hADAR2DD-E488Q (29) to SBDHga via a GGGGS linker and purified the recombinant protein by expression in S. cerevisiae BY4741. Then we designed ptgRNA complementary to the target sequence to anneal with the mRNA, forming the dsRNA with hemi PT-modification(s), which could be recognized by SBDHga and which would direct hADAR2DD to edit the target site (Figure 6A). A group of 40 nt RNAs with 1, 3, 5 or 7 consecutive PT-modifications, named ptgRNA-1PT, ptgRNA-3PT, ptgRNA-5PT and ptgRNA-7PT (with, respectively, 1, 3, 5 and 7 consecutive PT-modifications in the RNA); and a non-PT-modified RNA, named ptgRNA-0PT, as a negative modification control, were synthesized (Supplementary Figure S8).
After RNA editing by SBDHga-hADAR2DD, the mRNA products were reverse transcribed to cDNA for measurement of efficiency. We determined the RNA editing efficiency by measuring the peak area from Sanger sequencing of the cDNA (Figure 6B). If hADAR2DD performed RNA editing at the target site, the A base of codon 57 (UAG) would have changed to I (UIG), and the I base would be interpreted as G in Sanger sequencing. So, we were able to calculate the editing efficiency by measuring the peak area of G. No editing was detected in the blank control, which was carried out without ptgRNA. The modification control, which was performed using the ptgRNA-0PT without PT-modification, gave an editing efficiency of 30%, which is consistent with the natural function of hADAR2DD on mismatched RNA (44). We also tested hADAR2DD as a control, and the editing efficiency was about 20% with all types of ptgRNA (Figure 6C). The editing efficiency of PT-modified ptgRNA-1PT was about 40%, implying that binding of the PT-modified RNA by SBDHga helped direct hADAR2DD, resulting in higher efficiency. By using 3, 5 or 7 consecutive PT-modifications, we achieved efficiencies of about 45%, 50% and 60%, respectively (Figure 6D). The editing efficiency of approximately 60% is comparable with that of currently prevailing RNA editing technologies (19). Additionally, no obvious off-target sites were detected by Sanger sequencing (Supplementary Figure S9). The editing efficiency was also calculated in a confirmatory assay examining GFP fluorescence (Supplementary Figure S10). Since the I (inosine) base would be interpreted as G by in vivo translational machinery, we were able to calculate the editing frequency by counting the fluorescent colonies. The cDNA was cloned into an E. coli expression vector, and the editing efficiency was measured by counting the colonies with functional GFP. The results were consistent with the editing efficiency measured by Sanger sequencing.
DISCUSSION
Modification-dependent REases are known for their ability to recognize modified nucleic acids, and therefore they have been used for detection or mapping of DNA modifications (45). Moreover, the unique PT-dependent REases are promising new targeting domains for nucleic acid detection (46,47), genome editing and site-directed RNA editing, on the basis of their sequence promiscuity. To validate this hypothesis, we systematically analyzed the phylogenic distribution of SBDs, identified the promiscuous PT-dependent SBDHga, revealed the mechanism of the broad sequence recognition spectrum of SBDHga by solving the SBDHga crystal structure and performed site-directed RNA editing in vitro using SBDHga. Our work provides insights into the basis for the sequence-specificity of SBDs and demonstrated a new approach for the development of REases as tools for gene therapy.
SBDs are widely distributed in bacteria, with most of them linked with HNH to constitute PT-dependent HNH endonucleases. Among them, some possess an SRA domain, which shares a common HNH motif with SBD to restrict 5mC methylated DNA. Some homologs bear extra unknown domains, which may function in restriction of other types of modifications or regulation of the nuclease domain, and some homologs without an HNH domain have acquired kinase domains, which may play a role in PT-dependent regulation of biological processes. Additionally, a small group of SBD homologs has the PD-(D/E) XK motif characteristic of many Type II REases, and members of this group are consistently phylogenically distant from other currently identified SBDs (Figure 1A). Mining of SBDs with diverse cognate domains is an efficient way to discover SBDs with diversified sequence specificity and new PT-related functions in addition to restriction.
Hga, identified in Hahella ganghwensis, consists of an SBD and PD-(D/E) XK domain, and in our assays, Hga restricted the E. coli B7A dnd gene cluster (modifies DNA in the GPSAAC/GPSTTC, GPSATC/GPSATC and GPSTAC/GPSTAC sequence patterns) by 1000-fold, and restricted the V. cyclitrophicus FF75 dnd gene cluster (modifies DNA in the CPSCA/TGG sequence pattern) by 10-fold. The distinct restriction activities may lie in the difference in modification type in which Dnd from B7A phosphorothioates DNA on both strands while that from FF75 only phosphorothioates DNA on the CCA strand. The function of the PD-(D/E) XK domain was confirmed by site-mutagenesis of the catalytic center, which abolished the restriction to the dnd gene cluster. Again, this observation determined that the toxic effects are most likely rendered by the cooperative cleavage of PT-DNA by SBD and PD-(D/E) XK instead of tight binding of SBD to PT-DNA. Hga is the first reported PT-dependent REase that can recognize the CPSCA/TGG sequence, prompting us to systematically examine its sequence specificity. Substitution of bases flanking or inside the core sequence did not reduce the binding of SBDHga to PT-DNA, and using 10 bp PT-DNA with degenerate bases, we demonstrated that SBDHga is a PT-dependent and sequence-promiscuous DNA binding domain. Another work of our group also proved the sequence promiscuity by the PT-modified random oligos binding and selection assay (46).
Interactions between DNA and proteins are usually classified into sequence-nonspecific and sequence-specific interactions; the former provides stabilization and is primarily established by protein residues via hydrogen bonds or electrostatic contacts with the sugar-phosphate DNA backbone, and the latter provides specificity and is established via base-specific interactions (48). For the PT-dependent SBDs, the hydrophobic interactions with the sulfur atoms in the DNA backbone, as well as the hydrogen bonds and electrostatic contacts with the DNA backbone, are sequence-nonspecific but significant for stabilizing the SBD-PT-DNA complex. The interactions between protein residues and distinct DNA bases are responsible for the sequence specificity. The crystal structure of SBDSco and SBDSpr and SBDHga in complex with their PT-DNA substrates revealed similar sulfur-binding cavities and hydrophobic interactions, but the interactions with the DNA backbone and bases were rather distinct. Previous work (14,16) revealed 4 and 7 hydrogen bonds or electrostatic interactions with the DNA backbone and bases in SBDSco, or 7 and 4 hydrogen bonds or electrostatic interactions with the DNA backbone and bases in SBDSpr, respectively. Removing one or more of these interactions by mutagenesis will reduce the binding affinity. Compared with SBDSco, which binds only the GPSGCC core sequence, SBDSpr has more sequence-nonspecific interactions with the DNA backbone and thus requires fewer sequence-specific interactions with the bases for stabilization of the complex, allowing recognition of a broader sequence spectrum, including GPSAAC/ GPSTTC and GPSATC. Engineering of SBDSco by mutation of E156R/D157R provided additional interactions with the DNA backbone and extended the sequence spectrum. The crystal structure of SBDHga complexed with PT-DNA revealed 13 sequence-nonspecific interactions with the DNA backbone, much stronger than the respective 7 and 4 sequence-nonspecific interactions in SBDSpr and SBDSco. Although SBDHga makes only one contact with a DNA base, the 13 interactions with the DNA backbone are sufficient for stabilizing the SBD-PT-DNA complex, making SBDHga a promiscuous PT-DNA binding domain.
The residues responsible for base interactions are mainly located in the long, flexible L4 loop in SBDSpr and SBDSco. However, in SBDHga, part of the residues corresponding to L4 of SBDSpr and SBDSco formed a new rigid A4′ α-helix instead, so the remaining L4 loop is rather short. The short L4 loop of SBDHga thus cannot insert into the DNA major groove, but interacts with the DNA backbone. In addition, the L1 loop and L3 loop in SBDHga each have 6 interactions with the DNA backbone, compared with 3 interactions in SBDSpr and no interaction in SBDSco. In all, SBDHga recognizes PT-DNA of all sequence patterns by enhanced interactions with the DNA backbone through protein residues in L1, L3, and L4. The PT-DNA recognition mechanism of SBDHga may provide insights for expanding the sequence recognition range of DNA-binding proteins, such as the PAM site extension of CRISPR-Cas9 (49).
The broad sequence spectrum of SBDHga makes it a potential tool for gene targeting, with PT-modified oligonucleotides as probes. Since DNA editing requires unwinding of the double helix, we used RNA editing as a proof of concept, and to avoid potential degradation by RNase H, we used PT-modified RNA oligonucleotides as probes. The ptgRNA annealed with target sequences and recruited recombinant SBD-hADAR2DD for the base editing reactions. The editing efficiency of ptgRNA-1PT was slightly higher than that of the ptgRNA-0PT control due to binding by SBD, and with increasing PT-modifications, the editing efficiency was also improved from 40% to 60%, due to improved binding by SBD on consecutive PT-modifications. PT-modifications raised the editing efficiency to 60%, which is 2-fold above the control value and comparable with currently available methods (26–29,34,50). We ascribe the improvement of editing efficiency to the enhancement of the SBDHga binding affinity to tandem PT-modified substrates (unpublished data). In fact, PT-modifications extend the intracellular half-lives of oligonucleotides by increasing resistance to nucleases and improve delivery by enhancing interactions with proteins (36). The tandem PT-modified ptgRNA is a promising tool for guiding SBD-hADAR2DD to perform base editing at a high efficiency. Additionally, ptgRNA and SBD-hADAR2DD could be applied for gene therapy via non-viral delivery strategies such as RNP delivery (51).
Supplementary Material
ACKNOWLEDGEMENTS
We thank the staffs from BL17B/BL18U1/BL19U1/ BL19U2/BL01B beamlines of the Shanghai Synchrotron Radiation Facility for assistance in data collection.
Contributor Information
Wenyue Hu, State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, People's Republic of China.
Bingxu Yang, State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, People's Republic of China.
Qingjie Xiao, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201204, People's Republic of China.
Yuli Wang, State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, People's Republic of China.
Yuting Shuai, State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, People's Republic of China.
Gong Zhao, State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, People's Republic of China.
Lixin Zhang, State Key Laboratory of Bioreactor Engineering, and School of Biotechnology, East China University of Science and Technology (ECUST), Shanghai 200237, People's Republic of China.
Zixin Deng, State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, People's Republic of China.
Xinyi He, State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, People's Republic of China.
Guang Liu, State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, People's Republic of China.
DATA AVAILABILITY
Atomic coordinates and structure factors for crystal structures of SBDHga-PT-DNA were deposited in the Protein Data Bank. The accession code is 8H0L.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
National Key Research and Development Program of China [2022YFC3400200, 2022YFA0912200, 2020YFA0907200], National Natural Science Foundation of China [32170047, 31900060], Shanghai Pilot Program for Basic Research - Shanghai Jiao Tong University [21TQ1400204] and Natural Science Foundation of Shanghai [22ZR1430100, 20ZR1414500]. Funding for open access charge: National Key Research and Development Program of China and Shanghai Pilot Program for Basic Research – Shanghai Jiao Tong University.
Conflict of interest statement. None declared.
REFERENCES
- 1. Loenen W.A., Dryden D.T., Raleigh E.A., Wilson G.G., Murray N.E.. Highlights of the DNA cutters: a short history of the restriction enzymes. Nucleic Acids Res. 2014; 42:3–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Greenberg M.V.C., Bourc’his D. The diverse roles of DNA methylation in mammalian development and disease. Nat. Rev. Mol. Cell Biol. 2019; 20:590–607. [DOI] [PubMed] [Google Scholar]
- 3. Unnikrishnan A., Freeman W.M., Jackson J., Wren J.D., Porter H., Richardson A.. The role of DNA methylation in epigenetics of aging. Pharmacol. Ther. 2019; 195:172–185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Koch A., Joosten S.C., Feng Z., de Ruijter T.C., Draht M.X., Melotte V., Smits K.M., Veeck J., Herman J.G., Van Neste L.et al.. Analysis of DNA methylation in cancer: location revisited. Nat. Rev. Clin. Oncol. 2018; 15:459–466. [DOI] [PubMed] [Google Scholar]
- 5. Zhou X., He X., Liang J., Li A., Xu T., Kieser T., Helmann J.D., Deng Z.. A novel DNA modification by sulphur. Mol. Microbiol. 2005; 57:1428–1438. [DOI] [PubMed] [Google Scholar]
- 6. Wang L., Chen S., Xu T., Taghizadeh K., Wishnok J.S., Zhou X., You D., Deng Z., Dedon P.C.. Phosphorothioation of DNA in bacteria by dnd genes. Nat. Chem. Biol. 2007; 3:709–710. [DOI] [PubMed] [Google Scholar]
- 7. Wang L., Chen S., Vergin K.L., Giovannoni S.J., Chan S.W., DeMott M.S., Taghizadeh K., Cordero O.X., Cutler M., Timberlake S.et al.. DNA phosphorothioation is widespread and quantized in bacterial genomes. Proc. Natl. Acad. Sci. U.S.A. 2011; 108:2963–2968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Wu X., Cao B., Aquino P., Chiu T.P., Chen C., Jiang S., Deng Z., Chen S., Rohs R., Wang L.et al.. Epigenetic competition reveals density-dependent regulation and target site plasticity of phosphorothioate epigenetics in bacteria. Proc. Natl. Acad. Sci. U.S.A. 2020; 117:14322–14330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Bickle T.A., Kruger D.H.. Biology of DNA restriction. Microbiol. Rev. 1993; 57:434–450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Tock M.R., Dryden D.T.. The biology of restriction and anti-restriction. Curr. Opin. Microbiol. 2005; 8:466–472. [DOI] [PubMed] [Google Scholar]
- 11. Loenen W.A., Raleigh E.A.. The other face of restriction: modification-dependent enzymes. Nucleic Acids Res. 2014; 42:56–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Liu G., Ou H.Y., Wang T., Li L., Tan H., Zhou X., Rajakumar K., Deng Z., He X.. Cleavage of phosphorothioated DNA and methylated DNA by the type IV restriction endonuclease ScoMcrA. PLoS Genet. 2010; 6:e1001253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Yu H., Liu G., Zhao G., Hu W., Wu G., Deng Z., He X.. Identification of a conserved DNA sulfur recognition domain by characterizing the phosphorothioate-specific endonuclease SprMcrA from Streptomyces pristinaespiralis. Mol. Microbiol. 2018; 110:484–497. [DOI] [PubMed] [Google Scholar]
- 14. Liu G., Fu W., Zhang Z., He Y., Yu H., Wang Y., Wang X., Zhao Y.L., Deng Z., Wu G.et al.. Structural basis for the recognition of sulfur in phosphorothioated DNA. Nat. Commun. 2018; 9:4689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Lutz T., Czapinska H., Fomenkov A., Potapov V., Heiter D.F., Cao B., Dedon P., Bochtler M., Xu S.Y.. Protein Domain Guided Screen for Sequence Specific and Phosphorothioate-Dependent Restriction Endonucleases. Front. Microbiol. 2020; 11:1960. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Yu H., Li J., Liu G., Zhao G., Wang Y., Hu W., Deng Z., Wu G., Gan J., Zhao Y.L.et al.. DNA backbone interactions impact the sequence specificity of DNA sulfur-binding domains: revelations from structural analyses. Nucleic Acids Res. 2020; 48:8755–8766. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Hsu P.D., Lander E.S., Zhang F.. Development and applications of CRISPR-Cas9 for genome engineering. Cell. 2014; 157:1262–1278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Knott G.J., Doudna J.A.. CRISPR-Cas guides the future of genetic engineering. Science. 2018; 361:866–869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Vogel P., Stafforst T.. Critical review on engineering deaminases for site-directed RNA editing. Curr. Opin. Biotechnol. 2019; 55:74–80. [DOI] [PubMed] [Google Scholar]
- 20. Kim H., Kim J.S.. A guide to genome engineering with programmable nucleases. Nat. Rev. Genet. 2014; 15:321–334. [DOI] [PubMed] [Google Scholar]
- 21. Rauch S., He E., Srienc M., Zhou H., Zhang Z., Dickinson B.C.. Programmable RNA-Guided RNA Effector Proteins Built from Human Parts. Cell. 2019; 178:122–134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Wiedenheft B., Sternberg S.H., Doudna J.A.. RNA-guided genetic silencing systems in bacteria and archaea. Nature. 2012; 482:331–338. [DOI] [PubMed] [Google Scholar]
- 23. Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E.. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. 2012; 337:816–821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Urnov F.D., Rebar E.J., Holmes M.C., Zhang H.S., Gregory P.D.. Genome editing with engineered zinc finger nucleases. Nat. Rev. Genet. 2010; 11:636–646. [DOI] [PubMed] [Google Scholar]
- 25. Miller J.C., Tan S., Qiao G., Barlow K.A., Wang J., Xia D.F., Meng X., Paschon D.E., Leung E., Hinkley S.J.et al.. A TALE nuclease architecture for efficient genome editing. Nat. Biotechnol. 2011; 29:143–148. [DOI] [PubMed] [Google Scholar]
- 26. Montiel-Gonzalez M.F., Vallecillo-Viejo I., Yudowski G.A., Rosenthal J.J.. Correction of mutations within the cystic fibrosis transmembrane conductance regulator by site-directed RNA editing. Proc. Natl. Acad. Sci. U.S.A. 2013; 110:18285–18290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Katrekar D., Chen G., Meluzzi D., Ganesh A., Worlikar A., Shih Y.R., Varghese S., Mali P.. In vivo RNA editing of point mutations via RNA-guided adenosine deaminases. Nat. Methods. 2019; 16:239–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Reautschnig P., Wahn N., Wettengel J., Schulz A.E., Latifi N., Vogel P., Kang T.W., Pfeiffer L.S., Zarges C., Naumann U.et al.. CLUSTER guide RNAs enable precise and efficient RNA editing with endogenous ADAR enzymes in vivo. Nat. Biotechnol. 2022; 40:759–768. [DOI] [PubMed] [Google Scholar]
- 29. Cox D.B.T., Gootenberg J.S., Abudayyeh O.O., Franklin B., Kellner M.J., Joung J., Zhang F.. RNA editing with CRISPR-Cas13. Science. 2017; 358:1019–1027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Guilinger J.P., Thompson D.B., Liu D.R.. Fusion of catalytically inactive Cas9 to FokI nuclease improves the specificity of genome modification. Nat. Biotechnol. 2014; 32:577–582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Tsai S.Q., Wyvekens N., Khayter C., Foden J.A., Thapar V., Reyon D., Goodwin M.J., Aryee M.J., Joung J.K.. Dimeric CRISPR RNA-guided FokI nucleases for highly specific genome editing. Nat. Biotechnol. 2014; 32:569–576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Komor A.C., Kim Y.B., Packer M.S., Zuris J.A., Liu D.R.. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature. 2016; 533:420–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Gaudelli N.M., Komor A.C., Rees H.A., Packer M.S., Badran A.H., Bryson D.I., Liu D.R.. Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature. 2017; 551:464–471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Stafforst T., Schneider M.F.. An RNA-deaminase conjugate selectively repairs point mutations. Angew. Chem. Int. Ed Engl. 2012; 51:11166–11169. [DOI] [PubMed] [Google Scholar]
- 35. Tong T., Chen S., Wang L., Tang Y., Ryu J.Y., Jiang S., Wu X., Chen C., Luo J., Deng Z.et al.. Occurrence, evolution, and functions of DNA phosphorothioate epigenetics in bacteria. Proc. Natl. Acad. Sci. USA. 2018; 115:E2988–E2996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Eckstein F. Phosphorothioates, essential components of therapeutic oligonucleotides. Nucleic Acid Ther. 2014; 24:374–387. [DOI] [PubMed] [Google Scholar]
- 37. Edgar R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004; 32:1792–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Price M.N., Dehal P.S., Arkin A.P.. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 2009; 26:1641–1650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Letunic I., Bork P.. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 2016; 44:W242–W245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Lu S., Wang J., Chitsaz F., Derbyshire M.K., Geer R.C., Gonzales N.R., Gwadz M., Hurwitz D.I., Marchler G.H., Song J.S.et al.. CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res. 2020; 48:D265–D268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Shen T., Wu J.X., Lan H.D., Zheng L.Z., Pei J.G., Wang S., Liu W., Huang J.Z.. When homologous sequences meet structural decoys: accurate contact prediction by tFold in CASP14-(tFold for CASP14 contact prediction). Proteins. 2021; 89:1901–1910. [DOI] [PubMed] [Google Scholar]
- 42. Steczkiewicz K., Muszewska A., Knizewski L., Rychlewski L., Ginalski K.. Sequence, structure and functional diversity of PD-(D/E)XK phosphodiesterase superfamily. Nucleic Acids Res. 2012; 40:7016–7045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Zidek A., Potapenko A.et al.. Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596:583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Qu L., Yi Z., Zhu S., Wang C., Cao Z., Zhou Z., Yuan P., Yu Y., Tian F., Liu Z.et al.. Programmable RNA editing by recruiting endogenous ADAR using engineered RNAs. Nat. Biotechnol. 2019; 37:1059–1069. [DOI] [PubMed] [Google Scholar]
- 45. Yang W., Fomenkov A., Heiter D., Xu S., Ettwiller L.. High-throughput sequencing of EcoWI restriction fragments maps the genome-wide landscape of phosphorothioate modification at base resolution. PLos Genet. 2022; 18:e1010389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Shuai Y., Xu A., Li J., Han Z., Ma D., Duan H., Wang X., Jiang L., Zhang J., Tan G.-Y.et al.. Profile and relaxation of sequence-specificity of DNA sulfur binding domains facilitate new nucleic acid detection platform. Science Bulletin. 2023; 68:1752–1756. [DOI] [PubMed] [Google Scholar]
- 47. Shuai Y., Ju Y., Li Y., Ma D., Jiang L., Zhang J., Tan G.-Y., Liu X., Wang S., Zhang L.et al.. A rapid nucleic acid detection platform based on phosphorothioate-DNA and sulfur binding domain. Synth. Syst. Biotechnol. 2023; 8:213–219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Travers A.A. DNA-Protein Interactions. 1993; 1st edn.London; NY: Chapman & Hall. [Google Scholar]
- 49. Walton R.T., Christie K.A., Whittaker M.N., Kleinstiver B.P.. Unconstrained genome targeting with near-PAMless engineered CRISPR-Cas9 variants. Science. 2020; 368:290–296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Yi Z., Qu L., Tang H., Liu Z., Liu Y., Tian F., Wang C., Zhang X., Feng Z., Yu Y.et al.. Engineered circular ADAR-recruiting RNAs increase the efficiency and fidelity of RNA editing in vitro and in vivo. Nat. Biotechnol. 2022; 40:946–955. [DOI] [PubMed] [Google Scholar]
- 51. Zuris J.A., Thompson D.B., Shu Y., Guilinger J.P., Bessen J.L., Hu J.H., Maeder M.L., Joung J.K., Chen Z.Y., Liu D.R.. Cationic lipid-mediated delivery of proteins enables efficient protein-based genome editing in vitro and in vivo. Nat. Biotechnol. 2015; 33:73–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Atomic coordinates and structure factors for crystal structures of SBDHga-PT-DNA were deposited in the Protein Data Bank. The accession code is 8H0L.






