Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Dec 6.
Published in final edited form as: J Am Chem Soc. 2024 Dec 3;146(51):35129–35138. doi: 10.1021/jacs.4c11043

Joining Natural and Synthetic DNA Using Biversal Nucleotides: Efficient Sequencing of Six-Nucleotide DNA

Bang Wang 1, Hyo-Joong Kim 2, Kevin M Bradley 3, Cen Chen 4, Chris McLendon 5, Zunyi Yang 6, Steven A Benner 7
PMCID: PMC12679606  NIHMSID: NIHMS2114117  PMID: 39625448

Abstract

By rearranging hydrogen bond donor and acceptor groups within a standard Watson–Crick geometry, DNA can add eight independently replicable nucleotides forming four additional not found in standard Terran DNA. For many applications, the orthogonal pairing of standard and nonstandard pairs offers a key advantage. However, other applications require standard and nonstandard nucleotides to communicate with each other. This is especially true when seeking to recruit high-throughput instruments (e.g., Illumina), designed to sequence standard 4-nucleotide DNA, to sequence DNA that includes added nucleotides. For this purpose, PCR workflows are needed to replace nonstandard nucleotides in (for example) a 6-letter DNA sequence by defined mixtures of standard nucleotides built from 4 nucleotides. High-throughput sequencing can then report the sequences of those mixtures to bioinformatic alignment tools, which infer the original 6-nucleotide sequence by analysis of the mixtures. Unfortunately, the intrinsic orthogonality of standard and nonstandard nucleotides often demand polymerases that violate pairing biophysics to do this replacement, leading to inefficiencies in this “transliteration” process. Thus, laboratory in vitro evolution (LIVE) using “anthropogenic evolvable genetic information systems” (AEGIS), an important “consumer” of new sequencing tools, has been slow to be democratized; robust sequencing is needed to identify the AegisBodies and AegisZymes that AEGIS-LIVE delivers. This work introduces a new way to connect synthetic and standard molecular biology: biversal nucleotides. In an example presented here, a pyrimidine analogue (pyridine-2-one, y) pairs with Watson–Crick geometry to both a nonstandard base (2-amino-8-imidazo-[1,2a]-1,3,5-triazin-[8H]-4-one, P, the Watson–Crick partner of 6-amino-5-nitro-[1H]-pyridin-2-one, Z) and a base that completes the Watson–Crick hydrogen bond pattern (2-amino-2′-deoxyadenosine, amA). PCR amplification of GACTZP DNA with dyTP delivers products where Z:P pairs are cleanly transliterated to A:T pairs. In parallel, PCR of the same GACTZP sample at higher pH delivers products where Z:P pairs are cleanly transliterated to C:G pairs. By allowing robust sequencing of 6-letter GACTZP DNA, this workflow will help democratize AEGIS-LIVE. Further, other implementations of the biversal concept can enable communication across and between standard DNA and synthetic DNA more generally.

Graphical Abstract

graphic file with name nihms-2114117-f0001.jpg

INTRODUCTION

Molecules that bind other molecules are central to research, diagnostics, and therapeutics. Small-molecule binders come via medicinal chemistry, which includes quantitative structure–activity analysis,1 distributed computing,2 and screening.3 For protein binders, antibodies are the go-to molecules, especially if they are improved by laboratory-directed evolution.4 Other protein scaffolds may replace antibody protein scaffolds.5

Unfortunately, even today, medicinal chemistry remains a “hard slog” as a route to binders. Commercial antibodies often generate irreproducible outcomes that have been the subject of much discussion.6 Further, they are essentially impossible to multiplex in soluble form. More generally, peptide libraries are dominated by insoluble species,7 as the amide backbone linkages with hydrophobic side chains create precipitates that form over folded states.8

Binding and catalytic molecules might be easier to develop if protein scaffolds were replaced with nucleic acid (NA) scaffolds. These, unlike protein scaffolds, are intrinsically soluble, making them easier to evolve and easier to multiplex. Indeed, many have suggested that an early episode of life on Earth (the “RNA World”9) used RNA as the only encoded component of catalysts, and that these supports enabled a complex metabolism.10 This model is consistent with the structures of RNA cofactors,11 the biosynthesis of proteins by catalytic rRNA,12 and examples of RNA catalysis in modern molecular biology.13 With less confidence, some have suggested that Terran life itself emerged via the abiotic generation of an RNA molecule able to catalyze template-directed polymerization of RNA.14

This teases a question: If early life evolved binding and catalytic NAs to support complex lifestyles, could not a sophisticated molecular biology laboratory do the same, perhaps even more efficiently? Thus, Gold,15 Szostak,16 Joyce,17 and others suggested that libraries of NA molecules might support laboratory in vitro evolution (LIVE), from which library components that bound a target receptor (aptamers) or catalyzed a target reaction (aptazymes) might be extracted and PCR amplified. Then, in a process analogous to lead development in medicinal chemistry or the maturation of primary antibodies, tighter binders and more efficient catalysts might be evolved by rounds of selection with mutation.

This approach had many successes.18 However, over time, it became clear that natural nucleic acids (NAs) have difficulty meeting the performance metrics of proteins as binders and catalysts. Natural NAs have just four building blocks, a few functional groups, low information density, and poorly controlled folding. Indeed, in one case where an evolved DNAzyme was examined in detail, its limited catalytic power was found in large part to be due to folding ambiguity.19 Separately, experiments that reduced the number of nucleotides to below 4 could produce RNA catalysts, but these were much less efficient.20 Conversely, adding functional groups to four nucleotide DNA makes them better21 catalysts.

Limited information density has also confounded efforts to add to NAs functional groups thought to be important for binding and catalysis.22 With standard 4-letter DNA, one of the four nucleotides participating in the two base pairs must carry that functional group. Since both pairs are needed to define a fold, functional groups end up being present in greater numbers than needed, the “overdecoration problem”.23

For example, Hirao found that a single hydrophobic nucleotide added to an aptamer could improve a typical affinity (50 nM) to subpicomolar affinity.24 However, if a hydrophobic group is present on every thymine in an aptamer (for example, a SOMAmer25), the resulting molecule can have limited binding specificity and lack the robust solubility that is normally seen to be a strength of NA as a scaffold for molecular evolution. Thus, SOMAmers are used in immobilized form on arrays, where the pattern of interaction with many low-specificity SOMAmers gives a high-specificity readout of analytes that might be present.

Anticipating these issues in 1987,26 the Benner lab proposed an “anthropogenic expanded genetic information system” (AEGIS)27 that exploits all patterns of hydrogen bonding possible within a Watson–Crick pairing geometry; the four standard nucleotides in the two standard pairs exploit only two of these. Rearranging the hydrogen bond donor and acceptor groups adds eight independently replicable nucleotides, forming another four independently replicable pairs to the 4 nucleotides in standard Terran DNA (Figure 1A). This completes the Watson–Crick base pairing concept28 that Terran prebiotic chemistry (and subsequent evolution) failed to complete on Earth.29 Functional groups could be added on a few nucleotides, with the others controlling folding, avoiding the overdecoration problem. The repeating backbone charge would allow the system to evolve without encountering insolubility problems.

Figure 1.

Figure 1.

Chemical structures of standard and nonstandard nucleotides, which may interact with each other through proton gain and loss or by using biversal nucleotide analogues. (A) By exploiting all hydrogen bonding patterns within the Watson–Crick geometry in duplex DNA, the total number of DNA letters has been expanded to 12 in an “anthropogenic evolvable genetic information system (AEGIS)”. These hydrogen bonding patterns are designated by the letters C, G, T, A, K, X, S, B, Z, P, V, and J. (B) AEGIS Z and standard G form a pair joined by three hydrogen bonds with Watson–Crick geometry if Z is deprotonated. However, AEGIS Z cannot pair with standard A. (C) AEGIS P cannot interact with standard T or C by any geometry. (D) The biversal nucleotide y can form a pair joined by two hydrogen bonds with Watson–Crick geometry with both AEGIS P and amino-A, a functional analogue of standard adenine. Thus, y acts as an intermediary to allow A to communicate with AEGIS P.

AEGIS improves nucleic acid scaffolds, largely as anticipated. The higher information density of AEGIS DNA, its more rapid hybridization, and the orthogonality of added pairs separate from the standard pairs, have supported over $1 billion in diagnostics products.30 AEGIS-LIVE gives AEGISzymes31 and AEGISbodies32 that exploit higher information density to control folding.33 They also give new folds,34 including the recently discovered fZ-motif.35

AEGIS libraries also proved to be richer sources of binding affinity and catalytic power. LIVE has been able to take increased advantage of this richness, as the enzymes required to PCR amplify AEGIS DNA without losing nonstandard components have improved. Thus, a recent study estimated that a GACTZP 6-letter 25 nucleotide AEGIS library to be 100,000 times richer as a reservoir for ribonuclease-type catalysts than a standard 4-letter GACT library. In this case, Z in the AEGISzyme serves as a general acid-general base catalyst.31

Reinventing DNA required campaigns to synthesize AEGIS phosphoramidites and triphosphates to make AEGIS oligonucleotides.36 Campaigns of screening,37 protein engineering,38 and directed evolution39 were required to get polymerases that replicate AEGIS DNA and RNA.27,40 Other efforts were required to develop the structural biology and solution biophysics of AEGIS pairs.27,41

Of course, a reinvented DNA also requires sequencing tools, preferably those that exploit “next-generation sequencing” instruments that already benefit from enormous past investment to sequence 4-letter DNA. In the ideal approach, for example, Z:P pairs would, during PCR amplification, be transliterated in equal amounts to C:G and T:A pairs. The amplicon product was then deep sequenced. The sequences would then be aligned by informatics to match those that arise from a single amplicon (but with Z:P transliterated to C:G) with those that arose from the same amplicon (but with Z:P transliterated to T:A). Sites containing C and T in the alignment would be back-interpolated to be “Z” in the ancestral amplicon; sites containing G and A would be called “P” in that ancestor.

Robustly transliterating Z:P pairs to C:G pairs proved to be easy.42 This transliteration exploited a pair having standard Watson–Crick geometry joined by three hydrogen bonds between G and deprotonated Z (Figure 1B, top). This was possible because Z has a relatively low pKa (approximately 7.8).42

Unfortunately, parallel conditions to robustly transliterate Z:P pairs to T/A pairs were difficult to find. This transliteration would require either a P:T mismatch or a Z:A mismatch. A P:T mismatch requires a C=O x O=C clash (Figure 1C, top). A Z:A mismatch requires a noncanonical “wobble” geometry joined by a single hydrogen bond (Figure 1B, bottom). Thus, while forcing this translation by depriving a PCR system of dZTP and dPTP was able to sequence AEGISzymes31 and AEGISbodies32 in expert laboratories, its workflow complexity is not adequate to democratize AEGIS-LIVE.

Therefore, an alternative sequencing approach was developed43 to exploit enzymatic deamination of natural cytidine (C), transliterating it to uridine (U). As before, during PCR amplification, AEGIS Z is transliterated to cytosine (C), involving the robust mismatching of standard G with deprotonated Z. Then, bioinformatic alignment with the direct PCR products was used to call the bases in the parent sequences. Since all of the existing C’s had been transliterated to “T”, the sequences could be deconvoluted by comparing the sequences with and without deamination.

Unfortunately, the deamination process gave amplicon products that were rich in A and T. This gave poor-quality high-throughput sequencing reads (for details, see Supporting Figure 1).

These outcomes motivated the search for a new idea to do robust transliteration and, in particular, transliterate Z:P pairs to give T:A pairs. The new idea involves “biversal nucleotides”. Biversal nucleotides are nucleobase analogues that form Watson–Crick pairs with two size complementary nucleobases without being deprotonated or protonated, without tautomerism, but rather only by strategic manipulation of hydrogen bonding groups.44

Here, we exemplify this general idea by robustly solving the six-letter GACTZP sequencing problem. This required a biversal pyrimidine analogue that could pair both with AEGIS P and standard A to allow robust transliteration of Z:P pairs to T:A pairs.

To construct this biversal analogue, we began with a C-glycosidic analogue of thymidine first synthesized by Solomon and Hopkins45 and studied by Ishikawa46 et al. This is a C3-pyridine-2-one C glycoside (y, Figure 1D) that is missing a “top” exocyclic hydrogen bonding group. Thus, y cannot form the “top” hydrogen bond to the N6 amino group of adenine. Further, since natural adenosine is missing a minor groove “bottom” hydrogen bonding amino group, the y:A pair is joined by only one hydrogen bond. This is insufficient to compete with the hydrogen bonds between the nucleobases and solvent water.

However, if adenine is replaced by aminoadenine, then the y:amA pair is joined by two hydrogen bonds. These can compete with hydrogen bonding to water. Further, y forms a pair with Watson–Crick geometry with P, also joined by two hydrogen bonds (Figure 1D). Thus, y is a “biversal” nucleobase. It can pair with two hydrogen bonds and a Watson–Crick geometry with two purine analogues. As one of these is standard (A) and the other is nonstandard (P), y prospectively might allow the two DNAs to communicate with each other.

This led to experiments that applied the biversal concept to the task of sequencing GACTZP 6-letter DNA with controlled transliteration that connects AEGIS DNA with standard DNA. We transliterate Z:P pairs to T:A pairs during PCR amplification with addition of dyTP, damATP, and dPTP (optionally). In GACTZP DNA, template P first pairs with y in the absence of dZTP (Figure 1D, top). In the second step, y (now in a template) pairs with amATP (Figure 1D, bottom), directing its incorporation into a complementary strand. Then, amA, now in the template, pairs with standard dTTP, which can template the incorporation of standard dATP. These events, all in one pot, transliterate Z:P pairs to T:amA pairs, which are then sequenced as T:A pairs. The overall result is biversal nucleotide-assisted sequencing (BNA-Seq) for GCATZP AEGIS DNA.

RESULTS

Synthesis of dyTP.

Solomon and Hopkins45 delivered a stereocontrolled synthesis of dyTP from the acetonide of D-glyceraldehyde in a process that included a two-carbon homologation with diallyl zinc followed by reaction with a lithiated fluoropyridine. To implement BNA-Seq in a democratizable form, we required a simpler route. This started with a glycal derived from thymidine (Figure 2). Palladium-catalyzed coupling of the glycal with an O-protected iodopyridinol gave the protected nucleoside analogue. This was deprotected by oxidation of the sulfide, followed by the formation of a triphosphate and base-catalyzed deprotection. The analytical data (NMR, HRMS spectra) are summarized in the Supporting Information.

Figure 2.

Figure 2.

New synthesis of dyTP. Reaction condition: (i) 2-(Phenylthio)-ethanol, Ph3P, DEAD, THF, RT, 30 min. 56%. (ii) (a) Pd(OAc)2, AsPh3, Ag2CO3, CHCl3, 70 °C, overnight; (b) Et3N-3HF, THF, RT, 30 min; (c) NaBH(OAc)3, CH3CN/AcOH, RT, 30 min. 45% (total yield). (iii) H2O2, AcOH, 50 °C, 3 h. 80%. (iv) (a) DMTrCl, pyridine, DMF, RT, 1h; (b) Ac2O, pyridine, DMAP, DCM, RT, 30 min; (c) DCA (3%), DCM, RT, 2 h. 68% (total yield). (v) (a) pyridine, CLOP, dioxane, RT, 15 min; (b) pyrophosphate/tributylamine in DMF, RT, 20 min; (c) iodine, water, pyridine, RT, 30 min; (d) NH4OH, RT, overnight. 10% (total yield).

Enzymology.

In BNA-Seq, biversal nucleotide dyTP serves as a substrate in the template-directed polymerization of a DNA template to match P. Here, the y:P pair is joined by two hydrogen bonds with a Watson–Crick geometry (Figure 3A). 2-Aminoadenine (amA) triphosphate then operates in the next copying step to complete the transliteration. Amino-A pairs with y also via two hydrogen bonds with a Watson–Crick geometry and, in the next cycle, with T via three hydrogen bonds. Thus, in one amplification, Z:P pairs are transliterated to T:A pairs. To implement this application of the biversal concept, the DNA molecules in Table 1 were made by solid phase synthesis and used with primers in PCR amplifications. The “Nat” sequence includes two restriction sites cut by PspOMI (GGGCCC) and DraI (TTTAAA). In contrast, AEGIS sequence “P-1” is designed to contain P within the restriction enzyme recognition sequence (GGPCCC, TTTPAA). Restriction enzymes do not cut these sites. Thus, if P is transliterated into either G or A during PCR, the PspOMI or (respectively) DraI sites are formed (see Supporting Figure 2). This desired transliteration is then monitored by restriction digestion.

Figure 3.

Figure 3.

Biversal nucleotide-assisted sequencing of GACTZP DNA through transliteration. (A) Z:P pairs were transliterated to C:G pairs via PCR 1 amplification in the presence of five triphosphates d(A, T, C, G, Z)TP. This exploits a Watson–Crick pair between G and deprotonated Z. Transliteration of Z:P pairs to T:A pairs via PCR 2 amplification with six triphosphates d(A, T, C, G, y, amA)TP. (B) PAGE-urea analysis of the PCR products from DNA templates (Nat or P-1) under various PCR conditions, followed by digestion with restriction endonuclease specific for the natural sequence after specific transliterations. (C) Sanger sequencing is used to illustrate the sequencing of the P-1 template across a range of PCR conditions. (D) Next-generation sequencing (NGS) to measure the transliteration ratio of P to A or G under various PCR conditions.

Table 1.

DNA Sequences Used in This Study

name sequence (5′–3′)

Nat TAAGATGAGAGTTGAGGAGAGTTAAGGGCCCAACAGTCGATTTAAATATAGTAGTGTAAGTAGATAGTGGA
P-1 TAAGATGAGAGTTGAGGAGAGTTAAGGPCCCAACAGTCGATTTPAATATAGTAGTGTAAGTAGATAGTGGA
ZZ TAAGATGAGAGTTGAGGAGAGTTATCCAAGZTATAGGGCZZTTCAGTATAGTAGTGTAAGTAGATAGTGGA
ZP-1 TAAGATGAGAGTTGAGGAGAGTTACGTGZACGCPTPGTCAZCACAGTATAGTAGTGTAAGTAGATAGTGGA
ZP-2 TAAGATGAGAGTTGAGGAGAGTTATCAPCGTAGCAZPCTTPTZATGTATAGTAGTGTAAGTAGATAGTGGA
Z-Ran TAAGATGAGAGTTGAGGAGAGTTATNNNZNNNGTATAGTAGTGTAAGTAGATAGTGGA
P-Ran TAAGATGAGAGTTGAGGAGAGTTATNNNPNNNGTATAGTAGTGTAAGTAGATAGTGGA

The Nat DNA template was amplified by PCR with the four standard triphosphates d(A, T, C, and G)TP. In parallel, the P-1 DNA template was PCR amplified under three different conditions.

PCR Conditions (1): dATP, dTTP, dCTP, dGTP, and dZTP, to force P to be transliterated to G via a deprotonated Z:G match with Watson–Crick geometry (Figure 1B, top).

PCR Condition (2): dATP, dTTP, dCTP, dGTP, dyTP, and damATP, to force P to be transliterated to A via y:P and y:amA matches, both again with Watson–Crick geometry (Figure 1D).

Regular PCR with four standard triphosphates, dATP, dTTP, dCTP, and dGTP. P is transliterated to G/A mixture, via P:C and P:T mismatch with the noncanonical and less robust, (Figure 1C).

Following the initial amplification, products were diluted 2000-fold and then subjected to another PCR under standard PCR conditions. Subsequently, the PCR products were subjected to digestion by restriction endonucleases. These digested products were then analyzed by denaturing the urea polyacrylamide gel electrophoresis.

In the first group, PCR amplicons derived from the Nat template in combination with a standard of 4-triphosphate PCR were digested by PspOMI and DraI. This resulted in a distinct, short band, observable on the urea-PAGE gel (Lanes 2 and 3, Figure 3B). These data serve as a positive control, proving that the restriction enzyme strategy could be used to analyze the products.

In the second group, where AEGIS template (P-1) was used under conditions where P might be transliterated to A, reducing the concentrations of dCTP and dGTP (to 0.1 mM) and maintaining the concentrations of dTTP and dATP (0.2 mM) do not significant increase the transliteration of P to A. However, when yTP (0.1 mM) and amATP (0.1 mM) were added into the PCR mixture, P was efficiently transliterated to A, as quantified by NGS, 86.6% (P-28) and 89.9% (P-44) (Supporting Figure 5). Again, this shows that P first pairs with y, y then pairs with amA, amA then pairs with T, and T then pairs with A.

The concentration of dyTP in the PCR process was then systematically varied over a range of concentrations (0.1–0.5 mM); results are presented in Supporting Figure 6. When the dyTP concentration was 0.1 mM, 86.6% of the initial P was transliterated to A. With increasing dyTP, the amount of P transliterated to A increased. Concentrations of dyTP above 0.3 mM did not increase the transliterated yield to over 96%. However, increased concentrations of damATP above 0.2 mM reduced the efficacy of the PCR.

Optimizing PCR with Condition 2 gave dATP (0.2 mM), dTTP (0.2 mM), dCTP (0.1 mM), dGTP (0.1 mM), dyTP (0.3 mM), and damATP (0.1 mM). Here, P-28 transliterated to A (94.7%), and P-44 transliterated to A (96.3%) were acceptable. The gel-based restriction enzyme strategy also shows these high levels of transliteration. Template (P-1) under optimized PCR condition 2 with six triphosphates d(A, T, C, G, y, amA)TP, resisted digestion by PspOMI (Figure 3B, Lane 5), but was successfully digested by DraI (Figure 3B, Lane 6). The amplicon was also sequenced by Sanger sequencing (Figure 3C).

Analogous success was achieved when amplifying AEGIS template (P-1) under PCR Condition 1 (third group) with five triphosphates d(A, T, C, G, Z)TP. The products were digested by PspOMI (GGGCCC) (Figure 3B, Lane 8), but not by DraI (TTTAAA), (Figure 3B, Lane 9). These results show that template P nucleotide was transliterated to G under Condition 1, as expected from previous reports.43 Sanger sequencing confirmed the sequence of the amplicons (Figure 3C), and NGS quantitated the ratio of transliteration products. Here, P-28 was transliterated to G (97.7%), and P-44 was transliterated to G (99.3%). (Figure 3D)

In the fourth set of experiments, the P-1 template was used under standard PCR conditions with the four standard triphosphates: d(A, T, C, G)TP. The amplicons were only partly digested by PspOMI and DraI (lanes 11 and 12, Figure 3B). This outcome suggests that the P nucleotide in a template underwent transliteration to an A/G mixture with standard PCR. This was confirmed by Sanger sequencing (Figure 3C); NGS quantitated the transliteration ratios. P at position 28 was transliterated to A (57.3%) and G (42%), and P at position 44 was transliterated to A (59.1%) and G (40.2%) (Figure 3D).

We then examined the fate of Z in the template. Here, no standard nucleotide forms a match with the Watson–Crick geometry. Thus, dPTP was added at different concentrations (0.1–0.5 mM) to allow the first copy cycle to pair dPTP with template Z. A second template with two consecutive Z’s showed that a 0.5 mM (dPTP) concentration gave the best transliteration (~85%). Results from Sanger sequencing and restriction enzyme digestion are summarized in (Supporting Figures 7 and 8).

Sequencing AEGIS DNA Contains both Z and P.

A full sequencing workflow was then developed to sequence AEGIS DNA molecules that contained both Z and P. ZP-1 and ZP-2 templates under PCR Condition 1 transliterate their Z:P to C/G via pairing always with a Watson–Crick geometry. The same is true for PCR Condition 2, which transliterates Z:P to A:T via steps all involving pairs with Watson–Crick geometry (Figure 4A). The resulting transliterated sequences were separately analyzed by Sanger sequencing. Bioinformatics reliably back-inferred the original sequences (Figure 4C,D). NGS also quantified the results and were demonstrated using sequence logos (Figure 4E,F). The transliteration ratio of all of the base (excluding the primer region) in the templates of P-1, ZP-1, and ZP-2 after PCR 1 or PCR 2 are summarized in Figure 4B. Standard bases (A, T, C, G) were called with >99% accuracy, showing that added nonstandard triphosphates had no impact on those calls.

Figure 4.

Figure 4.

GACTZP DNA sequencing through transliteration. (A) Z:P transliteration to C:G under PCR Condition 1 in the presence of five triphosphates d(A, T, C, G, Z)TP; Z:P transliteration to T:A under Condition 2 in the presence of six triphosphates d(A, T, C, G, P, y, amA)TP. (B) NGS data quantitating the transliteration (%) of original bases [A, T, C, G, Z, and P] in P-1, ZP-1, and ZP-2 sequences under Conditions 1 and 2. Standard deviations represent the multiple bases in all sequences. (C, D) Sanger sequencing showing transliteration amplicons of ZP-1 and ZP-2 templates under Conditions 1 and 2. (E, F) Sequence logos demonstrating that next-generation sequencing (NGS) quantifies transliterated amplicons of ZP-1 and ZP-2 templates under Conditions 1 and 2. (G, H) –NNNZNNN- and –NNNPNNN- sequence under PCR 1 or PCR 2 condition, Sanger sequencing, and NGS show that the transliteration outcomes do not depend materially on the preceding three nucleotides or the trailing three nucleotides.

Sequence Context.

We then asked whether the sequencing results obtained using the biversal nucleotide analogue y were affected by neighbor sequences. Here, a library of sequences having all four standard nucleotides preceding and following an AEGIS Z or P (4096 combinations in all, -NNNZNNN- and -NNNPNNN-) were treated with sequencing and NGS. No obvious context bias in sequence calls was seen (Figure 4G,H). The detailed sequencing analytics are summarized in the Supporting Information.

Applying BNA-Seq to a Mixture of DNA Molecules in a Pool That Might Arise from an AEGIS-LIVE Evolution Experiment.

In democratized AEGIS-LIVE, users will primarily use BNA-Seq to deconvolute sequences from survivor pools with a degree of sequence diversity. To demonstrate the value of BNA-Seq in this application context, we prepared a mixture of DNA molecules with natural and added Z and P nucleotides. This mixture included five types of DNA molecules in a 1:1:1:1:1 ratio: Nat DNA (only ATCG nucleotides), ZZ DNA (ATCGZ nucleotides), P-1 DNA (ATCGP nucleotides), and ZP-1 and ZP-2 DNA (ATCGZP nucleotides)

To test the sensitivity of this sequencing method, we prepared a dilution series from 1 to 0.001 pM and submitted each sample to BNA-seq (Figure 5). The results showed that all five templates were successfully read with high sensitivity even at 0.001 pM DNA concentration. Interestingly, the Nat, ZZ, and ZP-2 templates were sequenced at approximately 20%, corresponding to the initial preparation ratio. However, the P-1 template was read at a lower ratio of 7%, while the ZP-1 template was sequenced at a higher ratio of 30% compared to the initial 20%. This discrepancy may result from PCR efficiency bias, which is also common in general PCR.

Figure 5.

Figure 5.

Application and sequencing of AEGIS DNA mixtures. 10-fold diluted samples were sequenced using BNA-seq.

Steady-State Kinetic Assays.

To quantitatively compare efficiencies of nucleotide incorporation in transfer PCR when DNA containing Z or P, steady-state kinetic assays were performed to characterize the efficiencies using TaKaRa Taq polymerase (pseudo-second-order rate constant), Vmax/Km (%·min−1·μM−1), referring to the literature.47

In this primer extension, single nucleotide incorporation (n + 1 product formation) was followed with the final concentrations of dNTP ranging from 0.0013 to 100 μM. Kinetic parameters Km and Vmax were calculated. The insertion efficiencies of dCTP opposite dG is 478%·min−1·μM−1. The insertion efficiencies of dZTP or dyTP opposite dPTP were 175 and 1.26%·min−1·μM−1, respectively. Moreover, no detectable incorporation of dTTP or dCTP opposite dP was observed under the same conditions (Supporting Table 4). The insertion efficiencies of dPTP or dGTP opposite dZTP were 810 and 152%·min−1·μM−1, respectively. Moreover, no detectable reactions were observed for dATP and damATP opposite to dZ under the same conditions.

DISCUSSION

Several general themes have emerged over the past three decades, as researchers have sought to expand the DNA alphabet. The first, represented by AEGIS, retains both the size complementarity and hydrogen bonding complementarity of the standard Watson–Crick pairs. The second, represented by work from Kool,48 Hirao,49 and Romesberg,50 dispenses with hydrogen bonding, relying only on size complementarity to ensure faithful replication. The third, represented by “fat” and “skinny” pairs, but also notably developed by important work in the Kool laboratory to give an 8-letter DNA alphabet, retains hydrogen bonding complementarity but dispenses with size complementarity.51

All themes have attributes. Indeed, pairs of the second type have been tested in vivo in semisynthetic organisms.52 However, lacking hydrogen bonds, these systems have challenges in evolution. Likewise, systems of the third type do not easily interact with DNA polymerases.

This work with biversals shows how interbase hydrogen bonding can be manipulated to solve a specific problem with the first theme: letting existing high-throughput sequencing platforms sequence GACTZP DNA. However, the concept can be generalized.

Central to the biversal concept is the observation that unless a nucleobase forms at least two hydrogen bonds to its partner, hydrogen bonding to solvent can compete effectively with interbase hydrogen bonding.53 This constrains the design of biversals. With these constraints, Figure 6 applies the same concept across the 12-letter DNA alphabet. Thus, careful manipulation of hydrogen bonding groups on pyrimidine analogues gives biversals that should allow G/X (pyDA-), J/A (pyAD-), P/A (py-DA), and B/X (py-AD) interconversion (D represents hydrogen bond donor group, A represents hydrogen bond acceptor groups). As no purine analogue presents a DDD hydrogen bonding pattern or an AAA hydrogen bonding pattern, the py-DD and py-AA heterocycles are not biversal.

Figure 6.

Figure 6.

Chemical structures of biversal nucleotide analogues to communicate various standard or nonstandard nucleotides. (A) Pyrimidine biversal nucleotides pair with purine nucleotides. (B) Purine biversal nucleotides pair with pyrimidine nucleotides.

Likewise, careful manipulation of hydrogen bonding groups on purine analogues gives biversals that should allow C-K, S–K, T-Z, and T-V interconversion. Since there is no pyrimidine analogue that presents a AAA hydrogen bonding pattern, the G analogue with the top hydrogen bonding group deleted has no second partner.

Note, just as we can supplement biversality with bases that are deprotonated in the Z system, we can do this with other systems. For example, the diaminopyrimidine implementation of the K (DAD) hydrogen bonding pattern is a relatively strong base, becoming protonated with a pKa value of ~7. In its protonated form, this heterocycle presents a DDD hydrogen bonding pattern.

Entirely separately from this, reviewers have noted that a polyelectrolyte backbone, which the Polyelectrolyte Theory of the Gene28 posits as necessary for any informational biopolymer to be able to support Darwinian evolution, seems to have difficulty forming core folds through backbone–backbone interactions. Of course, such core folds are exactly what produce compact structures in proteins, including β sheets and α helices. It is commonly thought that such core folds are essential for very tight binding or truly effective catalysis.

Expanded genetic alphabets offer an opportunity to explore this hypothesis. With polyelectrolytes, core folds can arise from side chain-side chain interactions. With standard 4-nucleotide DNA, the G quadraplex54 is the only one available near neutral pH.

However, AEGIS offers many more base–base interactions that give folds. These include assemblies formed by isoguanosine55 (AEGIS B), fat and skinny pairs34b (AEGIS T:K, S:Z, and V:C), and the fZ-motif.35 Further, AEGIS components that have hydrophobic tags can form hydrophobic cores by base tag-base-tag interactions if they are sparingly introduced so as to not cause overdecoration and precipitation that overwhelm the intrinsic solubility of the polyanionic backbone.

CONCLUSIONS

This work introduces a new idea, “biversality”, for sequencing in expanded DNA particular, but also to connect the molecular biological universe of the natural world to the molecular biological universe being reinvented by synthetic biologists. The biversality concept can connect many partners within AEGIS DNA, as it does within natural DNA, and between AEGIS and natural DNA. Thus, it introduces a new concept into the design of nucleic acid systems.

Supplementary Material

Supplemental Information
supplementary source data excel file

ASSOCIATED CONTENT

Supporting Information

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/jacs.4c11043.

Source Data (XLSX)

Detailed methods, materials, NMR spectra, and additional studies (PDF)

ACKNOWLEDGMENTS

This work was supported by the National Institutes of General Medical Science under Award Number 1R01GM141391–01A1 and by the National Science Foundation under Award Number MCB-2123995.

Footnotes

The authors declare the following competing financial interest(s): SAB is a sole owner of Firebird Biomolecular Sciences, which provides AEGIS components at no profit to the research communities.

Contributor Information

Bang Wang, Foundation for Applied Molecular Evolution, Alachua, Florida 32601, United States; Firebird Biomolecular Sciences, LLC, Alachua, Florida 32601, United States.

Hyo-Joong Kim, Foundation for Applied Molecular Evolution, Alachua, Florida 32601, United States.

Kevin M. Bradley, Firebird Biomolecular Sciences, LLC, Alachua, Florida 32601, United States

Cen Chen, Foundation for Applied Molecular Evolution, Alachua, Florida 32601, United States.

Chris McLendon, Firebird Biomolecular Sciences, LLC, Alachua, Florida 32601, United States.

Zunyi Yang, Foundation for Applied Molecular Evolution, Alachua, Florida 32601, United States; Firebird Biomolecular Sciences, LLC, Alachua, Florida 32601, United States.

Steven A. Benner, Foundation for Applied Molecular Evolution, Alachua, Florida 32601, United States; Firebird Biomolecular Sciences, LLC, Alachua, Florida 32601, United States

REFERENCES

  • (1).Hansch C; Fujita T p-σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure. J. Am. Chem. Soc. 1964, 86 (8), 1616–1626. [Google Scholar]
  • (2).Venkatraman V; Colligan TH; Lesica GT; Olson DR; Gaiser J; Copeland CJ; Wheeler TJ; Roy A Drugsniffer: an open source workflow for virtually screening billions of molecules for binding affinity to protein targets. Front. Pharmacol. 2022, 13, No. 874746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (3).Wildey MJ; Haunso A; Tudor M; Webb M; Connick JH High-throughput Screening. In Annual Reports in Medicinal Chemistry; Elsevier, 2017; Vol. 50, pp 149–195. [Google Scholar]
  • (4).Winter G; Griffiths AD; Hawkins RE; Hoogenboom HR Making antibodies by phage display technology. Annu. Rev. Immunol. 1994, 12 (1), 433–455. [DOI] [PubMed] [Google Scholar]
  • (5).Stumpp MT; Dawson KM; Binz HK Beyond antibodies: the DARPin drug platform. Biodrugs 2020, 34 (4), 423–433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (6).(a) Baker M Blame it on the antibodies. Nature 2015, 521 (7552), 274–276. [DOI] [PubMed] [Google Scholar]; (b) Begley CG; Ellis LM Raise standards for preclinical cancer research. Nature 2012, 483 (7391), 531–533. [DOI] [PubMed] [Google Scholar]
  • (7).Hayhurst A; Harris WJ Escherichia coliskp chaperone coexpression improves solubility and phage display of single-chain antibody fragments. Protein Expression Purif. 1999, 15 (3), 336–343. [DOI] [PubMed] [Google Scholar]
  • (8).Acharya VV; Chaudhuri P Modalities of protein denaturation and nature of denaturants. Int. J. Pharm. Sci. Rev. Res. 2021, 69 (2), 19–24. [Google Scholar]
  • (9).Gilbert W Origin of life: The RNA world. Nature 1986, 319 (6055), 618. [Google Scholar]
  • (10).Benner SA; Ellington AD; Tauer A Modern metabolism as a palimpsest of the RNA world. Proc. Natl. Acad. Sci. U.S.A. 1989, 86 (18), 7054–7058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (11).White HB Coenzymes as fossils of an earlier metabolic state. J. Mol. Evol. 1976, 7, 101–104. [DOI] [PubMed] [Google Scholar]
  • (12).Ban N; Nissen P; Hansen J; Moore PB; Steitz TA The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution. Science 2000, 289 (5481), 905–920. [DOI] [PubMed] [Google Scholar]
  • (13).Altman S Enzymatic cleavage of RNA by RNA (Nobel lecture). Angew. Chem., Int. Ed. Engl. 1990, 29 (7), 749–758. [Google Scholar]
  • (14).Rich A On the Problems of Evolution and Biochemical Information Transfer. In Horizons in Biochemistry; Academic Press, 1962; pp 103–126. [Google Scholar]
  • (15).Tuerk C; Gold L Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 1990, 249 (4968), 505–510. [DOI] [PubMed] [Google Scholar]
  • (16).Lorsch JR; Szostak JW Chance and necessity in the selection of nucleic acid catalysts. Acc. Chem. Res. 1996, 29 (2), 103–110. [DOI] [PubMed] [Google Scholar]
  • (17).Joyce GF Directed evolution of nucleic acid enzymes. Annu. Rev. Biochem. 2004, 73 (1), 791–836. [DOI] [PubMed] [Google Scholar]
  • (18).(a) Wang B; Pan X; Teng I-T; Li X; Kobeissy F; Wu Z-Y; Zhu J; Cai G; Yan H; Yan X; et al. Functional Selection of Tau Oligomerization-Inhibiting Aptamers. Angew. Chem., Int. Ed. 2024, 63 (18), No. e202402007. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Li N; Ebright JN; Stovall GM; Chen X; Nguyen HH; Singh A; Syrett A; Ellington AD Technical and biological issues relevant to cell typing with aptamers. J. Proteome Res. 2009, 8 (5), 2438–2448. [DOI] [PubMed] [Google Scholar]; (c) Byun J Recent progress and opportunities for nucleic acid aptamers. Life 2021, 11 (3), No. 193. [DOI] [PMC free article] [PubMed] [Google Scholar]; (d) Rozenblum GT; Lopez VG; Vitullo AD; Radrizzani M Aptamers: current challenges and future prospects. Expert Opin. Drug Discovery 2016, 11 (2), 127–135. [DOI] [PubMed] [Google Scholar]; (e) Wang B; Kobeissy F; Golpich M; Cai G; Li X; Abedi R; Haskins W; Tan W; Benner SA; Wang KK W. Aptamer Technologies in Neuroscience, Neuro-Diagnostics and Neuro-Medicine Development. Molecules 2024, 29 (5), No. 1124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (19).Carrigan MA; Ricardo A; Ang DN; Benner SA Quantitative analysis of a RNA-cleaving DNA catalyst obtained via in vitro selection. Biochemistry 2004, 43 (36), 11446–11459. [DOI] [PubMed] [Google Scholar]
  • (20).Reader JS; Joyce GF A ribozyme composed of only two different nucleotides. Nature 2002, 420 (6917), 841–844. [DOI] [PubMed] [Google Scholar]
  • (21).Hollenstein M DNA catalysis: the chemical repertoire of DNAzymes. Molecules 2015, 20 (11), 20777–20804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (22).Wang Y; Ng N; Liu E; Lam CH; Perrin DM Systematic study of constraints imposed by modified nucleoside triphosphates with protein-like side chains for use in in vitro selection. Org. Biomol. Chem. 2017, 15 (3), 610–618. [DOI] [PubMed] [Google Scholar]
  • (23).Wolk SK; Mayfield WS; Gelinas AD; Astling D; Guillot J; Brody EN; Janjic N; Gold L Modified nucleotides may have enhanced early RNA catalysis. Proc. Natl. Acad. Sci. U.S.A. 2020, 117 (15), 8236–8242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (24).(a) Kimoto M; Nakamura M; Hirao I Post-ExSELEX stabilization of an unnatural-base DNA aptamer targeting VEGF165 toward pharmaceutical applications. Nucleic Acids Res. 2016, 44 (15), 7487–7494. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Kimoto M; Yamashige R; Matsunaga K.-i.; Yokoyama S; Hirao I Generation of high-affinity DNA aptamers using an expanded genetic alphabet. Nat. Biotechnol. 2013, 31 (5), 453–457. [DOI] [PubMed] [Google Scholar]
  • (25).Gold L; Ayers D; Bertino J; Bock C; Bock A; Brody EN; Carter J; Dalby AB; Eaton BE; Fitzwater T; et al. Aptamer-Based Multiplexed Proteomic Technology for Biomarker Discovery. PLoS One 2010, 5 (12), No. e15004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (26).Benner SA; Allemann RK; Ellington AD; Ge L; Glasfeld A; Leanz GF; Krauch T; MacPherson LJ; Moroney S; Piccirilli JA; Weinhold E Natural selection, protein engineering, and the last riboorganism: rational model building in biochemistry. Cold Spring Harbor Symp. Quant. Biol. 1987, 52, 53–63. [DOI] [PubMed] [Google Scholar]
  • (27).Hoshika S; Leal NA; Kim M-J; Kim M-S; Karalkar NB; Kim H-J; Bates AM; Watkins NE Jr; SantaLucia HA; Meyer AJ; et al. Hachimoji DNA and RNA: A genetic system with eight building blocks. Science 2019, 363 (6429), 884–887. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (28).Benner SA Rethinking nucleic acids from their origins to their applications. Philos. Trans. R. Soc., B 2023, 378 (1871), No. 20220027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (29).Benner SA; Kim H-J; Biondi E Prebiotic chemistry that could not not have happened. Life 2019, 9 (4), No. 84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (30).(a) Yang Z; Chen F; Chamberlin SG; Benner SA Expanded genetic alphabets in the polymerase chain reaction. Angew. Chem., Int. Ed. 2010, 49 (1), 177–180. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Hoshika S; Chen F; Leal NA; Benner SA Artificial Genetic Systems: Self-Avoiding DNA in PCR and Multiplexed PCR. Angew. Chem., Int. Ed. 2010, 49 (32), 5554–5557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (31).Jerome CA; Hoshika S; Bradley KM; Benner SA; Biondi E In vitro evolution of ribonucleases from expanded genetic alphabets. Proc. Natl. Acad. Sci. U.S.A. 2022, 119 (44), No. e2208261119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (32).Zhang L; Yang Z; Sefah K; Bradley KM; Hoshika S; Kim M-J; Kim H-J; Zhu G; Jiménez E; Cansiz S; et al. Evolution of Functional Six-Nucleotide DNA. J. Am. Chem. Soc. 2015, 137 (21), 6734–6737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (33).Biondi E; Lane JD; Das D; Dasgupta S; Piccirilli JA; Hoshika S; Bradley KM; Krantz BA; Benner SA Laboratory evolution of artificially expanded DNA gives redesignable aptamers that target the toxic form of anthrax protective antigen. Nucleic Acids Res. 2016, 44 (20), 9565–9577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (34).(a) Chaput JC; Switzer C A DNA pentaplex incorporating nucleobase quintets. Proc. Natl. Acad. Sci. U.S.A. 1999, 96 (19), 10614–10619. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Hoshika S; Singh I; Switzer C; Molt RW Jr; Leal NA; Kim M-J; Kim M-S; Kim H-J; Georgiadis MM; Benner SA Skinny” and “Fat” DNA: two new double helices. J. Am. Chem. Soc. 2018, 140 (37), 11655–11660. [DOI] [PubMed] [Google Scholar]
  • (35).Wang B; Rocca JR; Hoshika S; Chen C; Yang Z; Esmaeeli R; Wang J; Pan X; Lu J; Wang KK; Cao YC; Tan W; Benner SA A folding motif formed with an expanded genetic alphabet. Nat. Chem. 2024, 16 (10), 1715–1722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (36).Kim H-J; Chen F; Benner SA Synthesis and properties of 5-cyano-substituted nucleoside analog with a donor–donor–acceptor hydrogen-bonding pattern. J. Org. Chem. 2012, 77 (7), 3664–3669. [DOI] [PubMed] [Google Scholar]
  • (37).Hendrickson CL; Devine KG; Benner SA Probing minor groove recognition contacts by DNA polymerases and reverse transcriptases using 3-deaza-2′-deoxyadenosine. Nucleic Acids Res. 2004, 32 (7), 2241–2250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (38).Chen F; Gaucher EA; Leal NA; Hutter D; Havemann SA; Govindarajan S; Ortlund EA; Benner SA Reconstructed evolutionary adaptive paths give polymerases accepting reversible terminators for sequencing and SNP detection. Proc. Natl. Acad. Sci. U.S.A. 2010, 107 (5), 1948–1953. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (39).(a) Laos R; Shaw R; Leal NA; Gaucher E; Benner S Directed evolution of polymerases to accept nucleotides with nonstandard hydrogen bond patterns. Biochemistry 2013, 52 (31), 5288–5294. [DOI] [PubMed] [Google Scholar]; (b) Laos R; Thomson JM; Benner SA DNA polymerases engineered by directed evolution to incorporate non-standard nucleotides. Front. Microbiol. 2014, 5, No. 565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (40).(a) Lutz MJ; Horlacher J; Benner SA Recognition of 2′-deoxyisoguanosine triphosphate by HIV-1 reverse transcriptase and mammalian cellular DNA polymerases. Bioorg. Med. Chem. Lett. 1998, 8 (5), 499–504. [DOI] [PubMed] [Google Scholar]; (b) Sismour AM; Lutz S; Park JH; Lutz MJ; Boyer PL; Hughes SH; Benner SA PCR amplification of DNA containing non-standard base pairs by variants of reverse transcriptase from Human Immunodeficiency Virus-1. Nucleic Acids Res. 2004, 32 (2), 728–735. [DOI] [PMC free article] [PubMed] [Google Scholar]; (c) Leal NA; Kim H-J; Hoshika S; Kim M-J; Carrigan MA; Benner SA Transcription, reverse transcription, and analysis of RNA containing artificial genetic components. ACS Synth. Biol. 2015, 4 (4), 407–413. [DOI] [PubMed] [Google Scholar]
  • (41).(a) Wang X; Hoshika S; Peterson RJ; Kim M-J; Benner SA; Kahn JD Biophysics of artificially expanded genetic information systems. Thermodynamics of DNA duplexes containing matches and mismatches involving 2-amino-3-nitropyridin-6-one (Z) and imidazo [1, 2-a]-1, 3, 5-triazin-4 (8H) one (P). ACS Synth. Biol. 2017, 6 (5), 782–792. [DOI] [PubMed] [Google Scholar]; (b) Pham TM; Miffin T; Sun H; Sharp KK; Wang X; Zhu M; Hoshika S; Peterson RJ; Benner SA; Kahn JD; Mathews DH DNA Structure Design Is Improved Using an Artificially Expanded Alphabet of Base Pairs Including Loop and Mismatch Thermodynamic Parameters. ACS Synth. Biol. 2023, 12 (9), 2750–2763. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (42).Yang Z; Chen F; Alvarado JB; Benner SA Amplification, mutation, and sequencing of a six-letter synthetic genetic system. J. Am. Chem. Soc. 2011, 133 (38), 15105–15112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (43).Wang B; Bradley KM; Kim M-J; Laos R; Chen C; Gerloff DL; Manfio L; Yang Z; Benner SA Enzyme-assisted high throughput sequencing of an expanded genetic alphabet at single base resolution. Nat. Commun. 2024, 15 (1), No. 4057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (44).Yang Z; Kim H-J; Le JT; McLendon C; Bradley KM; Kim M-S; Hutter D; Hoshika S; Yaren O; Benner SA Nucleoside analogs to manage sequence divergence in nucleic acid amplification and SNP detection. Nucleic Acids Res. 2018, 46 (12), 5902–5910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (45).(a) Solomon MS; Hopkins PB Stereocontrolled syntheses of C-linked deoxyribosides of 2-hydroxypyridine and 2-hydroxyquinoline. Tetrahedron Lett. 1991, 32 (28), 3297–3300. [Google Scholar]; (b) Ishikawa M; Hirao I; Yokoyama S Synthesis of 3-(2-deoxy-β-d-ribofuranosyl)pyridin-2-one and 2-amino-6-(N,N-dimethylamino)-9-(2-deoxy-β-d-ribofuranosyl)purine derivatives for an unnatural base pair. Tetrahedron Lett. 2000, 41 (20), 3931–3934. [Google Scholar]
  • (46).Ishikawa M; Hirao I; Yokoyama S Synthesis of 3-(2-deoxy-β-D-ribofuranosyl) pyridin-2-one and 2-amino-6-(N, N-dimethylamino)-9-(2-deoxy-β-D-ribofuranosyl) purine derivatives for an unnatural base pair. Tetrahedron Lett. 2000, 41 (20), 3931–3934. [Google Scholar]
  • (47).Petruska J; Goodman MF; Boosalis MS; Sowers LC; Cheong C; Tinoco I Jr Comparison between DNA melting thermodynamics and DNA polymerase fidelity. Proc. Natl. Acad. Sci. U.S.A. 1988, 85 (17), 6252–6256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (48).(a) Morales JC; Kool ET Efficient replication between non-hydrogen-bonded nucleoside shape analogs. Nat. Struct. Biol. 1998, 5 (11), 950–954. [DOI] [PubMed] [Google Scholar]; (b) Guckian KM; Krugh TR; Kool ET Solution structure of a DNA duplex containing a replicable difluorotoluene–adenine pair. Nat. Struct. Biol. 1998, 5 (11), 954–959. [DOI] [PubMed] [Google Scholar]
  • (49).Kimoto M; Hirao I Genetic alphabet expansion technology by creating unnatural base pairs. Chem. Soc. Rev. 2020, 49 (21), 7602–7626. [DOI] [PubMed] [Google Scholar]
  • (50).(a) Malyshev DA; Romesberg FE The expanded genetic alphabet. Angew. Chem., Int. Ed. 2015, 54 (41), 11930–11944. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Feldman AW; Romesberg FE Expansion of the genetic alphabet: A chemist’s approach to synthetic biology. Acc. Chem. Res. 2018, 51 (2), 394–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (51).Gao J; Liu H; Kool ET Assembly of the complete eight-base artificial genetic helix, xDNA, and its interaction with the natural genetic system. Angew. Chem., Int. Ed. 2005, 44 (20), 3118–3122. [DOI] [PubMed] [Google Scholar]
  • (52).Zhang Y; Ptacin JL; Fischer EC; Aerni HR; Caffaro CE; Jose KS; Feldman AW; Turner CR; Romesberg FE A semi-synthetic organism that stores and retrieves increased genetic information. Nature 2017, 551 (7682), 644–647. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (53).Geyer CR; Battersby TR; Benner SA Nucleobase pairing in expanded Watson-Crick-like genetic information systems. Structure 2003, 11 (12), 1485–1498. [DOI] [PubMed] [Google Scholar]
  • (54).Lipps HJ; Rhodes D G-quadruplex structures: in vivo evidence and function. Trends Cell Biol. 2009, 19 (8), 414–422. [DOI] [PubMed] [Google Scholar]
  • (55).Roberts C; Chaput JC; Switzer C Beyond guanine quartets: cation-induced formation of homogenous and chimeric DNA tetraplexes incorporating iso-guanine and guanine. Chem. Biol. 1997, 4 (12), 899–908. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Information
supplementary source data excel file

RESOURCES