Graphical Abstract
A “Self-Avoiding Molecular Recognition System” (SAMRS) is a species of DNA that binds to natural DNA but not to other members of the same SAMRS species. SAMRS should be useful in clinical analyses where many DNA molecules must interact with target DNA but not with each other. We report here a SAMRS based on 2-aminopurine (A*), 2-thiothymine (T*), 2′-hypoxanthine (G*) and N4-ethylcytosine (C*) and its use in multiplexed polymerase chain reactions.
Keywords: PCR. Self-avoiding genetic systems, Pseudocomplementarity, DNA polymerases, Synthetic biology, DNA analogs
Many applications of DNA chemistry in biology and medicine would be enhanced if procedures that efficiently analyze single DNA molecules also worked well to analyze many DNA molecules (multiplexing). Unfortunately, multiplexing often requires adding many DNA probes and primers to an assay at the same time, often in great excess over the targeted DNA molecules. Multiple primers built from standard nucleotides can easily interact with each other, even when well designed. These interactions can create artifacts and noise that defeats the analysis, especially when polymerases are involved in the analytic architecture, as in multiplexed PCR. With more than a dozen target amplicons, multiplexed PCR generally fails because of PCR artifacts.[i]
Recently, we reported that the efficiency and consistency of multiplexed PCR could be greatly improved by placing components of our artificially expanded genetic information system (Aegis) in the external primers in a nested PCR architecture.[ii] Aegis increases the number of independently replicable nucleotides from the natural four (A, T, G, and C) to as many as 12. Aegis is now in the clinic, where it today personalizes the care of some 400,000 patients annually infected with the HIV, hepatitis B, and hepatitis C viruses. [iii] However, a nested PCR architecture still does not prevent the analyte-specific segments of the chimeric primers from interacting with each other, as these must be constructed from natural nucleotides.
In a different strategy, multiplexed PCR might be enabled if the analyte-specific portions of the primers were built from a “self-avoiding molecular recognition system” (Samrs). The opposite of Aegis DNA, Samrs DNA binds to natural DNA, but not to other members of the same Samrs species. Schematically, a Samrs replaces T, A, G and C by the nucleotide analogs T*, A*, G* and C*, where T* pairs with A, A* pairs with T, G* pairs with C and C* pairs with G, but neither the T*-A* pair nor the G*-C* pair contributes substantially to the stability of a duplex. In particular, if PCR primers were built from Samrs components, they should enable multiplexed PCR without artifacts arising from primer-primer interactions.
Empirical studies show that pairs joined by two hydrogen bonds contribute to duplex stability, but not pairs joined by one hydrogen bond.[iv] Accordingly, a candidate for G* in a “first generation” Samrs heterocycle might be hypoxanthine (found in inosine), which pairs with C using the top two hydrogen bonding units (Figure 1, top left). The corresponding first generation candidate for C* would be pyrminidin-2-one (found in zebularine),[v] which pairs with standard G using the bottom two hydrogen bonding units (Figure 1, top left). As hypoxanthine and pyrminidin-2-one can form only one hydrogen bond in a standard Watson-Crick geometry, their pair should not contribute to duplex stability; the inosine-zebularine pair would be a G*-C* self-avoiding pair.
Figure 1.
Two generations of candidate nucleobases for self-avoiding molecular recognition systems (Samrs). (a) First generation Samrs candidates are simple implementations having the top two hydrogen bonding units of the standard nucleobase are used for one pair and the bottom two are used for the other. (b) Second generation Samrs exploits 2-thiothymine as T* to resolve issues arising from the weak bonding the first generation T* to adenine, and N4-ethylcytosine as C* to resolve issues arising from the chemical instability of the first generation C*.
For the second self-avoiding pair, pyridone might be a first generation T* candidate, pairing with standard A using its top two hydrogen bonding units (Figure 1, top right). As standard adenine lacks a “bottom” hydrogen bonding unit, 2-aminopurine would be a candidate for A*, pairing with standard T at the bottom two sites. 2-Aminopurine and pyridone would form only one hydrogen bond (Figure 1, top right), and therefore would not contribute to duplex stability. The aminopurine-pyridone pair would then be an A*-T* self-avoiding pair. Some representative melting temperatures of duplexes incorporating these Samrs components are shown in Suppl. Tables 1 and 2.
Samrs should work for simple binding assays. For example, in 1996, Kutyavin et al. [vi] reported that “pseudocomplementary” diaminopurine and 2-thiothymine[vii] bound to thymine and adenine respectively, but that diaminopurine did not bind to 2-thiothymine. Using 2-thiothymine instead of pyridone as a T* candidate is consistent with a need for minor groove solvation to stabilize double helices.[viii] Indeed, 2-thiothymine pairs with A slightly better than T itself (Suppl. Table 3).
Pseudocomplementarity of this limited type has been used in peptide nucleic acids (PNAs) to invade duplex DNA.[ix,x] Gamper showed that similar species as triphosphates could be incorporated into DNA, suggesting that the products from this incorporation might not fold and therefore be more uniformly captured on arrays.[xi,xii]
Accordingly, we attempted to extend the Samrs concept to PCR by incorporating various Samrs candidates into PCR primers based on what we learned by analyzing duplexes built from a first generation Samrs alphabet (Suppl. Tables 1 and 2). We encountered multiple difficulties. First, 2′-deoxy-6-methylzebularine proved to be insufficiently stable in both acid and base to be useful in standard phosphoramidite DNA synthesis. [xiii] This problematic chemical reactivity was only partly mitigated by placing substituents on the heterocycle. Unfortunately, 5-phenyl and 5-propynyl substituted 2′-deoxyzebularines could not be made, while DNA containing 4,5-dimethylzebularine had a low Tm.
Further problems were encountered with hypoxanthine as a candidate G*. A dozen thermophilic DNA polymerases were tested for their ability to support PCR using primers containing 5 or 6 inosines as G* (data not shown). Most polymerases from extreme thermophiles rejected hypoxanthine, possibly because it is a deamination product of adenosine occurring at very high temperatures where extreme thermophiles live.[xiv] In contrast, Taq DNA polymerase performed well reading through Samrs components in a template (Suppl. Figure 1). We therefore focused on Taq to develop PCR with primers that incorporated Samrs components (Suppl, Figure 2).
Here, we encountered the surprising result that standard DNA duplexes held together entirely by pairs joined by just two hydrogen bonds were remarkably poor primers. Melting temperatures of such duplexes were also surprisingly low.
To mitigate this problem, we first sought to replace zebularine derivatives as C*’s. N4-Methyl and N4-ethyl cytosines[xv] with adjacent 5-methyl groups proved not to form stable pairs. However, both N4-methyl and N4-ethyl cytosine performed well as C* (Suppl. Table 4); the N4-ethyl variant was chosen because it better distinguished various matches (Suppl. Table 5).
A complete Samrs using 2-thiothymine, 2-aminopurine, hypoxanthine, and N4-ethylcytosine as T*, A*, G* and C* was then developed. When introduced individually into a reference DNA duplex, the corresponding Samrs:standard pairs contributed to duplex stability as well as an A:T pair (Table 1).[xvi] In every case, however, the Samrs:Samrs pair contributed to the stability of the reference duplex less than the corresponding Samrs:standard pair.
Table 1.
Melting temperatures (Tms) for 5′-ACCAAGCXATCAAGT-3′ and 3′-TGGTTCGYTAGTTCA-5′. Boxes with bold outline hold Tms for complementary pairs, in two contexts. Doubly underlined Tms are from duplexes having matched Samrs:Samrs pairs; these are lower than the corresponding singly underlined Tms, which have Samrs:standard or standard:Samrs pairs. The underlined Tms are similar to those having A:T and T:A pairs, also joined by 2 hydrogen bonds. Note also how the off-diagonal terms representing formal mismatches have lower Tms than the N:N* pairs.
We then turned to develop polymerases that used this optimized chemistry, recognizing that the properties of polymerases are rarely predictable.[xvii] Surprisingly, we found that 25-mer primers forming duplexes joined uniformly by two hydrogen bonds performed unpredictably, even at low temperatures using the Klenow fragment of DNA polymerase 1 (Suppl. Figure 3). The inefficiency in priming correlated with the low Tms of their Samrs:standard duplexes. Thus, the Tms of a set of Samrs 25-mers paired with complementary standard DNA were all approximately 40 ºC (Suppl. Table 6), far below the 60–70 °C Tm of a typical 25-mer duplex build from equal proportions of A, T, G, and C.
To mitigate this problem, various backbones were examined to support Samrs, including 2′-O-alkyl ribonucleosides. While improving the stability of duplexes joined by Samrs:standard pairs, these diminished the ability of the oligonucleotide to support PCR.
We therefore asked whether chimeric primers containing Samrs at their 3′-segments and standard nucleotides in their 5′-segments would still display useful self-avoidance. A pair of primers targeting the Taq gene that perfectly matched in their last nine nucleotides (Figure 2) were prepared with zero, four or eight Samrs components in their 3′-segments, with the 3′-terminal nucleotide remaining standard (to lower the cost of synthesis).
Figure 2.
Amplification of the Taq gene demonstrating the ability of Samrs to manage PCR artifacts in a “worst case design” scenario, where the forward and reverse primers are formally complementary in their last nine nucleotides. Standard primer pairs give only primer dimers in these cases. Primer pairs having Samrs components in the 3′-segment, even as few as four, give the desired 1109 nucleotide amplicon. + and − indicate with and without target gene. A* = 2-aminopurine. G* = hypoxanthine. T* = 2-thiothymine. C* = N4-ethylcytosine.
Standard-F: 5′-TATCTGCGTGCCCTGTCTCTGGAGG-3′
Standard-R: 5′-CCAATGCCAACCTCTACCTCCAGAG-3′
SAMRS-20+4*+1-F: 5′-TATCTGCGTGCCCTGTCTCTG*G*A*G*G-3′
SAMRS-20+4*+1-R: 5′-CCAATGCCAACCTCTACCTCC*A*G*A*G-3′
SAMRS-16+8*+1-F: 5′-TATCTGCGTGCCCTGTC*T*C*T*G*G*A*G*G-3′
SAMRS-16+8*+1-R: 5′-CCAATGCCAACCTCTAC*C*T*C*C*A*G*A*G-3′
PCR results were striking. Primer pairs built from standard nucleotides failed completely to yield the desired amplicon (1109 base pairs, Figure 2); only primer-dimer was observed. The combination where one of the primers was built from standard nucleotides while the other contained four Samrs nucleotides gave the amplicon only inefficiently, with primer dimer arising from Samrs-standard mismatching between the primers. However, when both primers had four or eight Samrs components, PCR amplification efficiently gave only the desired amplicon. This was a surprising demonstration of the Samrs effect in PCR, even for short Samrs segments. This result was confirmed in real time PCR using primers with eight Samrs components near their 3′-ends in a chimeric {16+8*+1} architecture (Suppl. Fig. 4).
We then tested multiplexed PCR using Samrs primers. Ten pairs of chimeric {16+8*+1} primers were prepared to target 14 cancer-relevant genes. The primers were chosen to give a ladder of amplicons of increasing length, facilitating analysis of the ten PCR products by agarose gel electrophoresis. They were not designed by computer programs to explicitly avoid PCR artifacts. Control primers had analogous sequences built entirely from standard nucleotides.
Single-plexed PCR was successful with all of the {16+8*+1} primer pairs (Figure 3c). With standard primer pairs, singleplexed PCRs were also successful, except with the PTPN11 amplicon, which failed because of primer dimer formation. Grouping these primer pairs in sets showed the advantage of Samrs primer pairs over standard primer pairs in multiplexed PCR. For example, with standard primer pairs, 5-fold multiplexing (FLT3, TSHR, EGFR, CTNNB1, APC, Figure 3b, right) gave only two of the five desired amplicons. In contrast, the analogous multiplexing with {16+8*+1} Samrs primer pairs generated all desired amplicons (Figure 3d, right). PCR with all ten {16+8*+1} Samrs primer pairs gave all ten amplicons (Figure 3d). In contrast, PCR with standard primer pairs gave only five (or possibly six) of the ten desired amplicons (Figure 3b).
Figure 3.
PCR amplification of 10 cancer genes using primer pairs chosen to allow the amplicons to be conveniently separated by size on a 3% agarose gel, using sequences in Suppl. Table 7. (a) Singleplexed PCR with indicated primer pairs containing only standard nucleotides. (b) Attempted multiplexed PCR with indicated pairs containing only standard nucleotides. (c) Singleplexed PCR with indicated pairs containing {16+8*+1} standard-Samrs-standard primer pairs. (d) Multiplexed PCR with indicated pairs containing {16+8*+1} standard-Samrs-standard primer pairs.
This work reinforces the evolving view of DNA as a complex organic molecule rather than a simple linear string that pairs following simple rules, the first generation model for DNA of Watson and Crick.[xviii] Thus, while we expected that duplexes joined more A:T-like base would have lower Tms, we did not expect the Tm to drop so severely as the fraction of such pairs approached unity. Considering the etiology of nucleic acids,[xix] it is tempting to infer from this a need in DNA for at least one pair to be joined by three hydrogen bonds.
Supplementary Material
Acknowledgments
This work was supported by a grant from Nucleic Acids Licensing LLC and by the National Human Genome Research Institute under 1R01 HG004831.
Footnotes
Supporting information for this article is available on the WWW under http://www.angewandte.org or from the author.
References
- i.Fredriksson S, Banér J, Dahl F, Chu A, Ji H, Welch K, Davis RW. Nucleic Acids Res. 2007;35:e47, 6. doi: 10.1093/nar/gkm078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- ii.Yang Z, Chen F, Chamberlin SG, Benner SA. Angew Chem Int Ed. 2010;49:177–180. doi: 10.1002/anie.200905173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- iii.Benner SA. Accounts Chem Res. 2004;37:784–797. doi: 10.1021/ar040004z. [DOI] [PubMed] [Google Scholar]
- iv.Geyer CR, Battersby TR, Benner SA. Structure. 2003;11:1485–1498. doi: 10.1016/j.str.2003.11.008. [DOI] [PubMed] [Google Scholar]
- v.Cech D, Holy A. Coll Czech Chem Comm. 1977;42:2246–2260. [Google Scholar]
- vi.Kutyavin IV, Rhinehart RL, Lukhtanov EA, Gorn VV, Meyer RB, Jr, Gamper HB., Jr Biochemistry. 1996;35:11170–11176. doi: 10.1021/bi960626v. [DOI] [PubMed] [Google Scholar]
- vii.Connolly BA, Newman PC. Nucleic Acids Res. 1989;17:4957–4974. doi: 10.1093/nar/17.13.4957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- viii.Lan T, McLaughlin LW. J Am Chem Soc. 2000;122:6512–6513. [Google Scholar]
- ix.Ishizuka T, Yoshida J, Yamamoto Y, Sumaoka J, Tedeschi T, Corradini R, Sforza S, Komiyama M. Nucl Acids Res. 2008;36:1464–1471. doi: 10.1093/nar/gkm1154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- x.Demidov VV, Protozanova E, Izvolsky KI, Price C, Nielsen PE, Frank-Kamenetskii MD. Proc Natl Acad Sci USA. 2002;99:5953–5958. doi: 10.1073/pnas.092127999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- xi.Gamper HB, Gewirtz A, Edwards J, Hou YM. Biochemistry. 2004;43:10224–10236. doi: 10.1021/bi049196w. [DOI] [PubMed] [Google Scholar]
- xii.Lahoud G, Timoshchuk V, Lebedev A, de Vega M, Salas M, Arar K, Hou YM, Gamper H. Nucleic Acids Res. 2008;36:3409–3419. doi: 10.1093/nar/gkn209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- xiii.Vives M, Eritja R, Tauler R, Marquez VE, Gargallo R. Biopolymers. 2004;73:27–43. doi: 10.1002/bip.10515. [DOI] [PubMed] [Google Scholar]
- xiv.Kamiya H, et al. Chem Pharm Bull. 1992;40:2792–2795. doi: 10.1248/cpb.40.2792. [DOI] [PubMed] [Google Scholar]
- xv.Nguyen HK, Bonfils E, Auffray P, Costaglioli P, Schmitt P, Asseline U, Durand M, Maurizot JC, Dupret D, Thuong NT. Nucleic Acids Res. 1998;26:4249–4258. doi: 10.1093/nar/26.18.4249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- xvi.Bommarito S, Peyret N, SantaLucia J. Nucleic Acids Res. 2000;28:1929–1934. doi: 10.1093/nar/28.9.1929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- xvii.Horlacher J, Hottiger M, Podust VN, Huebscher U, Benner SA. Proc Natl Acad Sci USA. 1995;92:6329–6333. doi: 10.1073/pnas.92.14.6329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- xviii.Watson JD, Crick FHC. Nature. 1953;171:737–738. doi: 10.1038/171737a0. [DOI] [PubMed] [Google Scholar]
- xix.Eschenmoser A. Science. 1999;284:2118–2124. doi: 10.1126/science.284.5423.2118. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.