Abstract
The assembly of synthetic oligonucleotides into genes and genomes is an important methodology. Several methodologies for such synthesis have been developed, but they have two drawbacks: 1) the processes are slow and 2) the error frequencies are high (typically 1 – 3 errors/kb of DNA). Thermal damage is a major contributor to biosynthetic errors. In this paper, we elucidate the advantages of rapid gene synthesis by polymerase chain assembly (PCA) when used in combination with smart error control strategies. We used a high speed thermocycler (PCRJet®) to effectively minimize thermal damage and to perform rapid assembly of synthetic oligonucleotides to construct two different genes: endothelial protein C receptor (EPCR) and endothelial cell thrombin receptor, thrombomodulin (TM). First, the intact EPCR gene (EPCR-1, 612 bp) and a mutant EPCR-2 (576 bp) that lacked 4 N-linked glycosylation sites were constructed from 35 and 33 oligonucleotides, respectively. Next, for direct error comparison, another longer gene, the 1548 bp TM gene was constructed from 87 oligonucleotides by both rapid and conventional PCA. The fidelity and accuracy of the synthetic genes generated in this manner were confirmed by sequencing. The combined steps of PCA and DNA amplification are completed in about 10 and 22 minutes for EPCR -1, 2 and TM genes, respectively with comparable low errors in the DNA sequence. Furthermore, we subcloned synthetic TM, EPCR-1, EPCR-2 and native EPCR-1 (amplified from cDNA) into a Pichia Pastoris expression vector to evaluate the expression ability, and to compare them with the native gene. Here, we illustrate that the synthetic genes, assembled by rapid PCA, successfully directed the expression of functional proteins. And, importantly, the synthetic and the native genes expressed proteins with the same efficiency.
Keywords: PCRJet® thermocycler, high-speed polymerase chain assembly, thermal damage of DNA and control, error comparison, gene synthesis and expression
1. Introduction
The chemical synthesis of a 77 bp long yeast ala-tRNA gene was first described in 1970 (Agarwal et al, 1970). Refinements in chemical techniques (e.g., the use of a combination of organic synthesis followed by enzymatic 5′ phosphorylation and DNA ligase-catalyzed joining) facilitated the total synthesis of a 126 bp gene encoding the E. coli suIII suppressor tyr-tRNA (Khorana, 1979). By the mid-1980s, automated phosphoramidite synthesizers were developed (Alvarado-Urbina et al, 1981; Hunkapiller et al., 1984; Engels and Uhlmann, 1989). Today, automated phosphoramidite synthesis machines routinely synthesize deoxyoligonucleotides 15 to 100 nucleotides long. In the classic gene assembly method (Khorana, 1979), synthetic DNA oligonucleotides are 5′ phosphorylated using T4 kinase, annealed to form overlapping duplexes, and then enzymatically joined using T4 DNA ligase. A variation of that method was used to synthesize a 5,386 bp ΦX174 RFI DNA molecule in two weeks (Smith et al., 2003) where 42-mer oligonucleotides were 5′ phosphorylated, gel-purified, annealed, and ligated using Taq ligase at 55°C. After overnight high temperature ligation, the DNA reaction product was amplified using the polymerase chain assembly (PCA) strategy described by Stemmer et al. (Stemmer et al., 1995). Recently, green-fluorescence-protein (GFPuv) was assembled from synthetic oligonucleotides by applying both coincidence filtering and consensus shuffling protocols to reduce errors in resultant DNA populations (Binkowiski et al., 2005). In previously described PCA strategies (Barnett and Erfle, 1990; Ciccarelli et al., 1991), several overlapping non-phosphorylated 5′OH oligonucleotides were annealed in a single reaction at concentrations of less than 0.1 μM. After 20 – 30 cycles of PCA amplification, the concentrations of the two outer primers were raised to ~1 μM and another 30 – 35 PCA cycles were carried out (Stemmer et al., 1995). Although only a few full-length DNA molecules assemble during the first PCR stage, they are selectively amplified during the second stage. Using this technique, Stemmer et al. (1995) constructed an ~2,700 bp long synthetic plasmid from 134 different overlapping 5′OH oligonucleotides. Refinements to the Stemmer et al. (Stemmer et al., 1995) protocol have been described by Mehta et al. (1997), Gao et al. (2004), Shevchuk et al. (2004), and Young and Dong (2004). Subsequently, double-stranded DNA molecules 1.2 to 20 kb have been assembled by PCA processes.
To our knowledge, the recent report of a 1,200 bp synthetic DNA fragment constructed in one week is the fastest kilobase DNA synthesis to date, though a synthesis of a 5,386 bp DNA molecule in two weeks has been documented, too (Smith et al., 2003; Young and Dong, 2004). In another study, parallel processing to construct about 10 fragments of ~5 kb in two weeks was proposed (Kodumal et al., 2004). However, in all reported experiments using PCA-based synthesis, error rates of 2 – 5 errors/1,000 bp have been reported. For example, 11 sequence mistakes per synthetic 5,386 bp ΦX174 RFI molecule were observed elsewhere (Smith et al., 2003). At an error rate of 2 errors/1,000 bp (Smith et al., 2003), a synthetic 48.5 kilobase pair phage λ DNA molecule would contain ~97 errors, a molecule >100 kb long would contain >200 errors, and a chromosome-sized one megabase pair DNA molecule would contain ~2,000 errors. Therefore, a clear understanding of the sources of synthetic errors, and methods to control them, become an essential part of the biomolecular engineering approach to prepare synthetic genes. We recently proposed a quantitative model of error accumulation during PCR amplification (Rouillard et al., 2004). The proposed quantitative model predicts the accumulation of errors over the course of a PCR cycle. Our model analysis shows thermal damage contributes significantly to the total number of detectable errors; therefore, consideration must be given to thermal management of the PCA process.
Few precautions have been taken to minimize DNA thermal damage during gene assembly experiments. Furthermore, the thermocyclers used during these experiments are slow because they are often used in combination with conservatively long thermocycling protocols. We hypothesize that high-fidelity synthesis of DNA molecules requires an awareness of biosynthetic errors and a practical strategy to minimize these errors. Error control requires a fundamental understanding of 1) the relationship between PCR kinetics and errors that occur during DNA replication and 2) thermal management of the process to minimize high temperature exposure. We used a combination of a fast thermocycler (in which DNA spends little time at elevated temperature) with kinetically optimized DNA biochemistry to optimize gene assembly. Using fast PCRJet® thermocycling, not only is the PCA process faster, but unwanted thermal damage (A+G depurination and cytosine deamination) is effectively minimized. As a case study, we selected the de novo synthesis of two genes associated with endothelial protein-C receptor (EPCR). A synthetic gene ~612 bp, EPCR-1, which encodes all the extracellular domains of EPCR and EPCR-2, a synthetic gene ~576 bp that encodes all the extracellular domains of EPCR but lacks the four N-glycosylation sites, was constructed de novo. For error comparison, another longer gene, thrombin receptor gene, TM was constructed by both rapid and conventional PCA. A synthetic TM gene ~ 1548 bp with 6-His and a factor Xa cleavage site was designed to encode all the extracelluar domains of TM but lacking transmembrane domain. TM. The correctness of the assembled genes was confirmed by DNA restriction digest analysis, and the fidelity of the synthetic gene constructs assembled by the PCA process were authenticated by dideoxy sequence analysis. We further verified the functionality of our method by cloning and expression in a Pichia pastoris expression system.
2. Materials and Methods
2.1. Oligonucleotide design and synthesis
The template coding for EPCR-1 (i.e., residues 1 to 193) and EPCR-2 (i.e., residues 1 to 193 but lacking 36 bp corresponding to 4-N glcosylation sites) were based on the sequences of the EPCR. Oligonucleotides for a 612 bp EPCR-1 gene and a 576 bp mutant EPCR-2 gene were designed from a cDNA sequence of the human soluble EPCR deposited in the GenBank (GenBank accession no. L35545) using the Gene2Oligo (Rouillard et al., 2004,) computer program. Using the same program, oligonucleotides for 1548 bp TM gene were designed from a cDNA sequence (accesion no. BC035602) of the human TM after codon optimization using P. pastoris codons and adding 6-His tag and factor Xa site. The resulting oligonucleotides, were purchased from Integrated DNA Technologies (Coralville, IA). 5′OH oligonucleotides were synthesized by standard purification on a 25 nmol scale and stock solutions were obtained at 50 μM in water. KOD Hot Start polymerase was obtained from Novagen (Madison, WI). A gel elute extraction kit was obtained from Qiagen (CA).
2.2. Instrumentation
PCA assembly was carried out in a PCRJet® thermocycler (Megabase Research Products, Lincoln, NE). The PCRJet® is a pneumatic device typically operated with 50 –70 psig compressed air. Fast thermocycling with the PCRJet® is achieved by the flow of high velocity air through a thermostated reaction chamber that houses an array of capillaries. Two inlet air streams (one hot, one cold) are preconditioned to achieve the required temperature and flow rate in the reaction chamber (Moore, 2005; Quintanar and Nelson, 2002). Reactions are carried out in glass capillaries 25 μl or less volumes.
2.3. Assembly PCR and amplification of EPCR -1, EPCR -2 and TM genes
The PCA (Stemmer et al., 1995) of EPCR gene 1 was carried out in a 25 μl volume containing 0.1 μM of each oligonucleotide, 200 μM of each dNTP, 4 mM MgSO4, 400 μg/ml non-acetylated BSA and 0.5 unit KOD Hot-Start Polymerase in 1X manufacturer’s buffer. Assembly PCR reaction was conducted in a PCRJet® under the following conditions: a 30-second hot start at 92°C followed by 30 cycles × [92°C for 1 second and 68°C for 4 seconds]. Figure 1 explains the methodology to assemble the 35 oligonucleotides into 612 bp molecules. The full-length product from the assembly PCR reaction was amplified by PCR using the outer primers: 5′-ATCTCGAGAAAAGATTTTGTAGCCAAGACGCCTC-3′ (forward) and 5′-TAGCGGCCGCTTACGAAGTGTAGGAGCGGCTT -3′ (reverse). at a concentration of 0.5 μM. PCR was carried out in 25 μl reaction volumes containing 0.5 μl assembly PCR mix, 200 μM of each dNTP, 4 mM MgSO4, 400 μg/ml non-acetylated BSA, 0.75 μM of each primer, and 0.5 unit KOD Hot-Start Polymerase in 1X manufacturer’s buffer. PCR was conducted in the PCRJet® thermocycler under the following conditions: a 30-second hot start at 92°C followed by 30 cycles × [92°C for 1 second, 60°C for 2 seconds, and 72°C for 4 seconds].
Figure 1.
The assembly protocol for a 612 bp EPCR Gene 1. A) 35 oligonucleotides were designed by the Gene2Oligo software. The length of overlapping hybridization units is optimized by the software to ensure specificity and uniform melting temperature. F and R correspond to the forward or (+) and the reverse or (−) strand of the input sequence. B) Oligonucleotides are mixed at equimolar concentration and hybridized using polymerase chain assembly procedure (PCA). C) PCA results in a few full length molecules and several incomplete molecules. D) The full length synthetic gene is selectively amplified by PCR with outer primers
For error comparison, a control, EPCR was assembled under conventional PCR condition using the same PCRJet®. Assembly PCR reaction was conducted under the following conditions: a 30-second hot start at 92°C followed by 30 cycles × [92°C for 30 second and 68°C for 60 seconds. PCR condition: a 30 second hot start at 92°C followed by 30 cycles × [92°C for 30 second, 60°C for 30 seconds, and 72°C for 30 seconds]. Synthesis of EPCR-2 was undertaken in a manner similar to EPCR-1.
The PCA of TM gene was carried out in a 100 μl volume containing 0.1 μM of each oligonucleotide, 200 μM of each dNTP, 4 mM MgSO4, 400 μg/ml non-acetylated BSA and 0.5 units KOD Hot-Start Polymerase in 1X manufacturer’s buffer. Assembly PCR reaction was conducted in a PCRJet® under the following conditions: a 30-second hot start at 94°C followed by 30 cycles × [94°C for 2 seconds and 60°C for 10 seconds, 72°C for 5 seconds]. The full-length product from the PCA reaction was amplified by PCR using the following outer primers: 5′-ATCTCGAGAAAAGACATCATC-3′ (forward) and 5′-TAGCGGCCGCTTAAGAATG-3′ (reverse). PCR was carried out in 25 μl reaction volumes containing 0.25 μl assembly PCR mix, 200 μM of each dNTP, 4 mM MgSO4, 400 μg/ml non-acetylated BSA, 0.75 μM of each primer, and 0.5 units KOD Hot-Start Polymerase in 1X manufacturer’s buffer. PCR was conducted in the PCRJet® thermocycler under the following conditions: a 30-second hot start at 94°C followed by 35 cycles × [94°C for 2 second, 60°C for 3 seconds, and 72°C for 10 seconds].
For error comparison, a control, TM gene was assembled under conventional PCR condition using the same PCRJet®. Assembly PCR reaction was conducted under the following conditions: a 30-second hot start at 94°C followed by 30 cycles × [94°C for 30 seconds and 60°C for 60 seconds, 72°C for 30 seconds]. PCR amplification was conducted using above mentioned primers under the following conditions: a 30-second hot start at 94°C followed by 35 cycles × [94°C for 30 seconds and 60°C for 30 seconds, 72°C for 60 seconds].
After amplification, the purified PCR products from both rapid and conventional PCR were digested by Xho I and Not I and then directionally ligated with pPICZaA vector that was previously digested with both enzymes. The ligated plasmid was transformed into competent cells of E. coli strain TOP-10 (Invitrogen) and then positive 10 clones were selected from each sample. Purified plasmids from selected clones were sequenced at University of Nebraska-Lincoln sequencing facility using vector and gene specific primers.
2.4. Vector constructions and expression of synthetic and native EPCR-1, EPCR-2 and TM genes in a Pichia pastoris system
Synthetic EPCR-1, EPCR-2, native EPCR-1 [corresponding to the residues 1 – 194, mature protein numbering] (Simmonds and Lane, 1999) and TM [corresponding to the residues 19–515 mature protein numbering, with 6 His and factor Xa site at the N-terminal] genes were expressed in Pichia pastoris (strain X-33) system (Invitrogen, Carlsbad, CA) according to the manufacturer’s protocol. In brief, cDNA encoding human soluble EPCR (GenBank accession no. L35545) was PCR amplified from the Gateway entry vector pENTR™221 (Invitrogen, Life Technologies) using primer pairs designed on the basis of the EPCR sequence (Simmonds and Lane, 1999). PCR was performed using Hot Start DNA polymerase (Stratagene). The respective sequences of the forward and reverse primers used to construct pPICZαA-native EPCR-1 were: (forward) 5′-ATCTCGAGAAAAGATTTTGTAGCCAAGACGCCTC-3′ (reverse) 5′-TAGCGGCCGCTTACGAAGTGTAGGAGCGGCTT-3′. The bold underlined nucleotide bases indicate restriction-enzyme digestion-site for Xho I and Not I, respectively.
A 612 bp EPCR-1, 576 bp EPCR-2 and 1548 bp TM genes were synthesized by adding Xho I and Not I restriction sites as described above. The amplified PCR product of EPCR-1 gene wild type and synthetic EPCR-1, EPCR-2 and TM genes were digested by Xho I and Not I for at least 3 hours at 37°C and then directionally ligated with pPICZαA vector that was previously digested with both enzymes. The ligated plasmid was transformed into competent cells of E. coli strain TOP-10 (Invitrogen) and then positive clones were selected. The expression plasmids were isolated and sequenced in order to confirm accuracy of the synthetic genes and also to verify the ligations and inserts were correctly in-frame. Plasmids were linearized with Pme I and transformed into X-33 Pichia pastoris competent cells. Transformation, expression and methanol induction of recombinant proteins were performed according to the manufacturer’s protocol provided for Pichia expression kit (Invitrogen). Samples of yeast culture supernatants were analyzed by SDS-PAGE and Western blotting. For evaluation of intracellular accumulation of TM, 0.25 gram of the cells were disrupted with Mini-Beadbeaster-48 using zirconia/silica beads and the cell lysates were centrifuged at 10,000g at 4°C, the supernatant and pellet were analyzed by SDS-PAGE and Western blotting. N-Glycosylation of the wild-type and synthetic EPCR-1 and TM were studied by digestion with endoglycosidase Endo H (New England Biolabs, Beverly, MA).
2.5. SDS-PAGE and Western blotting
For SDS-PAGE, yeast culture medium containing expressed synthetic and native EPCR-1 proteins were concentrated 6-fold using Micron YM-30 (Millipore) and 20 μl of concentrated culture medium from each sample were separated in 10% Bis-Tris NuPAGE gels according to the manufacturer’s protocol (Novex Pre-cast gel, Invitrogen). For Western blotting, the protein samples of 2 μl (native and synthetic EPCR-1) and 10 μl (EPCR-2) yeast culture medium and 10 micrograms of soluble protein (TM) were separated in 10% Bis-Tris, NuPAGE gels (Novex Pre-cast gel, Invitrogen) and the proteins were transferred to a PVDF membrane (Immobilon, Millipore) and then treated with a rabbit anti-human EPCR (for EPCRs) and anti His Tag Monoclonal Antibody (for TM). The signals were visualized with ECL plus detection reagent and hyperfilm ECL from Amersham Biosciences (Piscataway, NJ)
3. Results
3.1. Rapid Polymerase Chain Assembly of EPCR-1, EPCR-2 and TM genes
The two-step polymerase chain assembly (PCA) method detailed by Stemmer et al. (1995) was adapted to high-speed assembly reaction conditions using the PCRJet®. EPCR-1 (Figure 3A), EPCR-2 (Figure 3B) and TM (Figure 3D) were successfully assembled from either 35, 33 or 87 synthetic oligonucleotides, respectively, using PCA followed by amplification of the full length desired gene by PCR. The gene assembly method we delineate here differs from previously described methods in a number of ways. Our method uses a fast pressurized gas thermocycler PCRJet® (Quintanar and Nelson, 2002) for rapid PCA. As a result, the combined steps of PCR assembly and DNA amplification are completed in about 10 minutes. The temperature versus time profile of the PCA step using the PCRJet® is shown in Figure 2(a). The total time is about 4 minutes 30 seconds, including a 30-second hot start. In recent experiments, we reduced the number of cycles to 15 since only a few copies of the full-length assembled product are needed for further PCR amplification. The temperature versus time profile for the PCR amplification step is shown in Figure 2(b). The average elongation rate of KOD Pol is 290 – 300 nucleotides per second (Griep et al., 2006); thus, a 4-second elongation is more than sufficient to complete extension. For error comparison, EPCR-1 and TM genes were also assembled by conventional PCR using the same PCRJet. 10 clones were selected from each sample and purified DNA was analyzed by sequencing. No DNA sequence errors were observed in the 612 bp EPCR- 1 gene, synthesized by fast PCR, however 1.1 errors/kb observed for the same gene assembled by conventional PCR. Error rates of 0.53 errors/kb (fast PCA) versus 0.9 errors/kb (conventional PCA) was observed for TM gene. These collective findings demonstrate that fast PCR eliminates thermal damage of DNA and reduces errors in DNA sequences.
Figure 3.
Two-step polymerase chain assembly of a 612 bp EPCR gene 1 (A) and a 576 bp EPCR-2 gene (B). A) Eight microliters of each sample was analyzed on a 2% agarose gel. Lanes: (M) 1 Kb DNA Ladder (Fermentas Inc.); Lane 1: 612 bp assembly PCR product with a 4-second elongation; Lane 2: 612 bp assembly PCR product with 5-second elongation. B) Eight microliters of each sample was analyzed on a 2% agarose gel. Lanes: (M) 100 bp DNA Ladder (Fermentas Inc.); Lanes 1 and 2: 576 bp assembly PCR product with a 4-second elongation. C) Gel purified EPCR genes 1 and 2. Eight microliters of each sample was analyzed on a 2% agarose gel. Lanes: (M) 100 bp DNA Ladder (Fermentas Inc.); Lane 1: gel purified 612 bp EPCR gene 1, Lane 2: gel purified 576 bp EPCR-2 gene. D) gel purified 1548 bp TM gene
Figure 2.
Temperature versus time profiles for two-step polymerase chain assembly in the PCRJet® thermocycler. A) First-round Assembly PCR: 30-second KOD Pol hot start at 92°C followed by 30 cycles × [1 second at 92°C and 4 seconds at 68°C] in a 25 μl reaction volume. B) Second round PCR amplification: 0.5 μl of the first round reaction product was amplified in a new 25 μl reaction volume containing 0.5 μM outer primers: a 30-second hot start at 92°C followed by 30 cycles × [1 second at 92°C, 2 secconds at 60°C, and 4 seconds at 72°C].
3.2. Expression of synthesized genes in Pichia pastoris
The functionality of the synthesized genes was verified by expression of synthetic EPCR-1, EPCR-2, native EPCR-1 (native versus synthetic) and TM genes in the methylotrophic yeast Pichia Pastoris, a system highly successful for expressing heterologous proteins (Daly and Hearn, 2005). As a result, we were provided four constructs for recombinant protein expression: 1) pPICZαA/EPCR-1 (native), 2) pPICZαA/EPCR-1 (synthetic), 3) pPICZαA/EPCR-2 (synthetic) and pPICZαA/TM (synthetic). Regarding expression, all four constructs were linearized with Pme I at the vector 5′AOX1 promoter region, allowing for integration into the AOX1 locus of X-33 genome by homologous recombination. The transformants could be easily selected by their ability to grow on a Zeocin medium. The selected clones were induced according to the manufacturer’s protocol provided for Pichia expression kit (Invitrogen) and significant secretions of EPCRs genes were achieved after 24 hours, however, TM was not directed into the culture, but highly expressed intracellularly (see below). Yeast culture supernatants were analyzed by SDS-PAGE and Western blotting using rabbit anti-human soluble EPCR antibody. The SDS-PAGE and Western blotting results presented in Figure 4 show proteins expressed by the native and synthetic EPCR-1 genes are similar with respect to their: a) immunoreactivity with rabbit anti-human soluble EPCR antibody, b) both showed the same migration pattern on a denaturing SDS-PAGE with a band of ~40 kDa (Figure 4A, lanes 1 and 2) and c) incubation of both native and synthetic gene products by N-glycosidase F resulted in a unique and uniform lower band ~21 kDa (Figure 4B, lanes 2 and 3 and Figure 4C, lanes 2 and 3), indicating glycosylation was similar in both native and synthetic genes. Based on Western blotting (Figure 4A, lanes 1 and 2; Figure 4B, lanes 2 and 3) and SDS-PAGE results (Figure 4C, lanes 1 and 2), the gene expression efficiency of both the native and the synthetic genes was found to be very similar. Figure 4B, lane-1panel shows the EPCR-2 synthetic gene that lacked 4 glycosylation sites expressed in Pichia pastoris also cross-reacted well with rabbit anti-human soluble EPCR antibody and migrated as an ~20.5 kDa protein in the expected size. The expressed TM was analyzed by Western blotting using anti-His Tag Monoclonal antibody. As described above, highly disulfide bonded protein, recombinant TM, was not directed into the culture, however, using AOX1 promoter high level expression of rTM was observed intracellularly as a ~70 kDa glycoprotein and a deglycosilated MM of ~60 kDa as expected size (Fig. 4C). The molecular mass of expressed recombinant TM was higher under reducing condition (data not shown) with compare with non-reducing condition, similar to that of TM, purified from human lung endothelial membrane preparations. Collectively, these findings provide evidence that 1) the synthetic genes synthesized by our method produced functional proteins and 2) high-speed polymerase chain assembly is accurate.
Figure 4.
Expression of recombinant synthetic and native EPCR-1, EPCR-2 and TM genes in P. pastoris. A and B) Native (A, lane 1), synthetic EPCR-1 (A, lane 2), and synthetic EPCR-2 (B, lane1) were subjected to SDS-PAGE followed by Western blotting with a rabbit anti-human sEPCR antibody. 2 μl (Native and synthetic EPCR-1) and 10 μl of yeast culture supernatant (EPCR-2) were electrophoresed under reducing condition. Lanes 2, 3 (B) show native and synthetic EPCR-1 after deglycosylation. Proteins from 2 μl (native and synthetic EPCR-1) yeast culture supernatant were subjected to deglycosylation by digestion with Peptide-N-Glycosidase F endoglycosidase (New England, Biolabs). C) Proteins from yeast culture medium were concentrated 6-fold using Micron YM-30 (Millipore) and 20 μl of concentrated culture medium from each sample and subjected to 10% SDS-PAGE. M- Protein standards; 1- P. pastoris clone that secrets native EPCR-1; 2- P. pastoris clone that secrets synthetic EPCR-1. D) P. pastoris (harboring excreting TM) cell lysate was prepared as described in Materials and Methods. 10 μg of soluble proteins were run on SDS-PAGE followed by western blot using anti-His tag monoclonal antibody (lane 1) as described in Materials and Methods. 10 μg of soluble proteins were treated with Endo H endoglycosidase (New England, Biolabs) and were subjected to deglycosylation (lane 2).
4. Discussion
High-fidelity molecular DNA synthesis requires an awareness of biosynthetic errors and a practical strategy for the minimization of these errors. Error control requires a fundamental understanding of 1) the relationship between PCR kinetics and the errors that occur during polymerase-catalyzed nucleotide insertions and 2) thermal management to minimize high temperature exposure. The combination of a fast thermocycler (in which DNA spends very little time at elevated temperatures) with kinetically optimized DNA biochemistry is ideal to assemble synthetic DNA fragments.
Three sources of error may be identified during the construction of synthetic DNA molecules: 1) errors from the phosphoramidite synthesis of the oligodeoxyribonucleotides, 2) editing errors that occur during DNA polymerase-catalyzed enzymatic copying, and 3) errors that result from thermal damage to DNA. We used synthetic 5′ hydroxylated oligonucleotides obtained from a commercial vendor; thus, deletions, mismatch errors, and/or guanine oxidation products that occur during phosphoramidite synthesis (23–25) were beyond our control. However, shorter synthetic oligomers have fewer errors – for that reason, we limited their size to ≤40 nucleotides.
The second source of error stems from enzyme editing mistakes when annealed oligomers are enzymatically extended. Extension errors depend on the fidelity of the polymerase. Extension errors can be reduced with a high fidelity polymerase enzyme and optimization of the biochemical reaction conditions during DNA extension. The fast thermostable β-type DNA polymerase Pyrococcus kodakaraensis (KOD Pol; Toyobo Co. Ltd., Osaka; Mizuguchi et al., 1999) has proofreading capability and it is preferred for rapid DNA amplification. The error frequency of Pyrococcus kodakaraensis (KOD Pol) under equimolar dNTP pool conditions was reported as 1.1 errors per million base pairs (Mizuguchi et al., 1999)). In other words, errors induced by KOD Pol are expected to occur at a rate significantly lower than the rate of errors stemming from phosphoramidite synthesis. A further advantage of KOD Pol is the high extension rate, typically 270 – 300 nucleotides per second (Griep et al., 2006a; Grip et al., 2006b), which further promotes rapid PCR assembly.
The greatest source of error in synthetic DNA molecule construction is thermally induced damage. Generally speaking, thermal damage to DNA is minimized when thermolabile nucleic acids spend as little time as possible at elevated temperatures (> 50°C). For dsDNA, the rate of depurination increases four-fold for every 10°C increase in temperature. The depurination rate of ssDNA is approximately 3.3 times higher than the depurination of dsDNA (Lindahl and Nyberg, 1972). However, cytosine deamination of ssDNA occurs much faster than in dsDNA (Lindahl and Nyberg, 1972). Therefore, enzymatic steps such as a 65°C overnight Taq ligase reaction (Smith et al., 2003) are expected to result in substantial thermal depurination and C deamination to U in DNA on the order of 2 – 3 errors/1,000 bp. Furthermore, many commonly used PCR thermocycling protocols, such as that of Saiki et al. (1985; 1 minute at 94°C, 1 minute at 55°C, and 1 minute at 72°C), spend unnecessarily long times at elevated temperatures. High-speed thermocycling protocols (Quintanar and Nelson, 2002; Whitney et al., 2004; Wittwer et al., 1990; Padhye et al., 1990) that use parameters such as 0.5–2.0 seconds at 94°C, 0.5–10.0 seconds at 55°C, and 0.5–10.0 seconds at 72°C avoid excess “cooking” of nucleic acids, especially in the thermolabile ssDNA form. With fast PCRJet® thermocycling, not only is the PCR process faster, but unwanted thermal damage (A+G depurination and cytosine deamination) is effectively minimized. Our results demonstrated that fast PCR indeed eliminates thermal damage of DNA and significantly reduces errors in DNA sequences. No DNA sequence errors were observed in the 612 bp EPCR- 1 gene, synthesized by fast PCR, however 1.1 errors/kb observed for the same gene assembled by conventional PCR using the same PCRJet. For TM gene, assembled by both fast and conventional PCR, error rates was 0.53 errors/kb and 0.9 errors/kb, respectively. As mentioned earlier, we have used synthetic 5′ hydroxylated oligonucleotides obtained from a commercial vendor and any errors that were introduced during phosphoramidite synthesis were extrinsic. However, shorter synthetic oligomers are known to have fewer errors and for that reason, we limited their size to 40 nucleotides. In the case of EPCR-1 we did not observe any errors in the ten clones that were sequenced. It perhaps shows that the starting oligonucleotides that we used for PCA had an error rate of less than 1:6000. High-throughput gene synthesis on microchips using improved oligonucleotide synthesis method was recently reported (Tian et al., 2004; Engels, 2005). Tian et al (2004) developed a stringent hybridization method to remove error-containing oligonucleotides and using such oligos, a complete operon of 14.6 kb for 21 genes was assembled. The error rate for the oligonucleotides was estimated to be 1:7300 bp, confirming that error rate of synthetic oligonucleotides is one of the limiting step.
A quantitative model of error production during PCA process was recently developed (Pienaar et al., 2006). Errors due to thermal damage stem mostly from three sources: 1) A+G depurination (Lindahl and Nyberg, 1972), 2) oxidative damage of guanine to 8-oxoG (Frelon et al., 2002; Hsu et al., 2004), and 3) cytosine deamination to uracil (Lindahl and Nyberr, 1974). Oxidative damage can be reduced in PCR experiments by purging mixtures with argon to remove dissolved oxygen. Cytosine deamination is slowest at pH 8 to 8.5 – the rate increases sharply at higher pH and more gradually at lower pH (when a small positive salt effect is also exhibited). The deamination rates are different for ssDNA and dsDNA; Fryxell and Zuckerkandl (2000) used data from three different sources to obtain the following dsDNA deamination rate:
C deamination of double stranded DNA
| (1) |
Lindahl and Nyberg (Lindahl and Nyberr, 1974) measured cytosine deamination rates in denatured E. coli DNA. At 95°C and 80°C, the rates are 2.2×10−7 s−1 and 1.3×10−8 s−1, respectively. If an Arrhenius plot is fitted between the two points, the rate constant is:
C deamination of single stranded DNA
| (2) |
Cytosine deamination of ssDNA does not only have a much larger activation energy than the deamination rate of dsDNA, but the reaction is much faster over the temperature ranges 60 – 95°C.
Depurination reactions are strongly catalyzed at low pH. Based on the experimental data of Lindahl and Nyberg (1972), the rate constant for depurination of dsDNA at pH 7.4 is:
A+G depurination of double stranded DNA
| (3) |
The depurination rate constant of ssDNA is nominally 3.3 times higher than for dsDNA as illustrated elsewhere (28):
A+G depurination of single stranded DNA
| (4) |
To apply equations (1 – 4) to a target template, it is necessary to know the degree of melting. At the beginning of a PCR cycle (i.e., the completion of primer/template annealing), the bulk of the template is in the single stranded form. As extension progresses, the fraction of the template that is double stranded increases. However, it is not correct to assume the polymerase/template complex marks the transition between dsDNA and ssDNA. Double stranded DNA may form bubbles, and localized melting may occur even at moderate temperatures. As the temperature increases to the denaturing temperature, the fraction of ssDNA template increases. Thus, a model that describes the helix/coil transitions is needed to assess the thermal damage.
A comparison of a protocol proposed by Saiki et al. (1985) (1 minute at 94°C, 1 minute at 55°C, and 1 minute at 72°C) with a typical PCRJet® protocol (0 seconds at 94°C,1 second at 55°C, and 2 seconds at 72°C) shows that total times per cycle (including ramp times) are 206 seconds and 4.66 seconds, respectively. First, the sample is heated to 94°C. No time is spent at the denaturation temperature and the sample is immediately cooled to 55°C. It is annealed for one second and the temperature is raised at a rate of 47°C/second to 72°C, where it remains for 2 seconds before it is raised to 94°C again for second cycle. The total time for one cycle is 4.7 seconds. The total thermal errors over one PCR cycle (depurination and C deamination) are 7.2/106 bp for the slow protocol and 0.026/106 bp for the fast protocol. The bulk of the damage occurs during the denaturation hold. Thus, a 50-fold reduction in the PCR period leads to a 270-fold reduction in errors.
Rapid Polymerase Chain Assembly of DNA with the PCRJet®
The two-step polymerase chain assembly method detailed by Stemmer et al. (7) was adapted to high-speed assembly reaction conditions using the PCRJet® to assemble a synthetic 612 bp EPCR-1 and 1548 bp TM genes from 35 or 87 synthetic 5′OH oligonucleotides, respectively. A web-based tool (Gene2Oligo), which designs oligonucleotides of uniform melting temperature, Tm, was used to design oligonucleotides. For both EPCR-1 and EPCR-2, the Tm ranges from 67°C to 75°C, with the average being 69.8°C. For TM gene the Tm ranged from 60°C to 66°C, with the average being 62°C Hence, we performed the assembly reactions at the highest possible temperature since the mismatches have a stronger destabilizing effect at high temperature – this diminishes the probability of incorporating mutated oligonucleotides into a final product. This full-length assembled product was further amplified by PCR. The total time needed for PCRJet® polymerase chain amplification of the 612 bp EPCR-1 gene starting with a mixture of 35 synthetic 5′OH oligonucleotides was about 10.0 minutes and for the 1548 bp TM gene was about 22 minutes. The fidelity of the synthesized PCA constructs was authenticated by dideoxy sequence analysis. The DNA sequence obtained was consistent with the cDNA sequence of the soluble EPCR and TM deposited in the GenBank (GenBank accession nos. L35545 and D00210). On an average ten clones were sequenced for each PCA construct assembled and the types of observed mutations was shown in the table 1. As we can see from the table errors stem mostly from (A+G) depurination, oxidative damage of G to 8-oxoG and C deamination. These results then support our hypothesis that thermal damage is a source of error in the gene assembly process. Additionally, the discrepancy between the fast protocol error rates for EPCR-1 (zero) compared to TM (0.53 errors/kb) may have arisen due to the sizes of TM (~1528 bp) versus EPCR (~612 bp). Although, the final, optimal conditions are mentioned in this paper, the best annealing temperature for assembly was always determined by an optimization scheme, wherein the assembly process was started 8° to 10° below the average melting temperature, and the temperature was gradually increased to produce the highest yield. As such, we were able to achieve both specificity and rapid assembly. The time required for annealing during the assembly step is longer than what is required for standard PCR –this translates to the time required for the oligos to hybridize. The annealing time has been optimized with gene fragments of various lengths (data not shown); we used the standard time based on the number of oligos. After the assembly and expression of synthetic wild type was authenticated, the deletion mutant EPCR-2 (576 bp) that lacked 4 glycosylation sites was assembled from a template, resulting in no detectable errors in the DNA sequence of the assembled EPCR-2 gene product.
Table 1.
Comparison of observed mutations in the EPCR-1 (612 bp) and TM genes (1548 bp) assembled by conventional and fast PCR
| Assembled genes | Number of clones sequenced | Errors/kb* | Total errors in 10 sequenced clones | Types of observed mutations, % | |||
|---|---|---|---|---|---|---|---|
| A | T | C | G | ||||
| EPCR-1 conventional PCR | 10 | 1.1 | 7 | 28.5 | 14.28 | 42.85 | 14.28 |
| EPCR-1 fast PCR | 10 | 0 | 0 | 0 | 0 | 0 | 0 |
| TM-1 conventional PCR | 10 | 0.9 | 14 | 35.71 | 28.57 | 14.28 | 21.42 |
| TM-1 fast PCR | 10 | 0.53 | 8 | 25.0 | 0 | 50.0 | 25.0 |
the numbers are reported in the manuscript
The following conclusions can be drawn from the work we elucidated here: 1) our de novo gene assembly methodology allows genes to be synthesized in an expedited time scale a once important proteins are identified; 2) the fast PCA process affords the efficient assembly of genes from templates of putative DNA sequences with comparable low errors; 3) the resulting synthetic genes can further be cloned and overexpressed to provide a large quantities of pure protein, which may then be used to generate high-affinity reagents in protein arrays; and 4) though we delineated the synthesis of a deletion mutant, our strategy can also be used to generate genes with substitutions, insertions, or targeted rearrangements in an expeditious manner. We believe the proposed strategy of rapid de novo gene synthesis – when coupled with proteomics –has the potential to accelerate the development of new disease markers and drug targets via the analysis of clinically relevant molecular events. Our future research efforts will aim to synthesize larger genes using an automated strategy in the rapid PCA thermocycler and also undertake the PCA and PCR of genes with a high CG content to generate error-time-temperature plots.
Supplementary Material
Acknowledgments
The authors thank Casey Kotera for his technical assistance. This work was supported in part by funds from the NIH (1 R21 RR022860-01).
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Agarwal KL, Buchi H, Caruthers MH, Gupta N, Khorana HG, Kleppe K, Kumar A, Ohtsuka E, Rajbhandary UL, Van de Sande JH, Sgaramella V, Weber H, Yamada T. Total Synthesis of the Gene for an Alanine Transfer Ribonucleic Acid from Yeast. Nature. 1970;227:27–34. doi: 10.1038/227027a0. [DOI] [PubMed] [Google Scholar]
- Alvarado-Urbina G, Sathe GM, Liu WC, Gillen MF, Duck PD, Bender R, Ogilvie KK. Automated Synthesis of Gene Fragments. Science. 1981;214:270–273. doi: 10.1126/science.6169150. [DOI] [PubMed] [Google Scholar]
- Barnett RW, Erfle H. Rapid Generation of DNA Fragments by PCR Amplification of Crude, Synthetic Oligonucleotides. Nucleic Acids Res. 1990;18:3094. doi: 10.1093/nar/18.10.3094. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beaucage SL, lyer RP. Advances in the Synthesis of Oligonucleotides by the Phosphoramidite Approach. Tetrahedron. 1992;48:2223–2311. [Google Scholar]
- Binkowiski BF, Richmond KE, Kaysen J, Sussman R, Belshaw PJ. Correcting errors in synthetic DNA through consensus shuffling. Nucleic Acids Res. 2005;33:e55. doi: 10.1093/nar/gni053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ciccarelli RB, Bunyuzlu P, Huan J, Scott C, Oakes F. Construction of synthetic genes using PCR after automated DNA synthesis of their entire top and bottom strands. Nucleic Acids Res. 1991;19:6007–6013. doi: 10.1093/nar/19.21.6007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Daly R, Hearn MT. Expression of heterologuos proteins in Pichia pastoris: A useful experimental tool in protein engineering and production. J Mol Recognit. 2005;18:119–138. doi: 10.1002/jmr.687. [DOI] [PubMed] [Google Scholar]
- Eadie JS, Davidson DS. Guanine modification during chemical DNA synthesis. Nucleic Acids Res. 1987;15:8333–8349. doi: 10.1093/nar/15.20.8333. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Engels JW, Uhlmann E. Gene Synthesis. Angewandte Chemie International Edition. 1989;28:716–733. [Google Scholar]
- Engels JW. Gene Synthesis on Microchips. Angewandte Chemie International Edition. 2005;44:7166–7169. doi: 10.1002/anie.200501896. [DOI] [PubMed] [Google Scholar]
- Frelon S, Douki T, Cadet J. Radical oxidation of the adenine moiety of nucleoside and DNA: 2-hydroxy-2′-deoxyadenosine is a minor decomposition product. Free Radic Res. 2002;36:499–508. doi: 10.1080/10715760290025889. [DOI] [PubMed] [Google Scholar]
- Fryxell KJ, Zuckerkandl E. Cytosine deamination plays a primary role in the evolution of mammalian isochores. Mol Biol Evol. 2000;14:1371–1383. doi: 10.1093/oxfordjournals.molbev.a026420. [DOI] [PubMed] [Google Scholar]
- Gao X, Yo P, Keith A, Ragan T, Harris T. Thermodynamically Balanced Inside-Out (TBIO) PCR-Based Gene Synthesis: A Novel Method of Primer Design for High-Fidelity Assembly of Longer Gene Sequences. Nucleic Acids Res. 2004;31:e143–e164. doi: 10.1093/nar/gng143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Griep M, Whitney S, Nelson M, Viljoen H. AIChE Journal. 2006;52:384–392. [Google Scholar]
- Griep M, Kotera C, Nelson M, Viljoen HJ. Comp Biol & Chem. 2006 In press. [Google Scholar]
- Hsu GW, Ober M, Carell T, Beese LS. Error-prone replication of oxidatively damaged DNA by a high-fidelity DNA polymerase. Nature. 2004;431:217–221. doi: 10.1038/nature02908. [DOI] [PubMed] [Google Scholar]
- Hunkapiller M, Kent S, Caruthers M, Hood L. A microchemical facility for the analysis and synthesis of genes and proteins. Nature. 1984;310:105–111. doi: 10.1038/310105a0. [DOI] [PubMed] [Google Scholar]
- Khorana HG. Total Synthesis of a Gene. Science. 1979;203:614–625. doi: 10.1126/science.366749. [DOI] [PubMed] [Google Scholar]
- Kodumal SJ, Patel KG, Reid R, Menzella HG, Welch M, Santi DV. Total synthesis of long DNA sequences: synthesis of a contiguous 32-kb polyketide synthase gene cluster. Proc Natl Acad Sci U S A. 2004;101:5573–8. doi: 10.1073/pnas.0406911101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lindahl T, Nyberg B. Rate of depurination of native DNA. Biochemistry. 1972;11:3610–3618. doi: 10.1021/bi00769a018. [DOI] [PubMed] [Google Scholar]
- Lindahl T, Nyberg B. Heat-induced deamination of cytosine residues in deoxyribonucleic acid. Biochemistry. 1974;13:3405–3410. doi: 10.1021/bi00713a035. [DOI] [PubMed] [Google Scholar]
- McClain WH, Foss K, Mittelstadt KL, Schneider J. Variants in clones of gene-machine-synthesized oligodeoxynucleotides. Nucleic Acids Res. 1986;14:6770. doi: 10.1093/nar/14.16.6770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mehta DV, DiGate RJ, Banville DL, Guiles RD. Optimized Gene Synthesis, High Level Expression, Isotopic Enrichment, and Refolding of Human Interleukin-5. Protein Expression & Purification. 1997;11:86–94. doi: 10.1006/prep.1997.0785. [DOI] [PubMed] [Google Scholar]
- Mizuguchi H, Nakatsuji M, Fujiwara S, Takagi M, Imanaka T. Characterization and application to hot start PCR of neutralizing monoclonal antibodies against KOD DNA polymerase. J Biochem. 1999;126:762–768. doi: 10.1093/oxfordjournals.jbchem.a022514. [DOI] [PubMed] [Google Scholar]
- Moore P. PCR: Replicating Success. Nature. 2005;435:235–238. doi: 10.1038/435235a. [DOI] [PubMed] [Google Scholar]
- Quintanar A, Nelson RM. A process and apparatus for high-speed amplification of DNA. US Patent No. 2002;6:472,186. [Google Scholar]
- Padhye NV, Nelson RM, Quintanar A, Viljoen HJ, Whitney SE, Shoemaker D, Henchal EA. High-speed Amplification of Bacillus anthracis DNA using a Helium-CO2 Gas Thermocycler. Genetic Engineering News. 2002;22(11):42–43. [Google Scholar]
- Pienaar E, Theron M, Nelson M, Viljoen HJ. A quantitative model of error accumulation during PCR amplification. Comp Biol Chem. 2006 doi: 10.1016/j.compbiolchem.2005.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rouillard JM, Lee W, Truan G, Gao X, Zhou X, Gulari E. Gene2Oligo: oligonucleotide design for in vitro gene synthesis. Nucleic Acids Res. 2004;32:W176–180. doi: 10.1093/nar/gkh401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saiki RK, Scharf S, Faloona F, Mullisk KB, Horn GT, Erlich HA, Anheim N. Enzymatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science. 1985;230:1350–1354. doi: 10.1126/science.2999980. [DOI] [PubMed] [Google Scholar]
- Shevchuk NA, Bryksin AV, Nusinovich YA, Cabello FC, Sutherland M, Ladisch S. Construction of long DNA molecules using long PCR-based fusion of several fragments simultaneously. Nucleic Acids Res. 2004;32:e19–e47. doi: 10.1093/nar/gnh014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simmonds RE, Lane DA. Structural and functional implications of the intron/exon organization of the human endothelial cell protein C/activated protein C receptor (EPCR) gene: comparison with the structure of CD1/major histocompatibility complex alpha1 and alpha2 domains. Blood. 1999;94:632–41. [PubMed] [Google Scholar]
- Smith HO, Hutchison CA, Pfannkock C, Venter JC. Generating a synthetic genome by whole genome assembly: ΦX174 bacteriophage from synthetic oligonucleotides. Proc Nat Acad Sciences USA. 2003;100(15):440–15,445. doi: 10.1073/pnas.2237126100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stemmer WPC, Crameri A, Ha KD, Brennan TM, Heyneker HL. Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. Gene. 1995;164:49–53. doi: 10.1016/0378-1119(95)00511-4. [DOI] [PubMed] [Google Scholar]
- Tian J, Gong H, Sheng N, Zhou X, Gulari E, Gao X, Church G. Accurate multiplex gene synthesis from programmable DNA microchips. Nature. 2004;432:1050–1053. doi: 10.1038/nature03151. [DOI] [PubMed] [Google Scholar]
- Whitney SE, Sudhir A, Nelson RM, Viljoen HJ. Principles of rapid polymerase chain reactions: mathematical modeling and experimental verification. Computational Biology and Chemistry. 2004;28:195–209. doi: 10.1016/j.compbiolchem.2004.03.001. [DOI] [PubMed] [Google Scholar]
- Wittwer CT, Filmore GC, Garling DJ. Minimizing the time required for DNA amplification by efficient heat transfer to small samples. Anal Biochem. 1990;186:328–331. doi: 10.1016/0003-2697(90)90090-v. [DOI] [PubMed] [Google Scholar]
- Young L, Dong Q. Two-step total gene synthesis method. Nucleic Acids Res. 2004;32:e59–e71. doi: 10.1093/nar/gnh058. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




