Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2004 Oct 20;101(44):15573–15578. doi: 10.1073/pnas.0406911101

Total synthesis of long DNA sequences: Synthesis of a contiguous 32-kb polyketide synthase gene cluster

Sarah J Kodumal 1, Kedar G Patel 1, Ralph Reid 1, Hugo G Menzella 1, Mark Welch 1, Daniel V Santi 1,*
PMCID: PMC524854  PMID: 15496466

Abstract

To exploit the huge potential of whole-genome sequence information, the ability to efficiently synthesize long, accurate DNA sequences is becoming increasingly important. An approach proposed toward this end involves the synthesis of ≈5-kb segments of DNA, followed by their assembly into longer sequences by conventional cloning methods [Smith, H. O., Hutchinson, C. A., III, Pfannkoch, C. & Venter, J. C. (2003) Proc. Natl. Acad. Sci. USA 100, 15440–15445]. The major current impediment to the success of this tactic is the difficulty of building the ≈5-kb components accurately, efficiently, and rapidly from short synthetic oligonucleotide building blocks. We have developed and implemented a strategy for the high-throughput synthesis of long, accurate DNA sequences. Unpurified 40-base synthetic oligonucleotides are built into 500- to 800-bp “synthons” with low error frequency by automated PCR-based gene synthesis. By parallel processing, these synthons are efficiently joined into multisynthon ≈5-kb segments by using only three endonucleases and “ligation by selection.” These large segments can be subsequently assembled into very long sequences by conventional cloning. We validated the approach by building a synthetic 31,656-bp polyketide synthase gene cluster whose functionality was demonstrated by its ability to produce the megaenzyme and its polyketide product in Escherichia coli.


The chemical synthesis of genes and genomes has received considerable attention for several decades and is becoming increasingly important in the exploitation of whole-genome sequence information. The field was pioneered by Khorana and coworkers with the then-heroic total synthesis of tRNA structural genes (1, 2) and by Itakura et al. (3) with the synthesis and expression of the somatostatin gene. Since then, DNA synthesis methodology has made steady progress, with current approaches relying on the enzyme-catalyzed assembly of short, chemically synthesized oligonucleotides. Of the various methods, polymerase cycling assembly (PCA) (4) is the most widely used because of its inherent simplicity. Overlapping, complementary oligonucleotides are annealed and recursively elongated with a heat-stable DNA polymerase to ultimately yield a full-length sequence, which is amplified by conventional PCR. PCA, first reported for synthesis of the 303-bp HIV-2 Rev gene (5), has since evolved (68) into a widely used general method for synthesis of genes of up to ≈1 kb.

The 1-kb size barrier was broken in 1990 by Mandecki et al. (9), who synthesized a 2.1-kb plasmid by ligation of 30 fragments, and again in 1995 when Stemmer et al. (7) reported the one-step PCA synthesis of a 2.7-kb plasmid that was purified by antibiotic selection. Smith et al. (4) assembled the 5,386 ϕX174 bacteriophage genome from a single pool of chemically synthesized oligonucleotides by using a combination of ligation and PCA methods, but purification of the product again required biological selection. In 2002, Cello et al. (10) described a stepwise synthesis of a 7,558-bp poliovirus cDNA by ligation and PCA. This sequence appears to be the longest synthetic DNA reported to date. Visionaries have even projected application of DNA synthesis technology to build synthetic, minimal genomes (11). If such goals are to be realized, methods will be needed to prepare long, contiguous, and perfect sequences of DNA without requiring biological selection for purification.

Our efforts to this end stemmed from a desire to develop heterologous expression of large polyketide synthase (PKS) genes in Escherichia coli. Type I modular PKS genes encode the giant enzymes (among the largest proteins known) that synthesize polyketide natural products such as erythromycin, epothilone, and tacrolimus (12). These genes reside within the high G+C genomes of the actinomycete and myxobacterial groups of prokaryotes and encode proteins with multiple sets, or modules, of active sites (domains). Each module catalyzes the assembly of a specific two-carbon-unit component of the polyketide product. We sought to recreate PKS genes with the twin objectives of optimizing their codon composition for efficient expression in E. coli and to introduce common restriction sites flanking modules and domains that would permit facile interchangeability, thus exploiting the full potential of combinatorial biosynthesis of “unnatural natural products” (12).

Smith et al. (4) proposed building very long DNA sequences by synthesis of ≈5-kb segments of DNA from short synthetic oligonucleotides, followed by their assembly into longer sequences by conventional methods. However, methodologies for preparing segments of ≈5 kb were not sufficiently accurate or facile to enable implementation of the approach. For the large number of sequences to be synthesized in our project, we therefore excluded the possibility of one-step synthesis of the ≈5-kb segments, because this would have required time-consuming, manual correction of numerous errors (4). Instead, we developed methods to build them in two error-free steps. First, we constructed multiple perfect sequences ≈500 bp in length called “synthons” (13); then we used a facile method, dubbed ligation by selection (LBS), to connect them into multisynthon segments of ≈5,000 bp. These segments, in turn, were readily assembled into larger sequences by conventional cloning strategies, as illustrated by our construction of a contiguous, synthetic 31.7-kb PKS gene cluster.

Materials and Methods

Enzymes were obtained from New England Biolabs, unless otherwise noted, and used as recommended. Molecular biological techniques were used as standard protocols (14). The pUC18-derived plasmids pKOS239-172-2 and pKOS293-172-A76 were reported in ref. 15. DH5α E. coli was made chemically competent with a kit from Zymogen Research (Orange, CA). Oligonucleotides were from Qiagen/Operon Technologies (Alameda, CA). NTPs were PCR-grade from Roche Applied Sciences. DNA sequencing was performed on an ABI 3730 DNA analyzer (Applied Biosystems) according to the manufacturer's recommended protocol.

For uracil DNA glycosidase/ligation-independent cloning (UDG/LIC), the forward primer was 5′-GCUAUAUCGCUAUCGAUGAGCUGCCACTGAGCACCAACTACG, and the reverse primer was 5′-GCUAGUGAUCGAUGCAUUGAGCUGGCACTTCGCTCACTACACC.

Gene synthesis and sequencing were assisted by an integrated automation system consisting of a BioMek FX, a robotic ORCA arm, and a tip lift (Beckman Coulter), plate sealer and piercer (Velocity11), Palo Alto, CA), two tetrad thermal cyclers (MJ Research, Cambridge, MA), and a cytomat hotel (Kendro, Asheville, NC).

Vector Construction. The BsaI site in the apramycin-resistance (ApR) genes of pKOS293-172-2 and pKOS293-172-A76 were changed to GAGATC by PCR-based SDM (16) to give pKOS309-52 [ApR and chloramphenicol-resistance (CmR)] and pKOS309-53 [ApR and kanamycin-resistance (KmR)], respectively. The tetracycline-resistance gene (TetR), obtained from pACYC184 by PCR, was introduced into the EcoRV site of pKOS309-52 with the 5′ end of the gene adjacent to the ApR gene to generate pKOS399-16-78 (ApR, CmR, and TetR). By using PCR-based site-directed mutagenesis, the BbsI site in the TetR gene of pKOS399-16-78 was changed to GTATTC to give pKOS399-21-1, and the EcoRI site in the CmR gene was changed to GAGTTC to give pKOS399-51-1. The streptomycin-resistance (StrR) marker, obtained by PCR from pAY1105, was introduced at the StuI site in pKOS309-53 with the 5′ end of the gene proximal to the ApR gene to generate pKOS399-16-69 (ApR, KmR, and StrR).

The linker 5′-AATTGGCACCGGGTTAATTAAGCGACCCGTTAA was inserted into the EcoRI site of pKOS399-51-1 and pKOS399-16-69, introducing a PacI site (nucleotides 14–21) and destroying the EcoRI site, to provide pKOS399-55-9 (ApR, CmR, and TetR) and pKOS399-56-2 (ApR, KmR, and StrR).

In preparation for ligation-independent cloning (LIC), a solution of 80 μl of each vector (20 μg) was digested with 8 μl of SacI (20 units/μl) for 2 h at 37°C. Then, 8 μl of nicking endonuclease N. BbvC IA (10 units/μl) was added and incubated for 2 h at 37°C. The mixture was heated at 65°C for 20 min and extracted with phenol. Samples were precipitated with 2 vol of cold EtOH, resuspended in 50 μl of 10 mM Tris-HCl at pH 7.5, adjusted to a final concentration of 20 ng/μl DNA, and stored at -20°C.

Synthetic Insert Design. DNA sequences to be synthesized were designed by using custom-designed software (http://software.kosan.com/GeMS). Briefly, it accepts a DNA or protein sequence and chooses, randomizes, and harmonizes codons of ORFs according to codon preference tables; the user may retain part or all of the natural DNA sequence. Allowed restriction sites are predicted, desired sites are chosen and inserted either by the user or automatically at defined intervals, and undesired sites are purged. Long genes are then divided into segments suitable for synthesis (here, 500–800 bp), and stem–loop structures are identified. After adding user-defined sequences at the insert ends to facilitate cloning procedures, primer specificity is optimized, and the software provides the overlapping oligo components in a format ready for production.

All DNA sequences were designed to lack the BsaI, BbsI, and XhoI sites needed for ligations (see below) and those sites required for subsequent manipulations. The ATG start codons of genes encoding an N-terminal fragment were preceded by the sequence CAT to allow cleavage with NdeI. The 5′ end of one fragment and the 3′ end of the adjacent fragment were designed to contain the same 6-nt sequence. The 5′ ends of each strand were appended with a distinct 20-nt universal template for PCR followed by a Type IIS restriction site, a BsaI site on the plus strand, and a BbsI site followed by G on the minus strand. Treatment with these endonucleases cleaves within the common 6 nt at the ends of each insert to provide cohesive 4-nt 5′ overhangs for ligation. Oligonucleotides of 40 bases were synthesized that collectively encoded both strands of the insert, each having 20-nt overlaps with 40-mer oligos from the opposite strand. The single-stranded 5′ overhangs were allowed to vary in size and filled during assembly.

Synthon Synthesis. Oligonucleotide consolidation and assembly. To each well of a microtiter plate was added 5 μl of a 50 μM solution (250 pmol) of each of the oligonucleotide components of a synthon, and sufficient water was added to double the volume. For synthons up to 1 kb, each well of the “assembly” microtiter plate was loaded with 48 μl of a stock solution containing 0.5 μl of Expand High Fidelity polymerase (5 units/μl, Roche), 1.0 μl of 10 mM dNTPs, 5.0 μl of 10× PCR buffer, 3.0 μl of 25 mM MgCl2, and 38.5 μl of water. To separate wells of the assembly plate, 2.0 μl of each oligonucleotide mixture was added. For synthons >1 kb, additional oligonucleotide mixture was added to keep the final concentration of individual oligonucleotides at 1 μM. Thermal cycling began with a 5-min denaturing step at 95°C, and continued with 25 cycles at 95°C for 30 s, 50°C for 30 s, and 72°C for 90 s.

Amplification. Each well of the “amplification” microtiter plate was loaded with 48.75 μl of a stock solution containing 0.5 μl of Expand High Fidelity polymerase (5 units/μl, Roche), 1.0 μl 10 mM dNTPs, 5.0 μl 10× PCR buffer, 3.0 μl of 25 mM MgCl2, 39.25 μl of water, and 1.0 μl each of the forward and reverse LIC primers. To separate wells, 1.25 μl was added to each assembly mixture. Amplification began with a 5-min denaturing step at 95°C and continued with 25 cycles at 95°C for 30 s, 62°C for 30 s, and 60 s at 72°C, with a final extension of 10 min.

UDG/LIC. Each well of the “ligation” microtiter plate was loaded with 2 μl of a solution containing 1 μl (20 ng) of the previously digested SacI/N. BbvC IA vector and 1 μl (1 unit) of USER enzyme mix (New England Biolabs Endonuclease VIII plus UDG). Five microliters of the amplification reaction product was added to each well. The reaction was incubated for 15 min at 37°C, followed by 15 min at room temperature. The plate was placed on ice for 2 min, and 5 μl of each reaction mixture was added to chemically competent DH5α E. coli cells on ice. After 30 min, cells were treated at 42°C for 45 s and 0°C for 2 min; 200 μl of LB was added to each well, and the contents of each were plated on LB plates containing 100 μg/ml carbenicillin. The sizes and sequences of inserts were verified by DNA sequencing.

LBS. Wells of a microtiter plate contained 6 μl of DNA (100–200 ng), 3 μl of the appropriate 10× NEB buffer, 3 μl of 10× BSA (250 μg/ml), and water to give a final volume of 28 μl. For reactions requiring BbsI digestion, 1 μl (5 units) of BbsI and 1 μl (20 units) of XhoI were included, and digestion was performed at 37°C for 2 h. Reactions requiring BsaI were first treated with 1 μl (5 units) of BsaI at 50°C for 1 h and then 1 μl (20 units) of XhoI at 37°C for 1 h. Samples were heated for 20 min at 80°C and analyzed by gel electrophoresis to verify digestion.

The ligation mixture, containing 3–4 μl (10–30 ng) of each of the digested donor and acceptor plasmids, 1.5 μl (600 units) of T4 ligase, and sufficient water to give 30 μl, was kept at ambient temperature for 2 h. A sample of 5 μl was used to transform DH5α, and the mixture was plated on LB containing the appropriate pair of antibiotics (chosen from kanamycin, chloramphenicol tetracycline, and streptomycin) to select for the plasmid containing the ligated inserts. Plasmids isolated from clones were digested with 5 units of NotI plus 10 units of EcoRI for 2 h at 37°C and analyzed by gel electrophoresis. Plasmids containing the inserts of correct size were purified and used for the next LBS cycle.

Results and Discussion

Building Error-Free Synthons. To identify an optimal length range for the synthons, we assumed, and later demonstrated (Table 1), that the error distribution in a population of PCA-synthesized DNAs would follow a Poisson distribution. Thus, the error frequency (EF) and sequence length (L) allow estimation of the fraction (F) of clones with accurate sequences and the number of clones (N) requiring sequencing to give 95% confidence of identifying a correct one. F is estimated as exp(-EF × L), and N = 0.301/[1 - log(1 - F)]. Accordingly, with a low EF of 0.2%, a 500-bp fragment would require sequencing of ≈1.5 clones to obtain a correct one, whereas a 2,000-bp sequence requires sequencing of ≈30 clones. Further, because N increases exponentially with EF and L, small increases in EF have a more pronounced effect on N for longer fragments. For example, with a 0.25% EF, only 0.05% higher than in the previous example, a 500-bp sequence requires sequencing of only ≈2.1 clones to obtain a correct one, whereas a 2,000-bp segment requires sequencing >100 clones. Moreover, additional error-causing effects are introduced as the length of synthesized genes increases. In PCA, for example, there is an inevitable increase in mispriming of component oligonucleotides during assembly as their numbers increase (8, 17). Indeed, assuming an EF of 0.20%, we calculate that the 5,386-bp ϕ174 synthetic genome preparation (4) would contain an average of 10.8 errors per genome, and 2 × 10-5 correct sequences; it was estimated that the synthetic genome contained ≈5 × 10-5 infective sequences of which about one in four had the exact sequence intended. With our current DNA sequencing capabilities, we could tolerate ≤0.4% EF for ≈500-bp synthetic fragments (N = 4.8). Based on such considerations, we chose a synthon length of ≈500 bp. We concentrated on developing robust, reproducible methods for preparing synthons with low EF, efficient methods for joining these together, and molecular biology techniques to enable parallel processing and automation.

Table 1. Summary of DNA synthons synthesized.

Synthon size, bp
Correct clones, %
EF
Experiment* No. of synthons Range Average Total bp synthesized Total clones sequenced Sequenced DNA, bp Predicted Found Mismatch Deletion Insertion Total
1 102 229-541 496 50,634 647 308,938 43 39 0.12 0.047 0.004 0.176
2 118 129-781 510 59,884 797 409,267 30 32 0.17 0.070 0.004 0.24
3 44 286-748 519 27,004 706 346,222 20 20 0.27 0.06 0.006 0.34
4 85 112-650 502 42,743 589 317,382 19 16 0.24 0.078 0.02 0.33
Total/avg 349 112-748 505 180,265 2,739 1,381,809 28 27 0.20 0.064 0.007 0.27
*

Each experiment represents the parallel processed synthesis of the DNAs indicated

Assuming Poisson distribution of errors

Any specific error was counted only once

Because of the numerous variables involved in PCA and the dearth of large comparative studies, we could not predict general conditions to minimize EF. In initial experiments with conditions typically reported for PCA, we experienced unacceptably high EFs of 0.5–0.9% (N = 8–62 for L = 500 bp). Although certain variables were codependent, by varying reagents (e.g., crude vs. purified oligos, polymerases, and NTPs) and conditions (cycle number and annealing temperature) we pragmatically converged on conditions that ultimately gave an EF of ≈0.2–0.3% by using commercially available, unpurified oligonucleotides.

The data from four large PCA experiments (Table 1) with unpurified 40-mer synthetic oligonucleotides allowed evaluation of certain factors that contribute to EF. Overall, the results represent 349 synthons (180,265 bp) covering a length range of 112–781 (average 505) bp. In this study, ≈99% of PCA reactions provided a fragment of the predicted size, and the remainder were obtained by using a higher annealing temperature (62°C) during assembly. The average EF was ≈0.27% over 1,382 kb sequenced, and values of F reasonably tracked those predicted by a Poisson distribution of errors. The observed EF in PCA is significantly higher than that of the polymerase mixture used (≈1 × 10-5) and, as indicated by others (4, 8), suggests that PCR errors are insignificant contributors to errors in synthons. Enigmatically, the EFs observed were lower than expected in the component oligonucleotides. Interestingly, when varying cycle numbers for assembly, the EF of 2.6 ± 0.6% was similar up to ≈25 cycles but nearly doubled to 4.1 ± 0.8% at 50 cycles. Perhaps a kinetic selection for correct sequences during the earlier cycles caused perfect oligonucleotide hybrids to be extended more rapidly than imperfect ones. There was no correlation of EF with size of the synthesized fragment or error location in component oligonucleotides, but a higher frequency of sequences with EF >0.3% occurred when G+C exceeded 60%. We did not observe products resulting from mispriming of oligonucleotides, perhaps reflecting desirable features of the sequence design software or the modest sequence lengths prepared. In final products, there was a 0.007% frequency of base insertions attributable to primer slippage during assembly, a 0.06% deletion frequency (90% single nucleotide) attributable to slippage or, more likely, n - 1 oligonucleotide components, and a 0.20% point mutation frequency. Because errors occurring during PCA are expected to involve mainly insertion/deletions from primer-template slippage and are expected to be clustered at the ends of synthetic oligonucleotide components, we conclude that most errors originated from the short synthetic oligonucleotides. Unexpectedly, the predominant errors were mismatches due to point mutations rather than deletions from n - 1 errors, generally believed to predominate in oligonucleotide synthesis. Thus, there is little benefit to be gained by purifying oligonucleotides by gel electrophoresis before PCA, and, contrary to a recent report (4), high-fidelity PCA can be achieved with unpurified oligonucleotide components. Nevertheless, batches of synthetic oligonucleotides occasionally gave unacceptably high EFs in PCA. Thus, each large experiment should be preceded by synthesis of a control sequence comparing the EF by using the trial oligonucleotides vs. a previously validated set.

To avoid time-consuming purification of the PCR products, we used UDG/LIC (1820) to clone synthons into vectors containing appropriate resistance markers to allow subsequent LBS (vide infra). The primers used in the amplification step for PCA had 22 or 23 bases of U-containing sequences on the 5′ ends, followed by a 20-bp priming sequence complementary to sequences introduced at the ends of synthons. UDG treatment of PCR products gave long (22 or 23 bp) 3′ overhangs on both ends and concurrently destroyed any primer dimers formed during PCR. After annealing the UDG-processed PCR products with the recipient LIC vector containing complementary 5′ overhangs, the mixture was introduced into E. coli and plated to give several hundred colonies. After performing >300 UDG/LIC reactions and sequencing inserts from six to eight colonies of each, the statistics of EFs described above and cloning efficiencies were determined. More than 95% of clones contained inserts of the correct size, the remainder being parent LIC vectors. With more stringent quality control in preparation of the LIC vector, we attain 100% cloning efficiency. Compared with this remarkable value, cloning crude PCR products into a directional TOPO-cloning vector resulted in a cloning efficiency of only 56%. Here, the incorrect clones had inserts of small fragments of the synthesized DNA containing the directional primer.

The synthesis of multiple ≈500-bp sequences was facilitated by parallel processing and automation to yield ≈50,000 bp of synthetic DNA in ≈1 week. On day 1, oligos in microtiter plates are loaded into the plate hotel, and the robot consolidates components of each synthon into individual wells of the 96-well assembly plate. The assembly and amplification PCRs are performed robotically with an ORCA arm delivering the assembly plate to and from the plate sealer and thermocycler. On day 2, the robot transfers samples to wells of a plate containing the LIC vector and, after UDG/endonuclease treatment, to wells containing competent E. coli cells for transformation. Mixtures are then manually plated on agar. Colonies are picked on day 3 and grown. On day 4, plasmids are isolated and sequenced to identify correct sequences by day 6.

Building Larger DNA Segments from Synthons. The next challenge was to develop methods for efficiently connecting synthons into segments of ≈5 kb. Conventionally, these connections require fragment cleavage and purification, ligation to the adjacent fragment, transformation, cell growth, and plasmid isolation, with each cycle needing significant intervention over ≈3 days. We recently reported a technology, termed LBS, for facile ligation of multiple DNA fragments (15). Complementary overhangs were generated by cleavage of restriction sites common to both fragments, and, after ligation, antibiotic selection along with restriction-purification (21) served to purify the plasmid containing the conjoined fragments. This procedure possessed certain deficiencies. (i) Although the ligated inserts were readily identified by size, ≈25% of clones were starting donor vector containing the marker, which survived restriction-purification. (ii) The multiple unique restriction enzymes needed to connect multiple fragments often required salt and/or temperature changes that encumbered parallel processing. (iii) The need for regularly spaced unique restriction sites placed constraints on gene design and, most importantly, precluded their subsequent use when needed for further assembly into longer sequences.

Apart from being simpler, and therefore amenable to parallel processing, the improved LBS strategy in Fig. 1 provides two big advantages for building large DNA sequences. (i) The same three restriction enzymes, XhoI and the type IIS enzymes BsaI and BbsI, are used for all ligations, so that all others are available for subsequent use. (ii) Each of the vectors harbors two unique selectable markers that segregate during LBS to give four possible pairs when recombined, and, by alternating the vectors used in sequential ligations (see below), each ligated fragment pair is associated with a unique marker pair that allows the product to be isolated in very high efficiency by double antibiotic selection.

Fig. 1.

Fig. 1.

LBS with type IIS restriction enzymes and double antibiotic selection of ligated products. The procedure is as described in Results and Discussion.P and P′ represent the PCR primer sites incorporated at the insert ends.

As shown in Fig. 1, each of the two parent vectors used for UDG/LIC cloning of synthons contains two unique antibiotic resistance markers (TetR + CmR or StrR + KmR), one on each side of the cloning site. Both vectors contain the ApR for convenience of propagation, 5′ NotI and 3′ PacI sites flanking the cloning site to facilitate fragment size determination, and a XhoI site between ApR and the unique antibiotic marker proximal to the 3′ end of the cloning site. By using LIC, the PCA product corresponding to the 5′ fragment was cloned into one plasmid to give the “acceptor” vector, and the product corresponding to the 3′ fragment was cloned into the other to give the “donor” vector. Restriction sites for the type IIS enzymes BsaI and BbsI adjacent to the 5′ and 3′ ends of the insert, respectively, are introduced on the plus strands of the PCR product such that treatment of the acceptor vector with BbsI and the donor vector with BsaI cuts within the same sequence at one end of each insert to create complementary four-base overhangs for a seamless ligation. The acceptor vector, cleaved with XhoI and BbsI, gives a large fragment containing the insert and a unique marker (e.g., TetR) and a small fragment containing the other unique marker (CmR). The donor vector is cleaved with XhoI and BsaI to give a small fragment containing the insert and a unique marker (e.g., KmR) and a large fragment containing a different unique marker (StrR). When the mixture of four fragments is annealed and then ligated with a T4 ligase, the vector containing the fused inserts has a unique pair of selectable markers (e.g., TetR + KmR). The ligation mixture is introduced into E. coli, and colonies are selected for resistance to the appropriate pair of antibiotics to yield the vector with the fused inserts. To identify the product, fragment size analysis is performed by electrophoresis of the NotI/PacI digest. A complete LBS cycle requires ≈3 days. On day 1, the cleavage and ligation reactions are performed, and the bacteria are transformed and plated; day 2 involves colony picking and cell growth; and day 3 involves plasmid preparation and insert size analysis.

For the efficient connection of multiple DNA segments by LBS, it is imperative that a plan be prepared at the outset that (i) allows for alternation of resistance markers of acceptor and donor vectors in each LBS cycle, (ii) defines which LBS vector the synthon is cloned into, and (iii) minimizes the number of cycles required in parallel processing. In the example in Fig. 1, two DNA fragments residing on parent vectors containing TetR + CmR and StrR + KmR markers are joined to give a two-fragment plasmid with a unique TetR + KmR marker pair; by alternating the parent vectors, the two-fragment plasmid product carries a StrR + CmR marker pair. The LBS product of these two-fragment plasmid products yields a four-fragment plasmid with the unique TetR + CmR marker pair. Thus, with appropriate planning to minimize LBS cycles and parallel processing, the method can be used recursively to efficiently assemble any number of fragments into a single, contiguous DNA segment. Fig. 2 shows a dendrographic plan used for connecting eight DNA fragments together to form a 3,408-bp PKS module. As shown, construction of the sequence requires seven ligations that can be processed in parallel in three 3-day cycles. In the first cycle, eight single DNA fragments cloned into the appropriate sister plasmids were connected to form four two-fragment plasmids. In the next cycle, each two-fragment plasmid was ligated to another two-fragment insert. In cycle 3, the two four-fragment inserts were combined to give the 3,408-bp module. Including the synthesis of component synthons, the preparation of a ≈5 kb sequence requires ≈3 weeks, and ≈10 such sequences can be processed simultaneously.

Fig. 2.

Fig. 2.

Dendrogram of the plan for a three-cycle, eight-fragment LBS synthesis of a 3,408-bp PKS module. Assigned synthon numbers, insert length in bp, and resistance markers used for LBS are indicated. Other sequences were planned and prepared in a similar fashion.

Thus far, the current version of LBS has provided >90% success in obtaining desired products, with the few failures being due to contamination by a cotransformed parent plasmid or poor DNA cleavage by BbsI. We note that by including a restriction site at the ends of the synthesized fragments as in the first version of LBS (15), and the type IIS sites adjacent to the insert as described here, advantages of both approaches can be realized. The versatility of the system could be further enhanced by including attB sites in the primer site that would enable Gateway cloning as an option.

Assembling the ≈5-kb Segments into a Gene Cluster. The next stage was to connect the ≈5-kb segments to form PKS genes and combine these into transcription units and on into a whole-gene cluster. The ≈5-kb segments of PKS ORFs were designed to contain unique 5′ and 3′ restriction sites that would facilitate construction of these large sequences (Fig. 3 and Table 2). N-terminal modules were prepared with an NdeI site at the start codon and an XbaI site on the C terminus. Internal modules began with an SpeI site and had an XbaI site at the end of the ACP. C-terminal components (thiolesterase and C-terminal linkers) possessed an SpeIsiteonthe N terminus and an EcoRI site on the C terminus. The design is such that the 3′ XbaI overhang of an N-terminal or internal module can be joined with the 5′ SpeI overhang of the adjacent module, linker, or thiolesterase with concomitant destruction of both sites (Fig. 3).

Fig. 3.

Fig. 3.

Construction of synthetic DEBS ORFs, TUs, and gene cluster. (Top) Components of the DEBS ORFs were excised from their LBS vectors and assembled in a pUC derivative to give pKOS 422-33-1 (DEBS 1), pKOS 422-51-1 (DEBS 2), and pKOS 422-31-2 (DEBS 3). (Middle) The TU cloning vector, pKOS 422-174-3, was created by cloning a 270-bp synthetic fragment containing, from 5′ to 3′, a BglII site, a T7 promoter (Pr), a lac operator (Op), a ribosome binding site (RBS), NdeI/EcoRI cloning sites, a T7 transcriptional terminator (TT), and a MfeI restriction site into the BglII/EcoRI sites of pET22b. The DEBS's ORFs were excised as NdeIT7/EcoRI fragments and cloned into the NdeI/EcoRI sites of pKOS 422-74-3 to generate the TUs pKOS 422-80-1 (DEBS 1), pKOS 422-80-2 (DEBS 1), and pKOS 422-80-3 (DEBS 1). (Bottom) The XbaI/PacI fragment of pKOS 422-80-3 containing the DEBS 3 TU was cloned into the SpeI/PacI sites of pKOS 422-80-2, adjacent to the DEBS 2 TU to give pKOS 422-81-1. The XbaI/PacI fragment of this plasmid was inserted into the SpeI/PacI sites of pKOS 422-80-1 containing the DEBS 1 TU to obtain the three-ORF gene cluster, pDE1.

Table 2. Synthetic components of DEBS ORFs.

Synthons
ORF Component Size bp Amino acids encoded* 5′ site 3′ site Designed substitutions No. Size range, bp
DEBS1 Load module 1,614 1-538 NdeI XbaI A537S; A538S 3 528-559
Module 1 4,440 537-2016 SpeI XbaI V560I; E1436O 9 370-754
Module 2 4,344 2015-3462 SpeI XbaI G2015S; G2016S; L2908O 10 325-737
C-terminal linker 252 3461-3544 (+3) SpeI EcoRI G3461S; T3462S 1 252
DEBS2 Module 3 4,416 1-1472 NdeI XbaI V1471S; G1472S 9 326-737
Module 4 6,060 1471-3490 SpeI XbaI F3489S; A3490S 13 377-642
C-terminal linker 237 3489-3567 (+3) SpeI EcoRI - 1 237
DEBS3 Module 5 4,398 1-1466 NdeI XbaI P900Q 9 376-718
Module 6 4,290 1465-2894 SpeI XbaI V1465S; G1466S; A2357Q 9 355-735
Thioesterase 834 2893-3170 SpeI EcoRI D2893S 2 417-422

Other than specified modifications, the sequence of the synthetic gene was designed to encode the protein predicted from the recently corrected DEBS gene cluster sequence (GenBank AY661566).

*

Adjacent components have two overlapping codons that provide 3′ XbaI (TCTAGA) and 5′ SpeI (ACTAGA) ligation sites; upon joining, both sites are destroyed and provide bases (TCTAGA) encoding SS at the junction of the components

The C-terminal linkers have nine bases (GGGAATTCN) encoding three amino acids, GNS, added to the natural sequence to incorporate an EcoRI cloning site at the 3′ ends; GGGAATTCN was used to encode the natural GNS at the C terminus of the thioesterase

One of the PKS gene clusters targeted for synthesis was the 6-deoxyerythronolide B synthase (DEBS) cluster that encodes the structure of the erythromycin aglycone. We designed the cluster from the 10 components listed in Table 2. The accuracy of each component was verified by DNA sequencing before use in the next step of assembly, which consisted of making the three large ORFs of the DEBS gene cluster. These ORFs comprise the 10.6-kb DEBS 1 containing a loading domain, followed by modules 1 and 2; the 10.7-kb DEBS 2 containing modules 3 and 4; and the 9.5-kb DEBS 3 containing modules 5 and 6, followed by a thioesterase domain.

One of several strategies used for assembling the components into PKS ORFs used a special ORF cloning vector containing a 5′ NdeI–NotI–SpeI–EcoRI multiple cloning site (Fig. 3). The assembly was initiated by cloning the C-terminal linkers for the first two ORFs, or the thioesterase domain for the third, into the SpeI/EcoRI sites of the ORF vector. Internal modules were then cloned, sequentially if more than one, as NotI/XbaI fragments excised from the LBS vectors into the NotI/SpeI site of the ORF vector; this ligation destroyed the SpeI site of the ORF vector and introduced another derived from the synthetic segment. Finally, the N-terminal module was cloned into the NdeI/SpeI sites to complete construction of the ORF. The three DEBS ORFs were thus assembled in individual vectors, flanked by unique 5′ NdeI and 3′ EcoRI sites.

The three DEBS ORFs were next converted into transcription units (TUs) in which each ORF was preceded by a 5′ T7 promoter and a ribosome binding site and followed by a 3′ transcriptional terminator (Fig. 3). A special TU cloning vector was constructed with a 270-bp synthetic insert containing the ϕ10 promoter of T7 phage with an overlapping lac operator, a Shine–Dalgarno sequence, and a T7 transcriptional terminator; NdeI/EcoRI sites were appropriately placed for insertion of PKS ORFs, and 5′ XbaI and adjacent 3′ SpeI and PacI sites were positioned at the ends for mobilization of the entire TU. Each of the three DEBS ORFs was cloned into the NdeI/EcoR I site of the vector to create corresponding TUs.

Finally, the three DEBS TUs were assembled to give a pDE1 containing a 31,656-bp contiguous sequence of synthetic DNA containing the complete DEBS gene cluster (Fig. 3). For this assembly, the XbaI/PacI fragment of DEBS 2 TU was cloned into the SpeI/PacI sites of the DEBS 1 TU, and then the XbaI/PacITU of DEBS 3 was cloned into the SpeI/PacI sites of the vector containing the DEBS 1 + DEBS 2 TUs. PDE1 was completely sequenced to verify it had the DNA sequence anticipated from its components.

When E. coli K207-3 (22) harboring pDE1 was induced with IPTG, the large DEBS subunits were observed on SDS/PAGE of the soluble protein, and 6-dEB was observed by liquid chromatography–MS.

Perspective. Our strategy for preparing long, accurate DNA sequences has a number of advantages over other procedures. The approach begins with the high-throughput synthesis and cloning of DNA sequences ≈500 bp long, called synthons, as the primary building blocks for longer sequences. The benefits of using smaller rather than larger segments are that their synthesis can be processed in parallel and that the frequency of perfect sequences is sufficient to permit facile identification by sequence screening. Then, the synthons of ≈500 bp are efficiently assembled into multisynthon sequences of ≈5 kb by using LBS. Our current LBS method uses the same three restriction enzymes for all ligations so that others are reserved for later use, and conjoined products are simply purified by double-antibiotic selection. By parallel processing, about 10 ≈5-kb segments of accurate DNA can be made from the shorter components in ≈2 weeks. Finally, the ≈5-kb sequences are assembled into larger sequences by efficient but conventional cloning methods.

Using these methods, we have prepared well over 30 PKS gene modules, including all seven modules of the DEBS gene cluster as well as its thioesterase domain. The modular components of the DEBS gene cluster were further combined into a complete DEBS gene cluster composed of 31.7 kb of contiguous synthetic DNA. The functionality of the gene cluster was demonstrated by successfully expressing the polyketide synthase and producing its polyketide product in E. coli.

Although applied here to the modular PKS gene clusters, the technology currently at hand can be used to synthesize any sequence of this size, and such sequences could be combined to give even longer sequences. It is not unrealistic to propose that the DNA synthesis technology described here is poised to challenge the task of making artificial chromosomes and even the prophetic synthetic minimal genome (11).

Acknowledgments

We thank David Hopwood for helping in the preparation of the manuscript. This work was supported in part by National Institute of Standards and Technology Advanced Technology Program Grant 70NANB2H3014.

Author contributions: S.J.K., R.R., H.G.M., M.W., and D.V.S. designed research; S.J.K., K.G.P., H.G.M., and M.W. performed research; S.J.K., K.G.P., R.R., H.G.M., M.W., and D.V.S. analyzed data; and D.V.S. wrote the paper.

Abbreviations: DEBS, 6-deoxyerythronolide B synthase; EF, error frequency; LBS, ligation by selection; LIC, ligation-independent cloning; PCA, polymerase cycling assembly; PKS, polyketide synthase; TU, transcription unit; UDG/LIC, uracil DNA glycosidase/ligation-independent cloning.

Data deposition: The sequences in this paper have been deposited in the GenBank database (accession nos. AY661566 and AY771999).

Footnotes

In synthetic chemistry, synthons are defined as “structural units within a molecule which are related to possible synthetic operations” (13).

References

  • 1.Khorana, H. G., Yamada, T., Weber, H., Terao, T., RajBhandary, U. L., Otsuka, E., Kumar, A., Gupta, N. K., Buchi, H., Agarwal, K. L., et al. (1972) J. Mol. Biol. 72, 209-217. [DOI] [PubMed] [Google Scholar]
  • 2.Sekiya, T., Takeya, T., Brown, E. L., Belagaje, R., Contreras, R., Fritz, H. J., Gait, M. J., Lees, R. G., Ryan, M. J., Khorana, H. G., et al. (1979) J. Biol. Chem. 254, 5787-5801. [PubMed] [Google Scholar]
  • 3.Itakura, K., Hirose, T., Crea, A. D., Riggs, A. D., Heyneker, H. L., Bolivar, F. & Boyer, H. W. (1977) Science 198, 1056-1063. [DOI] [PubMed] [Google Scholar]
  • 4.Smith, H. O., Hutchison, C. A., III, Pfannkoch, C. & Venter, J. C. (2003) Proc. Natl. Acad. Sci. USA 100, 15440-15445. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Dillon, P. J. & Rosen, C. A. (1990) BioTechniques 9, 298-300. [PubMed] [Google Scholar]
  • 6.Prodromou, C. & Pearl, L. H. (1992) Protein Eng. 5, 827-829. [DOI] [PubMed] [Google Scholar]
  • 7.Stemmer, W. P., Crameri, A., Ha, K. D., Brennan, T. M. & Heyneker, H. L. (1995) Gene 164, 49-53. [DOI] [PubMed] [Google Scholar]
  • 8.Hoover, D. M. & Lubkowski, J. (2002) Nucleic Acids Res. 30, e43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Mandecki, W., Hayden, M. A., Shallcross, M. A. & Stotland, E. (1990) Gene 94, 103-107. [DOI] [PubMed] [Google Scholar]
  • 10.Cello, J., Paul, A. V. & Wimmer, E. (2002) Science 297, 1016-1018. [DOI] [PubMed] [Google Scholar]
  • 11.Hutchison, C. A., III, Peterson, S. N., Gill, S. R., Cline, R. T., White, O, Fraser, C. M., Smith, H. O. & Venter, J. C. (1999) Science 286, 2165-2169. [DOI] [PubMed] [Google Scholar]
  • 12.Walsh, C. (2003) Antibiotics: Actions, Origins, Resistance (Am. Soc. Microbiol., Washington, DC)
  • 13.Corey, E. J. (1967) Pure Appl. Chem. 14, 19-37. [Google Scholar]
  • 14.Sambrook, J., Fritsch, E. F. & Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual (Cold Spring Harbor Lab. Press, Plainview, NY), 2nd Ed.
  • 15.Kodumal, S. J. & Santi, D. V. (2004) BioTechniques 37, 34-40. [DOI] [PubMed] [Google Scholar]
  • 16.Fisher, C. L. & Pei, G. K. (1997) BioTechniques 23, 570-574. [DOI] [PubMed] [Google Scholar]
  • 17.Zhang, H., Howard, E. M. & Roepe, P. D. (2002) J. Biol. Chem. 277, 49767-49775. [DOI] [PubMed] [Google Scholar]
  • 18.Rashtchian, A., Buchman, G. W., Schuster, D. M. & Berninger, M. S. (1992) Anal. Biochem. 206, 91-97. [DOI] [PubMed] [Google Scholar]
  • 19.Chambers, R. S. & Johnston, S. A. (2003) Nat. Biotechnol. 21, 1088-1092. [DOI] [PubMed] [Google Scholar]
  • 20.Smith, C., Day, P. J. R. & Walker, M. R. (1993) PCR Methods Appl. 2, 328-332. [DOI] [PubMed] [Google Scholar]
  • 21.Wells, J. A., Cunningham, B. C., Graycar, T. P. & Estell, D. A. (1986) Philos. Trans. R. Soc. London 317, 415-423. [Google Scholar]
  • 22.Murli, S., Kennedy, J., Dayem, L. C., Carney, J. R. & Kealey, J. T. (2003) J. Ind. Microbiol. Biotechnol. 30, 500-509. [DOI] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES