Skip to main content
Synthetic Biology logoLink to Synthetic Biology
. 2020 Nov 5;5(1):ysaa023. doi: 10.1093/synbio/ysaa023

Building a custom high-throughput platform at the Joint Genome Institute for DNA construct design and assembly—present and future challenges

Ian K Blaby 1,2,, Jan-Fang Cheng 1,2,
PMCID: PMC7737003  PMID: 34746437

Abstract

The rapid design and assembly of synthetic DNA constructs have become a crucial component of biological engineering projects via iterative design–build–test–learn cycles. In this perspective, we provide an overview of the workflows used to generate the thousands of constructs and libraries produced each year at the U.S. Department of Energy Joint Genome Institute. Particular attention is paid to describing pipelines, tools used, types of scientific projects enabled by the platform and challenges faced in further scaling output.

Keywords: bio-CAD, DNA assembly, DNA synthesis, platform development, synthetic biology

1. Introduction

The redesign of biological systems has historically been impeded by an incomplete understanding of the system and the difficulty in unhinging molecular components from the constraints, such as regulation and enzyme metabolite preference, to which they have evolved. With the broad objective of overcoming the latter and providing novel and accelerated approaches toward the former, synthetic biology lies at the interface of a number of biological disciplines and engineering. To help overcome incomplete knowledge of biological complexity, design–build–test–learn (DBTL) cycles are commonly employed as a means to iterating toward exploiting biological machinery with applications ranging from energy to agriculture to health. This approach has been enabled by technological achievements in the last decade: just as sequencing costs have decreased over the last 15 years, chemical DNA synthesis has declined to <$0.1/bp, while synthesized sequence accuracy and fragment length have both increased. In recognition of this potential, the Joint Genome Institute (JGI), a user facility that provides capabilities and scientific expertise on a competitive peer-reviewed basis for U.S. Department of Energy (DOE)-relevant research, added to the suite of genomics capabilities by initiating the DNA synthesis program for synthetic biology in 2012. With similar goals to the JGI platform, a growing number of bio-foundries have been built that serve the regional needs for synthetic biology and bio-manufacturing process engineering (1).

For several decades, core facilities at academic institutes have been providing simple access to DNA sequencing services with rapid and cost-effective turnaround. Given cost reductions, development of biological computer-aided design tools and laboratory automation, it is conceivable that DNA synthesis and construct assembly could become another major service of core facilities. This perspective aims to provide an overview of our experience in building a high-throughput platform incorporating construct design, assembly, cloning and sequence verification processes at scale for synthetic biology products.

2. Capabilities offered by the platform and user projects

The DNA synthesis platform at JGI comprises an end-to-end pipeline from design to assembled constructs and presently generates ∼7 Mbp/annum of custom DNA synthesis and assemblies. The platform team is divided between bioinformatics, production and research, and supports the workflow by developing new tools for design and optimization, assembly of constructs for users and development of new capabilities, respectively. Presently, the platform focuses on four classes of project (Figure 1). These include: (i) small inserts, such as single or a few small genes (typically <5 kb/insert total); (ii) large inserts including complete pathways or multiple operons (up to ∼50 kb); (iii) combinatorial constructs or libraries and (iv) small size and high degree of variants libraries, such as gRNA or promoter libraries (Figure 1). DNA is not synthesized internally at the JGI, but ordered as linear or clonal fragments by commercial vendors, and is used to generate the building blocks of each construct (or oligonucleotide pools in the case of small size libraries; see Supplementary File for details). These are then seamlessly assembled via Gibson (2), Golden Gate or MoClo assembly (3, 4), or yeast recombinase-mediated cloning (5, 6) as appropriate for each project. For large constructs exceeding ∼25 kb sequential rounds of these methods are employed.

Figure 1.

Figure 1.

Overview of construct types. (a) Single or multiple small genes are typically constructed by Gibson assembly. gRNA libraries can be similarly constructed using oligonucleotide pools flanked with vector homology. (b) Pathways and multiple operons can be compiled by digestion with a type IIS restriction enzyme (whose recognition site either does not occur in composite sequences or has been removed by sequence refactoring) followed by Golden Gate assembly or inclusion of overlapping sequences and yeast-mediated recombination. (c) Combinatorial libraries are constructed by Golden Gate assembly or for libraries with increased complexity using modular cloning (MoClo) approaches. (d) Libraries containing higher degrees of variants are generated using multiple compatible inserts and assembled into vectors via Gibson or Golden Gate assembly. Regions of overlapping homology for Gibson assembly or yeast recombination are signified by matching colors. Promoters, terminators and enzyme cut sites are designated by green arrows, red Ts and scissor cartoons, respectively.

As a DOE Biological and Environmental Research (BER) user facility, all granted proposals at JGI contribute to BER priorities (7), and more broadly, DOE mission. Consequently, many projects aim to address or query gene function, for example generating large numbers of enzymes for biochemical or biophysical characterization, or for performing mutant library screens to identify all genes responding to a given condition. Other projects focus on optimizing a previously characterized pathway by combinatorially arranging components (e.g. promoters, coding regions and terminators) from different sources to achieve elevated product level.

Projects can be further enhanced in scope by additional JGI capabilities that intersect with DNA synthesis. Multiple projects have benefited from data mining sequence-based databases developed and maintained by JGI (such as the Integrated Microbial Genomes and Microbiomes (8), Phytozome (9) and Mycocosm (10)) to maximize the breadth of phylogenetic diversity surveyed, contributing to the design of DBTL the cycle. Other opportunities exist with other technologies, for example synthesizing genes encoding transcription factors for regulon interrogation by DapSeq (11), or for metabolomics to investigate the metabolic consequences of introduced pathways (supporting test and learn phases of DBTL). Frameworks are also in place allowing DNA synthesis (and other JGI capabilities) to be coupled with other complementary DOE user facilities.

Many projects aim to obtain a greater understanding of protein function and focus on characterizing proteins expressed from synthetic DNA constructs either in vitro in cell-free extracts or in vivo using model organisms. An affinity purification tag is usually used for expressing genes heterogeneously in model organisms to yield sufficient protein quantity and purity for downstream enzymology and structural analysis. To mitigate potential protein insolubility or toxicity, there is growing interest in expressing the same set of genes in multiple organisms, thus elevating the likelihood of obtaining the desired protein. For both identifying specific sequences to work with and ensuring taxonomic diversity ensuring the breadth of a protein family is captured, sequence data mining often constitutes part of the design of gene function discovery-based projects. Some examples include characterization of terpene biosynthesis (12–15) and the glycoside hydrolase protein families (16–19). Such projects often require single gene constructs, but increasingly interest is growing in much larger fragments, for example functional interrogation of biosynthetic gene clusters (BGCs).

Another growing area of interest is CRISPR-related projects for tool development and libraries of engineered strains exploiting CRISPR nuclease, interference or activation-based screens (CRISPRi (20) and CRISPRa (21), respectively). Multiple libraries have been generated and are presently being worked on ranging from prokaryotes to yeasts (22) and algae.

The high capacity of the platform facilitates the synthesis of large construct numbers allowing for pathway screening. Examples of these approaches include the use of combinatorial screens to uncover novel and optimal activities (23–25) and metabolic engineering for the development of new pathways (26).

3. Project management and workflows

The general workflow of the JGI platform has been modeled on DBTL engineering cycles, with a typical project beginning with an initiating conference call between JGI staff scientists and the proposers to discuss the project goals, experimental design and DNA assembly strategies suggested by JGI staff to aid users with their research (Supplementary Figure S1). In this regard, work by JGI scientists provides the Design and Build capabilities, and, depending on the nature of the project, can contribute to capabilities of Test and Learn, which are usually conducted in the researcher's own laboratory (Supplementary Figure S1).

Subsequent to receiving sequence information, the data are processed through a suite of custom software pipelines, as appropriate for the individual project, and the final designs sent back to the user for final confirmation prior to ordering individual fragments (termed ‘building blocks’). These computational tools have recently been comprehensively reviewed (27), but are summarized in the Supplementary File. Once all of the synthetic DNA fragments and oligonucleotides required for a project have been received, their delivery is recorded into our Laboratory Information Management System (LIMS) for tracking and the molecular workflows are initiated (Supplementary File; Supplementary Figure S2).

4. Challenges and limitations of scaling output

Total output of the platform has grown steadily each year (Supplementary Figure S3). In 2019, 41 user projects were initiated. The year 2019 saw the construction of 4182 and 338 cloned fragments of < and >5 kb, respectively, which totals 7.44 Mb of constructs delivered (Supplementary Figure S3). Of these 4520 requested constructs, 4030 were delivered (89%). 7.44 Mb delivered compares to 6.59 Mb synthesized; this discrepancy is accounted for by several projects requiring only PCR amplification from genomic or plasmid DNA templates. In addition, nine libraries of high-degree variants were constructed and delivered. Users are predominantly based in the USA, but the platform is globally accessible and projects also originate from institutes from other countries. Nevertheless, the platform is not presently approaching maximum capacity; the potential for scaling beyond these numbers is dependent upon multiple factors, as discussed below. Detailed information regarding our molecular workflows, apparatus and protocols are provided in Supplementary File and summarized in Supplementary Figure S2.

A limitation in achieving the theoretical maximum capacity of our pipeline infrastructure is that user projects are often split into smaller batches. This enables the collaborator to pilot test their vector or downstream assays before committing with their full request, but frequently results in non-filled plates during the assembly process, leading to reduced machine and staff time efficiencies. One possible option that we are currently exploring to mitigate this is to combine projects on plates to fill every well, and deconvolute constructs postcompletion.

Standardization of project type has generally been resisted in order to maximize project flexibility, and accordingly project goals and the scientific questions posed by users vary significantly. Whether fragments originate from PCR amplicons or synthesized DNA, fragments are assembled into the user’s vector of choice precisely as agreed upon in discussions. Often this necessitates vectors being modified or built prior to assembly of the final constructs. A further complication that occasionally derives from custom vectors is incorrect sequence data. By default, platform staff sequence validates all incoming vectors to ensure constructs are assembled as intended. On rare occasions, point mutations or larger sequence deviations identified in the provided plasmid DNA must be repaired before a project can proceed. While this vector flexibility, which most companies do not offer (or require on-boarding fees for new vectors and/or additional cloning costs), represents a limitation to the potential platform output, this approach ensures the final constructs are of maximal utility to collaborators for addressing their scientific questions.

Another restriction to throughput is that for some projects DNA synthesis is inappropriate or not applicable. For instance, if the DNA sequence of the final construct cannot be refactored (such as for gene-flanking regions for homologous recombination-mediated gene deletions where the cloned regions must be identical at the nucleotide level) and/or synthesis constraints cannot be overcome, PCR amplification from the source DNA or direct cloning are the only affordable options. Conversely, DNA synthesis provides an opportunity to study coding capacity if the naturally occurring DNA is not available, such as the characterization of genes encoded on a sequenced environmental sample or unculturable microbes. Since projects that depend upon significant PCR amplification from genomic DNA are more prone to failure to achieve all required building block fragments (due to PCR limitations of high or low GC skew, repetitive sequence, secondary structure and the possibility of point mutations being introduced by the polymerase into the amplified DNA), projects utilizing synthetic DNA generally yield higher delivery rates and are more amenable to both automation and high-throughput approaches. Operonic structures and BGCs present a different challenge since (depending on the use case) coding regions may be refactored, but alteration of regulatory regions such as promoters or terminators from the native sequence is generally undesirable. As well as surmounting synthesis of problematic sequence, refactoring provides opportunities to modify each codon, and potentially impacting gene expression levels by mimicking codon usage of the host organism’s genome.

In our present workflows, some steps may be performed manually using multi-channel pipettes instead of automation for small numbers of assemblies or if a machine is in use. While affording maximal flexibility with methods for assembly and general workflows, this approach both prevents maximal capacity from being achieved and necessitates constant human oversight. To help overcome this, one avenue that is presently being explored is a complete end-to-end integrated system for the automated assembly of constructs via Gibson assembly, for which the inputs would be DNA building blocks, oligonucleotides and reagents, and the output would be complete final constructs arrayed in plates.

Finally, an ongoing challenge faced is balancing the testing of emergent technologies to identify novel approaches, potentially leading to efficiency gains or new product types, versus investing in scaling up methods already in place for which robust protocols have been developed. New products are frequently assessed for their possible benefits to the platform as they become available, and subsequent to validating for compatibility with existing workflows and protocol robustness, are incorporated into the platform.

5. Future directions

Catalyzed by technological advances and the resulting decreases in price and turnaround times, the applications of DNA synthesis remain fast-moving technologies. Accordingly, a series of computational and biological applications are presently in development with potential future availability to users, briefly summarized here.

Extensive genome sequencing has revealed new coding capacity whose function has yet to be discovered. But functional characterization of these sequences has been impeded by difficulty in expressing from traditional model organisms, possibly due to a combination of misfolding of proteins, absence of required precursor metabolites and/or low tolerance to gene products. To help overcome this, an area of active development is the engineering of diverse microorganisms as modular chassis strains for heterogeneous expression of synthetic genes and pathways. Presently, this includes around two dozen new strains in g-Proteobacteria (28, 29), with work progressing on additional prokaryotic lineages and unicellular eukaryotes.

Finally, if trends from the past decade continue, notwithstanding progress in new technologies, the annual capacity of the platform might be expected to continue to increase, leading to possible increases in the scope and/or number of projects worked on.

Supplementary data

Supplementary Data are available at SYNBIO Online.

Funding

This work has been supported by the DOE Joint Genome Institute (http://jgi.doe.gov) by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research, through Contract DE-AC02-05CH11231 between Lawrence Berkeley National Laboratory and the U.S. Department of Energy.

Supplementary Material

ysaa023_Supplementary_Data

Acknowledgments

We are extremely grateful to Sangeeta Nath for input on drafting figures.

Conflict of interest statement. None declared.

References

  • 1. Hillson N., Caddick M., Cai Y., Carrasco J.A., Chang M.W., Curach N.C., Bell D.J., Le Feuvre R., Friedman D.C., Fu X.  et al. (2019) Building a global alliance of biofoundries. Nat. Commun., 10, 2040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Gibson D.G., Young L., Chuang R.-Y., Venter J.C., Hutchison C.A., Smith H.O. (2009) Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods, 6, 343–345. [DOI] [PubMed] [Google Scholar]
  • 3. Weber E., Engler C., Gruetzner R., Werner S., Marillonnet S. (2011) A modular cloning system for standardized assembly of multigene constructs. PLoS One, 6, e16765. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Engler C., Kandzia R., Marillonnet S. (2008) A one pot, one step, precision cloning method with high throughput capability. PLoS One, 3, e3647. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Joska T.M., Mashruwala A., Boyd J.M., Belden W.J. (2014) A universal cloning method based on yeast homologous recombination that is simple, efficient, and versatile. J. Microbiol. Methods, 100, 46–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Kouprina N., Larionov V. (2016) Transformation-associated recombination (TAR) cloning for genomics studies and synthetic biology. Chromosoma, 125, 621–632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. BERAC. (2017) Grand Challenges for Biological and Environmental Research: Progress and Future Vision. A Report from the Biological and Environmental Research Advisory Committee DOE/SC–0190. http://science.osti.gov/~/media/ber/berac/pdf/Reports/BERAC-2017-Grand-Challenges-Report.pdf
  • 8. Chen I.-M.A., Chu K., Palaniappan K., Pillay M., Ratner A., Huang J., Huntemann M., Varghese N., White J.R., Seshadri R.  et al. (2019) IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res., 47, D666–D677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Goodstein D.M., Shu S., Howson R., Neupane R., Hayes R.D., Fazo J., Mitros T., Dirks W., Hellsten U., Putnam N.  et al. (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res., 40, D1178–D1186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Grigoriev I.V., Nikitin R., Haridas S., Kuo A., Ohm R., Otillar R., Riley R., Salamov A., Zhao X., Korzeniewski F.  et al. (2014) MycoCosm portal: gearing up for 1000 fungal genomes. Nucleic Acids Res., 42, D699–D704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Bartlett A., O'Malley R.C., Huang S-s. C., Galli M., Nery J.R., Gallavotti A., Ecker J.R. (2017) Mapping genome-wide transcription-factor binding sites using DAP-seq. Nat. Protoc., 12, 1659–1672. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Murphy K.M., Ma L.T., Ding Y., Schmelz E.A., Zerbe P. (2018) Functional characterization of two class II diterpene synthases indicates additional specialized diterpenoid pathways in maize (Zea mays). Front. Plant Sci., 9, 1542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Pelot K.A., Chen R., Hagelthorn D.M., Young C.A., Addison J.B., Muchlinski A., Tholl D., Zerbe P. (2018) Functional diversity of diterpene synthases in the biofuel crop switchgrass. Plant Physiol., 178, 54–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Ding Y., Murphy K.M., Poretsky E., Mafu S., Yang B., Char S.N., Christensen S.A., Saldivar E., Wu M., Wang Q.  et al. (2019) Multiple genes recruited from hormone pathways partition maize diterpenoid defences. Nat. Plants, 5, 1043–1056. [DOI] [PubMed] [Google Scholar]
  • 15. Pelot K.A., Hagelthorn D.M., Hong Y.J., Tantillo D.J., Zerbe P. (2019) Diterpene synthase-catalyzed biosynthesis of distinct clerodane stereoisomers. Chembiochem, 20, 111–117. [DOI] [PubMed] [Google Scholar]
  • 16. Macdonald S.S., Armstrong Z., Morgan-Lang C., Osowiecka M., Robinson K., Hallam S.J., Withers S.G. (2019) Development and application of a high-throughput functional metagenomic screen for glycoside phosphorylases. Cell Chem. Biol., 26, 1001–1012.e5. [DOI] [PubMed] [Google Scholar]
  • 17. Glasgow E.M., Vander Meulen K.A., Takasuka T.E., Bianchetti C.M., Bergeman L.F., Deutsch S., Fox B.G. (2019) Extent and origins of functional diversity in a subfamily of glycoside hydrolases. J. Mol. Biol., 431, 1217–1233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Deng K., Guenther J.M., Gao J., Bowen B.P., Tran H., Reyes-Ortiz V., Cheng X., Sathitsuksanoh N., Heins R., Takasuka T.E.  et al. (2015) Development of a high throughput platform for screening glycoside hydrolases based on oxime-NIMS. Front. Bioeng. Biotechnol., 3, 153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Heins R.A., Cheng X., Nath S., Deng K., Bowen B.P., Chivian D.C., Datta S., Friedland G.D., D’Haeseleer P., Wu D.  et al. (2014) Phylogenomically guided identification of industrially relevant GH1 beta-glucosidases through DNA synthesis and nanostructure-initiator mass spectrometry. ACS Chem. Biol., 9, 2082–2091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Larson M.H., Gilbert L.A., Wang X., Lim W.A., Weissman J.S., Qi L.S. (2013) CRISPR interference (CRISPRi) for sequence-specific control of gene expression. Nat. Protoc., 8, 2180–2196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Perez-Pinera P., Kocak D.D., Vockley C.M., Adler A.F., Kabadi A.M., Polstein L.R., Thakore P.I., Glass K.A., Ousterout D.G., Leong K.W.  et al. (2013) RNA-guided gene activation by CRISPR-Cas9-based transcription factors. Nat. Methods, 10, 973–976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Schwartz C., Cheng J.-F., Evans R., Schwartz C.A., Wagner J.M., Anglin S., Beitz A., Pan W., Lonardi S., Blenner M.  et al. (2019) Validating genome-wide CRISPR-Cas9 function improves screening in the oleaginous yeast Yarrowia lipolytica. Metab. Eng., 55, 102–110. [DOI] [PubMed] [Google Scholar]
  • 23. Xie L., Zhang L., Wang C., Wang X., Xu Y-m., Yu H., Wu P., Li S., Han L., Gunatilaka A.A.L.  et al. (2018) Methylglucosylation of aromatic amino and phenolic moieties of drug-like biosynthons by combinatorial biosynthesis. Proc. Natl. Acad. Sci. U S A, 115, E4980–E4989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Dossani Z.Y., Reider Apel A., Szmidt-Middleton H., Hillson N.J., Deutsch S., Keasling J.D., Mukhopadhyay A. (2018) A combinatorial approach to synthetic transcription factor-promoter combinations for yeast strain engineering. Yeast, 35, 273–280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Wang X., Wang C., Duan L., Zhang L., Liu H., Xu Y-m., Liu Q., Mao T., Zhang W., Chen M.  et al. (2019) Rational reprogramming of O-methylation regioselectivity for combinatorial biosynthetic tailoring of benzenediol lactone scaffolds. J. Am. Chem. Soc., 141, 4355–4364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Schwander T., Schada von Borzyskowski L., Burgener S., Cortina N.S., Erb T.J. (2016) A synthetic pathway for the fixation of carbon dioxide in vitro. Science, 354, 900–904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Oberortner E., Evans, R., Meng, X., Nath, S., Plahar, H., Simirenko, L., Tarver, A., Deutsch, S., Hillson, N. J., Jan-Fang, C. (2020) An integrated computer-aided design and manufacturing workflow for synthetic biology. In: DNA Cloning and Assembly Methods and Protocols 3-18.  Humana Press, New York, NY. [DOI] [PubMed] [Google Scholar]
  • 28. Ke J., Yoshikuni Y. (2020) Multi-chassis engineering for heterologous production of microbial natural products. Curr. Opin. Biotechnol., 62, 88–97. [DOI] [PubMed] [Google Scholar]
  • 29. Wang G., Zhao Z., Ke J., Engel Y., Shi Y.-M., Robinson D., Bingol K., Zhang Z., Bowen B., Louie K.  et al. (2019) CRAGE enables rapid activation of biosynthetic gene clusters in undomesticated bacteria. Nat. Microbiol., 4, 2498–2510. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ysaa023_Supplementary_Data

Articles from Synthetic Biology are provided here courtesy of Oxford University Press

RESOURCES