Abstract
The cyanobactin biosynthetic pathways pat and tru, isolated from metagenomes of marine animals, lead to diverse natural products containing heterocycles derived from Cys, Ser, and Thr. Previous work has shown that PatD and TruD are extremely broad-substrate heterocyclase enzymes. These enzymes are virtually identical in their N-terminal putative catalytic domains, but only ~77% identical in their C-terminal putative substrate-binding domains. Here, we show that these differences allow the enzymes to control regioselectivity of posttranslational modifications, helping to control product chemistry in this hypervariable family of marine natural products.
A problem for chemists in the metagenomic era is how to approach the vast amounts of data coming from metagenome sequences. Metagenomes are mixed samples of multiple organisms from the environment, including symbiotic associations of bacteria with animals.1–4 We have been developing the use of metagenomes from symbiotic bacteria for pathway engineering, allowing natural sequence variation to define parameters for gene-based organic synthesis.5,6 Here, we provide an example of how metagenome sequence analysis contributes to understanding biosynthetic enzyme function using the thiazoline and oxazoline synthases, PatD and TruD. These proteins in turn are useful in the synthesis of diverse natural product libraries.
The first natural products pathways identified and characterized by sequencing of metagenomic DNA and comparison are the patellamide (pat) and trunkamide (tru) pathways to the cyanobactins, ribosomal peptides that are extensively posttranslationally modified (Figure 1A).5,7 We previously used a metagenome sequence analysis method to determine that enzymes from pat and tru are extremely broad-substrate selective and can be used for in vivo synthesis of diverse natural and unnatural cyanobactin derivatives. Essentially identical (>99%) enzymes within the tru or pat pathway accept numerous substrates. Although enzymes are identical within pathways, there are differences between pat and tru. In particular, there are enzyme domains that are virtually identical between pathways, and there are more divergent domains. These changes have functional consequences that can be precisely tracked (Figure 1).
Both pat and tru were initially cloned from bacteria living symbiotically with marine animals, the ascidians. Over 60 cyanobactins, including the well-known metabolites trunkamide and patellamide C,8,9 have been isolated from ascidians; sequencing of metagenomic DNA has identified more than 30 additional relatives.6 Cyanobactins are ribosomal peptide natural products, and comparison of metagenomic sequencing data with known compounds provided a natural mutagenesis in which evolution has mutated every amino acid in the natural products while maintaining identical modifying enzymes.5 The ribosomal peptides are encoded on precursor peptides of ~70 amino acid length. Precursors include leader sequences (~35 amino acids) and two sets each of recognition and product-coding sequences.
Despite their demonstrated usefulness in synthesizing diverse compounds, the biosynthetic rules behind posttranslational modification patterns remained elusive. For example, products of the pat pathway contain Cys, Ser, and Thr residues that are mostly heterocyclic. By contrast, known representatives of the tru pathway are heterocyclic at Cys but reverse prenylated at Ser and Thr (Figure 1).
Previous work has shown that essentially identical enzymes within the pat or tru pathways lead to these diverse products.5,6,10 The enzymes modify precursor peptides that have conserved leader sequences and enzyme recognition sequences. Between the conserved recognition sequences are cassettes that directly encode the final natural products. These cassettes are modified by heterocyclization, prenylation, and N-C circularization to yield the mature natural products. Cassette sequences are hypervariable, with substitutions accepted at any amino acid position, leading to large natural product libraries.
Six, seven, or eight amino acids are found in ascidian cyanobactins. Numbering from C- to N-terminus, all positions except position 2 naturally contain the heterocyclizable residues, Cys, Ser, or Thr, in one or more of the 60 known cyanobactins. The pat group includes members in which heterocycles occur in positions 1, 3, 5, 6, and 7 (Figure 2). By contrast, cycles are only found in position 1 in the tru group. In tru products, only Cys is heterocyclized, while Ser and Thr are prenylated; Cys, Ser, or Thr are cyclic in the pat products, and prenylation is not found. In all of these peptides, there are numerous Ser residues outsitde of the cassettes that are not modified.
These natural substitution patterns are particularly striking given the overall enzyme conservation. Despite chemical differences between pat and tru, all enzymes are homologous between these pathways. In fact, the didomain heterocyclase enzymes, PatD and TruD, are >99% identical in their N-terminal domains and only ~77% identical in their C-terminal domains. The questions addressed in this study are how such similar enzymes and pathways lead to different posttranslational products and how these enzymes can be used to synthesize diverse chemical derivatives.
Microcin B17 and streptolysin S synthetases were the first and second heterocyclase enzymes to be characterized in vitro.11–13 Although they are only distantly related to PatD and TruD, comparison of these enzymes showed that the PatD/TruD catalytic domain is at the N-terminus, while the C-terminus functions primarily to bind the substrate peptide. This led to the question, do these enzymes operate in a chemoselective fashion (O vs. S in heterocycles), or is regioselectivity important, as might be implicated by peptide binding differences? Here, we use biochemical experiments to define PatD and TruD as regioselective heterocyclases, which catalyze thiazoline and oxazoline biosynthesis.
When the pat gene cluster encoding PatD was previously expressed in E. coli, we observed products containing the natural heterocycle pattern, including thiazoline and oxazoline.6,7 Similarly, heterologous expression of the tru pathway led to production of thiazoline-containing natural products with prenylated Ser/Thr residues.5 Thus, PatD and TruD were implicated as probable heterocyclases, but they had not been purified or characterized. Both genes were cloned and expressed here as N-terminal His-tagged proteins (Figure S1). The patD clone includes two point mutations in the N-terminal domain, which did not change enzyme function. The truD gene was cloned into the C-terminus of an existing patD construct because of toxicity problems. Practically, this cloning strategy ensured that PatD and TruD were in fact 100% identical (instead of merely >99%) in their presumed catalytic N-terminal domains. A series of substrates were constructed by cloning and expressed recombinantly, which was extremely efficient in comparison to peptide synthesis because of the large size of the substrates (~70 amino acids).
Purified PatD or TruD were used in experiments with the substrate analogs, PatEdm, PatEα, TruE2, TruE4, and TruE5 (Figure 3). All enzyme experiments in this study were done with varying substrate concentrations, with each condition performed independently at least in triplicate. We first noticed that PatD and TruD products exhibited band shift differences by SDS-PAGE. In all cases, TruD products migrated more rapidly than unmodified peptides, while PatD products migrated more rapidly still (Figure 3). The mobility shift was consistent with formation of thiazoline and/or oxazoline rings given that these heterocycles introduce a significant conformational restraint on the peptide backbone.14–17
Given that heterocycle formation causes a loss of 18 Da, we employed an MS approach to confirm that each of the five precursor peptides were modified by heterocyclization. First, intact mass electrospray ionization (ESI) was used to determine the total number of dehydrations catalyzed by PatD or TruD on these ~9 kDa substrates, in comparison to unmodified control (Figure S2). Subsequently, the enzyme products were treated by a specific protease, PatA, that cleaves upstream from each cassette’s start site.10 These smaller fragments were analyzed by MALDI and ESI to localize dehydrations to single cassettes (Figures S3 and S8). Finally, these PatA-digested cassettes were also subjected to LC-Fourier Transform Ion Cyclotron Resonance (FT-ICR) and MS/MS to determine which amino acids within cassettes were heterocyclized (Figures S4 and S5). For Cys, MS data confirmed that thiazoline was formed and ruled out other possible dehydration routes. In addition, for Thr, all available evidence supported oxazoline formation. This evidence included observation of the same type of fragmentation suppression seen in other heterocycle-containing peptides,18 the absence of observed MS/MS fragments consistent with other modifications, and the SDS-PAGE mobility shift, which as noted above, is consistent with heterocycle formation. Additionally, the facts that these genes lead to oxazoline formation in vivo and that no other type of Thr dehydration has been observed in this compound family also support this interpretation. Despite this evidence, we nevertheless sought to completely eliminate the possibility of a reverse-Michael reaction (Figure S6). Because activated double bonds are not very reactive with acids, while oxazolines are labile in acidic conditions, substrates were subjected to very mild acidic conditions. The resulting rehydrated products confirmed that all Thr modifications were indeed due to oxazolines (Figure S6). Therefore, PatD and TruD catalyzed the synthesis of thiazoline and oxazoline; other products were not observed in extensive experimental analysis.
TruE2 is a natural substrate that contains two Cys residues that are found as thiazoline in the final natural products, as well as a number of Ser and Thr residues that are prenylated naturally (Figure 1A). TruE2 contains two cassettes, with Cys in position 1 of each cassette. When the tru cluster was expressed in E. coli, these natural products were synthesized,5 indicating that tru gene products modify TruE2 to produce thiazoline. When treated with TruD, both Cys residues were heterocyclized as expected, while none of the Ser / Thr residues in the molecule were modified. By contrast, when treated with PatD, which is not normally associated with TruE2, both Cys residues and an additional Thr residue in position 3 of cassette I were cyclized (Figure 3). An unnatural analog of TruE2, TruE4, contained only cassette II and only a single Cys residue; this residue was also cyclized by both PatD and TruD.
We next analyzed reactivity using unnatural substrate analogs from the pat pathway. In cassette I, both PatEdm and PatEα encode the natural patellamide C sequence which would normally be modified to contain two thiazole and two oxazoline residues (Figure 1). Indeed, patellamide C is synthesized in E. coli when PatEdm is co-expressed with pat enzymes.6,7 When these peptide substrates were treated with PatD, two Cys and two Thr residues were cyclized in both PatEdm and PatEα, as found in vivo in E. coli expression. TruD, which is not normally associated with the pat pathway, behaves differently. None of the tru products we have so far examined contain Cys in position 5, as found in PatEdm. However, TruD readily cyclized both Cys residues in position 1 and position 5 (Figure 3).
From these experiments, we could not determine whether TruD was truly chemoselective for Cys or whether regioselectivity played a role. We therefore synthesized an unnatural variant, TruE5, that contained Thr in place of Cys at position 1 of cassette I. Both PatD and TruD were able to cyclize this new Thr residue, indicating that the reaction specificity of these enzymes is mainly due to regioselectivity (Figure S7), though this reaction was much slower than Cys heterocyclization. Thus, although the enzymes select residues for modification based primarily on their position within cassettes, the chemical features of the modified residue may influence selectivity as well.
The regioselectivity of these enzymes clearly explains the observed product patterns in the ~60 ascidian-derived cyanobactins. In fact, previously the cyanobactin comoramide A was isolated with Thr heterocyclized in position 5 and prenylated in position 3.19 Although genes for comoramide synthesis have not been cloned, the data described here allow this to be defined as a tru-like pathway. Another pathway type contains a prenylated Ser in position 5. In TruE2, PatD did not modify a Ser in position 6.
These experiments also indicate that TruD and PatD do not determine the regioselectivity of prenylation, since TruD leaves unmodified the residues that are presumably later prenylated. Although there are no prenyltransferase homologs in the pathways, our current hypothesis that the TruF1 and TruF2 proteins are involved in this step.
The usefulness of these enzymes is primarily for in vivo synthesis of new compounds, not necessarily for the in vitro modification of discrete substrates. Previously, we demonstrated that in principle large libraries of natural and unnatural cyanobactin derivatives could be synthesized and screened in E. coli.6 The enzymatic specificity results reported here will be greatly helpful in determining which sequences are appropriate for the development of a chemically diverse cyanobactin library. Metagenome sequence analysis enabled the discovery of methods to synthesize these libraries, and in this study the sequencing methods allowed us to obtain new insights into enzyme function. These methods should be applicable to other enzyme systems. Because the underlying genetic methods are now extremely fast and inexpensive, they are of practical utility in the enzymatic synthesis of new molecules.
Supplementary Material
Acknowledgment
This work was funded by NIH GM071425, and a Willard Eccles Fellowship as well as an ACS Medicinal Chemistry Predoctoral Fellowship sponsored by Sanofi Aventis to J.A.M. We thank Brian Hathaway and Michael Mathews for cloning PatEα and PatD; Chad Nelson, Krishna Parawar, and Jim Muller for mass spectrometry assistance; Adele Flail for graphical assistance; Archana Yerra for technical assistance.
Footnotes
Supporting information available. Additional mass spectrometry data including FT-ICR, MS-MS, ESI-MS, and MALDI data, and SDS-PAGE analyses.
References
- 1.Stein JL, Marshall TL, Wu KY, Shizuya H, DeLong EF. J. Bacteriol. 1996;178:591–599. doi: 10.1128/jb.178.3.591-599.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Haygood MG, S.K D. Appl. Environ. Microbiol. 1997;63:4612–4616. doi: 10.1128/aem.63.11.4612-4616.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Schloss PD, Handelsman J. Curr. Opin. Biotechnol. 2003;14:303–310. doi: 10.1016/s0958-1669(03)00067-3. [DOI] [PubMed] [Google Scholar]
- 4.Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman RM. Chem. Biol. 1998;5:R245–R249. doi: 10.1016/s1074-5521(98)90108-9. [DOI] [PubMed] [Google Scholar]
- 5.Donia MS, Ravel J, Schmidt EW. Nat. Chem. Biol. 2008;4:341–343. doi: 10.1038/nchembio.84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Donia MS, Hathaway BJ, Sudek S, Haygood MG, Rosovitz MJ, Ravel J, Schmidt EW. Nat. Chem. Biol. 2006;2:729–735. doi: 10.1038/nchembio829. [DOI] [PubMed] [Google Scholar]
- 7.Schmidt EW, Nelson JT, Rasko DA, Sudek S, Eisen JA, Haygood MG, Ravel J. Proc. Natl. Acad. Sci. U. S. A. 2005;102:7315–7320. doi: 10.1073/pnas.0501424102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ireland CM, Durso AR, Newman RA, Hacker MP. J. Org. Chem. 1982;47:1807–1811. [Google Scholar]
- 9.Carroll AR, Coll JC, Bourne DJ, MacLeod JK, Zabriskie T, Ireland CM, Bowden BF. Aust. J. Chem. 1996;49:659–667. [Google Scholar]
- 10.Lee J, McIntosh JA, Hathaway BJ, Schmidt EW. J. Am. Chem. Soc. 2009;131:2122–2124. doi: 10.1021/ja8092168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lee SW, Mitchell DA, Markley AL, Hensler ME, Gonzalez D, Wohlrab A, Dorrestein PC, Nizet V, Dixon JE. Proc. Natl. Acad. Sci. U. S. A. 2008;105:5879–5884. doi: 10.1073/pnas.0801338105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Milne JC, Roy RS, Eliot AC, Kelleher NL, Wokhlu A, Nickels B, Walsh CT. Biochemistry. 1999;38:4768–4781. doi: 10.1021/bi982975q. [DOI] [PubMed] [Google Scholar]
- 13.Li YM, Milne JC, Madison LL, Kolter R, Walsh CT. Science. 1996;274:1188–1193. doi: 10.1126/science.274.5290.1188. [DOI] [PubMed] [Google Scholar]
- 14.Milne BF, Long PF, Starcevic A, Hranueli D, Jaspars M. Org. Biomol. Chem. 2006;4:631–638. doi: 10.1039/b515938e. [DOI] [PubMed] [Google Scholar]
- 15.Abbenante G, Fairlie DP, Gahan LR, Hanson GR, Piersens GK, van den Brenk AL. J. Am. Chem. Soc. 1996;118:10382–10388. [Google Scholar]
- 16.Bernhardt PV, Comba P, Fairlie DP, Gahan LR, Hanson GR, Lotzbeyer L. Chemistry. 2002;8:1527–1536. doi: 10.1002/1521-3765(20020402)8:7<1527::aid-chem1527>3.0.co;2-f. [DOI] [PubMed] [Google Scholar]
- 17.Walsh CT, Nolan EM. Proc. Natl. Acad. Sci. U. S. A. 2008;105:5655–5656. doi: 10.1073/pnas.0802300105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Belshaw PJ, Roy RS, Kelleher NL, Walsh CT. Chem Biol. 1998;5:373–384. doi: 10.1016/s1074-5521(98)90071-0. [DOI] [PubMed] [Google Scholar]
- 19.Rudi A, Aknin M, Gaydou EM, Kashman Y. Tetrahedron. 1998;54:13203–13210. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.