Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Nov 14.
Published in final edited form as: Nat Prod Rep. 2018 Nov 14;35(11):1156–1184. doi: 10.1039/c8np00044a

Trapping Interactions Between Catalytic Domains and Carrier Proteins of Modular Biosynthetic Enzymes with Chemical Probes

Andrew M Gulick 1, Courtney C Aldrich 2
PMCID: PMC6235721  NIHMSID: NIHMS983052  PMID: 30046790

Overview

The Nonribosomal Peptide Synthetases (NRPSs) and Polyketide Synthases (PKSs) are families of modular enzymes that produce a tremendous diversity of natural products, with antibacterial, antifungal, immunosupressive, and anticancer activities. Both enzymes utilize a fascinating modular architecture in which the synthetic intermediates are covalently attached to a peptidyl- or acyl-carrier protein that is delivered to catalytic domains for natural product elongation, modification, and termination. An investigation of the structural mechanism therefore requires trapping the often transient interactions between the carrier and catalytic domains. Many novel chemical probes have been produced to enable the structural and functional investigation of multidomain NRPS and PKS structures. This review will describe the design and implementation of the chemical tools that have proven to be useful in biochemical and biophysical studies of these natural product biosynthetic enzymes.

TOC Graphic

A review of chemical probes used to characterize interactions between carrier and catalytic domains of modular NRPS and PKS enzymes.

graphic file with name nihms-983052-f0001.jpg

1. The Biochemistry and Structural Biology of the Modular Biosynthetic Enzymes

1.1. Natural Product Biosynthesis by Modular Enzymes

Most microorganisms produce a chemically diverse array of chemicals that allow them to adapt to the wide range of environments they may encounter. These natural products can play many roles, including coordinating communication between bacterial cells, inhibition of competing microbial or host cells, and acquisition of essential nutrients to support growth. Almost by definition, microbial natural products have interesting biological activities, which have been exploited in the discovery of new therapeutic agents. Moreover, their stability in biological environments coupled with their ability to cross cellular membranes have been instrumental for their development as pharmaceuticals.1,2

As detailed in the reviews of this issue, many natural products are synthesized by two families of large, modular enzymes known as Polyketide Synthases (PKSs) and Nonribosomal Peptide Synthetases (NRPSs), which produce polyketide and peptide molecules, respectively. The elucidation of the modular synthetic strategy used by both PKSs and NRPSs, whereby building blocks are covalently attached to an integrated carrier protein domain then shuttled to neighboring catalytic domain for chemical extension or modification, was one of the pioneering achievements in the field of chemical biology.3-6

Along with the functional characterization of the PKS and NRPS enzymes, understanding the structural basis of catalysis was equally important. The requirement for the loaded carrier domains to visit multiple active sites as well as the large size of these multidomain enzymes provided challenges to standard structural techniques of NMR or protein crystallography. The early advances in understanding the structural basis of NRPS and PKS catalysis logically focused on individual domains, including the study of naturally occuring free-standing catalytic domains or those generated by genetic truncations. Additional insights were obtained by the study of structurally and functionally homologous enzymes, which provided clues to the features that control catalysis or dictate the necessary conformational changes required for the proper delivery of the loaded substrates to their catalytic domains.

In several cases, the structural interrogation of modular biosynthetic enzymes was facilitated by the use of chemical probes. These molecules were designed to mimic the substrates and intermediates. The probes stabilized the interaction of multiple domains, enabling the capture of catalytically-relevant states. These tools thus provided both the views of the active site, allowing the identification or confirmation of proposed catalytic residues, as well as the proper interaction interfaces between different domains. This review will describe the design and use of these tools for both the PKS and NRPS enzymes.

We will begin with a brief overview of the structure and function of the modular enzyme systems, highlighting several recent reviews. Both systems utilize a modular approach, where the substrates and synthetic intermediates are covalently attached to small carrier domains that deliver the substrates to neighboring catalytic domains. The necessary steps for both NRPS and PKS systems therefore include loading, chemical modification, extension, and termination and associated release of the final product.4, 7 Specialized catalytic domains are present in the modular enzymes for carrying out each of these steps. While their presence and order is often indicative of the nature of the final product, the literature contains many examples of unexpected domain organization, inactive domains, and catalytic domains that break the expected rules and carry out unusual chemistry.

1.2. Carrier Protein Domains

During catalysis, the building blocks for the PKS and NRPS enzymes are loaded on an acyl- or peptidyl carrier protein domain. These domains, which also share homology with the acyl carrier proteins of fatty acid biosynthesis or transport, are composed of four α-helices and the intervening loops. Helices α1, α2, and α4, are 10-15 residues in length, while helix α3 is shorter, encompassing only one or two turns of the helix. This shorter helix lies nearly perpendicular to the other α-helices.

A critical post-translational modification is placed on a conserved serine residue at the start of α2. A phosphopantetheine moiety derived from Coenzyme A (CoA) is transferred to the serine through the activity of a phosphopantetheinyl transferase.8, 9 This modification results in the addition of a 16-20 Å flexible cofactor that terminates in a thiol. The pantetheine thiol can then reach into the active site of the neighboring catalytic domains to become initially loaded or deliver the substrate or intermediate. The ability to post-translationally modify the carrier protein domain enables the installation of pantetheine analogs that have been exploited in the studies described below. Critically, the Sfp enzyme from B. subtilis is sufficiently promiscuous to allow a wide variety of CoA analogs to be used as substrates to tag the carrier protein domain with novel chemical tools. We will use PCP and ACP to describe the carrier protein domains of NRPS and PKS systems, respectively, regardless of the exact chemical nature of the substrate loaded on the different carrier proteins

The presence of the pantetheine at the start of the α2 helix defines the face of the carrier domain that is presented to the catalytic domains. Because of this, the elements of the carrier domain that interact with other domains are largely the hydrophillic face of helix α2 and loop 1 between helices α1 and α2. This loop is 15-20 residues in length and exhibits flexibility in early NMR structures. Additionally, residues of helix α3 have also been shown to interact with neighboring catalytic domains. The helical architecture of the carrier domain offers another interesting feature as the pantetheine may fold back upon the protein domain to allow the loaded substrate to be partly or completely buried within the carrier domain, interacting with helices α2 and α3, as well as the dynamic loop 1.10-12 The buried and extended forms are likely in equilibrium10 and may serve to protect the thioester from hydrolysis.

1.3. NRPS Catalytic Domains

The substrates or building blocks for NRPSs encompass a broad range of carboxylates, including proteinogenic and less common amino acids, as well as fatty acids and aromatic acids. These substrates are loaded onto the carrier protein pantetheine cofactor through a two-step reaction that is analogous to the activation of amino acids by amino acyl-tRNA synthetases. The NRPS adenylation (A) domains catalyze the initial adenylation of the amino acid through a reaction with ATP resulting in the formation of an acyladenylate (a mixed anhydride of the carboxylic acid substrate and adenosine monophosphate (AMP), i.e. acyl-AMP).13 This intermediate then serves as the substrate for the second partial reaction in which the panthetheine thiol attacks the activated substrate to displace AMP and load the amino acid substrate as a thioester on the pantetheine cofactor.14, 15 The NRPS adenylation domains belong to a family of adenylate-forming enzymes that also includes acyl- and aryl-CoA synthetases (ligases) and firefly luciferases.13 These enzymes are all structurally and functionally similar and all adopt two catalytic conformations to carry out the individual steps of their complete reaction.

In many NRPS modules, the loaded building block is modified before subsequent peptide bond formation that extends the nascent peptide. These chemical modifications include N- or O-methylation, epimerization, halogenation, cyclization, or hydroxylation. While some modifications are catalyzed in trans by free-standing enzymes, other changes are installed by catalytic domains embedded within the NRPS modules. Among the most common of the latter are methyltransferases16 that transfer a methyl group from S-adenosylmethionine to the substrate amino group, epimerization17 domains, which produce D-amino acids, and cyclization domains responsible for the generation of heterocycles from serine, threonine, or cysteine residues.18, 19

Two substrates loaded onto PCP domains of adjacent modules must be joined via formation of an amide or peptide bond. Upon catalysis, the amino group on the amino acid attached to the downstream PCP attacks the thioester of the amino acid or peptide that is loaded on the upstream PCP resulting in the extension of the peptide chain by one unit. This reaction is catalyzed by an NRPS condensation domain (C). The active site of the condensation domain lies between two lobes and contains a conserved HHxxxDG motif.20, 21 The binding site for the donor and acceptor carrier protein domains are positioned on opposite sides of the condensation domain, allowing the pantetheine bound substrates to approach the core of the domain through the cavities between these two lobes.

In the final NRPS module of a biosynthetic pathway, the completed peptide must be released from the carrier protein domain. The most common domain at the C-terminus is a thioesterase (TE) domain, a serine or cysteine hydrolase that catalyzes transfer of the peptide to the active site nucleophile. This acyl-enzyme intermediate is then released through either hydrolysis, resulting in a linear peptide, or cyclization.22-24 Cyclization can occur through the N-terminal amine or with side chain nucleophiles and, not uncommonly, may also involve multimerization of the peptide through iterative synthesis of the monomeric peptide units. Additional means of product release include C-terminal reductase domains that result in the formation of a peptide aldehyde or peptide alcohol.25, 26 Finally some fungal NRPSs terminate with macrocyclization domains, which are homologous to condensation rather than thioesterase domains.27, 28

1.4. PKS Catalytic Domains

The PKS modules similarly must catalyze the same fundamental steps in the biosynthesis of the polyketide product.29-32 The building blocks for PKS systems are acyl-CoA molecules. The initial substrate is generally an acetyl- or propionyl-CoA, while subsequent modules utilize a malonyl-CoA or methylmalonyl-CoA thioester. The building blocks are covalently attached to the ACP by an acyltransferase (AT) domain.33 The acyltransferase domains can be either embedded within the PKS module, as a so-called cis-AT domain, or a free-standing protein referred to as a trans-AT.31

The extension of the acyl chains then is catalyzed by the activity of the ketosynthase (KS) domain.30 This domain first self acylates, transfering the acyl chain from an upstream acyl-carrier protein onto a nucleophilic cysteine residue. The ketosynthase domain then catalyzes a Claisen condensation with the loaded unit on the subsequent acyl- carrier protein. The malonyl (or methylmalonyl) is decarboxylated, resulting in the carbanion on the downstream ACP, which attacks the upstream acyl chain from the KS acyl-enzyme intermediate, leading to an extended acyl chain containing a β-keto moiety. In the absence of modifying domains, repetition of this cycle results in the formation of the polyketide chain.

The presence of modifying domains however will result in different structural units within the polyketide chain. The presence of a ketoreductase (KR) can convert the β-keto group to a β-hydroxyl. A dehydratase (DH) domain will eliminate the hydroxyl, resulting in the formation of an α-β double bond. Finally, an enoyl reductase (ER) can reduce the double bond to an alkyl moiety. These three steps are in fact used by fatty acid synthase to product fully saturated fatty acids, pointing to the evolutionary relationship of the fatty acid and polyketide synthases. The varable inclusion of these domains in PKS modules leads to the chemical diversity of polyketide products.

Like the NRPSs, the polyketide product is bound to the terminal ACP domain and therefore a final activity is required to catalyze release of the PKS product. Most commonly, this is catalyzed by a thioesterase domain that releases the polyketide through either hydrolysis or macrocyclization with an internal nucleophile. The PKS TE domains also contain an Asp-His-Ser catalytic triad and an acyl enzyme intermediate. Often, the active site pocket excludes water, favoring the lactonization of the polyketide product.30

2. Chemical Tools to Interrogate NRPS Enzymes

2.1. Adenylation Domains

The activation and transfer of amino acids as well as related carboxylic acid substrates onto their downstream carrier protein domains by adenylation domains represents the first step in the NRPS catalytic cycle. Structures of individual excised A domains and stand-alone A domains have been solved in the apo-form with bound substrates revealing an open unliganded conformation as well as closed liganded conformations capturing the adenylation conformational state.34-37 The co-crystal structures of A domains have provided a detailed understanding of the substrate recognition requirements, which in turn has enabled development of algorithms to predict substrate specificity of adenylation domains solely from the primary amino acid sequence.38, 39 In the second thioesterification partial reaction, wherein the activated substrate is transferred onto the phosphopantetheine arm of the carrier protein domain, a large conformational shift of the C-terminal subdomain of the A domain occurs in order to present a binding interface for the incoming carrier protein domain and enable the phosphopantheine arm to insert into the active site. This conformational change was first observed with the structural and functional homolog, acetyl-CoA synthetase and the free-standing A domain DltA.40, 41 This conformational change largely involves the rotation of the C-terminal subdomain around a conserved aspartic acid or lysine hinge residue that separates the N-terminal subdomain from the ~120 residue C-terminal subdomain.13 Capture of the thioester-forming conformation initially appeared to require the presence of the CoA or pantetheine analog; however, some family members42, 43 have been structurally characterized in the thioester-forming conformation in the absence of ligands expected to drive this conformational state. Therefore, it seems the high energy barrier separating these two states is small and the proteins are able to exchange relatively easily between these two conformations.41 While this has important implications for the tailoring of a single active site for the catalysis of the two distinct partial reactions, it also proves challenging for structural studies aimed to capture specifically one state or the other.

2.1.1. The design of AMS inhibitors.

The synthesis of a small-molecule chemical probes has proven extremely useful to stabilize the adenylation and thioesterification conformations. Non-hydrolyzable mimics of the labile acyl- adenylate intermediate can be easily prepared through substitution of the adenylate (i.e. AMP) with adenosine monosulfamate (AMS) [Fig. 1], The AMS template for NRPS adenylation inhibitors introduced by Marahiel et al.,44, 45 is based on the functionally related aminoacyl-tRNA synthetase inhibitors, which can be traced to earlier seminal reports from Ishida and Isono on the nucleoside antibiotic ascamycin 5.46-48 The acyl-AMS mimics tightly bind to their cognate adenylation domains and possess dissociation constants in the nanomolar to picomolar range owing to the large number of interactions with both substrate-binding pockets. Binding of acyl-AMS derivatives to A domains enhances thermal stability measured using differential scanning fluorimetry (DSF), typically resulting in greater than 20 °C increase in the melting temperature, thereby promoting crystallization of A domains in the adenylation conformation.

Fig. 1.

Fig. 1.

Nucleoside inhibitors of NRPS adenylation domains. A. The two step reaction carried out by NRPS adenylation domains. In the first partial reaction, the amino acid reacts with ATP to form an acyl-adenylate intermediate. In a second partial reaction, the pantetheine cofactor from the carrier protein forms a covalent thioester with the activated amino acid. B. Acylsulfamate inhibitor that mimics the acyl adenylate. C. Natural product ascamycin (5) that inspired the acylsulfamate inhibitors.

2.1.2. The design and biochemical characterization of AVS inhibitors.

The design of ligands to trap adenylation domains in the thioesterification conformation proved more challenging, but was ultimately realized through synthesis of acyl-adenylate surrogates containing a Michael acceptor at the precise position of the incoming phosphopantetheine thiol. This was accomplished by replacing the acyl-phosphate linkage with a vinylsulfonamide that was predicted to mimic the folded conformation of the acyl-adenylate in the active site. The vinylsulfonamide was selected to minimize non-specific thiol addition because it is among the least reactive Michael acceptors (relative activity: enone > vinylsulfone > vinylsulfonate > enoate > vinylsulfonamide). Representative syntheses of acyl-AMS and related acyl-adenosine vinylsulfonamides (acyl-AVS) are shown in Fig. 2.

Fig. 2.

Fig. 2.

Synthesis of salicyl-AMS and valinyl-AVS. Representative syntheses of an AMS inhibitor (panel A) and an AVS inhibitor (panel B).

To establish initial proof-of-principle, 17-18 were synthesized and evaluated against the prototypical stand-alone adenylation domain EntE,49 which is part of a three module NRPS assembly line comprising EntE, EntB, and EntF responsible for the synthesis of the siderophore enterobactin.50 EntE activates 2,3-dihydroxybenzoic acid (2,3-DHB) then transfers it onto the carrier protein domain of EntB [Fig. 3], Compared to acyl-AMS ligands, acyl-AVS 17-18 were shown to bind relatively weakly to EntE possessing dissociation constants of 6.3 μM for 18 and 63 μM for 17 measured using a fluorescence polarization assay. The substantially weaker affinity is attributed to the deletion of the carbonyl group that H-bonds to the conserved Lys520 as well as removal of the ionized sulfamate nitrogen, which has an electrostatic interaction with two threonine residues from conserved A3 and A5 motifs.3, 13 However, the modest micromolar affinity was deemed sufficient to ensure binding to EntE before channeling onto EntB. Incubation of 1 mM 18 with 10 μM EntE and 10 μM EntB resulted in quantitative labelling of EntB determined by electrospray mass spectrometry.49 Exclusion of EntE from the reaction mixture did not lead to covalent modification of EntB demonstrating 18 does not react non-specifically. Indeed, measurement of the pseudo- first order rate constant for reaction of 17 with N-acetylcysteamine revealed it had low intrinsic activity towards thiols (Kobs = 1.85 × 10−3 min-1 at 25 °C and pH 8).51

Fig. 3.

Fig. 3.

Enterobactin biosynthesis. A. NRPS pathway for the synthesis of the siderophore enterobactin. EntE (a) activates and loads a molecule of 2,3-dihydroxybenzoate (DHB) onto the pantetheine cofactor of the carrier protein of EntB. Similarly, the EntF adenylation domain (b) loads a molecule of serine onto the EntF carrier domain. The EntF condensation domain (c) catalyzes the transfer of DHB onto the serine amine, generating the N-(2,3-dihydroxybenzyl)serine amide. This intermediate is then (d) transferred to the catalytic serine within the thioesterase domain. Two additional rounds of reactions (a,b,c) produce the second and third molecules of N-(2,3-dihydroxybenzyl)serine that are sequentially used to extend the trimer. Finally, the intermediate is (g) cyclized and released from the thioesterase domain to produce enterobactin (16). B. The adenylation domain EntE is able to install a molecule of Sal-AVS (17) or DHB-AVS (18) on the cofactor of the EntB carrier protein to form loaded proteins 19 and 20. C. Summary table of the ability of EntE, but not homologous adenylation domains, to load the AVS inhibitors onto EntB.

The ability of other non-cognate adenylation domains VibE, BasE, and MbtA involved in biosynthesis of related aryl-capped siderophores52 to transfer probe 18 onto EntB was assessed to determine the specificity of the A domain PCP interaction. Incubation of VibE, BasE or MbtA under otherwise identical conditions (1 mM 18, 10 μM EntB) failed to show any labelling of EntB with the probe. Analogously, incubation of 17 and MbtA with its corresponding carrier protein MbtB led to efficient covalent modification of the phosphopantetheinyl thiol of MbtB as confirmed by trypsin digestion and LC-MS/MS analysis. Like the EntE-EntB protein pair, MbtA-MbtB were highly specific and transfer of probe 17 was not observed with other non-cognate A domains. Additionally, probe 17 stabilized a protein-protein interaction between MbtA and MbtB as measured in an electrophoretic mobility shift assay in a non-denaturing gel.51 Taken together these results suggest the A domain first binds the acyl-AVS then, upon proper recognition of the partner PCP and delivery of the pantetheine to the active site, the inactivator is loaded onto the carrier protein domain cofactor. Although not specifically cross-linking the A and PCP domains, the affinity of the A domain for the loaded substrate analog stabilized the interaction between these two domains.

2.1.3. AVS inhibitors as tools in crystallographic studies.

The demonstration of the ability to promote the productive interaction between the A domain and its cognate PCP suggested that the AVS inhibitors might be useful tools for the structural characterization of the functional interaction between these domains. Indeed, AVS inhibitors have been used in four different NRPS systems to trap the adenylation domains in the thioester forming conformation in a complex with the holo carrier protein domain. The first system to be studied was the enterobactin NRPS system. To facilitate the crystallization, a fusion protein was genetically designed to create an EntE-B chimeric protein.49, 53 This didomain protein was shown to be functionally active and formed crystals that showed the formation of the covalent linkage between the pantetheine thiol and the salicyl-AVS inhibitor 17.

This strategy was also adopted to solve the structure of PA1221, a didomain A-PCP protein from an uncharacterized pathway in Pseudomonas aeruginosa.54 Because the substrate for this adenylation domain was unknown, it was first necessary to screen for activity using a variety of potential substrates. Biochemical assays employing the common pyrophosphate exchange assay showed a substrate preference for valine. The production of Val-AVS 15 allowed for crystallization of the didomain protein, once again trapped in the thioester-forming conformation. Although it was possible to crystallize the protein in the absence of the inhibitor, the carrier protein was disordered in the crystal lattice, illustrating that the inhibitor stabilized the adenylation and carrier protein interface.

While the use of the AVS inhibitor first provided a view of the functional interaction between the adenylation and carrier protein domains, they have been used more recently to examine larger multidomain NRPS proteins. A Ser-AVS inhibitor of the EntF adenylation domain was used to capture this terminal module in the thioester-forming conformation.55, 56 And finally, a Gly-AVS inhibitor was used to determine the structure of the A-PCP-C tridomain protein from DhbF, a dimodular NRPS from the bacillinbactin system of Geobacillus sp. Y4.1MC1. The A-PCP-C protein used for structural studies spanned the boundaries between two adjacent modules.57 Interestingly, overnight incubation of the Gly-AVS resulted in complete covalent modification of the holo carrier protein domain but activity of the A domain was only partly blocked. This suggested that even when modified with the AVS inhibitor, the PCP is able to dissociate from the A domain, leaving the adenylation domain active site free to carry out the adenylating partial reaction. Despite this demonstration that the AVS inhibitors may not promote a uniformly stable adenylation-PCP complex, the Gly-AVS inhibitor did facilitate crystallization of the DhbF A-PCP-C protein in which the pantetheine cofactor was covalently bound to the inhibitor within the adenylation domain active site.

All four crystal structures, EntE-B, PA1221, EntF, and DhbF, illustrate shared features and a common interface between the A and PCP domains. The A domains all adopt the thioester-forming conformation in which the C-terminal subdomain is rotated to direct the conserved A8 motif immediately following the hinge into the active site (Fig. 4), The domain interface is contributed by residues from both the subdomains of the A domain. Residues from the N-terminal subdomain interact with residues from loop 1 that extends between helices α1 and α2 of the PCP, while residues from the C-terminal subdomain interact primarily with residues from helix α2 of the PCP. Interestingly, the exact nature of the interface is not shared between the different systems that have been structurally characterized. The interface between EntE and EntB is much more polar, with a number of charged residues existing on both EntE and EntE. In contrast, the interface of the didomain protein PA1221 shows a largely hydrophobic interface between loop α1 and a helix on the adenylation N-terminal subdomain. The PA1221 interface is further formed by hydrogen bonding and ionic interactions between loop 1 of the PCP and the C-terminal subdomain of the A domain. The interfaces used by EntF and DhbF are also rather hydrophobic along helix α2. It is possible that the more polar nature of the EntE-EntB interface reflects the requirements for these two proteins to exist freely and interact in trans for the necessary thioesterification reaction. In contrast the existence of the A and PCP domains on a single multidomain polypeptide in the other three systems may dictate the more hydrophobic nature of the interface.

Fig. 4.

Fig. 4.

Structure of EntF bound to Ser-AVS probe (PDB 5JA1). A. The Ser-AVS probe was incubated with holo-EntF and crystallized. The EntF adenylation domain (N-terminus, pink; C-terminus red) adopts the thioester-forming conformation. B. The active site of the adenylation domain shows the pantetheine from Ser1006 of the PCP (yellow) entering the active site where it interacts with the inhibitor. Side chains are shown for Arg673, which interacts with the phosphate moiety, Asp648 and Asp754, which interact with the serine substrate, as well as the Asp840 residue that positions the ribose hydroxyls.

2.1.4. Use of the AMS and AVS motif for activity based proteomic profiling.

Fumihiro Ishikawa and colleagues have developed a series of AMS and AVS derivatives that exploit the high affinity binding or covalent modification to create novel chemical tools [Fig. 5], These studies describe multiple uses that take advantage of the specific catalytic activities of the A domains to identify new NRPS pathways and inhibitors of specific domains. The probes are able to selectively label A domains within proteomic lysates and are sensitive to cognate inhibitors, raising the potential to interrogate NRPS biosynthetic pathways by activity based proteomic profiling,58, 59 as has been used by others to target serine hydrolases60 or kinases61.

Fig. 5.

Fig. 5.

Chemical probes for proteomic profiling. As a representative aminoacyl AMS probe, PheAMS is shown modified with biotin (21) or photocrosslinking (22) probes to allow detection of phenylalanine specific adenylation domains.

The ability of Sal-AMS or DHB-AMS fluorescent probes modified on the 2’-hydroxyl of the ribose to maintain robust affinity for a series of free-standing adenylation domains 62 suggested this site could be modified in the design of the proteomic probes. A panel of AA-AMS derivatives were synthesized containing a biotin molecule attached via a polyethylene linker to the 2’-hydroxyl site, including phenylalanine-based probe 21. These compounds were used to develop a protein-protein interaction assay that was inspired by the commonly used Enzyme-Linked Immunosorbant Assay (ELISA) technique. In this approach, the biotinylated Acyl-AMS derivative is bound to a streptavidin-coated bead. The immobilized probe was then used to capture the A domain. The presence of a His6-tag on the adenylation domain enabled the detection of binding through the use of anti-His6 antibodies and antibody-directed enzyme conjugates. This assay was shown to be selective for the probe-adenylation domain interaction, could enable the mutation of the adenylation domain active site to alter substrate specificity, and finally could be used to calculate affinity constants of additional inhibitors.63, 64

A next series of experiments by Ishikawa and colleagues incorporated a photoactive crosslinker into the affinity probes to enable the covalent labeling of NRPS adenylation domains (22). These compounds altered the amino acyl-AMS derivatives to incorporate a benzophenone moiety as well as an alkyne onto the 2’-hydroxyl of the ribose moiety. The resulting high affinity photocrosslinking probe could be subsequently modified by click chemistry to label the A domain with a fluorophore.65 This probe functioned with purified protein and in a crude cell lysate to label specifically the A domain. Additionally, multiple probes could simulataneously label a multi-domain NRPS protein containing more than one A domain.66

To expand the utility of the designed chemical probes, these photoreactive crosslinking tools were then used to identify specific inhibitors of A domains. Alternate amino acyl-AMS inhibitors were tested for their ability to block crosslinking. Results correlated with known alternate promiscous activity of several adenylation domains.67 Finally, Ishikawa and colleagues also used this same set of probes to examine the substrate specificity through directed mutagenesis of the A domains or as an affinity purification probe.68, 69

Very recently, the electrophilic vinylsulfonamide inhibitors have been exploited to specifically label carrier protein domains providing additional proteomic profiling tools.70 The probes were designed to contain an alkyne on the 2’-hydroxyl of the inhibitor. As demonstrated in the prior biochemical and structural studies, the new probes could be used to covalently label the pantetheine cofactor of the carrier protein. The reaction was specific as controls lacking the A domain or performed with an apo PCP domain resulted in no background labeling.

2.2. Thioesterase Domains

The NRPS thioesterase (TE) domains are members of the α/β hydrolase family of enzymes and employ a catalytic triad of Ser-His-Asp for release of the linear polypeptide from the antecedent PCP domain through a conventional two-step mechanism involving initial transesterification of the peptidyl-S-PPant-PCP to a peptidyl-O-Ser-TE intermediate. The TE acyl-enzyme intermediate is then either hydrolyzed to afford a linear peptide or more commonly cyclized through intramolecular nucleophilic attack of a side-chain heteroatom within the polypeptide to form a macrolactone or macrolactam.71

2.2.1. α-Chloroacetamide-CoA probes for TE domains.

Bruner and colleagues employed an innovative strategy to capture a stable interaction between a TE domain and carrier protein domain by attaching a reactive α-chloroacetyl warhead to the terminus of the phosphopantetheinyl (PPant) arm of a carrier protein domain.72 Halomethyl carbonyl inhibitors have been widely used affinity labels for protases.73 Another key design feature was replacement of the labile thioester bond by a more stable amide bond to limit hydrolysis by the TE domain. Following insertion of the modified PPant arm into the active site of the TE, it was hypothesized that addition of the nucleophilic serine to the substrate amide would afford a tetrahedral intermediate. Intramolecular displacement of the chloride by the oxyanion then furnishes an intermediate epoxide that can be intercepted by the active site catalytic histidine leading to the predicted cross-linked species, in analogy to the mechanism observed with the related chloromethyl ketone protease inhibitors.73 Alternatively, hydrolysis of the intermediate epoxide can lead to formation of a non-cross-linked hydroxymethylamide. [Fig. 6].

Fig. 6.

Fig. 6.

Chemical probes to target the EntF thioesterase domain. A. Synthesis of acyl-desthioamino-CoA probes requires chemical production of pantetheine analog 26 which is enzymatically converted to des-thioaminoCoA through the activities of pantothenate kinase (PanK), phosphopantetheine adenylyltransferase (PPAT), and dephosphocoenzyme A kinase (DPCK). The amide analog of the CoA thioester is then produced chemically to generate the final probe 27. B. The promiscuous phosphopantetheinyl transferase (Sfp) was then used to load the pantetheine analog on the carrier protein of the truncated PCP-TE didomain of EntF. C. The catalytic nucleophile Ser1138 forms an epoxide intermediate 29 with probe 27. Subsequent hydrolysis results in the 2-hydroxyacetyl product observed in the crystal structure.

Chemoenzymatic synthesis of the required α-chloroacetylamino-CoA chemical probe featured Burkart’s solid-phase synthesis for preparation of the des-thio-aminopantetheine 26, which was efficiently transformed to the corresponding CoA derivative in one-pot by pantothenate kinase (CoaA), phosphopantetheine adenylyltransferase (CoaD), and dephospho coenzyme A kinase (CoaE) (Fig. 6A), The amino-CoA probe was a versatile intermediate and could be coupled to chloroacetic acid employing PyBOP to provide probe 27.

As a model system, the authors selected EntF, a four domain protein (C-A-PCP-TE) discussed in the previous section. The amino-CoA probe was attached to apo-EntF using the promiscuous phosphopantetheinyl transferase Sfp. To verify installation of the amino-PPant cofactor arm on the PCP domain of this 142 kDa protein, the authors incubated EntF with [14C]-serine and ATP, which resulted in incorporation of serine onto the amino-PPant moiety through an amide bond (EntF-PPant-NH-Ser). Autoradiographic analysis of an SDS-PAGE gel revealed a ~140 kDa band corresponding to [14C]-serine-labelled EntF. By contrast, no band was observed in a control experiment with wild-type holo-EntF possessing the native PPant moiety, presumably due to TE-mediated hydrolysis of the labile thioester bond. Following these initial experiments, the authors attached chloroacetyl probe 27 to apo-EntF utilizing Sfp. They observed the TE activity was significantly reduced, suggesting the probe had cross-linked the PCP and TE domains.

2.2.2. Capture of PCP-TE interaction using an α-chloroacetamide probe.

Direct confirmation of cross-linked EntF was impeded by the large molecular weigh of this protein, thus a recombinant truncated PCP-TE didomain protein was purified.74 This construct was the same as an apo-PCP-TE from EntF that had been structurally characterized by NMR.75 The PCP-TE didomain was incubated with Sfp and 27 for 2 hours to allow post-translational phosphopantetheinylation of the PCP domain and subsequent covalent modification of the TE domain. Analysis of the protein by mass spectrometry indicated quantitative conversion to a species incorporating 27 with loss of chlorine. The modified didomain protein readily crystallized whereas neither the apo- or holo-PCP-TE crystallized under all conditions examined.74 A trapped covalent intermediate was not observed in the structure [Fig. 7]; rather, the initial acyl enzyme is hydrolyzed either directly via a water molecule or indirectly, perhaps through a second acyl enzyme intermediate with another nucleophilic residue in the active site. Nonetheless, this novel chemical approach enabled the crystallographic observation of the functional interaction between the holo-PCP and the TE domain. The pantethine from the carrier protein is directed into the TE domain active site through a channel below the lid domain. The PCP domain forms a large, mostly hydrophobic interface that includes residues from the α2 and α3 helices of the PCP and both the lid and the core of the TE domain. There are no significant interactions with the phosphate moiety of the cofactor and the TE domain, and only two hydrogen bonds between protein and pantetheine group, occurring between Ser1075 and the main chain amide of Ala1074 and the carbonyl moieties of the cofactor. Within the catalytic site, the carbonyl of the α-hydroxyacetyl moiety, resulting from the hydrolysis of the acyl-enzyme intermediate, is positioned in the oxyanion hole formed by the main chain amides of Ala1074 and Leu1139.

Fig. 7.

Fig. 7.

Structure of the EntF holo-PCP-TE didomain (PDB 3TEJ). Chemical probe 27 was used to capture the didomain crystal structure of the carrier protein engaged with the thioesterase domain of EntF. The thioesterase catalytic triad is composed of Asp 1165, His 1271, and Ser1138. Several residues that form the hydrophobic interactions formed between the carrier domain (yellow) and the mouth of the thioesterase domain are shown.

Ideally, these techniques to describe the PCP-TE structure could also be employed in the context of a complete NRPS module. Unfortunately, to date, in multiple crystal structures of termination modules, the thioesterase domain has been observed to adopt strikingly different orientations in relation to the core of the module formed by the condensation and adenylation domains.55, 56, 76 This suggests that there may not be a single stable position of the TE. Upon completion of the condensation reaction to form the linear peptide, the PCP and the TE domains may be relatively dynamic and free to engage in order to catalyze the final release of the peptide through either hydrolysis or cyclization.

2.3. Condensation Domains

NRPS condensation (C) domains bind to two loaded holo-PCP proteins to catalyze peptide bond formation. Structures of C domains reveal two lobes forming a V-shaped pseudo-dimer and have been observed in open and closed conformations.20, 21 The loaded pantetheine chains of the donor and acceptor substrate approach the active site from opposite sides of the domain through two clefts which meet at the central active site. Given the high intrinsic reactivity of amines with thioesters, the C domain facilitates catalysis primarily by positioning the α-amino group of the acceptor substrate for nucleophilic attack on the reactive thioester of the donor substrate. The second histidine residue in the conserved active site motif HHxxxDG is important for proper orientation of the the α-amino group, but does not appear to serve as a general base. A recent comprehensive review by Bloudoff and Schmeing describes the background biochemistry as well as structural studies of C domains.77

Structures of C domains interacting with holo-PCP domains illustrate both the donor and acceptor pantetheine tunnels. The structures of the SrfA-C76 and the AB340355 modules present the interface between the condensation and the acceptor PCP. These two structures represent the interactions between the apo- and holo-PCP, respectively, with the C domain and show modest differences in the orientation of the PCP relative to the C domain. A crystal structure of a C domain with the donor PCP has not been published; however, recent structures have been determined between PCP and condensation domain homologs. These new structures include the N-terminal domain of GrsA,17 which catalyzes epimerization of the phenylalanine residue loaded on a downstream PCP, and the macrocyclization domain of TqaA, a fungal protein that catalyzes the macrolactamization-mediated release of fumiquinazoline F.28 Together, these structures define the binding tunnels of the substrates to the C domain active site.

2.3.1. Alkyl halide probes of condensation domains.

While the structures of complete NRPS modules containing C domains with holo-PCP domains offer a view of the protein-protein interface and the interactions with the pantetheine moiety, to date none of the structures contained a loaded pantetheine arm. Attempts to obtain co-crystal structures by soaking aminoacyl thioester substrates have also not been successful, likely owing to the low affinity of substrates without their PCP domains. The Schmeing group thus employed an elegant chemical biology approach to probe the C domain by covalently tethering small-molecule aminoacyl substrates near the active site.78 This was accomplished by introducing a cysteine into the pantetheine tunnel of the acceptor substrate. Acceptor aminoacyl moieties containing a bromoalkyl linker through which the acyl group could then be covalently attached to the cysteine by alkylation. As done with the thioesterase probes, the authors replaced the labile thioester by an isosteric amide.

The first C domain of the calcium-dependent antibiotic synthetase (CDA-C1) was selected for these studies because it has been structurally characterized, forms well-diffracting crystals, and retains activity when excised from the full-length synthetase.20 This C domain is responsible for the condensation of a 2,3-epoxyhexanoyl-ACP donor with a serine-PCP acceptor. A glutamate residue of CDA-C1 was chosen for mutation to cysteine (Glul7Cys) because it is located in the acceptor pantetheine channel between the surface of the protein and the active site. Next, a panel of probes was synthesized employing alanine, rather than the natural substrate serine, as the acceptor substrate. The alkylbromide moiety for conjugation to the cysteine was attached to the alanine carboxylate via an amide linkage. Because, the length required to reach the active site could not be precisely modelled, probes 31a-c were prepared incorporating alkyl chains ranging from three to five methylenes (Fig. 8).

Fig. 8.

Fig. 8.

Chemical biological approach to obtain CDA-C1 structure with ligands. Glul7 of the CDA-C1 protein was mutated to cysteine. Incubation of the mutant protein with alkyl halide probes 31a–c afforded covalently modified protein 32a–c. The protein was catalytically competent as incubation with the loaded holo-carrier protein donor enabled formation of the condensation domain harboring the N-acylamino acids 33a–c.

Reaction of the probes with the condensation domain was monitored via mass spectrometry, which confirmed the probes were installed at Cys17 of the condensation domain affording 32a-c. Further, the probes were unable to modify wild-type protein demonstrating the specificity of labelling. The ability of the C domain tethered probes 32a-c to react with the 2,3-epoxyhexanoyl-ACP donor group could also be monitored by mass spectrometry to produce 33a-c showing that all three were competent substrates leading to the expected N-acylamino acid products. Among the acceptor substrates, 32b was most efficiently acylated, indicating optimal spacer length was achieved by a four-carbon linkage. Additionally, amide bond formation was dependent on the presence of His157, a key catalytic residue of the C domain (the second histidine in the HHxxxDG conserved motif). Collectively, these results recapitulated the biochemical selectivity of the wild-type CDA-C1 condensation employing substrates delivered by ACP domains, suggesting the tethered acceptor substrate largely retains the native interactions with CDA-C1.

The chemically modified condensation domains were used in crystallization studies. Structures of 32b and 32c were solved at 1.6 Å resolution. Clear electron density was observed for both probes bound to C17 and directed into the active site. The alanine substrate analog was positioned in the same place in both structures, in spite of the different linker lengths of the probes, suggesting the bound conformation represents that of the native substrate. The primary amine of the tethered alanine formed hydrogen bonds with both His157 and the backbone carbonyl of Ser386 (Fig. 9), The authors then used the structural information to interrogate substrate specificity by mutagenesis of active-site residues. They were successfully able to both expand and restrict the substrate specificity of the acceptor substrate. This strategy to tether a substrate in the pantetheine tunnel by strategically introducing a cysteine mutation allowed for the first view of a loaded amino acid within the C domain active site. Moreover, this general approach can likely be applied to other NRPS and PKS domains that have proven recalcitrant to structural characterization using low affinity substrates.

Fig. 9.

Fig. 9.

Structure of CDA-C1 condensation domain covalently modified with probe 31b (PDB 5DU9). Structure of the engineered CDA condensation domain is shown. The cysteine mutation introduced at Glu17 is highlighted, bound to the alkylhalide chemical probe. His 157 and Ser386, which both interact with the alanine amine are represented in stick form.

2.3.2. Capture of peptide intermediates with C domain probes.

A series of probes was described by Tosin and colleagues.79 The probes were designed as nonhydrolyzable analogs of the downstream aminoacyl loaded pantetheine cofactor that bind to the acceptor site of the condensation domain then attack the donor peptide transferring it to the probe. Since the probes are not covalently attached to a PCP, but are diffusible small-molecules, the peptide products are released from the NRPS assembly line resulting in chain termination. The captured peptide intermediate(s) could be extracted from liquid culture and characterized by mass spectrometry (Fig. 10), These probes were therefore designed not to foster structural studies but as tools to identify and analyze synthetic intermediates.

Fig. 10.

Fig. 10.

Chemical probes to capture biosynthetic intermediates. In place of the normal transfer to a downstream carrier protein (a), the probes were designed to enter the condensation domain and interrupt biosynthesis (b). Probes replaced the labile thioester linkage with an amide. Probes were designed with varied amino acids to examine the ability to enter multiple condensation domains within the NRPS. While N-acetyl probe 34a was unable to enter the cell, probes with longer acyl moieties, including N-(2-(2-aminopropanamido)ethyl butyramide and heptanamide 34b and 34c were able to capture multiple peptide intermediates.

The probes were first investigated for their ability to interrupt the production of the antitumor antibiotic echinomycin in the soil bacterium Streptomyces lasaliensis.80 The echinomycin NRPS system uses five modules to produce a cyclized dimer of two pentapeptides formed by ester linkages between a serine side chain as well as an internal thioacetal between two cysteine residues.

Initial tests using an N-acetyl L-alanine probe 34a fed to S. lasaliensis in liquid culture failed to identify any peptide intermediates, which the authors attributed to the hydrophility of the probe that hindered extraction. Probes with longer N-acyl chains 34b and 34c were then synthesized, enabling the capture of a multiple peptide intermdiates. A variety of aminoacyl groups (L-Val, L-Ser, D-Ser, β-Ala, L-Phe) were additionally prepared employing varying N-acyl chains and used to interrupt biosynthesis at different modules and extract synthetic intermediates. Several of the probes were isolated bound to di-, tri-, tetra-and even penta-peptides, illustrating that the acceptor sites of the condensation domains were able to accommodate probes containing different aminoacyl side chains. The findings that dipeptide intermediates accumulated at higher concentrations than subsequent downstream compounds suggests the first peptide bond forming reaction may be among the rate limiting steps in the pathway. Probes lacking a side chain, i.e., those composed of either glycine or β-alanine, led to more synthetic intermediates than probes containing larger amino acyl side chains, indicating these smaller probes were able to enter multiple condensation domain acceptor sites. This innovative in vivo method simultaneously reports on the substrate specificity of all individual NRPS modules and potentially identifies the rate-limiting step in the entire pathway, providing crucial information for biosynthetic engineering.

2.4. NRPS Auxiliary Proteins

Some NRPS products contain hydroxylations that are installed while the peptide is still bound to the PCP. Among the numerous NRPS products possessing these modifications are the glycopeptide antibiotics such as vancomycin and teichoplanin, which retain activity against highly drug resistant pathogenic bacteria such as Staphylococcus aureus and have therefore been labelled as antibiotics of last resort.81-83 The hydroxylations are installed through the activity of cytochrome P450 monooxygenases. In vancomycin and teichoplanin biosynthesis, the interaction of the P450 protein with the carrier protein is assisted by the presence of the so-called X-domain, a catalytically defective condensation domain that forms an interface with the P450 enzymes.84 In the X-domain from Tcp12 of the teicoplanin NRPS, which has been structurally characterized, the canonical HHxxxDG catalytic motif contains two substitutions. The second histidine is replaced by an arginine and the glycine is replaced by an aspartic acid. These two residues sterically interfere at the conventional active site in the related condensation domains.

A chemical strategy exploiting the inherent affinity of P450 enzymes for azoles, which directly coordinate to the heme iron and were among the first ligands to be co-crystallized with a P450,85 was adopted by Cryle and colleagues to capture the functional interaction between the PCP domain and the P450 involved in biosynthesis of skyllamycin, a cyclic depsipeptide with multiple β-hydroxylated amino acids.86 A panel of coenzyme A conjugates containing imidazole or pyridine heterocyclic thioesters as substrate mimics were synthesized using HBTU/HOBt coupling. These probes were designed to have subtly different lengths and positions of the coordinating nitrogen atom to empirically identify the optimal binding probe. The CoA thioester with imidazole-4-carboxylate 38 exhibited substrate-like heme binding as measured by UV-Vis spectroscopy was thus selected for loading onto the cognate PCP7 using Sfp (Fig. 11). The crypto-38-PCP bound to the P450 with a KD of 10 μM. Interestingly, apo-PCP7 or a loaded non-cognate PCP10 displayed no interaction with P450Sky. This illustrates the importance of both the protein-protein and ligand-heme interaction for productive binding.

Fig 11.

Fig 11.

Imidazole probe to capture carrier protein interaction with cytochrome P450. A. Imidazole probe 39 was synthesized via coupling 1H-imidazole-5-carboxylic acid with coenzyme A. The CoA derivative was then installed on the carrier protein using Sfp. B. Incubation of loaded PCP with the P450 allowed the imidazole moiety to interact with the heme iron in the P450 active site.

This probe enabled the crystallization and structure determination of the transient complex between the loaded PCP7 and the P450Sky (Fig. 12). The structure shows that α-helix G of P450Sky forms a platform on which the loaded PCP domain interacts. The interactions are primarily hydrophobic in nature and positioned near the α2 and α3 helices of the PCP domain. Hydrogen bonds and ionic interactions are formed between Thr46, Lys47, and Arg63 of the PCP domain with Asn197, Glu235, and Asp191 of the P450sky respectively.

Fig. 12.

Fig. 12.

Structure of the PCP7-P450sky interaction (PDB 4PXH). The P450-PCP structure captured with imidazole probe 39 is shown. The carrier protein domain is shown in yellow, while the P450Sky structure is shown in tan. Several residues that form interactions between the proteins are highlighted, along with a core of hydrophobic residues from helix α3 of the carrier protein and helix G of the P450.

3. Chemical Tools to Interrogate PKS Enzymes

3.1. Ketosynthase Domains

The ketosynthase domain (KS) catalyzes formation of the crucial carbon-carbon bond between the (methyl)malonyl-ACP extender unit and the growing polyketide chain. Modular PKS systems contain KS domains in each module, whereas iterative PKSs and the related fatty acid synthase (FAS) employ a discrete KS domain for all extension reactions. The KS catalytic cycle begins by transthioesterification of the growing ACP-tethered polyketide onto a conserved KS active site cysteine facilitated by a pair of histidine residues that function as a general base and acid.87 The dipole moment of an adjacent α-helix lowers the pKa of the cysteine thiol while an oxyanion hole stabilizes the anionic tetrahedral intermediate. Insertion of the downstream (methyl)malonyl-ACP into the active site brings both reactive partners in close proximity. The pair of conserved histidine residues are also critical for initiating decarboxylation of the (methyl)malonyl-ACP to generate a transient enolate species, which immediately reacts with the KS-tethered polyketide via a Claisen condensation resulting in a net two-carbon extension of the polyketide and translocation onto the downstream ACP. Productive processing by the KS domain is dependent on both substrate recognition and proper protein-protein interactions between the KS and ACP domains.87, 88

The priming KS domain of ZhuH, the Type II PKS from the pathway for the anthraquinone estrogen receptor antagonist R1128, shows a dimer of subunits that each contain alternating α-helices and β-sheets.89 The buried active site, harboring the catalytic cysteine and histidine residues, can be reached by the PPant of the ACP, enabling the transfer of the acyl group to the catalytic cysteine. The structure determination of the KS-AT didomain protein from the Type I deoxyerythronolide B (DEBS) polyketide synthase illustrates that the KS dimer is maintained, with the AT domains extending in opposite directions from the KS core.90 In the structure, the active site of the AT domain is ~80 Å from the KS catalytic cysteine, illustrating the need for large conformational changes to bring the two reacting groups together for carbon-carbon bond formation. Insight into the necessary conformational changes were revealed by the cryo-electron microscopy (cryo-EM) structures of PikAIII, the fifth module from the pikromycin PKS containing a KS-AT-KR-ACP architecture.91 In the cryo-EM structures, the AT is rotated ~120° and forms an extensive interaction with the KS subunit, creating a smaller reaction chamber into which the ACP domain can rotate to direct the PPant to the different active sites. Structures of the PikAIII with either an upstream or downstream loaded holo-PCP illustrate the two tunnels that meet at the KS active site.

3.1.1. Phosphopantethine probes to cross-link KS and ACP domains.

Since the upstream ACP delivers the growing polyketide chain to the KS active site cysteine, Burkart and co-workers incorporated a reactive electrophile into the PPant arm of the ACP to irreversibly label the cysteine residue.92 Inspired by the antibiotic cerulenin, a known covalent inhibitor of bacterial KS domains in FAS and PKS pathways, an epoxide group was initially chosen as the reactive warhead. Acrylamide Michael acceptors were also conceived with E and Z-configured β-chloro substituents to prevent reversible retro-Michael addition. The chemoenzymatic synthesis of the pantetheine analogues 40, 41, and 42 is illustrated in (Fig. 13), These CoA analogues could be efficiently loaded onto apo-ACP domains through the action of the promiscuous phosphopanthetheinyl transferase Sfp to afford the corresponding crypto-ACP derivative containing a modified PPant arm.

Fig. 13.

Fig. 13.

Chemical probes for trapping the ACP-KS interaction. A. Schematic representation of the chain extending reaction catalyzed by the KS domain illustrating interactions with upstream (yellow) or downstream (orange) ACP domain. B. Strategy for trapping the KS with the upstream ACP using epoxide- or β-chloroacrylate-functionalized pantetheine analogs. C. Probes 40, 41, and 42 used to produce derivatives loaded on the ACP. D. Synthesis of pantetheine analogs, which were subsequently converted to 40 and 41 with CoA biosynthetic enzymes, enabled production of crypto-ACPs. Synthesis of the Z-configured analog (not shown) coupled cis-3-chloroacrylic acid to the amine.

The ketosynthase KAS from E. coli FAS pathway was selected because this is a stand-alone protein enabling dissection of the specificity of the probe using cognate and non-cognate ACP domains.92 Incubation of the CoA probes 40, 41, and 42 with KAS, the E. coli apo-ACP (AcpP) and Sfp resulted in KAS-ACP cross-linking as visualized by SDS- PAGE, whereas the non-cognate ACPs from VibB or EntB, remained unmodified highlighting the role of distinct protein-protein interactions for effective labelling. Verification of the cross-linked KAS-ACP species was performed by in gel digestion followed by MALDI-TOF tandem mass spectrometry. Burkart has extended these initial results to other FAS, type II PKS, and fungal nonreducing polyketide synthases (NR-PKS) as well as further explored the molecular basis for selectivity employing steady-state kinetic and thermodynamic binding experiments.93-95 Khosla et al. applied these probes to study the KS3 and KS5 domains in DEBS PKS, which were the first type I PKS KS domains to be structurally characterized. Their findings recapitulate Burkart’s observations that proper protein-protein interactions drive cross-linking; however, the DEBS KS domains demonstrated greater selectivity as only the (E)-β-chloroacrylamide probe 41 was active.96

While these reports illustrate the ability of the probes to trap the interaction between KS and upstream ACP domains, they have not enabled the structure of a complex to be determined by crystallographic studies. Rather, computational docking of models of the interacting domains have identified motifs important for functional protein-protein interaction.93 Specifically, two acidic residues on the loops preceding and following the α2 helix were identified that interacted with several positive residues on the KS domain. The role of these putative interactions was examined by mutagenesis experiments wherein the cross-linking efficiency of the probes reported on proper protein complex formation.

Very recently, a similar crypto-ACP derived from loading an ACP with 2-bromopropionyl aminopantetheine was used to determine the cryo-EM structure of the ACP with the three domain AT-KS-AT protein from the iterative, non-reducing PKS for cercosporin.97 The two AT domains are distinguished as the starter AT (SAT) and malonyl-CoA specific AT (MAT), respectively. An initial crystal structure illustrated that both the KS and the SAT domains align along the dimeric interface, while the MAT extends from the core, in an orientation reminiscent of the KS-AT structure of the DEBS PKS system.90 The ACP domain bound in a cleft formed between the KS dimer and a small linker motif that exists between the KS and the MAT domain (Fig. 14), Interestingly, only a single ACP was visualized, which suggests that only one of the two monomers of the KS dimer is active at any time.

Fig. 14.

Fig. 14.

Cryo-EM structure of the SAT-KS-MAT protein from cercosporin PKS biosynthesis (PDB 6FIK). The dimeric structure is shown bound to one ACP protein. The complex was stabilized for analysis through the use of a crypto-ACP derived from a β-chloro amino-CoA analog. At the resolution of the EM structure, the atoms from the pantetheine analog cannot be visualized. The KS dimer is shown in pink, the two SAT domains in grey, and the two MAT domains in tan and olive. The ordered helices from the ACP are shown in yellow.

The pioneering studies of Burkart and co-workers, which were the first in the field, using chemical probes to covalently cross-link ACP and KS domains have proven not only instrumental in advancing our understanding of the protein interface between these core PKS domains, but also more broadly in stimulating research to apply chemical approaches to stabilize carrier protein interactions.

3.1.2. Thiocyanate probes to cross-link KS and ACP domains.

In 2014, Charkoudian and co-workers reported a powerful yet simple method to monitor the structural dynamics of carrier protein domains by converting the terminal thiol of the PPant arm to a thiocyanate (ACP-SCN).98 The solvato-sensitive thiocyanate probe has a unique, diagnostic infrared absorbance (IR) at ~2163 cm−1 in aqueous solution, which is red-shifted 5-10 cm−1 when buried in the hydrophobic environment of a partner PKS domain.

The cyanylation of carrier proteins was conveniently performed by incubating a holo-ACP (0.2 mM) in phosphate buffer (pH 7.0) with 8 equivalents of 5,5’-dithiobis-(2-nitrobenzoic acid) (DTNB) to form a mixed disulphide species (ACP-S-TNB), Subsequent treatment with excess NaCN (55 equivalents) furnished the ACP-SCN product that was isolated by a Sephadex desalting column. The cyanylated ACPs were shown to faithfully report on the dynamics as well as local environments of several ACPs from FAS and PKS systems.98

Incubation of ACP-SCN (E. coli ACP) and KAS (E. coli KS) led to cross-linking as determined by SDS-PAGE wherein the KS cysteine residue reacted with the thiocyanate to form a mixed disulphide species.99 Significantly, the interaction could be readily monitored by IR spectroscopy following the appearance of the released cyanide anion at 2120 cm−1. Incubation of KAS with non-cognate ACP domains or bovine serum albumin did not lead to cross-linking or to release of the cyanide anion. These results demonstrate the thiocyanate electrophile is selective for functional communication between partner domains. The thiocyanate cross-linking approach is notable for the ease of preparation of the cyanylated ACP domains and the ability to simply monitor productive engagement between protein domains by IR.

3.1.3. Protein Engineering to interrogate Protein-Protein Interactions.

Although not yet implemented in a polyketide system, an elegant study with the fatty acid synthase (FAS) ketosynthase protein FabF illustrates the potential of protein engineering to illuminate the protein-protein interface in modular enzymes.100, 101 Site directed mutagenesis was employed along with the amber suppression to introduce a p-benzoylphenylalanine UV-activatable cross-linker into either the acyl carrier protein or the FAS ketosynthase to enable protein cross-linking that could be monitored by gel electrophoresis. The probe was placed either on the carrier protein, at or adjacent to the serine of the pantetheinylation site, or at the FabF ketosynthase. The cross-linking was used as a functional assay, enabling the examination of the influence of specific residues on either protein, or of carrier protein acylation on cross-linking efficiency.

3.2. β-Processing Reductive Domains

Polyketide modules often carry out varying degrees of β-carbon processing by successive action of ketoreductase (KR), dehydratase (DH), and enoyl reductase (ER) domains. These catalytic domains transform the β-keto moiety produced from the KS catalytic step domains into hydroxyl, olefin, and saturated alkane products depending on their presence in the biosynthetic pathway. The β-processing domains create stereogenic centers and set the olefin geometry present in the final natural product with extremely stringent fidelity.

3.2.1. Ketoreductase Probes.

Efforts to capture KR-ACP or substrate-protein interactions have proven elusive. Initial work by Leadlay and co-workers employed simple N-acetylcysteamine (NAC) thioesters of diketides as surrogates of the ACP-bound substrates to interrogate the substrate specificity of KR domains.102 Diketide NAC thioesters have proven invaluable in the study of PKS systems, but have clear limitations as they cannot properly mimic the majority of potential substrates. Indeed, a recent investigation of the KR3 from the Tylosin (Tyl) PKS showed a simple diketide NAC substrate was not turned-over by the enzyme (kcat/KM <0.15 M−1 s−1) whereas a native tetraketide NAC substrate was efficiently processed (Kcat/KM 16.3 M−1 s−1).103 To address reactivity of thioesters, especially in cases where the native substrate possesses a δ-hydroxyl group, which would spontaneously cyclize to afford a δ-lactone, Fecik and co-workers introduced thioether substrates containing 1-2 carbon atom spacers between the sulfur atom of the NAC moiety and the C-1 carbonyl group of the polyketide substrate.104, 105 However, attempts to obtain substrate-bound KR structures with NAC substrates either through soaking experiments or co-crystallization have not been successful, highlighting the importance of the ACP domain for substrate presentation.

3.2.2. Dehydratase Probes.

Dehydratase (DH) domains catalyze the reversible stereospecific syn-dehydration of β-hydroxy group of polyketide chain intermediates resulting in the formation of either cis or trans-olefin products. DH domains share a high degree of structural similarity despite the low sequence identify and employ a double-hotdog fold with similar active sites.105, 106 A histidine residue in the His-Asp catalytic dyad is the general base for deprotonation of the α-proton while the aspartic acid serve as the general acid for protonation of the alcohol. The histidine residue is located in a conserved active site motif HXXGXXXP while the aspartate is found in the HPALLD motif.

Capturing the transient ACP-DH interaction has proven challenging owing to the simple nature of the DH enzymatic reaction. The Burkart group thus employed their crypto-ACP strategy (see section 3.1.1) utilizing an alkyne moiety appended to the PPant arm designed to covalently modify the conserved histidine residue in the DH active site.107-109 The first generation pantetheine analogue 49 was based on Bloch’s mechanism-based inhibitor 3-decynoyl-N-acetyl- cysteamine 48, which was shown inactivate the E. coli dehydratase FabA (Fig. 15).110

Fig. 15.

Fig. 15.

Chemical probes for trapping the ACP-DH interaction. A. Structure of Bloch’s mechanism based inhibitor (48) along with first (50) and second generation (52) DH probes. Probe 52 was loaded onto the apo-ACP domain AcpP by Sfp to afford crypto-AcpP 53. B. Chemical mechanism for covalent labeling of the catalytic His residue in DH domains by sulfonyl probes proceeding through allene intermediate 55. C. Schematic of ACP-DH cross-linking by crypto-ACP 53.

To test this strategy, FabA, the stand-alone DH domain involved in the FAS pathway in E. coli along with the cognate apo-ACP (AcpP) were chosen. The pantetheine analogue 49 was converted to the corresponding CoA derivative 50 then loaded onto the apo-ACP to furnish the 3-decynoyl-crypto-ACP. Incubation with FabA resulted in the formation of the ACP-FabA complex visualized by SDS-PAGE. The extent of cross-linking was low, presumably due to the instability of the thioester linkage. However, unlike the related KS- and C domain pantetheine probes (see sections 3.1.1 and 2.3.1), the isosteric amide analogue of 50 was completely inactive, highlighting the importance of pKa of the α-proton (the pKa of a thioester is several units lower than an amide). This motivated development of a second-generation DH probe 52 wherein the thioester was replaced by a nonhydrolyzable sulfone moiety containing a sufficiently acidic α-proton to participate in enzymatic inactivation. Probe 52 was similarly elaborated to the crypto-ACP in quantitative yield. Addition of the DH domain FabA afforded clear intense bands in SDS-PAGE corresponding to the cross-linked ACP-FabA complex. Optimization of the reaction conditions (pH 7.0, 36 h, 37 °C) led to greater than 90% yields for the cross-linked adduct. The cross-linking exhibited a high level of specificity for native DH-ACP protein pairs. The NRPS crypto-ACPs EntB and VibB failed to show any interaction with FabA while several type II PKS ACPs, which bear homology to type II FAS ACPs displayed a moderate level of cross-linking.

The mechanism of cross-linking proceeds through a novel propargylic rearrangement to an allene intermediate 55 mediated by the catalytic histidine residue (Fig. 15), Subsequent Michael addition of the histidine into the allene affords the covalent adduct 56 that can tautomerize to 57. The process is highly selective since the alkyne is intrinsically unreactive and requires enzymatic activation to form the reactive allene species.

The structure of the FabA interaction with the cognate ACP was determined by crystallography, illustrating a heterotetrameric complex of each subunit of the dimeric FabA interacting with a partner ACP.111 In the structure (Fig. 16), a negatively charged patch located near helices α2 and α3 interacts with an arginine-rich patch, formed by Arg136 and Arg137 of FabA. Although no direct ionic interactions form between the Arg residues and the acidic residues on the ACP, the complementary electrostatic surfaces likely promote the transient interaction and proper orientation. Upon complex formation, the acyl chain from probe 54 enters the active site to form a covalent cross-link with His70 of FabA of the alternate chain of the FabA dimer.

Fig. 16.

Fig. 16.

Structure of complex between ACP and FabA dehydratase domain (PDB 4KEH). The structure of the FabA2-ACP2 heterotetramer was solved using crypto-ACP probe 53 to capture the transient interaction. The ACP (yellow) interacts with one chain of the FabA dimer (grey) while cross-linking to His70 from the alternate chain (pink). Complementary electrostatic charges between the FabA and ACP proteins stabilize the complex interaction.

Probe 54 has also been used to characterize interactions between ACPs with the structurally and functionally related product template (PT) domains discussed further in section 3.4.112

3.2.3. Enoylreductase Probes.

The enoylreductase (ER) domains of PKS and FAS enzymes belong to the NAD(P)H-dependent medium-chain dehydrogenase/reductase superfamily of enzymes. ER domains catalyze the reduction of the carbon-carbon bond of the enoyl thioester substrate by transfer of hydride from NADPH to the C-3 position with concomitant protonation of the generated carbanion. Mutational analysis of active site residues in a model PKS ER domain failed to identify catalytic residues as all mutants retained significant activity, thus the catalytic general acid that protonates the C-2 carbanion remains to be elucidated.113

Attempts to capture detailed substrate-protein interactions of ER domains have also proven challenging as observed with the KR domains given the simple nature of the reaction (hydride transfer). Although an ER PKS probe has not been reported, Burkart and co-workers have described an ingenious probe that stabilizes an interaction between the ER and ACP proteins in a bacterial type II FAS pathway.114 The critical feature of their probe was the incorporation of the tight-binding ER ligand triclosan, which is a potent nanomolar inhibitor of the enoyl-ACP reductase (FabI) in E. coli. Triclosan was coupled via a diazo linker to pantothenate to afford 58, which was ligated onto the E. coli ACP (AcpP) to afford crypto-58-AcpP (Fig. 17), Crypto-58-AcpP was shown to tightly bind FabI with a KD of 712 nM. Overall, these results provide a foundation for further exploring ACP-ER interactions. However, application to PKS systems will require the identification of suitable ligands.

Fig. 17.

Fig. 17.

Design of ER probe 58. A. Structure of ER probe containing triclosan (blue), a linker (gray), and a pantetheine arm (green). B. An apo-ACP can be enzymatically modified with probe 58 to yield crypto-58-ACP, which can bind ER domains.

3.3. Thioesterase/Termination Domains

PKS TE domains operate by an analogous mechanism as described for NRPS TEs in section 2.2 and the tools are equally applicable. The PKS TE domains direct the serine of the catalytic triad toward a binding pocket for the ACP domain enabling the position of the acyl chain for transacylation.115, 116

To examine the mechanism of substrate macrocyclization and hydrolysis, Fecik and co-workers designed a series of substrate-based affinity labels capable of modifying the catalytic serine residue to form a stable adduct.117, 118 The key design element was replacement of the terminal thioester of the ACP bound polyketide linkage by a stable diphenylphosphonate moiety, which has been used to characterize serine proteases. Mechanistically, the serine nucleophilically attacks the diphenylphosphonate warhead with displacement of one phenoxy group. The second phenoxy group is slowly hydrolysed during aging of the complex to form a tetrahedral complex that mimics the tetrahedral intermediate of the substrate during transacylation. Additionally, one of the oxygen atoms of the resulting phosphonic ester is bound in the oxyanion hole.

The TE domain from the pikromycin (Pik) polyketide pathway was selected for these studies because of its inherent relaxed substrate specificity to produce both 12- and 14-membered macrolactone products from hexa- and heptaketide chain intermediates channelled from the antecedent ACP domain in the PikAIV monomodule. Moreover, the PikTE domain had been previously structurally characterized.115 A full-length pentaketide phosphonate affinity label 61 was synthesized, which mimics the C1-C9 segment of the pikromycin heptaketide intermediate (Fig. 18). Incubation of probe 61 with PikTE resulted in the time-dependent inactivation. A structure was obtained by soaking PikTE crystals with 61, which showed the acyl chain of the mimic curling back upon itself in a position primed for macrocyclization.118 The hydroxyls of 61 made direct or water-mediated hydrogen bonds to the side chains of Thr70 and Gln183 that helped orient the acyl chain. Of note, Gln183 is located on one of the first of the two lid helices. This helix in the PikTE structure displays a pronounced kink. The second lid helix also forms a wall of the acyl chain binding pocket further facilitating the orientation of the substrate for cyclization (Fig. 19). A recent report from Boddy and Schmeing details their efforts to apply this general approach to the DEBS TE.119

Fig. 18.

Fig. 18.

Polyketide probe to target the PikAIV thioesterase domain. A. Macrocyclization of the native heptaketide substrate 59 by PikTE. B. Pentaketide diphenylphosphonate affinity label 60 that mimics the C1-C9 segment of the native heptaketide. C. Mechanism of affinity labeling. The catalytic serine (Ser148) nucleophilically attacks the phosphonate to form intermediate 62. Subsequent hydrolysis results in the observed serine-phosphopentaketide 63 in the crystal structure.

Fig. 19.

Fig. 19.

Structure of PikTE with affinity probe 60 (PDB 2HFJ). The phosphonate probe is attacked by the serine nucleophile. The acyl chain turns back upon itself as partly directed by hydrogen bonding interactions with the ketide intermediate oxygen atoms. The two lid helices are colored darker blue.

3.4. Polyketide Mimetics.

The biosynthesis of aromatic polyketides fundamentally differs from classical polyketides discussed in the previous sections that employ modular type I PKSs with embedded reductive domains for β-keto processing at each step of elongation.120 Aromatic polyketide biosynthesis proceeds through polymerization of malonyl-CoA to afford a poly-β-ketone intermediate, which undergoes consecutive intramolecular aldol reactions mediated by specialized domains to afford linearly or angularly fused multicyclic PKS scaffolds (Fig. 20). Efforts to obtain insight into the substrate-protein interactions that govern the regioselectivity of cyclization have been impeded by the highly reactive nature of the poly-β-ketone intermediates, which spontaneously react intra- and inter-molecularly to afford a complex mixture of products.

Fig. 20.

Fig. 20.

Polyketide mimetics. A. Representative aromatic polyketides. B. Aflatoxin biosynthesis requires one molecule of hexanoyl CoA and seven malonyl-CoA molecules, which are condensed by the NR-PKS PksA to afford poly-β-ketone 67, which is cyclized by the PT domain within PksA to 68. The polyketide is off-loaded by the TE domain through intramolecular cyclization to 69, a key intermediate in the synthesis of aflatoxin. C. Polyketide mimetic 71 was designed as an isostere of the poly-β-ketone 68. A single depiction of the ketoenol tautomerization states is illustrated. D. Phosphopantetheine polyketide mimetic 72 was prepared through chemoenzymatic synthesis.

To address the inherent instability of the poly-β-ketones, the Burkart and Tsai groups described the synthesis of substrate mimetics wherein one or several of the reactive ketone functional groups were replaced by a stable isostere to preserve the overall structural topology while rendering the polyketide less prone to non-enzymatic degradation.121 The thioether was first selected as a ‘ketone’ isostere in a model tetraketide; however, attempts to prepare longer polyketides utilizing this strategy with multiple thioethers suffered from inherent reactivity of the remaining β-di-ketones.121 The isoxazole moiety employed in the synthesis of the intermediates was subsequently recognized as a 1,3-diketone isostere and used in combination with thioethers for the construction of polyketide substrate mimetics. The heptaketide substrate mimic 71 contains thioether replacements for the second and fifth ketone groups while an isoxazole is used to substitute the third, fourth, sixth and seventh carbonyls (Fig. 20). The representative phosphopantetheine substrate 72 was synthesized for crystallization trials.

To visualize substrate-protein interactions between these polyketide mimetics and cyclization domains, simple phosphopantetheine analogues were found to be sufficient for productive binding. The non-polar mimetics partitioned into the hydrophobic binding pocket of the cyclization domains and did not necessitate presentation by an ACP domain. As a model system the Burkart and Tsai groups investigated the product template (PT) domain embedded within the multifunctional non-reducing polyketide synthase (NR-PKS) PksA responsible for cyclizing a 20-carbon poly-β-ketone intermediate 67 to afford 68 in the biosynthesis of aflatoxin (Fig. 20).122 This PT domain catalyzes two intramolecular aldol reactions once the poly-β-ketone intermediate has achieved the correct length while bound to the ACP. A crystal structure had previously been solved by the Tsai lab bound to a co-purifying fatty acid as well as to a bicyclic substrate analog depicting the binding pocket.123 The domain forms a modified “double hot dog” fold and contains a two-part active site that includes both a hexyl binding region for the unreacting fatty acid and a cyclization chamber where the chemistry occurs. Mutation of the catalytic dyad residues His1345 and Asp1543 abolished activity, while mutation of Thr1546 and Asn2548, which are both directed into the cyclization chamber, also reduced activity.123

Use of heptaketide mimetic 71 enabled the determination of a 1.8 Å resolution structure that provided more detailed insight into the active site.122 The active site is further divided into a third component, which is responsible for pantetheine binding. Residues shown to interact with the phosphopantetheine were also probed by mutagesis. Within the cyclization chamber, the structure illustrates a highly ordered water network near the dyad that forms a polar environment that is exploited for the aldol cyclization (Fig. 21). Although sterically constrained by the intervening isoxazole moiety, the interaction of the isosteric atoms for both C4 and C9 with this water network suggested that the environment would properly orient these two atoms towards for aldol cyclization. Molecular docking further provided plausible orientations for the linear polyketide substrate, the monocyclic intermediate, and the final bicyclic product of the PT domain.

Fig. 21.

Fig. 21.

Co-crystal structure of PksA-PT domain and probe 72 (PDB 5KBZ). The pantetheine and hexyl binding pockets are highlighted. The highly ordered solvent network of the core cyclization chamber is shown, along with the catalytic diad composed of residues Asp1543 (not labeled) and His1345. The C4 and C9 isosteres that react in the aldol cyclization are labeled with green asterisks.

In a complimentary approach, Tsai and Vandewal have proposed the use of an oxetane as a surrogate for the carbonyl groups in poly-β-ketones.124 The oxetane is a well-known carbonyl isostere that maintains the lone pairs on the oxygen atom in the same vector, but lacks the inherent reactivity of the carbonyl. As a model system, the authors prepared the phosphopantetheine malonyl mimic 74 (Fig. 22). Compound 74 was successfully co-crystallized with the ketosynthase DpsC, which catalyzes the first step in the biosynthesis of the aromatic polyketide daunorubicin. The use of probe 74 enabled the structural characterization of the acylenzyme form of DpsC bound to the unreactive extender unit. The structure reveals the proper distance between the two reactive atoms for the Claisen condensation, namely the β carbons of the propionyl-serine of the KS domain and of the probe. The carboxylate is positioned for decarboxylation but has not occurred. The thioester oxygen is oriented away from the putative oxyanion hole formed by the side chain of His148 suggesting that, upon decarboxylation, the resulting carbanion rotates to exploit the potential interaction with the histidine (Fig. 23), The structure was used to seed molecular dynamics simulations with either the oxetane probe or a computationally generated carbonyl. The simulations of either behaved the same, showing similar backbone fluctuations across the full-length proteins and high-frequency movements around the active site.

Fig. 22.

Fig. 22.

Oxetane-based polyketide surrogate. A. Structure of the isosteric oxetane malonyl phosphopantetheine probe 74 compared to malonyl phosphopantetheine 73. B. Chemenzymatic synthesis of probe 74 from D-pantothenic acid.

Fig. 23.

Fig. 23.

Co-crystal structure of DpsC and probe 74 (PDB 5WGC). The oxetane probe bound to propionyl-Enzyme KS domain, DpsC, to mimic the malonyl-ACP extender unit. Arg271 and Thr163 interact with the carboxylate of the probe, while His198 (partially obscured) is proposed to stabilize the oxyanion formed upon decarboxylation.

The oxetane isostere is expected to find more widespread application to study not only aromatic polyketide biosynthesis, but also other PKS systems where the intrinsic reactivity of thioesters or ketones renders the substrates or products unstable.104, 105, 125

3.5. Non-cleavable Malonyl Derivatives

The use of non-hydrolyzable malonyl coenzyme A analogues to capture polyketide intermediates was first reported by Spencer and co-workers.126 These authors prepared thioether probe 79 containing an extra methylene between the sulfur atom and the carbonyl, which preserves the reactivity of the substrate allowing it to react in a decarboxylative aldol Claisen condensation (Fig. 24) However, this small change removes the reactive thioester; consequently, the chain-extended polyketide cannot undergo further elongation, resulting in chain termination. Because the probes are not covalently attached to an ACP, but are diffusible small-molecules, the polyketide products are released from the PKS assembly line and can be isolated and subsequently detected by LC-MS/MS. Probe 79 was used to study biosynthesis of the type III PKS stilbene synthase (STS) from Pinus sylvestris. Incubation of probe 79 along with malonyl CoA, and the starter unit p-hydroxyphenylacetyl-CoA led the formation of a triketide validating this approach. However, no product was detected with the natural starter unit suggesting the extra methylene atom in 79 was deleterious for substrate recognition. Thus, closer mimics of malonyl CoA were prepared wherein the sulfur atom was replaced with an oxygen or CH2 moiety.127 The malonylcarba(dethio)-CoA 80 was efficiently processed by STS resulting in formation of di-, tri-, and tetraketide products employing the natural starter unit, highlighting the better isosteric design of probe 80.

Fig. 24.

Fig. 24.

Chemical probes to capture biosynthetic intermediates. A. Whereas the ketosynthase domain would normally transfer the loaded substrate from an upstream to a downstream carrier protein (a), the probes were designed to enter the ketosynthase domain and interrupt biosynthesis (b). B. CoA probes 79–80 designed to replace the labile thioester linkage with a ketone. C. malonyl carba(dethio)-N-acetylcysteamine probes 81 and 83 along with the corresponding methyl esters 82 and 84 for cellular penetration. D. malonyl carba(dethio)-N-acetylcysteamine probes 85–86 with bioorthogonal handles. E. acetoxy methyl ester (AM) of malonyl carba(dethio)-N-acetylcysteamine 87 designed for improved esterase cleavage.

To facilitate in vivo studies, simplify synthesis, and enhance MS characterization, Tosin next truncated CoA probe 80 to the malonyl carba(dethio)-N-acetylcysteamine analog 81128 (Fig. 24) based on the established behavior of the corresponding N-acetylcysteamine (NAC) thioester derivatives, which have been widely used a surrogates of natural CoA or ACP-bound substrates.102 Protection of the free carboxylate as the methyl ester enhanced cellular penetration enabling in vivo profiling directly in the producing microorganism where the esters are hydrolyzed intracellularly by endogenous esterase(s). The malonyl and methylmalonyl carba(dethio)-N-acetylcysteamine probes 81 and 83 have now been used to study a multitude of PKS systems in vitro and in vivo including the prototypical 6-deoxyerythonolide (DEBS) PKS,128 the polyether Lasalocid (LAS) PKS,129 the iterative partially reducing (PR) PKSs involved in the syntheses of 6-methylsalicylate and 6-pentylsalicylate (6-MSAS),130, 131 and the thiotetronate PKS responsible for thiolactomycin biosythesis.132, 133 A representative metabolite from in vivo feeding studies of the lasalocid PKS system is shown in Fig. 24 highlighting the useful information that one can obtain from this analysis, which in the case of lasalocid revealed the timing of ring formation in this polyether antibiotic.

Tosin has further expanded their non-hydrolyzable malonyl carba(dethio)-NAC probes by modulating the chain length and incorporating alkyne, azide, and fluoro functional groups into the N-acyl portion of the NAC probes.134 Surprisingly, most modifications appear to be well tolerated affording many unique metabolites. The bioorthogonal functional groups also allow late-stage diversification by azide-alkyne Huisgen cycloaddition and Staudinger-phosphine reactions providing a further series of analogues. Moreover, the azide or alkyne synthetic handles could in principal be exploited to facilitate purification and enhance ionization for subsequent MS analysis.

The in vivo feeding studies rely on intracellular hydrolysis of the malonyl carba(dethio)-N-acetylcysteamine probes 82 and 84 by endogenous esterases. The Tosin group observed that hydrolytic release depended on the bacterial strain as well as the nature of the probe resulting in highly variable yields from 5-70%. The acetoxy methyl ester (AM) was evaluated because this has been shown in other systems to have enhanced reactivity since hydrolysis occurs on the remote and less sterically encumbered carbonyl affording an intermediate acyl-hemiacetal that decomposes to release acetaldehyde and the free malonate probe.135 Feeding studies with probe 89 (Fig. 24) led to a nearly 10-fold increased yield of polyketide species relative to the methyl ester counterpart demonstrating the superiority of probe 89.135 Analysis of the culture extracts also showed the probe was completely hydrolysed over five days confirming the improved reactivity of the acetoxy methyl ester.

In summary, these non-hydrolyzable malonyl probes have proven remarkably versatile and extremely powerful to decipher PKS biosynthetic pathways. Isolation of interrupted biosynthetic chain intermediates reveals direct insight into the timing of biosynthetic events such as cyclization and identifies modules or domains with relaxed or strict substrate specificity. Second-generation probes have increased the in vivo activity and contain bioorthogonal functional handles for further polyketide diversification.

4. Conclusions and Future Directions

4.1. Structural Characterization of Modular Biosynthetic Enzymes.

Since the discovery and elucidation of the thiotemplate mechanism used by the modular NRPS and PKS enzymes nearly 25 years ago,3, 4, 136 a search for both the mechanistic and structural features that govern their function has motivated many research labs. Novel chemistry has been described, identifying the surprising ways that nature has evolved to produce an arsenal of natural products. These elegant studies have described new products and new ways that they are produced.

The structural investigation of the modular enzymes has identified common themes in the interactions between the catalytic and carrier protein domains. Not surprisingly, given the conserved positioning of the PPant at the start of helix α2, this helix has been shown to contribute to many of the interactions with other domains. The NRPS adenylation domains have been seen in multiple distinct proteins, including both multi-domain and free-standing systems, to exploit residues on the loop that precedes helix α2 for proper recognition and docking. Indeed, structure guided mutagenesis of both the adenylation or the PCP domains with subsequent biochemical analysis confirms the importance of these interactions in a functional interaction.49, 137 The other structures of carrier proteins with NRPS catalytic domains show a different face of the carrier protein is used. In the NRPS condensation, TE, or with the auxiliary P450 protein, domain interfaces include helix α3 in addition to α.2. Similarly, the interactions of the ACP with the dehydratase domain from fatty acid synthase, and presumably then from PKS DH domains, residues from α3 play an important role.

Nonetheless, as the number of biosynthetic gene clusters continues to grow, we are reminded that there is also a great deal to learn.

Understanding the structural and catalytic basis for natural product biosynthesis is critical for several reasons. First, knowledge of the features that dictate distinct reactions or building block specificity will allow for the prediction of function of new, uncharacterized biosynthetic gene clusters. Bioinformatic approaches are available to predict the specificity of NRPS adenylation domains. However, the features that endow noncanonical catalytic domains with novel biochemical activities are not fully understood. Identifying the key sequence motifs and structural features that dictate unusual chemistry will allow for the prediction of which natural product clusters may produce the next set of active molecules.

Further, the field has long sought the ability to engineer new modular catalysts, either through changes in active site residues that govern specificity or activity, or through the wholesale reorganization of the domain or module assemblies. A clear definition of the features that enable the functional and physical interaction of domains and modules will provide the basis for the more routine generation of new biosynthetic systems.

The structures of single catalytic domains has provided insight into the residues that contribute to activity and substrate preference. The goals of elucidating the fundamental structural basis for NRPS and PKS function as well as combining larger assemblies into functional catalysts to produce novel natural products has compelled the push for structures of multidomain proteins. By their very nature, the multidomain NRPS and PKS enzymes are highly dynamic; they contain features to allow the PCP/ACP domains to migrate between different catalytic domains, which may themselves also exhibit conformational flexibility. Brute force approaches with multidomain proteins have had some success leading to views of the compatible orientations of the domains and the intervening linkers.

The use of carefully designed chemical probes however has proved to be extremely useful in studies of multidomain NRPS and PKS enzymes. These probes have at least two important features that makes them valuable tools for the structural biologist First, these probes reduce the conformational flexbility of the large multidomain systems. Particularly for crystallographic studies, limiting the dynamics of a protein system can produce a homogenous protein sample with a higher propensity for crystallization. Even for cryo-EM, which is able to computationally segregate multiple ensembles of particles, minimizing the heterogeneity of the target can facilitate structural analysis.

Perhaps more significant, the best chemical probes described herein are meaningful analogues for the active site contents to describe relevant states of the assembly line. Many of the best probes for the NRPS and PKS systems exploit the ACP and PCP domains that migrate between different catalytic domains. By loading a pantetheine analog with a reactive group onto the carrier domain, the natural dynamics of the assembly line can position the probe in the proximity of a reactive group on the catalytic domain, enabling the trapping of the functional interaction between the domains. Many labs have adopted this common ACP/PCP labelling approach while exploiting a diverse range of reactive warheads that take advantage of the features of each targeted domain. Undoubtedly, other probes will be developed and, as new chemistry being carried out by the NRPS and PKS catalytic domains is discovered, unusual chemical probes will be designed, synthesized and used to continue to unravel the fascinating biosynthetic capabilities of these enzymes.

6. Acknowledgement

This work is funded by a grant from the National Institutes of General Medicine (GM116957) from the National Institutes of Health.

Footnotes

5.

Conflicts of Interest

There are no conflicts of interest to declare

Contributor Information

Andrew M. Gulick, University at Buffalo, Department of Structural Biology Jacobs School of Medicine and Biomedical Sciences 955 Main St Buffalo, NY 14203 amgulick@buffalo.edu.

Courtney C. Aldrich, University at Minnesota, Department of Medicinal Chemistry College of Pharmacy 8-101 Weaver-Densford Hall 308 Harvard St, SE Minneapolis, MN 55455 aldri015@umn.edu.

References

RESOURCES