Abstract
The fungal species Aspergillus flavus produces an alkaloid terpenoid, flavunoidine, through a hybrid biosynthetic pathway combining both terpene cyclase and nonribosomal peptide synthetase enzymes. Flavunoidine consists of a tetracyclic, oxygenated sesquiterpene core decorated with dimethyl cadaverine and 5,5-dimethyl-L-pipecolate moieties. Unique to the flavunoidine biosynthetic pathway is FlvF, a putative enzyme implicated in stereospecific C–N bond formation as dimethyl cadaverine is linked to the sesquiterpene core to generate pre-flavunoidine. Here, we report the 2.6 Å-resolution crystal structure of FlvF, which adopts the α-helical fold of a class I terpene synthase. However, FlvF is not a terpene synthase, as indicated by its lack of enzymatic activity with farnesyl diphosphate and its lack of signature metal ion binding motifs that would coordinate to catalytic Mg2+ ions. Thus, FlvF is the first example of a protein that adopts a terpene synthase fold but is not a terpene synthase. Two Bis-Tris molecules bind in the active site of FlvF, and the binding of these ligands guided the docking of pre-flavunoidine to generate a model of the enzyme-product complex. Phylogenetic analysis of FlvF and related fungal homologues reveals conservation of residues that interact with the tetracyclic sesquiterpene in this model, but less conservation of residues interacting with the pendant amino moiety. This may hint toward the possibility that alternative amino substrates can be linked to a common sesquiterpene core by FlvF homologues to generate flavunoidine congeners, such as the phospholipase C inhibitor hispidospermidin.
Graphical Abstract
Introduction
Natural products are complex organic molecules synthesized by living organisms. Because of their incredible structural diversity, natural products are instrumental in myriad applications including drug discovery and development,1 generation of renewable biofuels,2 and use in commercial fragrances and cosmetics.3 Several families of biosynthetic enzymes generate such natural products through well-defined sequences of intricate chemical reactions catalyzed by multi-enzyme complexes. For example, polyketide synthases (PKSs) are molecular assembly lines containing an acyl carrier protein that shuttles biosynthetic intermediates from one catalytic domain to another, utilizing acetyl-CoA as a source of C2 building blocks.4 Nonribosomal peptide synthetases (NRPSs) comprise a second example – these multi-enzyme complexes contain a peptidyl carrier protein that similarly shuttles a growing peptide chain from one catalytic domain to another in assembly-line fashion.5 Finally, a third example is provided by the greater family of terpene synthases (TSs), which generally consist of prenyltransferases and cyclases. Prenyltransferases link the C5 isoprenoid precursors dimethylallyl diphosphate and isopentenyl diphosphate to generate linear isoprenoids such as C10 geranyl diphosphate (GPP) and C15 farnesyl diphosphate (FPP); cyclases then convert these linear isoprenoids into structurally complex products typically containing multiple rings and stereocenters.6–8
Each individual family of biosynthetic enzymes described above generates myriad natural products; product chemodiversity is even further amplified through the combination of pathways (e.g., TS/PKS, TS/NRPS, or PKS/NRPS) to yield hybrid natural products.9 Consider fungal meroterpenoids, which are TS/PKS-derived natural products containing terpene and polyketide components.10–12 Prominent examples include the meroterpenoid pyripyropene, which is a potent inhibitor of acyl-CoA:cholesterol acyltransferase generated by Aspergillus fumigatus;13 the antibiotic and immunosuppressant mycophenolic acid, which is generated by Penicillium brevicompactum;14 and ascofuranone, which is a potential antiparasitic drug produced by Acremonium egyptiacum.15 Biosynthetic systems that generate such hybrid natural products can be further exploited for the generation of derivatives with unique chemical structures and biological activities.16
Hybrid TS/NRPS biosynthetic pathways have also been discovered in certain fungi. Recently, Tang and co-workers isolated and characterized the biosynthetic gene cluster from Aspergillus flavus responsible for producing flavunoidine, a hybrid terpene alkaloid (Figure 1).17 Formation of the tetracyclic sesquiterpene core of flavunoidine begins with the generation of acoradiene by the FlvE-catalyzed cyclization of FPP, after which oxygenation of the product is achieved by the P450 enzyme FlvD. Separately, a dimethylcadaverine moiety is generated from L-lysine by FlvH, an S-adenosyl-methionine-dependent N-methyltransferase, and the PLP-dependent decarboxylase FlvG. Dimethylcadaverine is covalently linked to the sesquiterpene core at the C7 axial position through C–N bond-forming chemistry governed by FlvF. To complete the biosynthetic sequence, a 5,5-dimethyl-L-pipecolate moiety is produced using O-acetyl-homoserine and α-keto-isovalerate by the didomain enzyme FlvA, which contains an N-terminal PLP-dependent lyase and a C-terminal non-heme-iron α-ketoglutarate-dependent oxygenase, followed by FlvB-catalyzed reduction. Lastly, pre-flavunoidine is hydroxylated by the P450 FlvC and subsequently acylated at the C10 position by the NRPS FlvI to yield flavunoidine.
A unique feature of the flavunoidine biosynthetic pathway is the putative terpene synthase FlvF. Although annotated as a cyclase, FlvF is responsible for the covalent linkage of dimethylcadaverine to the polycyclic sesquiterpene core.17 However, the chemical mechanism underlying this reaction is not well understood. As the first step toward understanding structure-function relationships in this unusual enzyme, we now report the X-ray crystal structure of FlvF determined at 2.6 Å resolution. Notably, the initial electron density map was phased using a search model generated by AlphaFold2.18 FlvF adopts the α-fold of a metal-dependent class I terpene synthase, but it does not contain the canonical metal-binding motifs required for prenyltransferase or cyclase activity. Therefore, the class I terpene cyclase fold has been repurposed for an alternative function. Bound molecules of Bis-Tris provide intriguing clues regarding intermolecular interactions in the FlvF active site that may be important for understanding substrate binding and catalysis.
Materials and Methods
Reagents
All chemicals and buffers used for protein expression and purification were purchased from Sigma-Aldrich Chemical Company, Fisher Scientific, RPI, and Goldbio. All crystallization reagents were purchased Hampton Research. No additional purification was performed.
Gene Cloning
The full-length flvF gene (without the start methionine)17 was amplified by polymerase chain reaction using the forward primer 5’-TACTTCCAATCCAATGCAGAAGGACTCAGAAGA-3’ and the reverse primer 5’-TTATCCACTTCCAATGTTATTAGTTTGCGTACCGAACTTC-3’. The linearized flvF gene was inserted into a pET-His6-GFP-TEV-LIC vector (S. Gradia, Addgene plasmid 29663). The resulting pET-His6-GFP-TEV-FlvF plasmid was transformed into NEB-5α E. coli competent cells (New England Biolabs) and purified using the Qiagen miniprep kit. The sequence was verified using services provided by the University of Pennsylvania DNA Sequencing Facility.
Protein Purification
The pET-His6-GFP-TEV-FlvF plasmid was transformed into BL21(DE3) E. coli competent cells (New England Biolabs). Selected colonies were used to inoculate 5-mL cultures of Terrific Broth (TB) media supplemented with 50 μg/mL kanamycin and grown overnight at 37 °C, shaking at 250 rpm. Four one-liter flasks of TB medium supplemented with 50 μg/mL kanamycin were inoculated with 5 mL cultures and incubated at 37 °C, shaking at 250 rpm until the OD600 reached 0.6 – 0.9. The cultures were cooled to 16 °C, and protein expression was induced with the addition of 1 mM final concentration of isopropyl β-L-1-thiogalactopyranoside (IPTG). Incubation continued overnight at 16 °C while shaking at 250 rpm. Cells were pelleted by centrifugation and stored at −80 °C.
A one-liter cell pellet was resuspended in lysis buffer [50 mM Tris (pH 8.0), 300 mM NaCl, 2 mM TCEP, and 10 mM imidazole] with the addition of 6 mg DNase and one cOmplete protease inhibitor cocktail tablet (Roche) and incubated for 1 h at 4 °C. Cells were lysed using sonication and cell debris was removed by ultracentrifugation. The supernatant was applied to a 5-mL HisTrap HP column (Cytiva), and protein was eluted using a linear gradient with an increasing concentration of imidazole to 300 mM. The protein was treated with 3 mg of TEV protease to cleave the GFP tag and dialyzed into lysis buffer overnight at 4 °C. The digested protein was applied to a 5-mL HisTrap HP column to separate the cleaved protein from the GFP tag. Flow-through fractions containing FlvF were pooled, concentrated, and applied to a HiLoad 26/600 Superdex 200 prep grade column (Cytiva) in sizing buffer [50 mM Tris-HCl (pH 8.0), 150 mM NaCl, 2 mM TCEP, and 10% glycerol]. Peak fractions from size-exclusion chromatography were analyzed for purity using SDS-PAGE, pooled, and concentrated to approximately 15 mg/mL. The final yield was approximately 4 mg/g of wet cell pellet.
Native-PAGE
Native (non-denaturing) polyacrylamide gel electrophoresis was performed using NativePAGE™ 4–16% Bis-Tris gels (Invitrogen) and imaged with a white-light illuminator. Ten microliters of FlvF at different protein concentration (50 – 150 μg) was loaded onto the gel along with NativeMark™ unstained protein standard ranging from 20 – 1236 kDa.
Crystallization
Initial crystallization conditions were identified using high throughput screens from Hampton Research (Index, PEG/Rx, PEG/Ion, and Crystal Screen), assembled in sitting-drop microplates (MRC Plate 96-well 2 drop) using the Mosquito liquid handler (sptlabtech). A 300-nL drop of protein solution [5 and 10 mg/mL FlvF, 50 mM Tris-HCl (pH 8.0), 150 mM NaCl, 2 mM TCEP, and 10% glycerol] was added to 300 nL of precipitant solution and equilibrated against an 80-μL reservoir of precipitant solution. Crystals were grown by the sitting-drop vapor diffusion method at 4 °C. The precipitation buffer that yielded crystals [0.2 M MgCl2, 0.1 M Bis-Tris (pH 6.5), and 25% PEG 3,350] was systematically optimized using the Additive Screen (Hampton Research); the addition of 10 mM yttrium(III) chloride hexahydrate yielded higher-quality crystals that were harvested, soaked in cryoprotectant solution (mother liquor plus 20% glycerol), and frozen in liquid nitrogen.
Data Collection and Structure Determination
X-ray diffraction data from crystals of FlvF were collected remotely on the Northeastern Collaborative Access Team 24-ID-E beamline at the Advanced Photon Source (APS) operated by the Argonne National Laboratory (Lemont, IL), using a Dectris EIGER 16M detector at 100 K under N2 gas flow. Diffraction data were indexed and integrated using iMosflm19 and scaled using Aimless20 in the CCP4 program suite.21 Crystals belonged to space group P21 with unit cell dimensions a = 90.6 Å, b = 195.0 Å, c = 101.4 Å, β = 106.2°. With eight molecules in the asymmetric unit, the Matthews coefficient VM = 2.54 Å3/Da (52% solvent content). Additional data collection statistics are listed in Table 1.
Table 1.
Unit Cell | |
Space group | P21 |
a, b, c (Å) | 90.6, 195, 101.4 |
α, β, γ (deg) | 90, 106.2, 90 |
Data Collection | |
Wavelength (Å) | 0.979 |
Resolution (Å) | 97.3 – 2.6 (2.7 – 2.6) |
Total/unique no. of reflections | 197,549/99,760 |
R merge a,b | 0.093 (0.479) |
R pim a,c | 0.032 (0.096) |
CC1/2a,d | 0.983 (0.474) |
I/σ(I) a | 18.7 (9.4) |
Redundancya | 2.0 (2.0) |
Completeness (%)a | 98.9 (95.8) |
Refinement | |
No. of reflections used in refinement/test set | 99,684 (9,635) |
R work e | 0.208 (0.274) |
R free f | 0.243 (0.311) |
No. of protein chains | 8 |
No. of non-hydrogen atoms | 22,284 |
Average B factor (Å2) | 49.3 |
macromolecules | 49.3 |
ligands | 55.7 |
RMSD from ideal geometry | |
bonds (Å) | 0.006 |
angles (deg) | 1.16 |
Ramachandran plot (%)g | |
favored | 95.6 |
allowed | 3.8 |
outliers | 0.5 |
PDB entry | 8D7F |
Values in parentheses refer to the highest-resolution shell of the data.
Rmerge = ∑|Ih − ⟨Ih⟩|/∑⟨Ih⟩; Ih = intensity measure for reflection h; ⟨Ih⟩ = average intensity for reflection h calculated from replicate data.
Rpim = ∑(1/(n − 1)1/2∑|Ih − ⟨Ih⟩|/∑⟨Ih⟩; n = number of observations (redundancy).
where is the true measurement error variance and is the independent measurement error variance.
Rwork = ∑||Fo| − |Fc||/∑|Fo| for reflections contained in the working set. |Fo| and |Fc| are the observed and calculated structure factor amplitudes, respectively.
Rfree = ∑||Fo| − |Fc||/∑|Fo| for reflections contained in the test set held aside during refinement.
Assessed by MolProbity.
The crystal structure of FlvF was determined using molecular replacement in Phenix Phaser-MR.22 The search model was generated using AlphaFold218 with access provided by ColabFold23 (open source software available at github.com/sokrypton/ColabFold). A single solution was identified with a log likelihood gain (LLG) score of 21044 and a translation-function Z-score of 59.1, indicating that a correct solution was identified.22
The starting model was subject to iterative rounds of refinement in Phenix Refine24 using simulated annealing, rigid body, individual B factors, and translation-libration-screw (TLS) parameters. All manual manipulation including the addition of ligands were performed using Coot25,26 and validated using polder electron density maps in Phenix.27 Disordered protein side chain atoms with poor electron density in the 2|Fo| − |Fc| map were omitted from the model. Parenthetically, we note that superposition of the initial model of FlvF and the final refined experimentally-determined structure of FlvF yields a root-mean-square (rms) deviation of 0.9 Å for 283 Cα atoms (calculated with PyMol), although there are peripheral regions of the protein that differ in structure (Figure S1). Refinement statistics are listed in Table 1. All images were rendered using PyMOL.28
Multiple Sequence Alignment and Phylogenetic Tree Generation
Protein-protein BLAST was run using the full protein sequence of FlvF as query against all fungal sequences deposited in the MycoCosm fungal genomics portal (http://jgi.doe.gov/fungi), 29 developed by the US Department of Energy Joint Genome Institute. Since proteins that share 40% or greater sequence identity are more likely to share functional similarity,30 and since proteins below this threshold can exhibit more divergent functions,31 40 hits exhibiting 40 –100% sequence identity and >90% sequence coverage with FlvF were downloaded (E-value > 1e−65). To generate the phylogenetic tree, amino acid sequences of FlvF and homologues were aligned using ClustalOmega and visualized using Jalview.32–34 The resulting multiple sequence alignment was used to generate a maximum likelihood phylogenetic tree using MEGA11.35 The tree with the highest log likelihood (−8,220.42) was used for further analysis. Initial trees were generated automatically by using the neighbor-joining and BioNJ algorithms set as default. The Bootstrap method was used for the test of phylogeny with 500 replications. The substitution model used was the Whelan And Goldman (WAG) model36 with Gamma Distribution (5 discrete gamma categories). Additional parameters include: gaps /missing data treatment (use all sites), ML heuristic method (nearest-neighbor-interchange (NNI)), branch swap filter (none), and number of threads (3). Connections with bootstrap values of >50% for 500 iterations were deemed significant and values are shown above branches.
For additional analysis, the dataset of 40 FlvF homologue sequences was condensed using the Cluster Database at High Identity with Tolerance (CD-HIT) webserver (http://weizhongli-lab.org/cdhit_suite/cgi-bin/index.cgi) 37 with a 90% identity cutoff. Thus, the dataset was reduced by clustering all sequences with >90% identity and selecting one sequence from each cluster. Nine unique representatives were used for further analysis along with the three terpene cyclases most closely related to FlvF: fusicoccadiene synthase (cyclase domain) from Phomopsis amygdali (PaFS), FgGS from Fusarium graminearum, and aristolochene synthase from Aspergillus terreus. A multiple sequence alignment was calculated using ClustalOmega32,33 and visualized using ESPript 3.0.38 The color-by-sequence-conservation feature in UCSF Chimera39 was used to illustrate the conservation of FlvF active site residues.
Results and Discussion
The 2.6 Å-resolution crystal structure of FlvF consists of 8 independent molecules in the asymmetric unit, revealing the canonical α fold of a class I terpene synthase as first observed in the crystal structure of avian farnesyl diphosphate synthase.40 Based on amino acid sequence analysis, FlvF is most similar to the cyclase domain of fusicoccadiene synthase from Phomopsis amygdali (PaFS; PDB 5ERM)41 with 20%/36% sequence identity/similarity, followed by the terpene cyclase FgGS from Fusarium graminearum (PDB 6VYD)42 (17%/33%) and aristolochene synthase from Aspergillus terreus (PDB 2OA6)43 (16%/25%). In terms of structural homology as assessed with the Dali server,44 the tertiary structure of FlvF most closely resembles that of FgGS and the cyclase domain of PaFS with rms deviations of 1.7 Å for 287 Cα atoms and 1.9 Å for 259 Cα atoms, respectively; aristolochene synthase is more distantly related, with an rms deviation of 9.3 Å for 205 Cα atoms (Figure 2a).
Despite its close structural homology with class I terpene cyclases, FlvF lacks intact metal-binding motifs characteristic of all class I terpene synthases: prenyltransferases contain two aspartate-rich DDXX(D/E) motifs, and cyclases contain one aspartate-rich DDXX(D/E) motif and one (N,D)D(L,I,V)X(S,T)XXXE motif (boldface indicates usual metal ligands) (Figure 2b).6 In both prenyltransferases and cyclases, these residues coordinate to a cluster of 3 Mg2+ ions along with the substrate diphosphate group or co-product inorganic pyrophosphate (PPi).45 In FlvF, the aspartate-rich motif is not conserved, although E95 of FlvF aligns with D94 in the aspartate-rich motif of the terpene cyclase FgGS. While the terminal glutamate of the second metal-binding motif in FlvF is conserved in FlvF as E233, sequence alignment of FlvF and FgGS shows that Y225 and H229 of FlvF correspond to metal ligands N209 and S213, respectively, in FgGS (Figure 2b). Notably, the side chain of Y225 occupies a cavity in the active site that would accommodate PPi and one of the Mg2+ ions (Figure 2c). In FgGS, the PPi anion is additionally stabilized by hydrogen bonds with R165, T169, N209, R216, R300, and Y301.42 With the exception of R165 (R181 in FlvF), these residues are similarly not conserved in FlvF. Neither metal ligands nor residues that would activate farnesyl diphosphate for catalysis are conserved in FlvF, and we confirmed that FlvF exhibits no activity when incubated with farnesyl diphosphate. Therefore, we conclude that FlvF is not a terpene cyclase, but instead has been repurposed to support an alternative catalytic function in flavunoidine biosynthesis.
The putative active site of FlvF is comprised of a large cavity in the center of the α-helical bundle. The binding of two Bis-Tris molecules in this cavity clearly demonstrate its capacity for binding small molecules. The active site cavity is divided into two regions, designated as the upper cavity and the lower cavity. One molecule of Bis-Tris binds in the lower cavity of chains A, B, and D, where it associates with the base of the active site largely defined by W187, F313, W316, and F68, and Y225 and H322 on one side of the active site (Figure 3). The orientation of Bis-Tris varies in each chain. Electron density for Bis-Tris in chain B is most prominent; here, the hydroxy groups of Bis-Tris form hydrogen bonds with the side chains of Y225 and H322 and the backbone carbonyl of N88. The lower active site cavities in chains C, E, F, G, and H are either empty or contain density that is not reliably interpreted as Bis-Tris and thus left unmodeled.
In the upper active site cavities of chains A and B, electron density is observed for a second molecule of Bis-Tris, which is positioned between Y184 and Y94 (Figure 3) and forms hydrogen bonds with the phenolic hydroxyl group of Y94. Additional interactions include hydrogen bonds with the backbone carbonyl groups of Y184 and N183. Notably, in protein chains that do not bind a second molecule of Bis-Tris (chains C, D, E, F, G, and H), electron density for the K98–K118 loop is not well resolved and hence presumed disordered (residue range varies slightly between each chain). In chain B, hydroxyethyl moieties of the two Bis-Tris molecules are hydrogen bonded (O•••O separation = 3.1 Å), although this distance is much longer in chain A (O•••O separation = 7.0 Å). This result demonstrates that disordered loops flanking the active site in the unliganded enzyme have the capacity to become ordered to stabilize bound ligands.
With regard to the quaternary structure of FlvF, native-PAGE analysis indicates that FlvF is predominantly a dimer (Figure 4a). A minor tetrameric species is also observed, although at high protein concentrations this oligomeric state may not be relevant physiologically. Surprisingly, FlvF is predicted to be a monomer based on analysis of macromolecular assemblies with PISA46 despite the burial of 1304 Å2 surface area at the dimer interface stabilized by several hydrogen bonds and salt bridges (Figure 4b,c). We cannot explain this result with PISA, especially since the mode of dimer assembly is similar to that observed previously in other terpene cyclases (Figure S2).47–49
Crystal packing is such that two dimers assemble in the unit cell oriented approximately 90º relative to one another to form a tetramer (Figure S3). Tetramer assembly buries 811 Å2 surface area at the dimer-dimer interface. Since this is relatively small, this could explain why tetramers are formed to a much smaller extent compared with dimers in native-PAGE analysis (Figure 4a). Tetramer assembly has been observed previously in A. terreus aristolochene synthase.43 Aristolochene synthase forms an antiparallel dimer similar to that of FlvF, but the tetramer is formed with 222 point group symmetry with a larger average buried surface area of 1634 Å2. Therefore, we conclude that FlvF likely functions as a dimer in solution, and tetramer assembly is mainly a consequence of crystal packing interactions.
Based on the structure of the FlvF-Bis-Tris2 complex, the positioning of Bis-Tris molecules guides the manual docking of pre-flavunoidine in the active site to generate a plausible model of the enzyme-product complex (Figure 5). This model shows that the tetracyclic sesquiterpene core of pre-flavunoidine is surrounded by aromatic residues and the pendant dimethyl cadaverine moiety is positioned between Y184 and Y94. This model provides a framework for speculation regarding FlvF function, since there are several residues in the vicinity of the C–N bond that could serve as a catalytic general acid or base, such as E95, H229, and H322. At this point in time, we cannot propose a detailed chemical mechanism for C–N bond formation in the absence of additional structural information regarding substrate binding in the active site.
To further probe the puzzle of FlvF structure-function relationships, we performed a protein-protein BLAST search of the fungal genome using FlvF as a query protein. We identified 40 hits ranging from 42.8 – 100% amino acid sequence identities, two of which were identical to FlvF from Aspergillus flavus (Table S1). This range was selected because proteins sharing ≥40% sequence identity are more likely to share functional similarity,30 and proteins below this threshold often exhibit more divergent functions.31 To reduce redundant sequences in further analyses, the 40 BLAST hits were clustered using a 90% sequence identity cutoff. Nine uncharacterized fungal sequences with 43 – 89% sequence identities were selected for the generation of a multiple sequence alignment (Figure S4). Based on this alignment, amino acid sequence conservation was mapped onto the FlvF structure (Figure 6a). Overall, the structural core containing the active site is generally conserved, while residues on peripheral helices are not. The aromatic base of the active site, i.e., the lower cavity, is highly conserved; in contrast, the upper cavity exhibits greater variation, particularly with residues N183, Y184, H229, and H322 (Figure 6b). This might suggest that FlvF homologues bind a similar tetracyclic sesquiterpene core, along with various pendant amino moieties that might be accommodated for C–N bond formation. For example, the phospholipase C inhibitor, hispidospermidin, contains a tetracyclic sesquiterpene core identical to that of flavunoidine, but instead of a pendant dimethyl cadaverine moiety hispidospermidin contains a pendant trimethylspermidine moiety.50 Although the hispidospermidin biosynthetic pathway has not yet been elucidated, we speculate that this pathway contains an analogue of FlvF that similarly governs the C–N bond-forming reaction. Other amino acid-derived amino moieties might be similarly accommodated in the active sites of other FlvF homologues for the generation of diverse aminoterpenes.
The fungal sequences identified from protein-protein BLAST were utilized in a multiple sequence alignment, and a maximum likelihood phylogenetic tree was calculated using MEGA11 (Figure 6c).35 In the phylogenetic tree, branching occurs at several points; FlvF establishes a clade with homologues from other Aspergillus species, such as sequences from Aspergillus parvisclerotigenus CBS 117616, Aspergillus oryzae RIB40, Aspergillus caelatus CBS 763.97, and Aspergillus pseudocaelatus CBS 121.62.
Interestingly, these fungal homologues do not contain intact metal-binding motifs DDXX(D/E) or (N,D)D(L,I,V)X(S,T)XXXE (boldface indicates usual metal ligands) characteristic of class I terpene cyclases (Figure 6c, Figure 2b), with the exception of the enzyme from A. caespitosus CBS 103.45 (here, the first two aspartates but not the third are conserved in the aspartate-rich motif) (Figure S4). There is a distinct separation between the sequences of motifs from the Aspergillus species and the Fusarium species along with Tolypocladium ophioglossoides CBS100239 and Neocosmospora boninensis NRRL 22470, which is mirrored by the division of clades in the phylogenetic tree. While E95, E98, D226, and E233 are highly conserved (or conservatively replaced with aspartate), the primary differences result from the substitution of C96 with alanine and H229 with threonine, arginine, and lysine. The most divergent motif sequences are observed in Xylariaceae sp. FL0662B, for which the DDXX(D/E) motif does not fully align with the remaining FlvF homologues. Because FlvF belongs to a separate clade from the Fusarium species and other Aspergillus species, it is a reasonable expectation that FlvF will exhibit a different molecular function and/or substrate preference. Mutagenesis studies will be employed to probe the functional importance of these residues once the precise substrate of FlvF is identified.
In closing, we would like to highlight the fact that the FlvF structure was solved by molecular replacement using a model generated by AlphaFold2. No previously determined terpene synthase structures were successfully utilized as molecular replacement probes, so the advent of AlphaFold2 was critical for a successful structure determination. This work accordingly demonstrates the powerful complementarity of computational and experimental approaches to protein crystal structure determination – together, these approaches are more powerful than either one separately. With regard to FlvF structure-function relationships, we note that the proposed mechanism17 by which a secondary carbocation could act as a substrate in a reaction catalyzed by FlvF remains elusive in view of the crystal structure. While FlvF contains numerous aromatic residues in the lower active site that could stabilize a carbocation through cation-π interactions, it is unknown as to how the carbocation is generated, or arrives, in the FlvF active site. It has been speculated that FlvF might form a complex with the P450 enzyme FlvD to enable transit of the carbocation from one active site to the next.17 However, given the typical reactivity of secondary carbocations, with lifetimes calculated to be on the femtosecond timescale,51 it is difficult to envisage how this transfer could occur. Possibly, a more stable derivative is transferred to FlvF and then activated for catalysis. We conclude that the crystal structure determination of FlvF represents a critical first step in resolving these ambiguities as we study structure-function relationships in a repurposed terpene cyclase fold.
Supplementary Material
ACKNOWLEDGEMENTS
We are grateful to Professor Karen Allen for helpful and insightful discussions. This research is based upon research conducted at the Northeastern Collaborative Access Team beamlines, which are funded by the National Institute of General Medical Sciences of the NIH (P30 GM124165). The Eiger 16M detector on the 24-ID-E beam line is funded by an NIH-ORIP HEI grant (S10OD021527). This research utilized resources of the Advanced Photon Source, a U.S. Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357.
Funding
This research was supported by NIH grants R01GM056838 (D.W.C.) and R35GM118056 (Y.T.).
Footnotes
Conflict of Interest Statement
The authors declare no competing financial interests.
ASSOCIATED CONTENT
Supporting Information
The Supporting Information is available free of charge on the ACS Publications website at DOI:
Accession Code
Atomic coordinates and structure factor amplitudes of FlvF have been deposited in the Protein Data Bank (www.rcsb.org) with accession code 8D7F.
References
- (1).Newman DJ, and Cragg GM (2020) Natural products as sources of new drugs over the nearly four decades from 01/1981 to 09/2019. J. Nat. Prod. 83, 770–803. [DOI] [PubMed] [Google Scholar]
- (2).Beller HR, Lee TS, and Katz L (2015) Natural products as biofuels and bio-based chemicals: fatty acids and isoprenoids. Nat. Prod. Rep. 32, 1508–1526. [DOI] [PubMed] [Google Scholar]
- (3).Sharmeen JB, Mahomoodally FM, Zengin G, and Maggi F (2021) Essential oils as natural sources of fragrance compounds for cosmetics and cosmeceuticals. Molecules 26, 666. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (4).Dutta S, Whicher JR, Hansen DA, Hale WA, Chemler JA, Congdon GR, Narayan ARH, Håkansson K, Sherman DH, Smith JL, and Skiniotis G (2014) Structure of a modular polyketide synthase. Nature 510, 512–517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (5).Martínez-Núñez MA, and López V. E. L. y. (2016) Nonribosomal peptides synthetases and their applications in industry. Sustain. Chem. Process. 4, 1–8. [Google Scholar]
- (6).Christianson DW (2017) Structural and chemical biology of terpenoid cyclases. Chem. Rev. 117, 11570–11648. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (7).Chen M, Harris GG, Pemberton TA, Christianson DW (2016) Multi-domain terpenoid cyclase architecture and prospects for proximity in bifunctional catalysis. Curr. Op. Struct. Biol. 41, 27–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (8).Faylo JL, Ronnebaum TA, Christianson DW (2021) Assembly-line catalysis in bifunctional terpene synthases. Acc. Chem. Res. 54, 3780–3791. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (9).Wei X, Wang WG, and Matsuda Y (2022) Branching and converging pathways in fungal natural product biosynthesis. Fungal Biol. Biotechnol. 9, 1–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (10).Mitsuhashi T, Barra L, Powers Z, Kojasoy V, Cheng A, Yang F, Taniguchi Y, Kikuchi T, Fujita M, Tantillo DJ, Porco JA, and Abe I (2020) Exploiting the potential of meroterpenoid cyclases to expand the chemical space of fungal meroterpenoids. Angew. Chem. Int. Ed. 59, 23772–23781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (11).Barra L, and Abe I (2021) Chemistry of fungal meroterpenoid cyclases. Nat. Prod. Rep. 38, 566–585. [DOI] [PubMed] [Google Scholar]
- (12).Awakawa T, and Abe I (2021) Reconstitution of polyketide-derived meroterpenoid biosynthetic pathway in Aspergillus oryzae. J. Fungi (Basel) 7, 486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).Itoh T, Tokunaga K, Matsuda Y, Fujii I, Abe I, Ebizuka Y, and Kushiro T (2010) Reconstitution of a fungal meroterpenoid biosynthesis reveals the involvement of a novel family of terpene cyclases. Nat. Chem. 2, 858–864. [DOI] [PubMed] [Google Scholar]
- (14).Zhang W, Du L, Qu Z, Zhang X, Li F, Li Z, Qi F, Wang X, Jiang Y, Men P, Sun J, Cao S, Geng C, Qi F, Wan X, Liu C, and Li S (2019) Compartmentalized biosynthesis of mycophenolic acid. Proc. Natl. Acad. Sci. U.S.A. 116, 13305–13310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (15).Araki Y, Awakawa T, Matsuzaki M, Cho R, Matsuda Y, Hoshino S, Shinohara Y, Yamamoto M, Kido Y, Inaoka DK, Nagamune K, Ito K, Abe I, and Kita K (2019) Complete biosynthetic pathways of ascofuranone and ascochlorin in Acremonium egyptiacum. Proc. Natl. Acad. Sci. U.S.A. 116, 8269–8274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (16).Tsukada K, Shinki S, Kaneko A, Murakami K, Irie K, Murai M, Miyoshi H, Dan S, Kawaji K, Hayashi H, Kodama EN, Hori A, Salim E, Kuraishi T, Hirata N, Kanda Y, and Asai T (2020) Synthetic biology based construction of biological activity-related library of fungal decalin-containing diterpenoid pyrones. Nat. Commun. 11, 1830. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (17).Yee DA, Kakule TB, Cheng W, Chen M, Chong CTY, Hai Y, Hang LF, Hung YS, Liu N, Ohashi M, Okorafor IC, Song Y, Tang M, Zhang Z, and Tang Y (2020) Genome mining of alkaloidal terpenoids from a hybrid terpene and nonribosomal peptide biosynthetic pathway. J. Am. Chem. Soc. 142, 710–714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (18).Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, and Hassabis D (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (19).Battye TGG, Kontogiannis L, Johnson O, Powell HR, and Leslie AGW (2011) iMOSFLM: a new graphical interface for diffraction-image processing with MOSFLM. Acta. Crystallogr., Sect. D: Biol. Crystallogr. 67, 271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (20).Evans PR, and Murshudov GN (2013) How good are my data and what is the resolution? Acta. Crystallogr., Sect. D: Biol. Crystallogr. 69, 1204–1214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (21).Winn MD, Ballard CC, Cowtan KD, Dodson EJ, Emsley P, Evans PR, Keegan RM, Krissinel EB, Leslie AGW, McCoy A, McNicholas SJ, Murshudov GN, Pannu NS, Potterton EA, Powell HR, Read RJ, Vagin A, and Wilson KS (2011) Overview of the CCP4 suite and current developments. Acta. Crystallogr., Sect. D: Biol. Crystallogr. 67, 235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (22).McCoy AJ, Grosse-Kunstleve RW, Adams PD, Winn MD, Storoni LC, and Read RJ (2007) Phaser crystallographic software. J. Appl. Crystallogr. 40, 658–674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (23).Mirdita M, Ovchinnikov S, and Steinegger M (2021) ColabFold - Making protein folding accessible to all. bioRxiv. doi: 10.1101/2021.08.15.456425 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (24).Afonine P. v, Grosse-Kunstleve RW, Echols N, Headd JJ, Moriarty NW, Mustyakimov M, Terwilliger TC, Urzhumtsev A, Zwart PH, and Adams PD (2012) Towards automated crystallographic structure refinement with phenix.refine. Acta. Crystallogr., Sect. D: Biol. Crystallogr. 68, 352–367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (25).Emsley P, and Cowtan K (2004) Coot: model-building tools for molecular graphics. Acta. Crystallogr., Sect. D: Biol. Crystallogr. 60, 2126–2132. [DOI] [PubMed] [Google Scholar]
- (26).Emsley P, Lohkamp B, Scott WG, and Cowtan K (2010) Features and development of Coot. Acta. Crystallogr., Sect. D: Biol. Crystallogr. 66, 486–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (27).Liebschner D, Afonine PV, Moriarty NW, Poon BK, Sobolev OV, Terwilliger TC, and Adams PD (2017) Polder maps: improving OMIT maps by excluding bulk solvent. Acta. Crystallogr., Sect. D: Biol. Crystallogr. 73, 148–157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (28).Schrödinger L (2015) The PyMOL Molecular Graphics System, Version 2.0.
- (29).Grigoriev IV, Nikitin R, Haridas S, Kuo A, Ohm R, Otillar R, Riley R, Salamov A, Zhao X, Korzeniewski F, Smirnova T, Nordberg H, Dubchak I, Shabalov I (2014) MycoCosm portal: gearing up for 1000 fungal genomes. Nucleic Acids Res. 42, D699–D704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (30).Pearson WR (2013). An introduction to sequence similarity (“homology”) searching. Curr. Protoc. Bioinformatics, Chapter 3, Unit 3.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (31).Sangar V, Blankenberg DJ, Altman N and Lesk AM (2007) Quantitative sequence-function relationships in proteins based on gene oncology. BMC Bioinformatics 8, 294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (32).Sievers F, Wilm A, Dineen DG, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins D (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (33).Goujon M, McWilliam H, Li W, Valentin F, Squizzato S, Paern J, Lopez R (2010) A new bioinformatics analysis tools framework at EMBL-EBI. Nucleic Acids Res. 38, W695–W699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (34).Waterhouse AM, Proctor JB, Martin DM, Clamp M, and Barton GJ (2009) Jalview version 2 – a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (35).Tamura K, Stecher G, and Kumar S (2021) MEGA 11: molecular evolutionary genetics analysis version 11. Mol. Biol. Evol. 38, 3022–3027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (36).Whelan S, and Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691–699. [DOI] [PubMed] [Google Scholar]
- (37).Huang Y, Niu B, Gao Y, Fu L, and Li W (2010) CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26, 680–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (38).Robert X and Gouet P (2014) Deciphering key features in protein structures with the new ENDscript server. Nucleic Acids Res. 42, W320–W324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (39).Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, and Ferrin TE (2004) UCSF Chimera--a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612. [DOI] [PubMed] [Google Scholar]
- (40).Tarshis LC, Sacchettini JC, Yan M, and Poulter CD (1994) Crystal structure of recombinant farnesyl diphosphate synthase at 2.6-Å resolution. Biochemistry 33, 10871–10877. [DOI] [PubMed] [Google Scholar]
- (41).Chen M, Chou WKW, Toyomasu T, Cane DE, and Christianson DW (2016) Structure and function of fusicoccadiene synthase, a hexameric bifunctional diterpene synthase. ACS Chem. Biol. 11, 889–899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (42).He H, Bian G, Herbst-Gervasoni CJ, Mori T, Shinsky SA, Hou A, Mu X, Huang M, Cheng S, Deng Z, Christianson DW, Abe I, and Liu T (2020) Discovery of the cryptic function of terpene cyclases as aromatic prenyltransferases. Nat. Commun. 11, 3958. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (43).Shishova EY, Di Costanzo L, Cane DE, and Christianson DW (2007) X-ray crystal structure of aristolochene synthase from Aspergillus terreus and evolution of templates for the cyclization of farnesyl diphosphate. Biochemistry 46, 1941–1951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (44).Holm L, and Rosenström P (2010) Dali server: conservation mapping in 3D. Nucleic Acids Res. 38, W545–W549. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (45).Aaron JA, and Christianson DW (2010) Trinuclear metal clusters in catalysis by terpenoid synthases. Pure Appl. Chem. 82, 1585–1597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (46).Krissinel E, and Henrick K (2007) Inference of macromolecular assemblies from crystalline state. J. Mol. Biol. 372, 774–797. [DOI] [PubMed] [Google Scholar]
- (47).Lesburg CA, Zhai G, Cane DE, and Christianson DW (1997) Crystal structure of pentalenene synthase: mechanistic insights on terpenoid cyclization reactions in biology. Science 277, 1820–1824. [DOI] [PubMed] [Google Scholar]
- (48).Rynkiewicz MJ, Cane DE, and Christianson DW (2001) Structure of trichodiene synthase from Fusarium sporotrichioides provides mechanistic inferences on the terpene cyclization cascade. Proc. Natl. Acad. Sci. U.S.A. 98, 13543–13548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (49).Harris GG, Lombardi PM, Pemberton TA, Matsui T, Weiss TM, Cole KE, Köksal M, Murphy FV, Vedula LS, Chou WKW, Cane DE, and Christianson DW (2015) Structural studies of geosmin synthase, a bifunctional sesquiterpene synthase with αα domain architecture that catalyzes a unique cyclization-fragmentation reaction sequence. Biochemistry 54, 7142–7155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (50).Yanagisawa M, Sakal A, Adachi K, Sano T, Watanabe K, Tanaka Y, and Okuda T (1994) Hispidospermidin, a novel phospholipase C inhibitor produced by Chaetosphaeronema hispidulum (Cda) Moesz NR 7127. I. Screening, taxonomy, and fermentation. J. Antibiot 47, 1–5. [DOI] [PubMed] [Google Scholar]
- (51).Pemberton RP, and Tantillo DJ (2014) Lifetimes of carbocations encountered along reaction coordinates for terpene formation. Chem. Sci. 5, 3301–3308. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.