Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jul 6.
Published in final edited form as: J Am Chem Soc. 2024 Oct 23;146(44):30194–30203. doi: 10.1021/jacs.4c08761

Ancestral sequence reconstruction to enable biocatalytic synthesis of azaphilones

Chang-Hwa Chiang 1,2, Ye Wang 2, Azam Hussain 3, Charles L Brooks III 1,4,5, Alison R H Narayan 1,2,4
PMCID: PMC11923553  NIHMSID: NIHMS2064985  PMID: 39441831

Abstract

Biocatalysis can be powerful in organic synthesis but often limited by enzymes’ substrate scope and selectivity. Developing a biocatalytic step involves identifying an initial enzyme for the target reaction, followed by optimization through rational design, directed evolution, or both. These steps are time-consuming, resource-intensive, and require expertise beyond typical organic chemistry. Thus, an effective strategy streamlining the process from enzyme identification to implementation is essential to expanding biocatalysis. Here, we present a strategy combining bioinformatics-guided enzyme mining and ancestral sequence reconstruction (ASR) to resurrect enzymes for biocatalytic synthesis. Specifically, we achieve an enantioselective synthesis of azaphilone natural products using two ancestral enzymes: a flavin-dependent monooxygenase (FDMO) for stereodivergent oxidative dearomatization and a substrate-selective acyltransferase (AT) for acylation of the enzymatically installed hydroxyl group. This cascade, stereocomplementary to established chemoenzymatic routes, expands access to enantiomeric linear tricyclic azaphilones. By leveraging the co-occurrence and co-evolution of FDMO and AT in azaphilone biosynthetic pathways, we identified an AT candidate, CazE, and addressed its low solubility and stability through ASR, obtaining a more soluble, stable, promiscuous, and reactive ancestral AT (AncAT). Sequence analysis revealed AncAT as a chimeric composition of its descendants, with enhanced reactivity likely due to ancestral promiscuity. Flexible receptor docking and molecular dynamics simulations showed that the most reactive AncAT best promotes a reactive geometry between substrates. We anticipate that our bioinformatics-guided, ASR-based approach can be broadly applied in target-oriented synthesis, reducing the time required to develop biocatalytic steps and efficiently access superior biocatalysts.

Graphical Abstract

graphic file with name nihms-2064985-f0001.jpg

INTRODUCTION

Biocatalysis is a powerful tool, as enzymes can provide catalyst-controlled selectivity in the synthesis of complex molecules such as the HIV drug islatravir and insulin derivatives.12 However, the implementation of biocatalysis is often limited by the native substrate scope or selectivity of enzymes, necessitating additional effort in protein engineering to achieve different substrate scope, selectivity, or reactivity.3 Directed evolution has been a cornerstone in optimizing enzymes for various purposes.4 However, directed evolution is generally time- and resource-intensive, which increases the barrier to its use. Additionally, this approach can face many practical challenges at different stages, including identifying a soluble and stable starting enzyme with the desired activity which can be optimized and navigating decreasing protein stability over the course of an evolution campaign.58 Finding a suitable starting point is essential to a protein engineering approach, and the properties of the initial protein directly affect the evolvability of an enzyme and its compatibility with a high throughput screening workflow, which eventually determines the success of an evolution campaign.6 As an alternative to directed evolution, de novo enzyme design has shown some success in modifying or extending enzymatic activities, however, it commonly requires further optimization after the original design.3 Other approaches, such as consensus sequence design9 and ProteinMPNN10, have also been applied to improve enzyme expression, stability, and function. Although these two approaches can achieve excellent outcomes, they currently are limited to optimizing inherent enzymatic activities and have not been used for generating enzymes with non-native substrate scope or selectivity, which is often required for target-oriented applications. Therefore, more efficient and effective biocatalyst development workflows are necessary to find or create enzymes for target-oriented synthesis (Figure 1A).

Figure 1.

Figure 1.

Common strategies for biocatalyst development and the targeted biocatalytic synthesis in the study expected to diversify the azaphilone chemical space. (A) Strategies include screening wild-type enzymes, directed evolution, and using computational tools such as consensus sequence, ProteinMPNN, enzyme design, and bioinformatics-guided ancestral sequence reconstruction. (B) Stereocomplementary chemoenzymatic synthesis of linear tricyclic azaphilones enabled by the identification of a suitable flavin-dependent monooxygenase and acyltransferase. (C) Based on the versatility of groups R1, R2, and R3 on the azaphilones, stereocomplementarity doubles the number of combinations and thus chemical diversity.

With experience in the challenges of applying biocatalysis in target-oriented synthesis, we envisioned using ancestral sequence reconstruction (ASR) and resurrection in combination with bioinformatics to identify and optimize enzymes for synthesis. ASR is commonly employed in studying protein sequence-function relationships.1117 As ancestral enzymes often possess attractive features over their extant relatives, such as enhanced solubility, thermostability, and promiscuity,1821 ASR shows great potential for the application of biocatalysis. For example, researchers have used ASR to engineer P450 enzymes and ene-reductases with improved stability and solubility,2224 enhance the activity of ketol-acid reductoisomerases,22 and obtain enzymes with greater promiscuity.15,18,23 As ASR is based on phylogeny, it can be seamlessly combined with other bioinformatic tools such as homology searches,25 sequence similarity networks,2627 and BGC analyses2627 to guide the identification of an initial enzyme candidate. We anticipated that a bioinformatics-guided ASR approach could provide a retrospective path to find enzymes with desired traits and substrate selectivity if an appropriate sequence space was targeted and sampled.

To test our bioinformatics-guided ASR strategy, we chose azaphilones, a class of fungal polyketide natural products, as our synthetic target.2829 Azaphilones, which feature a conserved oxygenated pyranoquinone bicyclic core and a tetrasubstituted stereocenter, are particularly valuable due to their wide range of biological activities.2829 In nature, the C7-stereocenter of azaphilones is installed through a flavin-dependent monooxygenase (FDMO) catalyzed oxidative dearomatization.30 The azaphilone core can be further elaborated by introducing the third ring through acylation by an acyltransferase (AT) and subsequent spontaneous Knoevenagel condensation.3132 Inspired by nature, we recently reported chemoenzymatic syntheses of linear tricyclic azaphilones,32 including rubropunctatin (1) and monascorubrin (2),33 using the enzymatic sequence involving AzaH34 and MrPigD31 (Figure 1B). Specifically, the chemoenzymatic synthesis requires FDMOs which can catalyze stereodivergent oxidative dearomatization reactions to transform 3 into (S)-4 or (R)-4 selectively. This stereodivergent step can be followed by an acylation event with 5, which can be mediated by an AT to form (S)-6 or (R)-6, respectively (Figure 1B). By combining modifications at R1 and R2, and also diversification through amination, a number of linear tricyclic azaphilone derivatives can be accessed (Figure 1C). However, realization of this stereodivergent strategy was stymied by the lack of an AT capable of acylating (S)-4 to form (S)-6 as no reported biosynthetic gene clusters (BGCs) responsible for synthesis of linear tricyclic azaphilones bearing the (S)-configuration at C7. In addition, AfoD35 is the only extant FDMO reported to act on 3, and it does so with low conversion.36 Therefore, synthetic access to (S)-tricyclic linear azaphilone structures was not possible through this chemoenzymatic approach.

To solve this problem and achieve a stereocomplementary biocatalytic synthesis, we used an ancestral flavin-dependent monooxygenase (AncFDMO) reconstructed and engineered for our previous enzyme evolutionary study37 with a newly reconstructed ancestral acyltransferase (AncAT). The reconstruction and sampling of AncATs was guided by knowledge of FDMO stereoselectivity16 and co-localization of FDMOs and ATs in biosynthetic pathways.38 This strategy allowed us to focus on a smaller sequence space most likely to have the substrate selectivity of interest. By bringing together bioinformatic tools and computational molecular evolution, ASR is positioned to guide the generation of biocatalysts tailored for specific synthetic challenges. We anticipated that this type of ASR-based strategy could be more broadly applied in the biocatalytic synthesis of natural products and derivatives.

RESULTS AND DISCUSSION

Guiding acyltransferase discovery through bioinformatics

To identify a candidate AT to achieve a stereocomplementary synthetic strategy, we leveraged knowledge of FDMO enantioselectivity to increase the likelihood of discovering a candidate AT. Since FDMO and AT genes are generally co-localized in the same BGC,30 our strategy was to find candidate ATs that are part of the same gene cluster as FDMOs that stereoselectively afford (S)-4 (Figure 1B). This assumes that co-localization leads to co-evolution on the related substrate scaffolds, increasing the likelihood of identifying a candidate gene to further explore. We began our AT mining by testing five FDMO co-localizing ATs in our library but did not find any hits (Table S1). Therefore, we redirected our attention to reported azaphilone-related BGCs that possess a FDMO-AT pair. The search returned three FDMO-AT pairs for rubropunctatin/monascorubrin33 (1/2), azanigerone A34 (7), and chaetoviridin A39 (8, Figure 2A). The AT in the BGC of azanigerone A, AzaD34, was excluded as the pyranoquinone core of the product 7 has the opposite stereochemical configuration to our targeted substrate (S)-4 (R1 = n-Pr). Additionally, the acyl group incorporated by AzaD does not possess the 3-ketoacyl motif required to provide access to the butanolide ring. MrPigD31, the AT in the BGCs of rubropunctatin and monascorubrin was tested for activity with the substrate 4 (R1 = n-Pr). As expected, MrPigD only reacted with (R)-4 (R1 = n-Pr) and did not show any activity toward (S)-4 (R1 = n-Pr), implying that absolute configuration is important for the substrate binding in the active site of these ATs. Chaetoviridin A has the same absolute configuration at C7 as rubropunctatin and monascorubrin,39 so we omitted its corresponding AT, CazE, from the initial testing.

Figure 2.

Figure 2.

Acyltransferase mining based on the co-localization and the upstream-downstream relationship of flavin-dependent monooxygenase (FDMO) and acyltransferase (AT) in azaphilone biosynthetic gene clusters (BGCs). (A) Well-studied azaphilone related BGCs. (B) Enzymatic cascade featuring a sequential relationship between FDMO and AT. (C) Agreement between the FDMO and AT clusterings ex-emplified by three FDMO-AT clustering pairs and quantitated by the adjusted mutual information (AMI) score. (D) Workflow for AT mining starting from the BLAST search of AfoD and subsequent identification of ATs co-occurring with an adjacent homolog of AfoD. The following mining workflow diverges into an ancestral sequence reconstruction (ASR) based approach and a protein engineering-based approach using a wild-type enzyme as a template.

After not finding any active AT candidates from previously reported BGCs, we turned our focus to the BGC of asperfuranone (9, Figure 2A). Although this BGC does not contain an AT-encoding gene, it possesses a FDMO, AfoD35, that can afford the (S)-dearomatized product akin to (S)-4 (Figure 2A).36 Considering the frequent co-localization and sequential relationship of FDMO and AT in the azaphilone biosynthetic pathway (Figure 2B), we reasoned that identifying the AT’s paired with FDMOs which are homologous to AfoD could offer an additional avenue of exploration. The FDMO homologs in PF01494 were searched by performing BLAST using AfoD as the query on EFI-EST2627 (Figure S3), and these genes were then filtered to retain those having a co-occurring AT located within a range of 10 genes either upstream or downstream by EFI-GNT.2627 Next, we generated separate sequence similarity networks (SSNs) for those FDMOs in PF01494 and ATs in PF02458. Clustering by SSN is commonly used to search for homologous sequences based on the assumption that the sequences in the same cluster are likely to be “isofunctional.” Built upon clustering data and genome neighborhood analysis of FDMO and AT genes, the agreement between the two clusterings was assessed using the adjusted mutual information (AMI) score,40 with scores near 1 indicating a high degree of agreement (shared information). We observed an AMI of 0.85, validating our co-localization and co-evolution hypothesis. The high degree of mutual information shared can be visualized in Figure 2C by mapping the SSN clusters between the co-localized FDMOs and ATs, which shows that most FDMOs from the same cluster in the SSN of PF01494 have their adjacent AT partners in the same cluster in the SSN of PF02458. In brief, this proof-of-concept lays the foundation for the subsequent AT mining workflow shown in Figure 2D that leverages knowledge about the enantioselectivity of FDMOs,16 given the co-localization of FDMO and AT genes.

After demonstrating that the substrate selectivity of ATs can be inferred from the stereoselectivity of FDMOs in the same BGC, we followed the workflow depicted in Figure 2D to significantly narrow the AT sequence space to target (39,000 PF02458 sequences in the UniProtKB database as of November 2023). The PSI-BLAST41 search was performed on the NCBI database using AfoD as our query to potentially cover a more diverse sequence set of FDMOs, and these sequences were then filtered to retain those with adjacent ATs in PF02458. In the top twenty hits of the BLAST search, four FDMOs with 55–60% sequence identity to AfoD were found to coexist with an AT (Table S3). Surprisingly, CazL,39 which is the associated FDMO of CazE (AT), was found to be one of the four hits (Table S4). We proceeded to investigate the reactivity of CazE. Interestingly, although CazE was previously reported to react with 11 (Figure 3A), about 10% substrate conversion was seen when (S)-12 was incubated with purified CazE, encouraging us to further explore CazE and its close homologs.

Figure 3.

Figure 3.

Bioinformatics-guided sampling of ancestral sequence space of CazE and resurrection of ancestral acyltransferases (ATs). (A) Reported CazL/CazE (FDMO/AT) sequence in the biosynthetic gene cluster of chaetoviridin A. (B) Targeted two-enzyme sequence to access a linear tricyclic azaphilone. (C) Unrooted phylogenetic tree comprising 240 extant ATs reconstructed using the LG maximum likelihood model. The CazE clade was zoomed in, and all the ATs experimentally sampled were highlighted in cyan for ancestors and green for extant enzymes. (D) Preliminary study of resurrected ATs. AT-expressing E. coli BL21(DE3) pellets were lysed, and the soluble expression level of ATs was evaluated using SDS-PAGE. The preliminary screening of ATs’ reactivities was performed following the conditions: 2.5 mM substrate 13, 10 μM flavin-dependent monooxygenase (FDMO, AncFDMO), 1 mM NADP+, 5 mM glucose-6-phosphate (G6P), 2 U·mL−1 glucose-6-phosphate dehydrogenase (G6PDH), 50 mM potassium phosphate buffer, pH 8.0, 30 °C, 1 h. 1.1 equiv of thioester 14 was added into 50 μL of the reaction mixture, mixed with 50 μL of the freshly-lysed clarified cell lysate, and incubated at 30 °C for another 1 h. (E) Melting temperatures of two extant and three ancestral ATs determined by differential scanning fluorimetry. (F) AlphaFold2 model of AncAT4 with an incorporated phosphopantetheine C5-thioester. The active site of AT4 was defined using the geometric center of His166 and colored based on the identity to AncAT5 (light green) and AT6 (green).

Reconstruction and resurrection of ancestral acyltransferases

The acylation reaction catalyzed by CazE possessed too low a yield to make it practical for implementation in a synthetic route. To increase enzyme activity, we considered using directed evolution,4243 which relies on iterative rounds of high-throughput screening of CazE variants. A critical prerequisite of fitting an enzyme into a directed evolution campaign is whether the enzyme is sufficiently soluble.44 However, we were not able to detect any product when performing reactions using the clarified cell lysate of CazE (Figure 3B). After careful consideration of the applicability of directed evolution, we recognized that CazE would not be compatible with the high-throughput screening due to its low level of soluble expression. To overcome this hurdle, we leveraged ASR as an approach to acquire more stable, soluble, and promiscuous enzymes, and to avoid a resource intensive evolution campaign. Although the reactivity of CazE was low, the preliminary result suggested that we may have already found sequence space which captured our desired selectivity. From the ASR, we anticipated that we could access more reactive ancestral enzymes by deliberately introducing promiscuity or at least generate enzymes with higher stability and soluble expression.4547 The ASR was started by reconstructing an unrooted maximum likelihood phylogenetic tree built from 240 homologous AT sequences acquired through BLAST search using CazE as the query (Figure 3C and S4). From this tree, we resurrected seven ancestral ATs (AncAT1–7) that are from the ancestral nodes either on the evolutionary trajectory to CazE or in the close sister clades, and sampled a closely homologous extant AT from Talaromyces proteolyticus, TpAT. The ancestral sequences were then aligned with CazE and TpAT, demonstrating that all ATs share over 58% identity with each other (Figure 3D). The expression level and reactivity of all ATs were qualitatively characterized in the clarified cell lysate format. All ancestors showed superior expression levels and solubility compared to CazE and TpAT based on SDS-PAGE (Figure 3D). Additionally, we observed that the reaction mixture with AncAT4 turned orange compared to the negative control after 1 h in the screening against (S)-12 (dearomatized product of 13) and the C5-thioester 14, indicating that some (S)-15 was likely formed (Figure 3D). Based on these preliminary results, the representative ATs, including AncAT4–6, CazE, and TpAT were further purified, and their melting temperatures were measured using differential scanning fluorometry. This revealed that, although AT4–6 were all recent ancestors, they were still generally more stable than the extant enzymes including CazE and TpAT (43.5 vs. 40.4 °C, Figure 3E and Table S2), aligning with many previous reports that ancestral proteins tend to be more thermally stable.20 AlphaFold2 homology models of AncAT4–6 and CazE were generated, and all of them showed high structural alignment with each other (Figure 3F). In summary, all ancestral ATs were successfully resurrected.

Exploring expanded substrate scope in ancestral acyltransferases

With select purified ATs in hand, we evaluated the activity of each enzyme against substrate (S)-12. In the two-enzyme sequence, (S)-4 was first generated from 3 through the oxidative dearomatization reaction catalyzed by the purified AncFDMO (Figure 4A). After one hour, thioester acyl donor and acyltransferase were added and incubated for an additional hour. Based on the consumption of (S)-12, the activity of the acyltransferases with the C5-thioester as an acyl donor followed the trend of AT4 > AT5 > CazE > AT6 ~ TpAT, with AT4 producing the highest conversion (40%, Figure 4B). In addition to the C5-thioester, thioesters with varying chain lengths (C1, 3, 6–7, and 9) were tested against (S)-12 using AncAT4. Alternative chain lengths were tolerated, but the activity of AncAT4 peaked with the C5-thioester (Figure 4B). Notably, we observed much lower enzymatic activity in AT6 and TpAT, which was surprising considering the similarity shared between AT4 and AT6 or TpAT. Additional azaphilone substrates were tested against the panel of purified enzymes with the C5-thioester coupled including (S)-1618, which possess modifications of the R1-substituent. Again, AncAT4 showed the greatest activity among all the enzymes. We were also curious as to whether AncAT4 is more selective for (S)-12 over (R)-12. To answer this question, the oxidative dearomatization was catalyzed by AzaH, and the (R)-product was incubated with each purified AT to be coupled with the C5-thioester. Interestingly, we found that AT4 was more reactive with (S)-12 than (R)-12 (40% vs. 9% conversion, Figure 4B), implying that the reconstructed enzyme has higher selectivity for the (S)-substrate. These observations support the conclusion that our workflow successfully guided the sampling of ancestral ATs to an appropriate sequence space with desired substrate selectivity (enantiodiscrimination). Finally, the enzymatic cascade for the synthesis of (S)-15 was performed on preparative scale to showcase its potential synthetic implementation. (S)-15 was successfully purified and characterized with a 1H-NMR spectrum matching the previously reported (R)-15, but opposite optical rotation and circular dichroism signals compared to (R)-15 (Figure S17). In summary, the results demonstrated that we successfully acquired the stereocomplementary product.

Figure 4.

Figure 4.

Substrate scope of the acyltransferases (ATs) to access linear tricyclic azaphilones. (A) Reaction conditions for the first step from 3 to 4: 2.5 mM substrate 3, 10 μM flavin-dependent monooxygenase (FDMO, AzaH or AncFDMO), 1mM NADP+, 5mM glucose-6-phosphate (G6P), 2 U·mL−1 glucose-6-phosphate dehydrogenase (G6PDH), 50 mM potassium phosphate buffer, pH 8.0, 30 °C, 1 h; conditions for the second step from 4 to 6: 1.1 equiv of thioester 5 and 250 μM AT (2 mol %) were added to the reaction mixture from the step 1. Conversion was calculated by the consumption of the substrate 4 and only reported when the product 6 was formed and detected by UPLC-UV or UPLC-MS. (B) Activity of ATs between (S)-12 and (R)-12. (C) Activity of AncAT4 reacting with various thioester chain lengths in the range from C1 to C9. (D) Substrates with modifications on R1.

Our initial hypothesis was that the function and the (S)-substrate selectivity of AncAT4 could diverge during evolution, leading to that seen either in the clade including AT5 and CazE or that in the clade of AT6 and TpAT (Figure 3C). However, the inferior activity shown in all AT4 descendants suggested that the higher reactivity of AT4 might be a different function/selectivity compared with CazE and TpAT and is less likely a function/selectivity kept in any lineage. This is supported by the fact that our targeted reaction was different from the native reaction of CazE and that no activity was seen when running the targeted reaction using TpAT. To better understand the major differences between AT4 and other enzymes at the molecular level, all ancestors and extant enzymes in the AncAT4 clade were aligned and the active site center was assigned as the geometric center of His166 in the key HX3D motif. Based on the AlphaFold2 model of AncAT4, 96 residues were identified within 18 Å of the active site (SI). AncAT4 shares 87% sequence identity to AT5 and 95% identity to AT6 (Figure 3F). Interestingly, all 28 closest residues within 10.4 Å are identical between AT4, AT5 and CazE (Table S5). From 10.7 to 13.7 Å, 35 residues are identical between AncAT4 and AT6. The remaining 33 more distant residues of AncAT4 from 14.0 to 18.0 Å have one and four substitutions compared with AT5 and AT6, respectively (Table S5). Taken together, the core region of AncAT4 centered around the active site can be considered a chimera of AT5 and AT6 (Figure 3F). Based on this analysis, we suggested that the first-sphere identity of AncAT4 to AT5 endows the basal activity of AT4 and the second-sphere identity to AT6 potentially introduces a different substrate selectivity and thus activity further.

Structural basis for observed reactivity in acyltransferases

Considering the high similarity between AncAT4 and other less reactive ATs, we were interested in gaining further insight into what differentiates AncAT4 apart from others. We hypothesized that molecular modeling would provide a structural basis for the enzyme’s enhanced activity. Structural models of AT4–6 and CazE were generated with AlphaFold248 and the phosphopantetheine C5-thioester was introduced based on the homologous structure of 2E1T49, a malonyl transferase (Figure 5A). The model substrate (S)-12 was docked into these models using the CDOCKER module from pyCHARMM50 (Figure 5B). To identify potential reactive poses, we calculated the distance between the oxygen of tertiary alcohol on (S)-12 and the carbonyl carbon of phosphopantetheine C5-thioester for the top cluster poses of each docking approach, plotted in Figure 5B. We observe that with flexible CDOCKER5152 modified to use a flexible phosphopantetheine C5-thioester, AncAT4 was the only system with a putative reactive pose for the majority of the top cluster poses. This suggests that the apo-structure prediction from AlphaFold requires significant reorientation of the binding site and phosphopantetheine C5-thioester, and that flexible receptor docking is a useful tool for future screening of acyltransferases.

Figure 5.

Figure 5.

Substrate docking and molecular dynamics (MD) simulations. (A) Near attack complex of the substrate (S)-12 in the active site of AncAT4. Near attack complex was selected by finding smallest (S)-12 and C5-thioester carbonyl carbon distance with a Bürgi-Dunitz angle within 90° and 107.5° from 40 ns simulation. (B) Substrate OH and C5-thioester carbonyl carbon distance of top cluster poses from rigid receptor docking (Rigid CDOCKER), flexible receptor docking (Flexible CDOCKER), and Flexible CDOCKER with a flexible C5-thioester group. (C) Joint plot of the OH-carbonyl carbon distance of (S)-12 with the Bürgi-Dunitz angle obtained from 40 ns simulation. The black dashed lines indicate an OH-carbonyl carbon distance of less than 6 Å and Bürgi-Dunitz angle of greater than 90°. (D) Plot of the histidine-OH distance obtained from 40 ns simulation.

The top clustered pose for AncAT4 with a flexible phosphopantetheine C5-thioester was positioned in CazE and AncAT4–6 and simulated for 40 ns to assess the stability of the reactive geometry for each enzyme. In the simulations, AncAT4 maintained the (S)-12 or (R)-12 near a reactive geometry most effectively, while other enzymes explored various potentially unreactive conformations. This is depicted in Figure 5C and S19, which plots the Bürgi-Dunitz angle53 of the acylation reaction against the carbonyl carbon-12-oxygen distance. The Bürgi-Dunitz angle characterizes the angle of attack of nucleophile to the carbonyl plane and ranges from ~90° to 107.5° for effective nucleophilic attack. We adopt a reactivity criterium cutoff of an angle between ~90° and 107.5° and OH-carbonyl distance of less than 6 Å similar to previous work characterizing human acyltransferases.54 With these criteria, AncAT4 is the only enzyme that occupies a reactive geometry during the course of the simulation, with 2% of frames within this cutoff. The pose with the closest OH-carbonyl distance and reactive angle is depicted in Figure 5A. In addition, the majority of the enzymes maintain a histidine-OH distance of less than 6 Å (Figure 5D), suggesting that the orientation of the substrate and C5-thioester is key for determining reactivity. The docking and simulation results collectively suggest that the structural basis for AncAT4’s increased reactivity is its superior positioning of (S)-12 in a reactive conformation, with the identified reactive conformation illustrated in Figure 5A and S20. Furthermore, this study demonstrates the potential of molecular modeling as a system to guide the exploration of other ancestral enzymes and the testing of potential binding site and second-sphere mutations to modulate activity.

Enzyme promiscuity is generally defined as the ability of an enzyme to perform reactions beyond their primary physiological functions.55 This latent trait is crucial since the ability for an enzyme to carry out non-native reactions provides its corresponding organism with some selective advantages, enabling the organism to adapt to changing selection more quickly.56 For this reason, promiscuity is widely considered as a driving force of evolution.57 Inspired by nature, if possible, intentionally introducing promiscuity into an extant enzyme within a sequence space of interest by ASR would be a preferable strategy over the random search for an extant enzyme to achieve desired chemical transformations in some circumstances.58 Additionally, based on a reliable reconstruction, sampling representative ancestral sequences provides a more efficient and economical way than sampling extant sequences to explore the sequence space of interest.59 In our case, five ancestral ATs (AT3–7) can potentially cover the exploration of the sequence space including ten extant sequences. In summary, our bioinformatics-guided ASR method offers a strategically efficient approach to generate novel enzymes with different selectivity. This is particularly beneficial for designing non-native selectivity or reactivity that is challenging to obtain even through extensive screening of existing enzymes and subsequent engineering.

CONCLUSIONS

Tailoring enzymes toward a specific selectivity or novel reactivity is a goal that biocatalytic practitioners strive to achieve and are often met with challenges. In the present study, we solved the challenge of achieving a stereocomplementary chemoenzymatic synthesis of azaphilones as an example to demonstrate a gene mining approach that can be seamlessly combined with ASR to generate ancestral enzymes fulfilling a designed set of synthetic requirements. Specifically, a chemoenzymatic sequence comprising two ancestral enzymes, AncFDMO and AncAT, was designed to enable the biocatalytic synthesis of tricyclic azaphilones. To acquire a suitable AncAT, a bioinformatics-guided workflow was adopted to find an appropriate sequence space where ancestral enzymes were sampled. Starting from identification of initial hits, the strategy enables users to efficiently access active enzymes by circumventing the time- and resource-intensive process of traditional protein engineering and by overcoming challenges associated with protein starting points with limited solubility and/or thermal stability. The best reconstructed AT demonstrated prominently improved soluble expression, reactivity, selectivity, and significant potential to be further engineered. According to our results from experiments, sequence alignment, docking, and MD simulations, we proposed that the reactivity seen in the best reconstructed AT, which is a chimera of multiple extant enzymes, was likely related to the promiscuity introduced during the ASR. Considering that promiscuity has been leveraged by nature to drive evolution, we envision that our approach to introduce promiscuity into potential ancestral sequence space by ASR will be valuable for tailoring enzymes for specific selectivity or reactivity in biocatalysis.

Supplementary Material

SI

ACKNOWLEDGMENT

This research was supported by funds from the University of Michigan Life Sciences Institute, the University of Michigan Department of Chemistry, NIH R35 GM124880 (A.R.H.N.), and NIH R35 GM130587 (C.L.B.). The authors thank Dr. Troy Wymore for introducing ancestral sequence reconstruction into the groups.

ABBREVIATIONS

ASR

ancestral sequence reconstruction

FDMO

flavin-dependent monooxygenase

AT

acyltransferase

BGC

biosynthetic gene cluster

Footnotes

Supporting Information

The Supporting Information is available free of charge on the ACS Publications website.
  • Experimental procedures, acyltransferase DNA and protein sequences, SDS-PAGE gel, differential scanning fluorimetry (DSF) raw data, sequence similarity network (SSN) analysis, BLAST search results, biosynthetic gene cluster (BGC) analysis, phylogenetic tree, active-site residue comparison, UPLC chromatograms, and product characterization (circular dichroism and NMR spectrum) (PDF)
  • SSN files, Python scripts, and Jupyter notebooks to reproduce the cluster analysis: https://github.com/BrooksResearchGroup-UM/Ancestral_AT_paper
  • Python scripts, Jupyter notebooks, and all computational results to reproduce the modeling analysis: https://github.com/BrooksResearchGroup-UM/anc_acyl

The authors declare no competing financial interest.

REFERENCES

  • (1).Huffman MA; Fryszkowska A; Alvizo O; Borra-Garske M; Campos KR; Canada KA; Devine PN; Duan D; Forstater JH; Grosser ST, et al. , Design of an in vitro biocatalytic cascade for the manufacture of islatravir. Science 2019, 366 (6470), 1255–1259. [DOI] [PubMed] [Google Scholar]
  • (2).Fryszkowska A; An C; Alvizo O; Banerjee G; Canada KA; Cao Y; DeMong D; Devine PN; Duan D; Elgart DM, et al. , A chemoenzymatic strategy for site-selective functionalization of native peptides and proteins. Science 2022, 376 (6599), 1321–1327. [DOI] [PubMed] [Google Scholar]
  • (3).Buller R; Lutz S; Kazlauskas RJ; Snajdrova R; Moore JC; Bornscheuer UT, From nature to industry: Harnessing enzymes for biocatalysis. Science 2023, 382 (6673), eadh8615. [DOI] [PubMed] [Google Scholar]
  • (4).Chen K; Arnold FH, Engineering new catalytic activities in enzymes. Nat. Catal 2020, 3 (3), 203–213. [Google Scholar]
  • (5).Wang Y; Xue P; Cao M; Yu T; Lane ST; Zhao H, Directed Evolution: Methodologies and Applications. Chem. Rev 2021, 121 (20), 12384–12444. [DOI] [PubMed] [Google Scholar]
  • (6).Bloom JD; Labthavikul ST; Otey CR; Arnold FH, Protein stability promotes evolvability. Proc. Natl. Acad. Sci. U. S. A 2006, 103 (15), 5869–5874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (7).Romero PA; Arnold FH, Exploring protein fitness landscapes by directed evolution. Nat. Rev. Mol. Cell Biol 2009, 10 (12), 866–876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (8).Tokuriki N; Stricher F; Serrano L; Tawfik DS, How Protein Stability and New Functions Trade Off. PLoS Comput. Biol 2008, 4 (2), e1000002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (9).Sternke M; Tripp KW; Barrick D, Consensus sequence design as a general strategy to create hyperstable, biologically active proteins. Proc. Natl. Acad. Sci. U. S. A 2019, 116 (23), 11275–11284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (10).Sumida KH; Nunez-Franco R; Kalvet I; Pellock SJ; Wicky BIM; Milles LF; Dauparas J; Wang J; Kipnis Y; Jameson N, et al. , Improving Protein Expression, Stability, and Function with ProteinMPNN. J. Am. Chem. Soc 2024, 146 (3), 2054–2061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (11).Bridgham JT; Carroll SM; Thornton JW, Evolution of Hormone-Receptor Complexity by Molecular Exploitation. Science 2006, 312 (5770), 97–101. [DOI] [PubMed] [Google Scholar]
  • (12).Harms MJ; Thornton JW, Analyzing protein structure and function using ancestral gene reconstruction. Curr. Opin. Struct. Biol 2010, 20 (3), 360–366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (13).Hochberg GKA; Thornton JW, Reconstructing Ancient Proteins to Understand the Causes of Structure and Function. Annu. Rev. Biophys 2017, 46, 247–269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (14).Nicoll CR; Bailleul G; Fiorentini F; Mascotti ML; Fraaije MW; Mattevi A, Ancestral-sequence reconstruction unveils the structural basis of function in mammalian FMOs. Nat. Struct. Mol. Biol 2020, 27 (1), 14–24. [DOI] [PubMed] [Google Scholar]
  • (15).Schriever K; Saenz-Mendez P; Rudraraju RS; Hendrikse NM; Hudson EP; Biundo A; Schnell R; Syren PO, Engineering of Ancestors as a Tool to Elucidate Structure, Mechanism, and Specificity of Extant Terpene Cyclase. J. Am. Chem. Soc 2021, 143 (10), 3794–3807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (16).Chiang CH; Wymore T; Rodríguez Benítez A; Hussain A; Smith JL; Brooks CL III; Narayan ARH, Deciphering the evolution of flavin-dependent monooxygenase stereoselectivity using ancestral sequence reconstruction. Proc. Natl. Acad. Sci. U. S. A 2023, 120 (15), e2218248120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (17).DeMars MD II; O’Connor SE, Evolution and diversification of carboxylesterase-like [4+2] cyclases in aspidosperma and iboga alkaloid biosynthesis. Proc. Natl. Acad. Sci. U. S. A 2024, 121 (7), e2318586121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (18).Devamani T; Rauwerdink AM; Lunzer M; Jones BJ; Mooney JL; Tan MA; Zhang ZJ; Xu JH; Dean AM; Kazlauskas RJ, Catalytic Promiscuity of Ancestral Esterases and Hydroxynitrile Lyases. J. Am. Chem. Soc 2016, 138 (3), 1046–1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (19).Zou T; Risso VA; Gavira JA; Sanchez-Ruiz JM; Ozkan SB, Evolution of Conformational Dynamics Determines the Conversion of a Promiscuous Generalist into a Specialist Enzyme. Mol. Biol. Evol 2015, 32 (1), 132–143. [DOI] [PubMed] [Google Scholar]
  • (20).Wheeler LC; Lim SA; Marqusee S; Harms MJ, The thermostability and specificity of ancient proteins. Curr. Opin. Struct. Biol 2016, 38, 37–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (21).Risso VA; Sanchez-Ruiz JM; Ozkan SB, Biotechnological and protein-engineering implications of ancestral protein resurrection. Curr. Opin. Struct. Biol 2018, 51, 106–115. [DOI] [PubMed] [Google Scholar]
  • (22).Gumulya Y; Baek JM; Wun SJ; Thomson RES; Harris KL; Hunter DJB; Behrendorff JBYH; Kulig J; Zheng S; Wu XM, et al. , Engineering highly functional thermostable proteins using ancestral sequence reconstruction. Nat. Catal 2018, 1 (11), 878–888. [Google Scholar]
  • (23).Livada J; Vargas AM; Martinez CA; Lewis RD, Ancestral Sequence Reconstruction Enhances Gene Mining Efforts for Industrial Ene Reductases by Expanding Enzyme Panels with Thermostable Catalysts. ACS Catal. 2023, 13 (4), 2576–2585. [Google Scholar]
  • (24).Jones BS; Ross CM; Foley G; Pozhydaieva N; Sharratt JW; Kress N; Seibt LS; Thomson RES; Gumulya Y; Hayes MA, et al. , Engineering Biocatalysts for the C–H Activation of Fatty Acids by Ancestral Sequence Reconstruction. Angew. Chem. Int. Ed. Engl 2024, 63 (18), e202314869. [DOI] [PubMed] [Google Scholar]
  • (25).Johnson M; Zaretskaya I; Raytselis Y; Merezhuk Y; McGinnis S; Madden TL, NCBI BLAST: a better web interface. Nucleic Acids Res. 2008, 36 (Web Server issue), W5–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (26).Zallot R; Oberg N; Gerlt JA, The EFI Web Resource for Genomic Enzymology Tools: Leveraging Protein, Genome, and Metagenome Databases to Discover Novel Enzymes and Metabolic Pathways. Biochemistry 2019, 58 (41), 4169–4182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (27).Oberg N; Zallot R; Gerlt JA, EFI-EST, EFI-GNT, and EFI-CGFP: Enzyme Function Initiative (EFI) Web Resource for Genomic Enzymology Tools. J. Mol. Biol 2023, 435 (14), 168018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (28).Gao JM; Yang SX; Qin JC, Azaphilones: Chemistry and Biology. Chem. Rev 2013, 113 (7), 4755–4811. [DOI] [PubMed] [Google Scholar]
  • (29).Chen C; Tao H; Chen W; Yang B; Zhou X; Luo X; Liu Y, Recent advances in the chemistry and biology of azaphilones. RSC Adv. 2020, 10 (17), 10197–10220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (30).Williams K; Greco C; Bailey AM; Willis CL, Core Steps to the Azaphilone Family of Fungal Natural Products. ChemBioChem 2021, 22 (21), 3027–3036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (31).Chen W; Chen R; Liu Q; He Y; He K; Ding X; Kang L; Guo X; Xie N; Zhou Y, et al. , Orange, red, yellow: biosynthesis of azaphilone pigments in Monascus fungi. Chem. Sci 2017, 8 (7), 4917–4925. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (32).Wang Y; Torma KJ; Pyser JB; Zimmerman PM; Narayan ARH, Substrate-Selective Catalysis Enabled Synthesis of Azaphilone Natural Products. ACS Cent Sci 2024, 10 (3), 708–716. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (33).Fielding BC; Holker JSE; Jones DF; Powell ADG; Richmond KW; Robertson A; Whalley WB, 898. The chemistry of fungi. Part XXXIX. The structure of monascin. J. Chem. Soc 1961, 4579–4589. [Google Scholar]
  • (34).Zabala AO; Xu W; Chooi YH; Tang Y, Characterization of a silent azaphilone gene cluster from Aspergillus niger ATCC 1015 reveals a hydroxylation-mediated pyran-ring formation. Chem. Biol 2012, 19 (8), 1049–1059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (35).Chiang YM; Szewczyk E; Davidson AD; Keller N; Oakley BR; Wang CC, A Gene Cluster Containing Two Fungal Polyketide Synthases Encodes the Biosynthetic Pathway for a Polyketide, Asperfuranone, in Aspergillus nidulans. J. Am. Chem. Soc 2009, 131 (8), 2965–2970. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (36).Pyser JB; Dockrey SAB; Benítez AR; Joyce LA; Wiscons RA; Smith JL; Narayan ARH, Stereodivergent, Chemoenzymatic Synthesis of Azaphilone Natural Products. J. Am. Chem. Soc 2019, 141 (46), 18551–18559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (37).Chiang CH; Wymore T; Rodriguez Benitez A; Hussain A; Smith JL; Brooks CL III; Narayan ARH, Deciphering the evolution of flavin-dependent monooxygenase stereoselectivity using ancestral sequence reconstruction. Proc. Natl. Acad. Sci. U. S. A 2023, 120 (15), e2218248120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (38).Fischbach MA; Walsh CT; Clardy J, The evolution of gene collectives: How natural selection drives chemical innovation. Proc. Natl. Acad. Sci. U. S. A 2008, 105 (12), 4601–4608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (39).Winter JM; Sato M; Sugimoto S; Chiou G; Garg NK; Tang Y; Watanabe K, Identification and Characterization of the Chaetoviridin and Chaetomugilin Gene Cluster in Chaetomium globosum Reveal Dual Functions of an Iterative Highly-Reducing Polyketide Synthase. J. Am. Chem. Soc 2012, 134 (43), 17900–17903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (40).Vinh NX; Epps J; Bailey J, Information Theoretic Measures for Clusterings Comparison: is a Correction for Chance Necessary? In Proceedings of the 26th Annual International Conference on Machine Learning, Association for Computing Machinery: Montreal, Quebec, Canada, 2009; pp 1073–1080. [Google Scholar]
  • (41).Altschul SF; Madden TL; Schaffer AA; Zhang J; Zhang Z; Miller W; Lipman DJ, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17), 3389–3402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (42).Chen K; Arnold FH, Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide. Proc. Natl. Acad. Sci. U. S. A 1993, 90 (12), 5618–5622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (43).Arnold FH, Directed Evolution: Bringing New Chemistry to Life. Angew. Chem. Int. Ed. Engl 2018, 57 (16), 4143–4148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (44).Wang T; Badran AH; Huang TP; Liu DR, Continuous directed evolution of proteins with improved soluble expression. Nat. Chem. Biol 2018, 14 (10), 972–980. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (45).Risso VA; Gavira JA; Mejia-Carmona DF; Gaucher EA; Sanchez-Ruiz JM, Hyperstability and Substrate Promiscuity in Laboratory Resurrections of Precambrian Beta-Lactamases. J. Am. Chem. Soc 2013, 135 (8), 2899–2902. [DOI] [PubMed] [Google Scholar]
  • (46).Nicoll CR; Massari M; Fraaije MW; Mascotti ML; Mattevi A, Impact of ancestral sequence reconstruction on mechanistic and structural enzymology. Curr. Opin. Struct. Biol 2023, 82, 102669. [DOI] [PubMed] [Google Scholar]
  • (47).Chaloupkova R; Liskova V; Toul M; Markova K; Sebestova E; Hernychova L; Marek M; Pinto GP; Pluskal D; Waterman J, et al. , Light-Emitting Dehalogenases: Reconstruction of Multifunctional Biocatalysts. ACS Catal. 2019, 9 (6), 4810–4823. [Google Scholar]
  • (48).Jumper J; Evans R; Pritzel A; Green T; Figurnov M; Ronneberger O; Tunyasuvunakool K; Bates R; Zidek A; Potapenko A, et al. , Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596 (7873), 583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (49).Unno H; Ichimaida F; Suzuki H; Takahashi S; Tanaka Y; Saito A; Nishino T; Kusunoki M; Nakayama T, Structural and Mutational Studies of Anthocyanin Malonyltransferases Establish the Features of BAHD Enzyme Catalysis. J. Biol. Chem 2007, 282 (21), 15812–15822. [DOI] [PubMed] [Google Scholar]
  • (50).Buckner J; Liu X; Chakravorty A; Wu Y; Cervantes LF; Lai TT; Brooks CL III, pyCHARMM: Embedding CHARMM Functionality in a Python Framework. J. Chem. Theory Comput 2023, 19 (12), 3752–3762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (51).Gagnon JK; Law SM; Brooks CL III, Flexible CDOCKER: Development and application of a pseudo-explicit structure-based docking method within CHARMM. J. Comput. Chem 2016, 37 (8), 753–762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (52).Wu Y; Brooks CL III, Flexible CDOCKER: Hybrid Searching Algorithm and Scoring Function with Side Chain Conformational Entropy. J. Chem. Inf. Model 2021, 61 (11), 5535–5549. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (53).Bürgi HB; Dunitz JD; Lehn JM; Wipff G, Stereochemistry of reaction paths at carbonyl centres. Tetrahedron 1974, 30 (12), 1563–1572. [Google Scholar]
  • (54).Panina IS; Krylov NA; Chugunov AO; Efremov RG; Kordyukova LV, The Mechanism of Selective Recognition of Lipid Substrate by hDHHC20 Enzyme. Int. J. Mol. Sci 2022, 23 (23), 14791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (55).Arora B; Mukherjee J; Gupta MN, Enzyme promiscuity: using the dark side of enzyme specificity in white biotechnology. Sustain. Chem. Process 2014, 2 (1), 25. [Google Scholar]
  • (56).Aharoni A; Gaidukov L; Khersonsky O; Mc QGS; Roodveldt C; Tawfik DS, The ‘evolvability’ of promiscuous protein functions. Nat. Genet 2005, 37 (1), 73–76. [DOI] [PubMed] [Google Scholar]
  • (57).Leveson-Gower RB; Mayer C; Roelfes G, The importance of catalytic promiscuity for enzyme design and evolution. Nat. Rev. Chem 2019, 3 (12), 687–705. [Google Scholar]
  • (58).Champagne SE; Chiang C-H; Gemmel PM; Brooks CL III; Narayan ARH, Biocatalytic Stereoselective Oxidation of 2-Arylindoles. J. Am. Chem. Soc 2024, 146 (4), 2728–2735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (59).Gumulya Y; Gillam EM, Exploring the past and the future of protein evolution with ancestral sequence reconstruction: the ‘retro’ approach to protein engineering. Biochem. J 2017, 474 (1), 1–19. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SI

RESOURCES