Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Oct 17.
Published in final edited form as: Nat Prod Rep. 2024 Oct 17;41(10):1543–1578. doi: 10.1039/d4np00009a

Advances, Opportunities, and Challenges in Methods for Interrogating the Structure Activity Relationships of Natural Products

Christine Mae F Ancajas 1, Abiodun S Oyedele 1, Caitlin Butt 1, Allison S Walker 1,2,3,*
PMCID: PMC11484176  NIHMSID: NIHMS2004621  PMID: 38912779

Abstract

Natural products play a key role in drug discovery, both as a direct source of drugs and as a starting point for the development of synthetic compounds. Most natural products are not suitable to be used as drugs without further modification due to insufficient activity or poor pharmacokinetic properties. Choosing what modifications to make requires an understanding of the compound’s structure-activity relationships. Use of structure-activity relationships is commonplace and essential in medicinal chemistry campaigns applied to human-designed synthetic compounds. Structure-activity relationships have also been used to improve the properties of natural products, but several challenges still limit these efforts. Here, we review methods for studying the structure-activity relationships of natural products and their limitations. Specifically, we will discuss how synthesis, including total synthesis, late-stage derivatization, chemoenzymatic synthetic pathways, and engineering and genome mining of biosynthetic pathways can be used to produce natural product analogs and discuss the challenges of each of these approaches. Finally, we will discuss computational methods including machine learning methods for analyzing the relationship between biosynthetic genes and product activity, computer aided drug design techniques, and interpretable artificial intelligence approaches towards elucidating structure-activity relationships from models trained to predict bioactivity from chemical structure. Our focus will be on these latter topics as their applications for natural products have not been extensively reviewed. We suggest that these methods are all complementary to each other, and that only collaborative efforts using a combination of these techniques will result in a full understanding of the structure-activity relationships of natural products.

Graphical Abstract

graphic file with name nihms-2004621-f0015.jpg

This review highlights experimental and computational methods for studying structure activity relationships of natural products and proposes that these methods are complementary and could be used to build an iterative computational-experimental workflow.

1. Introduction

Natural products (NPs) play an essential role in drug discovery - they have been used as medicines dating far back in human history, from before humans even understood the nature of chemical matter.1 In the modern era, NPs make up a large portion of the FDA-approved drugs with NPs and botanical mixtures accounting for 4.6% and NP derivatives accounting for an additional 18.9% of FDA approved drugs between 1981 and 2019.2 One potential explanation for the great utility of NPs in drug discovery is that they have evolved to target specific proteins and can therefore be used as drugs acting against those targets or their homologs. However, it is important to note that just because an NP has evolved to target a specific protein, does not mean that it is the ideal compound to treat a related disease. Many NPs are proposed to serve a defensive function for their producer by killing or inhibiting the growth of competitors. These compounds can be used against human pathogens or tumors that share the molecular target of that competitor. However, it is unlikely that these homologous targets will have identical binding site structures and therefore the NP may not function with as high an efficacy as it does against its natural target. In addition, there is likely little or no selection on NPs for other qualities that are necessary for making a successful drug, for example pharmacokinetic properties such as bioavailability in humans. This is because most NPs originate from environments quite different from the human body, for example soil or ocean environments or in plants. As a result, synthetic derivatives of NPs are generally more likely to be approved as drugs than NPs themselves.3,4 Another problem when using NPs against infectious agents or cancer is that the target cells can evolve resistance against the NP, rendering it ineffective at treating the disease.5,6

Because of these limitations, NPs must often be modified in a way that maintains or improves their activity while improving their pharmacokinetic properties in order to be used as a successful drug. In order to accomplish this, it is important to understand the structure-activity relationships (SAR) of the NP. SARs are a description of how a molecule's structure relates to its activity. A related concept is quantitative SAR (QSAR), in which mathematical models are used to quantitatively relate structure to activity. SARs are commonly used in medicinal chemistry to guide optimization of a lead compound.7 While there are many examples of SAR being used to develop NP leads into drugs (for example, caspofungin is a semi-synthetic analog of a natural echinocandin with lower toxicity8 and many rapamycin analogs with improved therapeutic properties have been developed9), SAR efforts are generally much more extensive for human-designed compounds.10 This is because synthetic compounds are generally less structurally complex and more amenable to synthetic diversification. Here we discuss experimental and computational methods that enable the study of SAR, their challenges and limitations, and propose how these methods can be applied to NP drug discovery. Our focus in this review is primarily on the methodology used for SAR studies, rather than the SARs of individual NPs. In addition, because experimental methods for SAR studies have been reviewed relatively recently,1114 we will focus more on computational methods which have not been reviewed extensively in the context of NPs.

The most definitive way to determine how a functional group on a molecule contributes to its activity is to remove or chemically modify the group and measure the relative change in activity. To accomplish this, that analog must be obtained. For synthetic compounds, this would be accomplished through chemical synthesis. The same strategy can be applied to NPs. NP derivatives can be accessed through total synthesis, the complete chemical synthesis of the product from simple and commercially available precursors, or through synthetic derivatization. Due to the structural complexity of NPs, this process is more challenging than for compounds of synthetic origin, and we will discuss several total synthesis and derivatization strategies that have been developed to handle these challenges. The natural origin of NPs enables use of enzymes and even entire biosynthetic pathways to aid in their production, and we will also highlight synthetic studies that made use of natural or engineered enzymes to produce NP derivatives as well as those that engineered the entire biosynthetic gene cluster (BGC) of an NP to produce analogs. Another advantage of the natural origin of NPs is that it is likely that evolution has already sampled the chemical space around NPs, and those that are adaptive, perhaps to a different homolog of an ancestral target, will be selected for. Therefore, it is likely that there are evolutionarily-related BGCs which can be mined for analogs with a spectrum of activity against different targets.

Despite the number of tools available to chemists for accessing NP analogs, it is still an extremely time-consuming process, and there may be some analogs that are inaccessible without considerable effort or development of new synthetic technologies. We propose that traditional computational drug design methods as well as more modern AI methods, which are more commonly applied to synthetic compounds, can also be used to learn more about NP SARs. These computational results can then guide NP analog synthesis and discovery efforts by prioritizing those analogs that are more likely to improve activity (Figure 1). We will also discuss these methods and provide suggestions for their application to NPs. First, we will discuss some recently reported methods for predicting bioactivity from BGC sequence and how those methods can be used to deduce SARs. We will then discuss traditional computer aided drug design (CADD) methods and artificial intelligence (AI) techniques, with a focus on how explainable AI (XAI) can be used to elucidate SARs. The experimental and computational techniques are complementary. We propose that the best way to study NP SARs is with an experimental-computational feedback loop (Figure 1). Due to the range of expertise needed for the different experimental and computational techniques, this approach will require that groups of interdisciplinary scientists collaborate to elucidate NP SARs and fully realize the potential of NPs in drug discovery.

Figure 1. Proposed experimental-computational feedback loop.

Figure 1.

We propose that a combination of current experimental and computational techniques for studying SAR is necessary to fully understand the SARs of NPs. In this loop, experimental methods will be used to provide NP analog-activity pairs for training and validation of QSAR and XAI models and these models can in turn be used to guide synthetic and discovery efforts.

2. Synthetic and Semisynthetic Approaches to SAR studies.

2.1. Total synthesis for production of analogs.

There have been many impressive total syntheses of NPs, which are often incredibly complex and therefore challenging to synthesize efficiently. In this review, we will only focus on a few selected examples where the same synthetic strategy was used to generate a large amount of chemical diversity, which could in turn be used for SAR studies. This discussion is not meant to be a comprehensive account of all studies that used total synthesis to study NP SARs or produce NP analogs but rather a general discussion of techniques and a highlight of a few studies that illustrate the use of total synthesis in NP SAR studies well.

Most early total syntheses of NPs used a target-oriented approach, where the synthesis was designed to generate the single target compound.13 Analogs were difficult to access with this approach, as any modifications either had to be made at the end of the synthetic route or by making changes to intermediate steps while remaining compatible with the rest of the synthetic route. It is possible to design synthetic routes for specific analogs of interest, but this is inefficient to do on a large scale. The focus on target-based approaches began to change once the community recognized the importance of screening synthetic analogs of NPs to optimize them for therapeutic application15 and with the development of diversity-oriented synthesis approaches for small molecule library generation.16,17 One of the main approaches for accessing synthetic analogs of NPs for SAR studies is diverted total synthesis (Figure 2A), a term first introduced by Danishefsky and applied to the synthesis of migrastatin analogs, resulting in some analogs with improved antitumor activity without sacrificing plasma stability (Figure 3, Table 1).18 This strategy, also referred to as collective total synthesis,19 involves first determining points on the target for diversification and then identifying the corresponding branch points from a common intermediate. It enables access to changes to the core of the molecule that cannot easily be installed at the end of the synthesis or by semisynthesis.20 Earlier branch points can lead to greater diversity, but also require more reactions to achieve.15 This strategy can result in modifications to the skeletal structure of the product, for example as is seen in the synthesis of pleuromutilin analogs by the Herzon group.21 In some cases, a single divergent strategy can be developed for a specific NP class. For example, the Baran lab developed a two-phase synthesis of terpenes inspired by the biosynthesis of terpenes, where the terpene skeleton is first built through cyclization and subsequently divergently oxidized.14,2225

Figure 2: Synthetic strategy for diversification.

Figure 2:

A) Diverted total synthesis and B) convergent total syntheses are both total synthetic routes that diversify NPs; diverted synthesis has branch points while convergent synthesis feeds different starting materials or intermediates into the same downstream pathway. C) Organo- and organometallic catalysts that interact with a substrate in a specific orientation can lead to site specific modification. D) Promiscuous enzymes can act on multiple substrates to produce a variety of products.

Figure 3.

Figure 3.

Divergent synthesis of migrastatin analogs described in ref 18.18 Positions that are altered relative to the natural migrastatin are highlighted. For simplicity reaction conditions are not shown. Arrows indicated with an a or a b indicate that two alternate reagents could be used at that step which introduce differences in the bond order of the bond between carbons 2 and 3. Bioactivity data on these structures is available in Table 1.

Table 1.

Activities of migrastatin analogs reported in ref 18.18

Compound name 4T1 tumor cell migration (IC50) Stability (t1/2, mouse plasma)
migrastatin 29 μM >60 min
2,3-dihydromigrastatin 10 μM >60 min
N-methyl 2,3-dihydromigrastatin 7.0 μM >60 min
migrastatin core 22 nM 20 min
macrolactone 24 nM <5 min
acetylated macrolactone 192 nM NA
oxidized macrolactone 223 nM NA
hydrolized core 378 nM NA
macrolactam 255 nM >60 min
macroketone 100 nM >60 min
(S)-isopropyl macrolactone 227 μM >60 min
(R)-isopropyl macrolactone 146 μM >60 min
macrocyclic secondary alcohol 8.9 μM NA
macrocyclic tertiary alcohol 3.1 μM NA
macrocyclic CF3-alcohol 101 nM NA

Convergent synthesis is another strategy for generating diversity and involves feeding alternate starting materials or intermediates into the same downstream synthetic route, enabling diversification of structural motifs that must be installed earlier in the synthetic route (Figure 2B). This approach has been applied to generate more than 300 macrolide antibiotic candidates.26 Another strategy for studying SAR relationships is pharmacophore-directed retrosynthesis.13,27 This strategy is similar to the truncated synthesis strategy28 in that it does not aim to synthesize the entire NP, but rather targets the pharmacophore necessary for activity from the outset of the total synthesis effort. Another similar strategy developed by the Shenvi group is to use computation to identify parts of the molecule that are important for target affinity and exclude unimportant but difficult to synthesize parts of the molecule. In one study by the Shenvi group, they aimed to improve the potency of salvinorin A, which has two epimers, one of which is significantly less active. They used computation to identify a change, in this case removal of a methyl group, that could be made to the molecule to favor the active epimer and improve ease of synthesis and used molecular docking to confirm the altered compound was likely to bind in the same pose; synthesis of the altered compound then confirmed the computational results.29 This type of analysis could also be used to first computationally confirm binding of a minimal molecule composed of just the proposed pharmacophore and then synthesizing it to confirm pharmacophore identity. There are a number of reviews that go into more depth on these general synthetic strategies with examples of successful applications and readers should refer to these reviews to learn more.3,1113,15,3033

One challenge of the diverted and convergent approaches is that many reactions must be carried out to generate the diverse products. The reactions needed increase exponentially with the number of branchings in the pathway and linearly with the number of parallel steps. Therefore, pathways that can be automated are ideal for producing a large number of analogs for SAR studies. Solid phase reactions, and in particular solid phase peptide synthesis, is especially amenable to automation, and peptide synthesizer machines are now commonplace. Solid phase peptide synthesis has been used for SAR studies of a number of important peptides including teixobactin,3446 polymyxin,47 lysocin,48 jasplakinolide,49 daptomycin.5055 There are still challenges with peptide solid phase synthesis. Some nonribosomal peptides contain rare amino acids that are not trivial to synthesize, and if a synthesis is not developed for these rare amino acids, they must be substituted in all synthetic analogs. Synthesis of these peptides also often requires multiple orthogonal protecting groups,56 and peptides with complex topology introduced by cyclizations cannot be easily synthesized by solid phase synthesis. Solid phase synthesis has also been used to synthesize polyketides, for example epothilone.57,58 While most automated syntheses of NPs are currently limited to those accomplished by a peptide synthesizer, there is currently a lot of interest in developing general automated chemical synthesis platforms which could ultimately be used to generate larger diversity of NPs.5962 Automated synthesis will also likely be complemented by future developments in computer-aided retrosynthetic planning, which can further automate the process of NP analog production.6367

2.2. Synthetic modification of natural products for SAR studies.

The total syntheses of NPs discussed above often require many steps and are not feasible for producing large quantities of different analogs. If a NP or biosynthetic intermediate can be isolated in large quantities from a native or heterologous producer through fermentation, then modification of the NP through chemical reactions becomes a valid strategy for accessing analogs for SAR studies. This approach is termed semisynthesis. The same methods can also be applied to the NP obtained through total synthesis rather than fermentation and is also referred to as late-stage functionalization. There are also already many existing reviews that cover this strategy6875 so again we will limit our discussion to examples that illustrate general techniques and challenges involved in this approach.

Even if sufficient quantities of an NP can be isolated for input into functionalization reactions, there remain a number of challenges with this approach. These challenges mainly center around developing reactions with sufficient chemo-, site-, and stereoselectivity to modify the NP in the desired manner. NPs often contain multiple of the same reactive groups and therefore developing a reaction to target just one of them is challenging. There are often differences in reactivity for different instances of the same functional group due to differences in their local environment. If these differences are large enough, it becomes possible to modify the most reactive group selectively. Steric effects can also control which group is modified. If the reactivities are too similar or if the target for modification is not the most reactive group, then a catalyst that alters the relative reactivities of the different functional group in order to give the desired modification is required74 (Figure 2C). In this section, we will highlight studies that demonstrated they could achieve selective modification at different sites on the same NP through alterations to the catalyst or reactants, rather than those studies that simply modified the most reactive or sterically accessible sites on a NP or those that relied on the incorporation of directing or protecting groups.

One very effective strategy pioneered by the Miller group is the use of peptide catalysts for site-selective modification. They have applied this strategy for acylation of hydroxyl groups of erythromycin76,77 and apoptolidin A,78 thiocarbonylation, deoxygenation, or lipidation of vancomycin,79,80 phosphorylation of teicoplanin hydroxyl groups,81 bromination of the aryl groups of vancomycin82 and teicoplanin.83 Some of the peptides used as catalysts for the modification of the glycopeptide antibiotics mimicked their natural target, D-Ala-D-Ala, to promote specific binding of the catalyst to the substrate (Figure 4).8083 Peptides are an ideal catalyst for this application because they are easy to synthesize and screen in order to identify catalysts that promote derivatization in different locations.74,84 In addition to peptides, other organocatalysts have also been used to selectively modify different positions in an NP. Chiral 4-pyrrolidinopyridine catalysts have been used to catalyze site-selective acylations of avermectin B2a and changes in solvent were shown to reverse the site-selectivity of the catalyst.85 Other examples include the use of Bi(2-napthol)-derived (BINOL) chiral phosphoric acids to alter site-selectivity of acylations of steroidal and flavonoid NPs.86

Figure 4. Example of peptide catalyst altering site selectivity of reaction.

Figure 4.

Peptide catalyst reported by the Miller lab that alters the site-selectivity of a thiocarbonylation reaction with vancomycin as a substrate.79 The peptide is a modified version of the target of vancomycin such that the catalytic residue is positioned near the desired modification site. Vancomycin is shown in green, the peptide in purple, and the catalytic residue of the peptide in cyan. Only the change to introduce the catalytic residue is shown, other changes made to the peptide are not shown. The structure was modified from PDB ID 1FVM with changes made manually, therefore this structure may not represent the actual structure and some dihedral angles may be inaccurate.

Organometallic catalysts have also been extensively applied in NP total synthesis and derivatization. Organometallic catalysts are especially useful in derivatizing NPs at C-H bonds, as the C-H bond is relatively inert and therefore difficult to activate for modification. C-H activation is a major area of research in chemistry and some of the resulting techniques have been applied to derivatization of NPs. The White group developed iron catalysts that they applied to oxidize C-H bonds in the NPs (+)-artemisinin87 and cycloheximide.88 They demonstrated that alteration of the catalyst’s ligands can lead to catalyst-controlled selectivity and selective reaction at alternative sites previously thought to be too similar in reactivity for selective modifications.89 The Costas group has also applied similar iron catalysts for the site-selective oxidation of C-H bonds in various NPs.90 Oxidation of C-H bonds makes additional downstream modification possible, including those that alter the underlying scaffold such as ring expansion.91 Overall, while there has been a lot of progress in this area, additional progress in catalyst development is needed before it becomes possible to easily edit any site on an NP by late-stage functionalization.

Some of the derivatization methods discussed here can also be used to insert handles, for example for click chemistry reactions, that can later be used to transform the NP into a probe, a strategy used by the Romo group and previously reviewed by them.73 While these analogs are not directly useful for SAR studies, they can be used to discover the molecular target of the NP, which is useful for guiding future SAR studies. Probe handles can be incorporated into any site on the NP so long as it does not interfere with target binding. The probe can then be added to cells or lysate from the target organism. Probes with a biotin or other affinity tag that can be used for pull-downs can then be used to enrich proteins that bind the NP. Proteomics can then be used to measure the enrichment of these proteins. Those with the highest enrichment are the most likely targets of the NP.92 Proteomic strategies for target identification have been extensively reviewed, for both synthetic and natural compounds.92102 Once the target is known, a crystal structure of the NP bound to its target can be obtained. Crystal structures enable rational design and structure-based computational design, lessening the potential number of analogs that need to be screened before one with improved activity is obtained.

2.3. Enzymatic modification of synthetic products for SAR studies.

In this section we will focus on the use of individual enzymes applied to make specific modifications to an NP. We will discuss the engineering of full biosynthetic pathways in the next section. Enzymes are generally much more selective than the organo- and organometallic catalysts discussed previously. This is a trade-off because while enzymes often only catalyze the reaction at a specific location on a molecule in a highly stereoselective fashion, they generally have extremely narrow substrate scopes. Therefore, enzymes often must be engineered for the desired substrate. We will present a few illustrative examples of how enzymatic modification can be incorporated into synthetic routes to enable selective access to more diverse products. For general reviews on biocatalysis for NP modifications readers should refer to refs 103105.103105

As is the case with the organic and organometallic catalysts, it is costly to develop an enzyme to catalyze a specific desired transformation. However, with sufficient effort it is possible to engineer enzymes to act on novel substrates or even catalyze a different reaction. This was made possible by the Arnold group's pioneering work in directed evolution of enzymes, for which Frances Arnold won the Nobel Prize in 2018, in which large mutant libraries of an enzyme are screened to identify those that can catalyze the desired reaction. This process can be repeated multiple times starting from the best candidates from the previous rounds to lead to better selectivity and enzyme efficiency.106 Mutant libraries are often constructed by randomly mutating positions in the active sites of enzymes. Occasionally, naturally occurring enzymes provide a path for more rational engineering, for example the SxtT and GxtA Rieske oxygenase enzymes have 88% sequence identity but install hydroxyl groups on different carbons in the saxitoxin scaffold. A study by the Bridwell-Rabb and Narayan labs compared structures of the enzymes to identify the positions important for determining site-selectivity and used this information to switch selectivity of the enzymes (Figure 5).107

Figure 5. Natural enzymes with altered regioselectivity.

Figure 5.

Two natural enzymes, SxtT (green) and GxtA (purple) catalyze hydroxylation of β-saxitoxinol at two different sites. This is due to the different orientation of the substrate in the enzyme binding pocket. Residue R204 is involved in altering the orientation in SxtT relative to GxtA. Y198 is also positioned differently in SxtT enabling it to make a hydrogen bond with a water molecule that also interacts with the substrate.107

The Renata group has used natural enzymes from NP biosynthesis to access challenging precursors, simplifying the synthetic route and making it possible to invest more effort in producing analogs. Their efforts in this area include the use of natural enzymes for the hydroxylation of amino acids for production of cepafungin I analogs,108 GE81112 analogs,109 and oxidations of terpene scaffolds using P450s from terpene BGCs.110 In addition to natural enzymes, the Fasan and Renata groups have also used engineered enzymes for divergent NP chemoenzymatic synthesis, including the use of engineered P450s for C-H oxidation of terpenes or chiral terpene building blocks.111114 This work is reviewed in more depth in refs 115116.115,116 Similar approaches have also been applied to the chemoenzymatic synthesis of polyketides. One study used synthetic intermediates, terminal PKS modules, and different combinations of glycosylases and P450s to produce a variety of structurally-related polyketides with different glycosylations and oxidation patterns.117 Mutations to P450s that catalyze multiple reactions in a cascade have been shown to alter regioselectivity - this strategy applied to a P450 from the tirandamycin BGC was used to generate five tirandamycin analogs.118

While enzymes are often applied to make specific modifications to a single substrate, another approach is to use a natural or engineered promiscuous enzyme on a library of compatible substrates to synthesize a variety of products (Figure 2D). Enzymes with sufficient promiscuity can be used in convergent synthetic routes, where diverse intermediates are enzymatically transformed to produce diverse products.119 One example of this is the use of Stig cyclases and Fam prenyltransferases from hapalindoles and fischerindole BGCs, many of which were found to have a broad substrate tolerance, to produce 11 hapalindole derivatives and eight fischerindole derivatives.120 Another example of a promiscuous enzyme that can be used to generate many structural analogs is Ulm16, a penicillin binding protein (PBP)-like cyclase, which the Parkinson lab discovered to be highly promiscuous both in terms of precursor sequence and product ring size. They then used Ulm16 to generate libraries of cyclic hexa-, penta- and tetrapeptides from precursors produced by solid-phase peptide synthesis. This is especially notable for the tetrapeptides which are difficult to produce without the help of biocatalysis.121 This strategy could be applied to explore the chemical space around nonribosomal peptides and provide insight into their SARs.

A limitation of total synthesis, semisynthesis, and chemoenzymatic strategies for generating analogs is that a custom strategy must be developed for each NP of interest. For a total synthesis approach, a divergent or convergent route must be planned for each product class. For semisynthesis and chemoenzymatic late-stage derivatization, a catalyst must be chosen. If no catalyst exists that is both promiscuous and selective enough for general use, then a new catalyst must be designed for each desired modification site. This makes SAR studies by synthesis relatively low throughput. However, AI and automation is becoming more common in all areas of synthesis - for example in synthetic route planning,122 synthetic catalyst design,123 identification of synthetic steps that can be completed biocatalytically,63 and enzyme design.124 As these technologies become more advanced it should be possible to access more NP analogs for SAR studies.

3. Biosynthetic approaches to SAR studies

3.1. Natural product classes and nature’s way of diversification

One method for producing derivatives of an NP is to edit or engineer the biosynthetic machinery that synthesizes it. To accomplish this, one must have an understanding of NP biosynthetic machinery, therefore we will first introduce the biosynthesis of different NP classes to which this strategy has been applied.

Over time, advances in genomics and structural biology have unraveled the biosynthetic machineries and origins of NPs, offering insights into nature's diversification strategies. For instance, the pathway for non-ribosomal peptides (NRPs) are governed by NRP synthetases (NRPSs). NRPSs are composed of multi-modular enzymes following an assembly-line logic. Each adenylation (A) domain is dedicated to incorporating specific amino acids into the peptide chain. The activated building blocks are then transferred to the peptidyl carrier protein (PCP) or thiolation (T) domain while the condensation (C) domain catalyzes the peptide bond formation and the thioesterase (Te) domain releases the peptide chain (Figure 6A).125,126 Similarly, a minimal set up of polyketide synthases (PKS), specifically Type 1 (T1PKS), consists of a module containing an acyltransferase (AT) domain to load an activated starter or extender unit such as acetyl-CoA, an acyl carrier protein (ACP), a ketosynthase (KS) domain for catalyzing a condensation reaction to extend the growing polyketide chain while a Te domain catalyzes the cleaving of the assembly line (Figure 6B).127,128 Substrate specificity of the A and AT domains controls diversity of building blocks and starting units that make up the final product.127,129 Additional domains such as ketoreductase (KR), dehydratase (DH), enoylreductase (ER), methyltransferase (Mt) domains modify the polyketide core while NRPS have optional epimerization (E), N-methylation (NMt), heterocyclization (Cy), and oxidation (Ox) domains (Figure 6).126,130 Another group of peptidic NPs are ribosomally synthesized and posttranslationally modified peptides (RiPPs).125 RiPPs are formed first by biosynthesis of a precursor peptide, comprising an N-terminal leader peptide and a C-terminal core region, by the ribosome. The leader peptide contains a recognition sequence which recruits post-translational modifying (PTM) enzymes to modify the core peptide, forming the mature peptide after removal of the leader peptide by peptidases; the modifying enzymes are tolerant of sequence diversity of the core peptide, providing a mechanism for diversification (Figure 6C).131,132 More detailed information of the biosynthetic logic of these classes have been discussed in many recent reviews.125,126,128,133,134 Increasing understanding of the NRPS, PKS, and RiPP biosynthetic pathways, genetic manipulability, and enzyme promiscuity have made these important classes of NPs amenable to engineering, enabling production of analogs for SAR studies. Other classes of NPs such as terpenes have also shown amenability to engineering efforts.135,136

Figure 6. Schematic overview of natural product biosynthetic pathways.

Figure 6.

A) Assembly-line logic of the biosynthetic routes for NRPS with their associated amino acid substrates to form daptomycin. B) PKS modules with their starter and loading units to form erythromycin A, C) Mature RiPP cacaoidin formation.

3.2. Methods to manipulate biosynthetic pathways and examples

Combinatorial biosynthesis is a promising alternative to diversification of the NP arsenal, both structurally and functionally, taking advantage of genetic engineering techniques and the inherent properties of the biosynthetic pathways. These strategies play a crucial role in conducting studies on SARs, furnishing a versatile toolkit to probe the impact of structural variations on the biological activities of NPs. Combinatorial biosynthesis encompasses a spectrum of approaches, including domain/module shuffling, targeted mutagenesis, artificial pathways, directed evolution, manipulation of tailoring modifications (Figure 7). Extensive reviews on these methods have been published in the past.125,132,136142 Here, we highlight examples employing combinatorial biosynthetic approaches to create derivatives for SAR studies, along with related studies.

Figure 7. Expansion of PKs and NRPs NP diversity.

Figure 7.

A) PKS and B) NRPS systems feature multiple tailoring domains including noncanonical NMt, OMt, Cy, and Ox domains. C) Stepwise assembly facilitates combinatorial biosynthesis such as module swapping. Additionally, independent tailoring enzymes like glycosyltransferases can add further modifications to the scaffolds during later stages as shown in the formation of a hypothetical glycosylated hybrid PK/NRP.

A shared property among most NPs like PKS and NRPSs that renders them highly amenable to combinatorial biosynthesis is their inherent assembly-line logic. This allows for predictable diversification by strategic deletion, insertion, duplication, and exchange of domains, modules, and units. According to the assembly-line logic, for example, deletion of modules relate to control of chain length, substitution of A or AT domains can alter building block incorporation while other changes can target stereochemistry and further tailoring steps.143 One of the pioneering examples applies the assembly-line logic of polyketides through transfer of genes involved in actinorhodin biosynthesis into the producers of medermycin/lactoquinomycin or dihydrogranaticin to produce mederrhodins A and B.144,145 Comparisons of the antimicrobial activity against a range of bacterial strains revealed that while mederrhodin A (lacking the OH group at C6) exhibited similar activity to medermycin against gram-negative bacteria, it displayed reduced activity against gram-positive bacteria. In contrast, mederrhodin B (lacking the cyclic lactone) was inactive against both types of bacteria,145 highlighting the importance of both the lactone and hydroxyl group in medermycin. The initial potential of combinatorial biosynthesis to generate products in a predictable manner spurred further interest for SAR studies. Given its role as the prototypical model of T1PKS and as the precursor for clinically relevant erythromycin and rapamycin, 6-deoxyerythronolide B synthase (DEBS) was used in hybridization studies. ATs in the DEBS pathway have been exchanged with those from other PKS clusters with different extender specificities including the rapamycin146,147 and avermectin PKS.148,149 In total, a large library of 61 6DEB analogs was systematically constructed,150 laying the foundation for further optimization of polyketide cores.151153 Examples of such studies include those aimed at generating rapamycin analogs (rapalogues) 154,155 and avermectin analogs for SAR studies at sufficient titers.156

In the context of NRPS, similar attempts have been made to use combinatorial biosynthesis for peptide analogs. One successful example of applying combinatorial biosynthesis to NRPS involves the NPs in the A21978 and A54145 complexes, including daptomycin.157159 While these products are active against clinically relevant gram-positive pathogens, only daptomycin has been developed as a clinical drug. Despite this, daptomycin’s clinical use is limited due to inhibition through interaction with pulmonary surfactant, a mixture of compounds present in epithelial lining fluid in the lungs.160,161 Combinatorial biosynthesis has been used to produce analogs to probe the SAR of these related lipopeptides. This was accomplished through careful considerations of A, C, and T domain specificities162 to conduct gene deletions, exchanges, module shuffling,126,141,163165 and lipidation, generating over 120 compounds; however, only around 40 were produced in sufficient amounts for further analysis. Effects of the modifications were analyzed against Staphylococcus aureus with and without 1% bovine surfactant (Figure 6, Table 2). The best results were from substitutions of Kyn13 to aliphatic Ile13 or Val13. Similarly, related A54145D and A54145E have relatively good antibacterial activities and arguably without surfactant inhibition.166,167 Further optimization was conducted by modifying eight positions of the core peptide.162 Notably, CB-182,390 had a minimum inhibitory concentration (MIC) of 2 ug/mL without surfactant and retained the same MIC with surfactant, indicating the importance of the modified positions Asn3, Asp9, and 3mGlu12, the latter of which has been shown to be correlated to antibacterial activity.50 These combinatorial engineering pursuits of the NRPS pathway of daptomycin and related peptides allowed for the interrogation of peptide core residues as well as preparation of non-proteinogenic amino acids such as 3mGlu. This has propelled further studies and derivatization of related lipopeptides by leveraging the importance of stereochemistry, Te domain cyclization, N-terminal modifications and lipidations.50,54,167169 Recently, the concept of evolution-guided identification of exchange units was developed for NRPS170 and trans-AT PKS engineering,171 greatly increasing the efficiency of engineering these biosynthetic pathways. The NRPS exchange unit strategy was applied to biosynthesize analogs of fellutamide B, a protease inhibitor, which resulted in a compound that is the best reported inhibitor of the Mycobacterium tuberculosis proteasome.170

Table 2. Daptomycin and lipopeptide antibiotics generated by combinatorial biosynthesis. Adapted from Baltz 2014.

172 A schematic representation of Daptomycin with numbered amino acids is available in Figure 6A.

Compound Amino acid at position S. aureus MIC (ug/mL)
2 3 5 6 8 9 11 12 13 Side chain -surf +surf Ratio (+/-)
Daptomycin D-Asn Asp Gly Orn D-Ala Asp D-Ser 3mGlu Kyn N-decanoyl 0.5 64 128
CB-182,107 D-Asn Asp Gly Orn D-Ala Asp D-Ser 3mGlu Ile Anteiso-undecanoyl 2 8 4
CB-182,106 D-Asn Asp Gly Orn D-Ala Asp D-Ser 3mGlu Val Anteiso-undecanoyl 4 8 2
A54145E D-Glu hAsn Sar Ala D-Lys moAsp D-Asn 3mGlu Ile Anteiso-undecanoyl 1 32 32
A54145D D-Glu hAsn Sar Ala D-Lys moAsp D-Asn Glu Ile Anteiso-undecanoyl 2 4 2
CB-183,296 D-Glu hAsn Sar Ala D-Lys moAsp D-Asn Glu Kyn Anteiso-undecanoyl 1 2 2
CB-182,390 D-Glu Asn Sar Ala D-Lys Asp D-Asn 3mGlu Ile Anteiso-undecanoyl 2 2 1
CB-182,561 D-Asn Asp Sar Ala D-Lys moAsp D-Asn 3mGlu Ile Anteiso-undecanoyl 1 2 2

While the previously mentioned examples were successful in generating a relatively substantial number of derivatives, the overall success rate of these approaches tends to be low as combinatorial editing of PKS and NRPS assembly are not as straightforward. Numerous recurring challenges are primarily due to disruptions in PSKS or NRPS systems which can be attributed to the impact of the gatekeeper domains and inter-domain communication.128,139,173 While recent efforts have reported the establishment of several high-throughput methods for NRPS and PKS engineering149,174178 and the development of computational engineering tools such as ClusterCAD for multimodular T1PKS and NRPS,179181 producing libraries of derivatives for SAR studies is limited by low production titers.182,183 Another challenge is that derivatives generated from manipulation of the assembly-line typically cover a limited chemical space, mainly modifying the reduction level and side chains of the core polyketide or peptide scaffold and may not necessarily exhibit improved activity compared to the parent compound. One example is from a recent SAR study on a hemiacetal-less rapamycin with diminished activity, suggesting nature has already optimized some of these scaffolds.184 Given the invaluable insights gained from SAR studies of NPs for their optimization for clinical use, alterations to these enzymes may necessitate significant modifications that may otherwise not yet be sampled by nature. This could include the incorporation of non-natural building blocks, a possibility achievable through combined approaches like mutasynthesis.185

On the other hand, nature continues to provide examples that inspire other ways of diversification. Nature has ingeniously exploited the shared features between PKS and NRPS, as evident from multitude of hybrid NRPS-PKS NPs186189 such as antitumor bleomycin190 and FK520,191 and vatiamides A-F.192 These examples showcase the versatility of hybrid PKS and NRPS assemblies and highlight the potential of hybrid combinatorial pathways for expanding biosynthetically-accessible chemical space. Recent examples exchanged domains from similar antimycin-like hybrid enzymes to generate novel neoantimycin and JBIR-06 derivatives with relatively productive yields.193,194 NRPS/PKS engineering was also used to produce a rapamycin analog. While the rapamycin core is biosynthesized mostly by PKS which have been extensively manipulated for production of other rapalogues, its gene cluster also has an NRPS gene that incorporates pipecolic acid but is promiscuous enough to accept alternative substrates such as L-proline, enabling production of additional analogs.195 This strategy has also been attempted in fungal hybrid system to swap non-cognate PKS and NRPS modules with mixed success.196198 A better understanding of PKS and NRPS compatibility is imperative for hybrid engineering.

Despite the still limited number of RiPPs that have received clinical approval, RiPP enzyme engineering has emerged as a promising avenue to access peptides that might be better suited as drugs than their natural counterparts, as emphasized by recent reviews.132,133,140,142 In RiPP biosynthesis, the organization of the leader peptide, core peptide and PTM enzymes can be viewed as parallel to the modular logic of NRPS and PKS, and it facilitates even easier manipulation for peptide diversification compared to NRPS engineering. Moreover, since RiPPs are directly gene-encoded, precursor peptide mutants can be generated from mutagenesis and recombinant techniques, offering a facile approach to creating libraries of derivatives. One example is from the Müller lab on the promising RiPP antibiotic, darobactin, by heterologous expression to increase titers and identify analogs with improved activity.199 Technologies for generating RiPP analogs in high-throughput have been improving rapidly. A novel nanoFleming platform was used to screen for bioactive molecules, 11 of which had improved activity against Enterococci and Staphylococci strains.200 Another recent study generated a library of over 90,000 ubonodin lassopeptide variants.201 A select 15 of these variants showed antimicrobial activity against Burkholderia cenocepacia while one variant (H17G) had a lower MIC than the wild-type ubonodin, which already has a MIC comparable to clinically approved antibiotics.201,202 Moreover, the large data set allowed the generation of a deep learning model to predict RNAP inhibition which was also validated by RNAP inhibitory activity of the variants. Compared to NRPS and PKS engineering efforts, these SAR studies of RiPPs sample a large chemical space in sufficient titers for activity assays in a high-throughput manner. Moreover, the potential of RiPP engineering can be expanded to generate artificial libraries inspired by hybrid RiPP pathways and NRP mimics.125,140,203,204

Post-tailoring enzymes which catalyze reactions including glycosylation, halogenation, and alkylation, are commonly observed in many classes of NPs.205 These tailoring modifications decorate the scaffolds of NPs to increase the structural diversity and pharmaceutical applications, providing another catalytic toolbox to probe SAR. One versatile tailoring reaction is the addition of sugars by glycosyltransferases (GTs) which improves solubility and bioavailability. While this has been well-explored for polyketides such as in erythromycin206,207 and glycopeptides like mannopeptimycin,208 RiPPs are an interesting new target as only a few of these glycosylated RiPPs have been isolated such as cacaoidin,209 glycocins,210 and NAI-112.211,212 A few glycopeptide engineering tools have been developed and applied to produce peptides which showed inhibitory activity against Bacillus cereus with a lower MIC than sublancin, a natural glycocin.213 A high-throughput screening assay (SELECT-GLYCOCIN) was developed for facile generation of O- and S-linked glycopeptide (enterocin-like) libraries in which di-glycosylated variant G16E-H24L showed improved activity against Listeria monocytogenes.214 In combination with previous studies reporting relaxed substrate specificity of S-glycosyltransferases,215 these strategies provide powerful tools for production of novel glycopeptides. Apart from RiPP glycosylation, other recent reports of improved activity from combinatorial biosynthesis using tailoring enzymes across polyketides,216,217 NRPs,218 and other classes219222 highlight the power of this strategy.

4. Genome Mining of Natural Products: Unveiling Evolutionary Relationships between Biosynthetic Gene Clusters for Valuable SAR Insights

4.1. Evolution and SAR of natural products.

NPs, also called secondary or specialized metabolites, are thought to help their producing organisms adapt to specific ecological niches or lifestyles.223 Therefore, the genes that are essential for producing NPs should be under selection when their product provides a fitness advantage to the organism. Changes in environment could lead to changes in selective pressures; for example, if a new competing organism enters an environment, there could be selective pressure for the original organism to produce compounds that inhibit the growth of the new competitor. Introduction of antimicrobial resistance genes into a population would also likely lead to a change in selective pressures on genes that produce antimicrobial compounds. One challenge facing this work is that the true ecological role of NPs is often unknown, and may not be the same as the potential clinical applications.224. Some have even suggested that NPs do not serve specific adaptive roles and are instead neutrally evolving offshoots of primary metabolism or a way to dispose of unneeded precursors.225 However, production of NPs is costly and there is mounting evidence that they are under selection. For example, there is evidence that BGCs that produce synergistic compounds coevolve, in the case of the β-lactams and β-lactamase inhibitors (such as clavulanic acid) or pairs of compounds that inhibit a target at different sites, such as the streptogramins.226 There is also evidence of convergent evolution of chemical structures, for example, dentigerumycin and gerumycin from the fungus-growing ant system, which have similar activities and chemical structures but unrelated BGCs.224 Convergent evolution is also observed among unrelated BGCs that produce similar β-lactam scaffolds.227 Another example of convergent evolution of BGCs are the multiple unrelated pathways for producing phosphonate NPs such as fosfomycin and fosmidomycin.228,229 Understanding the ecological roles of NPs and the mechanisms behind BGC evolution, specifically how selection acts on genetic variation to give rise to active compounds, can provide insight into NP SAR.

There are several existing research articles and reviews that investigate what is known about the evolutionary mechanisms and dynamics that give rise to diverse NP structures. Here, we will just highlight some of the common themes across these publications. Genetic variation in BGCs capable of leading to differences in product structure can arise through several mechanisms including de novo assembly, gene duplication, gene diversification, rearrangements, and horizontal gene transfer.129,223,224,227,230 Medema et al. performed a comprehensive analysis of BGC evolution and observed the same evolutionary mechanisms discussed in these reviews and frequent merging of smaller "sub-clusters". These sub-clusters appear to function as independent evolutionary units, which can be transferred and recombined between different BGCs, giving rise to new chemical entities.231 It has also been proposed that enzymes from secondary metabolic pathways are more promiscuous than those from primary metabolism, enabling diversification and faster evolution.223,230,232,233 Changes to the structure can also occur through the gain or loss of tailoring enzymes, leading to different modifications of a shared scaffold.129 One cluster may also produce multiple compounds due to incomplete modification by a tailoring enzyme or perhaps differences in the expression of the tailoring enzyme relative to the core enzymes.234 Such clusters are a source of closely related compounds that could be used for SAR studies.

4.2. Bioinformatics tools and databases for gene cluster comparison for natural variant exploration

The advancement of scientific research has been propelled by the advent of cutting-edge technologies like genomic sequencing, curated databases, and bioinformatic tools powered by machine learning to facilitate the examination of gene clusters to uncover bioactive secondary metabolites.235 Software for analyzing and comparing sequences such as BLAST,236 Diamond,237 and HMMer238 enable exploration of large quantities of genetic data. A drawback of these methods is that they only annotate one gene at a time. Therefore, multiple BGC-specific tools have been built on these technologies to enable the characterization and comparison of multiple genes in order to identify and analyze BGCs. Some of the openly available BGC-computational tools include CLUSEAN,239 NP.searcher,240 antiSMASH,241 MultiGeneBlast,242 DeepBGC,243 RODEO,244 BiG-SCAPE,245 BiG-SLiCE,246 CORASON,245 EvoMining,247 PRISM,248 ARTS,249 ClusterScout,250 and the lately developed cblaster,251 clinker,252 CAGECAT,253 and lsaBGC.254 To understand how changes in BGCs that occur over evolutionary history influence the structure and activity of their products, it is necessary to compare evolutionarily related BGCs to identify the insertions, deletions, duplications, and recombinations that result in changes in product structure. Many BGC-computational tools provide methods by which to compare clusters (Table 3). AntiSMASH has knownclusterblast and clusterblast, which enable comparison of BGCs to characterized BGCs from the MIBiG database and BGCs from the larger antiSMASH database, respectively. These methods identify clusters that have homologous genes, as defined by a set threshold of sequence similarity, and provide a visual representation of which genes in the pairs of clusters are homologous and a percent similarity score.255 Knownclusterblast and clusterblast are limited by their reliance on specific databases for comparison and can only be used to analyze BGCs that belong to well-established biosynthetic classes that are identified by antiSMASH. Other tools, such as MultiGeneBlast, ClusterScout, lsaBGC, and cblaster enable searching for multiple genes, which the user specifies, that co-occur against the NCBI database using BLAST or HMMER searches.242,250,252,254 Clinker provides a mechanism for visualizing results from cblaster or other search methods, coloring genes by homology, and connecting homologous genes by paths shaded by the level of sequence identity. The cblaster-clinker workflow was recently combined into a single user-friendly webserver, CompArative Gene Cluster Analysis Toolbox (CAGECAT).253 RODEO uses a different approach, performing queries on a single gene family, but subsequently allows for the analysis of gene co-occurrence patterns in the genomic neighborhood of the query.244 The EvoMining approach also searches first for individual genes, specifically enzymes related to those from primary metabolism that may have functionally diverged to become part of secondary metabolism, and then analyzes the surrounding genome for similar domains that could indicate a BGC. This method enables the identification of previously unknown classes of BGC.247

Table 3. Methods for comparing BGCs.

Query searches are searches that use one or more domains or genes from a cluster as a query, untargeted clustering compares all BGCs in an input database and does not rely on a specific query.

Method Type of search Type of visualization
antiSMASH query colored by homology
MultiGeneBLAST query colored by homology
ClusterScout query colored by homology, BGC similarity network
cblaster/clinker/CAGECAT query gene presence/absence table, colored by homology
lsaBGC untargeted clustering colored by homology, gcf phylogeny heatmap
RODEO query colored by homology
BiG-SCAPE untargeted clustering BGC network, BGCs colored by matching domains
BiG-SLICE untargeted clustering BGCs colored by matching domains
CORASON BiG-SCAPE cluster colored by homology

All the methods discussed so far rely on the user identifying a specific BGC or family of BGCs to use as a query. However, to understand the broader evolutionary history of BGCs it will likely be necessary to identify multiple groups of related BGCs. There are several methods for clustering BGCs based on sequence similarity of their genes or shared biosynthetic domains. Biosynthetic Gene Similarity Clustering and Prospecting Engine (BiG-SCAPE) calculates the distance between clusters based on a combination of shared types of protein family (PFAM) domains, percentage of shared adjacent domains, and sequence identity which is measured using HMM profiles to improve computational speed. These scores are weighted differently for different classes of BGCs to account for class-specific evolutionary dynamics. These distances are then used to build similarity networks of BGCs to cluster them into gene cluster families (GCFs); different thresholds allow for hierarchical clustering.245 While BiG-SCAPE was designed to process many clusters quickly, it is not fast enough to process all putative BGC sequences in one run. Biosynthetic Genes Super-Linear Clustering Engine (BiG-SLiCE) was developed to address this issue and works by first converting BGCs into a vector representation of the absence/presence and similarity bitscores resulting from a gene search using profile HMMs (pHMMs). Then the BIRCH clustering algorithm, which runs with near linear complexity, is used to cluster large numbers of BGCs into GCFs.246 Both BiG-SCAPE and BiG-SLiCE offer interfaces that allow for the visualization of shared domains between members of the same GCF, allowing for the identification of evolutionarily conserved core biosynthetic proteins. BiG-SCAPE also allows for the visualization of the BGC similarity network. A complementary method to BiG-SLiCE, clust-o-matic, uses an all-versus-all distance matrix of BGCs based on sequence similarity and agglomerative hierarchical clustering; these two methods were found to generally agree with each other.256 LsaBGC also provides a method to cluster BGCs, focusing on genes identified as homologous using OrthoFinder2, rather than PFAM similarity, and a synteny score similar to that of BiG-SCAPE. Another advantage of lsaBGC is that it can calculate various evolutionary statistics, such as the rate of synonymous and nonsynonymous mutations for homologous genes, in addition to analyzing overall gene co-occurrence patterns, potentially revealing parts of biosynthetic enzymes that are under purifying or directional selection which could correlate with activity of the product.254

Finally, while all the methods we have described so far can identify potentially homologous genes between clusters, they do not provide insight into the evolutionary relationships of the clusters. One method for learning more about evolutionary relationships is to build phylogenetic trees for individual genes that are shared between the clusters of interest.257 This type of analysis can be especially useful when applied to genes whose evolutionary history is highly correlated with the product’s structure, for example, trans-acyltransferase polyketide synthases (trans-AT PKS) ketosynthase (KS) domain.258 However, in many cases, these results are only applicable to the individual domains or proteins and not the whole cluster because frequent recombination events in BGCs mean that the evolutionary history of different proteins in the cluster may be distinct.231 CORe Analysis of Syntenic Orthologs (CORASON) can be used to generate multi-locus phylogeny of a set of related BGCs using the sequence of one or more genes conserved across the BGCs and uncovers all clades that may be accountable for the biosynthesis of a family of NPs. CORASON has also been integrated with BiG-SCAPE to build phylogenetic trees for GCFs identified by BiG-SCAPE.245

The successful application of the tools discussed above requires high-quality and open sequence databases. Available databases include BAGEL,259 antiSMASH-db,260 IMG-ABC,250 MIBiG,261 and BiG-FAM.262 BAGEL is a web-based database that provides sequences of putative bacteriocins and RiPPs.259 antiSMASH-db260 and Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC)250 include both experimentally verified and predicted BGCs. BiG-FAM is a database of putative BGCs clustered into GCFs, enabling comparisons of related BGCs for users who cannot run clustering of BGCs themselves.262 These databases are useful for genome mining efforts and for evaluating sequence variation between related BGCs, which could provide insight into how they have evolved to have different structures and functions. MIBiG is unique in that it is curated to BGCs with experimental evidence linking them to a specific NP.261 MIBiG also interfaces with NPAtlas,263 a database of NP structures, enabling the study of BGC-structure-activity relationships.

4.3. Example case studies that illustrate how BGC comparison informs knowledge of SAR

Various combinations of the techniques and datasets described above have been used to successfully identify structurally related compounds produced by evolutionarily related BGCs. These types of linkages enable the understanding of how evolutionary processes shape NP structural diversity and alter their bioactivities, possibly for the purpose of adaptation. Several existing reviews describe the use of phylogenetic technologies in NP studies.257,264266 Here, we will highlight some studies that applied these approaches to isolate compounds that were related to known compounds and compare the activity of these different structural analogs.

The Brady lab developed a phylogenetic approach to identify BGCs that produce analogs of known compounds from eDNA libraries. First, their approach involves selectively amplifying core biosynthetic genes whose phylogeny is linked to product structure, sequencing the amplicons, and then building a phylogenetic tree from the resulting sequences and those from known BGCs. Finally, heterologous expression is used to isolate the product of interest. They performed this process using the chromopyrrolic acid synthase gene from tryptophan dimer BGCs to identify several novel tryptophan dimer NPs (Figure 8, Table 4), including those from previously unknown subclasses. These compounds all had different degrees of cytotoxic activity against tumor, fungal, and bacterial cells and likely have different molecular targets.267270

Figure 8. Genome mining for tryptophan dimer NP analogs.

Figure 8.

This figure shows a comparison of BGCs and their corresponding products. The BGC image was created using clinker with BGCs retrieved from the MIBiG database. The genes in black are homologs of staD, the gene used as a handle for eDNA genome mining by the Brady lab. Connections between genes indicate percent sequence identity. We identified several genes that lead to structural divergence and colored them based on enzymatic activity in a manner consistent with the coloring of functional groups on the product structure. Homologous genes are given the same color even if they have divergent enzymatic activities. Note that not all structural differences in the product are due to gene gain or loss. For example staC and rebC, which are involved in the conversion of chromopyrrolic acid to the staurosporine and rebeccamycin aglycone, respectively, result in different oxidation states at the C-7 position, which suggests that differences in these enzymes’ catalytic activity results in structural divergence. Information on the biosynthesis of these compounds used in the figure was obtained from refs 268270, 277279.268270,277279

Table 4: Bioactivities of tryptophan dimer NP analogs.

This table shows activities of different tryptophan dimer analogs shown in Figure 8. Abbreviations used in table: MIC = minimum inhibitory concentration, MTD = maximum tolerated dose, IC50 = half maximum inhibitory concentration.268270,277279

NPs Bioactivities
MIC in μg/mL: S. aureus MIC in μg/mL: E. Coli IC50 (μM): Human HCT116 MIC in  μg/mL: C.
albicans
Reductasporine 105 >150 503 36.3
Hydroxysporine >150 >150 36.5 36.0
Erdasporine A 4.2 4.3
Erdasporine B 4.2 5.3
Erdasporine A 8.5 13
Violacein 6.2 >200 5–10 4.375
MIC in μg/mL: S. aureus MIC in μg/mL: E. Coli MIC in μg/mL: S. faecalis IC50 (μM): Human HCT116 IC50 (μM): P388 Leukemia
Rebeccamycin 1 >125 8 0.41 6.0
MIC in μg/mL: S. aureus MIC in μg/mL: E. Coli IC50 ( μM): Human HCT116 MIC in  μg/mL: S.
cerevisiae
Arixanthomycins A 1.6 >50 0.15 >50
Arixanthomycins B 25 >50 5.14 >50
Arixanthomycins C >50 >50 25.42 >50
MIC in μg/mL: M. tuberculosis MIC in μg/mL: S. grisues IC50 (μM): PfPK5 IC50 (μM): PKnB
Staurosporine 5–50 125 1.0 0.60
MIC in μg/mL: M. tuberculosis MIC in μg/mL: S. grisues IC50 (μM): PfCDPK1 IC50 (μM): PKnB
K252a 5–50 12.5 0.045 0.096
MIC in μg/mL: S. aureus MIC in μg/mL: E. coli IC50 (μM): Human HCT116 MIC in μg/mL: B. subtilis
Borregomycin A >25 >25 1.2 >25
Borregomycin B 0.20 >25 1.4 0.20
Borregomycin C 0.39 >25 1.9 1.6
Borregomycin D 3.1 >25 3.9 3.1
MIC in μg/mL: S. pombe mutants MIC in μg/mL: E. coli IC50 (μM): Human HCT116 MIC in μg/mL: B. subtilis
BE-54017 0.031 0.079
Cladoniamide 0.078 0.0088
MIC in μg/mL: S. aureus MIC in μg/mL: M. luteus MTD (mg/kg): P388 leukemia MIC in μg/mL: B. subtilis
AT2433-A1 16 <0.25 2 4.0
AT2433-A2 16 1.0 8.0
AT2433-B1 32 0.25 4 8.0
AT2433-B2 32 4.0 64.0

The Brady lab has also applied this approach to KS domains from anthracycline and pentangular polyphenol BCGs. This resulted in the discovery of new anthracyclines, arimetamycins A-C, which were produced by a gene cluster most closely related to the steffimycin BGC. The cluster had additional glycosyltransferases, and the arimetamycins were glycosylated with additional sugars not previously found in the steffimycin family. Arimetamycin A, which was glycosylated with two rare sugar moieties, showed improved activity against multiple cancer cell lines, including two multidrug-resistant cell lines, compared to doxorubicin and daunorubicin. This indicates that these sugars could be important for improving activity and that the glycosyltransferases in the arimetamyicn BGC could help their host compete against microbes that had evolved resistance to monoglycosylated steffimycins.271 The same approach applied to the pentangular polyphenol family of polyketides resulted in the discovery of arixanthomycins A-C, which differ from previously discovered pentangular polyphenols in many ways, including the addition of a carboxylated oxazolidine ring, glycosylation at C-13, and different oxidation states of some of the rings. The arixanthomycins were found to have antiproliferative activity, with arixanthomycin A being the most active. The authors attributed the improved activity to the sugar moiety present on arixanthomycin A but not on arixanthomycin B and C.272

The glycopeptide antibiotics (GPAs) are an especially interesting class to study with a phylogenetics approach because resistance mechanisms do not provide equal protection against all GPAs,273 and some GPAs may have evolved to escape those mechanisms. Several studies have built phylogenies of different protein domains in GPA clusters to reveal the natural history of glycopeptide antibiotic biosynthesis and resistance.274,275 A later study first used phylogenetic mining to identify relatives of GPA and then prioritized BGCs that lacked known resistance genes because these BGCs would be more likely to produce antibiotics with a novel mechanism of action (MOA). BGCs that appeared to have diverged from GPAs but lacked the GPA resistance genes were found to produce a known compound, complestatin, as well as a compound first identified in that study, corbomycin. Both compounds were found to be active against vancomycin-resistant and intermediate strains. These compounds work by binding to peptidoglycan and blocking autolysin activity, unlike “true-GPAs,” which target D-Ala-D-Ala peptidoglycan precursor, inhibiting transpeptidation/transglycosylation.276 These studies illustrate how phylogenetic mining can enable the discovery of structurally and evolutionarily related compounds with different MOAs and resistance profiles. Further study of synthetic or natural intermediates between the compounds could identify the structural motifs responsible for the divergent MOAs.

5. Machine learning analysis of biosynthetic gene clusters.

As discussed above, variation in BGC genetic content leads to variation in product structure. With sufficient data and knowledge of biosynthetic enzymes, it should be possible to predict the structures of NPs from the sequence of the biosynthetic gene clusters that produce them. Similarly, with enough knowledge of SAR trends for the chemical scaffold of the product, it should be possible to predict activity from the structure of the product. Since BGC determines product structure and product structure determines activity, it follows that NP activity can be predicted directly from the sequence of the BGC encoding it (Figure 9A). Beyond biosynthetic genes, BGCs also contain additional clues for the activity of their product since they often carry genes that provide resistance to the product. There are several methods for identifying these genes.249,280

Figure 9. Use of BGC product activity prediction algorithms to infer NP SARs.

Figure 9.

A) Relationship between BGC, NP structure, and NP activity. B) Workflow for using methods that predict activity from BGCs to infer SARs.

Recently, there have been several machine learning methods reported that predict NP bioactivities from features of the BGC that produce them. While all of these methods have limited accuracy, likely due to a severe lack of training data, we expect that they will greatly improve in the future as more data and advanced AI models become available. SARs can be gleaned from these methods in two ways. First, explainable AI methods (discussed further below) can be used to identify which biosynthetic features contribute to a prediction of activity or inactivity (Figure 9B). These biosynthetic genes can then be connected to the functional groups they install in the final product, which can also be assumed to contribute to activity or inactivity, respectively. Second, activity can be predicted for different natural variants of BGCs discovered using the methods described in the previous section. If the method is of sufficient accuracy to predict the relative activity of the products of the two BGCs and the structural change between the products can be determined from the BGCs, this could predict a SAR. While these methods may not currently be of sufficient accuracy for the second approach to work on most BGCs, we expect that in the future this type of analysis could become feasible. In the remainder of this section, we further describe how each of the reported prediction methods works and suggest how the method could be adapted for studying SARs of NPs.

The first reported method to predict NP activity from BGC was DeepBGC. DeepBGC’s primary function is to identify BGCs using a deep learning approach, specifically a Bidirectional Long Short-Term Memory (BiLSTM) Recurrent Neural Network that takes a sequence of embedded protein domain family classifications (PFAM) vectors as inputs. For its activity prediction, DeepBGC uses a random forest model trained on a count vector of PFAM domains. Random forest models have a feature importance score that measures which features are most important for making classifications. If this approach is applied to DeepBGC, it could be used to identify biosynthetic genes which are correlated with activity and the structural motifs they install. DeepBGC is trained to predict four bioactivities: antibacterial, cytotoxic, inhibitor, and antifungal. The activity prediction of DeepBGC was only trained on 370 training data points and therefore has limited accuracy,243 and attribution of activity to specific biosynthetic domains using this model would also likely lack accuracy.

The next reported method to predict bioactivity from BGCs is PRISM 4. PRISM 4’s primary function is to identify BGCs and predict the chemical structure of the product, but PRISM also has activity prediction functionality. The authors of the PRISM 4 study trained support vector machines (SVMs) to predict bioactivities and compared two different BGC featurization strategies - a PFAM count vector and the chemical fingerprint of the PRISM predicted product structures. They found that the models that used predicted structures were more accurate than those using PFAMs. PRISM 4 was trained to predict five bioactivities: antibacterial, antifungal, antiviral, antitumor, or immunomodulatory activity.248 In general, SVMs are less interpretable than the random forest method used by DeepBGC because SVMs often use non-linear kernel functions which mix features. Since the model is applied to predicted structures, it is possible to make changes to the predicted structure and to analyze how those changes impact the predicted probability of activity.

We previously reported a third method for predicting bioactivity from BGCs. This method relies on counts of not only PFAM domains but also other biosynthetic domain annotations supported by antiSMASH, predicted monomers for NRPS and PKS modules, and resistance genes annotated by the resistance gene identifier.281 We used three different models - random forest, logistic regression, and support vector machines in this study – and found that the identity of the model did not significantly impact accuracy. We trained the models to predict six activities: antibacterial, activity against gram-positive bacteria, activity against gram-negative bacteria, activity against eukaryotic cells, activity against fungus, and antitumor activity. As discussed above, random forests are interpretable due to their feature importance score as is the logistic regression which provides coefficients for each feature - with larger coefficients being more important for predictions. While it is possible to do this type of analysis using DeepBGC and PRISM 4’s activity prediction methods, we were the first to report feature importance analysis for these types of predictions. Our models picked up on several known structure-activity trends, for example that amines are associated with activity against gram-negative bacteria and that N-methylation of peptides is associated with activity against eukaryotic cells282 as well as some associations that have not previously been studied. Subsequently, our method was adapted for use on fungal BGCs as well as bacterial BGCs, although accuracy on fungal BGCs is currently hindered by a lack of training data.283

Each of the methods described above can predict bioactivity from the sequence of BGCs, either directly or by first predicting the structure of the product. Explainable AI tools, which will be discussed further in a later section, can then be used to reveal what biosynthetic or molecular features are correlated with activity. This process has been shown to reveal previously known SARs and predict additional SARs that have yet to be validated. Currently, these methods are severely limited by a lack of well-curated training data, which reduces their accuracy in activity prediction as well as in identification of SARs.

6. Structure based docking and modeling studies to predict SAR

6.1. Computational methods in drug discovery

While NPs provide us a gateway into their diverse structural and biological arsenal, the chemical space surrounding NPs is too vast to explore with experimental approaches alone. Improvements in technological resources, statistical methods, and structural biology advancements have propelled computational methods to the forefront as indispensable, time-efficient, and cost-effective tools in the field of drug discovery. These methods collectively fall under the umbrella term of computer-aided drug design (CADD) and are categorized into two general approaches: Ligand-based (LB) and Structure-based (SB) methods (Figure 10). CADD methods have played a significant role since the 1960s284 and have been incorporated into every step of the drug discovery process from target identification to lead optimization. This approach has contributed to the development of various pharmaceuticals currently in clinical trials or approved for use including Captopril, Dorzolamide, Saquinavir, Zanamivir, Oseltamivir, Aliskiren, Boceprevir, Nolatrexed, Rupintrivir, Imatinib, Indinavir, Tirofiban, and Raltegravir.285288 For more comprehensive information on these methodologies, additional details can be found in other reviews.289293 Here, we briefly outline these methods and highlight their utility in exploring and predicting the SAR of analogs derived from NPs.

Figure 10. CADD strategies to study SAR.

Figure 10.

A) Ligand-based methods primarily utilize information from known active molecules B) structure-based techniques involve the 3D structures of target receptors.

Ligand-based (LB) methods rely on the molecular similarity principle, where molecules with similar structural and physicochemical qualities are likely to share similar properties or activities (Figure 10A). One such LB method is pharmacophore modeling which extracts essential molecular features in active ligands – such as electronegativity, symmetry, hydrogen bond donors and acceptors, aromaticity, and many more – to generate a model highlighting the common features among the ligands.294,295 Another widely used LB method is quantitative structure-activity relationship (QSAR) which elucidates significant and quantitative correlation between ligand properties, represented by 1D to nD numerical descriptors, and biological activity. Earlier works primarily relied on simple 1D and 2D descriptors such as molecular weight and logP while later works started incorporating higher dimensionality.296 QSAR models employ statistical techniques like multi-linear regression (MLR) and principal component analysis (PCA)295,297,298 while Comparative Molecular Field Analysis (CoMFA)299 and Comparative Molecular Similarity Indices Analysis (CoMSIA)300 have become prominent among the 3D-QSAR techniques.301 This can then be used to estimate the activities of related novel compounds based on their structural attributes. Like other LB methods, this approach is not explicitly dependent on the interaction of the molecule with its target protein. For example, to identify potential dengue protease inhibitors, LB-QSAR and pharmacophore models were developed from derivatives of 4-benzyloxyphenylglycine – an important residue in previously identified protease inhibitors.302,303 The models were used for virtual screening of similar features from ZINC database and resulted in identification of two promising compounds; subsequent docking studies validated their favorable binding with the dengue protease. Another study leveraged 2D and 3D-QSAR to design novel anti-osteosarcoma chemotherapy drugs. First, 2D-QSAR models were generated from dipeptide-alkylated nitrogen-mustard derivatives followed by construction of a CoMSIA model to account for the 3D spatial characteristics. Crucial descriptors identified from the 2D-QSAR experiments and the contour map from the 3D-QSAR model guided the design of 200 new nitrogen-mustard compounds which were screened against potential targets with docking.304 The LB approach enables the design of compounds even if the target is not known, but it requires proper identification and handling of molecular descriptors, adequate available data, and validation methods for high-quality LB models. Another potential limitation of LB QSAR models is that they rely on previously observed trends and are unlikely to correctly predict activity of compounds unrelated to those used to build the model.290

Structure-Based (SB) methods play an equivalently important role in drug design by leveraging the 3D structures of biologically relevant target proteins and elucidating their interaction with ligands. The two main SB techniques utilized are molecular docking and molecular dynamics (MD) simulations (Figure 10B). Molecular docking is used to predict the preferred orientation and position of a ligand in the active site of a target protein, and scoring functions embedded in docking programs provide rapid and simplified quantitative assessment of the binding affinity and quality of ligand binding poses among the multiple conformations generated.305,306 These scoring functions, classified into physics-based, empirical-based, and knowledge-based, rely on atomic force-fields, physicochemical properties, and statistical analyses of protein-ligand complexes, respectively.86,307 The ability to rank ligand binding affinity via the scoring function facilitates the identification of modifications influencing binding strength, as illustrated by virtual screening studies applied to GPCRs.308,309 In another example, scoring functions were correlated with acetylcholinesterase (AChE) inhibition potency, showcasing a quantitative connection between scoring functions and activity.310 Meanwhile, a computational study on fatty acid binding proteins (FABP) guided the design of new class of antinociceptive and anti-inflammatory agents.311 SAR was established after docking studies, determining that the α-truxillic acid scaffold is essential for FABP binding, and identified two lead candidates after promising in vivo efficacy results. In these studies, reliable scoring functions were influential in distinguishing binders from nonbinders and in highlighting important molecular structures; however, the major weakness in most docking studies is the approximations used by the scoring functions, leading to low accuracy of the binding affinity.305 Docking can be further refined with techniques like free energy perturbation (FEP) and thermodynamic integration (TI) for improved binding free energy predictions, another indicator to characterize binding strength.312314

While molecular docking may provide a static model of a protein-ligand interaction, it fails to accurately represent the inherent conformational flexibility exhibited by most biomolecules, limiting further meaningful SAR analysis. On the other hand, molecular dynamics (MD) simulations have the ability to probe the dynamic behavior of ligand-protein complexes over time and provide more accurate measurements of binding affinity. MD simulations help capture the flexibility and fluctuations in the complex structure using Newtonian mechanics. In the context of SAR studies, MD simulations are typically used to reevaluate the results of docking studies, providing additional quantitative insights into the strength and stability of the ligand-protein interaction. In order to obtain sufficiently comparable results to experiments, an equally important aspect in these simulations is that realistic solvent conditions are accounted for. Post-processing MD approaches like linear interaction energy (LIE)315 method and methods that utilize implicit solvent models such as Molecular Mechanics/Poisson-Boltzmann Surface Area (MM/PBSA) and Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) are efficient in estimating binding free energies.316318 The atomic detail obtained from MD simulations, especially for complex molecular interactions at longer time scales, are more computationally expensive than docking; nevertheless, this SB method provides a more robust calculation, serving as another metric for optimizing the pharmacological properties of drug candidates.

SB methods require knowledge of the target’s structure, which were traditionally determined using spectroscopic techniques, including nuclear magnetic resonance (NMR), X-ray crystallography, cryo-electron microscopy, and homology modeling to provide reliable 3D structures of protein targets. Until recently, it was impossible to determine the structure of the vast majority of protein targets computationally, unless they had a close homolog that could be used as a template for modeling. More recently, AlphaFold has enabled the prediction of many protein structures, including those without any structurally characterized homologs.319 However, it is still unclear how suitable these models are for docking and other SB methods.320323 Recently, several AI-based docking methods have been developed, which could have the potential to be faster and more accurate than traditional methods,324326 but these methods generally do not perform well on benchmarks.327 This underscores the inherent and general limitations of computational methods due to complexity of biological molecules, availability and quality of data, and resource constraints. These computational methods essentially serve as approximations with varying levels of accuracy and experimental verifications are ultimately required to assess the impact of the results. However, comparisons to previously obtained experimental data are initially used to evaluate their performance. For SB-based methods, calculating the root-mean squared deviation (RMSD) of a docking pose or MD trajectory with respect to a structure from the aforementioned structural biology instrument is a common validation technique; satisfactory RMSD values are ≤ 2 Å. For LB-based methods, internal and external validation using datasets with experimental values and metrics like cross-validation are used. Despite these challenges, these calculations provide valuable insights, especially in SAR studies, into how variations in ligand structure influence binding affinity and binding free energies which translate to biological activity. Additionally, they enable extremely high throughput studies that are not possible to accomplish in the wet lab.

6.2. Applications of CADD to natural product SAR

Most examples of CADD have used primarily unnatural compounds. But, CADD technology is just as applicable to NPs as it is to synthetic compounds, although conformational search for NPs will often be slightly more challenging due to their general higher complexity and number of rotatable bonds. The chemical space surrounding a known NP, or general areas of NP-like chemical space (e.g. peptides made up of amino acids found in NRPS or RiPPs) can be used to create a library for virtual screening. Virtual screening is the process by which docking and other CADD techniques are applied to large libraries of chemical structures.328 SARs can be derived from the results of the virtual screen and confirmed with additional targeted experiments designed based on the results of the virtual screen. It is generally possible to screen many more compounds by virtual screening than by experimental screening. This is especially true for NPs, where their analogs must first be obtained by synthesis, biosynthesis, or isolation from a natural source, all of which are costly. Therefore, we propose that virtual screening should be incorporated into NP drug discovery efforts more than they currently are.

Despite the focus on unnatural compounds in most virtual screens, there have been a few studies that applied CADD methods to NP drug discovery. Sometimes, these efforts focus on optimizing a single NP scaffold. For example, the Shenvi group used docking to determine if a proposed stable analog of Salvinorin A was still able to bind the κ-opioid receptor before investing in the synthesis of the analog.29 Conversely, complexes predicted by docking can be used to rationalize experimentally observed differences in binding affinity, as was done in a study of synthetic cannabidiol analogs with activity against the μ-opioid receptor.329

Other studies have performed virtual screening using large libraries of NP. Available libraries include databases that contain NP structural data such as NPAtlas,263 COCONUT,330 Canvass,331 and the ZINC library.332 The ZINC library contains both synthetic and natural compounds, but it is especially useful in screening since many of the compounds in the library can be purchased, enabling easy experimental follow up experiments for any virtual hits. Ideally, multiple techniques described in the previous section can be combined to improve the efficiency and accuracy of the virtual screen. There are several examples for studies that combined pharmacophore-based and molecular docking screening applied to NP libraries against the following targets: X-linked inhibitor of apoptosis protein,333 the SARS-CoV2 Main protease,334 and enzyme 5-enolpyruvylshikimate-3-phosphate synthase.335 Other studies have used a combination of docking and MD to screen for inhibitors of the following targets: penicillin binding proteins and β-lactamases,336 Fascin,337 the SARS-CoV2 Main protease,338 and RAF and MEK kinases.339 One limitation of these studies is that there are not many NPs that are commercially available, so it is difficult to experimentally validate any hits. One study addressed this challenge by using extracts from herbs that were more likely to be rich in the hit from the virtual screen.340 While CADD is still limited by a lack of accuracy, we believe that it is still a useful tool, especially when combined with creative computational-experimental feedback loops and therefore we expect it to play an increasingly important role in NP drug discovery in the future.

7. Explainable AI/ML models for analysis of SAR

7.1. Overview of AI/ML and SAR in Small Molecules:

In the past few decades, machine learning (ML) has been increasingly utilized in the SAR field to develop ML-based SAR models. ML is a subfield of artificial intelligence (AI) that uses data and algorithms to identify patterns and make predictions. The integration of ML has allowed for more complex, nonlinear approaches to SAR analysis.341 ML can be broken down into two categories, supervised or unsupervised learning (Figure 11). This review will mainly focus on supervised learning (Figure 11A) which uses data labeled with a prediction or classification. In ML-based SAR models, supervised learning is utilized to predict properties of compounds like bioactivity or ADMET (absorption, distribution, metabolism, excretion, and toxicity).342,343 Unsupervised learning uses unlabeled training data and identifies patterns without any guidance or human oversight. It is useful in ML-based SAR models to learn general patterns of chemical structures to generate feature representations of the data344,345 or cluster similar compounds together.346 In addition to drug discovery, ML-based SAR models have also been applied in materials347 and organic synthesis.64,348 This review section will be focused on the usage of ML to predict biological activity, and its potential applications to the study of NP SAR.

Figure 11. Overview of ML workflows used for the SAR analysis of small molecules.

Figure 11.

The workflows are split into two categories: A) supervised learning and unsupervised learning. In supervised SAR models, ML is utilized for property prediction. B) In unsupervised SAR models, the ML methods are mainly used for clustering or dimensionality reduction.

To predict SAR with ML techniques, curated molecular datasets must first be encoded into numerical representations. The encoded compounds, termed molecular representations or molecular descriptors, can be represented in 1D, 2D, 3D, or even higher dimensions.349 The most common representations are the 2D-molecular descriptors which include information on the atoms and their connectivity. Popular 2D-molecular representations are the molecular fingerprints and the molecular graph.350 ML algorithms then use these molecular descriptors to find relationships between the molecular structure and the property of interest. ML algorithms range from interpretable linear models, such as linear regression, to more complex deep neural networks (DNNs). Although the more complex models have shown higher prediction accuracy, they do so at the expense of the interpretability of the model.351353 Common ML models used in SAR analysis, such as random forest (RF), support vector machines (SVMs),354 and DNNs,355 are termed “black-boxes” as they lack the interpretability of linear models. In other words, users are unable to inherently understand how black-box models make their predictions. To address this, the field of explainable artificial intelligence (XAI) has emerged to develop methods to interpret black-box models.

7.2. Explainable Artificial Intelligence

XAI is a broad concept, and in this section, we aim to define the most commonly used terminology and why XAI is needed in ML-based SAR models. The definitions of two terms, explainability and interpretability, have been under debate in literature as some researchers use them interchangeably and others define them as separate concepts.356,357 In this review, explainability and interpretability will be defined separately. Explainability is an active characteristic of a model, providing an explanation of its decisions by using separate algorithms to understand its internal functions or logic.356,358 On the other hand, interpretability is defined as a passive characteristic and refers to a model that a user can inherently understand.359 Under these definitions, linear models and decision tree models are interpretable, whereas black-box models are not.

XAI is a useful technique for ML-based SAR models. Knowledge of what portions of the chemical structure the model deems to be an important predictor of bioactivity adds additional support to any predictions the model makes. This helps avoid the Clever Hans effect, which occurs when a model learns spurious correlations in the data, i.e., the model produces correct predictions for the wrong reasons.360 It also helps bridge the gap between the scientific and machine learning communities as XAI provides justifications to predictions that could affect humans and has the potential to improve human understanding of SARs.

7.3. Types of XAI

XAI has been categorized in multiple ways. In this review, we will classify the types of XAI methods based on a taxonomy scheme (Figure 12) in a previously published survey which is based on complexity, level of dependency, and scope.361 The complexity of the model often determines how dependent the XAI technique is on the model, so these classifications will be grouped together.

Figure 12. Taxonomy scheme of XAI methods.

Figure 12.

XAI models can be classified by their complexity and scope. Those classified as extrinsic require post-hoc methods for explainability.

XAI models classified by their complexity are either intrinsic or extrinsic. For intrinsic models, explainability comes directly from an interpretable model. Intrinsic XAI methods are model-dependent, meaning they can only be applied to specific models, and include simple, white-box models like linear or decision-tree models. For extrinsic models, explainability comes from post-hoc methods which are applied to the model after training. Explainable methods for deep learning models fall under the category of extrinsic models as they require separate post-hoc methods to understand their decisions. Many post-hoc techniques are model-agnostic, meaning they can be applied to any model.

The scope of an XAI model refers to whether explanations look to understand the model as a whole (global interpretations) or understand individual datapoints (local interpretations). In ML-based SAR models, global interpretations capture general SAR trends and would typically contain multiple SARs. Global interpretations are useful when using a structurally and chemically diverse dataset. Conversely, local interpretations capture SAR trends of individual compounds, identifying functional groups or structural motifs that affect bioactivity. Local interpretations are useful in the optimization stage of drug development when researchers look to improve bioactivity and/or the ADMET profile.353

7.4. Common XAI methods in SAR of small molecules

SAR models of small molecules are typically interpreted by determining the descriptor importance which identifies correlations between descriptors and the predicted property.351,362 If the molecular descriptors are fingerprint- or graph-based, visual explanations can be created that highlight substructures identified as important in predicting the property. Visual explanations for small molecules include colored molecules and heat maps that color atoms or bonds based on their importance.363,364 This importance can be based on models trained on activity without target structural information (ligand based approach) or on protein-ligand structures labeled with binding affinity, in which case the importance should approximate contribution of a group to ligand binding affinity.365 It should be noted that the selection of molecular descriptors when developing ML-based SAR models is important and can affect the explainability of the model. Interpretable descriptors are those that have clear physio-chemical meaning and include various 1D descriptors (e.g., molecular weight, the number of hydrogen donors, etc.) and topological descriptors. This section is not intended to serve as an exhaustive review of all XAI techniques, but rather to highlight XAI methods that are useful in the SAR analysis of small molecules and readers should refer to existing reviews for more details.356,358,366

Feature attribution techniques are post hoc methods that calculate an attribution score for each feature based on their contribution to the model’s prediction. Feature attribution methods can be split into two broad categories: perturbation-based and gradient-based. Perturbation-based methods mask or modify each input feature to measure their effect on the output of the model.367 These methods are typically model-agnostic as they do not need access to the inner workings of a model. However, they require multiple passes through the network to calculate feature importance and as such are less computationally efficient than gradient-based techniques. Examples of perturbation-based methods include Local Interpretable Model-Agnostic Explanations (LIME),368 the permutation-based variable importance (VI) measure,369 Randomized Input Sampling for Explanation (RISE),370 and GNNExplainer.371

LIME is a local model-agnostic method that explains a model's predictions through a surrogate model368 by perturbing the input features for a specific instance and then observing the model's corresponding predictions. These results are then used to train a simple interpretable model (e.g., linear model or decision tree) to approximate the original model's behavior in proximity to a specific instance. Whitmore et al. used LIME to provide structural interpretation for a model trained to predict Research Octane Number.372 A large problem of LIME is its sampling technique, which can lead to unlikely data373 and frequent generation of unstable explanations for complex, nonlinear models.374 In other words, for complex models, LIME can generate very different explanations for neighboring inputs that have only been slightly modified.

The permutation-based variable importance (VI) measure was first proposed by Breiman for random forest models.369 A model-agnostic version called model reliance has since been adapted by Fisher et al.375 This technique measures the change in the prediction error after permuting the input features. Important features cause a large increase in error after permutation. Guha and Jurs developed a variant of this method for CNN SAR models.376

RISE, which is generally applied to tasks with image input data, estimates feature importance by multiplying each input elementwise with random masks and measuring the model's response.370 From this, the method generates saliency maps from linear combinations of the masks. To our knowledge, RISE has yet to be used to explain a ML-based SAR model. However, it has been used to generate instance level and model level explanations for a pollen classification model trained on fluorescence spectra and shows promise for explaining small molecule image data.377

GNNExplainer is applicable to any graph neural network (GNN)-based model.371 It provides explanations of a GNN's predictions by learning a graph mask and a feature mask that mask unimportant features of the input. To do this, GNNExplainer randomly initializes the masks and then optimizes them by maximizing the mutual information between the predictions of the original graph and the perturbed graph. By learning the unimportant features of the input graph, GNNExplainer can provide the important subgraph and node features that affect the model's predictions (Figure 13). Wojtuch et al. recently used this technique to determine important molecular features of models trained on four datasets: the ESOL dataset (a water solubility dataset), the QM9 dataset (a quantum properties dataset), a human metabolic stability dataset, and a rat metabolic stability dataset.378

Figure 13. Generated explanation of a SAR model using GNNExplainer 371.

Figure 13.

A GNN was trained on the MUTAG dataset which contains the mutagenetic data of nitroaromatic compounds. The subsequent predictions were explained with GNNExplainer. The methods to do this were based off of the blog post Why should I trust my Graph Neural Network? and its associated colab 379. a) A feature importance plot generated by GNNExplainer for the compound. b) Visualization of the explanation for this compound. Edges (the bonds) colored blue indicate high mask areas that the model deemed important for the prediction task. The darker the blue, the more important the bond was. For this molecule, NO2, a known mutagenetic substructure,380 was highlighted as important when predicting the molecule as mutagenetic.

Gradient-based methods rely on backpropagation to compute the gradients of the model’s output with respect to each input feature,381 which are then used to estimate attribution scores. Gradient-based methods are model-dependent as they can only be used on models trained by gradient descent. They also tend to be noisy, producing feature importance maps with irrelevant contributions.382 Examples of gradient-based techniques used in ML-based SAR models are gradient-weighted class activation maps (Grad-CAM),383 Integrated Gradients,384 Layer-wise Relevance Propagation (LRP)385 and Shapley Additive Explanations (SHAP).386

Grad-CAM is a flexible version of class activation maps (CAM) that can be used on any convolutional neural network (CNN) architecture. Grad-CAM utilizes the gradients in the final convolutional layer of CNNs to visualize the regions of the image the CNN used for classification.383 It has been used to interpret many small molecule SAR models, including by Zhong et al. to interpret SARs for predicting the rate constant of a compound’s reaction with OH radicals and validate the model by visualizing regions that were linked to the model’s predictions.387

Integrated Gradients is another popular gradient-based technique that was designed to satisfy what Sundararajan et al. describe as two fundamental axioms of attribution methods: sensitivity and implementation invariance. To determine the important features of a deep neural network, integrated gradients compute the average of all gradients along a path from a baseline input (defined as an input where the prediction is neutral or near zero) to the actual input.384 Integrated gradients have been utilized to investigate protein-ligand binding, cytochrome P450 inhibition, hERG channel inhibition, and passive permeability.388,389 This technique was able to discern known important molecular features of these properties as well as identifying models that achieved high prediction accuracy by learning spurious correlations.

LRP interprets predictions of black-box models through backpropagation.385 It begins with the output layer of the model, assigning relevance to each neuron. The relevance is then backpropagated through the network to the input-layer neurons using a set of designed local propagation rules. LRP is not inherently gradient-based. However, a variant of LRP, ϵ-LRP, can compute the average gradient and as such, LRP is typically classified as a gradient-based technique.381 An example of the use of LRP in ML-based SAR models includes Baldassarre and Azizpour’s usage of LRP to explain a graph neural network trained to predict the aqueous solubility of organic compounds.390

SHAP is a technique that combines three linear explanation models - LIME, LRP, and DeepLift - with three classic Shapley value estimations.386 Shapley values are derived from cooperative game theory and were originally used in economics to fairly distribute resources within a group (such as dividing profits or payouts) by determining each player’s contribution to the game. Lundberg et al. developed both model-agnostic and model-specific approximation techniques for calculating Shapley values to explain ML models. For example, Kernal SHAP, a model-agnostic technique, combines Linear LIME and Shapley values, whereas Deep SHAP, a model-specific technique, combines DeepLIFT and Shapley values. This method satisfies three desirable properties of additive feature attributions: local accuracy, missingness, and consistency. In small molecule SAR analysis, SHAP has been used to determine compound substructure features that affect metabolic stability391 and bioactivity.392

7.5. Applications of XAI in SAR of NP

ML-based SAR models of NPs have only recently begun to grow in popularity. This is due, in part, to the fact that curated and freely available NP databases of sufficient size and quality for ML have only recently become available. Considering the abundance of NP or NP-derived drugs,2 SAR models of NPs are commonly developed to predict bioactivities. Some commonly predicted bioactivities include anti-cancer,393396 anti-microbial,397400 and anti-inflammation.401,402

Popular encyclopedic NPs databases include NPAtlas,263 COCONUT,330 and the Universal Natural Product Database.403 Most notable is COCONUT, which is a large database containing the largest and most diverse collections of NPs. Many other databases only contain a particular type of NP, like NPAtlas which focuses on microbial NPs, while others are no longer updated or supported. These encyclopedic databases mainly contain structural information and do not contain information on bioactivities. To train a ML model to predict bioactivities, more specialized databases are needed. For example, anti-cancer NPs can be found in the NPACT404 or NPCARE405 databases. Sorokina and Steinbeck’s review gives a more in-depth survey of the current state of NP databases.406

Despite the growing number of databases in the field, there is still a lack of publicly available NPs bioactivity information. For this reason, many ML-based SAR models of NPs are trained on datasets containing synthetic small molecules. However, given the difference between small molecules and NPs (NPs typically have greater molecular weights, more hydrogen bond donors/acceptors, more oxygen atoms, fewer nitrogen atoms, etc.), ML-based SAR models of small molecules are not inherently translatable to NPs as NPs are outside of these models’ applicability domains,407 or region in chemical space, defined by the model’s training set, for which the model can make reliable and accurate predictions. One potential solution is transfer learning, a type of ML that is used when there is not sufficient training data for the task of interest. The learned parameters of a model pre-trained on one task, like the bioactivity of small molecules, can be transferred or fine-tuned to a model for a new task or domain, like the bioactivity of NPs (Figure 14).408 Qiang et al. used this technique to fine-tune a model pretrained on ChEMBL data to predict multiple targets for NPs.409

Figure 14. Overview scheme of transfer learning in ML-based SAR models of NP.

Figure 14.

An ML model trained on a large dataset of synthetic small molecules can be fine-tuned on a smaller NPs dataset for the property prediction of NPs.

However, the use of XAI in ML-based SAR models of NPs is still lacking. The most common XAI application in the area is in the classification of compounds as NPs. Kim et al. used a supervised feed-forward network to classify the structure of a NP into three levels: Pathway (specialized metabolism), Superclass (taxonomic information and chemical properties), and Class (chemical structure).410 Although the authors did not use any of the XAI techniques described in this review section, they did manually study the response of NPClassifier to perturbations in NP input structures to determine what structural features the model was using and why the model misclassified structures. NP-Scout, developed by Chen et al., is another ML method to classify small molecules as NPs.411 The classified molecules were visualized using similarity maps412 to highlight portions of the molecule that the random forest model used to classify as either a NP or a small molecule. To our knowledge, the only instance of one of the previously described XAI techniques being used in a ML-based SAR model of NPs was from Maroni et al.413 This model was trained on both natural and synthetic molecules to classify compounds as either sweet or bitter. They used SHAP to obtain global explanations and local explanations of the model’s decisions.

As the use of black-box models in the SAR analysis of NPs continues to rise, so should the subsequent use of XAI techniques. Any of the XAI methods described in this review can be utilized in ML-based SAR models of NPs. Considering the many applications of NPs in the drug discovery field, XAI can foster collaboration between the scientific and machine learning community by providing explanations to predictions. In addition to giving insight into the model’s decisions, any identified substructures or features could guide optimization of lead compounds. Going forward, we recommend that any results from a ML-based SAR model of NPs be backed by explanations from an XAI technique.

Conclusion.

In this review, we have presented experimental and computational methods that can be used to study the SARs of NPs. All of these methods are complementary. Different approaches to NP synthesis, derivatization, biosynthesis, and isolation are likely to give access to different analogs. We have presented several examples, such as the antibiotics daptomycin, which have been studied using multiple of these techniques, illustrating their complementarity. However, many computational techniques, in particular QSAR models and XAI models, require experimental data to build the models. Therefore, we propose that the optimal way to study NP SAR is through an experimental-computational feedback loop in which experiments are used to validate and generate training data for computational studies and computational studies are used to focus synthetic and biosynthetic efforts on those compounds that are most likely to have improved activity or be informative for computational model refinement. Successful execution of such a feedback loop requires expertise in many domains ranging from chemical synthesis, bioactivity assay development, synthetic biology, bioinformatics, cheminformatics, and artificial intelligence and will therefore likely require collaboration between researchers in the NP field. We expect that these collaborative efforts will play a key role in drug development in the future, especially for emerging threats such as antimicrobial resistant pathogens and future pandemics.

Acknowledgements.

Writing of this review was supported by the National Institute of General Medicine of the National Institutes of Health under award number R35GM146987. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

References.

  • 1.Dias DA, Urban S. and Roessner U, Metabolites, 2012, 2, 303–336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Newman DJ and Cragg GM, J. Nat. Prod, 2020, 83, 770–803. [DOI] [PubMed] [Google Scholar]
  • 3.Huffman BJ and Shenvi RA, J. Am. Chem. Soc, 2019, 141, 3332–3346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Yñigez-Gutierrez AE and Bachmann BO, J. Med. Chem, 2019, 62, 8412–8428. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Seukep AJ, Nembu NE, Mbuntcha HG and Kuete V, in Advances in Botanical Research, ed. Kuete V, Academic Press, 2023, vol. 106, pp. 21–45. [Google Scholar]
  • 6.Yusuf RZ, Duan Z, Lamendola DE, Penson RT and Seiden MV, Curr. Cancer Drug Targets, 2003, 3, 1–19. [DOI] [PubMed] [Google Scholar]
  • 7.Guha R, Methods Mol. Biol, 2013, 993, 81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Szymański M, Chmielewska S, Czyżewska U, Malinowska M. and Tylicki A, J. Enzyme Inhib. Med. Chem, 2022, 37, 876–894. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Mohamed MA, Elkhateeb WA and Daba GM, Bioresour Bioprocess, 2022, 9, 65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Simoben CV, Babiaka SB, Moumbock AFA, Namba-Nzanguim CT, Eni DB, Medina-Franco JL, Günther S, Ntie-Kang F. and Sippl W, RSC Adv, 2023, 13, 31578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Szpilman AM and Carreira EM, Angew. Chem. Int. Ed Engl, 2010, 49, 9592–9628. [DOI] [PubMed] [Google Scholar]
  • 12.Itoh H. and Inoue M, Chem. Rev, 2019, 119, 10002–10031. [DOI] [PubMed] [Google Scholar]
  • 13.Truax NJ and Romo D, Nat. Prod. Rep, 2020, 37, 1436–1453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wu Z-C and Boger DL, Nat. Prod. Rep, 2020, 37, 1511–1531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Li L, Chen Z, Zhang X. and Jia Y, Chem. Rev, 2018, 118, 3752–3832. [DOI] [PubMed] [Google Scholar]
  • 16.Tan DS, Foley MA, Shair MD and Schreiber SL, J. Am. Chem. Soc, 1998, 120, 8565–8566. [Google Scholar]
  • 17.Schreiber SL, Science, 2000, 287, 1964–1969. [DOI] [PubMed] [Google Scholar]
  • 18.Gaul C, Njardarson JT, Shan D, Dorn DC, Wu K-D, Tong WP, Huang X-Y, Moore MAS and Danishefsky SJ, J. Am. Chem. Soc, 2004, 126, 11326–11337. [DOI] [PubMed] [Google Scholar]
  • 19.Jones SB, Simmons B, Mastracchio A. and MacMillan DWC, Nature, 2011, 475, 183–188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Wilson RM and Danishefsky SJ, J. Org. Chem, 2006, 71, 8329–8351. [DOI] [PubMed] [Google Scholar]
  • 21.Goethe O, DiBello M. and Herzon SB, Nat. Chem, 2022, 14, 1270–1277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ishihara Y. and Baran PS, Synlett, 2010, 2010, 1733–1745. [Google Scholar]
  • 23.Kanda Y, Ishihara Y, Wilde NC and Baran PS, J. Org. Chem, 2020, 85, 10293–10320. [DOI] [PubMed] [Google Scholar]
  • 24.Mendoza A, Ishihara Y. and Baran PS, Nat. Chem, 2011, 4, 21–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Yuan C, Jin Y, Wilde NC and Baran PS, Angew. Chem. Int. Ed Engl, 2016, 55, 8280–8284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Seiple IB, Zhang Z, Jakubec P, Langlois-Mercier A, Wright PM, Hog DT, Yabu K, Allu SR, Fukuzaki T, Carlsen PN, Kitamura Y, Zhou X, Condakes ML, Szczypiński FT, Green WD and Myers AG, Nature, 2016, 533, 338–345. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Abbasov ME, Alvariño R, Chaheine CM, Alonso E, Sánchez JA, Conner ML, Alfonso A, Jaspars M, Botana LM and Romo D, Nat. Chem, 2019, 11, 342–350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Crane EA and Gademann K, Angew. Chem. Int. Ed Engl, 2016, 55, 3882–3902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Roach JJ, Sasano Y, Schmid CL, Zaidi S, Katritch V, Stevens RC, Bohn LM and Shenvi RA, ACS Cent. Sci, 2017, 3, 1329–1336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Bebbington MWP, Chem. Soc. Rev, 2017, 46, 5059–5109. [DOI] [PubMed] [Google Scholar]
  • 31.Fürstner A, Acc. Chem. Res, 2021, 54, 861–874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Li G, Lou M. and Qi X, Org. Chem. Front, 2022, 9, 517–571. [Google Scholar]
  • 33.Nandy JP, Prakesch M, Khadem S, Reddy PT, Sharma U. and Arya P, Chem. Rev, 2009, 109, 1999–2060. [DOI] [PubMed] [Google Scholar]
  • 34.Ng V, Kuehne SA and Chan WC, Chemistry, 2018, 24, 9136–9147. [DOI] [PubMed] [Google Scholar]
  • 35.Yang H, Chen KH and Nowick JS, ACS Chem. Biol., 2016, 11, 1823–1826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wu C, Pan Z, Yao G, Wang W, Fang L. and Su W, RSC Adv, 2017, 7, 1923–1926. [Google Scholar]
  • 37.Jin K, Sam IH, Po KHL, Lin D, Ghazvini Zadeh EH, Chen S, Yuan Y. and Li X, Nat. Commun, 2016, 7, 12394. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Abdel Monaim SA, Jad YE, Acosta GA, Naicker T, Ramchuran EJ, El-Faham A, Govender T, Kruger HG, de la Torre BG and Albericio F, RSC Adv, 2016, 6, 73827–73829. [Google Scholar]
  • 39.Abdel Monaim SAH, Jad YE, Ramchuran EJ, El-Faham A, Govender T, Kruger HG, de la Torre BG and Albericio F, ACS Omega, 2016, 1, 1262–1265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Chen KH, Le SP, Han X, Frias JM and Nowick JS, Chem. Commun., 2017, 53, 11357–11359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Schumacher CE, Harris PWR, Ding X-B, Krause B, Wright TH, Cook GM, Furkert DP and Brimble MA, Org. Biomol. Chem, 2017, 15, 8755–8760. [DOI] [PubMed] [Google Scholar]
  • 42.Monaim SAHA, Noki S, Ramchuran EJ, El-Faham A, Albericio F. and de la Torre BG, Molecules, 2017, 22, 1632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Abdel Monaim SAH, Ramchuran EJ, El-Faham A, Albericio F. and de la Torre BG, J. Med. Chem, 2017, 60, 7476–7482. [DOI] [PubMed] [Google Scholar]
  • 44.Parmar A, Iyer A, Prior SH, Lloyd DG, Leng Goh ET, Vincent CS, Palmai-Pallag T, Bachrati CZ, Breukink E, Madder A, Lakshminarayanan R, Taylor EJ and Singh I, Chem. Sci, 2017, 8, 8183–8192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Parmar A, Iyer A, Lloyd DG, Vincent CS, Prior SH, Madder A, Taylor EJ and Singh I, Chem. Commun., 2017, 53, 7788–7791. [DOI] [PubMed] [Google Scholar]
  • 46.Abdel Monaim SAH, Jad YE, El-Faham A, de la Torre BG and Albericio F, Bioorg. Med. Chem, 2018, 26, 2788–2796. [DOI] [PubMed] [Google Scholar]
  • 47.Gallardo-Godoy A, Muldoon C, Becker B, Elliott AG, Lash LH, Huang JX, Butler MS, Pelingon R, Kavanagh AM, Ramu S, Phetsang W, Blaskovich MAT and Cooper MA, J. Med. Chem, 2016, 59, 1068–1077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Murai M, Kaji T, Kuranaga T, Hamamoto H, Sekimizu K. and Inoue M, Angew. Chem. Int. Ed Engl, 2015, 54, 1556–1560. [DOI] [PubMed] [Google Scholar]
  • 49.Tannert R, Milroy L-G, Ellinger B, Hu T-S, Arndt H-D and Waldmann H, J. Am. Chem. Soc, 2010, 132, 3063–3077. [DOI] [PubMed] [Google Scholar]
  • 50.Chow HY, Po KHL, Jin K, Qiao G, Sun Z, Ma W, Ye X, Zhou N, Chen S. and Li X, ACS Med. Chem. Lett, 2020, 11, 1442–1449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Barnawi G, Noden M, Goodyear J, Marlyn J, Schneider O, Beriashvili D, Schulz S, Moreira R, Palmer M. and Taylor SD, ACS Infect. Dis, 2022, 8, 778–789. [DOI] [PubMed] [Google Scholar]
  • 52.Moreira R, Barnawi G, Beriashvili D, Palmer M. and Taylor SD, Bioorg. Med. Chem, 2019, 27, 240–246. [DOI] [PubMed] [Google Scholar]
  • 53.Barnawi G, Noden M, Taylor R, Lohani C, Beriashvili D, Palmer M. and Taylor SD, Biopolymers, 2018, 111, e23094. [DOI] [PubMed] [Google Scholar]
  • 54.Lohani CR, Taylor R, Palmer M. and Taylor SD, Org. Lett, 2015, 17, 748–751. [DOI] [PubMed] [Google Scholar]
  • 55.Lohani CR, Taylor R, Palmer M. and Taylor SD, Bioorg. Med. Chem. Lett, 2015, 25, 5490–5494. [DOI] [PubMed] [Google Scholar]
  • 56.Conda-Sheridan M. and Krishnaiah M, in Peptide Synthesis: Methods and Protocols, eds. Hussein WM, Skwarczynski M. and Toth I, Springer US, New York, NY, 2020, pp. 111–128. [DOI] [PubMed] [Google Scholar]
  • 57.Nicolaou KC, Winssinger N, Pastor J, Ninkovic S, Sarabia F, He Y, Vourloumis D, Yang Z, Li T, Giannakakou P. and Hamel E, Nature, 1997, 387, 268–272. [DOI] [PubMed] [Google Scholar]
  • 58.Nicolaou KC, Vourloumis D, Li T, Pastor J, Winssinger N, He Y, Ninkovic S, Sarabia F, Vallberg H, Roschangar F, King NP, Finlay MRV, Giannakakou P, Verdier-Pinard P. and Hamel E, Angew. Chem. Int. Ed Engl, 1997, 36, 2097–2103. [Google Scholar]
  • 59.Gao W, Raghavan P. and Coley CW, Nat. Commun, 2022, 13, 1–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Liu C, Xie J, Wu W, Wang M, Chen W, Idres SB, Rong J, Deng L-W, Khan SA and Wu J, Nat. Chem, 2021, 13, 451–457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Burke MD, Denmark SE, Diao Y, Han J, Switzky R. and Zhao H, AI Mag, 2024, 45, 117–123. [Google Scholar]
  • 62.Wang W, Angello N, Blair D, Medine K, Tyrikos-Ergas T, Laporte A. and Burke M, ChemRxiv, , DOI: 10.26434/chemrxiv-2023-qpf2x. [DOI] [Google Scholar]
  • 63.Sankaranarayanan K. and Jensen KF, Chem. Sci, 2023, 14, 6467–6475. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Coley CW, Green WH and Jensen KF, Acc. Chem. Res, 2018, 51, 1281–1289. [DOI] [PubMed] [Google Scholar]
  • 65.Coley CW, Rogers L, Green WH and Jensen KF, ACS Cent Sci, 2017, 3, 1237–1245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Szymkuć S, Gajewska EP, Klucznik T, Molga K, Dittwald P, Startek M, Bajczyk M. and Grzybowski BA, Angew. Chem. Int. Ed Engl, 2016, 55, 5904–5937. [DOI] [PubMed] [Google Scholar]
  • 67.Shen Y, Borowski JE, Hardy MA, Sarpong R, Doyle AG and Cernak T, Nature Reviews Methods Primers, 2021, 1, 1–23. [Google Scholar]
  • 68.Majhi S. and Das D, Tetrahedron, 2021, 78, 131801. [Google Scholar]
  • 69.Huo T, Zhao X, Cheng Z, Wei J, Zhu M, Dou X. and Jiao N, Yao Xue Xue Bao, , DOI: 10.1016/j.apsb.2023.11.021. [DOI] [Google Scholar]
  • 70.Kim KE, Kim AN, McCormick CJ and Stoltz BM, J. Am. Chem. Soc, 2021, 143, 16890–16901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Wang Z. and Hui C, Org. Biomol. Chem, 2021, 19, 3791–3812. [DOI] [PubMed] [Google Scholar]
  • 72.Lin D, Jiang S, Zhang A, Wu T, Qian Y. and Shao Q, Nat. Products Bioprospect, 2022, 12, 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Robles O. and Romo D, Nat. Prod. Rep, 2014, 31, 318–334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Shugrue CR and Miller SJ, Chem. Rev, 2017, 117, 11894–11951. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Hong B, Luo T. and Lei X, ACS Cent. Sci, 2020, 6, 622–635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Lewis CA and Miller SJ, Angew. Chem. Int. Ed Engl, 2006, 45, 5616–5619. [DOI] [PubMed] [Google Scholar]
  • 77.Lewis CA, Merkel J. and Miller SJ, Bioorg. Med. Chem. Lett, 2008, 18, 6007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Lewis CA, Longcore KE, Miller SJ and Wender PA, J. Nat. Prod, 2009, 72, 1864–1869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Fowler BS, Laemmerhold KM and Miller SJ, J. Am. Chem. Soc, 2012, 134, 9755–9761. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Yoganathan S. and Miller SJ, J. Med. Chem, 2015, 58, 2367–2377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Han S. and Miller SJ, J. Am. Chem. Soc, 2013, 135, 12414–12421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Pathak TP and Miller SJ, J. Am. Chem. Soc, 2012, 134, 6120–6123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Pathak TP and Miller SJ, J. Am. Chem. Soc, 2013, 135, 8415–8422. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Metrano AJ, Chinn AJ, Shugrue CR, Stone EA, Kim B. and Miller SJ, Chem. Rev, 2020, 120, 11479–11615. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Yamada T, Suzuki K, Hirose T, Furuta T, Ueda Y, Kawabata T, Ōmura S. and Sunazuka T, Chem. Pharm. Bull., 2016, 64, 856–864. [DOI] [PubMed] [Google Scholar]
  • 86.Li J, Grosslight S, Miller SJ, Sigman MS and Toste FD, ACS Catal, 2019, 9, 9794–9799. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Gormisky PE and White MC, J. Am. Chem. Soc, 2013, 135, 14052–14055. [DOI] [PubMed] [Google Scholar]
  • 88.Howell JM, Feng K, Clark JR, Trzepkowski LJ and White MC, J. Am. Chem. Soc, 2015, 137, 14590–14593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.White MC and Zhao J, J. Am. Chem. Soc, 2018, 140, 13988–14009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Gómez L, Canta M, Font D, Prat I, Ribas X. and Costas M, J. Org. Chem, 2013, 78, 1421–1433. [DOI] [PubMed] [Google Scholar]
  • 91.Zhao C, Ye Z, Ma Z-X, Wildman SA, Blaszczyk SA, Hu L, Guizei IA and Tang W, Nat. Commun, 2019, 10, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Chen X, Wang Y, Ma N, Tian J, Shao Y, Zhu B, Wong YK, Liang Z, Zou C. and Wang J, Signal Transduction and Targeted Therapy, 2020, 5, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Meissner F, Geddes-McAlister J, Mann M. and Bantscheff M, Nat. Rev. Drug Discov, 2022, 21, 637–654. [DOI] [PubMed] [Google Scholar]
  • 94.Prudent R, Annis DA, Dandliker PJ, Ortholand J-Y and Roche D, Nature Reviews Chemistry, 2020, 5, 62–71. [DOI] [PubMed] [Google Scholar]
  • 95.Li G, Peng X, Guo Y, Gong S, Cao S. and Qiu F, Front. Chem, 2021, 9, 761609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Wright MH and Sieber SA, Nat. Prod. Rep, 2016, 33, 681–708. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Bhukta S, Gopinath P. and Dandela R, RSC Adv, 2021, 11, 27950–27964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Zhang H-W, Lv C, Zhang L-J, Guo X, Shen Y-W, Nagle DG, Zhou Y-D, Liu S-H, Zhang W-D and Luan X, Biomed. Pharmacother, 2021, 141, 111833. [DOI] [PubMed] [Google Scholar]
  • 99.Gao Y, Ma M, Li W. and Lei X, Adv. Sci, 2024, 11, e2305608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Lomenick B, Olsen RW and Huang J, ACS Chem. Biol., 2011, 6, 34–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Mateus A, Kurzawa N, Perrin J, Bergamini G. and Savitski MM, Annu. Rev. Pharmacol. Toxicol, 2022, 62, 465–482. [DOI] [PubMed] [Google Scholar]
  • 102.Moseley FL, Bicknell KA, Marber MS and Brooks G, J. Pharm. Pharmacol, 2007, 59, 609–628. [DOI] [PubMed] [Google Scholar]
  • 103.Kries H, Trottmann F. and Hertweck C, Angew. Chem. Weinheim Bergstr. Ger,, DOI: 10.1002/ange.202309284. [DOI] [PubMed] [Google Scholar]
  • 104.Romero E, Jones BS, Hogg BN, Rué Casamajo A, Hayes MA, Flitsch SL, Turner NJ and Schnepel C, Angew. Chem. Int. Ed Engl, 2021, 60, 16824–16855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Zetzsche LE and Narayan ARH, Nat Rev Chem, 2020, 4, 334–346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Chen K. and Arnold FH, Nature Catalysis, 2020, 3, 203–213. [Google Scholar]
  • 107.Liu J, Tian J, Perry C, Lukowski AL, Doukov TI, Narayan ARH and Bridwell-Rabb J, Nat. Commun, 2022, 13, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Amatuni A, Shuster A, Abegg D, Adibekian A. and Renata H, ACS Cent. Sci, 2023, 9, 239–251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Zwick Iii CR, Sosa MB and Renata H, J. Am. Chem. Soc, 2021, 143, 1673–1679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Zhang X, King-Smith E, Dong L-B, Yang L-C, Rudolf JD, Shen B. and Renata H, Science, 2020, 369, 799–806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Li F, Deng H. and Renata H, J. Am. Chem. Soc, 2022, 144, 7616–7621. [DOI] [PubMed] [Google Scholar]
  • 112.Li J, Li F, King-Smith E. and Renata H, Nat. Chem, 2020, 12, 173–179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Kolev JN, O’Dwyer KM, Jordan CT and Fasan R, ACS Chem. Biol., 2014, 9, 164–173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Alwaseem H, Giovani S, Crotti M, Welle K, Jordan CT, Ghaemmaghami S. and Fasan R, ACS Cent Sci, 2021, 7, 841–857. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Stout CN and Renata H, Acc. Chem. Res, 2021, 54, 1143–1156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Ren X. and Fasan R, Curr Opin Green Sustain Chem,, DOI: 10.1016/j.cogsc.2021.100494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Lowell AN, DeMars MDII, Slocum ST, Yu F, Anand K, Chemler JA, Korakavi N, Priessnitz JK, Park SR, Koch AA, Schultz PJ and Sherman DH, J. Am. Chem. Soc, 2017, 139, 7913–7920. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Espinoza RV, Haatveit KC, Grossman SW, Tan JY, McGlade CA, Khatri Y, Newmister SA, Schmidt JJ, Garcia-Borràs M, Montgomery J, Houk KN and Sherman DH, ACS Catal, 2021, 11, 8304–8316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Zetzsche LE, Chakrabarty S. and Narayan ARH, J. Am. Chem. Soc, 2022, 144, 5214–5225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Hohlman RM, Newmister SA, Sanders JN, Khatri Y, Li S, Keramati NR, Lowell AN, Houk KN and Sherman DH, ACS Catal, 2021, 11, 4670–4681. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Budimir ZL, Patel RS, Eggly A, Evans CN, Rondon-Cordero HM, Adams JJ, Das C. and Parkinson EI, Nat. Chem. Biol, 2023, 20, 120–128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Kassa G, Liu J, Hartman TW, Dhiman S, Gadhamshetty V. and Gnimpieba E, in Microbial Stress Response: Mechanisms and Data Science, American Chemical Society, 2023, vol. 1434, pp. 93–111. [Google Scholar]
  • 123.Yang W, Fidelis TT and Sun W-H, ACS Omega, 2020, 5, 83–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Yang J, Li F-Z and Arnold FH, ACS Cent. Sci,, DOI: 10.1021/acscentsci.3c01275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Mordhorst S, Ruijne F, Vagstad AL, Kuipers OP and Piel J, RSC Chem. Biol, 2023, 4, 7–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Süssmuth RD and Mainz A, Angew. Chem. Int. Ed Engl, 2017, 56, 3770–3821. [DOI] [PubMed] [Google Scholar]
  • 127.Hertweck C, Angew. Chem. Int. Ed Engl, 2009, 48, 4688–4716. [DOI] [PubMed] [Google Scholar]
  • 128.Nivina A, Yuet KP, Hsu J. and Khosla C, Chem. Rev, 2019, 119, 12524–12547. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Fischbach MA, Walsh CT and Clardy J, Proc. Natl. Acad. Sci. U. S. A, 2008, 105, 4601–4608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Robbins T, Liu Y-C, Cane DE and Khosla C, Curr. Opin. Struct. Biol, 2016, 41, 10–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Ortega MA and van der Donk WA, Cell Chem Biol, 2016, 23, 31–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Hudson GA and Mitchell DA, Curr. Opin. Microbiol, 2018, 45, 61–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Montalbán-López M, Scott TA, Ramesh S, Rahman IR, van Heel AJ, Viel JH, Bandarian V, Dittmann E, Genilloud O, Goto Y, Grande Burgos MJ, Hill C, Kim S, Koehnke J, Latham JA, Link AJ, Martínez B, Nair SK, Nicolet Y, Rebuffat S, Sahl H-G, Sareen D, Schmidt EW, Schmitt L, Severinov K, Süssmuth RD, Truman AW, Wang H, Weng J-K, van Wezel GP, Zhang Q, Zhong J, Piel J, Mitchell DA, Kuipers OP and van der Donk WA, Nat. Prod. Rep, 2021, 38, 130–239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Arnison PG, Bibb MJ, Bierbaum G, Bowers AA, Bugni TS, Bulaj G, Camarero JA, Campopiano DJ, Challis GL, Clardy J, Cotter PD, Craik DJ, Dawson M, Dittmann E, Donadio S, Dorrestein PC, Entian K-D, Fischbach MA, Garavelli JS, Göransson U, Gruber CW, Haft DH, Hemscheidt TK, Hertweck C, Hill C, Horswill AR, Jaspars M, Kelly WL, Klinman JP, Kuipers OP, Link AJ, Liu W, Marahiel MA, Mitchell DA, Moll GN, Moore BS, Müller R, Nair SK, Nes IF, Norris GE, Olivera BM, Onaka H, Patchett ML, Piel J, Reaney MJT, Rebuffat S, Ross RP, Sahl H-G, Schmidt EW, Selsted ME, Severinov K, Shen B, Sivonen K, Smith L, Stein T, Süssmuth RD, Tagg JR, Tang G-L, Truman AW, Vederas JC, Walsh CT, Walton JD, Wenzel SC, Willey JM and van der Donk WA, Nat. Prod. Rep, 2013, 30, 108–160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135.Hoshino Y. and Gaucher EA, Mol. Biol. Evol, 2018, 35, 2185–2197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 136.Malico AA, Calzini MA, Gayen AK and Williams GJ, J. Ind. Microbiol. Biotechnol, 2020, 47, 675–702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137.Weissman KJ, Nat. Prod. Rep, 2016, 33, 203–230. [DOI] [PubMed] [Google Scholar]
  • 138.Neves RPP, Ferreira P, Medina FE, Paiva P, Sousa JPM, Viegas MF, Fernandes PA and Ramos MJ, Top. Catal, 2022, 65, 544–562. [Google Scholar]
  • 139.Ruijne F. and Kuipers OP, Biochem. Soc. Trans, 2021, 49, 203–215. [DOI] [PubMed] [Google Scholar]
  • 140.Fu Y, Xu Y, Ruijne F. and Kuipers OP, FEMS Microbiol. Rev,, DOI: 10.1093/femsre/fuad017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.Winn M, Fyans JK, Zhuo Y. and Micklefield J, Nat. Prod. Rep, 2016, 33, 317–347. [DOI] [PubMed] [Google Scholar]
  • 142.Do T. and Link AJ, Biochemistry, 2023, 62, 201–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 143.Beck C, Garzón JFG and Weber T, Biotechnol. Bioprocess Eng, 2020, 25, 886–894. [Google Scholar]
  • 144.Hopwood DA, Malpartida F, Kieser HM, Ikeda H, Duncan J, Fujii I, Rudd BA, Floss HG and Omura S, Nature, 1985, 314, 642–644. [DOI] [PubMed] [Google Scholar]
  • 145.Omura S, Ikeda H, Malpartida F, Kieser HM and Hopwood DA, Antimicrob. Agents Chemother, 1986, 29, 13–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 146.Oliynyk M, Brown MJ, Cortés J, Staunton J. and Leadlay PF, Chem. Biol, 1996, 3, 833–839. [DOI] [PubMed] [Google Scholar]
  • 147.Staunton J. and Wilkinson B, Chem. Rev, 1997, 97, 2611–2630. [DOI] [PubMed] [Google Scholar]
  • 148.Marsden AF, Wilkinson B, Cortés J, Dunster NJ, Staunton J. and Leadlay PF, Science, 1998, 279, 199–202. [DOI] [PubMed] [Google Scholar]
  • 149.Klaus M. and Grininger M, Nat. Prod. Rep, 2018, 35, 1070–1081. [DOI] [PubMed] [Google Scholar]
  • 150.McDaniel R, Thamchaipenet A, Gustafsson C, Fu H, Betlach M. and Ashley G, Proc. Natl. Acad. Sci. U. S. A, 1999, 96, 1846–1851. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 151.Zhu ZJ, Krasnykh O, Pan D, Petukhova V, Yu G, Liu Y, Liu H, Hong S, Wang Y, Wan B, Liang W. and Franzblau SG, Tuberculosis, 2008, 88 Suppl 1, S49–63. [DOI] [PubMed] [Google Scholar]
  • 152.Mamada SS, Nainu F, Masyita A, Frediansyah A, Utami RN, Salampe M, Emran TB, Lima CMG, Chopra H. and Simal-Gandara J, Mar. Drugs,, DOI: 10.3390/md20110691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 153.Buyachuihan L, Zhao Y, Schelhas C. and Grininger M, ACS Chem. Biol., 2023, 18, 1500–1509. [DOI] [PubMed] [Google Scholar]
  • 154.Wlodek A, Kendrew SG, Coates NJ, Hold A, Pogwizd J, Rudder S, Sheehan LS, Higginbotham SJ, Stanley-Smith AE, Warneck T, Nur-E-Alam M, Radzom M, Martin CJ, Overvoorde L, Samborskyy M, Alt S, Heine D, Carter GT, Graziani EI, Koehn FE, McDonald L, Alanine A, Rodríguez Sarmiento RM, Chao SK, Ratni H, Steward L, Norville IH, Sarkar-Tyson M, Moss SJ, Leadlay PF, Wilkinson B. and Gregory MA, Nat. Commun, 2017, 8, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 155.Gregory MA, Kaja AL, Kendrew SG, Coates NJ, Warneck T, Nur-e-Alam M, Lill RE, Sheehan LS, Chudley L, Moss SJ, Sheridan RM, Quimpere M, Zhang M-Q, Martin CJ and Wilkinson B, Chem. Sci, 2013, 4, 1046–1052. [Google Scholar]
  • 156.Zhang J, Yan Y-J, An J, Huang S-X, Wang X-J and Xiang W-S, Microb. Cell Fact, 2015, 14, 152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 157.Debono M, Abbott BJ, Molloy RM, Fukuda DS, Hunt AH, Daupert VM, Counter FT, Ott JL, Carrell CB and Howard LC, J. Antibiot., 1988, 41, 1093–1105. [DOI] [PubMed] [Google Scholar]
  • 158.Debono M, Barnhart M, Carrell CB, Hoffmann JA, Occolowitz JL, Abbott BJ, Fukuda DS, Hamill RL, Biemann K. and Herlihy WC, J. Antibiot., 1987, 40, 761–777. [DOI] [PubMed] [Google Scholar]
  • 159.Fukuda DS, Du Bus RH, Baker PJ, Berry DM and Mynderse JS, J. Antibiot., 1990, 43, 594–600. [DOI] [PubMed] [Google Scholar]
  • 160.Piper KE, Steckelberg JM and Patel R, J. Infect. Chemother, 2005, 11, 207–209. [DOI] [PubMed] [Google Scholar]
  • 161.Silverman JA, Mortin LI, Vanpraagh ADG, Li T. and Alder J, J. Infect. Dis, 2005, 191, 2149–2152. [DOI] [PubMed] [Google Scholar]
  • 162.Baltz RH, ACS Synth. Biol, 2014, 3, 748–758. [DOI] [PubMed] [Google Scholar]
  • 163.Miao V, Coëffet-Le Gal M-F, Nguyen K, Brian P, Penn J, Whiting A, Steele J, Kau D, Martin S, Ford R, Gibson T, Bouchard M, Wrigley SK and Baltz RH, Chem. Biol, 2006, 13, 269–276. [DOI] [PubMed] [Google Scholar]
  • 164.Nguyen KT, Ritz D, Gu J-Q, Alexander D, Chu M, Miao V, Brian P. and Baltz RH, Proc. Natl. Acad. Sci. U. S. A, 2006, 103, 17462–17467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 165.Doekel S, Coëffet-Le Gal M-F, Gu J-Q, Chu M, Baltz RH and Brian P, Microbiology, 2008, 154, 2872–2880. [DOI] [PubMed] [Google Scholar]
  • 166.Nguyen KT, He X, Alexander DC, Li C, Gu J-Q, Mascio C, Van Praagh A, Mortin L, Chu M, Silverman JA, Brian P. and Baltz RH, Antimicrob. Agents Chemother, 2010, 54, 1404–1413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 167.Moreira R. and Taylor SD, ACS Infect. Dis, 2022, 8, 1935–1947. [DOI] [PubMed] [Google Scholar]
  • 168.Kopp F, Grünewald J, Mahlert C. and Marahiel MA, Biochemistry, 2006, 45, 10474–10481. [DOI] [PubMed] [Google Scholar]
  • 169.Karas JA, Carter GP, Howden BP, Turner AM, Paulin OKA, Swarbrick JD, Baker MA, Li J. and Velkov T, J. Med. Chem, 2020, 63, 13266–13290. [DOI] [PubMed] [Google Scholar]
  • 170.Bozhüyük KAJ, Präve L, Kegler C, Schenk L, Kaiser S, Schelhas C, Shi Y-N, Kuttenlochner W, Schreiber M, Kandler J, Alanjary M, Mohiuddin TM, Groll M, Hochberg GKA and Bode HB, Science, 2024, 383, eadg4320. [DOI] [PubMed] [Google Scholar]
  • 171.Mabesoone MFJ, Leopold-Messer S, Minas HA, Chepkirui C, Chawengrum P, Reiter S, Meoded RA, Wolf S, Genz F, Magnus N, Piechulla B, Walker AS and Piel J, Science, 2024, 383, 1312–1317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 172.Baltz RH, in Natural Products, John Wiley & Sons, Inc., Hoboken, NJ, USA, 2014, pp. 433–454. [Google Scholar]
  • 173.Bozhüyük KAJ, Micklefield J. and Wilkinson B, Curr. Opin. Microbiol, 2019, 51, 88–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 174.Huang H-M, Stephan P. and Kries H, Cell Chem. Biol, 2021, 28, 221–227.e7. [DOI] [PubMed] [Google Scholar]
  • 175.Camus A, Gantz M. and Hilvert D, ACS Chem. Biol, 2023, 18, 2516–2523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 176.Bozhüyük KAJ, Linck A, Tietze A, Kranz J, Wesche F, Nowak S, Fleischhacker F, Shi Y-N, Grün P. and Bode HB, Nat. Chem, 2019, 11, 653–661. [DOI] [PubMed] [Google Scholar]
  • 177.Bozhüyük KAJ, Fleischhacker F, Linck A, Wesche F, Tietze A, Niesert C-P and Bode HB, Nat. Chem, 2017, 10, 275–281. [DOI] [PubMed] [Google Scholar]
  • 178.Bozhueyuek KAJ, Watzel J, Abbood N. and Bode HB, Angew. Chem. Int. Ed Engl, 2021, 60, 17531–17538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 179.Eng CH, Backman TWH, Bailey CB, Magnan C, García Martín H, Katz L, Baldi P. and Keasling JD, Nucleic Acids Res, 2018, 46, D509–D515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 180.Tao XB, LaFrance S, Xing Y, Nava AA, Martin HG, Keasling JD and Backman TWH, Nucleic Acids Res, 2023, 51, D532–D538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 181.Alanjary M, Cano-Prieto C, Gross H. and Medema MH, Nat. Prod. Rep, 2019, 36, 1249–1261. [DOI] [PubMed] [Google Scholar]
  • 182.Gao L, Guo J, Fan Y, Ma Z, Lu Z, Zhang C, Zhao H. and Bie X, Microb. Cell Fact, 2018, 17, 84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 183.Su L, Hôtel L, Paris C, Chepkirui C, Brachmann AO, Piel J, Jacob C, Aigle B. and Weissman KJ, Nat. Commun, 2022, 13, 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 184.Kudo K, Nishimura T, Kozone I, Hashimoto J, Kagaya N, Suenaga H, Ikeda H. and Shin-ya K, Sci. Rep, 2021, 11, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 185.Hwang S, Lee N, Cho S, Palsson B. and Cho B-K, Front Mol Biosci, 2020, 7, 87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 186.Wang H, Fewer DP, Holm L, Rouhiainen L. and Sivonen K, Proceedings of the National Academy of Sciences, 2014, 111, 9259–9264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 187.Cane DE and Walsh CT, Chem. Biol, 1999, 6, R319–25. [DOI] [PubMed] [Google Scholar]
  • 188.Du L, Sánchez C. and Shen B, Metab. Eng, 2001, 3, 78–95. [DOI] [PubMed] [Google Scholar]
  • 189.Fisch KM, RSC Adv, 2013, 3, 18228–18247. [Google Scholar]
  • 190.Shen B, Du L, Sanchez C, Edwards DJ, Chen M. and Murrell JM, J. Ind. Microbiol. Biotechnol, 2001, 27, 378–385. [DOI] [PubMed] [Google Scholar]
  • 191.Wu K, Chung L, Revill WP, Katz L. and Reeves CD, Gene, 2000, 251, 81–90. [DOI] [PubMed] [Google Scholar]
  • 192.Moss NA, Seiler G, Leão TF, Castro-Falcón G, Gerwick L, Hughes CC and Gerwick WH, Angew. Chem. Int. Ed Engl, 2019, 58, 9027–9031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 193.Awakawa T, Fujioka T, Zhang L, Hoshino S, Hu Z, Hashimoto J, Kozone I, Ikeda H, Shin-Ya K, Liu W. and Abe I, Nat. Commun, 2018, 9, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 194.Awakawa T, Chem. Pharm. Bull., 2021, 69, 415–420. [DOI] [PubMed] [Google Scholar]
  • 195.Ritacco FV, Graziani EI, Summers MY, Zabriskie TM, Yu K, Bernan VS, Carter GT and Greenstein M, Appl. Environ. Microbiol, 2005, 71, 1971–1976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 196.Boettger D. and Hertweck C, Chembiochem, 2013, 14, 28–42. [DOI] [PubMed] [Google Scholar]
  • 197.Nielsen ML, Isbrandt T, Petersen LM, Mortensen UH, Andersen MR, Hoof JB and Larsen TO, PLoS One, 2016, 11, e0161199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 198.Zhang L, Wang C, Chen K, Zhong W, Xu Y. and Molnár I, Nat. Prod. Rep, 2023, 40, 62–88. [DOI] [PubMed] [Google Scholar]
  • 199.Groß S, Panter F, Pogorevc D, Seyfert CE, Deckarm S, Bader CD, Herrmann J. and Müller R, Chem. Sci, 2021, 12, 11882–11893. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 200.Schmitt S, Montalbán-López M, Peterhoff D, Deng J, Wagner R, Held M, Kuipers OP and Panke S, Nat. Chem. Biol, 2019, 15, 437–443. [DOI] [PubMed] [Google Scholar]
  • 201.Thokkadam A, Do T, Ran X, Brynildsen MP, Yang ZJ and Link AJ, ACS Cent. Sci, 2023, 9, 540–550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 202.Becka SA, Zeiser ET, LiPuma JJ and Papp-Wallace KM, Antimicrob. Agents Chemother, 2021, 65, e0133221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 203.Wu C. and van der Donk WA, Curr. Opin. Biotechnol, 2021, 69, 221–231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 204.Zhao X, Li Z. and Kuipers OP, Cell Chem Biol, 2020, 27, 1262–1271.e4. [DOI] [PubMed] [Google Scholar]
  • 205.Malico AA, Nichols L. and Williams GJ, Curr. Opin. Chem. Biol, 2020, 58, 45–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 206.Quirós LM, Carbajo RJ, Braña AF and Salas JA, J. Biol. Chem, 2000, 275, 11713–11720. [DOI] [PubMed] [Google Scholar]
  • 207.Olano C, Méndez C. and Salas JA, Nat. Prod. Rep, 2010, 27, 571–616. [DOI] [PubMed] [Google Scholar]
  • 208.He H, Appl. Microbiol. Biotechnol, 2005, 67, 444–452. [DOI] [PubMed] [Google Scholar]
  • 209.Ortiz-López FJ, Carretero-Molina D, Sánchez-Hidalgo M, Martín J, González I, Román-Hurtado F, de la Cruz M, García-Fernández S, Reyes F, Deisinger JP, Müller A, Schneider T. and Genilloud O, Angew. Chem. Int. Ed Engl, 2020, 59, 12654–12658. [DOI] [PubMed] [Google Scholar]
  • 210.Norris GE and Patchett ML, Curr. Opin. Struct. Biol, 2016, 40, 112–119. [DOI] [PubMed] [Google Scholar]
  • 211.Iorio M, Sasso O, Maffioli SI, Bertorelli R, Monciardini P, Sosio M, Bonezzi F, Summa M, Brunati C, Bordoni R, Corti G, Tarozzo G, Piomelli D, Reggiani A. and Donadio S, ACS Chem. Biol., 2014, 9, 398–404. [DOI] [PubMed] [Google Scholar]
  • 212.Sheng W, Xu B, Chen S, Li Y, Liu B. and Wang H, Org. Biomol. Chem, 2020, 18, 6095–6099. [DOI] [PubMed] [Google Scholar]
  • 213.Ren H, Biswas S, Ho S, van der Donk WA and Zhao H, ACS Chem. Biol., 2018, 13, 2966–2972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 214.Choudhary P. and Rao A, Glycoconj. J, 2021, 38, 233–250. [DOI] [PubMed] [Google Scholar]
  • 215.Oman TJ, Boettcher JM, Wang H, Okalibe XN and van der Donk WA, Nat. Chem. Biol, 2011, 7, 78–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 216.Peng H, Ishida K, Sugimoto Y, Jenke-Kodama H. and Hertweck C, Nat. Commun, 2019, 10, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 217.Ye S, Ballin G, Pérez-Victoria I, Braña AF, Martín J, Reyes F, Salas JA and Méndez C, Microb. Biotechnol, 2022, 15, 2905–2916. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 218.Yan Y, Wang H, Song Y, Zhu D, Shen Y. and Li Y, ACS Synth. Biol, 2021, 10, 2434–2439. [DOI] [PubMed] [Google Scholar]
  • 219.Song X, Lv J, Cao Z, Huang H, Chen G, Awakawa T, Hu D, Gao H, Abe I. and Yao X, Yao Xue Xue Bao, 2021, 11, 1676–1685. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 220.Yeo WL, Heng E, Tan LL, Lim YW, Ching KC, Tsai D-J, Jhang YW, Lauderdale T-L, Shia K-S, Zhao H, Ang EL, Zhang MM, Lim YH and Wong FT, Microb. Cell Fact, 2020, 19, 3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 221.Kato N, Furutani S, Otaka J, Noguchi A, Kinugasa K, Kai K, Hayashi H, Ihara M, Takahashi S, Matsuda K. and Osada H, ACS Chem. Biol., 2018, 13, 561–566. [DOI] [PubMed] [Google Scholar]
  • 222.Tibrewal N. and Tang Y, Annu. Rev. Chem. Biomol. Eng, 2014, 5, 347–366. [DOI] [PubMed] [Google Scholar]
  • 223.Chevrette MG, Gutiérrez-García K, Selem-Mojica N, Aguilar-Martínez C, Yañez-Olvera A, Ramos-Aboites HE, Hoskisson PA and Barona-Gómez F, Nat. Prod. Rep, 2020, 37, 566–599. [DOI] [PubMed] [Google Scholar]
  • 224.Chevrette MG and Currie CR, J. Ind. Microbiol. Biotechnol, 2019, 46, 257–271. [DOI] [PubMed] [Google Scholar]
  • 225.Stone MJ and Williams DH, Mol. Microbiol, 1992, 6, 29–34. [DOI] [PubMed] [Google Scholar]
  • 226.Challis GL and Hopwood DA, Proc. Natl. Acad. Sci. U. S. A, 2003, 100 Suppl 2, 14555–14561. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 227.Fischbach MA, Curr. Opin. Microbiol, 2009, 12, 520–527. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 228.Ju K-S and Nair SK, Curr. Opin. Chem. Biol, 2022, 71, 102214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 229.Parkinson EI, Erb A, Eliot AC, Ju K-S and Metcalf WW, Nat. Chem. Biol, 2019, 15, 1049–1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 230.Rokas A, Mead ME, Steenwyk JL, Raja HA and Oberlies NH, Nat. Prod. Rep, 2020, 37, 868–878. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 231.Medema MH, Cimermancic P, Sali A, Takano E. and Fischbach MA, PLoS Comput. Biol, 2014, 10, e1004016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 232.Firn RD and Jones CG, Nat. Prod. Rep, 2003, 20, 382–391. [DOI] [PubMed] [Google Scholar]
  • 233.Firn RD and Jones CG, Mol. Microbiol, 2000, 37, 989–994. [DOI] [PubMed] [Google Scholar]
  • 234.Bode HB, Bethe B, Höfs R. and Zeeck A, Chembiochem, 2002, 3, 619–627. [DOI] [PubMed] [Google Scholar]
  • 235.Scherlach K. and Hertweck C, Nat. Commun, 2021, 12, 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 236.Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ, J. Mol. Biol, 1990, 215, 403–410. [DOI] [PubMed] [Google Scholar]
  • 237.Buchfink B, Xie C. and Huson DH, Nat. Methods, 2015, 12, 59–60. [DOI] [PubMed] [Google Scholar]
  • 238.Finn RD, Clements J. and Eddy SR, Nucleic Acids Res, 2011, 39, W29–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 239.Weber T, Rausch C, Lopez P, Hoof I, Gaykova V, Huson DH and Wohlleben W, J. Biotechnol, 2009, 140, 13–17. [DOI] [PubMed] [Google Scholar]
  • 240.Li MHT, Ung PMU, Zajkowski J, Garneau-Tsodikova S. and Sherman DH, BMC Bioinformatics, 2009, 10, 185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 241.Blin K, Shaw S, Augustijn HE, Reitz ZL, Biermann F, Alanjary M, Fetter A, Terlouw BR, Metcalf WW, Helfrich EJN, van Wezel GP, Medema MH and Weber T, Nucleic Acids Res, 2023, 51, W46–W50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 242.Medema MH, Takano E. and Breitling R, Mol. Biol. Evol, 2013, 30, 1218–1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 243.Hannigan GD, Prihoda D, Palicka A, Soukup J, Klempir O, Rampula L, Durcak J, Wurst M, Kotowski J, Chang D, Wang R, Piizzi G, Temesi G, Hazuda DJ, Woelk CH and Bitton DA, Nucleic Acids Res, 2019, 47, e110–e110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 244.Tietz JI, Schwalen CJ, Patel PS, Maxson T, Blair PM, Tai H-C, Zakai UI and Mitchell DA, Nat. Chem. Biol, 2017, 13, 470–478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 245.Navarro-Muñoz JC, Selem-Mojica N, Mullowney MW, Kautsar SA, Tryon JH, Parkinson EI, De Los Santos ELC, Yeong M, Cruz-Morales P, Abubucker S, Roeters A, Lokhorst W, Fernandez-Guerra A, Cappelini LTD, Goering AW, Thomson RJ, Metcalf WW, Kelleher NL, Barona-Gomez F. and Medema MH, Nat. Chem. Biol, 2020, 16, 60–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 246.Kautsar SA, van der Hooft JJJ, de Ridder D. and Medema MH, Gigascience, 2021, 10, giaa154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 247.Cruz-Morales P, Kopp JF, Martínez-Guerrero C, Yáñez-Guerra LA, Selem-Mojica N, Ramos-Aboites H, Feldmann J. and Barona-Gómez F, Genome Biol. Evol, 2016, 8, 1906–1916. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 248.Skinnider MA, Johnston CW, Gunabalasingam M, Merwin NJ, Kieliszek AM, MacLellan RJ, Li H, Ranieri MRM, Webster ALH, Cao MPT, Pfeifle A, Spencer N, To QH, Wallace DP, Dejong CA and Magarvey NA, Nat. Commun, 2020, 11, 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 249.Mungan MD, Alanjary M, Blin K, Weber T, Medema MH and Ziemert N, Nucleic Acids Res, 2020, 48, W546–W552. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 250.Hadjithomas M, Chen I-MA, Chu K, Huang J, Ratner A, Palaniappan K, Andersen E, Markowitz V, Kyrpides NC and Ivanova NN, Nucleic Acids Res, 2017, 45, D560–D565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 251.Gilchrist CLM, Booth TJ, van Wersch B, van Grieken L, Medema MH and Chooi Y-H, Bioinformatics Advances, 2021, 1, vbab016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 252.Gilchrist CLM and Chooi Y-H, Bioinformatics, 2021, 37, 2473–2475. [DOI] [PubMed] [Google Scholar]
  • 253.van den Belt M, Gilchrist C, Booth TJ, Chooi Y-H, Medema MH and Alanjary M, BMC Bioinformatics, 2023, 24, 181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 254.Salamzade R, Cheong JZA, Sandstrom S, Swaney MH, Stubbendieck RM, Starr NL, Currie CR, Singh AM and Kalan LR, Microb Genom,, DOI: 10.1099/mgen.0.000988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 255.Medema MH, Blin K, Cimermancic P, de Jager V, Zakrzewski P, Fischbach MA, Weber T, Takano E. and Breitling R, Nucleic Acids Res, 2011, 39, W339–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 256.Gavriilidou A, Kautsar SA, Zaburannyi N, Krug D, Müller R, Medema MH and Ziemert N, Nat Microbiol, 2022, 7, 726–735. [DOI] [PubMed] [Google Scholar]
  • 257.Adamek M, Alanjary M. and Ziemert N, Nat. Prod. Rep, 2019, 36, 1295–1312. [DOI] [PubMed] [Google Scholar]
  • 258.Helfrich EJN, Ueoka R, Dolev A, Rust M, Meoded RA, Bhushan A, Califano G, Costa R, Gugger M, Steinbeck C, Moreno P. and Piel J, Nat. Chem. Biol, 2019, 15, 813–821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 259.van Heel AJ, de Jong A, Song C, Viel JH, Kok J. and Kuipers OP, Nucleic Acids Res, 2018, 46, W278–W281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 260.Blin K, Shaw S, Medema MH and Weber T, Nucleic Acids Res, 2023, 52, D586–D589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 261.Terlouw BR, Blin K, Navarro-Muñoz JC, Avalon NE, Chevrette MG, Egbert S, Lee S, Meijer D, Recchia MJJ, Reitz ZL, van Santen JA, Selem-Mojica N, Tørring T, Zaroubi L, Alanjary M, Aleti G, Aguilar C, Al-Salihi SAA, Augustijn HE, Avelar-Rivas JA, Avitia-Domínguez LA, Barona-Gómez F, Bernaldo-Agüero J, Bielinski VA, Biermann F, Booth TJ, Carrion Bravo VJ, Castelo-Branco R, Chagas FO, Cruz-Morales P, Du C, Duncan KR, Gavriilidou A, Gayrard D, Gutiérrez-García K, Haslinger K, Helfrich EJN, van der Hooft JJJ, Jati AP, Kalkreuter E, Kalyvas N, Kang KB, Kautsar S, Kim W, Kunjapur AM, Li Y-X, Lin G-M, Loureiro C, Louwen JJR, Louwen NLL, Lund G, Parra J, Philmus B, Pourmohsenin B, Pronk LJU, Rego A, Rex DAB, Robinson S, Rosas-Becerra LR, Roxborough ET, Schorn MA, Scobie DJ, Singh KS, Sokolova N, Tang X, Udwary D, Vigneshwari A, Vind K, Vromans SPJM, Waschulin V, Williams SE, Winter JM, Witte TE, Xie H, Yang D, Yu J, Zdouc M, Zhong Z, Collemare J, Linington RG, Weber T. and Medema MH, Nucleic Acids Res, 2022, 51, D603–D610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 262.Kautsar SA, Blin K, Shaw S, Weber T. and Medema MH, Nucleic Acids Res, 2020, 49, D490–D497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 263.van Santen JA, Poynton EF, Iskakova D, McMann E, Alsup TA, Clark TN, Fergusson CH, Fewer DP, Hughes AH, McCadden CA, Parra J, Soldatou S, Rudolf JD, Janssen EM-L, Duncan KR and Linington RG, Nucleic Acids Res, 2021, 50, D1317–D1323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 264.Schmitt I. and Barker FK, Nat. Prod. Rep, 2009, 26, 1585–1602. [DOI] [PubMed] [Google Scholar]
  • 265.Kang H-S, J. Ind. Microbiol. Biotechnol, 2017, 44, 285–293. [DOI] [PubMed] [Google Scholar]
  • 266.Ziemert N. and Jensen PR, Methods Enzymol, 2012, 517, 161–182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 267.Chang F-Y, Ternei MA, Calle PY and Brady SF, J. Am. Chem. Soc, 2013, 135, 17906–17912. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 268.Chang F-Y and Brady SF, Proc. Natl. Acad. Sci. U. S. A, 2013, 110, 2478–2483. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 269.Chang F-Y, Ternei MA, Calle PY and Brady SF, J. Am. Chem. Soc, 2015, 137, 6044–6052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 270.Chang F-Y and Brady SF, J. Am. Chem. Soc, 2011, 133, 9996–9999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 271.Kang H-S and Brady SF, Angew. Chem. Int. Ed Engl, 2013, 52, 11063–11067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 272.Kang H-S and Brady SF, ACS Chem. Biol., 2014, 9, 1267–1272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 273.Courvalin P, Clin. Infect. Dis, 2006, 42 Suppl 1, S25–34. [DOI] [PubMed] [Google Scholar]
  • 274.Waglechner N, McArthur AG and Wright GD, Nat Microbiol, 2019, 4, 1862–1871. [DOI] [PubMed] [Google Scholar]
  • 275.Donadio S, Sosio M, Stegmann E, Weber T. and Wohlleben W, Mol. Genet. Genomics, 2005, 274, 40–50. [DOI] [PubMed] [Google Scholar]
  • 276.Culp EJ, Waglechner N, Wang W, Fiebig-Comyn AA, Hsu Y-P, Koteva K, Sychantha D, Coombes BK, Van Nieuwenhze MS, Brun YV and Wright GD, Nature, 2020, 578, 582–587. [DOI] [PubMed] [Google Scholar]
  • 277.Nakano H. and Ōmura S, J. Antibiot., 2009, 62, 17–26. [DOI] [PubMed] [Google Scholar]
  • 278.Gao Q, Zhang C, Blanchard S. and Thorson JS, Chem. Biol, 2006, 13, 733–743. [DOI] [PubMed] [Google Scholar]
  • 279.Kim S-Y, Park J-S, Chae C-S, Hyun C-G, Choi BW, Shin J. and Oh K-B, Appl. Microbiol. Biotechnol, 2007, 75, 1119–1126. [DOI] [PubMed] [Google Scholar]
  • 280.Alcock BP, Huynh W, Chalil R, Smith KW, Raphenya AR, Wlodarski MA, Edalatmand A, Petkau A, Syed SA, Tsang KK, Baker SJC, Dave M, McCarthy MC, Mukiri KM, Nasir JA, Golbon B, Imtiaz H, Jiang X, Kaur K, Kwong M, Liang ZC, Niu KC, Shan P, Yang JYJ, Gray KL, Hoad GR, Jia B, Bhando T, Carfrae LA, Farha MA, French S, Gordzevich R, Rachwalski K, Tu MM, Bordeleau E, Dooley D, Griffiths E, Zubyk HL, Brown ED, Maguire F, Beiko RG, Hsiao WWL, Brinkman FSL, Van Domselaar G. and McArthur AG, Nucleic Acids Res, 2023, 51, D690–D699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 281.Walker AS and Clardy J, J. Chem. Inf. Model, 2021, 61, 2560–2571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 282.Bockus AT, Schwochert JA, Pye CR, Townsend CE, Sok V, Bednarek MA and Lokey RS, J. Med. Chem, 2015, 58, 7409–7418. [DOI] [PubMed] [Google Scholar]
  • 283.Olivia Riedling, Walker Allison S. and Antonis Rokas, Microbiology Spectrum, 2024, 12, e03400–23. [Google Scholar]
  • 284.Bajorath J, F1000Res, , DOI: 10.12688/f1000research.6653.1. [DOI] [Google Scholar]
  • 285.Talele TT, Khedkar SA and Rigby AC, Curr. Top. Med. Chem, 2010, 10, 127–141. [DOI] [PubMed] [Google Scholar]
  • 286.Zhu K-F, Yuan C, Du Y-M, Sun K-L, Zhang X-K, Vogel H, Jia X-D, Gao Y-Z, Zhang Q-F, Wang D-P and Zhang H-W, Mil Med Res, 2023, 10, 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 287.Ugariogu SN, International Journal of Pharmacognosy & Chinese Medicine, 2020, 4, 1–8. [Google Scholar]
  • 288.Agrwal A, in CADD and Informatics in Drug Discovery, eds. Rudrapal M. and Khan J, Springer Nature Singapore, Singapore, 2023, pp. 35–52. [Google Scholar]
  • 289.Yu W. and MacKerell AD Jr, Methods Mol. Biol., 2017, 1520, 85–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 290.Temml V. and Kutil Z, Comput. Struct. Biotechnol. J, 2021, 19, 1431–1444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 291.Tsagkaris C, Corriero AC, Rayan RA, Moysidis DV, Papazoglou AS and Alexiou A, in Computational Approaches in Drug Discovery, Development and Systems Pharmacology, eds. Gautam RK, Kamal MA and Mittal P, Academic Press, 2023, pp. 237–253. [Google Scholar]
  • 292.Walters WP and Wang R, J. Chem. Inf. Model, 2020, 60, 4109–4111. [DOI] [PubMed] [Google Scholar]
  • 293.Ma D-L, Chan DS-H and Leung C-H, Chem. Sci, 2011, 2, 1656–1665. [Google Scholar]
  • 294.Tyagi R, Singh A, Chaudhary KK and Yadav MK, in Bioinformatics, eds. Singh DB and Pathak RK, Academic Press, 2022, pp. 269–289. [Google Scholar]
  • 295.Choudhuri S, Yendluri M, Poddar S, Li A, Mallick K, Mallik S. and Ghosh B, Kinases and Phosphatases, 2023, 1, 117–140. [Google Scholar]
  • 296.Vedani A, Dobler M. and Lill MA, J. Med. Chem, 2005, 48, 3700–3703. [DOI] [PubMed] [Google Scholar]
  • 297.Pirhadi S, Shiri F. and Ghasemi JB, RSC Adv, 2015, 5, 104635–104665. [Google Scholar]
  • 298.Yoo C. and Shahlaei M, Chem. Biol. Drug Des, 2018, 91, 137–152. [DOI] [PubMed] [Google Scholar]
  • 299.Cramer RD, Patterson DE and Bunce JD, J. Am. Chem. Soc, 1988, 110, 5959–5967. [DOI] [PubMed] [Google Scholar]
  • 300.Klebe G, Abraham U. and Mietzner T, J. Med. Chem, 1994, 37, 4130–4146. [DOI] [PubMed] [Google Scholar]
  • 301.Bahia MS, Kaspi O, Touitou M, Binayev I, Dhail S, Spiegel J, Khazanov N, Yosipof A. and Senderowitz H, Mol. Inform, 2023, 42, e2200186. [DOI] [PubMed] [Google Scholar]
  • 302.Behnam MAM, Graf D, Bartenschlager R, Zlotos DP and Klein CD, J. Med. Chem, 2015, 58, 9354–9370. [DOI] [PubMed] [Google Scholar]
  • 303.Poola AA, Prabhu PS, Murthy TPK, Murahari M, Krishna S, Samantaray M. and Ramaswamy A, Front Mol Biosci, 2023, 10, 1106128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 304.Zhuo W, Lian Z, Bai W, Chen Y. and Xia H, Front. Mol. Biosci, 2023, 10, 1164349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 305.Meng X-Y, Zhang H-X, Mezei M. and Cui M, Curr. Comput. Aided Drug Des, 2011, 7, 146–157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 306.Bender BJ, Gahbauer S, Luttens A, Lyu J, Webb CM, Stein RM, Fink EA, Balius TE, Carlsson J, Irwin JJ and Shoichet BK, Nat. Protoc, 2021, 16, 4799–4832. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 307.Lyne PD, Drug Discov. Today, 2002, 7, 1047–1055. [DOI] [PubMed] [Google Scholar]
  • 308.Vilar S, Ferino G, Phatak SS, Berk B, Cavasotto CN and Costanzi S, J. Mol. Graph. Model, 2011, 29, 614–623. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 309.Kamenik AS, Singh I, Lak P, Balius TE, Liedl KR and Shoichet BK, Proceedings of the National Academy of Sciences, 2021, 118, e2106195118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 310.Šinko G, Chem. Biol. Interact, 2019, 308, 216–223. [DOI] [PubMed] [Google Scholar]
  • 311.Yan S, Elmes MW, Tong S, Hu K, Awwa M, Teng GYH, Jing Y, Freitag M, Gan Q, Clement T, Wei L, Sweeney JM, Joseph OM, Che J, Carbonetti GS, Wang L, Bogdan DM, Falcone J, Smietalo N, Zhou Y, Ralph B, Hsu H-C, Li H, Rizzo RC, Deutsch DG, Kaczocha M. and Ojima I, Eur. J. Med. Chem, 2018, 154, 233–252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 312.Beveridge DL and DiCapua FM, Annu. Rev. Biophys. Biophys. Chem, 1989, 18, 431–492. [DOI] [PubMed] [Google Scholar]
  • 313.Enyedy IJ and Egan WJ, J. Comput. Aided Mol. Des, 2008, 22, 161–168. [DOI] [PubMed] [Google Scholar]
  • 314.Zev S, Raz K, Schwartz R, Tarabeh R, Gupta PK and Major DT, J. Chem. Inf. Model, 2021, 61, 2957–2966. [DOI] [PubMed] [Google Scholar]
  • 315.Rifai EA, van Dijk M. and Geerke DP, Front Mol Biosci, 2020, 7, 114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 316.Hou T, Wang J, Li Y. and Wang W, J. Chem. Inf. Model, 2011, 51, 69–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 317.Genheden S. and Ryde U, Expert Opin. Drug Discov, 2015, 10, 449–461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 318.Wang E, Sun H, Wang J, Wang Z, Liu H, Zhang JZH and Hou T, Chem. Rev, 2019, 119, 9478–9508. [DOI] [PubMed] [Google Scholar]
  • 319.Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P. and Hassabis D, Nature, 2021, 596, 583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 320.Holcomb M, Chang Y-T, Goodsell DS and Forli S, Protein Sci, 2023, 32, e4530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 321.Scardino V, Di Filippo JI and Cavasotto CN, iScience, 2023, 26, 105920. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 322.Lyu J, Kapolka N, Gumpper R, Alon A, Wang L, Jain MK, Barros-Álvarez X, Sakamoto K, Kim Y, DiBerto J, Kim K, Tummino TA, Huang S, Irwin JJ, Tarkhanova OO, Moroz Y, Skiniotis G, Kruse AC, Shoichet BK and Roth BL, bioRxiv,, DOI: 10.1101/2023.12.20.572662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 323.Wong F, Krishnan A, Zheng EJ, Stärk H, Manson AL, Earl AM, Jaakkola T. and Collins JJ, Mol. Syst. Biol, 2022, 18, e11081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 324.Corso G, Stärk H, Jing B, Barzilay R. and Jaakkola T, arXiv [q-bio.BM], 2022. [Google Scholar]
  • 325.Stärk H, Ganea O-E, Pattanaik L, Barzilay R. and Jaakkola T, arXiv [q-bio.BM], 2022. [Google Scholar]
  • 326.Gentile F, Agrawal V, Hsing M, Ton A-T, Ban F, Norinder U, Gleave ME and Cherkasov A, ACS Cent. Sci, 2020, 6, 939–949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 327.Buttenschoen M, Morris GM and Deane CM, Chem. Sci,, DOI: 10.1039/D3SC04185A. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 328.Lavecchia A. and Di Giovanni C, Curr. Med. Chem, 2013, 20, 2839–2860. [DOI] [PubMed] [Google Scholar]
  • 329.Bosquez-Berger T, Gudorf JA, Kuntz CP, Desmond JA, Schlebach JP, VanNieuwenhze MS and Straiker A, J. Med. Chem, 2023, 66, 9466–9494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 330.Sorokina M, Merseburger P, Rajan K, Yirik MA and Steinbeck C, J. Cheminform, 2021, 13, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 331.Kearney SE, Zahoránszky-Kőhalmi G, Brimacombe KR, Henderson MJ, Lynch C, Zhao T, Wan KK, Itkin Z, Dillon C, Shen M, Cheff DM, Lee TD, Bougie D, Cheng K, Coussens NP, Dorjsuren D, Eastman RT, Huang R, Iannotti MJ, Karavadhi S, Klumpp-Thomas C, Roth JS, Sakamuru S, Sun W, Titus SA, Yasgar A, Zhang Y-Q, Zhao J, Andrade RB, Brown MK, Burns NZ, Cha JK, Mevers EE, Clardy J, Clement JA, Crooks PA, Cuny GD, Ganor J, Moreno J, Morrill LA, Picazo E, Susick RB, Garg NK, Goess BC, Grossman RB, Hughes CC, Johnston JN, Joullie MM, Kinghorn AD, Kingston DGI, Krische MJ, Kwon O, Maimone TJ, Majumdar S, Maloney KN, Mohamed E, Murphy BT, Nagorny P, Olson DE, Overman LE, Brown LE, Snyder JK, Porco JA Jr, Rivas F, Ross SA, Sarpong R, Sharma I, Shaw JT, Xu Z, Shen B, Shi W, Stephenson CRJ, Verano AL, Tan DS, Tang Y, Taylor RE, Thomson RJ, Vosburg DA, Wu J, Wuest WM, Zakarian A, Zhang Y, Ren T, Zuo Z, Inglese J, Michael S, Simeonov A, Zheng W, Shinn P, Jadhav A, Boxer MB, Hall MD, Xia M, Guha R. and Rohde JM, ACS Cent. Sci, 2018, 4, 1727–1741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 332.Sterling T. and Irwin JJ, J. Chem. Inf. Model, 2015, 55, 2324–2337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 333.Opo FAD, Rahman MM, Ahammad F, Ahmed I, Bhuiyan MA and Asiri AM, Sci. Rep, 2021, 11, 1–17.33414495 [Google Scholar]
  • 334.Ang D, Kendall R. and Atamian HS, Biology,, DOI: 10.3390/biology12040519. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 335.de Oliveira MVD, Fernandes GMB, da Costa KS, Vakal S. and Lima AH, RSC Adv, 2022, 12, 18834–18847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 336.Kumar KM, Anbarasu A. and Ramaiah S, Mol. Biosyst, 2014, 10, 891–900. [DOI] [PubMed] [Google Scholar]
  • 337.Lin L, Lin K, Wu X, Liu J, Cheng Y, Xu L-Y, Li E-M and Dong G, Front Chem, 2021, 9, 719949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 338.Joshi T, Joshi T, Pundir H, Sharma P, Mathpal S. and Chandra S, J. Biomol. Struct. Dyn, 2021, 39, 6728–6746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 339.Muthu Kumar T. and Ramanathan K, J. Comput. Biophys. Chem, 2022, 21, 515–528. [Google Scholar]
  • 340.Liang H, Liu H, Kuang Y, Chen L, Ye M. and Lai L, J. Chem. Inf. Model, 2020, 60, 4350–4358. [DOI] [PubMed] [Google Scholar]
  • 341.Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuz’min VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A. and Tropsha A, J. Med. Chem, 2014, 57, 4977–5010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 342.Wang T, Wu M-B, Lin J-P and Yang L-R, Expert Opin. Drug Discov, 2015, 10, 1283–1300. [DOI] [PubMed] [Google Scholar]
  • 343.Zhang L, Tan J, Han D. and Zhu H, Drug Discov. Today, 2017, 22, 1680–1685. [DOI] [PubMed] [Google Scholar]
  • 344.Jaeger S, Fulle S. and Turk S, J. Chem. Inf. Model, 2018, 58, 27–35. [DOI] [PubMed] [Google Scholar]
  • 345.Yang K, Swanson K, Jin W, Coley C, Eiden P, Gao H, Guzman-Perez A, Hopper T, Kelley B, Mathea M, Palmer A, Settels V, Jaakkola T, Jensen K. and Barzilay R, J. Chem. Inf. Model, 2019, 59, 3370–3388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 346.Hadipour H, Liu C, Davis R, Cardona ST and Hu P, BMC Bioinformatics,, DOI: 10.1186/s12859-022-04667-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 347.Schmidt J, Marques MRG, Botti S. and Marques MAL, npj Computational Materials, 2019, 5, 1–36. [Google Scholar]
  • 348.Polishchuk P, Madzhidov T, Gimadiev T, Bodrov A, Nugmanov R. and Varnek A, J. Comput. Aided Mol. Des, 2017, 31, 829–839. [DOI] [PubMed] [Google Scholar]
  • 349.Puzyn T, Leszczynski J. and Cronin MT, Recent Advances in QSAR Studies: Methods and Applications, Springer Science & Business Media, 2010. [Google Scholar]
  • 350.Rogers D. and Hahn M, J. Chem. Inf. Model, 2010, 50, 742–754. [DOI] [PubMed] [Google Scholar]
  • 351.Guha R, Comput J. Aided Mol. Des, 2008, 22, 857–871. [DOI] [PubMed] [Google Scholar]
  • 352.Johansson U, Sönströd C, Norinder U. and Boström H, Future Med. Chem, 2011, 3, 647–663. [DOI] [PubMed] [Google Scholar]
  • 353.Polishchuk P, J. Chem. Inf. Model, 2017, 57, 2618–2639. [DOI] [PubMed] [Google Scholar]
  • 354.Bruce CL, Melville JL, Pickett SD and Hirst JD, J. Chem. Inf. Model, 2007, 47, 219–227. [DOI] [PubMed] [Google Scholar]
  • 355.Ma J, Sheridan RP, Liaw A, Dahl GE and Svetnik V, J. Chem. Inf. Model, 2015, 55, 263–274. [DOI] [PubMed] [Google Scholar]
  • 356.Linardatos P, Papastefanopoulos V. and Kotsiantis S, Entropy, 2020, 23, 18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 357.Ponzoni I, Páez Prosper JA and Campillo NE, Wiley Interdiscip. Rev. Comput. Mol. Sci,, DOI: 10.1002/wcms.1681. [DOI] [Google Scholar]
  • 358.Barredo Arrieta A, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, Garcia S, Gil-Lopez S, Molina D, Benjamins R, Chatila R. and Herrera F, Inf. Fusion, 2020, 58, 82–115. [Google Scholar]
  • 359.Miller T, Artif. Intell, 2019, 267, 1–38. [Google Scholar]
  • 360.Lapuschkin S, Wäldchen S, Binder A, Montavon G, Samek W. and Müller K-R, Nat. Commun, 2019, 10, 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 361.Adadi A. and Berrada M, IEEE Access, 2018, 6, 52138–52160. [Google Scholar]
  • 362.Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, Isayev O, Curtalolo S, Fourches D, Cohen Y, Aspuru-Guzik A, Winkler DA, Agrafiotis D, Cherkasov A. and Tropsha A, Chem. Soc. Rev, 2020, 49, 3525–3564. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 363.Rosenbaum L, Hinselmann G, Jahn A. and Zell A, J. Cheminform, 2011, 3, 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 364.Sheridan RP, J. Chem. Inf. Model, 2019, 59, 1324–1337. [DOI] [PubMed] [Google Scholar]
  • 365.Brown BP, Mendenhall J, Geanes AR and Meiler J, J. Chem. Inf. Model, 2021, 61, 603–620. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 366.Das A. and Rad P, arXiv.org. [Google Scholar]
  • 367.Ivanovs M, Kadikis R. and Ozols K, Pattern Recognit. Lett. [Google Scholar]
  • 368.Ribeiro MT, Singh S. and Guestrin C, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, New York, NY, USA, 2016, pp. 1135–1144. [Google Scholar]
  • 369.Breiman L, Mach. Learn, 2001, 45, 5–32. [Google Scholar]
  • 370.Petsiuk V, Das A. and Saenko K, British Machine Vision Conference. [Google Scholar]
  • 371.Ying Z, Bourgeois D, You J, Zitnik M. and Leskovec J, in Advances in Neural Information Processing Systems, eds. Wallach H, Larochelle H, Beygelzimer A, textquotesingle Alché-Buc F, Fox E. and Garnett R, Curran Associates, Inc., 2019, vol. 32. [PMC free article] [PubMed] [Google Scholar]
  • 372.Whitmore LS, George A. and Hudson CM, arXiv [stat.ML], 2016. [Google Scholar]
  • 373.Molnar C, Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, 2nd edn., 2022. [Google Scholar]
  • 374.Alvarez-Melis D. and Jaakkola TS, arXiv [cs.LG], 2018. [Google Scholar]
  • 375.Fisher A, Rudin C. and Dominici F, J. Mach. Learn. Res,, DOI: 10.1080/01621459.1963.10500830. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 376.Guha R. and Jurs PC, J. Chem. Inf. Model, 2005, 45, 800–806. [DOI] [PubMed] [Google Scholar]
  • 377.Brdar S, Panić M, Matavulj P, Stanković M, Bartolić D. and Šikoparija B, Sci. Rep, 2023, 13, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 378.Wojtuch A, Danel T, Podlewska S. and Maziarka Ł, J. Cheminform, 2023, 15, 81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 379.C. G. Explanations, Why should I trust my Graph Neural Network? - Stanford CS224W GraphML Tutorials - Medium, https://medium.com/stanford-cs224w/why-should-i-trust-my-graph-neural-network-4d964052bd85, (accessed February 16, 2024).
  • 380.Debnath AK, Lopez de Compadre RL, Debnath G, Shusterman AJ and Hansch C, J. Med. Chem, 1991, 34, 786–797. [DOI] [PubMed] [Google Scholar]
  • 381.Samek W, Montavon G, Vedaldi A, Hansen LK and Müller K, Explainable AI,, DOI: 10.1007/978-3-030-28954-6. [DOI] [Google Scholar]
  • 382.Adebayo J, Gilmer J, Muelly M, Goodfellow I, Hardt M. and Kim B, Adv. Neural Inf. Process. Syst. [Google Scholar]
  • 383.Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D. and Batra D, in 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, 2017, pp. 618–626. [Google Scholar]
  • 384.Sundararajan M, Taly A. and Yan Q, in Proceedings of the 34th International Conference on Machine Learning, eds. Precup D. and Teh YW, PMLR, 06--11 Aug 2017, vol. 70, pp. 3319–3328. [Google Scholar]
  • 385.Bach S, Binder A, Montavon G, Klauschen F, Müller K-R and Samek W, PLoS One, 2015, 10, e0130140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 386.Lundberg SM and Lee S-I, in Advances in Neural Information Processing Systems, eds. Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S. and Garnett R, Curran Associates, Inc., 2017, vol. 30. [Google Scholar]
  • 387.Zhong S, Hu J, Yu X. and Zhang H, Chem. Eng. J, 2021, 408, 127998. [Google Scholar]
  • 388.McCloskey K, Taly A, Monti F, Brenner MP and Colwell LJ, Proceedings of the National Academy of Sciences, 2019, 116, 11624–11629. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 389.Jiménez-Luna J, Skalic M, Weskamp N. and Schneider G, J. Chem. Inf. Model, 2021, 61, 1083–1094. [DOI] [PubMed] [Google Scholar]
  • 390.Baldassarre F. and Azizpour H, arXiv [cs.LG], 2019. [Google Scholar]
  • 391.Wojtuch A, Jankowski R. and Podlewska S, J. Cheminform, 2021, 13, 74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 392.Rodríguez-Pérez R. and Bajorath J, J. Med. Chem, 2020, 63, 8761–8777. [DOI] [PubMed] [Google Scholar]
  • 393.Pereira F, Latino DARS and Gaudêncio SP, Molecules, 2015, 20, 4848–4873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 394.Yue Z, Zhang W, Lu Y, Yang Q, Ding Q, Xia J. and Chen Y, PeerJ, 2015, 3, e1425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 395.Sahu SK, Kumar R, Singh VK and Ojha KK, Mol. Simul, 2023, 49, 1077–1090. [Google Scholar]
  • 396.Gahl M, Kim HW, Glukhov E, Gerwick WH and Cottrell GW, J. Nat. Prod,, DOI: 10.1021/acs.jnatprod.3c00879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 397.Egieyeh S, Syce J, Malan SF and Christoffels A, PLoS One, 2018, 13, e0204644. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 398.Dias T, Gaudêncio SP and Pereira F, Mar. Drugs,, DOI: 10.3390/md17010016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 399.Masalha M, Rayan M, Adawi A, Abdallah Z. and Rayan A, Mol. Med. Rep, 2018, 18, 763–770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 400.Wong F, Zheng EJ, Valeri JA, Donghia NM, Anahtar MN, Omori S, Li A, Cubillos-Ruiz A, Krishnan A, Jin W, Manson AL, Friedrichs J, Helbig R, Hajian B, Fiejtek DK, Wagner FF, Soutter HH, Earl AM, Stokes JM, Renner LD and Collins JJ, Nature, 2023, 626, 177–185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 401.Zhang R, Ren S, Dai Q, Shen T, Li X, Li J. and Xiao W, J. Cheminform, 2022, 14, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 402.Brown KS, Jamieson P, Wu W, Vaswani A, Alcazar Magana A, Choi J, Mattio LM, Cheong PH-Y, Nelson D, Reardon PN, Miranda CL, Maier CS and Stevens JF, Antioxid. Redox Signal, 2022, 11, 1400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 403.Gu J, Gui Y, Chen L, Yuan G, Lu H-Z and Xu X, PLoS One, 2013, 8, e62839. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 404.Mangal M, Sagar P, Singh H, Raghava GPS and Agarwal SM, Nucleic Acids Res, 2013, 41, D1124–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 405.Choi H, Cho SY, Pak HJ, Kim Y, Choi J-Y, Lee YJ, Gong BH, Kang YS, Han T, Choi G, Cho Y, Lee S, Ryoo D. and Park H, J. Cheminform, 2017, 9, 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 406.Sorokina M. and Steinbeck C, J. Cheminform, 2020, 12, 1–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 407.Feher M. and Schmidt JM, J. Chem. Inf. Comput. Sci, 2003, 43, 218–227. [DOI] [PubMed] [Google Scholar]
  • 408.Cai C, Wang S, Xu Y, Zhang W, Tang K, Ouyang Q, Lai L. and Pei J, J. Med. Chem, 2020, 63, 8683–8694. [DOI] [PubMed] [Google Scholar]
  • 409.Qiang B, Lai J, Jin H, Zhang L. and Liu Z, Int. J. Mol. Sci, 2021, 22, 4632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 410.Kim HW, Wang M, Leber CA, Nothias L-F, Reher R, Kang KB, van der Hooft JJJ, Dorrestein PC, Gerwick WH and Cottrell GW, J. Nat. Prod, 2021, 84, 2795–2807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 411.Chen Y, Stork C, Hirte S. and Kirchmair J, Biomolecules,, DOI: 10.3390/biom9020043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 412.Riniker S. and Landrum GA, J. Cheminform, 2013, 5, 43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 413.Maroni G, Pallante L, Di Benedetto G, Deriu MA, Piga D. and Grasso G, Curr Res Food Sci, 2022, 5, 2270–2280. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES