Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jul 1.
Published in final edited form as: Nat Prod Rep. 2020 Jan 8;37(7):879–892. doi: 10.1039/c9np00050j

Recent developments in self-resistance gene directed natural product discovery

Yan Yan a, Nicholas Liu a, Yi Tang a,b,*
PMCID: PMC7340575  NIHMSID: NIHMS1069080  PMID: 31912842

Abstract

Natural products (NPs) are important sources of human therapeutic agents and pesticides. To prevent self-harm from bioactive NPs, some microbial producers employ self-resistance genes to protect themselves. One effective strategy is to employ a self-resistance enzyme (SRE), which is a slightly mutated version of the original metabolic enzyme, and is resistant to the toxic NP but is still functional. The presence of a SRE in a gene cluster can serve as a predictive window to the biological activity of the NPs synthesized by the pathway. In this highlight, we summarize representative examples of NP biosynthetic pathways that utilize self-resistance genes for protection. Recent discoveries based on self-resistance gene identification have helped in bridging the gap between activity-guided and genome-driven approaches for NP discovery and functional assignment.

Graphic abstract

graphic file with name nihms-1069080-f0001.jpg

This review covers recent natural product research directed by self-resistance genes, which bridges the gap between activity-guided and genome-driven approaches.

1. Introduction

Natural products (NPs) are small organic molecules derived from secondary metabolism of living organisms, mostly found in microbes and plants. NPs have been evolutionally optimized to bind to specific but diverse macro-biomolecule targets, facilitated by their complex structures unrivalled by synthetic molecule libraries. As a result, NPs continue to inspire innovative discoveries in the field of chemistry, biology and medicine. From 1981 to 2014, among 1562 newly approved drugs, 710 were derived from NPs. Between 1997 and 2010, NPs and their derivatives made up approximately 36% of all new registered pesticide ingredients1, 2. NPs will remain an important source for discovering new human therapeutic agents or agrochemicals3.

In the past century, the rapid development of biological assays and chemical separation techniques has led to the identification and isolation of many bioactive NPs1, 2. In activity-guided NP discovery, a biomass or extract with bioactivity is first identified using an activity screen. The desired activity is enriched and tracked by several rounds of fractionation and purification, until a pure bioactive compound is obtained4 (Fig. 1a). The most famous example is perhaps the serendipitous discovery of penicillin, identified through inhibition of bacterial growth. Such activity-guided discovery of NPs has remained a mainstay for almost a century, becoming more versatile with the development of higher-resolution and higher-throughput assays5.

Fig. 1.

Fig. 1

Workflow of NP discovery. (a) activity-guided NP discovery approach. (b) genome-driven NP discovery approach.

The rapid development of genome sequencing technologies has revolutionized NP discovery6, 7. Extensive experimental studies of the biosynthetic genes of NPs showed that the genes encoding enzymes responsible microbial for biosynthesis of one NP are typically clustered in a genome as a biosynthetic gene cluster (BGC)8. This may facilitate co-regulation, co-expression and horizontal transfer of all genes within a BGC together. With the clustering of biosynthetic genes and powerful bioinformatics tools9, 10, 11, tens of thousands of BGCs have been identified from sequenced microbial genomes. Identification of a BGC is typically performed first through cataloguing of anchoring biosynthetic core enzymes, such as polyketide synthases (PKSs), nonribosomal peptide synthetases (NRPSs), terpene synthases (TSs), etc12, that build up the backbone of the molecules. Among identified BGCs, it is estimated that fewer than 10% are associated with characterized NPs, and the other >90% of gene clusters have unknown products13. This led to the common characterization of these BGCs as transcriptionally silent, cryptic and/or biosynthetic dark matter. The lack of production of NPs from these BGCs can be attributed to several reasons: 1) the production of many NPs in native strains is in responses to complex combinations of environmental and growth signals, which are difficult to recapitulate under axenic laboratory culturing conditions; 2) some NPs are only produced at very low concentrations, which could elude detection; and 3) the biological activities of the NPs are not targeted during activity-guided screens, and thereby overlooked during extract analysis. Uncovering the structures and activities of the >90% cryptic BGCs of unknown products is therefore a tantalizing new approach of drug discovery.

In the last decade, genome-driven NP discovery approaches have led to the effective mining of these cryptic BGCs14 (Fig. 1b). The application of modern cloning and synthetic biology tools, which include overexpression of pathway-specific transcription factors, epigenetic modification of chromatin structure, and heterologous expression of pathways in model hosts, have demonstrated there is significant potential to increase the NP structure space from microbes11, 1517. Although many NPs with new structures have been discovered using genome mining approaches, the molecular targets and biological functions of these NPs are not easily identified. This is because genome mining efforts have been generally targeting the presence of interesting biosynthetic enzymes and from ecologically unique species, and are not activity-guided. As there are exponential increases in the number of cryptic BGCs, the challenge is to prioritize BGC mining with biological activity to identify the most therapeutically promising NPs18.

How can we predict the activity of a NP based on a genomic sequence? The answer to this question can help us pinpoint the untapped bioactive NPs from microbial secondary metabolism. In this review, we will summarize the occurrence of self-resistance genes in characterized BGCs. We will then highlight the use of the self-resistance gene as a predictive window for linking BGCs to NP activities, and to mine NPs with desired activity. In our opinion, this approach could be effective in leading the renaissance of NP discovery for novel human therapeutic agents and agricultural chemicals.

2. Self-resistance gene co-localized with BGC implies the molecular target of a NP

Many NPs are believed to be produced by organisms (the host) to kill or limit the growth of competitor organisms through the inhibition or inactivation of essential housekeeping enzymes. However, if the same enzyme target is conserved in the producing organism and is essential, the NP will also be toxic to the producing host. Therefore, self-protecting methods must be present in the host to confer resistance to the biosynthesized NPs. Several mechanisms of self-resistance are known, including efflux pumps that actively transport the metabolites to extracellular space; resistance proteins that are present to sequester or modify highly active NPs that remain in the cell; and enzymes that modify housekeeping enzymes in the host to evade NP inhibition19, 20. The accumulating knowledge of self-resistance mechanisms have facilitated NP research. For example, Wright and co-workers designed a resistance-based NP discovery platform for the discovery of novel antibiotics21. This platform can be summarized in the following steps: 1. Screening of environmental samples using desired antibiotic to enrich microbial producers of selected antibiotic scaffolds; 2. Determining the presence of a desired BGC in unique microbial isolates using polymerase chain reaction (PCR)-based screening; 3. Further phylogenetic analysis of the producers of new NPs

An additional strategy nature employs for self-resistance is to encode a functionally equivalent, self-resistance enzyme (SRE) that is a variant of the housekeeping enzyme. The SRE is highly similar in sequence to the housekeeping target, but contains mutations that render the enzyme insensitive to NP inhibition, while still maintaining activity22, 23. Because of the sequence similarities, the functions of SREs can be readily predicted through bioinformatic analysis. The SRE is often co-clustered with the BGC of the NP and is transcriptionally co-regulated with the biosynthetic genes during NP production. This ensures the SRE is present when the NP starts to accumulate intracellularly. The use of SRE strategy is widely adopted by NP producing microorganisms in both bacterial and fungal species22, 23.

From a genome mining perspective, the SRE effectively serves as a readout of the enzyme target of the associated NP. Here we summarize 36 examples of SRE co-localizations, which are grouped into different modes of action of NPs (Table 1). Among these 36 examples, 24 proposed SREs have been functionally validated by either in vivo or in vitro experiments (Table 1). The examples demonstrate co-localization of biosynthetic and SRE genes is prevalent for NP producing microorganisms.

Table 1.

Example of BGCs containing a self-resistance-gene as a duplicate copy of the housekeeping gene (molecular target)

natural products organism molecular target biosynthetic core gene SRE verifieda FDA approvedb SRE inspiredc
inhibitors of DNA replication
novobiocin (1) Streptomyces niveus bacterial DNA gyrase NRPS V Y N
chlorobiocin (2) Streptomyces roseochromogenes bacterial DNA gyrase NRPS V N N
coumermycin A1 (3) Streptomyces rishiriensis bacterial DNA gyrase NRPS V N N
griselimycin (4) Streptomyces sp. DSM-40835 DnaN NRPS V N Y

inhibitors of protein biosynthesis
mupirocin (5) Pseudomonas fluorescens isoleucyl-tRNA synthetase PKS V Y N
thiomarinol A (6) Pseudoalteromonas sp. SANK 73390 isoleucyl-tRNA synthetase PKS V N N
cladosporin (7) Cladosporium cladosporioides lysyl-tRNA synthetase NRPS V Y N
borrelidin (8) Streptomyces parvulus threonyl-tRNA synthetase PKS V N N
albomycin δ2 (9) Streptomyces sp. ATCC 700974 seryl-tRNA synthetase NRPS V N N
indolmycin (10) Streptomyces griseus tryptophanyl-tRNA synthetase other P N N
agrocin 84 (11) Agrobacterium radiobacter leucyl-tRNA synthetase other V N N
thiocillin (12) Bacillus cereus 50S ribosomal protein L11 precursor peptide P N N
rubradirin (13) Streptomyces achromogenes translation initiation factor PKS P N N
GE2270 (14) Planobispora rosea elongation factor Tu precursor peptide P N N
thiomuracin (15) Thermobispora bispora elongation factor Tu P N N
bengamide B (16) Myxococcus virescens methionine aminopeptidase, type I NRPS V N N
fumagillin (17) Aspergillus fumigatus methionine aminopeptidase, type II PKS P N N

inhibitors of protein degradation
salinosporamide A (18) Salinispora tropica proteasome subunit beta PKS-NRPS hybrid V N N
eponemycin (19) Streptomyces hygroscopicus proteasome subunit beta NRPS V N N
fellutamide B (20) Aspergillus nidulans proteasome component C5 NRPS V N Y

inhibitors of lipid metabolism
andrimid (21) Pantoea agglomerans acetyl-CoA carboxylase NRPS P N N
platensimycin (22) / platencin (23) Streptomyces platensis FabF TS V N N
kalimantacin (24) Pseudomonas fluorescens FabI PKS V N N
lovastatin (25) Aspergillus terreus 3-hydroxy-3-methylglutaryl-coenzyme A reductase PKS V Y N
zaragozic acid (26) Curvularia lunata squalene synthase PKS P N N
thiolactomycin (27) Salinispora pacifica fatty acid synthase II PKS V N Y
thiotetroamide C (28) Streptomyces afghaniensis fatty acid synthase II PKS V N Y

inhibitors of carbohydrate and energy metabolism
pentalenolactone (29) Streptomyces avermitilis glyceraldehyde-3-phosphate dehydrogenase TS V N N
heptelidic acid (30) Aspergillus oryzae glyceraldehyde-3-phosphate dehydrogenase TS V N N
aurovertin E (31) Metarhizium anisopliae ATP synthase beta chain PKS P N N
citreoviridin (32) Aspergillus terreus ATP synthase beta chain PKS P N N

inhibitors of amino acid metabolism
phaseolotoxin (33) Pseudomonas syringae ornithine carbamoyltransferase other V N N
aspterric acid (34) Aspergillus terreus dihydroxy-acid dehydratase TS V N Y

inhibitors of nucleotide metabolism
mycophenolic acid (35) Penicillium brevicompactum inosine-5′-monophosphate dehydrogenase PKS V Y Y

inhibitors of xenobiotics biodegradation
cephamycin C (36) Streptomyces clavuligerus beta-lactamase NRPS P N N
a

verified (V) or proposed (P).

b

approved (A) or not (N).

c

discovery is SRE inspired (Y) or not (N).

2.1. Inhibitors of DNA replication

DNA replication is a fundamental life process that produces identical copies of DNA molecules using the original chromosome in a template process. This process requires multiple enzyme complexes in coordination to achieve high efficiency and accuracy. This process and the enzymes involved are highly conserved among prokaryotes, while divergent in eukaryotes. Therefore, bacterial DNA replication enzymes are ideal targets for mining antibacterial agents24. The first NP identified to target DNA replication process was novobiocin (1), which was isolated in 1955 and used as an anti-infection agent in clinical treatment of Staphylococcus25 (Fig. 2). The enzyme target of 1 was determined to be DNA gyrase, which belongs to a subclass of type II topoisomerase26. This enzyme unwinds double-stranded DNA by reducing the topological strain in an ATP dependent manner27. Biosynthetic studies of 1 revealed the gyrB gene present within the BGC encodes a variant of the housekeeping DNA gyrase that is not sensitive to 128, 29 (Table 1). Co-localization of self-resistant DNA gyrase variants were also found in BGCs of other DNA gyrase inhibitors, such as chlorobiocin (2)30 and coumermycin A1 (3)31 (Fig. 2 and Table 1).

Fig. 2.

Fig. 2

Example NP DNA replication inhibitors that employ a mutated target as SRE encoded in biosynthetic gene clusters.

2.2. Inhibitors of protein biosynthesis

Enzymes involved in protein biosynthesis are classic targets for the development of antibacterial agents because of the well-studied functional differences between prokaryotes and eukaryotes32. During protein biosynthesis, transfer RNAs (tRNAs) are first aminoacylated by the cognate amino acids by the 20 aminoacyl-tRNA synthetases (aaRSs). There are several NPs targeting different aaRSs including mupirocin (5) (IleRS) 33, 34, thiomarinol (6) (IleRS)35, cladosporin (7) (LysRS)36, borrelidin (8) (ThrRS)37, albomycin δ2 (9) (SerRS)38, indolmycin (10) (TrpRS)39 and agrocin 84 (11) (LeuRS)40, 41 (Fig. 3). Among these natural inhibitors, 5 is approved by the FDA to treat a skin infectious disease impetigo, and 7 is a FDA approved drug against malarial parasites. To prevent inhibition of housekeeping aaRSs by these potent inhibitors, the producing strains of these NPs encode mutated versions of corresponding aaRSs as SREs in the BGCs3438, 41 (Table 1).

Fig. 3.

Fig. 3

Example NP protein biosynthesis inhibitors that employ a mutated target as SRE encoded in biosynthetic gene clusters.

Many classic NPs inhibit the translation process at the ribosome, including tetracyclines, erythromycins, etc. The producers of these well-studied NPs use different mechanisms of self-resistance as noted earlier. The use of SRE for host protection of ribosomal inhibitors is also a widely adopted strategy. Thiopeptides such as thiocillin (12) perturb translation by binding within a cleft located between the ribosomal protein L11 and rRNA to block elongation factor G42. Study of 12 biosynthesis revealed that two identical copies of a variant of the housekeeping 50S ribosomal protein L11 are encoded in the BGC of 12, which are proposed to function as SREs to 1243 (Fig. 3 and Table 1). In another example, rubradirin (13) can selectively bind to one specific translation initiation factor to arrest protein biosynthesis44 (Fig. 3). Similar to 12, two identical copies of the translation initiation factor in the BGC are proposed to protect the host during biosynthesis of 1345 (Table 1). Similarly, identification of BGCs of NPs thiomuracin (15)46 and GE2270 (14)47 that inhibit elongation factors Tu (EF-Tu) revealed both clusters contain genes encoding variants of housekeeping EF-Tu as SREs (Fig. 3 and Table 1).

Most nascent proteins synthesized by ribosome require post-translational modifications to become mature and fully functional. One required modification in bacteria is removal of N-terminal methionine residue catalysed by methionine aminopeptidase (MetAP). A bacterial antibiotic bengamide B (16) produced by Myxococcus virescens is able to block type I MetAP48 (Fig. 3). Experimental studies of the type I MetAP variant encoded within the corresponding BGC confirmed it can confer self-resistance in the presence of 1649 (Table 1). The presence of MetAP variants as SREs is also observed in the BGC of fumagillin (17), which is a fungal MetAP inhibitor produced by Aspergillus fumigatus. Interestingly, two proposed SREs are encoded in the cluster, which correspond to additional copies of type I and type II MetAP50 (Fig. 3 and Table 1).

2.3. Inhibitors of protein degradation

Protein degradation is a cellular process that controls the intracellular concentration of proteins and enzymes, and is a key checkpoint for programmed cell death. Accurate and timely protein degradation is catalysed by protein complexes known as proteasomes. Disorder of a proteasome subunit is lethal to the organism. Salinosporamide A (18), isolated from a marine bacteria Salinispora tropica, is able to covalently inactivate proteasome subunit beta through a β-lactone ring opening step 51 (Fig. 4a). In the BGC of 18, an additional, mutated copy of the proteasome subunit beta was encoded. Functional assays confirmed the additional copy is indeed a SRE that confers resistance to 1852 (Table 1). Moore and coworkers showed that the SRE contains a single A49V mutation, which has been shown to constrict the S1 binding pocket and hinder 18 binding52. The biosynthesis of proteasome inhibitor eponemycin (19), produced by Streptomyces hygroscopicus, also involves the use of a mutated copy of proteasome subunit beta in the BGC that functions as SRE53 (Fig. 4a and Table 1). Although 19 itself is not further developed into a new drug, the epoxyketone warhead of 19 was used to design the FDA approved therapeutic agent Carfilzomib that targets the proteasome in the treatment of multiple myeloma54.

Fig. 4.

Fig. 4

Example NP metabolic enzyme inhibitors that employ a mutated target as SRE encoded in biosynthetic gene clusters.

2.4. Inhibitors of lipid metabolism

Lipids are essential biomolecules involved in formation of cellular membranes, energy storage and signal transduction, etc. Therefore, most enzymes involved in the synthesis and degradation of lipids are essential to living organisms. A number of BGCs producing NPs targeting the fatty acid biosynthetic pathway encode SREs to confer self-resistance. For example, andrimid (21) targets acetyl-coenzyme A (acetyl-CoA) carboxylase, which catalyses the synthesis of malonyl-CoA via carboxylation of acetyl-CoA55 (Fig. 4b). A second copy of acetyl-CoA carboxylase in the BGC of 21 was validated to be the self-resistance-gene56, 57 (Table 1). Both platensimycin (22)58 and platencin (23)59 produced by Streptomyces platensis are potent inhibitors of FabF, the beta-ketoacyl-acyl carrier protein synthase II in fatty acid biosynthesis (Fig. 4b). The FabF homologs encoded within 22 and 23 BGCs have been demonstrated to be SREs60, 61 (Table 1). The bacterial polyketide kalimantacin (24) inhibits FabI, the enoyl-acyl carrier protein reductase in the fatty acid biosynthetic pathway62 (Fig. 4b). The second copy of fabI co-localized with 24 biosynthetic genes was verified to be the self-resistance gene62 (Table 1).

In addition to fatty acids, steroids and steroid alcohols are also lipids that function as vital components of cell membranes or signalling molecules in microorganisms. Both cholesterol (used by animals) and ergosterol (used by fungi and protozoa) are derived from triterpene precursors such as squalene, which in turn are polymerized from isopentenyl pyrophosphate build blocks derived from the mevalonate pathway. Whereas the ergosterol pathway in fungi is the target of antifungal drugs such as azoles63, the cholesterol biosynthetic pathway in humans has been a very successful target for treating hypercholesterolemia64. One of the most famous and successful fungal NPs, lovastatin (25), targets the rate limiting step of the mevalonate pathway, 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGR). 25 is an approved drug (Mevacor) that treats high blood cholesterol levels65 (Fig. 4b), and has inspired the development of both semisynthetic and synthetic statin drugs. 25 is synthesized by the lov BGC from Aspergillus terreus, presumably to target the sterol biosynthetic pathways of competing fungal species. In the middle of the lov BGC is a second copy of the gene encoding a putative HMGR, and is demonstrated to confer self-resistance66, 67 (Table 1). Other NPs inhibiting the sterol biosynthetic pathway have been explored as potential cholesterol-lowering drugs. One notable example is zaragozic acid (26) that inhibits squalene synthase, the enzyme that dimerizes farnesyl diphosphate to afford squalene68 (Fig. 4b). As squalene synthase is vital to the producing fungus, a gene encoding a slightly mutated squalene synthase was found to be co-localized in the gene cluster, although the functional role as an SRE has not been verified69, 70 (Table 1).

2.5. Inhibitors of carbohydrate and energy metabolism

Not surprisingly, the central metabolism of microorganisms is heavily targeted by NPs in microbial warfare, including both anaerobic (glycolysis) and aerobic (respiration) processes. Because of the highly conserved enzymes in metabolism across species, SREs are commonly found in biosynthetic pathways of producing hosts. For example, glyceraldehyde-3-phosphate dehydrogenase (GAPDH) catalyses the sixth step of glycolysis, which converts glyceraldehyde 3-phosphate to the high energy D-glycerate 1,3-bisphosphate in an NAD+ dependent fashion. There are currently two known NPs, the bacterial pentalenolactone (29)71 and fungal heptelidic acid (30)72, that can covalently inactivate GAPDH using an epoxide moiety as a warhead targeting the active site cysteine (Fig. 4c). In the BGC of both NPs, an additional copy of GAPDH has been found and is verified to serve as SRE73, 74 (Table 1). At the end of aerobic respiration, adenosine triphosphate (ATP) synthase is an essential enzyme to generate ATP driven by the electrochemical gradient. Two structurally similar fungal NPs, aurovertin E (31)75, 76 and citreoviridin (32)77, produced by Metarhizium anisopliae and A. terreus, target ATP synthase beta chain as noncompetitive inhibitors75 (Fig. 4c). An additional mutated copy of ATP synthase beta chain is encoded by the BGC of each molecule, presumably to protect the producing hosts76, 78 (Table 1).

2.6. Inhibitors of amino acid metabolism

Phaseolotoxin (33) isolated from plant pathogen Pseudomonas syringae is a potent irreversible inhibitor of ornithine carbamoyltransferase in the arginine biosynthetic pathway79 (Fig. 4d). 33 is a devastating toxin in agriculture, because it leads to halo blight disease that caused loss of beans production worldwide80. The producing organism is able to survive during NP production by co-expressing a mutated ornithine carbamoyltransferase SRE, which is confirmed to be insensitive to this toxin81, 82 (Table 1).

2.7. Inhibitors of nucleotide metabolism

Inosine-5’-monophosphate dehydrogenase (IMPDH) catalyses the first step of guanine biosynthesis. Inhibition of this pathway has been effective in arresting the development of lymphocytes due to de novo biosynthesis of guanine, and is the only way to supply both adenosine monophosphate (AMP) and guanosine monophosphate (GMP) in these cells. The fungal NP mycophenolic acid (35) targeting this enzyme was approved by FDA as an immunosuppressant to prevent rejection during organ transplantation83 (Fig. 4e). Biosynthesis studies of 35 in Penicillium brevicompactum revealed a second copy of IMPDH is encoded in the BGC as a SRE84 (Table 1).

2.8. Inhibitors of xenobiotics biodegradation

Xenobiotics are substances that are not naturally produced within the host cells such as some environmental toxins, external antibiotics or synthetic chemicals. Biodegradation of these xenobiotic compounds is often a necessary process to ensure the survival of the host. β-lactam antibiotics are commonly produced by microorganisms as cell wall biosynthesis inhibitors and are widely used as antibiotics. In nature, an evasive strategy employed by bacteria is to evolve β-lactamase that can cleave the β-lactam compounds and become resistant. Interestingly, nature has evolved NPs that can inhibit β-lactamase which renders the organisms sensitive again. This has led to the co-use of both β-lactams and β-lactamase inhibitors to overcome clinical resistance. Cephamycin C (36), produced by Streptomyces clavuligerus, is such a NP85 (Fig. 4f). In the BGC of 36, a mutated copy of β-lactamase is co-expressed and is not sensitive to 3686 (Table 1).

3. Recent studies on natural products using a resistance gene directed approach

The above examples illustrate that the co-localizations of SRE genes within the BGC is a common occurrence. Most importantly, the identification of a putative SRE gene in the BGC through bioinformatic analysis or functional verification can connect the encoded NP to a potential target. The SRE, if predicted correctly, can therefore be a powerful predictive window to the bioactivity of a NP. This has several important implications to genome mining (Fig. 5): i) If the activity of the compound is already known, searching for a BGC with a potential resistance gene can help locate the gene cluster; ii) Many NPs are discovered without a molecular target. If a BGC is identified and contains a potential SRE, this can connect the molecule to its activity; and iii) Using a molecular target as a query, searching for BGCs that contain a corresponding SRE can lead to target-guided genome mining. Here we will highlight recent examples of using SREs in these three approaches.

Fig. 5.

Fig. 5

NPs discovery inspired by a self-resistance gene. a. Locating the BGC of a NP with known activity. b. Rediscovering the activity of a known NP. c. Discovering NP with desired activity.

3.1. Locating the BGCs of NPs with known biomolecular targets

With both the structure and activity of a NP elucidated, one can use both information as probes to locate the BGCs (Fig. 5a). For example, the BGC of gyrase inhibitor 3 was located in the producing strain Streptomyces rishiriensis by identifying BGC that contains both the biosynthetic core gene dTDP-glucose 4,6-dehydratase and the self-resistance-gene gyrB20, 23. The latter was hinted from the presence of a SRE in the BGC of a related compound 1. In an example from fungi, the BGC of IMPDH inhibitor 35 was located within the producing strain Penicillium brevicompactum (Fig. 4e). The biosynthetic origin of 35 was not immediately clear at the onset of the study83, and core enzymes were not useful in pinpointing the BGC83, 84. Although there was no precedent of BGCs encoding IMPDH inhibitors requiring a SRE, the authors proposed that a homolog of IMPDH may be present to confer resistance to 3584. Based on this hypothesis, the BGC of 35 was successfully discovered using the IMPDH gene as a probe.

An example of using a SRE to locate a gene cluster from our lab is that of the fungal MetAP2 inhibitor 1750. It is known that the bis-epoxide cyclohexanol portion of 17 (fumagillol) is derived from a farnesyl diphosphate precursor87. When we searched through the sequenced and annotated A. fumigatus genome for BGCs containing canonical sesquiterpene cyclases, no suitable BGC could be identified. Suitability here refers to the presence of expected tailoring enzymes that match the structure of 17. An alternative strategy was to search for additional copies of MetAP, which was an elucidated target of 1788. In doing so, a BGC on the eighth chromosome was identified to contain additional copies of both MetAP1 and MetAP2. While the BGC encoded a multitude of tailoring enzymes, as well as a PKS that could synthesize the dioic acid, no terpene cyclase was found. Further detailed annotation of the gene cluster revealed that one gene was misannotated. Upon correction of the 5’ region assignment, the gene was reannotated as a putative membrane bound prenyltransferase with weak sequence homology to UbiA. Biochemical characterization of the enzyme demonstrated it is a new type of sesquiterpene cyclase that synthesizes the 17 precursor β-trans-bergamotene50. All the remaining genes in the BGC were characterized to be responsible for synthesizing 17 by both the Keller group and our group50, 89, 90. This example illustrates the utility of using the SRE hypothesis to find BGCs that may otherwise be difficult to identify using a core enzyme alone.

3.2. Discovering the biomolecular targets of known NPs

Besides locating the BGC of a NP of known bioactivity, the SRE presence in a BGCs can also reveal the bioactivity of discovered NPs (Fig. 5b). For example, 24 isolated from Pseudomonas fluorescens were identified to be promising novel antibacterial agents with a strong selective anti-staphylococcal activity (Fig. 4b). The mechanism of action of 24 however, had remained unknown62. When characterizing the function of each gene within the BGC, the Lavigne group observed a knockout of BatG, a predicted FabI isozyme, did not affect the production of this antibiotic. On the other hand, heterologous expression of BatG in a 24 sensitive host conferred this host resistance to 2462. FabI is an essential metabolic enzyme that functions as a trans-2-enoyl-ACP reductase in the fatty acid biosynthetic pathway. Thus they proposed the molecular target of 24 is FabI, and BatG is able to complement the function of FabI during production of 2462 (Table 1). One may ask how the original host can survive when batG is inactivated, since the SRE is no longer present. It may be that the housekeeping copy of FabI has already accumulated some mutations to gain resistance to 24, and the SRE is an additional safeguard. This may explain why there is not always an additional copy of the SRE in the BGCs of the bioactive NP. The ability of the housekeeping copy to be resistant may already be built in during evolution, and render SRE unnecessary. However, identification of these self-resistant housekeeping copies will need higher resolution bioinformatic analysis.

In another example, griselimycin (4) is a cyclic peptide isolated from Streptomyces strains in 1960s (Fig. 2). Although it has excellent broad spectrum antibacterial activity including against Mycobacterium tuberculosis that causes tuberculosis (TB), further development of this NP as an anti-TB drug was impeded because of its poor pharmacokinetics properties91. While analogues of 4 were synthesized and tested, however, efforts were abandoned when rifampin became available to treat TB92. Without knowledge of the mode of action, further medicinal chemical development for activity improvement was difficult. Recently, Kling et al. discovered the molecular target of 4 is DNA polymerase sliding clamp DnaN, which was hinted by a DnaN homolog encoded in the BGC that was confirmed to be the SRE93 (Table 1). This discovery unveiled a novel mode of action of 4, and resurrected its promising potential to be developed into a new anti-TB drug. This research outcome again indicated that the BGCs of NPs contain much more information than just the biosynthetic pathway: it is not just limited to how a NP is made, but also how that NP functions.

3.3. (Re)discovering NPs with desired biomolecular targets

While the above examples successfully leveraged the SRE presence to connect NPs to the biological activity, the more useful application of a SRE is to accurately connect biological activity and genomic information in target-guided mining of NPs. The concept is to use SREs as a predictive window to help prioritize and identify BGCs that encode a NP inhibiting the desired target. This effectively bridges the gap discussed earlier between activity-guided screening and untargeted genome mining. A workflow of resistance-gene directed NP discovery could be proposed as follows: 1) Identifying a desired drug or pesticide target that is also essential and conserved in microorganisms of interest; 2) Searching through a genome database for BGCs carrying duplicate copies of the target using desired biosynthetic core enzymes; 3) Activating the biosynthetic genes to produce the NP using various synthetic biology tools; 4) Isolating the NP and determining its structure via multiple analytical chemistry techniques. 5) Validating the bioactivity of NP using in vitro biochemical assays of housekeeping enzymes or in vivo growth inhibition assays of a susceptible heterologous host with or without expressing SRE (Fig. 5c). This can be a general workflow to perform resistance-gene directed NP discovery using a genomic database.

To identify BGCs associated with SREs for genome mining, new bioinformatic tools are being developed with more efficient search algorithms. Ziemert and co-workers developed a web-based genome mining tool Antibiotic Resistant Target Seeker (ARTS), which specifically uses SRE-directed approaches for NPs discovery in bacteria94. The user provides genomic data of an organism of interest as input, and the putative BGCs that could produce NPs to inhibit all possible targets will be displayed in a dynamic graphic as an output. This search algorithm processes data in the following steps: 1) The genome will first be processed by automated genome mining tools to detect BGCs based on biosynthetic core genes; 2) The putative resistance genes will be located according to known self-resistance models and duplication of essential housekeeping genes, which is validated by phylogenetic analysis of horizontal gene transfer; 3) The BGCs co-localized with a self-resistance gene will be screened and highlighted as an output, which are summarized into an interactive output table to rapidly visualize both known and putative targets. Although ARTS is currently focused on actinobacteria, the methodology of this automated pipeline can be extended to other organisms to prioritize genome mining based on potential bioactivity.

Another genome mining software, clusterTools, developed by Challis and co-workers also enables in silico resistance-gene directed NP discovery95. Compared to existing bioinformatics tools, clusterTools is designed to identify putative BGCs of interest using Hidden Markov Models (HMMs) of specific functional elements. A customized search can be performed with inputs of HMMs including not only functional domains of biosynthetic core enzymes and tailoring enzymes, but also other elements such as resistance or regulatory genes. This software enables a more precise biosynthetic-hypothesis driven and resistance-gene directed search of BGCs at the same time.

Recently, Andersen and co-workers developed a similar pipeline to perform resistance-gene directed genome mining from fungal species96. 72 unique putative resistance genes and clusters were identified by applying the search algorithm to 51 Aspergillus and Penicillium species.

The first detailed experimental methodology was performed by Moore’s group97. They successfully discovered a group of FASII inhibitors using their search pipeline to identify NPs that may inhibit parts of the fatty acid biosynthetic pathway. The pipeline can be summarized as follows: 1) Possible self-resistance gene were selected from duplicate members within delineated orthologous groups in the pan-genome of 86 Salinispora strains; 2) One PKS44 cluster was identified to contain a putative SRE, which is a homolog of fatty acid biosynthetic enzyme; 3) This PKS44 cluster was reconstituted in a heterologous host Streptomyces coelicolor, which produced a group of thiolactomycins (27) that were previously reported as FASII inhibitors98 (Fig. 4b and Table 1). Following a similar procedure, the thiotetroamide C (28) gene cluster was also identified in Streptomyces afghaniensis97, 99 (Fig. 4b and Table 1); and 4) The putative self-resistance genes ttmE and ttmJ were verified to aid the heterologous host to survive in the presence of 2897. The output of this pipeline demonstrated the practicality to perform a self-resistance-gene directed approach to find NPs with desired mode of action.

The proteasome inhibitor fellutamide B (20) was rediscovered by Yeh and co-workers guided by a putative resistance gene encoding a proteasome component C5 homolog100 (Fig. 4a and Table 1). First, the authors identified a BGC in Aspergillus nidulans that contains the gene inpE that encodes a putative proteasome subunit. This cluster was then activated by a serial promoter replacement to produce the NP 20, which was previously characterized to be a potent proteasome inhibitor101. Finally inpE was demonstrated to confer self-resistance to 20 by in vivo studies100.

Notwithstanding the above examples of SRE-directed mining, the rediscovery of 27, 28 and 20 led to known NPs with previously established modes of action. In the last few years, a couple of examples have emerged for the discovery of NPs with previously unknown targets. For example, Müller’s group discovered novel topoisomerase inhibitors from myxobacteria using topoisomerase-targeting pentapeptide repeat protein (PRP) as a probe102. This SRE was previously observed in BGCs of gyrase inhibitors albicidin and cystobactamid. A BGC encoding a polyketide metabolite was identified to contain a PRP in the genome of Pyxidicoccus fallax. Two new NPs, pyxidicycline A and B, were produced upon mutating the promoters of polyketide synthase genes. Bioactivity studies confirmed both pyxidicycline A and B are selective inhibitors of topoisomerases. Although the resistance gene here is not directly a mutated copy of the targeted housekeeping topoisomerase, this example validated the possibility of using SREs to discover new NPs with desired bioactivities.

We applied target-guiding genome mining to find herbicide leads with a new mode of action from a fungal genomic database. We targeted the enzyme dihydroxy-acid dehydratase (DHAD), the third enzyme in the branched chain amino acid biosynthesis pathway103. The first enzymes on this pathway, acetolactate synthase is the most targeted enzyme for herbicide development with over 50 commercialized agents. However, there is no inhibitor of DHAD that has been reported to work in planta. After confirming DHAD enzymes in plants are close homologs of the housekeeping fungal DHADs, we scanned fungal genomes in publicly available databases for a BGC that encodes a possible SRE copy of DHAD. We located a conserved gene cluster that encodes a duplicate copy of DHAD that is ~60% identical to the well-conserved housekeeping DHAD present in all fungi. The biosynthetic core genes from Aspergillus terreus were then reconstituted in a heterologous host to produce aspterric acid (34), a known sesquiterpene NP with unknown biological target104 (Fig. 4d, Table 1). We demonstrated that 34 is a competitive inhibitor of DHAD, and the SRE AstD is indeed resistant to 34. The broad spectrum herbicidal activities of 34 were then confirmed using a variety of model plant species. Furthermore, we introduced astD into Arabidopsis thaliana to construct a transgenic plant that is resistant to 34. The combination of the compound with new mode of action and a resistant plant makes 34 a promising herbicide lead. Finally, the structure of A. thaliana DHAD was solved and the homology structure of AstD was modelled, which showed the active site entrance in AstD may be significantly narrowed to prevent binding of 34. Our rediscovery of 34 with the target-guided mining demonstrates this approach can indeed lead to NPs with novel modes of action.

4. Conclusions and prospects

The rapid development of genome sequencing technologies has reinvigorated NP discovery using genome-mining approaches. The duplicated and insensitive SREs co-localized with NP BGCs likely co-evolved with the biosynthetic pathways of NPs. While study of the molecular mechanism of how SREs are insensitive can lead to new insights in how nature evolves resistance to powerful antibiotics, the SRE provides a unique window to predict the biological activities of NPs22, 23.

However, it is important to note that our limited biosynthetic knowledge means that prediction of a self-resistance gene in a BGC could sometimes be inaccurate. Using an example from our lab, when we tried to discover a new IMPDH inhibitor using the presence of a SRE, we recently discovered a BGC that synthesizes the PKS-NRPS product pyranonigrin A. However, pyranonigrin A was shown not to be an IMPDH inhibitor105. Therefore, a housekeeping enzyme homologue could be coincidently duplicated in the BGC of a NP and does not serve as a SRE.

In other cases, a housekeeping enzyme variant in a BGC could be bona fide biosynthetic enzyme that involved in the assembly of NPs rather than a SRE. For example, similar to that of 9, a second copy of seryl-tRNA synthetase is located within the valanimycin BGC. However, further characterization of the function revealed that this seryl-tRNA synthetase is responsible for catalysing seryl transfer in the valanimycin biosynthetic pathway106. In another recent example, Abe and coworkers discovered the BGC of an Ile-tRNA synthetase (IleRS) inhibitor SB-203208 in Streptomyces sp. NCIMB 40513 using the presence of an SRE copy of IleRS as a search criterion107. Surprisingly, the additional copy of IleRS (SbzA) in the BGC was demonstrated to participate in the biosynthetic steps of SB-203208. Therefore, it is challenging to accurately predict true SREs, especially for enzymes that may play functional roles in the biosynthetic pathway. However, we expect this to be less ambiguous as we build up more knowledge in biosynthetic enzymology. The accumulating knowledge of self-resistance mechanisms in the future will also facilitate new automated bioinformatics tools to prioritize cryptic BGCs with higher accuracy. We expect a self-resistance-gene guided approach to bridge the traditional and modern methods of NP discovery during this exciting period of NP renaissance.

6. Acknowledgements

This work was supported by the NIH (1R35GM118056) to Yi Tang.

Biography

graphic file with name nihms-1069080-b0007.gif Nicholas Liu received his B.S. in Chemical and Biomolecular Engineering from Johns Hopkins University in 2014. He is currently working toward his Ph.D. under the guidance of Prof. Yi Tang at the University of California, Los Angeles. His research interests focus on self-resistance gene directed natural product discovery and biosynthesis, particularly on polyketide natural products.

graphic file with name nihms-1069080-b0008.gif Yi Tang received his undergraduate degree in Chemical Engineering and Material Science from Penn State University. He received his Ph.D. in Chemical Engineering from California Institute of Technology in 2002. After NIH postdoctoral training in Chemical Biology at Stanford University, he started his independent career at the University of California Los Angeles in 2004. He is currently Professor in the Department of Chemical and Biomolecular Engineering at UCLA, and in the Department of Chemistry and Biochemistry. His lab is interested in natural product biosynthesis, biocatalysis and protein engineering.

graphic file with name nihms-1069080-b0009.gif Yan Yan received his B.S. in Biology from China Agricultural University in 2008, and his M.S. in Organic Chemistry from Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences in 2013. After receiving his Ph.D. under the direction of Prof. Yi Tang at University of California, Los Angeles in 2019, he is currently a postdoctoral scholar in the same group. His research interests focus on self-resistance-gene directed natural products discovery, self-resistance mechanism of natural products, and natural products biosynthesis.

Footnotes

5. Conflicts of interest

Yi Tang is a cofounder of Hexagon Biosciences, a start-up biotechnology company using genome mining to discover new bioactive natural products.

7. References

RESOURCES