Significance
Numerous iterative type I polyketide synthases (iT1PKSs) have been discovered in fungi that can synthesize structurally diverse bioactive molecules. However, very few iT1PKSs have been discovered in bacteria. In this work, by combining extensive phylogenetic analysis and experimental verification, we discovered many iT1PKSs in Streptomyces (a bacteria genus), some of which were experimentally confirmed to produce various branched/nonbranched linear intermediates. Particularly, we discovered two bioactive allenic polyketides which are conditionally essential to the producer. In addition, we validated that the phylogenetic analysis is effective in discerning the bacterial iT1PKSs. The widespread existence of iT1PKSs in bacteria is of great significance for both fundamental studies of T1PKS evolution and practical discovery of novel polyketide natural products.
Keywords: polyketide synthase, iterative, natural product discovery, biosynthetic gene clusters
Abstract
Type I polyketide synthases (T1PKSs) are one of the most extensively studied PKSs, which can act either iteratively or via an assembly-line mechanism. Domains in the T1PKSs can readily be predicted by computational tools based on their highly conserved sequences. However, to distinguish between iterative and noniterative at the module level remains an overwhelming challenge, which may account for the seemingly biased distribution of T1PKSs in fungi and bacteria: small iterative monomodular T1PKSs that are responsible for the enormously diverse fungal natural products exist almost exclusively in fungi. Here we report the discovery of iterative T1PKSs that are unexpectedly both abundant and widespread in Streptomyces. Seven of 11 systematically selected T1PKS monomodules from monomodular T1PKS biosynthetic gene clusters (BGCs) were experimentally confirmed to be iteratively acting, synthesizing diverse branched/nonbranched linear intermediates, and two of them produced bioactive allenic polyketides and citreodiols as end products, respectively. This study indicates the huge potential of iterative T1PKS BGCs from streptomycetes in the discovery of novel polyketides.
Polyketides are synthesized by multifunctional megaenzymes or enzyme complexes called polyketide synthases (PKSs) and have proven to be a rich source of pharmaceutical and agrochemical lead compounds (1). Based on biosynthetic machinery architectures PKSs can be divided into three types, including the modular type I PKSs (T1PKSs), the iterative type II PKSs, and the acyl-carrier protein (ACP)-independent freestanding iterative type III PKSs (2, 3). T1PKSs are composed of modules which further consist of multiple domains covalently linked in a very long polypeptide to catalyze specific reactions, including the mini-PKS consisting of ketosynthase (KS), acetyltransferase (AT), and ACP and the accessory domains such as ketoreductase (KR), dehydratase (DH), enoyl reductase (ER), and methyltransferase. Furthermore, AT can be either integrated in a module as a domain (termed cis-AT) or act as a discrete enzyme (trans-AT) (4–6). Both subtypes of modules are organized into an assembly line and used nonrepeatedly to condense various acyl-building blocks into the full-length polyketide dictated by a rule of “co-linearity” albeit exceptions exist (7–10). Therefore, the backbone of a polyketide compound is largely determined by the modules, including the number and order of modules and the module compositions. Important polyketides, such as antibiotic erythromycin (11) (synthesized by 6 modules) and insecticide spinosyn (12) (10 modules) and antitumor ansamycins geldanamycin (7 modules) are synthesized by multimodular T1PKS biosynthetic gene clusters (BGCs), which appear to be nature’s example of “more is better” (1, 13) albeit still controversial. So far, the pathway for synthesis of the antiproliferation compounds, stambomycins, represents one of the largest PKS assembly lines which involves 25 modules to work concertedly (14).
In contrast, the fungal species employ a different mechanism to synthesize their enormously diverse and bioactive polyketides ranging from the carcinogenic compound aflatoxin B1 to the pharmaceutical drug lovastatin (15). It involves iterative monomodular T1PKSs that act as multimodules condensed into one module, but strikingly, the single module is highly programmed to catalyze several iterations while in each iteration different domain combinations would be employed, generating compounds with diverse chain lengths and reducing carbons. Based on the presence of KR, DH, and ER domains, the fungal iterative T1PKSs can further be divided into highly reducing, nonreducing, and partially reducing subtypes (16).
Although iterative T1PKSs have been occasionally discovered in bacteria, both the number and types are far fewer than those present in the fungi species (17), which appears to be the divergent evolution of T1PKSs—noniterative multimodular T1PKSs in bacteria and iterative monomodular T1PKSs in fungi. T1PKSs are extensively studied and various computational tools, such as NP.searcher (18) and antiSMASH (19), can be readily used to detect the domains within modules. However, at the module level, to distinguish iterative modules from noniterative modules remains very challenging, which we believe accounts for the seemingly biased distribution of T1PKSs in bacteria and fungi.
An outstanding feature of fungal iterative T1PKSs, termed monomodular T1PKS BGCs, is that they mostly lie in a BGC containing only one PKS module (15, 20). In bacteria, they may be disguised in various forms, such as being a standalone PKS gene in a monomodular T1PKS BGC, as most of their fungal counterparts, or blending in a multimodular T1PKS [e.g., aureothin (21) and borrelidin (22)] or a T1PKS/nonribosomal peptide synthetase (NRPS) hybrid BGC (23). The former is the focus of current work which contains strictly only one T1PKS module in a BGC. We used bioinformatics tools to extensively analyze the T1PKS BGCs in Streptomyces, a representative bacterial genus, and discovered that the short monomodular T1PKS BGCs are unexpectedly both abundant and widespread. Seven of 11 systematically selected T1PKS monomodules were experimentally confirmed to be iteratively acting, synthesizing diverse branched/nonbranched linear intermediates, and two of them produced bioactive allenic polyketides and citreodiols as end products, respectively. This study indicates the huge potential of iterative T1PKS BGCs from streptomycetes in the discovery of novel polyketides.
Results and Discussion
Extensive Analysis of Streptomyces T1PKSs Uncovered a Large Number of Monomodular T1PKS BGCs.
We systematically investigated the distribution and function of the monomodular T1PKS BGCs in the representative bacterial genus Streptomyces, with a dual goal of generating knowledge for discerning bacterial iterative T1PKSs and discovering natural products. Of note, the monomodular T1PKS BGCs defined in this study are strictly confined to those that encode only a single T1PKS module in a single gene, which excludes any other modular forms of T1PKS BGCs or T1PKS/NRPS hybrid BGCs. To identify monomodular T1PKS BGCs in Streptomyces, we first reprocessed all of the data on T1PKS BGCs from the antiSMASH database (24) containing over 600 Streptomyces species in-house based on conserved domain signatures (SI Appendix, Table S1). Individual T1PKS BGCs were extracted based on a similar 20-kb rule as adopted by antiSMASH (19) and NP.searcher (18), but we removed those with flanking regions less than 20 kb, i.e., located adjacent to the contig edges. In our analysis, the number of T1PKS BGCs before applying the 20-kb rule was 3,804 from 595 Streptomyces species, which is roughly the same number as what was retrieved from the antiSMASH database (3,975/625). However, only 1,784 of them met this rule, which suggested that more than half of these T1PKS BGCs are experimentally unexploitable due to their incompletely sequenced BGCs. By manual reinspection, a total of 641 hits were screened out as typical monomodular T1PKS BGCs (Fig. 1A) and present in more than two-thirds of the analyzed Streptomyces species (Fig. 1B), which demonstrated the greatly underestimated large amount and wide distribution of monomodular T1PKS BGCs in Streptomyces.
Fig. 1.
Genome mining of Streptomyces species for monomodular T1PKS BGCs. (A) General workflow of the bioinformatics analysis and demonstration of typical monomodular T1PKS BGCs. (B) Distribution of monomodular T1PKS BGCs at the single species level. The number and percentage related to each pie slice refer to the amount of monomodular T1PKS BGCs per genome and the percentage thereof in the database. (C) SSN analysis of the total KSs from T1PKS BGCs based on a cutoff value of E < 10−130 and visualized by Cytoscape (25). Colors in the nodes show KSs from different classes of T1PKS BGCs. Edges (lines) indicate that pairwise alignments of nodes are better than the cutoff value while those singlets are dissimilar to any other nodes. The boxed four classes (G1 to G4) from clade 1 were the focus of this study with corresponding module compositions added. The majority of the gray nodes in clade 1 (4,583 of 6,022 nodes) are KSs from standard multimodular T1PKS BGCs. Among them is the class G4-situated subclade which is composed of 3,845 of 6,022 nodes. Other separate gray clades are mainly KSs from T1PKS/NRPS hybrid BGCs, trans-AT T1PKS BGCs, and bimodular T1PKS BGCs (SI Appendix, Fig. S1).
We hypothesized that the strictly monomodular composition of a T1PKS BGC probably suggests its iterative-acting nature, and, similar to their fungal counterparts, the corresponding iterative T1PKSs might have characteristic sequence features that can be identified by phylogenetic analysis. To test this hypothesis, the corresponding KSs were extracted and applied to an extensive phylogenetic analysis which was presented as a sequence similarity network (SSN) and visualized by Cytoscape (25), and the known references were included for optimizing the threshold (26). Surprisingly, after mapping all of the KSs from monomodular T1PKS BGCs, we found not only that they are clustered into separate clades, but also that each clade features distinct domain organizations (Fig. 1C and SI Appendix, Fig. S1). This clustering pattern is in sharp contrast to the multimodular T1PKSs that mainly clade either with members from their own BGCs (cis-AT) or on the basis of the preceding module-determined substrate structures (trans-AT) regardless of module composition (5). But it shows high resemblance to the fungal pattern (27) and includes more than 20 classes that cover all of the fungal types, suggesting the evolutionary relatedness of fungal and bacterial monomodular T1PKSs and the iterative nature of the latter. One supporting piece of evidence for this speculation is the recent discovery while this paper was in preparation of the anti-Staphylococcus aureus antibiotic amycomicin produced by an Amycolatopsis species (28) (another bacterial genus) and the corresponding discovery that iterative monomodular T1PKS BGC belongs to class G12 (SI Appendix, Fig. S1). Notably, the monomodular T1PKS BGCs are rich in other types of KSs and/or long-chain fatty-acyl CoA ligases (SI Appendix, Fig. S1), which apparently suggests their preference for special initiation that usually leads to a higher order of structural diversity and complexity (29).
Structure Characterization Revealed the Iterative Nature of the Selected Monomodular T1PKSs.
To investigate whether any of the above-mentioned 641 monomodular T1PKS BGCs that were grouped into 23 classes (Fig. 1C and SI Appendix, Fig. S1) is iterative, we submitted them to antiSMASH v5 for analysis (30). To our surprise, only 16 of 70 BGCs specifically from class G3 were annotated to contain iterative T1PKSs, which indicates the lack of prototypes for current iterative T1PKS prediction if our hypothesis turns out to be true. We then set out to verify their iterativity by experimentally characterizing the associated products. Of note, being able to catalyze a minimum of two acyl condensations in a programmed fashion is generally regarded as iterative-acting (31). To increase the chance of successful heterologous expression, from the four biggest classes we selected 11 BGCs that are seemingly simple yet difficult to be discerned from the canonical multimodular T1PKSs (clade 1 in Fig. 1C) due to their high sequence similarity. We cloned either the single monomodules alone or the single monomodules together with closely related genes under the control of the strong promoter SF14 (32) and transformed them into Streptomyces lividans TK24 (SI Appendix, Fig. S2). Class G2 was reported to be involved in organoarsenic metabolism (33), but the arseno-polyketide has not been characterized confirmatively. We identified and characterized the polyketide moiety [1] produced by both sliM8 and scoM23 (see SI Appendix, Table S2, for designations of sliM8 and scoM23) as a 2,4,6-trimethylnon-2-enoic acid (Fig. 2 and SI Appendix, Figs. S2A and S3–S7 and Table S3). The branched tetraketide structure clearly indicates that the responsible monomodules act in an iterative manner. Furthermore, the C3-unit specificity determined by the AT domain seems to not support the proposed nonbranched hexaketide or heptaketide (SI Appendix, Fig. S8). Although similar in module composition, the two class G1 targets from slavM41 and ssp2M5 did not yield detectable products, and neither did two of the class G3 targets from g21M14 and g31M15. This is presumably because their highly reduced carbon chains make ultraviolet (UV) detection difficult, similar to the aforementioned amycomicin (28).
Fig. 2.
Summary of the heterologous expression experiments for the T1PKS monomodules. The single PKS modules were heterologously expressed except for class G2 in which a KSIII gene immediately downstream was also included. The module compositions of each class are shown in Fig. 1C, while the BGC designations are shown in SI Appendix, Table S2. The diverse branched/nonbranched linear polyketide intermediates indicate the iterative nature of the corresponding PKS modules; e.g., class G4 employs a CoA-ligase to load 3- or 4-hydroxybenzoic acid as a starter unit and then runs the KS-AT-DH-KR-ACP for seven iterations by using malonyl-CoA as the extender unit and finally releases the octaketide polyenomycins by thioesterase.
The other two class G3 targets, sgriM25 and sgriM38, produced a group of heptadecapentaenoic acid isomers [2] (Fig. 2 and SI Appendix, Fig. S2B) and a 2,6-dimethylocta-2,4,6-trienoic acid [3] (Fig. 2 and SI Appendix, Figs. S2C and S9–S13 and Table S4), respectively. Of note, the heptadecapentaenoic acid isomers could not be purified due to their polyene-like spontaneous isomerization, and their structure prediction was based on a high-resolution mass spectrum and UV-vis absorption profile (SI Appendix, Fig. S2B) and comparison with the structure of allenomycin A [6] (Fig. 3D). In addition, all three targets (s4M7, g17aM16, and ssp1M2) from class G4 surprisingly synthesized the same arylpolyene compounds—polyenomycins A [4] and B [5]—when externally fed with 4-hydroxybenzoic acid and 3-hydroxybenzoic acid, respectively (Fig. 2 and SI Appendix, Fig. S2D). Conclusive characterization of 4 or 5 failed despite exhaustive efforts. However, the UV-Vis absorption profile and high-resolution mass spectrum data (SI Appendix, Fig. S2D) support the structure deduction that they both are all-trans aryl polyene carboxylic acids similar to known arylpolyenes (34, 35).
Fig. 3.
Discovery of end products from native producers. (A) Gene organization of sgriM25 BGC. The core monomodular T1PKS is labeled yellow. (B) Disruptant IsgriM25 cannot grow in MYM medium, but can be rescued by external addition of the pure allenomycins (6 and 7). (C) Double agar layer assay for the mode-of-action of the active allenomycins. IsgriM25 was streaked all through the upper layer, but only those right on top of sgriCK grew out. As shown in the side view, two strains were obviously separated by a thick layer of agar. It suggests that the active molecule(s) produced by sgriCK are permeable and can be imported back into IsgriM25. (D) HPLC analysis of crude extracts from the disruptants, IsgriM25 and IsgriM38, and the associated compound structures. For peak 8, two wavelengths were used in the zoomed-in Inset to distinguish it from the overlapped background peak. (E) HPLC analysis of the crude extracts from heterologously expressed sgriM25 and sgriM38 BGCs. Corresponding HPLC peaks were produced in agreement with those from the native producers.
We hypothesized that PKS modules from the monomodular T1PKS BGCs are more likely to be iteratively acting, and it turned out to be true for more than half of the analyzed targets. However, it is worth noting that T1PKSs are extraordinarily complex enzymes which in some cases exhibit context-dependent behaviors. A typical example is the PikAIII module which is responsible for a single chain elongation when it is part of the pikromycin assembly line, but catalyzes two iterations in vitro when it is isolated as a pure protein (31). In this aspect, we cannot rule out that the iterative T1PKS modules verified in the heterologous host may work differently in their native hosts, but if so, it would probably indicate that, similar to the pikromycin pathway, the iterative T1PKSs are still evolving, generating an assembly line or becoming a part of it by module duplication and recombination (36).
Exploring the End Products from Native Producers Led to Discovery of Bioactive Allenomycins.
Because the T1PKS monomodules from relatively simple BGCs alone can synthesize diverse linear branched or nonbranched “intermediates” by their intrinsic iterative strategies, more interesting chemical structures could be expected for their corresponding end products. Therefore, we sought to determine the end products of the iterative T1PKS BGCs in their corresponding native hosts and selected g31M15, sgriM25 (Fig. 3A), sgriM38, and s4M7 as initial target BGCs. RT-PCR was first performed to evaluate their transcription levels. Compared with the corresponding internal reference, the sigma factor hrdB (37), all target BGCs were shown to be moderately expressed (SI Appendix, Fig. S14), suggesting their potential biological roles in the organisms. Because only Streptomyces griseofuscus is genetically tractable (38), we focused our subsequent studies on sgriM25 and sgriM38 BGCs.
To rapidly inactivate sgriM25 and sgriM38 BGCs, we employed a suicide plasmid, pKC1132, to achieve single-crossover homologous recombination-mediated gene disruption (SI Appendix, Fig. S15). The resulting mutants, IsgriM25 and IsgriM38, grown on mannitol soya flour solid medium showed no obvious phenotypical difference compared to the control strain sgriCK (wild type bearing the self-replicable plasmid pKC1139). However, switching to maltose–yeast extract–malt extract (MYM) liquid medium (38) appeared to be lethal for IsgriM25 (Fig. 3 B and C)—a phenomenon that suggested that sgriM25 may be conditionally essential for the producer. Fortunately, we found another liquid medium, R5, in which not only IsgriM25 grew quite well but also the BGC was transcriptionally active albeit lower than in MYM (SI Appendix, Fig. S14). We purified and characterized the sgriM25-associated High-Performance Liquid Chromatography (HPLC) peaks as two volatile allenic polyketides, allenomycins A [6] and B [7] (Fig. 3D and SI Appendix, Figs. S16–S25 and Table S5), and sgriM25 was named as aln BGC accordingly. So far, only around 150 allenic natural products are known, including the fatty-acid–derived linear allenes and bromoallenes, and the allenic carotenoids and terpenoids (39–42). However, the biosynthetic mechanism for allene formation remains unknown, probably because the known allenic compounds are derived mostly from primary metabolism of fungi and higher plants and the responsible genes are almost impossible to identify (39). Recently, we successfully cloned and heterologously expressed the complete aln BGC in Streptomyces avermitilis SUKA17 (Fig. 3E), which should facilitate the elucidation of the allene formation mechanism. In parallel, we tested external feeding of pure 6 and 7 and found that both of them can effectively rescue IsgriM25 (Fig. 3B). Furthermore, the double-layer agar plate assay suggests that the two compounds might function as communication molecules just like what the insect pheromones (43) or the A-factor like signaling molecules (44) do, yet are different from them because the allene polyketides are growth indispensable (Fig. 3C).
In addition, we eventually discerned and isolated the end products of the sgriM38 BGC from large background HPLC peaks and characterized them as an epimeric mixture of citreodiol and epi-citreodiol [8a and 8b] (Fig. 3D and SI Appendix, Figs. S26–S30 and Table S6). Subsequent cloning and heterologous expression of a 25-kb DNA region containing the sgriM38 BGC successfully produced the citreodiols, which validated this BGC-to-compound linkage (Fig. 3E). Citreodiols were reported to be produced by both prokaryotic streptomycetes (45) and eukaryotic fungi (46). This kind of cross-kingdom distribution itself strongly suggests their important and universal roles, yet their BGCs have been previously undiscovered until this work.
To conclude, we uncovered numerous uncharacterized monomodular T1PKS BGCs in Streptomycetes that potentially contain iterative T1PKS modules. We experimentally verified the iterative nature of seven of them from three major classes as well as identified several corresponding bioactive compounds. Although the iterativity of other classes defined in this work needs further investigation and it is challenging to extract the sequence features associated with iterativity, new rules for predicting iterative classes of G2, G3, and G4 may be established based on extensive phylogenetic analysis and biochemical characterization. Other modular forms such as bimodular and trimodular T1PKSs or T1PKS/NRPS hybrids might also possess the iterativity (17). However, those standalone bimodular T1PKS BGCs identified using the strategy described in this work are also promising targets for further exploration of bacterial iterative T1PKSs. In addition, our discovery of the abundant and diverse potential iterative T1PKSs in bacteria might have evolutionary implications for the horizontal gene-transfer origin of fungal T1PKSs and the module duplication hypothesis of the origin of bacterial multimodular T1PKSs (SI Appendix, Supporting Note).
Methods
Bacterial Strains, Culture Conditions, and General Remarks.
All Escherichia coli strains were cultured in Luria–Bertani broth at 37 °C supplemented with appropriate antibiotics when needed. E. coli NEB5α was used for general cloning and E. coli WM6026 for conjugation. S. lividans TK24 and E. coli BAP1 (47) were used for heterologous expression of T1PKSs. Designations of Streptomyces strains, plasmids, and primers used in this study are listed in SI Appendix, Tables S2 and S7.
General Procedures.
Genomic DNAs used for PCR were extracted as described elsewhere (48). PCR primers were listed in SI Appendix, Table S7. Plasmid construction was carried out by Gibson Assembly (49). Specifically, plasmid pSET616 (50) was predigested by BamHI and EcoRI, and pET28a by NcoI and HindIII. DNA fragments were amplified by using Q5 hot-start high-fidelity DNA polymerase [New England Biolabs (NEB), Ipswich, MA], and a two- or three-fragment assembly protocol was used at a ratio of backbone:fragment of 1:3 (NEB). Assembly mixture was incubated at 50 °C for 1 h and then directly transformed into NEB5α high-efficiency competent cells (NEB). Plasmids were extracted and verified by restriction digestion analysis. Correct pSET plasmids were then transformed into E. coli WM6026 which was cultured with 40 μg/mL (final concentration) of diaminopimelic acid. Conjugation was performed as described elsewhere (48). Exconjugants were purified by three successive passages. Fermentation media commonly used were MYM liquid medium (51) except for culturing IsgriM25 which used R5 liquid medium (48). Fermentation analyses were performed at least twice to confirm the reproducibility or for preparation of compounds. Glycerol stocks of the spores were used for small-scale fermentation analysis, and overnight seed cultures were used for large-scale fermentation. Plasmid pET28a-1MPKSsgri01-His was transformed into E. coli BAP1 which was induced with 0.1 mM of isopropyl β-d-1-thiogalactopyranoside.
Direct Cloning of sgriM25 (aln) and sgriM38 BGCs.
Genomic DNA of S. griseofuscus NRRL B-5429 was extracted and digested by XbaI and MfeI (for sgriM25) and MfeI and NdeI (for sgriM38), respectively. For each construct, two receiver plasmids were amplified with homology arms to the end of the cloned region. After DNA purification, the digested genomic DNAs were assembled with corresponding receiver plasmids by HiFi Gibson Assembly (NEB) at 50 °C for 2 h. The assembly mixtures were dialyzed before being electrotransformed into E. coli NEB10β which contains a L-arabinose–induced Cre protein expression. Finally, single colonies were picked in the next day for plasmid extraction and digestion verification.
Metabolite Extraction, HPLC, and Liquid Chromatography–Mass Spectrometry Analyses.
The supernatant of fermentation broth was adjusted to pH ∼3 by adding HCl and then mixed with an equal volume of ethyl acetate. Organic phase was dried using rotovap and then dissolved in 1 mL of methanol. After being filtered by a 0.2-μm filter, the crude extract was injected into HPLC. All HPLC analyses were carried out on Agilent 1260, equipped with multiple wavelength detector, using analytical column Kinetex SB-C18 (4.6 × 180 mm, 5 μm) with a flow rate of 1.0 mL/min. Two methods were used for analytical HPLC, which differed only in detector wavelength settings. Method 1 was set to 220, 260, 280, 300, and 320 nm for detection of compounds other than polyenomycins A and B. Method 2 was set to 300, 320, 360, 400, and 450 nm for detection of the polyenomycins. Solvent A was water with 0.1% trifluoroacetic acid (TFA), and Solvent B was acetonitrile with 0.1% TFA. Flow rate was set to 1.0 mL/min, and the following program was used: 5 to 50% B in 15 min, 50 to 100% B in 5 min, held at 100% B for 4 min, 100 to 5% B in 1 min, and held 5% B for 5 min. For Liquid Chromatography–Mass Spectrometry (LC-MS) analysis, ESI positive ion mode (Bruker, Amazon SL Ion Trap) was used, equipped with a Kinetex 2.6-μm XB-C18 100 Å (Phenomenex).
Isolation of Tetraketides 1, 3, and 8.
A same procedure was applied to the isolation of 1, 3, and 8. Specifically, fermentation was performed using 4 L of MYM liquid medium. Crude extract was obtained by 1:1 ethyl acetate extraction and dissolved in methanol. After centrifugation to remove the insolubles, the sample was loaded onto a Sephadex LH-20 column pre-equilibrated by methanol. The first 30-min eluent was discarded while the subsequent fractions were collected every 5 min. Each fraction was checked by HPLC, and fractions containing the target peak were pooled together and finally subjected to semipreparative HPLC for further purification. The freshly purified compound was injected into HPLC to check purity. Partial acetonitrile was removed from the HPLC collections by rotovap, leaving mostly water, and then the sample was frozen at −80 °C for more than 1 h and finally subjected to lyophilizer to dry overnight. About 0.5 to 1 mg of pure compound was obtained. Deuterated dimethylsulfoxide was used to dissolve the sample for NMR experiments. NMR analysis was performed on an Agilent 600-MHz NMR spectrometer (SI Appendix, Figs. S3–S7, S9–S13, and S26–S30 and Tables S3, S4, and S6).
Isolation of Allenomycins 6 and 7.
Fermentation was performed using 10 L of R5 liquid medium. After centrifugation, the supernatant was directly extracted by 1:1 ethyl acetate after the pH was adjusted to 3, and the cell pellets were extracted first by acetone and then by ethyl acetate. Two ethyl acetate fractions were pooled together and then semidried by rotovap (never overdried). Hexane and silica gel were then added into the rotovap flask. After removing most of the organic phase (never overdried), the sample was loaded onto a silica gel column. Two column volumes of eluents were applied, i.e., pure hexane, hexane:EtOAc 1:1, hexane:EtOAc 1:2, pure EtOAc, and pure methanol. Each fraction was analyzed by HPLC and the fractions containing the target peak were pooled and semidried (never overdried) and finally subjected to semipreparative HPLC for further purification. The HPLC collection was diluted fivefold and then applied to C18 solid-phase extraction. D2O was used to wash the column to completely remove H2O and acetonitrile. Then the compound was eluted by 1 mL of CD3OD and directly loaded into the NMR tube for NMR experiments (SI Appendix, Figs. S16–S25 and Table S5).
Double Agar Layer Assay.
The control strain sgriCK was first spot-inoculated on MYM solid medium containing 0.01% apramycin (the bottom layer) and incubated at 28 °C for 24 h until aerial hyphae were about to grow out. Then another layer of MYM medium with 0.01% apramycin (kept at 40 °C to avoid solidification) was poured onto the bottom layer. After the medium solidified, the disruption mutant IsgriM25 was streaked out all through the upper layer. Finally the double agar layer plate was incubated at 28 °C for another 4 d. Different inoculation patterns of sgriCK on the bottom layer generated consistent results indicating that IsgriM25 can grow only on top of sgriCK.
RT-PCR.
Total RNAs were extracted using the PureLink RNA Mini Kit (Thermo Fisher Scientific). Target Streptomyces species G31, s4, and sgri were grown for 36 h in yeast extract–malt extract medium (48) before mycelia were collected for RNA extraction. Isolated total RNA was first subjected to genomic DNA removal and then reverse-transcribed using SuperScript IV VILO Master Mix (Thermo Fisher Scientific). The internal reference gene was the hrdB (37) homologous sigma factors. Gene-specific primers are listed in SI Appendix, Table S7.
Phylogenetic Analysis.
A total of 609 Streptomyces genomes were downloaded from the antiSMASH database (24) as a local genome database. In order to make the numbering system orderly, contigs in a GenBank file were merged with an Nx100 linker, which was temporarily called “gap.” The general bioinformatic workflow was as follows: 1) A tBLASTn search was performed using the conserved domain signatures listed in SI Appendix, Table S1, against the local genome database. All BGCs containing at least the KS-AT didomain within 800 bp were extracted based on a widely accepted 20-kb rule; i.e., any domain hits within a 20-kb region from the last hit is considered part of this BGC until the last 20 kb of the expanded gene cluster covers no more hits. It should be noted that the ACP/PCP domain sequence varies a lot and thus is difficult to be precisely predicted. For this, we did not constrain the ACP/PCP domain as a must-present domain for prediction. 2) Another 20-kb rule was set to remove those gap-containing BGCs and those with flanking regions less than 20 kb, i.e., located adjacent to the contig edges. This step filtered off about 50% of the total gene clusters, 3,804 BGCs from 595 species down to 1,784 BGCs from 508 species. 3) The BGCs were sorted and plotted out based on the number of KS-AT didomains. The monomodular T1PKS BGCs were thus readily screened out. 4) All of the KS domain sequences were extracted for the Cytoscape (25) SSN construction. An all-to-all BLASTp was performed, and the output data were subjected to Cytoscape mapping, in which the prefuse force-directed layout was applied. A cutoff value was set to E = 10−130 after several rounds of optimization based on the internal reference sequences. 5) All of the KSs from monomodular T1PKS BGCs were written into an Node Attribute file and imported to Cytoscape for visualization. The module compositions of each clade were checked with the BGC plotting, and the nodes not from monomodular T1PKS BGCs but located in the monomodular clades were manually double-checked and corrected. 6) For construction of the phylogenetic tree in order to reflect the distance and relatedness of each clade in the SSN map, we performed Multiple Sequence Comparison by Log-Expectation (52) using Fas (Mycobacterium leprae) and FasA (Aspergillus arachidicola) as outgroups. Due to its 500-sequence limitation, we randomly sampled five datasets of 500 KSs with each containing KSs from 30 reference iterative T1PKSs, 400 from monomodular T1PKS BGCs, and 70 from nonmonomodular T1PKS BGCs (multimodular T1PKSs and T1PKS/NRPS hybrid) out of 6,022 KS sequences. The output data were downloaded and modified by MEGA 4.
Data Availability.
All genome sequences used in this study for bioinformatics analysis can be accessed from the antiSMASH database (https://antismash-db.secondarymetabolites.org/#!/browse), including the aln BGC which is Cluster 24 in S. griseofuscus NRRL B-5429, and the accession numbers for the single T1PKS proteins contained in the monomodular T1PKS BGCs are WP_032920506.1, WP_078620314.1, CAB71915.1, WP_042799407, KOV99488.1, WP_030381252.1, WP_078967536, WP_125210158.1, WP_033279691, and KOV67645.1, KUL64015.1. Any additional information is available from the corresponding author upon reasonable request.
Supplementary Material
Acknowledgments
We thank Keqiang Fan and Jinwei Ren (Institute of Microbiology, Chinese Academy of Sciences) for writing the CDS_plotting script and for advice on handling volatile compounds, respectively; Zhong (Lucas) Li (School of Molecular and Cellular Biology Metabolomics Center) for LC-MS training and data analysis; and Furong Sun (School of Chemical Sciences Mass Spectrometry Laboratory) for collecting High Resolution Mass Spectrometry data and data analysis. This work was supported by NIH Grants GM077596 and AI144967 (to H.Z.). Some of the data were collected by the Carl R. Woese Institute for Genomic Biology Core on a 600 MHz NMR funded by NIH Grant S10-RR028833.
Footnotes
The authors declare no competing interest.
This article is a PNAS Direct Submission. D.H.S. is a guest editor invited by the Editorial Board.
Data deposition: All genome sequences used in this study for bioinformatics analysis can be accessed from the antiSMASH database, https://antismash-db.secondarymetabolites.org/#!/browse, and the accession nos. for the single T1PKS proteins contained in the monomodular T1PKS BGCs are WP_032920506.1, WP_078620314.1, CAB71915.1, WP_042799407, KOV99488.1, WP_030381252.1, WP_078967536, WP_125210158.1, WP_033279691, KOV67645.1, and KUL64015.1.
See online for related content such as Commentaries.
This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1917664117/-/DCSupplemental.
References
- 1.Katz L., Baltz R. H., Natural product discovery: Past, present, and future. J. Ind. Microbiol. Biotechnol. 43, 155–176 (2016). [DOI] [PubMed] [Google Scholar]
- 2.Walsh C. T., Polyketide and nonribosomal peptide antibiotics: Modularity and versatility. Science 303, 1805–1810 (2004). [DOI] [PubMed] [Google Scholar]
- 3.Staunton J., Weissman K. J., Polyketide biosynthesis: A millennium review. Nat. Prod. Rep. 18, 380–416 (2001). [DOI] [PubMed] [Google Scholar]
- 4.Lohman J. R., et al. , Structural and evolutionary relationships of “AT-less” type I polyketide synthase ketosynthases. Proc. Natl. Acad. Sci. U.S.A. 112, 12693–12698 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Nguyen T., et al. , Exploiting the mosaic structure of trans-acyltransferase polyketide synthases for natural product discovery and pathway dissection. Nat. Biotechnol. 26, 225–233 (2008). [DOI] [PubMed] [Google Scholar]
- 6.Cheng Y.-Q., Tang G.-L., Shen B., Type I polyketide synthase requiring a discrete acyltransferase for polyketide biosynthesis. Proc. Natl. Acad. Sci. U.S.A. 100, 3149–3154 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Fischbach M. A., Walsh C. T., Assembly-line enzymology for polyketide and nonribosomal peptide antibiotics: Logic, machinery, and mechanisms. Chem. Rev. 106, 3468–3496 (2006). [DOI] [PubMed] [Google Scholar]
- 8.Hertweck C., The biosynthetic logic of polyketide diversity. Angew. Chem. Int. Ed. Engl. 48, 4688–4716 (2009). [DOI] [PubMed] [Google Scholar]
- 9.Moss S. J., Martin C. J., Wilkinson B., Loss of co-linearity by modular polyketide synthases: A mechanism for the evolution of chemical diversity. Nat. Prod. Rep. 21, 575–593 (2004). [DOI] [PubMed] [Google Scholar]
- 10.Zhang J. J., Tang X., Huan T., Ross A. C., Moore B. S., Pass-back chain extension expands multimodular assembly line biosynthesis. Nat. Chem. Biol. 16, 42–49 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Staunton J., The extraordinary enzymes involved in erythromycin biosynthesis. Angew. Chem. Int. Ed. Engl. 30, 1302–1306 (1991). [Google Scholar]
- 12.Waldron C., et al. , Cloning and analysis of the spinosad biosynthetic gene cluster of Saccharopolyspora spinosa. Chem. Biol. 8, 487–499 (2001). [DOI] [PubMed] [Google Scholar]
- 13.Keatinge-Clay A. T., The structures of type I polyketide synthases. Nat. Prod. Rep. 29, 1050–1073 (2012). [DOI] [PubMed] [Google Scholar]
- 14.Laureti L., et al. , Identification of a bioactive 51-membered macrolide complex by activation of a silent polyketide synthase in Streptomyces ambofaciens. Proc. Natl. Acad. Sci. U.S.A. 108, 6258–6263 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Chooi Y.-H., Tang Y., Navigating the fungal polyketide chemical space: From genes to molecules. J. Org. Chem. 77, 9933–9953 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Cox R. J., Skellam E., Williams K., “Biosynthesis of fungal polyketides” in Physiology and Genetics: Selected Basic and Applied Aspects, Anke T., Schüffler A., Eds. (Springer International Publishing, Cham, Switzerland, 2018), pp. 385–412. [Google Scholar]
- 17.Chen H., Du L., Iterative polyketide biosynthesis by modular polyketide synthases in bacteria. Appl. Microbiol. Biotechnol. 100, 541–557 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Li M. H., Ung P. M., Zajkowski J., Garneau-Tsodikova S., Sherman D. H., Automated genome mining for natural products. BMC Bioinformatics 10, 185 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Weber T., et al. , antiSMASH 3.0: A comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res. 43, W237–W243 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Herbst D. A., Townsend C. A., Maier T., The architectures of iterative type I PKS and FAS. Nat. Prod. Rep. 35, 1046–1069 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.He J., Hertweck C., Iteration as programmed event during polyketide assembly: Molecular analysis of the aureothin biosynthesis gene cluster. Chem. Biol. 10, 1225–1232 (2003). [DOI] [PubMed] [Google Scholar]
- 22.Olano C., et al. , Biosynthesis of the angiogenesis inhibitor borrelidin by Streptomyces parvulus Tü4055: Cluster analysis and assignment of functions. Chem. Biol. 11, 87–97 (2004). [DOI] [PubMed] [Google Scholar]
- 23.Fisch K. M., Biosynthesis of natural products by microbial iterative hybrid PKS–NRPS. RSC Adv. 3, 18228–18247 (2013). [Google Scholar]
- 24.Blin K., et al. , The antiSMASH database version 2: A comprehensive resource on secondary metabolite biosynthetic gene clusters. Nucleic Acids Res. 47, D625–D630 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Shannon P., et al. , Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Gerlt J. A., Genomic enzymology: Web tools for leveraging protein family sequence-function space and genome context to discover novel functions. Biochemistry 56, 4293–4308 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kroken S., Glass N. L., Taylor J. W., Yoder O. C., Turgeon B. G., Phylogenomic analysis of type I polyketide synthase genes in pathogenic and saprobic ascomycetes. Proc. Natl. Acad. Sci. U.S.A. 100, 15670–15675 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Pishchany G., et al. , Amycomicin is a potent and specific antibiotic discovered with a targeted interaction screen. Proc. Natl. Acad. Sci. U.S.A. 115, 10124–10129 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ray L., Moore B. S., Recent advances in the biosynthesis of unusual polyketide synthase substrates. Nat. Prod. Rep. 33, 150–161 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Blin K., et al. , antiSMASH 5.0: Updates to the secondary metabolite genome mining pipeline. Nucleic Acids Res. 47, W81–W87 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Beck B. J., Aldrich C. C., Fecik R. A., Reynolds K. A., Sherman D. H., Iterative chain elongation by a pikromycin monomodular polyketide synthase. J. Am. Chem. Soc. 125, 4682–4683 (2003). [DOI] [PubMed] [Google Scholar]
- 32.Labes G., Bibb M., Wohlleben W., Isolation and characterization of a strong promoter element from the Streptomyces ghanaensis phage I19 using the gentamicin resistance gene (aacC1) of Tn 1696 as reporter. Microbiology 143, 1503–1512 (1997). [DOI] [PubMed] [Google Scholar]
- 33.Cruz-Morales P., et al. , Phylogenomic analysis of natural products biosynthetic gene clusters allows discovery of arseno-organic metabolites in model Streptomycetes. Genome Biol. Evol. 8, 1906–1916 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Schöner T. A., et al. , Aryl polyenes, a highly abundant class of bacterial natural products, are functionally related to antioxidative carotenoids. ChemBioChem 17, 247–253 (2016). [DOI] [PubMed] [Google Scholar]
- 35.Cimermancic P., et al. , Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell 158, 412–421 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Jenke-Kodama H., Dittmann E., Evolution of metabolic diversity: Insights from microbial polyketide synthases. Phytochemistry 70, 1858–1866 (2009). [DOI] [PubMed] [Google Scholar]
- 37.Shiina T., Tanaka K., Takahashi H., Sequence of hrdB, an essential gene encoding sigma-like transcription factor of Streptomyces coelicolor A3(2): Homology to principal sigma factors. Gene 107, 145–148 (1991). [DOI] [PubMed] [Google Scholar]
- 38.Wang B., Guo F., Dong S.-H., Zhao H., Activation of silent biosynthetic gene clusters using transcription factor decoys. Nat. Chem. Biol. 15, 111–114 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kenar J. A., Moser B. R., List G. R., “Naturally occurring fatty acids: Source, chemistry, and uses” in Fatty Acids Chemistry, Synthesis, and Applications, Ahmad M. U., Ed. (AOCS Press, 2017), pp. 23–82. [Google Scholar]
- 40.Ahmad M. U., Ed. Fatty Acids Chemistry, Synthesis, and Applications, ( AOCS Press, 2017), pp. Xxi–Xxii. [Google Scholar]
- 41.Dembitsky V. M., Maoka T., Allenic and cumulenic lipids. Prog. Lipid Res. 46, 328–375 (2007). [DOI] [PubMed] [Google Scholar]
- 42.Hoffmann-Röder A., Krause N., Synthesis and properties of allenic natural products and pharmaceuticals. Angew. Chem. Int. Ed. Engl. 43, 1196–1216 (2004). [DOI] [PubMed] [Google Scholar]
- 43.Horler D. F., (-) Methyl n-tetradeca-trans-2,4,5-trienoate, an allenic ester produced by the male dried bean beetle, Acanthoscelides obtectus (Say). J. Chem. Soc. Perkin 1 6, 859–862 (1970). [DOI] [PubMed] [Google Scholar]
- 44.Horinouchi S., Beppu T., A-factor as a microbial hormone that controls cellular differentiation and secondary metabolism in Streptomyces griseus. Mol. Microbiol. 12, 859–864 (1994). [DOI] [PubMed] [Google Scholar]
- 45.Cao Z., Yoshida R., Kinashi H., Arakawa K., Blockage of the early step of lankacidin biosynthesis caused a large production of pentamycin, citreodiol and epi-citreodiol in Streptomyces rochei. J. Antibiot. (Tokyo) 68, 328–333 (2015). [DOI] [PubMed] [Google Scholar]
- 46.Shizuri Y., et al. , Isolation and stereostructures of citreoviral, citreodiol, and epicitreodiol. Tetrahedron Lett. 25, 4771–4774 (1984). [Google Scholar]
- 47.Pfeifer B. A., Admiraal S. J., Gramajo H., Cane D. E., Khosla C., Biosynthesis of complex polyketides in a metabolically engineered strain of E. coli. Science 291, 1790–1792 (2001). [DOI] [PubMed] [Google Scholar]
- 48.Kieser T., Bibb M. J., Buttner M. J., Chater K. F., Hopwood D. A., Practical Streptomyces Genetics (John Innes Foundation Norwich, 2000). [Google Scholar]
- 49.Gibson D. G., et al. , Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009). [DOI] [PubMed] [Google Scholar]
- 50.Wang B., et al. , Kinamycin biosynthesis employs a conserved pair of oxidases for B-ring contraction. Chem. Commun. (Camb.) 51, 8845–8848 (2015). [DOI] [PubMed] [Google Scholar]
- 51.Yang K., Han L., He J., Wang L., Vining L. C., A repressor-response regulator gene pair controlling jadomycin B production in Streptomyces venezuelae ISP5230. Gene 279, 165–173 (2001). [DOI] [PubMed] [Google Scholar]
- 52.Madeira F., et al. , The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res. 47, W636–W641 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All genome sequences used in this study for bioinformatics analysis can be accessed from the antiSMASH database (https://antismash-db.secondarymetabolites.org/#!/browse), including the aln BGC which is Cluster 24 in S. griseofuscus NRRL B-5429, and the accession numbers for the single T1PKS proteins contained in the monomodular T1PKS BGCs are WP_032920506.1, WP_078620314.1, CAB71915.1, WP_042799407, KOV99488.1, WP_030381252.1, WP_078967536, WP_125210158.1, WP_033279691, and KOV67645.1, KUL64015.1. Any additional information is available from the corresponding author upon reasonable request.