Abstract
Borosins are ribosomally synthesized and post-translationally modified peptides (RiPPs) with α-N-methylations installed on the peptide backbone that impart unique properties like proteolytic stability to these natural products. The borosin RiPP family was initially reported only in fungi until our recent discovery and characterization of a Type IV split borosin system in the metal-respiring bacterium Shewanella oneidensis. Here, we used hidden Markov models and sequence similarity networks to identify over 1600 putative pathways that show split borosin biosynthetic gene clusters are widespread in bacteria. Noteworthy differences in precursor and α-N-methyltransferase open reading frame sizes, architectures, and core peptide properties allow further subdivision of the borosin family into six additional discrete structural types, of which five have been validated in this study.
Graphical Abstract

INTRODUCTION
Early top-down approaches to natural product discovery relied heavily on culturing microbes and performing bioactivity-guided screens. Scores of important and potent antibiotics, anticancer agents, and immunosuppressants were discovered using this approach. Over time, as much of the “low-hanging fruit” was picked, rediscovery rates increased, and the identification of new bioactive scaffolds plummeted.1 However, rapid advances in DNA sequencing technologies and bioinformatics tools in the 21st century have reinvigorated the field of natural products. The plethora of accessible (meta)genomic data has enabled a genomics-driven approach to natural product discovery that offers insight into the true biosynthetic potential of microbes through culture-independent methods.2
One class of natural products that have benefited from in silico approaches are ribosomally synthesized and post-translationally modified peptides (RiPPs).3,4 RiPPs are produced as genomically encoded precursor peptides composed of a leader and core peptide (Figure 1). Tailoring enzymes, typically encoded alongside the precursor gene in a biosynthetic gene cluster (BGC), recognize the leader peptide and modify the core peptide. The final steps in RiPP biosynthesis usually involve proteolytic cleavage of the core peptide and export of the mature natural product. While RiPPs are limited to the 20 canonical amino acids as building blocks, exquisite chemical diversity is generated by hypervariable core sequences and extensive post-translational modifications (PTMs) that enhance their stability and activity.5
Figure 1.

RiPP precursor peptide maturation. Borosin precursor peptides are α-N-methylated on the peptide backbone through an S-adenosylmethionine (SAM)-dependent process. SAH = S-adenosylhomocysteine.
Along with our collaborators, we and others identified the fungal nematicide omphalotin A as the first RiPP natural product with amide backbone α-N-methylations.6,7 Once thought to be an exclusive feature of nonribosomal peptides, amide backbone α-N-methylation imparts peptides with properties that include enhanced backbone rigidity, membrane permeability, and resistance to proteases, making these modifications attractive for pharmacological development. In omphalotin biosynthesis, an α-N-methyltransferase domain (NMT) in the RiPP precursor OphMA is fused to the core peptide substrate and iteratively α-N-methylates the omphalotin A sequence in an N- to C-terminal fashion.6 We named this family of α-N-methylated RiPPs the borosins, with the omphalotins as their founding members. Soon after this discovery, the cytotoxic gymnopeptides8 and the omphalotin-related lentinulin A and dentrothelin A9 were also found to be from the borosin RiPP family. A more thorough analysis of the fungal borosin pathways revealed different protein architectures of the fused methyltransferase-encoding precursors. Based on differences found in the leader and core peptides, the precursors in the putative borosin pathways were categorized into three structural types (Figure 2a).8
Figure 2.

Borosin α-N-methyltransferases and precursor peptides are subdivided into Types I–X based on overall protein architecture and core peptide composition. Cartoon representations are displayed for borosin NMTs and/or precursor peptide structural types. (a) Previously characterized Type I–III fused borosin systems from fungi listed alongside verified examples. (b) “Split” borosins Types IV–VIII identified in bacteria. Types V–VIII are newly described in this study: RceA/M, Rhodospirillum centenum SW; SliA/M, Spirosoma linguale DSM 74; AinA/M, Achromobacter insuavis AXX-A; and PmoA/M, Pseudomonas mosselii CIP 105259. Sequences below protein architectures shown in black text were identified by LC–MS/MS; sequences in bracketed gray italics were not identified by LC–MS/MS. (c) Multi-domain-encoding bacterial “split” borosins Types IX and X (new to this study). Diffuse methylation patterns of Asp and Glu residues are observed with the Type IX borosin system SurA/M1 from Streptomyces ureilyticus. As the core peptide is not well defined, the entire precursor is depicted in pale yellow.
Crystallographic interrogation of the Type I omphalotin-encoding precursor OphMA spurred a proposal for the molecular mechanism of catalysis.10 Briefly, water-mediated proton abstraction from the target amide forms an imidate stabilized by an oxyanion hole. Subsequent nucleophilic attack on the methyl-donating cofactor S-adenosylmethionine (SAM) yields an α-N-methylated amide and S-adenosylhomocysteine (SAH) as byproducts. This process occurs iteratively in an N-to-C direction on the core peptide to create the omphalotin A α-N-methylated backbone.
The diversity of substrate-fused NMT architectures across fungi raised the intriguing question of whether bacteria also harbor borosin BGCs. Iterative PSI-BLAST searches of the OphMA NMT domain yielded putative hits that included one from the metal-respiring bacterium Shewanella oneidensis MR-1.11 Unlike the canonical substrate-fused NMT, this putative borosin BGC followed traditional RiPP biosynthetic logic with discrete NMT and precursor peptide open reading frames. Mass spectrometric and kinetics analyses revealed α-N-methylation of two residues in the core peptide, indicating methylation occurred truly in trans, unlike previously identified substrate-fused Type I–III borosin systems. Crystal structures of fully resolved NMT-precursor complexes illustrate a five-helix bundle termed the borosin binding domain (BBD) as the dominant interaction domain of the precursor peptide with the NMT. In addition, stark conformational changes observed among different NMT-precursor mutants sparked a hypothesis that core peptide secondary structure changes direct iterative N-to-C α-N-methylation. Kinetic analysis revealed multiple substrate turnover and an efficient system with an apparent kcat of 0.52 min−1 compared to 0.0053 min−1 for single peptide turnover observed with OphMA. Divergence of the S. oneidensis-type borosin protein architectures from the canonical fused Type I–III fungal systems warranted the designation of Type IV borosins informally referred to as split borosins.11
In this work, we perform extensive analyses and verification of split borosin pathways to reveal a diverse family of RiPPs widespread across bacteria. We validate five putative bacterial split borosin NMT and precursor pairs with distinct methylation patterns among six additional borosin structural types (Types V–X). Bioinformatic analysis of split borosins from major bacterial groups detected in BLASTP searches uncovered >1600 putative split borosins BGCs among many of the architectural types. The combination of in vivo and in silico data analysis allowed us to develop a set of rules using hidden Markov models (HMMs) from conserved borosin NMT regions to allow researchers to detect and mine putative borosin gene clusters in programs such as antiSMASH.12 The discovery and detailed analyses of borosin BGCs open the door to many putative bioactive metabolites and biotechnology applications for amide backbone α-N-methylated peptides.
RESULTS AND DISCUSSION
Split Borosin Architecture Variability.
Due to the variable architectures of fused RiPP precursors observed in fungal borosin structural Types I–III, we sought to determine whether the newly identified bacterial split borosin NMTs and precursors were similarly diverse in size and sequence. Manual inspection was initially performed on bacterial genetic loci surrounding PSI-BLAST hits in bacterial genomes using the omphalotin methyltransferase domain in OphMA as a query. Interestingly, putative NMTs ranged in length from ~250 amino acids (AAs) to over 1000 AAs (Figure 2). Heterologous expression of select bacterial NMTs alone followed by high-resolution, high-pressure liquid chromatography–mass spectrometric (LC–MS/MS) analysis did not yield observable autocatalytic self-methylation as in the fungal borosin systems.
Like with many RiPP classes, identification of split borosin precursors was not straightforward, albeit for different reasons than the typically hard-to-identify, short hypervariable precursor peptides.4 While some putative borosin pathways encoded relatively short peptides (50–100 AAs) homologous to SonA and were easily pinpointed, many BGCs did not. Careful analysis of the encoded surrounding genes revealed several large proteins of unknown function, some in excess of 600 AAs. Fortunately, a subset of these putative proteins contained sequence repeats reminiscent of the cores in fungal borosin Type II and Type III pathways8 and were subsequently flagged as possible precursors.
To validate putative split borosin BGCs, NMTs and putative precursors were categorized into additional structural types based on their overall protein sizes and compositions (Figure 2). Precursor–NMT pairs from each new structural type were then heterologously expressed in Escherichia coli and purified via nickel-chelate affinity chromatography. Precursors from in vivo coexpressions with their cognate NMTs were subsequently digested and analyzed by LC–MS/MS. Through extensive manual analysis and use of the proteomics software MaxQuant,13 a wide variety of split borosin core peptides in structural Types V–IX were found to be extensively α-N-methylated (Figures 2 and S1).
Borosin Types V–IX Methylation Patterns.
Ordering of the split borosin architectural types was based on the collective length of NMT and precursor, similarly to what was reported for the fused borosin fungal systems Types I–III, and continuing from the first Type IV split borosins identified in bacteria.11 Type V split borosins were verified through expression of RceM and RceA from R. centenum SW. Similarly to the fused fungal Type II borosins, Type V split borosins are signified by a multicore precursor peptide, where a single methylation is incorporated in each near-identical core repeat. Ten near-identical core peptide sequences of “DVIELSSG-GEL” are found in the precursor RceA. Due to proteolytic limitations for MS/MS analysis, the mutant RceA S78F was also created and analyzed by LC–MS/MS to provide evidence for methylation in all 10 copies of the core sequence repeat (Figures S1 and S2). Curiously, the borosin binding domain (BBD), shown crucial for precursor peptide docking with the NMT of Type IV borosins, was predicted by HHpred14 to be encoded in the RceM NMT domain and not within the short RceA leader peptide (Figure 2). Consequently, the structural determinants required for precursor–NMT interactions in Type V split borosins may differ from those observed in Type IV borosins.
For split borosin Types VI–VIII, we identified putative precursor genes adjacent to the NMTs encoding repetitive sequences reminiscent of the methylated “DVDVT” repeats found in the fungal Type III borosin AboMA.8 Each of Types VI–VIII has distinct architectures and methylation patterns. For example, SliA (Type VI), a putative precursor protein encoded in a BGC from S. linguale DSM 74, is a 300-AA protein containing several TEVX repeats (X = Thr, Val, Ala). A defining feature of Type VI split borosins is the successive methylation pattern observed within its core peptide. Combining the results of several LC–MS/MS-verified peptide fragments, we observed 32 consecutive α-N-methylated amino acids (Figures 3 and S1). Due to the difficulties in overexpression, purification, and a high incidence of soluble aggregate formation with many of these new split borosin NMT-precursor pairs (Figure S3), in vitro characterization has proven cumbersome, and complete methylation of the core peptide has not been achieved for Types VI and VIII. For instance, we hypothesize that up to 48 successive methylations are incorporated into SliA by SliM.
Figure 3.

Type VI split borosin BGC verification and workflow. The putative split borosin BGC encoded in S. linguale DSM 74 was selected for verification. The NMT and precursor were coexpressed in E. coli, and the precursor was purified and proteolytically digested prior to HPLC-MS/MS analysis. A series of five partially α-N-methylated fragments were detected and summed to report the maximal number of N-to-C–incorporated methylations observed. The 32 consecutive α-N-methylations observed in this system are in contrast to reports that bacterial borosins harbor fewer modifications than their fungal counterparts.15
A relatively short precursor protein (~110 AA) encoding two small sets of “DVDV” repeats and an unusually large NMT in excess of 1000 AA signifes Type VII split borosin pairs. Type VII split borosins were verified through coexpression of AinA and AinM from A. insuavis AXX-A (formerly identified as Achromobacter xylosoxidans).16 Interestingly, the methylation of the acidic residues displays an opposite pattern as compared to the AboMA Type III fused borosin, where methylation occurs on Val and Thr residues (Figure S1). Despite substantial mutational analyses and core peptide replacements, the “rules” for α-N-methylation incorporation into borosins are still enigmatic.6,17 A combination of sequence and secondary structural features of the core peptide likely contribute to the fidelity of borosin methylation in each RiPP system.11
Similar to Type VII, the Type VIII split borosin systems also incorporate methylations of acidic residues in “D[V/I]D[V/I]” repeats. However, the substantial size of the precursor peptide (~600 AA) along with the location of the core peptide in the middle of the protein makes this RiPP precursor quite an unusual example and justifies its own division among the split borosin BGCs (Figure 2).
The affixed domains found in the split borosin precursors and larger α-N-methyltransferases garnered interest as to whether additional catalytic domains might be encoded in these genes. Beyond the autocatalytic fungal borosin methyltransferases, RiPP precursors have been recently found in plants to encode repetitive core sequences and a C-terminal BURP domain that functions as a macrocyclase.18 However, no additional catalytic domains were identified in NMTs or precursors in Types VI–VIII by programs such as HHpred.14 As an alternative method, we ran all of our newly identified NMT-precursor pairs for structural predictions through AlphaFold2.19 While AlphaFold2 predicted folds for several of the additional domains (Figure S4), the predictions are based on very few homologous sequences, and the resulting low-confidence structures have yet to reveal additional catalytic functions. Despite our best efforts, we have not yet observed PTMs beyond α-N-methylation in Type VI–VIII split borosin NMT-precursor coexpressions.
In contrast, a subset of split borosin α-N-methyltransferases do appear to encode additional catalytic domains. Borosin NMTs can be found fused to the C-terminus of GGDEF-containing proteins, similar to what is found in the S. oneidensis-Type IV borosin BGC.11 GGDEF proteins produce cyclic di-GMP, a secondary messenger in bacteria that has been linked to the lifestyle decision between motility and biofilm formation.20 These proteins are predicted to span the inner membrane with a tetratricopeptide repeat (TPR) domain21 displayed in the periplasm. Another multidomain example (Type IX) found in S. ureilyticus appears to have a duplicated and fused second borosin methyltransferase domain (Figure 2). The two domains are highly similar, with 84.1% AA identity and 91.4% AA similarity and all of the active site residues conserved (Figure S5). As expected, heterologous expression and purification revealed that Type IX SurM1 from S. ureilyticus is monomeric in solution as observed by size exclusion chromatography, unlike all other borosin NMTs analyzed to date (Figure S3). Coexpression with its putative precursor protein, SurA, with or without an additional borosin NMT encoded in the BGC, SurM2 (a Type IV borosin NMT with only one catalytic domain), resulted in methylations distributed throughout SurA on Asp and Glu residues, even within the predicted BBD (Figure S1). Inactivation of either SurM1 methyltransferase domains through mutants R69A or R363A led to overall reduced levels of methylation, indicating both domains as active (Figure S1). However, repeated expressions of SurM1 and SurM2 alone or together with the precursor SurA resulted in a subset of different Asp and Glu methylated residues. As such, more information is needed to define the maturation of the precursor in this system.
Sequence Similarity Network (SSN) Analysis.
To determine the abundance of the newly identified borosin architectures (Types IV–X) distributed in nature, a sequence similarity network (SSN) was constructed of OphMA homologs. The sequences were obtained through a BLASTP search querying the NMT domain (residues 1–250) of OphMA against the nonredundant protein database (April 28, 2021). Because an overview of borosins found in fungal genomes has been previously explored,8 only nonfungal sequences were queried. The search returned 1804 sequences with E-values ranging from 4.00 × 10−94 to 0.037 and 21.4–58.2% identity, with the longest sequence comprising 1151 AA (Supplemental Dataset S1). The 1804 sequences are derived from sequenced genomes of 948 distinct organismal strains: 934 bacteria, 11 archaea, and 3 eukaryota. Within the bacterial representatives, the vast majority are classified as Gammaproteobacteria (66.1%), in particular the classes of Alteromonadales (23.4% of all BLASTP hits) and Xanthomonadales (22.5% of all BLASTP hits). Comparatively, available Gammaproteobacteria, Alteromonadales, and Xanthamona-dales represent only 42.9, 0.5, and 1.3% of available nonfungal RefSeq genome assemblies, respectively (as of April 28, 2021). Of the archaea, eight of the 11 (72.7%) representatives are Halobacteria, though Halobacteria comprise only 41.5% of Archaeal RefSeq assemblies. Interestingly, the eukaryotic sequences are from the sea anemones Actinia tenebrosa and Exaiptasia diaphana, and the mollusc Pecten maximus, and have E-values between 1.0 × 10−79 and 3.0 × 10−58. Inspection of the surrounding genes supports the notion that the protein sequences are encoded in nonfungal eukaryotes due to the presence of introns and the observation that the most closely related homologs of the surrounding genes belong to nonfungal eukaryotes.
An SSN was constructed by running an all-vs-all analysis of the sequences using the Needleman–Wunsch global alignment scoring algorithm.22 Sequences included in the network are composed of the BLASTP sequences containing a complete methyltransferase domain (1704 sequences), previously identified borosin α-N-methyltransferases found in fungi (55 sequences),8 and representative sequences of the YabN_N_like domain subfamily (cd11723) of tetrapyrrole methylases (76 sequences), for a total of 1834 sequences, with one sequence appearing in the BLASTP hits as well as the YabN group (Supplemental Dataset S2). The YabN_N_like domains were used as an outgroup as it is a subfamily of the TP_methylase domain family (cd09815) that was identified to be closely related to the borosin methyltransferase domains (cd11724).
The functionally verified borosin architectural types are easily distinguishable from one another in the SSN at a cutoff value of 1.6 (Figure 4). To reduce redundancy, non-representative sequences were identified and removed from the network, resulting in 947 unique sequences. This removal was performed post-clustering to ensure that taxonomic diversity within each group was not obscured. In the network, nine of the BLASTP sequences clustered with the outgroup, and manual inspection confirmed these sequences to be members of the YabN_N_like outgroup.
Figure 4.

Sequence similarity network of the global architecture of putative borosin NMTs. The SSN consists of sequences from a BLASTP search of the α-N-methyltransferase domain of OphMA, sequences of Type I–III methyltransferases involved in fungal borosin biosynthesis (orange), and sequences of YabN_N_like methyltransferases (maroon). Select sets of nodes are colored by the taxonomic groups to which their host organisms are assigned. Diamonds indicate proteins characterized in this manuscript.
Type IV borosins, which include the well-characterized SonM–SonA split borosin pathway pair, are by far the most commonly found borosins (746 members, 78.8%), with most sequences from this group coming from the orders Alteromonadales (167 sequences, 22.4%) and Legionellales (81 sequences, 10.9%). Type V borosins, represented by the multicore RceM–RceA split borosin pair, contain only nine members, all from the order Rhodospirillales. Type VI borosin BGCs contain 19 members, including the sequentially methylating NMT SliM, and are most frequently found in Burkholderiales and Streptomycetales with four members each. Type VII borosins contain only two unique members, one from A. insuavis (this study) and the other from an uncharacterized Magnetococcales bacterium. Type VIII borosins contain 41 representative members, again with the highest represented groups being Burkholderiales and Streptomycetales (seven and five members, respectively). There were only two identified sequences of the duplicate NMT domain-containing Type IX borosins, one each from Streptomycetales and Xanthomonadales. Finally, the transmembrane-spanning, multidomain Type X borosins contain 35 representative members, with the majority (24 sequences) coming from Xanthomonadales. Curiously, some bacterial sequences remained clustered with the fungal sequences, mainly sequences from Enterobacterales.
A number of additional clusters also appeared in the network. One prominent cluster of note is a 20-member group containing both bacterial and archaeal sequences. Within this cluster, the most highly represented taxonomic groups are Streptomycetales and the archaeal Halobacteriales, each with four members. The overall domain architecture of this group does not appear to differ from that of Type IV borosins; the sequences fall between 210 and 309 residues in length, and no other conserved domains were identified. This suggests strong divergence within the sequences of the methyltransferase domains between this group of putative borosins and the general Type IV borosins. Indeed, percent identity to OphMA in the original BLASTP search for this group is an average of 28.5% compared to 38.7% for Type IV borosins. Additionally, an SSN constructed from an all-vs-all BLAST analysis using only the methyltransferase domains from each sequence shows these sequences breaking off from the main group at a relatively less stringent cutoff value of 1 × 10−60 (Figure S6).
This domain-specific SSN also demonstrates that sequence divergence within the NMT domains is not the predominating factor in the overall separation seen among borosin architectural types in the network built from the global alignment algorithm. Under a relatively stringent cutoff value of 1 × 10−90, the Type VI, Type VIII, and fungal (Types I–III) borosin α-N-methyltransferase domains remain clustered together (Figure S7). In addition, Type X borosins cluster closely with a small set of Type IV borosins consisting predominantly of sequences from Gammaproteobacteria and Chromatiales. This indicates that the NMT domains are highly similar among these groups, though they differ in the overall architecture. Interestingly, sequences from Legionellales dissociate from the main group at relatively low-stringency cutoff values (Figure S8). This is a striking contrast to the global analysis, where they remained clustered with other Type IV borosin NMTs, even at high-stringency cutoffs. These observations indicate that, while the overall architecture is consistent with Type IV borosins, the NMT domains in Legionellales diverge from other borosin NMT domains, Type IV or otherwise.
BGC Analysis Through BiG-SCAPE.
We next utilized the antiSMASH—BiG-SCAPE pipeline to identify genes with high incidence within borosin BGCs; genome assemblies available in the RefSeq database corresponding to the protein accession IDs for putative borosin NMTs identified in the BLASTP search were downloaded from NCBI (1902 assemblies). Borosin BGCs (2789 total) were identified by running downloaded assemblies through a locally installed copy of antiSMASH12 containing an added borosin detection rule, BorosinMT, based on highly conserved regions within the NMT domains (Figure 5). Predicted borosin BGC regions were dereplicated using a 90% cutoff value in MMseqs223 (1055 resulting regions) before the construction of a BGC similarity network in BiG-SCAPE (Figures S9 and S10).24 Unsurprisingly, BGCs from closely related organisms showed the highest degree of similarity.
Figure 5.

Profile hidden Markov models (HMMs) used to identify borosin BGCs. Sequence logos for the profile HMMs of conserved motifs within the NMT domain, YGHP_v3 (left) and DCLFAD_v3 (right). The profile HMMs were built from an alignment of unique borosin α-N-methyltransferase domains. Logos were generated using Skylign.25
Looking globally at Pfam domains within the split borosin BGCs, the most common is the GGDEF domain at a frequency of 56%, aside from the NMT-containing TP_methylase domain (Table 1). GGDEF or GGEEF domain-containing proteins synthesize the bacterial secondary messenger cyclic di-GMP.20 Cyclic-di-GMP is involved in a variety of cellular processes, including metabolic decisions between motility and biofilm formation, exopolysaccharide production, and virulence, among others.26 In contrast to the Type X borosins, where the GGDEF domain is fused to the borosin NMT, the vast majority are encoded as separate genes, similar to what is seen in the S. oneidensis BGC. Furthermore, these domains appear in a diverse set of BGCs, including those from Types IV, V, VI, VIII, and X borosins in multiple taxonomic groups, including Proteobacteria, Actinobacteria, Firmicutes, and Bacteroidetes (Supplemental Dataset S3).
Table 1.
25 Most Common Pfam Domains Encoded in Borosin BGCs, Listed in Descending Ordera
| Pfam_ID | Pfam name | count | incidence (%) |
|---|---|---|---|
| PF00590.23 | TP_methylase | 1037 | 100.0 |
| PF00990.24 | GGDEF | 580 | 55.9 |
| PF13302.10 | Acetyltransf_3 | 271 | 26.1 |
| PF00072.27 | Response_reg | 184 | 17.7 |
| PF02518.29 | HATPase_c | 181 | 17.5 |
| PF07690.19 | MFS_1 | 169 | 16.3 |
| PF00589.25 | Phage_integrase | 151 | 14.6 |
| PF00512.28 | HisKA | 147 | 14.2 |
| PF04355.16 | SmpA_OmlA | 133 | 12.8 |
| PF00126.30 | HTH_1 | 131 | 12.6 |
| PF03466.23 | LysR_substrate | 131 | 12.6 |
| PF03364.23 | Polyketide_cyc | 132 | 12.7 |
| PF01668.21 | SmpB | 132 | 12.7 |
| PF03658.17 | Ub-RnfH | 131 | 12.6 |
| PF00005.30 | ABC_tran | 124 | 12.0 |
| PF01274.25 | Malate_synthase | 114 | 11.0 |
| PF00593.27 | TonB_dep_Rec | 109 | 10.5 |
| PF07715.18 | Plug | 99 | 9.5 |
| PF01381.25 | HTH_3 | 98 | 9.5 |
| PF00583.28 | Acetyltransf_1 | 93 | 9.0 |
| PF08668.15 | HDOD | 91 | 8.8 |
| PF00854.24 | PTR2 | 91 | 8.8 |
| PF04055.24 | Radical_SAM | 83 | 8.0 |
| PF00271.34 | Helicase_C | 80 | 7.7 |
| PF00440.26 | TetR_N | 76 | 7.3 |
The complete list of Pfams is available (Supplemental Dataset S3).
Several frequently occurring Pfam domains distributed across split borosin BGCs are known to be involved in protein maintenance and degradation. PF04355 domains (SmpA_OmlA) include BamE homologs (SmpA, OmlA) that are a part of the BAM complex involved in outer membrane β-barrel protein assembly.27 Defects in BamE have been associated with minor defects in outer membrane protein assembly, activation of the σE-mediated envelope stress response, and increased susceptibility to various compounds, including detergents and antibiotics.28,29 Pfam domains PF01668 (SmpB) and PF03658 (Ub-RnfH) are associated with tmRNA-dependent transtranslation protein degradation systems.30 Interestingly, the tmRNA gene ssrA is directly adjacent to the Type IV split borosin BGC recently identified in S. oneidensis.11 Many other prevalent Pfam domains found in split borosin BGCs are related to transport, signal transduction, and regulation (Table 1).
Relatively few of the top-25 identified protein domains are associated with biosynthetic enzymes. Two frequently observed Pfams in split borosin BGCs (PF13302, ~26% incidence; PF00583, ~9% incidence) are acetyltransferases known to acetylate a wide variety of small molecule and protein substrates.31 Several RiPPs, including LAPs and microviridins, are α-N-acetylated at their N-termini.5 Approximately 13% of split borosin BGCs encode PF03364-containing proteins that have homology to a variety of polyketide cyclases and desaturases. Although rare, polyketide-RiPP hybrids do exist, as first confirmed with the discovery of microvionin.32 The last notable Pfam domain in Table 1 encoding likely biosynthetic enzymes is in the radical SAM superfamily with PF04055 domains identified in 8% of split borosin BGCs. Radical SAM enzymes are a particularly diverse set of enzymes catalyzing a wide array of chemically difficult biosynthetic transformations.33 Members of the radical SAM family PF04055 have been found in a wide variety of putative34 and known35 RiPP families that include the bottromycins,36 sactipeptides,37 ranthipeptides,38 streptide,39,40 and PQQ.41 Despite few highly represented biosynthetic enzyme families found across all split borosin BGCs, individual borosin pathways can encode a number of putative post-translationally modifying enzymes. Of the examples functionally validated in this study, the Type VII BGC from A. insuavis encodes several additional biosynthetic enzymes, including putative P450, glycosyltransferase, carbamoyltransferase, poly-ketide synthase, and hydroxylase enzymes (Figure S11). For a more detailed distribution of Pfam domains among closely related borosin BGCs identified through BiG-SCAPE, please refer to Supplemental Dataset S4.
CONCLUSIONS
A combination of advances in DNA sequencing, gene synthesis, and bioinformatics tools have shifted the natural product discovery pipeline in many ways to favor a genomics-driven approach. Through a combination of in silico analyses and in vivo heterologous expressions, we have expanded the borosin RiPP family to encompass an architecturally diverse set of α-N-methyltransferases and precursor pairs predominantly found in bacteria. We have developed a set of profile HMMs that, when used in a locally installed version of antiSMASH, successfully identify borosin α-N-methyltransferases and their surrounding BGCs. Unlike the fused NMT-precursor architectures observed in fungi, bacterially derived borosin BGCs encode discrete precursors and modifying enzymes, as are more commonly seen with RiPPs. However, substantially different protein architectures are found in both NMTs and precursors in these split borosin pathways. NMTs greater than 1000 AAs and precursors in excess of 600 AAs with cores embedded in the middle of the protein are a subset of distinct structural features that have contributed to the diversification of split borosins into Types IV–X. This diversity in the overall architecture as well as in core peptide sequence and methylation pattern, with examples of precursors methylated successively for >30 residues, beg the questions of what are the final metabolites, what are their biological functions, and why have they remained elusive despite their widespread distribution? We are passionately pursuing these questions and many others surrounding these disparate α-N-methylating borosin RiPP pathways. This work continues to highlight the advantages of genomics-driven approaches to natural product BGC discovery.
METHODS
Protein Expression and Purification.
Protein expression and purification were performed as described previously.8 Briefly, genes were expressed in BL21(DE3) or LOBSTR (Kerafast) E. coli at 16 °C for 24, 48, or 72 h. Cells were harvested and lysed using sonication. Recombinant proteins were purified via nickel-affinity chromatography based on the manufacturer’s recommendations (Ni-NTA resin, Gold Biotechnology). An SDS-PAGE gel of all proteins characterized in this work is included (Figure S12).
Peptide Mass Spectrometric Analysis.
LC–MS/MS analysis of digested peptides was performed as described previously.8 Briefly, data were recorded on a Thermo Scientific Fusion or Fusion Lumos mass spectrometer equipped with a Dionex Ultimate 3000 UHPLC system using a nLC column (200 mm × 75 μm) packed using Vydac 5-μm particles with a 300 Å pore size (Hichrom Limited). Elution was performed with a linear gradient using water with 0.1% FA (solvent A) and ACN with 0.1% FA (solvent B) at a flow rate of 0.3 μL min−1. The column was equilibrated with 20% solvent B for 5 min, followed by a linear increase of solvent B to 85% over 32 min and a final elution step with 85% solvent B for 2 min. Mass spectra were acquired in positive ion mode. Full MS was done at a resolution of 60 000 [automatic gain control (AGC) target, 4 × 105; maximum injection time (IT), 50–100 ms; range, 300–1800 m/z], and data-dependent as well as targeted MS/MS was performed at a resolution of 15 000 (AGC target, 5 × 105; maximum IT, 100–500 ms; isolation window, 2.2 m/z) using higher-energy collisional dissociation (HCD) energies from 14 to 25%, with steps of ±4%. Data were processed using Thermo Fisher Xcalibur software and MaxQuant v1.6.10.
Sequence Similarity Networks.
A BLASTP search of the nonredundant protein database using residues 1–250 of the OphMA sequence was conducted using default settings (April 2021). The search returned 1804 protein sequences. Sequences that did not contain a full methyltransferase region (corresponding to residues 1–250 of OphMA) were removed. An all-vs-all BLASTP analysis under default settings was performed on the remaining 1704 sequences in addition to the amino acid sequences of 55 fungal borosins and 76 representative sequences from the YabN_N_like protein superfamily (Supplemental Dataset S2). The all-vs-all similarity scores were visualized in Cytoscape v3.6.1 with various E-value cutoff values. For global alignment, all-vs-all analysis was performed using the needleall module in the Emboss suite with a gap penalty of −10, an extension penalty of −0.5, and an endgap penalty of −10. Scores were normalized by alignment length and then visualized in Cytoscape using various cutoff values. Representative sequences were identified using MMseqs2 v13.4511123 with a minimum identity of 90% using cluster mode (0). Nonrepresentative sequences were then removed from the networks.
Domain Analysis of Putative Borosin BGCs.
Genome assemblies in the RefSeq database linked to the protein accession IDs of the borosin α-N-methyltransferases identified through BLASTP were downloaded (1902 assemblies) and analyzed through antiSMASH12 (containing the previously described BorosinMT rule). Representative sequences were identified from the resulting 2789 clusters using the cluster function in MMseqs223 with a minimum identity of 90% with cluster mode (2) and coverage mode (1) (80% coverage). This resulted in 1055 BGC region sequences, which were then run through BiG-SCAPE v1.1.024 for cluster analysis. Clusters with less than 5 predicted Pfam domains or that did not contain a predicted TP_methylase Pfam domain were removed from the dataset, leaving 1030 clusters (Supplemental Dataset S3). Network information was then imported into and visualized using Cytoscape. BGCs that clustered together with a raw distance cutoff value of 0.5 were assigned to the same BGC family (Supplemental Dataset S4). Heat maps showing the most commonly occurring Pfam domains within cluster families were created using heatmaply in Rstudio v1.2.5033.
Supplementary Material
ACKNOWLEDGMENTS
The authors would like to thank L. Amoureux for strain A. insuavis AXX-A. The authors are grateful to M. Jensen and F. Miller for their initial work with RceA and RceM as well as M. Quijano with AinM. The authors thank M. Medema for helpful discussions concerning the work in this manuscript.
Funding
This work was supported by the National Institute of General Medical Sciences (T32 GM008347 for A.R.L. and R35 GM133475 for M.F.F.) along with the University of Minnesota and the BioTechnology Institute.
Footnotes
Complete contact information is available at: https://pubs.acs.org/10.1021/acschembio.1c01002
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acschembio.1c01002.
Additional experimental details, LC–MS/MS spectra, and additional clustering analyses (PDF); Supplemental Dataset S1: primers, genes, and protein sequences used in this study (XLSX); Supplemental Dataset S2: protein sequences used for the SSN in Figure 4 (XLSX); Supplemental Dataset S3: global BGC Pfam table with organism taxonomic data from BiG-SCAPE analysis (XLSX); and Supplemental Dataset S4: BGC family-specific Pfam tables from BiG-SCAPE analysis related to Figure S9 (XLSX)
The authors declare the following competing financial interest(s): M.F.F. is an inventor on patents US20190112583A1, WO2017EP58327, and on patent application Nos. PCT/US2021/019009 and 62/979,947.
Contributor Information
Aman S. Imani, Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Twin Cities, St. Paul, Minnesota 55108, United States.
Aileen R Lee, Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Twin Cities, St. Paul, Minnesota 55108, United States.
Nisha Vishwanathan, BioTechnology Institute, University of Minnesota, Twin Cities, St. Paul, Minnesota 55108, United States.
Floris de Waal, Bioinformatics Group, Wageningen University, 6708 PB Wageningen, The Netherlands.
Michael F. Freeman, Department of Biochemistry, Molecular Biology, and Biophysics and BioTechnology Institute, University of Minnesota, Twin Cities, St. Paul, Minnesota 55108, United States.
REFERENCES
- (1).Lewis K The Science of Antibiotic Discovery. Cell 2020, 181, 29–45. [DOI] [PubMed] [Google Scholar]
- (2).Baltz RH Natural Product Drug Discovery in the Genomic Era: Realities, Conjectures, Misconceptions, and Opportunities. J. Ind. Microbiol. Biotechnol 2019, 46, 281–299. [DOI] [PubMed] [Google Scholar]
- (3).Skinnider MA; Johnston CW; Edgar RE; Dejong CA; Merwin NJ; Rees PN; Magarvey NA Genomic Charting of Ribosomally Synthesized Natural Product Chemical Space Facilitates Targeted Mining. Proc. Natl. Acad. Sci. U.S.A 2016, 113, E6343–E6351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (4).Tietz JI; Schwalen CJ; Patel PS; Maxson T; Blair PM; Tai H-C; Zakai UI; Mitchell DA A New Genome-Mining Tool Redefines the Lasso Peptide Biosynthetic Landscape. Nat. Chem. Biol 2017, 13, 470–478. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (5).Montalbán-López M; Scott TA; Ramesh S; Rahman IR; van Heel AJ; Viel JH; Bandarian V; Dittmann E; Genilloud O; Goto Y; et al. New Developments in RiPP Discovery, Enzymology and Engineering. Nat. Prod. Rep 2021, 38, 130–239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (6).van der Velden NS; Kälin N; Helf MJ; Piel J; Freeman MF; Künzler M Autocatalytic Backbone N-Methylation in a Family of Ribosomal Peptide Natural Products. Nat. Chem. Biol 2017, 13, 833–835. [DOI] [PubMed] [Google Scholar]
- (7).Ramm S; Krawczyk B; Mühlenweg A; Poch A; Mösker E; Süssmuth RD A Self-Sacrificing N-Methyltransferase Is the Precursor of the Fungal Natural Product Omphalotin. Angew. Chem., Int. Ed 2017, 56, 9994–9997. [DOI] [PubMed] [Google Scholar]
- (8).Quijano MR; Zach C; Miller FS; Lee AR; Imani AS; Künzler M; Freeman MF Distinct Autocatalytic α-N-Methylating Precursors Expand the Borosin RiPP Family of Peptide Natural Products. J. Am. Chem. Soc 2019, 141, 9637–9644. [DOI] [PubMed] [Google Scholar]
- (9).Matabaro E; Kaspar H; Dahlin P; Bader DLV; Murar CE; Staubli F; Field CM; Bode JW; Künzler M Identification, Heterologous Production and Bioactivity of Lentinulin A and Dendrothelin A, Two Natural Variants of Backbone N-Methylated Peptide Macrocycle Omphalotin A. Sci. Rep 2021, 11, No. 3541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (10).Song H; van der Velden NS; Shiran SL; Bleiziffer P; Zach C; Sieber R; Imani AS; Krausbeck F; Aebi M; Freeman MF; et al. A Molecular Mechanism for the Enzymatic Methylation of Nitrogen Atoms within Peptide Bonds. Sci. Adv 2018, 4, No. eaat2720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (11).Miller FS; Crone KK; Jensen MR; Shaw S; Harcombe WR; Elias MH; Freeman MF Conformational Rearrangements Enable Iterative Backbone N-Methylation in RiPP Biosynthesis. Nat. Commun 2021, 12, No. 5355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (12).Blin K; Shaw S; Steinke K; Villebro R; Ziemert N; Lee SY; Medema MH; Weber T AntiSMASH 5.0: Updates to the Secondary Metabolite Genome Mining Pipeline. Nucleic Acids Res. 2019, 47, W81–W87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).Cox J; Neuhauser N; Michalski A; Scheltema RA; Olsen JV; Mann M Andromeda: A Peptide Search Engine Integrated into the MaxQuant Environment. J. Proteome Res. 2011, 10, 1794–1805. [DOI] [PubMed] [Google Scholar]
- (14).Zimmermann L; Stephens A; Nam S-Z; Rau D; Kübler J; Lozajic M; Gabler F; Söding J; Lupas AN; Alva V A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at Its Core. J. Mol. Biol 2018, 430, 2237–2243. [DOI] [PubMed] [Google Scholar]
- (15).Cho H; Lee H; Hong K; Chung H; Song I; Lee J; Kim S Bioinformatic Expansion of Borosins Uncovers Trans-Acting Peptide Backbone N-Methyltransferases in Bacteria. Biochemistry 2022, 61, 183–194. [DOI] [PubMed] [Google Scholar]
- (16).Amoureux L; Bador J; Fardeheb S; Mabille C; Couchot C; Massip C; Salignon AL; Berlie G; Varin V; Neuwirth C Detection of Achromobacter xylosoxidans in Hospital, Domestic, and Outdoor Environmental Samples and Comparison with Human Clinical Isolates. Appl. Environ. Microbiol 2013, 79, 7142–7149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (17).Song H; Fahrig-Kamarauskaitè J; Matabaro E; Kaspar H; Shirran SL; Zach C; Pace A; Stefanov B-A; Naismith JH; Künzler M Substrate Plasticity of a Fungal Peptide α-N-Methyltransferase. ACS Chem. Biol 2020, 15, 1901–1912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (18).Chigumba DN; Mydy LS; de Waal F; Li W; Shafiq K; Wotring JW; Mohamed OG; Mladenovic T; Tripathi A; Sexton JZ; et al. Discovery and Biosynthesis of Cyclic Plant Peptides via Autocatalytic Cyclases. Nat. Chem. Biol 2022, 18, 18–28. [DOI] [PubMed] [Google Scholar]
- (19).Jumper J; Evans R; Pritzel A; Green T; Figurnov M; Ronneberger O; Tunyasuvunakool K; Bates R; Žídek A; Potapenko A; et al. Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596, 583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (20).Simm R; Morr M; Kader A; Nimtz M; Römling U GGDEF and EAL Domains Inversely Regulate Cyclic Di-GMP Levels and Transition from Sessility to Motility. Mol. Microbiol 2004, 53, 1123–1134. [DOI] [PubMed] [Google Scholar]
- (21).Blatch GL; Lässle M The Tetratricopeptide Repeat: A Structural Motif Mediating Protein-Protein Interactions. BioEssays 1999, 21, 932–939. [DOI] [PubMed] [Google Scholar]
- (22).Needleman SB; Wunsch CD A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. J. Mol. Biol 1970, 48, 443–453. [DOI] [PubMed] [Google Scholar]
- (23).Steinegger M; Söding J MMseqs. 2 Enables Sensitive Protein Sequence Searching for the Analysis of Massive Data Sets. Nat. Biotechnol 2017, 35, 1026–1028. [DOI] [PubMed] [Google Scholar]
- (24).Navarro-Muñoz JC; Selem-Mojica N; Mullowney MW; Kautsar SA; Tryon JH; Parkinson EI; De Los Santos ELC; Yeong M; Cruz-Morales P; Abubucker S; et al. A Computational Framework to Explore Large-Scale Biosynthetic Diversity. Nat. Chem. Biol 2020, 16, 60–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (25).Wheeler TJ; Clements J; Finn RD Skylign: A Tool for Creating Informative, Interactive Logos Representing Sequence Alignments and Profile Hidden Markov Models. BMC Bioinform. 2014, 15, No. 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (26).Whiteley CG; Lee D-J Bacterial Diguanylate Cyclases: Structure, Function and Mechanism in Exopolysaccharide Biofilm Development. Biotechnol. Adv 2015, 33, 124–141. [DOI] [PubMed] [Google Scholar]
- (27).Knowles TJ; Scott-Tucker A; Overduin M; Henderson IR Membrane Protein Architects: The Role of the BAM Complex in Outer Membrane Protein Assembly. Nat. Rev. Microbiol 2009, 7, 206–214. [DOI] [PubMed] [Google Scholar]
- (28).Ruiz N; Wu T; Kahne D; Silhavy TJ Probing the Barrier Function of the Outer Membrane with Chemical Conditionality. ACS Chem. Biol 2006, 1, 385–395. [DOI] [PubMed] [Google Scholar]
- (29).Sklar JG; Wu T; Gronenberg LS; Malinverni JC; Kahne D; Silhavy TJ Lipoprotein SmpA Is a Component of the YaeT Complex That Assembles Outer Membrane Proteins in Escherichia coli. Proc. Natl. Acad. Sci. U.S.A 2007, 104, 6400–6405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (30).Keiler KC Biology of Trans-Translation. Annu. Rev. Microbiol 2008, 62, 133–151. [DOI] [PubMed] [Google Scholar]
- (31).Favrot L; Blanchard JS; Vergnolle O Bacterial GCN5-Related N-Acetyltransferases: From Resistance to Regulation. Biochemistry 2016, 55, 989–1002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (32).Wiebach V; Mainz A; Siegert M-AJ; Jungmann NA; Lesquame G; Tirat S; Dreux-Zigha A; Aszodi J; Le Beller D; Süssmuth RD The Anti-Staphylococcal Lipolanthines Are Ribosomally Synthesized Lipopeptides. Nat. Chem. Biol 2018, 14, 652–654. [DOI] [PubMed] [Google Scholar]
- (33).Oberg N; Precord TW; Mitchell DA; Gerlt JA RadicalSAM.Org: A Resource to Interpret Sequence-Function Space and Discover New Radical SAM Enzyme Chemistry. ACS Bio Med Chem Au 2021, 2, 22–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (34).Haft DH; Basu MK Biological Systems Discovery in Silico: Radical S-Adenosylmethionine Protein Families and Their Target Peptides for Posttranslational Modification. J. Bacteriol 2011, 193, 2745–2755. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (35).Mahanta N; Hudson GA; Mitchell DA Radical S-Adenosylmethionine Enzymes Involved in RiPP Biosynthesis. Biochemistry 2017, 56, 5229–5244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (36).Franz L; Kazmaier U; Truman AW; Koehnke J Bottromycins - Biosynthesis, Synthesis and Activity. Nat. Prod. Rep 2021, 38, 1659–1683. [DOI] [PubMed] [Google Scholar]
- (37).Flühe L; Knappe TA; Gattner MJ; Schäfer A; Burghaus O; Linne U The Radical SAM Enzyme AlbA Catalyzes Thioether Bond Formation in Subtilosin A. Nat. Chem. Biol 2012, 8, 350–357. [DOI] [PubMed] [Google Scholar]
- (38).Precord TW; Mahanta N; Mitchell DA Reconstitution and Substrate Specificity of the Thioether-Forming Radical S-Adenosylmethionine Enzyme in Freyrasin Biosynthesis. ACS Chem. Biol 2019, 14, 1981–1989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (39).Schramma KR; Bushin LB; Seyedsayamdost MR Structure and Biosynthesis of a Macrocyclic Peptide Containing an Unprecedented Lysine-to-Tryptophan Crosslink. Nat. Chem 2015, 7, 431–437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (40).Bushin LB; Clark KA; Pelczer I; Seyedsayamdost MR Charting an Unexplored Streptococcal Biosynthetic Landscape Reveals a Unique Peptide Cyclization Motif. J. Am. Chem. Soc 2018, 140, 17674–17684. [DOI] [PubMed] [Google Scholar]
- (41).Barr I; Latham JA; Iavarone AT; Chantarojsiri T; Hwang JD; Klinman JP Demonstration That the Radical S-Adenosylmethionine (SAM) Enzyme PqqE Catalyzes de Novo Carbon-Carbon Cross-Linking within a Peptide Substrate PqqA in the Presence of the Peptide Chaperone PqqD. J. Biol. Chem 2016, 291, 8877–8884. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
