Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jan 24.
Published in final edited form as: Methods Enzymol. 2022 Nov 24;679:191–233. doi: 10.1016/bs.mie.2022.08.050

Bioinformatic prediction and experimental validation of RiPP recognition elements

Kyle E Shelton a,b, Douglas A Mitchell a,b,c
PMCID: PMC9871372  NIHMSID: NIHMS1860968  PMID: 36682862

Abstract

Ribosomally synthesized and post-translationally modified peptides (RiPPs) are a family of natural products for which discovery efforts have rapidly grown over the past decade. Of the more than 40 known RiPP classes, 37 are of prokaryotic origin. More than half of the prokaryotic RiPP classes include a protein domain called the RiPP Recognition Element (RRE) for successful installation of post-translational modifications on a RiPP precursor peptide. In most cases, the RRE domain binds to the N-terminal ‘leader’ region of the precursor peptide, facilitating enzymatic modification of the C-terminal ‘core’ region. The prevalence of the RRE domain renders it a theoretically useful bioinformatic handle for class-independent RiPP discovery; however, first-in-class RiPPs have yet to be isolated and experimentally characterized using an RRE-centric strategy. Moreover, with most known RRE domains engaging their cognate precursor peptide(s) with high specificity and nanomolar affinity, evaluation of the residue-specific interactions that govern RRE:substrate complexation is a necessary first step to leveraging the RRE domain for various bioengineering applications. This chapter details protocols for developing custom bioinformatic models to predict and annotate RRE domains in a class-specific manner. Next, we outline methods for experimental validation of precursor peptide binding using fluorescence polarization binding assays and in vitro enzyme activity assays. We anticipate the methods herein will guide and enhance future critical analyses of the RRE domain, eventually enabling its future use as a customizable tool for molecular biology.

1. Introduction

Ribosomally synthesized and post-translationally modified peptides (RiPPs) are a rapidly growing family of natural products (Montalbán-López et al., 2021). RiPPs are categorized into molecular classes, which are usually defined by the presence of a defining post-translational modification (PTM) in the mature product (Arnison et al., 2013). RiPPs exhibit a range of desirable biological functions, such as antibacterial, antiviral, and antifungal activities. Additional RiPPs have been shown to act as redox cofactors, exhibit anticancer activities, and display antinociceptive properties (e.g., the FDA-approved conotoxin ziconotide) (Guerrero-Garzón et al., 2020; Hegemann and Süssmuth, 2020; Tocchetti et al., 2021; Walker et al., 2022).

To this end, natural product researchers have a vested interest not only in discovering new RiPPs with clinically useful properties, but also RiPPs with unexploited biological targets and mechanisms of action. Historically, natural product discovery was approached primarily through phenotypic screening of organisms known to be prolific natural product producers, such as soil-dwelling actinobacteria (Dias et al., 2012; Katz and Baltz, 2016; Worthen, 2007). However, after many decades of intense screening campaigns by industrial and academic groups, the low-hanging fruit have been picked. Indeed, untargeted methods that rely on readily cultured bacteria now suffer from a high rediscovery rate; in other words, the likelihood of finding a known compound far outweighs the likelihood of discovering a new natural product (Kloosterman et al., 2021).

1.1. Challenges of using bioinformatic genome mining for RiPP discovery

Over the past decade, discovery efforts in the RiPP field have focused to a greater extent on bioinformatics methods, as well as methods to elicit production from cryptic biosynthetic gene clusters (BGCs)—those that are not expressed by the host under standard laboratory growth conditions. New algorithms (e.g., RiPPER and RiPPMiner) incorporate improved algorithms for high-confidence prediction and de-replication of RiPP BGCs, taking advantage of the wealth of publicly available genomic data (de los Santos, 2019; Santos-Aberturas et al., 2019). BGCs mined using these bioinformatic tools are often expressed in a heterologous host, obviating the need to obtain and culture the native host organism, many of which are not readily cultivatable or may only produce a target RiPP under specific growth conditions (Ahmed et al., 2020; Myronovskyi et al., 2018). Likewise, elicitor screening has been carried out to induce expression of cryptic BGCs in the native host, which may only produce detectable titers of product under specific culturing conditions, such as nutrient depletion or other forms of stress (Abdelmohsen et al., 2015; Moon et al., 2019; Pettit, 2011; Pimentel-Elardo et al., 2015). These methods are required for reconstitution of pathways mined from metagenomic data, where the precise host is not usually known. In all, these methods allow for interrogation of lesser-studied bacterial (and archaeal) genera for natural product production and obviate the need for a native host that can be cultured, thus expanding the breadth of RiPP biosynthetic space that can be accessed (Robinson et al., 2021).

There are several advantages to using bioinformatics-guided approaches to prioritize BGCs with a high probability of producing novel RiPP compounds. The majority of RiPP BGCs are highly compact and require <10 kb of genomic space. For example, most characterized ranthipeptide and streptide BGCs are roughly 3 kb in size, comprising a precursor peptide, one PTM enzyme, and a transporter (Precord et al., 2019; Schramma et al., 2015). However, some RiPP classes with numerous canonical PTMs (e.g., thiopeptides) can range upwards of ~30 kb (Yu et al., 2009). In many RiPP BGCs, the requisite genes are encoded in a monocistronic operon. Thus, analysis of gene direction can sometimes assist in identification of BGC boundaries and possible multi-enzyme complexes acting on a previously unrecognized precursor peptide.

Given these parameters, homology-based searching of known RiPP PTM enzymes, paired with analysis of co-occurring protein domains, can be used to generate a list of homologous BGCs to known RiPP biosynthetic pathways, where the class-defining PTMs are characterized (Kloosterman et al., 2020; Walker et al., 2020). However, mining for RiPP BGCs by using class-defining enzymes as handles for homology-based searching has intrinsic limitations: Many PTM-installing enzymes are unique to specific RiPP classes, and thus are less useful for class-independent discovery efforts. Furthermore, many RiPP enzymes are homologous to other cellular machinery, which can lead to high false positive rates. For example, lasso peptides cyclases evolved from asparagine synthetase enzymes (DiCaprio et al., 2019; Tietz et al., 2017). Nevertheless, homology-based searching of known PTM enzymes has shown promise in discovering new hybrid RiPP classes that have co-opted canonical PTMs from other RiPP pathways, such as the recently discovered nocathioamides and streptamidines (Figure 1) (Russell et al., 2021; Saad et al., 2021).

Figure 1.

Figure 1.

Function and genomic context of the RRE domain. (A) RiPP recognition elements (RREs) bind the leader peptide region, allowing a fused or complexed enzyme to act on the core peptide residues. (B) Representative examples of fused and discrete RRE domains in RiPP biosynthesis. As shown, RRE domains can exist as N-terminal, C-terminal, or internal domain fusions to other proteins. (C) Two recently discovered hybrid RiPP classes that are RRE-dependent. Nocathioamide BGCs contain an RRE required for thiazole installation, analogous to thiopeptide BGCs. Streptamidine BGCs contain an RRE required for azoline installation, analogous to the cyclodehydratase in linear azole-containing peptide (LAP) biosynthesis. Figure adapted from (Kloosterman et al., 2020).

RiPP discovery can also be viewed from the perspective of substrate precursor peptides. RiPP biosynthesis starts with a ribosomal precursor peptide, typically ~50 amino acids in length (Hudson and Mitchell, 2018). This feature of RiPP BGCs allows, in many cases, for high-confidence prediction of final product structures, based on the sequence of the precursor peptide combined with knowledge about the biosynthetic enzymes encoded nearby. Unfortunately, the small size of precursor peptides means they are largely missed by automated gene finders, which makes homology-based searching using precursor peptides unfeasible. Our group previously addressed this challenge with the development of RODEO, a bioinformatic tool for prediction and annotation of genomic regions with numerous modules for class-specific precursor peptide prediction (Georgiou et al., 2020; Oberg et al., 2022; Ramesh et al., 2021; Tietz et al., 2017).

By using RODEO and related tools, researchers can prioritize putative RiPP BGCs that contain unique precursor peptide sequences or unprecedented enzymes encoded as genomic neighbors, reducing overall rates of rediscovery (Kloosterman et al., 2021). However, determination of exact BGC boundaries can be complicated by large numbers of PTM enzymes, the presence of tailoring modification enzymes, or distally encoded elements. In particular, in some cases RiPP precursor peptides have been shown to be distally encoded from related PTM-installing enzymes (Haft, 2009; Harris et al., 2020; Li et al., 2010). This phenomenon is likely underrepresented in the current literature because RiPP mining bioinformatic tools typically only annotate potential precursors within the local genomic space.

1.2. Initial genomic identification and experimental validation of the RRE domain

While homology-based searching of characterized RiPP PTM enzymes has shown great success in discovery of new compounds of known RiPP classes, this inherently limits the likelihood of novelty in bioactivity (Bushin et al., 2018; Hudson et al., 2019; Oberg et al., 2022; Schwalen et al., 2018; Tietz et al., 2017; Walker et al., 2020). For example, although thiopeptides are a molecularly diverse RiPP class, many thiopeptides inhibit protein translation through either inhibition of elongation factor Tu or the 50S ribosomal subunit (Chan and Burrows, 2021). Class-independent discovery of RiPPs requires an approach to be agnostic toward specific enzymes or PTMs, as these features are usually unique to one or several RiPP classes.

One solution to the challenge of class independent RiPP discovery employs the use of the RiPP recognition element, or RRE, as a bioinformatic handle for class-independent discovery of novel RiPP BGCs. Initially discovered in 2015, the RRE domain serves to recognize and bind precursor peptides (Figure 1) (Burkhart et al., 2015). Although the RRE domain was not formally defined until this time, several crystal structures of leader-bound RRE domains had been characterized prior (e.g., NisB from nisin biosynthesis and LynD from cyanobactin biosynthesis) (Koehnke et al., 2015; Ortega et al., 2014). In addition, the necessity of the RRE domain for precursor peptide processing had been recognized in some systems, such as the streptolysin S biosynthetic pathway (Mitchell et al., 2009).

RREs can exist in RiPP BGCs either as discretely encoded proteins ~80–100 amino acids long, or as fusions to other biosynthetic proteins (Figure 2) (Kloosterman et al., 2020). RRE domains generally bind their cognate leader peptides with nanomolar affinity. This strong and specific binding is, in all structurally characterized cases, driven by interactions between the third alpha helix and beta strand of the RRE, and a short motif within the leader peptide, herein called the recognition sequence (Figure 3) (Chekan et al., 2019).

Figure 2.

Figure 2.

Representative crystal structures of RRE-precursor peptide complexes available in the PDB. (A) Structures of two discretely encoded RRE proteins in lasso peptide biosynthetic gene clusters (PDB codes 5V1V and 6JX3). TbiA: precursor for therbactin (Chekan et al., 2019). FusA: precursor for fusilassin (Alfi et al., 2022). (B) An N-terminal RRE fusion to a radical SAM-SPASM enzyme involved in biosynthesis of the ranthipeptide thermocellin (Grove et al., 2017; Hudson et al., 2019). (C) An internal RRE fusion to a dehydratase protein involved in biosynthesis of the lanthipeptide nisin (Repka et al., 2017).

Figure 3.

Figure 3.

Leader peptide recognition sequences observed in crystal structures of RREs bound to their cognate leader peptides. (A) The lasso peptide therbactin uses a YxxP leader peptide motif as the recognition sequence, typical of discrete RRE:leader peptide interactions in the lasso peptide class. The TbiB1 RRE binds its cognate precursor peptides with a KD of 266 nM (Chekan et al., 2019) (B) The lanthipeptide nisin employs an FNLD recognition sequence, typical of class I lanthipeptides. NisB binds the unmodified NisA precursor peptide with a KD of 1.05 μM, an example of a lower-affinity RRE interaction (Bothwell et al., 2019; Mavaro et al., 2011) (C) In general, RRE:leader peptide interactions are driven by hydrophobic packing interactions at the interface of the third alpha helix of the RRE and the leader peptide. The exact residues responsible for recognition are dependent on the class of RiPP being studied. Parts of figure adapted from (Chekan et al., 2019).

Recognition sequences are highly conserved motifs within RiPP classes, for example the YxxP motif of lasso peptides, the FNLD motif of lanthipeptides, or the FxxxB (B, branched chain amino acid) motif in cytolysins (DiCaprio et al., 2019; Mitchell et al., 2009; van der Donk and Nair, 2014). The experimental determination of key binding residues, both in the RRE and RiPP precursor is especially important for engineering applications. For example, new-to-nature RiPPs can be produced by concatenating the recognition sequence for multiple RRE domains on one leader peptide (Burkhart et al., 2017).

As of the most recent survey of RRE domains, roughly half of prokaryotic RiPP classes encode one or more RRE domains in the BGC, presumably for substrate recognition and processing although that has not been experimental validated in every case (Kloosterman et al., 2020). The role of the RRE domain in precursor peptide engagement (i.e., substrate recognition) has been biochemically determined for several RiPP classes, including lanthipeptides, lasso peptides, thiopeptides, linear azole-containing peptides, and bottromycins, among others (Burkhart et al., 2015; DiCaprio et al., 2019; Dunbar et al., 2015; Hegemann and van der Donk, 2018; Mavaro et al., 2011; Melby et al., 2012; Schwalen et al., 2017). Taken wholistically, the RRE is a common motif spanning a range of known RiPP classes and, although it is not ubiquitous to RiPP biosynthetic pathways, it has been proposed as a potentially useful bioinformatic handle for first-in-class RiPP discovery (Figure 1).

1.3. The RRE domain as a class-independent discovery tool

Although RRE domains are common to a host of RiPP classes, they are difficult to detect by traditional homology-based searches (e.g., BLAST searches). This arises owing to RRE domains sharing structural homology but exhibiting high levels of sequence divergence (Figure 2) (Kloosterman et al., 2020). One useful structure for bioinformatically defining and predicting a sequence-diverse protein family is the hidden Markov model (HMM), a statistical model that correlates each residue position with an amino acid probability score (Eddy, 2004). For example, the Pfam database uses a collection of HMMs to define conserved protein families (Mistry et al., 2021).

Until 2020, the only Pfam HMM that accurately predicted the presence of an RRE domain was the PqqD model (PF05402), which is named after the first RRE domain to be structurally characterized, the PqqD protein in pyrroloquinoline quinone (PQQ) biosynthesis (Evans et al., 2017; Tsai et al., 2009). While PF05402 robustly identifies discretely encoded RRE domains (i.e., small proteins that contain only the RRE domain) in both PQQ and lasso peptide BGCs, it fails to identify most RRE domains in other RiPP classes, particularly in cases where the RRE domain exists as a fusion to a PTM enzyme (Figure 2) (Kloosterman et al., 2020; Tietz et al., 2017). In short, one bioinformatic model is not sufficient to capture the diversity of primary RRE sequences found across RiPP classes.

In 2020, we created a bioinformatic tool, RRE-Finder, which addresses the shortcomings of existing models for RRE prediction in two ways (Kloosterman et al., 2020). Two modes were designed to enable class-independent RiPP discovery as well as class-specific annotation of RRE domains. First, exploratory mode of RRE-Finder uses a truncated version of HHpred (described in the following section), which uses secondary structure prediction and comparison to existing RRE crystal structures to predict RRE domains (Kloosterman et al., 2020; Soding et al., 2005). The accompanying mode, precision mode, uses a library of custom HMMs, each designed to specifically predict RRE domains belonging to a specific RiPP class. This chapter includes protocols and guidelines for building such custom HMMs, including how to curate RRE sequences representative of a subfamily and how to validate the accuracy and precision of resulting models. Although this example is specific to RRE domains, the techniques employed here could ostensibly be applied to annotation of any protein domain that is inconsistently annotated by comparison to protein family databases, such as Pfam and TIGRFAM (Figure 4) (Haft, 2001; Mistry et al., 2021).

Figure 4.

Figure 4.

Development and validation of the original RRE-Finder bioinformatic tool. (A) Overall pipeline of RRE-Finder development. The methodology behind precision mode of RRE-Finder is detailed in Section 3 of this chapter. Exploratory mode uses a truncated version of the HHpred pipeline, with further details found in the original publication. (B) The predicted family of RRE-dependent RiPPs retrieved from the UniProtKB database using RRE-Finder in precision mode with a bit score cutoff of 25 (The UniProt Consortium et al., 2021). Predicted RREs are colored based on RiPP class annotation from precision mode HMMs. (C) Validation of RRE-Finder precision and exploratory models using true positive RiPP BGCs from the MIBiG database at a bit score cutoff of 25 (Kautsar et al., 2019). (D) Schematic of HMM predictive overlap in the selected models of RRE-Finder precision mode. Numbers in parentheses indicate the RREs retrieved from UniProtKB in June 2020 using a class-specific HMM at a bit score cutoff of 25. Numbers in overlapping regions indicate that multiple models called this RRE at a greater significance than the bit score threshold. Figure adapted from (Kloosterman et al., 2020).

Following bioinformatic prioritization of RRE domains, there are several ways to experimentally validate that these RRE are functional, necessary for installation of class-defining PTMs, and which specific residue-level interactions are necessary for RRE:leader peptide binding. Herein, we cover methods for heterologous expression and affinity purification of RRE domains and their cognate precursor peptides. Next, methods are provided for systematic mutagenesis of RRE domains and precursors, paired with fluorescence polarization binding assays, which we have found to be a fast and effective means of assessing which residues are indispensable for the nanomolar affinity observed in most RRE binding interactions. Finally, we provide methods for in vitro activity assays to determine the functional role of a given RRE in a RiPP biosynthetic pathway. In general, we believe the methods contained herein could be adapted to study any RRE-dependent binding interaction or RiPP biosynthetic pathway, opening the door to discovering and studying novel RRE-dependent RiPP biochemistry.

3. Generation of custom models for RRE prediction and annotation

This section covers strategies for generating custom hidden Markov models (HMMs) for prediction of RRE domains from genomic data. The pipeline for generating custom HMMs (Figure 5) requires a suite of bioinformatic tools, listed in section 3.1. Sections 3.2 and 3.3 outline the process for selecting representative protein sequences to define a domain and generating custom HMMs. Where possible, we have indicated options at each stage of this pipeline to use either web tool or downloadable versions of the tools employed. Those without prior bioinformatics experience may wish to use web tools to eliminate user error and save hard drive storage. However, the downloadable versions of these programs will be more suitable for researchers wishing to generate a large library of HMMs and those with command line experience. Creating effective HMMs is an iterative process, and section 3.4 outlines the process for improvement and validation of custom HMMs (Johnson et al., 2010). Although this workflow applies specifically to generating custom HMMs for RRE domain identification, the strategy here could easily be extended to any other protein domain. While certain numbers, such as bit score cutoffs, will change from those outlined below, the general pipeline and guidelines can be applied to any domain that is not well defined by existing databases, such as Pfam or TIGRfam (Haft, 2001; Mistry et al., 2021).

Figure 5.

Figure 5.

General pipeline for creating a set of custom HMMs. (A) Datasets of known and predicted RRE-containing proteins are generated and visualized using Cytoscape and the Enzyme Function Initiative Enzyme Similarity Tool (Shannon et al., 2003; Zallot et al., 2019). From the sequence similarity network, seed sequences are selected. (B) RRE domains fused to other proteins and enzymes are truncated in silico using the HHpred web tool. (C) MAFFT is used to generate a multiple sequence alignment representative of the target RRE family (Katoh and Standley, 2013). (D) HMMER tools and protein database search tools are used to generate and validate the custom HMM (Mistry et al., 2013, p. 3).

3.1. Software, online tools, and hardware requirements

Hardware

Use of HMMER to generate HMMs requires a Unix-based operating system or simulated environment. If you are a Mac user, you are already using a Unix system and can use the Terminal application for these steps. If you are using a Windows OS, you will need to use a Unix emulator, such as the Mintty console emulator for Cygwin (found at https://mintty.github.io).

Memory is a key consideration for successful manipulation of sequence similarity networks (SSNs) in Cytoscape. Networks with large numbers of edges (lines on the output network that connect homologous proteins) can be difficult to manipulate. The Cytoscape user manual suggests that users have 1 GB of RAM per 150,000 edges. Network sizes are significantly cut down by employing RepNode networks, which are covered in this section. As discussed in the following section, the number of nodes per network at a given RepNode is output as part of the EFI-EST tool.

HMMER Suite (Finn et al., 2015)

The HMMer suite tools can be downloaded at www.hmmer.org, where complete installation instructions and usage documentation can be found. This workflow will employ some of the HMMer tools, including HMM scan, HMM build, and HMM press functions. Validation of HMMs will require use of the web tool HMM search function, which can be found at www.ebi.ac.uk/Tools/hmmer/search/hmmsearch.

EFI-EST Web Tools (Zallot et al., 2019)

The EFI-EST tools, used for generation of sequence similarity networks and genome neighboring networks, are exclusively web tools. They can be found at https://efi.igb.illinois.edu/efi-est.

Cytoscape (Shannon et al., 2003)

Cytoscape must be downloaded for the visualization and manipulation of sequence similarity networks. This open-source program can be downloaded at https://cytoscape.org.

RODEO (Georgiou et al., 2020; Kloosterman et al., 2020; Ramesh et al., 2021; Schwalen et al., 2018; Tietz et al., 2017)

RODEO is an artificial intelligence-driven tool for compilation, categorization, visualization, and annotation of RiPP BGCs, including precursor peptide prediction. The web tool can be accessed at https://rodeo.igb.illinois.edu. For large batch runs (>1,000 query sequences), users are encouraged to download the command line version from https://github.com/the-mitchell-lab/rodeo2.

HHpred (Soding et al., 2005)

HHpred predicts protein domain homology using a combination of primary and predicted secondary structure alignment to structures in the Protein Data Bank (PDB; https://www.rcsb.org) (Zardecki et al., 2016). For most applications, using the web tool will be sufficient: https://toolkit.tuebingen.mpg.de/tools/hhpred. A downloadable version of HHpred may be needed for large-scale analysis, but requires local download of the entire PDB, so this is not recommended for casual users.

MAFFT (optional) (Katoh and Standley, 2013)

MAFFT is a command line-only tool for generating multiple sequence alignments (MSAs), appropriate for either HMM or phylogenetic tree generation. It can be downloaded at https://mafft.cbrc.jp/alignment/software/. Other tools for MSA generation can be used in place of MAFFT and several web tools can serve as substitutes, such as the Clustal Omega tool available at https://www.ebi.ac.uk/Tools/msa/clustalo/.

3.2. Curation of seed sequences for model generation

Timing: 2–3 hours

  1. Acquire or generate a dataset from which to select sequences to use for hidden Markov model generation. Ideally, this dataset should represent all known members of a given RRE class, e.g., all sactipeptide associated RREs. In cases where a dataset does not exist, one should be generated by using a target RRE as a query for a PSI-BLAST search. Parameters should be set to three iterative rounds of searching with an expect value cutoff of 0.05.

  2. Generate a sequence similarity network of the target dataset using the EFI-EST tools. Input for the SSN tool can be in the form of a FASTA file with all query sequences or a list of UniProt or GenBank accession IDs. All other settings are kept at default.

  3. When prompted by email, generate the final SSN by choosing a starting alignment score. The alignment score determines how proteins in the SSN cluster together as a function of their sequence identity. Choose an alignment score that corresponds to clustering of sequences at 40% identity, as determined by the output graphs generated by EFI-EST.

  4. The output SSN can be downloaded at several different RepNode cutoffs. RepNode networks conflate proteins that share greater identity than a set percentage cutoff. For custom HMM generation, it is recommended to download the RepNode60 network. This ensures that proteins occupying separate nodes on the network do not share greater than 60% sequence identity.

  5. Use Cytoscape to visualize the sequence similarity network. Data can be imported as an xgmml file. Select the Layouts drop-down menu, then select the “organic” layout under the yFiles tab. In this view, clustered proteins are visualized as circular “nodes” and edges connecting two nodes represents homology more significant than the cutoff specified by the chosen alignment score (Figure 6).

  6. Select 5–20 nodes from the SSN that will comprise the seed sequences for the HMM. See notes below for considerations on choosing a diverse set of seed sequences and selecting an appropriate number of sequences to represent your target RRE family.

  7. Once nodes are selected, output the selected data to a tab-delimited file (readable either with a text editor or Microsoft Excel). This file contains data useful for a variety of downstream analyses, including phylogeny of the producing organism, sequence length, and matches to families in Pfam or InterPro databases. The column called “shared name” can be copied and pasted into the InterPro protein retrieval tool (https://www.ebi.ac.uk/interpro/search/sequence/) to generate a FASTA file of target RRE seed sequences. This FASTA file can be either be used as direct input for MSA generation, as covered by the following section, or can be truncated to contain only RRE domains (highly recommended if you are dealing with RRE-domain fusions to larger enzymatic domains).

  8. (Optional) Truncate seed sequences using HHpred to contain only the residues comprising the RRE domain. Each seed sequence should be individually submitted to the HHpred web tool (https://toolkit.tuebingen.mpg.de/tools/hhpred). Output from this tool shows alignment of the query protein to proteins in the Protein Data Bank (https://www.rcsb.org), using a combination of primary and (predicted) secondary structure homology (Figure 5). The most common PDB accessions identified for RRE domains are listed in Table 1. Navigate to the alignment section for the top-scoring RRE hit to identify the query protein residues corresponding to the RRE domain. Your FASTA file from the previous step should be modified to contain only the RRE domain residues before generating a multiple sequence alignment.

Figure 6.

Figure 6.

Selection of an appropriate alignment score for SSN visualization. This network shows a dataset of RRE-containing proteins from various RiPP natural products with azoles/thiazoles in their final structures. As shown, at a low alignment score, RREs belonging to disparate classes of RiPPs cluster together (e.g., thiopeptides and heterocycloanthracins). This is known as underfractionation. Raising the alignment score above a reasonable threshold fragments clusters to an extent that they are not useful for HMM generation. This is known as overfractionation.

Table 1.

List of RRE crystal and NMR structures available in the PDB. LP: leader peptide.

RRE RiPP Class PDB Accession UniProtKB Accession Precursor bound? Citation DOI
LynD Cyanobactin 4V1T A0YXD2 Yes 10.1038/nchembio.1841
TruD Cyanobactin 4BS9 B2KYG8 No 10.1002/anie.201306302
ThcOx Cyanobactin 5LQ4 B8HTZ1 No 10.1107/S2059798316015850
MibB Lanthipeptide 5EHK E2IHB7 No 10.1016/j.chembiol.2015.11.017
NisB Lanthipeptide 4WD9/6M7Y P20103 Yes 10.1038/nature13888
McbB LAP 6GOS/6GRH P23184 Yes 10.1016/j.molcel.2018.11.032
FusB1 Lasso peptide 6JX3 Q47QT5 Yes (LP only) 10.1021/acschembio.9b00348
TbiB1 Lasso peptide 5V1V D1CIZ5 Yes (LP only) 10.1073/pnas.1908364116
MccB Microcin 6OM4 Q47506 Yes 10.1039/c8sc03173h
PaaA Pantocin 5FF5 Q9ZAR3 No 10.1021/jacs.5b13529
PqqD PQQ 3G2B/5SXY Q8P6M8 No 10.1002/prot.22461
10.1021/acs.biochem.7b00247
CteB Ranthipeptide 5WGG A3DDW1 Yes (LP only) 10.1021/jacs.7b01283
SkfB Sactipeptide 6EFN O31423 No 10.1074/jbc.RA118.005369
SuiB Streptide 5V1T A0A0Z8EWX1 Yes 10.1073/pnas.1703663114
TbtB Thiopeptide 6EC7/6EC8 D6Y502 No 10.1073/pnas.1905240116
Notes:
  1. There is no minimum or maximum number of seed sequences that can be incorporated into an HMM; however, we have found that using too many sequences can lead to a model that is biased toward known, characterized RREs and not useful for exploratory, genome-mining applications. For smaller RiPP classes (<1,000 predicted members), 10 diversity-maximized sequences are usually sufficient. For large RiPP classes, more sequences should be employed.

  2. As shown in Figure 7, sequences should be chosen to represent the natural diversity of an RRE family and not bias the model toward commonly occurring motifs. If multiple clusters occur in the SSN at the starting alignment score (representing clustering at 40% identity), seed sequences should be selected to sample from each primary cluster. Singleton nodes or clusters with a small number of nodes (<5 nodes) should usually be ignored as these are outliers that are not representative of the RRE class being targeted.

Figure 7.

Figure 7.

Implications of seed sequence selection on custom HMM predictive capabilities. (A) Using too few or too many seed sequences for MSA generation can limit HMM scope as shown in this theoretical dataset. Green nodes represent selected seed sequences, while the larger dotted green circles represent homologous sequences called by the resulting HMM. (B) Choosing highly identical seed sequences is not recommended. This problem is largely eliminated by employing RepNode networks. (C) For large or highly diverse RiPP classes, multiple HMMs can be employed to better cover a sequence space. In our dataset, such division into sub-classes was carried out for sactipeptides and linear azole-containing peptides, both of which contain sequence-diverse RREs spanning the class.

3.3. Custom HMM generation

Timing: <1 hour

  1. Use MAFFT to generate a multiple sequence alignment for the target RRE class. The default L-INS-I alignment option should be used. The input for MAFFT should be a FASTA format file of your individual RRE seed sequences. The output MSA will be a text-readable file.

  2. Use the command line terminal to navigate to the folder containing your MSA. Use the following command to generate a custom HMM from the output MSA:
    % hmmbuild <hmm_file> <msafile>
    

    The output HMM should be appended with the suffix .hmm for the following steps.

  3. If the output HMM will be used with the RODEO command line tool (rather than the web tool), press it into binary form using the following HMMER command:
    % hmmpress <hmm_file>
    

    This command should produce four separate files with the same parent name as the original HMM but with different suffixes (h3f, h3i, h3m, h3p).

  4. To ensure the HMM is properly annotated by RODEO, use a text editor to change or add the following fields at the top of the HMM:

    NAME: Target RRE class (e.g., thiopeptide RRE)

    ACC: Identifier of your HMM, helpful if you are creating multiple models (e.g., RREfam004)

    DESC: Description of the target protein domain

3.4. Custom HMM model validation

Custom HMMs generated using the method above should be validated either against a trusted dataset of RRE-containing proteins or, in cases where such a dataset does not exist and would be difficult to generate, using the online hmmsearch tool to interrogate the UniProtKB database using the custom HMM. In many cases, model validation may reveal predictive gaps of the custom HMM, necessitating iterative improvement of the HMM by modifying the input seed sequences. The below section outlines a protocol for assessment of model recall and precision, and advice for improvement of HMMs in cases where model recall fails to meet a designated threshold.

Timing: 2–3 hours

  1. Validate recall of your HMM against the original dataset of RREs generated in Section 3.2 using the hmmsearch function of the HMMer suite. Input for this function includes a FASTA format file containing all RRE sequences comprised by the original dataset and the HMM generated in the previous section. Hmmsearch can be run in the terminal using the following command:
    Hmmsearch –tblout <f> -T <x> <hmmfile> <seqdb>
    

    In the above command, <hmmfile> should direct to your custom HMM, while <seqdb> directs to the FASTA file of RRE sequences. The output file will be a tab-delimited file, where <f> represents your desired output file name. The -T option indicates a bit score significance cutoff. We have found a bit score of 25 to be an appropriate cutoff for high-confidence RRE domains.

  2. An effective custom HMM should detect >90% of the RRE domains in the original dataset at a bit score threshold of 25. If this threshold is not met, follow the steps below to improve the model.

  3. Evaluate custom HMM specificity through an unbiased query of the UniProtKB protein database. Using the hmmsearch web tool (https://www.ebi.ac.uk/Tools/hmmer/search/hmmsearch), input the custom HMM to query the UniProtKB database. For RRE domains, we have found a bit score threshold of 25 to be ideal for retrieving high confidence, class-specific RRE domains with minimal false positives. Lowering the bit score threshold may be necessary for very large or diverse RiPP classes. Raising the bit score threshold reduces false positive rates and is useful in cases where class-specific identification is important, or model overlap is high (Figure 5).

  4. Use all UniProtKB retrievals at the specific bit score threshold as queries for RODEO (either the web tool or command line tool) to gather the genome neighborhood flanking each predicted RRE. RODEO takes NCBI accession identifiers as input, thus UniProtKB accessions must first be converted to “EMBL/NCBI CDS” accessions using the UniProt mapping tool (https://www.uniprot.org/id-mapping).

4. Cloning, expression, and purification of RRE domains and precursor peptides

Proper expression and purification of RRE domains and their cognate precursor peptides is a critical first step to performing in vitro activity-based assays or binding assays such as fluorescence polarization. Impurities could result in miscalculation of protein concentration, which will decrease the accuracy of KD and IC50 measurements, as outlined in a later section. In general, we have had success expressing codon-optimized RRE and precursor constructs in the E. coli BL21 (DE3)-RIPL strain. RREs are generally small (~80 amino acids) and fold readily, so the use of chaperone proteins or cold induction temperatures is usually unnecessary when expressing a discrete or excised RRE. We have had success purifying both RREs and their cognate precursor peptides as constructs with N-terminal His6 tags or maltose-binding protein (MBP) tags. The MBP tag is preferable in most cases for two reasons. First, MBP is highly soluble and will help ensure the RRE can be concentrated without risking precipitation. Second, the MBP tag increases the overall size difference between the tagged RRE and precursor peptide, which improves fluorescence polarization signals during binding assays.

4.1. Materials, reagents, and equipment

Equipment

  1. Thermo Scientific Sorvall Legend Micro 17

  2. Sorvall RC6 Plus floor centrifuge with SS-34 rotor (Thermo Scientific)

  3. Ultrasonic cell disruptor (Microson, NY)

Materials

  1. EconoSpin silica membrane mini-spin column (Epoch Life Science, TX)

  2. Amylose resin (New England BioLabs, MA)

  3. 1.5 × 20 cm CrystalCruz chromatography column (Santa Cruz Biotechnology, CA)

  4. Amicon Ultra centrifugal filter, 30 kDa molecular weight cutoff (MWCO) (EMD Millipore)

Reagents and Buffers

  1. Q5 DNA polymerase (New England BioLabs, MA)

  2. Restriction enzymes (New England BioLabs, MA)

  3. T4 DNA ligase (New England BioLabs, MA)

  4. Protease inhibitor cocktail (1× solution contains 2 μM leupeptin, 2 μM benzamide HCl, and 2 μM E64; solution can be prepared at a 100× concentration and stored at −80 °C until use)

  5. Luria-Bertani (LB) growth medium: 1 L of LB medium contains 10 g tryptone, 10 g NaCl, 5 g yeast extract; supplement with 15 g of agar for growth on solid medium

  6. Lysis buffer: 50 mM Tris-HCl at pH 7.5, 500 mM NaCl, 2.5% glycerol (v/v), and 0.1% Triton X-100 (v/v)

  7. Wash buffer: 50 mM Tris-HCl, 500 mM NaCl, 2.5% glycerol (v/v), and 0.5 mM tris(2-carboxyethyl)phosphine (TCEP)

  8. Elution buffer: 50 mM Tris-HCl at pH 7.5, 150 mM NaCl, 2.5% glycerol (v/v), 10 mM maltose, and 0.5 mM TCEP

  9. Protein storage buffer: 50 mM N-2-hydroxyethylpiperazine-N’−2-ethanesulfonic acid (HEPES) at pH 7.5, 300 mM NaCl, 2.5% glycerol (v/v), and 0.5 mM TCEP

  10. Phosphate-buffered saline (PBS): 137 mM NaCl, 2.7 mM KCl, 10 mM Na2HPO4, and 1.8 mM KH2PO4

4.2. Cloning strategies

The E. coli DH5α strain was used for plasmid amplification and storage. Genes of interest can either be ordered as codon-optimized constructs directly from companies like GenScript or Twist Bioscience. Alternatively, if the native host strain is available and culturable, the strain can be grown on appropriate medium and genomic DNA can be extracted using a commercial gDNA extraction kit. Genes of interest are then amplified from gDNA using PCR with primers that introduce N- and C-terminal restriction sites appropriate for ligating into an expression vector (in this case a pET28-MBP expression vector). The resulting PCR products and host vectors are digested, purified, and ligated using T4 DNA ligase and following conventional molecular biology protocols. If genes of interest are ordered for custom synthesis, PCR amplification can be carried out directly on the received plasmid and then subcloned into an appropriate expression vector. The resulting constructs were validated through Sanger sequencing.

4.3. Expression protocol for RRE domains

Timing: 2–3 days

  1. Transform the pET28-MBP-RRE plasmid into BL21 (DE3) chemically competent cells using standard heat shock procedures.

  2. Grow cells overnight (~16 hours) on LB agar plates containing 50μg/mL kanamycin at 37 °C. If an expression vector other than pET28 is employed, the antibiotic resistance cassette contained within the plasmid will determine what antibiotic should be added to the growth medium.

  3. The following day, pick an isolated colony and inoculate a 10mL starter culture of liquid LB containing 50 μg/mL kanamycin and 34 μg/mL chloramphenicol. Chloramphenicol can be omitted if using a regular BL21 strain of E. coli not containing the pACYC-based plasmid for rare codons.

  4. Incubate the starter culture overnight (16–18 hours) at 37 °C with moderate shaking.

  5. Inoculate 1 L of sterile LB medium with the entire 10mL starter culture. This larger culture should also contain equal concentrations of kanamycin and chloramphenicol as the starter culture.

  6. Grow the 1 L culture at 37 °C with moderate shaking until the optical density reaches 0.6. This will take several hours and OD600 can be periodically measured using any standard spectrophotometer. Cultures expressing RRE domains should be grown to an OD600 of 0.4–0.6, while cultures expressing precursor peptides can be grown to a higher density of 0.8–1.0.

  7. After the proper optical density is reached, cool the cultures in ice water for 20 minutes or place in a 4 °C cold room.

  8. Induce protein expression by adding 0.4 mM isopropyl β-D-1-thiogalactopyranoside (IPTG). For substrate peptides, induce with 1 mM IPTG.

  9. Incubate cultures overnight (16 hours) at 22 °C with shaking set to 220 rpm. Precursor peptides are highly susceptible to proteolytic cleavage, and should only be incubated for 1–2 hours at a higher temperature, usually 37 °C.

  10. Harvest cells by centrifugation at 3,000 × g for 10 minutes. Discard spent media, resuspend cell pellet in 50 mL of chilled PBS, and harvest cells once more by centrifugation at 3,000 × g for 10 minutes. Resuspension of the cell pellet in PBS provides an opportunity to transfer cells to a 50 mL conical tube, which is suitable for storage at −80 °C.

  11. Discard the supernatant. Flash-freeze cell pellets in liquid nitrogen and store at −80 °C until use.

4.4. Purification protocol for RRE domains and precursor peptides

Timing: 6–8 hours

  1. Thaw cell pellets on ice for 30 minutes. After thawing, resuspend the cell pellets in pre-chilled lysis buffer. Add in 4 mg/ml of lysozyme, 1× protease inhibitor cocktail, and 10 mg of phenylmethylsulfonyl fluoride (PMSF) protease inhibitor (only needed for precursor purification, optional for RRE purification)

  2. Resuspend cell pellet fully in lysis buffer by using a vortex mixer.

  3. Lyse cells with an ultrasonic cell disruptor (sonicator) for 30 second increments with a power output of 10–12 W. Alternate sonication intervals with 10-minute periods of gentle rocking at 4 °C until cell pellets are fully broken up and solution appears cloudy. It is easiest to perform this step in a cold room; otherwise, cells should be kept on ice whenever possible.

  4. Remove insoluble cellular components by centrifugation at 20,000 × g for 1 hour at 4 °C.

  5. Load affinity chromatography columns with amylose resin (10 ml per 1L of original cell culture) and pre-equilibrate column by running through 20 ml of chilled lysis buffer supplemented with 0.5 mM TCEP.

  6. Load entire cell lysate supernatant onto the pre-equilibrated column and allow for gravity flow through the column. It is good practice to collect the flow-through in case binding of the analyte to column is not successful.

  7. Wash the column with 40 mL of wash buffer, then elute target protein/peptide into a clean falcon tube using 30 mL of chilled elution buffer.

  8. Concentrate the eluent using a 30 kDa MWCO Amicon Ultra centrifugal filter.

  9. Perform buffer exchange with 30 mL of chilled protein storage buffer prior to determination of final concentration and storage.

  10. A rough concentration of the eluted protein/peptide can be determined by absorbance at 280 nm. Expected extinction coefficients can be predicted using the ExPASy ProtParam tool available at http://web.expasy.org/protparam. A Bradford or bicinchoninic acid (BCA) colorimetric protein concentration assay should be performed. If an accurate protein concentration is needed, dry weight or quantitative amino acid analysis should be conducted. Protein purity should also be evaluated via Coomassie-stained SDS-PAGE with a molecular weight standard to assess full length expression.

Notes:
  1. The best results will come from performing the cell lysis and affinity chromatography steps above in a 4 °C cold room. If this is not possible, protein solutions should be kept on ice whenever possible, including directly before and after column chromatography.

  2. RRE domains are highly soluble and can generally be concentrated to up to 20 mg/mL before there is risk of precipitation.

  3. If columns become clogged during affinity chromatography, flow can be re-established by gently resuspending the amylose resin using a pipette. This can be done without affecting overall yield.

  4. The above protocol will generally yield protein purities >95%, as determined by band intensity on an SDS-PAGE gel. If additional purification is required for a particular application, either size-exclusion or ion-exchange chromatography can be employed.

5. Fluorescence polarization binding assays

We have found a straightforward and relatively high-throughput way of determining key binding residues to be a tandem site-directed mutagenesis (SDM) and fluorescence polarization (FP) assay workflow (Burkhart et al., 2015; Zhang et al., 2016). First, residues suspected to contribute to key binding interactions are selected; there are several strategies for selecting target residues, outlined in this section. Then, either the plasmid construct for the MBP-tagged precursor peptide or RRE is subjected to SDM. Variants are then expressed and purified following the protocols in the previous section. Finally, binding parameters are measured using either fluorescence polarization (FP) or competition FP assays.

Because the FP signal is enhanced by a large molecular weight difference between protein and ligand, we have had the most success with assays where the RRE is kept as an MBP-tagged protein, but the leader (or precursor) peptide is not. For this purpose, we employ a pET28-MBP expression plasmid with a TEV-cleavable linker, which is a peptide motif recognized and cleaved by the tobacco etch virus protease (Raran-Kurussi et al., 2017). The combination of nanomolar affinity of the RRE for its cognate LP and the large size difference between protein and ligand makes FP an effective and rapid way to assess binding for any RRE-dependent RiPP pathway.

5.1. Materials, reagents, and equipment

Equipment

  1. Thermo Scientific Sorvall Legend micro 17 centrifuge

  2. Synergy H4 hybrid plate reader (BioTek)

  3. Multi-channel pipette, 0.5–10 μL and 20–200 μL

Materials

  1. 96-well microplates (Corning)

  2. 384-well black polystyrene microplates (Corning)

Reagents

  1. Q5 DNA polymerase (New England BioLabs, MA)

  2. DpnI restriction enzyme (New England BioLabs, MA)

  3. pET28b-MBP-precursor peptide plasmid (constructed as described in Section 4)

  4. pET28b-MBP-RRE plasmid (constructed as described in Section 4)

  5. Fluorescein isothiocyanate (Sigma Aldrich)

5.2. Strategy and protocol for site-directed mutagenesis

Timing: 2–3 days

  1. Use Q5 DNA polymerase to amplify the pET28-MBP plasmid containing either a target RRE domain or leader peptide. Primers should be designed to anneal to regions both 15 base pairs upstream and downstream of the region targeted for mutagenesis. The PCR cycle protocol is shown in Table 2.

  2. Following PCR, digest any remaining parental plasmid by adding 1 μL DpnI restriction enzyme and incubating at 37 °C for 2–4 hours.

  3. Transform DH5α chemically-component E. coli cells with the mutated plasmid by standard heat shock procedures. Plate transformed cells onto LB agar plates supplemented with 50 μg/mL kanamycin. Allow colonies to grow at 37 °C overnight (12–16 hours)

  4. Choose 3 distinct colonies and use to inoculate cultures of 10 mL LB with 50 μg/mL kanamycin. Allow cultures to grow at 37 °C with moderate agitation for 12–16 hours.

  5. Once cultures have grown to be visibly cloudy (~OD600 of 2), harvest cells in microcentrifuge tubes (4,000 × g for 10 minutes). Isolate plasmid from the cell pellets using standard protocol with a miniprep kit.

  6. Verify the correct sequence of the mutated plasmid using primers designed to flank the target region. When using the pET28-MBP, we employ an MBP-forward primer and a T7-terminator primer with the following sequences:

    MBP F primer: GAGGAAGAGTTGGCGAAAGATCCAGGTA

    T7 R primer: GCTAGTTATTGCTCAGCGG

  7. RRE and peptide variants should be expressed and purified using the protocols outlined in the previous section.

Table 2.

PCR cycling protocol used for site-directed mutagenesis. Annealing temperature of 65 °C as indicated should work for most amplifications using long primers designed as in Section 4.2 but can be adjusted if initial PCR cycles fail.

Step Temperature (°C) Duration (min:sec)
1 98 10:00
2 98 0:30
3 65 0:30
4 72 16:00
5 Repeat steps 2–4 for 10 cycles
6 98 0:30
7 65 0:30
8 72 16:00 (+15 s/cycle)
9 Repeat steps 6–8 for 20 cycles
10 72 30:00
11 12 Hold until use

5.3. FITC-labeling of precursor peptides for use in binding assays

A stock solution of fluorescently labeled precursor peptide must be synthesized for fluorescence polarization (FP) and competition FP experiments in the following sections. Synthetic peptides containing the leader peptide of interest can be ordered from GenScript. For a larger cost, fluorescently labeled peptides can be ordered and this protocol can be skipped entirely. We have had success using fluorescein isothiocyanate (FITC) as a fluorescent tag, due to its high extinction coefficient and compatibility with filters on most commercial plate readers. The example protocol herein uses the thiomuracin leader peptide, but this protocol can be adapted to any leader peptide that can be purified by standard reverse-phase HPLC.

Timing: 3 days

  1. Dissolve the synthetic leader peptide in dimethyl sulfoxide (DMSO) to a final concentration of 10 mg/ml.

  2. Dissolve FITC in 10% aqueous DMSO to a final concentration of 6mg/ml. Adjust pH of the resulting solution to 9.5 using NaHCO3 and Na2CO3

  3. Combine 100 μL of dissolved peptide with 300 μL of FITC solution. Allow N-terminal FITC labeling to proceed overnight (12–16 hours) at 25 °C. The reaction should be wrapped in foil to prevent photobleaching of the FITC label.

  4. Quench the reaction using 50 mM of NH4Cl, allowing the quenched reaction to sit in darkness for at least 1 hour.

  5. Evaluate labeling by MALDI-TOF MS using the methods in Section 4. FITC labeling should result in an overall mass shift of 389.4 Da relative to the mass of the unmodified leader peptide.

  6. Purify labeled peptide using CombiFlash instrument equipped with a reverse-phase C18 column (RediSep Rf 4.3 g). General separation conditions are a 5–90% methanol gradient, with 10 mM NH4HCO3 used as the aqueous phase.

  7. Spot fractions on a stainless-steel target and analyze by MALDI-TOF MS. Combine any fractions containing labeled peptide and concentrate down using a rotary evaporator.

  8. Re-dissolve labeled peptide in 5% methanol. Remove insoluble components by filtering mixture through a 0.22 μm syringe filter.

  9. Inject peptide solution onto a Betasil C18 HPLC column (250 × 4.6mm). Separate at a flow rate of 1 mL/min using a gradient of 20–98% methanol, with 100 mM NH4HCO3 as the aqueous phase.

  10. Analyze fractions by MALDI-TOF MS for presence of labeled peptide. Pool fractions containing the desired peptide and concentrate using rotary evaporation.

  11. Dissolve dried peptide in binding buffer to a final concentration of 250 nM. This is the stock solution that will be employed for FP and comp. FP protocols below. The concentration of labeled peptide can be estimated based on the absorption at 495 nm, using the extinction coefficient of FITC (73,000 M−1cm−1)

Notes
  1. Use of fluorescein is advantageous because of its high extinction coefficient, compatibility with excitation/emission filters for most plate readers, and its sequence-independent labeling of the peptide N-terminus. Addition of an N-terminal Gly-Gly linker to the synthesized leader peptide is recommended to spatially separate fluorescein from the RRE and obviate any potential binding interference.

  2. We have had the most success with a single Gly-Gly linker. Use of a longer linker region allows the bound fluorophore too much flexibility relative to the bound leader peptide.

5.4. Fluorescence polarization (FP) binding assay

Timing: 2–4 hours

  1. Dilute a master solution of the MBP-tagged RRE of choice to an appropriate starting concentration. The starting concentration should be chosen to center the FP binding curve around the predicted KD of the RRE:LP interaction (~60–100 nM for most RRE:LP binding interactions). For the example outlined in Figure 8, we have chosen a starting concentration of 3.2 μM.

  2. Perform 11 consecutive 2-fold dilutions of the MBP-tagged RRE in binding buffer. This is most easily done in a 96-well plate.

  3. Transfer the serially diluted MBP-RRE solutions to a non-binding-surface, 384-well black microplate. To each well, add FITC-labeled precursor peptide to a final concentration of 25 nM, mixing very gently by pipetting to avoid the formation of bubbles. Each unique RRE:LP interaction should be assayed in triplicate

  4. Let the binding partners equilibrate for 30 minutes with shaking at 25 °C. If a microplate shaker is not available, plates can also be equilibrated for 1 hour without shaking and will result in the same binding curve.

  5. Collect FP data using a plate reader with the appropriate emission and excitation filters installed (λex = 485 nm; λem = 538 nm for FITC labels). Our data was collected using a Synergy H4 Hybrid plate reader with SoftMax Pro Gen5 software.

  6. Calculate polarization units at each concentration using the following equation, where P is the polarization, I is emission fluorescence intensity parallel to excitation, and I is emission fluorescence intensity perpendicular to excitation. G represents a differential sensitivity correction and is a plate-reader specific parameter.
    P=IG×II+G×I
  7. Fit the data using OriginPro9.1 (OriginLab) with a non-linear dose response curve to estimate the dissociation constant (KD). For most RRE:leader peptide interactions, the KD will lie in the mid nM range.

Figure 8.

Figure 8.

Analysis of recognition sequence binding residues in the thiomuracin system via fluorescence polarization assays. (A) Fluorescence polarization binding curves for wild-type thiomuracin leader peptide (TbtA) binding to the RRE of TbtF versus the fused C-terminal portion of the protein. In all cases, error bars represent a standard deviation of the mean (n=3). (B) Competition binding curve for wild-type TbtA leader competing against a FITC-labeled leader peptide. In the leader sequence shown, important binding residues (>2-fold perturbation of binding upon mutation) are highlighted in blue, while critical binding residues (ablation of binding upon mutation) are highlighted in green. (C) Competitive FP binding curve for a low affinity variant of the TbtA leader peptide. This data is representative of a variant with severely impacted binding and an IC50 outside of the range of the assay parameters. (D) Summary of competitive FP data for alanine scan of the thiomuracin leader peptide. The negative numbers indicate residue position N-terminal to the leader peptide cleavage site. Figure adapted from (Zhang et al., 2016).

Notes:
  1. An ideal FP binding curve will have a sigmoidal shape, flattening out at both high and low concentrations of the MBP-tagged RRE. These regions represent unbound leader peptide polarization and fully saturated binding, respectively. Not observing these regions in a binding curve indicates that the starting concentration of MBP-tagged RRE should be adjusted to center the curve around the observed dissociation constant. Although many RRE:LP interactions are of nanomolar affinity, there are known examples of diminished affinity RRE interactions (Melby et al., 2012). In these cases, a higher initial concentration of MBP-tagged RRE should be employed.

5.5. Competition FP assay

Timing: 2–4 hours

  1. Competition FP can be used to assess the effect of recognition sequence mutation on leader peptide affinity toward the RRE. The same FITC-labeled wild-type leader peptide used in part 5.4 can be employed here.

  2. Perform 11 consecutive 2-fold dilutions of the MBP-tagged leader peptide variant in binding buffer. This is most easily done in a 96-well plate. As in Section 5.4, the initial concentration of leader peptide variant should be chosen to center the inhibition curve around the expected IC50 value. We have found 20 μM to be a reasonable starting concentration for many RRE:leader peptide interactions.

  3. Mix each well of serially diluted leader peptide variant with MBP-RRE (to a final concentration of 4 μM) and 3 μL of 250 nM stock solution of FITC-labeled leader peptide. Transfer mixed solutions to a 384-well black microplate. Let mixtures either equilibrate with shaking for 30 °C or without shaking for 1 hour at 25 °C. All assays should be performed in triplicate.

  4. Fluorescence polarization data can be collected and processed in an identical manner to Section 5.4. When using OriginPro to estimate the IC50 value, data should be fit through a non-linear regression analysis.

  5. Ki, the inhibition constant, can be calculated from the estimated IC50 values using the following equation. In this equation, L50 is the concentration of FITC-labeled peptide and PO is the final concentration of MBP-tagged RRE (3 μM).
    Ki=IC501+L50KD+POKD

6. Enzyme activity assays and mass spectrometric analysis

While the FP-based binding assays described in the previous section are suitable for confirming and quantifying binding of an RRE to its cognate precursor peptide, it can be useful to assess the role of RRE-binding from the broader perspective of RiPP PTMs. RREs themselves do not install PTMs on their cognate precursor, but rather serve as part of larger enzymatic complexes, serving to position the core peptide region in the active site of RiPP PTM-installing enzymes. In cases where the RRE exists as a fusion to a larger enzymatic domain, the relationship between leader peptide binding and related PTMs is more obvious. In cases where the RRE exists as a discrete protein, it may be less apparent which partner enzyme(s) form the components of the PTM-installing enzyme complex. If potential partner enzymes have been identified and are amenable to heterologous expression and purification using the methods outlined in Section 4, we have found in vitro enzymatic activity assays using a “leave one out” approach to be a fast way to assess which PTMs in each RiPP biosynthetic pathway are RRE-dependent. Analysis is carried out using MALDI-TOF mass spectrometry. In the below section, we outline an example activity-based assay for assessing the role of the RRE in thiopeptide biosynthesis (Hudson et al., 2015). While these conditions are a starting point for assessing the RRE’s role in novel RiPP biochemistry, some parameters, like enzyme loading, equilibration time, and reaction additives, will necessarily change depending on the type of RiPP chemistry involved. For example, RREs that are associated with radical S-adenosyl methionine enzymes (rSAMs) will be much more difficult to assess through in vitro activity assays, due to rSAM sensitivity to oxygen (Oberg et al., 2022).

6.1. Materials, reagents, and equipment

Equipment

  1. Bruker UltrafleXtreme mass spectrometer (Bruker Daltonics)

  2. Speedvac concentrator (Optional; Thermo Fisher)

Materials

  1. C18 ZipTip pipette tips (Millipore Sigma)

Reagents

  1. TEV protease (prepared in-house through Ni-NTA affinity chromatography and stored at −80 °C until use)

  2. Synthetase buffer: 50 mM Tris-HCl at pH 7.5, 125 mM NaCl, 20 mM MgCl2, 5 mM DTT, 5 mM ATP

  3. High performance liquid chromatography grade (HPLC) acetonitrile (Millipore Sigma)

  4. ZipTip solutions: Solution A (100% HPLC-grade acetonitrile), solution B (50% acetonitrile, 50% H2O, 0.1% formic acid), and solution C (H2O with 0.1% formic acid)

  5. ProteoMass Peptide MALDI-MS calibration kit (Millipore Sigma)

6.2. Activity assay protocol: thiopeptide azole/azoline installation

Timing: 6–8 hours

  1. All reaction components should be expressed and purified separately using the protocols outlined in Section 4. For this activity-based assay, there are four reaction components: the full-length precursor peptide (TbtA), the RRE-containing protein (TbtF), the azoline-installing YcaO protein (TbtG), and a dehydrogenase (TbtE) that oxidizes azolines to azoles.

  2. Digest all MBP-tagged components using TEV protease. Add protein components (TbtE/F/G) to a final concentration of 4 μM, peptide to a final concentration of 100 μM, and TEV (1:10 molar ratio). Dilute reactions to 50 μL using synthetase buffer. Allow digestion to proceed for 30 minutes at 25 °C without agitation.

  3. After TEV digestion is complete, add in 5 mM ATP to initiate azoline/azole formation. Let the reaction proceed for 5–6 hours at 25 °C.

  4. Desalt the reaction for analysis by MALDI-TOF MS using the standard ZipTip procedure, outlined in the following section.

  5. (Optional) If your ZipTip is getting clogged in the previous step, larger protein components should be precipitated out of solution by addition of 50% acetonitrile (this will also quench the reaction). Remove precipitated proteins by centrifugation of the reaction mixture (17,000 × g, 10 minutes, 25 °C)

6.3. MALDI-TOF-MS qualitative analysis of modified products

Timing: 30 minutes

  1. Prime a ZipTip by sequentially pipetting up and down gently, moving from solution A to solution C using a 10- or 20 μL micropipette. Avoid air bubble formation during this process.

  2. Load the reaction mixture onto the pre-equilibrated ZipTip by pipetting up and down 10 times into the reaction supernatant, avoiding disruption of the pelleted protein precipitate.

  3. De-salt the sample by washing 3 times with 10 μL of solution C. Wash can be discarded as waste.

  4. Elute target peptides from the ZipTip using 3 μL of 70% aqueous acetonitrile saturated with sinapinic acid. Deposit the eluted peptide onto a stainless-steel MALDI target plate and allow the spot to dry before analysis. Drying can be sped up using a small fan and lightly scraping the target’s surface with the pipette tip during elution can encourage crystallization.

  5. Analyze samples by MALDI-TOF mass spectrometry using a Bruker UltrafleXtreme instrument operating in reflector positive-ion mode. The instrument should be calibrated using the MALDI calibration kit prior to data collection. Depending on the expected size of the product (the molecular weight of the unmodified precursor peptide can be obtained easily using the Expasy ProtParam tool at https://web.expasy.org/protparam/), adjust the mass window for detection within your targeted mass range.

  6. Analyze data using the Bruker FlexAnalysis software (this software is only available for machines running a Windows OS).

6.4. ESI-HR-MS/MS analysis of modified products

Timing: 1 hour

  1. Desalt the reaction mixture supernatant using ZipTips and the protocol described above, eluting into 30 μL of 75% aqueous acetonitrile. Repeat this process with 3–5 separate ZipTips, to ensure a concentrated sample of the analyte peptides.

  2. Dry down the desalted sample using a Speedvac concentrator set to 25 °C.

  3. Redissolve the reaction mixture in 35% aqueous acetonitrile with 0.1% formic acid added. This solution can be directly infused onto the Orbitrap Fusion ESI-MS using a TriVersa Nanomate 100.

  4. Before infusing the sample, the ESI-MS should be calibrated and tuned appropriately. Our collected data used the Pierce LTQ Velos ESI Positive Ion Calibration Solution.

  5. Data can be analyzed using the Xcalibur software from Thermo Fisher or an equivalent software. Data should be averaged across the time dimension.

7. Summary

In this chapter, we have outlined the main challenges and opportunities associated with bioinformatic prediction of RiPP recognition elements (RREs), a prevalent domain in prokaryotic RiPP biosynthesis. RRE domains are highly sequence diverse but display well conserved secondary and tertiary structures. We have provided strategies to customize the prediction and annotation of RRE domains using state-of-the-art bioinformatic tools. In addition, we have reiterated the most useful methods for expression and purification of RRE domains for use in experimental assays, such as in vitro binding and enzyme activity assays. Herein, we have provided methods for targeted mutagenesis of RRE and leader peptide binding residues, and quantification of binding through fluorescence polarization and competition FP assays. Given that RRE domains are present in the majority of known prokaryotic RiPP classes, and numerous examples of novel RRE-dependent natural products have been reported in recent years, we anticipate yet-undiscovered classes of RiPPs will emerge in coming years. The strategies herein will enable natural product researchers to both mine novel gene clusters using the RRE as a bioinformatic handle and confirm function of these RRE domains through binding and enzyme activity assays.

Figure 9.

Figure 9.

MALDI-TOF MS data for in vitro activity assay using the thiomuracin RRE domain (TbtF). (A) Reaction scheme for thiazole/thiazoline formation on the thiomuracin precursor peptide. Cysteine residues highlighted in purple are cyclized by TbtG in an RRE-dependent process, and further oxidized by TbtE. (B) MALDI-TOF mass spectra showing the results of leaving individual reaction components out during in vitro activity assays. As shown, the TbtG cyclization reaction is dependent on the RRE (TbtF) as well as ATP. Figure adapted from (Hudson et al., 2015).

Acknowledgements

This work was supported by the National Institutes of Health (AI144967 to D.A.M.) and the Chemistry-Biology Interface Research Training Program (GM070421 to K.E.S.).

Footnotes

Conflict of interest statement

The authors declare no conflicts of interest.

References

  1. Abdelmohsen UR, Grkovic T, Balasubramanian S, Kamel MS, Quinn RJ, Hentschel U, 2015. Elicitation of secondary metabolism in actinomycetes. Biotechnology Advances 33, 798–811. 10.1016/j.biotechadv.2015.06.003 [DOI] [PubMed] [Google Scholar]
  2. Ahmed Y, Rebets Y, Estévez MR, Zapp J, Myronovskyi M, Luzhetskyy A, 2020. Engineering of Streptomyces lividans for heterologous expression of secondary metabolite gene clusters. Microb Cell Fact 19, 5. 10.1186/s12934-020-1277-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Alfi A, Popov A, Kumar A, Zhang KYJ, Dubiley S, Severinov K, Tagami S, 2022. Cell-Free Mutant Analysis Combined with Structure Prediction of a Lasso Peptide Biosynthetic Protease B2. ACS Synth. Biol 11, 2022–2028. 10.1021/acssynbio.2c00176 [DOI] [PubMed] [Google Scholar]
  4. Arnison PG, Bibb MJ, Bierbaum G, Bowers AA, Bugni TS, Bulaj G, Camarero JA, Campopiano DJ, Challis GL, Clardy J, Cotter PD, Craik DJ, Dawson M, Dittmann E, Donadio S, Dorrestein PC, Entian K-D, Fischbach MA, Garavelli JS, Göransson U, Gruber CW, Haft DH, Hemscheidt TK, Hertweck C, Hill C, Horswill AR, Jaspars M, Kelly WL, Klinman JP, Kuipers OP, Link AJ, Liu W, Marahiel MA, Mitchell DA, Moll GN, Moore BS, Müller R, Nair SK, Nes IF, Norris GE, Olivera BM, Onaka H, Patchett ML, Piel J, Reaney MJT, Rebuffat S, Ross RP, Sahl H-G, Schmidt EW, Selsted ME, Severinov K, Shen B, Sivonen K, Smith L, Stein T, Süssmuth RD, Tagg JR, Tang G-L, Truman AW, Vederas JC, Walsh CT, Walton JD, Wenzel SC, Willey JM, van der Donk WA, 2013. Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat. Prod. Rep 30, 108–160. 10.1039/C2NP20085F [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bothwell IR, Cogan DP, Kim T, Reinhardt CJ, van der Donk WA, Nair SK, 2019. Characterization of glutamyl-tRNA–dependent dehydratases using nonreactive substrate mimics. Proc. Natl. Acad. Sci. U.S.A 116, 17245–17250. 10.1073/pnas.1905240116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Burkhart BJ, Hudson GA, Dunbar KL, Mitchell DA, 2015. A prevalent peptide-binding domain guides ribosomal natural product biosynthesis. Nat. Chem. Biol 11, 564–570. https://doi.org/10.1038/nchembio.1856 http://www.nature.com/nchembio/journal/v11/n8/abs/nchembio.1856.html#supplementary-information [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Burkhart BJ, Kakkar N, Hudson GA, van der Donk WA, Mitchell DA, 2017. Chimeric Leader Peptides for the Generation of Non-Natural Hybrid RiPP Products. ACS Cent. Sci 3, 629–638. 10.1021/acscentsci.7b00141 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bushin LB, Clark KA, Pelczer I, Seyedsayamdost MR, 2018. Charting an Unexplored Streptococcal Biosynthetic Landscape Reveals a Unique Peptide Cyclization Motif. J Am Chem Soc 140, 17674–17684. 10.1021/jacs.8b10266 [DOI] [PubMed] [Google Scholar]
  9. Chan DCK, Burrows LL, 2021. Thiopeptides: antibiotics with unique chemical structures and diverse biological activities. J Antibiot 74, 161–175. 10.1038/s41429-020-00387-x [DOI] [PubMed] [Google Scholar]
  10. Chekan JR, Ongpipattanakul C, Nair SK, 2019. Steric complementarity directs sequence promiscuous leader binding in RiPP biosynthesis. Proc. Natl. Acad. Sci. U.S.A 116, 24049–24055. 10.1073/pnas.1908364116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. de los Santos ELC, 2019. NeuRiPP: Neural network identification of RiPP precursor peptides. Sci Rep 9, 13406. 10.1038/s41598-019-49764-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Dias DA, Urban S, Roessner U, 2012. A Historical Overview of Natural Products in Drug Discovery. Metabolites 2, 303–336. 10.3390/metabo2020303 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. DiCaprio AJ, Firouzbakht A, Hudson GA, Mitchell DA, 2019. Enzymatic Reconstitution and Biosynthetic Investigation of the Lasso Peptide Fusilassin. J. Am. Chem. Soc 141, 290–297. 10.1021/jacs.8b09928 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Dunbar KL, Tietz JI, Cox CL, Burkhart BJ, Mitchell DA, 2015. Identification of an Auxiliary Leader Peptide-Binding Protein Required for Azoline Formation in Ribosomal Natural Products. J Am Chem Soc 137, 7672–7677. 10.1021/jacs.5b04682 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Eddy SR, 2004. What is a hidden Markov model? Nat Biotechnol 22, 1315–1316. 10.1038/nbt1004-1315 [DOI] [PubMed] [Google Scholar]
  16. Evans RL, Latham JA, Xia Y, Klinman JP, Wilmot CM, 2017. Nuclear Magnetic Resonance Structure and Binding Studies of PqqD, a Chaperone Required in the Biosynthesis of the Bacterial Dehydrogenase Cofactor Pyrroloquinoline Quinone. Biochemistry 56, 2735–2746. 10.1021/acs.biochem.7b00247 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Finn RD, Clements J, Arndt W, Miller BL, Wheeler TJ, Schreiber F, Bateman A, Eddy SR, 2015. HMMER web server: 2015 update. Nucleic Acids Res 43, W30–W38. 10.1093/nar/gkv397 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Georgiou MA, Dommaraju SR, Guo X, Mast DH, Mitchell DA, 2020. Bioinformatic and Reactivity-Based Discovery of Linaridins. ACS Chem. Biol 15, 2976–2985. 10.1021/acschembio.0c00620 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Grove TL, Himes PM, Hwang S, Yumerefendi H, Bonanno JB, Kuhlman B, Almo SC, Bowers AA, 2017. Structural Insights into Thioether Bond Formation in the Biosynthesis of Sactipeptides. J. Am. Chem. Soc 139, 11734–11744. 10.1021/jacs.7b01283 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Guerrero-Garzón JF, Madland E, Zehl M, Singh M, Rezaei S, Aachmann FL, Courtade G, Urban E, Rückert C, Busche T, Kalinowski J, Cao Y-R, Jiang Y, Jiang C, Selivanova G, Zotchev SB, 2020. Class IV Lasso Peptides Synergistically Induce Proliferation of Cancer Cells and Sensitize Them to Doxorubicin. iScience 23, 101785. 10.1016/j.isci.2020.101785 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Haft DH, 2009. A strain-variable bacteriocin in Bacillus anthracis and Bacillus cereus with repeated Cys-Xaa-Xaa motifs. Biol Direct 4, 15. 10.1186/1745-6150-4-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Haft DH, 2001. TIGRFAMs: a protein family resource for the functional identification of proteins. Nucleic Acids Research 29, 41–43. 10.1093/nar/29.1.41 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Harris LA, Saint-Vincent PMB, Guo X, Hudson GA, DiCaprio AJ, Zhu L, Mitchell DA, 2020. Reactivity-Based Screening for Citrulline-Containing Natural Products Reveals a Family of Bacterial Peptidyl Arginine Deiminases. ACS Chem. Biol 15, 3167–3175. 10.1021/acschembio.0c00685 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Hegemann JD, Süssmuth RD, 2020. Matters of class: coming of age of class III and IV lanthipeptides. RSC Chem. Biol 1, 110–127. 10.1039/D0CB00073F [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Hegemann JD, van der Donk WA, 2018. Investigation of Substrate Recognition and Biosynthesis in Class IV Lanthipeptide Systems. J. Am. Chem. Soc 140, 5743–5754. 10.1021/jacs.8b01323 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Hudson GA, Burkhart BJ, DiCaprio AJ, Schwalen CJ, Kille B, Pogorelov TV, Mitchell DA, 2019. Bioinformatic Mapping of Radical S-Adenosylmethionine-Dependent Ribosomally Synthesized and Post-Translationally Modified Peptides Identifies New Cα, Cβ, and Cγ-Linked Thioether-Containing Peptides. J. Am. Chem. Soc 141, 8228–8238. 10.1021/jacs.9b01519 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Hudson GA, Mitchell DA, 2018. RiPP antibiotics: biosynthesis and engineering potential. Curr Opin Microbiol 45, 61–69. 10.1016/j.mib.2018.02.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Hudson GA, Zhang Z, Tietz JI, Mitchell DA, van der Donk WA, 2015. In Vitro Biosynthesis of the Core Scaffold of the Thiopeptide Thiomuracin. J. Am. Chem. Soc 137, 16012–16015. 10.1021/jacs.5b10194 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Johnson LS, Eddy SR, Portugaly E, 2010. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics 11, 431. 10.1186/1471-2105-11-431 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Katoh K, Standley DM, 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30, 772–80. 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Katz L, Baltz RH, 2016. Natural product discovery: past, present, and future. Journal of Industrial Microbiology and Biotechnology 43, 155–176. 10.1007/s10295-015-1723-5 [DOI] [PubMed] [Google Scholar]
  32. Kautsar SA, Blin K, Shaw S, Navarro-Muñoz JC, Terlouw BR, van der Hooft JJJ, van Santen JA, Tracanna V, Suarez Duran HG, Pascal Andreu V, Selem-Mojica N, Alanjary M, Robinson SL, Lund G, Epstein SC, Sisto AC, Charkoudian LK, Collemare J, Linington RG, Weber T, Medema MH, 2019. MIBiG 2.0: a repository for biosynthetic gene clusters of known function. Nucleic Acids Research gkz 882. 10.1093/nar/gkz882 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Kloosterman AM, Medema MH, van Wezel GP, 2021. Omics-based strategies to discover novel classes of RiPP natural products. Current Opinion in Biotechnology 69, 60–67. 10.1016/j.copbio.2020.12.008 [DOI] [PubMed] [Google Scholar]
  34. Kloosterman AM, Shelton KE, van Wezel GP, Medema MH, Mitchell DA, 2020. RRE-Finder: a Genome-Mining Tool for Class-Independent RiPP Discovery. mSystems 5, mSystems.00267-20, e00267-20. 10.1128/mSystems.00267-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Koehnke J, Mann G, Bent AF, Ludewig H, Shirran S, Botting C, Lebl T, Houssen WE, Jaspars M, Naismith JH, 2015. Structural analysis of leader peptide binding enables leader-free cyanobactin processing. Nat Chem Biol 11, 558–563. 10.1038/nchembio.1841 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Li B, Sher D, Kelly L, Shi Y, Huang K, Knerr PJ, Joewono I, Rusch D, Chisholm SW, van der Donk WA, 2010. Catalytic promiscuity in the biosynthesis of cyclic peptide secondary metabolites in planktonic marine cyanobacteria. Proc. Natl. Acad. Sci. U.S.A 107, 10430–10435. 10.1073/pnas.0913677107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Mavaro A, Abts A, Bakkes PJ, Moll GN, Driessen AJM, Smits SHJ, Schmitt L, 2011. Substrate Recognition and Specificity of the NisB Protein, the Lantibiotic Dehydratase Involved in Nisin Biosynthesis. J Biol Chem 286, 30552–30560. 10.1074/jbc.M111.263210 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Melby JO, Dunbar KL, Trinh NQ, Mitchell DA, 2012. Selectivity, Directionality, and Promiscuity in Peptide Processing from a Bacillus sp. Al Hakam Cyclodehydratase. J. Am. Chem. Soc 134, 5309–5316. 10.1021/ja211675n [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, Tosatto SCE, Paladin L, Raj S, Richardson LJ, Finn RD, Bateman A, 2021. Pfam: The protein families database in 2021. Nucleic Acids Research 49, D412–D419. 10.1093/nar/gkaa913 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Mistry J, Finn RD, Eddy SR, Bateman A, Punta M, 2013. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Research 41, e121–e121. 10.1093/nar/gkt263 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Mitchell DA, Lee SW, Pence MA, Markley AL, Limm JD, Nizet V, Dixon JE, 2009. Structural and Functional Dissection of the Heterocyclic Peptide Cytotoxin Streptolysin S. J Biol Chem 284, 13004–13012. 10.1074/jbc.M900802200 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Montalbán-López M, Scott TA, Ramesh S, Rahman IR, van Heel AJ, Viel JH, Bandarian V, Dittmann E, Genilloud O, Goto Y, Grande Burgos MJ, Hill C, Kim S, Koehnke J, Latham JA, Link AJ, Martínez B, Nair SK, Nicolet Y, Rebuffat S, Sahl H-G, Sareen D, Schmidt EW, Schmitt L, Severinov K, Süssmuth RD, Truman AW, Wang H, Weng J-K, van Wezel GP, Zhang Q, Zhong J, Piel J, Mitchell DA, Kuipers OP, van der Donk WA, 2021. New developments in RiPP discovery, enzymology and engineering. Nat. Prod. Rep 38, 130–239. 10.1039/D0NP00027B [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Moon K, Xu F, Zhang C, Seyedsayamdost MR, 2019. Bioactivity-HiTES Unveils Cryptic Antibiotics Encoded in Actinomycete Bacteria. ACS Chem. Biol 14, 767–774. 10.1021/acschembio.9b00049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Myronovskyi M, Rosenkränzer B, Nadmid S, Pujic P, Normand P, Luzhetskyy A, 2018. Generation of a cluster-free Streptomyces albus chassis strains for improved heterologous expression of secondary metabolite clusters. Metabolic Engineering 49, 316–324. 10.1016/j.ymben.2018.09.004 [DOI] [PubMed] [Google Scholar]
  45. Oberg N, Precord TW, Mitchell DA, Gerlt JA, 2022. RadicalSAM.org: A Resource to Interpret Sequence-Function Space and Discover New Radical SAM Enzyme Chemistry. ACS Bio Med Chem Au 2, 22–35. 10.1021/acsbiomedchemau.1c00048 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Ortega MA, Hao Y, Zhang Q, Walker MC, van der Donk WA, Nair SK, 2014. Structure and mechanism of the tRNA-dependent lantibiotic dehydratase NisB. Nature. 10.1038/nature13888 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Pettit RK, 2011. Small-molecule elicitation of microbial secondary metabolites: Elicitation of microbial secondary metabolites. Microbial Biotechnology 4, 471–478. 10.1111/j.1751-7915.2010.00196.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Pimentel-Elardo SM, Sørensen D, Ho L, Ziko M, Bueler SA, Lu S, Tao J, Moser A, Lee R, Agard D, Fairn G, Rubinstein JL, Shoichet BK, Nodwell JR, 2015. Activity-Independent Discovery of Secondary Metabolites Using Chemical Elicitation and Cheminformatic Inference. ACS Chem. Biol 10, 2616–2623. 10.1021/acschembio.5b00612 [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Precord TW, Mahanta N, Mitchell DA, 2019. Reconstitution and Substrate Specificity of the Thioether-Forming Radical S -Adenosylmethionine Enzyme in Freyrasin Biosynthesis. ACS Chem. Biol 14, 1981–1989. 10.1021/acschembio.9b00457 [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Ramesh S, Guo X, DiCaprio AJ, De Lio AM, Harris LA, Kille BL, Pogorelov TV, Mitchell DA, 2021. Bioinformatics-Guided Expansion and Discovery of Graspetides. ACS Chem. Biol 16, 2787–2797. 10.1021/acschembio.1c00672 [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Raran-Kurussi S, Cherry S, Zhang D, Waugh DS, 2017. Removal of Affinity Tags with TEV Protease, in: Burgess-Brown NA (Ed.), Heterologous Gene Expression in E.Coli, Methods in Molecular Biology. Springer; New York, New York, NY, pp. 221–230. 10.1007/978-1-4939-6887-9_14 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Repka LM, Chekan JR, Nair SK, van der Donk WA, 2017. Mechanistic Understanding of Lanthipeptide Biosynthetic Enzymes. Chem. Rev 117, 5457–5520. 10.1021/acs.chemrev.6b00591 [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Robinson SL, Piel J, Sunagawa S, 2021. A roadmap for metagenomic enzyme discovery. Nat. Prod. Rep 38, 1994–2023. 10.1039/D1NP00006C [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Russell AH, Vior NM, Hems ES, Lacret R, Truman AW, 2021. Discovery and characterisation of an amidine-containing ribosomally-synthesised peptide that is widely distributed in nature. Chem. Sci 12, 11769–11778. 10.1039/D1SC01456K [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Saad H, Aziz S, Gehringer M, Kramer M, Straetener J, Berscheid A, Brötz‐Oesterhelt H, Gross H, 2021. Nocathioamides, Uncovered by a Tunable Metabologenomic Approach, Define a Novel Class of Chimeric Lanthipeptides. Angew. Chem. Int. Ed 60, 16472–16479. 10.1002/anie.202102571 [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Santos-Aberturas J, Chandra G, Frattaruolo L, Lacret R, Pham TH, Vior NM, Eyles TH, Truman AW, 2019. Uncovering the unexplored diversity of thioamidated ribosomal peptides in Actinobacteria using the RiPPER genome mining tool. Nucleic Acids Res 47, 4624–4637. 10.1093/nar/gkz192 [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Schramma KR, Bushin LB, Seyedsayamdost MR, 2015. Structure and biosynthesis of a macrocyclic peptide containing an unprecedented lysine-to-tryptophan crosslink. Nat Chem 7, 431–437. 10.1038/nchem.2237 [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Schwalen CJ, Hudson GA, Kille B, Mitchell DA, 2018. Bioinformatic Expansion and Discovery of Thiopeptide Antibiotics. J. Am. Chem. Soc 140, 9494–9501. 10.1021/jacs.8b03896 [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Schwalen CJ, Hudson GA, Kosol S, Mahanta N, Challis GL, Mitchell DA, 2017. In Vitro Biosynthetic Studies of Bottromycin Expand the Enzymatic Capabilities of the YcaO Superfamily. J Am Chem Soc 139, 18154–18157. 10.1021/jacs.7b09899 [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T, 2003. Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Res. 13, 2498–2504. 10.1101/gr.1239303 [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Soding J, Biegert A, Lupas AN, 2005. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Research 33, W244–W248. 10.1093/nar/gki408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. The UniProt Consortium, Bateman A, Martin M-J, Orchard S, Magrane M, Agivetova R, Ahmad S, Alpi E, Bowler-Barnett EH, Britto R, Bursteinas B, Bye-A-Jee H, Coetzee R, Cukura A, Da Silva A, Denny P, Dogan T, Ebenezer T, Fan J, Castro LG, Garmiri P, Georghiou G, Gonzales L, Hatton-Ellis E, Hussein A, Ignatchenko A, Insana G, Ishtiaq R, Jokinen P, Joshi V, Jyothi D, Lock A, Lopez R, Luciani A, Luo J, Lussi Y, MacDougall A, Madeira F, Mahmoudy M, Menchi M, Mishra A, Moulang K, Nightingale A, Oliveira CS, Pundir S, Qi G, Raj S, Rice D, Lopez MR, Saidi R, Sampson J, Sawford T, Speretta E, Turner E, Tyagi N, Vasudev P, Volynkin V, Warner K, Watkins X, Zaru R, Zellner H, Bridge A, Poux S, Redaschi N, Aimo L, Argoud-Puy G, Auchincloss A, Axelsen K, Bansal P, Baratin D, Blatter M-C, Bolleman J, Boutet E, Breuza L, Casals-Casas C, de Castro E, Echioukh KC, Coudert E, Cuche B, Doche M, Dornevil D, Estreicher A, Famiglietti ML, Feuermann M, Gasteiger E, Gehant S, Gerritsen V, Gos A, Gruaz-Gumowski N, Hinz U, Hulo C, Hyka-Nouspikel N, Jungo F, Keller G, Kerhornou A, Lara V, Le Mercier P, Lieberherr D, Lombardot T, Martin X, Masson P, Morgat A, Neto TB, Paesano S, Pedruzzi I, Pilbout S, Pourcel L, Pozzato M, Pruess M, Rivoire C, Sigrist C, Sonesson K, Stutz A, Sundaram S, Tognolli M, Verbregue L, Wu CH, Arighi CN, Arminski L, Chen C, Chen Y, Garavelli JS, Huang H, Laiho K, McGarvey P, Natale DA, Ross K, Vinayaka CR, Wang Q, Wang Y, Yeh L-S, Zhang J, Ruch P, Teodoro D, 2021. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Research 49, D480–D489. 10.1093/nar/gkaa1100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Tietz JI, Schwalen CJ, Patel PS, Maxson T, Blair PM, Tai H-C, Zakai UI, Mitchell DA, 2017. A new genome-mining tool redefines the lasso peptide biosynthetic landscape. Nat Chem Biol 13, 470–478. 10.1038/nchembio.2319 [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Tocchetti A, Iorio M, Hamid Z, Armirotti A, Reggiani A, Donadio S, 2021. Understanding the Mechanism of Action of NAI-112, a Lanthipeptide with Potent Antinociceptive Activity. Molecules 26, 6764. 10.3390/molecules26226764 [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Tsai T-Y, Yang C-Y, Shih H-L, Wang AH-J, Chou S-H, 2009. Xanthomonas campestris PqqD in the pyrroloquinoline quinone biosynthesis operon adopts a novel saddle-like fold that possibly serves as a PQQ carrier. Proteins 76, 1042–1048. 10.1002/prot.22461 [DOI] [PubMed] [Google Scholar]
  66. van der Donk WA, Nair SK, 2014. Structure and mechanism of lanthipeptide biosynthetic enzymes. Current Opinion in Structural Biology 29, 58–66. 10.1016/j.sbi.2014.09.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Walker JA, Hamlish N, Tytla A, Brauer DD, Francis MB, Schepartz A, 2022. Redirecting RiPP Biosynthetic Enzymes to Proteins and Backbone-Modified Substrates. ACS Cent. Sci 8, 473–482. 10.1021/acscentsci.1c01577 [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Walker MC, Eslami SM, Hetrick KJ, Ackenhusen SE, Mitchell DA, van der Donk WA, 2020. Precursor peptide-targeted mining of more than one hundred thousand genomes expands the lanthipeptide natural product family. BMC Genomics 21, 387. 10.1186/s12864-020-06785-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Worthen DB, 2007. Streptomyces in Nature and Medicine: The Antibiotic Makers. Journal of the History of Medicine and Allied Sciences 63, 273–274. 10.1093/jhmas/jrn016 [DOI] [Google Scholar]
  70. Yu Y, Duan L, Zhang Q, Liao R, Ding Y, Pan H, Wendt-Pienkowski E, Tang G, Shen B, Liu W, 2009. Nosiheptide Biosynthesis Featuring a Unique Indole Side Ring Formation on the Characteristic Thiopeptide Framework. ACS Chem. Biol 4, 855–864. 10.1021/cb900133x [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Zallot R, Oberg N, Gerlt JA, 2019. The EFI Web Resource for Genomic Enzymology Tools: Leveraging Protein, Genome, and Metagenome Databases to Discover Novel Enzymes and Metabolic Pathways. Biochemistry 58, 4169–4182. 10.1021/acs.biochem.9b00735 [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Zardecki C, Dutta S, Goodsell DS, Voigt M, Burley SK, 2016. RCSB Protein Data Bank: A Resource for Chemical, Biochemical, and Structural Explorations of Large and Small Biomolecules. J. Chem. Educ 93, 569–575. 10.1021/acs.jchemed.5b00404 [DOI] [Google Scholar]
  73. Zhang Z, Hudson GA, Mahanta N, Tietz JI, van der Donk WA, Mitchell DA, 2016. Biosynthetic Timing and Substrate Specificity for the Thiopeptide Thiomuracin. J. Am. Chem. Soc 138, 15511–15514. 10.1021/jacs.6b08987 [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES