Abstract
Peptide-based therapeutics are increasingly pushing to the forefront of biomedicine with their promise of high specificity and low toxicity. Although noncanonical residues can always be used, employing only the natural 20 residues restricts the chemical space to a finite dimension allowing for comprehensive in silico screening. Towards this goal, the dataset comprising all possible di-, tri-, and tetra-peptide combinations of the canonical residues has been previously reported. However, with increasing computational power, the comprehensive set of pentapeptides is now also feasible for screening as the comprehensive set of cyclic peptides comprising four or five residues. Here, we provide both the complete and prefiltered libraries of all di-, tri-, tetra-, and penta-peptide sequences from 20 canonical amino acids and their homodetic (N-to-C-terminal) cyclic homologues. The FASTA, simplified molecular-input line-entry system (SMILES), and structure-data file (SDF)-three dimension (3D) libraries can be readily used for screening against protein targets. We also provide a simple method and tool for conducting identity-based filtering. Access to this dataset will accelerate small peptide screening workflows and encourage their use in drug discovery campaigns. As a case study, the developed library was screened against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease to identify potential small peptide inhibitors.
Keywords: Dipeptides, Tripeptides, Tetrapeptides, Pentapeptides, N-to-C-terminal cyclic peptides, Peptide library
Graphical abstract
Highlights
-
•
All possible sequences of di, tri, tetra, and pentapeptides using canonical amino acids provided.
-
•
N-to-C-terminal cyclization of all peptides provided.
-
•
Methods for the filtration of peptides based on ease of synthesis, solubility, and stability provided.
-
•
Methods for the removal of duplicate cyclicpeptides implemented.
-
•
The toolkit is used to identify SARS-CoV-2 Mpro inhibitors.
1. Introduction
With few exceptions, including insulin, therapeutic peptides have only recently become increasingly prevalent. Approximately 10% of the drugs approved between 2009 and 2013 were peptide-derived, and most of these compounds are first-in-class drugs, e.g., bivalirudin, desmopressin, and octreotide [1]. As of this study, over 80 unique peptide drugs have been approved worldwide, while over 150 are in development [2,3].
Most drugs work by interfering, strengthening, or modifying protein-ligand interactions. This defines ligand in the broadest possible way: small molecules, ions, proteins, nucleic acids, lipids, peptides, or peptidomimetics. Over 40% of all intracellular interactions involve peptide-protein interactions, and many more involve protein-protein interactions. In both cases, peptides and peptidomimetics are ideal interfering agents [4]. Cyclic peptides have emerged as a promising class of molecules with the potential to selectively target “undruggable” or “blank wall” surfaces of proteins involved in protein-protein interactions. Such targets are often challenging to access with conventional small molecules, and cyclic peptides offer an alternative approach to address this limitation. Moreover, cyclic peptides have a unique position in drug discovery as they occupy a niche between small molecules and biological therapeutics [5].
Biologics comprise a broad range of polysaccharides, proteins, nucleic acids or their conjugates, e.g., vaccines, blood components, somatic cells, and recombinant proteins. They are usually produced in living organisms (animals, microorganisms, or even humans) and then processed ex vivo. They often offer treatment options for disorders with no available treatments [5,6]. However, biologics are often complex mixtures, difficult to characterize. Furthermore, they are generally high-cost, low stability, heat sensitive, susceptible to microbial contamination, and immunogenic [7,8]. Because of their limited membrane permeability, sensitivity to digestion, and first-pass metabolism, they are frequently administrated parenterally, which causes patient compliance issues [9].
In contrast, small molecules (usually < 1,000 Da) offer enormous chemical diversity to interact with their targets. Inherent advantages typically associated with small molecules include low cost, high stability, good bioavailability, and easy administration [5]. They often effectively bind to small binding pockets in target proteins but have limited application in blocking protein-protein interactions due to their small surface area [10]. However, this small size means they can often interact with other proteins, and many exhibit off-target effects and toxicity. Their non-endogenous metabolites can also induce undesired physiological responses [11].
Small peptides lie between both classes of therapeutics. They are often better tolerated and more specific than small molecules, while being lower cost and more stable than biologics. They share the single characterizable entity benefit of small molecules but can present specificity similar to biologics [12,13]. Tetra- and penta-peptides offer an optimum size to occupy catalytic or molecular recognition sites often occupied by small molecule metabolites or the termini of endogenous peptides or proteins [14]. Larger peptides can imitate proteins to block surfaces. Together these opportunities have driven the rise of peptides [3,[15], [16], [17]].
Working with peptides has the benefit of occupying an extensive yet finite chemical space, provided that one only considers the canonical residues. This indicates that it can be thoroughly investigated either in vitro or, theoretically, in vivo. Even though these experiments are generally cost-prohibitive, a thorough preliminary in silico screening is still doable.
Recent advances in docking algorithms for the discovery of small molecules against particular targets have been astounding [18,19]. Many docking algorithms are available for protein-protein and peptide-protein docking that can accommodate the conformational restrictions on the inherently flexible nature of peptides and specific amino acids. A number of benchmark studies have evaluated the accuracy of these algorithms, and they are continually getting better [20,21]. Despite this progress, high throughput screens of small peptides in computer-aided drug discovery (CADD) campaigns are still not frequently performed [22], possibly due to the large number of calculations required, difficulties in handling the conformational complexities of peptides, and a more computationally expensive workflow compared to small molecules. But a primary limitation is that comprehensive machine-readable peptide libraries of dockable peptides with all possible sequences are unavailable.
To address these challenges, we have developed, and here make available, the comprehensive library of all possible di-, tri-, tetra-, and penta-peptides sequences from 20 canonical amino acids. We have also provided a library of N-to-C-terminal cyclic analogues of these peptides. To demonstrate the potential of this tool, as part of our broader effort in CADD, we provide a case study screening this tool against the target-du-jour, the main protease (Mpro) or 3-chymotrypsin-like protease (3CLpro) of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a cysteine protease [23].
Although the novel coronavirus has a unique Mpro, it is still very similar to that of the original SARS-CoV-1 and is a well-understood enzyme and a well-investigated drug target [24]. This, of course, accelerated with the emergence of SARS-CoV-2 in late 2019 and early 2020, and the crystal structure of the SARS-CoV-2 Mpro was solved extremely quickly, and nearly simultaneously, by several different groups [[25], [26], [27], [28]] as part of what is undoubtedly the fastest and most intense drug discovery effort in human history. Mpro cleaves polypeptides after a glutamine residue, making it a promising drug target because no human proteases share this substrate specificity, meaning off-target effects can be expected to be minimal [27,29]. The active site of Mpro consists of a catalytic dyad of Cys145 and His41. Other anchoring residues important in ligand interactions in cocrystallized structures include Thr25, Thr26, Asn142, Gly143, Ser144, Met165, Glu166, and Gln189 [30].
Inhibiting SARS-CoV-2 Mpro would prevent the virus from processing the polypeptide properly and would consequently prevent capsid formation and release from the infected cell; this would halt infection [31]; however, despite very significant effort, there are no drugs in advanced clinical trials that are highly effective, and despite the efforts from many academic groups suggesting drug repurposing might prove helpful, it would be incredibly naïve to consider that all relevant extant drugs were not investigated immediately by major players within weeks and months of the emergence of the disease. Furthermore, 61 independent reports evaluated by Macip and coworkers [32] do not agree or converge around a common series. Regardless, repurposed drugs have been thoroughly screened in vitro [33]. Although some are initially promising [[34], [35], [36], [37]], none have proven clinically effective, including those that were developed to target the main proteases of other related viruses: the active site is significantly different, even from closely related SARS-CoV-1 [38], and the entire protein is highly dynamic for a globular enzyme that makes small molecule drugs difficult to design. New chemical material is required.
The MPro enzyme is responsible for cleaving the viral polypeptide at various points along its sequence, and as such, its active site must be dynamic. While the natural ligand for MPro is a peptide, researchers have identified a peptidomimetic molecule with some activities against the enzyme. This peptidomimetic was discovered through a high-throughput screen of Ugi reaction products, a process that involves rapidly testing large numbers of chemical compounds for activity against a specific target [39]. Although natural peptides are unlikely to be useful drugs themselves [40], they might prove to be useful starting points for structure-activity relationships through both incorporating unnatural residues and modifying the backbone to increase the stability of the peptide (both physiologically and to avoid cleavage by peptidase activity in the binding site) and affinity for the target [41]. But for this to be viable, we require a method to identify promising peptides to be validated in vitro as potential inhibitors. This makes SARS-CoV-2 Mpro a relevant example target for our peptide screening library to be evaluated against.
2. Methods
2.1. Preparation of peptide library
Single letter codes of all 20 canonical amino acids (A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, and Y) were used to make all possible sequences of di, tri-, tetra-, and penta-peptides based on the relation 20n (n = 2, 3, 4, and 5) using MS Excel. These peptides were either filtered for their physicochemical properties or linked via N-to-C-terminal amide linkage to get cyclic peptides.
2.1.1. Peptide filtration based on physico-chemical properties
The interplay between the constituent amino acids of a peptide is responsible for its solubility, stability, and ease of synthesis; consequently, we chose seven filters that screen these parameters. Although many complex tools could be used to conduct these filtrations, the digital nature of the peptide sequences (consisting of only 20 values) makes them extremely easy to manipulate in Excel, which also lowers the barrier to use. Excel also simplifies modifying the filters we applied, allowing other users to either easily expand them should they believe we were insufficiently restrictive in our selection process or reduce them should the user believe we were overly zealous in our elimination of sequences. This can be accomplished through simple manipulation of the provided formulae. However, in the case of reducing the filters, we caution that the user will need to optimize additional peptide conformations. In all cases, we have provided examples of the process in the Supplementary data that can be applied to the full dataset.
2.1.1.1. Filter 1: peptides that can form gels in an aqueous solution
Peptides having a very high ratio (>75%) of amino acids that can form intermolecular hydrogen bonds (D, E, H, K, N, Q, R, S, T or Y) reliably result in gel formation in water, rendering them water-insoluble. We removed these peptides by simply eliminating the entries in the Excel file. This was accomplished by copying the column containing all sequences (column A) to column B. Using the “replace” function on column B only, substitute “X” for all of D, E, H, K, N, Q, R, S, T, and Y. To remove peptides that are comprised of >75% of above stated resides, cells in column B containing >75% X were deleted. This is achieved by using replace option of Excel for the column: for a pentapeptide, put ∗X∗X∗X∗X∗ in the “find” field, leaving the “replace” field blank. All cells containing four or five Xs (4 Xs is 80%) will become blank. Sort the entire worksheet by column B. This will put the blank cells at the top. Remove all rows that have no entry in column B. Column B can then be deleted. This removes all of these peptides from the list, and this new list can then be applied to the next filter.
2.1.1.2. Filter 2: peptides with 50% or more hydrophobic residues are insoluble or partly soluble in water
We can employ an analogous method to remove entries with >50% hydrophobic residues (A, I, L, M, F, W, Y, and V). Again, duplicate column A to column B, convert the hydrophobic residues to X, and replace ∗X∗X∗X∗ with a blank for pentapeptides with blank cell. Once the empty rows are sorted and deleted, these sequences are removed.
2.1.1.3. Filter 3: multiple cysteines and methionines in a peptide are prone to oxidation, and multiple cysteines can also form a disulfide bridge
This filter should be run at the discretion of the user. As the peptides examined here are so small, intramolecular disulfide bridges are unlikely, and free cysteines could be more likely to form intermolecular bridges. So, any in vitro test of these peptides might provide data about complex mixtures of oligomers rather than the planned peptide. Although these peptides are synthetically accessible, the complexities they could introduce into both synthesis and screening, in our opinion and experience in peptide synthesis, merits their exclusion. The filter was applied as above by replacing any strings containing ∗C∗C∗ and ∗M∗M∗ with blank cells to remove entries having two or more cysteines or methionines.
2.1.1.4. Filter 4: multiple prolines or serines in a small peptide can readily cause deletions during synthesis, and multiple prolines can also experience cis/trans isomerization
Again, this filter may or may not be appropriate for a given application. We employ it to ensure a higher hit rate for our peptides in any in vitro test. To employ this filter, replace ∗S∗S∗ and ∗P∗P∗ with blank cells to remove entries having two or more serine or proline residues.
2.1.1.5. Filter 5: aspartic acid next to glycine, serine, or proline can readily undergo hydrolytic peptide cleavage in an acidic solution
Again, this filter may not always be appropriate, and this complication can be evaded with sufficient care in synthesis. However, these anticipated side reactions could introduce challenges for a simple screen. To employ this filter, remove aspartic acid glycine pairs, and remove ∗GD∗ and ∗DG∗ with blank cells. Similarly, use ∗PD∗ and ∗DP∗ to remove aspartic acid and proline pairs and ∗SD∗ and ∗DS∗ to remove aspartic acid serine pairs.
2.1.1.6. Filter 6: N-terminal glutamine can form pyroglutamate in an acidic solution
This could even be desirable under specific conditions. But it can be simpler to exclude these examples. To execute this filter, put the FASTA data in column A and put: in all cells of column B. Use formula “=B1&A1” to get: at the start of every entry (left-hand side, i.e., N terminal). Paste the formula-generated column to column C as text to avoid the formula. Remove columns A and B. Put: Q∗ in find and leave replace field blank to remove sequences starting with Q. This same syntax can be used to remove any other problematic N-terminal sequence. An analogous process could be used to remove any problematic C-terminal sequence.
2.1.1.7. Filter 7: protecting group on N-terminal asparagine are difficult to remove
This filter was employed simply to ease synthesis. Again, this is synthetically solvable but requires specialist peptide synthesis knowledge. It may not be appropriate for all groups looking to screen and prepare peptides; if they are being made in-house by specialists, it can be ignored. If they are being ordered in bulk with limited characterization by the supplier, we suggest that it might be relevant. It was implemented using the protocol used in Filter 6, but employing: N∗.
It is worth noting that these filters were only performed on pentapeptides. Filtered pentapeptides and all di-, tri-, and tetra-peptides were placed in text files in FASTA format along with their FASTA identifiers. The files were loaded in MarvinView [42], and the structures were saved as simplified molecular-input line-entry system (SMILES) [43] and structure-data file (SDF) files along with their FASTA [44] identifiers for later use in docking. The filters were employed to reduce the number of peptides needing to be screened. Of course problematic sequences can be removed from the final list by visual inspection by one skilled in the art; the balance of increased screening time with the risk of missing promising hits must be carefully considered by the user and all filters should be carefully considered.
2.1.2. Preparation of cyclic peptides
All possible combinations of di-, tri-, tetra-, and penta-peptides were used without any prefiltration to generate head-to-tail cyclic amide-linked peptides. To generate this library, the peptides in FASTA format were placed in Excel in column A. Column B was populated with X (the FASTA notation for a general amino acid with an arbitrary sidechain). The formula “=B1&A1&B1” was applied in column C to get X at the start and end of each peptide (i.e., XpeptideX). These peptides were copied in a.txt file, and the file opened in MarvinView. The structures were saved in SMILES format. The resulting SMILES file was opened in notepad++, and the following replacements were made:
For N terminal: ([H])C(=O)[C@]([H])([∗])N([H]) to 9, and (C(=O)[C@]([H])([∗])N([H])[H]) to 9.
For C terminal: [H]OC(=O)[C@]([H])([∗])N([H])C to C9, and C(=O)N([H])[C@@]([H])([∗])C(=O)O[H] to C9(=O).
The FASTA identifiers were placed next to the SMILES separated by a space. The resulting SMILES were filtered for duplicates using Open Babel [45] through the option “remove duplicates with descriptor” InChI (the IUPAC international chemical identifier). After removing the duplicate entries, the structures were exported as SMILES files. The SMILES files were loaded in MarvinView and the structures were saved as an SDF file for docking.
2.2. Virtual screening
2.2.1. Ligand preparation
The ligands in the SDF file were imported into Maestro, and the three dimension (3D) structures of the compounds were batch prepared using the LigPrep utility of Maestro (Schrödinger Release 2020-4) [46]. The possible tautomers of the compounds were produced using Epik [47] at the target pH of 7 ± 2. Only one structure was produced for each ligand.
2.2.2. Protein preparation
The crystal structure of SARS-CoV-2 Mpro was downloaded from the Protein Data Bank (PDB ID 6XR3, resolution 1.45 Å) [48]. The Protein Preparation Wizard of the Maestro was used to optimize the protein for docking. The preparation steps used are standard: addition of hydrogens, optimization of the hydrogen bond networks, and assignment of protonation states of histidine residues, as described elsewhere [42,49]. Water molecules were removed, followed by a restrained minimization step using the OPLS4 force field, which uses an implicit water model, generally useful for globular proteins [47]. Maestro's Receptor Grid Generation utility was then employed to specify the docking site by generating a cubical grid box with the center at (9.11, 27.14, 22.64) and 23 Å length. This is the cener of the active site, so we are avoiding an examination of allosteric inhibitors or peptides that might be useful in preventing dimerization.
2.2.3. Docking
2.2.3.1. Glide
In the first step of virtual screening, the docking was performed using the Glide docking tool of Maestro with high throughput virtual screening (HTVS) mode giving one output pose for each ligand [50]. Due to the sheer size of the ligand library, the top 600,000 (∼30%) hits were selected for a robust screening step using Glide standard precision-peptide (SP-peptide) docking mode. The top 60,000 (10%) hits were subjected to Glide extra precision (XP) docking.
2.2.3.2. Gold
The peptide library was also screened using the Gold ChemPLP (piecewise linear potential) [51] scoring function. The top 600,000 (∼30%) hits were subjected to the Gold fitness score [52] with 10% search efficiency (Goldscore 10%). The top 60,000 (10%) hits were selected for a vigorous screening using a 200% search efficiency of Gold fitness score (Goldscore 200%).
2.2.4. Molecular mechanics generalized Born surface area (MMGBSA)
The top 6,000 (10%) hits from Glide XP and the top 6,000 (10%) hits from Gold docking were selected for the MMGBSA analysis. The estimated binding free energy of the top hits was calculated with the Maestro Prime/MMGBSA module using the VSGB solvation model and the OPLS4 force field [50].
2.2.5. Molecular dynamics simulation
The Maestro System Builder utility was employed to prepare docked protein-ligand complexes for molecular dynamics (MD) simulation. The simple point-charge (SPC) water model was used with an orthorhombic box extending 10 Å from protein. The placement of ions was excluded within 20 Å of the ligand. Sodium chloride was added to make the concentration 0.15 M, and extra sodium or chloride ions were added to neutralize the system. The OPLS4 force field was used for simulation. Initially, the NPT ensemble was used at 300 K and 1.01325 bar, and the system was relaxed for 100 ps before simulation. The simulation was recorded for 12 ns with a 3 ps recording interval. The complexes presenting reasonable stability for 12 ns were selected for 100 ns simulation with recording interval of 4 ps.
3. Results
3.1. Peptide library production
The overall scheme to build the peptide library is displayed in Fig. 1. All possible combinations of the twenty canonical amino acids provided 400 dipeptides, 8,000 tripeptides, 160,000 tetrapeptides, and 3.2 million pentapeptides according to the relation 20n, where n is 2, 3, 4, and 5 for tripeptide, tetrapeptide, and pentapeptide, respectively. Di-, tri- and tetra-peptides were not subjected to filters due to their smaller size and relative ease of synthesis compared to pentapeptides—they can be reasonably made in the solution phase, and the synthetic challenges associated with solid-phase synthesis can be evaded. Based on aqueous solubility, stability, and ease of synthesis, pentapeptides were passed through a set of seven filters, giving 1,169,013 pentapeptides.
Fig. 1.
Building peptide dataset. Process diagram displaying the key steps in building the peptide dataset in SMILES and SDF format after applying peptide filters. SDF: structured data files; SMILES: simplified molecular-input line-entry system.
Homodetic cyclic peptides were prepared using 400 dipeptides, 8,000 tripeptides, 160,000 tetrapeptides, and the 3.2 million pentapeptides. N-to-C-terminal amide cyclization of all possible sequences generates substantial number of duplicate peptides. The removal of duplicate entries yielded 210 cyclic dipeptides, 2,680 cyclic tripeptides, 40,110 cyclic tetrapeptides, and 640,016 cyclic pentapeptides.
Collectively, a total of 2,020,429 filtered peptides (linear and cyclic) were generated (Table 1). The datasets of linear and cyclic peptides are available in FASTA and SMILES format in separate files. These files can be readily converted into any desired format (PDB, MOL2, etc.) for molecular docking.
Table 1.
Number of peptides generated during dataset construction. The three datasets are available in FASTA, simplified molecular-input line-entry system (SMILES) and structure-data file (SDF) formats.
| Form | Dipeptide | Tripeptide | Tetrapeptide | Pentapeptide | Total |
|---|---|---|---|---|---|
| Linear | 400 | 8,000 | 160,000 | 3,200,000 | 3,368,400 |
| Linear filtered | (400) | (8,000) | (160,000) | (1,169,013) | (1,337,413) |
| Cyclic | 210 | 2,680 | 40,110 | 640,016 | 683,016 |
| Total | 610 | 10,680 | 200,110 | 3,840,016 (1,809,029 filtered) | 4,051,416 (2,020,429 filtered) |
A total of 2,020,429 peptides (di-, tri-, tetra-, and penta-peptides in their linear and cyclic forms) were used for docking against SARS-CoV-2 Mpro as a case study to implement the peptide dataset. For computational simplicity, multiple conformations of each peptide were not generated due to the large number of input compounds and the proof-of-principle nature of this work; instead, only one most suitable conformation for each peptide was selected for docking. A more rigorous effort, but considerably more computationally expensive, would want to generate 20 low lying conformations per peptide to ensure the conformational space is being sampled.
3.2. Virtual screening of peptide dataset on SARS-CoV-2 main protease
3.2.1. Glide
A quick docking screen of the library using Glide HTVS provided binding energy as low as −10.9 kcal/mol. The top 600,000 hits (∼30% of the dataset, −10.9 to −5.1 kcal/mol) were subjected to Glide SP peptide docking, and the top 60,000 hits from this screen (10%, −12.1 to −5.2 kcal/mol) were docked using Glide XP mode. Following this, the top 6,000 hits (10%, −13.7 to −8.8 kcal/mol) were subjected to Prime MMGBSA calculations. The MMGBSA data of these “binders” ranged between −90.2 and −9.1 kcal/mol (Fig. 2). We note that we selected the center of the docking grid at the MPro catalytic dyad. The screen is looking for orthosteric inhibitors and would not identify allosteric binders to the protein at other sites.
Fig. 2.
Schematic workflow for peptide screening. The schematic block diagram of the in silico workflow followed in the study to screen peptides against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease. Lighter to darker colour represents the process of a workflow. HTVS: high-throughout-virtual screening; SP: standard precision; XP: extra precision; ChemPLP: piecewise linear potential; MMGBSA: molecular mechanics generalized Born surface area.
3.2.2. Gold
An initial screen of the library was performed with the ChemPLP score in Gold. The top 600,000 hits (∼30%, 113.5 to 46 ChemPLP score) were docked using Goldscore with 10% efficiency. The top 60,000 hits from this assay (top 10% with 112.6 to 52.2 Goldscore 10% efficiency) were further docked using Goldscore 200% efficiency. The Goldscore 200% values of these hits ranged from 117.3 to 50.1. The top 6,000 hits from this step (top 10% with 117.3 to 89.6 Goldscore 200% efficiency) were subjected to Prime MMGBSA. The results from this MMGBSA calculation ranged between −89.3 and −12.4 kcal/mol (Fig. 2, Fig. 3).
Fig. 3.
Peptide screening results from two independent workflows. (A) Top 6,000 (10%) hits obtained from Glide high throughput virtual screening (HTVS) > standard precision (SP) > extra percision (XP) score, distributed in Glide XP docking score slabs. (B) Top 6,000 (10%) hits obtained from Gold ChemPLP > Goldscore 10% efficiency > Goldscore 200% efficiency, distributed in Goldscore 200% efficiency slabs. (C) Scatter plot of Glide XP docking score vs Goldscore 200% efficiency of the 547 common top hits found from both top 6,000 hits from each study taken from corresponding top-ranked datasets. (D, E) Docked conformations of the top hits identified from workflow 1 (Glide HTVS > SP-peptide > XP > Prime MMGBSA) (D) and workflow 2 (Gold ChemPLP score > Goldscore 10% > Goldscore 200% > Prime-MMGBSA) € inside the binding pocket of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease. ChemPLP docking: piecewise linear potential; MMGBSA: molecular mechanics generalized Born surface area.
3.2.3. Molecular dynamics simulations
With this integrated parallel virtual screening pipeline, the top hits were selected for MD simulation after careful consideration, completed in 3 steps: 1) top 25 hits from Glide HTVS > Glide SP peptide > Glide XP > Prime MMGBSA workflow; 2) top 25 hits from Gold ChemPLP > Goldscore 10% > Goldscore 200% > Prime MMGBSA workflow, and 3) top 25 common hits that displayed consensus binding poses after both workflows (Fig. 3). The latter was selected extensively through visual inspection, i.e., peptides that revealed consensus binding conformation by both docking approaches and displayed significant molecular interactions with the critical binding pocket residues, excluding peptides that showed unrealistic docking conformations. These might not be the top hits arising from either algorithm, but a major drawback of single tool-docking studies is their poor correlation to experimental data [21,32,53,54]. Correlation is always improved when molecules are identified using multiple different workflows, and we consider two methods the absolute minimum acceptable. We suggest at least four mutually independent methods for any planned CADD campaign.
These 75 hits were subjected to 12 ns MD simulations to analyze the stability of the complex, interaction energetics, and the conservation of specific intermolecular interactions. After MDs, the following parameters were taken into consideration to select the best hits: 1) a favorable binding affinity (kcal/mol) in terms of MMGBSA calculations; 2) minimum fluctuations of peptide inside the binding pocket (RMSD < 3.5 Å); 3) favorable interactions as determined from per-residue decomposition analysis, and 4) having at least one stable H-bond (highest occupancy) with one of the residues of the catalytic dyad over the 12 ns simulation.
The initial poses of all the selected 75 peptides fit pretty well in the Mpro binding pocket, predominantly interacting with the catalytic dyad. This is to be expected that only the best-fitting peptides would occupy the active site, and a strong hydrogen bond with a member of the catalytic dyad that is already primed to be nucleophilic and acidic was almost going to be a certainty. As the MD simulation progressed, some of the peptides migrated significantly and decreased their interaction with the catalytic dyad. This resulted in an unstable binding conformation; therefore, they could be excluded from further studies. The least disrupted 20 peptides were selected for further analysis and were subjected to a moderate 100 ns MD simulation. Of these, ten protein-peptide complexes displayed a stable RMSD over the simulation period, with the peptides staying in the binding pocket and interacting tightly with the protein's residues. All these peptides displayed significant interactions, including H-bond with His41 and Cys145 of the catalytic dyad (occupancy, >30%; average distance, <3 Å). We discussed the two best-binding peptides (HLFNT and GQYWH); the remainder (GHYFH, FKQH, IKRDM, CHQLN, GQRFL, AQRFS, AQQFN, and EEYCM) are discussed in the Supplementary data. We would like to highlight the diversity of these hits; it would be potentially expected that the algorithms would collapse into a series of closely related sequences with variability at only one residue. This is not the case here, and this variety maximizes the possibility of finding a hit that is amenable to further optimization by providing good coverage of the possible chemical space explored.
3.2.4. HLFNT and GQYWH as high affinity binders for Mpro
An analyzes of HLFNT and GQYWH's interactions with Mpro was conducted on the MD simulation data (Fig. 4, Fig. 5, Fig. 6). The final frame of the 12 ns production run was used as the initial frame for the extended 100 ns simulation. The RMSD of the Cα-backbone atoms of Mpro with bound peptide was examined and provided information about the overall stability of the system. The results indicated that the Mpro remained stable in the presence of peptides, and the peptides remained closely converged within 3 Å. This observation suggests that the formation of a stable complex was achieved, which was analyzed throughout the simulation duration.
Fig. 4.
The binding conformation of (A) GQYWH and (B) HLFNT inside the binding pocket of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease. The binding pocket residues are labelled and displayed in stick representation, while catalytic dyad residues are colored red.
Fig. 5.
Post-molecular dynamics (MD) simulation analysis of HLFNT@Mpro. (A) The root mean square deviation (RMSD) of Mpro Cα-backbone bound to HLFNT over the simulation period in ns. (B) Ligand-root mean square fluctuation (RMSF) of HLFNT for characterizing changes in the peptide atom positions during the final 45 ns of the MD simulations. (C) Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Mpro interactions with the peptide were monitored throughout the MD simulations. While interaction fraction represents the percentage of specific interaction maintained during simulation time, these are categorized into four types: hydrogen bonds, hydrophobic, ionic and water bridges. (D) A schematic plot of detailed ligand atom interactions with the protein residues; interactions that occur more than 30% during the simulation period are displayed.
Fig. 6.
Post-molecular dynamics (MD) simulation analysis of GQYWH@Mpro. (A) The root mean square deviation (RMSD) of Mpro Cα-backbone bound (RMSF) to GQYWH over the simulation period in ns. (B) Ligand-root mean square fluctuation (RMSF) of GQYWH for characterizing changes in the peptide atom positions during the final 50 ns of the MD simulations. (C) Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Mpro interactions with the peptide were monitored throughout the the final 50 ns of the MD simulations. While interaction fraction represents the percentage of specific interaction maintained during simulation time, these are categorized into four types: hydrogen bonds, hydrophobic, ionic and water bridges. (D) A schematic plot of detailed ligand atom interactions with the protein residues; interactions that occur more than 30% during the simulation period are displayed.
In the HLFNT@Mpro complex, the RMSD values of Mpro initially (for the first ∼50 ns) displayed slight fluctuation but then stabilized and shifted around 1 Å for the rest of ∼50 ns, indicating a tight and conserved interaction mode of HLFNT (Fig. 5A). The GQYWH@Mpro complex remained stable throughout the simulation period, and RMSD values rapidly converged (Fig. 6A). In HLFNT@Mpro, the initial fluctuations of Cα-backbone atoms of Mpro were evident from the ligand-root-mean-square-fluctuations (L-RMSF) values of HLFNT (∼3 Å) compared to the stable conformation of GQYWH (<3 Å) inside Mpro binding site throughout the simulation period. The presence of more aromatic residues in GQYWH added stability due to the presence of significant π-stacking interactions. Both peptides bind near the catalytic dyad. Besides their H-bond interactions with the catalytic dyad, both peptides formed H-bonds with adjacent binding site residues: Thr25, Asn119, Gly143, Glu166 and Gln189. Glu166 established a well-populated H-bond (occupancy, >90%) with backbone atoms of both peptides. In comparison, all other interactions were supported with more than 30% occupation (average distance <3 Å). The tryptophan in GQYWH established a well-populated π–π stacking interaction (occupancy, 97%) with the catalytic His41. The stable GQYWH@Mpro complex formation over the simulation period was possibly due to the formation of this consistent stacking interaction.
4. Discussion
A dataset of all sequences of linear di-, tri-, and tetra-peptides has been previously published by others to help perform CADD [55]. This or in-house libraries have been used, and there are reports of efforts to comprehensively screen all possible di- or tri-peptides against a target protein [56,57]; however, to the best of our knowledge, no machine-readable database is available with all possible pentapeptides or the N-to-C-terminal cyclic tri-, tetra-, and penta-peptides. This manuscript is accompanied by the publication of this library for use by others. We also provide a simple method for conducting prefiltration on these peptides. These libraries are provided in FASTA and SMILES formats that can be easily converted to dock-able formats, e.g., PDB, and MOL2. We also provide the SDF file with the prefiltered library we used in this study.
We chose to prefilter the dataset. It would not take significantly more computational time to screen all possible pentapeptides, but it is important to have a method for arbitrarily reducing the number of simulations to be run according to a research team's whims. The filtrations we conducted here are reasonable to eliminate problematic sequences, but it would also be reasonable for a researcher to not use these filters on a small dataset. The filters were selected in this case so that all peptides arising from the screen would be trivial to synthesize using automated solid phase peptide synthesis [58]. The method of filtration and this filtered data set are provided for ease of use by others should they so desire.
Most small peptides (di-, tri-, tetra-, and penta-peptides) have suitable physicochemical properties and are easy to synthesize. The natural zwitterion assists with solubility. However, this charge density decreases as the peptide grows in length so some would be sparingly soluble, while other sequences can prove unstable.
The chosen filters fall into three categories: solubility, stability, and ease of synthesis. In terms of solubility, we filtered out sequences liable for gel formation (>75% D, E, H, K, N, Q, R, S, T or Y, Filter 1). These would not likely be soluble as distinct species and would likely give false positives on many binding assays. We also removed sequences with a preponderance of aliphatic hydrophobic residues (Filter 2), and again the concern is the solubility of these peptides providing false positives in assays due to precipitation and non-specific binding [59]. We then removed a series of notoriously unstable sequences. Cysteine and methionine are liable to rapid oxidation and cleavage of protecting groups during synthesis, making subsequent purification complicated; peptides containing multiple of either sulfur-containing residues were removed (Filter 3). The final category is synthetically challenging sequences that might have reduced purity coming off a bead. A typical pentapeptide needs only a simple precipitation followed by solid phase extraction to render a sufficiently pure material for evaluation. Some combinations of residues are known to be problematic and could interfere with this workflow. Short peptides containing two or more serines or prolines are more likely to lead to deletions during synthesis and were removed (Filter 4). Similarly, aspartic acid residues adjacent to glycine, serine, or proline are susceptible to hydrolysis during acidic cleavage conditions, typically used in solid-phase peptide synthesis (Filter 5). Finally, N-terminal glutamines readily cyclize to pyroglutamate and N-terminal asparagines are irritating to deprotect without decomposition. Peptides with either motif were removed (Filters 6 and 7). As this is a virtual study, these filters could have been omitted, but were included to show the ease of use of the technique and the library manipulation. Even then, we didn't prefilter the tetra-, tri-, or di-peptides in any way (Fig. 2). A limitation of employing filters, like those discussed in this article, is that they will eliminate some of promising sequences. We emphasize the employment of the filters on case-to-case bases, e.g., Filter 5, removing sequences where D is adjacent to P, G, or S as aspartic acid-containing peptides can suffer hydrolysis resulting in peptide cleavage under acidic conditions with these sequences in place. This suggests that these combinations should be avoided when designing orally administrable peptides. However, RGD is also the key triplet for driving cell adhesion and is extremely common in many cell surface peptides and proteins [60]. Depending on the desired application, the user should use filters with care. We provide this template for the end users to employ or ignore as they see fit for their given biomedical challenge.
To understand the challenges associated with screening these peptides in a CADD workflow, we selected SARS-CoV-2 Mpro as a potential target due to its compact and flexible catalytic site [61]. Precise binding affinity and peptide conformational prediction of the protein-peptide complex is challenging [53,62]. We established an integrated peptide virtual screening pipeline to overcome this by incorporating two state-of-the-art molecular docking suits (Fig. 2). The benchmark studies have reported that the Goldscore is the most efficient for small peptide docking among different scoring functions available in the Gold docking suite [21]. Goldscore can predict near-native conformations of top-scoring peptides (having more than 9 residues) with an accuracy higher than 30% [21]. The Glide SP-peptide has been proven to be 53% accurate in predicting peptide poses. The accuracy could be improved to 58% when SP-peptide is preceded by Prime MMGBSA [63,64].
The binding pocket of SARS-CoV-2 Mpro is not deep, and most of it is solvent-exposed. The binding pocket of SARS-CoV-2 Mpro in two apo structures, 7ALH and 6M2Q, interact with 12 and 10 water molecules, respectively [65,66]. The part of the inhibitors sitting in this solvent-accessible region could interact with the solvent, thereby reducing the strength of the protein-ligand interaction. Under these conditions, most docking protocols do not account for the role of water molecules, and docking accuracy can be very poor in highly solvent-accessible binding pockets. The MMGBSA and, especially the MD simulations become very important due to the involvement of water molecules.
Among the top ten selected hits, one linear tetrapeptide and nine linear pentapeptides were identified. No cyclic peptide was identified in the selected top hits because the binding site of SARS-CoV-2 Mpro has a tight deep gorge that could not accommodate cyclic peptides. Neither of our top hits were previously identified in the literature nor evaluated against Mpro. Some custom small peptides (1–8) analogous to our selected top hits (GQYWH and HLFNT) tested as SARS-CoV-1 Mpro inhibitors are presented in Fig. 7. The peptides 1, 2, 3, and 4 have been tested to inhibit Mpro at 100 μM with percentage inhibition values of 72%, 23%, <10%, and <10%, respectively [67]. The peptide 5 is a potent inhibitor of SARS-CoV-1 Mpro with Ki of 58 nM [68]. Peptides 6, 7, and 8 have been cocrystallized with Mpro of SARS-CoV-1. The Ki value of 6 was 10.7 μM [69], and the IC50 of 7 was 80 μM [70]. The Kinact/Ki value of 8 against Mpro of SARS-CoV was 1900 M−1s−1 [71].
Fig. 7.
Structures of small peptidomimetics reported as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Mpro inhibitors. Structures of two top hits (GQYWH and HLFNT) identified in this study. The hydroxyl group of threonine (T) in entries 1, 3, 4, and 5 is in the ether form with benzyl or tert-butyl groups. PDB: Protein Data Bank.
Peptide 8 has a strong structural analogy with our identified hit, HLFNT. The N-terminal histidine of HLFNT is replaced with benzyl hydrogen carbonate in 8 and the following two amino acids, leucine and phenylalanine, are identical in both peptides. The fourth asparagine residue of HLFNT is replaced with glutamine type of residue having N in place of α-carbon in 8, having only one extra methylene residue. The C-terminal threonine of HLFNT is substituted with ethyl ester of 2-hydroxypropanoic acid in 8, both having hydroxyl groups. This close analogy validates the use of the prepared library and authenticates our in silico screening workflow.
5. Conclusion
Among the marketed therapeutics used today, peptides have proved a valuable niche in the drug development spectrum substituting small molecules and large biological therapeutics. The study provides a library of small linear peptides and their N-to-C-terminal cyclized analogues using all possible sequences of 20 canonical amino acids. We have also devised a comprehensive strategy to handle such large peptide datasets utilizing an appropriate array of tools to extract desirable outcomes. The given library could be screened against any desired target protein to identify small peptide hits. These hits can be further refined, e.g., through the attachment of amino acids on the C- or N-terminal to extend the length, substitution with noncanonical amino acids, or coupling additional organic moieties to the termini to further improve activity. We hope and believe that this library of high-quality, curated peptides provides a valuable resource for any researcher looking to find new peptide ligands and therapeutic agents against a variety of targets.
CRediT author statement
Sarfraz Ahmad: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Data curation, Writing - Original draft preparation, Reviewing and Editing, Visualization; Muhammad Usman Mirza: Methodology, Validation, Formal analysis, Investigation, Data curation, Writing - Original draft preparation, Reviewing and Editing, Visualization; John F. Trant: Conceptualization, Funding acquisition, Resources, Writing - Reviewing and Editing, Supervision.
Declaration of competing interest
The authors declare that there are no conflicts of interest.
Acknowledgments
The authors highly acknowledge the Natural Sciences and Engineering Research Council of Canada (Grant No.: 2018-06338) for providing the resources to conduct this work. Sarfraz Ahmad, Muhammad Usman Mirza and John F. Trant wish to recognize that this work was made possible by the facilities of the Shared Hierarchical Academic Research Computing Network (SHARCNET: www.sharcnet.ca) and Compute/Calculation Canada, now the Digital Research Alliance of Canada (https://alliancecan.ca/en). The authors are also affiliated with Binary Star Research Services (https://binarystarchem.ca/). The content of this article is exclusively the responsibility of the authors and does not necessarily reflect the official views of Binary Star Research Services. Binary Star Research Services had no input into the conclusions of this article.
Footnotes
Peer review under responsibility of Xi'an Jiaotong University.
Supplementary data to this article can be found online at https://doi.org/10.1016/j.jpha.2023.04.008.
Contributor Information
Muhammad Usman Mirza, Email: mumirza@uwindsor.ca.
John F. Trant, Email: j.trant@uwindsor.ca.
Appendix A. Supplementary data
The following is the Supplementary data to this article.
References
- 1.Bruno B.J., Miller G.D., Lim C.S. Basics and recent advances in peptide and protein drug delivery. Ther. Deliv. 2013;4:1443–1467. doi: 10.4155/tde.13.104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Anand U., Bandyopadhyay A., Jha N.K., et al. Translational aspect in peptide drug discovery and development: An emerging therapeutic candidate. Biofactors. 2023;49:251–269. doi: 10.1002/biof.1913. [DOI] [PubMed] [Google Scholar]
- 3.Muttenthaler M., King G.F., Adams D.J., et al. Trends in peptide drug discovery. Nat. Rev. Drug Discov. 2021;20:309–325. doi: 10.1038/s41573-020-00135-8. [DOI] [PubMed] [Google Scholar]
- 4.Petsalaki E., Russell R.B. Peptide-mediated interactions in biological systems: New discoveries and applications. Curr. Opin. Biotechnol. 2008;19:344–350. doi: 10.1016/j.copbio.2008.06.004. [DOI] [PubMed] [Google Scholar]
- 5.Wang L., Wang N., Zhang W., et al. Therapeutic peptides: Current applications and future directions. Signal Transduct. Target Ther. 2022;7:48. doi: 10.1038/s41392-022-00904-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Johnston S.L. Biologic therapies: What and when? J. Clin. Pathol. 2007;60:8–17. doi: 10.1136/jcp.2005.032300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Boehncke W.-H., Brembilla N.C. Immunogenicity of biologic therapies: Couses and consequences. Expert Rev. Clin. Immunol. 2018;14:513–523. doi: 10.1080/1744666X.2018.1468753. [DOI] [PubMed] [Google Scholar]
- 8.Makurvet F.D. Biologics vs. small molecules: Drug costs and patient access. Med. Drug Discov. 2021;9 [Google Scholar]
- 9.Škalko-Basnet N. Biologics: The role of delivery systems in improved therapy. Biol. Targets Ther. 2014;8:107–114. doi: 10.2147/BTT.S38387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Smith M.C., Gestwicki J.E. Features of protein–protein interactions that translate into potent inhibitors: Topology, surface area and affinity. Expert Rev. Mol. Med. 2012;14:e16. doi: 10.1017/erm.2012.10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.de Lomana M.G., Svensson F., Volkamer A., et al. Consideration of predicted small-molecule metabolites in computational toxicology. Digit. Discov. 2022;1:158–172. [Google Scholar]
- 12.Waldmann H. Human monoclonal antibodies: The residual challenge of antibody immunogenicity. Methods Mol. Biol. 2014:1–8. doi: 10.1007/978-1-62703-586-6_1. [DOI] [PubMed] [Google Scholar]
- 13.Fosgerau K., Hoffmann T. Peptide therapeutics: Current status and future directions. Drug Discov. Today. 2015;20:122–128. doi: 10.1016/j.drudis.2014.10.003. [DOI] [PubMed] [Google Scholar]
- 14.Hale M., Oyler G., Swaminathan S., et al. Basic tetrapeptides as potent intracellular inhibitors of type A botulinum neurotoxin protease activity. J. Biol. Chem. 2011;286:1802–1811. doi: 10.1074/jbc.M110.146464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Vlieghe P., Lisowski V., Martinez J., et al. Synthetic therapeutic peptides: Science and market. Drug Discov. Today. 2010;15:40–56. doi: 10.1016/j.drudis.2009.10.009. [DOI] [PubMed] [Google Scholar]
- 16.Craik D.J., Fairlie D.P., Liras S., et al. The future of peptide-based drugs. Chem. Biol. Drug Des. 2013;81:136–147. doi: 10.1111/cbdd.12055. [DOI] [PubMed] [Google Scholar]
- 17.Kaspar A.A., Reichert J.M. Future directions for peptide therapeutics development. Drug Discov. Today. 2013;18:807–817. doi: 10.1016/j.drudis.2013.05.011. [DOI] [PubMed] [Google Scholar]
- 18.Caballero J. The latest automated docking technologies for novel drug discovery. Expert Opin. Drug Discov. 2021;16:625–645. doi: 10.1080/17460441.2021.1858793. [DOI] [PubMed] [Google Scholar]
- 19.Stanzione F., Giangreco I., Cole J.C. Use of molecular docking computational tools in drug discovery. Prog. Med. Chem. 2021;60:273–343. doi: 10.1016/bs.pmch.2021.01.004. [DOI] [PubMed] [Google Scholar]
- 20.Weng G., Gao J., Wang Z., et al. Comprehensive evaluation of fourteen docking programs on protein-peptide complexes. J. Chem. Theory Comput. 2020;16:3959–3969. doi: 10.1021/acs.jctc.9b01208. [DOI] [PubMed] [Google Scholar]
- 21.Hauser A.S., Windshügel B.R. LEADS-PEP: A benchmark data set for assessment of peptide docking performance. J. Chem. Inf. Model. 2016;56:188–200. doi: 10.1021/acs.jcim.5b00234. [DOI] [PubMed] [Google Scholar]
- 22.Salo-Ahen O.M.H., Alanko I., Bhadane R., et al. Molecular dynamics simulations in drug discovery and pharmaceutical development. Processes. 2020;9:71. [Google Scholar]
- 23.Steuten K., Kim H., Widen J.C., et al. Challenges for targeting SARS-CoV-2 proteases as a therapeutic strategy for COVID-19. ACS Infect. Dis. 2021;7:1457–1468. doi: 10.1021/acsinfecdis.0c00815. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Anand K., Ziebuhr J., Wadhwani P., et al. Coronavirus main proteinase (3CLpro) structure: Basis for design of anti-SARS drugs. Science. 2003;300:1763–1767. doi: 10.1126/science.1085658. [DOI] [PubMed] [Google Scholar]
- 25.Zhao Y., Fang C., Zhang Q., et al. Crystal structure of SARS-CoV-2 main protease in complex with protease inhibitor PF-07321332. Prot. Cell. 2022;13:689–693. doi: 10.1007/s13238-021-00883-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lee J., Worrall L.J., Vuckovic M., et al. Crystallographic structure of wild-type SARS-CoV-2 main protease acyl-enzyme intermediate with physiological C-terminal autoprocessing site. Nat. Commun. 2020;11:5877. doi: 10.1038/s41467-020-19662-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zhang L., Lin D., Sun X., et al. Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors. Science. 2020;368:409–412. doi: 10.1126/science.abb3405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Mirza M.U., Froeyen M. Structural elucidation of SARS-CoV-2 vital proteins: Computational methods reveal potential drug candidates against main protease, Nsp12 polymerase and Nsp13 helicase. J. Pharm. Anal. 2020;10:320–328. doi: 10.1016/j.jpha.2020.04.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Zhang L., Lin D., Kusov Y., et al. α-Ketoamides as broad-spectrum inhibitors of coronavirus and enterovirus replication: Structure-based design, synthesis, and activity assessment. J. Med. Chem. 2020;63:4562–4578. doi: 10.1021/acs.jmedchem.9b01828. [DOI] [PubMed] [Google Scholar]
- 30.Ahmad S., Mirza M.U., Lee Y.K., et al. Fragment-based in silico design of SARS CoV-2 main protease inhibitors. Chem. Biol. Drug Des. 2021;98:604–619. doi: 10.1111/cbdd.13914. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Wu C., Liu Y., Yang Y., et al. Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods. Acta Pharm. Sin. B. 2020;10:766–788. doi: 10.1016/j.apsb.2020.02.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Macip G., Garcia-Segura P., Mestres-Truyol J., et al. Haste makes waste: A critical review of docking-based virtual screening in drug repurposing for SARS-CoV-2 main protease (M-pro) inhibition. Med. Res. Rev. 2022;42:744–769. doi: 10.1002/med.21862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Breidenbach J., Lemke C., Pillaiyar T., et al. Targeting the main protease of SARS-CoV-2: From the establishment of high throughput screening to the design of tailored inhibitors. Angew. Chem. Int. Ed. Engl. 2021;60:10423–10429. doi: 10.1002/anie.202016961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Pillaiyar T., Flury P., Krüger N., et al. Small-molecule thioesters as SARS-CoV-2 main protease inhibitors: Enzyme inhibition, structure-activity relationships, antiviral activity, and X-ray structure determination. J. Med. Chem. 2022;65:9376–9395. doi: 10.1021/acs.jmedchem.2c00636. [DOI] [PubMed] [Google Scholar]
- 35.Hu Q., Xiong Y., Zhu G.-H., et al. The SARS-CoV-2 main protease (Mpro): Structure, function, and emerging therapies for COVID-19. MedComm. 2022;3:e151. doi: 10.1002/mco2.151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.La Monica G., Bono A., Lauria A., et al. Targeting SARS-CoV-2 main protease for treatment of COVID-19: Covalent inhibitors structure-activity relationship insights and evolution perspectives. J. Med. Chem. 2022;65:12500–12534. doi: 10.1021/acs.jmedchem.2c01005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Gao K., Wang R., Chen J., et al. Perspectives on SARS-CoV-2 main protease inhibitors. J. Med. Chem. 2021;64:16922–16955. doi: 10.1021/acs.jmedchem.1c00409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Bzówka M., Mitusińska K., Raczyńska A., et al. Structural and evolutionary analysis indicate that the SARS-CoV-2 Mpro is a challenging target for small-molecule inhibitor design. Int. J. Mol. Sci. 2020;21:3099. doi: 10.3390/ijms21093099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Quan B.-X., Shuai H., Xia A.-J., et al. An orally available Mpro inhibitor is effective against wild-type SARS-CoV-2 and variants including Omicron. Nat. Microbiol. 2022;7:716–725. doi: 10.1038/s41564-022-01119-7. [DOI] [PubMed] [Google Scholar]
- 40.P. Kashyap, V.K. Bhardwaj, M. Chauhan, et al., A ricin-based peptide BRIP from Hordeum vulgare inhibits Mpro of SARS-CoV-2, Sci. Rep. 12 (2022), 12802. [DOI] [PMC free article] [PubMed]
- 41.Johansen-Leete J., Ullrich S., Fry S.E., et al. Antiviral cyclic peptides targeting the main protease of SARS-CoV-2. Chem. Sci. 2022;13:3826–3836. doi: 10.1039/d1sc06750h. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.M.U. Mirza, I. Alanko, M. Vanmeert, et al., The discovery of Zika virus NS2B-NS3 inhibitors with antiviral activity via an integrated virtual screening approach, Eur. J. Pharm. Sci. 175 (2022), 106220. [DOI] [PubMed]
- 43.Anderson E., Veith G.D., Weininger D. US Environmental Protection Agency, Environmental Research Laboratory; Duluth, MN: 1987. SMILES, A Line Notation and Computerized Interpreter for Chemical Structures. [Google Scholar]
- 44.Lipman D.J., Pearson W.R. Rapid and sensitive protein similarity searches. Science. 1985;227:1435–1441. doi: 10.1126/science.2983426. [DOI] [PubMed] [Google Scholar]
- 45.O’Boyle N.M., Banck M., James C.A., et al. Open Babel: An open chemical toolbox. J. Cheminformatics. 2011;3:33. doi: 10.1186/1758-2946-3-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Sastry G.M., Adzhigirey M., Day T., et al. Protein and ligand preparation: Parameters, protocols, and influence on virtual screening enrichments. J. Comput. Aided Mol. Des. 2013;27:221–234. doi: 10.1007/s10822-013-9644-8. [DOI] [PubMed] [Google Scholar]
- 47.Roos K., Wu C., Damm W., et al. OPLS3e: Extending force field coverage for drug-like small molecules. J. Chem. Theory Comput. 2019;15:1863–1874. doi: 10.1021/acs.jctc.8b01026. [DOI] [PubMed] [Google Scholar]
- 48.Anson B., Ghosh A.K., Mesecar A. Protein Databank; 2020. X-ray structure of SARS-CoV-2 main protease bound to GRL-024-20 at 1.45 A, 6XR3. [Google Scholar]
- 49.Mirza M.U., Saadabadi A., Vanmeert M., et al. Discovery of HIV entry inhibitors via a hybrid CXCR4 and CCR5 receptor pharmacophore-based virtual screening approach. Eur. J. Pharm. Sci. 2020;155 doi: 10.1016/j.ejps.2020.105537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Li J., Abel R., Zhu K., et al. The VSGB 2.0 model: A next generation energy model for high resolution protein structure modeling. Proteins Struct. Funct. Bioinform. 2011;79:2794–2812. doi: 10.1002/prot.23106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Jones G., Willett P., Glen R.C., et al. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 1997;267:727–748. doi: 10.1006/jmbi.1996.0897. [DOI] [PubMed] [Google Scholar]
- 52.Hartshorn M.J., Verdonk M.L., Chessari G., et al. Diverse, high-quality test set for the validation of protein-ligand docking performance. J. Med. Chem. 2007;50:726–741. doi: 10.1021/jm061277y. [DOI] [PubMed] [Google Scholar]
- 53.Agrawal P., Singh H., Srivastava H.K., et al. Benchmarking of different molecular docking methods for protein-peptide docking. BMC Bioinform. 2019;19:105–124. doi: 10.1186/s12859-018-2449-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Sanner M.F., Dieguez L., Forli S., et al. Improving docking power for short peptides using random forest. J. Chem. Inf. Model. 2021;61:3074–3090. doi: 10.1021/acs.jcim.1c00573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Prasasty V.D., Istyastono E.P. Data of small peptides in SMILES and three-dimensional formats for virtual screening campaigns. Data Br. 2019;27 doi: 10.1016/j.dib.2019.104607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Panyayai T., Sangsawad P., Pacharawongsakda E., et al. The potential peptides against angiotensin-I converting enzyme through a virtual tripeptide-constructing library. Comput. Biol. Chem. 2018;77:207–213. doi: 10.1016/j.compbiolchem.2018.10.001. [DOI] [PubMed] [Google Scholar]
- 57.Mollica A., Zengin G., Durdagi S., et al. Combinatorial peptide library screening for discovery of diverse α-glucosidase inhibitors using molecular dynamics simulations and binary QSAR models. J. Biomol. Struct. Dyn. 2019;37:726–740. doi: 10.1080/07391102.2018.1439403. [DOI] [PubMed] [Google Scholar]
- 58.Petrou C., Sarigiannis Y. Peptide synthesis: Methods, trends, and challenges, Pept. Appli. Biomed. Biotech. Bioengin. 2018:1–21. [Google Scholar]
- 59.Sarma R., Wong K.-Y., Lynch G.C., et al. Peptide solubility limits: Backbone and side-chain interactions. J. Phys. Chem. B. 2018;122:3528–3539. doi: 10.1021/acs.jpcb.7b10734. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Kapp T.G., Rechenmacher F., Neubauer S., et al. A comprehensive evaluation of the activity and selectivity profile of ligands for RGD-binding integrins. Sci. Rep. 2017;7 doi: 10.1038/srep39805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Rocha R.E., Chaves E.J., Fischer P.H., et al. A higher flexibility at the SARS-CoV-2 main protease active site compared to SARS-CoV and its potentialities for new inhibitor virtual screening targeting multi-conformers. J. Biomol. Struct. Dyn. 2022;40:9214–9234. doi: 10.1080/07391102.2021.1924271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Ciemny M., Kurcinski M., Kamel K., et al. Protein-peptide docking: Opportunities and challenges, Drug Discov. Today. 2018;23:1530–1537. doi: 10.1016/j.drudis.2018.05.006. [DOI] [PubMed] [Google Scholar]
- 63.Tubert-Brohman I., Sherman W., Repasky M., et al. Improved docking of polypeptides with Glide. J. Chem. Inf. Model. 2013;53:1689–1699. doi: 10.1021/ci400128m. [DOI] [PubMed] [Google Scholar]
- 64.Feher M., Williams C.I. Numerical errors and chaotic behavior in docking simulations. J. Chem. Inf. Model. 2012;52:724–738. doi: 10.1021/ci200598m. [DOI] [PubMed] [Google Scholar]
- 65.Su H., Yao S., Zhao W., et al. Anti-SARS-CoV-2 activities in vitro of Shuanghuanglian preparations and bioactive ingredients. Acta Pharmacol. Sin. 2020;41:1167–1177. doi: 10.1038/s41401-020-0483-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.ul Qamar M.T., Mirza M.U., Song J.-M., et al. Probing the structural basis of Citrus phytochrome B using computational modelling and molecular dynamics simulation approaches. J. Mol. Liq. 2021;340 [Google Scholar]
- 67.Zhang J., Pettersson H.I., Huitema C., et al. Design, synthesis, and evaluation of inhibitors for severe acute respiratory syndrome 3C-like protease based on phthalhydrazide ketones or heteroaromatic esters. J. Med. Chem. 2007;50:1850–1864. doi: 10.1021/jm061425k. [DOI] [PubMed] [Google Scholar]
- 68.Yang S., Chen S.-J., Hsu M.-F., et al. Synthesis, crystal structure, structure-activity relationships, and antiviral activity of a potent SARS coronavirus 3CL protease inhibitor. J. Med. Chem. 2006;49:4971–4980. doi: 10.1021/jm0603926. [DOI] [PubMed] [Google Scholar]
- 69.Yang H., Xie W., Xue X., et al. Design of wide-spectrum inhibitors targeting coronavirus main proteases. PLoS Biol. 2005;3:e324. doi: 10.1371/journal.pbio.0030324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Ghosh A.K., Xi K., Grum-Tokars V., et al. Structure-based design, synthesis, and biological evaluation of peptidomimetic SARS-CoV 3CLpro inhibitors. Bioorg. Med. Chem. Lett. 2007;17:5876–5880. doi: 10.1016/j.bmcl.2007.08.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Lee T.-W., Cherney M.M., Huitema C., et al. Crystal structures of the main peptidase from the SARS coronavirus inhibited by a substrate-like aza-peptide epoxide. J. Mol. Biol. 2005;353:1137–1151. doi: 10.1016/j.jmb.2005.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.








