Abstract
Membranolytic anticancer peptides represent a potential strategy in the fight against cancer. However, our understanding of the underlying structure-activity relationships and the mechanisms driving their cell selectivity is still limited. We developed a computational approach as a step towards the rational design of potent and selective anticancer peptides. This machine learning model distinguishes between peptides with and without anticancer activity. This classifier was experimentally validated by synthesizing and testing a selection of 12 computationally generated peptides. In total, 83% of these predictions were correct. We then utilized an evolutionary molecular design algorithm to improve the peptide selectivity for cancer cells. This simulated molecular evolution process led to a five-fold selectivity increase with regard to human dermal microvascular endothelial cells and more than ten-fold improvement towards human erythrocytes. The results of the present study advocate for the applicability of machine learning models and evolutionary algorithms to design and optimize novel synthetic anticancer peptides with reduced hemolytic liability and increased cell-type selectivity.
Subject terms: Medicinal chemistry, Protein design, Cheminformatics
Introduction
Cancer therapy faces the challenge of resistance to chemotherapeutics and receptor-targeted anticancer agents. Several cell resistance mechanisms, such as drug inactivation or efflux, target protein alteration, DNA damage repair and signaling cascade alteration have been identified1,2. Moreover, the indiscriminate action of most chemotherapeutics towards all rapidly dividing cells causes a variety of severe side effects3,4. Membranolytic anticancer peptides (ACPs) represent a new class of potential cancer therapeutics. Their receptor-independent mechanism of action may hinder the development of cellular resistance3–5. Nevertheless, the underlying structure-activity relationship that explains the membranolytic properties of these peptides is not completely understood. Peptide amphipathicity, moderate overall hydrophobicity, and a positive net charge are known requirements for ACP activity6–9. However, no simple combination of these properties has been found sufficient to fully explain the activity and selectivity of ACPs towards cancer cells10. Producing novel peptides lacking toxicity against nonneoplastic cells also remains challenging11. Various machine learning methods have been successfully applied to guide the rational design of both ACPs12–18 and antimicrobial peptides (AMPs)19,20, as well as other membrane-active peptides21. The lack of a systematic annotation of the selectivity of ACPs towards cancer cells in the literature and in peptide databases has hindered the development of predictive models that take selectivity into account. There is a need for innovative methods that do not require selectivity data for peptide optimization.
Simulated molecular evolution (SME) is a stochastic optimization algorithm pioneered in the 1990s for computational peptide design22–24. SME belongs to the class of evolutionary algorithms, which also includes genetic algorithms, and enables the optimization of peptide properties that are encoded in a theoretical fitness function or in combination with an experimental fitness evaluation when structure-activity relationships cannot be determined a priori. We have recently applied this design concept to generate innovative membrane-targeting peptides25,26. Here, we present a peptide design approach that is based on a novel ACP prediction model and on SME for the optimization of ACP selectivity for cancer cells. The predictive machine learning model led to the discovery of four novel synthetic ACPs with low-micromolar activity (1–20 µM) against A549 lung cancer and MCF7 breast cancer cells. One of these peptides was then subjected to SME. After the first iteration of the optimization process, we obtained a novel ACP that showed micromolar activities against a range of cancer cell types with significantly reduced activity towards human dermal microvascular cells (HDMEC) and human erythrocytes. The results of this study advocate for machine-learning models in combination with computational sequence generators for designing and optimizing functional peptides in silico.
Results and Discussion
ACP classifier model
We developed a machine learning model to classify peptides into ACPs and non-ACPs based on their amino acid sequence representations. The machine-learning classifier was trained on “positive” (ACPs, active) and “negative” (non-ACPs, inactive) peptides. We retrieved alpha-helical ACPs from the CancerPPD database27 as positive examples (N = 339). For the negative class, we retrieved alpha-helices from nontransmembrane proteins in the PDB database28 (N = 680). All amino acid sequences were represented numerically in a computer-readable form by the use of molecular descriptors. For this purpose, we utilized a combination of PEPCATS pharmacophore feature descriptors29 and four global properties, namely, Eisenberg’s hydrophobicity, Eisenberg’s hydrophobic moment30, charge density, and peptide length (number of residues). The PEPCATS descriptor represents the amino acid sequences as binary vectors indicating cross-correlated pharmacophore features of the individual amino acids (hydrophobic, aromatic, hydrogen-bond acceptor, hydrogen-bond donor, positively ionizable, negatively ionizable). The cross-correlation of pharmacophoric feature pairs is determined within a sliding sequence window encompassing seven residues. The resulting 151-dimensional descriptor vector was reduced to an 18-dimensional feature vector by covariance elimination and sequential feature elimination (Fig. S1, Supplementary Information). The dataset was split into a training set (2/3) and an independent test set (1/3) by stratified sampling, preserving the proportion between the positive and negative classes. Two machine learning algorithms were considered for model development: random forests31 and support vector machines (SVM)32. We optimized the SVM model’s hyperparameter by 10-fold cross-validation on the training data and chose a linear kernel for SVM training to enable straightforward feature interpretation. The performance of both classifiers exceeded 0.9 for both the training and the test data for all calculated metrics (Table 1). The SVM model was selected for further analysis due to the robustness of its decision function, which is determined solely by the support vectors and therefore unaltered by the addition of new data points that lie outside the decision margin32. Additionally, an analytical decision function as a linear combination of the model features can be extracted from linear support vector machines, whose weights indicate feature importance for the classification problem32.
Table 1.
Metrics | Support Vector Machine | Random Forest | ||||
---|---|---|---|---|---|---|
CV score | Train score | Test score | CV score | Train score | Test score | |
MCC | 0.88 ± 0.05 | 0.91 | 0.90 | 0.90 ± 0.05 | 1 | 0.91 |
Accuracy | 0.94 ± 0.02 | 0.96 | 0.96 | 0.95 ± 0.02 | 1 | 0.96 |
Precision | 0.89 ± 0.04 | 0.92 | 0.91 | 0.96 ± 0.03 | 1 | 0.97 |
Recall | 0.95 ± 0.06 | 0.96 | 0.95 | 0.90 ± 0.06 | 1 | 0.90 |
Scores obtained from ten-fold cross-validation (CV) score (mean ± std), on the whole training dataset (Train score) and the independent test dataset (Test score) for the support vector machine and random forest models.
We then compared the performance on the test dataset for our SVM model to online available ACP prediction tools, specifically the AntiCP models 1 and 213, the iACP model33, and the MLACP model18. These ACP prediction models are also based on an SVM classifier but utilize different descriptors and training data (Table S1, Supplementary Information). The prediction performance of the four classifiers and our SVM model was assessed on the independent test dataset (Table 2). In this experiment, the performance of our SVM model on the independent test set was superior to all four publicly available ACP prediction models in terms of all performance metrics, except for precision. The MLACP model showed higher precision but lower Matthews correlation coefficient (MCC), accuracy and recall than the other models. Therefore, the MLACP model is better at avoiding false positives but retrieves a higher number of false negatives compared to the SVM model developed in this study.
Table 2.
Metrics | AntiCP Model 1 | AntiCP Model 2 | iACP | MLACP |
---|---|---|---|---|
MCC | −0.04 | 0.81 | 0.51 | 0.84 |
Accuracy | 0.29 | 0.92 | 0.77 | 0.93 |
Precision | 0.29 | 0.81 | 0.58 | 0.96 |
Recall | 0.99 | 0.92 | 0.78 | 0.80 |
Feature importance for ACP activity
We analyzed the feature weights of the SVM classifier to gain an understanding of important discriminatory features for distinguishing between ACPs and non-ACPs (Table 3, Fig. S2, Supplementary Information). Features were ranked by their absolute weight values as a measure of their relative importance for ACP classification. The global hydrophobicity (H), hydrophobic moment (µH) and the frequency of positively charged amino acid pairs separated by one residue (PPd2) were identified as important features of the classifier (weight values w = 1.65, w = 0.5 and w = 0.39, respectively). This finding is in accordance with previous reports on ACPs that highlight the relevance of the hydrophobicity, the hydrophobic moment and a net positive charge for anticancer activity7,34. The peptide length was also identified as a discriminatory feature (w = 0.4), indicating that longer peptides were considered more likely to be active. Two features that take into account the frequency of amino acids with hydrogen-bond donor and acceptor groups (ADd0, DDd0) were identified as bearing the greatest absolute weights (w = −1.94 and w = 1.67, respectively), emphasizing their role in distinguishing ACPs from inactive peptides (Table 3).
Table 3.
Feature | Weight | Description |
---|---|---|
ADd0 | −1.94 | Frequency of amino acids with hydrogen-bond acceptor and donor groups (T, C, Q, N, S and Y) |
DDd0 | 1.67 | Frequency of amino acids with hydrogen-bond donor groups (K, T, C, Q, H, R, W, N, S and Y) |
H | 1.65 | Global peptide hydrophobicity (Eisenberg consensus scale30) |
RPd0 | −0.72 | Frequency of aromatic amino acids with a positively ionizable group (H) |
ADd2 | 0.65 | Frequency of amino acids with hydrogen-bond acceptor and amino acids with donor groups at distance 2 |
µH | 0.50 | Peptide hydrophobic moment |
LDd0 | 0.40 | Frequency of lipophilic amino acids with hydrogen-bond donor groups |
Len | 0.40 | Peptide length |
PPd2 | 0.39 | Frequency of amino acids with positively ionizable groups at distance 2 |
RPd5 | 0.38 | Frequency of aromatic amino acids and amino acids with positively ionizable groups at distance 5 |
APd6 | −0.38 | Frequency of amino acids with hydrogen-bond acceptor groups and amino acids with positively ionizable groups at distance 6 |
RAd3 | −0.26 | Frequency of amino acids with hydrogen-bond acceptor and amino acids with donor groups at distance 3 |
RAd2 | −0.25 | Frequency of amino acids with hydrogen-bond acceptor and amino acids with donor groups at distance 2 |
APd1 | −0.25 | Frequency of amino acids with hydrogen-bond acceptor groups and amino acids with positively ionizable groups at distance 1 |
DNd1 | −0.16 | Frequency of amino acids with hydrogen-bond donor groups and amino acids with negatively ionizable groups at distance 1 |
APd2 | −0.11 | Frequency of amino acids with hydrogen-bond acceptor groups and amino acids with positively ionizable groups at distance 2 |
RPd2 | −0.08 | Frequency of aromatic amino acids and amino acids with positively ionizable groups at distance 2 |
RRd6 | 0.02 | Frequency of aromatic amino acids at distance 6 |
The top scoring features are ranked by their absolute support vector machine weight values, as a measure of their relative importance for ACP classification. An interpretation of each feature is provided.
De novo design of ACPs
To make use of the SVM model for the in silico design of novel ACPs, we generated three virtual peptide libraries of 100,000 peptides each, based on different design principles (Fig. S3, Supplementary Information):
The Helical library contains peptides with the position-dependent amino acid distribution of alpha-helical ACPs11.
The Amphipathic Arc library contains amphipathic peptides with differently sized hydrophobic arcs and a high probability of being cationic.
The Gradient library contains amphipathic peptides that possess a linear hydrophobic gradient.
We predicted the activity of the peptides from each library with our SVM model (Fig. S4, Supplementary Information). More than 80% of the peptides from the Amphipathic Arc and Gradient libraries and more than 60% of the peptides from the Helical library received an SVM score >0.5, indicating potential actives. In contrast, only 10% of peptides with random sequences were predicted to be active. The design principles, therefore, enriched the libraries with potentially active peptides in contrast with random peptide generation.
The similarity of the peptides in the training data was analyzed to consider the applicability domain of the SVM model35; this domain is the chemical space in which the model predictions may be considered reliable. The SVM model was utilized to estimate the pseudo-probabilities (i.e., the probabilities predicted by the model) of the peptides to belong to the active and inactive classes. These scores were subsequently weighted by the similarity to the training data to obtain similarity-weighted scores that consider the model’s applicability domain (ϕACP, ϕNeg, Eqs 5 and 6).
From each peptide library, we selected the two peptides with the highest ϕACP and ϕNeg scores. None of the peptides were found in the training data or the CancerPPD database. No peptides were retrieved from the CancerPPD database with >95% similarity to the selected ones, as determined by the CD-HIT program36. We finally synthesized the 12 peptides and determined their half-effective concentration (EC50) values against the MCF7 and A549 cancer cell lines. For 10 of the 12 synthesized peptides, the predictions were correct (Table 4). All of the peptides predicted to be inactive did not kill more than 50% of the cancer cells at a concentration of 50 µM. Of the six peptides predicted to be active, two were determined to be false positives (inactive at 50 µM) (Figs S10 and S11, Supplementary Information). Of the four correctly predicted active peptides, three were active in a low-micromolar range against both of the tested cancer cell lines, and the fourth (Gradient2) showed activity solely against MCF7 cells (Table 4).
Table 4.
Peptide | Sequencea | ϕACP | ϕNeg | Predictionb | MCF7 EC50/µM | A549 EC50/µM | Outcomec |
---|---|---|---|---|---|---|---|
Helical1 | FLWIKLGKLAGAVLKLILGLKKVV | 0.94 | 0.45 | + | 4.4 ± 1.3 | 8.3 ± 2.0 | TP |
Helical2 | GLWAIAVKAGKVILKLIVFIWIRV | 0.94 | 0.45 | + | >50 | >50 | FP |
Helical3 | GLLDIAGGNAETLAGHAV | 0.44 | 0.90 | − | >50 | >50 | TN |
Helical4 | GLFDVIGSQAGGAAPHFLG | 0.46 | 0.89 | − | >50 | >50 | TN |
AmphiArc1 | KWVKKVHNWLRRWIKVFEALFG | 0.96 | 0.46 | + | 7.0 ± 0.5 | 18.4 ± 0.7 | TP |
AmphiArc2 | KIFKKFKTIIKKVWRIFGRF | 0.95 | 0.46 | + | 5.7 ± 0.7 | 9.3 ± 1.5 | TP |
AmphiArc3 | AFRHSVKEELNYIRRRLERFPNRL | 0.42 | 0.91 | − | >50 | >50 | TN |
AmphiArc4 | RIENGLRKRLQSIYRHLEE | 0.42 | 0.91 | − | >50 | >50 | TN |
Gradient1 | KWVRIWIKVLRGLFVWVWFF | 0.96 | 0.46 | + | >50 | >50 | FP |
Gradient2 | AWLKRIKKFLKALFWVWVW | 0.96 | 0.46 | + | 19.0 ± 1.8 | >50 | TP |
Gradient3 | KVVDNFENILII | 0.40 | 0.85 | − | >50 | >50 | TN |
Gradient4 | RVNAAIPNIIV | 0.41 | 0.84 | − | >50 | >50 | TN |
The peptides from each virtually designed library were evaluated according to a similarity-weighted score for belonging to the positive (ϕACP) and negative (ϕNeg) class. The two peptides with the highest ϕACP and ϕNeg scores for each library were synthesized and tested for anticancer activity on breast adenocarcinoma (MCF7) and lung adenocarcinoma (A549) cell lines (EC50, mean ± std, N = 3).
aAll peptides were synthesized with amidated C-termini; bPrediction: +predicted to be active, − predicted to be inactive; cOutcome: TP: true positive, FP: false positive, TN: true negative.
The AmphiArc2 peptide, the shortest peptide of the low micromolar active peptides, has a high hydrophobic moment (µH = 0.87) and a 180° arc of hydrophobic residues in an idealized helical structure (Fig. 1a). As determined by circular dichroism (CD) spectroscopy, the AmphiArc2 peptide is unstructured in pure water but adopts an alpha-helical structure in a hydrophobic environment (in 50% v/v water:2,2-trifluoroethanol, TFE) (Fig. 1b). Helix formation in a hydrophobic, membrane-like environment has been shown to be a characteristic of certain alpha-helical AMPs and ACPs37,38. To further investigate its membranolytic action, we observed the activity of AmphiArc2 on single MCF7 cells entrapped in a microfluidic chip. Video recordings showed morphological changes in the cell membrane and leakage of the cytosolic components as early as 30 seconds after initial contact with the peptide in the cells (Fig. 1c, Supplementary Information, Video SV1). After 95 seconds, the dye encapsulated in the cancer cell had leaked out, and the cell membrane showed deformations and blebbing.
After characterizing the anticancer activity of the AmphiArc2 peptide, we tested its cell-type selectivity. We determined its EC50 value against the noncancer HDMEC primary cell line and half-effective hemolytic concentration (HC50) against human erythrocytes (Fig. 1d). Both values were found to be in the same low-micromolar range as the EC50 against cancer cell lines, indicating toxicity of this peptide against noncancer cells.
Selectivity optimization of a de novo designed ACP
We applied the SME algorithm to improve the selectivity of the AmphiArc2 peptide towards noncancer cells. SME contained a variation (mutation) and a selection operator (Fig. 2a). By variation, a series of offspring was generated from a parent sequence. The fittest offspring of a generation was selected and used as a parent in the next SME iteration. In this study, parents were selected among the offspring that maintained anticancer activity but showed enhanced selectivity for cancer cells (selection operator). The mutations in the sequence variation step were performed according to a normalized Gaussian probability distribution of pairwise amino acid similarity (dij) (Fig. 2b). As a similarity measure, we utilized the Grantham matrix, which takes into account the atom composition, the polarity and the molecular volume of the residues39. The probability of substitution of residue i to residue j decreases with decreasing pairwise amino acid similarity. The degree of similarity of the offspring peptides to the parent sequence (offspring diversity) was controlled via the sigma (σ) parameter (Fig. 2b). A higher sigma value allowed the generation of sequences further away from the parent peptide (Fig. S5, Supplementary Information).
We performed a total of three SME iterations, starting from the AmphiArc2 peptide. In the first iteration, we generated 10 offspring peptides with σ = 0.1 (Fig. 2c). The mutations introduced by this sigma value were conservative amino acid changes that maintained the overall amphipathicity of the peptide. We synthesized and tested all ten offspring peptides of the three SME generations against the MCF7 and A549 cancer cell lines to determine their anticancer activity. For selectivity assessment, we tested their activity against the noncancer HDMEC primary cell line and measured their hemolytic activity on human erythrocytes (Fig. 2d).
The results obtained demonstrate that small conservative amino acid replacements affected the activity and selectivity of these ACPs while conserving their overall amphipathic helical structure in a lipophilic environment. Offspring n.2 (Off2) maintained the low-micromolar activity of the AmphiArc2 peptide against the A549 and MCF7 cancer cells but showed a 12-fold reduction of hemolytic activity against human erythrocytes and a ten-fold reduction of activity against HDMEC cells (Fig. 2d). Therefore, we selected Off2 as the parent for the next SME iteration (Fig. 3), in which ten new peptides were generated (Off2.1 to Off2.10).
The second generation of peptide variation did not achieve meaningful selectivity improvements with respect to HDMEC cells (Fig. S7, Supplementary Information). Five of the offspring peptides (Off2.1, Off2.3, Off2.4, Off2.9, Off2.10) were inactive. This loss of activity correlated with the introduction of a proline residue in the sequence (Fig. S7, Supplementary Information). Prolines affect alpha-helical conformation by introducing helix kinks and breaks40. We corroborated this secondary structure disruption with circular dichroism analysis of Off2.1, Off2.3, Off2.4 and Off2.9 (Fig. S9, Supplementary Information).
In the third SME generation (Off2.2.1 – Off2.2.10), we actively omitted proline residues and reduced the sigma value from 0.1 to 0.06 to explore close analogs of Off2 and Off2.2 (Fig. S8, Supplementary Information). Off2.2.10 showed decreased activity towards the noncancer HDMEC primary cells (Fig. 3d). This increase in selectivity was accompanied by a decreased activity against both the A549 and MCF7 cell lines.
The most active, but nonselective, AmphiArc2 parent peptide and the most cancer-cell selective Off2.2.10 peptide possess several differences and commonalities in their physicochemical properties. Even though both peptides display a hydrophobic arc of 180°, the hydrophobic moment of Off2.2.10 (µH = 0.64) is lower than that of AmphiArc2 (µH = 0.87) (Fig. 3c). The parent peptide bears eight positive charges, while Off2.2.10 contains seven positively ionizable residues caused by the N-terminal K1Q mutation. This moderate reduction of both the hydrophobic moment and the net positive charge improved the peptide selectivity for cancer cells and reduced the risk of killing non-transformed cells. To further explore these sequence features, we analyzed the ratios of the EC50 in the noncancer cells and in the cancer cell lines of all tested peptides. The more selective peptides (higher EC50 ratio) are characterized by moderate hydrophobic moments and charge densities (Supplementary Information, Fig. S10), suggesting a guideline for optimizing the cancer-cell selectivity of ACPs. This observation is in accordance with reports stating that decreasing the hydrophobic moment of helical ACPs reduces both their hemolytic potential and anticancer activity7–9.
NCI-60 cancer cell panel testing
The ACP candidates AmphiArc2 (parent), Off2 and Off2.2.10 were tested on the NCI-60 cancer cell panel41. The three tested peptides inhibited the growth of all the cancer cell lines in the NCI-60 panel at a low micromolar concentration (Table 5, Supplementary Information Table S3). This result corroborated the wide-spectrum effect of the anticancer peptides across a range of cancer types. Both the activity of Off2 and Off2.2.10 peptides on the cell lines tested were significantly lower than the anticancer activity of the AmphiArc2 peptide (p-value = 4.9 × 10−13 and 1.7 × 10−12, respectively, Welch two sample t-test), suggesting that the initial increased cancer cell selectivity comes at a cost of an activity loss. No significant anticancer activity difference was found between Off2 and Off2.2.10 peptides (p-value = 0.66, Welch two sample t-test), indicating that the additionally improved anticancer selectivity does not affect the average anticancer activities of these two peptides.
Table 5.
AmphiArc2 log GI50 | Off2 log GI50 | Off2.2.10 log GI50 | |
---|---|---|---|
Leukemia | −5.5 | −5.2 | −5.3 |
Lung | −5.6 | −5.4 | −5.2 |
Colon | −5.6 | −5.2 | −5.1 |
CNS | −5.6 | −5.2 | −5.2 |
Melanoma | −5.6 | −5.3 | −5.2 |
Ovarian | −5.6 | −5.2 | −5.2 |
Renal | −5.7 | −5.3 | −5.2 |
Prostate | −5.7 | −5.5 | −5.4 |
Breast | −5.6 | −5.4 | −5.4 |
The averaged peptide activity for the cancer types tested is shown as the logarithm of the half growth inhibitory concentration (GI50, Supplementary Information Eq. S1), which is the molar concentration of peptide needed to inhibit half of the normal cancer cell growth. The logarithm of GI50 is shown here as 10n M. The values from −5 to −6 correspond to growth inhibition in the 1–10 µM range. The growth inhibition values for the individual cell lines are displayed in Supplementary Information Table S4.
Conclusions
In this study, the combination of a machine learning model and the SME algorithm resulted in ACPs with low-micromolar potency against a wide variety of cancer cells (NCI-60 panel) and selectivity with respect to non-transformed cells (HDMEC) and human erythrocytes. The machine-learning classifier alone was able to identify active peptides but was insufficient to identify cancer cell selective peptides. Virtual screening of computationally designed peptide libraries with the implemented machine-learning classifier led to the discovery of four novel ACPs as the starting point for selectivity optimization by SME. In the first design-synthesize-test cycle, peptide hemolysis was reduced ten-fold, and after three cycles, peptide activity towards noncancer cells was reduced more than 20-fold while retaining anticancer activity compared to the parent peptide (AmphiArc2). The results of this study advocate for the SME method for experiment-guided peptide design and for exploration of the ACP structure-activity landscape. SME is applicable to all kinds of experimental readouts and provides an alternative to more conventional peptide optimization techniques, e.g., alanine scanning. At the same time, the results suggest that additionally increased cancer cell selectivity of membranolytic ACPs might come at the price of reduced peptide potency. This working hypothesis provides a basis for future study.
Methods
Machine learning model
Both machine learning models were constructed in Python v2.7 using the Scikit-Learn v0.18 library. For model training, the peptide dataset was split into 2/3 training and 1/3 testing subsets. Random forest classifier: the number of trees (“n_estimators”) was set to 500, and the number of features to be considered by each tree (“max_features”) was set to the squared root of all features (“sqrt”). SVM classifier: a linear kernel was employed and hyperparameter C was optimized by a ten-fold cross-validation in which the model is trained on 90% of the training data and validated on the remaining 10% in ten repetitions of training. The obtained mean of the 10 repetitions (cross-validation MCC score) was used to evaluate the performance of the models. The test scores were obtained with the independent test set.
Scoring metrics
The Matthews correlation coefficient (MCC, Eq. 1), accuracy (Eq. 2), precision (Eq. 3) and recall (Eq. 4) were calculated. TP, FP, TN and FN correspond to the number of true positives, false positives, true negatives and false negatives predicted by the model, respectively.
1 |
2 |
3 |
4 |
Data weighted scoring functions
To appropriately consider the applicability domain of the SVM classifier, the final scoring function for ACPs (ϕACP, Eq. 5) and inactive (negative) peptides (ϕNeg, Eq. 6) considers both the pseudo-probability of the peptide to be an ACP (PACP) as predicted by the SVM model and the similarity of the predicted peptides to the training data (Sim. score). k-means clustering with k = 3 was performed with Python v2.7 and the Scikit-Learn v0.18 library package. The similarity score is calculated as the inverse of the Euclidean distance in descriptor space of the peptides to the three centroids.
5 |
6 |
Virtual peptide libraries
Three virtual peptide libraries were generated according to three different design principles. For each library, the peptide length was restricted to a range of 11 to 30 amino acids, as peptides able to fold in an alpha-helix are typically inside this range42. Duplicate sequences were eliminated, and the similarity of the sequences was restricted with the CD-HIT36 program to a threshold of 0.8 similarity. A total of 106 peptides were selected from each of the libraries.
Helical library. The Helical library was generated with the position-dependent amino acid distributions of 62 anuran and hymenopteran alpha-helical ACPs11 in amino acid positions 1–18 (exactly 5 helical turns). For longer peptides, the pattern was repeated. The method to generate this library is included in the modlAMP43 Python package (modlamp.sequences.HelicesACP).
Amphipathic Arc library. The design principle of the Amphipathic Arc library was amphipathic peptide sequences, which would potentially be alpha-helical with a preference for positively charged amino acids in the polar phase of the helix and varying hydrophobic arcs in the range 100–260°. The method to generate this library was included in the python package modlAMP as the class AmphipathicArc (modlamp.sequences.AmphipathicArc).
Gradient library. The Gradient library was designed using the same procedure as the Amphipathic Arc library but with an additional hydrophobic gradient in the peptide structure from the N- to the C-terminus. For this, the amino acids in the C-terminal third of the peptide sequence were substituted with hydrophobic amino acids. In the modlAMP package, this was achieved by the method make_H_gradient in the modlamp.sequences.Amphipathic Arc class.
Simulated molecular evolution
The simulated molecular evolution (SME) algorithm is based on the (1, λ) evolution strategy44 in which λ mutated sequences (offspring) are generated from a parent sequence22,23,25. The offspring was scored according to a fitness function, which was defined as the experimentally determined peptide anticancer activity and selectivity with respect to non-transformed cells. The best offspring were selected as a parent for the following optimization iteration. The amino acid mutations were generated according to an amino acid similarity matrix that has been row-normalized (dij) to allow for a pseudo-probability calculation of the amino acid transitions (Eq. 7). Here, the Grantham amino-acid similarity matrix was utilized39. The amino acids cysteine and methionine were excluded from the mutation matrix to avoid potential peptide cyclization and facilitate peptide synthesis.
7 |
where σ is a strategy parameter that controls the distance of the offspring sequences to the parent sequence and, thus, the sequence diversity among the offspring. The σ strategy parameter was set to 0.1 for the two initial SME iterations. Sequence diversity was characterized by the Shannon entropy45 (H) of the residue distribution among the offspring (Eq. 8), where pi corresponds to the frequency of amino acid i in a certain sequence position. The Shannon entropy values were normalized to [0, 1]. The simulated molecular evolution strategy and Shannon entropy calculation were programmed with Python v2.7.
8 |
Supplementary information
Acknowledgements
The authors thank Sarah Haller for technical support and Prof. Cornelia Halin and Prof. Stephanie Krämer for the use of the cell culture facilities. The Developmental Therapeutics Program of the National Cancer Institute and Dr. John A. Beutler kindly performed the NCI-60 cancer cell panel tests. We thank Dr. Francesca Grisoni and Alexander L. Button for constructive input and discussions. This work was financially supported by the Swiss National Science Foundation (Grant No. 2000021_157190 to G.S. and J.A.H.).
Author Contributions
G.G., D.G., A.T.M. and C.S.N. performed the peptide syntheses and activity assays. G.G. and L.A. performed the microfluidics assay. J.A.H., P.S.D. and G.S. designed and supervised the study. G.G., A.T.M. and G.S. programmed the software. All authors analyzed the data and contributed to the manuscript. G.G. and G.S. wrote the manuscript.
Competing Interests
G.S. declares a potential financial conflict of interest in his role as life-science industry consultant and cofounder of inSili.com GmbH, Zurich. No further competing interests are declared.
Footnotes
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary information accompanies this paper at 10.1038/s41598-019-47568-9.
References
- 1.Holohan C, Van Schaeybroeck S, Longley DB, Johnston PG. Cancer drug resistance: an evolving paradigm. Nat. Rev. Cancer. 2013;13:714–726. doi: 10.1038/nrc3599. [DOI] [PubMed] [Google Scholar]
- 2.Chatterjee S, Damle SG, Sharma AK. Mechanisms of resistance against cancer therapeutic drugs. Curr. Pharm. Biotechnol. 2014;15:1105–1112. doi: 10.2174/1389201015666141126123952. [DOI] [PubMed] [Google Scholar]
- 3.Papo N, Shai Y. Host defense peptides as new weapons in cancer treatment. C. Cell. Mol. Life Sci. 2005;62:784–790. doi: 10.1007/s00018-005-4560-2. [DOI] [PubMed] [Google Scholar]
- 4.Schweizer F. Cationic amphiphilic peptides with cancer-selective toxicity. Eur. J. Pharmacol. 2009;625:190–194. doi: 10.1016/j.ejphar.2009.08.043. [DOI] [PubMed] [Google Scholar]
- 5.Mader JS, Hoskin DW. Cationic antimicrobial peptides as novel cytotoxic agents for cancer treatment. Expert Opin. Investig. Drugs. 2006;15:933–946. doi: 10.1517/13543784.15.8.933. [DOI] [PubMed] [Google Scholar]
- 6.Riedl S, et al. In search of a novel target — phosphatidylserine exposed by non-apoptotic tumor cells and metastases of malignancies with poor treatment efficacy. Biochim. Biophys. Acta - Biomembr. 2011;1808:2638–2645. doi: 10.1016/j.bbamem.2011.07.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Harris F, Dennison SR, Singh J, Phoenix DA. On the selectivity and efficacy of defense peptides with respect to cancer cells. Med. Res. Rev. 2013;33:190–234. doi: 10.1002/med.20252. [DOI] [PubMed] [Google Scholar]
- 8.Huang Y, Wang X, Wang H, Liu Y, Chen Y. Studies on mechanism of action of anticancer peptides by modulation of hydrophobicity within a defined structural framework. Mol. Cancer Ther. 2011;10:416–426. doi: 10.1158/1535-7163.MCT-10-0811. [DOI] [PubMed] [Google Scholar]
- 9.Yang Q-Z, et al. Design of potent, non-toxic anticancer peptides based on the structure of the antimicrobial peptide, temporin-1CEa. Arch. Pharm. Res. 2013;36:1302–1310. doi: 10.1007/s12272-013-0112-8. [DOI] [PubMed] [Google Scholar]
- 10.Dennison SR, Harris F, Bhatt T, Singh J, Phoenix DA. A theoretical analysis of secondary structural characteristics of anticancer peptides. Mol. Cell. Biochem. 2010;333:129–135. doi: 10.1007/s11010-009-0213-3. [DOI] [PubMed] [Google Scholar]
- 11.Gabernet G, Müller AT, Hiss JA, Schneider G. Membranolytic anticancer peptides. Med. Chem. Commun. 2016;7:2232–2245. doi: 10.1039/C6MD00376A. [DOI] [Google Scholar]
- 12.Lin Y-C, et al. Multidimensional design of anticancer peptides. Angew. Chem. Int. Ed. 2015;54:10370–10374. doi: 10.1002/anie.201504018. [DOI] [PubMed] [Google Scholar]
- 13.Tyagi A, et al. In silico models for designing and discovering novel anticancer peptides. Sci. Rep. 2013;3:2984. doi: 10.1038/srep02984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chen W, Ding H, Feng P, Lin H, Chou K. iACP: a sequence-based tool for identifying anticancer peptides. Oncotarget. 2016;7:16895–16909. doi: 10.18632/oncotarget.7815. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Hajisharifi Z, Piryaiee M, Mohammad Beigi M, Behbahani M, Mohabatkar H. Predicting anticancer peptides with Chou’s pseudo amino acid composition and investigating their mutagenicity via Ames test. J. Theor. Biol. 2014;341:34–40. doi: 10.1016/j.jtbi.2013.08.037. [DOI] [PubMed] [Google Scholar]
- 16.Saravanan V, Lakshmi PTV. ACPP: A web server for prediction and design of anti-cancer peptides. Int. J. Pept. Res. Ther. 2015;21:99–106. doi: 10.1007/s10989-014-9435-7. [DOI] [Google Scholar]
- 17.Grisoni F, et al. Designing anticancer peptides by constructive machine learning. ChemMedChem. 2018;13:1300–1302. doi: 10.1002/cmdc.201800204. [DOI] [PubMed] [Google Scholar]
- 18.Manavalan B, et al. MLACP: machine-learning-based prediction of anticancer peptides. Oncotarget. 2017;8:77121–77136. doi: 10.18632/oncotarget.20365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Fjell CD, et al. Identification of novel antibacterial peptides by chemoinformatics and machine learning. J. Med. Chem. 2009;52:2006–2015. doi: 10.1021/jm8015365. [DOI] [PubMed] [Google Scholar]
- 20.Müller AT, Hiss JA, Schneider G. Recurrent neural network model for constructive peptide design. J. Chem. Inf. Model. 2018;58:472–479. doi: 10.1021/acs.jcim.7b00414. [DOI] [PubMed] [Google Scholar]
- 21.Lee Ernest Y., Wong Gerard C.L., Ferguson Andrew L. Machine learning-enabled discovery and design of membrane-active peptides. Bioorganic & Medicinal Chemistry. 2018;26(10):2708–2718. doi: 10.1016/j.bmc.2017.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Schneider G, Wrede P. The rational design of amino acid sequences by artificial neural networks and simulated molecular evolution: de novo design of an idealized leader peptidase cleavage site. Biophys. J. 1994;66:335–344. doi: 10.1016/S0006-3495(94)80782-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Schneider G, Schuchhardt J, Wrede P. Peptide design in machina: development of artificial mitochondrial protein precursor cleavage sites by simulated molecular evolution. Biophys. J. 1995;68:434–447. doi: 10.1016/S0006-3495(95)80205-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Schneider G, et al. Peptide design by artificial neural networks and computer-based evolutionary search. Proc. Natl. Acad. Sci. USA. 1998;95:12179–12184. doi: 10.1073/pnas.95.21.12179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hiss JA, Stutz K, Posselt G, Weßler S, Schneider G. Attractors in sequence space: peptide morphing by directed simulated evolution. Mol. Inf. 2015;34:709–714. doi: 10.1002/minf.201500089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Stutz K, et al. Peptide–membrane interaction between targeting and lysis. ACS Chem. Biol. 2017;12:2254–2259. doi: 10.1021/acschembio.7b00504. [DOI] [PubMed] [Google Scholar]
- 27.Tyagi A, et al. CancerPPD: a database of anticancer peptides and proteins. Nucleic Acids Res. 2015;43:837–843. doi: 10.1093/nar/gku892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Berman HM. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Koch CP, et al. Scrutinizing MHC-I binding peptides and their limits of variation. PLoS Comput. Biol. 2013;9:e1003088. doi: 10.1371/journal.pcbi.1003088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Eisenberg D, Weiss RM, Terwilliger TC. The helical hydrophobic moment: a measure of the amphiphilicity of a helix. Nature. 1982;299:371–374. doi: 10.1038/299371a0. [DOI] [PubMed] [Google Scholar]
- 31.Breiman L. Random Forests. Mach. Learn. 2001;45:5–32. doi: 10.1023/A:1010933404324. [DOI] [Google Scholar]
- 32.Cortes C, Vapnik V. Support-vector networks. Mach. Learn. 1995;20:273–297. [Google Scholar]
- 33.Chen Y, et al. Comparison of biophysical and biologic properties of alpha-helical enantiomeric antimicrobial peptides. Chem. Biol. Drug Des. 2006;67:162–173. doi: 10.1111/j.1747-0285.2006.00349.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Riedl S, Zweytick D, Lohner K. Membrane-active host defense peptides – Challenges and perspectives for the development of novel anticancer drugs. Chem. Phys. Lipids. 2011;164:766–781. doi: 10.1016/j.chemphyslip.2011.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Schroeter TS, et al. Estimating the domain of applicability for machine learning QSAR models: A study on aqueous solubility of drug discovery molecules. J. Comput. Aided. Mol. Des. 2007;21:651–664. doi: 10.1007/s10822-007-9160-9. [DOI] [PubMed] [Google Scholar]
- 36.Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–1659. doi: 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
- 37.Marion D, Zasloff M, Bax A. A two-dimensional NMR study of the antimicrobial peptide magainin 2. FEBS Lett. 1988;227:21–26. doi: 10.1016/0014-5793(88)81405-4. [DOI] [PubMed] [Google Scholar]
- 38.Zelezetsky I, Tossi A. Alpha-helical antimicrobial peptides—Using a sequence template to guide structure–activity relationship studies. Biochim. Biophys. Acta Biomembr. 2006;1758:1436–1449. doi: 10.1016/j.bbamem.2006.03.021. [DOI] [PubMed] [Google Scholar]
- 39.Grantham R. Amino acid difference formula to help explain protein evolution. Science. 1974;185:862–864. doi: 10.1126/science.185.4154.862. [DOI] [PubMed] [Google Scholar]
- 40.Nilsson I, et al. Proline-induced disruption of a transmembrane α-helix in its natural environment. J. Mol. Biol. 1998;284:1165–1175. doi: 10.1006/jmbi.1998.2217. [DOI] [PubMed] [Google Scholar]
- 41.Monks A, et al. Feasibility of a high-flux anticancer drug screen using a diverse panel of cultured human tumor cell lines. J. Natl. Cancer Inst. 1991;83:757–766. doi: 10.1093/jnci/83.11.757. [DOI] [PubMed] [Google Scholar]
- 42.Manning MC, Illangasekare M, Woody RW. Circular dichroism studies of distorted alpha-helices, twisted beta-sheets, and beta turns. Biophys. Chem. 1988;31:77–86. doi: 10.1016/0301-4622(88)80011-5. [DOI] [PubMed] [Google Scholar]
- 43.Müller Alex T, Gabernet Gisela, Hiss Jan A, Schneider Gisbert. modlAMP: Python for antimicrobial peptides. Bioinformatics. 2017;33(17):2753–2755. doi: 10.1093/bioinformatics/btx285. [DOI] [PubMed] [Google Scholar]
- 44.Rechenberg, I. Evolutionsstrategie - Optimierung technischer Systeme nach Prinzipien der biologischen Evolution (Frommann-Holzboog, Stuttgart, 1973).
- 45.Asadi, M., Ebrahimi, N. & Soofi, E. S. Shannon entropy measures. In Wiley StatsRef: Statistics Reference Online 1–8, 10.1002/9781118445112.stat07920 (John Wiley & Sons, New York, 2017).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.