Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2021 Jan 22;140(2):15. doi: 10.1007/s00214-020-02707-8

The sequence of amino acids as the basis for the model of biological activity of peptides

Alla P Toropova 1, Maria Raškova 2, Ivan Raška Jr 2, Andrey A Toropov 1,
PMCID: PMC7820519  PMID: 33500680

Abstract

The algorithm of building up a model for the biological activity of peptides as a mathematical function of a sequence of amino acids is suggested. The general scheme is the following: The total set of available data is distributed into the active training set, passive training set, calibration set, and validation set. The training (both active and passive) and calibration sets are a system of generation of a model of biological activity where each amino acid obtains special correlation weight. The numerical data on the correlation weights calculated by the Monte Carlo method using the CORAL software (http://www.insilico.eu/coral). The target function aimed to give the best result for the calibration set (not for the training set). The final checkup of the model is carried out with data on the validation set (peptides, which are not visible during the creation of the model). Described computational experiments confirm the ability of the approach to be a tool for the design of predictive models for the biological activity of peptides (expressed by pIC50).

Keywords: QSAR, Amino acid, Peptide, Monte Carlo method, Index of ideality of correlation

Introduction

History of mathematical chemistry contains contributions of many outstanding scientists, such as A.T. Balaban, M. Randić, I. Gutman, N.Trinajstić, S.C. Basak, R. Carbó-Dorca, as well as many others [115]. Mathematical chemistry [1] is the area of research engaged in novel applications of mathematics to chemistry, biochemistry, and biology. It concerns itself principally with the mathematical modeling of complex molecular phenomena [2].

Most areas of research in mathematical chemistry include chemical graph theory, which deals with the development of topological descriptors which find application in quantitative structure–property relationships [3, 4], as well as chemical aspects of group theory, which finds applications in stereochemistry and quantum chemistry [5, 6].

Chemoinformatics is a relatively young field of natural sciences. By analogy with "in viva" and "in vitro," the results of cheminformatics denominate as "in silico" [7].

It is to be noted, contributions of Prof. R. Carbó-Dorca, related to the development of cheminformatics tools applied to quantum mechanical theoretical problems, which gave the possibility to solve chemical problems, like catalysis and reactivity, by simple computational schemes [812]. Chemoinformatic gradually extends to solve tasks in fields of theoretical chemistry, computational chemistry, and modeling [1315].

Apply mathematical methods to solve the tasks of chemistry and biochemistry can be effective [16, 17]. Peptides are important objects of chemistry, biochemistry, and medicine. Most interest in using proteins and peptides is caused by their application in drug design [18]. The amino acid residues of epitope-peptide substrate and SARS coronavirus main protease are interacting. Hence, the affinity of epitope-peptides with class I MHC (major histocompatibility complex) molecules can be used to development of antiviral agents, e.g., toward coronaviruses [18].

A fundamentally widely accepted science principle to understand complex systems is “Everything should be made as simple as possible, but no simpler” [19]. Perhaps, the approach used here cannot be adequately evaluated using the above principle, since a simpler method is not possible, or at least a simpler approach has not yet been described in the literature [2023]. To state the approach “simpler” than “simple” is not correct, since the approach gives quite good models [2023]. The model of biological activity of peptides described here is based on sequences of amino acids, represented by 1-letter codes (Table 1).

Table 1.

Structures and 1-letter codes for Amino acids

Amino acid 1-letter code Structure
Alanine A graphic file with name 214_2020_2707_Figa_HTML.gif
Arginine R graphic file with name 214_2020_2707_Figb_HTML.gif
Asparagine N graphic file with name 214_2020_2707_Figc_HTML.gif
Aspartic Acid D graphic file with name 214_2020_2707_Figd_HTML.gif
Cysteine C graphic file with name 214_2020_2707_Fige_HTML.gif
Glutamic acid E graphic file with name 214_2020_2707_Figf_HTML.gif
Glutamine Q graphic file with name 214_2020_2707_Figg_HTML.gif
Glycine G graphic file with name 214_2020_2707_Figh_HTML.gif
Histidine H graphic file with name 214_2020_2707_Figi_HTML.gif
Isoleucine I graphic file with name 214_2020_2707_Figj_HTML.gif
Leucine L graphic file with name 214_2020_2707_Figk_HTML.gif
Lysine K graphic file with name 214_2020_2707_Figl_HTML.gif
Methionine M graphic file with name 214_2020_2707_Figm_HTML.gif
Phenylalanine F graphic file with name 214_2020_2707_Fign_HTML.gif
Proline P graphic file with name 214_2020_2707_Figo_HTML.gif
Serine S graphic file with name 214_2020_2707_Figp_HTML.gif
Threonine T graphic file with name 214_2020_2707_Figq_HTML.gif
Tryptophan W graphic file with name 214_2020_2707_Figr_HTML.gif
Tyrosine Y graphic file with name 214_2020_2707_Figs_HTML.gif
Valine V graphic file with name 214_2020_2707_Figt_HTML.gif

The aim of the present study is the estimation of the CORAL software to provide a satisfactory model for the bioactivity of peptides. Representation of peptides via a sequence of amino acids is like a well-known simplified molecular input-line entry system (SMILES) [24]. Consequently, the CORAL software (www.insilico.eu/coral) that is oriented to build up quantitative structure–activity relationships (QSARs) using the SMILES representation can be a tool to build up a predictive model for the activity of peptides as a function of sequences of the 1-letter codes of corresponding amino acids [25]. Factually, the sequences of amino acids represented by 1-letter codes are quasi-SMILES [20, 21].

Method

Data

The numerical data on the biological activity of epitope-peptides with class I MHC (major histocompatibility complex) molecules taken from the literature [18]. The endpoint expressed via a negative logarithm of half-maximal inhibitory concentration IC50 (pIC50). Table 1 contains sequences of amino acids represent epitope-peptides studied here.

The available epitope-peptides were randomly distributed into the active training set (25%), passive training set (25%), calibration set (25%), and validation set (25%). Each above set has a defined task. The task for the active training set is to build up optimal correlation weights for the optimal descriptor. The task for the passive training set is to checkup whether current correlation weights (and the optimal descriptor) are satisfactory for peptides, which are not involved in the calculation of the correlation weights. The task for the calibration set is to detect the moment of the begin overtraining. The task of peptides from the validation set is the final estimation of the predictive potential of the model.

Quantitative structure–activity relationships (QSARs)

The CORAL software provides models, which are linear one-variable correlations obtained by the Monte Carlo method (http://www.insilico.eu/coral). The generalized representation of the model for the biological activity of peptides is the following one-variable correlation:

pIC50=C0+C1×DCWT,N 1

The DCW(T,N) is the descriptor of correlation weights (DCW). The C0 and C1 are regression coefficients. The T and N are parameters of the Monte Carlo optimization discussed below.

The descriptor of correlation weights (DCW)

The descriptors applied to QSAR analysis are calculated as the following:

DCWT,N=CWAk 2

The Ak is a 1-letter code of amino acid; CW(Ak) is the correlation weights for the Ak.

The T is an integer to define two classes (i) the rare and (ii) non-rare. If the frequency of Ak in the active training set is less than T, the Ak is rare, and the CW(Ak) = 0 (i.e., the Ak is removed from the modeling process). Thus, the model is based on correlation weights solely non-rare in the active training set amino acids. The N is the number of iterations for the Monte Carlo optimization. The T = T* and N = N* are values which provide the best statistical quality of the model for the calibration set.

Monte Carlo optimization

The scheme of the Monte Carlo optimization is described in [23, 25]. The essence of this version of the optimization procedure is the application of the Index of ideality of correlation (IIC). Models for the inhibitory activity of peptides built up here are build up to apply two different target functions TF1 and TF2:

TF1=RAT+RPT-RAT-RPT0.1 3
TF2=TF1+IICCLB0.5 4

The RAT and RPT are the correlation coefficient between observed and predicted endpoints for the active training set and passive training set, respectively.

The IICCLB is calculated with data on the calibration set as the following:

IICCLB=rCLBmin(-MAECLB,+MAECLB)max(-MAECLB,+MAECLB) 5
-MAECLB=1-Nk=1-NΔk,Δk0;-NisthenumberofΔk0 6
+MAECLB=1+Nk=1+NΔk,Δk0;+NisthenumberofΔk0 7
Δk=observedk-calculatedk 8

The observed and calculated are the corresponding values of pIC50.

Figure 1 contains the comparison of histories of the Monte Carlo optimizations with target functions TF1 and TF2. One can see, the TF2 seems preferable because factually the decrease in the statistical quality for calibration set and validation set is not observed, whereas in the case of TF1 the decrease in the statistical quality for the calibration set and validation set is observed.

Fig. 1.

Fig. 1

Histories of the Monte Carlo optimization (Split 1) with target functions TF1 and TF2

Domain of applicability

The domain of applicability for the CORAL model is defined according to the distribution of SMILES attributes in the active training set and calibration set as two steps:

Step 1: the definition of the statistical defect (dk) for each SMILES attribute involved in building up of a model:

dk=P(Ak)-P(Ak)NAk+NAk 9

where P(Ak) and P’(Ak) are the probability of Ak in the training and calibration sets, respectively.

N(Ak) and N’(Ak) are frequencies of Ak in the training and calibration sets, respectively.

Step 2: the calculation for all substances the statistical SMILES-defect (Dj):

Dj=k=1NAdk 10

where NA is the number of non-blocked SMILES attributes in the SMILES.

A substance falls in the domain of applicability if

Dj<2D¯ 11

where D¯ is the average of the statistical SMILES-defect for the training set.

The same operation can be carried out with the sequences of 1-letter codes of amino acids, if instead of Ak defined as a SMILES attribute, one examined Ak defined as a 1-letter code of corresponding amino acids.

Results and discussion

The models obtained for three random splits into the training set (which is association of the active and passive training sets together with the calibration set) and validation set are the following:

Target Function TF1

pIC50=5.0637012±0.3150527+0.9790357±0.1064904DCW1,3 12
pIC50=5.3015843±0.1783155+1.4109089±0.1001528DCW1,3 13
pIC50=2.6879582±0.2459626+1.0011131±0.0482456DCW1,3 14

Target Function TF2

pIC50=4.0179522±0.5296001+0.4553542±0.0634366DCW1,15 15
pIC50=4.8689021±0.3087049+0.6850851±0.0712025DCW1,15 16
pIC50=5.3828941±0.4250702+0.7649124±0.1215301DCW1,15 17

Table 2 contains the statistical characteristics of the models calculated with Eqs. 1217.

Table 2.

Statistical quality of models for three random splits

Split Set R2 Q2 IIC RMSE
Optimization with TF1
1 Active training 0.7625 0.5558 0.8732 0.360
Passive training 0.8250 0.7065 0.6739 0.395
Calibration 0.6012 0.4017 0.3695 0.506
Validation 0.6220 0.4816 0.490
2 Active training 0.8205 0.7052 0.9058 0.333
Passive training 0.9165 0.8301 0.4709 0.374
Calibration 0.5223 0.2836 0.4258 0.592
Validation 0.5481 0.3476 0.515
3 Active training 0.8846 0.8229 0.9406 0.265
Passive training 0.7283 0.5982 0.8264 0.599
Calibration 0.5053 0.2612 0.3745 0.927
Validation 0.5900 0.3277 0.700
Optimization with TF2
1 Active training 0.6416 0.3506 0.5340 0.442
Passive training 0.7231 0.5868 0.4120 0.507
Calibration 0.9486 0.9157 0.9679 0.142
Validation 0.7766 0.6298 0.306
2 Active training 0.6976 0.4905 0.5568 0.432
Passive training 0.9543 0.9192 0.8516 0.332
Calibration 0.7102 0.5447 0.8406 0.337
Validation 0.7856 0.6596 0.270
3 Active training 0.5326 0.1846 0.7298 0.533
Passive training 0.8128 0.6796 0.6251 0.562
Calibration 0.8743 0.8139 0.8827 0.214
Validation 0.7909 0.6721 0.248

Each set contains ten peptides

The best model is indicated by bold

One can see, the predictive potential of models calculated using the IIC is better.

Having numerical data on correlation weights of different amino acids obtained in several runs of the optimization, one can detect the amino acids of two classes: (1) amino acids with stable positive correlation weights, these are promoters of increase of pIC50; and (2) amino acids with stable negative correlation weights, these are promoters of decrease of pIC50. Thus, the approach gives the statistical mechanistic interpretation of the models. Table 3 contains a collection of amino acids which are promoters of increase/decrease for pIC50. It is to be noted, the prevalence of corresponding amino acids also should be considering.

Table 3.

Amino acids which are promoters of increase / decrease for pIC50 for examined peptides

Comment Ak CWsProbe 1 CWsProbe 2 CWsProbe 3 NAT NPT NC dk
Increase V………. 0.47695 0.30991 0.26611 10 10 10 0.0000
L………. 1.29542 0.73164 0.31587 8 5 7 0.0067
F………. 1.07326 0.70770 0.37614 6 6 7 0.0077
I………. 0.76211 0.16717 0.34684 6 3 4 0.0200
A………. 0.54686 0.01821 0.06304 4 3 2 0.0333
G………. 0.44966 0.52819 0.73395 4 5 4 0.0000
Y………. 1.46411 0.65332 0.40546 4 5 5 0.0111
M………. 1.29967 0.55126 0.39601 2 0 3 0.0200
Decrease T………. − 0.26044 − 0.28480 − 0.34702 6 9 6 0.0000
E………. − 0.62472 − 0.62778 − 0.55954 1 3 1 0.0000

NAT, NPT, and NC are the frequencies of an amino acid in the active training set, passive training set, and the calibration set, respectively

Table 4 contains experimental and calculated with Eq. 17 pIC50. Table 5 contains the numerical data on the correlation weights of amino acids to calculate the model with Eq. 17.

Table 4.

Experimental and calculated with Eq. 17 pIC50 for model obtained with split 3 (the best model): “ + ” is the indicator for the active training set; “– ” is the indicator for the passive training set; “#” is the indicator of calibration set; and “*” is the indicator for validation set

Set ID Sequence of amino acids DCW(1,15) pIC50 Expr pIC50 Calc DJ(D¯=0.08757) Applicability
P01 WLEPGPVTA 1.98966 6.0820 6.9048 0.0754 YES
P02 ITSQVPFSV 1.62921 6.1960 6.6291 0.1259 YES
# P03 FLEPGPVTA 2.17966 6.8980 7.0501 0.0485 YES
# P04 ITAQVPFSV 2.21389 7.0200 7.0763 0.1029 YES
 +  P05 YLEPGPVTL 2.98174 7.0580 7.6637 0.0421 YES
# P06 YTDQVPFSV 2.39417 7.0660 7.2142 0.0862 YES
P07 YLEPGPVTI 2.21031 7.1870 7.0736 0.0754 YES
* P08 YLEPGPVTV 2.20698 7.3420 7.0710 0.0421 YES
# P09 YLSPGPVTA 3.06834 7.3830 7.7299 0.0651 YES
# P10 IIDQVPFSV 3.11987 7.3980 7.7693 0.1219 YES
 +  P11 ITWQVPFSV 1.93195 7.4630 6.8607 0.1529 YES
 +  P12 ITYQVPFSV 2.14528 7.4800 7.0238 0.1195 YES
# P13 ILSQVPFSV 3.05039 7.6990 7.7162 0.1117 YES
P14 IMDQVPFSV 2.69191 7.7190 7.4420 0.0886 YES
* P15 YLMPGPVTV 3.23638 7.9320 7.8584 0.0421 YES
# P16 WLDQVPFSV 3.60203 7.9390 8.1381 0.1052 YES
* P17 YLAPGPVTA 3.65302 8.0320 8.1771 0.0421 YES
 +  P18 YLYPGPVTV 3.58840 8.0510 8.1277 0.0587 YES
* P19 YLWPGPVTV 3.37507 8.1250 7.9645 0.0921 YES
# P20 ILYQVPFSV 3.56646 8.3100 8.1109 0.1052 YES
P21 ILDQVPFSV 3.89130 8.4810 8.3594 0.0886 YES
P22 YLFPGPVTA 3.56108 8.4950 8.1068 0.0651 YES
 +  P23 YLDQVPFSV 3.81535 8.6380 8.3013 0.0719 YES
P24 ILFQVPFSV 3.54314 8.6990 8.0931 0.1117 YES
P25 ILWQVPFSV 3.35313 8.7700 7.9477 0.1386 YES
 +  P26 WTDQVPFSV 2.18084 6.1450 7.0510 0.1195 YES
* P27 YLEPGPVTA 2.20298 6.6680 7.0680 0.0421 YES
* P28 ITDQVPFSV 2.47011 6.9470 7.2723 0.1029 YES
* P29 ITFQVPFSV 2.12196 7.1790 7.0060 0.1259 YES
* P30 FTDQVPFSV 2.37085 7.2120 7.1964 0.0926 YES
P31 ITMQVPFSV 1.79326 7.3980 6.7546 0.1029 YES
# P32 YLSPGPVTV 3.07233 7.6420 7.7330 0.0651 YES
 +  P33 YLYPGPVTA 3.58440 7.7720 8.1246 0.0587 YES
 +  P34 YLAPGPVTV 3.65702 7.8180 8.1802 0.0421 YES
* P35 ILAQVPFSV 3.63508 7.9390 8.1634 0.0886 YES
* P36 ILMQVPFSV 3.21445 8.1250 7.8417 0.0886 YES
# P37 YLFPGPVTV 3.56508 8.2370 8.1099 0.0651 YES
P38 YLMPGPVTA 3.23239 8.3670 7.8554 0.0421 YES
 +  P39 YLWPGPVTA 3.37107 8.4950 7.9615 0.0921 YES
 +  P40 FLDQVPFSV 3.79203 8.6580 8.2835 0.0783 YES

Table 5.

Numerical data on the correlation weights to calculate model with Eq. 17

Amino acids, Ak CW(Ak) NAT NPT NC dk
A………. 0.42063 3 3 3 0.0000
D………. 0.67685 3 2 3 0.0000
E………. − 1.02940 1 2 1 0.0000
F………. 0.32870 5 7 8 0.0231
G………. 0.17820 5 4 4 0.0111
I………. 0.42796 2 7 4 0.0333
L………. 1.19938 7 7 7 0.0000
M………. 0.0 0 3 0 0.0000
P………. 0.43966 10 10 10 0.0000
Q………. 0.13354 5 6 6 0.0091
S………. − 0.16405 5 6 8 0.0231
T………. − 0.22180 8 6 6 0.0143
V………. 0.42463 10 10 10 0.0000
W………. 0.13869 3 2 1 0.0500
Y………. 0.35202 7 3 5 0.0167

NAT, NPT, and NC are the frequencies of an amino acid in the active training set, passive training set, and the calibration set, respectively

Table 6 contains an example of calculation DCW(1,15) for epitope-peptide “WLEPGPVTA” together with the calculation of corresponding pIC50 using Eq. 17.

Table 6.

Calculation of DCW(1,15) and pIC50 for epitope-peptide = WLEPGPVTA

Ak Structure CW(Ak)
W graphic file with name 214_2020_2707_Figu_HTML.gif 0.13869
L graphic file with name 214_2020_2707_Figv_HTML.gif 1.19938
E graphic file with name 214_2020_2707_Figw_HTML.gif − 1.02940
P graphic file with name 214_2020_2707_Figx_HTML.gif 0.43966
G graphic file with name 214_2020_2707_Figy_HTML.gif 0.17820
P graphic file with name 214_2020_2707_Figz_HTML.gif 0.43966
V graphic file with name 214_2020_2707_Figaa_HTML.gif 0.42463
T graphic file with name 214_2020_2707_Figab_HTML.gif − 0.22180
A graphic file with name 214_2020_2707_Figac_HTML.gif 0.42063

DCW(1,15) = ∑CW(Ak)

pIC50 = 5.3828941 + 0.7649124*DCW(1,15)

1.98966

6.9048

Thus, the described approach can be a tool to build up models for pIC50 for epitope-peptides.

Conclusions

The described approach gives a robust model for the biological activity of peptides (Table 4). The results are quite acceptable for three random splits into the training set and validation set. The approach obeys the OECD principles [26]. Once again, the possibility to build up predictive models for endpoints related to complex molecular systems (peptides) is confirmed [58]. In addition, the described confirms once more that applying the IIC improves the predictive potential of models [20, 25].

Acknowledgements

AAT and APT are grateful for the contribution of the project LIFE-VERMEER contract (LIFE16 ENV/ES/000167) for financial support.

Compliance with ethical standards

Conflict of interest

The authors confirm they have no conflict of interest.

Footnotes

Published as part of the special collection of articles "Festschrift in honour of Prof. Ramon Carbó-Dorca".

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Restrepo G (2016) Mathematical chemistry, a new discipline. In: Scerri E, Fisher G (Eds) Essays in the philosophy of chemistry. Oxford University Press, New York, UK, Chapter 15, pp 332–351.
  • 2.Gutman I, Polansky OE. Mathematical concepts in organic chemistry. SIAM Review. 1988;30(2):348–350. doi: 10.1137/1030083. [DOI] [Google Scholar]
  • 3.Trinajstić N, Gutman I. Mathematical chemistry. Croat Chem Acta. 2002;75:329–356. [Google Scholar]
  • 4.Balaban AT. Reflections about mathematical chemistry. Found Chem. 2005;7:289–306. doi: 10.1007/s10698-005-0779-0. [DOI] [Google Scholar]
  • 5.Restrepo G, Villaveces JL. Mathematical thinking in chemistry. HYLE. 2012;18:3–22. [Google Scholar]
  • 6.Basak SC, Restrepo G, Villaveces JL (Eds) (2015) Advances in mathematical chemistry and applications. Volume 2. Bentham Science eBooks. ISBN: 9781681080529
  • 7.Engel T. Basic overview of chemoinformatics. J Chem Inf Model. 2006;46(6):2267–2277. doi: 10.1021/ci600234z. [DOI] [PubMed] [Google Scholar]
  • 8.Fradera X, Amat L, Besalú E, Carbó-Dorca R. Application of molecular quantum similarity to QSAR. Quant Struct-Act Rel. 1997;16(1):25–32. doi: 10.1002/qsar.19970160105. [DOI] [Google Scholar]
  • 9.Carbó-Dorca R. About the prediction of molecular properties using the fundamental quantum QSPR (QQSPR) equation. SAR QSAR Environ Res. 2007;18(3–4):265–284. doi: 10.1080/10629360701304113. [DOI] [PubMed] [Google Scholar]
  • 10.Poater A, Saliner AG, Carbó-Dorca R, Poater J, Solà M, Cavallo L, Worth AP. Modeling the structure-property relationships of nanoneedles: a journey toward nanomedicine. J Comput Chem. 2009;30(2):275–284. doi: 10.1002/jcc.21041. [DOI] [PubMed] [Google Scholar]
  • 11.Carbó-Dorca R, Besalú E. Construction of coherent nano quantitative structure-properties relationships (nano-QSPR) models and catastrophe theory. SAR QSAR Environ Res. 2011;22(7–8):661–665. doi: 10.1080/1062936X.2011.623319. [DOI] [PubMed] [Google Scholar]
  • 12.Ayers PL, Boyd RJ, Bultinck P, Caffarel M, Carbó-Dorca R, Causá M, Cioslowski J, Contreras-Garcia J, Cooper DL, Coppens P, Gatti C, Grabowsky S, Lazzeretti P, Macchi P, Martín Pendás Á, Popelier PLA, Ruedenberg K, Rzepa H, Savin A, Sax A, Schwarz WHE, Shahbazian S, Silvi B, Solà M, Tsirelson V. Six questions on topology in theoretical chemistry. Comput Theor Chem. 2015;1053:2–16. doi: 10.1016/j.comptc.2014.09.028. [DOI] [Google Scholar]
  • 13.Carbó-Dorca R. Toward a universal quantum QSPR operator. Int J Quantum Chem. 2018;118(15):1. doi: 10.1002/qua.25602. [DOI] [Google Scholar]
  • 14.Carbó-Dorca R, Chakraborty T. Divagations about the periodic table: BOOLEAN hypercube and quantum similarity connections. J Comput Chem. 2019;40(30):2653–2663. doi: 10.1002/jcc.26044. [DOI] [PubMed] [Google Scholar]
  • 15.Carbó-Dorca R, Chakraborty T. Hypercubes defined on n-ary sets, the Erdös–Faber–Lovász conjecture on graph coloring, and the description spaces of polypeptides and RNA. J Math Chem. 2019;57(10):2182–2194. doi: 10.1007/s10910-019-01065-6. [DOI] [Google Scholar]
  • 16.Carbó-Dorca R, Van Damme S. Solutions to the quantum QSPR problem in molecular spaces. Theor Chem Acc. 2007;118(3):673–679. doi: 10.1007/s00214-007-0352-0. [DOI] [Google Scholar]
  • 17.Ponec R, Bultinck P, Van Damme S, Carbó-Dorca R, Tantillo DJ. Geometric and electronic similarities between transition structures for electrocyclizations and sigmatropic hydrogen shifts. Theor Chem Acc. 2005;113(4):205–211. doi: 10.1007/s00214-004-0625-9. [DOI] [Google Scholar]
  • 18.Du Q-S, Huang R-B, Wei Y-T, Wang C-H, Chou K-C. Peptide reagent design based on physical and chemical properties of amino acid residues. J Comput Chem. 2007;28(12):2043–2050. doi: 10.1002/jcc.20732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hogeweg P. Multilevel cellular automata as a tool for studying bioinformatic processes. In: Kroc J, Sloot P, Hoekstra A, editors. Simulating complex systems by cellular automata. Springer, Berlin, Heidelberg: Understanding Complex Systems; 2010. pp. 19–28. [Google Scholar]
  • 20.Toropov AA, Toropova AP, Leszczynska D, Leszczynski J. “Ideal correlations” for biological activity of peptides. Biosystems. 2019;181:51–57. doi: 10.1016/j.biosystems.2019.04.008. [DOI] [PubMed] [Google Scholar]
  • 21.Toropova AP, Toropov AA, Benfenati E, Leszczynska D, Leszczynski J. Prediction of antimicrobial activity of large pool of peptides using quasi-SMILES. Biosystems. 2018;169–170:5–12. doi: 10.1016/j.biosystems.2018.05.003. [DOI] [PubMed] [Google Scholar]
  • 22.Toropova AP, Toropov AA, Beeg M, Gobbi M, Salmona M. Utilization of the monte carlo method to build up QSAR models for hemolysis and cytotoxicity of antimicrobial peptides. Curr Drug Discov Technol. 2017;14(4):229–243. doi: 10.2174/1570163814666170525114128. [DOI] [PubMed] [Google Scholar]
  • 23.Toropov AA, Toropova AP, Raska I, Jr, Benfenati E, Gini G. QSAR modeling of endpoints for peptides which is based on representation of the molecular structure by a sequence of amino acids. Struct Chem. 2012;23(6):1891–1904. doi: 10.1007/s11224-012-9995-0. [DOI] [Google Scholar]
  • 24.Weininger D. SMILES, a chemical language and information system: 1: Introduction to methodology and encoding rules. J Chem Inf Comput Sci. 1988;28(1):31–36. doi: 10.1021/ci00057a005. [DOI] [Google Scholar]
  • 25.Toropov AA, Carbó-Dorca R, Toropova AP. Index of Ideality of correlation: new possibilities to validate QSAR: a case study. Struct Chem. 2018;29(1):33–38. doi: 10.1007/s11224-017-0997-9. [DOI] [Google Scholar]
  • 26.Toropova AP, Toropov AA. CORAL software: prediction of carcinogenicity of drugs by means of the monte carlo method. Eur J Pharm Sci. 2014;52(1):21–25. doi: 10.1016/j.ejps.2013.10.005. [DOI] [PubMed] [Google Scholar]

Articles from Theoretical Chemistry Accounts are provided here courtesy of Nature Publishing Group

RESOURCES