The sequence of amino acids as the basis for the model of biological activity of peptides

Alla P Toropova; Maria Raškova; Ivan Raška Jr; Andrey A Toropov

doi:10.1007/s00214-020-02707-8

. 2021 Jan 22;140(2):15. doi: 10.1007/s00214-020-02707-8

The sequence of amino acids as the basis for the model of biological activity of peptides

Alla P Toropova ¹, Maria Raškova ², Ivan Raška Jr ², Andrey A Toropov ^1,^✉

PMCID: PMC7820519 PMID: 33500680

Abstract

The algorithm of building up a model for the biological activity of peptides as a mathematical function of a sequence of amino acids is suggested. The general scheme is the following: The total set of available data is distributed into the active training set, passive training set, calibration set, and validation set. The training (both active and passive) and calibration sets are a system of generation of a model of biological activity where each amino acid obtains special correlation weight. The numerical data on the correlation weights calculated by the Monte Carlo method using the CORAL software (http://www.insilico.eu/coral). The target function aimed to give the best result for the calibration set (not for the training set). The final checkup of the model is carried out with data on the validation set (peptides, which are not visible during the creation of the model). Described computational experiments confirm the ability of the approach to be a tool for the design of predictive models for the biological activity of peptides (expressed by pIC50).

Keywords: QSAR, Amino acid, Peptide, Monte Carlo method, Index of ideality of correlation

Introduction

History of mathematical chemistry contains contributions of many outstanding scientists, such as A.T. Balaban, M. Randić, I. Gutman, N.Trinajstić, S.C. Basak, R. Carbó-Dorca, as well as many others [1–15]. Mathematical chemistry [1] is the area of research engaged in novel applications of mathematics to chemistry, biochemistry, and biology. It concerns itself principally with the mathematical modeling of complex molecular phenomena [2].

Most areas of research in mathematical chemistry include chemical graph theory, which deals with the development of topological descriptors which find application in quantitative structure–property relationships [3, 4], as well as chemical aspects of group theory, which finds applications in stereochemistry and quantum chemistry [5, 6].

Chemoinformatics is a relatively young field of natural sciences. By analogy with "in viva" and "in vitro," the results of cheminformatics denominate as "in silico" [7].

It is to be noted, contributions of Prof. R. Carbó-Dorca, related to the development of cheminformatics tools applied to quantum mechanical theoretical problems, which gave the possibility to solve chemical problems, like catalysis and reactivity, by simple computational schemes [8–12]. Chemoinformatic gradually extends to solve tasks in fields of theoretical chemistry, computational chemistry, and modeling [13–15].

Apply mathematical methods to solve the tasks of chemistry and biochemistry can be effective [16, 17]. Peptides are important objects of chemistry, biochemistry, and medicine. Most interest in using proteins and peptides is caused by their application in drug design [18]. The amino acid residues of epitope-peptide substrate and SARS coronavirus main protease are interacting. Hence, the affinity of epitope-peptides with class I MHC (major histocompatibility complex) molecules can be used to development of antiviral agents, e.g., toward coronaviruses [18].

A fundamentally widely accepted science principle to understand complex systems is “Everything should be made as simple as possible, but no simpler” [19]. Perhaps, the approach used here cannot be adequately evaluated using the above principle, since a simpler method is not possible, or at least a simpler approach has not yet been described in the literature [20–23]. To state the approach “simpler” than “simple” is not correct, since the approach gives quite good models [20–23]. The model of biological activity of peptides described here is based on sequences of amino acids, represented by 1-letter codes (Table 1).

Table 1.

Structures and 1-letter codes for Amino acids

Amino acid	1-letter code	Structure
Alanine	A
Arginine	R
Asparagine	N
Aspartic Acid	D
Cysteine	C
Glutamic acid	E
Glutamine	Q
Glycine	G
Histidine	H
Isoleucine	I
Leucine	L
Lysine	K
Methionine	M
Phenylalanine	F
Proline	P
Serine	S
Threonine	T
Tryptophan	W
Tyrosine	Y
Valine	V

Open in a new tab

The aim of the present study is the estimation of the CORAL software to provide a satisfactory model for the bioactivity of peptides. Representation of peptides via a sequence of amino acids is like a well-known simplified molecular input-line entry system (SMILES) [24]. Consequently, the CORAL software (www.insilico.eu/coral) that is oriented to build up quantitative structure–activity relationships (QSARs) using the SMILES representation can be a tool to build up a predictive model for the activity of peptides as a function of sequences of the 1-letter codes of corresponding amino acids [25]. Factually, the sequences of amino acids represented by 1-letter codes are quasi-SMILES [20, 21].

Method

Data

The numerical data on the biological activity of epitope-peptides with class I MHC (major histocompatibility complex) molecules taken from the literature [18]. The endpoint expressed via a negative logarithm of half-maximal inhibitory concentration IC₅₀ (pIC₅₀). Table 1 contains sequences of amino acids represent epitope-peptides studied here.

The available epitope-peptides were randomly distributed into the active training set (25%), passive training set (25%), calibration set (25%), and validation set (25%). Each above set has a defined task. The task for the active training set is to build up optimal correlation weights for the optimal descriptor. The task for the passive training set is to checkup whether current correlation weights (and the optimal descriptor) are satisfactory for peptides, which are not involved in the calculation of the correlation weights. The task for the calibration set is to detect the moment of the begin overtraining. The task of peptides from the validation set is the final estimation of the predictive potential of the model.

Quantitative structure–activity relationships (QSARs)

The CORAL software provides models, which are linear one-variable correlations obtained by the Monte Carlo method (http://www.insilico.eu/coral). The generalized representation of the model for the biological activity of peptides is the following one-variable correlation:

p I C_{50} = C_{0} + C_{1} \times D C W (T, N)

The DCW(T,N) is the descriptor of correlation weights (DCW). The C₀ and C₁ are regression coefficients. The T and N are parameters of the Monte Carlo optimization discussed below.

The descriptor of correlation weights (DCW)

The descriptors applied to QSAR analysis are calculated as the following:

D C W (T^{*}, N^{*}) = \sum_{}^{} C W (A_{k})

The A_k is a 1-letter code of amino acid; CW(A_k) is the correlation weights for the A_k.

The T is an integer to define two classes (i) the rare and (ii) non-rare. If the frequency of A_k in the active training set is less than T, the A_k is rare, and the CW(A_k) = 0 (i.e., the A_k is removed from the modeling process). Thus, the model is based on correlation weights solely non-rare in the active training set amino acids. The N is the number of iterations for the Monte Carlo optimization. The T = T* and N = N* are values which provide the best statistical quality of the model for the calibration set.

Monte Carlo optimization

The scheme of the Monte Carlo optimization is described in [23, 25]. The essence of this version of the optimization procedure is the application of the Index of ideality of correlation (IIC). Models for the inhibitory activity of peptides built up here are build up to apply two different target functions TF₁ and TF₂:

T F_{1} = R_{AT} + R_{PT} - |R_{AT} - R_{PT}| * 0.1

T F_{2} = T F_{1} + I I C_{CLB} * 0.5

The $R_{AT}$ and $R_{PT}$ are the correlation coefficient between observed and predicted endpoints for the active training set and passive training set, respectively.

The IIC_CLB is calculated with data on the calibration set as the following:

I I C_{CLB} = r_{CLB} \frac{min (_{}^{-} M A E_{CLB},_{}^{+} M A E_{CLB})}{max (_{}^{-} M A E_{CLB},_{}^{+} M A E_{CLB})}

_{}^{-} M A E_{CLB} = \frac{1}{_{}^{-} N} \sum_{k = 1}^{_{}^{-} N} |Δ_{k}|, Δ_{k} \leq 0 ;_{}^{-} N i s t h e n u m b e r o f Δ_{k} \leq 0

^{+} M A E_{CLB} = \frac{1}{^{+} N} \sum_{k = 1}^{^{+} N} |Δ_{k}|, Δ_{k} \leq 0 ;^{+} N is the number o f Δ_{k} \leq 0

Δ_{k} = o b s e r v e d_{k} - c a l c u l a t e d_{k}

The observed and calculated are the corresponding values of pIC₅₀.

Figure 1 contains the comparison of histories of the Monte Carlo optimizations with target functions TF₁ and TF₂. One can see, the TF₂ seems preferable because factually the decrease in the statistical quality for calibration set and validation set is not observed, whereas in the case of TF₁ the decrease in the statistical quality for the calibration set and validation set is observed.

Fig. 1 — Histories of the Monte Carlo optimization (Split 1) with target functions TF₁ and TF₂

Domain of applicability

The domain of applicability for the CORAL model is defined according to the distribution of SMILES attributes in the active training set and calibration set as two steps:

Step 1: the definition of the statistical defect (d_k) for each SMILES attribute involved in building up of a model:

d_{k} = \frac{|P (A_{k}) - P^{'} (A_{k})|}{N (A_{k}) + N (A_{k})}

where P(A_k) and P’(A_k) are the probability of A_k in the training and calibration sets, respectively.

N(A_k) and N’(A_k) are frequencies of A_k in the training and calibration sets, respectively.

Step 2: the calculation for all substances the statistical SMILES-defect (D_j):

D_{j} = \sum_{k = 1}^{NA} d_{k}

where NA is the number of non-blocked SMILES attributes in the SMILES.

A substance falls in the domain of applicability if

D j < 2 * \bar{D}

where $\bar{D}$ is the average of the statistical SMILES-defect for the training set.

The same operation can be carried out with the sequences of 1-letter codes of amino acids, if instead of A_k defined as a SMILES attribute, one examined A_k defined as a 1-letter code of corresponding amino acids.

Results and discussion

The models obtained for three random splits into the training set (which is association of the active and passive training sets together with the calibration set) and validation set are the following:

Target Function TF₁

p I C_{50} = 5.0637012 (\pm 0.3150527) + 0.9790357 (\pm 0.1064904) * D C W (1, 3)

{pIC}_{50} = 5.3015843 (\pm 0.1783155) + 1.4109089 (\pm 0.1001528) * DCW (1, 3)

{pIC}_{50} = 2.6879582 (\pm 0.2459626) + 1.0011131 (\pm 0.0482456) * DCW (1, 3)

Target Function TF₂

{pIC}_{50} = 4.0179522 (\pm 0.5296001) + 0.4553542 (\pm 0.0634366) * DCW (1, 15)

{pIC}_{50} = 4.8689021 (\pm 0.3087049) + 0.6850851 (\pm 0.0712025) * DCW (1, 15)

{pIC}_{50} = 5.3828941 (\pm 0.4250702) + 0.7649124 (\pm 0.1215301) * DCW (1, 15)

Table 2 contains the statistical characteristics of the models calculated with Eqs. 12–17.

Table 2.

Statistical quality of models for three random splits

Split	Set	R²	Q²	IIC	RMSE
	Optimization with TF₁
1	Active training	0.7625	0.5558	0.8732	0.360
	Passive training	0.8250	0.7065	0.6739	0.395
	Calibration	0.6012	0.4017	0.3695	0.506
	Validation	0.6220	0.4816		0.490
2	Active training	0.8205	0.7052	0.9058	0.333
	Passive training	0.9165	0.8301	0.4709	0.374
	Calibration	0.5223	0.2836	0.4258	0.592
	Validation	0.5481	0.3476		0.515
3	Active training	0.8846	0.8229	0.9406	0.265
	Passive training	0.7283	0.5982	0.8264	0.599
	Calibration	0.5053	0.2612	0.3745	0.927
	Validation	0.5900	0.3277		0.700
	Optimization with TF₂
1	Active training	0.6416	0.3506	0.5340	0.442
	Passive training	0.7231	0.5868	0.4120	0.507
	Calibration	0.9486	0.9157	0.9679	0.142
	Validation	0.7766	0.6298		0.306
2	Active training	0.6976	0.4905	0.5568	0.432
	Passive training	0.9543	0.9192	0.8516	0.332
	Calibration	0.7102	0.5447	0.8406	0.337
	Validation	0.7856	0.6596		0.270
3	Active training	0.5326	0.1846	0.7298	0.533
	Passive training	0.8128	0.6796	0.6251	0.562
	Calibration	0.8743	0.8139	0.8827	0.214
	Validation	0.7909	0.6721		0.248

Open in a new tab

Each set contains ten peptides

The best model is indicated by bold

One can see, the predictive potential of models calculated using the IIC is better.

Having numerical data on correlation weights of different amino acids obtained in several runs of the optimization, one can detect the amino acids of two classes: (1) amino acids with stable positive correlation weights, these are promoters of increase of pIC50; and (2) amino acids with stable negative correlation weights, these are promoters of decrease of pIC50. Thus, the approach gives the statistical mechanistic interpretation of the models. Table 3 contains a collection of amino acids which are promoters of increase/decrease for pIC₅₀. It is to be noted, the prevalence of corresponding amino acids also should be considering.

Table 3.

Amino acids which are promoters of increase / decrease for pIC50 for examined peptides

Comment	A_k	CWsProbe 1	CWsProbe 2	CWsProbe 3	N_AT	N_PT	N_C	d_k
Increase	V……….	0.47695	0.30991	0.26611	10	10	10	0.0000
	L……….	1.29542	0.73164	0.31587	8	5	7	0.0067
	F……….	1.07326	0.70770	0.37614	6	6	7	0.0077
	I……….	0.76211	0.16717	0.34684	6	3	4	0.0200
	A……….	0.54686	0.01821	0.06304	4	3	2	0.0333
	G……….	0.44966	0.52819	0.73395	4	5	4	0.0000
	Y……….	1.46411	0.65332	0.40546	4	5	5	0.0111
	M……….	1.29967	0.55126	0.39601	2	0	3	0.0200
Decrease	T……….	− 0.26044	− 0.28480	− 0.34702	6	9	6	0.0000
	E……….	− 0.62472	− 0.62778	− 0.55954	1	3	1	0.0000

Open in a new tab

N_AT, N_PT, and N_C are the frequencies of an amino acid in the active training set, passive training set, and the calibration set, respectively

Table 4 contains experimental and calculated with Eq. 17 pIC₅₀. Table 5 contains the numerical data on the correlation weights of amino acids to calculate the model with Eq. 17.

Table 4.

Experimental and calculated with Eq. 17 pIC₅₀ for model obtained with split 3 (the best model): “ + ” is the indicator for the active training set; “– ” is the indicator for the passive training set; “#” is the indicator of calibration set; and “*” is the indicator for validation set

Set	ID	Sequence of amino acids	DCW(1,15)	pIC₅₀ Expr	pIC₅₀ Calc	D_J $(\bar{D} = 0.08757)$	Applicability
–	P01	WLEPGPVTA	1.98966	6.0820	6.9048	0.0754	YES
–	P02	ITSQVPFSV	1.62921	6.1960	6.6291	0.1259	YES
#	P03	FLEPGPVTA	2.17966	6.8980	7.0501	0.0485	YES
#	P04	ITAQVPFSV	2.21389	7.0200	7.0763	0.1029	YES
+	P05	YLEPGPVTL	2.98174	7.0580	7.6637	0.0421	YES
#	P06	YTDQVPFSV	2.39417	7.0660	7.2142	0.0862	YES
–	P07	YLEPGPVTI	2.21031	7.1870	7.0736	0.0754	YES
*	P08	YLEPGPVTV	2.20698	7.3420	7.0710	0.0421	YES
#	P09	YLSPGPVTA	3.06834	7.3830	7.7299	0.0651	YES
#	P10	IIDQVPFSV	3.11987	7.3980	7.7693	0.1219	YES
+	P11	ITWQVPFSV	1.93195	7.4630	6.8607	0.1529	YES
+	P12	ITYQVPFSV	2.14528	7.4800	7.0238	0.1195	YES
#	P13	ILSQVPFSV	3.05039	7.6990	7.7162	0.1117	YES
–	P14	IMDQVPFSV	2.69191	7.7190	7.4420	0.0886	YES
*	P15	YLMPGPVTV	3.23638	7.9320	7.8584	0.0421	YES
#	P16	WLDQVPFSV	3.60203	7.9390	8.1381	0.1052	YES
*	P17	YLAPGPVTA	3.65302	8.0320	8.1771	0.0421	YES
+	P18	YLYPGPVTV	3.58840	8.0510	8.1277	0.0587	YES
*	P19	YLWPGPVTV	3.37507	8.1250	7.9645	0.0921	YES
#	P20	ILYQVPFSV	3.56646	8.3100	8.1109	0.1052	YES
–	P21	ILDQVPFSV	3.89130	8.4810	8.3594	0.0886	YES
–	P22	YLFPGPVTA	3.56108	8.4950	8.1068	0.0651	YES
+	P23	YLDQVPFSV	3.81535	8.6380	8.3013	0.0719	YES
–	P24	ILFQVPFSV	3.54314	8.6990	8.0931	0.1117	YES
–	P25	ILWQVPFSV	3.35313	8.7700	7.9477	0.1386	YES
+	P26	WTDQVPFSV	2.18084	6.1450	7.0510	0.1195	YES
*	P27	YLEPGPVTA	2.20298	6.6680	7.0680	0.0421	YES
*	P28	ITDQVPFSV	2.47011	6.9470	7.2723	0.1029	YES
*	P29	ITFQVPFSV	2.12196	7.1790	7.0060	0.1259	YES
*	P30	FTDQVPFSV	2.37085	7.2120	7.1964	0.0926	YES
–	P31	ITMQVPFSV	1.79326	7.3980	6.7546	0.1029	YES
#	P32	YLSPGPVTV	3.07233	7.6420	7.7330	0.0651	YES
+	P33	YLYPGPVTA	3.58440	7.7720	8.1246	0.0587	YES
+	P34	YLAPGPVTV	3.65702	7.8180	8.1802	0.0421	YES
*	P35	ILAQVPFSV	3.63508	7.9390	8.1634	0.0886	YES
*	P36	ILMQVPFSV	3.21445	8.1250	7.8417	0.0886	YES
#	P37	YLFPGPVTV	3.56508	8.2370	8.1099	0.0651	YES
–	P38	YLMPGPVTA	3.23239	8.3670	7.8554	0.0421	YES
+	P39	YLWPGPVTA	3.37107	8.4950	7.9615	0.0921	YES
+	P40	FLDQVPFSV	3.79203	8.6580	8.2835	0.0783	YES

Open in a new tab

Table 5.

Numerical data on the correlation weights to calculate model with Eq. 17

Amino acids, A_k	CW(A_k)	N_AT	N_PT	N_C	d_k
A……….	0.42063	3	3	3	0.0000
D……….	0.67685	3	2	3	0.0000
E……….	− 1.02940	1	2	1	0.0000
F……….	0.32870	5	7	8	0.0231
G……….	0.17820	5	4	4	0.0111
I……….	0.42796	2	7	4	0.0333
L……….	1.19938	7	7	7	0.0000
M……….	0.0	0	3	0	0.0000
P……….	0.43966	10	10	10	0.0000
Q……….	0.13354	5	6	6	0.0091
S……….	− 0.16405	5	6	8	0.0231
T……….	− 0.22180	8	6	6	0.0143
V……….	0.42463	10	10	10	0.0000
W……….	0.13869	3	2	1	0.0500
Y……….	0.35202	7	3	5	0.0167

Open in a new tab

N_AT, N_PT, and N_C are the frequencies of an amino acid in the active training set, passive training set, and the calibration set, respectively

Table 6 contains an example of calculation DCW(1,15) for epitope-peptide “WLEPGPVTA” together with the calculation of corresponding pIC50 using Eq. 17.

Table 6.

Calculation of DCW(1,15) and pIC50 for epitope-peptide = WLEPGPVTA

A_k	Structure	CW(A_k)
W		0.13869
L		1.19938
E		− 1.02940
P		0.43966
G		0.17820
P		0.43966
V		0.42463
T		− 0.22180
A		0.42063
	DCW(1,15) = ∑CW(A_k) pIC₅₀ = 5.3828941 + 0.7649124DCW(1,15)*	1.98966 6.9048

Open in a new tab

Thus, the described approach can be a tool to build up models for pIC₅₀ for epitope-peptides.

Conclusions

The described approach gives a robust model for the biological activity of peptides (Table 4). The results are quite acceptable for three random splits into the training set and validation set. The approach obeys the OECD principles [26]. Once again, the possibility to build up predictive models for endpoints related to complex molecular systems (peptides) is confirmed [5–8]. In addition, the described confirms once more that applying the IIC improves the predictive potential of models [20, 25].

Acknowledgements

AAT and APT are grateful for the contribution of the project LIFE-VERMEER contract (LIFE16 ENV/ES/000167) for financial support.

Compliance with ethical standards

Conflict of interest

The authors confirm they have no conflict of interest.

Footnotes

Published as part of the special collection of articles "Festschrift in honour of Prof. Ramon Carbó-Dorca".

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Restrepo G (2016) Mathematical chemistry, a new discipline. In: Scerri E, Fisher G (Eds) Essays in the philosophy of chemistry. Oxford University Press, New York, UK, Chapter 15, pp 332–351.
2.Gutman I, Polansky OE. Mathematical concepts in organic chemistry. SIAM Review. 1988;30(2):348–350. doi: 10.1137/1030083. [DOI] [Google Scholar]
3.Trinajstić N, Gutman I. Mathematical chemistry. Croat Chem Acta. 2002;75:329–356. [Google Scholar]
4.Balaban AT. Reflections about mathematical chemistry. Found Chem. 2005;7:289–306. doi: 10.1007/s10698-005-0779-0. [DOI] [Google Scholar]
5.Restrepo G, Villaveces JL. Mathematical thinking in chemistry. HYLE. 2012;18:3–22. [Google Scholar]
6.Basak SC, Restrepo G, Villaveces JL (Eds) (2015) Advances in mathematical chemistry and applications. Volume 2. Bentham Science eBooks. ISBN: 9781681080529
7.Engel T. Basic overview of chemoinformatics. J Chem Inf Model. 2006;46(6):2267–2277. doi: 10.1021/ci600234z. [DOI] [PubMed] [Google Scholar]
8.Fradera X, Amat L, Besalú E, Carbó-Dorca R. Application of molecular quantum similarity to QSAR. Quant Struct-Act Rel. 1997;16(1):25–32. doi: 10.1002/qsar.19970160105. [DOI] [Google Scholar]
9.Carbó-Dorca R. About the prediction of molecular properties using the fundamental quantum QSPR (QQSPR) equation. SAR QSAR Environ Res. 2007;18(3–4):265–284. doi: 10.1080/10629360701304113. [DOI] [PubMed] [Google Scholar]
10.Poater A, Saliner AG, Carbó-Dorca R, Poater J, Solà M, Cavallo L, Worth AP. Modeling the structure-property relationships of nanoneedles: a journey toward nanomedicine. J Comput Chem. 2009;30(2):275–284. doi: 10.1002/jcc.21041. [DOI] [PubMed] [Google Scholar]
11.Carbó-Dorca R, Besalú E. Construction of coherent nano quantitative structure-properties relationships (nano-QSPR) models and catastrophe theory. SAR QSAR Environ Res. 2011;22(7–8):661–665. doi: 10.1080/1062936X.2011.623319. [DOI] [PubMed] [Google Scholar]
12.Ayers PL, Boyd RJ, Bultinck P, Caffarel M, Carbó-Dorca R, Causá M, Cioslowski J, Contreras-Garcia J, Cooper DL, Coppens P, Gatti C, Grabowsky S, Lazzeretti P, Macchi P, Martín Pendás Á, Popelier PLA, Ruedenberg K, Rzepa H, Savin A, Sax A, Schwarz WHE, Shahbazian S, Silvi B, Solà M, Tsirelson V. Six questions on topology in theoretical chemistry. Comput Theor Chem. 2015;1053:2–16. doi: 10.1016/j.comptc.2014.09.028. [DOI] [Google Scholar]
13.Carbó-Dorca R. Toward a universal quantum QSPR operator. Int J Quantum Chem. 2018;118(15):1. doi: 10.1002/qua.25602. [DOI] [Google Scholar]
14.Carbó-Dorca R, Chakraborty T. Divagations about the periodic table: BOOLEAN hypercube and quantum similarity connections. J Comput Chem. 2019;40(30):2653–2663. doi: 10.1002/jcc.26044. [DOI] [PubMed] [Google Scholar]
15.Carbó-Dorca R, Chakraborty T. Hypercubes defined on n-ary sets, the Erdös–Faber–Lovász conjecture on graph coloring, and the description spaces of polypeptides and RNA. J Math Chem. 2019;57(10):2182–2194. doi: 10.1007/s10910-019-01065-6. [DOI] [Google Scholar]
16.Carbó-Dorca R, Van Damme S. Solutions to the quantum QSPR problem in molecular spaces. Theor Chem Acc. 2007;118(3):673–679. doi: 10.1007/s00214-007-0352-0. [DOI] [Google Scholar]
17.Ponec R, Bultinck P, Van Damme S, Carbó-Dorca R, Tantillo DJ. Geometric and electronic similarities between transition structures for electrocyclizations and sigmatropic hydrogen shifts. Theor Chem Acc. 2005;113(4):205–211. doi: 10.1007/s00214-004-0625-9. [DOI] [Google Scholar]
18.Du Q-S, Huang R-B, Wei Y-T, Wang C-H, Chou K-C. Peptide reagent design based on physical and chemical properties of amino acid residues. J Comput Chem. 2007;28(12):2043–2050. doi: 10.1002/jcc.20732. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Hogeweg P. Multilevel cellular automata as a tool for studying bioinformatic processes. In: Kroc J, Sloot P, Hoekstra A, editors. Simulating complex systems by cellular automata. Springer, Berlin, Heidelberg: Understanding Complex Systems; 2010. pp. 19–28. [Google Scholar]
20.Toropov AA, Toropova AP, Leszczynska D, Leszczynski J. “Ideal correlations” for biological activity of peptides. Biosystems. 2019;181:51–57. doi: 10.1016/j.biosystems.2019.04.008. [DOI] [PubMed] [Google Scholar]
21.Toropova AP, Toropov AA, Benfenati E, Leszczynska D, Leszczynski J. Prediction of antimicrobial activity of large pool of peptides using quasi-SMILES. Biosystems. 2018;169–170:5–12. doi: 10.1016/j.biosystems.2018.05.003. [DOI] [PubMed] [Google Scholar]
22.Toropova AP, Toropov AA, Beeg M, Gobbi M, Salmona M. Utilization of the monte carlo method to build up QSAR models for hemolysis and cytotoxicity of antimicrobial peptides. Curr Drug Discov Technol. 2017;14(4):229–243. doi: 10.2174/1570163814666170525114128. [DOI] [PubMed] [Google Scholar]
23.Toropov AA, Toropova AP, Raska I, Jr, Benfenati E, Gini G. QSAR modeling of endpoints for peptides which is based on representation of the molecular structure by a sequence of amino acids. Struct Chem. 2012;23(6):1891–1904. doi: 10.1007/s11224-012-9995-0. [DOI] [Google Scholar]
24.Weininger D. SMILES, a chemical language and information system: 1: Introduction to methodology and encoding rules. J Chem Inf Comput Sci. 1988;28(1):31–36. doi: 10.1021/ci00057a005. [DOI] [Google Scholar]
25.Toropov AA, Carbó-Dorca R, Toropova AP. Index of Ideality of correlation: new possibilities to validate QSAR: a case study. Struct Chem. 2018;29(1):33–38. doi: 10.1007/s11224-017-0997-9. [DOI] [Google Scholar]
26.Toropova AP, Toropov AA. CORAL software: prediction of carcinogenicity of drugs by means of the monte carlo method. Eur J Pharm Sci. 2014;52(1):21–25. doi: 10.1016/j.ejps.2013.10.005. [DOI] [PubMed] [Google Scholar]

[CR1] 1.Restrepo G (2016) Mathematical chemistry, a new discipline. In: Scerri E, Fisher G (Eds) Essays in the philosophy of chemistry. Oxford University Press, New York, UK, Chapter 15, pp 332–351.

[CR2] 2.Gutman I, Polansky OE. Mathematical concepts in organic chemistry. SIAM Review. 1988;30(2):348–350. doi: 10.1137/1030083. [DOI] [Google Scholar]

[CR3] 3.Trinajstić N, Gutman I. Mathematical chemistry. Croat Chem Acta. 2002;75:329–356. [Google Scholar]

[CR4] 4.Balaban AT. Reflections about mathematical chemistry. Found Chem. 2005;7:289–306. doi: 10.1007/s10698-005-0779-0. [DOI] [Google Scholar]

[CR5] 5.Restrepo G, Villaveces JL. Mathematical thinking in chemistry. HYLE. 2012;18:3–22. [Google Scholar]

[CR6] 6.Basak SC, Restrepo G, Villaveces JL (Eds) (2015) Advances in mathematical chemistry and applications. Volume 2. Bentham Science eBooks. ISBN: 9781681080529

[CR7] 7.Engel T. Basic overview of chemoinformatics. J Chem Inf Model. 2006;46(6):2267–2277. doi: 10.1021/ci600234z. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Fradera X, Amat L, Besalú E, Carbó-Dorca R. Application of molecular quantum similarity to QSAR. Quant Struct-Act Rel. 1997;16(1):25–32. doi: 10.1002/qsar.19970160105. [DOI] [Google Scholar]

[CR9] 9.Carbó-Dorca R. About the prediction of molecular properties using the fundamental quantum QSPR (QQSPR) equation. SAR QSAR Environ Res. 2007;18(3–4):265–284. doi: 10.1080/10629360701304113. [DOI] [PubMed] [Google Scholar]

[CR10] 10.Poater A, Saliner AG, Carbó-Dorca R, Poater J, Solà M, Cavallo L, Worth AP. Modeling the structure-property relationships of nanoneedles: a journey toward nanomedicine. J Comput Chem. 2009;30(2):275–284. doi: 10.1002/jcc.21041. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Carbó-Dorca R, Besalú E. Construction of coherent nano quantitative structure-properties relationships (nano-QSPR) models and catastrophe theory. SAR QSAR Environ Res. 2011;22(7–8):661–665. doi: 10.1080/1062936X.2011.623319. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Ayers PL, Boyd RJ, Bultinck P, Caffarel M, Carbó-Dorca R, Causá M, Cioslowski J, Contreras-Garcia J, Cooper DL, Coppens P, Gatti C, Grabowsky S, Lazzeretti P, Macchi P, Martín Pendás Á, Popelier PLA, Ruedenberg K, Rzepa H, Savin A, Sax A, Schwarz WHE, Shahbazian S, Silvi B, Solà M, Tsirelson V. Six questions on topology in theoretical chemistry. Comput Theor Chem. 2015;1053:2–16. doi: 10.1016/j.comptc.2014.09.028. [DOI] [Google Scholar]

[CR13] 13.Carbó-Dorca R. Toward a universal quantum QSPR operator. Int J Quantum Chem. 2018;118(15):1. doi: 10.1002/qua.25602. [DOI] [Google Scholar]

[CR14] 14.Carbó-Dorca R, Chakraborty T. Divagations about the periodic table: BOOLEAN hypercube and quantum similarity connections. J Comput Chem. 2019;40(30):2653–2663. doi: 10.1002/jcc.26044. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Carbó-Dorca R, Chakraborty T. Hypercubes defined on n-ary sets, the Erdös–Faber–Lovász conjecture on graph coloring, and the description spaces of polypeptides and RNA. J Math Chem. 2019;57(10):2182–2194. doi: 10.1007/s10910-019-01065-6. [DOI] [Google Scholar]

[CR16] 16.Carbó-Dorca R, Van Damme S. Solutions to the quantum QSPR problem in molecular spaces. Theor Chem Acc. 2007;118(3):673–679. doi: 10.1007/s00214-007-0352-0. [DOI] [Google Scholar]

[CR17] 17.Ponec R, Bultinck P, Van Damme S, Carbó-Dorca R, Tantillo DJ. Geometric and electronic similarities between transition structures for electrocyclizations and sigmatropic hydrogen shifts. Theor Chem Acc. 2005;113(4):205–211. doi: 10.1007/s00214-004-0625-9. [DOI] [Google Scholar]

[CR18] 18.Du Q-S, Huang R-B, Wei Y-T, Wang C-H, Chou K-C. Peptide reagent design based on physical and chemical properties of amino acid residues. J Comput Chem. 2007;28(12):2043–2050. doi: 10.1002/jcc.20732. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Hogeweg P. Multilevel cellular automata as a tool for studying bioinformatic processes. In: Kroc J, Sloot P, Hoekstra A, editors. Simulating complex systems by cellular automata. Springer, Berlin, Heidelberg: Understanding Complex Systems; 2010. pp. 19–28. [Google Scholar]

[CR20] 20.Toropov AA, Toropova AP, Leszczynska D, Leszczynski J. “Ideal correlations” for biological activity of peptides. Biosystems. 2019;181:51–57. doi: 10.1016/j.biosystems.2019.04.008. [DOI] [PubMed] [Google Scholar]

[CR21] 21.Toropova AP, Toropov AA, Benfenati E, Leszczynska D, Leszczynski J. Prediction of antimicrobial activity of large pool of peptides using quasi-SMILES. Biosystems. 2018;169–170:5–12. doi: 10.1016/j.biosystems.2018.05.003. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Toropova AP, Toropov AA, Beeg M, Gobbi M, Salmona M. Utilization of the monte carlo method to build up QSAR models for hemolysis and cytotoxicity of antimicrobial peptides. Curr Drug Discov Technol. 2017;14(4):229–243. doi: 10.2174/1570163814666170525114128. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Toropov AA, Toropova AP, Raska I, Jr, Benfenati E, Gini G. QSAR modeling of endpoints for peptides which is based on representation of the molecular structure by a sequence of amino acids. Struct Chem. 2012;23(6):1891–1904. doi: 10.1007/s11224-012-9995-0. [DOI] [Google Scholar]

[CR24] 24.Weininger D. SMILES, a chemical language and information system: 1: Introduction to methodology and encoding rules. J Chem Inf Comput Sci. 1988;28(1):31–36. doi: 10.1021/ci00057a005. [DOI] [Google Scholar]

[CR25] 25.Toropov AA, Carbó-Dorca R, Toropova AP. Index of Ideality of correlation: new possibilities to validate QSAR: a case study. Struct Chem. 2018;29(1):33–38. doi: 10.1007/s11224-017-0997-9. [DOI] [Google Scholar]

[CR26] 26.Toropova AP, Toropov AA. CORAL software: prediction of carcinogenicity of drugs by means of the monte carlo method. Eur J Pharm Sci. 2014;52(1):21–25. doi: 10.1016/j.ejps.2013.10.005. [DOI] [PubMed] [Google Scholar]

PERMALINK

The sequence of amino acids as the basis for the model of biological activity of peptides

Alla P Toropova

Maria Raškova

Ivan Raška Jr

Andrey A Toropov

Abstract