Abstract
Quantitative relationships between molecular structure and p56lck protein tyrosine kinase inhibitory activity of 50 flavonoid derivatives are discovered by MLR and GA-PLS methods. Different QSAR models revealed that substituent electronic descriptors (SED) parameters have significant impact on protein tyrosine kinase inhibitory activity of the compounds. Between the two statistical methods employed, GA-PLS gave superior results. The resultant GA-PLS model had a high statistical quality (R2 = 0.74 and Q2 = 0.61) for predicting the activity of the inhibitors. The models proposed in the present work are more useful in describing QSAR of flavonoid derivatives as p56lck protein tyrosine kinase inhibitors than those provided previously.
Keywords: Protein tyrosine kinase, Flavonoid, QSAR, Chemometrics, SED analysis
1. Introduction
The quantitative structure-activity relationship (QSAR) research field provides medicinal chemists with the ability to predict drug activity by mathematical equations which construct a relationship between the chemical structure and the biological activity [1, 2]. These mathematical equations are in the form of y = Xb+e that describe a set of predictor variables (X) with a predicted variable (y) by means of a regression vector (b) [3]. After the earlier QSAR studies by Hansch, who showed a correlation between biological activity and octanol-water partition coefficient [2], it is now assumed that the sum of substituent effects on the steric, electronic and hydrophobic interaction of compounds with their receptor determines their biological activity [4–6]. The first step in constructing the QSAR models is finding one or more molecular descriptors that represent variation in the structural property of the molecules by a number [7]. Nowadays, a wide range of descriptors are being used in QSAR studies which can be classified into different categories according to the Karelson approach including; constitutional, geometrical, topological, quantum, chemical and so on [8]. There are different variable selection methods available including; multiple linear regression (MLR), genetic algorithm (GA), principal component or factor analysis (PCA/FA) and so on. The mathematical relationships between molecular descriptors and activity are used to find the parameters affecting the biological activity and/or estimate the property of other molecules.
It is now well established that protein tyrosine kinases (PTKs) provide a central switching mechanism in cellular signal transduction pathways by catalyzing the transfer of the γ-phosphate of either ATP or GTP to specific tyrosine residues in certain protein substrates [9, 10]. This regulatory control plays a crucial role in signal transduction pathways that regulate several cellular functions under both normal and deregulated conditions [11–14]. PTKs are the intracellular effectors for many growth hormone receptors. After the discovery of activated PTKs as the product of dominant viral-transforming genes (oncogenes) providing the early hypothesis for the connection between protein tyrosine phosphorylation and cell transformation, enough evidence are now available to suggest that inappropriate or elevated expression of PTKs contribute to the transformed state of cells in many human malignancies [15–19]. P56lck is a lymphoid-specific protein tyrosine kinase that is principally expressed in T lymphocytes [20]. Association of p56lck with the cytoplasmic tail of various cell surface receptors, as well as associations of p56lck with intracellular targets of phosphorylation, suggests that this tyrosine kinase plays a central role in coordinating early signal transduction events [21]. Based on this knowledge it is clear that, substances which can modulate the activity of PTKs might be potentially effective therapeutic agents. The key step in the mechanism of kinase activity of all PTKs is the recognition and binding of a nucleoside triphosphate (usually ATP) and an appropriate tyrosyl-containing substrate to the enzyme. Direct transfer of phosphate between the two molecules is the next step in the PTKs function [22]. A variety of compounds can inhibit the function of PTKs in a manner which is competitive with respect to nucleotide binding. Among such competitive inhibitors are flavonoids, a group of low molecular weight plant natural products that include one of the largest classes of naturally-occurring polyphenolic compounds [23, 24]. This group of plant natural products is largely responsible for the colors of many fruits and flowers, and over 4,000 flavonoid pigments have been characterized and classified according to their chemical structure. Chemically they are C6-C3-C6 compounds in which the two C6 groups are substituted benzene rings, and the C3 group is an aliphatic chain which contains a pyran ring. Flavonoids occur as O-or C-glycosides or in the “free” state as aglycones with hydroxyl or methoxyl groups present on the aglycone. The flavonoids may be divided into seven types: flavones, flavonols, flavonones, chalcones, xanthones, isoflavones, and biflavones. Flavonoids have been gained wide interest as potential pharmacological agents since some of the best sources of flavonoids are foods: apples, blueberries, bilberries, onions, soy products and tea. Furthermore numerous medicinal plants contain therapeutic amounts of flavonoids, which are used to treat a wide variety of disorders [25].
Here, we consider the inhibitory activity of flavonoids against protein–tyrosine kinase p56lck. Several QSAR studies were reported on this class of molecules using different descriptors and different methods of modeling. Thakur et al. described a QSAR study on p56lck protein tyrosine kinase inhibitor flavonoids using only hydration energy and hydrophobic parameters [26]. Nikolovska-Coleska et al. treated a set of 104 derivatives with standard linear regression technique by the use of classical/quantum descriptors [27]. The same dataset was treated by Novic et al. with a counter propagation neural network by the use of classical/quantum descriptors [28]. Oblak et al. applied a wide variety of descriptors with CODESSA software on the above-mentioned dataset [29]. A quantum chemical/classical QSAR study on a set of 75 flavonoids and closely related compounds tested as p56lck protein tyrosine kinase and AR inhibitors has been carried out by Stefanic et al. and the obtained structure-activity relationships of both enzyme systems were compared [30]. A comprehensive ab initio study of 3D structures of some flavonoids is reported by Meyer [31]. Deeb et al. calculated nodal orientation with program NODANGLE [32].
In the present paper, the QSAR study for a series of 50 flavonoid analogues with the ability to inhibit protein tyrosine kinase has been considered [32]. In a comprehensive study of the PTK system we used a very large descriptor set (more than 600 topological, geometrical, constitutional, functional group, electrostatic, quantum and chemical descriptors) and different analyses: Hansch, Free-Wilson and substituent electronic descriptors (SED), in order to be able to compare the predictive ability of descriptors from different descriptor groups. Multiple linear regression (MLR) and genetic algorithm partial least squares (GA-PLS) methods were applied as methods for modeling.
2. Results and Discussion
The structural features and biological activity of the studied compounds are listed in Table 1. Calculated descriptors for each molecule are summarized in Table 2.
Table 1.
Chemical structure of flavonoid derivatives used in this study and their experimental and predicted activity for protein kinase inhibition.

Chemical structure of flavonoid derivatives.
| Compound | R | Experimental pIC50a | Predicted pIC50 | REP b |
|---|---|---|---|---|
| 1 | 5,7-OH,4′-NH2 | 5.13 | 4.7707 | −0.0753 |
| 2 | 3,5,7,3′,4′-OH | 4.88 | 4.9431 | 0.0128 |
| 3 | 3,7,3′,4′-OH | 4.86 | 4.7707 | −0.0187 |
| 4 | 5,7,4′-OH | 4.83 | 4.4356 | −0.0889 |
| 5 | 5,4′-OH | 4.80 | 4.2603 | −0.1267 |
| 6 | 6,3′-OH | 4.80 | 4.4242 | −0.0849 |
| 7 | 6-OH,5,7,4′-NH2 | 4.74 | 4.1061 | −0.1544 |
| 8 | 5,7-OH | 4.71 | 4.0895 | −0.1518 |
| 9 | 4′-OH,3′,5′-OCH3 | 4.57 | 4.2687 | −0.0706 |
| 10 | 5,7,3′,4′-OH | 4.46 | 4.4172 | −0.0097 |
| 11 | 7,3′-OH | 4.41 | 4.4358 | 0.0058 |
| 12 | 6-OH,5,7,3′-NH2 | 4.34 | 4.3681 | 0.0064 |
| 13 | 6-OMe,8,3′-NH2 | 4.25 | 4.1649 | −0.0204 |
| 14 | 6-OH,3′,4′,5′-OCH3 | 4.22 | 4.3591 | 0.0319 |
| 15 | 3,5,7,4′-OH,3′,5′-OCH3 | 4.16 | 4.1649 | 0.0012 |
| 16 | 3,5,7,3′,5′-OH | 4.00 | 3.9947 | −0.0013 |
| 17 | 6,4′-NH2 | 3.99 | 3.9613 | −0.0072 |
| 18 | 6,8,4′-NH2 | 3.97 | 3.9764 | 0.0016 |
| 19 | 6-OH,8,4′-NH2 | 3.93 | 3.9446 | 0.0037 |
| 20 | 6,4′-OH | 3.93 | 3.9247 | −0.0013 |
| 21 | 7,8,4′-OH,3′,5′-OCH3 | 3.92 | 3.8990 | −0.0054 |
| 22 | 8,4′-NH2 | 3.91 | 3.8994 | −0.0027 |
| 23 | 6,4′-OH,3′,5′-OCH3 | 3.89 | 3.9133 | 0.0060 |
| 24 | 7-OH,4′-NH2 | 3.86 | 3.8815 | 0.0056 |
| 25 | 7-OH,6,4′-NH2 | 3.85 | 3.8296 | −0.0053 |
| 26 | 7,4′-OH | 3.78 | 3.8621 | 0.0213 |
| 27 | 7,8,3′OH | 3.75 | 3.6903 | −0.0162 |
| 28 | 6,3′-NH2 | 3.70 | 4.0228 | 0.0803 |
| 29 | 4′-NH2 | 3.68 | 4.1850 | 0.1207 |
| 30 | 5-OH,6,4′-NH2 | 3.65 | 3.9325 | 0.0718 |
| 31 | 3,5,7-OH | 3.53 | 3.9794 | 0.1129 |
| 32 | 5,4′-OH,7-OCH3 | 3.55 | 3.7315 | 0.0487 |
| 33 | 5,3′-OH | 3.50 | 4.1209 | 0.1507 |
| 34 | 7,8-OH | 3.50 | 3.4873 | −0.0036 |
| 35 | 5-OH,8,4′-NH2 | 3.49 | 3.6705 | 0.0492 |
| 36 | 7-OH,8,4′-NH2 | 3.48 | 3.6694 | 0.0516 |
| 37 | 7-OH | 3.47 | 3.8567 | 0.1003 |
| 38 | 6-OCH3,8,4′-NH2 | 3.43 | 3.6709 | 0.0683 |
| 39 | 7,8-OH,3′,4′,5′-OCH3 | 3.40 | 4.0058 | 0.1512 |
| 40 | 3-COOCH3,4′-OH | 3.36 | 3.7081 | 0.0939 |
| 41 | 4′-OH | 3.30 | 3.7081 | 0.1101 |
| 42 | 7-OH,6,3′-NH2 | 3.30 | 3.3419 | 0.0125 |
| 43 | 7-OH,6,8,4′-NH2 | 3.12 | 3.3419 | 0.0664 |
| 44 | 3-COOCH3,4′-NH2 | 3.09 | 3.3419 | 0.0754 |
| 45 | 3-COOH,7-OCH3,4′-OH | 2.99 | 3.3262 | 0.1011 |
| 46 | 7,4′-OH,3′,5′-OCH3 | 2.90 | 3.3262 | 0.1281 |
| 47 | 7-OH,6,8,4′-NO2 | 2.81 | 3.0674 | 0.0839 |
| 48 | 3-COOH,4′-OH | 2.80 | 3.0674 | 0.0872 |
| 49 | 5-OCH3,8,4′-NH2 | 2.79 | 3.0674 | 0.0904 |
| 50 | 7-OH,8,4′-NO2 | 2.73 | 3.3262 | 0.1793 |
pIC50 = –log (IC50),
REP = Relative Error Prediction
Table 2.
Brief description of some descriptors used in this study.
| Descriptor type | Molecular Description |
|---|---|
| Constitutional | Molecular weight, no. of atoms, no. of non-H atoms, no. of bonds, no. of heteroatoms, no. of multiple bonds (nBM), no. of aromatic bonds, no. of functional groups (hydroxyl, amine, aldehyde, carbonyl, nitro, nitroso, etc.), no. of rings, no. of circuits, no of H-bond donors, no of H-bond acceptors, no. of Nitrogen atoms (nN), chemical composition, sum of Kier-Hall electrotopological states (Ss), mean atomic polarizability (Mp), number of rotable bonds (RBN), mean atomic Sanderson electronegativity (Me), etc.
|
| Topological | Molecular size index, molecular connectivity indices (X1A, X4A, X2v, X1Av, X2Av, X3Av, X4Av), information content index (IC), Kier Shape indices, total walk count, path/walk-Randic shape indices (PW3, PW4, Zagreb indices, Schultz indices, Balaban J index (such as MSD) Wiener indices, topological charge indices, Sum of topological distances between F..F (T(F..F)), Ratio of multiple path count to path counts (PCR), Mean information content vertex degree magnitude (IVDM), Eigenvalue sum of Z weighted distance matrix (SEigZ), reciprocal hyper-detour index (Rww), Eigenvalue coefficient sum from adjacency matrix (VEA1), radial centric information index, 2D petijean shape index (PJI2), etc.
|
| Geometrical | 3D petijean shape index (PJI3), Gravitational index, Balaban index, Wiener index, etc.
|
| Quantum | Highest occupied Molecular Orbital Energy (HOMO) , Lowest Unoccupied Molecular Orbital Energy (LUMO), Most positive charge (MPC), Least negative charge (LNC), Sum of squares of charges (SSC), Sum of square of positive charges (SSPC), Sum of square of negative charges (SSNC), Sum of positive charges (SUMPC), Sum of negative charges (SUMNC), Sum of absolute of charges (SAC), Total dipole moment (DMt), Molecular dipole moment at X-direction (DMX), Molecular dipole moment at Y-direction (DMY), Molecular dipole moment at Z-direction (DMZ), Electronegativity (χ= −0.5 (HOMO-LUMO)), Electrophilicity (ω= χ2/2 η) ,Hardness (η = 0.5 (HOMO+LUMO)), Softness (S=1/η).
|
| Functional group | Number of total tertiary carbons (nCt), Number of H-bond acceptor atoms (nHAcc), number of total hydroxyl groups (nOH), number of unsubstituted aromatic C(nCaH), number of ethers (aromatic) (nRORPh), etc.
|
| Chemical | LogP (Octanol-water partition coefficient), Hydration Energy (HE), Polarizability (Pol), Molar refractivity (MR), Molecular volume (V), Molecular surface area (SA).
|
| Substituent electronic descriptors | RMSQ (Root mean square error of charges), SPQ ( Sum of positive charges), SNQ ( Sum of negative charges), RMSDM (Root mean square of dipole moments at any Cartesian coordinate direction), TDM (Total dipole moment), FRMS (Root mean square force that any atom in constituent molecule see right before the optimization), FMAX (Maximum force on molecule), HOMO (Highest occupied molecular orbital), LUMO (Lowest unoccupied molecular orbital), HD (Hardness), SOF (Softness), EPH (Electrophilicity), EN (Electronegativity). |
2.1. MLR analysis
In the first step, separate stepwise selection-based MLR analyses were performed using different types of descriptors, and then, an MLR equation was obtained utilizing the pool of all calculated descriptors. The results are summarized in Table 3. Correlation coefficient (r2) matrix for the descriptors used in different MLR equations is shown in Table 4. Collinear descriptors degrade the performance of MLR equations and such models have lowered prediction ability.
Table 3.
The results of MLR analysis with different types of descriptors.
| No. | Descriptor source | MLR Equations | N | R2 | SE | RMSCV | Q2 | F |
|---|---|---|---|---|---|---|---|---|
| E1 | Chemical | pIC50 = 4.893 (± 0.735) − 0.056 (± 0.017) HE −0.007 (± 0.003) Mass | 50 | 0.40 | 0.55 | 0.58 | 0.32 | 13.82 |
| E2 | Quantum | pIC50 = 6.362 (± 0.565) − 6.805 (± 1.505) MPC | 50 | 0.43 | 0.53 | 0.54 | 0.38 | 17.44 |
| E3 | Constitutional | pIC50 = 3.139 (± 1.250) − 0.438 (± 0.100) nBM − 0.506 (± 0.205) AMW − 0.584 (± 0.266) nAB | 50 | 0.49 | 0.49 | 0.51 | 0.42 | 19.65 |
| E4 | Topological | pIC50 = 17.242 (± 0.605) − 3.374 (± 0.545) IVDM − 53.95 (± 12.355) X1Av + 2.349 (± 0.696) ICR +24.874 (±9.569) PW4 + 73.575 (±33.719) X4A | 50 | 0.72 | 0.38 | 0.48 | 0.58 | 30.13 |
| E5 | Geometrical | pIC50 = −15.093 (± 3.339) + 19.450 (± 3.406) SPH − 0.010 (± 0.002) G(N...O) | 50 | 0.60 | 0.43 | 0.47 | 0.49 | 17.23 |
| E6 | Functional group | pIC50 = 3.672 (± 0.123) − 0.414 (± 0.130) nNO2 −1.098 (± 0.369) nOHt + 0.160 (± 0.058) nOH | 50 | 0.53 | 0.45 | 0.50 | 0.45 | 12.67 |
| E7 | Hansch | pIC50 = 4.219 (± 0.289) − 0.615 (± 0.202) π5 + 1.462 (± 0.555) ℑR′3 − 1.379 (± 0.490) ℑR8 −0.249 (± 0.111) L3 | 50 | 0.53 | 0.45 | 0.50 | 0.45 | 12.67 |
| E8 | SED | pIC50 = −0.708 (± 1.228) − 9.570 (± 2.500) HOMOA3 + 1.092 (±0.308) SNQ8 | 50 | 0.82 | 0.32 | 0.30 | 0.61 | 51.43 |
| E9 | Molecular descriptor | pIC50 = −19.763 (± 4.304) − 4.785 (± 1.275) MPC + 25.113 (± 4.142) SPH + 0.849 (± 0.264) SNQ8 − 0.357 (± 0.136) L3 | 50 | 0.83 | 0.31 | 0.28 | 0.62 | 52.43 |
Table 4.
Correlation coefficient (r2) matrix for the descriptors of flavone derivatives used in the MLR equation.
| HE | Mass | MPC | nBM | AMW | nAB | ASP | G(N...O) | X1AV | ICR | PW4 | X4A | IVDM | nNO2 | nOHt | nOH | ℑR′3 | L3 | ℑR8 | π5 | pIC50 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HE | 1 | −0.234 | 0.192 | 0.124 | −0.327 | 0.236 | −0.006 | 0.00 | 0.651 | 0.075 | −0.012 | 0.316 | 0.065 | 0.069 | 0.047 | −0.745 | −0.394 | 0.067 | −0.005 | 0.485 | −0.347 |
| Mass | 1 | 0.531 | 0.580 | 0.512 | 0.136 | −0.269 | 0.328 | −0.655 | 0.416 | 0.541 | −0.631 | 0.816 | 0.554 | 0.099 | 0.211 | 0.326 | 0.196 | 0.487 | 0.040 | −0.268 | |
| MPC | 1 | 0.953 | 0.715 | 0.366 | −0.233 | 0.623 | −0.539 | 0.304 | 0.050 | −0.329 | 0.904 | 0.876 | 0.259 | −0.227 | −0.286 | 0.289 | 0.595 | 0.156 | −0.547 | ||
| nBM | 1 | 0.778 | 0.165 | −0.094 | 0.725 | −0.624 | 0.390 | 0.016 | −0.325 | 0.937 | 0.972 | 0.114 | −0.196 | −0.211 | 0.125 | 0.687 | 0.193 | −0.498 | |||
| AMW | 1 | 0.050 | −0.200 | 0.356 | 0.897 | 0.037 | 0.116 | −0.206 | 0.718 | 0.775 | 0.116 | 0.434 | 0.136 | 0.125 | 0.620 | 0.065 | −0.191 | ||||
| nAB | 1 | −0.684 | −0.127 | 0.069 | −0.192 | 0.257 | −0.397 | 0.235 | −0.073 | 0.692 | −0.086 | −0.198 | 0.930 | −0.108 | 0.185 | −0.364 | |||||
| ASP | 1 | 0.294 | 0.155 | 0.538 | 0.532 | 0.388 | −0.221 | 0.069 | 0.369 | −0.273 | −0.201 | −0.768 | −0.039 | −0.098 | 0.269 | ||||||
| G(N...O) | 1 | −0.379 | 0.578 | 0.299 | 0.348 | 0.618 | 0.763 | −0.138 | −0.478 | −0.437 | −0.182 | 0.508 | 0.034 | −0.329 | |||||||
| X1AV | 1 | −0.130 | −0.171 | 0.413 | −0.651 | −0.647 | −0.052 | −0.572 | −0.270 | −0.056 | −0.542 | 0.229 | 0.058 | ||||||||
| ICR | 1 | −0.212 | −0.277 | 0.442 | 0.441 | −0.104 | −0.410 | −0.161 | −0.278 | 0.153 | 0.168 | −0.080 | |||||||||
| PW4 | 1 | −0.157 | 0.261 | −0.045 | 0.158 | 0.336 | 0.413 | 0.356 | 0.029 | −0.249 | 0.002 | ||||||||||
| X4A | 1 | −0.489 | −0.233 | −0.252 | −0.046 | −0.025 | −0.466 | −0.261 | −0.157 | 0.347 | |||||||||||
| IVDM | 1 | 0.891 | 0.155 | −0.100 | −0.030 | 0.218 | 0.663 | 0.192 | −0.494 | ||||||||||||
| nNO2 | 1 | −0.050 | −0.177 | −0.166 | −0.097 | 0.720 | 0.151 | −0.416 | |||||||||||||
| nOHt | 1 | 0.061 | −0.137 | 0.513 | −0.075 | 0.128 | −0.306 | ||||||||||||||
| nOH | 1 | 0.621 | 0.104 | −0.004 | −0.375 | 0.370 | |||||||||||||||
| ℑR′3 | 1 | −0.070 | 0.008 | −0.014 | 0.315 | ||||||||||||||||
| L3 | 1 | −0.143 | 0.085 | −0.259 | |||||||||||||||||
| ℑR8 | 1 | 0.224 | −0.367 | ||||||||||||||||||
| π5 | 1 | −0.451 | |||||||||||||||||||
| pIC50 | 1 |
In Table 3 the QSAR models derived for different derivatives by using different sets of molecular descriptors are listed. Table 3 provides the resulted equations for the studied compounds. The first equation of Table 3 was found by using chemical descriptors (E1). This equation explained the negative effect of hydration energy and molecular weight (Mass) of molecules on protein tyrosine kinase inhibitory activity. Equation E2 shows that among quantum descriptors, most positive charge (MPC) has a negative effect on protein tyrosine kinase inhibitory activity and reveals the presence of columbic interactions between the ligands and receptors. The negative sign of the coefficient of MPC demonstrates that ligands with the least MPC could interact with receptor more efficiently. This indicates that there is probably a negative region in receptor which produces columbic interactions with ligand. Equation E3 of Table 3 demonstrates the effect of constitutional descriptors. It includes the negative effects of average molecular weight (AMW), number of multiple bonds (nBM) and number of aromatic bonds (nAB) on protein tyrosine kinase inhibitory activity. Molecules with lower coefficient of AMW show better protein tyrosine kinase inhibitory activity and decreasing the number of multiple bonds of compounds results in activity enhancement. The MLR equation of Table 3 was obtained from the pool of topological descriptors (E4) explained the positive effect of mean information content on the distance equality (ICR), path/walk 4-randic shape index (PW4), average connectivity index chi-4 (X4v) and the negative effect of mean information content vertex degree magnitude (IVDM) and average valence connectivity index chi-1 (X1v) on protein tyrosine kinase inhibitory activity. This equation describes the structure-activity relationship better than those obtained from the chemical, quantum, constitutional descriptors.
The equation obtained from the effect of geometrical parameter on protein tyrosine kinase inhibitory activity of the studied compounds has been described as E5 of Table 3. It explains the positive effect of spherosity (SPH) and negative effect of sum of geometrical distances between N...O, i.e. G (N...O) on protein tyrosine kinase inhibitory activity. The effect of functional groups on protein tyrosine kinase inhibitory activity of the studied compounds has been described by equation E6 of Table 3. This three-parametric equation does not have a high statistical quality, which suggests that the protein tyrosine kinase inhibitory activity of the studied molecules is not highly dependent on the type of functional group; but it is dependent on the structural changes induced by variations in functional groups. The negative sign of nNO2 and nOHt indicates that molecules with lower number of nitro groups (aliphatic) and tertiary alcohols (aliphatic) bind to protein kinase stronger. On the other hand, number of hydroxyl groups (nOH) represents direct effect on the inhibitory activity of the compounds. The Hansch equation (E7) shows the importance of steric, electronic and lipophilic factors on protein tyrosine kinase inhibitory activity. These factors are described by L3 (Length parameter of C3 substituent), ℑR′3, ℑR8 (Swain and Lupton field parameter of C-R′3 and C-R8 substitutes) and π5 (lipophilic parameter of C5 substitute), respectively. The negative coefficient of π5 indicates that lipophilic substituents at R5 are not favorable for binding affinity. This equation shows the positive effect of ℑR′3 and the negative effect of ℑR8 on the inhibitory activity of the compounds. In addition the negative effect of L3 describes that the presence of bulky groups at C3 leads to decreased activity because bulky groups hinder strong interaction between ligands and the enzyme. The SED equation (E8) shows the importance of SED factors on protein tyrosine kinase inhibitory activity. One of the parameters is molecular orbital energy HOMOA3 (Highest occupied molecular orbital parameter of C3 substitute) and the other one is SNQ8 (Sum of negative charges parameter of C8 substitute). It explains the positive effect of HOMOA3 and negative effect of SNQ8 on protein tyrosine kinase inhibitory activity.
The last Equation (E9) was obtained from the all types of calculated descriptors. Stepwise selection and elimination of variables produced a four-parametric QSAR equation. This equation shows that geometrical (SPH), quantum (MPC), Hansch (L3) and SED (SNQ8) parameters are major factors that affect protein tyrosine kinase inhibitory activity of compounds. Among these descriptors MPC and L3 have negative effects and the others have positive effects on the protein tyrosine kinase inhibitory activity.
2.2. Free-Wilson analysis
The simple Free-Wilson analysis (FWA) was considered to indicate which substituents on ring B and chromone moiety contribute to protein tyrosine kinase inhibitory activity and which ones detract from activity [33]. As indicated in Table 1, the molecules used in this study have a phenyl ring (ring B) and chromone moiety with different types of substituents in different positions of the ring. Some important substituents such as methoxyl, hydroxyl and amine are used in calculations. Therefore, the descriptors data matrix built for the FWA has 44 rows (i.e., number of selected molecules for FWA) and 24 columns (i.e., three substituents at eight substitution positions on the flavonoid structure). The elements of the descriptor data matrix are 1 or 0, to indicate the presence or absence of a given substituent in a specified position in a molecule, respectively. The following two-parametric equation was found between the activity data (y) and the Free-Wilson type descriptors data matrix:
| 1 |
Equation (1) describes that protein tyrosine kinase inhibitory activity of studied compounds is directly affected by the presence of electron-donating hydroxyl group in the meta position (R′3) of the phenyl ring and most probably this part of the flavonoid molecule interacts with the catalytic domain of the enzyme. The same result was obtained by other researchers [27]. A methoxyl group on C-R5 detracts from the inhibitory activity, according to this equation.
2.3. GA-PLS analysis
In PLS analysis, the descriptors data matrix is decomposed to orthogonal matrices with an inner relationship between the dependent and independent variables. Therefore, unlike MLR analysis, the multicolinearity problem in the descriptors is omitted by PLS analysis. Because a minimal number of latent variables are used for modeling in PLS; this modeling method coincides with noisy data better than MLR. In order to find the more convenient set of descriptors in PLS modeling, genetic algorithm was used. To do so, many different GA-PLS runs were conducted using different initial set of populations. The data set (n = 50) was divided into two group: calibration set (n = 40) and prediction set (n = 10). Given 40 calibration samples; the leave-one out cross-validation procedure was used to find the optimum number of latent variables for each PLS model. The most convenient GA-PLS model that resulted in the best fitness contained 14 indices, four of them being those obtained by MLR. The PLS estimate of coefficients for these descriptors are given in Figure 1. As it observed, a combination of quantum, topological, geometrical and Hansch descriptors have been selected by GA-PLS to account the protein tyrosine kinase inhibitory activity of flavonoid derivatives. The majority of these descriptors are topological indices. The resulted GA-PLS model possessed a high statistical quality R2 = 0.74 and Q2 = 0.61. The predictive ability of the model was measured by applying to 10 external test set molecules. The squared correlation coefficient for prediction was 0.82 and standard error of prediction was 0.30. The values of pIC50 using GA-PLS model (refined from cross-validation or external prediction set) along with the corresponding relative errors of prediction (REP) are shown in Table 1. Very small values of relative errors (between ± 0.40) confirm the accuracy of the proposed GA-PLS model for modeling protein tyrosine kinase inhibitory activity of the studied flavonoid derivatives.
Figure 1.
PLS regression coefficients for the variables used in GA-PLS model.
Comparison between the results obtained by GA-PLS and MLR methods indicates higher accuracy of GA-PLS method in describing the inhibitory activity of flavonoid derivatives toward protein tyrosine kinase enzyme. The difference in accuracy of the two regression methods used in this study is visualized in Figure 2 by plotting the predicted activity (by cross-validation) against the experimental values. Obviously, two linear models represented scattering of data around a straight line with slope close to one. As it is observed, the plot of data resulted by GA-PLS represents the lowest scattering and the plot obtained by MLR analysis (which is obtained from E9) is in the second order of accuracy.
Figure 2.
Plots of the cross-validated predicted activity against the experimental activity for the QSAR models obtained by MLR, GA-PLS methods.
To measure the significance of the 14 selected PLS descriptors in the protein tyrosine kinase inhibitory activity; VIP was calculated for each descriptor [34]. The VIP analysis of PLS equation is shown in Figure 3. VIP shows that HNar and TI2, which are topological, and SPH which is a geometrical parameter, are the most important indices in the QSAR equation derived by PLS analysis. In addition, quantum parameters such as (HOMO) and Hansch (ℑR′3) have been found to be moderately influential parameters.
Figure 3.
Plot of variables important in projection (VIP) for the descriptors used in GA-PLS model.
3. Methodology
3.1. Software
The two-dimensional structures of molecules were drawn using Hyperchem 7.0 software. The final geometries were obtained with the semi-empirical AM1 method in Hyperchem program. The molecular structures were optimized using the Polak-Ribiere algorithm until the root mean square gradient was 0.01 kcal mol−1. The resulted geometry was transferred into Dragon program package, which was developed by Milano Chemometrics and QSAR Group [35]. The z-matrix of the structures was provided by the software and transferred to the Gaussian 98 program. Complete geometry optimization was performed taking the most extended conformation as starting geometries. Semi-empirical molecular orbital calculation (AM1) of the structures was preformed using Gaussian 98 program [36].
3.2. Activity data & descriptor generation
The biological data used in this study are protein tyrosine kinase inhibitory activity, −log (IC50), of a set of 50 flavonoid analogues [32]. The structural features and biological activity of these compounds are listed in Table 1 and then used for subsequent QSAR analysis as dependent variables. The large number of molecular descriptors was calculated using Hyperchem, Dragon package and Gaussian 98. Some chemical parameters including molecular volume (V), molecular surface area (SA), hydrophobicity (Log P), hydration energy (HE) and molecular polarizability (MP) were calculated using Hyperchem Software. The Dragon software calculated different functional groups, topological, geometrical and constitutional descriptors for each molecule. Gaussian 98 was employed for calculation of different quantum chemical descriptors including, dipole moment (DM), local charges, and HOMO and LOMO energies. Hardness (η), softness (S), electronegativity (χ) and electrophilicity (ω) were calculated according to the method proposed by Thanikaivelan et al. [37]. Classical substituent constants including hydrophobic constant (π), the Hammet electronic constants (σ), the Taft field effect (FI), resonance (R) substituent and steric (molar refractivity MR and STERIMOL) constants were also used as descriptor in this study [38]. The calculated descriptors for each molecule are summarized in Table 2.
3.3. Data screening & model building
The selected descriptors from each class and the experimental data were analyzed by the stepwise regression SPSS (version 12.0) software. The calculated descriptors were collected in a data matrix whose number of rows and columns were the number of molecules and descriptors, respectively. Multiple linear regression (MLR) and partial least squares (PLS) were used to derive the QSAR equations and feature selection was performed by the use of genetic algorithm (GA). The resulted models were validated by leave-one out cross-validation procedure (using MATLAB software) to check their predictability and robustness. However, this procedure did not produce good results and therefore we used genetic algorithm (GA-PLS) to select the best variables.
Application of PLS allows the construction of larger QSAR equations, while still avoiding over-fitting and eliminating most variables. PLS is normally used in combination with cross-validation to obtain the optimum number of components [39, 40]. The PLS regression method used in this study was the NIPALS-based algorithm existed in the chemometrics toolbox of MATLAB software (version 7.1 Math work Inc.). Leave-one-out cross-validation procedure was used to obtain the optimum number of factors based on the Haaland and Thomas F-ratio criterion [41].
3.4. Variable importance in the projection (VIP)
In order to investigate the relative importance of the variable appeared in the final model obtained by GA-PLS method, variable important in projection (VIP) was employed [34]. VIP values reflect the importance of terms in PLS model. According to Erikson et al. X-variables (predictor variables) could be classified according to their relevance in explaining y (predicted variable), so that VIP > 1.0 and VIP < 0.8 mean highly or less influential, respectively, and 0.8 < VIP< 1.0 means moderately influential [8].
3.5. Substituent electronic descriptors (SED)
Electronic descriptors obtained from quantum chemical calculations have found major popularity and there is a challenge between calculation complexity and accuracy to select the quantum chemical calculation methods (i.e., semi-empirical and ab initio) [42]. To simplify the quantum chemical calculations Hemmateenejad et al. recently have hypothesized that the calculations could be performed on the substituents instead of whole molecular structures and the resulting electronic features can be considered as electronic descriptors which have found major popularity in QSAR/QSPR studies [43,44]. Hemmateenejad et al. proposed substituent electronic descriptors (SED) as an alternative to both substituent constants and molecular descriptors [43]. SED analysis for each substituent was used in our study and the calculated descriptors are listed in Table 2. They can be classified into three different electronic categories including local charges, dipoles and orbital energies. Since most of the constituents are open shell quantum species (due to being in doublet quantum state as a radical molecule), a difference in energy between two electronic energy populations, alpha (spine up) and beta (spine down) can be seen using Gaussian 98. It provides some additional descriptors HOMOA, HOMOB, LUMOA, LUMOB, HAD, HDB, SOFA, SOFB, ENA, ENB, EPHA, and EPHB stem from two different alpha and beta electronic population energy, where the subscript A and B stand for alpha and beta population of electronic energy, respectively. Therefore, a total of 26 electronic descriptors were calculated for each substituent.
4. Conclusions
Quantitative relationships between molecular structure and protein tyrosine kinase inhibitory activity of flavonoid derivatives were discovered by two chemometrics methods: MLR and GA-PLS. Different QSAR models revealed that SED parameters have significant impact on protein tyrosine kinase inhibitory activity of the compounds. In this series a significant role of topological and geometrical parameters on the inhibitory activity was observed. Using the pool of all types of calculated descriptors a new QSAR model was derived for these compounds. In this model the importance of quantum, geometrical, SED and Hansch parameters have an effect on protein tyrosine kinase inhibitory activity was indicated. A comparison between the two statistical methods employed indicated that GA-PLS represented superior results. The resulted GA-PLS model possessed a high statistical quality (R2 = 0.74 and Q2 = 0.61) for predicting the activity of the inhibitors. The models proposed in present work are more useful in describing QSAR of flavonoid derivatives as p56lck protein tyrosin kinase Inhibitors than those proposed previously.
Acknowledgments
This work was supported by Isfahan Pharmaceutical Sciences Research Center. The authors wish to thank Dr. Bahram Hemmateenejad for his advice on various aspects of this research.
References
- 1.Hansch C, Hoekman D, Gao H. Comparative QSAR: Toward a Deeper Understanding of Chemicobiological Interactions. Chem. Rev. 1996;96:1045–1076. doi: 10.1021/cr9400976. [DOI] [PubMed] [Google Scholar]
- 2.Hansch C, Maloney PP, Fujita T, Muir RM. Correlation of Biological Activity of Phenoxyacetic Acids with Hammett Substituent Constants and Partition Coefficients. Nature. 1962;194:178–180. [Google Scholar]
- 3.Hemmateenejad B. Correlation Ranking Procedure for Factor Selection in PC-ANN Modeling and Application to ADMETox Evaluation. Chemom. Intell. Lab. Syst. 2005;75:231–245. [Google Scholar]
- 4.Fujita T, Iwasa J, Hansch C. A New Substituent Constant, π, Derived from Partition Coefficients. J. Am. Chem. Soc. 1964;86:5175–5180. [Google Scholar]
- 5.Hansch C. Quantitative Approach to Biochemical Structure-Activity Relationships. Acc. Chem. Res. 1968;2:232–239. [Google Scholar]
- 6.Hansch C, Clayton JM. Lipophilic Character and Biological Activity of Drugs II: The Parabolic Case. J. Pharm. Sci. 1973;62:1–21. doi: 10.1002/jps.2600620102. [DOI] [PubMed] [Google Scholar]
- 7.Agatonovic-Kustrin S, Tucker IG, Zecevic M, Ziva-novic LJ. Prediction of Drug Transfer into Human Milk from Theoretically Derived Descriptors. Anal. Chem. Acta. 2000;418:181–195. [Google Scholar]
- 8.Mohajeri A, Hemmateenejad B, Mehdipour A, Miri R. Modeling Calcium Channel Antagonistic Activity of Dihydropyridine Derivatives Using QTMS Indices Analyzed by GA-PLS and PC-GA-PLS. J. Mol. Graph. Model. 2008;26:1057–1065. doi: 10.1016/j.jmgm.2007.09.002. [DOI] [PubMed] [Google Scholar]
- 9.Ullrich A, Schlessinger J. Signal Transduction by Receptors with Tyrosine Kinase Activity. Cell. 1990;61:203–212. doi: 10.1016/0092-8674(90)90801-k. [DOI] [PubMed] [Google Scholar]
- 10.Bishop JM. The Molecular Genetics of Cancer. Science. 1987;235:305–311. doi: 10.1126/science.3541204. [DOI] [PubMed] [Google Scholar]
- 11.Blume-Jensen P, Hunter T. Oncogenic Kinase C Signalling. Nature. 2001;411:355–365. doi: 10.1038/35077225. [DOI] [PubMed] [Google Scholar]
- 12.Hunter T. Signaling — 2000 and Beyond. Cell. 2000;100:113–127. doi: 10.1016/s0092-8674(00)81688-8. [DOI] [PubMed] [Google Scholar]
- 13.Schlessinger J. Cell Signaling by Receptor Tyrosine Kinases. Cell. 2000;103:211–225. doi: 10.1016/s0092-8674(00)00114-8. [DOI] [PubMed] [Google Scholar]
- 14.Hanahan D, Weinberg RA. The Hallmarks of Cancer. Cell. 2000;100:57–70. doi: 10.1016/s0092-8674(00)81683-9. [DOI] [PubMed] [Google Scholar]
- 15.Cantley LC, Auger KR, Carpenter C, Duckworth B, Graziani A, Kapeller R, Oltoff S. Oncogenes and Signal Transduction. Cell. 1991;64:281–302. doi: 10.1016/0092-8674(91)90639-g. [DOI] [PubMed] [Google Scholar]
- 16.Groundwater PW, Solomons KRH, Drewe JA, Munawar MA. In: Progress in Medicinal Chemistry. Ellis GP, Luscombe DK, editors. Elsevier Science B.V; Amsterdam: 1996. pp. 233–329. [DOI] [PubMed] [Google Scholar]
- 17.Bolen JB, Veillette A, Schwartz AM, DeSeau V, Rosen N. Activation of pp60c-src Protein Kinase Activity in Human Colon Carcinoma. Proc. Natl. Acad. Sci. USA. 1987;84:2251–2255. doi: 10.1073/pnas.84.8.2251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Slamon DJ, Clark GM, Wong SG, Levin WJ, Ullrich A, McGuire WL. Human Breast Cancer: Correlation of Relapse and Survival with Amplification of the HER-2/neu Oncogene. Science. 1987;235:177–182. doi: 10.1126/science.3798106. [DOI] [PubMed] [Google Scholar]
- 19.Yamamoto T, Kamata N, Kawano H, Shimizu S, Kuroki T, Toyoshima K, Rikimaru K, Nomura N, Ishizaki R, Pastan I, Gamou S, Shimizu N. High Incidence of Amplification of the Epidermal Growth Factor Receptor Gene in Human Squamous Carcinoma Cell Lines. Cancer Res. 1986;46:414–416. [PubMed] [Google Scholar]
- 20.Weil R, Veillette A. Signal Transduction by the Lymphocyte-Specific Tyrosine Protein Kinase p56lck. Current Topics Micro. Immunol. 1996;205:63–87. doi: 10.1007/978-3-642-79798-9_4. [DOI] [PubMed] [Google Scholar]
- 21.Anderson SJ, Levin SD, Perlmutter RM. Involvement of the Protein Tyrosine Kinase p56lck in T Cell Signaling and Thymocyte Development. Adv. Immunol. 1994;56:151–178. doi: 10.1016/s0065-2776(08)60451-4. [DOI] [PubMed] [Google Scholar]
- 22.Bishop JM. Cellular Oncogenes and Retroviruses. Annue. Rev. Biochem. 1983;52:301–354. doi: 10.1146/annurev.bi.52.070183.001505. [DOI] [PubMed] [Google Scholar]
- 23.Cushman M, Nagarathnam D, Burg DL, Geahlen RL. Synthesis and Protein-Tyrosine Kinase Inhibitory Activities of Flavonoid Analogues. J. Med. Chem. 1991;34:798–806. doi: 10.1021/jm00106a047. [DOI] [PubMed] [Google Scholar]
- 24.Cushman M, Zhu H, Geahlen RL, Kraker AJ. Synthesis and Biochemical Evaluation of a Series of Aminoflavones as Potential Inhibitors of Protein-Tyrosine Kinases p56lck, EGFr, and p60v-src. J. Med. Chem. 1994;37:3353–3362. doi: 10.1021/jm00046a020. [DOI] [PubMed] [Google Scholar]
- 25.Bylka W, Matlawska I, Pilewski NA. Natural Flavonoids as Antimicrobial Agents. JANA. 2004;7:24–31. [Google Scholar]
- 26.Thakur A, Vishwakarma S, Thakur M. QSAR Study of Flavonoid Derivatives as p56lck Tyrosine Kinase Inhibitors. Bioorg. Med. Chem. 2004;12:1209–1214. doi: 10.1016/j.bmc.2003.11.024. [DOI] [PubMed] [Google Scholar]
- 27.Nikolovska-Coleska Ž, Suturkova L, Dorevski K, Krbavcic A, Solmajer T. Quantitative Structure-Activity Relationship of Flavonoid Inhibitors of p56lck Protein Tyrosine Kinase: A Classical/Quantum Chemical Approach. Quant. Struct.-Act. Relat. 1998;17:7–13. [Google Scholar]
- 28.Novic M, Nikolovska-Coleska Ž, Šolmajer T. Quantitative Structure-Activity Relationship of Flavonoid p56lck Protein Tyrosine Kinase Inhibitors. A Neural Network Approach. J. Chem. Inf. Comput. Sci. 1997;37:990–998. [Google Scholar]
- 29.Oblak M, Randic M, Solmajer T. Quantitative Structure-Activity Relationship of Flavonoid Analogues.3. Inhibition of p56lck Protein Tyrosine Kinase. J. Chem. Inf. Comput. Sci. 2000;40:994–1001. doi: 10.1021/ci000001a. [DOI] [PubMed] [Google Scholar]
- 30.Stefanic-Petek A, Krbavcic A, Solmajer T. QSAR of Flavonoids: 4. Differential Inhibition of Aldose Reductase and p56lck Protein Tyrosine Kinase. Croatica Chemica Acta. 2002;75:517–529. [Google Scholar]
- 31.Meyer M. Ab initio Study of Flavonoid. Int. J. Quantum Chem. 2000;76:724–732. [Google Scholar]
- 32.Deeb O, Clare BW. QSAR of Aromatic Substances: Protein Tyrosin Kinase Inhibitory Activity of Flavonoid Analogues. Chem. Biol. Drug Des. 2007;70:437–449. doi: 10.1111/j.1747-0285.2007.00578.x. [DOI] [PubMed] [Google Scholar]
- 33.Free SMJR, Wilson JW. A Mathematical Contribution to Structure-Activity Studies. J. Med. Chem. 1964;7:395–399. doi: 10.1021/jm00334a001. [DOI] [PubMed] [Google Scholar]
- 34.Olah M, Bologa C, Oprea TI. An Automated PLS Search for Biologically Relevant QSAR Descriptors. J. Comput. Aided Mol. Des. 2004;18:437–449. doi: 10.1007/s10822-004-4060-8. [DOI] [PubMed] [Google Scholar]
- 35.Todeschini R. Milano Chemometrics and QSPR Group. http://michem.disat.unimib.it/, accessed 9 September, 2008.
- 36.Frisch MJ, Trucks MJ, Schlegel HB, Scuseria GE, Robb MA, Cheeseman JR, Zakrzewski VG, Montgomery JA, Stratmann JR, Burant JC, et al. Gaussian, Inc; Pittsburgh, PA: 1998. Gaussian 98, Revision A.7. [Google Scholar]
- 37.Roy K. QSAR of Adenosine Receptor Antagonists II: Exploring Physicochemical Requirements for Selective Binding of 2-arylpyrazolo [3,4-c]quinoline Derivatives with Adenosine A1 and A3 Receptor Subtypes. QSAR. Comb. Sci. 2003;22:614–621. [Google Scholar]
- 38.Hansch C, Leo A, Taft RW. A Survey of Hammett Substituent Constants and Resonance and Field Parameters. Chem. Rev. 1991;91:165–195. [Google Scholar]
- 39.Bhattacharya P, Roy K. QSAR of Adenosine A3 Receptor Antagonist 1,2,4-triazolo[4,3-a]quinoxalin-1-one Derivatives Using Chemometric Tools. Bioorg. Med. Chem. Lett. 2005;15:3737–3743. doi: 10.1016/j.bmcl.2005.05.051. [DOI] [PubMed] [Google Scholar]
- 40.Leardi R. Genetic Algorithms in Chemometrics and Chemistry: A Review. J. Chemometrics. 2001;15:559–569. [Google Scholar]
- 41.Hemmateenejad B. Optimal QSAR Analysis of the Carcinogenic Activity of Drugs by Correlation Ranking and Genetic Algorithm-Based. J. Chemometrics. 2004;18:475–485. [Google Scholar]
- 42.Wang J, Zhang L, Yang G, Zhan CG. Quantitative Structure-Activity Relationship for Cyclic Imide Derivatives of Protoporphyrinogen Oxidase Inhibitors: A Study of Quantum Chemical Descriptors from Density Functional Theory. J. Chem. Inf. Comput. Sci. 2004;44:2099–2105. doi: 10.1021/ci049793p. [DOI] [PubMed] [Google Scholar]
- 43.Hemmateenejad B, Sanchooli M. Substituent Electronic Descriptors for Fast QSAR/QSPR. J. Chemometrics. 2007;21:96–107. [Google Scholar]
- 44.Smeyers YG, Bouniam L, Smeyers NJ, Ezzamarty A, Hernandez-Laguna A, Sainz-Diaz CI. Quantum Mechanical and QSAR Study of Some a-Arylpropionic Acids as Anti-Inflammatory Agents. Eur. J. Med. Chem. 1998;33:103–112. [Google Scholar]



