Abstract
The COVID-19 pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) remains to be a serious threat due to the lack of a specific therapeutic agent. Computational methods are particularly suitable for rapidly fight against SARS-CoV-2. This present research aims to systematically explore the interaction mechanism of a series of novel bicycloproline-containing SARS-CoV-2 Mpro inhibitors through integrated computational approaches. We designed six structurally modified novel SARS-CoV-2 Mpro inhibitors based on the QSAR study. The four designed compounds with higher docking scores were further explored through molecular docking, molecular dynamics (MD) simulations, free energy calculations, and residual energy contributions estimated by the MM-PBSA approach, with comparison to compound 23(PDB entry 7D3I). This research not only provides robust QSAR models as valuable screening tools for the development of anti-COVID-19 drugs, but also proposes the newly designed SARS-CoV-2 Mpro inhibitors with nanomolar activities that can be potentially used for further characterization to treat SARS-CoV-2 virus.
Keywords: QSAR, SARS-CoV-2, Molecular docking, Molecular dynamics simulations, MM/PBSA
Graphical abstract
1. Introduction
The unprecedented coronavirus disease (COVID-19) that spreads rapidly around the globe has led to a severe health crisis [1]. The COVID-19 pandemic is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which shares approximately 82% sequence identity with SARS-CoV [2]. The surprising speed of SARS-CoV-2 is attributed to a large number of mild or asymptomatic infections [3]. Remdesivir, which demonstrates pan-coronavirus suppression potential, and has obtained FDA approval for treating COVID-19 [4,5]. More data is still needed to prove its effectiveness. Although the strategy of using other existing drugs for the treatment of this disease has been exploited, effective treatment options are still limited [6,7]. It is imminent to provide effective and safe antiviral drugs against SARS-CoV-2.
For coronavirus, the cleavage of two polyproteins (pp1a/pp1ab) into a single non-structural protein is critical for the assembly of the viral replication–transcription [8], which is processed by main protease (Mpro, 3CL pro) and papain-like protease (PLpro), thus are essential in the normal expression of SARS-CoV-2 virus [9]. Especially, Mpro was identified as an attractive target for antiviral drug design on account of its vital role and no human homolog [10,11]. Compared with complex drug synthesis process, the computational approaches such as QSAR (quantitative structure-activity relationship), molecular docking, pharmacophore modeling [12], [13], [14], etc., are effective for finding new drug targets and reusing existing drugs [15], in the development of the anti-SARS-CoV-2 drugs. Furthermore, despite the SARS-CoV and SARS-CoV-2 have approximately 82% sequence similarity of Mpro, most of the computational strategy is to leverage drugs and biomolecules previously reported to have activity against related coronaviruses [16], [17], [18], [19], [20], and few reports on data specific to SARS-CoV-2 Mpro. In addition, previous literature on inhibitors of SARS-CoV Mpro has not included infection data in an animal model. The synthesized compounds tested in the animal model have more significance to accelerate the development of novel therapies against COVID-19. Amin et al. [21] performed the first structure-activity relationship analysis on the structure-activity data of recently reported diverse SARS-CoV-2 Mpro inhibitors to understand the structural requirements for higher inhibitory activity. However, the activity deviation between different data sets is not considered and it is unavoidable. Herein, we chose a series of 32 new SARS-CoV-2 Mpro inhibitors synthesized by Qiao et. al [22] to investigate the structural requirements of such inhibitors against SARS-CoV-2 Mpro using computational methods. Those SARS-CoV-2 Mpro inhibitors were based on the reported crystal structures of SARS-CoV-2 Mpro and cocrystal structures of SARS-CoV-2 Mpro in complex with the approved antivirals against hepatitis C virus infection, boceprevir (PDB entry 7COM) and telaprevir (PDB entry 7C7P), and two compounds were tested for antiviral activity in mice. The cocrystal structure of Mpro in complex with the synthesized compound 23, which is one of the most potent compounds, was released to revealed its interaction modes (PDB entry 7D3I). This provides new information for the rational design of new type SARS-CoV-2 Mpro inhibitors based on its good inhibitory effect and binding mode revealed in detail.
Molecular modeling in combination to molecular dynamics simulation has been widely used in computer-aided drug design [23,24]. QSAR is one of the effective methods for predicting the activity of compounds when data and appropriate experimental facilities are lacking [25]. Whereas molecular docking is a very fast and efficient tool for predicting how a small molecule will have a conformational geometry on the active site of the target protein [26]. These computational methods are particularly suitable for rapidly fighting against SARS-CoV-2. Therefore, the present study was conducted using QSAR coupled with molecular docking approach, MD simulation, and free binding energy MM/PBSA analysis, to investigate the structural requirements of SARS-CoV-2 Mpro inhibitors and to design novel compounds that can be helpful against SARS-CoV-2 infection. The outcomes could provide insightful direction to design more active peptide-type compounds which can be further synthesized and screened for their potential utilization in the SARS-CoV-2 Mpro inhibition.
2. Materials and methods
2.1. Data set
The experimental activity data to perform the molecular modeling of 32 new bicycloproline-containing Mpro inhibitors derived from either boceprevir or telaprevir were taken from the recent literature [22]. The corresponding half-maximal inhibitory concentration (IC50) nM value was converted into pIC50 (-log IC50) value, as the dependent variable in the analysis of QSAR models. 32 compounds were divided into a training set composed of 24 compounds (75%) for establishing QSAR models and a test set composed of 8 compounds (25%) for model validation (Fig. 1 ). The test and training sets were chosen in such a way that the chemical diversity and a good range of activity were maintained throughout both sets. Outliers are generally removed as they result in ambiguity in statistical models. The range of the biological activities spans 1.99 log units, from 6.13 to 8.12. The structures of all compounds and their activity values (pIC50) are listed in Table 1 .
Fig. 1.
The distribution of experimental inhibitory activity (pIC50) of the training and test set compounds in QSAR models.
Table 1.
Structures and activities of SARS-CoV-2 Mpro inhibitors.
| Comp. | ![]() |
||||||
|---|---|---|---|---|---|---|---|
| R2 | IC50(nM) | pIC50 | Comp. | R2 | IC50 (nM) | pIC50 | |
| 1 | ![]() |
453 | 6.34 | 17 | ![]() |
298.8 | 6.53 |
| 2 | ![]() |
52.1 | 7.28 | 18 | ![]() |
525.9 | 6.28 |
| 3* | ![]() |
16.5 | 7.78 | 19* | ![]() |
195.6 | 6.71 |
| 4 | ![]() |
18.5 | 7.73 | 20 | ![]() |
375.0 | 6.43 |
| 5 | ![]() |
13.2 | 7.88 | 21 | ![]() |
7.6 | 8.12 |
| 6 | ![]() |
14.5 | 7.84 | 22* | ![]() |
17.4 | 7.76 |
| 7* | ![]() |
43.3 | 7.36 | 23 | ![]() |
7.6 | 8.12 |
| 8 | ![]() |
37.2 | 7.43 | 24* | ![]() |
378.2 | 6.42 |
| 9 | ![]() |
15.2 | 7.82 | 25 | ![]() |
36.2 | 7.44 |
| 10 | ![]() |
50.8 | 7.29 | 26 | ![]() |
69.1 | 7.16 |
| 11* | ![]() |
13.3 | 7.88 | 27 | ![]() |
93.5 | 7.21 |
| 12 | ![]() |
19.0 | 7.72 | 28* | ![]() |
9.2 | 8.04 |
| 13 | ![]() |
12.4 | 7.91 | 29 | ![]() |
34.7 | 7.46 |
| 14 | ![]() |
13.0 | 7.89 | 30 | ![]() |
17.2 | 7.76 |
| 15* | ![]() |
748.5 | 6.13 | 31 | ![]() |
30.0 | 7.52 |
| 16 | ![]() |
153.1 | 6.82 | 32 | ![]() |
19.7 | 7.71 |
*: selected as the test set.
2.2. Molecular optimization
The 3D conformation of each compound was generated by the SketchTool module in the SYBYL software package. Partial atomic charges were calculated by Gasteiger–Huckel method. In addition, under the Tripos force field the Powell conjugate gradient minimization algorithm minimized the potential energies of the compounds. The energy convergence criterion was 0.005 kcal/mol Å with the maximum number of iterations at 1000 [27]. The IC50 values were all converted to pIC50 (-LogIC50) and added to the spreadsheet as compound variables. Other parameters adopt the default values of SYBYL 2.0
2.3. Molecular alignment
The alignment was carried out by the distill module implemented in SYBYL 2.0 in which a common core scaffold was identified among all the 32 molecules and was shown in Fig. 2 (a). The lowest conformation of compound 23 was selected as a template because the most active compound of the data set. The remaining compounds of the dataset were aligned on this highest active compound based upon the common substructure similarity (Fig. 2(b)).
Fig. 2.
Molecular alignment mode. (a): 2D structure of compound 23 (blue represents the common skeleton); (b): Molecular alignment based on compound 23.
2.4. CoMFA and CoMSIA descriptors
The CoMFA (Comparative molecular field analysis) and CoMSIA (Comparative molecular similarity index analysis) studies have been performed based on the molecular alignment using SYBYL software to explore the contributions of diverse interactions. CoMFA and CoMSIA fields were calculated for each compound in a dataset at every grid point using an sp3-hybridized carbon probe with a charge of +1.0 [28]. The stereo field action is calculated by Lennard-Jones Eq. (1), and the electrostatic field calculation adopts the Coulomb Eq. (2).
| (1) |
| (2) |
Where represents the space potential energy of the compound, and represent the corresponding atomic van der Waals radius constants, n is the total number of atoms in the molecule, is the electrostatic field energy of the compound, is the net charge of the atom calculated by the Gasteiger-Huckel method, is the probe charge of the atom, and D is the dielectric constant.
In the CoMSIA analysis, descriptors such as spatial, electrostatic, hydrophobic, hydrogen bond donor, and acceptor fields were obtained [29]. Compared to CoMFA, CoMSIA uses a Gaussian function and the similarity index of the Gaussian distribution to avoid mutations in the grid-based probe-atom interaction [30].
2.5. Topomer CoMFA descriptors
The Topomer CoMFA modeling is a combination of the spatial, electrostatic, and topological properties of molecules. Topomer CoMFA modeling eliminates the steps that affect the prediction results, such as molecular overlap, and defines the activity value of the compound as the contribution value represented by each fragment through molecule cutting [31]. Topomer is similar to CoMFA in creating a spatial contour map produced by a stereocenter and position, with the Tripos force field method of +1 atomic charge. Importantly, the robustness of the generated QSAR model depends on the appropriate cutting method. Herein, we chose the highest Mpro inhibitors (compound 23) as the template for cutting. According to the structural characteristics of the inhibitors, the 32 compounds are divided into fragments by the same cutting method (Fig. S1).
2.6. HQSAR descriptors
HQSAR (Holographic QSAR) converts the molecular structure into characteristic molecular fingerprints according to the types of molecular fragments and marks with numbers (53–401) [32]. These marked numbers are used as QSAR descriptors to establish the corresponding structure-activity relationship to predict the biological activity of the compounds. Different fragment descriptors that including Atoms (A), Bonds(B), Connections (C), Chirality (Ch), Hydrogen atoms(H), Donor and Acceptor (DA) were utilized in combination with adjusted dimensions and holographic lengths to obtain an HQSAR model with good predictive ability.
2.7. Partial least square (PLS) analysis
The Partial least square method [33] is applied to generate QSAR models as an extension of multiple regression analysis. The generated CoMFA, CoMSIA, Topomer CoMFA, and HQSAR descriptors were set as independent variables and Mpro inhibitory activity (pIC50) as the dependent variable. Cross-validation correlation coefficient (Eq. (3)) and the optimal number of components (ONC) was executed by Leave-one-out (LOO) cross-validation analysis. In order to further verify the model internally, the non-cross-validated and bootstrap PLS analysis was performed using ONC obtained from the LOO method. The non-cross-validation correlation coefficient () (Eq. (4)) value, the standard error estimate (SEE), the F-test value (F), and contributions of each field were determined to better evaluate the accuracy and robustness of the developed models. Usually, high and (> 0.5, > 0.6) values are considered to be the proof that the high predictive ability of the QSAR model is in line with expectations [34].
| (3) |
| (4) |
whereas, and are predicted, observed activity values, and and are observed and predicted mean activity values of the training set, respectively.
2.8. Validation of the QSAR models
Validation of the QSAR model is essential for understanding the predictive power of the established QSAR models. Critical evaluation of developed models involves internationally recognized internal and external validation parameters to test the robustness of the model in terms of fitness, stability, classical fitness measurement, and predictability. External verification parameters was calculated by Eq. (5) to judge the quality of external verification capabilities.
| (5) |
SD represents the sum of square deviations between the biological activity of the molecules in the test set and the average activity of the molecules in the training set, and PRESS is the sum of the square derivatives of the predicted and actual activities of the molecules in the test set. The established QSAR model considered to have the desired predictive ability with > 0.6 [35].
Other external parameters including ,, concordance correlation coefficient (CCC), metrics, the root means square error (RMSE), mean absolute error (MAE) [35] (RMSE and MAE are close to zero) for both training and test set compounds were calculated by Eqs. (6)-(12):
| (6) |
| (7) |
| (8) |
| (9) |
| (10) |
| (11) |
| (12) |
Where and indicate the response means of the training set and the external test set, respectively. The >0.5, <0.2, , values more than 0.7 usually represent that the model has ideal external predictability [36].
The Golbraikh-Tropsha method [37] was used to calculate other verification parameters of the QSAR model, where conditions are satisfied: 0.6, 0.1 and or ( 0.1) and , 0.1. ,,, was calculated by (13), (14), (15), (16).
| (13) |
| (14) |
| (15) |
| (16) |
represents the correlation between the experimental and predicted activities of the test set; k and k' represent the regression slope of the experimental activity and the predicted activity or the regression slope of the predicted activity and the observed activity through the origin. and represent the correlation coefficients between the observed and predicted activity of the test set without intercept.
2.9. Molecular docking
We have implemented molecular docking studies to explore the interaction pattern of compounds with their relevant enzyme (Mpro). Choosing an appropriate docking method is a crucial factor in determining whether the molecular docking simulation is realistic. We attempt to select the most suitable one from the following docking protocols to evaluate the binding mode between protein and ligand:
-
(1)
Surflex-Dock: Surflex-Dock is a Protomol-based method with fast and accurate molecular docking mean, which uses a unique empirical scoring function and a patented search engine (based on the molecular similarity search engine) to dock ligand in the binding site of the target protein [38].
-
(2)
Autodock 4.2: AutoDock4.2 relies on many approximate values to predict the conformation and binding free energy during docking simulation. Ligands are considered to be flexible. But different from traditional molecular mechanics methods, only torsion degrees of freedom is discussed to keep the bond angle and bond length constant [39]. The empirical free energy force field is based on a molecular mechanics force field, which includes typical terms for dispersion/repulsion, hydrogen bonding, electrostatics, desolvation, and torsional entropy.
-
(3)
Autodock Vina: Vina improves the average accuracy of combined mode prediction, and speeds up the search by using a simpler scoring function [40]. Vina's scoring function is completely empirical, including Gaussian spatial interaction, repulsion, hydrogen bonding, hydrophobic, and torsion terms. Furthermore, Vina has parallel calculation functions and is very easy to use [41].
The energies of the prepared proteins and ligands are minimized and unnecessary water molecules are removed. Under the Kollman united force field with the non-bonding cut-off value set to 8.0 and the dielectric constant set to 1.0, the protein performs 1000 energy minimization iterations according to the gradient of the Powell method. The grids were chosen to be sufficiently large to include not only the active site but also a significant portion of the surrounding surface at receptor protein with grid points 60 × 60 × 60, along with a grid spacing of 0.375 Å.
2.10. Molecular dynamics simulation
The MD simulation was performed in Gromacs 5.15 [42] with CHARMM36 (Feb-2021) force field [43] to investigate the interaction between the receptor and ligand, and the time step is set to 2 fs. The topology files of the protein and ligand were obtained by executing the pdb2gmx command and employing the CHARMM General Force Field [44], respectively. The system is immersed in a periodic boundary condition (PBC) 12-sided box composed of Tip3p water model, followed by ionization and neutralization in Na+ and Cl− ions. The energy minimization was performed for 50,000 steps using the steepest descent algorithm. Two equilibration phases for NVT and NPT are in sequence to balance the temperature of 300 K and the atmospheric pressure of 1 atm. The simulation of each selected complex was performed for 50 ns.
Molecular dynamics trajectories were analyzed using tools in the GROMACS suite, including root mean square deviation (RMSD), root mean square fluctuation (RMSF), solvent-accessible surface area (SASA), number of hydrogen bonds, and Radius of gyration (Rg). PCA/essential dynamics statistical tools are used to identify similarities and differences in specific data sets, and to detect patterns in specific data sets. The principal component analysis (PCA) method in Gromacs was utilized to analyze the eigenvectors and eigenvalues to obtain the projection of the first two principal components (PC1 and PC2) [45]. The trajectories on eigenvectors were projected by gmx anaeig module, and covariance matrix was constructed by Gmx covar module.
The molecular mechanics/Poisson-Boltzmann surface area (MM-PBSA) method was performed to estimate the binding free energy () of the SARS-CoV-2 Mpro-ligand complex, which is a widely used endpoint method in free energy calculations [46,47]. The binding free energies of the complex in liquid/solution were determined with the g_mmpbsa tool [48] by Eq. (17):
| (17) |
represents the net free binding energy, and indicates molecular mechanics potential energy, which consists of van der Waals () and electrostatic energy (). and indicate polar solvation and nonpolar solvation contributions, respectively. TΔS refers to conformational entropy contribution at temperature T. The contribution of TΔS term to the binding energy can be ignored when calculating relative binding affinity energy [49]. To further confirm the key residues involved in the binding modes, the binding free energy decomposition was calculated, which utilized the python script MmPbSaDecomp.py (available at http://rashmikumari.github.io/g_mmpbsa) after the calculation of every term of binding energy contribution. The amino acid residues whose binding energy contribution is less than −2 KJ/mol and greater than 2 KJ/mol are selected to reveal their favorable and unfavorable contributions in the simulation process.
2.11. In silico ADMET prediction
ADMET (drug absorption, distribution, metabolism, excretion, and toxicity) pharmacokinetic property is a very essential means in contemporary drug design and drug screening. The early ADMET property evaluation can effectively solve the problem of species differences, significantly improve the success rate of drug development, and reduce drug development costs, drug toxicity, and side effects, The ADMET properties of newly designed compounds were predicted using the Online web server ADMETlab2.0 [50], followed SwissADME online tool [51] for the prediction of their drug-likeness and synthetic difficulty.
3. Results and discussion
3.1. Statistical results of CoMFA and Topomer CoMFA models
The statistical values of CoMFA and Topomer CoMFA models constructed by steric and electrostatic descriptors are listed in Table 2 . The CoMFA model possessed the cross-validated value of 0.590 with nine components, a non -cross-validated correlation coefficient value of 0.966, a lower SEE value of 0.044, and a significant F value of 414.196, which indicates the good statistical correlation of the models. The contributions of the steric and electrostatic fields are 0.664 and 0.336, respectively. Furthermore, high bootstrapped () value of 0.992 and of 0.013 indicate a high confidence level in the evaluation for the established CoMFA Model. The Topomer CoMFA model results in a value of 0.522 with components of 5, high of 0.952 with a low SEE value of 0.139, and F values of 74.147. These values initially recommend that the generated CoMFA and Topomer CoMFA models are supposed to have competent predictive ability.
Table 2.
Statistical parameters of QSAR models.
| Parameters | CoMFA | Topomer COMFA | CoMSIA | HQSAR 2–10 |
|---|---|---|---|---|
| 0.590 | 0.522 | 0.571 | 0.827 | |
| 0.966 | 0.952 | 0.989 | 0.945 | |
| SEE | 0.044 | 0.139 | 0.074 | 0.145 |
| N | 9 | 5 | 8 | 4 |
| F | 414.196 | 71.147 | 173.352 | – |
| (100 run) | 0.992 | 0.998 | ||
| 0.013 | 0.003 | |||
| Field contributions | ||||
| Steric | 0.664 | – | 0.321 | – |
| Electrostatic | 0.336 | – | – | – |
| Hydrophobic | – | 0.679 | – |
3.2. Statistical results of CoMSIA models
The CoMSIA model has been generated by considering all possible combinations of CoMSIA field descriptors, including steric (S), electrostatic (E), hydrophobic (H), hydrogen bond donor (D), and hydrogen bond acceptor (A) using Gasteiger-Hückel partial atomic charges. Different combinations of CoMSIA field descriptors were carried out to determine and generate an efficient CoMSIA model (Table 2 and Table S1). The combination of steric and hydrophobic field descriptors showed that best statistical results, which yielded value of 0.571 with the component of 8, value of 0.989 with a SEE value of 0.072. The contributions of the steric and hydrophobic fields are 0.321 and 0.679, respectively. It seems that only considering the steric field descriptors gives a higher , but its low F value makes it not more robust than the combination of steric and hydrophobic field descriptors. Additionally, results of bootstrap analysis for 100 runs shown in Table 2 demonstrate that the CoMSIA model along with low values of the standard error estimations and the absence of systematic errors within the training data set (= 0.998, =0.003).
3.3. Statistical results of HQSAR
Eighteen HQSAR models were established by combining different fragment parameter descriptors (Table S2). The optimal HQSAR model is obtained while combining fragment descriptors Atoms (A), Bonds (B), Connections (C), Chirality (Ch), and followed the default fragment lengths 4–7, which gives to be 0.827 and 0.945, with a hologram length of 61 and a component of 4. Moreover, the best HQSAR model 9 was selected for further improvement of the quality. For this purpose, the fragment sizes of 1–10 were explored, with the higher as a chosen criterion. It was found in Table S3 that the best model represents the results the same as HQSAR model 9, which achieved default fragment lengths of 4–7.
3.4. Validation of QSAR model
High value of does not imply that the model has high predictive power, and the independent test set was chosen for further evaluation of the QSAR model to ensure its robustness and stability. Before using the test set as the external verification, it is crucial to eliminate possible systematic errors. The experimental and predicted pIC50 with residual values of CoMFA, CoMSIA, Topomer CoMFA, and HQSAR are shown in Table 3 , and graphically depicted in Fig. 3 . Compound 24 has large residual values in all QSAR models that are reflected in both Table 3 and Fig. 3, but it does not influence the quality of the QSAR model established by the training set. So we treat compound 24 as an outlier and remove it during calculated external verification. As shown in Table 4 , the QSAR models for the test set without compound 24 results in of 0.8471 (CoMFA), 0.8226 (Topomer CoMFA), 0.8179 (CoMSIA), 0.7508 (HQSAR) and high slope regression lines with and values of 0.9975 and 1.0019 (CoMFA), 0.9939 and 1.0049 (Topomer CoMFA), 0.9942 and 1.0047 (CoMSIA), 0.9967 and 1.0022 (HQSAR), respectively. and values of 0.8392 and 0.8078 (CoMFA), 0.8150 and 0.8129 (Topomer CoMFA), 0.8168 and 0.7854 (CoMSIA), 0.7503 and 0.7255 (HQSAR), respectively. and are calculated by and to be 0.0734 and 0.0793 (CoMFA), 0.0051 and 0.0052 (Topomer CoMFA), 0.0754 and 0.0816 (CoMSIA), 0.0650 and 0.0695 (HQSAR), respectively. and values are 0.6828 and 0.1550 (CoMFA), 0.7572 and 0.0585 (Topomer CoMFA), 0.6841 and 0.1659 (CoMSIA), 0.6576 and 0.1532 (HQSAR), respectively.
Table 3.
Actual and predicted activities of training and test set compounds.
| Comp. | Exp. pIC50 | CoMFA |
CoMSIA |
Topomer CoMFA |
Contribution |
HQSAR10–4 |
|||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Predicted | Residual | Predicted | Residual | Predicted | Residual | Ra | Rb | Predicted | Residual | ||
| 1 | 6.34 | 6.342 | −0.002 | 6.334 | 0.006 | 6.40 | −0.06 | 0.02 | −0.64 | 6.364 | −0.024 |
| 2 | 7.28 | 7.303 | −0.023 | 7.278 | 0.002 | 7.34 | −0.06 | 0.02 | 0.31 | 7.311 | −0.031 |
| 3* | 7.78 | 7.523 | 0.257 | 7.765 | 0.015 | 8.05 | −0.27 | 0.02 | 1.02 | 7.819 | −0.039 |
| 4 | 7.73 | 7.696 | 0.034 | 7.814 | −0.084 | 7.74 | −0.01 | 0.02 | 0.70 | 7.743 | −0.013 |
| 5 | 7.88 | 7.862 | 0.018 | 7.805 | 0.075 | 7.83 | 0.05 | 0.02 | 0.80 | 7.697 | 0.183 |
| 6 | 7.84 | 7.849 | −0.09 | 7.801 | 0.039 | 7.93 | −0.09 | 0.02 | 0.90 | 7.637 | 0.203 |
| 7* | 7.36 | 7.141 | −0.051 | 7.682 | −0.322 | 7.61 | −0.25 | 0.02 | 0.57 | 7.943 | −0.583 |
| 8 | 7.43 | 7.384 | 0.046 | 7.459 | −0.029 | 7.56 | −0.13 | 0.02 | 0.52 | 7.610 | −0.18 |
| 9 | 7.82 | 7.817 | 0.003 | 7.821 | −0.001 | 7.66 | 0.16 | 7.66 | 0.02 | 7.691 | 0.129 |
| 10 | 7.29 | 7.340 | −0.050 | 7.364 | −0.074 | 7.58 | −0.29 | 0.02 | 0.55 | 7.580 | −0.29 |
| 11* | 7.88 | 7.893 | −0.013 | 7.998 | −0.118 | 7.81 | 0.07 | 0.02 | 0.78 | 7.805 | 0.075 |
| 12 | 7.72 | 7.807 | −0.087 | 7.784 | −0.064 | 7.61 | 0.11 | 0.02 | 0.58 | 7.738 | −0.018 |
| 13 | 7.91 | 7.839 | 0.071 | 7.724 | 0.186 | 7.61 | 0.3 | 0.02 | 0.58 | 7.861 | 0.049 |
| 14 | 7.89 | 7.878 | 0.012 | 7.933 | −0.043 | 7.85 | 0.04 | 0.02 | 0.82 | 7.874 | 0.016 |
| 15* | 6.13 | 6.541 | −1.011 | 6.421 | −0.291 | 6.74 | −0.61 | −0.04 | −0.24 | 6.591 | −0.461 |
| 16 | 6.82 | 6.827 | −0.007 | 6.843 | −0.023 | 6.79 | 0.03 | −0.04 | −0.19 | 6.836 | −0.016 |
| 17 | 6.53 | 6.538 | −0.014 | 6.509 | 0.021 | 6.54 | −0.01 | −0.04 | −0.44 | 6.450 | 0.08 |
| 18 | 6.28 | 6.293 | −0.013 | 6.332 | −0.052 | 6.34 | −0.06 | −0.04 | −0.63 | 6.397 | −0.117 |
| 19* | 6.71 | 6.918 | −0.208 | 6.747 | −0.037 | 6.44 | 0.27 | −0.04 | −0.53 | 6.614 | 0.096 |
| 20 | 6.43 | 6.380 | 0.050 | 6.368 | 0.062 | 6.39 | 0.04 | −0.04 | −0.59 | 6.238 | 0.192 |
| 21 | 8.12 | 8.105 | 0.015 | 8.079 | 0.041 | 7.99 | 0.13 | −0.04 | 1.02 | 8.065 | 0.055 |
| 22* | 7.76 | 8.120 | −0.360 | 7.995 | −0.235 | 8.17 | −0.41 | −0.04 | 1.19 | 7.596 | 0.164 |
| 23 | 8.12 | 8.137 | −0.017 | 8.151 | −0.031 | 8.08 | 0.04 | −0.04 | 1.11 | 8.152 | −0.032 |
| 24* | 6.42 | 7.038 | −1.018 | 7.550 | −1.13 | 7.31 | −0.89 | −0.04 | 0.33 | 7.298 | −0.878 |
| 25 | 7.44 | 7.469 | −0.029 | 7.442 | −0.002 | 7.46 | −0.02 | −0.04 | 0.48 | 7.318 | 0.122 |
| 26 | 7.16 | 7.189 | −0.029 | 7.103 | 0.057 | 7.02 | 0.14 | −0.04 | 0.04 | 7.299 | −0.139 |
| 27 | 7.21 | 7.013 | 0.017 | 7.058 | 0.152 | 6.97 | 0.24 | −0.04 | −0.00 | 7.241 | −0.031 |
| 28* | 8.04 | 7.736 | 0.304 | 7.465 | 0.575 | 7.56 | 0.48 | −0.04 | 0.58 | 7.424 | 0.616 |
| 29 | 7.46 | 7.466 | −0.006 | 7.504 | −0.044 | 7.53 | −0.07 | −0.04 | 0.56 | 7.553 | −0.093 |
| 30 | 7.76 | 7.766 | −0.006 | 7.781 | −0.021 | 7.80 | −0.04 | −0.04 | 0.82 | 7.637 | 0.123 |
| 31 | 7.52 | 7.477 | 0.043 | 7.502 | 0.018 | 7.55 | −0.03 | −0.04 | 0.58 | 7.624 | −0.104 |
| 32 | 7.71 | 7.724 | −0.014 | 7.710 | 0.000 | 7.93 | −0.22 | −0.04 | 0.95 | 7.583 | 0.127 |
Fig. 3.
The plots of experimental and predicted activities based on training and test sets.
Table 4.
Verification parameter data statistics results.
| Parameters | Criterion | CoMFA | Topomer COMFA | CoMSIA | HQSAR 10–4 |
|---|---|---|---|---|---|
| External validation | > 0.6 | 0.832 | 0.674 | 0.805 | 0.678 |
| Golbraikh–Tropsha method | > 0.6 | 0.8471 | 0.8226 | 0.8179 | 0.7508 |
| 0.8392 | 0.8150 | 0.8168 | 0.7503 | ||
| 0.8078 | 0.8129 | 0.7854 | 0.7255 | ||
| < 0.1 | 0.0734 | 0.0051 | 0.0754 | 0.0650 | |
| < 0.1 | 0.0793 | 0.0052 | 0.0816 | 0.0695 | |
| 0.85 ≤ ≤ 1.15 | 0.9975 | 0.9939 | 0.9942 | 0.9967 | |
| 0.85 ≤≤ 1.15 | 1.0019 | 1.0049 | 1.0047 | 1.0022 | |
| > 0.5 | 0.6828 | 0.7572 | 0.6841 | 0.6576 | |
| < 0.2 | 0.1550 | 0.0585 | 0.1659 | 0.1532 | |
| > 0.7 | 0.8388 | 0.8123 | 0.8082 | 0.7507 | |
| > 0.7 | 0.8354 | 0.8088 | 0.8054 | 0.7503 | |
| Close to 0 | 0.2351 | 0.3165 | 0.2648 | 0.3187 | |
| 0.1657 | 0.2573 | 0.1788 | 0.2383 | ||
| 0.85 | 0.9033 | 0.9726 | 0.8908 | 0.8607 |
The models gave low RMSE, MAE, and high CCC of 0.2351, 0.1657 and 0.9033 (CoMFA), 0.3165, 0.2573 and 0.9726 (Topomer CoMFA), 0.2648, 0.1788 and 0.8908 (CoMSIA), 0.3187, 0.2383 and 0.8607 (HQSAR) for training and test set, respectively. In addition, and values of QSAR models represent 0.8388 and 0.8354 (CoMFA), 0.8123 and 0.8088 (Topomer CoMFA), 0.8082 and 0.8054 (CoMSIA), 0.7507 and 0.7503 (HQSAR), respectively. In summary, four optimal models produced ideal internal and external verification parameters, which shows that the QSAR models have good evaluation stability and good predictive ability.
3.5. Y-randomization test and applicability domain validation
The Y-randomization test was performed on the QSAR models to evaluate the robustness of the model and to avoid accidental correlations that may appear [52]. 10 random trials were run for the PLS model (Table S4), and all the results show small average values of and (, ). This proves that the good results of our original model are not due to chance correlation or structural dependency of the training set. The applicability domain (AD) is the theoretical region in the chemical space defined by the model descriptor and the model response in which the prediction is reliable [53]. The robust and predictable QSAR model that can be used to predict new compounds should be followed by an applicability domain [54]. The applicability domain test was performed using the standardized method developed by Roy (http://dtclab.webs.com/software-tools). Regarding the applicability domain QSAR models, there is no compound located outside the AD for the QSAR models, which is consistent with the characteristics of this type of SARS-CoV-2 inhibitor. No compound has a divergence from other compounds.
3.6. CoMFA and Topomer CoMFA contour map analysis
The contour map of the CoMFA/Topomer CoMFA/CoMSIA model is a tool to explain the important structural aspects through the favorable and unfavorable regions in different fields. In steric contour maps, the green contours indicate that the large substituent has higher activity, while yellow contours stand for the large substituent with lower activity. In electrostatic contour maps, red contours represent electronegative charge favored region and blue contour map represents the electropositive charge favored region. Regarding the CoMFA steric maps in Fig. 4 (a), big green contours covered the 2nd and 3rd positions at a phenyl moiety of compound 23, which suggests that the bulky group was favorable in the region. This agrees with the fact that the activity of compound 19 (pIC50 = 6.293) is higher than compound 18 (pIC50 = 6.918) and compound 20 (pIC50 = 6.380). The green contours that appear at the introduction of the phenyl ring indicate that bulky group favorable in the region, such as compound 2 with O atom has higher activity than compound 1 with a C atom. Furthermore, a big yellow contour around the R1 substituent suggests that the less-bulky group here is beneficial to increase the inhibitory activity, such as the activity of compound 21 (pIC50 = 8.12) is higher than compound 3 (pIC50 = 7.78).
Fig. 4.
The 3D contour maps of the steric and electrostatic fields of the CoMFA and Topomer CoMFA models with compound 23 as the template. (a): The steric contour map of the CoMFA model. (b): the electrostatic contour map of the CoMFA model; (c): The steric contour maps of the Topomer CoMFA model for Ra fragment of compound 23. (d): The electrostatic contour maps of the Topomer CoMFA model for Rb fragment of compound 23. (e): The steric contour maps of the Topomer CoMFA model for Rb fragment of compound 23. (f): The electrostatic contour maps of the Topomer CoMFA model for Rb fragment of compound 23.
Fig. 4(b) is the contour map of the electrostatic field of the compound 23. A large red area wrapped around the 2nd and 3rd positions at a phenyl moiety of compound 23 indicates that the electron-withdrawing (electronegative) substituent linked here will enhance the biological activity of compounds. This might be the reason that the pIC50 of compound 30 (pIC50 = 7.76, 2nd = -F) is higher than compound 29 (pIC50 = 7.46, 2nd = -Cl). The blue contours appeared at the introduction of phenyl ring suggests that electron-donating (electropositive) substituents are favorable in the region, which validates biological activities of compound 16 (pIC50 = 6.82, R2 = indole substituent) > 17(pIC50 = 6.52, R2 = pyrazolo[1,5-a]pyridine substituent) > 18 (pIC50 = 6.28, vinylbenzene substituent).
Fig. 4(c) and (d) show the steric contour maps and electrostatic contour maps of the Topomer CoMFA model for the Ra fragment, respectively. Since the structural substitution of the Ra fragments for 32 inhibitors is not obvious, the R1 groups of compound 1–14 and compound 15–32 are the same, which leads to their contour maps being similar. Similar to CoMFA steric contour results, the big yellow contour is around the R1 substituent suggests that a less bulky region here is favored. This is consistent with the Topomer CoMFA contribution results that the R1 substituent of compound 1–14 is higher than the R1 substituent of compound 15–32 (Table 3).
Fig. 4(e) represents the steric contour map of Topomer CoMFA for the Rb fragment. The region where the R2 substituent is connected to the amide group is wrapped in green contours, which indicates that the bulky group is favorable in the region. This explains that the activity of compound 20 (pIC50 = 6.43) is higher than compound 24 (pIC50 = 6.42). In addition, the green contours around the 3rd position at a phenyl moiety of compound 23 suggest that a bulky group here is beneficial to increase the SARS-CoV-2 inhibitory activity. An example is that the activity of compound 21 (pIC50 = 8.12, 3rd of phenyl = -F) is higher than compound 24 (pIC50 = 6.24, 3rd of phenyl = -H, 4th of phenyl = -N(CH3)2). The yellow contours around the 4th position at a phenyl moiety of compound 23 also suggest a less bulky group here is beneficial to increase the SARS-CoV-2 inhibitory activity. For instance, the activity of compound 28 (pIC50 = 8.04, 4th of phenyl = Cl) > compound 25 (pIC50 = 7.44, 4th of phenyl = –OCH3) > compound 27 (pIC50 = 7.21, 3rd and 4th of phenyl = 1,4-dioxane).
Fig. 4(f) represents the electrostatic contour map of Topomer CoMFA for the Rb fragment. The red contours near the 2nd, 3rd, and 4th positions at a phenyl ring of compound 23 indicate that the introduction of electronegative groups will increase the inhibitory activity. This is reflected in the inhibitory activity that compound 1 (pIC50 = 6.34) and compound 15 (pIC50 = 6.13) without an electronegative group at a phenyl ring result in a relative lower inhibitory activity. The blue contours around the introduction positions at the phenyl ring of compound 23 indicate electropositive region groups here is favored, which could explain that the activity of compound 2 (pIC50 = 7.28) which have an electropositive group at the introduction positions of the phenyl ring is higher than compound 1 (pIC50 = 6.34).
3.7. CoMSIA contour map analysis
The CoMSIA steric contour maps were both similar to the CoMFA/Topomer CoMFA contour maps discussed above (Fig. 5 (a)). Thus, only hydrophobic fields of CoMSIA were analyzed. In the hydrophobic contour map (Fig. 5(b)), the cyan contours show favorable hydrophobic regions, while magenta contours represent unfavorable hydrophobic regions. For the hydrophobic map, the cyan favorable region could be found around 2nd and 3rd positions at a phenyl ring of R2 substitutes, which indicates that the addition of hydrophobic substituents in this region would lead to an increase in the activity. For instance, compound 6 (pIC50 = 7.84) with a –OCH3 substituents at 3rd position of phenyl has higher activity than compound 8 (pIC50 = 7.43) with a less hydrophobic substituent (3rd of phenyl = -F). The magenta unfavorable region could be found around the 4th position at a phenyl ring of R2 substitutes, which indicates that the addition of hydrophobic substituents in this region would lead to decreased activity. For instance, the activity of compound 21 (4th of phenyl = -H) is higher than compound 22 (4th of phenyl = -F). Also, the magenta unfavorable region could be found at R1 substitutes, which reveals that R1 substitutes should be less hydrophobic substituents. This is further verified by the Topomer CoMFA contribution results.
Fig. 5.
The 3D contour maps of the steric and hydrophobic contour map fields of the CoMSIA model with compound 23 as the template.
3.8. HQSAR color code map analysis
The results of the HQSAR contribution map visualized the results of the HQSAR model by color-coding different atoms. The color of each atom reflects its contribution to the overall activity of each compound. The highest activity compound 21 (pIC50 = 8.12) and a moderate activity compound 15 (pIC50 = 6.13) were chosen to explore the relationship between the activity and atom contribution. As shown in Fig. 6 (a), the orange color at the R1 substitute revealed that the cyclopentyl provides a negative contribution, which is consistent with Topomer CoMFA results. In addition, the orange color appeared at 2nd position and 4th of the pyridine ring suggests that pyridine substituents here are not beneficial to increase the SARS-CoV-2 inhibitory activity, and the five-membered ring in the R2 substituent could not be a pyridine ring. The color code map of the highest activity compound 21 is shown in Fig. 6(b). The green atom at the junction of the benzene ring group and the amide group indicates this methylene group is beneficial to improve the inhibitory activity, e.g., compound 16 (pIC50 = 6.82) and compound 17 (pIC50 = 6.53) have lower inhibitory activity compared with other compounds with a methylene group. Furthermore, the carbon atom on the phenyl ring represents a yellow or green positive contribution except for the C atom at the 2nd position. However, the hydrogen atoms at the 4th and 2nd positions show positive contributions, which suggests that the 4th and 2nd positions at the phenyl ring should be hydrogen atoms instead of others groups. This is reflected on the inhibitory activity that compound 21 (pIC50 = 8.21, 4th = -H) is higher than compound 22 (pIC50 =8.21, 4th = -F), and that compound 12 (pIC50 =7.72, 4th = -H) is higher than compound 32 (pIC50 =7.5, 2nd = -CH3).
Fig. 6.
Atomic contribution maps of compound 15 (a) and compound 23 (b). (Cyan represents common backbone. The green or yellow on the contribution map shows a positive contribution, while orange and red indicate a negative contribution. Fragments of atoms show white as an intermediate contribution to biological activity).
3.9. Molecular docking evaluations
Molecular docking analysis has been utilized to clarify the binding modes between bicycloproline-containing compounds and the SARS-CoV-2 Mpro. The Root means square deviation (RMSD) value between the homologous ligand and the re-docking ligand is a criterion to ensure docking reliability. Herein, six SARS-CoV-2 Mpro-ligand systems were used to perform the re-docking process for determining the most suitable docking protocol. As shown in Table S5, the RMSD values between the homologous ligand and the re-docking ligand of all six SARS-CoV-2 Mpro complex processed by Autodock Vina are within 2 Å. The minimum RMSD value between the homologous ligand and the re-docking ligand value is 1.0111 Å (7D3I), which indicates that this pair is most likely to reproduce the natural binding modes between the ligand and the protein. As shown in Fig. S2, the re-docking ligand (yellow) and homologous ligand (blue) are almost completely superimposed. Based on that, the Autodock Vina docking protocol was chosen to reveal the binding modes of SARS-CoV-2 Mpro with inhibitors.
Fig. 7 illustrates the binding pocket of the receptor and compound 23 that interacts with the representative key residues. Amino acid residues Phe140, Cys145, His164 were observed to form the hydrogen bond with the N, and O atoms at the aligned common skeleton structure (Phe140-OH—N, 3.26 Å, 119.00 °; Cys145-N—HO, 2.89 Å, 102.9 °; His164-OH—N, 2.88 Å, 117.46 °). Multiple independent hydrogen bonds here could maintain the stability of the ligand at the active site. Hydrophobic cavities are formed among His41, Met49, Met165 amino acid residues at R1 substituent. This is consistent with the CoMSIA hydrophobic filed contour map results. Amino acid residue Glu166 was observed to form a hydrogen bond with the amide group among the R1 substituent and R2 substituent (Glu166-N—HO, 2.91 Å, 125.52 °). Importantly, amino acid residue Cys145 also formed a strong covalent bond with warhead aldehyde, which is essential for antiviral activity [22,55]. In addition, certain interactions between the F atom at the benzyl ring of the R2 substituent and the amino acid residues Pro168 and Thr190 were observed.
Fig. 7.
(a) 3D and (b) 2D views of the binding conformations and ligand interactions of the most active compound 23 at the active site of SARS-CoV-2 Mpro (7D3I). (Visualized by Discovery Studio Visualizer 2021).
Fig. 8 shows the docking results of compound 15 with moderate activity. Similarly, amino acid residues Phe140, Gly143, Ser144, Cys145, and His164 were observed to form a hydrogen bond with the N, and O atoms at the aligned common skeleton of compound 15 (Phe140-O—HN, 1.99 Å, 136.55 °; Gly143-NH—O, 2.82 Å, 117.67 °; Ser144-NH—O, 3.37 Å, 122.33 °; Cys145-S—HN, 2.73 Å, 138.17 °; His164-O—HN, 2.71 Å, 150.93 °). The hydrophobic cavity at R1 substituent shows hydrophobic interaction with amino acid residue Met49. For the structure of compound 15, there are only weak interactions on the R2 substituents because of the piperidinyl group here. Moreover, the direct connection between the piperidine group and the amide group hinders the formation of a hydrogen bond. This may enhance the strong interactions on the R2 substituents and would lead to a more stable bonding mode.
Fig. 8.
(a) 3D and (b) 2D views of the binding conformations and ligand interactions of the moderate active compound 15 at the active site of SARS-CoV-2 Mpro.
3.10. Prediction of new compounds with promising activity
The interpretation of QSAR models provide us more flexible design tips for developing powerful SARS-CoV-2 Mpro inhibitors: (a): Bulky, electronegative groups at the 2nd, 3rd, and 4th position of benzene ring were useful with regards to the increasing of SARS-CoV-2 Mpro inhibitory activity; (b): the introduction place of phenyl ring surrounded by bulky, electron-donating groups will conduce improve activity; (c): The R1 substitute should be a less bulky substitute for improving the inhibitory activity. Depending on the validated multi-QSAR modeling approaches, six new potential SARS-CoV-2 Mpro inhibitors were designed by modifying the R2 substitute (Easily added bulky –CH2CH3, -CN, -F, and -Br group with relatively electronegative charge were added at the benzene ring, and the introduction place of phenyl ring replaced by the bulky, electron-donating groups that selected from the data set) and retaining a less bulky R1 substitute (The R1 group of compound 1–14). Table 5 represents the structures of designed compounds N1 —N6 along with their predicted pIC50 values for SARS-CoV-2 Mpro. All the newly designed inhibitors showed high predicted pIC50 values compared to compound 23. This suggests that these compounds are potential candidates as SARS-CoV-2 inhibitors. However, these compounds still need to be synthesized combined with in vitro and in vivo evaluation to confirm their efficacy in inhibiting SARS-CoV-2 Mpro.
Table 5.
Novel-designed compounds with their predicted activities.
| Comp. | Structure | Pred. pIC50 |
|||
|---|---|---|---|---|---|
| CoMFA | CoMSIA | Topomer CoMFA | HQSAR | ||
| N1 | ![]() |
7.659 | 8.012 | 8.06 | 8.269 |
| N2 | ![]() |
7.747 | 7.772 | 7.86 | 7.827 |
| N3 | ![]() |
7.626 | 7.732 | 7.91 | 7.642 |
| N4 | ![]() |
7.638 | 7.708 | 8.01 | 7.665 |
| N5 | ![]() |
7.893 | 7.955 | 7.91 | 8.365 |
| N6 | ![]() |
7.980 | 7.611 | 7.84 | 7.615 |
Table 6 illustrates the Autodock vina docking scores and ligand interactions of newly designed compounds, and they show ideal docking scores from −7.1–8.0 Kcal/mol. The 2D and 3D views of the binding ligand interactions of the newly designed compounds are shown in Fig. S3 and Fig. S4, respectively. The docking poses of the newly designed compounds were similar with compound 23 and compound 15, in the active site of SARS-CoV-2 Mpro, which further verifies that these newly designed compounds have strong potential to produce similar inhibitory effects with the compounds in the data set. For compound N1 with relatively low docking scores, the R1 and R2 substituents mainly have hydrophobic interactions with the amino acid residues Leu27 and Met49 at the active site of the SARS-CoV-2 Mpro, and the remaining common skeleton position could be observed to form stable hydrogen bonds with amino acid residues Asn142, Ser144, His164 (Asn142-OH—O, 3.34 Å, 107.68 °; Ser144-OH—O, 2.98 Å, 166.46 °; His164 -O—HN, 2.91 Å, 135.20 °). In addition, unlike other compounds, the pyrrolidine group at the common skeleton of compound N1 does not produce hydrophobic or hydrogen bonding at the active site of SARS-CoV-2 Mpro. This explains that compound N1 has a relatively lower docking score than other newly designed compounds. Regarding compound N6 with a relatively higher docking score, the newly introduced -CN group at phenyl can form a stable hydrogen bond with the amino acid residue Gln192 (Gln192-NH—N, 3.23 Å). In addition, strong hydrogen bonds with amino acid residues Phe140, Gly143, Cys145, and His164 are also observed at the position of the aligned backbone (Phe140-O—HN, 2.09 Å, 162.19 °; Gly143-NH—O, 3.16 Å, 127.44 °; Cys145-SH—O, 3.68 Å, 107.5 °; Cys145-S—HN, 2.42 Å, 145.19 °; His164-O—HN, 2.52 Å, 130.39 °). The hydrophobic interaction with Met49 amino acid residue is also observed at the R1 substituent. These hydrophobic interactions and hydrogen bonds stabilize the ligand at the active site of the protein, which also explains the high docking score of the compound N6.
Table 6.
Novel designed compounds with their Autodock Vina fitness scores and amino acids interactions.
| Comp. | Structure | Binding energy score (Kcal/mol) | Hydrogen bond interacting residues | Hydrophobic interactions |
|---|---|---|---|---|
| N1 | ![]() |
−7.1 | Asn142, Ser144, His164 | Leu27, Met49 |
| N2 | ![]() |
−7.5 | Phe140, Cys145, His164 | Met49, Leu167, Pro168 |
| N3 | ![]() |
−7.4 | Phe140, Cys145, His164 | Met49 |
| N4 | ![]() |
−7.9 | Phe140, Gly143, Cys145, His164, Glu166, | Met49, Pro168 |
| N5 | ![]() |
−7.8 | Phe140, Gly143, Cys145, His164, Gln192 | Met49 |
| N6 | ![]() |
−8.0 | Phe140, Gly143, Cys145, His164, Gln192 | Met49 |
Taken together, the R1 substituents in the active site mainly form on the hydrophobic interaction with amino acid residue Leu27, Met49 to maintain a stable conformation, and the R2 substituents in the active site rely on the hydrophobic interaction with the amino acid residues Met165, Leu167, Pro168, and Thr190, and on the hydrogen bond with amino acid residue Gln192 to maintain a stable conformation. Several stable hydrogen bonds can be formed at aligned backbone position with amino acid residues Phe140, Gly143, Cys145, His164, and Glu166.
3.11. Molecular dynamics simulations
To validate the intrinsic atomic interactions and binding conformations of the newly designed compounds, 50 ns MD simulations were performed on the compounds with h higher docking score (N3, N4, N5, and N6 with Mpro), and 7D3I (compound 23 with Mpro). The overall stability and convergence of the protein-ligand system are monitored by the RMSD values (relative to the simulation time of the initial structure) of the backbone atoms and ligand atoms. 7D3I protein and SARS-CoV-2 Mpro-N4 system are stable after 5 ns of initial simulation, other SARS-CoV-2 Mpro-ligand systems are gradually stable after 20 ns (Fig. 9 (a)). The backbone atoms of SARS-CoV-2 Mpro show fluctuations < 3 Å. Furthermore, 7D3I protein and SARS-CoV-2 Mpro-N4 systems have lower fluctuations < 2 Å. This reveals that the fluctuation of the SARS-CoV-2 Mpro during the simulation is within an acceptable range. The RMSD values of the ligand fluctuation over time are shown in Fig. 9(b). Except that the N3 ligand fluctuates greater during the initial simulation process (from 0.1 nm to 0.4 nm), and is stable after 35 ns, the RMSD fluctuations of other ligands are all stable between 0.1 nm - 0.2 nm during the whole simulation process.
Fig. 9.
RMSD values of protein backbone (a) and RMSD values of ligands (b) at the active site of SARS-CoV-2 Mpro during 50 ns MD simulation.
The root means square fluctuation (RMSF) reflects the movement of each amino acid residue during the MD simulation. Analysis of RMSF values versus the residue numbers of the five simulated systems is depicted in Fig. 10 (a). These five complexes have similar RMSF distributions, which indicates that the binding of these inhibitors has similar effects on protein residue fluctuation patterns. The smaller fluctuations in the region of 20–50, and 140–200 are caused by the relevant residues in the active site of SARS-CoV-2 Mpro, such as the residues Leu27 and Met49 in the residue region of 20–50, and Gly143, Cys145, Met165, Pro168, His164, and Glu192 in the residue region of 140–200. The hydrophobic or hydrogen interactions between the ligand and residues in these regions result in fewer fluctuations. This is consistent with the docking results.
Fig. 10.
Comparison and detail representation of (a) RMSF: Root mean square fluctuations, (b) Rg: Radius of gyration, (c) SASA: Solvent accessible surface area, and (d) H-bond numbers between protein and ligand over the time of 50 ns of 7D3I and all the selected complexes.
The radius of gyration represents the change in the compactness of the protein after the ligand interaction. The respective average Rg values of 2.17, 2.21, 2.20 2.19, and 2.21 nm are calculated for 7D3I, Mpro-N3, Mpro-N4, Mpro-N5, and Mpro-N6, respectively. The radius of gyration of Mpro-N3 has relatively obvious fluctuations, then tends to balance around 30 ns (Fig. 10(b)). The Rg results show that all Mpro-ligand systems reach a level of compactness, which indicates that each compound can interact with SARS-CoV-2 Mpro without affecting its structural folding in the dynamic environment.
Solvent accessible surface area (SASA) is the surface area where ligands can interact with solvent molecules, which could examine interactions between the complex and the solvent during the simulation analysis. The average SASA values for the five complexes were calculated and plotted vs simulation time in Fig. 10(c), the ligand-protein complex 7D3I, Mpro-N3, Mpro-N4, Mpro-N5, and Mpro-N6, are observed to be 166, 170, 168, 167, and 166 nm2, respectively. Those similar average SASA values reveal that no major differences in the available protein structure.
The hydrogen bonding between the protein and the ligand is a crucial factor in maintaining the stability of the complex conformation. Fig. 10(d) shows the number of hydrogen bonds between the five ligands and protein complexes under 50 ns simulation. The number of hydrogen bonds varies for 7D3I (0–5), Mpro-N3 (0–3), Mpro-N4 (1–4), Mpro-N5 (0–3), and Mpro-N6 (1–5). Compared with other complexes, 2–3 number of hydrogen bonds of compound Mpro-N4 system are more stable in the simulation process. This demonstrates that the N4 ligand has better stability at the active site of SARS-CoV-2 Mpro.
The principal component analysis could predict the relative movement of complex systems, and the overall movement in the protein is described by several key feature eigenvectors. The projections of two eigenvectors for SARS-CoV-2 Mpro are docked with selected compounds extracted from respective MD trajectories and are represented in clusters depicted in Fig. 11 . Those Mpro-ligand systems have similar cluster patterns. It is known that a complex that occupies less phase space and shows a stable cluster would have higher stability, while a complex that occupies more space and shows an unstable cluster indicates worse stability. The Mpro-N3 and Mpro-N5 systems occupy more space, followed by the Mpro-N6 system, and show less stable clusters as compared to other complexes. The 7D3I and Mpro-N4 complexes occupy less phase space as compared to the other complexes, and show a stable cluster. These observations suggest the stability of newly designed compounds is similar with 7D3I, and ultimately can restrict the protein essential motions required for exhibiting enzymatic activity, which is also supported by RMSD, RMSF, SASA, Rg, protein-ligand interaction profiling, and the substantial docking scores (Table 6, Figs. 9 and 10).
Fig. 11.
Projection of the motion of the SARS-CoV-2 Mpro in phase space along the PC1 and PC2.
3.12. MM/PBSA binding free energy calculation
The binding free energies of the five complexes were calculated using the MM-PBSA approach and the results are summarized in Table 7 . The binding free energies () of the designed compounds N3 (−81.862±13.081 kJ/mol), N4 (−75.300±13.868 kJ/mol), and N6 (−74.266±18.405 kJ/mol) are bigger than the most active compound 23 (7D3I: −69.027±11.484 kJ/mol). The van der Waals () energy term is the main driving force for the molecular recognition of SARS-CoV-2 Mpro by the selected ligands, followed by electrostatic energy () term which plays a major contribution in strengthening the binding mode. Those results suggest the binding affinities of newly designed compounds (N3, N4, N6) are bigger in comparison with the most active compound 23 with active residues in the active site of SARS-CoV-2 Mpro, and confirm that the structural modification information provided by QSAR models for designing novel potent SARS-CoV-2 Mpro inhibitors are credible as well.
Table 7.
Average binding free energies (KJ/mol) of ligands obtained from MD simulation with their energy terms.
| Energy Terms (kJ/mol) | 7D3I | Mpro-N3 | Mpro-N4 | Mpro-N5 | Mpro-N6 |
|---|---|---|---|---|---|
| −137.326±13.519 | −142.122±15.282 | −155.720±17.691 | −133.824±13.860 | −173.253±11.491 | |
| −12.710±7.669 | −37.786±16.039 | −60.982±11.120 | −20.653±9.1232 | −27.571±14.523 | |
| 98.593±18.267 | 110.874±27.662 | 160.117±15.581 | 114.815±17.675 | 145.169±22.998 | |
| −17.584±1.68 | −14.828±1.961 | −18.716±1.715 | −17.141±1.299 | −18.611±1.061 | |
| −69.027±11.484 | −81.862±13.081 | −75.300±13.868 | −56.803±9.870 | −74.266±18.405 |
The amino acid residues involved in the interactions between SARS-CoV-2 Mpro and ligands are observed by monitoring the binding free energy contribution (Fig. 12 ). The amino acid residues Cys27, Met49, Cys145, His164, Met165, and Pro168 have favorable contributions with more negative binding free energy to all five ligands, in which they exert the hydrophobic interaction, except for Cys145 and His164 with H-bond interactions. This is common in almost all the studied SARS-CoV-2 Mpro-ligand systems. The amino acid residues Ala191 and Gln192 have favorable contributions with N5 and N6, which may be due to the new introduced -CN group at phenyl ring has H-bond interactions with acid residues Gln192 and hydrophobic interactions with Ala191. Furthermore, the amino acid residues of Met49, His164 and Met165 contribute more than others to the five complexes, which indicates their significant roles in ligand binding in five systems. Those findings confirm the docking results (Table 6). Interestingly, the important amino acid residue Cys145 seems not to have a higher binding free energy contribution in the simulation process, and the binding affinity between ligand and SARS-CoV-2 Mpro is more dependent on hydrophobic interactions.
Fig. 12.
Plots of binding free energy decomposition for the five complex systems.
3.13. ADMET and drug-likeness prediction
The bioavailability (absorption, distribution, metabolism, and excretion) of newly designed compounds were estimated to elucidate the drug-likeness of strong binders, which is an important criterion for approving any compound for clinical trials. ADMET scores for newly designed compounds are listed in Table 8 . The designed compounds show good human intestinal absorption (HIA+: 0.053–0.31) except N5, and the human oral bioavailability 20% (F20%+: 0.023–0.079), which indicates that those designed compounds have good absorption and utilization in the human body. Madin-Darby Canine Kidney cells (MDCK) permeability is widely regarded as the gold standard for in-vitro evaluation of the absorption efficiency of chemical substances in the body. The designed compounds also give moderate MDCK Permeability values (3 × 10−6 - 1 × 10−5) except compound N3. The drugs with high protein-binding rates may have a low therapeutic index. The prediction results of PPB indicate that most compounds have a high plasma protein binding rate (PPB < 90%). The BBB+ prediction results demonstrate that newly designed compounds seem to be potential CYP3A4 protease substrates except for compound N4. For the total clearance of newly designed compounds, they have moderate clearance rates except for compound N3, which indicates that N3 is not cleared from the body more slowly. Finally, the newly designed compounds were predicted to be AMES toxic.
Table 8.
ADMET prediction of newly designed compounds.
| Comp. | Absorption |
Distribution |
Metabolism |
Excretion | Toxicity | ||||
|---|---|---|---|---|---|---|---|---|---|
| HIA + | F 20%+ | MDCK Permeability | PPB (%) | BBB + | CYP2D6 substrate | CYP3A4 substrate | Total Clearance (mL/min/kg) | AMES toxicity | |
| N1 | 0.249 | 0.127 | 3e-06 | 55.03 | 0.116 | 0.225 | 0.556 | 6.434 | 0.007 |
| N2 | 0.053 | 0.038 | 3e-06 | 67.87 | 0.177 | 0.26 | 0.617 | 8.528 | 0.007 |
| N3 | 0.574 | 0.023 | 1e-05 | 83.78 | 0.201 | 0.234 | 0.766 | 1.35 | 0.005 |
| N4 | 0.13 | 0.075 | 2e-06 | 68.45 | 0.07 | 0.117 | 0.298 | 6.697 | 0.009 |
| N5 | 0.31 | 0.037 | 3e-06 | 59.81 | 0.085 | 0.207 | 0.515 | 6.422 | 0.007 |
| N6 | 0.293 | 0.079 | 4e-06 | 59.67 | 0.069 | 0.206 | 0.373 | 6.339 | 0.008 |
*Classification (HIA): Category 1: HIA+ (HIA < 30%); Category 0: HIA- (HIA < 30%); The output value is the probability of being HIA+; (F20%): Category 1: F20%+ (bioavailability < 20%); Category 0: F20%- (bioavailability ≥ 20%); The output value is the probability of being F20%+; MDCK Permeability: low permeability: < 2 × 10−6 cm/s. medium permeability: 2–20 × 10−6 cm/s high passive permeability: > 20 × 10−6 cm/s; PPB: Optimal: 〈 90%. Drugs with high protein-binding rate may have a low therapeutic index; BBB: Category 1: BBB+; Category 0: BBB-; The output value is the probability of being BBB+; Clearance: High: 〉 15 mL/min/kg; moderate: 5–15 mL/min/kg; low: < 5 mL/min/kg.
Drug-likeness prediction results and synthesis accessibility for all designed compounds are illustrated in Table S6. All designed compounds are acceptable considering the Lipinski Rule, Pfizer Rule, Golden Triangle and are rejected by GSK Rule. This is because the molecular weights of newly designed compounds, including compounds in the data set, are higher than 400 g/mol, which violates the GSK Rule. But the molecular weights of Remdesivir, which has been approved by the FDA for treating COVID-19, are also higher than 400 g/mol. Therefore, the GSK Rule is not an absolute condition for determining the drug-likeness properties. Furthermore, all the newly designed compounds are predicted to have convenient synthesis accessibility (4.37–4.71).
4. Conclusion
The quick and efficient development of active antiviral agents for therapeutic use is exceptionally challenging. In the current study, a series of bicycloproline-containing SARS-CoV-2 Mpro inhibitors are used for molecular modeling. The robust QSAR models were established to predict the enzyme inhibition of SARS-CoV-2 Mpro, with structural modification information to design six potent SARS-CoV-2 Mpro inhibitors. The binding modes of this kind of SARS-CoV-2 Mpro inhibitors reveal that Met 49, Pro168, Met165, Cys145, and His164 might be the key residues for maintaining a stable conformation at the active site of SARS-CoV-2 Mpro. The designed active compounds N3, N4, N5, and N6 with Mpro complexes that have higher activities are further evaluated in terms of intermolecular interactions, complex stability, and binding affinity against SARS-CoV-2 Mpro, with 7D3I as the reference complex. The evaluation results suggest all newly designed compounds with SARS-CoV-2 Mpro have a similar mechanism of activity and stability compared to 7D3I. Binding free energy calculations on the SARS-CoV-2 Mpro-ligand complexes also revealed that the compound N3, N4, and N6 have bigger binding free energy than compound 23 (PDB entry 7D3I), which indicates their potential to become leading compounds for treating SARS-CoV-2 virus. Those results verify the prediction ability of the QSAR models. The Van der Waals interaction energies would dominate five compounds interactions over electrostatic contributions. Per-residue energy decomposition analysis demonstrates binding affinity between the ligands and SARS-CoV-2 Mpro is more dependent on hydrophobic interactions. Furthermore, all newly designed compounds are confirmed to have good ADMET properties through the webserver prediction. Also, compared to recent efforts to identify promising attributes for previous coronavirus inhibitors, the present study deals with the existing SARS-CoV-2 Mpro inhibitors and avoids activity test deviation between different data sets [21]. Taken together, the results from this study provide a reference for the further rational design of novel SARS-CoV-2 Mpro inhibitors with high potency.
5. Funding
This work was supported by the National Natural Science Foundation of China (21475081), the Natural Science Foundation of Shaanxi Province of China (2019JM-237), and the Graduate Innovation Fund of Shaanxi University of Science and Technology.
CRediT authorship contribution statement
Ding Luo: . Jian-Bo Tong: Conceptualization, Methodology, Software, Writing – review & editing, Software, Writing – original draft. Xing Zhang: Visualization, Investigation. Xue-Chun Xiao: . Shuai Bian: Supervision.
Declaration of Competing Interest
The authors declare that they have no conflict of interest.
Acknowledgments
The authors would like to sincerely thank Prof. Ran Fang at Shaanxi University of Science and Technology for providing high-performance computer for research facilities.
Footnotes
Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.molstruc.2021.131378.
Appendix. Supplementary materials
References
- 1.Zhou P., Yang X.-.L., Wang X.-.G., Hu B., Zhang L., Zhang W., Si H.-.R., Zhu Y., Li B., Huang C.-.L., Chen H.-.D., Chen J., Luo Y., Guo H., Jiang R.-.D., Liu M.-.Q., Chen Y., Shen X.-.R., Wang X., Zheng X.-.S., Zhao K., Chen Q.-.J., Deng F., Liu L.-.L., Yan B., Zhan F.-.X., Wang Y.-.Y., Xiao G.-.F., Shi Z.-.L. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579(7798):270–273. doi: 10.1038/s41586-020-2012-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Lu R., Zhao X., Li J., Niu P., Yang B., Wu H., Wang W., Song H., Huang B., Zhu N., Bi Y., Ma X., Zhan F., Wang L., Hu T., Zhou H., Hu Z., Zhou W., Zhao L., Chen J., Meng Y., Wang J., Lin Y., Yuan J., Xie Z., Ma J., Liu W.J., Wang D., Xu W., Holmes E.C., Gao G.F., Wu G., Chen W., Shi W., Tan W. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. The Lancet. 2020;395(10224):565–574. doi: 10.1016/S0140-6736(20)30251-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Hung I.F.-N., Cheng V.C.-C., Li X., Tam A.R., Hung D.L.-L., Chiu K.H.-Y., Yip C.C.-Y., Cai J.-.P., Ho D.T.-Y., Wong S.-.C., Leung S.S.-M., Chu M.-.Y., Tang M.O.-Y., Chen J.H.-K., Poon R.W.-S., Fung A.Y.-F., Zhang R.R., Yan E.Y.-W., Chen L.-.L., Choi C.Y.-K., Leung K.-.H., Chung T.W.-H., Lam S.H.-Y., Lam T.P.-W., Chan J.F.-W., Chan K.-.H., Wu T.-.C., Ho P.-.L., Chan J.W.-M., Lau C.-.S., To K.K.-W., Yuen K.-.Y. SARS-CoV-2 shedding and seroconversion among passengers quarantined after disembarking a cruise ship: a case series. Lancet Infect. Dis. 2020;20(9):1051–1060. doi: 10.1016/S1473-3099(20)30364-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Beigel J.H., Tomashek K.M., Dodd L.E., Mehta A.K., Zingman B.S., Kalil A.C., Hohmann E., Chu H.Y., Luetkemeyer A., Kline S., Lopez de Castilla D., Finberg R.W., Dierberg K., Tapson V., Hsieh L., Patterson T.F., Paredes R., Sweeney D.A., Short W.R., Touloumi G., Lye D.C., Ohmagari N., Oh M.-d., Ruiz-Palacios G.M., Benfield T., Fätkenheuer G., Kortepeter M.G., Atmar R.L., Creech C.B., Lundgren J., Babiker A.G., Pett S., Neaton J.D., Burgess T.H., Bonnett T., Green M., Makowski M., Osinusi A., Nayak S., Lane H.C. Remdesivir for the treatment of Covid-19 — final report. N. Engl. J. Med. 2020;383(19):1813–1826. doi: 10.1056/NEJMoa2007764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Wang M., Cao R., Zhang L., Yang X., Liu J., Xu M., Shi Z., Hu Z., Zhong W., Xiao G. Remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus (2019-nCoV) in vitro. Cell Res. 2020;30(3):269–271. doi: 10.1038/s41422-020-0282-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kneller D.W., Galanie S., Phillips G., O'Neill H.M., Coates L., Kovalevsky A. Malleability of the SARS-CoV-2 3CL Mpro active-site cavity facilitates binding of clinical antivirals. Structure. 2020;28(12):1313–1320. doi: 10.1016/j.str.2020.10.007. e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bai Y., Ye F., Feng Y., Liao H., Song H., Qi J., Gao G.F., Tan W., Fu L., Shi Y. Structural basis for the inhibition of the SARS-CoV-2 main protease by the anti-HCV drug narlaprevir. Signal Transduct. Target. Ther. 2021;6(1):51. doi: 10.1038/s41392-021-00468-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Hegyi A., Ziebuhr J. Conservation of substrate specificities among coronavirus main proteases. J. Gen. Virol. 2002;83(3):595–599. doi: 10.1099/0022-1317-83-3-595. [DOI] [PubMed] [Google Scholar]
- 9.Jin Z., Du X., Xu Y., Deng Y., Liu M., Zhao Y., Zhang B., Li X., Zhang L., Peng C., Duan Y., Yu J., Wang L., Yang K., Liu F., Jiang R., Yang X., You T., Liu X., Yang X., Bai F., Liu H., Liu X., Guddat L.W., Xu W., Xiao G., Qin C., Shi Z., Jiang H., Rao Z., Yang H. Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors. Nature. 2020;582(7811):289–293. doi: 10.1038/s41586-020-2223-y. [DOI] [PubMed] [Google Scholar]
- 10.Pillaiyar T., Manickam M., Namasivayam V., Hayashi Y., Jung S.-.H. An overview of Severe Acute Respiratory Syndrome–Coronavirus (SARS-CoV) 3CL protease inhibitors: peptidomimetics and small molecule chemotherapy. J. Med. Chem. 2016;59(14):6595–6628. doi: 10.1021/acs.jmedchem.5b01461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Yang H., Xie W., Xue X., Yang K., Ma J., Liang W., Zhao Q., Zhou Z., Pei D., Ziebuhr J., Hilgenfeld R., Yuen K.Y., Wong L., Gao G., Chen S., Chen Z., Ma D., Bartlam M., Rao Z. Design of wide-spectrum inhibitors targeting coronavirus main proteases. PLoS Biol. 2005;3(10):e324. doi: 10.1371/journal.pbio.0030324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chtita S., Belhassan A., Bakhouch M., Taourati A.I., Aouidate A., Belaidi S., Moutaabbid M., Belaaouad S., Bouachrine M., Lakhlifi T. QSAR study of unsymmetrical aromatic disulfides as potent avian SARS-CoV main protease inhibitors using quantum chemical descriptors and statistical methods. Chemometr. Intell. Lab. Syst. 2021;210 doi: 10.1016/j.chemolab.2021.104266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Islam R., Parves M.R., Paul A.S., Uddin N., Rahman M.S., Mamun A.A., Hossain M.N., Ali M.A., Halim M.A. A molecular modeling approach to identify effective antiviral phytochemicals against the main protease of SARS-CoV-2. J. Biomol. Struct. Dyn. 2020:1–12. doi: 10.1080/07391102.2020.1761883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Alves V.M., Bobrowski T., Melo-Filho C.C., Korn D., Auerbach S., Schmitt C., Muratov E.N., Tropsha A. QSAR Modeling of SARS-CoV Mpro inhibitors identifies sufugolix, cenicriviroc, proglumetacin, and other drugs as candidates for repurposing against SARS-CoV-2. Mol. Inform. 2021;40(1) doi: 10.1002/minf.202000113. [DOI] [PubMed] [Google Scholar]
- 15.Yuan S., Yin X., Meng X., Chan J.F.-W., Ye Z.-.W., Riva L., Pache L., Chan C.C.-Y., Lai P.-.M., Chan C.C.-S., Poon V.K.-M., Lee A.C.-Y., Matsunaga N., Pu Y., Yuen C.-.K., Cao J., Liang R., Tang K., Sheng L., Du Y., Xu W., Lau C.-.Y., Sit K.-.Y., Au W.-.K., Wang R., Zhang Y.-.Y., Tang Y.-.D., Clausen T.M., Pihl J., Oh J., Sze K.-.H., Zhang A.J., Chu H., Kok K.-.H., Wang D., Cai X.-.H., Esko J.D., Hung I.F.-N., Li R.A., Chen H., Sun H., Jin D.-.Y., Sun R., Chanda S.K., Yuen K.-.Y. Clofazimine broadly inhibits coronaviruses including SARS-CoV-2. Nature. 2021;593(7859):418–423. doi: 10.1038/s41586-021-03431-4. [DOI] [PubMed] [Google Scholar]
- 16.Pathan Mohsin K., Vinay K., Kunal R. In silico modeling of small molecule carboxamides as inhibitors of SARS-CoV 3CL protease: an approach towards combating COVID-19. Comb. Chem. High Throughput. Screen. 2020;23:1–19. doi: 10.2174/1386207323666200914094712. [DOI] [PubMed] [Google Scholar]
- 17.Ghosh K., Amin S.A., Gayen S., Jha T. Chemical-informatics approach to COVID-19 drug discovery: exploration of important fragments and data mining based prediction of some hits from natural origins as main protease (Mpro) inhibitors. J. Mol. Struct. 2021;1224 doi: 10.1016/j.molstruc.2020.129026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.De P., Bhayye S., Kumar V., Roy K. In silico modeling for quick prediction of inhibitory activity against 3CLpro enzyme in SARS CoV diseases. J. Biomol. Struct. Dyn. 2020:1–27. doi: 10.1080/07391102.2020.1821779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Tong J.-.B., Luo D., Xu H.-.Y., Bian S., Zhang X., Xiao X.-.C., Wang J. A computational approach for designing novel SARS-CoV-2 Mpro inhibitors: combined QSAR, molecular docking, and molecular dynamics simulation techniques. New J. Chem. 2021;45(26):11512–11529. [Google Scholar]
- 20.Abu-Saleh A.A.-A.A., Awad I.E., Yadav A., Poirier R.A. Discovery of potent inhibitors for SARS-CoV-2′s main protease by ligand-based/structure-based virtual screening, MD simulations, and binding energy calculations. Phys. Chem. Chem. Phys. 2020;22(40):23099–23106. doi: 10.1039/d0cp04326e. [DOI] [PubMed] [Google Scholar]
- 21.Amin S.A., Banerjee S., Singh S., Qureshi I.A., Gayen S., Jha T. First structure–activity relationship analysis of SARS-CoV-2 virus main protease (Mpro) inhibitors: an endeavor on COVID-19 drug discovery. Mol. Divers. 2021 doi: 10.1007/s11030-020-10166-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Qiao J., Li Y.-.S., Zeng R., Liu F.-.L., Luo R.-.H., Huang C., Wang Y.-.F., Zhang J., Quan B., Shen C., Mao X., Liu X., Sun W., Yang W., Ni X., Wang K., Xu L., Duan Z.-.L., Zou Q.-.C., Zhang H.-.L., Qu W., Long Y.-H.-P., Li M.-.H., Yang R.-.C., Liu X., You J., Zhou Y., Yao R., Li W.-.P., Liu J.-.M., Chen P., Liu Y., Lin G.-.F., Yang X., Zou J., Li L., Hu Y., Lu G.-.W., Li W.-.M., Wei Y.-.Q., Zheng Y.-.T., Lei J., Yang S. SARS-CoV-2 Mpro inhibitors with antiviral activity in a transgenic mouse model. Science. 2021;371(6536):1374–1378. doi: 10.1126/science.abf1611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Muhseen Z.T., Hameed A.R., Al-Hasani H.M.H., Tahir ul Qamar M., Li G. Promising terpenes as SARS-CoV-2 spike receptor-binding domain (RBD) attachment inhibitors to the human ACE2 receptor: integrated computational approach. J. Mol. Liq. 2020;320 doi: 10.1016/j.molliq.2020.114493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wang Y.-.L., Wang F., Shi X.-.X., Jia C.-.Y., Wu F.-.X., Hao G.-.F., Yang G.-.F. Cloud 3D-QSAR: a web tool for the development of quantitative structure–activity relationship models in drug discovery. Brief Bioinform. 2021;22(4):bbaa276. doi: 10.1093/bib/bbaa276. [DOI] [PubMed] [Google Scholar]
- 25.Seetaha S., Boonyarit B., Tongsima S., Songtawee N., Choowongkomon K. Potential tripeptides against the tyrosine kinase domain of human epidermal growth factor receptor (HER) 2 through computational and kinase assay approaches. J. Mol. Graph. Model. 2020;97 doi: 10.1016/j.jmgm.2020.107564. [DOI] [PubMed] [Google Scholar]
- 26.Weng G., Gao J., Wang Z., Wang E., Hu X., Yao X., Cao D., Hou T. Comprehensive evaluation of fourteen docking programs on protein–peptide complexes. J. Chem. Theory Comput. 2020;16(6):3959–3969. doi: 10.1021/acs.jctc.9b01208. [DOI] [PubMed] [Google Scholar]
- 27.Tong J.-.B., Luo D., Zhang X., Bian S. Design of novel SHP2 inhibitors using Topomer CoMFA, HQSAR analysis, and molecular docking. Struct. Chem. 2021;32(3):1061–1076. [Google Scholar]
- 28.Razmazma H., Ebrahimi A., Hashemi M. Structural insights for rational design of new PIM-1 kinase inhibitors based on 3,5-disubstituted indole derivatives: an integrative computational approach. Comput. Biol. Med. 2020;118 doi: 10.1016/j.compbiomed.2020.103641. [DOI] [PubMed] [Google Scholar]
- 29.Lorca M., Morales-Verdejo C., Vásquez-Velásquez D., Andrades-Lagos J., Campanini-Salinas J., Soto-Delgado J., Recabarren-Gajardo G., Mella J. Structure-activity relationships based on 3D-QSAR CoMFA/CoMSIA and design of aryloxypropanol-amine agonists with selectivity for the human β3-adrenergic receptor and anti-obesity and anti-diabetic profiles. Molecules. 2018;23(5) doi: 10.3390/molecules23051191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Verma J., Khedkar V.M., Coutinho E.C. 3D-QSAR in drug design - a review. Curr. Top. Med. Chem. 2010;10(1):95–115. doi: 10.2174/156802610790232260. [DOI] [PubMed] [Google Scholar]
- 31.Tong J.-.B., Luo D., Bian S., Zhang X. Structural investigation of tetrahydropteridin analogues as selective PLK1 inhibitors for treating cancer through combined QSAR techniques, molecular docking, and molecular dynamics simulations. J. Mol. Liq. 2021;335 [Google Scholar]
- 32.Veríssimo G.C., Menezes Dutra E.F., Teotonio Dias A.L., de Oliveira Fernandes P., Kronenberger T., Gomes M.A., Maltarollo V.G. HQSAR and random forest-based QSAR models for anti-T. vaginalis activities of nitroimidazoles derivatives. J. Mol. Graph. Model. 2019;90:180–191. doi: 10.1016/j.jmgm.2019.04.007. [DOI] [PubMed] [Google Scholar]
- 33.David T.S. QSAR and QSPR model interpretation using Partial Least Squares (PLS) analysis. Curr. Comput. Aided Drug Des. 2012;8(2):107–127. doi: 10.2174/157340912800492357. [DOI] [PubMed] [Google Scholar]
- 34.Gramatica P., Sangion A. A historical excursus on the statistical validation parameters for QSAR models: a clarification concerning metrics and terminology. J. Chem. Inf. Model. 2016;56(6):1127–1131. doi: 10.1021/acs.jcim.6b00088. [DOI] [PubMed] [Google Scholar]
- 35.Roy K., Das R.N., Ambure P., Aher R.B. Be aware of error measures. Further studies on validation of predictive QSAR models. Chemometr. Intell. Lab. Syst. 2016;152:18–33. [Google Scholar]
- 36.Consonni V., Ballabio D., Todeschini R. Comments on the definition of the Q2 parameter for QSAR validation. J. Chem. Inf. Model. 2009;49(7):1669–1678. doi: 10.1021/ci900115y. [DOI] [PubMed] [Google Scholar]
- 37.Golbraikh A., Tropsha A. Beware of q2! J. Mol. Graph. Model. 2002;20(4):269–276. doi: 10.1016/s1093-3263(01)00123-1. [DOI] [PubMed] [Google Scholar]
- 38.Meng L., Feng K., Ren Y. Molecular modelling studies of tricyclic triazinone analogues as potential PKC-θ inhibitors through combined QSAR, molecular docking and molecular dynamics simulations techniques. J. Taiwan Instit. Chem. Eng. 2018;91:155–175. [Google Scholar]
- 39.Cosconati S., Forli S., Perryman A.L., Harris R., Goodsell D.S., Olson A.J. Virtual screening with AutoDock: theory and practice. Expert Opin. Drug Discov. 2010;5(6):597–607. doi: 10.1517/17460441.2010.484460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Nguyen N.T., Nguyen T.H., Pham T.N.H., Huy N.T., Bay M.V., Pham M.Q., Nam P.C., Vu V.V., Ngo S.T. Autodock vina adopts more accurate binding poses but autodock4 forms better binding affinity. J. Chem. Inf. Model. 2020;60(1):204–211. doi: 10.1021/acs.jcim.9b00778. [DOI] [PubMed] [Google Scholar]
- 41.Trott O., Olson A.J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010;31(2):455–461. doi: 10.1002/jcc.21334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Van Der Spoel D., Lindahl E., Hess B., Groenhof G., Mark A.E., Berendsen H.J.C. GROMACS: fast, flexible, and free. J. Comput. Chem. 2005;26(16):1701–1718. doi: 10.1002/jcc.20291. [DOI] [PubMed] [Google Scholar]
- 43.Brooks B.R., Brooks Iii C.L., Mackerell Jr A.D., Nilsson L., Petrella R.J., Roux B., Won Y., Archontis G., Bartels C., Boresch S., Caflisch A., Caves L., Cui Q., Dinner A.R., Feig M., Fischer S., Gao J., Hodoscek M., Im W., Kuczera K., Lazaridis T., Ma J., Ovchinnikov V., Paci E., Pastor R.W., Post C.B., Pu J.Z., Schaefer M., Tidor B., Venable R.M., Woodcock H.L., Wu X., Yang W., York D.M., Karplus M. CHARMM: the biomolecular simulation program. J. Comput. Chem. 2009;30(10):1545–1614. doi: 10.1002/jcc.21287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Vanommeslaeghe K., MacKerell A.D. Automation of the CHARMM General Force Field (CGenFF) I: bond perception and atom typing. J. Chem. Inf. Model. 2012;52(12):3144–3154. doi: 10.1021/ci300363c. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.U. N., K. S. Molecular dynamics simulation study on Thermotoga maritima EngA: GTP/GDP bound state of the second G-domain influences the domain–domain interface interactions. J. Biomol. Struct. Dyn. 2020:1–13. doi: 10.1080/07391102.2020.1826359. [DOI] [PubMed] [Google Scholar]
- 46.Wang E., Sun H., Wang J., Wang Z., Liu H., Zhang J.Z.H., Hou T. End-point binding free energy calculation with MM/PBSA and MM/GBSA: strategies and applications in drug design. Chem. Rev. 2019;119(16):9478–9508. doi: 10.1021/acs.chemrev.9b00055. [DOI] [PubMed] [Google Scholar]
- 47.Sun H., Li Y., Shen M., Tian S., Xu L., Pan P., Guan Y., Hou T. Assessing the performance of MM/PBSA and MM/GBSA methods. 5. Improved docking performance using high solute dielectric constant MM/GBSA and MM/PBSA rescoring. Phys. Chem. Chem. Phys. 2014;16(40):22035–22045. doi: 10.1039/c4cp03179b. [DOI] [PubMed] [Google Scholar]
- 48.Kumari R., Kumar R., Lynn A. g_mmpbsa—A GROMACS tool for high-throughput MM-PBSA calculations. J. Chem. Inf. Model. 2014;54(7):1951–1962. doi: 10.1021/ci500020m. [DOI] [PubMed] [Google Scholar]
- 49.Hou T., Wang J., Li Y., Wang W. Assessing the performance of the MM/PBSA and MM/GBSA methods. 1. The accuracy of binding free energy calculations based on molecular dynamics simulations. J. Chem. Inf. Model. 2011;51(1):69–82. doi: 10.1021/ci100275a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Xiong G., Wu Z., Yi J., Fu L., Yang Z., Hsieh C., Yin M., Zeng X., Wu C., Lu A., Chen X., Hou T., Cao D. ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucl. Acid. Res. 2021 doi: 10.1093/nar/gkab255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Daina A., Michielin O., Zoete V. SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci. Rep. 2017;7(1):42717. doi: 10.1038/srep42717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Nossa González D.L., Gómez Castaño J.A., Rozo Núñez W.E., Duchowicz P.R. Antiprotozoal QSAR modelling for trypanosomiasis (Chagas disease) based on thiosemicarbazone and thiazole derivatives. J. Mol. Graph. Model. 2021;103 doi: 10.1016/j.jmgm.2020.107821. [DOI] [PubMed] [Google Scholar]
- 53.Chatterjee M., Roy K. Prediction of aquatic toxicity of chemical mixtures by the QSAR approach using 2D structural descriptors. J. Hazard Mater. 2021;408 doi: 10.1016/j.jhazmat.2020.124936. [DOI] [PubMed] [Google Scholar]
- 54.Hammoudi N.-E.-H., Sobhi W., Attoui A., Lemaoui T., Erto A., Benguerba Y. In silico drug discovery of Acetylcholinesterase and Butyrylcholinesterase enzymes inhibitors based on Quantitative Structure-Activity Relationship (QSAR) and drug-likeness evaluation. J. Mol. Struct. 2021;1229 [Google Scholar]
- 55.Dai W., Zhang B., Jiang X.-.M., Su H., Li J., Zhao Y., Xie X., Jin Z., Peng J., Liu F., Li C., Li Y., Bai F., Wang H., Cheng X., Cen X., Hu S., Yang X., Wang J., Liu X., Xiao G., Jiang H., Rao Z., Zhang L.-.K., Xu Y., Yang H., Liu H. Structure-based design of antiviral drug candidates targeting the SARS-CoV-2 main protease. Science. 2020;368(6497):1331–1335. doi: 10.1126/science.abb4489. (80-) [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


























































