High predictive QSAR models for predicting the SARS coronavirus main protease inhibition activity of ketone-based covalent inhibitors

Bakhtyar Sepehri; Mohammad Kohnehpoushi; Raouf Ghavami

doi:10.1007/s13738-021-02426-2

. 2021 Oct 26;19(5):1865–1876. doi: 10.1007/s13738-021-02426-2

High predictive QSAR models for predicting the SARS coronavirus main protease inhibition activity of ketone-based covalent inhibitors

Bakhtyar Sepehri ^1,^✉, Mohammad Kohnehpoushi ¹, Raouf Ghavami ¹

PMCID: PMC8547569

Abstract

In this research, a dataset including 29 ketone-based covalent inhibitors with SARS-CoV-1 3CL^pro inhibition activity was used to develop high predictive QSAR models. Twenty-two molecules were put in train set and seven molecules in test set. By using stepwise MLR method for molecules in train set, four molecular descriptors including Mor26p, Hy, GATS7p and Mor04v were selected to build QSAR models. MLR and ANN methods were used to create QSAR models for predicting the activity of molecules in both train and test sets. Both QSAR models were validated by calculating several statistical parameters. R² values for the test set of MLR and ANN models were 0.93 and 0.95, respectively, and RMSE values for their test sets were 0.24 and 0.17, respectively. Other calculated statistical parameters (especially $Q_{F 3}^{2}$ parameter) show that created ANN model has more predictive power with respect to developed MLR model (with four descriptor). Calculated leverages for all molecules show that predicted pIC₅₀ (by both QSAR models) for all molecules is acceptable, and drawn residuals plots show that there is no systematic error in building both QSAR modes. Also, based on developed MLR model, used molecular descriptors were interpreted.

Keywords: QSAR, SARS-CoV-1, SARS-CoV-2, 3CL^pro inhibition activity, COVID-19

Introduction

Coronavirus disease 19 (COVID-19) is a pandemic disease that has affected the health of peoples in the whole world. Until May 6, 2021, the World Health Organization (WHO) had reported 155,506,494 infected cases to COVID-19 (including 3,247,228 deaths) [1, 2]. The disease has spread from Wuhan in China (in late 2019) by a virus that has called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Since some coronaviruses had been transmitted from animals to humans, probably, the similar event has happened for SARS-CoV-2 [3–6]. Before COVID-19 pandemic, two coronaviruses including severe acute respiratory syndrome coronavirus (SARS-CoV-1) and Middle East respiratory syndrome coronavirus (MERS-CoV) had been transmitted to human from animals [7, 8]. Although SARS-CoV-2 has lower mortality rate (2.3%) with respect to SARS-CoV-1 (mortality rate 10%) and MERS-CoV (mortality rate 35%), it has higher reproductive number (2.0–2.5) with respect to SARS-CoV-1 (1.7–1.9) and MERS-CoV (< 1) [9–11]. Despite the lower mortality rate of SARS-CoV-2, it has killed more people with respect to SARS-CoV-1 and MERS-CoV because of its global pandemic outbreak. SARS-CoV-2 virus is present in body fluids such as cerebrospinal fluid and blood and usually is transmitted through respiratory droplets [12, 13]. So, from the beginning of COVID-19 outbreak, social distancing and closing mask have been suggested to reduce the number of infected cases [14]. Infected people show a variety of symptoms such as fever, difficulty breathing, taste or smell loss, headache, muscle ache, sore throat, runny nose and nausea [15]. Most of the patients show mild symptoms (~ 80%), and just the smaller proportion of them (~ 5%) have severe disease [16]. There are four subfamilies of coronaviruses including α-coronaviruses, β-coronaviruses, δ-coronaviruses and γ-coronaviruses, in which α- and β-coronaviruses infect mammals. SARS-CoV-1, MERS-CoV and SARS-CoV-2 are belonging to β-coronaviruses subfamily [17–19]. SARS-CoV-2 is a positive-sense, single-stranded RNA virus (+ ssRNA) that has been packed in an envelope. Spike membrane glycoproteins in the surface of virus bind to angiotensin-converting enzyme 2 (ACE2) receptor in the membrane of human cells and enters virus to our cells [20–23]. Generally, designed drugs for COVID-19 treatment can be classified into four groups including drugs that prevent the replication and synthesis of RNA by targeting critical enzymes for the replication of the virus, drugs that block the binding of spike protein to ACE2 receptor on human cells, drugs that inhibit coronavirus virulence factors and drugs that inhibit a receptor or enzymes in human cells [24]. 3C-like cysteine protease (3CL^pro) is the main protease of SARS-CoV-2 that catalyzes the cleavage of polypeptides to their effector forms and has essential enzymatic role for virus life cycle [25, 26]. So it can be considered as a target for design drugs in COVID-19 treatment [27–29]. Quantitative structure–activity relationship (QSAR) is a computer-assisted drug design method that relates the structural features of molecules to their activities. QSAR models are useful in drug design process because they predict the activity of molecules quantitatively and determine structural features that increase the activity of molecules [30]. In this research, we have used a series of new synthesized compounds including 29 ketone-based molecules as covalent inhibitors of SARS-CoV-1 3CL^pro (that had been synthesized by Hoffman et al.) [31] to develop QSAR models with high predictive power for predicting their 3CL^pro inhibition activities. Hoffman et al. had shown that the greatest active compound in their research (compound 4 in their published paper and compound m15 in this research) is the covalent inhibitor of 3CL^pro SARS-CoV-1 (IC₅₀: 0.004 µM) and 3CL^pro SARS-CoV-2 (IC₅₀:0.00027 µM) enzymes. The crystallographic structure of the complex of this compound with 3CL^pro SARS-CoV-2 is available in protein data bank (PDB ID: 6XHM). Also, performed researches by other groups show that the derivatives of available molecules in this dataset are covalent inhibitors for the 3CL^pro enzymes of MERS-CoV and SARS-CoV-2 [32–37]. Since SARS-CoV-1 and SARS-CoV-2 have high similarity in their genome [38] and the derivatives of molecules in this dataset are active against the 3CL^pro enzymes of SARS-CoV-1 and SARS-CoV-2, designed and optimized inhibitors by using developed QSAR models in this research help to design new drugs for treating COVID-19.

Materials and methods

Materials

A series of molecules including 29 ketone-based covalent inhibitors of 3CL^pro SARS-CoV-1 were selected from published paper by Hoffman et al. [31]. The chemical structure and activity of molecules are listed in Table 1. The activity of molecules was IC₅₀ in nano-molar unit. In the first step, IC₅₀ values in nano-molar unit were converted to IC₅₀ values at molar unit and then they are converted to pIC₅₀ by using the following equation:

{pIC}_{50} = - log ({IC}_{50})

pIC₅₀ values had a wide range from 5.97 to 8.40. This dataset has suitable features that make it unique for developing QSAR models including the following:

Dataset has the wide range of activities (more than 2 log unit);
3CL^pro SARS-CoV-1 inhibition activity in nano-molar level;
Molecule m15 in the dataset shows potent inhibition activity against 3CL^pro SARS-CoV-1 (IC₅₀: 0.004 µM) and 3CL^pro SARS-CoV-2 (IC₅₀:0.00027 µM);
Molecule m15 is a covalent inhibitor of 3CL^pro SARS-CoV-2 (PDB ID: 6XHM);
Molecule m15 in the dataset shows good selectivity against other proteases [31];
Several researches have indicated that the derivatives of molecules in this dataset are covalent inhibitors of 3CL^pro enzymes in SARS-CoV-1, SARS-CoV-2 and MERS-CoV [32–37], so the developed model can help to design new drugs for treating COVID-19.

Table 1.

The chemical structures and activities of molecules in dataset

Open in a new tab

To develop QSAR models, the dataset was divided into a train set containing 22 molecules for developing QSAR models and a test set including 7 molecules (molecules m3, m8, m13, m14, m17, m21 and m23) for validating them. Molecules with low, moderate and high activities were put in both train and test sets manually, and molecules with the lowest and greatest activities were put into the train set.

Programs

The three-dimensional chemical structure of all molecules was built in HyperChem (version 7.1) software and optimized by using AMBER force field (the root-mean-square gradient was set to 0.0001 kcal mol⁻¹ Å⁻¹) [39]. Dragon software (version 5.5) was used to calculate molecular descriptors for the optimized structures of molecules [40]. SPSS software (version 16) was used to select informative descriptors by using stepwise multiple linear regression (stepwise MLR) [41]. All other chemometrics methods for building and validating models were performed in R software (version 3.6.3) [42]. RStudio software (Version 1.1.463) was used as integrated development environment (IDE) for R programing language [43]. MLRQSAR package (version 0.1.0) was used to develop multiple linear regression (MLR) model and validate it by performing leave-one-out cross-validation and Y-randomization test on MLR model. Also, it was used to compute descriptor contribution for MLR model, calculate variance inflation factor (VIF) for descriptors, calculate several statistical parameters for validating both train and test sets of developed QSAR models and compute the applicability domain of created QSAR models based on the calculation of the leverage matrix [44, 45]. For building artificial neural network (ANN) model, h2o package (version 3.32.1.2) was used [46]. Also, ggplot2 package (version 3.3.3) was used to draw plots [47].

Methods

MLR modelling and validation

A MLR model has the following form:

{pIC}_{50} = β_{0} + β_{1} {MD}_{1} + β_{2} {MD}_{2} + \dots + β_{n} {MD}_{n}

where $β_{0}$ is constant coefficient and $β_{1}$ to $β_{n}$ are corresponding coefficients to the molecular descriptors ${MD}_{1}$ to ${MD}_{n}$ . Coefficients are obtained so that the sum of squared residuals (between predicted pIC₅₀ and experimental pIC₅₀) is minimum. Also, leave-one-out cross validation (LOOCV) and Y-randomization tests were performed on this model to indicate that the created model is robust and has not been obtained by chance [48, 49].

ANN modelling

To create an ANN model in h2o package, h2o.deeplearning option was used. Although this package is able to build both shallow feedforward ANN model (ANN model with one hidden layer) and deep feedforward ANN model (ANN model with more than one hidden layer), we built a shallow feedforward ANN model due to the small size of dataset. In deep ANN model, the number of trainable parameters increases and the small size of dataset leads to overfitting. To solve overfitting in created model, dropout technique was applied to network during its training and regularization terms were used in its cost function. Dropout removes some neurons from input and hidden layers during the training process, randomly. L1 (lasso) regularization, L2 (ridge) regularization and max_w2 (an upper limit for the (squared) sum of the incoming weights to a neuron) were added to loss function as regularization terms. The loss function in h2o.deeplearning has the following form that it is minimized for each training example j:

Lossfunction = L (W . B | j) + λ_{1} R_{1} (W . B | j) + λ_{2} R_{2} (W . B | j)

In Eq. 3, W is the collection {W_i}_1:N-1, where W_i denotes the weight matrix connecting layers i and i + 1 for a network of N layers and B is the collection {b_i}_1:N-1, where b_i denotes the column vector of biases for layer i + 1. In loss function, $L (W . B | j)$ was set to absolute that is the sum of residuals. $R_{1} (W . B | j)$ is the sum of all L1 norms for the weights and biases in the network, and L2 regularization is presented via $R_{2} (W . B | j)$ that is the sum of squares of all the weights and biases in the network. $λ_{1}$ and $λ_{2}$ are constant variables that generally they are set to a very small value (for example 10^–5). Also, maxout activation function was used for neurons in the hidden layer [50–53].

Applicability domain

The applicability domain of built QSAR models was investigated by calculating leverage matrix (H):

H = X {(X^{T} X)}^{- 1} X^{T}

where X is descriptors matrix and the diagonal elements of H matrix are the leverages for objects (molecules). Critical leverage value was considered 3p/n, where p is the number of descriptors in model plus one and n is the number of molecules in the train set. If calculated leverage (h) for a molecule is larger than critical leverage value, its predicted activity (by created model) is not acceptable [54, 55].

Statistical parameters for validating QSAR models

For validating created QSAR models, several statistical parameters have been calculated for both train and test sets including:

R^{2} = \frac{{(\sum_{i = 1}^{n} (y_{i} - \bar{y}) ({\hat{y}}_{i} - \bar{\hat{y}}))}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2} \times \sum_{i = 1}^{n} {({\hat{y}}_{i} - \bar{\hat{y}})}^{2}}

r_{0}^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - k \times {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} \sum {(y_{i} - \bar{y})}^{2}}

r_{0}^{^{'} 2} = 1 - \frac{\sum_{i = 1}^{n} {({\hat{y}}_{i} - k^{'} \times y_{i})}^{2}}{\sum_{i = 1}^{n} {({\hat{y}}_{i} - \bar{\hat{y}})}^{2}}

k = \frac{\sum_{i = 1}^{n} (y_{i} \times {\hat{y}}_{i})}{\sum_{i = 1}^{n} {({\hat{y}}_{i})}^{2}}

k^{^{'}} = \frac{\sum_{i = 1}^{n} (y_{i} \times {\hat{y}}_{i})}{\sum_{i = 1}^{n} {(y_{i})}^{2}}

\bar{r_{m}^{2}} = \frac{(r_{m}^{2} + r_{m}^{^{'} 2})}{2}

Δ r_{m}^{2} = |r_{m}^{2} - r_{m}^{^{'} 2}|

r_{m}^{2} = r^{2} \times (1 - \sqrt{r^{2} - r_{0}^{2}})

r_{m}^{^{'} 2} = r^{2} \times (1 - \sqrt{r^{2} - r_{0}^{^{'} 2}})

C C C^{2} = \frac{2 (\sum_{i = 1}^{n} (y_{i} - \bar{y}) ({\hat{y}}_{i} - \bar{\hat{y}}))}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2} + \sum_{i = 1}^{n} {({\hat{y}}_{i} - \bar{\hat{y}})}^{2} + n {(\bar{y} - \bar{\hat{y}})}^{2}}

M A E = \frac{\sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})}{n}

Q_{F 1}^{2} = 1 - \frac{\sum_{i = 1}^{n_{Test}} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n_{Test}} {(y_{i} - {\bar{y}}_{TR})}^{2}}

Q_{F 2}^{2} = 1 - \frac{\sum_{i = 1}^{n_{Test}} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n_{Test}} {(y_{i} - {\bar{y}}_{Test})}^{2}}

Q_{F 3}^{2} = 1 - \frac{\frac{\sum_{i = 1}^{n_{Test}} {(y_{i} - {\hat{y}}_{i})}^{2}}{n_{Test}}}{\frac{\sum_{i = 1}^{n_{Test}} {(y_{i} - {\bar{y}}_{TR})}^{2}}{n_{TR}}}

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{n}}

where y_i and ${\hat{y}}_{i}$ are, respectively, the experimental and the predicted activity of molecule and $\bar{y}$ and $\bar{\hat{y}}$ are the mean of the experimental and the predicted activities, respectively. ${\bar{y}}_{TR}$ and ${\bar{y}}_{Test}$ are the mean of the activity for train and test sets, respectively. Also, $n$ , $n_{TR}$ and $n_{Test}$ are the number of compounds, the number of compounds in train set and the number of compounds in test set, respectively. CCC² is the squared concordance correlation coefficient, RMSE is the root-mean-squared error, and MAE is the mean absolute error [44, 56, 57].

Results and discussion

Model building and validation

Molecular descriptors that belong to all 22 descriptors blocks in Dragon software were calculated for all molecules. In the first step, molecular descriptors with few repeated values (fewer than 5) across samples and many zero values (with more than 10 zero values) across samples were removed. After this preprocessing step, 1203 molecular descriptor were remained. Stepwise MLR in SPSS software was used to select informative variables based on molecules in the train set. Four molecular descriptors were selected to develop QSAR models whose name and definition are listed in Table 2, and their values for all molecules are listed in Table 3. VIF values for Mor26p, Hy, GATS7p and Mor04v molecular descriptors were 1.06, 1.28, 1.12 and 1.21 which indicate that these descriptors have no collinearity and multi-collinearity problems and are suitable for creating QSAR models. Mor26p has the largest correlation with the activities of molecules (R² = 0.59), but a predictive model cannot be created just by using this descriptor, so Hy descriptor was added by stepwise MLR and the following model was created:

{pIC}_{50} = 6.769 (\pm 0.253) - 3.236 (\pm 0.588) Mor 26 p + 0.563 (\pm 0.119) Hy

R² and RMSE values for the train set of this model were 0.77 and 0.22, respectively, and those for the test set were 0.79 and 0.32, respectively. R² value for LOO-CV on the train set was 0.72 which indicates that the created model is robust, and the maximum value of R² for ten runs of Y-randomization test was 0.17 which shows that the created model has not been obtained by chance. By adding another descriptor (GATS7p), a model with three descriptors was built:

{pIC}_{50} = 3.837 (\pm 0.633) - 3.542 (\pm 0.372) Mor 26 p + 0.751 (\pm 0.082) Hy + 2.936 (\pm 0.602) GATS 7 p

Table 2.

The definition of selected descriptors by stepwise MLR

Descriptor	Type	Descriptor block	Definition
Mor26p	3D	3D-MoRSE descriptors	3D-MoRSE—signal 26/weighted by atomic polarizability
Hy	Others	Molecular properties	Hydrophilic factor
GATS7p	2D	2D autocorrelations	Geary autocorrelation-lag 7/weighted by atomic polarizability
Mor04v	3D	3D-MoRSE descriptors	3D-MoRSE—signal 04/weighted by atomic van der Waals volume

Open in a new tab

Table 3.

Experimental and predicted pIC₅₀, descriptors values and leverage values for molecules (critical leverage value is 0.68)

Train set
		Predicted pIC₅₀ by		Descriptor values
Molecule	Experimental pIC₅₀	MLR model	ANN model	Mor26p	Hy	GATS7p	Mor04v	Leverage
m1	6.66	6.80	6.66	0.231	1.525	0.96	− 0.759	0.0419
m2	6.74	6.94	6.76	0.198	1.478	0.973	− 0.671	0.0460
m4	7.07	7.13	7.09	0.155	1.415	1.038	− 1.095	0.0770
m5	7.10	7.14	7.11	0.199	1.399	1.078	− 0.833	0.0766
m6	7.06	6.94	6.97	0.224	1.395	1.04	− 0.841	0.0592
m7	7.28	7.36	7.27	0.061	1.399	1.012	− 1.166	0.1362
m9	7.01	7.22	7.05	0.11	1.418	0.992	− 0.826	0.0807
m10	7.13	6.92	7.11	0.25	1.377	1.102	− 1.235	0.1022
m11	6.69	6.52	6.63	0.229	1.385	0.957	− 1.457	0.1103
m12	7.77	7.76	7.65	0.002	1.399	1.069	− 1.049	0.2476
m15	8.40	8.12	8.22	0.022	2.336	0.968	− 0.962	0.1625
m16	7.08	7.11	6.94	0.193	1.548	0.987	− 0.431	0.0783
m18	7.47	7.55	7.37	0.224	2.304	1.036	− 1.108	0.1097
m19	7.36	7.42	7.32	0.281	2.243	1.049	− 0.754	0.1550
m20	6.99	7.03	6.90	0.246	2.243	1.049	− 2.854	0.5575
m22	7.70	7.63	7.59	0.129	2.341	0.9	− 0.606	0.2084
m24	6.95	6.90	6.87	0.131	1.52	0.896	− 1.006	0.0330
m25	7.04	6.88	6.91	0.298	1.697	1.001	− 0.493	0.1029
m26	5.97	6.00	5.97	0.379	0.933	0.995	− 0.469	0.1676
m27	7.28	7.36	7.16	0.18	2.336	0.968	− 1.777	0.1678
m28	7.42	7.55	7.37	0.118	2.376	0.926	− 1.544	0.1414
m29	6.88	6.74	6.80	0.185	1.573	0.933	− 1.464	0.0791

Test set
		Predicted pIC₅₀ by		Descriptor values
Molecule	Experimental pIC₅₀	MLR model	ANN model	Mor26p	Hy	GATS7p	Mor04v	Leverage
m3	6.64	6.95	6.79	0.193	1.456	0.981	− 0.745	0.0444
m8	7.09	7.06	6.98	0.123	1.418	0.972	− 1.054	0.0644
m13	5.99	5.54	5.85	0.601	0.899	1.06	0.184	0.5221
m14	8.15	8.19	8.40	− 0.012	2.304	0.939	− 0.741	0.2222
m17	7.70	7.77	7.50	0.146	2.336	1.02	− 1.244	0.0943
m21	7.46	7.25	7.29	0.066	1.522	0.943	− 1.074	0.0762
m23	6.98	7.00	6.97	0.123	1.546	0.91	− 0.943	0.0357

Open in a new tab

R² and RMSE values for the train set of this model were 0.85 and 0.17, respectively, and those for the test set were 0.82 and 0.28, respectively. R² value for LOO-CV on the train set was 0.84 which indicates that the created model is robust, and the maximum value of R² for ten runs of Y-randomization test was 0.29 which shows that the created model has not been obtained by chance. As seen, adding GATS7p has increased the predictive power of QSAR model. For increasing the predictive power of model, another descriptor (Mor04v) was added to the model, and according to Topliss and Costello rule (the ratio of molecules in train set to used descriptors for building model should be at least 5 to 1) [58], this is the last descriptor that we can use for developing QSAR models. By using all four descriptors, the following equation was obtained in R software:

\begin{matrix} {pIC}_{50} & = 3.837 (\pm 0.633) - 3.542 (\pm 0.372) Mor 26 p + 0.751 (\pm 0.082) Hy \\ + 2.936 (\pm 0.602) GATS 7 p + 0.245 (\pm 0065) Mor 04 v \end{matrix}

R² values for the train and test sets of this model were 0.92 and 0.93, respectively, and RMSE values for the train and test sets were 0.13 and 0.24, respectively. R² value for LOO-CV was 0.90 which shows that the created model is robust, and the maximum R² value for ten runs of Y-randomization test was 0.37 which indicates that the created model has not been obtained by chance. R² and RMSE values for the test set of created MLR models show that the created MLR model with all four descriptors has the highest predictive power. For further validation of the MLR model (MLR model with four descriptors), several statistical parameters were calculated for the train and test sets that are listed in Tables 4 and 5. Calculated values for these statistical parameters show that the created model is acceptable and has high predictive power. Predicted pIC₅₀ for all molecules (in both train and test sets) by this model (MLR model with four descriptors) is listed in Table 3. Calculated leverages for all molecules (that are listed in Table 3) are smaller than critical leverages which show that the predicted pIC₅₀ for all molecules (by MLR model with four descriptors) is acceptable. The plot of predicted pIC₅₀ versus experimental pIC₅₀, William plot and residuals plot for the MLR model (MLR model with four descriptors) are shown in Fig. 1. The William plot in Fig. 1 shows that the created model has no outlier and the predicted pIC₅₀ for all molecules (in both train and test sets) is acceptable, and the residual plot shows that there is no systematic error in creating MLR model with four descriptors. To develop more predictive power QSAR model, these four descriptors were used as input variables for training an ANN model. In the first step, a network with one hidden layer and 10 neurons was created. For optimizing the trainable parameters of ANN model, k-fold cross-validation test was used. In this method, molecules in train set were divided into three sets, and each time, both of them were used for training ANN model and other for its validation and this process was repeated for each fold. The R² value for each fold and their mean were calculated. The activation function for neuron in the hidden layer was set to maxout activation function. By increasing the number of neurons in the hidden layer to 100 (each time, 10 neurons were added to the hidden layer of previous network architecture), the average of R² values for all three folds was increased. Increasing the number of neurons in the hidden layer to more than 100 neurons did not increase the average of R² values for k-fold cross-validation test, significantly, so an ANN architecture with one hundred neurons in its hidden layer was selected as the best architecture. Also, L1 and L2 regularization terms were set to 0.00001 and max_w2 was set to its default value. Dropout ratio from 0 to 0.5 was examined for both input and hidden layers, and the best results were obtained when dropout ratio for the input layer and hidden layer was set to 0.1 and 0.3, respectively. Other parameters were set to their default. So created ANN model had four neurons in its input layer and one hundred neurons in its hidden layer (with maxout activation function) and one neuron in its output layer (with linear activation function). The predicted pIC₅₀ for all molecules (in both train and test sets) is listed in Table 3, and the calculated statistical parameters for the train and test sets are listed in Tables 4 and 5. R² and RMSE values for the train set of ANN model were 0.99 and 0.06, respectively, and R² and RMSE values for the test set were 0.95 and 0.17, respectively. R² values for folds 1, 2 and 3 were 0.89, 0.69 and 0.68, respectively, and their mean was 0.75 which indicates that the created ANN model is robust. The plot of predicted pIC₅₀ versus experimental pIC₅₀, William plot and residuals plot for ANN model are shown in Fig. 2. Drawn residuals plot shows that there is no bias (systematic error) in creating this ANN model. William plot shows that molecule m15 is outlier, and based on this plot, predicted pIC₅₀ by the ANN model for all molecules (in both train and test sets) is acceptable.

Table 4.

Calculated statistical parameters for validating created QSAR models

Statistical parameters	Threshold values	MLR		ANN
		Train set	Test set	Train set	Test set
$C C C^{2}$	> 0.6	0.92	0.91	0.96	0.95
$R^{2}$	> 0.6	0.92	0.93	0.99	0.95
$RMSE$	–	0.13	0.24	0.06	0.17
$k$	≤ 1.15 and ≥ 0.85	1.00	1.00	1.01	1.00
$k^{^{'}}$	≤ 1.15 and ≥ 0.85	1.00	1.00	0.99	1.00
$r_{0}^{2}$	> 0.6	0.92	0.89	0.98	0.94
$r_{0}^{^{'} 2}$	> 0.6	0.91	0.92	0.98	0.95
$r_{m}^{2}$	> 0.5	0.92	0.74	0.93	0.85
$r_{m}^{^{'} 2}$	> 0.5	0.85	0.83	0.92	0.90
$\bar{r_{m}^{2}}$	> 0.5	0.88	0.79	0.92	0.87
$Δ r_{m}^{2}$	< 0.2	0.08	0.09	0.01	0.05
$(r^{2} - r_{0}^{2}) / r^{2}$	< 0.1	0.00	0.04	0.00	0.01
$(r^{2} - r_{0}^{^{'} 2}) / r^{2}$	< 0.1	0.01	0.01	0.00	0.00
$\|r^{2} - r_{0}^{^{'} 2}\|$	< 0.3	0.01	0.03	0.00	0.01
$MAE$	–	0.00	0.03	0.06	0.03

Open in a new tab

Table 5.

Calculated Q²-based statistical parameters for validating created QSAR models

Parameter	MLR	ANN
$Q_{F 1}^{2}$	0.89	0.94
$Q_{F 2}^{2}$	0.89	0.94
$Q_{F 3}^{2}$	0.77	0.88

Open in a new tab

Fig. 1 — Plots for created MLR model (train set with blue color and test set with red color): (A) the plot of predicted pIC₅₀ versus experimental pIC₅₀; (B) William plot (critical leverage is 0.68); (C) residuals plot

Fig. 2 — Plots for created ANN model (train set with blue color and test set with red color): (A) the plot of predicted pIC₅₀ versus experimental pIC₅₀; (B) William plot (critical leverage is 0.68); (C) residuals plot

Descriptors interpretation

The contribution of Mor26p, Hy, GATS7p and Mor04v molecular descriptors in the building of MLR model with four descriptors was 11.70%, 23.10%, 52.60% and 3.72%, respectively, and this MLR model (with four descriptors) was used for descriptors interpretation. Negative coefficient sign for Mor26p shows that smaller values (negative values) for this descriptor are favorable for increasing the activities of molecules. For example, molecules m14 and m15 which have smaller values for this descriptor have the most potent activities among others. Among all molecules, the value of this descriptor is negative only for molecule m14. Mor26p is a descriptor that belongs to 3D molecular representations of structure based on electron diffraction (3D-MoRSE) descriptors family that has been weighted by atomic polarizability. A study by Devinyak et al. [59] shows that the weighting of these descriptors by atomic polarizability decreases the effect of hydrogen significantly and diminishes the roles of nitrogen, oxygen and fluorine atoms. Also, they found that although these descriptors have information about the whole molecule, their final values are derived mostly from short-distance atomic pairs [59]. The presence of methoxy group on phenyl ring in the R₁ substituent of molecule m21 has decreased its Mor26p value and increased its activity with respect to molecule m23. The comparison of molecules m21, m23 and m26 shows that R₁ substituent with two fused rings is favorable for increasing the activity of molecule with respect to R₁ substituent with a ring because two fused rings decrease the Mor26p descriptor value for molecule. Replacing hydrogen atom in R₂ substituent with methyl group increases the Mor26p descriptor value and decreases the activity of molecule, so bulky groups in R₂ substituent are not favorable. Comparing molecules m15 and m17 with m20 shows that longer and bulky groups for R₃ substituent increase the value of Mor26p descriptor, so cyclic and long-chain groups for R₃ substituent are not favorable for increasing the activity of molecules. The presence of nitrile group on phenyl ring in R₄ substituent in molecules m7 and m12 has decreased the value of Mor26p descriptor for these two molecules, but comparing all molecules does not reveal a specific relationship between the size of R₄ substituent and Mor26p descriptor values for molecules. Hy is the hydrophilic factor for molecule, and MLR model shows that larger Hy descriptor values improve the activity of molecules. The R² value between Hy descriptor values and the activities of molecules is 0.52. Available data in Table 2 show that molecules with greater activities such as molecules m14, m15, m17 and m22 have larger Hy value. Hydrophilic groups such as hydroxyl group are favorable for increasing Hy descriptor value. Also, the presence of atoms with negative partial charge in R₁ substituent and less bulky groups in R₃ substituent increases the value of Hy descriptor. The developed MLR model shows that the larger value of GATS7p descriptor is favorable for increasing the activity of molecule. Mor04v descriptor belonging to 3D-MoRSE descriptors has been weighted by atomic van der Waals volume. Weighting descriptor by atomic van der Waals volume has similar effect with the weighting of 3D-MoRSE descriptor by atomic polarizability that decreases the effect of hydrogen significantly and diminishes the roles of nitrogen, oxygen and fluorine atoms [59]. In MLR model, Mor04v descriptor has a coefficient with positive sign, so larger values of this descriptor are favorable for increasing the activity of molecules. Except for molecule m13, the values of this descriptor are negative for other molecules (Table 3). Comparing molecules m1 to m14 shows that the larger value of Mor04v descriptor for molecule m13 is related to less bulky group for R₁ substituent in molecule m13. This situation is seen for molecules m25 and m26. Less bulky groups for R₁ substituent increase the value of Mor26p descriptor and decrease the activity of molecules. Since Mor04v descriptor has less contribution in creating model with respect to Mor26p descriptor, less bulky groups for R₁ substituent are not favorable for increasing the activity of molecules. The contribution of Mor26p, Hy, GATS7p and Mor04v molecular descriptors in the building of ANN model was 26.27%, 26.09%, 25.62% and 21.99%, respectively, that show different values in comparison with the MLR model. Although GATS7p shows the largest contribution in the building of MLR model, in ANN model all four descriptors show comparable contribution in the building of model. Also, it should be considered that Mor26p has the largest correlation with the activities of molecules (R² = 0.59).

Comparing QSAR models

Calculated statistical parameters for the train and test sets of both models in Tables 4 and 5 show both QSAR models are acceptable and have high predictive power. Calculated $C C C^{2}$ , $R^{2}$ , $RMSE$ , $r_{0}^{2}$ , $r_{0}^{^{'} 2}$ , $r_{m}^{2}$ , $r_{m}^{^{'} 2}$ , $\bar{r_{m}^{2}}$ and Q²-based parameters (especially $Q_{F 3}^{2}$ parameter) show that ANN model has more predictive power with respect to MLR model. William plot in Fig. 2 shows that molecule m15 is outlier in ANN model, but as seen from Table 2, ANN model has better prediction for its activity, and probably, it has happened because of the small standard deviation value of residuals for molecules in the train set of ANN model (SD = 0.06) with respect to MLR model (SD = 0.13).

Conclusions

The results of this research show the building of MLR and ANN models based on using Mor26p, Hy, GATS7p and Mor04v molecular descriptors which are suitable for predicting the SARS-CoV-1 3CL^pro inhibition activity of these ketone-based molecules. Although both created models are acceptable and show high predictive power, calculated R²- and Q²-based parameters and RMSE for both train and test sets of MLR model with four descriptors and ANN model show that the ANN model has more predictive power. The interpretation of descriptors (based on the developed MLR model with four descriptors) shows that groups with two fused rings in R₁ substituent are favorable for increasing the activity of molecule, bulky groups for R₂ substituent are not favorable for improving the activity of molecules, and the presence of cyclic groups and long-chain groups for R₃ substituent decreases the activity of molecules.

References

1.https://covid19.who.int/
2.Tuncer T, Ozyurt F, Dogan S, Subasi A. Chemometr. Intell. Lab. Syst. 2021;210:104256. doi: 10.1016/j.chemolab.2021.104256. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Parsafar G, Reddy V. J. Iran. Chem. Soc. 2021 doi: 10.1007/s13738-021-02299-5. [DOI] [Google Scholar]
4.Serte S, Demirel H. Comput. Biol. Med. 2021;132:104306. doi: 10.1016/j.compbiomed.2021.104306. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Ton AT, Gentile F, Hsing M, Ban F, Cherkasov A. Mol. Inf. 2020;39:2000028. doi: 10.1002/minf.202000028. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Zhang Y, Greer RA, Song Y, Praveen H, Song Y. Eur. J. Pharm. Sci. 2021;160:105771. doi: 10.1016/j.ejps.2021.105771. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Alves VM, Bobrowski T, Melo-Filho CC, Korn D, Auerbach S, Schmitt C, Muratov EN, Tropsha A. Mol. Inf. 2021;40:2000113. doi: 10.1002/minf.202000113. [DOI] [PubMed] [Google Scholar]
8.Ciotti M, Ciccozzi M, Terrinoni A, Jiang WC, Wang CB, Bernardini S. Crit. Rev. Clin. Lab. Sci. 2020;57:365–388. doi: 10.1080/10408363.2020.1783198. [DOI] [PubMed] [Google Scholar]
9.Duverger E, Herlem G, Picaud F. J. Mol. Graph. Model. 2021;104:107834. doi: 10.1016/j.jmgm.2021.107834. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Cavasotto CN, Di Filippo JI. Mol. Inf. 2021;40:2000115. doi: 10.1002/minf.202000115. [DOI] [PubMed] [Google Scholar]
11.Petrosillo N, Viceconte G, Ergonul O, Ippolito G, Petersen E. Clin. Microbiol. Infect. 2020;26:729–734. doi: 10.1016/j.cmi.2020.03.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Mills S. Judic. Rev. 2020;25:71–79. doi: 10.1080/10854681.2020.1760575. [DOI] [Google Scholar]
13.Kabir MA, Ahmed R, Chowdhury R, Asher Iqbal SM, Paulmurugan R, Demirci U, Asghar W. Microbes Infect. 2021 doi: 10.1016/j.micinf.2021.104832. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Hartt M. Cities and Health. 2020 doi: 10.1080/23748834.2020.1788770. [DOI] [Google Scholar]
15.https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/symptoms.html
16.Cavasotto CN, Lamas MS, Maggini J. Eur. J. Pharmacol. 2021;890:173705. doi: 10.1016/j.ejphar.2020.173705. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Li F. Annu. Rev. Virol. 2016;3:237–261. doi: 10.1146/annurev-virology-110615-042301. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Kucukoglu K, Faydal N, Bul D. Med. Chem. Res. 2020;29:1935–1955. doi: 10.1007/s00044-020-02625-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Sattari A, Ramazani A, Aghahosseini H. J. Iran. Chem. Soc. 2021 doi: 10.1007/s13738-021-02235-7. [DOI] [Google Scholar]
20.Wrobel AG, Benton DJ, Hussain S, Harvey R, Martin SR, Roustan C, Rosenthal PB, Skehel JJ, Gamblin SJ. Nat. Commun. 2020;11:5337. doi: 10.1038/s41467-020-19146-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Yousefi R, Moosavi-Movahedi A. J. Iran. Chem. Soc. 2020;17:1257–1258. doi: 10.1007/s13738-020-01939-6. [DOI] [Google Scholar]
22.Barge S, Jade D, Gosavi G, Talukdar NC, Borah J. Eur. J. Pharm. Sci. 2021;162:105820. doi: 10.1016/j.ejps.2021.105820. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Lan J, Ge J, Yu J, Shan S, Zhou H, Fan S, Zhang Q, Shi X, Wang Q, Zhang L, Wang X. Nature. 2020;581:215–220. doi: 10.1038/s41586-020-2180-5. [DOI] [PubMed] [Google Scholar]
24.Muhammed Y. Biosaf. Health. 2020;2:210–216. doi: 10.1016/j.bsheal.2020.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Ghosh K, Abdul Amin S, Gayen S, Jha T. J. Mol. Struct. 2021;1237:130366. doi: 10.1016/j.molstruc.2021.130366. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Zhang S, Krumberger M, Morris MA, Marie C, Parrocha T, Kreutzer AG, Nowick JS. Eur. J. Med. Chem. 2021;218:113390. doi: 10.1016/j.ejmech.2021.113390. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Chellapandi P, Saranya S. Med. Chem. Res. 2020;29:1777–1791. doi: 10.1007/s00044-020-02610-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Chang CK, Lin SM, Satange R, Lin SC, Sun SC, Wu HY, Kehn-Hall K, Hou MH. Comput. Struct. Biotechnol. J. 2021;19:2246–2255. doi: 10.1016/j.csbj.2021.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Mirtaleb MS, Mirtaleb AH, Nosrati H, Heshmatnia J, Falak R, Zolfaghari Emameh R. Biomed. Pharmacother. 2021;138:111518. doi: 10.1016/j.biopha.2021.111518. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Ahmadi R, Sepehri B, Ghavami R. J. Recept. Signal Transduct. 2019;39:264–275. doi: 10.1080/10799893.2019.1660898. [DOI] [PubMed] [Google Scholar]
31.Hoffman RL, Kania RS, Brothers MA, Davies JF, Ferre RA, Gajiwala KS, He M, Hogan RJ, Kozminski K, Li LY, Lockner JW, Lou J, Marra MT, Mitchell LJ, Jr, Murray BW, Nieman JA, Noell S, Planken SP, Rowe T, Ryan K, Smith GJ, III, Solowiej JE, Steppan CM, Taggart B. J. Med. Chem. 2020;63:12725–12747. doi: 10.1021/acs.jmedchem.0c01063. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Zhang L, Lin D, Sun X, Curth U, Drosten C, Sauerhering L, Becker S, Rox K, Hilgenfeld R. Science. 2020;368:409–412. doi: 10.1126/science.abb3405. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Dai W, Zhang B, Jiang XM, Su H, Li J, Zhao Y, Xie X, Jin Z, Peng J, Liu F, Li C, Li Y, Bai F, Wang H, Cheng X, Cen X, Hu S, Yang X, Wang J, Liu X, Xiao G, Jiang H, Rao Z, Zhang LK, Xu Y, Yang H, Liu H. Science. 2020;368:1331–1335. doi: 10.1126/science.abb4489. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Tomar S, Johnston ML, John SES, Osswald HL, Nyalapatla PR, Paul LN, Ghosh AK, Denison MR, Mesecar AD. J. Biol. Chem. 2015;290:19403–19422. doi: 10.1074/jbc.M115.651463. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Dai W, Jochmans D, Xie H, Yang H, Li J, Su H, Chang D, Wang J, Peng J, Zhu L, Nian Y, Hilgenfeld R, Jiang H, Chen K, Zhang L, Xu Y, Neyts J, Liu H. J. Med. Chem. 2021 doi: 10.1021/acs.jmedchem.0c02258. [DOI] [PubMed] [Google Scholar]
36.Bai B, Belovodskiy A, Hena M, Kandadai AS, Joyce MA, Saffran HA, Shields JA, Khan MB, Arutyunova E, Lu J, Bajwa SK, Hockman D, Fischer C, Lamer T, Vuong W, van Belkum MJ, Gu Z, Lin F, Du Y, Xu J, Rahim M, Young HS, Vederas JC, Tyrrell DL, Lemieux MJ, Nieman JA. J. Med. Chem. 2021 doi: 10.1021/acs.jmedchem.1c00616. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Vuong W, Khan MB, Fischer C, Arutyunova E, Lamer T, Shields J, Saffran HA, McKay RT, van Belkum MJ, Joyce MA, Young HS, Tyrrell DL, Vederas JC, Lemieux MJ. Nat. Commun. 2020;11:4282. doi: 10.1038/s41467-020-18096-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Chen Z, Boon SS, Wang MH, Chan RWY, Chan PKS. J. Virol. Methods. 2021;289:114032. doi: 10.1016/j.jviromet.2020.114032. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.HyperChem 7.1. Gainesville, USA: Hypercube, Inc. Available from: http://www.hyper.com
40.Milano chemometrics and QSAR research group, 2007. Available from http://www.talete.mi.it/dragon.htm
41.http://www.spss.com
42.https://www.r-project.org/
43.https://rstudio.com/
44.Sepehri B, Ghavami R, Farahbakhsh S, Ahmadi R. Int. J. Environ. Sci. Technol. 2021 doi: 10.1007/s13762-021-03271-9. [DOI] [Google Scholar]
45.https://www.researchgate.net/publication/350459619_MLRQSAR_package_version_010_for_R_programming_language
46.https://cloud.r-project.org/web/packages/h2o/index.html
47.https://cran.r-project.org/web/packages/ggplot2/index.html
48.Ghavami R, Sepehri B. J. Iran. Chem. Soc. 2016;13:519–529. doi: 10.1007/s13738-015-0761-2. [DOI] [Google Scholar]
49.Ghavami R, Sepehri B. J. Chromatogr. A. 2012;1233:116–125. doi: 10.1016/j.chroma.2012.01.047. [DOI] [PubMed] [Google Scholar]
50.Phil K. Matlab Deep Learning: With Machine Learning, Neural Networks and Artificial Intelligence. New York: Apress; 2017. [Google Scholar]
51.Cook D. Practical Machine Learning with H2O. Massachusetts: O’Reilly Media Inc; 2017. [Google Scholar]
52.J. Moolayil, Learn Keras for deep neural networks, (Jojo Moolayil, 2019)
53.A. Candel, E. LeDell, Deep learning with H₂O, (H₂O.ai, Inc, 2020)
54.Sepehri B, Ghavami R. Med. Chem. 2018;14:439–450. doi: 10.2174/1573406414666180321151029. [DOI] [PubMed] [Google Scholar]
55.Sepehri B, Ghavami R. J. Mol. Struct. 2017;1130:922–928. doi: 10.1016/j.molstruc.2016.10.079. [DOI] [Google Scholar]
56.Sepehri B, Rasouli Z, Hassanzadeh Z, Ghavami R. Med. Chem. Res. 2016;25:2895–2905. doi: 10.1007/s00044-016-1686-8. [DOI] [Google Scholar]
57.Sepehri B, Ghavami R. SAR QSAR Environ. Res. 2019;30:21–38. doi: 10.1080/1062936X.2018.1545695. [DOI] [PubMed] [Google Scholar]
58.Sepehri B. J. Mol. Liq. 2020;297:112013. doi: 10.1016/j.molliq.2019.112013. [DOI] [Google Scholar]
59.Devinyak O, Havrylyuk D, Lesyk R. J. Mol. Graph. Model. 2014;54:194–203. doi: 10.1016/j.jmgm.2014.10.006. [DOI] [PubMed] [Google Scholar]

[CR1] 1.https://covid19.who.int/

[CR2] 2.Tuncer T, Ozyurt F, Dogan S, Subasi A. Chemometr. Intell. Lab. Syst. 2021;210:104256. doi: 10.1016/j.chemolab.2021.104256. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Parsafar G, Reddy V. J. Iran. Chem. Soc. 2021 doi: 10.1007/s13738-021-02299-5. [DOI] [Google Scholar]

[CR4] 4.Serte S, Demirel H. Comput. Biol. Med. 2021;132:104306. doi: 10.1016/j.compbiomed.2021.104306. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Ton AT, Gentile F, Hsing M, Ban F, Cherkasov A. Mol. Inf. 2020;39:2000028. doi: 10.1002/minf.202000028. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Zhang Y, Greer RA, Song Y, Praveen H, Song Y. Eur. J. Pharm. Sci. 2021;160:105771. doi: 10.1016/j.ejps.2021.105771. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Alves VM, Bobrowski T, Melo-Filho CC, Korn D, Auerbach S, Schmitt C, Muratov EN, Tropsha A. Mol. Inf. 2021;40:2000113. doi: 10.1002/minf.202000113. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Ciotti M, Ciccozzi M, Terrinoni A, Jiang WC, Wang CB, Bernardini S. Crit. Rev. Clin. Lab. Sci. 2020;57:365–388. doi: 10.1080/10408363.2020.1783198. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Duverger E, Herlem G, Picaud F. J. Mol. Graph. Model. 2021;104:107834. doi: 10.1016/j.jmgm.2021.107834. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Cavasotto CN, Di Filippo JI. Mol. Inf. 2021;40:2000115. doi: 10.1002/minf.202000115. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Petrosillo N, Viceconte G, Ergonul O, Ippolito G, Petersen E. Clin. Microbiol. Infect. 2020;26:729–734. doi: 10.1016/j.cmi.2020.03.026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Mills S. Judic. Rev. 2020;25:71–79. doi: 10.1080/10854681.2020.1760575. [DOI] [Google Scholar]

[CR13] 13.Kabir MA, Ahmed R, Chowdhury R, Asher Iqbal SM, Paulmurugan R, Demirci U, Asghar W. Microbes Infect. 2021 doi: 10.1016/j.micinf.2021.104832. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Hartt M. Cities and Health. 2020 doi: 10.1080/23748834.2020.1788770. [DOI] [Google Scholar]

[CR15] 15.https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/symptoms.html

[CR16] 16.Cavasotto CN, Lamas MS, Maggini J. Eur. J. Pharmacol. 2021;890:173705. doi: 10.1016/j.ejphar.2020.173705. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Li F. Annu. Rev. Virol. 2016;3:237–261. doi: 10.1146/annurev-virology-110615-042301. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Kucukoglu K, Faydal N, Bul D. Med. Chem. Res. 2020;29:1935–1955. doi: 10.1007/s00044-020-02625-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Sattari A, Ramazani A, Aghahosseini H. J. Iran. Chem. Soc. 2021 doi: 10.1007/s13738-021-02235-7. [DOI] [Google Scholar]

[CR20] 20.Wrobel AG, Benton DJ, Hussain S, Harvey R, Martin SR, Roustan C, Rosenthal PB, Skehel JJ, Gamblin SJ. Nat. Commun. 2020;11:5337. doi: 10.1038/s41467-020-19146-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Yousefi R, Moosavi-Movahedi A. J. Iran. Chem. Soc. 2020;17:1257–1258. doi: 10.1007/s13738-020-01939-6. [DOI] [Google Scholar]

[CR22] 22.Barge S, Jade D, Gosavi G, Talukdar NC, Borah J. Eur. J. Pharm. Sci. 2021;162:105820. doi: 10.1016/j.ejps.2021.105820. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Lan J, Ge J, Yu J, Shan S, Zhou H, Fan S, Zhang Q, Shi X, Wang Q, Zhang L, Wang X. Nature. 2020;581:215–220. doi: 10.1038/s41586-020-2180-5. [DOI] [PubMed] [Google Scholar]

[CR24] 24.Muhammed Y. Biosaf. Health. 2020;2:210–216. doi: 10.1016/j.bsheal.2020.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Ghosh K, Abdul Amin S, Gayen S, Jha T. J. Mol. Struct. 2021;1237:130366. doi: 10.1016/j.molstruc.2021.130366. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Zhang S, Krumberger M, Morris MA, Marie C, Parrocha T, Kreutzer AG, Nowick JS. Eur. J. Med. Chem. 2021;218:113390. doi: 10.1016/j.ejmech.2021.113390. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Chellapandi P, Saranya S. Med. Chem. Res. 2020;29:1777–1791. doi: 10.1007/s00044-020-02610-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Chang CK, Lin SM, Satange R, Lin SC, Sun SC, Wu HY, Kehn-Hall K, Hou MH. Comput. Struct. Biotechnol. J. 2021;19:2246–2255. doi: 10.1016/j.csbj.2021.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Mirtaleb MS, Mirtaleb AH, Nosrati H, Heshmatnia J, Falak R, Zolfaghari Emameh R. Biomed. Pharmacother. 2021;138:111518. doi: 10.1016/j.biopha.2021.111518. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Ahmadi R, Sepehri B, Ghavami R. J. Recept. Signal Transduct. 2019;39:264–275. doi: 10.1080/10799893.2019.1660898. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Hoffman RL, Kania RS, Brothers MA, Davies JF, Ferre RA, Gajiwala KS, He M, Hogan RJ, Kozminski K, Li LY, Lockner JW, Lou J, Marra MT, Mitchell LJ, Jr, Murray BW, Nieman JA, Noell S, Planken SP, Rowe T, Ryan K, Smith GJ, III, Solowiej JE, Steppan CM, Taggart B. J. Med. Chem. 2020;63:12725–12747. doi: 10.1021/acs.jmedchem.0c01063. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Zhang L, Lin D, Sun X, Curth U, Drosten C, Sauerhering L, Becker S, Rox K, Hilgenfeld R. Science. 2020;368:409–412. doi: 10.1126/science.abb3405. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Dai W, Zhang B, Jiang XM, Su H, Li J, Zhao Y, Xie X, Jin Z, Peng J, Liu F, Li C, Li Y, Bai F, Wang H, Cheng X, Cen X, Hu S, Yang X, Wang J, Liu X, Xiao G, Jiang H, Rao Z, Zhang LK, Xu Y, Yang H, Liu H. Science. 2020;368:1331–1335. doi: 10.1126/science.abb4489. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Tomar S, Johnston ML, John SES, Osswald HL, Nyalapatla PR, Paul LN, Ghosh AK, Denison MR, Mesecar AD. J. Biol. Chem. 2015;290:19403–19422. doi: 10.1074/jbc.M115.651463. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Dai W, Jochmans D, Xie H, Yang H, Li J, Su H, Chang D, Wang J, Peng J, Zhu L, Nian Y, Hilgenfeld R, Jiang H, Chen K, Zhang L, Xu Y, Neyts J, Liu H. J. Med. Chem. 2021 doi: 10.1021/acs.jmedchem.0c02258. [DOI] [PubMed] [Google Scholar]

[CR36] 36.Bai B, Belovodskiy A, Hena M, Kandadai AS, Joyce MA, Saffran HA, Shields JA, Khan MB, Arutyunova E, Lu J, Bajwa SK, Hockman D, Fischer C, Lamer T, Vuong W, van Belkum MJ, Gu Z, Lin F, Du Y, Xu J, Rahim M, Young HS, Vederas JC, Tyrrell DL, Lemieux MJ, Nieman JA. J. Med. Chem. 2021 doi: 10.1021/acs.jmedchem.1c00616. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Vuong W, Khan MB, Fischer C, Arutyunova E, Lamer T, Shields J, Saffran HA, McKay RT, van Belkum MJ, Joyce MA, Young HS, Tyrrell DL, Vederas JC, Lemieux MJ. Nat. Commun. 2020;11:4282. doi: 10.1038/s41467-020-18096-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Chen Z, Boon SS, Wang MH, Chan RWY, Chan PKS. J. Virol. Methods. 2021;289:114032. doi: 10.1016/j.jviromet.2020.114032. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.HyperChem 7.1. Gainesville, USA: Hypercube, Inc. Available from: http://www.hyper.com

[CR40] 40.Milano chemometrics and QSAR research group, 2007. Available from http://www.talete.mi.it/dragon.htm

[CR41] 41.http://www.spss.com

[CR42] 42.https://www.r-project.org/

[CR43] 43.https://rstudio.com/

[CR44] 44.Sepehri B, Ghavami R, Farahbakhsh S, Ahmadi R. Int. J. Environ. Sci. Technol. 2021 doi: 10.1007/s13762-021-03271-9. [DOI] [Google Scholar]

[CR45] 45.https://www.researchgate.net/publication/350459619_MLRQSAR_package_version_010_for_R_programming_language

[CR46] 46.https://cloud.r-project.org/web/packages/h2o/index.html

[CR47] 47.https://cran.r-project.org/web/packages/ggplot2/index.html

[CR48] 48.Ghavami R, Sepehri B. J. Iran. Chem. Soc. 2016;13:519–529. doi: 10.1007/s13738-015-0761-2. [DOI] [Google Scholar]

[CR49] 49.Ghavami R, Sepehri B. J. Chromatogr. A. 2012;1233:116–125. doi: 10.1016/j.chroma.2012.01.047. [DOI] [PubMed] [Google Scholar]

[CR50] 50.Phil K. Matlab Deep Learning: With Machine Learning, Neural Networks and Artificial Intelligence. New York: Apress; 2017. [Google Scholar]

[CR51] 51.Cook D. Practical Machine Learning with H2O. Massachusetts: O’Reilly Media Inc; 2017. [Google Scholar]

[CR52] 52.J. Moolayil, Learn Keras for deep neural networks, (Jojo Moolayil, 2019)

[CR53] 53.A. Candel, E. LeDell, Deep learning with H₂O, (H₂O.ai, Inc, 2020)

[CR54] 54.Sepehri B, Ghavami R. Med. Chem. 2018;14:439–450. doi: 10.2174/1573406414666180321151029. [DOI] [PubMed] [Google Scholar]

[CR55] 55.Sepehri B, Ghavami R. J. Mol. Struct. 2017;1130:922–928. doi: 10.1016/j.molstruc.2016.10.079. [DOI] [Google Scholar]

[CR56] 56.Sepehri B, Rasouli Z, Hassanzadeh Z, Ghavami R. Med. Chem. Res. 2016;25:2895–2905. doi: 10.1007/s00044-016-1686-8. [DOI] [Google Scholar]

[CR57] 57.Sepehri B, Ghavami R. SAR QSAR Environ. Res. 2019;30:21–38. doi: 10.1080/1062936X.2018.1545695. [DOI] [PubMed] [Google Scholar]

[CR58] 58.Sepehri B. J. Mol. Liq. 2020;297:112013. doi: 10.1016/j.molliq.2019.112013. [DOI] [Google Scholar]

[CR59] 59.Devinyak O, Havrylyuk D, Lesyk R. J. Mol. Graph. Model. 2014;54:194–203. doi: 10.1016/j.jmgm.2014.10.006. [DOI] [PubMed] [Google Scholar]

PERMALINK

High predictive QSAR models for predicting the SARS coronavirus main protease inhibition activity of ketone-based covalent inhibitors

Bakhtyar Sepehri

Mohammad Kohnehpoushi

Raouf Ghavami

Abstract

Introduction