Abstract
Butyrylcholinesterase (BuChE) is a key enzyme implicated in the pathogenesis of Alzheimer’s disease (AD), and its inhibition represents a promising therapeutic strategy for disease management. Among various inhibitor classes, carbamate derivatives have attracted significant attention due to their pseudo-irreversible inhibition mechanism and favorable pharmacological profiles, making them valuable scaffolds in anti-Alzheimer drug discovery. In this study, a dataset of 205 carbamate derivatives was carefully compiled from reliable peer-reviewed literature, and QSAR modeling was performed for the first time on this dataset. Quantitative structure–activity relationship (QSAR) models were constructed to predict the BuChE inhibitory activity (pIC50) employing Monte Carlo optimization within the CORAL-2023 software framework. Hybrid optimal descriptors derived from SMILES notation and hydrogen-suppressed molecular graphs were utilized. Sixty models were developed across four random splits using four distinct target functions (TF0–TF3), among which the TF3-based models exhibited superior statistical performance (validation R2 ranging from 0.80 to 0.86, Q2 between 0.78 and 0.84, and RMSE values from 0.45 to 0.54). The mechanistic interpretation of the model showed that the increasing SMILES-based descriptors correspond to key pharmacophoric regions of the BuChE active site, including the PAS, acyl pocket, and catalytic triad. These correlations confirm that aromatic, hydrophobic, and branched fragments enhance inhibitory activity through π–π interactions, hydrophobic anchoring, and optimal orientation toward Ser198.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-025-33511-8.
Keywords: Butyrylcholinesterase inhibitors, Carbamate derivatives, QSAR modeling, Monte carlo optimization, SMILES
Subject terms: Chemistry, Computational biology and bioinformatics, Drug discovery
Introduction
Alzheimer’s disease (AD) is a progressive neurodegenerative condition primarily affecting regions of the brain responsible for cognition, memory, language, and awareness. It is most commonly observed in the elderly population1,2. A major pathological feature of AD is the impairment of central cholinergic neurotransmission. This process is tightly regulated by the levels of acetylcholine (ACh), a neurotransmitter that is rapidly degraded by the enzymes acetylcholinesterase (AChE) and butyrylcholinesterase (BuChE)3–6. Inhibiting these enzymes leads to elevated ACh concentrations in the brain, which can ameliorate cognitive symptoms associated with AD. Consequently, AChE and BuChE inhibitors have become a cornerstone of symptomatic treatment for the disease7,8.
Recent research has increasingly highlighted the significance of BuChE in the AD-affected brain, suggesting that selective BuChE inhibitors may hold particular therapeutic promise9–11. Various classes of cholinesterase inhibitors have been developed, with the first clinically approved example being physostigmine, a naturally occurring carbamate compound (Fig. 1)12. Carbamate derivatives, rooted in the cholinergic hypothesis that suggests improving cholinergic transmission by elevating acetylcholine levels may reduce Alzheimer’s disease symptoms, have emerged as strong inhibitors of AChE and BuChE and continue to play a central role in current therapeutic strategies for AD13.
Fig. 1.

The chemical structure of carbamates.
Accumulating experimental evidence supports the effectiveness of these compounds, indicating that carbamate-based inhibitors exhibit strong anti-cholinesterase activity and substantial potential for therapeutic application14.
Given the expanding volume of research in this area, a comprehensive analysis of recent advances in carbamate-based ChE inhibitors is warranted to support and accelerate drug discovery efforts. In this context, Quantitative Structure–Activity Relationship (QSAR) modeling offers a valuable strategy. QSAR techniques establish mathematical relationships between the biological activity or toxicity of chemical compounds and their physicochemical and structural characteristics (descriptors)15. These models enable the prediction of biological activity in compounds that have not yet been experimentally tested, thereby guiding and streamlining the drug development process. QSAR modeling offers several advantages for predicting biological responses, especially when experimental resources are limited. This approach is cost-effective, less time-consuming, and notably reduces the need for animal testing16. One of the primary strengths of QSAR is its ability to predict the biological activity of a large number of chemical compounds, as long as they are within the applicability domain of the model, by utilizing data from only a limited set of experimentally characterized substances17. Among the computational tools available for QSAR modeling, CORAL software (http://www.insilico.eu/coral) presents a particularly efficient option. It utilizes molecular descriptors derived solely from SMILES (Simplified Molecular Input Line Entry System) notation, bypassing the need for additional physicochemical or quantum mechanical descriptors. The models developed via CORAL are grounded in a Monte Carlo optimization algorithm, following the principle that “QSAR is a random event”18–21.
Given the limited availability of experimental data on cholinesterase inhibition, developing predictive computational models is both relevant and timely. At the same time, molecular docking simulations have become widely adopted techniques in drug discovery. These approaches assess the interaction potential between a ligand and a target protein, with the objective of predicting the most favorable binding orientation within the enzyme or receptor’s active site. In most cases, the predicted interaction strength, typically expressed as binding affinity, serves as a key criterion for identifying ligands with potential biological activity.
This study aims to assess the utility of the CORAL software for the development of QSAR models capable of predicting the butyrylcholinesterase (BuChE) inhibitory activity of carbamate derivatives. Molecular structures were represented using the Simplified Molecular Input Line Entry System (SMILES), from which relevant structural descriptors were systematically extracted. Model optimization was carried out using the balance-of-correlation method, employing four distinct target functions (TF0–TF3) to enhance both the predictive accuracy and the statistical robustness of the resulting models.
Data and method
Dataset
In this study, QSAR modeling was conducted using a dataset of 205 carbamate derivatives with experimentally determined inhibitory activity against the butyrylcholinesterase (BuChE) enzyme9,22–30. The IC50 values, defined as the concentration required to inhibit 50% of enzyme activity and originally reported in micromolar (µM), were extracted from ten published studies and transformed into their negative logarithmic form (pIC50), which served as the dependent variable in model construction. Molecular structures were drawn using the free software BIOVIA Draw and represented as SMILES (Simplified Molecular Input Line Entry System) strings for further computational analysis.
During the initial stage of QSAR model development, compounds exhibiting exceptionally high prediction errors were systematically identified as outliers. Specifically, structures 36, 37, 38, 40, 47, 48, 51, 104, 176, and 180 displayed residuals satisfying the criterion Residual > 3 × SD (three times the standard deviation of the model residuals). These compounds were consequently marked for exclusion from the training dataset to mitigate their disproportionate influence on model parameterization and to ensure the robustness, reliability, and predictive accuracy of the final QSAR model. Following this initial removal, 195 out of the original 205 molecules remained and were subsequently employed for the final model development. The 195 compounds were randomly divided into four subsets: active training (AT) (~ 35%), passive training (PT) (~ 25%), calibration (CAL) (~ 20%), and validation (VAL) (~ 20%) sets.
Table 1 summarizes the percentage of compound overlap between the randomized subsets AT, PT, CAL, and VAL) for Splits #1–#4. In all cases, the overlap percentages between any two subsets remain below 40%, indicating a sufficiently low redundancy among subsets. This low degree of overlap ensures that each subset retains a distinct chemical diversity profile, thereby minimizing the risk of information leakage between the training-related sets (AT, PT, CAL) and the external validation set (VAL). Consequently, the reported statistical parameters for model performance, particularly the external validation metrics, can be considered more reliable, as they reflect genuine predictive capability rather than memorization of shared structures. The consistent maintenance of this < 40% threshold across all four splits confirms the robustness of the randomization protocol applied in this study.
Table 1.
Percentage overlap between randomized subsets of the split 1 to 4 into the active training (AT), passive training (PT), calibration (CAL), and validation (VAL) sets.
| Split | Set | Split#1 | Split#2 | Split#3 | Split#4 |
|---|---|---|---|---|---|
| #1 | AT | 100 | 33.1 | 38.5 | 33.3 |
| PT | 100 | 16.7 | 21.2 | 19.8 | |
| CAL | 100 | 20.5 | 13.3 | 21.3 | |
| VAL | 100 | 25.7 | 28.0 | 27.8 | |
| #2 | AT | 100 | 30.9 | 33.6 | |
| PT | 100 | 28.2 | 24.7 | ||
| CAL | 100 | 27.2 | 9.9 | ||
| VAL | 100 | 29.7 | 27.5 | ||
| #3 | AT | 100 | 27.9 | ||
| PT | 100 | 19.5 | |||
| CAL | 100 | 20.5 | |||
| VAL | 100 | 24.1 | |||
| #4 | AT | 100 | |||
| PT | 100 | ||||
| CAL | 100 | ||||
| VAL | 100 |
Development of the SMILES-based QSAR model
The QSAR model used to predict the BuChE inhibitory activity (expressed as pIC50) was constructed using a descriptor based on the correlation weights of SMILES attributes, referred to as DCW (T, N). The general form of the model is given by the linear equation:
![]() |
1 |
where C0 and C1 are regression coefficients estimated via the least squares method, and DCW(T, N) is the SMILES-based descriptor optimized through Monte Carlo simulation. This descriptor reflects the contribution of statistically significant molecular features identified from SMILES representations of the compounds31.
The parameters T and N are central to the Monte Carlo optimization process. T is a threshold value used to categorize SMILES attributes into rare (inactive) and frequent (active) groups. Attributes classified as inactive are assigned a correlation weight of zero and excluded from further simulation steps. N denotes the number of optimization iterations. While these parameters are selected empirically, their choice significantly impacts model quality: selecting a high T value may exclude important features and oversimplify the model, whereas an excessively large N can result in unnecessary computational effort with no improvement in predictive performance, as the objective function may converge prematurely or stagnate.
This modeling framework enables efficient selection and weighting of relevant structural attributes, facilitating the development of statistically robust and interpretable QSAR models based solely on SMILES data.
Hybrid optimal descriptor
To model BuChE inhibitory activity (pIC50), a hybrid optimal descriptor, denoted as DCW(T*, N*), was employed by integrating molecular features derived from both SMILES notation and hydrogen-suppressed graphs (HSG). Previous studies have demonstrated that the use of hybrid optimal descriptors enhances the statistical quality and predictive performance of QSAR models compared to those developed using SMILES or graph-based data alone32,33. The hybrid descriptor is computed as the summation of two separate descriptors: one derived from SMILES and the other from HSG. Mathematically, this relationship is defined as:
![]() |
2 |
T∗ denotes the threshold used to prevent the inclusion of rare molecular features, while N∗ indicates the number of epochs used during the optimization process. The following mathematical equations are used to calculate the Descriptors of Correlation Weight (DCW) for both SMILES and hydrogen suppressed graph (HSG) representations:
![]() |
3 |
![]() |
4 |
The SMILES component includes contributions from various structural fragments and molecular attributes: Sk, SSk, and SSSk represent one-, two-, and three-atom SMILES fragments, respectively; NOSP and HALO indicate the presence or absence of nitrogen, oxygen, sulfur, phosphorus, and halogens (F, Cl, Br); BOND captures double, triple, and stereochemical bonds; and HARD combines NOSP, HALO, and BOND into a unified structural signal. Additionally, the variable Contr. reflects the contribution of selected atoms (Cl, Br, F, S, N, O) and bond types (= and #) present in the SMILES string. The descriptor APPk (atom pair proportion) quantifies the relative frequency of atom pairs composed of the specified elements and bond types, categorized into three states: state 0 excludes APP from modeling, state 1 includes it using standard correlation weights, and state 2 assigns it a prioritized role with enhanced weights. The Symm. variable represents local symmetry motifs such as XYX, XYYX, and XYZYX. The HSG component incorporates graph-theoretic descriptors including EC0k (Morgan extended connectivity of zero order), pt2k (number of paths of length two), VS2k (valence shell of second order), nnk (nearest neighbor topology), and indicators for five- and six-membered rings (C5 and C6). Each molecular feature, whether SMILES- or graph-based, is assigned a correlation weight CW(X), which is optimized through a Monte Carlo procedure to maximize a target function, thereby improving the overall performance and reliability of the QSAR model34.
Monte Carlo-based optimization of correlation weights
To compute the correlation weights required in Eq. 1, a Monte Carlo optimization algorithm is employed. This method serves as an effective tool for determining the optimal correlation weights necessary for QSAR modeling.
Monte Carlo optimization refines the correlation weights that provide numerical data on them, which maximizes the predictive potential of a model as much as possible. Figure 2 shows the flowchart of one cycle of Monte Carlo optimization of correlation weights (n is the number of correlation weights that contribute to model construction).
Fig. 2.

Flowchart of one cycle of the Monte Carlo optimization for finding correct correlation weights (n is the number of correlation weights that contribute to model construction).
The flowchart systematically presents each step of the optimization cycle as follows:
Initialization: The iteration index K is initialized to 0.
Increment Iteration: In each cycle, K is incremented by 1. A decision point checks whether K > n, where n is the total number of correlation weights contributing to model construction. If K > n, the algorithm proceeds to the next epoch; otherwise, the process continues.
Weight Adjustment: A step size Δ is set as 0.1 times the current correlation weight (CW). The correlation weight CWk is updated by adding Δ.
Target Function Evaluation: The target function (TF₀–TF₃) is calculated for the updated correlation weights.
-
Acceptance Criterion:
- If the TF increases after updating CWk, the new weight is accepted.
- If the TF does not increase, CWk is reverted by subtracting Δ and the step size is halved and reversed (Δ=−Δ/2), ensuring convergence towards the optimal weight.
Convergence Check: The absolute value of Δ is compared to a small threshold ε. If ∣Δ∣>ε, the loop continues; otherwise, the cycle ends and the algorithm moves to the next iteration.
This flowchart explicitly shows the stepwise Monte Carlo optimization procedure, including weight updates, evaluation of target functions, acceptance criteria, and convergence control. It ensures reproducibility and clarity of the computational workflow by indicating the applied criteria and threshold parameters at each step.
In this study, four different target functions were investigated for use in the Monte Carlo optimization process.
The first target function, referred to as TF₀, is based on the balance of correlations between training subsets:
![]() |
5 |
Here,
and
represent the correlation coefficients between observed and predicted values for the active and passive training sets, respectively35. This function rewards models that perform well in both subsets while penalizing large discrepancies between them. Intuitively, it ensures that the model maintains consistent predictive performance across different regions of the training data, preventing overfitting to only one subset.
The second target function (TF1) builds upon TF0 by incorporating the index of ideality of correlation (IIC):
![]() |
6 |
The IIC has been widely applied in recent studies to enhance the predictive capability of models constructed from optimal descriptors. It is calculated based on a structured training set comprising three subsets: active training, passive training, and calibration36–39. The IIC for the calibration set is defined as40,41:
![]() |
7 |
In this expression,
is the correlation coefficient between experimental and predicted pIC50 values for the calibration set. The negative and positive mean absolute errors with −MAE and +MAE are computed as:
![]() |
8 |
![]() |
9 |
![]() |
10 |
In these formulas, k denotes the data point index, while Obsk and Calck correspond to the experimental and model-predicted values, respectively. IIC is computed based on the calibration set and reflects both the magnitude and the symmetry of prediction errors. Models with higher IIC values not only exhibit strong correlations but also display well-balanced and systematically distributed errors, enhancing their generalization ability.
The third target function (TF2) also stems from TF0, but it introduces the correlation intensity index (CII) instead of IIC:
![]() |
11 |
Similar to the IIC, the CII was developed to enhance the quality of Monte Carlo optimization for generating robust QSAR/QSPR models. The CII for the calibration set is computed as:
![]() |
12 |
In this formulation, R2 is the correlation coefficient for the entire calibration set, and
is the coefficient obtained after removing the kth data point. If removing that point increases R2, the point is labeled as an “oppositionist.” A lower cumulative sum of such protests indicates a more stable and intensive correlation structure.
CII evaluates the stability of the correlation structure by identifying “oppositionist” data points whose removal improves the calibration correlation coefficient. A lower cumulative “protest” score indicates a robust model, less sensitive to individual influential points, thereby mitigating the risk of overfitting.
The fourth and final target function (TF3) is a combination of both IIC and CII, aiming to maximize the benefits of each:
![]() |
13 |
Recent studies have confirmed the usefulness of combining IIC and CII to enhance the predictive accuracy of QSAR models. These indices guide the optimization of correlation weights toward improving statistical performance for the calibration set and help to mitigate the risk of overfitting. However, it is essential to recognize that focusing optimization on the calibration set may, in some cases, reduce the statistical performance on the active and passive training sets, which should be considered when evaluating model robustness. By simultaneously rewarding ideal correlation (IIC) and structural robustness (CII), TF₃ provides a comprehensive target that maximizes predictive accuracy, balances error distribution, and enhances model stability. This function is particularly useful in optimizing correlation weights for descriptors to achieve reliable QSAR models across multiple validation subsets.
The hierarchy of target functions (TF0 → TF1/TF2 → TF3) represents a stepwise refinement strategy: starting with basic correlation balance (TF0), incorporating quality and symmetry of prediction errors (TF1), evaluating structural robustness (TF2), and finally integrating both error ideality and robustness (TF3). This approach ensures that Monte Carlo optimization leads to descriptor weights that produce accurate, generalizable, and statistically robust QSAR models.
Applicability domain definition based on SMILES statistical defects
The applicability domain of a QSAR model is a critical concept for assessing the reliability and uncertainty associated with the prediction of a given compound. The the applicability domain defines the physicochemical, structural, or biological space in which the model can make valid predictions, based specifically on the knowledge encoded in the training data. In essence, the the applicability domain represents the region of chemical space covered by the model’s descriptors, and predictions should ideally be limited to this domain through interpolation rather than extrapolation. Applying a QSAR model to compounds outside its the applicability domain undermines the reliability of its predictions, as no single model can realistically account for all chemical diversity.
In the CORAL modeling framework, the applicability domain is assessed by calculating the statistical “defects” of SMILES-based structural attributes (denoted as Sk). These defects quantify the variability in attribute distributions across the active training, passive training, and calibration sets. The defect for a given attribute Sk is defined by the following expression:
![]() |
14 |
![]() |
.
where PATRN(Sk), PPTRN(Sk) and PCAL(Sk) denote the probabilities of attribute Sk in the active training, passive training, and calibration sets, respectively. NATRN(Sk), NPTRN(Sk), and NCAL(Sk) represent their corresponding frequencies.
The overall SMILES-based statistical defect for a molecule is computed as the sum of the individual defects of its contributing attributes:
![]() |
15 |
where NA is the number of active SMILES attributes in the compound.
According to the CORAL protocol, a molecule is identified as an outlier and considered outside the applicability domain if its overall statistical defect is greater than twice the average defect of the active training set.
![]() |
16 |
is an average of statistical defects for the dataset of the active training set. This criterion helps ensure that predictions are made only for compounds well-represented in the model’s training space, thereby enhancing the credibility of the QSAR output.
Validation strategies for QSAR model reliability
Robust model validation is an essential component in the development of scientifically reliable QSAR models, serving to evaluate both the predictive reliability and statistical robustness of the constructed models. This process ensures that each phase of model construction follow to internationally recognized standards for reproducibility, transparency, and predictive accuracy. In QSAR/QSPR studies, validation protocols are broadly categorized into internal and external strategies, each addressing distinct yet complementary aspects of model credibility.
In the present study, the predictive performance and robustness of the QSAR models generated via the Monte Carlo optimization algorithm within the CORAL software environment were examined using multiple statistical validation protocols. Three complementary validation strategies were adopted: (i) Leave-One-Out (LOO) internal validation, (ii) external validation employing an independent test set, and (iii) Y-randomization testing to exclude spurious correlations.
For the training set, evaluation metrics included the coefficient of determination (R2), the LOO cross-validation coefficient (Q2LOO), and the mean absolute error for the training set (MAEtrain), among others. Predictions for the test set were assessed using external validation parameters such as Q2F1, Q2F2,
(including its mean,
, and
), the mean absolute error for the test set (MAEtest)42, and the concordance correlation coefficient (CCC). The mathematical formulations of these statistical measures are presented in Table 2.
Table 2.
Mathematical definitions of the validation metrics employed in this study.
| Validation type | Validation metrics |
|---|---|
| Internal validation |
|
| |
| External validation |
|
| |
| |
| |
| |
| |
| Y-scrambling |
|
The Y-randomization procedure was applied to confirm that the observed correlations were not artifacts of random data associations. This involved random permutation of the response variable, retraining the model, and verifying that a substantial reduction in predictive performance occurred, thereby substantiating the statistical validity of the original model. Additionally, the Index of Ideality of Correlation (IIC) and the Correlation Intensity Index (CII) were incorporated as supplementary diagnostic tools to further assess predictive strength, minimize overfitting risk, and enhance the models’ generalization potential.
According to widely accepted guidelines, a statistically robust QSAR/QSRR model should meet the thresholds R2 > 0.6, Q2 > 0.5, and Q2F1 / Q2F2 > 0.543. Todeschini44 further proposed that the cR2p parameter should exceed 0.5 to confirm robustness. Roy et al.45 emphasized that
is among the most stringent indicators of external predictive power; under recommended QSAR practices,
> 0.5 combined with
< 0.2 indicates satisfactory external predictivity46.
Results and discussion
Generation of QSAR models using different target functions
In this study, a total of 16 QSAR models were developed under the guiding hypothesis that “QSAR is a random event.” The modeling approach was based on the balance of correlation methodology and employed four versions of target functions (TF0, TF1, TF2, and TF3) integrated into the Monte Carlo optimization protocol. These models were generated using four independent random splits of the dataset.
To identify the optimal threshold value (T*) and the ideal number of epochs (N*), thresholds ranging from 1 to 10 and epochs from 1 to 40 were explored. For all target function variants (TF0 to TF3), the optimal modeling parameters were identified as T* = 5 and N* = 15. Consequently, the descriptor DCW5,15 was used in all modeling experiments. The statistical outcomes for models built using each target function are summarized in Table 3 (TF0), 4 (TF1), 5 (TF2), and 6 (TF3).
Table 3.
The statistical characteristics of models observed for ten splits in the case of the target function TF0.
| Split | Set | n | R 2 | CCC | IIC | CII | Q2 | Q2F1 | Q2F2 | Q2F3 | RMSE | MAE | F |
|
|
Y-Test | CR2p |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | AT | 67 | 0.9821 | 0.991 | 0.8534 | 0.9878 | 0.981 | 0.164 | 0.123 | 3568 | 0.9782 | ||||||
| PT | 42 | 0.9926 | 0.9298 | 0.6473 | 0.9956 | 0.9918 | 0.392 | 0.297 | 5379 | 0.9764 | |||||||
| CAL | 36 | 0.7408 | 0.8597 | 0.7577 | 0.8391 | 0.6989 | 0.7592 | 0.7212 | 0.8014 | 0.545 | 0.398 | 97 | 0.6396 | 0.0033 | 0.7238 | ||
| VAL | 50 | 0.7124 | 0.8420 | 0.6210 | 0.8323 | 0.6775 | 0.6549 | 0.4778 | 119 | 0.6032 | 0.0767 | 0.0252 | |||||
| 2 | AT | 60 | 0.9742 | 0.9869 | 0.7548 | 0.9826 | 0.9723 | 0.183 | 0.124 | 2191 | 0.9699 | ||||||
| PT | 42 | 0.988 | 0.9905 | 0.7102 | 0.9926 | 0.9868 | 0.152 | 0.123 | 3281 | 0.9770 | |||||||
| CAL | 42 | 0.6805 | 0.8234 | 0.8242 | 0.8001 | 0.6482 | 0.669 | 0.6303 | 0.4896 | 0.794 | 0.587 | 85 | 0.5641 | 0.0446 | 0.6635 | ||
| VAL | 51 | 0.7381 | 0.8548 | 0.6864 | 0.8355 | 0.7140 | 0.6482 | 0.5121 | 138 | 0.6344 | 0.1235 | 0.0224 | |||||
| 3 | AT | 63 | 0.9811 | 0.9905 | 0.9596 | 0.987 | 0.9799 | 0.19 | 0.137 | 3170 | 0.9739 | ||||||
| PT | 43 | 0.9913 | 0.994 | 0.5242 | 0.9935 | 0.9907 | 0.115 | 0.085 | 4678 | 0.9836 | |||||||
| CAL | 39 | 0.6484 | 0.794 | 0.6427 | 0.8314 | 0.6071 | 0.6484 | 0.6415 | 0.7909 | 0.591 | 0.456 | 68 | 0.5179 | 0.1797 | 0.6330 | ||
| VAL | 50 | 0.6484 | 0.8013 | 0.5127 | 0.7935 | 0.6193 | 0.6651 | 0.4428 | 89 | 0.5240 | 0.0553 | 0.0204 | |||||
| 4 | AT | 59 | 0.8517 | 0.9199 | 0.7787 | 0.906 | 0.8426 | 0.486 | 0.378 | 327 | 0.9651 | ||||||
| PT | 39 | 0.8527 | 0.9139 | 0.6517 | 0.9115 | 0.832 | 0.441 | 0.339 | 214 | 0.9782 | |||||||
| CAL | 39 | 0.8458 | 0.8919 | 0.9195 | 0.9124 | 0.8251 | 0.8261 | 0.8259 | 0.8516 | 0.457 | 0.355 | 203 | 0.634 | 0.1851 | 0.3655 | ||
| VAL | 58 | 0.6575 | 0.8031 | 0.5149 | 0.8377 | 0.6306 | 0.7344 | 0.5503 | 108 | 0.5321 | 0.1457 | 0.0222 |
All QSAR models developed in this study satisfied or surpassed these benchmark criteria, confirming their statistical robustness and predictive reliability. The outcomes strongly support the applicability of the models for accurate prediction of pesticide compound retention times. Furthermore, comparative analysis revealed that models optimized with the TF3 target function consistently demonstrated superior predictive accuracy and statistical stability relative to those built with TF0, TF1, TF2 and TF3, as evidenced by the validation results summarized in Tables 3, 4, 5 and 6.
Table 4.
The statistical characteristics of models observed for ten splits in the case of the target function TF1.
| Split | Set | n | R 2 | CCC | IIC | CII | Q2 | Q2F1 | Q2F2 | Q2F3 | RMSE | MAE | F |
|
|
Y-Test | CR2p |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | AT | 67 | 0.858 | 0.9236 | 0.5874 | 0.9094 | 0.8497 | 0.462 | 0.337 | 393 | 0.0079 | 0.8541 | |||||
| PT | 42 | 0.8694 | 0.8958 | 0.4353 | 0.923 | 0.8564 | 0.491 | 0.36 | 266 | 0.0134 | 0.8626 | ||||||
| CAL | 36 | 0.8384 | 0.9122 | 0.9155 | 0.8872 | 0.823 | 0.8579 | 0.8354 | 0.8828 | 0.419 | 0.31 | 176 | 0.7664 | 0.1298 | 0.0265 | 0.8250 | |
| VAL | 50 | 0.8402 | 0.9001 | 0.8584 | 0.9020 | 0.8247 | 0.4645 | 0.3917 | 252 | 0.6798 | 0.1671 | 0.0223 | |||||
| 2 | AT | 60 | 0.7285 | 0.8429 | 0.7468 | 0.8423 | 0.707 | 0.594 | 0.472 | 156 | 0.0056 | 0.7257 | |||||
| PT | 42 | 0.8887 | 0.9343 | 0.7856 | 0.9257 | 0.8787 | 0.374 | 0.297 | 319 | 0.0298 | 0.8737 | ||||||
| CAL | 42 | 0.8379 | 0.9123 | 0.9152 | 0.8952 | 0.8235 | 0.8546 | 0.8376 | 0.7759 | 0.526 | 0.413 | 207 | 0.7659 | 0.1383 | 0.0256 | 0.8250 | |
| VAL | 51 | 0.7977 | 0.8737 | 0.7748 | 0.9018 | 0.7737 | 0.5348 | 0.4232 | 193 | 0.6326 | 0.2002 | 0.0202 | |||||
| 3 | AT | 63 | 0.8537 | 0.9211 | 0.8951 | 0.9101 | 0.8442 | 0.528 | 0.385 | 356 | 0.0193 | 0.8440 | |||||
| PT | 43 | 0.8535 | 0.9065 | 0.7571 | 0.908 | 0.8405 | 0.500 | 0.380 | 239 | 0.0147 | 0.8461 | ||||||
| CAL | 39 | 0.7987 | 0.893 | 0.8937 | 0.8715 | 0.7776 | 0.7838 | 0.7795 | 0.8714 | 0.464 | 0.349 | 147 | 0.7151 | 0.0371 | 0.0149 | 0.7912 | |
| VAL | 50 | 0.7733 | 0.8788 | 0.8325 | 0.8721 | 0.7503 | 0.5431 | 0.3985 | 164 | 0.6817 | 0.0478 | 0.0166 | |||||
| 4 | AT | 59 | 0.8665 | 0.9285 | 0.7854 | 0.9097 | 0.8584 | 0.461 | 0.346 | 370 | 0.0195 | 0.8567 | |||||
| PT | 39 | 0.867 | 0.9123 | 0.5452 | 0.909 | 0.8455 | 0.458 | 0.34 | 241 | 0.0174 | 0.8582 | ||||||
| CAL | 39 | 0.7976 | 0.8578 | 0.8931 | 0.8694 | 0.7703 | 0.7773 | 0.7771 | 0.8099 | 0.518 | 0.413 | 146 | 0.5674 | 0.2321 | 0.0149 | 0.7901 | |
| VAL | 58 | 0.8600 | 0.9091 | 0.6827 | 0.9140 | 0.8505 | 0.4869 | 0.3795 | 0.7162 | 0.1456 | 0.0184 |
Table 5.
The statistical characteristics of models observed for ten splits in the case of the target function TF2.
| Split | Set | n | R 2 | CCC | IIC | CII | Q2 | Q2F1 | Q2F2 | Q2F3 | RMSE | MAE | F |
|
|
Y-Test | CR2p |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | AT | 67 | 0.9597 | 0.9794 | 0.8957 | 0.9718 | 0.9573 | 0.246 | 0.176 | 1547 | 0.0201 | 0.9496 | |||||
| PT | 42 | 0.9766 | 0.9538 | 0.4001 | 0.9873 | 0.9739 | 0.331 | 0.257 | 1671 | 0.0365 | 0.9582 | ||||||
| CAL | 36 | 0.8847 | 0.9399 | 0.929 | 0.9605 | 0.8654 | 0.8987 | 0.8827 | 0.9165 | 0.354 | 0.283 | 261 | 0.833 | 0.0345 | 0.0247 | 0.8723 | |
| VAL | 50 | 0.8165 | 0.9017 | 0.6638 | 0.8748 | 0.8024 | 0.5137 | 0.3847 | 214 | 0.7391 | 0.0684 | 0.0225 | |||||
| 2 | AT | 60 | 0.9653 | 0.9823 | 0.8038 | 0.9768 | 0.9626 | 0.212 | 0.153 | 1612 | 0.0055 | 0.9625 | |||||
| PT | 42 | 0.9842 | 0.9892 | 0.5494 | 0.991 | 0.9826 | 0.161 | 0.134 | 2499 | 0.0290 | 0.9696 | ||||||
| CAL | 42 | 0.9043 | 0.9402 | 0.4245 | 0.9622 | 0.8934 | 0.8892 | 0.8762 | 0.8291 | 0.46 | 0.391 | 378 | 0.861 | 0.0185 | 0.0192 | 0.8946 | |
| VAL | 51 | 0.7437 | 0.8493 | 0.6526 | 0.8661 | 0.7186 | 0.6918 | 0.5579 | 142 | 0.6185 | 0.2191 | 0.0243 | |||||
| 3 | AT | 63 | 0.9617 | 0.9805 | 0.95 | 0.9779 | 0.9593 | 0.27 | 0.218 | 1531 | 0.0261 | 0.9485 | |||||
| PT | 43 | 0.9801 | 0.985 | 0.6365 | 0.9866 | 0.9786 | 0.181 | 0.142 | 2021 | 0.0363 | 0.9618 | ||||||
| CAL | 39 | 0.9061 | 0.9484 | 0.8191 | 0.9641 | 0.8935 | 0.9018 | 0.8999 | 0.9416 | 0.312 | 0.25 | 357 | 0.8622 | 0.0849 | 0.0116 | 0.9002 | |
| VAL | 50 | 0.7698 | 0.8758 | 0.7781 | 0.8608 | 0.7487 | 0.5285 | 0.3843 | 160 | 0.6762 | 0.0806 | 0.0164 | |||||
| 4 | AT | 59 | 0.9421 | 0.9702 | 0.8767 | 0.9679 | 0.9379 | 0.304 | 0.244 | 928 | 0.0170 | 0.9336 | |||||
| PT | 39 | 0.9591 | 0.9633 | 0.5951 | 0.9735 | 0.9533 | 0.275 | 0.23 | 868 | 0.0260 | 0.9460 | ||||||
| CAL | 39 | 0.6463 | 0.7677 | 0.4273 | 0.9519 | 0.5975 | 0.6127 | 0.6123 | 0.6694 | 0.683 | 0.533 | 68 | 0.5109 | 0.2392 | 0.0167 | 0.6379 | |
| VAL | 58 | 0.7158 | 0.8445 | 0.6761 | 0.8388 | 0.6851 | 0.6872 | 0.4933 | 141 | 0.6084 | 0.0031 | 0.0191 |
Table 6.
The statistical characteristics of models observed for ten splits in the case of the target function TF3.
| Split | Set | n | R 2 | CCC | IIC | CII | Q2 | Q2F1 | Q2F2 | Q2F3 | RMSE | MAE | F |
|
|
Y-Test | CR2p |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | AT | 67 | 0.8344 | 0.9097 | 0.6166 | 0.8904 | 0.8241 | 0.499 | 0.367 | 328 | 0.0240 | 0.8223 | |||||
| PT | 42 | 0.8344 | 0.8793 | 0.5595 | 0.8952 | 0.8176 | 0.527 | 0.419 | 202 | 0.0140 | 0.8274 | ||||||
| CAL | 36 | 0.875 | 0.9215 | 0.9354 | 0.9166 | 0.8625 | 0.8842 | 0.8659 | 0.9045 | 0.378 | 0.294 | 238 | 0.7234 | 0.1368 | 0.0147 | 0.8676 | |
| VAL | 50 | 0.8630 | 0.9016 | 0.7244 | 0.9206 | 0.8493 | 0.4510 | 0.3832 | 302 | 0.6493 | 0.1714 | 0.0225 | |||||
| 2 | AT | 60 | 0.8171 | 0.8993 | 0.7396 | 0.892 | 0.8038 | 0.487 | 0.383 | 259 | 0.0230 | 0.8055 | |||||
| PT | 42 | 0.8843 | 0.9349 | 0.7103 | 0.9252 | 0.8741 | 0.398 | 0.306 | 306 | 0.0209 | 0.8738 | ||||||
| CAL | 42 | 0.8862 | 0.9396 | 0.9413 | 0.9262 | 0.876 | 0.8864 | 0.8731 | 0.8248 | 0.465 | 0.377 | 311 | 0.835 | 0.0645 | 0.0165 | 0.8779 | |
| VAL | 51 | 0.8059 | 0.8959 | 0.8869 | 0.8879 | 0.7864 | 0.5162 | 0.4270 | 204 | 0.7248 | 0.0661 | 0.0243 | |||||
| 3 | AT | 63 | 0.8497 | 0.9187 | 0.893 | 0.9069 | 0.8406 | 0.535 | 0.406 | 345 | 0.0213 | 0.8389 | |||||
| PT | 43 | 0.85 | 0.9044 | 0.5928 | 0.9096 | 0.836 | 0.5 | 0.378 | 232 | 0.0157 | 0.8422 | ||||||
| CAL | 39 | 0.8242 | 0.9077 | 0.9079 | 0.9032 | 0.8056 | 0.8222 | 0.8187 | 0.8943 | 0.42 | 0.348 | 174 | 0.7494 | 0.0327 | 0.024 | 0.8121 | |
| VAL | 50 | 0.8004 | 0.8946 | 0.8808 | 0.8747 | 0.7822 | 0.4986 | 0.3774 | 192 | 0.7178 | 0.0005 | 0.0164 | |||||
| 4 | AT | 59 | 0.8517 | 0.9199 | 0.7787 | 0.906 | 0.8426 | 0.486 | 0.378 | 327 | 0.0181 | 0.8426 | |||||
| PT | 39 | 0.8527 | 0.9139 | 0.6517 | 0.9115 | 0.832 | 0.441 | 0.339 | 214 | 0.0310 | 0.8370 | ||||||
| CAL | 39 | 0.8458 | 0.8919 | 0.9195 | 0.9124 | 0.8251 | 0.8261 | 0.8259 | 0.8516 | 0.457 | 0.355 | 203 | 0.634 | 0.1851 | 0.0408 | 0.8252 | |
| VAL | 58 | 0.8175 | 0.8824 | 0.6221 | 0.8867 | 0.8047 | 0.5506 | 0.4074 | 251 | 0.6711 | 0.1788 | 0.0191 |
These results demonstrate that all models developed across different data splits and target functions were statistically reliable. Notably, a consistent improvement in predictive performance was observed as models transitioned from TF0 to TF3. Among them, models optimized using TF3 exhibited the most favorable predictive power.
Figures 3 and 4 present a comparative analysis of the statistical performance of all developed QSAR models for the validation sets, using determination coefficients (R2, Fig. 3) and root mean square errors (RMSE, Fig. 4) as key performance metrics. Across all four random splits and target functions (TF0–TF3), a clear and consistent pattern emerges: TF3 delivers superior predictive performance compared to TF0, TF1, and TF2. This superiority is manifested in its consistently higher R2 values (0.80–0.86) coupled with lower RMSE values (0.45–0.54), indicating that TF3 optimizes the descriptor selection and weighting in a manner that maximizes correlation while effectively controlling prediction errors.
Fig. 3.
Comparison of the determination coefficients (R2) for validation set calculated by TF0, TF1, TF2, and TF3 of all four splits.
Fig. 4.
Comparison of RMSE for validation set calculated by TF0, TF1, TF2, and TF3 of all four splits.
From a methodological standpoint, TF3’s advantage can be attributed to its multi-objective optimization strategy in CORAL, which integrates both correlation maximization and error minimization into the target function. This balanced optimization likely enhances the signal-to-noise ratio in the descriptor set, reduces overfitting risk, and results in models with better generalization to unseen data.
Within the TF3 group, Split #1 stands out as the best-performing configuration, achieving the highest validation R2 (0.86) and the lowest RMSE (0.45) among all models. This indicates that the random distribution of training and validation compounds in Split #1 provided an optimal chemical diversity balance, enabling the model to learn representative patterns without bias toward specific structural subgroups.
Overall, these findings identify the TF3–Split #1 model as the most statistically reliable and practically applicable predictive tool in this study, combining robust internal consistency with high external predictive accuracy.
The regression equations for the models constructed using TF3-based Monte Carlo optimization for splits #1 through #4 are as follows:
![]() |
17 |
![]() |
18 |
![]() |
19 |
![]() |
20 |
Among these, the model from Split #1 demonstrated the highest predictive performance based on the determination coefficient of its validation set (see Table 6), and it was therefore selected as the representative model.
Figures 5 and 6 present plots of experimental versus predicted pIC50 values and residuals versus experimental pIC50 values for the four TF3-based models. As shown in Fig. 5, a strong linear correlation exists between experimental and predicted outcomes, confirming the reliability of the models. Figure 6 illustrates that residuals are symmetrically distributed around zero, indicating low prediction bias.
Fig. 5.
Graphical representation of predicted versus experimental pIC50 for split 1 to split 4 with TF3.
Fig. 6.
Graphical representation of residual versus experimental pIC50 for split 1 to split 4 with TF3.
An in-depth applicability domain assessment was performed using average defect values calculated via the CORAL software, with the aim of identifying chemical compounds whose structural or descriptor profiles deviate significantly from the representative chemical space of the training data. Such compounds, referred to as outliers, may exhibit reduced prediction reliability due to insufficient similarity to the domain encompassed by the model.
For the training sets of data splits #1 through #4, the number of compounds flagged as outliers was 24, 25, 21, and 14, respectively. These correspond to approximately 83%, 83%, 85%, and 90% of the training set compounds falling within the applicability domain. The consistently high inclusion percentages (> 80%) across all splits indicate that the majority of training compounds are well represented within the model’s descriptor space, suggesting a strong foundation for reliable internal predictions.
For the validation sets, the number of outliers identified for splits #1 through #4 was 6, 3, 13, and 9, respectively, corresponding to approximately 88%, 94%, 74%, and 84% of the validation compounds residing within the applicability domain. Although most validation compounds fall within the the applicability domain, the lower coverage observed in split #3 (74%) warrants attention, as it suggests that a notable fraction of compounds in this split may lie outside the well-characterized model space, potentially impacting prediction reliability for those particular cases.
Overall, the the applicability domain analysis confirms that the majority of both training and validation compounds are contained within the models’ applicability domain, supporting the robustness and generalization capability of the developed QSAR/QSPR models. Nonetheless, the identification of outliers, particularly in certain validation splits, highlights the importance of considering the applicability domain coverage when interpreting predictive performance metrics. Detailed compound-level results and defect value distributions are provided in the Supplementary Material.
Interpretation of the QSAR model
One of the principal advantages of the developed QSAR models lies in their mechanistic interpretability, which enables the identification of molecular features that contribute to either an increase or a decrease in the magnitude of the investigated endpoint. This capability also extends to recognizing structural attributes whose effects remain inconclusive.
By performing Monte Carlo optimization with multiple distinct initializations under identical conditions, namely, the same data splitting strategy and parameterization, the SMILES-derived molecular descriptors can be classified into three categories: (1) features that consistently exhibit positive correlation weights across all runs, referred to as promoters of endpoint increase; (2) features that consistently exhibit negative correlation weights across all runs, referred to as promoters of endpoint decrease; and (3) features that display both positive and negative weights in different runs, indicating an uncertain influence on the endpoint.
As summarized in Table 7, several additional SAks were identified as increasing descriptors, including: the presence of at least one ring (1……….); the combination of aliphatic carbon with branching (C…(…….); Morgan connectivity of first order for carbon equal to 6 (EC1-C…6…); the nearest-neighbor code for a carbon atom equal to 220 (NNC-C…220); a path length of 2 equal to 2 for a carbon atom (PT2-C…2…); and the presence of two successive aromatic carbon atoms (c…c…….).
Table 7.
Significant structural attributes identified as contributors to the increase of pIC50.
| Structural | Split# | CWs | CWs | CWs | NATa | NPTb | NCALc | Defect | Description |
|---|---|---|---|---|---|---|---|---|---|
| attributes | Probe 1 | Probe 2 | Probe 3 | ||||||
| 1………. | 1 | 4.1026 | 0.3526 | 0.3526 | 67 | 42 | 36 | 0 | Presence of at least one ring |
| 2 | 6.2762 | 0.0262 | 0.0262 | 60 | 42 | 42 | 0 | ||
| 3 | 1.6016 | 0.1641 | 0.1641 | 63 | 43 | 39 | 0 | ||
| 4 | 4.6615 | 0.0365 | 0.0365 | 59 | 39 | 39 | 0 | ||
| C…(……. | 1 | 0.6782 | 0.0532 | 0.0532 | 67 | 42 | 36 | 0 | Combination of aliphatic carbon with branching |
| 2 | 0.3598 | 0.4223 | 0.4223 | 60 | 42 | 42 | 0 | ||
| 3 | 0.0854 | 0.0854 | 0.0854 | 63 | 43 | 39 | 0 | ||
| 4 | 0.4065 | 0.4065 | 0.4065 | 59 | 39 | 39 | 0 | ||
| EC1-C…6… | 1 | 0.3172 | 0.0672 | 0.0672 | 67 | 42 | 36 | 0 | Morgan connectivity first order for carbon equal to 6 |
| 2 | 0.3706 | 0.4331 | 0.4331 | 60 | 42 | 42 | 0 | ||
| 3 | 0.3203 | 0.3203 | 0.3203 | 63 | 43 | 39 | 0 | ||
| 4 | 0.6191 | 0.3691 | 0.3691 | 59 | 39 | 39 | 0 | ||
| NNC-C…220. | 1 | 0.0688 | 0.3188 | 0.3188 | 67 | 42 | 36 | 0 | Nearest neighbor Code for a carbon atom equal to 220 |
| 2 | 0.2644 | 0.3269 | 0.3269 | 60 | 42 | 42 | 0 | ||
| 3 | 0.0669 | 0.0669 | 0.0669 | 63 | 43 | 39 | 0 | ||
| 4 | 0.007 | 0.257 | 0.257 | 59 | 39 | 39 | 0 | ||
| PT2-C…2… | 1 | 0.0151 | 0.0776 | 0.0776 | 67 | 42 | 36 | 0 | The path of length 2 is equal to 2 for a carbon atom |
| 3 | 0.0378 | 0.2878 | 0.2878 | 60 | 42 | 42 | 0 | ||
| 4 | 0.2204 | 0.2829 | 0.2829 | 59 | 39 | 39 | 0 | ||
| c…c……. | 1 | 0.3098 | 0.3723 | 0.3723 | 67 | 42 | 36 | 0 | Presence of two successive aromatic carbons |
| 2 | 0.1819 | 0.2444 | 0.2444 | 60 | 42 | 42 | 0 | ||
| 3 | 0.4096 | 0.4721 | 0.4721 | 63 | 43 | 39 | 0 | ||
| 4 | 0.3603 | 0.3603 | 0.3603 | 59 | 39 | 39 | 0 |
a,b,cThe occurrence frequencies of SMILES attributes within the active training, passive training, and calibration datasets, respectively.
In agreement with the findings of Nour et al.30, the anti-BuChE activity is mainly influenced by the octanol–water partition coefficient (log P), highest occupied molecular orbital energy (Ehomo), total energy (Et), and dipole moment (µ). A decrease in Ehomo or an increase in log P via the introduction of hydrophobic (alkyl) substituents, tends to enhance inhibitory activity.
The present analysis further demonstrates that descriptors such as the combination of aliphatic carbon with branching (C…(…….), the nearest-neighbor code for a carbon atom equal to 220 (NNC-C…220), and the presence of two successive aromatic carbon atoms (c…c…….) act as increasing descriptors, which is consistent with an increase in log P and, consequently, with higher anti-BuChE activity.
Table 8 visualizes the structures of several high-activity carbamate derivatives (high pIC50 values) with the corresponding increasing molecular descriptors highlighted. The experimental and predicted pIC50 values obtained from the best QSAR model are also indicated.
Table 8.
Visualization of high-activity carbamate derivatives with highlighted increasing descriptors and their experimental and predicted pIC50 values from the best QSAR model.
| No. | Structure | pIC50 | No. | Structure | pIC50 | ||
|---|---|---|---|---|---|---|---|
| Exp. | Prd. | Exp. | Prd. | ||||
| 39 |
|
7.16 | 7.04 | 178 |
|
7.82 | 7.70 |
| 59 |
|
7.24 | 6.28 | 183 |
|
7.72 | 7.57 |
| 70 |
|
7.57 | 6.58 | 189 |
|
7.82 | 7.28 |
Model interpretation reveals that several SMILES-derived increasing descriptors, such as the presence of two successive aromatic carbons (c…c…), the presence of at least one ring, and carbon branching patterns, are highly consistent with the known pharmacophore architecture of the BuChE active site47,48. The BuChE peripheral anionic site (PAS), mainly composed of aromatic residues such as Trp82, Tyr332, and Phe329, exhibits strong π–π and hydrophobic interactions with aromatic or bulky rings. Therefore, the increasing descriptors identifying aromatic fragments directly agree with the ability of carbamate derivatives to anchor into the PAS through π-stacking, which stabilizes initial ligand orientation prior to carbamoylation of the catalytic Ser198 .
Descriptors associated with aliphatic carbon branching (C.(….)) and hydrophobic atom-neighbor patterns (NNC-C…220, PT2-C…2…) correlate with an increase in BuChE inhibition. This trend aligns with the structural characteristics of the acyl pocket of BuChE, which is wider and more hydrophobic than in AChE. The acyl pocket (formed by residues such as Leu286, Val288, and Phe398) can effectively accommodate bulky and branched alkyl groups9,24. Thus, the model’s identification of branched aliphatic carbons as activity-promoting features corresponds to the known ability of BuChE to bind bulky carbamate substituents, enabling favorable hydrophobic and van der Waals contacts.
The observation that several increasing descriptors relate indirectly to higher logP values (e.g., extended hydrophobic fragments) also supports mechanistic relevance. More hydrophobic carbamate derivatives are better positioned within the gorge of BuChE, which is dominated by nonpolar residues, leading to a more favorable orientation toward the catalytic triad (Ser198–His438–Glu325). This orientation is crucial for efficient covalent carbamoylation of Ser19830,49. Therefore, the model’s tendency to favor hydrophobic substituents matches the biochemical mechanism of pseudo-irreversible BuChE inhibition by carbamates.
This alignment between model-derived structural features and the established BuChE pharmacophore is consistent with prior SAR and docking studies, which similarly highlight the importance of aromatic anchoring in PAS, hydrophobic occupation of the acyl pocket, and optimal positioning of the carbamate moiety near Ser19850. Consequently, the increasing descriptors identified in Table 8 not only reflect statistical correlations but are mechanistically significant within the known biochemical context of BuChE–ligand interactions.
Comparison of the results with reported models
Several QSAR studies have been reported for modeling BuChE inhibitors using different compound series and computational strategies. Early QSAR efforts, such as that by Uddin et al.51, developed 3D-QSAR models (CoMFA and CoMSIA) based on 39 steroidal alkaloids, achieving strong statistical results (q2 = 0.701, r2 = 0.979). However, their approach required molecular alignment and was limited by a small, structurally homogeneous dataset.
Fang et al.49 and Pang et al.52 applied 2D- and 3D-QSAR methods to berberine and DL0410 derivatives, respectively, with acceptable predictivity (R2 ≈ 0.88). More data-driven approaches were reported by Bitam et al. (2018)53, who used machine learning models (SVR, MLP, and GA-MLR) on 151 tacrine derivatives, achieving R2_test = 0.906, and by Kumar al.54, who constructed GA-MLR models with R2 > 0.81.
More recently, El Allouche et al.55 explored benzodiazepine-1,2,3-triazole derivatives using a 2D-QSAR model based on MLR, yielding R2 values of 0.77 and 0.81 for the training and test sets, respectively. Their integrated workflow highlighted the role of hydrogen bonding in BuChE inhibition and demonstrated the predictive potential of hybrid chemoinformatics pipelines. Nonetheless, their study was limited to 31 compounds and relied on manually derived molecular descriptors.
Particularly relevant to the present work, a 2D-QSAR study of carbamate derivatives by Nour et al.30 applied MLR with DFT- and Lipinski-derived descriptors to model BuChE inhibition. Their model demonstrated strong predictivity (R2_test = 0.817; Q2_CV = 0.774), identifying molecular features such as logP, HOMO energy, total energy, and dipole moment as key determinants of inhibitory activity. While the 2022 study used a small dataset (36 compounds) and conventional descriptors, it highlighted the potential of QSAR modeling for carbamate-type BuChE inhibitors.
In contrast, the present SMILES-based QSAR model was developed using a significantly larger and more chemically diverse dataset (205 carbamate derivatives) through Monte Carlo optimization within the CORAL-2023 framework. The resulting models exhibited robust statistical quality (R2_val = 0.80–0.86; Q2 = 0.78–0.84; RMSE = 0.45–0.54) and strong external validation. Unlike descriptor-based or alignment-dependent QSAR models, the proposed method is alignment-free, interpretable, and computationally efficient, offering broader chemical coverage and comparable or superior predictivity. Therefore, the present work advances previous QSAR studies by integrating data-driven optimization with simplified molecular representation, leading to a reliable and interpretable model for BuChE inhibition prediction. Moreover, the present model provides structural insights by identifying SMILES fragments and molecular features that contribute positively or negatively to BuChE inhibition, which are consistent with the steric and electronic patterns reported in earlier 2D/3D-QSAR studies.
Conclusion
In this work, to predict pIC50 of 205 carbamate derivatives as butyrylcholinesterase inhibitor, QSAR models were created using the Monte Carlo method and validated with several parameters.
The SMILES were used to symbolize the chemical structures of carbamate derivatives compounds.
The hybrid optimal descriptor calculated using SMILES by the Monte Carlo algorithm using the CORAL software was used to establish models. To generate QSAR models, we have employed four approaches namely balance of correlation without IIC or CII (TF0), balance of correlation with IIC (TF1), balance of correlation with CII (TF2) and balance of correlation with both IIC and CII (TF3) to generate 20 QSAR models from 5 splits. The models created using the third target function (TF3) are considered more reliable and robust and have better prediction capacity. The index of ideality of correlation and correlation intensity index one more had confirmed their suitability as a tool to improve the predictive potential of the model. The resulting QSAR model with TF3, having R2Validation = 0.86 for Split #1, is higher than the other established models, so, it is considered as the prominent model. Various standard statistical benchmarks such as R2, CCC, IIC, CII, Q2,
,
,
, MAE, RMSE, CR2p, F, Y- test,
,
were computed to judge the predictive potential and robustness of developed QSAR models. Based on “statistical defect,” d(S), the applicability domain was also studied. The mechanistic interpretation was done by identifying the SMILES attributes responsible for the promoter of endpoint increase and promoter of endpoint decrease. The identified activity-promoting descriptors were found to align strongly with the established pharmacophore architecture of BuChE, particularly the PAS, acyl pocket, and catalytic triad. This mechanistic agreement supports the reliability of the developed QSAR model and highlights its practical relevance for guiding structural optimization of future carbamate-based BuChE inhibitors.
Although the developed SMILES-based QSAR models demonstrated strong statistical performance and predictive reliability, several limitations should be acknowledged. First, the models were constructed using the available experimental dataset of carbamate derivatives, which may not fully represent compounds with substantially different structural scaffolds. Second, while the CORAL-generated descriptors provide excellent predictive capacity, they do not explicitly account for three-dimensional conformational or dynamic effects occurring within the BuChE active site. Therefore, further experimental validation and molecular docking or dynamics simulations are recommended to confirm the predictive reliability and general applicability of the proposed models.
Supplementary Information
Below is the link to the electronic supplementary material.
Author contributions
Negin Latifi contributed to data curation, methodology, software, visualization, and drafting the original manuscript. Shahin Ahmadi was responsible for conceptualization, supervision, methodology, investigation, validation, and critical revision of the manuscript. Shahram Lotfi contributed to conceptualization, drafting, and critical review. Saeed Akbarzadeh participated in conceptualization and critical revision of the manuscript. All authors have read and approved the final version of the manuscript and agree to be accountable for all aspects of the work.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Data availability
The datasets generated and/or analyzed during the current study are available from the corresponding author request.
Declarations
Ethics statement
This study does not involve any research with human participants or animals conducted by any of the authors.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Li, Y. et al. Development of novel Rivastigmine derivatives as selective BuChE inhibitors for the treatment of AD. Bioorg. Chem. 108245 (2025). [DOI] [PubMed]
- 2.Darras, F. H., Kling, B., Sawatzky, E., Heilmann, J. & Decker, M. Cyclic acyl guanidines bearing carbamate moieties allow potent and dirigible cholinesterase inhibition of either acetyl-or butyrylcholinesterase. Bioorg. Med. Chem.22(17), 5020–5034 (2014). [DOI] [PubMed] [Google Scholar]
- 3.Ahmadi, S. & Ganji, S. Genetic algorithm and self-organizing maps for QSPR study of some N-aryl derivatives as butyrylcholinesterase inhibitors. Curr. Drug Discov Technol.13(4), 232–253 (2016). [DOI] [PubMed] [Google Scholar]
- 4.Berry, A. S. & Harrison, T. M. New perspectives on the basal forebrain cholinergic system in Alzheimer’s disease. Neurosci. Biobehav Rev.150, 105192 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Zhang, Y. L. et al. Carvacrol/thymol derivatives as highly selective BuChE inhibitors with anti-inflammatory activities: discovery and bio-evaluation. Bioorg. Chem.160, 108430 (2025). [DOI] [PubMed] [Google Scholar]
- 6.Bravo, S. O., Henley, J. & Rodriguez-Ithurralde, D. (35) Acetylcholinesterase effects on glutamate receptors. Chem. Biol. Interact.157, 410–411 (2005). [PubMed]
- 7.Nazari, Z., Ahmadi, S. & Almasirad, A. Quantitative structure-activity relationship of some flavonoid derivatives as acetylcholinesterase inhibitors based on Monte Carlo algorithm. (2024).
- 8.Yiannopoulou, K. G. & Papageorgiou, S. G. Current and future treatments for Alzheimer’s disease. Ther. Adv. Neurol. Disord. 6(1), 19–33 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bajda, M., Łątka, K., Hebda, M., Jończyk, J. & Malawska, B. Novel carbamate derivatives as selective butyrylcholinesterase inhibitors. Bioorg. Chem.78, 29–38 (2018). [DOI] [PubMed] [Google Scholar]
- 10.Greig, N. H. et al. Selective butyrylcholinesterase Inhibition elevates brain acetylcholine, augments learning and lowers Alzheimer β-amyloid peptide in rodent. Proc. Natl. Acad. Sci.102(47), 17213–17218 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Nordberg, A., Ballard, C., Bullock, R., Darreh-Shori, T. & Somogyi, M. A review of butyrylcholinesterase as a therapeutic target in the treatment of Alzheimer’s disease. Prim. Care Companion CNS Disord. 15(2), 26731 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kovacs, G. G. et al. Linking pathways in the developing and aging brain with neurodegeneration. Neuroscience269, 152–172 (2014). [DOI] [PubMed] [Google Scholar]
- 13.Pizova, H. et al. Proline-based carbamates as cholinesterase inhibitors. Molecules22(11), 1969 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Matošević, A. & Bosak, A. Carbamate group as structural motif in drugs: A review of carbamate derivatives used as therapeutic agents. Arch. Ind. Hyg. Toxicol.71(4), 285 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ahmadi, S., Ketabi, S. & Jebeli Javan, M. Molecular descriptors in QSPR/QSAR modeling. In QSPR/QSAR Analysis Using SMILES and Quasi-SMILES. 25–56 (Springer, 2023).
- 16.Dearden, J. C. The history and development of quantitative structure-activity relationships (QSARs). In Oncology: Breakthroughs in Research and Practice, 67–117. (IGI Global, 2017).
- 17.Khan, K., Benfenati, E. & Roy, K. Consensus QSAR modeling of toxicity of pharmaceuticals to different aquatic organisms: ranking and prioritization of the drugbank database compounds. Ecotoxicol. Environ. Saf.168, 287–297 (2019). [DOI] [PubMed] [Google Scholar]
- 18.Ghiasi, T., Ahmadi, S., Ahmadi, E., Talei Bavil Olyai, M. R. & Khodadadi, Z. The index of ideality of correlation: QSAR studies of hepatitis C virus NS3/4A protease inhibitors using SMILES descriptors. SAR QSAR Environ. Res.32(6), 495–520 (2021). [DOI] [PubMed] [Google Scholar]
- 19.Toropova, A. P., Toropov, A. A., Benfenati, E. & Gini, G. QSAR models for toxicity of organic substances to daphnia magna built up by using the CORAL freeware. Chem. Biol. Drug Des.79(3), 332–338 (2012). [DOI] [PubMed] [Google Scholar]
- 20.Achary, P. G. R. QSPR modelling of dielectric constants of π-conjugated organic compounds by means of the CORAL software. SAR QSAR Environ. Res.25(6), 507–526 (2014). [DOI] [PubMed] [Google Scholar]
- 21.Živković, J. V., Trutić, N. V., Veselinović, J. B., Nikolić, G. M. & Veselinović, A. M. Monte Carlo method based QSAR modeling of maleimide derivatives as glycogen synthase kinase-3β inhibitors. Comput. Biol. Med.64, 276–282 (2015). [DOI] [PubMed] [Google Scholar]
- 22.Jiang, X. et al. Novel cannabidiol – carbamate hybrids as selective BuChE inhibitors: Docking-based fragment reassembly for the development of potential therapeutic agents against Alzheimer’s disease. Eur. J. Med. Chem.223, 113735 (2021). [DOI] [PubMed] [Google Scholar]
- 23.Krátký, M., Štěpánková, Š., Vorčáková, K., Švarcová, M. & Vinšová, J. Novel cholinesterase inhibitors based on O-aromatic N, N-disubstituted carbamates and thiocarbamates. Molecules21(2), 191 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Panek, D. et al. Discovery of new, highly potent and selective inhibitors of BuChE-design, synthesis, in vitro and in vivo evaluation and crystallography studies. Eur. J. Med. Chem.249, 115135 (2023). [DOI] [PubMed] [Google Scholar]
- 25.Wu, J. et al. Design, synthesis and biological evaluation of naringenin carbamate derivatives as potential multifunctional agents for the treatment of Alzheimer’s disease. Bioorg. Med. Chem. Lett.49, 128316 (2021). [DOI] [PubMed] [Google Scholar]
- 26.Yu, C. et al. Novel anti-neuroinflammatory pyranone-carbamate derivatives as selective butyrylcholinesterase inhibitors for treating Alzheimer’s disease. J. Enzyme Inhib. Med. Chem.39(1), 2313682 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Bak, A. et al. Novel benzene-based carbamates for AChE/BChE inhibition: synthesis and ligand/structure-oriented SAR study. Int. J. Mol. Sci.20(7), 1524 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Yamazaki, D. A. S. et al. Novel arylcarbamate-N-acylhydrazones derivatives as promising BuChE inhibitors: Design, synthesis, molecular modeling and biological evaluation. Bioorg. Med. Chem.32, 115991 (2021). [DOI] [PubMed] [Google Scholar]
- 29.Wu, J., Pistolozzi, M., Liu, S. & Tan, W. Design, synthesis and biological evaluation of novel carbamates as potential inhibitors of acetylcholinesterase and butyrylcholinesterase. Bioorg. Med. Chem.28(5), 115324 (2020). [DOI] [PubMed] [Google Scholar]
- 30.Nour, H. et al. 2D-QSAR and molecular docking studies of carbamate derivatives to discover novel potent anti‐butyrylcholinesterase agents for Alzheimer’s disease treatment. Bull. Korean Chem. Soc.43(2), 277–292 (2022). [Google Scholar]
- 31.Toropov, A. A. & Toropova, A. P. The index of ideality of correlation: A criterion of predictive potential of QSPR/QSAR models? Mutat. Res. - Genet. Toxicol. Environ. Mutagen.819 (2017). [DOI] [PubMed]
- 32.Lotfi, S., Ahmadi, S. & Kumar, P. A hybrid descriptor based QSPR model to predict the thermal decomposition temperature of imidazolium ionic liquids using Monte Carlo approach. J. Mol. Liq. 338 (2021).
- 33.Ahmadi, S., Lotfi, S., Hamzehali, H. & Kumar, P. A simple and reliable QSPR model for prediction of chromatography retention indices of volatile organic compounds in peppers. RSC Adv.14(5), 3186–3201 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ahmadi, S. & Akbari, A. Prediction of the adsorption coefficients of some aromatic compounds on multi-wall carbon nanotubes by the Monte Carlo method. SAR QSAR Environ. Res.29(11), 895–909 (2018). [DOI] [PubMed] [Google Scholar]
- 35.Kumar, A., Kumar, P. & Singh, D. QSRR modelling for the investigation of gas chromatography retention indices of flavour and fragrance compounds on Carbowax 20 M glass capillary column with the index of ideality of correlation and the consensus modelling. Chemom. Intell. Lab. Syst.224(2022).
- 36.Toropova, A. P., Toropov, A. A., Roncaglioni, A. & Benfenati, E. Monte Carlo technique to study the adsorption affinity of azo dyes by applying new statistical criteria of the predictive potential. SAR QSAR Environ. Res.33(8), 621–630 (2022). [DOI] [PubMed] [Google Scholar]
- 37.Ahmadi, S. Mathematical modeling of cytotoxicity of metal oxide nanoparticles using the index of ideality correlation criteria. Chemosphere242, 125192 (2020). [DOI] [PubMed] [Google Scholar]
- 38.Lotfi, S., Ahmadi, S., Toropova, A. P. & Toropov, A. A. Construction of reliable QSPR models for predicting the impact sensitivity of nitroenergetic compounds using correlation weights of the fragments of molecular structures. Sci. Rep.15(1), 11160 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Toropova, A. P., Toropov, A. A., Kudyshkin, V. O., Leszczynska, D. & Leszczynski, J. Application of monomer structures and fragments of local symmetry for simulation of glass transition temperatures of polymers. SAR QSAR Environ. Res. 1–9 (2025). [DOI] [PubMed]
- 40.Ahmadi, S., Lotfi, S. & Kumar, P. A Monte Carlo method based QSPR model for prediction of reaction rate constants of hydrated electrons with organic contaminants. SAR QSAR Environ. Res.31(12) (2020). [DOI] [PubMed]
- 41.Duhan, M. et al. Quantitative structure activity relationship studies of novel hydrazone derivatives as α-amylase inhibitors with index of ideality of correlation. J. Biomol. Struct. Dyn.40(11) (2022). [DOI] [PubMed]
- 42.Roy, K., Das, R. N., Ambure, P. & Aher, R. B. Be aware of error measures. Further studies on validation of predictive QSAR models. Chemom Intell. Lab. Syst.152, 18–33 (2016). [Google Scholar]
- 43.Golbraikh, A. & Tropsha, A. Beware of q2! J. Mol. Graph. Model. (2002). [DOI] [PubMed]
- 44.Todeschini, R., Ballabio, D. & Grisoni, F. Beware of unreliable Q 2! A comparative study of regression metrics for predictivity assessment of QSAR models. J. Chem. Inf. Model.56(10), 1905–1913 (2016). [DOI] [PubMed] [Google Scholar]
- 45.Roy, K. et al. Comparative studies on some metrics for external validation of QSPR models. J. Chem. Inf. Model. (2012). [DOI] [PubMed]
- 46.Roy, K. et al. Introduction of rm2 (rank) metric incorporating rank-order predictions as an additional tool for validation of QSAR/QSPR models. Chemom Intell. Lab. Syst.118, 200–210 (2012). [Google Scholar]
- 47.Nachon, F., Brazzolotto, X., Trovaslet, M. & Masson, P. Progress in the development of enzyme-based nerve agent bioscavengers. Chem. Biol. Interact.206(3), 536–544 (2013). [DOI] [PubMed] [Google Scholar]
- 48.Nicolet, Y., Lockridge, O., Masson, P., Fontecilla-Camps, J. C. & Nachon, F. Crystal structure of human butyrylcholinesterase and of its complexes with substrate and products. J. Biol. Chem.278(42), 41141–41147 (2003). [DOI] [PubMed] [Google Scholar]
- 49.Fang, J. et al. Molecular Modeling on Berberine Derivatives Toward BuChE: an Integrated Study with Quantitative Structure–Activity Relationships Models, Molecular Docking, and Molecular Dynamics Simulations, Chemical Biology & Drug Design, Vol. 87, 649–663 (Wiley Online Library, 2016). [DOI] [PubMed]
- 50.Masson, P. & Lockridge, O. Butyrylcholinesterase for protection from organophosphorus poisons: catalytic complexities and hysteretic behavior. Arch. Biochem. Biophys.494(2), 107–120 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Uddin, R., Yuan, H., Petukhov, P. A., Choudhary, M. I. & Madura, J. D. Receptor-based modeling and 3D-QSAR for a quantitative production of the butyrylcholinesterase inhibitors based on genetic algorithm. J. Chem. Inf. Model.48(5), 1092–1103 (2008). [DOI] [PubMed] [Google Scholar]
- 52.Pang, X. et al. Evaluation of novel dual acetyl-and butyrylcholinesterase inhibitors as potential anti-Alzheimer’s disease agents using pharmacophore, 3D-QSAR and molecular docking approaches. Molecules22(8), 1254 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Bitam, S., Hamadache, M. & Hanini, S. Prediction of therapeutic potency of tacrine derivatives as BuChE inhibitors from quantitative structure–activity relationship modelling. SAR QSAR Environ. Res.29(3), 213–230 (2018). [DOI] [PubMed] [Google Scholar]
- 54.Kumar, S. et al. Exploiting butyrylcholinesterase inhibitors through a combined 3-D pharmacophore modeling, QSAR, molecular docking, and molecular dynamics investigation. RSC Adv.13(14), 9513–9529 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.El Allouche, Y. et al. Chemoinformatics study of benzodiazepine-1, 2, 3-triazole derivatives targeting butyrylcholinesterase. J. Fluoresc. 35(5), 3667–3680 (2025). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets generated and/or analyzed during the current study are available from the corresponding author request.










































