Skip to main content
In Silico Pharmacology logoLink to In Silico Pharmacology
. 2021 Apr 4;9(1):28. doi: 10.1007/s40203-021-00087-w

In silico local QSAR modeling of bioconcentration factor of organophosphate pesticides

Purusottam Banjare 1, Balaji Matore 1, Jagadish Singh 1, Partha Pratim Roy 1,
PMCID: PMC8019672  PMID: 33868896

Abstract

The persistent and accumulative nature of the pesticide of indiscriminate use emerged as ecotoxicological hazards. The bioconcentration factor (BCF) is one of the key elements for environmental assessments of the aquatic compartment. Limitations of prediction accuracy of global model facilitate the use of local predictive models in toxicity modeling of emerging compounds. The BCF data of diverse organophosphate (n = 55) was collected from the Pesticide Properties Database and used as a model data set in the present study to explore physicochemical properties and structural alert concerning BCF. The structures were downloaded from Pubchem, ChemSpider database. Two splitting techniques (biological sorting and structure-based) were used to divide the whole dataset into training and test set compounds. The QSAR study was carried out with two-dimensional descriptors (2D) calculated from PaDEL by applying genetic algorithm (GA) as chemometric tools using QSARINS software. The models were statistically robust enough both internally as well as externally (Q2: 0.709–0.722, Q2Ext: 0.717–0.903, CCC: 0.857–0.880). Overall molecular mass, presence of fused, and heterocyclic ring with electron-withdrawing groups affect the BCF value. The developed models reflected extended applicability domain (AD) and reliable predictions than the reported models for the studied chemical class. Finally, predictions of unknown organophosphate pesticides and the toxic nature of unknown organophosphate pesticides were commented on. These findings may be useful for the scientific community in prioritizing high potential pesticides of organophosphate class.

Keywords: BCF, QSAR, GA, Database, Aquatic, AD

Introduction

Since 1950 exponential rise in the population around the world increased the demand for food grains/crops with limited expansion of the agricultural land. Pesticides are widely used in agriculture without paying much heed to the consequences of its unregulated and indiscriminate use (Gerwick et al. 2014; Lema et al. 2014; Neve et al. 2009; Oerke 2006). Detection of pesticides and their degradations in soil, water and air at relevant levels have invoked public concern and are responsible for the adverse effects of pesticides to target and non-target organisms. The persistence, bioaccumulative, and toxic nature of agrochemical is responsible for different ecotoxicity. Some pesticides last as long as the environment (like DDT, chlordane). More specifically, the developing and agriculture-based countries like India consume much higher quantities of these chemicals (Köhler et al. 2013). Organophosphate pesticides are the most widely used as one of the cheapest pesticides. Many active ingredients (chlorpyrifos and Malathion) that are potentially dangerous to health are routinely found in food, breast milk (Gavrilescu 2005). An aquatic environment is often the final destination of many contaminants. European regulations also require the bioconcentration factor (BCF) values for registration of compounds for their safety management of concentration in water and the intern facilitates the daily intake of fish (Reach in Brief, European Commission, Environment Directorate General, 2007).

Bioconcentration is a hazard itself without acute or chronic toxicity (Grisoni et al. 2016). In environmental assessments of the aquatic compartment, the chemical property of interest in modeling fate and persistence of chemicals in the environment is bioconcentration factor (BCF) (Arnot and Gobas 2006; Wang et al. 2014). This indicates partitioning of compounds between organisms and the surrounding environment (Mackay and Fraser 2000; Voutsas et al. 2002). Experimental determination of BCF is an expensive and time-consuming process. Much attention was given to the in-silico techniques like QSAR methodology which establish the relationship between molecular structures with molecular property statistically and these non-animal models are being adopted by different regulatory agencies (REACH; EC No 1907/2006) for filling up the data gaps for different ecotoxicological endpoints including BCF (http://www.epa.gov/) for compounds without value for specific endpoints. In-silico QSAR/QSPR models play a pivotal role in recent aspects to reduce the enormous cost to perform the BCF test according to OECD 305 guidelines as well as to fill the data gaps. Therefore, investigation of BCF of pesticides is of utmost importance in recent scenarios.

Numerous studies had indicated good correlation with log BCF with Log P (n-octanol/water partition coefficient) and most of the computational studies (linear models) were derived from Log P and some nonlinear models to a minor extent with Log P were also observed. The octanol–water partition coefficient based models predict well and suited for lipid driven BCF but inadequate to polar interactions derived BCF. Several in silico BCF models for pesticide based on chromatographic retention (log k) in biopartitioning micellar chromatography (BMC) index, multivariate image analysis (MIA) descriptors, artificial membrane accumulation index, and online descriptors have been reported by different groups (Aranda et al. 2017; Bermúdez-Saldaña et al. 2005; Bintein et al. 1993; Devillers et al. 1996; Freitas et al. 2016; Fujikawa et al. 2009; Garg and Smith 2014; Gramatica and Papa 2003; Gramatica and Papa 2005; Grisoni et al. 2015 Ivanciuc et al. 2006; Mackay 1982; Nendza and Herbst 2011; Papa et al. 2007; Yuan et al. 2016).

Many QSAR models on BCF for large diverse dataset were reported by different researcher groups as reported earlier. The available models were not specific towards a specific chemical class of compounds and therefore the unknown compounds of a particular class probably will fall outside the domain of applicability of models in most of the times. This thus indicates the need for statistically robust models of specific class with extended applicability domain for reliable predictions of unknown compounds. Predictive nature of local modelswas considered for the development of local models.

Nowadays with exponential increases of computational capability and online resources through informatics help the researchers to develop online models with a reproducible performance for a wide range of user (Banjare et al. 2017; Igor et al. 2017). Recently, some new freely available web-server and new software tools such as ACFIS for fragment-based drug discovery, Py‑CoMFA web application for 3-D QSAR, DTC-QSAR, Cloud 3D-QSAR etc. have been introduced which can be helpful for the researchers in this field (Hao et al. 2016; Ragno 2019; Wang et al. 2020a, b; Wang et al. 2020a, b; Yang et al. 2020; https://dtclab.webs.com/software-tools). With this, we have collected the BCF data for the organophosphate class of pesticides from the footprint database and local QSAR models were developed with some online available resources and licensed freeware. The applicability domain of our model was compared with available reliable models for the validity and reliability of our models predictions. Additionally, for our endpoint (BCF) we compared our predictions with reported models like VEGA -CAESAR model (https://www.vegahub.eu/portfolio-item/vega-qsar/), VEGA-Meylan model (https://www.vegahub.eu/portfolio-item/vega-qsar/), VEGA Read-across model (https://www.vegahub.eu/portfolio-item/vega-qsar/)and EPISuite BCFBAF model (https://www.epa.gov/tsca-screening-tools/epi-suitetm-estimation-program-interface). Finally, our model was applied to unknown organophosphates to isolate the potentially hazardous and bioaccumulative nature of unknown chemicals.

Material and methods

Dataset and descriptor calculation

The starting point of the study was to collect the BCF data for organophosphate pesticides. We have collected the BCF data of 60 diverse organophosphate agrochemicals (Table S1) with quality of verified and reliable source from the footprint pesticide database (https://sitem.herts.ac.uk/aeru/ppdb/en/index.htm). Deletion of some compounds (acephate, bromophos, formothion, methamidophos, pyrimitate) were done after examination of applicability domain and potential influence on model predictive power and finally a set of 55 compounds were used as model dataset for the study (Yuan et al. 2016). We used the logarithmic value of BCF as response variable for the analysis. Additionally, we have collected 99 numbers of organophosphate pesticides without experimental BCFvalues. All the structures were downloaded from Pubchem and Chemspider for further analysis (https://pubchem.ncbi.nlm.nih.gov; http://www.chemspider.com/) in.mol and SDF format which is an input format for PaDEL descriptor software. A total 4759, 2-D (topological, electrostatic, fingerprint etc.) descriptors were calculated by using the freely available PaDEL descriptor software version V-2.21 (Yap 2011). Descriptors with constant, semi constant (80%) values and pair-wise correlation more than 0.85, were excluded in QSARINS (Gramatica et al. 2013) to reduce the redundancy and non-useful informations in the data (Gramatica et al. 2013). Finally, a set of 853 descriptors were used as input for QSAR modeling.

Model development

QSAR models were developed by Multiple Linear Regression (MLR) using the Ordinary Least Squares (OLS) method and the Genetic Algorithm-Variable Subset Selection (GA-VSS), included in QSARINS was applied for the selection of modeling descriptors(Gramatica et al. 2013, 2014). Two different splitting techniques, namely by ordered response/biological sorting, and structure based approach were applied for data division. Approximately 25% compounds were retained in test set (14) and remaining 75% (41) compounds were used for model development (Golbraikh et al. 2012; Roy et al. 2008). The models were evaluated internally and further validated for their application towards test set compounds. For the selection of models, we have selected the most stable combination of variables that appeasers in both the splitting of the user settings (Genetic iteration = 10,000, mutation rate = 50%). All the statistical analysis starting from descriptor pruning, model development, and applicability domain analysis was carried out in QSARINS running under the windows operating system (Gramatica et al. 2013). A flow chart of whole methodology was given in Fig. 1.

Fig. 1.

Fig. 1

Flowchart of the whole methodology

Statistical qualities

The statistical qualities of the equations were judged by the parameters such as squared correlation coefficient (R2), adjusted R2 (Ra2), and variance ratio (F) at specified degrees of freedom (df) (Cochran & Snedecor) R2 is a measure of explained variance. A modification of R2 (Ra2), has been proposed due to bias nature of R2 with further addition of variables. The stability of regression coefficient was cheeked by the variance ratio F. The generated QSAR equations were validated by the cross-validation coefficient Q2LOO (leave-one-out) and Q2LMO (leave-many-out, i.e. 30% of chemicals excluded in each iteration) and predicted residual sum of squares (PRESS) (Debnath et al. 2001; Eriksson et al. 1995; Roy 2007) internally. Additionally, the chance correlation between modeling descriptors and response was verified by Y-scrambling method by the parameter averaged R2 scrambled (R2ys) and averaged Q2scrambled (Q2ys) at 2000 iterations (Gramatica 2007; Gramatica et al. 2012).The performance of the models was evaluated on the test set compounds. The parameters for external validation include Q2-F1, Q2-F2, Q2-F3, r2m and concordance correlation coefficient (CCC) (Chirico and Gramatica 2011, 2012), Mean absolute error (MAE) (Chai and Draxler 2014; Consonni et al. 2009; Lin 1992; Roy and Roy 2008; Shi et al. 2001; Schüurmann et al. 2008). In addition, the root mean squared of errors (RMSE) (Gramatica 2020) prediction accuracy in the training (RMSETR) and in the prediction (RMSEP) sets were also calculated.

Applicability domain

QSAR models should be developed on a defined domain of compounds with known properties and structures of training set compounds. Leverage approach (for the structural and response outliers), was applied for applicability domain analysis of the BCF models (Gramatica 2007). The leverage method is based on the calculation of the hat matrix. Graphically, the plot of hat values (h) versus standardized residuals, i.e., the Williams graph; represent the response and structural outlier. Compounds with hat value larger than the warning leverage h* (3p/n, where p is the number of the model variables plus one, and n is the number of training compounds are the structural outlier and compounds with cross-validated standardized residuals greater than 2.5 standard deviation units are the response outliers. Additionally, Insubria Graphs on basis of leverage approach was considered for the study of applicability domain of the studied compounds (Gramatica et al. 2012). The data predicted for high leverage chemicals in the prediction set are extrapolated and could be less reliable.

Result and discussion

For the study two splitting techniques (biological sorting and structure based) were used to divide the whole dataset (n = 55) into training and test set compounds. The splitted equations along with the statistical parameters were listed in Table 1. By using GA, we have got several models in both the splitting. Most of the developed models contain auto correlation descriptors and the selected model descriptors appear most in different population of models and this is the basis for the selection of the model.The best combination of the descriptors from each splitting as Model 1 and 2 were obtained from the GA (10,000 iterations; mutation rate = 50 and other default setting.

Table 1.

Statistical parameters and equations of the developed models

Splitting Equation Internal validation External validation

Biological shorting

(Model 1)

LogBCF=19.001(±1.957)-0.102(±0.012)AATS6i+0.005(0.001)AATS5m-0.221(±0.070)MAXDP nTraining=41R2=0.758Ra2=0.738QLOO2=0.721QLMO2=0.711MAETraining=0.379PRESS=12.228F=38.522RMSETraining=0.508RYscr2=0.073QYscr2=-0.154 nTraining=14QF12=0.718QF22=0.717QF32=0.760MAETest=0.400rm(test)2=0.644RMSEP=0.506CCC=0.857

Structure Based

(Model 2)

LogBCF=18.826(±2.015)-0.100(0.012)AATS6i+0.004(±0.002)AATS5m-0.228(±0.0797)MAXDP nTraining=41R2=0.747Ra2=0.727QLOO2=0.709QLMO2=0.691MAETraining=0.419PRESS=14.421F=36.413RMSETraining=0.553RYscr2=0.074QYscr2=-0.154 nTraining=14QF12=0.718QF22=0.757QF32=0.903MAETest=0.289rm(test)2=0.699RMSEP=0.342CCC=0.880

Full Model

(Model 3)

LogBCF=18.548(±1.672)-0.098(±0.010)AATS6i+0.005(±0.001)AATS5m-0.218(±0.061)MAXDP nTraining=55R2=0.750Ra2=0.735QLOO2=0.722QLMO2=0.712MAETraining=0.379PRESS=15.728RMSETraining=0.507

The best combination of descriptors appeared in the equations are mainly governed by the autocorrelation descriptors (AATS6i and AATS5m) and Electro topological descriptor (MAXDP). The order of the importance of descriptors based on standardized coefficient is AATS6i > AATS5m > MAXDP. The standard errors of regression coefficient are given within parentheses. The splitted models (Model 1and 2) could explain 72.7–73.8% of the variance (adjusted coefficient of variation). The leave-one-out predicted variance was found to be 70.9–72.2%. Prediction accuracy in both training and test set was reflected by very small differences in corresponding RMSE values. For both the division similar combinations of descriptors appeared for GA. The differences lie only in the regression coefficient due to different input of training set compounds. Finally, a full model based on same combination of descriptors (using similar settings in GA) was obtained which was further applied to unknown organophosphates with broader applicability domain (Roy et al. 2011, 2019).

The parameter AATS6i (Average Broto-Moreau autocorrelation—lag 6/weighted by first ionization potential) contributes negatively towards the BCF value. This property is directly correlated with the hydrophobicity/polarity of the molecules. The compounds containing the functional polar groups –NH2, −OH group, heterocyclic ring (Dimefox, Trichlorfon, Azamethiphos, Omethoat, Azinphos-methyl, Dimethoate, and Vamidothion) were supposed to susceptible to ionization and showed high numerical value of this descriptor with less BCF values. On the other hand, organophosphates containing aromatic rings with or without halogen (Cl, Br) substitution (Leptophos, dichlofenthion, sulprofos) Chlorethoxyfos (polychloro alkyl substitution)bromophos-ethyl, pirimiphos-ethyl, EPN, temephos, chlorpyrifos-methyl, phoxim, chlorpyrifos, profenofos showed higher BCF values with the numerical value of the above parameter in lowest range in the dataset. Inclusion of halogen mainly Cl, I, Br increases the lipophilicity nature of the molecules (Roy et al. 2011).This indicates that the lipophilicity nature of the compound is responsible for bioconcentration factor which is supported by various literatures.graphic file with name 40203_2021_87_Figa_HTML.jpg

The parameter AATS5m (Average Broto-Moreau autocorrelation—lag 5/weighted by mass) showed the positive contribution towards the BCF value. Compounds having higher molecular weight (Chlorpyrifos-methyl, Chlorethoxyfos, Leptophos, Chlorpyrifos, Bromophos-ethyl, Profenofos, Dichlofenthion, Carbophenothion, Temephos, Phoxim) showed the high value of BCF as compare to low molecular weighed compounds (Dimefox, Cadusafos, Dicrotophos, Vamidothion, Terbufos, Mesulfenfos, Omethoat, Fenitrothion).graphic file with name 40203_2021_87_Figb_HTML.jpg

The parameter MAXDP is related to the electrophilicity of the molecules. This descriptor shows the negative contribution towards the BCF value mainly governed by electron withdrawing (Halogens) and electron donating (Alkyl, methoxy, ethoxy) groups. Attachment of halogens to aromatic ring of organophosphate (Chlorpyrifos,Chlorpyrifos-methyl,Carbophenothion, Dichlofenthion, Bromophos-ethyl, and Chlorethoxyfos) the value of BCF increases and on the other hand the presence of electron donating groups like Alkyl, methoxy, ethoxy (Cadusafos, Fenamiphos,Azinphos-methyl, Azinphos-ethyl, Azamethiphos, Dimefox, Omethoat, Dicrotophos) decreases the value of BCF. Scattered plots of the developed models (indicating observed vs predicted BCF value) were reported in Fig. 2.graphic file with name 40203_2021_87_Figc_HTML.jpg

Fig. 2.

Fig. 2

Scattered plots: A = Model 1, B = Model 2, C = Model 3, D = Consensus (Model 1 and Model 2)

Applicability domain of developed models

The applicability domain was calculated for the all model to determine the reliability of the models by Leverage Approach (Gramatica et al. 2012). The plots (William Plots) were given in Fig. 3. The plots indicated the compound mecarbam was outside the AD for all model, while the compound dimefox was outside the AD for model 2 and full model (Model 3) and the compound chlorpyrifos-methyl was outside the AD for full model (Model 3).We have additionally, collected 99 organophosphates pesticides without experimental value of BCF from footprint database. The insubria plot for full model helped to analyze the applicability domain broadly. The developed full model was applied to these unknown chemicals in order to calculate the applicability domain (AD). Almost all the compounds were found to be inside the applicability domain of our developed model except the compound Mazidox (structural outlier). These indicate that the predictions are not from extrapolation and assumed to be reliable estimates. The plots of AD were given in Fig. 4.

Fig. 3.

Fig. 3

William plots: A = Model 1, B = Model 2, C = Model 3

Fig. 4.

Fig. 4

Insubria plots: A = Model 1, B = Model 2, C Model 3

Prediction reliability of the developed models

Initially prediction reliability of the developed models was confirmed by the AD analysis. With the objective of assuring our predictions with available standard models, we compared the predictions of our models with VEGA-CAESAR model, VEGA-Meylan model, VEGA Read-across model and EPISuite BCFBAF model. Initially, prediction of the above model was compared with studied modeled compounds.The AD analysis revealed 42% (23 compounds),and 33% (18 compounds) compounds were inside for Meylan model Read-across model respectively (Table 2). The correlation of determination for the inside AD compounds prediction with experimental BCF value were found to be 0.784 and 0.504 for Meylan and Read-across models respectively and RMSE values were observed 0.515 and 0.880 respectively (Table 3). For EPISUITE model all the compounds were inside and we found R2 and RMSE value with experimental BCF value of 0.614 and 0.738 respectively.

Table 2.

Applicability domain of reported models

Models Training Unknown
AD Inside AD out side AD Inside AD out side
VEGA (CAESAR model) 3 52 01 98
VEGA (Meylan model) 23 32 36 63
VEGA (Read-across model) 18 37 18 81
EPISuite BCFBAF (Meylan Model) 55 0 99 0
Our model 53 02 98 01

Table 3.

Prediction comparison inside AD compounds (training and unknown) for reported models and developed model

Models R2 RMSE No. of compounds
Prediction comparison inside AD compounds vs Experimental BCF
 VEGA (CAESAR model)
 VEGA (Meylan model) 0.784 0.515 23
 VEGA (Read-across model) 0.504 0.880 18
 EPISuite BCFBAF (Meylan Model) 0.614 0.738 55
Our Model 0.750 0.506 53
Prediction comparison inside AD compounds: Our model vs Standard model (Known Organophospate)
 VEGA (Meylan model) 0.827 23
 VEGA (Read-across model) 0.707 18
 EPISuite BCFBAF (Meylan Model) 0.614 53
Prediction comparison inside AD compounds- Our model vs Standard model (unknown organophosphates)
 VEGA (Meylan model) 0.718 36
 VEGA (Read-across model) 0.704 18
 EPISuite BCFBAF (Meylan Model) 0.503 99

Meylan and read across predictions were agreement with our predictions by 82.7% and 70.7% respectively for inside AD compounds (Table 3). EPISuite predictions were also correlated more than 60% with our predictions (Table 3). This indeed gave emphasis that our models with extended AD were able to provide more reliable estimation for organophosphates class of compounds. Guiding by these observations we compared the predictions for unknown compounds (99 compounds) with different models as mentioned above and the more than 71.8% of correlation was found in case of Meylan model for 36 inside AD compounds and 70.4% correlation was observed for read across mode for 18 inside AD compounds. EPISuite BCFBAF predictions with our prediction were 50% in agreement for all the unknown compounds (Table 3).

On the basis of prediction, we found 57 compounds having the BCF value in between (102–4866) i.e. the threshold for concern according to USEPA threshold (Table S2), 40 compounds found to be low potential (< 100) for the health hazard and one compound(trichlormetaphos-3)was found to be high potential (> 5000). The outside AD compound was not considered for the classification. According to the predictions, the compounds cyanofenphos, and prothiofos were found to be in the border line of high potential (Table S1).graphic file with name 40203_2021_87_Figd_HTML.jpg

Overview and conclusion

With the objective of developing statistically robust and reproducible BCF model for organophosphate pesticides, a set of 55 organophosphates with experimental BCF value were collected from footprint database. The models were developed according to OECD guidelines of QSAR model validation like unambiguous algorithm, statistical robustness, applicability domain analysis and mechanistic interpretation. Randomization results also indicate that the models were not by chance. The combination of descriptors appeared in the models indicates the importance of autocorrelation descriptors (AATS6i and AATS5m) weighted by either ionization potential or mass and electro topological descriptor (MAXDP) related to electrophilicity. Overall BCF value is affected by the molecular mass, presence of aromatic (halogen substitution), fused and heterocyclic ring with electron withdrawing/ donating groups. The results also highlighted organophosphate pesticides poorly laid in AD of the reported models. The models were able to give reliable estimates as compared to different available established models in terms of applicability domain as well as predictions specifically for organophosphate class. The results also encourage the application of the models for organophosphates without experimental values for the studied endpoint for their possible prioritization.Therefore; the developed open source model for BCF for organophosphate pesticides could be a value added model for prioritization and risk assessment of the defined class.

Acknowledgments

Financial assistance from the SCIENCE& ENGINEERING RESEARCHBOARD (SERB) DST, Govt.of India, New Delhi (File No. EMR/2017/004497) is gratefully acknowledged by Dr. Partha Pratim Roy. The authors acknowledge Prof. Paola Gramatica for the free license of QSARINS

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Aranda JF, Bacelo DE, Leguizamón Aparicio MS, Ocsachoque MA, Castro EA, Duchowicz PR. Predicting the bioconcentration factor through a conformation-independent QSPR study. SAR QSAR Environ Res. 2017;28:749–763. doi: 10.1080/1062936X.2017.1377765. [DOI] [PubMed] [Google Scholar]
  2. Arnot JA, Gobas FA. A review of bioconcentration factor (BCF) and bioaccumulation factor (BAF) assessments for organic chemicals in aquatic organisms. Environ Rev. 2006;14:257–297. doi: 10.1139/a06-005. [DOI] [Google Scholar]
  3. Banjare P, Singh J, Roy PP. Design and combinatorial library generation of 1H 1,4 benzodiazepines 2,5 diones as photosystem-II inhibitors: a public QSAR approach. Beni-SuefUni J Bas App Sci. 2017;6:219–231. [Google Scholar]
  4. Bermúdez-Saldaña J, Escuder-Gilabert ML, Medina-Hernández MJ, Villanueva-Camañas RM, Sagrado S. Modelling bioconcentration of pesticides in fish using biopartitioning micellar chromatography. J Chromatogr A. 2005;1063:153–160. doi: 10.1016/j.chroma.2004.11.074. [DOI] [PubMed] [Google Scholar]
  5. Bintein S, Devillers J, Karcher W. Nonlinear dependence of fish bioconcentration on n-octanol/water partition coefficient. SAR QSAR Environ Res. 1993;1:29–39. doi: 10.1080/10629369308028814. [DOI] [PubMed] [Google Scholar]
  6. Chai T, Draxler RR. Root mean square error (RMSE) or mean absolute error (MAE)?—Arguments against avoiding RMSE in the literature. Geosci Model Dev. 2014;7:1247–1250. doi: 10.5194/gmd-7-1247-2014. [DOI] [Google Scholar]
  7. Chirico N, Gramatica P. Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. J ChemInf Model. 2011;51(9):2320–2335. doi: 10.1021/ci200211n. [DOI] [PubMed] [Google Scholar]
  8. Chirico N, Gramatica P (2012) Real external predictivity of QSAR Models. Part2. New inter-comparable thresholds for different validation criteria and the need for scatter plot inspection. J Chem inf Model 52(8):2044–2058 [DOI] [PubMed]
  9. Cochran WG, Snedecor GW (2021) Statistical Methods. Oxford & IBH, New Delhi
  10. Consonni V, Ballabio D, Todeschini R. Comments on the definition of the Q2 parameter for QSAR validation. J ChemInf Model. 2009;49:1669–1678. doi: 10.1021/ci900115y. [DOI] [PubMed] [Google Scholar]
  11. Debnath AK, Ghose AK, Viswanadhan VN. Combinatorial library design and evaluation: principles, software, tools and application in drug discovery. New York: Marcel Dekker Inc; 2001. pp. 73–129. [Google Scholar]
  12. Devillers J, BinteinS DD. Comparison of BCF models based on log P. Chemosphere. 1996;33:1047–1065. doi: 10.1016/0045-6535(96)00246-9. [DOI] [Google Scholar]
  13. Eriksson L, Wold S (1995) In: Waterbeemd, HVD (Eds) Chemometric methods in molecular design. Willy VCH: Weinheim, 312–317
  14. Freitas MR, Barigye SJ, Daré JK, Freitas MP. Quantitative modeling of bioconcentration factors of carbonyl herbicides using multivariate image analysis. Chemosphere. 2016;152:190–195. doi: 10.1016/j.chemosphere.2016.03.011. [DOI] [PubMed] [Google Scholar]
  15. Fujikawa M, Nakao K, Shimizu R, Akamatsu M. The usefulness of an artificial membrane accumulation index for estimation of the bioconcentration factor of Organophosphorus pesticide. Chemosphere. 2009;74:751–757. doi: 10.1016/j.chemosphere.2008.10.046. [DOI] [PubMed] [Google Scholar]
  16. Garg R, Smith CJ. Predicting the bioconcentration factor of highly hydrophobic organic chemicals. Food ChemToxicol. 2014;69:252–259. doi: 10.1016/j.fct.2014.03.035. [DOI] [PubMed] [Google Scholar]
  17. Gavrilescu M. Fate of pesticides in the environment and its bioremediation. Eng Life Sci. 2005;30:497–526. doi: 10.1002/elsc.200520098. [DOI] [Google Scholar]
  18. Gerwick BC, Sparks TC. Natural products for pest control: an analysis of their role, value and future. Pest Manag Sci. 2014;70:1169–1185. doi: 10.1002/ps.3744. [DOI] [PubMed] [Google Scholar]
  19. Golbraikh A, Harten P, Martin TM, Muratov EN, Young DM, Tropsha A, Zhu H. Does rational selection of training and test sets improve the outcome of QSAR modeling? J Chem Inf Mod. 2012;52:2570–2578. doi: 10.1021/ci300338w. [DOI] [PubMed] [Google Scholar]
  20. Gramatica P. Principles of QSAR models validation: internal and external. Qsar Comb Sci. 2007;26:694–770. doi: 10.1002/qsar.200610151. [DOI] [Google Scholar]
  21. Gramatica P. Principles of QSAR modeling: comments and suggestions from personal experience. Int J Quant Struc Prop Relat. 2020;5(3):1–37. [Google Scholar]
  22. Gramatica P, Papa E. QSAR modeling of bioconcentration factor by theoretical molecular descriptors. QSAR Comb Sci. 2003;22:374–385. doi: 10.1002/qsar.200390027. [DOI] [Google Scholar]
  23. Gramatica P, Papa E. An update of the BCF QSAR model based on theoretical molecular descriptors. QSAR Comb Sci. 2005;24:953–960. doi: 10.1002/qsar.200530123. [DOI] [Google Scholar]
  24. Gramatica P, Cassani S, Roy PP, Kovarich S, Yap CW, Papa E. QSAR modeling is not “push a button and find a correlation”: a case study of toxicity of (benzo-)triazoles on algae. Mol Inf. 2012;31:817–835. doi: 10.1002/minf.201200075. [DOI] [PubMed] [Google Scholar]
  25. Gramatica P, Chirico N, Papa E, Kovarich S, Cassani S. QSARINS: a new software for the development, analysis, and validation of QSAR MLR models. J ComputChemSoftw News Updates. 2013;34:2121–2132. [Google Scholar]
  26. Gramatica P, Cassani S, Chirico N. QSARINS-Chem: insubria datasets and new QSAR/QSPR models for environmental pollutants in QSARINS. J ComputChem. 2014;35:1036–1044. doi: 10.1002/jcc.23576. [DOI] [PubMed] [Google Scholar]
  27. Grisoni F, Consonni V, Villa S, Vighi M, Todeschini R. QSAR models for bioconcentration: is the increase in the complexity justified by more accurate predictions? Chemosphere. 2015;127:171–179. doi: 10.1016/j.chemosphere.2015.01.047. [DOI] [PubMed] [Google Scholar]
  28. Grisoni F, Consonni V, Vighi M, Villa S, Todeschini R. Expert QSAR system for predicting the bioconcentration factor under the REACH regulation. Env Res. 2016;148:507–512. doi: 10.1016/j.envres.2016.04.032. [DOI] [PubMed] [Google Scholar]
  29. Hao GF, Jiang W, Ye YN, Wu FX, Zhu XL, Guo FB, Yang GF. ACFIS: a web server for fragment-based drug discovery. Nucl Acid Res. 2016;44(W1):W550–W556. doi: 10.1093/nar/gkw393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Igor TV, Uko M, Tropsha A. Public (Q)SAR services, integrated modeling environments, and model repositories on the web: state of the art and perspectives for future development. MolInf. 2017;36:1–14. doi: 10.1002/minf.201600082. [DOI] [PubMed] [Google Scholar]
  31. Ivanciuc T, Ivanciuc O, Klein DJ. Modelling the bioconcentration factors and bioaccumulation factors of polychlorinated biphenyls with posetic quantitative super-structure/activity relationships (QSSAR) Mol Divers. 2006;10(2):133–145. doi: 10.1007/s11030-005-9003-3. [DOI] [PubMed] [Google Scholar]
  32. Köhler HR, Triebskorn R. Wildlife ecotoxicology of pesticides: can we track effects to the population level and beyond? Science. 2013;341:759–765. doi: 10.1126/science.1237591. [DOI] [PubMed] [Google Scholar]
  33. Lema E, Machunda R, Njau KN. Agrochemicals use in horticulture industry in Tanzania and their potential impact to water resources. Int J Biol Chem Sci. 2014;8:831–842. doi: 10.4314/ijbcs.v8i2.38. [DOI] [Google Scholar]
  34. Lin L. Assay validation using the concordance correlation coefficient. Biometrics. 1992;48:599–660. doi: 10.2307/2532314. [DOI] [Google Scholar]
  35. Mackay D. Correlation of bioconcentration factors. Environ Sci Tech. 1982;16:274–278. doi: 10.1021/es00099a008. [DOI] [PubMed] [Google Scholar]
  36. Mackay D, Fraser A. Bioaccumulation of persistent organic chemicals: mechanisms and models. Environ Pollut. 2000;110:375–391. doi: 10.1016/S0269-7491(00)00162-7. [DOI] [PubMed] [Google Scholar]
  37. Nendza M, Herbst T. Screening for low aquatic bioaccumulation (2): physico-chemical constraints. SAR QSAR Environ Res. 2011;22:351–364. doi: 10.1080/1062936X.2011.569896. [DOI] [PubMed] [Google Scholar]
  38. Neve P, Vila-Aiub M, Roux F (2009) Evolutionary-thinking in agricultural weed management. The New Phyto 184:783–793 [DOI] [PubMed]
  39. Oerke EC. Crop losses to pest. J Agric Sci. 2006;144:31–43. doi: 10.1017/S0021859605005708. [DOI] [Google Scholar]
  40. Papa E, Dearden J, Gramatica P. Linear QSAR regression models for the prediction of bioconcentrationfactors by physicochemical properties and structural theoretical molecular descriptors. Chemosphere. 2007;67:351–358. doi: 10.1016/j.chemosphere.2006.09.079. [DOI] [PubMed] [Google Scholar]
  41. Pliška V, Testa B, Waterbeemd H (2008) Lipophilicity in drug action and toxicology. In: Methods and principles in medicinal chemistry
  42. Ragno R. www.3d-qsar.com: a web portal that brings 3-D QSAR to all electronic devices—the Py-CoMFA web application as tool to build models from pre-aligned datasets. J Comp Aid Mole Des. 2019;33:855–864. doi: 10.1007/s10822-019-00231-x. [DOI] [PubMed] [Google Scholar]
  43. Reach in Brief, European Commission, Environment Directorate General (2007)
  44. Roy K. On some aspects of validation of predictive quantitative structure-activity relationship models. Exp Opin Drug Discov. 2007;2:1567–1577. doi: 10.1517/17460441.2.12.1567. [DOI] [PubMed] [Google Scholar]
  45. Roy PP, Roy K. On some aspects of variable selection for partial least squares regression models. QSAR Comb Sci. 2008;27:302–313. doi: 10.1002/qsar.200710043. [DOI] [Google Scholar]
  46. Roy PP, Leonard JT, Roy K. Exploring the impact of the size of training sets for the development of predictive QSAR models. ChemomIntell Lab Syst. 2008;90:31–42. doi: 10.1016/j.chemolab.2007.07.004. [DOI] [Google Scholar]
  47. Roy PP, Kovarich S, Gramatica P. QSAR model reproducibility and applicability: a case study of rate constants of hydroxyl radical reaction models applied to polybrominated diphenyl ethers and (benzo-)triazoles. J Comput Chem. 2011;32(11):2386–2396. doi: 10.1002/jcc.21820. [DOI] [PubMed] [Google Scholar]
  48. Roy PP, Banjare P, Verma S, Singh J. acute rat and mouse oral toxicity determination of anticholinesterase inhibitor carbamate pesticides: a QSTR approach. MolInf. 2019;38:1–17. doi: 10.1002/minf.201800151. [DOI] [PubMed] [Google Scholar]
  49. Schüurmann G, Ebert RU, Wang B, Kuehne R. External validation and prediction employing the predictive squared correlation coefficient—test set activity mean vs training set activity mean. J ChemInf Model. 2008;48:2140–2145. doi: 10.1021/ci800253u. [DOI] [PubMed] [Google Scholar]
  50. Shi LM, Fang H, Tong WD, Wu J, Perkins R, Blair RM, Branham WS, Dial SL, Moland CI, Sheehan DM. QSAR models using a large diverse set of estrogens. J ChemInf Comput Sci. 2001;41:186–195. doi: 10.1021/ci000066d. [DOI] [PubMed] [Google Scholar]
  51. Voutsas E, Magoulas K, Tassios D. Prediction of the bioaccumulation of persistent organic pollutants in aquatic food webs. Chemosphere. 2002;48:645–651. doi: 10.1016/S0045-6535(02)00144-3. [DOI] [PubMed] [Google Scholar]
  52. Wang Y, Wen Y, Li JJ, He J, Qin WC, Su LM, Zhao YH. Investigation on the relationship between bioconcentration factor and distribution coefficient based on class-based compounds: The factors that affect bioconcentration. Environ Toxicol Pharmacol. 2014;38:388–396. doi: 10.1016/j.etap.2014.07.003. [DOI] [PubMed] [Google Scholar]
  53. Wang F, Yang JF, Wang MY, Jia CY, Shi XX, Hao GF, Yang GF. Graph attention convolutional neural network model for chemical poisoning of honey bees’ prediction. Sci Bull. 2020;65(14):1–8. doi: 10.1007/BF02900596. [DOI] [PubMed] [Google Scholar]
  54. Wang YL, Wang F, Shi XX, Jia CY, Wu FX, Hao GF, Yang GF (2020) Cloud 3D-QSAR: a web tool for the development of quantitative structure–activity relationship models in drug discovery. Brief Bioinfo: 1–8 [DOI] [PubMed]
  55. Yang JF, Wang F, Chen YZ, Hao GF, Yang GF. LARMD: integration of bioinformatic resources to profile ligand-driven protein dynamics with a case on the activation of estrogen receptor. Brief Bioinf. 2020;21(6):2206–2218. doi: 10.1093/bib/bbz141. [DOI] [PubMed] [Google Scholar]
  56. Yap CW. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011;32:1466–1474. doi: 10.1002/jcc.21707. [DOI] [PubMed] [Google Scholar]
  57. Yuan J, Xie C, Zhang T, Sun J, Yuan X, Yu S, Zhang Y, Cao Y, Yu X, Yang X, Yao W. Linear and nonlinear models for predicting fish bioconcentration factors for pesticides. Chemosphere. 2016;156:334–340. doi: 10.1016/j.chemosphere.2016.05.002. [DOI] [PubMed] [Google Scholar]

Articles from In Silico Pharmacology are provided here courtesy of Springer

RESOURCES