Skip to main content
Toxicology Research logoLink to Toxicology Research
. 2016 Feb 29;5(3):773–787. doi: 10.1039/c5tx00493d

In silico prediction of the developmental toxicity of diverse organic chemicals in rodents for regulatory purposes

Nikita Basant a, Shikha Gupta b, Kunwar P Singh b,
PMCID: PMC6061034  PMID: 30090388

graphic file with name c5tx00493d-ga.jpgThe figure shows performance of the local and global QSAR and ISC-QSAAR models in predicting the developmental toxicity potential of chemicals in rodents.

Abstract

The experimental determination of the developmental toxicity potential (LEL) of chemicals is not only tedious, time and resource intensive, but it also involves unethical tests on animals. In this study, we have established quantitative structure activity relationship (QSAR) models for predicting the developmental toxicity potential of chemicals in rodents following the OECD guidelines. Accordingly, decision tree forest (DTF) and decision tree boost (DTB) based local (L-QSAR), global (G-QSAR) and interspecies quantitative structure activity–activity relationship (ISC QSAAR) models were developed for estimating the LEL (lowest effective level) dose of chemicals for developmental toxicity in rats and rabbits. The structural features of chemicals responsible for developmental toxicity in rodents were extracted and used in QSAR/QSAAR analysis. The external predictive power of the developed models was evaluated through the internal and external validation procedures. In test data, the L-QSAR models (DTF, DTB) yielded R2 values of >0.846 (rat) and >0.906 (rabbit), whereas in G-QSAR, the correlation value was >0.870 between the measured and predicted endpoint values. In ISC QSAAR models, the R2 values in test data were 0.830 (DTF) and 0.927 (DTB), respectively. Values of various statistical validation coefficients derived from the test data (except rm2 in DTF based rat L-QSAR and ISC QSAAR models) were above their respective threshold limits, thus putting a high confidence in this analysis. The prediction quality of the developed QSAR/QSAAR models was also assessed using the mean absolute error (MAE) criteria and found good. The applicability domains of the constructed models were defined using the descriptor range, leverage, and standardization approaches. The results suggest that the developed QSAR/QSAAR models can reliably predict the developmental toxicity potential of structurally diverse chemicals in rodents, generating useful toxicity data for risk assessment in humans.

1. Introduction

Chemicals are inherently present in the environment in varying concentrations and exposure to many of these has been linked to various toxic effects, including developmental toxicity in animals. Developmental toxicity refers to adverse effects produced by an exposure prior to conception, or during pregnancy and childhood.1 It is one of the most important toxicological endpoints under European Union's REACH2 (Registration, Evaluation, Authorization and Restriction of Chemicals) regulations and USEPA3 (United States Environmental Protection Agency) requirements. The prenatal developmental toxicity study is undertaken to identify chemicals that may pose a risk to the developing fetus if pregnant women are exposed. The results of animal studies are used by regulatory agencies to help set human exposure guidelines.4 The USEPA and Organization for Economic Cooperation and Development (OECD) have outlined the experimental methods for determining a chemical's developmental toxicity potential in rats and rabbits.5,6 EPA's toxicity reference database (ToxRefDB) has been implemented with animal based toxicity data from chronic, multi-generation reproduction and prenatal developmental toxicity studies in rats and rabbits.1,7,8 The LEL (lowest effect level) dose of a chemical is considered an appropriate endpoint in developmental toxicity studies. The LEL is the minimum dose of chemicals for which any specific effect or group of effects1 is observed. These tests give an integrated evaluation of developmental toxicity over a broad dose range, and hence, these are labor intensive, costly and require large number of test animals. Although the experimental test methods based on extrapolating across dose and species are generally regarded as effective,9 in view of the large number of chemicals that already are in use and those newly entering into commercial use every day, it is almost impossible to screen them for their developmental toxicity potential assessment using the experimental protocols. Therefore, there has been much attention devoted to finding in vitro alternatives that can effectively screen a large number of compounds for their effects on complex pathways relevant to developmental processes and toxicities.10 The increasing public interest in reducing animal tests and the industry requirement to respond to the seventh amendment of the European Union (EU) cosmetics directive11 are driving the need to develop alternatives to animal testing such as in vitro and computational methods. A number of promising in vitro methods1215 have been developed and used; however, the current in vitro tests are not sufficient to replace whole animal tests for developmental toxicity.16 Recently, the European Union's REACH legislation requires toxicological hazard and risk assessments for all new and existing chemicals17 and advocates for the use of sufficiently validated computational prediction models based on QSAR (quantitative structure–activity relationship) to fill in the toxicity data gaps, and thus save time and money and help to reduce the number of animals used for experimental testing purposes.18 QSAR offers an in silico tool for the development of predictive models towards various activities and property endpoints of a series of chemicals using the response data that have been determined through experiments and molecular structure information.19 The OECD has provided guidelines for QSAR model development and validation for regulatory purposes.20 Therefore, robust QSAR models based on an appropriate approach, and validated rigorously through OECD recommended procedures are essentially required for the step-wise screening of chemicals for their developmental toxicity potential (LEL) for safety assessment.

The main objective of this study is to identify relevant structural features of the chemicals responsible for their developmental toxicity potential in rats and rabbits, and to develop reliable local and global QSAR models for predicting the developmental toxicity of chemicals in multiple rodent species in accordance with the OECD guidelines. Accordingly, several structural features of the chemicals considered here were extracted and regression QSAR models based on the decision tree forest (DTF), and decision tree boost (DTB) approaches were constructed, which were rigorously validated using the OECD recommended statistical parameters to ensure their external predictive power. An attempt was also made to develop interspecies correlation (ISC) based quantitative structure activity–activity relationship (QSAAR) models for predicting the developmental toxicity (LEL) in rodent species (rat and rabbit).

2. Materials and methods

In this study, we intend to develop local QSAR (L-QSAR), global QSAR (G-QSAR) and ISC QSAAR models for screening the chemicals for their developmental toxicity potential (LEL) in rats and rabbits in accordance with the OECD guidelines. Separate L-QSAR models were constructed for LEL prediction in rats and rabbits and G-QSAR models were developed using the combined toxicity data (uncommon compounds) of both the rodent species. An ISC QSAAR model was established using the common compounds in the two toxicity datasets. Adhering to the OECD guidelines, strict rules were followed for the selection of a definite dataset with a defined endpoint (principle 1), an explainable model building strategy in view of the nature of the selected data (principle 2), a defined applicability domain of the constructed models (principle 3), appropriate validation strategies corresponding to the goodness of fit, robustness and predictivity (principle 4), and finally offering possible mechanistic interpretation of the developed models (principle 5). A schematic diagram showing the modeling steps is presented in Fig. 1. Our intention here was to save the computational efforts and cost at the end of the user.

Fig. 1. A flow diagram showing QSAR modeling and OECD guidelines for predicting the developmental toxicity potential of chemicals.

Fig. 1

2.1. The dataset

Prenatal developmental toxicity data (oral LEL, mg kg per d per body weight) of chemicals in rats and rabbits were collected from the ToxRefDB.21 This database contained developmental toxicities of 1572 compounds, which have been generated according to the OPPTS guideline.5 According to Knudsen et al.1 the developmental toxicity effects investigated were the maternal body weight gain, maternal pregnancy losses, embryo-fetal losses, fetal weight reduction, and defects such as skeletal, urogenital, orofacial, neurosensory, cardiovascular, visceral (splanchnic), body wall (somatic), etc. This database provides a novel resource for building predictive models of chemical toxicity, with the initial goal of prioritizing chemicals for further testing.22 In order to obtain a high quality dataset, a rigorous screening process was applied here. All the mixtures, duplicates and salts were removed. Finally, a total of 286 chemicals in rat and 194 chemicals in rabbit toxicity data were retained for separate L-QSAR analysis. The rat toxicity data contained 224 pesticides, 14 pharmaceuticals and 48 other organic chemicals, whereas, the rabbit data contained 170 pesticides, 11 pharmaceuticals and 13 organic chemicals. Among the rat and rabbit toxicity data, 124 chemicals were common in both datasets, which were used for ISC QAAR analysis. Chemicals which were not common in rat and rabbit toxicity data sets (356) were used for developing G-QSAR models. Prior to the QSAR analysis, the LEL values were converted into the negative logarithmic scale (pLEL, mmol kg per d per bw). For the two test species, the endpoint toxicity (pLEL) values ranged between –1.87 and 4.48 (rat) and –1.12 and 4.18 (rabbit), respectively (Table S1, ESI). The chemical toxicity data in rats and rabbits were analyzed statistically by generating the Box–Whiskers diagrams (Fig. S1, ESI). These diagrams summarize each of the toxicity data from a central point to indicate the central tendency (median); a box to indicate variability around this central tendency (25th and 75th percentiles); and whiskers around the box to indicate the range of the data.

2.2. The molecular descriptors

In total 1444 1D and 2D molecular descriptors were calculated for all the compounds. For calculating the descriptors, SMILES (simplified molecular input line entry system) of the compound were converted into sdf files and were used in the PaDEL program.23 The SMILES of the compounds were obtained from the ChemSpider.24 The chemical structures available in ChemSpider corresponding to the SMILES of the considered molecules were compared with those in the PubChem.25 The SMILES of compounds, for which the chemical structures were found to be different, were taken from the PubChem for descriptor calculation. The calculated descriptors belong to the constitutional, autocorrelation, Basak, BCUT, Burden, connectivity, E-state, Kappa, extended topochemical atom (ETA), molecular property, and topological. Although, during the development of the models, all the descriptors in the pool were used in order to identify the most relevant features, in the final QSAR models the descriptors that can demonstrate the physical meaning of the structural attributes of molecules were retained to ensure the compliance of the OECD principles.

2.3. Dataset processing and descriptor selection

In this study, the developmental toxicity data (rat and rabbit) for L-QSAR, G-QSAR, and ISC QSAAR were split into training (80%) and test (20%) sets using the random distribution approach. This method ensures a uniform selection of test set molecules covering the entire range of the activity space of the total dataset and leads to a low bias of the model performance.26 Although we have tried our modeling exercises with different patterns of division of the data set into training and test sets, we report here only the best models obtained from a single division, in order to save journal space. Further, in order to check the distribution of the structural features of compounds in test and training data, principles components analysis (PCA) was performed27 and the score plots were constructed for the L-QSAR, G-QSAR and ISC QSAAR data (Fig. 2). These plots strongly suggested that the test compounds were located in close proximity to the training set compounds.

Fig. 2. Plot showing the distribution of the PCA scores of the descriptors in training and test compounds in (a) L-QSAR, (b) G-QSAR, and (c) ISC QSAAR analyses.

Fig. 2

Relevant features for constructing the L-QSAR and G-QSAR models were selected using the model-fitting approach.28 Prior to this, descriptors with a low variation (≤0.5) were excluded from the pool and the retained descriptors were further thinned for QSAR analysis. Now, the QSAR models were constructed with the respective training datasets using the retained pool of descriptors and optimal model parameters were determined through a 5-fold cross-validation (CV). The root mean squared error (RMSE) values were calculated to rank the contribution of the descriptors in the current set for each model. The lowest ranked descriptors (<10% contribution) were then removed in the successive modeling steps.29 The most significant descriptors were then retained and the corresponding prediction accuracies were computed. Finally the descriptors retained for the L-QSAR and G-QSAR models are presented in Table 1. A brief description of the selected descriptors is available elsewhere.30,31

Table 1. Descriptors used in QSAR and ISC QSAAR modeling.

Descriptors QSAR/QSAAR model Description
DELS L-QSAR (Rat) Sum of all atom intrinsic state differences (measure of total charge transfer in the molecule).
ETA_Beta L-QSAR (Rat) A measure of electronic features of the molecule
ETA_dBeta L-QSAR (Rabbit) A measure of relative unsaturation content
ETA_Eta G-QSAR Composite index Eta
ETA_Eta_L L-QSAR (Rabbit) Local index Eta_local
GGI3 L-QSAR (Rat), G-QSAR Topological charge index of order three
MAXDN L-QSAR (Rabbit) Maximum negative intrinsic state difference in the molecule.
MDEC-22 L-QSAR (Rabbit) Molecular distance edge between all secondary carbons
MDEC-33 L-QSAR (Rabbit) Molecular distance edge between all tertiary carbons
minHBa G-QSAR Minimum E-states for (strong) hydrogen bond acceptors
minsCH3 L-QSAR (Rat) Minimum atom-type E-state: –CH3
nAtomP G-QSAR Number of atoms in the largest pi system
TopoPSA L-QSAR (Rat), G-QSAR Topological polar surface area
XLogP QSAAR Logarithm of octanol–water partition coefficient by atomic contribution approach

The structural diversity of the compounds considered was evaluated using the Tanimoto Similarity Index (TSI) method.32 This index provides the measure for identifying which of the mechanistic groups, the target chemical was most likely to belong to.33 For a molecule, TSI is calculated as; TSIAB = 2ZAB[ZAA + ZBBZAB]–1; where Z is the similarity matrix, A and B are the two molecules being compared. The TSI ranges from 0 (no similarity) to 1 (pair-wise similarity). A smaller TSI means that compounds have good diversity.29 In this study, the calculation of TSI values gave rise to the following plot in Fig. S2 (ESI), which illustrated the distribution of TSI values for compounds. The plots show that all the chemicals fell in the TSI range from 0.002–0.307 (rat) and 0.011–0.393 (rabbit), respectively, which suggests a sufficiently high structural diversity among the considered chemicals.

2.4. Predictive modeling

In this study, the DTF and DTB based L-QSAR, G-QSAR and ISC QSAAR models were established for predicting the developmental toxicity potential (LEL) of the organic chemicals in rodents. A brief account of the modeling methods is provided here.

The DTF34 and DTB35 are ensembles of single decision trees (SDTs), which are considered to be weak learners. The generalization ability of an ensemble is usually much stronger than that of base learners and is able to make accurate predictions. In DTF, a large number of independent trees are grown in parallel, and they do not interact until after all of them have been built. The DTF gains strength from the bagging technique which derives bootstrapped replicas of original data. Separate models are produced and used to predict the entire data from the aforesaid sub-sets. Then various estimated models are aggregated by using the mean for regression problems. In bagging, a bootstrapped sample is constructed36 as Inline graphic, where D consists of data {(Xi,Yi), i = 1,2,…,n}, Yi is the real-valued response and Xi is a p-dimensional predictor variable for the ith instance. Then the bootstrapped predictor E(Y|X = x) = f(x) is estimated by the plug-in principle, as Inline graphic, where Cn(x) = hn(D1,…,Dn)(x) and hn is the nth hypothesis. Finally, the bagged predictor is represented as Inline graphic. The bagging technique uses the out of bag data rows for model validation and can reduce variance when combined with the base learner generation, with a good performance. The stochastic element in the DTF algorithm makes it highly resistant to over-fitting.

The DTB algorithm combines the strengths of two algorithms: regression tree, the model that relates a response to their predictors by recursive binary splits, and boosting, which is a technique for improving the accuracy of a predictive function by applying the function repeatedly in a series and combining the output of each function with weighting, so that the total error of prediction is minimized.35 Gradient boosting is an iterative algorithm to find an additive predictor. In many cases, the predictive accuracy of such a series greatly exceeds the accuracy of the base function used alone. Gradient boosting is a sequential stage-wise strictly forward procedure. The boosting algorithm implemented in the DTB creates a tree ensemble and it uses randomization during the tree creations. The goal is to minimize the loss function in the training set, {x,y}. After each iteration, F represents the sum of all trees built so far: Fm(x) = Fm–1(x) + Treem(x), where m is the number of trees in the model. Regardless of the loss-function, the trees fitting the gradient on pseudo residuals are regression trees trained to minimize mean squared error (MSE). The regularization parameter is the number of gradient boosting iterations and is achieved by shrinkage which consists of modifying the update rule as; Fm(x) = Fm–1(x) + υ·γmhm(x), 0 < υ ≤ 1, where parameter υ is called the learning rate and hm(x) is the base learner. In this method, a certain tree population is selected and the first tree is fitted to the data. The residuals from the first tree are then fed into the second tree which attempts to reduce the error. This process is repeated through a chain of successive trees and the final predicted value is formed by adding the weighted contribution of each tree. The number and depth of trees are the method's parameters for both DTF and DTB. Conceptual diagrams of the architecture of DTF and DTB models implemented here are given in Fig. 3.

Fig. 3. Conceptual diagram of the architecture of (a) DTF and (b) DTB models.

Fig. 3

2.5. Model validation metrics

The QSAR and QAAR models were constructed using the respective training data and selected descriptors (Table 1), while keeping the test data for external validation of the models. The robustness of the constructed models for the training sets was examined by comparing these models to those derived from random data sets, which were generated by rearranging the activities of the molecules in the training set. The models were derived using various randomly rearranged activities with the selected descriptors and the corresponding values of the squared correlation coefficient (R2) were calculated37 as, Inline graphic, where yi and ŷi are the measured and model computed values of the variable, and n represents the number of observations. The chance-correlation in the developed QSAR/QAAR models was also checked38 deriving the value of cRp2 for the scrambled models as Inline graphic, where Rr2 represents the squared mean correlation coefficient of the randomized model. The threshold value of cRp2 is 0.5 and a model exceeding this value might not be considered the outcome of mere chance only. Further, in order to evaluate the robustness in predictivity, the developed QSAR/QSAAR models were subjected to validation employing multiple strategies and the results are expressed in the form of various metrics, such as the concordance correlation coefficient (CCC), QF12, QF22, QF3 and rm2 were calculated3943 as Inline graphic, Inline graphic, Inline graphic, Inline graphic, and Inline graphic, where x and y correspond to the abscissa and ordinate values of the graph plotting the experimental and model predicted values; n is the number of chemicals; x[combining macron] and ȳ correspond to the averages of abscissa and ordinate values, respectively; ntest is the number of compounds in the test set; yi and ŷi are the measured and model predicted values of the dependent variable in the test set; ȳTr is the mean value of the dependent variable in the training set; ȳtest represents the mean value of the dependent variable in the test set; nTr is the number of compounds in the training set; r2 and ro2 are squared correlations between the observed and predicted values with and without an intercept for the least-square lines. The performance of the proposed QSAR/QSAAR models here was also assessed by calculating R2 and RMSE in training and test data. A QSAR model will be statistically significant, if the following conditions are satisfied: R2 (training) > 0.5, R2 (test) > 0.6, CCC > 0.85, QF1, QF2, QF3 > 0.7, and rm2 > 0.65.44,45 In addition to these criteria parameters, the developed models here were also assessed for their prediction quality in the test set using the recently proposed mean absolute error (MAE) criterion.46 The MAE is considered to be a simpler and more straight-forward determinant of prediction errors.47 The MAE can be calculated as Inline graphic. For a good prediction, a QSAR model should meet the criteria, MAE ≤ 0.1 × training set range AND MAE + 3σ ≤ 0.2 × training set range, whereas, a model will be considered as a bad predictor, if MAE > 0.15 × training set range OR MAE + 3σ > 0.25 × training set range. Here, the σ value denotes the standard deviation of the absolute values for the test set data. The predictions which do not fall under either of the above two conditions may be considered as of moderate quality.

2.6. Applicability domain analysis

According to the OECD guideline principle 3, the applicability domain (AD) of the developed QSAR model should be defined. The AD of a QSAR model is a theoretical region in the space defined by the descriptors used in the model for which a given QSAR should make reliable predictions.48 Here, the ADs of the developed QSAR models were determined by the approaches based on the descriptor range and the leverage analysis.49 According to this descriptor range method, a compound with descriptor values within the range of those of the training set compounds is considered as being inside the AD of the model.50 In the method based on the leverage approach, the distance of the compound from the centroids of its training set was measured by the leverage of the moiety. The leverage value, hi for each ith compound is calculated from the descriptor (i × j) matrix (X) as;51hi = xTi(XTX)–1xi, where xi is a raw vector of molecular descriptors for a particular ith compound. A value of hi greater than the critical h* value indicates that the structure of the compound substantially differs from those used for the calibration. Therefore, the compound is located outside the optimum prediction space. The h* value can be calculated as; Inline graphic, where p is the number of variables used in the model, and n is the number of training compounds.49 A leverage value greater than 3 i/j is considered large. Further, the X-outliers were also identified using the standardization approach.52

3. Results and discussion

3.1. L-QSAR modeling and evaluation

In this study, separate L-QSAR models were developed for predicting the prenatal developmental toxicity potential (LEL) of chemicals in rats and rabbits using structural features of chemicals following the OECD guidelines. The L-QSAR models were constructed using the toxicity data in a single test species.

3.1.1. L-QSAR modeling for rat developmental toxicity

Here, L-QSAR models based on DTF and DTB approaches were constructed to predict the prenatal developmental toxicity potential (LEL) of diverse chemicals in rats using five molecular descriptors (Table 1). The optimal architectures and the model parameters of the models based on DTF and DTB methods for rat data were determined using a 5-fold CV. The RMSE in the training and CV data for the two models (DTF, DTB) were 0.40, 0.84 and 0.31, 0.78, respectively. A 5-fold Y-scrambling was performed to check any chance correlation in the two models. Low R2 and high cRp2 values of 0.005, 0.933 (DTF) and 0.004, 0.943 (DTB) in Y-randomization revealed that the original L-QSAR models are unlikely to arise as a result of chance correlation. The architectures and the optimal parameters of the constructed L-QSAR models determined through the internal and external validation are given in Table 2.

Table 2. Optimal model parameters for QSAR and ISC QSAAR models.
Models   L-QSAR
G-QSAR QSAAR
Model parameters Rat Rabbit
DTF Number of trees 250 210 210 331
Maximum depth of any tree in the forest 14 15 14 11
Average number of group splits in each tree 116.9 63.6 145.5 29.2
DTB Number of trees 400 400 525 400
Maximum depth of any tree in the series 8 7 8 7
Average number of group splits in each tree 150.1 155.8 146.4 119.1

The contributions of the selected descriptors in DTF and DTB QSAR models developed here are presented in Fig. 4a. In both the models (DTF and DTB), ETA_Beta had the highest (100%) contribution, which represents the measure of electronic features of the molecule. In DTB, the relative importance of each independent variable to the model fit was measured through a reduction in the Huber loss summed across all the internal nodes of all the trees that split on that variable and was divided by the total number of internal nodes (number of internal nodes per tree x number of trees), yielding a squared importance for that variable. The relative importance was finally obtained in the range of 0–100 percent.53 In DTF, the contribution measures are based on the number of times a variable is selected for splitting, weighted by the squared improvement to the model as a result of each split, and averaged over all the trees.54 The contribution of input variables represents the relative importance values. The contribution of most of the important input variables was 100% and the contributions of other input variables were related to the most important variable.55 All the selected descriptors, such as ETA_Beta (r = 0.24), GGI3 (r = 0.20), DELS (r = 0.11), TopoPSA (r = 0.1) and minsCH3 (r = 0.1) correlated positively with the endpoint (pLEL). A positive relationship between the descriptor and the pLEL will mean its direct influence on the endpoint toxicity of the chemical. GGI3 is a topologically charged descriptor that represents a topological charge index of order 3. DELS and minsCH3 are electrotopological state (E-state) atom type descriptors. DELS represents the sum of all atom intrinsic state differences (a measure of total charge transfer in the molecule),56 whereas minsCH3 represents the minimum atom type E-state of the CH3 group in the molecule. E-state descriptors describe the electronic character and topological environment of a skeletal atom in the molecule. E-state indices of a certain atom in the molecule provide information on the electronic state of the atom, which depends on π-bonds, lone pair electrons and σ-bonds that reflect the quantitative availability of valence electrons for ligand target interactions.57 The topological polar surface area (TopoPSA) is defined as the part of the surface area of a module associated with N, O, and S and H-bonded to any of these atoms.58 The polar surface area is a descriptor that correlates well with the passive molecular transport through membranes and allows the prediction of transport properties of molecules. Molecules with a lower polar surface area would have higher permeability. A positive correlation of TopoPSA suggests that its higher value for a molecule would affect the activity adversely. The set of selected descriptors used here represented the features of diverse molecules that were responsible for the observed developmental toxicity in rats.

Fig. 4. Plot showing the contributions of input descriptors in L-QSAR models (a) Rat L-QSAR and (b) Rabbit L-QSAR.

Fig. 4

The selected optimal DTF and DTB QSAR models in training captured 85.52% and 89.76% of the total data variance, respectively. The proportion of the variance captured by the model descriptors is a measure of the closeness of the model predicted and actual values of the endpoint property. Selected L-QSAR models yielded RMSE and R2 values of 0.24, 0.846 (DTF) and 0.14, 0.935 (DTB) in test data (Table 3). It may be noted that the optimal models yielded a high correlation between the measured and the model predicted values of the endpoint toxicity both in training and test data. Further, the MAE parameter based criteria values for the developed QSAR models are presented in Table 4. From the results, the performance of the L-QSAR models on the whole test set was found to be good. A closely followed pattern of variation of the measured and model predicted responses (Fig. S3, ESI) and reasonably low values of prediction errors (Table 3) suggest a good-fit of the developed QSAR models to the datasets and the adequacy of the selected models for predicting the developmental toxicity of chemicals in rats.

Table 3. Performance parameters for the L-QSAR models for rodents.
QSAR models Species Data set RMSE R 2 Q F1 2 Q F2 2 Q F3 2 CCC r m 2
DTF Rat Training 0.38 0.935
Test 0.24 0.846 0.821 0.819 0.945 0.885 0.600
DTB Rat Training 0.32 0.945
Test 0.14 0.935 0.933 0.932 0.979 0.966 0.876
DTF Rabbit Training 0.37 0.927
Test 0.24 0.906 0.841 0.840 0.937 0.894 0.663
DTB Rabbit Training 0.29 0.953
Test 0.17 0.942 0.925 0.925 0.970 0.956 0.825
Table 4. MAE based criteria parameters for the developed QSAR/QSAAR models in the test set.
Models Model type Training range (TR) MAE TR*0.1 MAE + 3σ TR*0.2
L-QSAR DTF 6.352 0.171 0.635 0.664 1.270
(Rat) DTB 6.352 0.120 0.635 0.361 1.270
L-QSAR DTF 5.296 0.174 0.530 0.689 1.059
(Rabbit) DTB 5.296 0.126 0.530 0.456 1.059
G-QSAR DTF 6.352 0.144 0.635 0.547 1.270
DTB 6.352 0.122 0.635 0.423 1.270
ISC-QSAAR DTF 5.165 0.188 0.517 0.591 1.033
DTB 5.165 0.124 0.517 0.353 1.033

Further, the external predictivity of the developed QSAR models was evaluated using the test data that were kept away during the model development process. External validation coefficients, such as CCC, QF12, QF22, QF32 and rm2 were derived from the test data (rat) to attain a higher statistical confidence in the constructed models. The OECD principle 4 advocates for a rigorous validation of the constructed QSAR models prior to applying these for new chemicals. The values of these coefficients (except rm2 in the DTF model) along with their respective thresholds44,45 and the quality metric R2 are given in Table 3. From the results, it is evident that all the validation metrics for the developed QSAR models were within their acceptable limits. From the results, it is evident that the DTB performed relatively better than the DTF approach (Table 3). The better performance of the DTB QSAR model may be due to the fact that it incorporates a stochastic gradient boosting algorithm,59 which enhances the prediction ability of the weak learners.36,60

3.1.2. L-QSAR modeling for rabbit developmental toxicity

Separate L-QSAR models (DTF, DTB) were also established for predicting the developmental toxicity potential (LEL) of chemicals in rabbits using five different molecular descriptors (Table 1). A 5-fold CV was used to determine the optimal architectures of the rabbit L-QSAR (DTF, DTB) models (Table 2). The RMSE in the training and CV data for these models were 0.38, 0.88 and 0.26, 0.88, respectively. The low R2 and high cRp2 values of 0.010, 0.922 (DTF) and 0.008, 0.949 (DTB) in the 5-fold Y-randomization test revealed that the original models are unlikely to arise as a result of chance correlation. The contributions of the selected descriptors in rabbit L-QSAR models are shown in Fig. 4b. From Fig. 4b, it is evident that MAXDN had the highest (100%) and MDEC-22 has the lowest contribution in both the models. In the rabbit L-QSAR models, all the selected descriptors (except ETA_dBeta) were positively correlated with the endpoint toxicity. MAXDN is an E-state atom type descriptor which represents the maximum negative intrinsic state difference in the molecule (related to the nucleophilicity of the molecule).56 ETA_dBeta and ETA_Eta_L are extended topochemical atom (ETA) descriptors. ETA_dBeta is a measure of relative unsaturation in the molecule, whereas ETA_Eta_L is a local ETA-index. MDEC-22 and MDEC-33 are molecular distance edge descriptors and represent the distance edge between all secondary and tertiary carbons. A molecular distance edge is defined as the through bond distance from atom A to atom B. This descriptor encodes the molecular size and branching information of the molecule by taking into account sp3 hybridized carbon atoms.61 It may be inferred that all the selected descriptors are related to the developmental toxicity in rabbits.

The optimal rabbit L-QSAR (DTF, DTB) models captured 85.74% and 90.95% of the total training data variance, respectively. These models yielded RMSE and R2 values of 0.24, 0.906 (DTF) and 0.17, 0.942 (DTB) in test data. Further, the MAE based criteria values for the developed L-QSAR models (Table 4) suggested their good performance on the whole test data set. Further, a closely followed pattern of variation of the measured and model predicted response (Fig. S4, ESI) and reasonably low values of prediction errors (Table 3) suggest a good-fit of the developed QSAR models to the dataset and the adequacy of the selected models for predicting the developmental toxicity of chemicals in rabbits. Validation coefficients, such as CCC, QF12, QF22, QF32 and rm2 were derived from the test data (rabbit) and the values of these coefficients (Table 3) were above their respective thresholds.44,45 Similar to the previous case (rat), the performance of the DTB based L-QSAR model was relatively better than that of the DTF approach in rabbits (Table 3).

In order to understand the limitations of the developed L-QSAR models for prediction of the developmental toxicity potential of chemicals in rats and rabbits, the compounds that exhibited larger prediction errors in the test set were analyzed. It was noted that in rats four compounds (disulfoton, thiram, p-menthane-3,8-diol, methanesulfonothioic acid S-(2-hydroxypropyl)ester) by DTF and none of the compounds by the DTB L-QSAR model and in rabbits, only one compound (tebupirimfos) was predicted with an error larger than 0.5 units by the DTF and DTB L-QSAR models. It suggests that the set of selected descriptors here could not capture the features of these chemicals appropriately.

3.2. G-QSAR modeling and evaluation

G-QSAR models (DTF, DTB) were developed using the combined developmental toxicity data set (n = 356) of both the rats and rabbits using a separate set of molecular descriptors. In G-QSAR modeling chemicals that were different in two (rat and rabbit) toxicity data sets were considered. The application domain of the constructed G-QSAR model was thus wider than the individual L-QSAR models developed for the two test species. The G-QSAR models (DTF, DTB) were constructed using five descriptors (minHBa, ETA_Eta, nAtomP, GGI3, TopoPSA). A 5-fold CV, Y-scrambling and external validation were performed to select the optimal model architecture, verify the chance correlation and external applicability of the G-QSAR models. In CV, the RMSE in the training and CV data were 0.41, 0.78 (DTF) and 0.34, 0.82 (DTB), respectively. Low R2 and high cRp2 values of 0.005, 0.927 (DTF) and 0.004, 0.918 (DTB), respectively in the Y-randomization test revealed that the original G-QSAR models disapproved the chance correlation probability. The optimal G-QSAR (DTF, DTB) models captured 84.63% and 87.44% of the total training data variance, respectively. These models were applied to test data and yielded R2 and RMSE values of 0.870, 0.20 (DTF) and 0.909, 0.16 (DTB), respectively. Further, the MAE parameter based criteria values for the developed G-QSAR models (Table 4) suggested their good performance for the whole test data set.

The contributions of the selected descriptors in G-QSAR models are shown in Fig. 5. It may be noted that ETA_Eta exhibited the highest (100%) and nAtomP has the lowest contribution in both the DTF and DTB based G-QSAR models. ETA_Eta is an extended topochemical atom descriptor and represents a composite index. The minHBa is an E-state descriptor and represents minimum E-states for strong H-bond acceptors. The nAtomP represents the number of atoms in the largest π-system in the molecule, and indicated electron richness in the molecule.62 All the five descriptors considered here have positive correlations with the endpoint property (pLEL). Further, to examine the influence of different descriptors on toxicity, 3D plots were constructed (Fig. 6). It was done by varying some selected inputs at a time, considering all the other constants. Fig. 6a presents the combined influence of minHBa and GGI3 on the endpoint toxicity. It is evident that minHBa has a negative, whereas GGI3 has a positive influence on the endpoint toxicity. The minHBa is a minimum E-state descriptor for strong H-bond acceptors, whereas GGI3 represents a topological charge index of order 3. In Fig. 6b, a combined influence of nAtomP and ETA_Eta is demonstrated. It shows that both these descriptors exhibited a positive influence on the endpoint. Fig. 6c shows the combined influence of TopoPSA and minHBa. The TopoPSA has a positive effect, which is in accordance with its correlation with the endpoint.

Fig. 5. Plot showing the contributions of input descriptors in G-QSAR models.

Fig. 5

Fig. 6. 3-D plots of selected descriptors showing interaction trends between (a) GGI3 and minHBa, (b) ETA_Eta and nAtomP, and (c) TopoPSA and minHBa in the G-QSAR model.

Fig. 6

The architectures and the optimal parameters of the constructed G-QSAR models are given in Table 2. The G-QSAR models were applied to the test data and the various statistical coefficients calculated are summarized in Table 5. The values of all the statistical coefficients derived from the test set were above their respective thresholds. The plot of the actual and predicted values of the endpoint toxicity (Fig. 7) suggested a high correlation between them. From the results, it is evident that similar to L-QSARs, the DTB based G-QSAR performed relatively better than the DTF method and the performance results are comparable with those of the L-QSAR models. A similar pattern of performance has also been reported earlier.29,63,64 An in depth investigation of G-QSAR prediction results revealed that none of the compounds were predicted with an error larger than 0.5 units. This supports a wider applicability of the developed QSAR model for more than single test species with different toxicity mechanisms.

Table 5. Performance parameters for the G-QSAR models for rodents.

G-QSAR model Data set RMSE R 2 Q F1 2 Q F2 2 Q F3 2 CCC r m 2
DTF Training 0.40 0.929
Test 0.20 0.870 0.863 0.856 0.962 0.921 0.688
DTB Training 0.36 0.920
Test 0.16 0.909 0.912 0.907 0.976 0.953 0.845

Fig. 7. Plot showing the distribution of measured and model predicted LEL values of chemicals (a) DTF G-QSAR and (b) DTB G-QSAR models.

Fig. 7

3.3. ISC QSAAR modeling

The ISC QAAR model extrapolates data for one toxicity endpoint to those for another toxicity endpoint and can be used to determine the species-specific toxicity of a chemical.65 The QAAR is a mathematical relationship between two different biological endpoints measured in the same species or the same endpoint in different species. Since, the experimental developmental toxicity values (pLEL) of chemicals in rats exhibited a low correlation (r = 0.59) with endpoint toxicities in rabbits, a quantitative nonlinear relationship between the biological endpoints in these test species was defined. The rat toxicity was considered as independent and that of the rabbit was taken as the dependent variable. In order to improve the prediction power of the ISC QAAR models, some relevant descriptors were added to the QAAR models (Table 1). Among various selections, the model with XLogP provided the best results in terms of R2 and RMSE values. XLogP refers to the neutral state of the molecule and serves as a quantitative descriptor of lipophilicity. For the compounds with a strong lipophilic character, the lipid bilayer of the cellular membrane is a potential site for interaction. Partitioning of the molecule into the hydrophobic phase (membrane) permits it to interfere with the organism.66 Hence, the XLogP descriptor might be related to their ability to penetrate through the cell membrane and reach the target. Here, DTF and DTB methods were selected to develop the ISC QSAAR (quantitative structure activity–activity relationship) models. The optimal architecture of the models was selected using a 5-fold CV and retaining the RMSE as the criteria parameter (Table 2). The optimal DTF and DTB based ISC QSAAR models applied to the test data yielded RMSE and R2 values of 0.23, 0.830 and 0.14, 0.927, respectively. The values of the validation coefficients for the test set (Table 6) were above their respective thresholds (except for rm2 in DTF). Further, the MAE parameter based criteria values for the developed ISC QSAAR models (Table 4) suggested their good performance on the complete test data set. The plot of the experimental and model predicted values of the endpoint toxicity (Fig. S5, ESI) suggests for the adequacy of the constructed model. Although, the performance of the DTB based QSAAR model is relatively better than that of the DTF method, the overall prediction results are comparable with the L-QSAR (rabbit) and G-QSAR models. The prediction results were also analyzed for the compounds predicted with errors larger than 0.5 units and none of the chemicals was predicted beyond this error limit. Further, we analyzed the prediction of 10% compounds at the two extremes of the pLEL for assessing any over or underestimation of the predictions. It is evident that the compounds with higher pLEL values (upper end of plot) were under predicted.

Table 6. Performance parameters for the ISC QSAAR models for rodents.

ISC QSAAR model Data set RMSE R 2 Q F1 2 Q F2 2 Q F3 2 CCC r m 2
DTF Training 0.42 0.867
Test 0.23 0.830 0.827 0.791 0.942 0.865 0.498
DTB Training 0.26 0.954
Test 0.14 0.927 0.931 0.917 0.977 0.960 0.898

3.4. Applicability domain analysis

The methods based on the descriptor range and leverage approaches were used here to define the ADs of the constructed QSAR and QSAAR models. The ranges of the implemented descriptors in these models (Tables S2 and S3, ESI) show that none of the compounds in L-QSARs and G-QSARs were out of the defined limits of the AD. However, in ISC QSAAR models one compound (acequinocyl) was detected out of the AD.

Here, to visualize the AD of the constructed QSARs, the Williams plots were examined (Fig. 8) for the detection of both the response outliers (standardized residuals >3) and structurally influential chemicals in the model (h > h*) (in training data). The compounds detected as high leverage and high standardized residuals in L-QSARs, G-QSARs, and QSAARs are presented in Tables S4 and S5 (ESI). It may be noted that four compounds in rat L-QSAR, five in rabbit L-QSAR, six in G-QSAR, and two in QSAAR models were detected as high leverage compounds. However, a major limitation of this method is that the value of h*, hence, the number of compounds within or out of the AD of a model would depend on the number of compounds in the training data. Further, the outliers in the training and test sets were identified using the standardization approach.52 The analysis revealed nine compounds in rat L-QSAR, eight in rabbit L-QSAR, fourteen in G-QSAR and eight compounds in QSAAR models were detected as the outliers. Chemical structures of these compounds are presented in Table S6 (ESI). The anomalous behavior of the compounds outside the ADs of the models may be due to the fact that the set of selected descriptors could not capture some relevant structural features present in these molecules and that their biological mechanism is different from the remaining chemicals. For future predictions, predicted toxicity data must be considered reliable only for those chemicals that fall within the AD on which the model is constructed.

Fig. 8. Williams plot for the (a) rat L-QSAR, (b) rabbit L-QSAR, (c) G-QSAR, and (d) ISC QSAAR models.

Fig. 8

In this work, DTF and DTB based L-QSAR and G-QSAR modeling strategies have been applied in the light of the OECD guidelines (Fig. 1) for developing predictive models for a regulatory purpose. The most important feature of our models is the simplicity, reproducibility and interpretability of the descriptors employed. Furthermore, our models comply with the OECD norms and implicate reliability while assessing new or existing chemicals and also support the REACH policies.67 In the present study, a better performance of all the QSAR models may be attributed to the implementation of the bagging (DTF) and boosting (DTB) algorithms, which help to improve the predictivity of weak learners.

4. Conclusions

Considering the importance of prenatal developmental toxicity in screening the new molecules for a regulatory purpose, the computational approaches are directed towards their safety assessment. In this study, we have developed DTF and DTB based L-QSAR, G-QSAR and ISC QSAAR models for predicting the developmental toxicity potential of chemicals in rodents in accordance with the OECD guidelines for QSAR modeling. For QSAR modeling a highly authentic ToxRef database was considered and separate structural features were identified in the different QSAR analyses here. Several statistical validation tests performed on the constructed QSAR/QSAAR models revealed a high predictivity for these methods and rendered high statistical confidence. Performances of all the L-QSAR, G-QSAR, and ISC QSAAR models were excellent and comparable. The excellent prediction and generalization achieved for the QSAR/QSAAR models here may be due to their ability to capture the nonlinearities in the data. The proposed models will help in reducing the cost and the number of animals in the toxicity testing of chemicals and in generating reliable toxicity data in rodents to streamline the risk assessment process of diverse chemicals.

Supplementary Material

Acknowledgments

The authors thank the Director, CSIR-Indian Institute of Toxicology Research, Lucknow (India) for his keen interest in this work and providing all necessary facilities.

Footnotes

†Electronic supplementary information (ESI) available. See DOI: 10.1039/c5tx00493d

References

  1. Knudsen T. B., Martin M. T., Kavlock R. J., Judson R. S., Dix D. J., Singh A. V. Reprod. Toxicol. 2009;28:209–219. doi: 10.1016/j.reprotox.2009.03.016. [DOI] [PubMed] [Google Scholar]
  2. Regulation of (EC) No. 1907/2006 of the European Parliament and of the Council, December 2006 concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH), establishing a European Chemicals Agency, amending Directive 1999/45/EC and repealing Council Regulation (EEC) No 793/93 and Commission Regulation (EC) No. 1488/94 as well as Council Directive 76/769/EEC and Commission Directives 91/155/EEC, 93/67/EEC, 93/105/EC and 2000/21/EC, Off. J. Eur. Union, L396 (2007), pp. 1–849
  3. US Environmental Protection Agency, Laws & Regulations, http://www2.epa.gov/laws-regulations. [PubMed]
  4. National Toxicology Programme, Prenatal Developmental Toxicity Study, available at: https://ntp.niehs.nih.gov/testing/types/dev/.
  5. U.S. Environmental Protection Agency, Health effects test guidelines OPPTS 870.3700 prenatal developmental toxicity study, Office of Prevention, Pesticides and Toxic Substances, Washington, DC, EPA Publication 712-C-98–207, 1998. [Google Scholar]
  6. Organisation for Economic Cooperation and Development. OECD guideline for the testing of chemicals, No. 414: prenatal developmental toxicity study, Paris, France, Organisation for Economic Cooperation and Development, 2001
  7. Martin M. T., Judson R. S., Reif D. M., Kavlock R. J., Dix D. J. Environ. Health Perspect. 2009;117:392–399. doi: 10.1289/ehp.0800074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Martin M. T., Mendez E., Corum D. G., Judson R. S., Kavlock R. J., Rotroff D. M., Dix D. J. Toxicol. Sci. 2009;110:181–190. doi: 10.1093/toxsci/kfp080. [DOI] [PubMed] [Google Scholar]
  9. Daston G. P. Birth Defects Res., Part A. 2007;79:1–7. doi: 10.1002/bdra.20344. [DOI] [PubMed] [Google Scholar]
  10. Hartung T. Nature. 2009;460:208–212. doi: 10.1038/460208a. [DOI] [PubMed] [Google Scholar]
  11. Council Regulation (EEC) No. 793/93 of 23 March 1993 on the evaluation and control of the risks of existing substances, Off. J. Eur. Commun., 1993, L84, pp. 1–15
  12. Panigel M. Am. J. Obstet. Gynecol. 1962;84:1664–1683. [Google Scholar]
  13. Ala-Kokko T. I., Myllynen P., Vahakangas K. Int. J. Obstet. Anesth. 2000;9:26–38. [Google Scholar]
  14. Pienimaki P., Hartikainen A. L., Arvela P., Partanen T., Herva R., Pelkonen O., Vahakangas K. Epilepsia. 1995;36:241–248. doi: 10.1111/j.1528-1157.1995.tb00991.x. [DOI] [PubMed] [Google Scholar]
  15. Schneider H., Panigel M., Dancis J. Am. J. Obstet. Gynecol. 1972;114:822–828. doi: 10.1016/0002-9378(72)90909-x. [DOI] [PubMed] [Google Scholar]
  16. AltTox.org, Reproductive & Developmental Toxicity: The Way Forward, http://www.alttox.org/ttrc/toxicity-tests/repro-dev-tox/way-forward/.
  17. Worth A. P., Bassan A., DeBruijn J., Gallegos-Saliner A., Netzeva G., Patlewicz G., Pavan M., Tsakovska I., Eisenreich S. SAR QSAR Environ. Res. 2007;18:111–125. doi: 10.1080/10629360601054255. [DOI] [PubMed] [Google Scholar]
  18. European Commission, Directive 2006/121/EC of the European Parliament and of the Council of 18 December 2006 amending Council Directive 67/548/EEC on the approximation of laws, regulations and administrative provisions relating to the classification, packaging and labelling of dangerous substances in order to adapt it to Regulation (EC) No 1907/2006 concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) and establishing a European Chemicals Agency. Off. J. Eur. Union (2006), L 396/850 of 30.12.2006, Office for Official Publications of the European Communities (OPOCE), Luxembourg
  19. Roy K., Kar S. and Das R. N., Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment, Academic Press, London, UK, 2015, ISBN: 978-0-12-801505-6. [Google Scholar]
  20. Organization for Economic Cooperation and Development (OECD), Guidance Document on the Validation of (Quantitative) Structure–activity Relationships [(Q)SAR] Models, ENV/JM/MONO 2 (2007), 2007, 1–154
  21. Animal Toxicity Studies: Effects and Endpoints (Toxicity Reference Database – ToxRefDB) http://www.epa.gov/chemical-research/toxicity-forecaster-toxcasttm-data.
  22. Sipes N. S., Martin M. T., Reif D. M., Kleinstreuer N. C., Judson R. S., Singh A. V., Chandler K. J., Dix D. J., Kavlock R. J., Knudsen T. B. Toxicol. Sci. 2011;124:109–127. doi: 10.1093/toxsci/kfr220. [DOI] [PubMed] [Google Scholar]
  23. Yap C. W. J. Comput. Chem. 2011;32:1466–1474. doi: 10.1002/jcc.21707. [DOI] [PubMed] [Google Scholar]
  24. ChemSpider, http://www.chemspider.com.
  25. Pubchem, http://pubchem.ncbi.nlm.nih.gov/compound/.
  26. Reitermanov Z., Data splitting, WDS'10 Proceedings of Contributed Papers, Part I, 2010, pp. 31–36.
  27. Singh K. P., Malik A., Singh V. K., Mohan D., Sinha S. Anal. Chim. Acta. 2005;550:82–91. [Google Scholar]
  28. Basant N., Gupta S., Singh K. P. Chemosphere. 2015;139:246–255. doi: 10.1016/j.chemosphere.2015.06.063. [DOI] [PubMed] [Google Scholar]
  29. Basant N., Gupta S., Singh K. P. J. Chem. Inf. Model. 2015;55:1337–1348. doi: 10.1021/acs.jcim.5b00139. [DOI] [PubMed] [Google Scholar]
  30. Roy K., Kar S. and Das R. N., Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment, Academic Press, London, UK, 2015, ISBN: 978-0-12-801505-6. [Google Scholar]
  31. Roy K., Kar S. and Das R. N., A Primer on QSAR/QSPR Modeling Fundamental Concepts, Springer Briefs in Molecular Science, Springer Cham Heidelberg, New York, London, 2015, 10.1007/978-3-319-17281-1. [DOI] [Google Scholar]
  32. Zhao C. Y., Zhang H. X., Zhang X. Y., Liu M. C., Hu Z. D., Fan B. T. Toxicology. 2006;217:105–119. doi: 10.1016/j.tox.2005.08.019. [DOI] [PubMed] [Google Scholar]
  33. Patlewicz G., Jeliazkova N., Gallegos Saliner A., Worth A. P. SAR QSAR Environ. Res. 2008;19:397–412. doi: 10.1080/10629360802083848. [DOI] [PubMed] [Google Scholar]
  34. Breiman L. Mach. Learn. 1996;24:123–140. [Google Scholar]
  35. Friedman J. H. Comput. Stat. Data Anal. 2002;38:367–378. [Google Scholar]
  36. Erdal H. I., Karakurt O. J. Hydrol. 2013;477:119–128. [Google Scholar]
  37. Chenard J. F., Caissie D. Hydrol. Processes. 2008;22:3361–3372. [Google Scholar]
  38. Mitra I., Saha A., Roy K. Mol. Simul. 2010;36:1067–1079. [Google Scholar]
  39. Lin L. I. Biometrics. 1992;48:599–604. [Google Scholar]
  40. Shi L. M., Fang H., Tong W., Wu J., Perkins R., Blair R. M., Branham W. S., Dial S. L., Moland C. L., Sheehan D. M. J. Chem. Inf. Comput. Sci. 2001;41:186–195. doi: 10.1021/ci000066d. [DOI] [PubMed] [Google Scholar]
  41. Schuurmann G., Ebert R., Chen J., Wang B., Kuhne R. J. Chem. Inf. Model. 2008;48:2140–2145. doi: 10.1021/ci800253u. [DOI] [PubMed] [Google Scholar]
  42. Consonni V., Ballabio D., Todeschini R. J. Chem. Inf. Model. 2009;49:1669–1678. doi: 10.1021/ci900115y. [DOI] [PubMed] [Google Scholar]
  43. Roy K., Chakraborty P., Mitra I., Ojha P. K., Kar S., Das R. N. J. Comput. Chem. 2013;34:1071–1082. doi: 10.1002/jcc.23231. [DOI] [PubMed] [Google Scholar]
  44. Tropsha A., Golbraikh A., Cho W. J. Bull. Korean Chem. Soc. 2011;32:2397–2404. [Google Scholar]
  45. Chirico N., Gramatica P. J. Chem. Inf. Model. 2012;52:2044–2058. doi: 10.1021/ci300084j. [DOI] [PubMed] [Google Scholar]
  46. Roy K., Das R. N., Ambui P., Aher R. B. Chemom. Intell. Lab. Syst. 2016;152:18–33. [Google Scholar]
  47. Chai T., Draxler R. R. Geosci. Model Dev. 2014;7:1247–1250. [Google Scholar]
  48. Netzeva T. I., Worth A. P., Aldenberg A., Benigni R., Cronin M. T. D., Gramatica P., Jaworska J. S., Kahn S., Klopman G., Marchant C. A. Altern. Lab. Anim. 2005;33:155–173. doi: 10.1177/026119290503300209. [DOI] [PubMed] [Google Scholar]
  49. Gramatica P. QSAR Comb. Sci. 2007;26:694–701. [Google Scholar]
  50. Kovarich S., Papa E., Gramatica P. J. Hazard. Mater. 2011;190:106–112. doi: 10.1016/j.jhazmat.2011.03.008. [DOI] [PubMed] [Google Scholar]
  51. Puzyn T., Rasulev B., Gajewicz A., Hu X., Dasari T. P., Michalkova A., Hwang H. M., Toropov A., Leszczynska D., Leszczynska J. Nat. Nanotechnol. 2011;6:175–178. doi: 10.1038/nnano.2011.10. [DOI] [PubMed] [Google Scholar]
  52. Roy K., Kar S., Ambure P. Chemom. Intell. Lab. Syst. 2015;145:22–29. [Google Scholar]
  53. Friedman J. H. Ann. Stat. 2001;29:1189–1232. [Google Scholar]
  54. Friedman J. H., Meulman J. J. Stat. Med. 2003;22:1365–1381. doi: 10.1002/sim.1501. [DOI] [PubMed] [Google Scholar]
  55. Hajek P., Olej V. and Myskova R., Forecasting Stock Prices using Sentiment Information in Annual Reports – A Neural Network and Support Vector Regression Approach, BAE, 2013, vol. 10, pp. 293–305. [Google Scholar]
  56. Gramatica P., Corradi M., Consonni V. Chemosphere. 2000;41:763–777. doi: 10.1016/s0045-6535(99)00463-4. [DOI] [PubMed] [Google Scholar]
  57. Samat N. H. A., Abdualkader A. M., Mohamed F., Abdullahie A. D. Int. J. Pharm. Pharm. Sci. 2014;6:284–290. [Google Scholar]
  58. Ertl P., Rohde B., Selzer P. J. Med. Chem. 2000;43:3714–3717. doi: 10.1021/jm000942e. [DOI] [PubMed] [Google Scholar]
  59. Grunwald S., Daroub S. H., Lang T. A., Diaz O. A. Sci. Total Environ. 2009;407:3772–3783. doi: 10.1016/j.scitotenv.2009.02.030. [DOI] [PubMed] [Google Scholar]
  60. Chou J. S., Chiu C. K., Farfoura M., AI-Taharwa I. J. Comput. Civ. Eng. 2011;25:242–253. [Google Scholar]
  61. Serra J. R., Thompson E. D., Jurs P. C. Chem. Res. Toxicol. 2003;16:153–163. doi: 10.1021/tx020077w. [DOI] [PubMed] [Google Scholar]
  62. Afantitis A., Melagraki G., Koutentis P. A., Sarimveis H., Kollias G. Eur. J. Med. Chem. 2011;46:497–508. doi: 10.1016/j.ejmech.2010.11.029. [DOI] [PubMed] [Google Scholar]
  63. Gupta S., Basant N., Singh K. P. RSC Adv. 2015;5:71153–71163. [Google Scholar]
  64. Basant N., Gupta S., Singh K. P. Toxicol. Res. 2016;5:340–353. doi: 10.1039/c5tx00321k. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Furuhama A., Hasunuma K., Aoki Y. SAR QSAR Environ. Res. 2015;26:301–323. doi: 10.1080/1062936X.2015.1032347. [DOI] [PubMed] [Google Scholar]
  66. Singh K. P., Gupta S., Basant N. Chemom. Intel. Lab. Syst. 2015;140:61–72. [Google Scholar]
  67. Williams E. S., Panko J., Paustenbach D. J. Crit. Rev. Toxicol. 2009;39:553–675. doi: 10.1080/10408440903036056. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from Toxicology Research are provided here courtesy of Oxford University Press

RESOURCES