Skip to main content
Environmental Health Perspectives logoLink to Environmental Health Perspectives
. 2009 Apr 3;117(8):1257–1264. doi: 10.1289/ehp.0800471

A Novel Two-Step Hierarchical Quantitative Structure–Activity Relationship Modeling Work Flow for Predicting Acute Toxicity of Chemicals in Rodents

Hao Zhu 1, Lin Ye 1, Ann Richard 2, Alexander Golbraikh 1, Fred A Wright 3, Ivan Rusyn 4,*, Alexander Tropsha 1,*,
PMCID: PMC2721870  PMID: 19672406

Abstract

Background

Accurate prediction of in vivo toxicity from in vitro testing is a challenging problem. Large public–private consortia have been formed with the goal of improving chemical safety assessment by the means of high-throughput screening.

Objective

A wealth of available biological data requires new computational approaches to link chemical structure, in vitro data, and potential adverse health effects.

Methods and results

A database containing experimental cytotoxicity values for in vitro half-maximal inhibitory concentration (IC50) and in vivo rodent median lethal dose (LD50) for more than 300 chemicals was compiled by Zentralstelle zur Erfassung und Bewertung von Ersatz- und Ergaenzungsmethoden zum Tierversuch (ZEBET; National Center for Documentation and Evaluation of Alternative Methods to Animal Experiments). The application of conventional quantitative structure–activity relationship (QSAR) modeling approaches to predict mouse or rat acute LD50 values from chemical descriptors of ZEBET compounds yielded no statistically significant models. The analysis of these data showed no significant correlation between IC50 and LD50. However, a linear IC50 versus LD50 correlation could be established for a fraction of compounds. To capitalize on this observation, we developed a novel two-step modeling approach as follows. First, all chemicals are partitioned into two groups based on the relationship between IC50 and LD50 values: One group comprises compounds with linear IC50 versus LD50 relationships, and another group comprises the remaining compounds. Second, we built conventional binary classification QSAR models to predict the group affiliation based on chemical descriptors only. Third, we developed k-nearest neighbor continuous QSAR models for each subclass to predict LD50 values from chemical descriptors. All models were extensively validated using special protocols.

Conclusions

The novelty of this modeling approach is that it uses the relationships between in vivo and in vitro data only to inform the initial construction of the hierarchical two-step QSAR models. Models resulting from this approach employ chemical descriptors only for external prediction of acute rodent toxicity.

Keywords: acute toxicity, computational toxicology, IC50, LD50, LOAEL, NOAEL, QSAR


Development of accurate and predictive in vitro toxicity testing methods that could be used as alternatives for lengthy and costly in vivo experiments has long been an elusive goal for both industry and regulatory agencies (National Research Council 2007). New, bold research programs were recently established at the National Toxicology Program (Xia et al. 2008) and the U.S. Environmental Protection Agency (U.S. EPA) (Dix et al. 2007) and coordinated at the interagency level by the U.S. government (Collins et al. 2008) to address this important challenge in a systematic way. The overall goal of these initiatives is to explore a diverse array of in vitro toxicity assays, such as cell-based and cell-free high-throughput screening (HTS) techniques, as well as toxicogenomic technologies, to evaluate the toxic potential of chemicals and prioritize candidates for animal testing. However, the utility of in vitro data as indicators of in vivo effects will be fully realized only if rigorous correlation between the toxicity of chemicals in vitro and in vivo can be established (National Research Council 2007; Rabinowitz et al. 2008).

Many previous studies have indicated that the correlation between the in vitro toxicity results and animal toxicity test data (e.g., acute, subacute, subchronic, and chronic rodent toxicity test results) is generally poor. Most notably, in 2001, the Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) hosted a workshop to assess the relationship between cytotoxicity and rodent acute toxicity for > 300 diverse compounds; the data were compiled by the Zentralstelle zur Erfassung und Bewertung von Ersatz-und Ergaenzungsmethoden zum Tierversuch (ZEBET; the National Center for Documentation and Evaluation of Alternative Methods to Animal Experiments) [ICCVAM and National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM) 2001]. It was concluded that there is no clear correlation between cytotoxicity [half-maximal inhibitory concentration (IC50)] and acute toxicity [median lethal dose (LD50)] data in rodents. Similarly, poor correlation was found between in vitro cytotoxicity and in vivo rodent carcinogenicity, even when a diverse set of in vitro end points from HTS was used (Xia et al. 2008; Zhu et al. 2008).

Cheminformatics approaches such as quantitative structure–activity relationship (QSAR) modeling have been widely used in toxicology (Dearden 2003; Johnson et al. 2004). Several software packages, such as Toxicity Prediction by Komputer Assisted Technology (TOPKAT) (Venkatapathy et al. 2004) and Multiple Computer-Automated Structure Evaluation (MultiCASE) (Matthews et al. 2006), have been developed and actively used by both industry and regulatory agencies. However, existing modeling tools generally do not achieve good external accuracy of prediction for compounds not used in model development, and few QSAR models have been successful in predicting in vivo toxicity end points for diverse sets of environmental compounds (Benigni et al. 2007; Stouch et al. 2003).

There are several possible reasons that previous attempts to establish relationships between in vitro and in vivo toxicity data were largely ineffective. These include, among other factors, inadequate attention paid to the chemical diversity of the compounds used for screening and modeling and, consequently, unjustified confidence in the ability of models to extrapolate significantly outside the chemistry space of the training set. Furthermore, the conventional QSAR modeling efforts have been disconnected from the growing efforts to employ in vitro screening (i.e., HTS data) to predict in vivo outcomes. Recently, we have proposed the use of hybrid chemical–biological descriptors, that is, a combination of conventional chemical descriptors with HTS profile data regarded as biological descriptors. We have demonstrated that these hybrid descriptors afford QSAR models with significantly higher accuracy of prediction of rodent carcinogenicity versus models using chemical descriptors alone, and much higher accuracy versus models that used biological in vitro data alone (Zhu et al. 2008).

These recent studies suggest that the explicit consideration of chemical structure (in the form of chemical descriptors) along with in vitro assay data could potentially account for discrepancies between in vitro and in vivo results and produce more accurate predictive models of in vivo toxicity. To validate this hypothesis further, in this study we used the ZEBET data set (ICCVAM and NICEATM 2001) for which previous attempts to establish the direct in vitro/in vivo correlation proved largely unsuccessful (Freidig et al. 2007). We have observed that chemicals can be partitioned into two classes based on comparison between cytotoxicity and acute toxicity data: a) those for which the linear in vitro (IC50)/in vivo (LD50) correlation could be demonstrated and b) those that correlate poorly. Furthermore, and of central importance for applying our models to the external set of chemicals for which no in vitro data exist, we have built binary QSAR models that could discriminate between compounds in these two classes with reasonable accuracy based on their chemical features alone. Finally, we have established rigorous and externally predictive class-specific QSAR models of rodent acute toxicity measured by LD50 values. We show that a two-step hierarchical QSAR modeling work flow where compounds are first assigned to a class using binary QSAR models and then their LD50 values is predicted using class-specific continuous QSAR models affords accurate prediction of LD50 values for compounds not included in the training set. In addition, we show that this two-step model’s statistical prediction accuracy compares favorably with currently available commercial toxicity predictors. Our studies suggest that the two-step QSAR modeling work flow can improve performance of predictive acute toxicity models for diverse organic compounds and aid in prioritizing compounds for rodent toxicity testing.

Materials and Methods

Data sets

The ZEBET database consists of data for 361 chemicals compiled from literature studies and published in a consolidated ICCVAM report (ICCVAM and NICEATM 2001). Every compound in this data set has at least one cytotoxicity result (IC50) and at least one type of rodent acute toxicity value (rat or mouse LD50). We defined ZEBET criteria to select cytotoxicity data for this data set as follows: a) at least two different IC50 values were available, either from different cell types or from different cytotoxicity end points; b) cytotoxicity data were obtained with mammalian cells; c) cytotoxicity data obtained with hepatocytes were not acceptable; and d) chemical exposure time in the cytotoxicity tests was at least 16 hr. Furthermore, only the results obtained from the following cytotoxicity tests were accepted: a) cell proliferation measured by cell number, protein, DNA content, DNA synthesis, or colony formation; b) cell viability and metabolic indicators, including metabolic inhibition test (MIT-24), 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) assay, 3-(4,5-dimethylthiazol-2-yl)-5-(3-carboxymethoxyphenyl)-2-(4-sulfophenyl)-2H-tetrazolium (MTS) assay, and sodium 3,3-(1-[(phenylamino)carbonyl]-3,4-tetrazolium)-bis(4-emthoxy-6-nitro)benzene sulfonic acid hydrate (XTTC); c) cell viability and membrane indicators, including neutral red uptake, trypan blue exclusion, cell attachment, and cell detachment; and d) differentiation indicators.

For the purpose of this work, we curated the data set to select the subset of organic compounds and excluded inorganic and organometallic compounds, as well as compound mixtures, because conventional chemical descriptors used in QSAR studies could not be computed in these cases. There were 254 and 235 compounds that had rat or mouse LD50 (millimole/kilogram-body weight/day) values, respectively. Only LD50 values published in the Registry of Toxic Effects of Chemical Substances (RTECS) (Norager et al. 1978; Ruden and Hansson 2003) were used. The distributions of log(1/LD50) values of ZEBET compounds, with the exception of a single outlier, were from −2.61 to 2.30 for the rat and from −2.50 to 2.19 for the mouse. We considered one compound, 2,3,7,8-tetrachlorodibenzo-p-dioxin (CAS 1746-01-6), an activity outlier because its log(1/LD50) value was −4.21 for the rat, which deviated significantly from the activity range of the data set. After excluding this single outlier, the data sets used for modeling consisted of 253 compounds for the rat and 235 compounds for the mouse. An additional set of 115 compounds with complete data (both LD50 and IC50) for the rat was recently released by ICCVAM, which we used for validation (referred to as the ICCVAM data set). [For raw data, see Supplemental Material, Table 1 (doi:10.1289/ehp.0800471.S1).]

Table 1.

The results of data partitioning for the compounds with rat LD50, mouse LD50, rat chronic LOAEL, and rat chronic NOAEL data in ZEBET data set using cytotoxicity IC50 values.

Model No. of C1 compounds C1 ratio (%) No. of C2 compounds C2 ratio (%)
Rat LD50 (original set) 137 60 93 40
Mouse LD50 119 56 92 44
Rat LOAEL 21 49 21 51
Rat NOAEL 19 46 22 54
Rat LD50 (full data set) 258 61 167 39

Abbreviations: C1, Class 1; C2, Class 2.

The data on rat chronic lowest observed adverse effect levels (LOAELs) and rat chronic no observed adverse effect levels (NOAELs) were compiled from an internal low-dose toxicity data set established in our laboratory [see Supplemental Material, Table 2 (doi:10.1289/ehp.0800471.S1)]. These data include a combination of multiple toxicity phenotypes, such as liver toxicity and kidney toxicity. Compared with the ZEBET data set, 42 unique compounds have both rat LOAEL and in vitro IC50 values, and 41 compounds have both NOAEL and in vitro IC50 values. Because of limited availability of LOAEL and NOAEL data, we used these two data sets only to illustrate the data partitioning algorithm and did not build any QSAR models for them.

Table 2.

Statistical information for the five most statistically significant kNN QSAR models based on three modeling sets.

Model N-training Pred-training N-test Pred-test NNN
The best kNN classification model for 137 class 1 versus 93 class 2 compounds
 1 173 0.84 55 0.73 1
 2 147 0.86 74 0.70 1
 3 193 0.83 37 0.73 1
 4 165 0.86 59 0.70 1
 5 173 0.81 55 0.75 1

The best kNN continuous model for 137 class 1 compounds
 1 103 0.66 34 0.81 3
 2 103 0.73 34 0.71 2
 3 111 0.71 26 0.74 3
 4 115 0.65 22 0.79 5
 5 77 0.73 60 0.71 2

The best kNN continuous model for 93 class 2 compounds
 1 80 0.61 13 0.84 2
 2 77 0.67 16 0.77 1
 3 80 0.69 13 0.74 1
 4 80 0.65 13 0.76 2
 5 79 0.63 14 0.78 2

Abbreviations: NNN; number of the nearest neighbors used for prediction; N-test, number of compounds in the test set; N-training, number of compounds in the training set; Pred-test, the overall predictivity of the test set (correct classification rate for classification models, R2 for continuous models); Pred-training, the overall predictivity of the training set (correct classification rate for classification models, q2 for continuous models).

QSAR modeling approaches

We used the k-nearest neighbor (kNN) QSAR modeling approach that has been developed in our group (Zheng and Tropsha 2000). In brief, the method is based on the kNN principle and the variable selection procedure. It employs the leave-one-out cross-validation procedure (LOO-CV) and a simulated-annealing algorithm for the variable selection. The procedure starts with the random selection of a predefined number of descriptors from all descriptors. If kNN > 1, the estimated activities ŷi of compounds excluded by the LOO procedure are calculated using the following formula:

graphic file with name ehp-117-1257e1.jpg

where yj is the activity of the jth compound. We define weights wij as

graphic file with name ehp-117-1257e2.jpg

where dij is the Euclidean distance between compound i and its jth nearest neighbor. Further details of the algorithms and work flow are provided elsewhere (Medina-Franco et al. 2005; Ng et al. 2004; Shen et al. 2002; Zheng and Tropsha 2000).

We developed rat and mouse LD50 QSAR models for ZEBET compounds using DRAGON chemical descriptors (DRAGON for Windows, version 5.4; Teleste s.r.l., Milan, Italy). Before model construction, 23 compounds with rat LD50 results and 24 compounds with mouse LD50 results were selected at random to serve as external validation sets. The remaining 230 rat and 211 mouse compounds were used as modeling sets, and each was divided multiple times into training/test sets using the sphere exclusion approach (Golbraikh et al. 2003). We characterized the statistical significance of the models with the standard LOO-CV R2 (q2) for the training sets and the conventional R2 for the test sets when modeling real values (i.e., continuous QSAR). For classification modeling, we used correct classification rates expressed as a fractional value between 0 and 1. The model acceptability cutoff values of the LOO-CV accuracy of the training sets and the prediction accuracy for test sets were both set to 0.65 for classification models. For continuous models, the acceptability thresholds for LOO-CV regression q2 for the training sets and R2 values for the test set were both set at 0.5. Models that did not meet both training and test set cutoff criteria were discarded.

Moving M-regression for data partitioning

We used a novel approach related to a class of M-regression methods (Andersen 2007), which we termed “moving M-regression,” to select compounds for which there is a strong correlation between IC50 and LD50 values (class 1). The approach is a variant of the least squares regression that takes into account only those data points contained within a band around the regression line yregr = ax + b. For each y, only the points within the interval [ydi, y + di] are candidates for class 1, whereas points outside of this band are excluded from class 1. If the line y = ax + b is moved, some new points will enter the band, whereas some other points will leave it, which may result in a higher regression R2; this also explains why we use the term “moving M-regression.” For each point, we define {xi, yi}, i = 1, …, n, the moving M-regression inclusion function, as

graphic file with name ehp-117-1257e3.jpg

Thus, the moving M-regression line can be found by minimizing the following expression:

graphic file with name ehp-117-1257e4.jpg

Function F is not differentiable at all points (xi, yi) such that yi = axi + bd1 and yi = axi + b + d2. For practical purposes, we approximate η(xi, yi) by sums of two sigmoid functions:

graphic file with name ehp-117-1257e5.jpg

where P1 and P2 are large (~ 100) positive parameters. Indeed, as P1 and P2 approach infinity, the expression on the right side of Equation 5 approaches the right side of Equation 3. Small approximation errors in the vicinity of points {axi + bd1, yi} and {axi + b + d2, yi} approach zero as both P1 and P2 approach infinity. It is as if the data points are gradually included within, or excluded from, the band when the regression line is moving. Finally, replacing η(xi, yi) by Equation 3, we obtain

graphic file with name ehp-117-1257e6.jpg

To optimize F(a, b), the following system of equations is to be solved:

graphic file with name ehp-117-1257e7.jpg

Equations 7 are nonlinear, so depending on the data set and parameters P1, P2, d1, and d2, they can have multiple solutions (a, b).

In these studies, xi and yi were the in vitro log(1/IC50) and in vivo log(1/LD50) values, respectively, for a data set of compounds under study. Instead of using Equation 6, we determined the compounds that belong to class 1 by maximizing the number of data points within the band. With this correction, our target function takes the form

graphic file with name ehp-117-1257e8.jpg

To obtain the baseline toxicity regression (see Results), we opted to minimize the number of outliers below the regression line. For this purpose, we added additional terms for the lower border of the band weighted by an arbitrary parameter α. Thus, we minimized the following target function:

graphic file with name ehp-117-1257e9.jpg

The initial point (a, b) for minimization of F** was selected manually. P1 and P2 were equal to 100, d1 and d2 were equal to 0.4, and α was equal to 1. To optimize Equation 9, the system of Equations 7 in which F is replaced by F** should be solved. Figure 1 summarizes the data analytical work flow that we employed in this study for rodent acute toxicity modeling.

Figure 1.

Figure 1

The work flow of the two-step kNN QSAR LD50 modeling.

Model validation

We validated training set models by evaluating their external predictive power on the test sets as described above. Furthermore, a 5-fold external CV analysis was performed for the original ZEBET data set: the data set was randomly split into five equal-size subsets of compounds and five independent sets of calculations were conducted each time using 80% of the whole data set as a modeling set and the remaining 20% compounds as a test set. In addition, robustness of QSAR models was verified using a Y-randomization (randomization of response) approach as follows. We randomly divided the modeling set compounds into class 1 and class 2 subsets and developed kNN QSAR LD50 models for each subset using the same protocol and the same cutoff criteria (q2 and R2 > 0.5) as for compounds in classes 1 and 2 that we generated by means of the moving regression. The purpose of this was to see if statistically significant QSAR models could be obtained for any random division of the original data into two classes. Independently, we applied the test to compounds in unique classes 1 and class 2 by randomizing their LD50 values and redeveloping training set models. Both Y-randomization tests were repeated 10 times.

Results

Failure of conventional QSAR modeling of rodent acute toxicity

The modeling set including 230 compounds with known rat LD50 data was partitioned into 32 training and test sets and the conventional kNN QSAR modeling approach was applied to all training sets as detailed in “Materials and Methods.” We characterized each training set model by its q2 value; for the five best training set models, these values ranged between 0.5 and 0.57. These five models were used for predicting LD50 values for the respective external validation set (23 compounds). However, for each of these models the R2 value for this external set was < 0.5. When we used other types of in-house or commercial QSAR methods (e.g., support vector machine or partial least square) and other types of descriptors (e.g., MolConnZ descriptors or molecular operating environment descriptors), we obtained no statistically significant predictive QSAR models (data not shown). Likewise, modeling of the mouse data set (211 modeling compounds and 24 external validation compounds) was unsuccessful (data not shown). This negative result corroborates the well-known inability of conventional QSAR modeling approaches to arrive at statistically significant and externally predictive models of in vivo toxicity.

Data partitioning using the moving M-regression approach

It is well known that in vitro cytotoxicity correlates poorly with in vivo toxicity end points when any relatively large set of compounds is considered. The ZEBET data set is no exception; cytotoxicity (IC50) correlates with acute toxicity (LD50) for only a fraction of the compounds in either the rat (Figure 2A) or mouse (Figure 2B) data sets. Most of the compounds are more toxic in vivo than in vitro. Similar patterns could be found between cytotoxicity and other in vivo toxicity end points, for example, rat chronic LOAEL and rat chronic NOAEL (Figures 2C, D).

Figure 2.

Figure 2

The identification of the baseline correlation between cytotoxicity (IC50) and various types of in vivo toxicity testing results. (A) Rat LD50. (B) Mouse LD50. (C) Rat LOAEL. (D) Rat NOAEL. C1, class 1; C2, class 2.

To devise a mathematical means for identifying compounds with strong in vitro/in vivo correlation, we extended concepts that have been previously employed in calculating the “baseline regression” that correlated the aquatic toxicity of (some) chemicals with the logarithm of the n-octanol/water partition coefficient (log P) (Klopman et al. 1999, 2000; Mayer and Reichenberg 2006). Here, we have developed a novel approach, termed “moving M-regression,” to identify a subset of compounds with strong IC50 versus LD50 correlation. Using this method, we have partitioned compounds in the modeling set into two classes: class 1, compounds with acute toxicity that linearly correlate with cytotoxicity; and class 2, compounds with acute toxicity that do not correlate well with cytotoxicity, with these points positioned above the regression line.

This analysis for the rat ZEBET data set resulted in 122 compounds assigned to class 1, that is, within the linear regression correlation band between LD50 and IC50 values. The points corresponding to 93 out of 108 remaining compounds are located above the regression line band and are classified as class 2, whereas 15 compounds fall below the regression line (Figure 2A). Although these compounds are likely to be activity outliers, in the absence of an objective rationale for their outlier status, we merged them into class 1 to obtain the highest coverage of the resulting models and to provide a more realistic measure of external predictivity. Figure 2A and Equation 10 show the correlation between the LD50 and IC50 values of the resulting 137 class 1 compounds:

graphic file with name ehp-117-1257e10.jpg

with R2 = 0.74, SE = 0.36, and n = 137. We also applied this approach to analyze the relationship between the in vitro IC50 and other in vivo toxicity data, including mouse LD50, rat chronic LOAEL, and rat chronic NOAEL (Table 1). The same trend was found for all data sets, that is, in all cases the data were partitioned into two classes: a) points on the baseline and b) points off the baseline (Table 1, Figure 2). We found the ratio of class 1 to class 2 compounds to be similar for each of the four in vitro/in vivo toxicity data sets. This result further supports the generality of the “moving M-regression” approach.

Hierarchical QSAR modeling of the partitioned rodent toxicity data

Using class assignments from the data partitioning described above, we employed a two-step QSAR approach (Figure 1) for a) classification modeling (i.e., establishing that compounds assigned to classes 1 and 2 based on their biological activity data could be subdivided into the same classes based on their chemical structure), and b) predictive continuous modeling for all compounds in each class (i.e., estimation of the LD50 based on chemical structure, not IC50 data). For ZEBET rat data, we generated three modeling sets: set 1, 230 compounds (137 class 1 vs. 93 class 2) for classification modeling; set 2, 137 class 1 compounds; and set 3, 93 class 2 compounds for developing two continuous rat LD50 models. The analysis of these three data sets resulted in 252 classification models, as well as 1,207 continuous LD50 models for class 1 compounds and 40 continuous LD50 models for class 2 compounds that satisfied the statistical significance threshold criteria. Table 2 lists the statistical figures of merit for the best kNN models obtained for these three modeling sets.

To demonstrate that these QSAR models have significant external prediction accuracy, we have employed several concurrent approaches for model validation. First, following our general model validation work flow (Tropsha and Golbraikh 2007), we used 23 compounds excluded randomly from the entire data set as an external validation set. The following two-step prediction protocol for external compounds was used: a) kNN classification models were used to assign compounds to class 1 or class 2; and b) depending on the outcome, the respective class-specific continuous QSAR models was employed to predict the LD50 values for each compound. The results demonstrate that the overall accuracy of prediction for this external set is reasonably good. In the first step, the classification model had 65% prediction accuracy (the fraction of correctly identified class 1 and class 2 compounds). In the second step, we obtained R2 = 0.70, mean absolute error(MAE) = 0.39, and prediction coverage (i.e., the fraction of the external set compounds within the applicability domains of the models) of 74% for the external test set when combining the predictions for class 1 and class 2 compounds.

Second, we performed a 5-fold external CV analysis to test the robustness of the modeling outcome using 253 rat ZEBET compounds. The dataset was randomly split into five equal-size subsets of compounds and the modeling procedure was repeated five times, using each subset as a test set and the remaining four subsets as training set, as detailed in “Materials and Methods.” The statistical results of this exercise were as follows: sloperegr = 0.45 ± 0.01, R2regr = 0.71 ± 0.04, R2ext = 0.55 ± 0.05, MAE = 0.44 ± 0.04, coverage = 73 ± 3%.

Third, we performed Y-randomization tests to establish whether our models are statistically robust. Random partitioning of the compounds into two classes (10 times) produced only three (for class 1) and 28 (for class 2) models that satisfied the criteria of q2/R2 > 0.5, compared with 1,207 and 40, respectively, models for “moving M-regression”–assisted partitioning. Randomizing LD50 data generated no model with q2/R2 > 0.5 for class 1 and class 2 compounds.

Fourth, we performed additional Y-randomization analyses by randomly moving or rotating the correlation line (including negative correlation) and redefining compounds into classes 1 and 2. The randomly assigned class 1 and class 2 sets were used to develop QSAR LD50 models individually and the procedure was repeated 10 times. We found that at most, a very small number (< 7) of acceptable (Q2 > 0.5, R2 > 0.5) models could be developed.

Similar modeling results were obtained using the ZEBET mouse LD50 data. After partitioning 211 modeling set compounds into 119 class 1 compounds and 92 class 2 compounds, we developed 843 classification models for class 1 versus class 2, 236 continuous LD50 models for class 1 compounds, and 356 models for class 2 compounds. A two-step prediction protocol for evaluation of the 24 external compounds resulted in similarly good external prediction accuracy: R2 = 0.69, MAE = 0.42, and prediction coverage of 54%.

As a true external validation challenge, we have used our model to make predictions for the 115 compounds with rat LD50 data in the new ICCVAM data set. We compiled this data set after we finished the development of the above-described QSAR LD50 models, so it could be viewed as a true “blind” validation test. The statistical parameters of the prediction results for these compounds were R2 = 0.57, MAE = 0.48, and prediction coverage of 70%. Although somewhat less accurate than the results of the previous external prediction, this validation reinforces the statistical significance and utility of the model.

Y-randomization tests were also performed for the mouse LD50 data set. Similar to the rat data, after 10 random assignments of compounds into the two classes, we developed, at most, 4 (for class 1) and 38 (for class 2) models (q2/R2 > 0.5), compared with 843 and 236 models, respectively, when we used a classification model. Randomization of LD50 values produced no significant models.

Stability of the in vitro and in vivo moving M-regression parameters

Because the regression correlation between in vitro (IC50) and in vivo (LD50) data is required to classify the modeling set compounds and, subsequently, to create the kNN classification models, this linear correlation is an essential factor to determine the robustness of our final models. Hence, the slope of the correlation and associated correlation coefficient (R2) should remain stable when new compounds are added into the modeling set. To validate this supposition, we compiled all available ZEBET and ICCVAM compounds with rat LD50 data to create a new modeling set, including the original modeling set (230 compounds), the external validation set (23 compounds), and additional data (115 compounds). We also included the compounds previously not used for modeling (inorganic, organometallic, and mixtures) because we used no chemical descriptors in this validation. Using the moving M-regression approach for all 425 compounds with IC50 and LD50 values, the resulting in vitro/in vivo correlation parameters are similar to those obtained from our original modeling set in Equation10:

graphic file with name ehp-117-1257e11.jpg

with R2 = 0.71, SE = 0.37, and n = 258. The proportions of class 1 and class 2 compounds and outliers among these 425 compounds were also comparable to those of the original modeling set of 230 compounds (Table 1). We conclude that adding new compounds into the modeling set, which should be important to improve the final model by enriching its chemical and biological diversity, does not affect the in vitro/in vivo regression statistics.

Comparison between the two-step hierarchical LD50 QSAR model and TOPKAT

We compared the performance of our modeling approach with that of TOPKAT software, version 6.1 (Accelrys 2009; Enslein 1988). Two types of comparison were considered. First, we have analyzed 27 of the 115 ICCVAM compounds that have been used neither for building our model nor in the TOPKAT LD50 training set. Figure 3 shows the correlation between the experimental and predicted LD50 values obtained from our model versus TOPKAT. The R2 and MAE of TOPKAT were 0.16 and 0.78, respectively, for all 27 compounds, which is considerably less than the same statistical parameters for prediction of the same data set using our model, R2 and MAE of 0.64 and 0.38, respectively. For seven compounds that were outside of the applicability domain for our model, the R2 and MAE using TOPKAT were 0.60 and 0.50, respectively, whereas our model produced values of 0.86 and 0.29, respectively (Table 3).

Figure 3.

Figure 3

The correlation between experimental and predicted LD50 values for 27 external compounds within the applicability domain (A) using TOPKAT and (B) using the two-step model developed in this study.

Table 3.

Comparison between TOPKAT and the two-step model prediction of the external compounds.

Two-step model
TOPKAT
Measure No applicability domain With applicability domain No applicability domain With applicability domain
Prediction of 27 new ZEBET compounds
R2 0.64 0.86 0.16 0.60
 MAE 0.38 0.29 0.78 0.50
 Coverage (%) 100 67 100 67

Prediction of 1,562 RTECS compounds with 70% confidence level
R2 0.26 0.33 0.19 0.22
 MAE 0.65 0.54 0.76 0.65
 Coverage (%) 100 62 100 62

Prediction of 1,562 RTECS compounds with 90% confidence level
R2 0.42 0.62 0.19 0.26
 MAE 0.60 0.42 0.84 0.66
 Coverage (%) 12 6 12 6

Second, we have used our models to predict acute toxicity compounds in the RTECS (Norager et al. 1978) data set (data were kindly provided by Todd Martin from the U.S. EPA), which contains approximately 7,000 compounds with rat LD50 data. We removed compounds that we found within the ZEBET data set, as well as inorganic compounds and mixtures. This procedure produced a library of 4,003 compounds spanning a diverse chemical space of organic molecules for which experimental rat LD50 data are available.

Because the size of the RTECS library is much larger than that of our original modeling set, we drew from our experience in using QSAR models for virtual screening (Oloff et al. 2005) and narrowed the model applicability domain. Consequently, predictions were made only for compounds that had greater than 70% confidence level in assigning them to either class 1 or class 2 in step 1 of our work flow (i.e., we required that > 70% of all QSAR models meeting our acceptability domain criteria would predict a compound in the same class). We determined that there were 1,562 compounds (out of 4,003) that were not included in the training set of TOPKAT rat LD50 model and for which predictions could be made based on the aforementioned criteria. The TOPKAT model predicted LD50 values for these compounds with an R2 = 0.16 and MAE = 0.78 (Figure 4, Table 3). The same parameters for the two-step QSAR model were 0.26 and 0.65, respectively. After implementing the applicability domain filter, we made predictions for 965 RTECS chemicals. TOPKAT model had parameters of R2 = 0.22 and MAE = 0.65; the same parameters for the two-step model were 0.33 and 0.54, respectively (Table 3), which is better than or comparable to prediction accuracy of various commercial QSAR modeling packages (Moore et al. 2003), albeit there is room for improvement.

Figure 4.

Figure 4

Fraction of compounds versus prediction errors obtained by the two-step rat LD50 model, TOPKAT, and random sampling for 965 and 101 RTECS compounds.

It should be noted that the prediction accuracy of the two-step model can be improved by applying stricter criteria in the classification step. For instance, a 90% cutoff for correct class prediction results in prediction model statistics of R2 = 0.62 and MAE = 0.42, but the coverage of the model diminishes considerably to include 101 compounds (Table 3). The performance of TOPKAT for the same 101 compounds is poor: R2 = 0.26 and MAE = 0.66. Considering that the TOPKAT LD50 training set contains many more compounds (~ 6,000) than the training set used to develop the two-step model (~ 200), it is noteworthy that higher prediction accuracy can be achieved using our modeling approach for a much larger data set. Furthermore, our approach outperforms TOPKAT consistently over a range of error thresholds either for 965 RTECS compounds or for 101 RTECS compounds (Figure 4). In addition, we used the Wilcoxon test to calculate the p-values for the differences in MAEs obtained using two methods. Both for the whole set (965 compounds) and for the reduced set (101 compounds), the improvement achieved by our method, compared with TOPKAT, is statistically significant (p < 0.005).

One obvious reason that the prediction accuracy of our models for RTECS compounds is lower than that obtained from the external validation set of ICCVAM data is the difference in “activity” ranges of compounds in these two data sets. For example, the activity (log 1/activity, in millimolar units) of ZEBET compounds ranges from −2.61 to 2.30, whereas the activity range of RTECS compounds is considerably larger, from −3.34 to 4.21. It should be stressed that the kNN method used in our study cannot extrapolate in the activity space because external compound activity is predicted by averaging the activities of nearest-neighbor compounds in the training set as described in “Materials and Methods.” The MAE for the prediction of RTECS compounds that have experimental activity above 2 or below −2 is 1.14 log units. On the other hand, the MAE for the prediction for RTECS compounds that have experimental activity between −2 and 2 is considerably lower, 0.52 log units. The likely explanation for the better performance of our models in the latter range is that more than 90% of our modeling set compounds have rat LD50 activity in the same range, between −2 and 2. Increasing the diversity and activity range of compounds in the modeling set should significantly improve the prediction accuracy of our models.

Discussion

The conventional wisdom in mechanistic and regulatory toxicology is that predictions of the in vivo toxicity end points from in vitro measures, even within the same species, are difficult. However, an approximate linear correlation between in vitro IC50 and rodent LD50, two of the widely acceptable benchmark parameters used for regulatory purposes, can be established for a significant fraction of the compounds. Indeed, we confirmed this notion by quantitative analysis of the IC50/LD50 relationships and devised an objective, computational means to partition compounds into two groups: those having good linear fit within a defined band, or those falling outside the band and exhibiting, for the most part, higher in vivo than in vitro toxicity. Our hypothesis to explain this observation is that, whereas cytotoxicity assays can reflect some of the toxicity mechanisms resulting in adverse health effects at the whole-animal level, the in vitro tests cannot fully reproduce the complex mechanisms of the in vivo toxicity. For example, it is well known that many compounds are not toxicants themselves but have metabolites that are toxic. We argue that the two-step prediction model based on chemical descriptors only that we developed in our studies also assists in identification of the compound subset that may act directly (i.e., without being biotransformed) and through mechanisms likely to be predictive of the potential in vivo effects. A similar argument was presented previously in ecotoxicity research where log P was found to be a mechanistically relevant predictor (Verhaar et al. 2000).

To further substantiate this argument, we considered the top 10 chemical fragment descriptors that were used most frequently in statistically significant QSAR models, that is, descriptors with the highest discriminatory power [see Supplemental Material, Table 3 (doi:10.1289/ehp.0800471.S1)]. It is noteworthy that the aromatic primary amine, “hydrazine,” and “sulfonamide” moieties, found within compounds that are known to be toxic both in vitro and in vivo (Alaejos et al. 2008; Carr et al. 1993; Toth 1988), were found predominantly in compounds of class 1. On the other hand, “pyrrolidine” and “aromatic tertiary amine” moieties, which require biotransformation (Domagala 1994), were predictors for class 2. We have also demonstrated that this objective division of the data set into two major groups affords robust hierarchical QSAR models, an assertion further supported by successive challenges to the models with external data sets, CV, and randomization of data.

The approach advocated in this study for biologically informed partitioning of structure–activity relationship data differs from conventional cheminformatics clustering approaches. Traditional methods partition compounds into multiple subgroups based on their chemical structure properties only (i.e., chemical descriptors). The underlying reasoning for chemically based clustering is that similar structures are expected to have similar biological properties and mechanisms of activity. However, it is a well-known limitation of structure–activity relationships that the absence or presence of a functional group or other minor change of the chemical structure may result in a large change of biological activity (Maggiora 2006). In our studies, the conventional chemical structure–based clustering method did not yield any statistically meaningful models, either global or local. The distribution of pairwise chemical similarities for all compounds within the modeling sets (class 1 vs. class 2) of rat LD50 values using DRAGON descriptors is very similar (data not shown). This observation reconfirms that chemical clustering would not have partitioned compounds in a way similar to the biological data-based partitioning.

Conclusions

Although the cytotoxicity data generally show weak correlation with rodent acute toxicity, we have demonstrated that these data can be used to inform and improve QSAR modeling of in vivo acute toxicity. We have developed a novel two-step kNN QSAR modeling approach that affords a successful prediction of acute toxicity (LD50) values from chemical structure for both rats and mice. Furthermore, we predicted LD50 values for external compounds with accuracy, exceeding that of previously published QSAR models developed with the commercial (TOPKAT) software. It should be stressed that although in vitro cytotoxicity data have been used to establish the rules for partitioning most compounds into two classes, the ultimate models, both classification and continuous, employ chemical descriptors only. This vital feature of our approach makes it possible to achieve accurate predictions of rodent acute toxicity directly from chemical structure alone, even bypassing the need for in vitro studies of new compounds. We believe that this biological-data–based partitioning approach using in vitro toxicity data for the modeling set only, coupled with subsequent chemical-structure–based classification and continuous QSAR modeling techniques, holds promise for modeling other complex in vivo toxicity end points. This approach charts a future course for combining in vitro screening methods and QSAR modeling to prioritize chemicals for in vivo animal toxicity testing.

Footnotes

Supplemental Material is available online (doi:10.1289/ehp.0800471.S1 via http://dx.doi.org/)

We thank T. Martin [U.S. Environmental Protection Agency (EPA)], and J. Strickland and M. Jackson (ILS, Inc., Durham, NC) for providing some of the data used in this study. We also thank W. Setzer (U.S. EPA) for his interest in this study and valuable comments on the moving M-regression method.

The research described in this article has not been subjected to each funding agency’s peer review and policy review and therefore does not necessarily reflect their views, and no official endorsement should be inferred. The manuscript has been reviewed by the U.S. EPA National Center for Computational Toxicology and approved for publication. Approval does not signify that the contents necessarily reflect the views and policies of the agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use.

This work was supported, in part, by grants from the National Institutes of Health (GM076059 and ES005948) and the U.S. EPA (RD83272001 and RD83382501).

References

  1. Accelrys. [[accessed 30 June 2009]];Predictive Toxicology - DS TOPKAT. 2009 Available: http://accelrys.com/products/discovery-studio/toxicology/
  2. Alaejos MS, Pino V, Afonso AM. Metabolism and toxicology of heterocyclic aromatic amines when consumed in diet: influence of the genetic susceptibility to develop human cancer. A review. Food Res Int. 2008;41:327–340. [Google Scholar]
  3. Andersen M. Modern Methods for Robust Regression. Thousand Oaks, CA: Corwin Press; 2007. [Google Scholar]
  4. Benigni R, Netzeva TI, Benfenati E, Bossa C, Franke R, Helma C, et al. The expanding role of predictive toxicology: an update on the (Q)SAR models for mutagens and carcinogens. J Environ Sci Health C Environ Carcinog Ecotoxicol Rev. 2007;25:53–97. doi: 10.1080/10590500701201828. [DOI] [PubMed] [Google Scholar]
  5. Carr A, Tindall B, Penny R, Cooper DA. In vitro cytotoxicity as a marker of hypersensitivity to sulphamethoxazole in patients with HIV. Clin Exp Immunol. 1993;94:21–25. doi: 10.1111/j.1365-2249.1993.tb05971.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Collins FS, Gray GM, Bucher JR. Toxicology. Transforming environmental health protection. Science. 2008;319:906–907. doi: 10.1126/science.1154619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Dearden JC. In silico prediction of drug toxicity. J Comput Aided Mol Des. 2003;17:119–127. doi: 10.1023/a:1025361621494. [DOI] [PubMed] [Google Scholar]
  8. Dix DJ, Houck KA, Martin MT, Richard AM, Setzer RW, Kavlock RJ. The ToxCast program for prioritizing toxicity testing of environmental chemicals. Toxicol Sci. 2007;95:5–12. doi: 10.1093/toxsci/kfl103. [DOI] [PubMed] [Google Scholar]
  9. Domagala JM. Structure-activity and structure-side-effect relationships for the quinolone antibacterials. J Antimicrob Chemother. 1994;33:685–706. doi: 10.1093/jac/33.4.685. [DOI] [PubMed] [Google Scholar]
  10. Enslein K. An overview of structure-activity relationships as an alternative to testing in animals for carcinogenicity, mutagenicity, dermal and eye irritation, and acute oral toxicity. Toxicol Ind Health. 1988;4:479–498. doi: 10.1177/074823378800400407. [DOI] [PubMed] [Google Scholar]
  11. Freidig AP, Dekkers S, Verwei M, Zvinavashe E, Bessems JG, van de Sandt JJ. Development of a QSAR for worst case estimates of acute toxicity of chemically reactive compounds. Toxicol Lett. 2007;170:214–222. doi: 10.1016/j.toxlet.2007.03.008. [DOI] [PubMed] [Google Scholar]
  12. Golbraikh A, Shen M, Xiao Z, Xiao YD, Lee KH, Tropsha A. Rational selection of training and test sets for the development of validated QSAR models. J Comput Aided Mol Des. 2003;17:241–253. doi: 10.1023/a:1025386326946. [DOI] [PubMed] [Google Scholar]
  13. ICCVAM and NICEATM. Report of the International Workshop on In Vitro Methods for Assessing Acute Systemic Toxicity Interagency Coordinating Committee on the Validation of Alternative Methods and National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods Report 01-4499. Bethesda, MD: National Institutes of Health; 2001. [Google Scholar]
  14. Johnson DE, Smith DA, Park BK. Linking toxicity and chemistry: think globally, but act locally? Curr Opin Drug Discov Devel. 2004;7:33–35. [PubMed] [Google Scholar]
  15. Klopman G, Saiakhov R, Rosenkranz HS. Multiple computer-automated structure evaluation study of aquatic toxicity II. Fathead minnow. Environ Toxicol Chem. 2000;19:441–447. [PubMed] [Google Scholar]
  16. Klopman G, Saiakhov R, Rosenkranz HS, Hermens JLM. Multiple Computer-Automated Structure Evaluation program study of aquatic toxicity 1: guppy. Environ Toxicol Chem. 1999;18:2497–2505. [PubMed] [Google Scholar]
  17. Maggiora GM. On outliers and activity cliffs—why QSAR often disappoints. J Chem Inf Model. 2006;46:1535. doi: 10.1021/ci700332k. [Online 28 December 2007] [DOI] [PubMed] [Google Scholar]
  18. Matthews EJ, Kruhlak NL, Cimino MC, Benz RD, Contrera JF. An analysis of genetic toxicity, reproductive and developmental toxicity, and carcinogenicity data: II. Identification of genotoxicants, reprotoxicants, and carcinogens using in silico methods. Regul Toxicol Pharmacol. 2006;44:97–110. doi: 10.1016/j.yrtph.2005.10.004. [DOI] [PubMed] [Google Scholar]
  19. Mayer P, Reichenberg F. Can highly hydrophobic organic substances cause aquatic baseline toxicity and can they contribute to mixture toxicity? Environ Toxicol Chem. 2006;25(10):2639–2644. doi: 10.1897/06-142r.1. [DOI] [PubMed] [Google Scholar]
  20. Medina-Franco JL, Golbraikh A, Oloff S, Castillo R, Tropsha A. Quantitative structure-activity relationship analysis of pyridinone HIV-1 reverse transcriptase inhibitors using the k nearest neighbor method and QSAR-based database mining. J Comput Aided Mol Des. 2005;19:229–242. doi: 10.1007/s10822-005-4789-8. [DOI] [PubMed] [Google Scholar]
  21. Moore DR, Breton RL, MacDonald DB. A comparison of model performance for six quantitative structure-activity relationship packages that predict acute toxicity to fish. Environ Toxicol Chem. 2003;22:1799–1809. doi: 10.1897/00-361. [DOI] [PubMed] [Google Scholar]
  22. National Research Council. Toxicity Testing in the 21st Century: A Vision and a Strategy. Washington, DC: National Academies Press; 2007. [Google Scholar]
  23. Ng C, Xiao Y, Putnam W, Lum B, Tropsha A. Quantitative structure-pharmacokinetic parameters relationships (QSPKR) analysis of antimicrobial agents in humans using simulated annealing k-nearest-neighbor and partial least-square analysis methods. J Pharm Sci. 2004;93:2535–2544. doi: 10.1002/jps.20117. [DOI] [PubMed] [Google Scholar]
  24. Norager O, Town WG, Petrie JH. Analysis of the registry of toxic effects of chemical substances (RTECS) files and conversion of the data in these files for input to the environmental chemicals data and information network (ECDIN) J Chem Inf Comput Sci. 1978;18:134–140. doi: 10.1021/ci60015a005. [DOI] [PubMed] [Google Scholar]
  25. Oloff S, Mailman RB, Tropsha A. Application of validated QSAR models of D1 dopaminergic antagonists for database mining. J Med Chem. 2005;48:7322–7332. doi: 10.1021/jm049116m. [DOI] [PubMed] [Google Scholar]
  26. Rabinowitz JR, Goldsmith MR, Little SB, Pasquinelli MA. Computational molecular modeling for evaluating the toxicity of environmental chemicals: prioritizing bioassay requirements. Environ Health Perspect. 2008;116:573–577. doi: 10.1289/ehp.11077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Ruden C, Hansson SO. How accurate are the European Union’s classifications of chemical substances. Toxicol Lett. 2003;144:159–172. doi: 10.1016/s0378-4274(03)00204-2. [DOI] [PubMed] [Google Scholar]
  28. Shen M, LeTiran A, Xiao Y, Golbraikh A, Kohn H, Tropsha A. Quantitative structure-activity relationship analysis of functionalized amino acid anticonvulsant agents using k nearest neighbor and simulated annealing PLS methods. J Med Chem. 2002;45:2811–2823. doi: 10.1021/jm010488u. [DOI] [PubMed] [Google Scholar]
  29. Stouch TR, Kenyon JR, Johnson SR, Chen XQ, Doweyko A, Li Y. In silico ADME/Tox: why models fail. J Comput Aided Mol Des. 2003;17:83–92. doi: 10.1023/a:1025358319677. [DOI] [PubMed] [Google Scholar]
  30. Toth B. Toxicities of hydrazines: a review. In Vivo. 1988;2:209–242. [PubMed] [Google Scholar]
  31. Tropsha A, Golbraikh A. Predictive QSAR modeling work-flow, model applicability domains, and virtual screening. Curr Pharm Des. 2007;13:3494–3504. doi: 10.2174/138161207782794257. [DOI] [PubMed] [Google Scholar]
  32. Venkatapathy R, Moudgal CJ, Bruce RM. Assessment of the oral rat chronic lowest observed adverse effect level model in TOPKAT, a QSAR software package for toxicity prediction. J Chem Inf Comput Sci. 2004;44:1623–1629. doi: 10.1021/ci049903s. [DOI] [PubMed] [Google Scholar]
  33. Verhaar HJ, Solbe J, Speksnijder J, van Leeuwen CJ, Hermens JL. Classifying environmental pollutants: part 3. External validation of the classification system. Chemosphere. 2000;40:875–883. doi: 10.1016/s0045-6535(99)00317-3. [DOI] [PubMed] [Google Scholar]
  34. Xia M, Huang R, Witt KL, Southall N, Fostel J, Cho MH, et al. Compound cytotoxicity profiling using quantitative high-throughput screening. Environ Health Perspect. 2008;116:284–291. doi: 10.1289/ehp.10727. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Zheng W, Tropsha A. Novel variable selection quantitative structure–property relationship approach based on the k-nearest-neighbor principle. J Chem Inf Comput Sci. 2000;40:185–194. doi: 10.1021/ci980033m. [DOI] [PubMed] [Google Scholar]
  36. Zhu H, Rusyn I, Richard A, Tropsha A. Use of cell viability assay data improves the prediction accuracy of conventional quantitative structure–activity relationship models of animal carcinogenicity. Environ Health Perspect. 2008;116:506–513. doi: 10.1289/ehp.10573. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Environmental Health Perspectives are provided here courtesy of National Institute of Environmental Health Sciences

RESOURCES